Difference between revisions of "Input.interp"

Revision as of 17:35, 5 April 2012

This page is a method. The page identifier is Op_en5364
Moderator:Jouni (see all)
This page is a stub. You may improve it into a full page, and then a rating bar will appear here.
Upload data Show results

input.interp is an R function that interprets model inputs from a user-friendly format into explicit and exact mathematical format. The purpose is to make it easy for a user to give input without a need to worry about technical modelling details.

Question

What should be a list of important user input formats, and how should they be interpreted?

Answer

The basic feature is that if a text string can be converted to a meaningful numeric object, it will be. This function can be used when data is downloaded from Opasnet Base: if Result.Text contains this kind of numeric information, it is converted to numbers and fused with Result.

n is the number of iterations in the model. # is any numeric character in the text string.

Example	Regular expression	Interpretation	Output in R
12 000	# #	12000. Text is interpreted as number if space removal makes it a number.	as.numeric(gsub(" ", "", Result.text))
12,345	#,#	12.345. Commas are interpreted as decimal points.	as.numeric(gsub(",", ".", Result.text)) # Note! Do not use comma as a thousand separator!
-14,23	-#	-14.23. Minus in the beginning of entry is interpreted as minus, not a sign for a range.
50 - 125	# - #	Uniform distribution between 50 and 125	data.frame(iter=1:n, result=runif(n,50,125))
-12 345 - -23,56		Uniform distribution between -12345 and -23.56.
1 - 50	# - #	Loguniform distribution between 1 and 50 (Lognormality is assumed if the ratio of upper to lower is => 30)
3.1 ± 1.2 or 3.1 +- 1.2	# ± # or # +- #	Normal distribution with mean 3.1 and SD 1.2	data.frame(iter=1:n, result=rnorm(n,3.1,1.2))
2.4 (1.8 - 3.0)	# (# - #)	Normal distribution with mean 2.4 and 95 % confidence interval from 1.8 to 3.0	data.frame(iter=1:n, result=rnorm(n,2.4,(3.0-1.8)/2/1.96))
2.4 (2.0 - 3.2)	# (# - #)	Lognormal distribution with mean 2.4 and 95 % confidence interval from 2.0 to 3.0. Lognormality is assumed if the difference from mean to upper limit is => 50 % greater than from mean to lower limit.
24 - 35 (odds 5:1)	# - # (odds #:#)	Odds is five to one that the truth is between 24 and 35. How to calculate this, I don't know yet, but there must be a prior.	⇤# : I am not sure whether this is actually needed. Who expresses uncertainties in this way? --Jouni 14:00, 28 December 2011 (EET)
2;4;7		Each entry (2, 4, and 7 in this case) are equally likely to occur. Entries can also be text.
* (in index, or explanatory, columns)		The result applies to all locations of this index.	With merge() function, this column is not used as a criterion when these rows are merged.

How to actually make this happen in R?

Make a temporary result temp by removing all spaces from Result.Text. Columns: Indices,Result.Result.Text,temp (Indices contains all explanatory columns.)
Replace all "," with "."
Check if there are parentheses "()". If yes, assume that they contain 95 % CI.
Check if there are ranges "#-#".
Divide the rows of the data.frame into two new data.frames with the same list of columns (Indices,Result).
- If temp is a syntactically correct distribution, take the row to data.frame A and replace Result with temp.
- Otherwise, take the row to data.frame B and replace Result with Result.Text if that is not NA.
Create a new data.frame with index Iter = 1:n.
Make a random sample from each probability distribution in data.frame A using Iter.
Merge the data.frame B with Iter.
Join data.frames A and B with rbind(). Columns: Iter,Index,Result.

Rcode

+ Show code - Hide code

lmean <- function(parmean, parsd) {return(log(parmean)-log(1+(parsd^2)/(parmean^2))/2)}
lsd <- function(parmean, parsd) {return(log(1+(parsd^2)/(parmean^2)))}

input.interp <- function(res.char, n = 1000) {
	res.char <- gsub(" ", "", res.char)
	res.char <- gsub(",", ".", res.char)
	plusminus <- gregexpr(paste("\\+-|", rawToChar(as.raw(177)), sep = ""), res.char) # saattaa osoittautua ongelmaksi enkoodauksen vuoksi
	plusminus.length <- sapply(plusminus, length)
	plusminus.exists <- unlist(plusminus)[cumsum(c(0, plusminus.length[-length(plusminus.length)])) + 1] > 0
	minus <- gregexpr("-", res.char)
	minus.length <- sapply(minus, length)
	minus.exists <- unlist(minus)[cumsum(c(0, minus.length[-length(minus.length)])) + 1] > 0
	brackets <- gregexpr("\\(.*\\)", res.char) # matches for brackets "(...)"
	brackets.length <- as.numeric(unlist(lapply(brackets, attributes)))
	brackets.pos <- unlist(brackets)
	out <- list()
	for(i in 1:length(res.char)) {
		if(brackets.pos[i] >= 0) {
			minus.relevant <- unlist(minus)[(cumsum(c(0, minus.length)) + 1)[i]:cumsum(minus.length)[i]] # Meni hieman monimutkaiseksi ylla olevan vektorisoinnin vuoksi
			n.minus.inside.brackets <- sum(minus.relevant > brackets.pos[i] & minus.relevant < brackets.pos[i] + brackets.length[i])
			imean <- as.numeric(substr(res.char[i], 1, brackets.pos[i] - 1))
			if(n.minus.inside.brackets == 1) {
				ici <- c(as.numeric(substr(res.char[i], brackets.pos[i] + 1, minus.relevant[minus.relevant > brackets.pos[i]] - 1)), as.numeric(substr(res.char[i], 
					minus.relevant[minus.relevant > brackets.pos[i]] + 1, brackets.pos[i] + brackets.length[i] - 2)))
				isd <- sum(abs(ici - imean) / 2) / qnorm(0.975)
				if((ici[2] - imean) / (ici[1] - imean) < 1.5) {
					out[[i]] <- rnorm(n, imean, isd)
				} else {
					out[[i]] <- rlnorm(n, lmean(imean, isd), lsd(imean, isd)) # menee vaarin koska isd on laskettu normaalijakaumalle
				}
			} else 
			if(n.minus.inside.brackets == 2|n.minus.inside.brackets == 3) {
				# consecutive.minuses <-  minus.relevant + 1 == c(minus.relevant[2:length(minus.relevant)], 0) # turha jos oletetaan etta ensimmainen luku sulkujen sisalla on aina pienempi
				ici <- c(as.numeric(substr(res.char[i], brackets.pos[i] + 1, minus.relevant[minus.relevant > brackets.pos[i]][2] - 1)), as.numeric(substr(res.char[i], 
					minus.relevant[minus.relevant > brackets.pos[i]][2] + 1, brackets.pos[i] + brackets.length[i] - 2)))
				isd <- sum(abs(ici - imean) / 2) / qnorm(0.975)
				out[[i]] <- rnorm(n, imean, isd)
			} else {
				out[[i]] <- NA
				warning(paste("Unable to interpret \"", res.char[i], "\"", sep = ""))
			}
		} else {
			if(minus.exists[i]) {
				minus.relevant <- unlist(minus)[(cumsum(c(0, minus.length)) + 1)[i]:cumsum(minus.length)[i]]
				if(length(minus.relevant)==1) {if(as.numeric(substr(res.char[i], 1, minus.relevant - 1)) / as.numeric(substr(res.char[i], minus.relevant + 1, nchar(res.char[i]))) >= 1/100) {
					out[[i]] <- runif(n, as.numeric(substr(res.char[i], 1, minus.relevant - 1)), as.numeric(substr(res.char[i], minus.relevant + 1, nchar(res.char[i]))))} else {
					out[[i]] <- exp(runif(n, log(as.numeric(substr(res.char[i], 1, minus.relevant - 1))), log(as.numeric(substr(res.char[i], minus.relevant + 1, nchar(res.char[i]))))))}
				} else {out[[i]] <- runif(n, as.numeric(substr(res.char[i], 1, minus.relevant[2] - 1)), as.numeric(substr(res.char[i], minus.relevant[2] + 1, nchar(res.char[i]))))}
			} else {
				if(plusminus.exists[i]) {
					out[[i]] <- rnorm(n, as.numeric(substr(res.char[i], 1, plusminus[[i]][1] - 1)), as.numeric(substr(res.char[i], plusminus[[i]][1] + 1, nchar(res.char[i]))))
				}
			}
		} else {
			if(sum(unlist(strsplit(res.char[i], ""))==";") > 0) out[[i]] <- sample(sapply(strsplit(res.char[i], ";"), as.numeric), N, replace = TRUE)
		} else {out[[i]] <- NA; warning(paste("Unable to interpret \"", res.char[i], "\"", sep = ""))}
	}
	out
}

iter.f <- function(x) {
	1:x
}

input.interp.df <- function(idata, rescol = "Result.text", N = 1000) {
	temp <- input.interp(idata[, rescol], N)
	temp.lengths <- sapply(temp, length)
	out <- idata[rep(1:nrow(idata), times = temp.lengths),]
	out$Interp.Result <- unlist(temp)
	dim(temp.lengths) <- length(temp.lengths)
	out$Iteration <- c(apply(temp.lengths, 1, iter.f))
	out
}

--# : Koodi on vielä vaiheessa, ottaa character vectorin alkiot ja antaa tulkinnat listana. Virhetoleranssi hyvin huono. --Teemu R 03:09, 24 January 2012 (EET) --# : Data.framelle oma wrapperi. Testaamaton, ottaa ja antaa data.framen. --Teemu R 14:15, 24 January 2012 (EET)

⇤# : Koodi näyttää rakenteellisesti monimutkaiselta. Olisiko helpompaa yrittää tällaista lähestymistä: 1) regexpressioneilla selvitetään, mitä tyyppiä sisältö on (normaalijakauma, tasajakauma, luokittelumuuttuja, ...). 2) Erotetaan stringistä lukuarvot vektoriksi. 3) Vektorin lukuja käytetään parametreina jakaumafunktioissa. 4) Sämplätyt arvot pistetään tulosvektorin perään pötköksi. 5) Tulosvektori liitetään data.framen Result-kentäksi. Katso esimerkki Object-oriented programming in Opasnet, jossa on käytetty tätä taktiikkaa mutta on sallittu vain yksi syötetyyppi. --Jouni 20:35, 5 April 2012 (EEST)

@@ Line 53: / Line 53: @@
 # Merge the data.frame B with Iter.
 # Join data.frames A and B with rbind(). Columns: ''Iter,Index,Result''.
+===Rcode===
 <rcode name="answer">
@@ Line 132: / Line 134: @@
 {{comment|# |Koodi on vielä vaiheessa, ottaa character vectorin alkiot ja antaa tulkinnat listana. Virhetoleranssi hyvin huono.|--[[User:Teemu R|Teemu R]] 03:09, 24 January 2012 (EET)}}
 {{comment|# |Data.framelle oma wrapperi. Testaamaton, ottaa ja antaa data.framen.|--[[User:Teemu R|Teemu R]] 14:15, 24 January 2012 (EET)}}
+: {{attack|# |Koodi näyttää rakenteellisesti monimutkaiselta. Olisiko helpompaa yrittää tällaista lähestymistä: 1) regexpressioneilla selvitetään, mitä tyyppiä sisältö on (normaalijakauma, tasajakauma, luokittelumuuttuja, ...). 2) Erotetaan stringistä lukuarvot vektoriksi. 3) Vektorin lukuja käytetään parametreina jakaumafunktioissa. 4) Sämplätyt arvot pistetään tulosvektorin perään pötköksi. 5) Tulosvektori liitetään data.framen Result-kentäksi. Katso esimerkki [[Object-oriented programming in Opasnet]], jossa on käytetty tätä taktiikkaa mutta on sallittu vain yksi syötetyyppi.|--[[User:Jouni|Jouni]] 20:35, 5 April 2012 (EEST)}}

Difference between revisions of "Input.interp"

Revision as of 17:35, 5 April 2012

Question

Answer

Rcode

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Page Tools

Tools

In other websites