Difference between revisions of "OpasnetBaseUtils"

From Testiwiki
Jump to: navigation, search
m
m
 
(39 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
{{tool|moderator=Teemu R|stub=Yes}}
 
{{tool|moderator=Teemu R|stub=Yes}}
OpasnetBaseUtils is a collection of [[R]] functions for interaction with the [[Opasnet Base]] and manipulating data of multiple variables with multiple matching or unmatching dimensions, fitted into a neat package.  
+
[[Category:Code under inspection]]
 +
 
 +
{{attack|# |The code on this page should be built into the OpasnetUtils package and then the page should be merged with [[OpasnetUtils]]. |--[[User:Jouni|Jouni]] 18:45, 15 May 2013 (EEST)}}
 +
 
 +
==Question==
 +
 
 +
OpasnetBaseUtils is a collection of [[R]] functions for interaction with the [[Opasnet Base]] and manipulating data of multiple variables with multiple matching or unmatching dimensions, fitted into a neat package. What should such a package contain?
 +
 
 +
==Answer==
 +
 
 +
OpasnetBaseUtils contains the following functions. The functions are described in detail elsewhere (follow links).
 +
* [[Opasnet Base Connection for R#Downloading data|op_baseGetData()]]
 +
* [[Opasnet Base Connection for R#Finding index data|op_baseGetLocs()]]
 +
* [[Opasnet Base Connection for R#Uploading data|op_baseWrite()]]
 +
* These functions are outdated. They are only available for compatibility issues related to old code.
 +
** [http://en.opasnet.org/en-opwiki/index.php?title=Operating_intelligently_with_multidimensional_arrays_in_R&oldid=18412 IntArray()] (and related [http://en.opasnet.org/en-opwiki/index.php?title=Talk:Operating_intelligently_with_multidimensional_arrays_in_R&oldid=17403 discussion]) This function has been replaced by merge().
 +
** [http://en.opasnet.org/en-opwiki/index.php?title=Opasnet_Base_Connection_for_R&oldid=22176#Manipulating_data DataframeToArray()]. This function was used before because many calculations were made to arrays. More recently, calculations are done directly to data.frames, and they are rarely translated into arrays. It is more common to translate arrays to data.frames using as.data.frame(as.table(array)).
 +
 
 +
===Rcode generic===
 +
 
 +
* Functions: dropall, PTable, opasnet.data, tidy, summary.bring
 +
 
 +
<rcode name="generic">
 +
######################################
 +
## dropall pudottaa data.framesta pois kaikki faktorien sellaiset levelit, joita ei käytetä.
 +
## parametrit: x = data.frame
 +
 
 +
dropall <- function(x){
 +
    isFac <- NULL
 +
    for (i in 1:dim(x)[2]){isFac[i] = is.factor(x[ , i])}
 +
 
 +
    for (i in 1:length(isFac)){
 +
        x[, i] <- x[, i][ , drop = TRUE]
 +
        }
 +
    return(x)
 +
    }
 +
########################################
 +
 
 +
#########################################
 +
## PTable muuntaa arvioinnin todennäköisyystaulun sopivaan muotoon arviointia varten.
 +
## Parametrit: P = todennäköisyystaulu Opasnet-kannasta kaivettuna.
 +
##            n = iteraatioiden lukumäärä Monte Carlossa
 +
## Todennäköisyystaulun sarakkeiden on oltava: Muuttuja, Selite, Lokaatio, P
 +
## Tuotteena on Monte Carloa varten tehty taulu, jonka sarakkeina ovat
 +
## n (iteraatio) ja kaikki todennäköisyystaulussa olleet selitteet, joiden riveille on arvottu
 +
## lokaatiot niiden todennäköisyyksien mukaisesti, jotka todennäköisyystaulussa oli annettu.
 +
 
 +
PTable <- function(P, n) {
 +
Pt <- unique(P[,c("Muuttuja", "Selite")])
 +
Pt <- data.frame(Muuttuja = rep(Pt$Muuttuja, n), Selite = rep(Pt$Selite, n), obs = rep(1:n, each = nrow(Pt)), P = runif(n*nrow(Pt), 0, 1))
 +
for(i in 2:nrow(P)){P$Result[i] <- P$Result[i] + ifelse(P$Muuttuja[i] == P$Muuttuja[i-1] & P$Selite[i] == P$Selite[i-1], P$Result[i-1], 0)}
 +
P <- merge(P, Pt)
 +
P <- P[P$P <= P$Result, ]
 +
Pt <- as.data.frame(as.table(tapply(P$Result, as.list(P[, c("Muuttuja", "Selite", "obs")]), min)))
 +
colnames(Pt) <- c("Muuttuja", "Selite", "obs", "Result")
 +
Pt <- Pt[!is.na(P$Result), ]
 +
P <- merge(P, Pt)
 +
P <- P[, !colnames(P) %in% c("Result", "P", "Muuttuja")]
 +
P <- reshape(P, idvar = "obs", timevar = "Selite", v.names = "Lokaatio", direction = "wide")
 +
colnames(P) <- ifelse(substr(colnames(P), 1, 9) == "Lokaatio.", substr(colnames(P), 10,30), colnames(P))
 +
return(P)
 +
}
 +
 
 +
######################################
 +
## opasnet.data downloads a file from Finnish Opasnet wiki, English Opasnet wiki, or Opasnet File.
 +
## Parameters: filename is the URL without the first part (see below), wiki is "opasnet_en", "opasnet_fi", or "M-files".
 +
## If table is TRUE then a table file for read.table function is assumed; all other parameters are for this read.table function.
 +
 
 +
opasnet.data <- function(filename, wiki = "opasnet_en", table = FALSE, ...)
 +
{
 +
if (wiki == "opasnet_en") {
 +
file <- paste("http://en.opasnet.org/en-opwiki/images/", filename, sep = "")
 +
}
 +
if (wiki == "opasnet_fi") {
 +
file <- paste("http://fi.opasnet.org/fi_wiki/images/", filename, sep = "")
 +
}
 +
if (wiki == "M-files") {
 +
file <- paste("http://http://fi.opasnet.org/fi_wiki/extensions/mfiles/", filename, sep = "")
 +
}
 +
 
 +
#if(table == TRUE) {
 +
#file <- re#ad.table(file, header = FALSE, sep = "", quote = "\"'",
 +
#          dec = ".", row.names, col.names,
 +
#          as.is = !stringsAsFactors,
 +
#          na.strings = "NA", colClasses = NA, nrows = -1,
 +
#          skip = 0, check.names = TRUE, fill = !blank.lines.skip,
 +
#          strip.white = FALSE, blank.lines.skip = TRUE,
 +
#          comment.char = "#",
 +
#          allowEscapes = FALSE, flush = FALSE,
 +
#          stringsAsFactors = default.stringsAsFactors(),
 +
#          fileEncoding = "", encoding = "unknown")
 +
#return(file)
 +
#}
 +
#else {return(ge#tURL(file))}
 +
}
 +
 
 +
############ tidy: a function that cleans the tables from Opasnet Base
 +
# data is a table from op_baseGetData function
 +
tidy <- function (data, idvar = "obs", direction = "long") {
 +
 
 +
data$Result <- ifelse(!is.na(data$Result.Text), as.character(data$Result.Text), data$Result)
 +
if("Observation" %in% colnames(data)){test <- data$Observation != "Description"} else {test <- TRUE}
 +
data <- data[test, !colnames(data) %in% c("id", "Result.Text")]
 +
if("obs.1" %in% colnames(data)) {data[, "obs"] <- data[, "obs.1"]} # this line is temporarily needed until the obs.1 bug is fixed.
 +
data <- data[colnames(data) != "obs.1"]
 +
if("Row" %in% colnames(data)) { # If user has given Row, it is used instead of automatic obs.
 +
data <- data[, colnames(data) != "obs"]
 +
colnames(data)[colnames(data) == "Row"] <- "obs"
 +
}
 +
if(direction == "wide" & "Observation" %in% colnames(data))
 +
{
 +
data <- reshape(data, idvar = idvar, timevar = "Observation", v.names = "Result", direction = "wide")
 +
data <- data[colnames(data) != "obs"]
 +
colnames(data) <- gsub("^Result.", "", colnames(data))
 +
colnames(data)[colnames(data) == "result"] <- "Result"
 +
colnames(data)[colnames(data) == "Amount"] <- "Result"
 +
}
 +
else
 +
{
 +
data <- data[colnames(data) != "obs"]
 +
}
 +
return(data)
 +
}
 +
 
 +
############### summary.bring: Bring parts of summary table
 +
#  page is the page identifier for the summary table.
 +
summary.bring <- function(page, base = "opasnet_base"){
 +
data <- tidy(op_baseGetData(base, page))
 +
pages <- levels(data$Page)
 +
 
 +
## temp contains the additional information that is not on the actual data table.
 +
temp <- data[, !colnames(data) == "Observation"]
 +
temp <- reshape(temp, idvar = "Page", timevar = "Index", direction = "wide")
 +
colnames(temp) <- ifelse(substr(colnames(temp), 1, 7) == "Result.", substr(colnames(temp), 8, 50), colnames(temp))
 +
 
 +
## Get all data tables one at a time and combine them.
 +
for(i in 1:length(pages)){
 +
out <- op_baseGetData("opasnet_base", pages[i])
 +
out <- tidy(out)
 +
cols <- colnames(out)[!colnames(out) %in% c("Observation", "Result")]
 +
out <- reshape(out, timevar = "Observation", idvar = cols, direction = "wide")
 +
colnames(out) <- ifelse(substr(colnames(out), 1, 7) == "Result.", substr(colnames(out), 8, 50), colnames(out))
 +
out <- merge(temp[temp$Page == pages[i], ][colnames(temp) != "Page"], out)
 +
 
 +
## Check that all data tables have all the same columns before you combine them with rbind.
 +
if(i == 1){out2 <- out} else {
 +
addcol <- colnames(out2)[!colnames(out2) %in% colnames(out)]
 +
if(length(addcol) > 0) {
 +
temp <- as.data.frame(array("*", dim = c(1,length(addcol))))
 +
colnames(temp) <- addcol
 +
out <- merge(out, temp)}
 +
addcol <- colnames(out)[!colnames(out) %in% colnames(out2)]
 +
if(length(addcol) > 0) {
 +
temp <- as.data.frame(array("*", dim = c(1,length(addcol))))
 +
colnames(temp) <- addcol
 +
out2 <- merge(out2, temp)}
 +
 
 +
## Combine data tables.
 +
out2 <- rbind(out2, out)}
 +
}
 +
return(out2)
 +
}
 +
##########
 +
 
 +
</rcode>
 +
 
 +
[[op_fi:OpasnetBaseUtils]]
 +
[[Heande:OpasnetBaseUtils]]
 +
 
 +
==Rationale==
 +
 
 +
A suggestion about the structure and content:
 +
 
 +
There should be just one package (at least for the time being) from Opasnet developers, namely ''OpasnetUtils''. This contains different things:
 +
* OpasnetBaseUtils for connections to and from [[Opasnet Base]].
 +
** Suggested function names: opbase.read (previously op_baseGetData), opbase.write (previously op_baseWrite).
 +
{{comment|# |The original distinction between Write and GetData arose from the fact that data isn't the only thing read from the base. GetLocs also exists for getting location info on a particular data set. Of course GetLocs could be renamed locs or locations, but that loses some of the information contained in the function names.|--[[User:Teemu R|Teemu R]] 09:25, 9 May 2011 (EEST)}}
 +
* Functions for some particular tasks needed in Opasnet assessments, such as functions for calculating health impacts from ERF (the function takes in RR or OR or both and automatically calculates a synthesis), exposure and background disease.
 +
** Suggested function names: ophia.lifetable (for life table calculation), ophia.hia (for simple impact calculation), opgis.population (for slicing population data from a database for a case), opmath.sip and opmath.unsip (for turning a random sample into a [[SIPs and SLURPs|SIP]] and a SIP into a random sample, respectively, etc.
 +
* Outdated functions for compatibility reasons, such as [[Operating intelligently with multidimensional arrays in R|IntArray]].
 +
* Functions or practices for handling uncertain variables: how to merge run/obs index into a data.frame.
 +
 
 +
If the suggestion is accepted, the following things could be done to organise pages:
 +
* [[:File:OpasnetBaseUtils 0.8.0.zip]] is moved to [[:File:OpasnetUtils.zip]] (version numbers should NOT be in the filename).
 +
{{attack|# |This does not seem wise, my previous experience is that files downloaded from Opasnet are cached in some very special place and I was unable to download the most recent version of a certain file, because of the similar filename. Also I think any programmer would agree that it'd be bad practice to not include an easily accessible version number on the file. Instead we should consider use of some version management system e.g. SVN. |--[[User:Teemu R|Teemu R]] 09:25, 9 May 2011 (EEST)}}
 +
* The content of [[OpasnetBaseUtils]] is copied and the page is redirected to [[:file:OpasnetUtils.zip]].
 +
* [[OpasnetUtils]] is redirected to [[:file:OpasnetUtils.zip]].
 +
* [[File:OpasnetUtils.zip]] contains an explanation and links back to the archived pages mentioned above.
  
 
== Instructions ==
 
== Instructions ==
  
# Download [[File:OpasnetBaseUtils 0.8.0.zip]]
+
# Download [[File:OpasnetBaseUtils 0.8.4.zip]] (Save it in a location you can easily find)
 
# Open [[R]]
 
# Open [[R]]
 
# Click "Packages" on the topbar and choose "Install package(s) from local zip files..." from the drop-down menu
 
# Click "Packages" on the topbar and choose "Install package(s) from local zip files..." from the drop-down menu
 
# Locate the downloaded .zip file and install
 
# Locate the downloaded .zip file and install
  
*For usage notes see pages listed below.
+
{{mfiles}}
 +
 
 +
=== Usage ===
 +
 
 +
library(OpasnetBaseUtils)
 +
 
 +
*For function usage notes see the following pages:
 +
**[[Opasnet Base Connection for R]]
 +
**[[Operating intelligently with multidimensional arrays in R]]
  
 
=== Dependencies ===
 
=== Dependencies ===
  
*You need to have installed another package called RODBC which in turn requires the utils package. These functions are available from the CRAN repositories and can be easily installed from within R.  
+
*You need to have installed another package called RODBC which in turn requires the utils package. These packages are available from the CRAN repositories and can be easily installed from within R.
 +
[[Category:R tool]]
 +
[[Category:Open assessment]]
 +
[[Category:Opasnet Base]]
 +
 
 +
=== Change log ===
 +
 
 +
Forgot about this earlier so I'll add a change log now.
 +
 
 +
*0.8.4 - New versions of database up- and download functions added, they now support special characters properly in both opasnet and heande.
  
 
== See also ==
 
== See also ==
  
*[[Opasnet Base Connection for R]]
+
*[[File:OpasnetBaseUtils v.0.8.4 source.zip|OpasnetBaseUtils sources]]
*[[Operating intelligently with multidimensional arrays in R]]
+
**To build from source use '''R CMD build <src folder>''' & '''R CMD INSTALL <src folder>''' in a command line on properly configured machines (most Unix systems require no configuration)
 +
 
 +
{{Opasnet training}}

Latest revision as of 11:02, 26 August 2013

# : The code on this page should be built into the OpasnetUtils package and then the page should be merged with OpasnetUtils. --Jouni 18:45, 15 May 2013 (EEST)

Question

OpasnetBaseUtils is a collection of R functions for interaction with the Opasnet Base and manipulating data of multiple variables with multiple matching or unmatching dimensions, fitted into a neat package. What should such a package contain?

Answer

OpasnetBaseUtils contains the following functions. The functions are described in detail elsewhere (follow links).

  • op_baseGetData()
  • op_baseGetLocs()
  • op_baseWrite()
  • These functions are outdated. They are only available for compatibility issues related to old code.
    • IntArray() (and related discussion) This function has been replaced by merge().
    • DataframeToArray(). This function was used before because many calculations were made to arrays. More recently, calculations are done directly to data.frames, and they are rarely translated into arrays. It is more common to translate arrays to data.frames using as.data.frame(as.table(array)).

Rcode generic

  • Functions: dropall, PTable, opasnet.data, tidy, summary.bring

+ Show code

Rationale

A suggestion about the structure and content:

There should be just one package (at least for the time being) from Opasnet developers, namely OpasnetUtils. This contains different things:

  • OpasnetBaseUtils for connections to and from Opasnet Base.
    • Suggested function names: opbase.read (previously op_baseGetData), opbase.write (previously op_baseWrite).

--# : The original distinction between Write and GetData arose from the fact that data isn't the only thing read from the base. GetLocs also exists for getting location info on a particular data set. Of course GetLocs could be renamed locs or locations, but that loses some of the information contained in the function names. --Teemu R 09:25, 9 May 2011 (EEST)

  • Functions for some particular tasks needed in Opasnet assessments, such as functions for calculating health impacts from ERF (the function takes in RR or OR or both and automatically calculates a synthesis), exposure and background disease.
    • Suggested function names: ophia.lifetable (for life table calculation), ophia.hia (for simple impact calculation), opgis.population (for slicing population data from a database for a case), opmath.sip and opmath.unsip (for turning a random sample into a SIP and a SIP into a random sample, respectively, etc.
  • Outdated functions for compatibility reasons, such as IntArray.
  • Functions or practices for handling uncertain variables: how to merge run/obs index into a data.frame.

If the suggestion is accepted, the following things could be done to organise pages:

# : This does not seem wise, my previous experience is that files downloaded from Opasnet are cached in some very special place and I was unable to download the most recent version of a certain file, because of the similar filename. Also I think any programmer would agree that it'd be bad practice to not include an easily accessible version number on the file. Instead we should consider use of some version management system e.g. SVN. --Teemu R 09:25, 9 May 2011 (EEST)

Instructions

  1. Download File:OpasnetBaseUtils 0.8.4.zip (Save it in a location you can easily find)
  2. Open R
  3. Click "Packages" on the topbar and choose "Install package(s) from local zip files..." from the drop-down menu
  4. Locate the downloaded .zip file and install


<mfanonymousfilelist></mfanonymousfilelist>


Usage

library(OpasnetBaseUtils)

Dependencies

  • You need to have installed another package called RODBC which in turn requires the utils package. These packages are available from the CRAN repositories and can be easily installed from within R.

Change log

Forgot about this earlier so I'll add a change log now.

  • 0.8.4 - New versions of database up- and download functions added, they now support special characters properly in both opasnet and heande.

See also

  • File:OpasnetBaseUtils v.0.8.4 source.zip
    • To build from source use R CMD build <src folder> & R CMD INSTALL <src folder> in a command line on properly configured machines (most Unix systems require no configuration)
Materials and examples for training in Opasnet and open assessment
Help pages Wiki editingHow to edit wikipagesQuick reference for wiki editingDrawing graphsOpasnet policiesWatching pagesWriting formulaeWord to WikiWiki editing Advanced skills
Training assessment (examples of different objects) Training assessmentTraining exposureTraining health impactTraining costsClimate change policies and health in KuopioClimate change policies in Kuopio
Methods and concepts AssessmentVariableMethodQuestionAnswerRationaleAttributeDecisionResultObject-oriented programming in OpasnetUniversal objectStudyFormulaOpasnetBaseUtilsOpen assessmentPSSP
Terms with changed use ScopeDefinitionResultTool