Difference between revisions of "Opasnet Base Connection for R"

From Testiwiki
Jump to: navigation, search
m (Downloading data)
(See also)
 
(9 intermediate revisions by 3 users not shown)
Line 5: Line 5:
 
[[Category:Open assessment]]
 
[[Category:Open assessment]]
 
[[Heande:Opasnet Base Connection for R]]
 
[[Heande:Opasnet Base Connection for R]]
 +
[[Category:Code under inspection]]
 
{{tool|moderator=Teemu R}}
 
{{tool|moderator=Teemu R}}
 +
{{comment|# |Should we merge this page with [[OpasnetBaseUtils]]?|--[[User:Jouni|Jouni]] 20:18, 28 December 2011 (EET)}}
 +
 
Code for [[R]] for the purpose of interacting with the [[Opasnet Base]] is collected on this page. To use it, copy paste the code you need to the R console; this defines the functions, after which they can be called for in that R session. Or alternatively install the [[OpasnetBaseUtils]] package.  
 
Code for [[R]] for the purpose of interacting with the [[Opasnet Base]] is collected on this page. To use it, copy paste the code you need to the R console; this defines the functions, after which they can be called for in that R session. Or alternatively install the [[OpasnetBaseUtils]] package.  
  
Line 16: Line 19:
 
==Downloading data==
 
==Downloading data==
  
<nowiki>
+
'''op_baseGetData
op_baseGetData <- function(dsn, ident, include = NULL, exclude = NULL, series_id = NULL, iterations = NULL, use.utf8 = TRUE, utf8.conv = TRUE) {
+
 
 +
<rcode name="op_baseGetData">
 +
op_baseGetData <- function(dsn, ident, include = NULL, exclude = NULL, series_id = NULL, iterations = NULL, use.utf8 = TRUE, apply.utf8 = TRUE) {
 
if (use.utf8) db <- odbcConnect(dsn, DBMSencoding = "UTF-8") else db <- odbcConnect(dsn)
 
if (use.utf8) db <- odbcConnect(dsn, DBMSencoding = "UTF-8") else db <- odbcConnect(dsn)
 
obj_id <- sqlQuery(db, paste('SELECT id FROM obj WHERE ident = "', ident, '"', sep=''))[1,1]
 
obj_id <- sqlQuery(db, paste('SELECT id FROM obj WHERE ident = "', ident, '"', sep=''))[1,1]
Line 61: Line 66:
 
levels(dataframe[,2 + i]) <- gsub(" *$", "",gsub("^ *", "", levels(dataframe[,2 + i])))
 
levels(dataframe[,2 + i]) <- gsub(" *$", "",gsub("^ *", "", levels(dataframe[,2 + i])))
 
colnames(dataframe)[2 + i] <- as.character(Data[i, 3])
 
colnames(dataframe)[2 + i] <- as.character(Data[i, 3])
 +
if(apply.utf8) Encoding(levels(dataframe[,2 + i])) <- "UTF-8"
 
}
 
}
 
dataframe[,1:2 + 2 + nind] <- Data[1:nres*nind, 5:6]
 
dataframe[,1:2 + 2 + nind] <- Data[1:nres*nind, 5:6]
 
colnames(dataframe)[1:2 + 2 + nind] <- c("Result", "Result.Text")
 
colnames(dataframe)[1:2 + 2 + nind] <- c("Result", "Result.Text")
 +
if(apply.utf8) {if(is.factor(dataframe[,"Result.Text"])) {Encoding(levels(dataframe[,"Result.Text"])) <- "UTF-8"} else if(is.character(dataframe[,"Result.Text"])) {
 +
Encoding(dataframe[,"Result.Text"]) <- "UTF-8"}}
 
rownames(dataframe) <- 1:nres
 
rownames(dataframe) <- 1:nres
 
return(dataframe)
 
return(dataframe)
}</nowiki>
+
}
 +
</rcode>
  
 
===Usage===
 
===Usage===
Line 84: Line 93:
 
===Finding index data===
 
===Finding index data===
  
<nowiki>
+
'''op_baseGetLocs
op_baseGetLocs <- function(dsn, ident, series_id = NULL, use.utf8 = TRUE) {
+
 
 +
<rcode name="op_baseGetLocs">
 +
op_baseGetLocs <- function(dsn, ident, series_id = NULL, use.utf8 = TRUE, apply.utf8 = TRUE) {
 
if (use.utf8) db <- odbcConnect(dsn, DBMSencoding = "UTF-8") else db <- odbcConnect(dsn)
 
if (use.utf8) db <- odbcConnect(dsn, DBMSencoding = "UTF-8") else db <- odbcConnect(dsn)
 
obj_id <- sqlQuery(db, paste('SELECT id FROM obj WHERE ident = "', ident, '"', sep=''))[1,1]
 
obj_id <- sqlQuery(db, paste('SELECT id FROM obj WHERE ident = "', ident, '"', sep=''))[1,1]
Line 95: Line 106:
 
series_id, sep = ""))
 
series_id, sep = ""))
 
odbcClose(db)
 
odbcClose(db)
Locs <- Locs[order(Locs[,1]),]
+
Locs <- Locs[order(Locs$ind, Locs$loc_id),]
 
rownames(Locs) <- 1:nrow(Locs)
 
rownames(Locs) <- 1:nrow(Locs)
 +
if(apply.utf8) {Encoding(levels(Locs$ind)) <- "UTF-8"; Encoding(levels(Locs$loc)) <- "UTF-8"}
 
return(Locs)
 
return(Locs)
}</nowiki>
+
}
 +
</rcode>
  
 
*Returns all indexes and locations and their ids in a table of format: ind, loc, loc_id.
 
*Returns all indexes and locations and their ids in a table of format: ind, loc, loc_id.
Line 104: Line 117:
 
==Manipulating data==
 
==Manipulating data==
  
<nowiki>
+
'''DataframeToArray
 +
 
 +
<rcode name="DataframeToArray">
 
DataframeToArray <- function(dataframe, rescol = NULL) {
 
DataframeToArray <- function(dataframe, rescol = NULL) {
 
ColNames <- colnames(dataframe[(colnames(dataframe) == "id") == FALSE])
 
ColNames <- colnames(dataframe[(colnames(dataframe) == "id") == FALSE])
Line 135: Line 150:
 
array[as.matrix(dataframe[,ColNames])] <- dataframe[,rescol]
 
array[as.matrix(dataframe[,ColNames])] <- dataframe[,rescol]
 
return(array)
 
return(array)
}</nowiki>
+
}
 +
</rcode>
  
 
===Usage===
 
===Usage===
Line 148: Line 164:
 
==Uploading data==
 
==Uploading data==
  
<nowiki>
+
'''op_baseWrite
 +
 
 +
<rcode name="op_baseWrite">
 
op_baseWrite <- function(dsn, input, ident = NULL, name = NULL, unit = NULL, objtype_id = NULL, who = NULL, acttype = NULL,  
 
op_baseWrite <- function(dsn, input, ident = NULL, name = NULL, unit = NULL, objtype_id = NULL, who = NULL, acttype = NULL,  
rescol = NULL, n.obs.const = FALSE, maxrows = 50000, use.utf8 = TRUE) {
+
rescol = NULL, n.obs.const = FALSE, maxrows = 50000, use.utf8 = TRUE, use.utf8.read = TRUE, latin1.2.utf8.conv.write = TRUE, utf8.2.latin1.conv.read = TRUE) {
 
 
 
# Coerce input into a data frame if it isn't one already; get rid of empty cells
 
# Coerce input into a data frame if it isn't one already; get rid of empty cells
Line 168: Line 186:
 
dataframe[,i] <- factor(dataframe[,i])
 
dataframe[,i] <- factor(dataframe[,i])
 
levels(dataframe[,i]) <- gsub(" *$", "",gsub("^ *", "", levels(dataframe[,i])))
 
levels(dataframe[,i]) <- gsub(" *$", "",gsub("^ *", "", levels(dataframe[,i])))
if(use.utf8) if(sum(Encoding(levels(dataframe[,i]))=="latin1")!=0) levels(dataframe[,i]) <- iconv(levels(dataframe[,i]), "latin1", "UTF-8")
+
if(latin1.2.utf8.conv.write) if(sum(Encoding(levels(dataframe[,i]))=="latin1")!=0) levels(dataframe[,i]) <- iconv(levels(dataframe[,i]), "latin1", "UTF-8")
 
}
 
}
 
 
Line 175: Line 193:
 
# Open database connection
 
# Open database connection
 
if(use.utf8) db <- odbcConnect(dsn, DBMSencoding = "UTF-8") else db <- odbcConnect(dsn)
 
if(use.utf8) db <- odbcConnect(dsn, DBMSencoding = "UTF-8") else db <- odbcConnect(dsn)
 +
if(!use.utf8.read) db2 <- odbcConnect(dsn)
 
 
 
# Add page to database (if it doesn't already exist)
 
# Add page to database (if it doesn't already exist)
Line 230: Line 249:
 
i, '", 6)', sep = ''))
 
i, '", 6)', sep = ''))
 
}
 
}
IndIds <- sqlQuery(db, paste('SELECT id, ident FROM obj WHERE ident IN("', paste(gsub(" ", "_", ColNames),  
+
IndIds <- sqlQuery((if(use.utf8.read) db else db2), paste('SELECT id, ident FROM obj WHERE ident IN("', paste(gsub(" ", "_", ColNames),  
 
collapse = '","'), '")', sep = ''))
 
collapse = '","'), '")', sep = ''))
 +
if(utf8.2.latin1.conv.read) levels(IndIds$ident) <- iconv(levels(IndIds$ident), "UTF-8", "latin1")
 
IndIdMap <- IndIds$id
 
IndIdMap <- IndIds$id
 
names(IndIdMap) <- tolower(IndIds$ident)
 
names(IndIdMap) <- tolower(IndIds$ident)
Line 244: Line 264:
 
}
 
}
 
}
 
}
LocIds <- sqlQuery(db, paste('SELECT id, obj_id_i, location FROM loc WHERE obj_id_i IN("', paste(ColIds, collapse = '","'),  
+
LocIds <- sqlQuery((if(use.utf8.read) db else db2), paste('SELECT id, obj_id_i, location FROM loc WHERE obj_id_i IN("', paste(ColIds, collapse = '","'),  
 
'")', sep = ''))
 
'")', sep = ''))
 
+
if(utf8.2.latin1.conv.read) levels(LocIds$location) <- iconv(levels(LocIds$location), "UTF-8", "latin1")
 +
 
for (i in ColIds) {
 
for (i in ColIds) {
 
LocIdMap <- LocIds[LocIds$obj_id_i == i, 1]
 
LocIdMap <- LocIds[LocIds$obj_id_i == i, 1]
Line 260: Line 281:
 
#Writing cell
 
#Writing cell
 
n <- tapply(dataframe[,rescol], dataframe[,ColIds], length)
 
n <- tapply(dataframe[,rescol], dataframe[,ColIds], length)
ncell <- sum(n, na.rm = TRUE)
+
ncell <- sum(!is.na(n))
 
if (is.numeric(dataframe[,rescol])) means <- tapply(dataframe[,rescol], dataframe[,ColIds], mean) else means <- rep(0, ncell)
 
if (is.numeric(dataframe[,rescol])) means <- tapply(dataframe[,rescol], dataframe[,ColIds], mean) else means <- rep(0, ncell)
 
if (is.numeric(dataframe[,rescol])) {
 
if (is.numeric(dataframe[,rescol])) {
Line 318: Line 339:
 
return(character())
 
return(character())
 
}
 
}
</nowiki>
+
</rcode>
  
 
===Usage===
 
===Usage===
Line 334: Line 355:
 
***Indexes should preferably match an earlier entry: [[Special:OpasnetBaseIndices]].
 
***Indexes should preferably match an earlier entry: [[Special:OpasnetBaseIndices]].
 
***Indexes are treated as identifiers for indexes in the database, spaces in the indexes are converted to _. This ensures maximum compatibility and ease in operations in which data is downloaded and uploaded again. Names and more specific details can be edited into the indexes separately.
 
***Indexes are treated as identifiers for indexes in the database, spaces in the indexes are converted to _. This ensures maximum compatibility and ease in operations in which data is downloaded and uploaded again. Names and more specific details can be edited into the indexes separately.
 +
 +
== Regarding special characters and character encoding ==
 +
 +
Using special characters like ä and ö when our database is encoded in latin1 while wiki is in UTF-8 is a bit complicated. New parameters -- for forcing the odbc connection to use UTF-8 and in-R conversion of one encoding to the other prior to writing or after reading -- have been created. The defaults have been made to work with opasnet_base. For use with heande_base, forcing UTF-8 on the odbc connection when reading should be disabled, meaning that ''use.utf8'' should be set to ''FALSE'' when using ''op_baseGetData'' and ''use.utf8.read'' should be set to ''FALSE'' when using ''op_baseWrite''.
  
 
== See also ==
 
== See also ==
 +
 +
{{Opasnet Base}}
  
 
*[[A Tutorial on R]]
 
*[[A Tutorial on R]]

Latest revision as of 20:29, 10 April 2015


--# : Should we merge this page with OpasnetBaseUtils? --Jouni 20:18, 28 December 2011 (EET)

Code for R for the purpose of interacting with the Opasnet Base is collected on this page. To use it, copy paste the code you need to the R console; this defines the functions, after which they can be called for in that R session. Or alternatively install the OpasnetBaseUtils package.

Package dependencies

These packages are required for most of the code to work. To install: from the top bar menu Packages>Install. To load: copy-paste.

library(RODBC)

Downloading data

op_baseGetData

+ Show code

Usage

variable <- op_baseGetData("opasnet_base", "page identifier", include = vector_of_loc_ids, exclude = vector_of_loc_ids)
  • Assuming "opasnet_base" is a correctly defined DSN (Data Service Name; in Windows XP: Control Panel\Administrative tools\Data Sources (ODBC)).
  • Include and exclude are optional.
    • Include picks all cells in the locations given.
      • The clearest case is when all the included locations belong to the same index: Any cells in the non-included locations of the index will be left out.
      • In case given locations are in multiple indices: The effect produced will be the same as picking separately for each index and removing duplicates.
    • Exclude unpicks any cells which are indexed by the locations given. Slower than include.
    • They can be used in unison.
  • Result will be in a table format with columns: id, obs, ind1, ind2 ... indn, Result.
  • series_id is an optional parameter, if it is not given the most current upload of the data will be downloaded.

Finding index data

op_baseGetLocs

+ Show code

  • Returns all indexes and locations and their ids in a table of format: ind, loc, loc_id.

Manipulating data

DataframeToArray

+ Show code

Usage

variable2 <- DataframeToArray(variable1, rescol = NULL)
  • variable1 must be in similar format as the result when downloading.
  • Columns named "id" and various versions of "Result" are ignored for dimension creation.
    • "obs" column will also be ignored if there's only one.
  • The column containing the values may be defined in the parameters, otherwise it is assumed to be either "Freq", "Result" or "result" in that order.

Uploading data

op_baseWrite

+ Show code

Usage

op_baseWrite(dsn, input, ident = NULL, name = NULL, unit = NULL, objtype_id = NULL, who = NULL, acttype = NULL, rescol = NULL)
  • dsn and input must be defined, the rest of the object and act parameters if not defined are prompted for by the function as needed.
    • For uploading the DSN defined must have writers permissions.
  • rescol defines the column from which the values are chosen from, both numerical and textual data are allowed, if left undefined the function will check column matches for "Freq", "Result" and "result" in that order.

Restrictions

  • Input may only be given in either array or data.frame form.
    • Indexes used may not exceed the character limit of 20.
      • Indexes should preferably match an earlier entry: Special:OpasnetBaseIndices.
      • Indexes are treated as identifiers for indexes in the database, spaces in the indexes are converted to _. This ensures maximum compatibility and ease in operations in which data is downloaded and uploaded again. Names and more specific details can be edited into the indexes separately.

Regarding special characters and character encoding

Using special characters like ä and ö when our database is encoded in latin1 while wiki is in UTF-8 is a bit complicated. New parameters -- for forcing the odbc connection to use UTF-8 and in-R conversion of one encoding to the other prior to writing or after reading -- have been created. The defaults have been made to work with opasnet_base. For use with heande_base, forcing UTF-8 on the odbc connection when reading should be disabled, meaning that use.utf8 should be set to FALSE when using op_baseGetData and use.utf8.read should be set to FALSE when using op_baseWrite.

See also

Pages related to Opasnet Base

Opasnet Base · Uploading to Opasnet Base · Data structures in Opasnet · Opasnet Base UI · Modelling in Opasnet · Special:Opasnet Base Import · Opasnet Base Connection for R (needs updating) · Converting KOPRA data into Opasnet Base · Poll

Pages related to the 2008-2011 version of Opasnet Base

Opasnet base connection for Analytica · Opasnet base structure · Related Analytica file (old version File:Transferring to result database.ANA) · Analytica Web Player · Removed pages and other links · Standard run · OpasnetBaseUtils