Difference between revisions of "Uploading to Opasnet Base"

From Testiwiki
Jump to: navigation, search
(instructions from file:Opasnet base connection.ANA moved here without editing (yet))
 
(edited but not finished)
Line 9: Line 9:
  
 
==Definition==
 
==Definition==
 +
 +
===Inputs===
 +
 +
This section describes what the inputs are and the different ways they can be formatted for a successful upload.
 +
 +
'''Data
 +
 +
The data must be formatted as a two-dimensional table with one observation at each row. Columns contain either explanations (or independent variables in statistics) or the actual observations (or dependent variables in statistics). This difference is important. Explanations are things that are fixed before the actual observation, while observations are those that are actually measured or observed. See the following table as example.
 +
 +
{| {{prettytable}}
 +
|+An example of a data table.
 +
!City
 +
!Year
 +
!Sex
 +
! Body mass index BMI
 +
! Blood cholesterol
 +
|----
 +
|| London|| 2010|| Male|| 20|| 3.31
 +
|----
 +
|| London|| 2010|| Male|| 25|| 6.83
 +
|----
 +
|| London|| 2010|| Female|| 20|| 5.55
 +
|----
 +
|| London|| 2010|| Female|| 30|| 5.42
 +
|----
 +
|| London|| 2010|| Female|| 25|| 4.19
 +
|----
 +
|| New York|| 2010|| Male|| 22|| 3.33
 +
|----
 +
|| New York|| 2010|| Male|| 26|| 5.84
 +
|----
 +
|| New York|| 2010|| Female|| 28|| 5.67
 +
|----
 +
|| New York|| 2010|| Female|| 26|| 4.52
 +
|----
 +
|| New York|| 2010|| Female|| 24|| 5.67
 +
|----
 +
|}
 +
 +
The table seems unambiguous at the first glance, but it is impossible to interpret it without knowing, which columns are explanations and which are observations. This may be a study performed in London and New York in 2010, where random people were asked for a blood test. In this case, city and year are explanations, and other columns are observations. However, the data may as well be summary statistics from a larger study, where the studied individuals were grouped in these cities based on their sex and body mass index (BMI), and the mean cholesterol in each group is the only observation column. (The data implies that London used 5-unit groups for BMI while New York used 2-unit groups, but you cannot know based on the data only.)
 +
 +
However, if you know which columns are explanations and which are observations, you can actually deduce many important aspects of the design of the study from which the data came. Of course many things must be explained elsewhere like how people were selected and what the base population studied was.
 +
 +
Another important feature of the data is, whether it is deterministic or probabilistic.
  
 
8.7.2010 Jouni Tuomisto
 
8.7.2010 Jouni Tuomisto
 
If the variable is deterministic, Obs should be 0. This must be in all upload methods. They are corrected accordingly, see Indexify.
 
If the variable is deterministic, Obs should be 0. This must be in all upload methods. They are corrected accordingly, see Indexify.
  
; Findid: This function gets an id from a table.
 
in: the property for which the id is needed. In MUST be unique in cond and it must contain index i.
 
table: the table from where the id is brought. The table MUST have .j as the column index, .i as the row index, and a column named 'id'.
 
cond: the name of the field that is compared with in. Cond must be text.
 
  
;Textify: Changes a number to a text value with up to 15 significant numbers. This bypasses the number formatting problem that tends to convert e.g. 93341 to '93.34K'. If the input is null, the result is ''.
+
===Some key parts of the Opasnet Base connection===
 +
 
 +
'''Overview
 +
 
 +
This module saves original data or model results (a study or a variable, respectively) into the Opasnet Base. You need your Opasnet username and password to do that. You must fill in all tables and fields below before the process can be completed.  
  
This module saves original data or model results (a study or a variable, respectively) into the Opasnet Base. You need your Opasnet username and password to do that. You must fill in all tables and fields below before the process can be completed. Fill in the data below from top to bottom.
 
 
If an object with the same Ident already exists in the Opasnet Base, the information will be added to that object.
 
If an object with the same Ident already exists in the Opasnet Base, the information will be added to that object.
 
Before you start, make sure that you have created an object page in the Opasnet wiki for each object (study or variable) you want to upload.
 
Before you start, make sure that you have created an object page in the Opasnet wiki for each object (study or variable) you want to upload.
  
Data structure:
+
The first row must contain the names of the columns. These names are used when creating the list of explanations in the Opasnet Base.
* Data must be uploaded in the format of a two-dimensional table. The table has rows, one observation in each row, and columns (fields).
+
 
* There are two kinds of columns. A) Index columns (also called independent variables in statistics) contain determinants of the actual data, such as sex of the observed individuals, or the observation year. B) Parameter columns (also called dependent variables) contain the actual data about the observations, given the index information.
+
You must give your Opasnet username to upload data. The username will be stored together with the upload information.
* The first row must contain the names of the columns, i.e. the indices and parameters. These names are used when creating indices in the Opasnet Base.
+
 
 +
'''Inputs
 +
 
 +
Writerpsswd:
 +
You must know the writer password for the Opasnet Base if you are not using the AWP web interface.
 +
 
  
Object info:
 
* You must give your Opasnet username and password to upload data. The username will be stored together with the upload information.
 
 
*Object info contains the most important metadata about your data.
 
*Object info contains the most important metadata about your data.
- Data source must be 1 when using AWP.
 
- Analytica identifier is ignored when using AWP.
 
 
- Ident is the page identifier in Opasnet. If your study or variable does not already have a page, you must create one. The identifier is found in the metadata box in the top right corner of the Opasnet page.
 
- Ident is the page identifier in Opasnet. If your study or variable does not already have a page, you must create one. The identifier is found in the metadata box in the top right corner of the Opasnet page.
 
- Number of indices is the number of columns that contain explanatory information (see below).
 
- Number of indices is the number of columns that contain explanatory information (see below).
Line 40: Line 84:
 
- Append to upload: Typically, each data upload event is given a separate identifier. If you want to continue an existing upload of the same object, you can give the number of that upload, and the new data will be appended.
 
- Append to upload: Typically, each data upload event is given a separate identifier. If you want to continue an existing upload of the same object, you can give the number of that upload, and the new data will be appended.
  
Observations:
+
'''Key parts of OBC
* The data are copy-pasted into the field 'Observations'. The source of the data can be any spreadsheet or text processor, as long as each column is separated by a tab, and each row by a line break. Note that the pasted data should be between 'quotation marks'.
+
 
 +
; Findid: This function gets an id from a table. It has the following parameters: '''in''': the property for which the id is needed. In MUST be unique in cond and it must contain index i. '''table''': the table from where the id is brought. The table MUST have .j as the column index, .i as the row index, and a column named 'id'. '''cond''': the name of the field that is compared with in. Cond must be text.
 +
 
 +
;Textify: Changes a number to a text value with up to 15 significant numbers. This bypasses the number formatting problem that tends to convert e.g. 93341 to '93.34K'. If the input is null, the result is ''.
 +
 
 +
 
 +
 
 +
;Input format type 1 (Copy-paste): The data are copy-pasted into the field 'Observations'. The source of the data can be any spreadsheet or text processor, as long as each column is separated by a tab, and each row by a line break.  
 +
 
 +
Note that the pasted data should be between 'quotation marks'.
  
Data info:
 
 
Fill in the additional information about the data. These are asked for the object, and also for all the indices and the parameter. Note that is an entry with the identical Ident already exists in the Opasnet Base, this information will NOT be uploaded but the existing information will be used instead. All information should be between 'quotation marks' so that they are not mistakenly interpreted as Analytica node identifiers.
 
Fill in the additional information about the data. These are asked for the object, and also for all the indices and the parameter. Note that is an entry with the identical Ident already exists in the Opasnet Base, this information will NOT be uploaded but the existing information will be used instead. All information should be between 'quotation marks' so that they are not mistakenly interpreted as Analytica node identifiers.
 
- Name: a description that may be longer than an identifier. This is typically identical to the respective page in Opasnet.  
 
- Name: a description that may be longer than an identifier. This is typically identical to the respective page in Opasnet.  
 
- Unit: unit of measurement.
 
- Unit: unit of measurement.
  
Uploading:
+
'''Advanced options:
* There are two ways of uploading data. A) 'Upload data' is a public format, and all details are openly available. B) 'Upload non-public data' stores the actual data (the values in the parameter columns) into a database that requires a password for reading. However, all other information (including upload metadata and the data in  the index fields) are openly available.
 
  
Follow these instructions if you have Analytica Enterprise and have an ODBC connection to the Opasnet Base. Read also the simplified help; not everything is repeated here.
+
There are two ways of uploading data. A) 'Upload data' is a public format, and all details are openly available. B) 'Upload non-public data' stores the actual data (the values in the parameter columns) into a database that requires a password for reading. However, all other information (including upload metadata and the data in  the index fields) are openly available.
  
Platform:
+
Platform: You must choose THL computer if you are not using the AWP web interface.
You must choose THL computer if you are not using the AWP web interface.
 
  
Writerpsswd:
 
You must know the writer password for the Opasnet Base if you are not using the AWP web interface.
 
  
 
Object info:
 
Object info:

Revision as of 06:57, 11 July 2010


Uploading to Opasnet Base helps you understand what data could and should be updated to Opasnet Base and what the recommended data structures and formats are. For technical instructions how to use the current upload software, see Opasnet Base connection. For a general description about the database, see Opasnet Base and for technical details about the database, see [[Opasnet Base structure).

Scope

What data could and should be updated to Opasnet Base and what are the recommended data structures and formats?

Definition

Inputs

This section describes what the inputs are and the different ways they can be formatted for a successful upload.

Data

The data must be formatted as a two-dimensional table with one observation at each row. Columns contain either explanations (or independent variables in statistics) or the actual observations (or dependent variables in statistics). This difference is important. Explanations are things that are fixed before the actual observation, while observations are those that are actually measured or observed. See the following table as example.

An example of a data table.
City Year Sex Body mass index BMI Blood cholesterol
London 2010 Male 20 3.31
London 2010 Male 25 6.83
London 2010 Female 20 5.55
London 2010 Female 30 5.42
London 2010 Female 25 4.19
New York 2010 Male 22 3.33
New York 2010 Male 26 5.84
New York 2010 Female 28 5.67
New York 2010 Female 26 4.52
New York 2010 Female 24 5.67

The table seems unambiguous at the first glance, but it is impossible to interpret it without knowing, which columns are explanations and which are observations. This may be a study performed in London and New York in 2010, where random people were asked for a blood test. In this case, city and year are explanations, and other columns are observations. However, the data may as well be summary statistics from a larger study, where the studied individuals were grouped in these cities based on their sex and body mass index (BMI), and the mean cholesterol in each group is the only observation column. (The data implies that London used 5-unit groups for BMI while New York used 2-unit groups, but you cannot know based on the data only.)

However, if you know which columns are explanations and which are observations, you can actually deduce many important aspects of the design of the study from which the data came. Of course many things must be explained elsewhere like how people were selected and what the base population studied was.

Another important feature of the data is, whether it is deterministic or probabilistic.

8.7.2010 Jouni Tuomisto If the variable is deterministic, Obs should be 0. This must be in all upload methods. They are corrected accordingly, see Indexify.


Some key parts of the Opasnet Base connection

Overview

This module saves original data or model results (a study or a variable, respectively) into the Opasnet Base. You need your Opasnet username and password to do that. You must fill in all tables and fields below before the process can be completed.

If an object with the same Ident already exists in the Opasnet Base, the information will be added to that object. Before you start, make sure that you have created an object page in the Opasnet wiki for each object (study or variable) you want to upload.

The first row must contain the names of the columns. These names are used when creating the list of explanations in the Opasnet Base.

You must give your Opasnet username to upload data. The username will be stored together with the upload information.

Inputs

Writerpsswd: You must know the writer password for the Opasnet Base if you are not using the AWP web interface.


  • Object info contains the most important metadata about your data.

- Ident is the page identifier in Opasnet. If your study or variable does not already have a page, you must create one. The identifier is found in the metadata box in the top right corner of the Opasnet page. - Number of indices is the number of columns that contain explanatory information (see below). - Parameter name is a common name for all data columns. If omitted, 'Parameter' is used. See below for more details. - If "Probabilistic?" is 1, then each row of the data table is considered a random draw from a data pool. Note that it is assumed that the index values are assumed the same in all rows, and only the index values of the first row are stored. - Append to upload: Typically, each data upload event is given a separate identifier. If you want to continue an existing upload of the same object, you can give the number of that upload, and the new data will be appended.

Key parts of OBC

Findid
This function gets an id from a table. It has the following parameters: in: the property for which the id is needed. In MUST be unique in cond and it must contain index i. table: the table from where the id is brought. The table MUST have .j as the column index, .i as the row index, and a column named 'id'. cond: the name of the field that is compared with in. Cond must be text.
Textify
Changes a number to a text value with up to 15 significant numbers. This bypasses the number formatting problem that tends to convert e.g. 93341 to '93.34K'. If the input is null, the result is .


Input format type 1 (Copy-paste)
The data are copy-pasted into the field 'Observations'. The source of the data can be any spreadsheet or text processor, as long as each column is separated by a tab, and each row by a line break.

Note that the pasted data should be between 'quotation marks'.

Fill in the additional information about the data. These are asked for the object, and also for all the indices and the parameter. Note that is an entry with the identical Ident already exists in the Opasnet Base, this information will NOT be uploaded but the existing information will be used instead. All information should be between 'quotation marks' so that they are not mistakenly interpreted as Analytica node identifiers. - Name: a description that may be longer than an identifier. This is typically identical to the respective page in Opasnet. - Unit: unit of measurement.

Advanced options:

There are two ways of uploading data. A) 'Upload data' is a public format, and all details are openly available. B) 'Upload non-public data' stores the actual data (the values in the parameter columns) into a database that requires a password for reading. However, all other information (including upload metadata and the data in the index fields) are openly available.

Platform: You must choose THL computer if you are not using the AWP web interface.


Object info: - Data source: 1 means that you are copy-pasting data to the 'Observations' field. 2 means that you have a 2D table in an Analytica node. The node must have column index .j (note: it is a local index!) and row index .i. The names of the columns must be in the index .j, and the first row must contain data. 3 means that you have a typical Analytica node with n indices; one of the indices may be Run. The node is transformed into a 2D table using MDArrayToTable. - Analytica identifier is the identifier of the node to be used. The name must be given between 'quotation marks', i.e. as text. - Ident: like in the simplified upload. - Number of indices: like in the simplified upload if data source 2 is used; for 3, the number of indices comes from the node, and this entry is ignored. - Parameter name: like in the simplified upload if data source 2 is used; for 3, the parameter is implicit, and this entry is ignored. - Probabilistic?: like in the simplified upload if data source 2 is used; for 3, if this entry is 1, the sample mode is used and the full distribution is saved, if the entry is not 1, the mid mode is used. - Append to upload: like in the simplified upload.