Difference between revisions of "Uploading to Opasnet Base"

From Testiwiki
Jump to: navigation, search
(edited but not finished)
(editing finished for now)
Line 2: Line 2:
 
[[Category:Opasnet Base]]
 
[[Category:Opasnet Base]]
 
{{method|moderator=Jouni}}
 
{{method|moderator=Jouni}}
'''Uploading to [[Opasnet Base]]''' helps you understand what data could and should be updated to [[Opasnet Base]] and what the recommended data structures and formats are. For technical instructions how to use the current upload software, see [[Opasnet Base connection]]. For a general description about the database, see [[Opasnet Base]] and for technical details about the database, see [[Opasnet Base structure).
+
'''Uploading to [[Opasnet Base]]''' helps you understand what data could and should be updated to [[Opasnet Base]] and what the recommended data structures and formats are. For technical instructions how to use the current upload software, see [[Opasnet Base connection]]. For a general description about the database, see [[Opasnet Base]] and for technical details about the database, see [[Opasnet Base structure). For details about downloading data, see [[Opasnet Base UI]].
  
 
==Scope==
 
==Scope==
Line 9: Line 9:
  
 
==Definition==
 
==Definition==
 
===Inputs===
 
  
 
This section describes what the inputs are and the different ways they can be formatted for a successful upload.
 
This section describes what the inputs are and the different ways they can be formatted for a successful upload.
  
'''Data
+
What are the topics about which data can be uploaded into the Base? Well, basically any topic that provides useful information for any decision-making situation that has societal relevance. This sounds like a very wide definition, and it is. The data may be about which car models are environmentally friendly. It may be about pollutants in food. It may be new ideas about a societally just value-added tax.
  
The data must be formatted as a two-dimensional table with one observation at each row. Columns contain either explanations (or independent variables in statistics) or the actual observations (or dependent variables in statistics). This difference is important. Explanations are things that are fixed before the actual observation, while observations are those that are actually measured or observed. See the following table as example.
+
What are, then, the data structures that are allowed? Although not all structures are allowed, almost any data can be easily transformed into the structure that is used in [[Opasnet Base]]. The data must be formatted as a two-dimensional table with one observation at each row. Cells of the table may contain either text or numerical values. Columns contain either explanations (or ''independent variables'' in statistics) or the actual observations (or ''dependent variables'' in statistics). This difference is important. Explanations are things that are fixed before the actual observation, while observations are those that are actually measured or observed. See the following table as example.
  
 
{| {{prettytable}}
 
{| {{prettytable}}
 
|+An example of a data table.
 
|+An example of a data table.
!City
+
! City
!Year
+
! Year
!Sex
+
! Sex
! Body mass index BMI
+
! Body mass index BMI (kg/m<sup>2</sup>)
! Blood cholesterol
+
! Blood cholesterol (mM)
 
|----
 
|----
 
|| London|| 2010|| Male|| 20|| 3.31
 
|| London|| 2010|| Male|| 20|| 3.31
Line 48: Line 46:
 
|}
 
|}
  
The table seems unambiguous at the first glance, but it is impossible to interpret it without knowing, which columns are explanations and which are observations. This may be a study performed in London and New York in 2010, where random people were asked for a blood test. In this case, city and year are explanations, and other columns are observations. However, the data may as well be summary statistics from a larger study, where the studied individuals were grouped in these cities based on their sex and body mass index (BMI), and the mean cholesterol in each group is the only observation column. (The data implies that London used 5-unit groups for BMI while New York used 2-unit groups, but you cannot know based on the data only.)
+
The table seems unambiguous at the first glance, but it is impossible to interpret it correctly without knowing, which columns are explanations and which are observations. This may be a study performed in London and New York in 2010, where random people were asked for a blood test. In this case, city and year are explanations, and other columns are observations. However, the data may as well be summary statistics from a larger study, where the studied individuals were grouped in these cities based on their sex and body mass index (BMI), and the mean cholesterol in each group is the only observation column. (The data implies that London used BMI groups 5 kg/m<sup>2</sup> wide, while New York used BMI groups kg/m<sup>2</sup> wide, but you cannot know based on the data only.)
  
 
However, if you know which columns are explanations and which are observations, you can actually deduce many important aspects of the design of the study from which the data came. Of course many things must be explained elsewhere like how people were selected and what the base population studied was.
 
However, if you know which columns are explanations and which are observations, you can actually deduce many important aspects of the design of the study from which the data came. Of course many things must be explained elsewhere like how people were selected and what the base population studied was.
  
Another important feature of the data is, whether it is deterministic or probabilistic.
+
Another important feature of the data is, whether it is deterministic or probabilistic. With deterministic data, it is assumed that each row is an independent piece of information. With probabilistic data, it is assumed that there are random draws from a pool of potential observations (like an urn full of balls with different colours, each having the same probability of being picked). However, all observations are not picked from the same pool, but from several pools uniquely defined by the explanation columns.
 +
 
 +
Let's assume that we look at the table above and learn that it is a probabilistic data with two explanation columns. We then know that people were randomly picked from either London or New York in 2010 (explanation columns are always the first ones on the left side). In London, two happened to be males and three females, with measured BMIs and cholesterol levels. Thus, there are five observations from the pool defined by "London 2010", and also five observations from "New York 2010." These are numbered 1..5. If the data is deterministic, the observation number is 0 for all rows.
  
8.7.2010 Jouni Tuomisto
 
If the variable is deterministic, Obs should be 0. This must be in all upload methods. They are corrected accordingly, see Indexify.
 
  
 +
===Inputs===
  
===Some key parts of the Opasnet Base connection===
+
[[image:User interface of Opasnet Base for uploading.png|thumb|600px|center|Figure. User interface for uploading data to Opasnet Base.]]
  
 
'''Overview
 
'''Overview
  
This module saves original data or model results (a study or a variable, respectively) into the Opasnet Base. You need your Opasnet username and password to do that. You must fill in all tables and fields below before the process can be completed.  
+
This method is used to save original data or model results (a study or a variable, respectively) into the [[Opasnet Base]].
  
 
If an object with the same Ident already exists in the Opasnet Base, the information will be added to that object.
 
If an object with the same Ident already exists in the Opasnet Base, the information will be added to that object.
 
Before you start, make sure that you have created an object page in the Opasnet wiki for each object (study or variable) you want to upload.
 
Before you start, make sure that you have created an object page in the Opasnet wiki for each object (study or variable) you want to upload.
  
The first row must contain the names of the columns. These names are used when creating the list of explanations in the Opasnet Base.
+
You must give your '''Opasnet username''' to upload data. The username will be stored together with the upload information. You must also know the '''writer password for the Opasnet Base''' if you are not using the AWP web interface.
  
You must give your Opasnet username to upload data. The username will be stored together with the upload information.
 
  
'''Inputs
 
  
Writerpsswd:
+
''Object info'' contains the most important metadata about your data. All information should be between 'quotation marks' so that they are not mistakenly interpreted as Analytica node identifiers.
You must know the writer password for the Opasnet Base if you are not using the AWP web interface.
+
* '''Analytica identifier''' (only needed for type 2, i.e. when Analytica nodes are uploaded): the identifier(s) of the node(s).
 +
* '''Ident''' is the page identifier in Opasnet. If your study or variable does not already have a page, you must create one before uploading the data. The identifier is found in the metadata box in the top right corner of the [[Opasnet]] page. Note that if an entry with the identical Ident already exists in the Opasnet Base, the metadata will NOT be updated but the existing metadata will be used instead.
 +
* '''Name''': a description that may be longer than an identifier. This is typically identical to the respective page in Opasnet.
 +
* '''Unit''': unit of measurement. This is the same for the whole object. However, if different observation columns have different units, this must be clearly explained in the respective [[Opasnet]] page.
 +
* '''# Explanation cols''': the number of columns that contain explanatory information (see below).
 +
* '''Observation name''': a common name for all observation columns. If omitted, 'Observation' is used. See below for more details.
 +
* If "'''Probabilistic?'''" is Yes or 1, then each row of the data table is considered a random draw from a data pool with the explanation values that are listed on that row.
 +
* '''Append to upload''': Typically, a data upload happens in a situation where corrected or improved data is entered, and the already existing data (if any) is replaced by the new data. (The old data is not deleted, and it can be retrieved if needed.) In this case, the answer to the question "Replace data?" should be ''Yes''. On the other hand, sometimes the data is entered as smaller parts, and the whole data consists of several uploads. Then you should answer ''No'', and the data will be shown as one piece of data with the previous upload(s) for the user.  
  
  
*Object info contains the most important metadata about your data.
+
The data can be entered with four different ways. All of them are equal, it only depends on which is the easiest way to enter the data table(s).
- Ident is the page identifier in Opasnet. If your study or variable does not already have a page, you must create one. The identifier is found in the metadata box in the top right corner of the Opasnet page.
+
# Copy-paste a table: The data are copy-pasted into the field 'Copy-paste'. The source of the data can be any spreadsheet or text processor, as long as each column is separated by a tab, and each row by a line break. The first row must contain the column names. Note that the whole block of data must be between 'quotation marks'. The problem with this is that it can only handle fairly small tables (less than about 30 kiB).
- Number of indices is the number of columns that contain explanatory information (see below).
+
# Analytica model: Give identifiers of Analytica model nodes, and the node results will be uploaded. You can upload a whole model at one time.
- Parameter name is a common name for all data columns. If omitted, 'Parameter' is used. See below for more details.
+
# Node to be formatted as data table: Give the column names and number of rows, and a table with the right size will be created. Then, you can copy-paste your data into this table. The benefit is that larger data can be copied.
- If "Probabilistic?" is 1, then each row of the data table is considered a random draw from a data pool. Note that it is assumed that the index values are assumed the same in all rows, and only the index values of the first row are stored.
+
# Ready-made data table node: If you already have your data in the right format for a modified data table, you can simply tell the name of that Analytica node, and it will be directly used. Requires advanced knowledge about the specifications of a modified data table. This is NOT the same as the input data table format described above.
- Append to upload: Typically, each data upload event is given a separate identifier. If you want to continue an existing upload of the same object, you can give the number of that upload, and the new data will be appended.
 
  
'''Key parts of OBC
 
  
; Findid: This function gets an id from a table. It has the following parameters: '''in''': the property for which the id is needed. In MUST be unique in cond and it must contain index i. '''table''': the table from where the id is brought. The table MUST have .j as the column index, .i as the row index, and a column named 'id'. '''cond''': the name of the field that is compared with in. Cond must be text.
+
'''Advanced options:
  
;Textify: Changes a number to a text value with up to 15 significant numbers. This bypasses the number formatting problem that tends to convert e.g. 93341 to '93.34K'. If the input is null, the result is ''.
+
There are advanced methods for uploading data. 'Upload non-public data' stores the actual data (the values in the observation columns) into a database that requires a password for reading. However, all other information (including upload metadata and the explanation data) are openly available. For very large data, you can also create data files that match the data format and import those to [[Opasnet Base]] (although you need a specific password to be able to import files).
  
 +
Platform: You must choose THL computer if you are not using the AWP web interface.
  
 +
===Outputs===
  
;Input format type 1 (Copy-paste): The data are copy-pasted into the field 'Observations'. The source of the data can be any spreadsheet or text processor, as long as each column is separated by a tab, and each row by a line break.  
+
For learning how to access the data in [[Opasnet Base]], read [[Opasnet Base]] and for technical details [[Opasnet Base UI]].
  
Note that the pasted data should be between 'quotation marks'.
+
===Procedure===
  
Fill in the additional information about the data. These are asked for the object, and also for all the indices and the parameter. Note that is an entry with the identical Ident already exists in the Opasnet Base, this information will NOT be uploaded but the existing information will be used instead. All information should be between 'quotation marks' so that they are not mistakenly interpreted as Analytica node identifiers.
+
When you know what to put to the input fields, rest is very straightforward. You just check that you have entered the data correctly by pressing ''Check that your data looks sensible: Data table'' button. You may correct any errors at this point. Then, you just press ''Upload to Opasnet Base''. It may take several minutes especially with large data.
- Name: a description that may be longer than an identifier. This is typically identical to the respective page in Opasnet.
 
- Unit: unit of measurement.
 
 
 
'''Advanced options:
 
 
 
There are two ways of uploading data. A) 'Upload data' is a public format, and all details are openly available. B) 'Upload non-public data' stores the actual data (the values in the parameter columns) into a database that requires a password for reading. However, all other information (including upload metadata and the data in  the index fields) are openly available.
 
 
 
Platform: You must choose THL computer if you are not using the AWP web interface.
 
  
 +
Note! Currently you can only upload data if you have [[Analytica]] Enterprise in your machine and your computer is located in the THL network. You need the file [[:image:Opasnet base connection.ANA|Opasnet base connection.ANA]]. You can even upload a whole Analytica model at one time. You have to contact [[User:Juha Villman|Juha Villman]] or [[User:Jouni|Jouni]] to get the password for uploading data.
  
Object info:
+
If you don't have Analytica Enterprise, you can still provide data to [[Opasnet Base]]. Just create a [[study]] or [[variable page]] that contains your information about your data, upload a data file to [[Opasnet]] (or just copy it to the page if the data is small), make a link on the page to the file on the page you created, and contact [[User:Juha Villman|Juha Villman]] or [[User:Jouni|Jouni]] so that they can upload the data for you.
- Data source:
 
1 means that you are copy-pasting data to the 'Observations' field.  
 
2 means that you have a 2D table in an Analytica node. The node must have column index .j (note: it is a local index!) and row index .i. The names of the columns must be in the index .j, and the first row must contain data.
 
3 means that you have a typical Analytica node with n indices; one of the indices may be Run. The node is transformed into a 2D table using MDArrayToTable.
 
- Analytica identifier is the identifier of the node to be used. The name must be given between 'quotation marks', i.e. as text.
 
- Ident: like in the simplified upload.
 
- Number of indices: like in the simplified upload if data source 2 is used; for 3, the number of indices comes from the node, and this entry is ignored.
 
- Parameter name: like in the simplified upload if data source 2 is used; for 3, the parameter is implicit, and this entry is ignored.
 
- Probabilistic?: like in the simplified upload if data source 2 is used; for 3, if this entry is 1, the sample mode is used and the full distribution is saved, if the entry is not 1, the mid mode is used.
 
- Append to upload: like in the simplified upload.
 

Revision as of 22:01, 11 July 2010


Uploading to Opasnet Base helps you understand what data could and should be updated to Opasnet Base and what the recommended data structures and formats are. For technical instructions how to use the current upload software, see Opasnet Base connection. For a general description about the database, see Opasnet Base and for technical details about the database, see [[Opasnet Base structure). For details about downloading data, see Opasnet Base UI.

Scope

What data could and should be updated to Opasnet Base and what are the recommended data structures and formats?

Definition

This section describes what the inputs are and the different ways they can be formatted for a successful upload.

What are the topics about which data can be uploaded into the Base? Well, basically any topic that provides useful information for any decision-making situation that has societal relevance. This sounds like a very wide definition, and it is. The data may be about which car models are environmentally friendly. It may be about pollutants in food. It may be new ideas about a societally just value-added tax.

What are, then, the data structures that are allowed? Although not all structures are allowed, almost any data can be easily transformed into the structure that is used in Opasnet Base. The data must be formatted as a two-dimensional table with one observation at each row. Cells of the table may contain either text or numerical values. Columns contain either explanations (or independent variables in statistics) or the actual observations (or dependent variables in statistics). This difference is important. Explanations are things that are fixed before the actual observation, while observations are those that are actually measured or observed. See the following table as example.

An example of a data table.
City Year Sex Body mass index BMI (kg/m2) Blood cholesterol (mM)
London 2010 Male 20 3.31
London 2010 Male 25 6.83
London 2010 Female 20 5.55
London 2010 Female 30 5.42
London 2010 Female 25 4.19
New York 2010 Male 22 3.33
New York 2010 Male 26 5.84
New York 2010 Female 28 5.67
New York 2010 Female 26 4.52
New York 2010 Female 24 5.67

The table seems unambiguous at the first glance, but it is impossible to interpret it correctly without knowing, which columns are explanations and which are observations. This may be a study performed in London and New York in 2010, where random people were asked for a blood test. In this case, city and year are explanations, and other columns are observations. However, the data may as well be summary statistics from a larger study, where the studied individuals were grouped in these cities based on their sex and body mass index (BMI), and the mean cholesterol in each group is the only observation column. (The data implies that London used BMI groups 5 kg/m2 wide, while New York used BMI groups kg/m2 wide, but you cannot know based on the data only.)

However, if you know which columns are explanations and which are observations, you can actually deduce many important aspects of the design of the study from which the data came. Of course many things must be explained elsewhere like how people were selected and what the base population studied was.

Another important feature of the data is, whether it is deterministic or probabilistic. With deterministic data, it is assumed that each row is an independent piece of information. With probabilistic data, it is assumed that there are random draws from a pool of potential observations (like an urn full of balls with different colours, each having the same probability of being picked). However, all observations are not picked from the same pool, but from several pools uniquely defined by the explanation columns.

Let's assume that we look at the table above and learn that it is a probabilistic data with two explanation columns. We then know that people were randomly picked from either London or New York in 2010 (explanation columns are always the first ones on the left side). In London, two happened to be males and three females, with measured BMIs and cholesterol levels. Thus, there are five observations from the pool defined by "London 2010", and also five observations from "New York 2010." These are numbered 1..5. If the data is deterministic, the observation number is 0 for all rows.


Inputs

Error creating thumbnail: Unable to save thumbnail to destination
Figure. User interface for uploading data to Opasnet Base.

Overview

This method is used to save original data or model results (a study or a variable, respectively) into the Opasnet Base.

If an object with the same Ident already exists in the Opasnet Base, the information will be added to that object. Before you start, make sure that you have created an object page in the Opasnet wiki for each object (study or variable) you want to upload.

You must give your Opasnet username to upload data. The username will be stored together with the upload information. You must also know the writer password for the Opasnet Base if you are not using the AWP web interface.


Object info contains the most important metadata about your data. All information should be between 'quotation marks' so that they are not mistakenly interpreted as Analytica node identifiers.

  • Analytica identifier (only needed for type 2, i.e. when Analytica nodes are uploaded): the identifier(s) of the node(s).
  • Ident is the page identifier in Opasnet. If your study or variable does not already have a page, you must create one before uploading the data. The identifier is found in the metadata box in the top right corner of the Opasnet page. Note that if an entry with the identical Ident already exists in the Opasnet Base, the metadata will NOT be updated but the existing metadata will be used instead.
  • Name: a description that may be longer than an identifier. This is typically identical to the respective page in Opasnet.
  • Unit: unit of measurement. This is the same for the whole object. However, if different observation columns have different units, this must be clearly explained in the respective Opasnet page.
  • # Explanation cols: the number of columns that contain explanatory information (see below).
  • Observation name: a common name for all observation columns. If omitted, 'Observation' is used. See below for more details.
  • If "Probabilistic?" is Yes or 1, then each row of the data table is considered a random draw from a data pool with the explanation values that are listed on that row.
  • Append to upload: Typically, a data upload happens in a situation where corrected or improved data is entered, and the already existing data (if any) is replaced by the new data. (The old data is not deleted, and it can be retrieved if needed.) In this case, the answer to the question "Replace data?" should be Yes. On the other hand, sometimes the data is entered as smaller parts, and the whole data consists of several uploads. Then you should answer No, and the data will be shown as one piece of data with the previous upload(s) for the user.


The data can be entered with four different ways. All of them are equal, it only depends on which is the easiest way to enter the data table(s).

  1. Copy-paste a table: The data are copy-pasted into the field 'Copy-paste'. The source of the data can be any spreadsheet or text processor, as long as each column is separated by a tab, and each row by a line break. The first row must contain the column names. Note that the whole block of data must be between 'quotation marks'. The problem with this is that it can only handle fairly small tables (less than about 30 kiB).
  2. Analytica model: Give identifiers of Analytica model nodes, and the node results will be uploaded. You can upload a whole model at one time.
  3. Node to be formatted as data table: Give the column names and number of rows, and a table with the right size will be created. Then, you can copy-paste your data into this table. The benefit is that larger data can be copied.
  4. Ready-made data table node: If you already have your data in the right format for a modified data table, you can simply tell the name of that Analytica node, and it will be directly used. Requires advanced knowledge about the specifications of a modified data table. This is NOT the same as the input data table format described above.


Advanced options:

There are advanced methods for uploading data. 'Upload non-public data' stores the actual data (the values in the observation columns) into a database that requires a password for reading. However, all other information (including upload metadata and the explanation data) are openly available. For very large data, you can also create data files that match the data format and import those to Opasnet Base (although you need a specific password to be able to import files).

Platform: You must choose THL computer if you are not using the AWP web interface.

Outputs

For learning how to access the data in Opasnet Base, read Opasnet Base and for technical details Opasnet Base UI.

Procedure

When you know what to put to the input fields, rest is very straightforward. You just check that you have entered the data correctly by pressing Check that your data looks sensible: Data table button. You may correct any errors at this point. Then, you just press Upload to Opasnet Base. It may take several minutes especially with large data.

Note! Currently you can only upload data if you have Analytica Enterprise in your machine and your computer is located in the THL network. You need the file Opasnet base connection.ANA. You can even upload a whole Analytica model at one time. You have to contact Juha Villman or Jouni to get the password for uploading data.

If you don't have Analytica Enterprise, you can still provide data to Opasnet Base. Just create a study or variable page that contains your information about your data, upload a data file to Opasnet (or just copy it to the page if the data is small), make a link on the page to the file on the page you created, and contact Juha Villman or Jouni so that they can upload the data for you.