Data structures in Opasnet

From Testiwiki
Jump to: navigation, search


Question

What data structures there are in Opasnet and how should they be used?

Answer

This topic was previously in Opasnet base structure. Now there is only a brief description of the topic.

All data should be convertible into the following format (shown here in the wide format, i.e. each observation type as one column):

Observation
Year Sex Age Height Weight Description
2009 Male 20 178 70 An optional column for descriptive text about each row.
2009 Male 30 174 79
2010 Male 25 183 84
2010 Female 22 168 65

where

Names of explanation columns, also known as indices.
Explanation data, also known as locations. You can use these columns as search criteria.
Observation index, typically called "Observation". Common name for all observation columns
Names of observation columns. These are the parameters of interest.
Observation data. These are the actual measurements.


Other Object information. It slightly varies depending the format you use for uploading data.

Info table
Parameter Example How entered in Table2Base
ident Op_en2693 Automatically taken from the wiki page id.
name Testvariable Automatically taken from the wiki page name.
unit cm,kg unit="cm,kg"
explanation cols Year, Sex, Age, Observation index="Year,Sex,Age,Observation"
observation index Observation Given as the last index.
observation # 1, 2, 3 If indices together don't uniquely identify the row, use an additional index column "obs" with row numbers.


This is a table in long format where all observations have been put into a single column. There is an additional column "Observation" explaining which parameter is in which row. In this example, indices Year, Sex and Age uniquely define a row, and therefore there is no need for obs column. When a table is used in calculations, all rows where the index Observation has the value Description will be removed first.

Year Sex Age Observation Result
2009 Male 20 Height 178
2009 Male 30 Height 174
2010 Male 25 Height 183
2010 Female 22 Height 168
2009 Male 20 Weight 70
2009 Male 30 Weight 79
2010 Male 25 Weight 84
2010 Female 22 Weight 65
2009 Male 20 Description An optional column for descriptive text about each row.
2009 Male 30 Description
2010 Male 25 Description
2010 Female 22 Description

(The tables above have been created with File:Opasnet base explanation.ods.)

Protecting non-public data

Go down this list in this order until you have reached a proper level of protection.

  1. Remove personal information (names, social security numbers etc.) from the data and use person-specific identifiers instead. Keep the key linking names and identifiers in a safe place.
  2. Remove other sensitive information (the name of an endangered species or a drug studied) and use an identifier instead.
  3. Make information coarser e.g. by giving relative values instead of absolute: do not give the exact operation date but give the time from a reference date, but tell the reference date only at the precision of one year. Similarly, instead of giving the exact location of an endangered species, give relative location to a reference point not exactly revealed.

Rationale

Protecting data

Opasnet is a workspace for open sharing and using of data. However, sometimes it is necessary to restrict the use of data. For example, the data may contain personal patient information or unpublished research data.

The main approach is always that as much data should be opened as possible.

It is not a question about whether a piece of data is openable or not but which parts are and how other parts should be handled. For example, "John Doe has lung cancer, which was operated using an experimental method with radical lymph node removal in Mass General Hospital on 6 Jan 2012" is a piece of data that clearly must not be published by the principal investigator. However, a simple change in the information content turns this sensitive patient information into neutral medical information: "A lung cancer was operated using an experimental method with radical lymph node removal in Mass General Hospital on 6 Jan 2012". In addition, the investigator may not want to tell which specific treatments they are testing, but he may want to release the data as much as possible to get comments about the study design and statistical data analysis from colleagues: "A lung cancer was operated using a method B in Mass General Hospital on 6 Jan 2012."

With these two changes to the data, it can be released in a machine-readable format in the Internet. It is still very useful for health economists and possibly to many other people as well.

See also

Pages related to Opasnet Base

Opasnet Base · Uploading to Opasnet Base · Data structures in Opasnet · Opasnet Base UI · Modelling in Opasnet · Special:Opasnet Base Import · Opasnet Base Connection for R (needs updating) · Converting KOPRA data into Opasnet Base · Poll

Pages related to the 2008-2011 version of Opasnet Base

Opasnet base connection for Analytica · Opasnet base structure · Related Analytica file (old version File:Transferring to result database.ANA) · Analytica Web Player · Removed pages and other links · Standard run · OpasnetBaseUtils