Difference between revisions of "Opasnet base structure"

Latest revision as of 12:01, 10 January 2014

This page is a variable. The page identifier is Op_en1913
Moderator:Jouni (see all)
Give your opinion to the peer rating of the content of this page. {{ #opasnet_rater: }} Citation of this page: Juha Villman, Einari Happonen, Jouni T. Tuomisto: Opasnet Base structure. Opasnet 2010. [1]. Accessed 23 May 2024.
Upload data Show results

This page is about the old structure of Opasnet Base used in ca. 2008-2011. For a description about the current database, see Opasnet base 2.

Question

Error creating thumbnail: Unable to save thumbnail to destination

Structure and connections (lines) of the tables (boxes) in the Opasnet Base. All table identifiers are called id (so they can be called by like obj.id). When obj.id is referred to in another table such as actobj, it is called actobj.obj_id. The latter end of a one-to-many relationship is marked with a ring. Important substantive fields are listed inside table boxes.

Opasnet Base is a storage and retrieval system for results of variable and data from studies. What is the structure of Opasnet Base such that it enables the following functionalities?

Storage of results of variables with uncertainties when necessary, and as multidimensional arrays when necessary.R↻
Automatic retrieval of results when called from Opasnet wiki or other platforms or modelling systems.
Description and handling of the indicess that a variable may take.
It is possible to protect some results and data from reading by unauthorised persons.
If is possible to build user interfaces for easily entering observations into the Base.

Answer

Opasnet base is a MySQL database located at http://base.opasnet.org.

Data structure

Main article: Data structures in Opasnet

All data should be convertible into the following format:

			Observation
Year	Sex	Age	Height	Weight	Description
2009	Male	20	178	70	An optional column for descriptive text about each row.
2009	Male	30	174	79
2010	Male	25	183	84
2010	Female	22	168	65

where

Names of explanation columns, also known as indices.

Explanation data, also known as locations. You can use these columns as search criteria.

Observation index, typically called "Observation". Common name for all observation columns

Names of observation columns. These are the parameters of interest.

Observation data. These are the actual measurements.

Table structure in the database

All tables

act
Uploads, updates, and other actions
Field	Type	Null	Extra	Key
id	int(10) unsigned	NO	auto_increment	PRI
acttype_id	tinyint(3) unsigned	NO		MUL
who	varchar(50)	NO
comments	varchar(250)	YES
time	timestamp	NO
temp_id	int(10) unsigned	NO		MUL

actloc
Locations of an act
Field	Type	Null	Extra	Key
actobj_id	int(10) unsigned	NO		PRI
loc_id	int(10) unsigned	NO		PRI

actobj
Acts of an object
Field	Type	Null	Extra	Key
id	int(10) unsigned	NO	auto_increment	PRI
act_id	int(10) unsigned	NO		MUL
obj_id	int(10) unsigned	NO		MUL
series_id	int(10) unsigned	NO		MUL
unit	varchar(64)	YES

acttype
List of action types
Field	Type	Null	Extra	Key
id	int(10) unsigned	NO	auto_increment	PRI
acttype	varchar(250)	NO		UNI

cell
Cells of an object
Field	Type	Null	Extra	Key
id	int(12) unsigned	NO	auto_increment	PRI
actobj_id	int(10) unsigned	NO		MUL
mean	float	YES
sd	float	NO
n	int(10)	NO
sip	varchar(2000)	YES

loc
Location information
Field	Type	Null	Extra	Key
id	int(10) unsigned	NO	auto_increment	PRI
std_id	int(10) unsigned	NO		MUL
obj_id_i	int(10) unsigned	NO		MUL
location	varchar(100)	NO
roww	mediumint(8) unsigned	NO
description	varchar(150)	NO

loccell
Locations of a cell
Field	Type	Null	Extra	Key
cell_id	int(10) unsigned	NO		PRI
loc_id	int(10) unsigned	NO		PRI

obj
Object information (all objects)
Field	Type	Null	Extra	Key
id	int(10) unsigned	NO	auto_increment	PRI
ident	varchar(20)	NO		UNI
name	varchar(200)	NO
objtype_id	tinyint(3) unsigned	NO		MUL
page	int(10) unsigned	NO
wiki_id	tinyint(3) unsigned	NO

objtype
Types of objects
Field	Type	Null	Extra	Key
id	tinyint(3)	NO		PRI
objtype	varchar(30)	NO

res
Result distribution (actual values)
Field	Type	Null	Extra	Key
id	bigint(20) unsigned	NO	auto_increment	PRI
cell_id	int(12) unsigned	NO		MUL
obs	int(10) unsigned	NO
result	float	NO
restext	varchar(250)	YES
implausible	binary(1)	YES

wiki
Wiki information
Field	Type	Null	Extra	Key
id	tinyint(3)	NO		PRI
url	varchar(255)	NO
wname	varchar(20)	NO

Contents of selected tables

Table objtype
id	objtype
1	Variable
2	Study
3	Method
4	Assessment
5	Class
6	Index
7	Nugget
8	Encyclopedia article
9	Run

Table acttype
id	acttype
1	Start object
2	Finish assessment
3	Update formula
4	Upload data (replace)
5	Upload data (append)
6	Review scope
7	Review definition
8	Add object info

Rationale

Data

Software

Because Opasnet base will contain very large amounts of mostly numerical information, the state-of-the-art structure is a SQL database. Because of its flexibility, ease of use, and cost, MySQL is an optimal choice among SQL software. In addition to the database software, a variable transfer protocol is needed on top of that so that the results of variables can be retrieved and new results stored either automatically by a calculating software, or manually by the user. Fancy presenting software can be built on top of the database, but that is not the topic of this page.

Storage and retrieval of results of variables

The most important functionality is to store and retrieve the results of variables. Because variables may take very different forms (from a single value such as natural constant to an uncertain spatio-temporal concentration field over the whole Europe), the database must be very flexible. The basic solution is described in the variable page, and it is only briefly summarised here. The result is described as

  P(R|x₁,x₂,...)

where P(R) is the probability distribution of the result and x₁ and x₂ are defining locations of an index where a particular P(R) applies. Typically locations are operationalised as discrete indices. A variable must have at least one index. Uncertainty about the true value of the variable is operationalised as a random sample from the probability distribution, in such a way that the samples are located along an index Sample, which is a list of integers 1,2,3...n, where n=number of samples.

Old description of the structure

Dependencies

Replacing some cells

It is possible that there is a large data, where there is a need to update only a few cells while all others remain the same. How should this be done? There are a few potential alternatives.

Use the current replace functionality. Replace all cells but most of them with the original value.
Use a new act_type that is similar to the current append functionality. This should be understood in a way that if there are two (or more) identical cells (based on cell indices and locations), then the newest result is used and all older ones are discarded. (If the old append is used, then new info is just seen as a new row in the data table, not a replacement of an existing row.
Add a new field into the cell (?) table with an updated cell_id (in a similar way than act_id and series_id are used in the actobj table). This way, the new cell can automatically inherit all locations of the old cell.

Formula structure

Now it has become clear that it is not enough to have samples of the result distributions. It must be possible to completely recalculate the result based on the information in the Opasnet Base. There are different approaches:

Calculate the result based on a formula that may refer to other variables called parents. This is a deterministic approach.
Calculate the result based on the marginal distribution and (conditional) rank correlations with parent variables. This is a probabilistic approach.

This approach requires new tables, namely Formula and Language.

--11: Do we need tables DIF and DIP like Uninet? --Jouni 21:50, 30 December 2009 (UTC)

DIP
- DIP_node_id
- DIP_parent_node_id
- DIP_corr_coeff
- DIP_parent_index
DIF
- DIF_node_id
- DIF_formula
- DIF_varnames_in_formula

Universal Opasnet Base

The idea of universal Opasnet Base says that it should be possible to store results in such a way that the results themselves are public but their interpretation is limited. For example, patient symptoms and clinical test results should be openly available for research, but information about whose results they are should be private. This can be achieved with the following database structure.

Error creating thumbnail: Unable to save thumbnail to destination

Universal Opasnet Base has some parts that exist in different versions depending on the privacy level. The yellow areas are e.g. a public area and a private area. The parts that are white are public.

Let's say that it is enough to have two security levels, public and private. A person wants to record personal health information into the database. She logs in with her personal user name. The private profile gives the name (say, Liisa) and social security number of the person, while the public profile says only "30-40-year-old woman in Finland". Liisa writes down her symptoms or medical information and saves them. This is what is stored in the databases:

**Information stored in the public and private databases. The private database can read tables from the public one but not vice versa.**
Table, field	Private database	Public database
act.who	Liisa, 010175-1024	Woman, 30-40 a
act.when	2011-03-09 22:09:10	2011-03
obj.name	N/A. Data is taken from public side.	Pregnancy test
loccell.loc_id (locations and indices explained)	Person = 010175-1024 Time = 2011-03-09 Test = Clearblue digital test	Age = 30-40 Sex = Female Country = Finland Time = 2011-03 Test = Clearblue digital test
res.restext	N/A. Data is taken from public side.	Pregnant 1-2 weeks.

Based on the information, anyone can see that there is a woman in Finland who has used a Clearblue pregnancy test and the result was positive. But there is no way an outsider could connect this information to any particular person, because all information that could be used for linking is located in the private website. However, an authorised person from health case could see the data in the private database and connect Liisa and the test result.

Difference between revisions of "Opasnet base structure"

Latest revision as of 12:01, 10 January 2014

Contents

Question

Answer

Data structure

Table structure in the database

All tables

Contents of selected tables

Rationale

Data

Software

Storage and retrieval of results of variables

Dependencies

Replacing some cells

Formula structure

Universal Opasnet Base

See also

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Page Tools

Tools

In other websites

@@ Line 2: / Line 2: @@
 [[Category:Open assessment]]
 [[Category:Tool]]
-{{variable|moderator = Jouni}}
+{{variable|moderator = Jouni
+| reference = {{publication
+| authors        = Juha Villman, Einari Happonen, Jouni T. Tuomisto
+| page           = Opasnet Base structure
+| explanation    =
+| publishingyear = 2010
+| urn            =
+| elsewhere      =
+}}
+}}
-This page is about the '''structure of Opasnet Base'''. For a general description, see [[Opasnet base]].
+:''This page is about the '''old structure of Opasnet Base''' used in ca. 2008-2011. For a description about the current database, see [[Opasnet base 2]].
-==Scope==
+==Question==
-'''Opasnet base''' is a storage and retrieval system for [[result]]s of [[variable]] and [[data]] from [[study|studies]]. What is the structure of [[Opasnet base]] such that it enables the following functionalities?
+[[image:Opasnet Base structure.png|thumb|400px|Structure and connections (lines) of the tables (boxes) in the [[Opasnet Base]]. All table identifiers are called id (so they can be called by like obj.id). When obj.id is referred to in another table such as actobj, it is called actobj.obj_id. The latter end of a one-to-many relationship is marked with a ring. Important substantive fields are listed inside table boxes.]]
+'''Opasnet Base''' is a storage and retrieval system for [[result]]s of [[variable]] and [[data]] from [[study|studies]]. What is the structure of [[Opasnet Base]] such that it enables the following functionalities?
 # Storage of results of variables with uncertainties when necessary, and as multidimensional arrays when necessary.{{reslink|Should all variables go to result distribution database?}}
 # Automatic retrieval of results when called from [[Opasnet wiki]] or other platforms or modelling systems.
-# Description and handling of the [[dimension]]s that a [[variable]] may take.
+# Description and handling of the [[index|indices]]s that a [[variable]] may take.
 # It is possible to protect some results and data from reading by unauthorised persons.
 # If is possible to build user interfaces for easily entering observations into the Base.
+==Answer==
-==Definition==
-===Data===
-====Software====
-Because Opasnet base will contain very large amounts of mostly numerical information, the state-of-the-art structure is a [[:en:SQL|SQL]] database. Because of its flexibility, ease of use, and cost, [[:en:MySQL|MySQL]] is an optimal choice among SQL software. In addition to the database software, a [[variable transfer protocol]] is needed on top of that so that the results of variables can be retrieved and  new results stored either automatically by a calculating software, or manually by the user. Fancy presenting software can be built on top of the database, but that is not the topic of this page.
-====Storage and retrieval of results of variables====
-The most important functionality is to store and retrieve the results of variables. Because variables may take very different forms (from a single value such as natural constant to an uncertain spatio-temporal concentration field over the whole Europe), the database must be very flexible. The basic solution is described in the [[variable]] page, and it is only briefly summarised here. The result is described as
-   P(R|x<sub>1</sub>,x<sub>2</sub>,...)
-where P(R) is the probability distribution of the result and x<sub>1</sub> and x<sub>2</sub> are defining [[location]]s of a [[dimension]] where a particular P(R) applies. Typically locations are operationalised as discrete [[Index|indices]]. A variable must have at least one dimension. [[Uncertainty]] about the true value of the variable is operationalised as a random sample from the probability distribution, in such a way that the samples are located along an index ''Sample'', which is a list of integers 1,2,3...n, where n=number of samples.
-* [http://en.opasnet.org/en-opwiki/index.php?title=Opasnet_base&oldid=7856 Old description of the structure]
-===Dependencies===
-* [[Opasnet structure]]
-* [[Open assessment]]
-==Result==
 Opasnet base is a [[:en:MySQL|MySQL]] database located at http://base.opasnet.org.
-===Table structure===
+===Data structure===
-====Formula structure====
+:''Main article: '''[[Data structures in Opasnet]]'''''
-Now it has become clear that it is not enough to have samples of the result distributions. It must be possible to completely recalculate the result based on the information in the [[Opasnet Base]]. There are different approaches:
+All data should be convertible into the following format:
-* Calculate the result based on a formula that may refer to other variables called parents. This is a deterministic approach.
-* Calculate the result based on the marginal distribution and (conditional) rank correlations with parent variables. This is a probabilistic approach.
+{| {{prettytable}}
+! colspan="3"| || colspan="3"  style="background-color: #FFD8F0;"|Observation
+|-----
+!  style="background-color: #CCDFC8;"|Year ||  style="background-color: #CCDFC8;"|Sex ||  style="background-color: #CCDFC8;"|Age ||  style="background-color: #DFB8D0;"|Height ||  style="background-color: #DFB8D0;"|Weight ||  style="background-color: #DFB8D0;"|Description
+|-----
+|  style="background-color: #ECFFE8;"|2009 ||  style="background-color: #ECFFE8;"|Male ||  style="background-color: #ECFFE8;"|20 ||  style="background-color: #ECDFE8;"|178 ||  style="background-color: #ECDFE8;"|70 ||  style="background-color: #ECDFE8;"|An optional column for descriptive text about each row.
+|-----
+|  style="background-color: #ECFFE8;"|2009 ||  style="background-color: #ECFFE8;"|Male ||  style="background-color: #ECFFE8;"|30 ||  style="background-color: #ECDFE8;"|174 ||  style="background-color: #ECDFE8;"|79 ||  style="background-color: #ECDFE8;"|
+|-----
+|  style="background-color: #ECFFE8;"|2010 ||  style="background-color: #ECFFE8;"|Male ||  style="background-color: #ECFFE8;"|25 ||  style="background-color: #ECDFE8;"|183 ||  style="background-color: #ECDFE8;"|84 ||  style="background-color: #ECDFE8;"|
+|-----
+|  style="background-color: #ECFFE8;"|2010 ||  style="background-color: #ECFFE8;"|Female ||  style="background-color: #ECFFE8;"|22 ||  style="background-color: #ECDFE8;"|168 ||  style="background-color: #ECDFE8;"|65 ||  style="background-color: #ECDFE8;"|
+|}
-This approach requires new tables, namely:
+where
-* Formula
-** id (automatic incremental integer)
-** Obj_id_v
-** Obj_id_r
-** When (what is the relationship between upload and formula/When? Is there always a new formula for a new upload? No, because the upload may change even if formula doesn't, if the parent change. Is there always a new upload for a new formula? Yes, because it is necessary to make a new upload.
-** Language (of the formula code)
-** Code (a large text or memo field for the formula)
-: {{comment|# |Do we need tables DIF and DIP like Uninet?|--[[User:Jouni|Jouni]] 21:50, 30 December 2009 (UTC)}}
-* DIP
-** DIP_node_id
-** DIP_parent_node_id
-** DIP_corr_coeff
-** DIP_parent_index
-* DIF
-** DIF_node_id
-** DIF_formula
-** DIF_varnames_in_formula
+{| {{prettytable}}
+|  style="background-color: #CCDFC8;"|Names of explanation columns, also known as indices.
+|-----
+|  style="background-color: #ECFFE8;"|Explanation data, also known as locations. You can use these columns as search criteria.
+|-----
+|  style="background-color: #FFD8F0;"|Observation index, typically called "Observation". Common name for all observation columns
+|-----
+|  style="background-color: #DFB8D0;"|Names of observation columns. These are the parameters of interest.
+|-----
+|  style="background-color: #ECDFE8;"|Observation data. These are the actual measurements.
+|-----
+|}
-====Objinfo: new structure====
+===Table structure in the database===
-The structure of Objinfo should be changed. The original plan was that there is at most one row of Objinfo per Object. Now it is clear that this does not have all functionalities we need. Instead, there should be a possibility to add any number of actions per object. Therefore, even the name of the table should be changed to Act. The structure should be changed accordingly:
+==== All tables ====
-* The field id is the primary field for the table. It is NOT the Obj.id any longer.
-* A new field Obj_id should be added. This is the old field id.
-* End field should be removed, it is not used.
-* Url should be changed to Comment, as it may contain also other info.
-* The length of Comment should be 250 characters (at least).
-* Begin should be replaced by When, which is the current timestamp of the row addition.
-* A new field Act_id should be added.
-* A new table Acttype for actions should be added. It would contain only fields id and Act, and the following rows:
-*# Start the object
-*# Finish the assessment
-*# Add a reference
-*# Add an URL
-*# Peer review the object definition: accept based on the discussion
-*# Peer review the object definition: reject based on the discussion
-*# Peer review the object definition: accept (personal opinion)
-*# Peer review the object definition: reject (personal opinion)
-*# Clairvoyant test for the scope: pass
-*# Clairvoyant test for the scope: fail
-*# Save a run of the object.
-====Merging Res and Resinfo -tables====
-These tables should be merged. Discussion is here {{disclink|Res and Resinfo -tables should be merged}}.
-====All tables: Overview====
-* We need '''Ressec''' (Result secure) and '''Resinfosec''' (Result info secure) tables for secure information. All other tables are openly readable except these two. They have the same structure as Res and Resinfo tables, respectively.
 {| VALIGN="top" BORDER="0"
 |-
-|
+|VALIGN="top"|
-{| WIDTH="250px" {{prettytable}}
+{| {{prettytable}}
-|----
+|colspan=5|'''act'''
-|COLSPAN="3"|'''Obj'''
+|-
+|colspan=5|'''Uploads, updates, and other actions'''
+|-
+|Field||Type||Null||Extra||Key
+|-
+|id||int(10) unsigned||NO||auto_increment||PRI
+|-
+|acttype_id||tinyint(3) unsigned||NO||||MUL
+|-
+|who||varchar(50)||NO||||
+|-
+|comments||varchar(250)||YES||||
 |-
-|COLSPAN="3"|''Describes all objects''
+|time||timestamp||NO||||
-|----
-| '''FIELD'''
-| '''TYPE'''
-| '''EXTRA'''
-|----
-| id
-| int(10)
-| primary
-|----
-| Ident
-| varchar(20)
-| unique
-|----
-| Name
-| varchar(200)
-|
-|----
-| Unit
-| varchar(16)
-|
-|----
-| Objtype_id
-| tinyint(3)
-|
-|----
-| Page
-| int(10)
-|
 |-
-| Wiki_id
+|temp_id||int(10) unsigned||NO||||MUL
-| tinyint(3)
-|
-|----
 |}
-{{attack|1|Unit should have at least 32 characters.|--[[User:Jouni|Jouni]] 19:29, 17 September 2009 (EEST)}}
-{{comment|2|We can increase it to 64 at once.|--[[User:Juha Villman|Juha Villman]] 07:52, 18 September 2009 (EEST)}}
 |VALIGN="top"|
-{| WIDTH="250px" {{prettytable}}
+{| {{prettytable}}
-|COLSPAN="3"|'''Cell''' (previously Res)
+|colspan=5|'''actloc'''
+|-
+|colspan=5|'''Locations of an act'''
+|-
+|Field||Type||Null||Extra||Key
 |-
-|COLSPAN="3"|''Cells of an object''
+|actobj_id||int(10) unsigned||NO||||PRI
 |-
-| '''FIELD'''
+|loc_id||int(10) unsigned||NO||||PRI
-| '''TYPE'''
-| '''EXTRA'''
-|----
-| id
-| int(12)
-| primary
-|----
-| Obj_id_v (variable id)
-| int(10)
-|
-|----
-| Obj_id_r (run id)
-| int(10)
-|
-|----
-| Mean (mean of the cell)
-| float
-|
-|----
-| N (samplesize)
-| int(10)
-|
-|----
 |}
 |VALIGN="top"|
-{| WIDTH="250px" {{prettytable}}
+{| {{prettytable}}
-|COLSPAN="3"|'''Loc'''
+|colspan=5|'''actobj'''
 |-
-|COLSPAN="3"|''Location information''
+|colspan=5|'''Acts of an object'''
+|-
+|Field||Type||Null||Extra||Key
+|-
+|id||int(10) unsigned||NO||auto_increment||PRI
+|-
+|act_id||int(10) unsigned||NO||||MUL
+|-
+|obj_id||int(10) unsigned||NO||||MUL
+|-
+|series_id||int(10) unsigned||NO||||MUL
+|-
+|unit||varchar(64)||YES||||
+|}
 |-
-| '''FIELD'''
-| '''TYPE'''
-| '''EXTRA'''
-|----
-| id
-| int(10)
-| primary
-|----
-| Obj_id_i (index id)
-| int(10)
-|
-|----
-| Location
-| varchar(1000)
-|
-|----
-| Roww (row # of index)
-| Mediumint(8)
 |
-|----
+{| {{prettytable}}
-| Description
+|colspan=5|'''acttype'''
-| varchar(150)
+|-
-|
+|colspan=5|'''List of action types'''
-|----
-|}
 |-
-|VALIGN="top"|
+|Field||Type||Null||Extra||Key
-{| WIDTH="250px" {{prettytable}}
-|COLSPAN="3"|'''Item'''
 |-
-|COLSPAN="3"|''Items of a set''
+|id||int(10) unsigned||NO||auto_increment||PRI
 |-
-| '''FIELD'''
+|acttype||varchar(250)||NO||||UNI
-| '''TYPE'''
-| '''EXTRA'''
-|----
-| id
-| int(10)
-| primary
-|----
-| Sett_id (set to which the item belongs)
-| int(10)
-|
-|----
-| Obj_id (item id)
-| int(10)
-|
-|----
-| Fail (membership not valid?)
-| tinyint(1)
-|
-|----
 |}
 |VALIGN="top"|
-{| WIDTH="250px" {{prettytable}}
+{| {{prettytable}}
-|COLSPAN="3"|'''Loccell''' (previously Locres)
+|colspan=5|'''cell'''
+|-
+|colspan=5|'''Cells of an object'''
+|-
+|Field||Type||Null||Extra||Key
 |-
-|COLSPAN="3"|''Locations of a cell''
+|id||int(12) unsigned||NO||auto_increment||PRI
 |-
-| '''FIELD'''
+|actobj_id||int(10) unsigned||NO||||MUL
-| '''TYPE'''
+|-
-| '''EXTRA'''
+|mean||float||YES||||
-|----
+|-
-| id
+|sd||float||NO||||
-| int(10)
+|-
-| primary
+|n||int(10)||NO||||
-|----
+|-
-| Cell_id
+|sip||varchar(2000)||YES||||
-| int(10)
-|
-|----
-| Loc_id
-| int(10)
-|
-|----
 |}
 |VALIGN="top"|
-{| WIDTH="250px" {{prettytable}}
+{| {{prettytable}}
-|COLSPAN="3"|'''Res''' (previously Sam)
+|colspan=5|'''loc'''
+|-
+|colspan=5|'''Location information'''
+|-
+|Field||Type||Null||Extra||Key
+|-
+|id||int(10) unsigned||NO||auto_increment||PRI
+|-
+|std_id||int(10) unsigned||NO||||MUL
+|-
+|obj_id_i||int(10) unsigned||NO||||MUL
+|-
+|location||varchar(100)||NO||||
 |-
-|COLSPAN="3"|''Result distribution (actual values)''
+|roww||mediumint(8) unsigned||NO||||
 |-
-| '''FIELD'''
+|description||varchar(150)||NO||||
-| '''TYPE'''
-| '''EXTRA'''
-|----
-| id
-| bigint(20)
-| primary
-|----
-| Cell_id
-| int(12)
-|
-|----
-| Obs (previously Sample)
-| int(10)
-|
-|----
-| Result
-| float
-|
-|----
 |}
 |-
-|VALIGN="top"|
+|
-{| WIDTH="250px" {{prettytable}}
+{| {{prettytable}}
-|COLSPAN="3"|'''Sett'''
+|colspan=5|'''loccell'''
+|-
+|colspan=5|'''Locations of a cell'''
+|-
+|Field||Type||Null||Extra||Key
 |-
-|COLSPAN="3"|''List of sets''
+|cell_id||int(10) unsigned||NO||||PRI
 |-
-| '''FIELD'''
+|loc_id||int(10) unsigned||NO||||PRI
-| '''TYPE'''
-| '''EXTRA'''
-|----
-| id
-| int(10)
-| primary
-|----
-| Obj_id
-| int(10)
-|
-|----
-| Settype_id
-| tinyint(3)
-|
-|----
 |}
 |VALIGN="top"|
-{| WIDTH="250px" {{prettytable}}
+{| {{prettytable}}
-|COLSPAN="3"|'''Settype''' (previously Sty)
+|colspan=5|'''obj'''
 |-
-|COLSPAN="3"|''Types of set-item memberships''
+|colspan=5|'''Object information (all objects)'''
 |-
-| '''FIELD'''
+|Field||Type||Null||Extra||Key
-| '''TYPE'''
+|-
-| '''EXTRA'''
+|id||int(10) unsigned||NO||auto_increment||PRI
-|----
+|-
-| id
+|ident||varchar(20)||NO||||UNI
-| tinyint(3)
+|-
-|
+|name||varchar(200)||NO||||
-|----
+|-
-| Settype (previously Stype)
+|objtype_id||tinyint(3) unsigned||NO||||MUL
-| varchar(30)
+|-
-|
+|page||int(10) unsigned||NO||||
-|----
+|-
+|wiki_id||tinyint(3) unsigned||NO||||
 |}
 |VALIGN="top"|
-{| WIDTH="250px" {{prettytable}}
+{| {{prettytable}}
-|COLSPAN="3"|'''Objtype''' (previously Typ)
+|colspan=5|'''objtype'''
 |-
-|COLSPAN="3"|''Types of objects''
+|colspan=5|'''Types of objects'''
 |-
-| '''FIELD'''
+|Field||Type||Null||Extra||Key
-| '''TYPE'''
+|-
-| '''EXTRA'''
+|id||tinyint(3)||NO||||PRI
-|----
+|-
-| id
+|objtype||varchar(30)||NO||||
-| tinyint(3)
-| primary
-|----
-| Objtype (previously Type)
-| varchar(30)
-|
-|----
 |}
+|-
+|
+{| {{prettytable}}
+|colspan=5|'''res'''
+|-
+|colspan=5|'''Result distribution (actual values)'''
+|-
+|Field||Type||Null||Extra||Key
+|-
+|id||bigint(20) unsigned||NO||auto_increment||PRI
+|-
+|cell_id||int(12) unsigned||NO||||MUL
+|-
+|obs||int(10) unsigned||NO||||
+|-
+|result||float||NO||||
+|-
+|restext||varchar(250)||YES||||
 |-
+|implausible||binary(1)||YES||||
+|}
 |VALIGN="top"|
-{| WIDTH="250px" {{prettytable}}
+{| {{prettytable}}
-|COLSPAN="3"|'''Wiki''' (previously Wik)
+|colspan=5|'''wiki'''
+|-
+|colspan=5|'''Wiki information'''
 |-
-|COLSPAN="3"|''Wiki information''
+|Field||Type||Null||Extra||Key
 |-
-| '''FIELD'''
+|id||tinyint(3)||NO||||PRI
-| '''TYPE'''
+|-
-| '''EXTRA'''
+|url||varchar(255)||NO||||
+|-
+|wname||varchar(20)||NO||||
+|}
+|}
+====Contents of selected tables====
+{|
+|
+{| {{prettytable}}
+|+ Table objtype
+! id!! objtype
 |----
-| id
+|| 1|| Variable
-| tinyint(3)
-| primary
 |----
-| Url
+|| 2|| Study
-| varchar(255)
-|
 |----
-| Wname
+|| 3|| Method
-| varchar(20)
+|----
-|
+|| 4|| Assessment
-|}
+|----
-|VALIGN="top"|
+|| 5|| Class
-{| WIDTH="250px" {{prettytable}}
+|----
-|COLSPAN="3"|'''Resinfo''' (previously Descr)
+|| 6|| Index
-|-
+|----
-|COLSPAN="3"|''Additional description of the result''
+|| 7|| Nugget
-|-
-| '''FIELD'''
-| '''TYPE'''
-| '''EXTRA'''
 |----
-| id
+|| 8|| Encyclopedia article
-| bigint(20)
-| primary
 |----
-| Restext (previously Description)
+|| 9|| Run
-| varchar(250)
-|
 |----
-| Who
+|}
-| varchar(50)
 |
+{| {{prettytable}}
+|+ Table acttype
+! id|| acttype
 |----
-| When
+|| 1|| Start object
-| timestamp
+|----
-|
+|| 2|| Finish assessment
-|}
+|----
-:{{defend|# |We should add Formula_id to Res.|--[[User:Jouni|Jouni]] 21:50, 30 December 2009 (UTC)}}
+|| 3|| Update formula
-|VALIGN="top"|
-{| WIDTH="250px" {{prettytable}}
-|COLSPAN="3"|'''Objinfo''' (previously Inf)
-|-
-|COLSPAN="3"|''Additional information about the object''
-|-
-| '''FIELD'''
-| '''TYPE'''
-| '''EXTRA'''
 |----
-| id
+|| 4|| Upload data (replace)
-| int(10)
-| primary
 |----
-| Begin
+|| 5|| Upload data (append)
-| date
-|
 |----
-| End
+|| 6|| Review scope
-| date
-|
 |----
-| Who
+|| 7|| Review definition
-| varchar(50)
-|
 |----
-| Url
+|| 8|| Add object info
-| varchar(250)
-|
 |----
 |}
 |}
-==See also==
+==Rationale==
+===Data===
+====Software====
+Because Opasnet base will contain very large amounts of mostly numerical information, the state-of-the-art structure is a [[:en:SQL|SQL]] database. Because of its flexibility, ease of use, and cost, [[:en:MySQL|MySQL]] is an optimal choice among SQL software. In addition to the database software, a [[variable transfer protocol]] is needed on top of that so that the results of variables can be retrieved and  new results stored either automatically by a calculating software, or manually by the user. Fancy presenting software can be built on top of the database, but that is not the topic of this page.
+====Storage and retrieval of results of variables====
+The most important functionality is to store and retrieve the results of variables. Because variables may take very different forms (from a single value such as natural constant to an uncertain spatio-temporal concentration field over the whole Europe), the database must be very flexible. The basic solution is described in the [[variable]] page, and it is only briefly summarised here. The result is described as
+   P(R|x<sub>1</sub>,x<sub>2</sub>,...)
+where P(R) is the probability distribution of the result and x<sub>1</sub> and x<sub>2</sub> are defining [[location]]s of an [[index]] where a particular P(R) applies. Typically locations are operationalised as discrete [[Index|indices]]. A variable must have at least one [[index]]. [[Uncertainty]] about the true value of the variable is operationalised as a random sample from the probability distribution, in such a way that the samples are located along an index ''Sample'', which is a list of integers 1,2,3...n, where n=number of samples.
+* [http://en.opasnet.org/en-opwiki/index.php?title=Opasnet_base&oldid=7856 Old description of the structure]
+===Dependencies===
+* [[Opasnet structure]]
+* [[Open assessment]]
+====Replacing some cells====
+It is possible that there is a large data, where there is a need to update only a few cells while all others remain the same. How should this be done? There are a few potential alternatives.
+# Use the current replace functionality. Replace all cells but most of them with the original value.
+# Use a new act_type that is similar to the current append functionality. This should be understood in a way that if there are two (or more) identical cells (based on cell indices and locations), then the newest result is used and all older ones are discarded. (If the old ''append'' is used, then new info is just seen as a new row in the data table, not a replacement of an existing row.
+# Add a new field into the cell (?) table with an updated cell_id (in a similar way than act_id and series_id are used in the actobj table). This way, the new cell can automatically inherit all locations of the old cell.
-===Some useful syntax===
+===Formula structure===
-* http://www.baycongroup.com/sql_join.htm
+Now it has become clear that it is not enough to have samples of the result distributions. It must be possible to completely recalculate the result based on the information in the [[Opasnet Base]]. There are different approaches:
-* [[:image:Opasnet base connection.ANA|Opasnet base connection.ANA]] for Analytica: for writing and reading variable results into and from the database. Writing requires a password. For SQL used in the model, see the model page.
+* Calculate the result based on a formula that may refer to other variables called parents. This is a deterministic approach.
-* [http://en.opasnet.org/en-opwiki/index.php?title=Opasnet_base&oldid=7181#Other_queries Some historical queries]
+* Calculate the result based on the marginal distribution and (conditional) rank correlations with parent variables. This is a probabilistic approach.
-<sql-query display=1>
+This approach requires new tables, namely Formula and Language.
-SELECT Obj.id, Obj.Ident, Obj.Name, Obj.Typ_id, Sty_id, Itemm.Ident as Iident, Itemm.Name as Iname
-FROM Obj
-LEFT JOIN Sett ON Obj.id = Sett.Obj_id
-LEFT JOIN Item ON Sett.id = Item.Sett_id
-LEFT JOIN Obj AS Itemm ON Item.Obj_id = Itemm.id
-</sql-query>
-'''NOTE! The queries below work in the new database "opasnet_base", not "resultdb" as the old versions.
+: {{comment|11|Do we need tables DIF and DIP like Uninet?|--[[User:Jouni|Jouni]] 21:50, 30 December 2009 (UTC)}}
+* DIP
+** DIP_node_id
+** DIP_parent_node_id
+** DIP_corr_coeff
+** DIP_parent_index
+* DIF
+** DIF_node_id
+** DIF_formula
+** DIF_varnames_in_formula
-{{#sql-query:
+===Universal Opasnet Base===
-SELECT Var.Ident, Var.Name, Var.Unit, Run.Ident, Begin, Who, Run.Name as Method
-FROM Obj as Var, Obj as Run, Cell, Objinfo
-WHERE Var.Ident = "Op_en{{PAGEID}}"
-AND Var.id = Cell.Obj_id_v
-AND Run.id = Cell.Obj_id_r
-AND Run.id = Objinfo.id
-GROUP BY Var.id, Run.id
-|Runs}}
-{{#sql-query:
+The idea of universal Opasnet Base says that it should be possible to store results in such a way that the results themselves are public but their interpretation is limited. For example, patient symptoms and clinical test results should be openly available for research, but information about whose results they are should be private. This can be achieved with the following database structure.
-SELECT Var.Ident, Var.Name, Cell.id, N, Begin, Mean, Var.Unit
-FROM Obj as Var, Obj as Run, Cell, Objinfo
-WHERE Var.Ident = "Op_en{{PAGEID}}"
-AND Var.id = Cell.Obj_id_v
-AND Run.id = Cell.Obj_id_r
-AND Run.id = Objinfo.id
-GROUP BY Cell.id
-ORDER BY Run.id DESC, Var.Ident
-|Means and samplesizes (N)}}
-{{#sql-query:
-SELECT Var.Ident, Cell.id, Cell.Obj_id_r as Run, Obs, Result, Var.Unit
-FROM Obj as Var, Cell, Res
-WHERE Var.Ident = "Op_en{{PAGEID}}"
-AND Var.id = Cell.Obj_id_v
-AND Cell.id = Res.Cell_id
-ORDER BY Cell.Obj_id_r, Var.Ident, Cell.id
-|Full sample}}
+[[File:Universal Opasnet Base structure.png|thumb|400px|Universal Opasnet Base has some parts that exist in different versions depending on the privacy level. The yellow areas are e.g. a public area and a private area. The parts that are white are public.]]
-'''List all dimensions that have indices, and the indices concatenated:
+Let's say that it is enough to have two security levels, public and private. A person wants to record personal health information into the database. She logs in with her personal user name. The private profile gives the name (say, Liisa) and social security number of the person, while the public profile says only "30-40-year-old woman in Finland". Liisa writes down her symptoms or medical information and saves them. This is what is stored in the databases:
-<sql-query display="1">
+{| {{prettytable}}
-SELECT Dim.Ident, Dim.Name, Dim.Unit, Group_concat(Ind.Ident
+|+ '''Information stored in the public and private databases. The private database can read tables from the public one but not vice versa.
-ORDER BY Ind.Name SEPARATOR ', ') as Indices
+! Table, field
-FROM Obj AS Dim, Obj as Ind, Sett, Item
+! Private database
-WHERE Dim.id = Sett.Obj_id
+! Public database
-AND Sett.Settype_id=1
+|----
-AND Sett.id = Item.Sett_id
+| act.who
-AND Item.Obj_id = Ind.id
+| Liisa, 010175-1024
-GROUP BY Dim.Name
+| Woman, 30-40 a
-ORDER BY Dim.id
+|----
-</sql-query>
+| act.when
+| 2011-03-09 22:09:10
+| 2011-03
+|----
+| obj.name
+| N/A. Data is taken from public side.
+| Pregnancy test
+|----
+| loccell.loc_id (locations and indices explained)
+| Person = 010175-1024 <br>Time = 2011-03-09 <br>Test = Clearblue digital test
+| Age = 30-40 <br>Sex = Female <br>Country = Finland <br>Time = 2011-03 <br>Test = Clearblue digital test
+|----
+| res.restext
+| N/A. Data is taken from public side.
+| Pregnant 1-2 weeks.
+|----
+|}
+Based on the information, anyone can see that there is a woman in Finland who has used a Clearblue pregnancy test and the result was positive. But there is no way an outsider could connect this information to any particular person, because all information that could be used for linking is located in the private website. However, an authorised person from health case could see the data in the private database and connect Liisa and the test result.
-'''List all indices, and their locations concatenated:
+==See also==
-<sql-query display="1">
+* [http://en.opasnet.org/en-opwiki/index.php?title=Opasnet_base_structure&oldid=18900#All_tables:_Overview A previous discussion about the structure]
-SELECT Ident, Name, Unit, GROUP_CONCAT(Location ORDER BY Roww SEPARATOR ', ') AS Locations
+* [http://en.opasnet.org/en-opwiki/index.php?title=Opasnet_base_structure&oldid=18900#Main_tables A previous structure and related discussions]
-FROM Obj AS Ind, Loc
+* [http://en.opasnet.org/en-opwiki/index.php?title=Opasnet_base_structure&oldid=18900#Tasks_performed Previous tasks performed]
-WHERE Ind.id = Loc.Obj_id_i
-GROUP BY Name
-ORDER BY Name
-</sql-query>
+; A basic query for retrieving the full result of a variable upload (an example):
-'''List all variables and their runs, and also list all indices (concatenated) used for each variable for each run.
+{{#sql-query:
+SELECT obj.ident, obj.name, obj.unit, obj.page, obj.wiki_id, comments, mean, sd, n, location, ind.ident, obs, result, restext
+FROM obj
+LEFT JOIN actobj ON actobj.obj_id = obj.id
+LEFT JOIN act ON actobj.act_id = act.id
+LEFT JOIN cell ON cell.actobj_id = actobj.id
+LEFT JOIN loccell ON loccell.cell_id = cell.id
+LEFT JOIN loc on loccell.loc_id = loc.id
+LEFT JOIN obj AS ind ON loc.obj_id_i = ind.id
+LEFT JOIN res ON res.cell_id = cell.id
+WHERE obj.ident = "Op_en1912"
+AND actobj.series_id = 190
+LIMIT 0,100
+}}
-<sql-query display="1">
+;Some useful syntax
-SELECT Var_id, Run_id, Ident, Name, GROUP_CONCAT(Indic SEPARATOR ', ') AS Indices, N, Method
+* http://www.baycongroup.com/sql_join.htm
-FROM
+* [[:image:Opasnet base connection.ANA|Opasnet base connection.ANA]] for Analytica: for writing and reading variable results into and from the database. Writing requires a password. For SQL used in the model, see the model page.
-   (SELECT Var.id as Var_id, Run.id as Run_id, Var.Ident AS Ident, Var.Name as Name, Ind.Ident AS Indic, N, Run.Name AS Method
+* [http://en.opasnet.org/en-opwiki/index.php?title=Opasnet_base&oldid=7181#Other_queries Some historical queries]
-   FROM Obj AS Var, Obj AS Run, Obj AS Ind, Loccell, Loc, Cell
+* [http://en.opasnet.org/en-opwiki/index.php?title=Opasnet_base_structure&oldid=14214#Some_useful_syntax Some historical queries 2]
-   WHERE Var.id = Cell.Obj_id_v
+[[Category:Opasnet Base]]
-   AND Run.id = Cell.Obj_id_r
+[[Category:Data]]
-   AND Cell.id = Loccell.Cell_id
+[[Category:Opasnet]]
-   AND Loc.id = Loccell.Loc_id
-   AND Ind.id = Loc.Obj_id_i
-   GROUP BY Var_id, Run_id, Ind.Ident ) AS Temp1
-GROUP BY Var_id, Run_id
-</sql-query>