Difference between revisions of "Object-oriented programming in Opasnet"

Revision as of 16:32, 3 April 2012

This page is a method. The page identifier is Op_en5529
Moderator:Jouni (see all)
This page is a stub. You may improve it into a full page, and then a rating bar will appear here.
Upload data

Object-oriented programming is an approach where programs (or, in Opasnet, typically assessment models) have a modular structure in such a way that each part is considered as a separate object that has specific properties and interacts with other objects in standard ways.

Question

How should object-oriented programming be utilised in Opasnet in such a way that

it has seamless connections to R-tools,
it is easy to understand by non-expert users and contributors,
it uses the variable structure and other information structures (e.g. universal object) used in open assessment, and
it enables standards for typical processes in environmental health assessments (such as distribution modelling, life tables, decision optimising, etc.).

Answer

Structure of objects

Objects have two different implementations: wiki page in Opasnet, and S4 class object called oavariable (open assessment variable) in R-tools. The wiki page is the user-friendly interface for users, and oavariable is the versatile format for efficient, standardised modelling. The default direction for data is long (using the terminology in the merge function).

Attribute	What it contains	How implemented in the wiki	How implemented in the R-tools as a S4 class object oavar
question	A research question that defines the topic of the object	First main heading	Slot question = "character". Contains the question as text.
answer	The current best answer to the question, shown as text, data table, or distribution.	Second main heading; contains a single data table. NOTE! The data table is actually under ratonale/data but often it is the same as answer. The actual answer is precisely described by distribution and sample (see below).	Only sub-attributes are implemented.
index	List of indices that are used to specify the answer.	Index columns in the data table	Slot index = "vector" (or factor?). A character vector with all indices used. The content is the same as index parameter in t2b tag.
marginal	A Boolean vector with the same size as index. TRUE if an index is indexing a marginal distribution in sample, FALSE if joint distribution. The difference is that in a marginal distribution there are n iterations for each location of the index, while in joint distribution, there are altogether n iterations in such a way that the frequencies of locations match their probabilities.	Not implemented in wiki.	Slot marginal = "vector". Especially with indices with lots of locations, joint distribution needs much less memory.
observation	An identifier of an individual when the answer consists of a group of individuals.	Obs column (usually implicit because the default is that each row is an observation) in the data table.	Obs column in data.frames data and sample. Not explicitly needed as a slot in S4 object.
iteration	An identifier of a probabilistic run or iteration. Sometimes it is also called a possible world or realisation.	Iter column in the data table (data usually not shown probabilistically in wiki).	Iter column in the data.frames sample (and rarely in data). Not explicitly needed as a slot in S4 object.
distribution	A joint probability distribution (with indices as dimensions) describing the answer mathematically.	Not shown	Slot distribution = "distribution?". A distribution created with e.g. dnorm(0,1). We don't know yet how to actually implement this and how the indices are included.
sample	A random sample from the distribution (default is 10000 iterations).	Not shown	Slot sample = "data.frame". The data frame must contain columns Iter, one column for each index, and at least one result column. There may also be a column Obs.
rationale	Any information that is needed to convince a critical reader that the answer is good.	Third main heading	Only sub-attributes are implemented.
data	Observations, expert judgement, discussions, and other pieces of information.	Subheading under Rationale	Slot data = "data.frame". The data frame must contain at least one index column Obs column, and at least one result column.
unit	The measurement unit(s) that are used in the answer to measure the topic.	Subheading under Rationale with plain text. Also mentioned in the data table with parameter unit.	Slot unit = "vector". The format used is kg m^2 /s^2 where a space implies a multiplication. Unit is a vector with length > 1, if different rows in data have different units.
dependencies	List of upstream objects that are causally related to this object.	Subheading under Rationale with a list of links to upstream (and sometimes downstream) wiki pages.	Slot dependencies = "vector". A character vector where entries have the format "Op_fi:Vaativuusluokkien keskipalkat". If the wiki identifier is omitted, the default is op_en.
formula	A computer code or algorithm to derive the answer from rationale and objects listed in dependencies. The formula may assume a deterministic dependency (e.g. y <- k*x + b), a conditional probability structure (y ~ dnorm(x, sd), or a rank correlation matrix.	Subheading under Rationale, often using <rcode> tags.	Slot formula = "list". There may be several competing algorithms. Each of them is described (as a function?) as one entry in the list. When implementing the formula, the algorithm that is implemented is randomly selected for each iteration with equal probability unless otherwise specified in formula.prob.
formula.prob	A list of probabilities assigned to the competing algorithms in formula. The default is that each has an equal probability.	A detail in <rcode> code.	Slot formula.prob = "vector". Should have the same size as formula.

Methods

R code should be developed in such a way that there are object-specific implementations of critical functions. The user should see straightforward content, and all messy indexing etc should happen behind scenes.

These methods should be implemented for oavariable objects.

show, print: show the data slot.
plot: plot the sample, showing one (the first by default) marginal index with all locations and all other marginal indices with the first location only.
tidy: applies to data: remove id column; add Obs and Iter columns if they do not exist; Change the direction from wide to long.
createSample: create sample directly from data using interp.input.
GetSample, GetData: extract sample and data from the object, respectively.
Ops: applies to sample: merge two oavariables based on index columns, then perform the Ops operation to the result columns.
standardUnits: Based on units, transform the result column of data to SI units using Unit transformations table; then update unit.
demarginalize: turn one specified index from marginal to joint format.

Rationale

References

Related files

@@ Line 14: / Line 14: @@
 ===Structure of objects===
-Objects have two different implementations: wiki page in Opasnet, and S4 class object called ''oavar'' in R-tools. The wiki page is the user-friendly interface for users, and oavar is the versatile format for efficient, standardised modelling.
+Objects have two different implementations: wiki page in Opasnet, and S4 class object called ''oavariable'' (open assessment variable) in [[R-tools]]. The wiki page is the user-friendly interface for users, and oavariable is the versatile format for efficient, standardised modelling. The default direction for data is long (using the terminology in the merge function).
 {| {{prettytable}}
@@ Line 35: / Line 35: @@
 | List of indices that are used to specify the answer.
 | Index columns in the data table
-| Slot index = "data.frame". A data frame where the first column is a factor with all indices used, and the second column is a factor with the index type (text, real, date) where text is a discrete index type.
+| Slot index = "vector" (or factor?). A character vector with all indices used. The content is the same as index parameter in t2b tag.
+|----
+| '''marginal
+| A Boolean vector with the same size as index. TRUE if an index is indexing a marginal distribution in sample, FALSE if joint distribution. The difference is that in a marginal distribution there are n iterations for each location of the index, while in joint distribution, there are altogether n iterations in such a way that the frequencies of locations match their probabilities.
+| Not implemented in wiki.
+| Slot marginal = "vector". Especially with indices with lots of locations, joint distribution needs much less memory.
 |----
 | '''observation
 | An identifier of an individual when the answer consists of a group of individuals.
-| Obs column (usually implicit because the default is that each row is an observation) in the data table
+| ''Obs'' column (usually implicit because the default is that each row is an observation) in the data table.
-| Slot observation = "vector". A vector from 1..o where o is the number of observations.
+| ''Obs'' column in data.frames data and sample. Not explicitly needed as a slot in S4 object.
 |----
 | '''iteration
 | An identifier of a probabilistic run or iteration. Sometimes it is also called a possible world or realisation.
-| Iter column in the data table (data usually not shown probabilistically)
+| ''Iter'' column in the data table (data usually not shown probabilistically in wiki).
-| Slot iteration = "vector". A vector from 1..n where n is the number of iterations. Default n = 10000.
+| ''Iter'' column in the data.frames sample (and rarely in data). Not explicitly needed as a slot in S4 object.
 |----
 | '''distribution
@@ Line 65: / Line 70: @@
 | Observations, expert judgement, discussions, and other pieces of information.
 | Subheading under Rationale
-| Slot data = "data.frame". The data frame must contain at least one index column, and at least one result column. If column Obs is omitted, it is assumed that each row is a separate observation.
+| Slot data = "data.frame". The data frame must contain at least one index column Obs column, and at least one result column.
 |----
 | '''unit
 | The measurement unit(s) that are used in the answer to measure the topic.
 | Subheading under Rationale with plain text. Also mentioned in the data table with parameter unit.
-| Slot unit = "character". The format used is kg m^2 /s^2 where a space implies a multiplication.
+| Slot unit = "vector". The format used is ''kg m^2 /s^2'' where a space implies a multiplication. Unit is a vector with length > 1, if different rows in data have different units.
 |----
 | '''dependencies
 | List of upstream objects that are causally related to this object.
 | Subheading under Rationale with a list of links to upstream (and sometimes downstream) wiki pages.
-| Slot dependencies = "factor". The format is Op_fi:Vaativuusluokkien keskipalkat. If the wiki identifier is omitted, the default is op_en.
+| Slot dependencies = "vector". A character vector where entries have the format "Op_fi:Vaativuusluokkien keskipalkat". If the wiki identifier is omitted, the default is op_en.
 |----
 | '''formula
-| A computer code or algorithm to derive the answer from rationale and objects listed in dependencies. The formula may assume a deterministic dependency (e.g. y = k*x + b), a conditional probability structure (y ~ dnorm(x, sd), or a rank correlation matrix.
+| A computer code or algorithm to derive the answer from rationale and objects listed in dependencies. The formula may assume a deterministic dependency (e.g. y <- k*x + b), a conditional probability structure (y ~ dnorm(x, sd), or a rank correlation matrix.
 | Subheading under Rationale, often using &lt;rcode&gt; tags.
 | Slot formula = "list". There may be several competing algorithms. Each of them is described (as a function?) as one entry in the list. When implementing the formula, the algorithm that is implemented is randomly selected for each iteration with equal probability unless otherwise specified in formula.prob.
@@ Line 89: / Line 94: @@
 |}
-===Coding===
+===Methods===
 R code should be developed in such a way that there are object-specific implementations of critical functions. The user should see straightforward content, and all messy indexing etc should happen behind scenes.
+These methods should be implemented for oavariable objects.
+* show, print: show the data slot.
+* plot: plot the sample, showing one (the first by default) marginal index with all locations and all other marginal indices with the first location only.
+* tidy: applies to data: remove id column; add Obs and Iter columns if they do not exist; Change the direction from wide to long.
+* createSample: create sample directly from data using interp.input.
+* GetSample, GetData: extract sample and data from the object, respectively.
+* Ops: applies to sample: merge two oavariables based on index columns, then perform the Ops operation to the result columns.
+* standardUnits: Based on units, transform the result column of data to SI units using [[Unit transformations]] table; then update unit.
+* demarginalize: turn one specified index from marginal to joint format.
 ==Rationale==

Difference between revisions of "Object-oriented programming in Opasnet"

Revision as of 16:32, 3 April 2012

Contents

Question

Answer

Structure of objects

Methods

Rationale

See also

References

Related files

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Page Tools

Tools

In other websites