Ovariable

From Testiwiki
Jump to: navigation, search
For updates about modelling instructions, see Portal:Modelling with Opasnet and Modelling in Opasnet.

Ovariable is an object in R software. It is the basic building block of an open assessment model.

Question

What is the structure of an ovariable such that

  • it complies with the requirements of variable and
  • it is able to implement probabilistic descriptions of multidimensional variables and
  • it is able to implement different scenarios?

Answer

The ovariable is a class (S4 object) defined by OpasnetUtils in R software system. Its purpose is to contain the current best answer in a machine-readable format (including uncertainties when relevant) to the question asked by the respective variable. In addition, it contains information about how to derive the current best answer. The respective variable may have an own page in Opasnet, or it may be implicit so that it is only represented by the ovariable and descriptive comments within a code.

It is useful to clarify terms here. Answer is the overall answer to the question asked, so it is the reason for producing the Opasnet page in the first place. This is why it is typically located near the top of an Opasnet page. The answer may contain text, tables, or graphs on the web page. It typically also contains an R code with a respective ovariable, and the code produces these representations of the answer when run. (However, the ovariable is typically defined and stored under Rationale/Calculations, and the code under Answer only evaluates and plots the result.) Output is the key part (or slot) of the answer within an ovariable. All other parts of the ovariable are needed to produce the output, and the output contains what the reader wants to know about the answer. Finally, Result is the key column of the Output table (or data.frame) and contains the actual numerical values for the answer.

Slots

The ovariable has seven separate slots that can be accessed using X@slot:

@name
  • Name of <self> (the ovariable object) is a requirement since R doesn't support self reference.
@output
  • The current best answer to the question asked.
  • A single data.frame (a 2D table type in R)
  • Not defined until <self> is evaluated.
  • Possible types of columns:
    • Result is the column that contains the actual values of the answer to the question of the variable. There is always a result column, but its name may vary; it is of type ovariablenameResult.
    • Indices are columns that define or restrict the Result in some way. For example, the Result can be given separately for males and females, and this is expressed by an index column Sex, which contains values Male and Female. So, the Result contains one row for males and one for females.
    • Iter is a specific kind of index. In Monte Carlo simulation, Iter is the number of the iteration.
    • Unit contains the unit of the Result. It may be the same for all rows, but it may also vary from one row to another. Unit is not an index.
    • Other, non-index columns can exist. Typically, they are information that were used for some purpose during the evolution of the ovariable, but they may be useless in the current ovariable. Due to these other columns, the output may sometimes be a very wide data.frame.
@data
  • A single data.frame that defines <self> as such.
  • data slot answers this question: What measurements are there to answer the question? Typically, when data is used, the result can be directly derived from the information given (with possibly some minimal manipulation such as dropping out unnecessary rows).
  • May include textual regular expressions that describe probability distributions which can be interpreted by OpasnetUtils/Interpret.
@marginal
  • A logical vector that indicates full marginal indices (and not parts of joint distributions, result columns or other row-specific descriptions) of @output.
@formula
  • A function that defines <self>.
  • Should return either a data.frame or an ovariable.
  • @formula and @dependencies slots are always used together. They answer this question: How can we estimate the answer indirectly? This is the case if we have knowledge about how the result of this variable depends on the results of other variables (called parents). The @dependencies is a table of parent variables and their identifiers, and @formula is a function that takes the results of those parents, applies the defined code to them, and in this way produces the @output for this variable.
@dependencies
  • A data.frame that contains names and Rtools or Opasnet tokens/identifiers of variables required for <self> evaluation (list of causal parents). The following columns may be used:
    • Name: name of an ovariable or a constant (one-row numerical vector) found in the global environment (.GlobalEnv).
    • Key: the run key (typically a 16-character alphanumeric string) to be used in objects.get() function where the run contains the dependent object.
    • Ident: Page identifier and rcode name to be used in objects.latest() function where the newest run contains the dependent object. Syntax: "Op_en6007/answer".
  • A way of enabling references in R (for in ovariables at least) by virtue of OpasnetUtils/ComputeDependencies which creates variables in .GlobalEnv so that they are available to expressions in @formula.
  • Variables are fetched and evaluated (only once by default) upon <self> evaluation.
@ddata
  • A string containing an Opasnet identifier e.g. "Op_en1000". May also contain a subset specification e.g. "Op_en1000/dataset".
  • This identifier is used to download data from the Opasnet database for the @data slot (only if empty by default) upon <self> evaluation.
  • By default, the data defined by @ddata is downloaded when an ovariable is created. However, it is also possible to create and save an ovariable in such a way that the data is downloaded only when the ovariable is evaluated.

Decisions and other upstream orders

The general idea of ovariables is such that they should not be modified to match a specific model but rather define the variable in question as extensively as possible under it's scope. In other words, it should answer its question in a re-usable way so that the question and answer would be useful in many different situations. (Of course, this should be kept in mind already when the question is defined.) To match the scope of specific models, ovariables can be modified by supplying orders upstream (outwards in the recursion tree). These orders are checked for upon evaluation. For example decisions in decision analysis can be supplied this way:

  1. pick an endpoint
  2. make decision variables for any upstream variables (this means that you create new scenarios with particular deviations from the actual or business-as-usual answer of that variable)
  3. evaluate endpoint
  4. optimize between options defined in decisions.

Other orders include: collapse of marginal columns by sums, means or sampling to reduce data size and passing input from model level without redefining the whole variable. It is also possible to redefine any specific variable before starting the recursive evaluation, in which case the recursion stops at the defined variable (dependencies are only fetched if they do not already exist; this is to avoid unnecessary computation).

Rationale

Basic properties and ideas

The objectives of ovariable modelling are to

  • Separate actual code and parameter values, so that whenever possible, parameters are shown in tables on Opasnet pages.
  • Enable straightforward equations with multidimensional objects in the same was as in AnalyticaTM "Intelligent arrays".
  • Enable uncertainty propagation using standard Monte Carlo.
  • Enable object-oriented modelling in such a way that a model itself "knows" what input it needs, so that the user does not need to worry about that.
  • Enable easy scenario analysis so that the user can change any value within a model and compare the original model with the changed model, also called a "scenario" or "counterfactual world".
  • Enable intuitive expression of uncertainties. For example, an input value "5 - 7" is easily understood by a reader as an uncertain thing with possible values between five and seven. The same input is understood by the modelling system as a uniform probability distribution with min 5 and max 7, sampled by default 1000 times.

See also

Related files