Difference between revisions of "Variable"

From Testiwiki
Jump to: navigation, search
(moved from Intarese)
 
m
Line 1: Line 1:
 +
<accesscontrol>members of projects</accesscontrol>
 
[[Category:Universal product]]
 
[[Category:Universal product]]
 
[[Category:Guidebook]]
 
[[Category:Guidebook]]

Revision as of 21:02, 29 January 2008

<accesscontrol>members of projects</accesscontrol> Variable is a description of a particular piece of reality. It is the basic building block of a risk assessment. It can be a description of physical phenomena, e.g. yearly average of PM2.5 concentration in Kuopio in 2006, or a description of a value judgement, e.g. willingness to pay to avoid lung cancer. Variables (the scopes of variables) can be more general or more specific and hierarchically related, e.g. yearly average of PM2.5 concentration in Finland in 2006 (general variable) and daily average of PM2.5 concentration in Kuopio on January 1st, 2006.

In order to make coherent descriptions of reality in assessments, the assessments must have a certain clear structure. As we also want to produce descriptions that are coherent between assessments, there must be a universal structure for all assessments. Variables with a certain set of attributes, and linkages between these variables is the universal structure of the assessments. For further details, see Guidance and methods for indicator selection and specification |. The universal assessment structure is essential for coherent inclusion of causality in assessments, enabling of collective structured learning, collaborative work as well as combining value judgements with descriptions of physical reality.

Variable structure

In the new risk assessment method, variables have a specified structure with four basic attributes (and possibly some sub-attributes). The attributes of variables are the same as for other objects in the information structure of pyrkilo method, i.e. assessments and classes.


Name attribute is the identifier of the variable, which of course already more or less describes what the real-world entity the variable describes is. The variable names should be chosen so that they are descriptive, unambiguous and not easily confused with other variables. An example of a good variable name could be e.g. daily average of PM2.5 concentration in Helsinki.

Scope attribute defines the boundaries of the variable - what does it describe and what not? The boundaries can be e.g. spatial, temporal or abstract. In the above example variable, at least the geographical boundary restricts the coverage of the variable to Helsinki and the considered phenomena are restricted to PM2.5 daily averages. There could also be some further boundary settings defined in the scope of the variable, which are not explicitly mentioned in the name of the variable.

Help:Variable definition

Result attribute is an answer to the question presented in the scope of the variable. A result is preferably a probability distribution (which can in a special case be a single number), but a result can also be non-numerical such as "very good". It should be noted that the result is the distribution itself, although it can be expressed as some kind of description of the distribution, such as mean and standard deviation. The result should be described in such a detailed way that the full distribution can be reproduced from the information presented under this attribute. A technically straightforward way to do this is to provide a large random sample from the distribution.

The result may be a different number for different locations, such as geographical positions, population subgroups, or other determinants, Then, the result is described as

  R|x1,x2,... 

where R is the result and x1 and x2 are defining the locations. A dimension means a property along which there are multiple locations and the result of the variable may have different values when the location changes. In this case, x1 and x2 are dimensions, and particular values of x1 and x2 are locations. A variable can have zero, one, or more dimensions. Even if a dimension is continuous, it is usually operationalised in practice as a list of discrete locations. Such a list is called an index, and each location is called a row of the index.

Uncertainty about the true value of the variable is one dimension. The index of the uncertainty dimension is called the Sample index, and it contains a list of integers 1,2,3... . Uncertainty is operationalised as a sequence of random samples from the probability distribution of the result. The ith random sample is located in the ith row of the Sample index.


General attribute structure

Each attribute may contain three kinds of information:

  • Actual content (only this will have an impact on other objects)
  • Narrative description (to help understanding the actual content). Includes uncertainty analysis.
  • Discussion (argumentation about issues in the actual content)

For a detailed description of discussions, see Help:Dispute.

Connection to the PSSP structure

A universal information structure has been suggested. This is called PSSP (Purpose, Structure, State, Performance). PSSP describes the attributes of universal objects, whereas pyrkilo method is intended for describing particular objects in the context of risk assessment. The variable structure is closely connected to PSSP, and the relationships can be described in the following way.

PSSP Variable structure
Purpose The purpose of a variable is to describe a particular piece of reality.
Structure Scope, Unit, and Definition describe the structure of the variable.
State Result is an expression of the state of the variable.
Performance Performance is an expression of the uncertainty of the variable, i.e. how well does the variable fulfill its purpose, i.e. describe the piece of reality defined in the scope. On variable-level performance is evaluated separately for result (parameter uncertainty) and definition (model uncertainty). However, evaluating the performance of a scope of a variable can not be done on the variable-level, but instead on assessment-level.

There are different kinds of variables

Although all variables share the same basic structure, it is useful to distinguish different kinds of variables based on their use or position in a risk assessment.

  • Endpoint variables are variables that describe phenomena which are outcomes of the assessed causal network, i.e. there is no variables downstream from an endpoint variable according to the scope of the assessment. In practice endpoint variables are most often also chosen as indicators.
  • Intermediate variables include all other variables besides endpoint variables.
  • Key variable D↷is a variable which is particularly important in carrying out the assessment successfully and/or assessing the endpoints adequately.
  • Indicator is a variable that is particularly important in relation to the interests of the intended users of the assessment output or other stakeholders. Indicators are used as means of effective communication of the assessment results. Communication here refers to conveying information about certain phenomena of interest to the intended target audience of the assessment output, but also to monitoring the statuses of the certain phenomena e.g. in evaluating effectiveness of actions taken to influence that phenomena. In the context of integrated assessment indicators can generally be considered as pieces of information serving the purpose of communicating the most essential aspects of a particular risk assessment to meet the needs of the uses of the assessment. Indicators can be endpoint variables, but also any other variables located anywhere in the causal network.
  • Decision variables are possible decisions that are in consideration within a risk assessment. The main interest of the assessment is then the comparison of outcomes resulting from the different decision options. More of decision variables can be found from a separate page.


Variables are versatile objects. They are able to describe all of the following aspects of reality:

  • Causal relationships linking variables in the different steps in the causal chain from source to impact (mainly in the definition/causality attribute);
  • Different environmental, social, economic and infrastructural contexts in which risks might arise and play out (mainly in the scope attribute);
  • Physical and chemical processes that generate, transform and transport the hazards (agents) from source to the target organs in the human body (mainly as variables that are defined as functions);
  • Indicators to describe and communicate the causal chain and impacts (variables selected for reporting);
  • Different policy measures that might be taken to address the risks, and thus different assessment scenarios that might be compared (decision variables);
  • Appraisal of the impacts (and the policy scenarios to which they relate), in the light of agreed value systems and rules for evaluation (variables describing value judgements or derived from value judgement variables).
  • Adaptation and feedback loops arising as a result of adaptation to the risks, at both individual and institutional level. A feedback loop is described as a variable that is indirectly dependent on the result of itself at a previous time point.

Ideally, all variables in the full-chain can be expressed quantitatively. In order to use the full chain approach quantitatively in an integrated assessment, it is necessary to acquire data for the variables, or to estimate these variables by modelling the underlying causal processes.

Proxies are not indicators

The term indicator is sometimes also (mistakenly, in the eyes of the new risk assessment method) used in the meaning of a proxy. Proxies are used as replacements for the actual objects of interest in a description if adequate information about the actual object of interest is not available. Proxies are indirect representations of the object of interest that usually have some identified correlation with the actual object of interest. At least within the context of the new risk assessment method, proxy and indicator have clearly different meanings and they should not be confused with each other. The figure below attempts to clarify the difference between proxies and indicators:


Error creating thumbnail: Unable to save thumbnail to destination


In the example, a proxy (PM10 site concentration) is used to indirectly represent and replace the actual object of interest (exposure to traffic PM2.5). Mortality due to traffic PM2.5 is identified as a variable of specific interest to be reported to the target audience, i.e. selected as an indicator. The other two nodes in the graph are considered as ordinary variables. The above graph has been made with Analytica, here is the the original Analytica file.

Specifying indicators and other variables

When the endpoints, indicators and key variables have been identified, they should be specified in more detail. Additional variables are created and specified in addition to the endpoints, indicators and key variables as is necessary to complete the causal network. Specifying these variables means defining the contents of the attributes of each variable. The four plausibility tests are very useful in specifying variables.

Help:Plausibility tests

The specification of variables proceeds in iterative steps, going into more detail as the overall understanding of the assessed phenomena increases. First, it is most crucial to specify the scopes (and names) of the variables and their causal relations. As part of the specification process, in particular the name and scope attributes, the clairvoyant test can be applied. The test helps to ensure the clarity and unambiguity of the variable scope.

Addressing causalities means in practice that all changes in any variable description should be reflected in all the variables that the particular variables is causally linked to. At this point, the causality test can be used, although not always necessarily quantitatively. In the early phases of the process, it is probably most convenient to describe causal networks as diagrams, representing the indicators, endpoints, key variables and other variables as nodes (or boxes) and causal relations as arrows pointing from upstream variables to downstream variables. In the graphical representations of causal networks the arrows are only statements of existence of a causal relation between particular variables, more detailed definitions of the relations should be described within the definition attribute of each variable according to how well the causal relation is known or understood.

Once a relatively complete and coherent graphical representation of the causal network has been created, the specification process for the identified indicators may continue to more detail. The indicators, the leading variables, are of crucial importance in the assessment process. If, during the specification process, it turns out that the indicator would conflict with one or several of the properties of good indicators, such as calibration, it may be necessary to consider revising the scoping of the indicator or choosing another leading variable in the source - impact chain to replace it. This may naturally bring about a partial revision of the whole causal network affecting a bunch of key variables, endpoints and indicators. For example, it may happen that no applicable exposure-response function is available for calculating the health impact from intake of ozone. In this case, the exposure-response indicator may be replaced with an intake fraction indicator affecting both the downstream and upstream variables in the causal network in the form of e.g. bringing about a need to change the units the variables are described in.

The description, unit and definition attributes are specified as is explained in the previous section. The unit test can be applied to check the calculability, and thus descriptive coherence, of the causal network. When all the variables in the network appear to pass the required tests, the indicator and variable results can be computed across the network and the first round of iteration is done. Improvement of the description takes place through deliberation and re-specification of the variables, especially definition and result attributes, until an adequate level of quality of description throughout the network has been reached. The discussion attribute provides the place for deliberating and documenting deliberation throughout the process.

Importance of indicators in the assessment process

Indicators have a special role in making the assessment. As mentioned above, indicators are the variables of most interest from the point of view of the use, users and other audiences of the assessment. The idea thus behind the indicator selection, specification and use is to highlight the most important and/or significant parts of the source-impact chain which are to be assessed and subsequently reported. The selected set of indicators guides the assessment process to address the relevant issues within the assessment scope according to the purpose of the assessment. It could be said that indicators are the leading variables in carrying out the assessment, other variables are subsidiary to specifying the indicators.

However, within the context of integrated risk assessment, selecting and specifying indicators may sound more straightforward than it actually is. Maybe, identification of indicators and specification of the causal network in line with the identified indicators, could grasp the essence of the process better. Instead of merely picking from a predefined set of indicators, selection here refers rather to identifying the most interesting phenomena within the scope of the assessment in order to describe and report them as indicators. Specification of indicators then is similar to specification of all other variables, although indicators are the ones that are primarily considered while other variables are considered secondarily, and mainly in relation to the indicators.

In principle, any variable could be chosen as an indicator and the set(s) of indicators could be composed of any types of indicators across the full-chain description. In practice, the generally relevant types of indicators, such as performance indicators can be somewhat predefined and even some detailed indicators can be defined in relation to commonly existing purposes and user needs. This kind of generality is also helpful in bringing coherence between the assessments.

On the generalizability of variables

Aim: Variables must be generalizable so that they can be used without additional knowledge of the context. In other words, the context must be described well enough inside the variable.

→ Because of this, the variables must be estimates about the truth, and not deliberate under- or overestimates. Biased estimates are common in risk assessment because usually the assessments want to avoid false negative results much more than false positive results. In other words, it is much worse if there is a risk and you don't find it than if there is no risk and you think there is.

→ Decisions may be based on risk aversion, but the estimates of variables must be best estimates, because you cannot know which decisions will be based on the variable.

Technical issues in Mediawiki

  • Each variable is a page in the Variable namespace. The name of the variable is also the name of the page. However, draft variables may be parts of other pages.
  • The scope is the first paragraph(s) on the page, before the first sub-title. Scope starts with the word Scope in the previous line (wiki code '''Scope'''<br>. The name should be repeated in the beginning of scope in bold, followed by text "describes..." and then a description of the scope (whenever the contents fits in this format). Subtitles are NOT used with Scope; this way, it locates above the table of contents.
  • All other attributes (unit, definition, result) are second-level (==) sub-titles on the page.
  • Description of the attribute content is added at the end of that content; discussions on the content are added to the Talk page, each discussion under an own descriptive title.
  • References to external sources are added to the text with the <ref>Reference information</ref> tag. The references are located in the end of the page under subtitle References. However, reference is not an attribute of the variable despite it is technically similar.
  • In the formula, computer code for a specific software may be used. The following are in use.
    • Analytica_id: Identifier of the respective node in an Analytica model. <anacode>Place your Analytica code here. Use double Enter to make a line break.</anacode>
    • <rcode>Place you R code here. Use double Enter to make a line break.<rcode>

See also