Difference between revisions of "Quality evaluation criteria"

From Testiwiki
Jump to: navigation, search
(format of the result)
(new idea of coverage, first half)
 
Line 1: Line 1:
 
[[Category:Quality control]]
 
[[Category:Quality control]]
{{variable}}
+
{{variable|moderator=Jouni}}
 +
'''Quality evaluation criteria''' are a set of measures that together describe the [[quality of content]] of an information object.
 +
 
 
==Scope==
 
==Scope==
  
Line 9: Line 11:
 
==Definition==
 
==Definition==
  
===Data===
+
* [[Properties of a good assessment]]
 +
 
 +
Previous references:
 +
P-Box: <ref>McGill Research blog on P-Box [http://www.professormcgill.com/blog/category/p-box/]</ref><ref>Scott Ferson, W. Troy Tucker: Sensitivity analysis using probability bounding. Reliability Engineering and System Safety 91 (2006) 1435–1442. [http://www.ramas.com/wttreprints/FersonTucker06.pdf]</ref> approach.
 +
Vines and copulas: <ref>Delft University of Technology, 2nd Vine Copula Workshop, 16-17 December 2008 [http://dutiosc.twi.tudelft.nl/~risk/index.php?view=article&catid=4%3Aall-workshops&id=18%3Aworkshop-2008&option=com_content&Itemid=7]</ref>
  
* NUSAP criteria could be used here?
+
The quality of the content of a variable can be divided into two main parts based on the golden standard: goodness of the result estimate compared with the truth, and goodness of the description of existing data compared with the actually existing data.
  
====Amount of data====
+
In practice, the [[dependencies]] and [[result]] can be compared with the truth, while [[data]] and [[formula]] can be compared with the actually existing data. This is because result and dependencies typically contain few references, while data and formula contain a lot. However, this can vary a lot from one variable to another and should not be used as a strict rule.
  
Amount of data can be measured with this question: '''How many sets of independent observations are used as data for defining the object?
+
'''Comparison with the truth
  
Typically, each individual scientific article is one set of observations. However, if several articles are derived from the same observations, they should be counted as one set. There are also intermediate situations. For example, several follow-ups may be published from the same cohort. They are clearly not independent, but a later follow-up clearly includes observations that are not included in the previous one. In this case, the discussion should be whether the previous follow-up has any additional merit given the later one. It might for example be better for describing the impacts of exposures occurring at early stages of the follow-up.
+
Of course, truth is never actually known precisely, but still people are able to give their personal estimates ([[subjective probability|subjective probabilities]]) about the result of a variable. In addition, they are able to estimate which variables are causally related to the variable under consideration (i.e., parent variables). To operationalise this, two concepts are defined.
  
===Dependencies===
+
;[[Result range]]: Result range ''R'' is a range of plausible values within which the true value of the variable is located [r<sub>l</sub>, r<sub>u</sub>]. It is described as a part of the [[result]] of a [[variable]]. What "plausible" exactly means is somewhat fuzzy, as it is based on the evaluation by the group of people who have produced the current version of the variable result.
 +
;[[Coverage]]: Coverage is defined as two subjective probabilities given by a user: probability that the truth is actually below the [[result range]]; and probability that the truth is actually above the [[result range]]. If the variable is not quantitative, but e.g. a discrete probability distribution with non-ordered values, coverage means any values that are not defined in the discrete distribution; in this case, coverage is described by only one probability estimate.
  
* [[Properties of good assessments]]
+
Coverage can be estimated for the result, and this is what is usually meant by the word. However, coverage can also be estimated for upstream dependencies. Then, it means the probability that the rank correlation between this variable and a parent is smaller (or larger) than the current estimate (described in [[dependencies]]). It is important to notice that if a dependency is not mentioned at all, this implies a rank correlation between the variable and its parent that is exactly 0.
  
== Result ==
+
The individual coverage estimates can be aggregated into a probability distribution that is wide enough to capture the true value with high subjective aggregated probability estimated by the group. The distribution is clearly wider than the aggregated best estimate of the distribution. The usefulness of coverage is that with a draft assessment, coverage can be used in [[VOI]] analysis, and it is unlikely to result in false negative (the distribution being too narrow falsely implying that no further research is needed).
  
===Format of the result===
+
Subjective coverages can be aggregated to what is called group coverage.
  
This technical classification can be used for object results that are quantitative.
+
== Result ==
  
:0.&nbsp;&nbsp; Unspecified.
+
===Coverage===
# The result is only a placeholder to enable technical usage of the object in an assessment. It may e.g. contain the indices to-be-used, but it contains little or no information about the result itself.
 
# The result is a point estimate without any uncertainty information.
 
# The result defines the [[result domain]], i.e. the range in which all plausible values fall. It does not contain probabilistic information except e.g. a uniform distribution across the whole range. This does not imply that all values are equally likely but that any value is possible within the result domain, in a similar sense as in P-Box<ref>McGill Research blog on P-Box [http://www.professormcgill.com/blog/category/p-box/]</ref><ref>Scott Ferson, W. Troy Tucker: Sensitivity analysis using probability bounding. Reliability Engineering and System Safety 91 (2006) 1435–1442. [http://www.ramas.com/wttreprints/FersonTucker06.pdf]</ref> approach.
 
# The result is defined as a marginal probability distribution with no explication of correlations with other objects.
 
# The result is defined as a marginal probability distribution with rank correlations (vines<ref>Delft University of Technology, 2nd Vine Copula Workshop, 16-17 December 2008 [http://dutiosc.twi.tudelft.nl/~risk/index.php?view=article&catid=4%3Aall-workshops&id=18%3Aworkshop-2008&option=com_content&Itemid=7]</ref>) to objects that are causally linked or correlated with it.
 
# The result is defined as a full joint distribution with objects that are causally linked or correlated with it.
 
  
===Amount of data===
 
  
'''Amount of data is a quantitative measure with the following [[result domain]]:
 
* Number of independent sources of information. This doesn't need to be an integer, if sources are only partly independent.
 
* 0 means a "guesstimate" where the object result is based on general knowledge without any citable sources of information.
 
* -1 means a place holder that contains little or no information about the topic; it is just used to make a technically complete object that can be used in testing the usage of the object in e.g. an assessment model.
 
 
   
 
   
 
==See also==
 
==See also==
  
 +
* [http://en.opasnet.org/en-opwiki/index.php?title=Quality_evaluation_criteria&oldid=7317#Amount_of_data Previous idea about "Amount of data"]
 +
* [http://en.opasnet.org/en-opwiki/index.php?title=Quality_evaluation_criteria&oldid=7317#Format_of_the_result Previous idea about a technical classification of quantitative results] (placeholder, guesstimate, result range, marginal distribution, joint distribution)
  
 
==References==
 
==References==
  
 
<references/>
 
<references/>

Latest revision as of 18:26, 23 May 2010



Quality evaluation criteria are a set of measures that together describe the quality of content of an information object.

Scope

What is a set of quality evaluation measures such that it fulfils the following criteria:

Definition

Previous references: P-Box: [1][2] approach. Vines and copulas: [3]

The quality of the content of a variable can be divided into two main parts based on the golden standard: goodness of the result estimate compared with the truth, and goodness of the description of existing data compared with the actually existing data.

In practice, the dependencies and result can be compared with the truth, while data and formula can be compared with the actually existing data. This is because result and dependencies typically contain few references, while data and formula contain a lot. However, this can vary a lot from one variable to another and should not be used as a strict rule.

Comparison with the truth

Of course, truth is never actually known precisely, but still people are able to give their personal estimates (subjective probabilities) about the result of a variable. In addition, they are able to estimate which variables are causally related to the variable under consideration (i.e., parent variables). To operationalise this, two concepts are defined.

Result range
Result range R is a range of plausible values within which the true value of the variable is located [rl, ru]. It is described as a part of the result of a variable. What "plausible" exactly means is somewhat fuzzy, as it is based on the evaluation by the group of people who have produced the current version of the variable result.
Coverage
Coverage is defined as two subjective probabilities given by a user: probability that the truth is actually below the result range; and probability that the truth is actually above the result range. If the variable is not quantitative, but e.g. a discrete probability distribution with non-ordered values, coverage means any values that are not defined in the discrete distribution; in this case, coverage is described by only one probability estimate.

Coverage can be estimated for the result, and this is what is usually meant by the word. However, coverage can also be estimated for upstream dependencies. Then, it means the probability that the rank correlation between this variable and a parent is smaller (or larger) than the current estimate (described in dependencies). It is important to notice that if a dependency is not mentioned at all, this implies a rank correlation between the variable and its parent that is exactly 0.

The individual coverage estimates can be aggregated into a probability distribution that is wide enough to capture the true value with high subjective aggregated probability estimated by the group. The distribution is clearly wider than the aggregated best estimate of the distribution. The usefulness of coverage is that with a draft assessment, coverage can be used in VOI analysis, and it is unlikely to result in false negative (the distribution being too narrow falsely implying that no further research is needed).

Subjective coverages can be aggregated to what is called group coverage.

Result

Coverage

See also

References

  1. McGill Research blog on P-Box [1]
  2. Scott Ferson, W. Troy Tucker: Sensitivity analysis using probability bounding. Reliability Engineering and System Safety 91 (2006) 1435–1442. [2]
  3. Delft University of Technology, 2nd Vine Copula Workshop, 16-17 December 2008 [3]