Difference between revisions of "Evaluating assessment performance"

From Testiwiki
Jump to: navigation, search
m (links fixed)
m
Line 36: Line 36:
 
The general purpose of a variable is to describe a particular piece of reality and the general purpose of an assessment, being a collection of variables, is similarly to describe a particular, but broader, piece of reality as a collection of variables. What is the ''particular piece of reality'' that each particular object refers to is defined under the scope attribute of each object. The performance of these objects, in light of quality of content, is therefore evaluated in relation to the general purpose of the object type and the defined scope of the particular object.
 
The general purpose of a variable is to describe a particular piece of reality and the general purpose of an assessment, being a collection of variables, is similarly to describe a particular, but broader, piece of reality as a collection of variables. What is the ''particular piece of reality'' that each particular object refers to is defined under the scope attribute of each object. The performance of these objects, in light of quality of content, is therefore evaluated in relation to the general purpose of the object type and the defined scope of the particular object.
  
As mentioned above, evaluating the quality of content is comparing a description against reality. This evaluation thus happens as comparison against the general purpose of assessments and variables, given the defined scope. In other words, how good are the definition and result of an object in describing the particular piece of reality defined in the scope of that object. The goodness can be evaluated/measured in terms of informativeness, calibration and relevance (see [[Help:General properties of good risk assessments]] for explanation of these sub-properties).
+
As mentioned above, evaluating the quality of content is comparing a description against reality. This evaluation thus happens as comparison against the general purpose of assessments and variables, given the defined scope. In other words, how good are the definition and result of an object in describing the particular piece of reality defined in the scope of that object. The goodness can be evaluated/measured in terms of informativeness, calibration and relevance (see [[Purpose and properties of good assessments]] for explanation of these sub-properties).
  
 
Variables and assessments can also have (and in practice do have) an instrumental purpose. This is related to the use purpose of the object and is evaluated/measured in terms of the applicability and its sub-properties, i.e. usability, availability and acceptability. Although the instrumental purpose of an individual variable can be defined, it is always defined on assessment-level. Therefore applicability properties are always only evaluable/measurable for assessments, not for individual variables. The reasoning behind this is explained in more detail in the following paragraph(s).
 
Variables and assessments can also have (and in practice do have) an instrumental purpose. This is related to the use purpose of the object and is evaluated/measured in terms of the applicability and its sub-properties, i.e. usability, availability and acceptability. Although the instrumental purpose of an individual variable can be defined, it is always defined on assessment-level. Therefore applicability properties are always only evaluable/measurable for assessments, not for individual variables. The reasoning behind this is explained in more detail in the following paragraph(s).

Revision as of 09:29, 8 February 2008

<accesscontrol>Members of projects</accesscontrol>

Evaluating performance is an important issue also in open risk assessment. In pyrkilo method this means that all objects within the information structure as well as the process of creating them are evaluated regarding their purpose. It presumes that the purpose of each object is identified and defined. Pyrkilo method considers performance evaluation as an essential, continuous and integral part of carrying out risk assessment, not as an add-on analysis that may be conducted separately from the risk assessment itself.

The basis of evaluating performance of an assessment is its purpose. Each assessment should have an identified and explicated purpose, which also influences the purpose of each piece or component (variable) of the assessment (see Purpose and properties of good assessments). Evaluating performance is actually evaluating how well the assessments, and their parts, fulfill their purpose. The overall performance of an assessment can be considered as a function of the process performance (efficiency) and performance of the output (effectiveness). In pyrkilo method, the Purpose and properties of good assessments are used for evaluating the performance of risk assessments and their outputs. The properties of good risk assessment illustrate the framework for evaluating, as well as designing and managing, open risk assessments in terms of its quality of content, applicability and process efficiency.

Efficiency

Evaluating efficiency is in principle a relatively straight-forward thing to do. It can be written down as an equation as follows:

efficiency = output/effort

Efficiency is thus merely a technical measure of comparing the performance of the output, effectiveness, to the amount of effort spent in creating the particular output. The latter determinant (the denominator) is (in principle) quite simply measured or estimated, but the previous determinant (the numerator) is bit more complicated thing (see the effectiveness section below).

According to the pyrkilo method, evaluating efficiency of the process is in practice meaningful only on the level of whole assessment(s), not its parts (variables). This is due to assessments being discrete objects having defined starting and ending points in time and specific contextually and situationally defined goals, whereas variables are seen as continuously existing descriptions of reality that develop in time as knowledge about them increases. Efficiency is thus evaluated only for assessments or groups of assessments, not for individual variables. It should be noted, however, that if a variable is developed in relation to a particular assessment, the effort spent in developing it is considered as effort spent within that particular assessment and similarly the effectiveness of that variable is also evaluated in relation to the goals of that particular assessment.

Effectiveness

Evaluating effectiveness is a bit more complicated and multifaceted issue than evaluating efficiency. Effectiveness is evaluated for the outputs created in an assessment according to their properties regarding the assessment purpose. It consists of both the quality of content of the outputs and the applicability of the outputs (see Purpose and properties of good assessments for more detailed descriptions of the properties). Effectiveness thus covers the issues that are most often discussed under the labels of knowledge quality as well as uncertainty, and even goes beyond that.

Evaluating performance in terms of quality of content and applicability relates to both assessments and variables. Although evaluation of performance for both types of objects goes along the same lines, there are also certain differences that derive from the differences in the primary purposes of these object types and consequently their roles within the information structure (see Universal products and Variable for more about this issue).

Quality of content is a measure that describes the relation between real-world phenomena and descriptions about them. The point of reference in evaluating the performance of an object in terms of quality of content is therefore the part of reality that the object attempts to describe. The sub-properties of quality of content are called calibration, informativeness and relevance. Quality of content can be considered as an objective measure because it is (in principle) merely a comparison between a part of reality and a description about it and does not therefore involve any value judgments.

Applicability is a measure of the capability of a chunk of information to convey its message, the content of the object, to its intended uses (and users). The point of reference in evaluating performance of an object in terms of applicability is thus the intended use purpose for the object. The sub-properties of applicability are usability, availability and acceptability. Due to the reference point being the (instrumental) use purpose, valuating applicability is sensible only for assessments, not for individual variables. Applicability is related to contextual and situational, and thus also value-laden, settings and is therefore considered as a subjective measure.


  • differences in evaluating performance of a) assessments and b) variables
  • addressing of QofC properties through different attributes
  • addressing QofC properties on variable- and assessment-level
  • addressing applicability through different attributes of assessments


warning: old text beyond this point...

The general purpose of a variable is to describe a particular piece of reality and the general purpose of an assessment, being a collection of variables, is similarly to describe a particular, but broader, piece of reality as a collection of variables. What is the particular piece of reality that each particular object refers to is defined under the scope attribute of each object. The performance of these objects, in light of quality of content, is therefore evaluated in relation to the general purpose of the object type and the defined scope of the particular object.

As mentioned above, evaluating the quality of content is comparing a description against reality. This evaluation thus happens as comparison against the general purpose of assessments and variables, given the defined scope. In other words, how good are the definition and result of an object in describing the particular piece of reality defined in the scope of that object. The goodness can be evaluated/measured in terms of informativeness, calibration and relevance (see Purpose and properties of good assessments for explanation of these sub-properties).

Variables and assessments can also have (and in practice do have) an instrumental purpose. This is related to the use purpose of the object and is evaluated/measured in terms of the applicability and its sub-properties, i.e. usability, availability and acceptability. Although the instrumental purpose of an individual variable can be defined, it is always defined on assessment-level. Therefore applicability properties are always only evaluable/measurable for assessments, not for individual variables. The reasoning behind this is explained in more detail in the following paragraph(s).

The hierarchical information structure of pyrkilo brings about some interesting aspects for evaluation. Like in all ontological structures "a system can not be defined within the system itself". Setting the boundaries or defining the purpose of a system (e.g. an object) is something that can only be done from a higher level. Within the pyrkilo information structure this means in practice that the content of scope attribute is always defined, and therefore also evaluated/measured, from the point of view of an object higher in the hierarchy. Variable scopes are thus defined and evaluated in relation to the assessment(s) the variable belongs to, and similarly assessment scopes are defined/evaluated in relation to the context the assessment belongs to. This explains e.g. why applicability can only be defined on assessment-level - the use purpose of an assessment and all the variables included in it are defined by the context where the assessment (output) is intended to be used.

Variables are, however, objects that are supposed to be defined so that they are usable in several assessment (and consequently several contexts. This is possible, because the general purpose (to describe the reality within the defined scope) is constant and does not change, whereas the instrumental purpose may change according to the changes in the use purpose within different assessments. This does not mean, however, that the reference point for evaluating the performance against the general purpose (i.e. the quality of content) would be constant. Naturally, as the reality within the scope changes, the description of it should change accordingly.

In a somewhat similar manner as was explained applicability, the efficiency can only be evaluated/measured from outside an object itself, because the efforts/resources spent in creating or maintaining a variable or assessment is external to the object itself. Therefore, the efficiency measures are always only relevant for assessments and the variables as parts of assessments, not for individual variables.

What is here referred to as quality of content is somewhat similar to what is often referred to as uncertainty in different contexts. However, the use of the term uncertainty varies a lot and sometimes also some issues seen here to belong to applicability are considered under the label of uncertainty. Most considerations about uncertainty are compatible with this approach building on an ontological, hierarchical information structure, and can therefore be attached to this approach. This approach attempts to bring about a more solid base for identifying the sources of uncertainty and their points of reference, as well as identifying the important differences in nature of different types of uncertainty.

Quality of content can (and should) be evaluated/measured for any attribute for variables and assessments. Definition and result are evaluated within the object itself and in relation to reality defined by the scope of the object. Scope is always evaluated as a part of the definition attribute on a higher level object in which the particular object belongs to. Scope is thus always evaluated in relation to the instrumental purpose, not to reality. This makes sense, because any boundary in the description does not have a true counterpart in reality, it could be even said that any boundary definition is always arbitrary, although it may be practically sensible.