Difference between revisions of "Evaluating assessment performance"

From Testiwiki
Jump to: navigation, search
m (links fixed)
(related exercises added)
 
(13 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<accesscontrol>Members of projects</accesscontrol>
+
[[category:Open assessment]]
 +
{{Lecture}}
  
Evaluating performance is an important issue also in open risk assessment. In pyrkilo method this means that all objects within the information structure as well as the process of creating them are evaluated regarding their purpose. It presumes that the purpose of each object is identified and defined. Pyrkilo method considers performance evaluation as an essential, continuous and integral part of carrying out risk assessment, not as an add-on analysis that may be conducted separately from the risk assessment itself.
+
'''Evaluating assessment performance''' is a lecture about the factors that constitute the performance, the ''goodness'', of assessments and they can be evaluated within a single general framework.
  
The basis of evaluating performance of an assessment is its purpose. Each assessment should have an identified and explicated purpose, which also influences the purpose of each piece or component (variable) of the assessment (see [[Purpose and properties of good assessments]]). Evaluating performance is actually evaluating how well the assessments, and their parts, fulfill their purpose. The overall performance of an assessment can be considered as a function of the process performance (efficiency) and performance of the output (effectiveness). In pyrkilo method, the [[Purpose and properties of good assessments]] are used for evaluating the performance of risk assessments and their outputs. The properties of good risk assessment illustrate the framework for evaluating, as well as designing and managing, open risk assessments in terms of its quality of content, applicability and process efficiency.
+
This page was converted from an encyclopedia article to a lecture as there are other descriptive pages already existing about the same topic. The old content was archived and can be found [http://en.opasnet.org/en-opwiki/index.php?title=Evaluating_assessment_performance&oldid=4383 here].
  
'''Efficiency'''
+
==Scope==
  
Evaluating efficiency is in principle a relatively straight-forward thing to do. It can be written down as an equation as follows:
+
'''Purpose:''' To summarize what assessments are about overall, what are the factors that constitute the overall performance of assessment, how the factors are interrelated, and how the performance, the ''goodness'' of assessment can be evaluated.
  
::efficiency = output/effort
+
'''Intended audience:''' Researchers (especially at doctoral student level) in any field of science (mainly natural, not social scientists).
  
Efficiency is thus merely a technical measure of comparing the performance of the output, effectiveness, to the amount of effort spent in creating the particular output. The latter determinant (the denominator) is (in principle) quite simply measured or estimated, but the previous determinant (the numerator) is bit more complicated thing (see the effectiveness section below).
+
'''Duration:''' 1 hour 15 minutes
  
According to the pyrkilo method, evaluating efficiency of the process is in practice meaningful only on the level of whole assessment(s), not its parts (variables). This is due to assessments being discrete objects having defined starting and ending points in time and specific contextually and situationally defined goals, whereas variables are seen as continuously existing descriptions of reality that develop in time as knowledge about them increases. Efficiency is thus evaluated only for assessments or groups of assessments, not for individual variables. It should be noted, however, that if a variable is developed in relation to a particular assessment, the effort spent in developing it is considered as effort spent within that particular assessment and similarly the effectiveness of that variable is also evaluated in relation to the goals of that particular assessment.
+
==Definition==
  
'''Effectiveness'''
+
In order to understand this lecture it is recommended to also acquaint oneself with the following lectures:
 +
* [[Open assessment in research]]
 +
* [[Assessments - science-based decision support]]
 +
* [[Variables - evolving interpretations of reality]]
 +
* [[Science necessitates collaboration]]
  
Evaluating effectiveness is a bit more complicated and multifaceted issue than evaluating efficiency. Effectiveness is evaluated for the outputs created in an assessment according to their properties regarding the assessment purpose. It consists of both the quality of content of the outputs and the applicability of the outputs (see [[Purpose and properties of good assessments]] for more detailed descriptions of the properties). Effectiveness thus covers the issues that are most often discussed under the labels of knowledge quality as well as uncertainty, and even goes beyond that.
+
==Result==
  
Evaluating performance in terms of quality of content and applicability relates to both assessments and variables. Although evaluation of performance for both types of objects goes along the same lines, there are also certain differences that derive from the differences in the primary purposes of these object types and consequently their roles within the information structure (see [[Universal products]] and [[Variable]] for more about this issue).
+
[[Image:Evaluating assessment performance.ppt|slides]]
  
'''Quality of content''' is a measure that describes the relation between real-world phenomena and descriptions about them. The point of reference in evaluating the performance of an object in terms of quality of content is therefore the part of reality that the object attempts to describe. The sub-properties of quality of content are called '''calibration''', '''informativeness''' and '''relevance'''. Quality of content can be considered as an objective measure because it is (in principle) merely a comparison between a part of reality and a description about it and does not therefore involve any value judgments.
+
* Introduction through analogy: what makes a mobile phone good?
 +
** chain from production to use
 +
** goodness from whose perspective, producer or user? Can they be fit into one framework?
 +
** phone functionalities (qofC)
 +
** user interface, appearance design, use context, packaging, logistics, marketing, sales (applicability)
 +
** mass production/customization: components, code, assembly (efficiency)
 +
* Assessments
 +
** serve 2 masters: truth (science) and practical need (societal decision making, policy)
 +
** must meet the needs of their use
 +
** must strive for truth
 +
** both requirements must be met, which is not easy, but possible
 +
** a business of creating understanding about reality
 +
*** making right questions, providing good answers, getting the (questions and) answers where they are needed
 +
*** getting the questions right (according to need) is primary, getting the answers right is only conditional to the previous
 +
* Contemporary conventions of addressing performance:
 +
** quality assurance/control: process approach
 +
** uncertainty assessment: product approach
 +
* Performance in the context of assessments as science-based decision support: properties of good assessment
 +
** takes the production point of view
 +
** quality of content
 +
*** informativeness, calibration, relevance
 +
** applicability
 +
*** availability, usability, acceptability
 +
** efficiency
 +
** different properties have different points of reference and criteria,
 +
** a means of managing design and execution or evaluating past work
 +
** not an orthogonal set: applicability conditional on quality of content, efficiency conditional on both quality of content and applicability
 +
* Methods of evaluation
 +
** informativeness and calibration - uncertainty analysis, discrepancy analysis between result and another estimate (assumed as a golden standard)
 +
** relevance - scope vs. need
 +
** usability - (assumed) intended user opinion, participant rating for (technical) quality of information objects
 +
** availability - observed access to assessment information by intended users
 +
** acceptability of premises - (assumed) acceptance of premises by intended users, or others interested, or affected
 +
** acceptability of process - scientific acceptability of definition, given scope &rarr; peer review
 +
** efficiency - estimation of spent effort (given outcome)
 +
* How much addressing of these properties is built in to open assessment and opasnet?
  
'''Applicability''' is a measure of the capability of a chunk of information to convey its message, the content of the object, to its intended uses (and users). The point of reference in evaluating performance of an object in terms of applicability is thus the intended use purpose for the object. The sub-properties of applicability are '''usability''', '''availability''' and '''acceptability'''. Due to the reference point being the (instrumental) use purpose, valuating applicability is sensible only for assessments, not for individual variables. Applicability is related to contextual and situational, and thus also value-laden, settings and is therefore considered as a subjective measure.
+
Exercises:
 
+
{|{{prettytable}}
 
+
! Topic
*differences in evaluating performance of a) assessments and b) variables
+
! Task description
*addressing of QofC properties through different attributes
+
|-----
*addressing QofC properties on variable- and assessment-level
+
| Quality of content
*addressing applicability through different attributes of assessments
+
| Consider the concepts informativeness, calibration and relevance. What is their point of reference, practical need or truth? Which attributes and sub-attributes of assessments and variables they relate to ?Now choose one variable that was developed in Tuesday and Wednesday exercises. What can you say about their informativeness, calibration and relevance? Then consider the case assessment (ET-CL). What can you say about its informativeness, calibration and relevance? What implications this gives about future work regarding the variable and the assessment? Give suggestions for improving their quality of content. Think of ways how assessments are or could be evaluated against these properties in practical terms. Prepare a presentation including a brief introduction to the concepts, evaluation of the quality of content of the variable and the assessment, practical means of evaluation against these properties, and suggestions for improving the quality of content, as well as reasoning to back-up your suggestions.
 
+
|-----
 
+
| Usability and availability
''warning: old text beyond this point...''
+
| Consider the concepts usability and availability. What is the reference point they relate to, practical need or truth? Assuming that the information content of a variable or an assessment were complete, correct and exact, what would make it usable and available or not usable or available for someone who would need that particular chunk of information? Now choose one variable that was developed in Tuesday and Wednesday exercises. What can you say about its usability and availability given the purpose and users defined in the scope of the case assessment (ET-CL)? Then consider the case assessment. What can you say about its usability and availability given the purpose and users defined in the scope. What implications this gives about future work regarding the variable and the assessment? Give suggestions for improving their usability and availability. Think of ways how assessments are or could be evaluated against these properties in practical terms. Prepare a presentation including a brief presentation of the concept, why is it an important aspect of assessment performance, evaluation of the usability and availability of the variable and the assessment, practical means of evaluation against these properties, and suggestions for improving the usability and availability, as well as reasoning to back-up your suggestions.
 
+
|-----
The general purpose of a variable is to describe a particular piece of reality and the general purpose of an assessment, being a collection of variables, is similarly to describe a particular, but broader, piece of reality as a collection of variables. What is the ''particular piece of reality'' that each particular object refers to is defined under the scope attribute of each object. The performance of these objects, in light of quality of content, is therefore evaluated in relation to the general purpose of the object type and the defined scope of the particular object.
+
| Acceptability
 
+
| Consider the concept acceptability and its disaggregation to acceptability of premises and acceptability of process. Which attributes and sub-attributes of variables and assessments does acceptability relate to? Now choose one variable that was developed in Tuesday and Wednesday exercises. What can you say about its acceptability given the scope of the variable? Then consider the case assessment. What can you say about its acceptability given the scope of the assessment. What aspects are most crucial regarding the acceptability of the assessment? Give suggestions for improving the acceptability of the assessment. Think of ways how assessments are or could be evaluated against these properties in practical terms. Prepare a presentation including a brief introduction to the concept and why it is an important aspect of assessment performance, an evaluation of the acceptability of the variable, practical means of evaluation against these properties, and the assessment, and suggestions for improving the acceptability, as well as reasoning to back-up your suggestions.
As mentioned above, evaluating the quality of content is comparing a description against reality. This evaluation thus happens as comparison against the general purpose of assessments and variables, given the defined scope. In other words, how good are the definition and result of an object in describing the particular piece of reality defined in the scope of that object. The goodness can be evaluated/measured in terms of informativeness, calibration and relevance (see [[Help:General properties of good risk assessments]] for explanation of these sub-properties).
+
|-----
 
+
| Efficiency
Variables and assessments can also have (and in practice do have) an instrumental purpose. This is related to the use purpose of the object and is evaluated/measured in terms of the applicability and its sub-properties, i.e. usability, availability and acceptability. Although the instrumental purpose of an individual variable can be defined, it is always defined on assessment-level. Therefore applicability properties are always only evaluable/measurable for assessments, not for individual variables. The reasoning behind this is explained in more detail in the following paragraph(s).
+
| Consider the concept efficiency. How is efficiency dependent on quality of content and applicability? What else influences efficiency of assessments? Now explore the case assessment, how it is designed and being executed, and think about its efficiency. If you were responsible for making the assessment and provided with scarce resources, what would you do in order to   get the best outcome? How about if you were responsible for making a somewhat related an partially overlapping assessment, e.g. [[Bioher]], [[Claih]], TAPAS, some other, and again with scarce resources, what would you tell the manager of ET-CL in order to maximize the efficiency of your assessment. How about if you were responsible for a group of overlapping assessments, say ET-CL, [[Claih]] and [[Bioher]] for example, would it change your management decisions? Think of ways how assessments are or could be evaluated against these properties in practical terms. Prepare a presentation including a brief introduction to the concept and why it is an important aspect of assessment performance, practical means of evaluation against these properties, an explanation of how you as an assessment manager would try maximize the efficiency in the above mentioned different situations, as well as reasoning to back-up your hypothetical management decisions.
 
+
|-----
The hierarchical information structure of pyrkilo brings about some interesting aspects for evaluation. Like in all ontological structures "a system can not be defined within the system itself". Setting the boundaries or defining the purpose of a system (e.g. an object) is something that can only be done from a higher level. Within the pyrkilo information structure this means in practice that the content of '''scope''' attribute is always defined, and therefore also evaluated/measured, from the point of view of an object higher in the hierarchy. Variable scopes are thus defined and evaluated in relation to the assessment(s) the variable belongs to, and similarly assessment scopes are defined/evaluated in relation to the context the assessment belongs to. This explains e.g. why applicability can only be defined on assessment-level - the use purpose of an assessment and all the variables included in it are defined by the context where the assessment (output) is intended to be used.
+
| Peer review
 
+
| Acquaint yourself with how peer review is defined in the context of open assessment. Consider its similarities with and deviations from your prior perception of what peer review is. Which attributes of variables and assessment does peer review (in OA) address and how? Which properties of good assessments does peer review relate to? Take the position of a peer reviewer and review one variable and the case assessment. Would you accept? Prepare a presentation including a brief introduction to the method and how it is used in evaluating assessment performance
Variables are, however, objects that are supposed to be defined so that they are usable in several assessment (and consequently several contexts. This is possible, because the general purpose (to describe the reality within the defined scope) is constant and does not change, whereas the instrumental purpose may change according to the changes in the use purpose within different assessments. This does not mean, however, that the reference point for evaluating the performance against the general purpose (i.e. the quality of content) would be constant. Naturally, as the reality within the scope changes, the description of it should change accordingly.
+
|}
 
 
In a somewhat similar manner as was explained applicability, the efficiency can only be evaluated/measured from outside an object itself, because the efforts/resources spent in creating or maintaining a variable or assessment is external to the object itself. Therefore, the efficiency measures are always only relevant for assessments and the variables as parts of assessments, not for individual variables.
 
 
 
What is here referred to as quality of content is somewhat similar to what is often referred to as '''uncertainty''' in different contexts. However, the use of the term uncertainty varies a lot and sometimes also some issues seen here to belong to applicability are considered under the label of uncertainty. Most considerations about uncertainty are compatible with this approach building on an ontological, hierarchical information structure, and can therefore be attached to this approach. This approach attempts to bring about a more solid base for identifying the sources of uncertainty and their points of reference, as well as identifying the important differences in nature of different types of uncertainty.
 
 
 
Quality of content can (and should) be evaluated/measured for any attribute for variables and assessments. Definition and result are evaluated within the object itself and in relation to reality defined by the scope of the object. Scope is always evaluated as a part of the definition attribute on a higher level object in which the particular object belongs to. Scope is thus always evaluated in relation to the instrumental purpose, not to reality. This makes sense, because any boundary in the description does not have a true counterpart in reality, it could be even said that any boundary definition is always arbitrary, although it may be practically sensible.
 

Latest revision as of 12:02, 26 March 2009



Evaluating assessment performance is a lecture about the factors that constitute the performance, the goodness, of assessments and they can be evaluated within a single general framework.

This page was converted from an encyclopedia article to a lecture as there are other descriptive pages already existing about the same topic. The old content was archived and can be found here.

Scope

Purpose: To summarize what assessments are about overall, what are the factors that constitute the overall performance of assessment, how the factors are interrelated, and how the performance, the goodness of assessment can be evaluated.

Intended audience: Researchers (especially at doctoral student level) in any field of science (mainly natural, not social scientists).

Duration: 1 hour 15 minutes

Definition

In order to understand this lecture it is recommended to also acquaint oneself with the following lectures:

Result

File:Evaluating assessment performance.ppt

  • Introduction through analogy: what makes a mobile phone good?
    • chain from production to use
    • goodness from whose perspective, producer or user? Can they be fit into one framework?
    • phone functionalities (qofC)
    • user interface, appearance design, use context, packaging, logistics, marketing, sales (applicability)
    • mass production/customization: components, code, assembly (efficiency)
  • Assessments
    • serve 2 masters: truth (science) and practical need (societal decision making, policy)
    • must meet the needs of their use
    • must strive for truth
    • both requirements must be met, which is not easy, but possible
    • a business of creating understanding about reality
      • making right questions, providing good answers, getting the (questions and) answers where they are needed
      • getting the questions right (according to need) is primary, getting the answers right is only conditional to the previous
  • Contemporary conventions of addressing performance:
    • quality assurance/control: process approach
    • uncertainty assessment: product approach
  • Performance in the context of assessments as science-based decision support: properties of good assessment
    • takes the production point of view
    • quality of content
      • informativeness, calibration, relevance
    • applicability
      • availability, usability, acceptability
    • efficiency
    • different properties have different points of reference and criteria,
    • a means of managing design and execution or evaluating past work
    • not an orthogonal set: applicability conditional on quality of content, efficiency conditional on both quality of content and applicability
  • Methods of evaluation
    • informativeness and calibration - uncertainty analysis, discrepancy analysis between result and another estimate (assumed as a golden standard)
    • relevance - scope vs. need
    • usability - (assumed) intended user opinion, participant rating for (technical) quality of information objects
    • availability - observed access to assessment information by intended users
    • acceptability of premises - (assumed) acceptance of premises by intended users, or others interested, or affected
    • acceptability of process - scientific acceptability of definition, given scope → peer review
    • efficiency - estimation of spent effort (given outcome)
  • How much addressing of these properties is built in to open assessment and opasnet?

Exercises:

Topic Task description
Quality of content Consider the concepts informativeness, calibration and relevance. What is their point of reference, practical need or truth? Which attributes and sub-attributes of assessments and variables they relate to ?Now choose one variable that was developed in Tuesday and Wednesday exercises. What can you say about their informativeness, calibration and relevance? Then consider the case assessment (ET-CL). What can you say about its informativeness, calibration and relevance? What implications this gives about future work regarding the variable and the assessment? Give suggestions for improving their quality of content. Think of ways how assessments are or could be evaluated against these properties in practical terms. Prepare a presentation including a brief introduction to the concepts, evaluation of the quality of content of the variable and the assessment, practical means of evaluation against these properties, and suggestions for improving the quality of content, as well as reasoning to back-up your suggestions.
Usability and availability Consider the concepts usability and availability. What is the reference point they relate to, practical need or truth? Assuming that the information content of a variable or an assessment were complete, correct and exact, what would make it usable and available or not usable or available for someone who would need that particular chunk of information? Now choose one variable that was developed in Tuesday and Wednesday exercises. What can you say about its usability and availability given the purpose and users defined in the scope of the case assessment (ET-CL)? Then consider the case assessment. What can you say about its usability and availability given the purpose and users defined in the scope. What implications this gives about future work regarding the variable and the assessment? Give suggestions for improving their usability and availability. Think of ways how assessments are or could be evaluated against these properties in practical terms. Prepare a presentation including a brief presentation of the concept, why is it an important aspect of assessment performance, evaluation of the usability and availability of the variable and the assessment, practical means of evaluation against these properties, and suggestions for improving the usability and availability, as well as reasoning to back-up your suggestions.
Acceptability Consider the concept acceptability and its disaggregation to acceptability of premises and acceptability of process. Which attributes and sub-attributes of variables and assessments does acceptability relate to? Now choose one variable that was developed in Tuesday and Wednesday exercises. What can you say about its acceptability given the scope of the variable? Then consider the case assessment. What can you say about its acceptability given the scope of the assessment. What aspects are most crucial regarding the acceptability of the assessment? Give suggestions for improving the acceptability of the assessment. Think of ways how assessments are or could be evaluated against these properties in practical terms. Prepare a presentation including a brief introduction to the concept and why it is an important aspect of assessment performance, an evaluation of the acceptability of the variable, practical means of evaluation against these properties, and the assessment, and suggestions for improving the acceptability, as well as reasoning to back-up your suggestions.
Efficiency Consider the concept efficiency. How is efficiency dependent on quality of content and applicability? What else influences efficiency of assessments? Now explore the case assessment, how it is designed and being executed, and think about its efficiency. If you were responsible for making the assessment and provided with scarce resources, what would you do in order to get the best outcome? How about if you were responsible for making a somewhat related an partially overlapping assessment, e.g. Bioher, Claih, TAPAS, some other, and again with scarce resources, what would you tell the manager of ET-CL in order to maximize the efficiency of your assessment. How about if you were responsible for a group of overlapping assessments, say ET-CL, Claih and Bioher for example, would it change your management decisions? Think of ways how assessments are or could be evaluated against these properties in practical terms. Prepare a presentation including a brief introduction to the concept and why it is an important aspect of assessment performance, practical means of evaluation against these properties, an explanation of how you as an assessment manager would try maximize the efficiency in the above mentioned different situations, as well as reasoning to back-up your hypothetical management decisions.
Peer review Acquaint yourself with how peer review is defined in the context of open assessment. Consider its similarities with and deviations from your prior perception of what peer review is. Which attributes of variables and assessment does peer review (in OA) address and how? Which properties of good assessments does peer review relate to? Take the position of a peer reviewer and review one variable and the case assessment. Would you accept? Prepare a presentation including a brief introduction to the method and how it is used in evaluating assessment performance