Difference between revisions of "SIPs and SLURPs"

From Testiwiki
Jump to: navigation, search
(first draft based on the hopepage)
 
(own thoughts)
Line 1: Line 1:
{{encyclopedia|moderator=Jouni}}
+
{{method|moderator=Jouni}}
 
'''Stochastic information packet (SIP)''' is a format for describing random samples from probability distributions. A SIP is essentially a Monte Carlo sample of possible values, using a standard sample size, with a distribution representative of the possible outcomes.  Importantly, the SIP is treated as the representation of the value and uncertainty of the quantity.  To capture relationships between quantities, multiple SIPs are bundled into a '''SLURP (Stochastic Library Unit with Relationship Preserved)'''.  SIPs and SLURPs may be exchanged between people within the organization, and used directly in decision models.  By managing a standardized set of SIPs and SLURPs within an organization, probabilistic estimates from different groups within an organization can be combined within models in a coherent fashion.
 
'''Stochastic information packet (SIP)''' is a format for describing random samples from probability distributions. A SIP is essentially a Monte Carlo sample of possible values, using a standard sample size, with a distribution representative of the possible outcomes.  Importantly, the SIP is treated as the representation of the value and uncertainty of the quantity.  To capture relationships between quantities, multiple SIPs are bundled into a '''SLURP (Stochastic Library Unit with Relationship Preserved)'''.  SIPs and SLURPs may be exchanged between people within the organization, and used directly in decision models.  By managing a standardized set of SIPs and SLURPs within an organization, probabilistic estimates from different groups within an organization can be combined within models in a coherent fashion.
  
 
The sample values within SIPs and SLURPs appear in a random order, as would be the case in a Monte Carlo sample, but the specific ordering of the samples is critical:  It captures the relationships between quantities.  Suppose one SIP represents the remaining cost to complete a construction project and another SIP is the remaining time to completion.  In scenarios, or samples, with an exceptionally low cost, the remaining time will also usually be small.  Likewise, cost overruns usually coincide with delays.  These two SIPs are coherent when the ordering of samples captures this relationship, meaning that the nth point of remaining_cost should correspond to the same scenario as the nth point of remaining_time.  Coherence in this fashion captures correlation between the quantities as well as other more subtle dependencies that may not be apparent in of correlations.  Remaining cost and time are SIPs that should be bundled within the same SLURP. <ref>[http://www.lumina.com/ana/SIPsandSLURPs.htm SIPs and SLURPs with Analytica]</ref>
 
The sample values within SIPs and SLURPs appear in a random order, as would be the case in a Monte Carlo sample, but the specific ordering of the samples is critical:  It captures the relationships between quantities.  Suppose one SIP represents the remaining cost to complete a construction project and another SIP is the remaining time to completion.  In scenarios, or samples, with an exceptionally low cost, the remaining time will also usually be small.  Likewise, cost overruns usually coincide with delays.  These two SIPs are coherent when the ordering of samples captures this relationship, meaning that the nth point of remaining_cost should correspond to the same scenario as the nth point of remaining_time.  Coherence in this fashion captures correlation between the quantities as well as other more subtle dependencies that may not be apparent in of correlations.  Remaining cost and time are SIPs that should be bundled within the same SLURP. <ref>[http://www.lumina.com/ana/SIPsandSLURPs.htm SIPs and SLURPs with Analytica]</ref>
 +
 +
==Scope==
 +
 +
SIPs and SLURPs are based on a commercial DIST 1.1 Standard by ProbiliTech. However, the same idea of packing random samples while retaining the original order of samples can be implemented using other means. '''What is a good way of packing random samples in such a way that is not bound by commercial standards?'''
 +
 +
==Definition==
 +
 +
===Input===
 +
 +
The method should take in a random sample of values (or text) and pack it effectively with a minimal loss of information. The user should be able to adjust the critical parameters, for example
 +
* The rounding precision ''prec'' (2 = two decimal digits, 0 = integer, -1 = rounded to tens)
 +
* The smallest value sampled ''min''
 +
* The largest value sampled ''max''
 +
* Number ''n'' of bins used. The default is 256 (2<sup>8</sup>), which is used if n is omitted. However, ''prec, min, max'' may constrain the number of possible values, and if that is smaller than n, the smaller number will be used.
 +
 +
===Output===
 +
 +
A text string with all necessary information to unpack the sample should be the output.
 +
* An identifier for a sip: SAMPLE
 +
* The parameter values for the four parameters.
 +
* If the distribution is a probability table with text values, a list of all possible values are given with sequence numbers.
 +
* The packed sequence of random draws. If ''n'' is 256, 8 bits will be used for each draw. These are changed into characters having the ASCII values 33-288, which are unambiguously understood by most character encoding systems. There might be some exceptions, like asc(256) and "|"  should not be used. For effective packing, n should be exactly or slightly smaller than some potency of 2, as ''n'' = 257 and ''n'' = 512 both take 9 bits.
 +
 +
For example, the output can look like this:
 +
 +
SAMPLE|prec=2|min=260.47|max=294.37|n=16||eKu8W)=εñ"-$§▼eT4i.Mî║|
 +
 +
With ''n='' = 16, each draw takes 4 bits ie. two draws per one character. This example has 23 characters and therefore it contains 46 draws. the bar | is used as a separator between parameters.
 +
 +
===Procedure===
 +
 +
The method is based on an assumption of latin hypercube sampling. This means that the numbers drawn from a distribution are not random but the whole distribution is divided into n bins which are equally apart from each other and have different probabilities. In effect, the distribution is treated as a frequency distribution with x<sub>1</sub> observations from bin 1, x<sub>2</sub> from bin 2, ... and x<sub>n</sub> from bin n. These values are clearly deterministic given the distribution, but they will be shuffled randomly. When the minimum, the maximun, and the number of bins are known, the values can be deduced. The the packed part of the SIP only contains the order of values that come from different bins.
  
 
==See also==
 
==See also==

Revision as of 08:56, 28 October 2010

Stochastic information packet (SIP) is a format for describing random samples from probability distributions. A SIP is essentially a Monte Carlo sample of possible values, using a standard sample size, with a distribution representative of the possible outcomes. Importantly, the SIP is treated as the representation of the value and uncertainty of the quantity. To capture relationships between quantities, multiple SIPs are bundled into a SLURP (Stochastic Library Unit with Relationship Preserved). SIPs and SLURPs may be exchanged between people within the organization, and used directly in decision models. By managing a standardized set of SIPs and SLURPs within an organization, probabilistic estimates from different groups within an organization can be combined within models in a coherent fashion.

The sample values within SIPs and SLURPs appear in a random order, as would be the case in a Monte Carlo sample, but the specific ordering of the samples is critical: It captures the relationships between quantities. Suppose one SIP represents the remaining cost to complete a construction project and another SIP is the remaining time to completion. In scenarios, or samples, with an exceptionally low cost, the remaining time will also usually be small. Likewise, cost overruns usually coincide with delays. These two SIPs are coherent when the ordering of samples captures this relationship, meaning that the nth point of remaining_cost should correspond to the same scenario as the nth point of remaining_time. Coherence in this fashion captures correlation between the quantities as well as other more subtle dependencies that may not be apparent in of correlations. Remaining cost and time are SIPs that should be bundled within the same SLURP. [1]

Scope

SIPs and SLURPs are based on a commercial DIST 1.1 Standard by ProbiliTech. However, the same idea of packing random samples while retaining the original order of samples can be implemented using other means. What is a good way of packing random samples in such a way that is not bound by commercial standards?

Definition

Input

The method should take in a random sample of values (or text) and pack it effectively with a minimal loss of information. The user should be able to adjust the critical parameters, for example

  • The rounding precision prec (2 = two decimal digits, 0 = integer, -1 = rounded to tens)
  • The smallest value sampled min
  • The largest value sampled max
  • Number n of bins used. The default is 256 (28), which is used if n is omitted. However, prec, min, max may constrain the number of possible values, and if that is smaller than n, the smaller number will be used.

Output

A text string with all necessary information to unpack the sample should be the output.

  • An identifier for a sip: SAMPLE
  • The parameter values for the four parameters.
  • If the distribution is a probability table with text values, a list of all possible values are given with sequence numbers.
  • The packed sequence of random draws. If n is 256, 8 bits will be used for each draw. These are changed into characters having the ASCII values 33-288, which are unambiguously understood by most character encoding systems. There might be some exceptions, like asc(256) and "|" should not be used. For effective packing, n should be exactly or slightly smaller than some potency of 2, as n = 257 and n = 512 both take 9 bits.

For example, the output can look like this:

SAMPLE|prec=2|min=260.47|max=294.37|n=16||eKu8W)=εñ"-$§▼eT4i.Mî║|

With n= = 16, each draw takes 4 bits ie. two draws per one character. This example has 23 characters and therefore it contains 46 draws. the bar | is used as a separator between parameters.

Procedure

The method is based on an assumption of latin hypercube sampling. This means that the numbers drawn from a distribution are not random but the whole distribution is divided into n bins which are equally apart from each other and have different probabilities. In effect, the distribution is treated as a frequency distribution with x1 observations from bin 1, x2 from bin 2, ... and xn from bin n. These values are clearly deterministic given the distribution, but they will be shuffled randomly. When the minimum, the maximun, and the number of bins are known, the values can be deduced. The the packed part of the SIP only contains the order of values that come from different bins.

See also

References