Regenerating distributions
From Testiwiki
This page is a method.
The page identifier is Op_en6224 |
---|
Moderator:Jouni (see all) |
Give your opinion to the peer rating of the content of this page. |
Upload data
|
Contents
Question
How to regenerate a probability distribution from data that only contains mean, sd or variance, and a few quantiles?
Answer
Rationale
This Matlab code was obtained from Patrycja Gradowska
[1].
% This script recreates the probability distribution of a random variable whose % only mean, variance/standard deviation and a couple of percentiles are known. % We start by constructing the initial cumulative distribution function (CDF) % of the variable by linear interpolation between known percentiles % (including lower and upper limits defined by the intrinsic range). % In general, this distribution will not reproduce the mean and % standard deviation as provided by the data. Therefore, the constrained optimization % is further used to find the CDF that satisfies % the percentiles given and whose mean and variance are as close as possible % to the mean and variance from the data. clear all; clc; % ----- Input data: known summary statistics of the variable distribution ----- % (1) mean and standard deviation EX = 3.538520795; SdX = 4.046127233; VarX = SdX^2; % (2) 5th, 25th, 50th, 75th and 95th percentiles Q = [0.486401325 0.606605988 1.945751342 4.944405412 10.52461027]; q05 = Q(1); q25 = Q(2); q50 = Q(3); q75 = Q(4); q95 = Q(5); % ----- Determine the intrinsic range ----- ql=Q(1)-0.1*(Q(5)-Q(1)); qu=Q(5)+0.1*(Q(5)-Q(1)); if ql<0 %Intake is positive among users/consumers ql=0.0001; end % ----- Linear interpolation among successive percentiles ----- Y = zeros(1001,1); Y(1:51) = interp1([0.03 50],[ql,q05],[0.03 1:50],'linear'); Y(51:251) = interp1([50 250],[q05,q25],[50:250],'linear'); Y(251:501) = interp1([250 500],[q25,q50],[250:500],'linear'); Y(501:751) = interp1([500 750],[q50,q75],[500:750],'linear'); Y(751:951) = interp1([750 950],[q75,q95],[750:950],'linear'); Y(951:1001) = interp1([950 999.98],[q95,qu],[950:999 999.98],'linear'); % ----- Optimization part ----- diff=@(Y)((mean(Y)-EX)^2+(var(Y)-VarX)^2); %Objective function % ----- Optimization constraints ----- u=-1*ones(1000,1); A=eye(1001)+diag(u,1); A(1001,1001)= 0; b=zeros(1,1001); bb=[zeros(50,1);1;zeros(199,1);1;zeros(249,1);1;zeros(249,1);1;zeros(199,1);1;zeros(50,1)]; Aeq=diag(bb); beq=[zeros(50,1);Q(1);zeros(199,1);Q(2);zeros(249,1);Q(3);zeros(249,1);Q(4);zeros(199,1);Q(5);zeros(50,1)]; Y0=Y; %Starting point for optimization lb=0.0001*ones(1,1001); %Lower bound for distribution percentiles ub=qu*ones(1,1001); %Upper bound for distribution percentiles options=optimset('LargeScale','off','Display','iter', 'MaxFunEval', 500000, 'MaxIter', 500000);% Options for Matlab solver [sol,fval,exitflag,output] = fmincon(diff,Y0,A,b,Aeq,beq,lb,ub,[],options); % ----- Saving data into a .dis file ----- data = [[0.03 1:999 999.98]' sol]; save BS_Herring_Pregnant_Women.dis data /ascii
See also
References
- ↑ Patrycja Gradowska: Regenerating distributions. Personal communication, email 21.4.2009