CMB@IJClab | CovarianceMatrix / EstimatorBehaviour browse

Caracteristics of the Variance estimator used for HPR MC

date:17/07/09

When the noise sigma is estimated on the HPR of White Noise (see MC HPR), a systematic bias is observed. Hereafter is a toy model to see if this effect can be reproduced and if an other not biased estimator can be found.

CONCLUSION:
The effect has been reproduced even if the origin is not understood. It appears that the best way to estimate sigma is to use the variance routine of IDL.

Caracteristics of the data:

The code is making an array of gaussian white noise of roughly 500000 elements corresponding to a ring in the TOI.
The HPR is also modelised with 4500 bins.
These values are the caracteristics value for TOI and HPR.
The number of sample of the TOI per bin of the HPR is randomly chosen following a gaussian distribution.
For each bin, the data of the TOI corresponding to the HPR bin are averaged and this mean value corresponds to the noise in the HPR. In the code, these TOI data are consecutive. It is not the case in Planck simulation because TOI data in a bin correspond to different circles of a ring and so are not time consecutive.
After that, the HPR is unweighted before estimating the noise sigma on the HPR data. This unweighted step corresponds to multiplying the HPR value in a bin to the square root of the number of sample (hit) in this bin. Thus, the sigma of this random variable is the sample sigma (TOI noise sigma).

These random variables follow a normal law N0,1. The sigma can be estimated with two method:

Method 1: By doing the histogram of this noise and fitting it with a gaussian. That is what was done on the true HPR first. But it seems that this estimator has a bias
Method 2: More simple: work out the variance of the noise.

Hereafter are the caracteristics of the Toy Model:

Data simulated with IDL
The noise is a Gaussian White Noise of mean 0 and standard deviation 1: N0,1

nsample in TOI	500000
nbin in HPR	4500
initial seed for gaussian noise generator	12345

Exemple of Noise created

TOI White Noise	Number of sample per bin in the Simulated HPR	HPR of White Noise	HPR unweighted

Histogram of TOI WN	Histo of HPR	Histo of Unweighted HPR

Residu of Histo of TOI WN	Residu of Histo of HPR	Residu of Histo of Unweighted HPR

Comment:
-The plot of the number of sample per bin in a HPR is different from the real HPR but this difference of distribution shouldn't affect the results. (It doesn't affect much the results when the distribution is uniform instead of gaussian.)
-This distribution for the number of sample per bin in a HPR is not exactly gaussian because it must be greater or equal to 1. This is the case here.
-The histogram of the TOI noise is very close to a gaussian. One can see on the residue that the difference between the fit and the histogram has the amplitude of the expected error.
-Same observation for the HPR unweigthed. The fit is not as good as the TOI because there is less elements in the unweighted HPR (roughly 100 time less. (You can check that by looking at the histogram amplitude, there is a factor ~100)).
-The histogram of the HPR isn't gaussian at all because each value for one bin follows a different random law due to the difference of the number of sample in a bin.

Behaviour of the Estimator:

In order to compare to what is obtained in MC HPR, different White Noise realisations are created by changing the seed of the IDL gaussian random generator and, for each one, the estimation of the sigma is worked out with the two methods and for the TOI and the Unweighted HPR.

With 1000 simulations of White Noise: (each simulation have 500000 elt in the TOI and 4500 bins in the HPR.)

Sigma with Meth 1	Error on Sigma Estimation	Variance with Meth 2	Sigma with Meth 2
Data of the TOI

Data of the HPR (unweighted)

Comment:
-The first conclusion is that for the TOI data. The two methods gives a good results. The results is not biased. The dispersion is the same (~1e-3) for both methods and it is consistent with the estimated error on sigma by the method one.
-BUT, for the HPR data, the first method is biased and not the second one. More important, sigma is under-estimated of 1-2%. It is exactly as what is obtained on real data.
-And then, the second method (using the variance routine of IDL) is FINE. Then this method will be used for the real data !!!
-In the first method, there is a bias for the HPR but not for the TOI. It may be linked to the smaller number of elements in the HPR compared to the TOI. (it will be tested after.)

TO DO NEXT:
- See if the bias for the first method can be seen on the TOI data when there is less data
- We will try to accumulate the data as at MC HPR to see how the bias evolve.

Bias in the first method:

The same simulation is done (1000 noise realisations) but with a smaller and a bigger number of sample in the TOI and in the HPR:

nsample in TOI	5000	50000000
nbin in HPR	450	4500000

nsample in TOI = 5000		nsample in TOI = 50000000
Sigma with Meth 1	Sigma with Meth 2	Sigma with Meth 1	Sigma with Meth 2
Data of the TOI

Data of the HPR (unweighted)

Comment:
- When the TOI has few elements, the estimation of sigma is biased with the method 1. And the bias is the same than the bias for the HPR above because they have roughly the same number of elements (4500 for the HPR above and 5000 for the TOI here).
- When the HPR and TOI has more elements, the estimation of sigma with the method 1 is better.
- BUT, One can see that the estimation of sigma with the method 2 (variance) is very bad when the number of sample is bigger whereas it should be better. The same problem appears when data are accumulated. It is due to numerical precision. Using double solve the problem. see below: precision problem

Accumulation of Data:

This time, we have 150 noise realisations but each time the noise realisation is concatenated to all the precendent noise realisations. Then, one can see the convergence of the sigma estimated. Each TOI noise have 500000 elements and each HPR noise have 4500 bins.

Sigma with Meth 1	Error on Sigma Estimation	Variance with Meth 2	Sigma with Meth 2
Data of the TOI

Data of the HPR (unweighted)

Comment:
- This time the first method seems better than the first one. Indeed, for the TOI, the second method diverges. Actually, it is due to numerical precision. see below: precision problem. Nevertheless, the method 1 seems to converge but not toward the expected value. This time there is a positive bias. The same behaviour is seen on real data.
- The results for the HPR are less biased and seems to converge to the good value.

When the precision problem is solved (have to add the argument '/double' in every IDL routine: variance, total, mean, stddev), the method 2 gives:

Sigma with Meth 1	Error on Sigma Estimation	Variance with Meth 2	Sigma with Meth 2
Data of the TOI

Data of the HPR (unweighted)

Comment:
- The second method is better for the TOI and equivalent for the HPR.
- The dispersion for the TOI and HPR with the method 2 is perfectly understandable:
One can see on graphs that, the dispersion for TOI is around 1e-4 and 1e-3 for HPR. It correpsonds to the expected dispersion:
For TOI, the number of sample used to workout the variance is: NTOI ~ 150*500000 = 7.5e7. ~ 5e7
For HPR it is: NHPR~150*4500 = 6.75e5
And the variance on the variance estimator is known to be: Var{ Variance Estimator} ~ 2 / N.
Thus, for TOI: V{V} ~ 2 / 5e7 ~ 4e-8 => Sig{V} ~ sqrt(V{V}) ~ 2e-4!
On graph, the TOI dispersion for sigma was estimated to 1e-4. So the dispersion for the variance is 2e-4!
Indeed, sigma_estimated = 1 + u (see plot above) where u~1e-4, then variance_estimated = sigma_estimated^2 = (1+u)^2 ~ 1 + 2u. Then the dispersion for the variance is two time the dispersion for sigma i.e. 2*1e-4 = 2e-4.
One can conclude that the two estimations of dispersion fit!
This the same thing for the HPR, there is two order of magnitude (a factor 100) between the number of sample in TOI and HPR, then one order of magnitude (a factor 10) between the dispersion: 1e-4 for TOI and 1e-3 for HPR !!

NOW, the sigma will be estimated by the Variance routine of IDL.

precision problem

The method based on IDL routine wasn't good when the number of sample is very important. It is probably due to numerical truncation. When the double are used, the problem is solved. See below an other toy model showing the effect of the type of data.

In this Toy Model, the number of sample in the White Noise realisation increases and each time the variance is worked out. One time, with out the '/double' argument and one time with.

With out Double	With Double

Comment:
- One can see on the first graph that the estimated value diverges from the expected value: 1. Whereas, on the second graph, the estimator stays around 1, doesn't seem biased and the dispersion decreases as expected.