Figure 2: Parameter estimates with varying n for variables under the H1 hypothesis. This figure only concerns estimates for identification sets. Each of the four panels was obtained with a specific sample size with n={100; 200; 400; 1000}. Whatever the panel, the following distributions were plotted: 1-distribution of the estimates for the Ωp1 variables computed over 200 identification sets (grey histogram). 2-distribution of the estimates for the ΩS set of variables computed over 200 identification sets (histogram with horizontal hatching). The vertical continuous line indicates 0.2.
With n=100, estimates for Ωp1 are highly fluctuating, as shown by the wide distribution. Variables are selected in the extreme of the distributions of the estimates for Ωp1, and these mean estimates are thus far from their true means. When increasing the sample sizes, the mean distribution of ΩS estimates tends toward the mean distribution of Ωp1 estimates. This is an illustration of the regression to the mean phenomenon that leads to over-estimation of the strength of association for true positives.