Figure 2

Figure 2: Parameter estimates with varying n for variables under the H₁ hypothesis. This figure only concerns estimates for identification sets. Each of the four panels was obtained with a specific sample size with n={100; 200; 400; 1000}. Whatever the panel, the following distributions were plotted: 1-distribution of the estimates for the Ω_p1 variables computed over 200 identification sets (grey histogram). 2-distribution of the estimates for the Ω_S set of variables computed over 200 identification sets (histogram with horizontal hatching). The vertical continuous line indicates 0.2.
With n=100, estimates for Ω_p1 are highly fluctuating, as shown by the wide distribution. Variables are selected in the extreme of the distributions of the estimates for Ω_p1, and these mean estimates are thus far from their true means. When increasing the sample sizes, the mean distribution of Ω_S estimates tends toward the mean distribution of Ω_p1 estimates. This is an illustration of the regression to the mean phenomenon that leads to over-estimation of the strength of association for true positives.