Figure 3: Parameter estimates with varying n for variables under the H0 hypothesis. This figure only concerns estimates for identification sets. Each of the four panels was obtained with a specific sample size with n={100; 200; 400; 1000}. Whatever the panel, the following distributions were plotted: 1-distribution of the estimates for the Ωp0 variables obtained over 200 identification sets (grey histogram). 2-distribution of the estimates for the ΩV sets obtained over 200 identification sets parameters (histogram with horizontal hatching). The vertical continuous line indicates 0.2.
With n=100, estimates for Ωp0 are highly fluctuating, as shown by the wide distribution. Variables are selected in the extreme of the distributions of Ωp0 estimates and the mean estimates of ΩV variables are thus far from their true means. When increasing the sample sizes, the distribution of the estimates for the Ωp0 variables gets narrower and the mean distribution of the ΩV variables estimates decreases. This illustrates the regression to the mean phenomenon that leads to the inappropriate selection of some FP variables that have in fact no effect on survival.