Figure 5: Parameter estimates for all selected variables with n={100; 1000} and n={1000; 100}, respectively, for identification and validation sets. At the top n={100; 1000}; at the bottom n={1000; 100}, respectively, for identification and validation sets. The following distributions were plotted: 1-distribution of the estimates for the ΩR sets over 200 identification sets parameters (histogram with horizontal hatching). 2-distribution of the estimates for the ΩR sets over 200*50 validation datasets (histogram with diagonal hatching). The vertical dotted line indicates the mean of the latter distribution. The vertical continuous line indicates 0.2.
With n=100, estimates of the strength of association are poor and have far lower estimates on validation datasets, even large ones like with n=1000. With n=1000 in the identification datasets, however, the strength of association is correctly estimated; as a consequence, it is confirmed on the validation sets, whatever their size.