Author(s): Bakker B, Heskes T
Abstract Share this page
Abstract We show that large ensembles of (neural network) models, obtained e.g. in bootstrapping or sampling from (Bayesian) probability distributions, can be effectively summarized by a relatively small number of representative models. In some cases this summary may even yield better function estimates. We present a method to find representative models through clustering based on the models' outputs on a data set. We apply the method on an ensemble of neural network models obtained from bootstrapping on the Boston housing data, and use the results to discuss bootstrapping in terms of bias and variance. A parallel application is the prediction of newspaper sales, where we learn a series of parallel tasks. The results indicate that it is not necessary to store all samples in the ensembles: a small number of representative models generally matches, or even surpasses, the performance of the full ensemble. The clustered representation of the ensemble obtained thus is much better suitable for qualitative analysis, and will be shown to yield new insights into the data.
This article was published in Neural Netw
and referenced in Journal of Geology & Geophysics