Use of the Multiple Imputation Strategy to Deal with Missing Data in the ISBSG RepositoryAbdalla Bala and Alain Abran*
École de Technologie Supérieure (ÉTS)-University of Québec, Montréal, Québec, Canada
- *Corresponding Author:
- Alain Abran
École de Technologie Supérieure (ÉTS)-
University of Québec, Montréal, Québec, Canada
Tel: +1 (514) 396
E-mail: [email protected]
Received Date: February 07, 2016; Accepted Date: February 18, 2016; Published Date: February 29, 2016
Citation: Bala A, Abran A (2016) Use of the Multiple Imputation Strategy to Deal with Missing Data in the ISBSG Repository. J Inform Tech Softw Eng 6:171. doi:10.4172/2165-7866.1000171
Copyright: © 2016 Bala A, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Multi-organizational repositories, in particular those based on voluntary data contributions such as the repository of the International Software Benchmarking Standards Group (ISBSG), may be missing a large number of values for many of their data fields, as well as including some outliers. This paper suggests a number of data quality issues associated with the ISBSG repository which can compromise the outcomes for users exploiting it for benchmarking purposes or for building estimation models. We propose a number of criteria and techniques for preprocessing the data in order to improve the quality of the samples identified for detailed statistical analysis, and present a multiple imputation (MI) strategy for dealing with datasets with missing values.