Extending UML for Modeling Data Mining Projects (DM-UML)
Óscar Marbán and Javier Segovia*
Polytechnic University of Madrid, Montegancedo, Spain
- *Corresponding Author:
- Javier Segovia
Informa’tica faculty, Polytechnic University of Madrid
Montegancedo Campus s/n. 28660 Boadilla del Monte (Madrid) Spain
E-mail: [email protected]
Received Date: July 03, 2013; Accepted Date: September 16, 2013; Published Date: September 30, 2013
Citation: Marbán Ó, Segovia J (2013) Extending UML for Modeling Data Mining Projects (DM-UML). J Inform Tech Softw Eng 3:121. doi:10.4172/2165-7866.1000121
Copyright: © 2013 Marbán Ó, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Existing Data Mining process models propose one way or another of developing projects in a structured manner, trying to reduce their complexity through effective project management. It is well-known in any engineering environment that one of the management tasks that helps to reduce project problems is systematic project documentation, but few of the existing Data Mining processes propose their documentation. Furthermore, these few remark the need of producing documentation at each phase as an input for the next, but they don’t show how to do it. On the other hand, in the literature there are examples of UML extensions for data mining projects, but they always focus on the model implementation side and fail to take into account the remainder of the process. In this paper, we present an extension of the UML modeling language for data mining projects (DM-UML) covering all the documentation needs for a project conforming to a standard process, namely CRISP-DM, ranging from business understanding to deployment. We also show an example of a real application of the proposed DM-UML modeling. The result of this approach is that, besides the advantages of having an standardized way of producing the documentation, it clearly constitutes a very useful and transparent tool for modeling and connecting the business understanding or modeling phase with the remainder of the project right through to deployment, as well as a way of facilitating the communication with the nontechnical stakeholders involved in the project, problems which have always been an open question in data mining.