Application of Data Mining Techniques to Predict Urinary Fistula Surgical Repair Outcome: The Case of Addis Ababa Fistula Hospital, Addis Ababa, Ethiopia

Since the 1990s, the social and economic structure of the world has changed from industrial and product oriented environment to information and knowledge dependant one. Rapid growth of information technologies and its integration with digital networks, software, and database systems are the main characteristics of information and knowledge society [1]. The explosive growth in raw data accumulation in turn widened the gap between raw data that is not yet analyzed and meaningful information available for decision making. Because of the high volume of data and summarizing those with simple quantitative models became a great challenge for the information age-turning data into information and information into knowledge lead to a demand for specialized tools to view and analyze the data [2].


Background
Since the 1990s, the social and economic structure of the world has changed from industrial and product oriented environment to information and knowledge dependant one. Rapid growth of information technologies and its integration with digital networks, software, and database systems are the main characteristics of information and knowledge society [1]. The explosive growth in raw data accumulation in turn widened the gap between raw data that is not yet analyzed and meaningful information available for decision making. Because of the high volume of data and summarizing those with simple quantitative models became a great challenge for the information age-turning data into information and information into knowledge lead to a demand for specialized tools to view and analyze the data [2].
Following that data mining was applied for summarizing a large volume of data of maternal related problems. Maternal outcomes are good in most countries of the developed world while the same is not true in many developing and resource-poor countries. This disparity in maternal outcomes can easily be seen from the maternal mortality rate and lifetime risk of maternal death. For instance, the 2008 estimate of maternal mortality ratio for developed regions is 14 per 100,000 live births while it is 290 per 100,000 live births for developing regions. In the same line, the lifetime risk of maternal death is 1 per 4,300 births for developed regions while it is 1 per 120 births for developing regions. The above statistics for Sub Saharan Africa will rise to Maternal Mortality Ratio (MMR) of 640 per 100,000 live births and lifetime risk of maternal death of 1 per 31 births [3].
Generally, throughout the world, half a million women die from complications of pregnancy or childbirth every year, most of which occurs in resource-poor countries. In 2008 alone, an estimated 358,000 maternal deaths occurred worldwide because of complications related to pregnancy and childbirth from which developing countries accounted for 99% of the deaths. Furthermore, the analysis of the maternal mortality data for Sub-Saharan Africa and South Asia alone has shown that 87% of the global maternal deaths occurred in countries of these regions [3].
A fistula is an abnormal opening between the vagina and the bladder, the most common and the one which dominates the clinical presentations i.e. vesico vaginal fistula (VVF), and/or between the vagina and rectum i.e. recto vaginal fistula (RVF) [4][5][6].
Despite its devastating effects the exact prevalence of obstetric fistula is unknown while it is estimated to affect thousands of women in developing countries. The most frequently reported global prevalence of obstetric fistula shows that approximately two million women have untreated fistula in Asia and sub-Saharan Africa alone and additional 50,000 to 100,000 women develop obstetric fistulas each year [3]. In Ethiopia also obstetric fistula is a health challenge to thousands of women where 9,000 are affected each year [4].

Objective
The purpose of the research is therefore, to apply data mining techniques and build a model that maps clinical examination attributes with the outcome of surgical repair for urinary fistula. This research will also compare the performance measures of logistic regressions with that of Decision Trees, Decision rules, Naïve Bayes, multinomial logistic regression so as to come up with a model of relatively higher area under the ROC (Receiver Operating Characteristics) curve. To this end, this research will try to answer the following questions: 1. What values of predictive factors (attributes) are associated with each outcome of urinary fistula repair? 2. Would it be possible to draw association rules among the attributes and the classes of urinary fistula surgical repair outcomes?
3. Can models from other algorithms predict urinary fistula surgical repair outcome with better sensitivity and specificity expressed as area under ROC curve than logistic regression?

Methods (Data Mining Modelling)
The hybrid model (six-step KDP model) is chosen to be used as a framework to guide the overall activities in the current study. Hybrid process model was selected since it combines best features of CRISP-DM and KDD methodology to identify and describe several explicit feedback loops which are helpful in attaining the research objectives. Hybrid methodology basically involves six steps ( Figure 1):

The Weka GUI chooser
The Weka GUI chooser provides a starting point for launching Weka's main GUI applications and supporting tools. It includes access to the four Weka's main applications: Explorer, Experimenter, Knowledge Flow and Simple CLI.

Classifier accuracy measures
Classifier Accuracy Measures using the same dataset to derive a classifier or predictor and then to estimate the accuracy of the resulting learned model results in misleading overoptimistic estimates due to over specialization of the learning algorithm to the data. Then, the classifier is applied on the test set and the number of instances that were assigned to actual classes and different class by the classifier is counted, a process whose result is effectively represented by confusion matrix [7].

Confusion matrix
Confusion matrix is useful tool for analyzing how well classifier recognized the classes. An entry, CM i,j in the first m rows and m columns indicate the number of tuples of class that were labeled by the classifier as class j [8].

Selection of instances
Building a predictive model for victims of urinary fistula requires selection of instances with no additional type of fistula is identified. Out of the 19929 victims, 15961 victims were affected by urinary fistula (VVF) and have undergone urinary fistula repair. Instance with missing values for outcome class are not useful for predictive model building in data mining because classification algorithms of data mining learn how instance were classified under the different classes [7]. As this study uses classification algorithms for the purpose of predictive model building, the 220 records without class information are removed from subsequent analyses. The remaining dataset was then having 15741 records whose outcomes are distributed in one of the outcome categories. Thus, the statistical summaries of attributes relevant to the data mining objectives are on these 15741 records.

Exploratory data analysis
The attribute's description, data type, unit of measure and list of values or range of values are described. The frequency tables for the selected attributes show the original distribution of values of attributes in instances of the dataset before any preprocessing is done on the dataset.

Number of previous repairs at other hospital
It is an attribute used to show the number of previous repairs done at other hospital. It is nominal valued attribute and includes values such as 1, 2, 3, >3, not applicable, no information ( Table 1).

Type of urinary fistula
This attribute mainly indicates the site at which the fistula has occurred. Like the other attributes it assumes valid nominal values such as Urethral, Circumferential, Combined, Juxta-urethral, Mid Vaginal, juxta-cervical, vesico uterine, vault, uretheric, Torn urethra, Absent Urethra, No bladder, Other and no information ( Table 2).

VVF length:
VVF length is a measure of fistula size which indicates the length of fistula in centimeters, takes only limited and pre-specified number of values which makes the attribute to be considered as nominal. The values of this attribute are 1, 2, 3, 4, 5, >5 (Table 3).

VVF width
VVF width is the second measure of fistula size which indicates the width of fistula in centimeters, it takes only limited and pre-specified number of values which makes the attribute to be considered as nominal. The values of this attribute are 1, 2, 3, 4, 5, >5 (Table 4). designed to store information on the social and demographic background, information on obstetric and medical history, preoperative care, and information on operation date of the victim who comes to the hospital seeking treatment. The datasets in the access files are exported to excel files whose size amounted to 10.5 MB before any processing activity is done on it. Data found in electronic format is preferred to the manual records found on more than 35,000 victims that the hospital has treated for the past 38 years. Therefore, because of the short period of time given for the study, the study has considered only the 19,929 instances found in the access database. Finally, access was obtained to analyze the dataset for the objectives specified in this research.

Data selection
The 63 attributes left in the dataset were organized under five general headings such as; social and demographic variables, medical and obstetric history, preoperative care, operation date, postoperative course. Socio demographic attributes indicate the social and demographic back ground of these women with child birth injuries. Attributes found under this general heading are serial number, age at marriage, age at causative delivery, current age, height (cm), weight (kg), parity, number of living children, days to AAFH (Addis Ababa Fistula Hospital) on foot, days to AAFH by transport, educational status, marital status, accompanying person, distance to the nearest health facility, source of information, how many days before the woman could walk. The second groups of attributes are found under the medical and obstetric history. Values of attributes such as: antenatal care, duration of incontinence months, no of previous repairs done at other hospital, cause of fistula, other illness, duration of labour, place of delivery, mode of delivery, fetal outcome, other major illness, menstruation history, are recorded for each case. The third groups of attributes found under preoperative care are; pre-operative stay days, antibiotic given pre-operatively, type of antibiotics, Pre-operative care provided, nerve and musculoskeletal injury. The fourth groups of attributes are those attributes whose values are recorded during operation date. These attributes include: anesthesia, approach for urinary fistula repair, circumcision, type of procedure (repair), number of urinary fistula, type of urinary fistula (site), VVF length, VVF width, scarring, bladder size, Status of bladder neck, status of urethra, status of ureters, ureteric cateters, bladder fistula closure, graft, flaps, RVF location (rectal-injury type), RVF length, RVF width, rectal fistula closure (layers), sphincter status, intra operative complications, duration of surgery, and surgery outcome urinary, surgery outcome bowel.
Finally, information is captured during the post-operative course. The attributes for recording the values during this course are: transfusion, antibiotics post-operative type, pack in (days), postoperative complications, duration of bladder urethral catheters and total length of stay

Attribute subset selection
The major criterion for selecting an attribute set at this initial stage is to check whether each attribute is relevant to the data mining objective. Two crows corporation also ascertain that usefulness to the data mining objective is the major criteria in selecting attributes at the initial stage [10].
The Chi Square AttributeEval also ranks the attributes based on their chi-square statistics because the selected attributes are all nominal values to see the distribution of each value of attributes in the dataset to identify errors and to discern there is exist missing values or not.

Bladder size
It indicates the size of bladder in terms of its volume expressed with nominal values such as small, good, fair, none, no information ( Table  6).

Status of bladder neck
It is an attribute used to indicate the level of the effect of obstruction on bladder neck. The values to this attribute are complete destruction, partially damaged, intact, no information, not applicable. Only 3.36% of the total number of instances has no values and no errors are committed during entering values to the fields ( Table 7).

Status of urethra
It is an attribute used to indicate the level of the effect of obstruction on the urethra. The values to this attribute are complete destruction,

Type of urinary fistula Frequency Percent
Valid values together with inconsistencies as a result of discrepancy in data representations

Number of fistula
It is an attribute used to record the number of fistula at different sites. It is considered nominal because of values none and >3 cannot be taken as numeric. The values a particular record can assume are also pre-specified to include 1, 2, 3, >3, and none ( Table 9).

Status of ureters
It is an attribute used to record the side the ureters are affected. It assumes one of the three nominal values such as one outside, both inside and both outside (Table 10).

Surgical outcome of urinary fistula repair
It indicates the restoration of urinary continence after surgical intervention. Valid values of this attribute are: cured, failed, stress, residual. The missing values are in each case was handled by replacing the most frequented value (Table 11).

Noise correction
Noise refers to a random error mostly characterized by a deviation from valid values of the attribute. The errors for nominal valued attribute are resolved by methods used for handling missing values [7]. First, the error values are removed manually, and then replaced by the modal value (Table 12).

Resolving inconsistencies
The two possible causes for the inconsistencies detected in the fields of selected attributes are human error in data entry and the design of the values of attributes of the database with no predefined values. The problem associated with existence of inconsistencies is that they reduce the quality of the final model and makes learning difficult for the algorithms. Discrepancies were detected while extracting statistical summaries of attribute values. Despite the valid values of the attributes observed in the manual form used in actual data collection, there are invalid values entered in the database. Han and Kamber [7] also state that knowledge about the properties of the data can be used in detecting discrepancies that may exist in databases (Tables 13 and 14).

Description of preprocessed and prepared data
Different activities were performed on the dataset with the objective of making it suitable for the data mining algorithms and producing representative model. Very large numbers of instances were removed and large numbers of attributes are removed (Table 14).

Experimentation, Analysis and Evaluation of Discovered Knowledge
Experimentation, in this study, represents the data mining step in the six step hybrid KDP model where five data mining algorithms (including the association algorithm) are applied on the dataset to achieve the objective of extracting association rules from attribute values of urinary fistula assessment and to build a model for predicting the outcome of urinary fistula surgical repair association rule mining experiments and predictive model building experiments. Likewise, experiments which make use of different classification algorithms are intended to build urinary fistula surgical repair outcome predictive model of relatively better sensitivity and specificity as compared to others.

Experimental design
All the experiments that are discussed in the subsequent sections are carried on 15546 instances and 11 attributes. The attribute set includes "previous repairs at other hospital", "type of urinary fistula", "VVF length", "VVF width", "bladder size", "status of bladder neck", "status of urethra", "scarring", "status of ureters", "number of fistula" and "surgical outcome of urinary fistula repair". The last attribute in the list represents the class attribute which is mandatory in developing predictive models. In order to build predictive models for urinary fistula surgical repair outcome, four different algorithms were used. More specifically, J48, PART, naïve Bayes, and multinomial logistic regression are the algorithms with which predictive model building experiments are conducted. In 10 fold cross validation, one option in Weka for the purpose mentioned; the dataset is split into 10 equal parts. The "explorer window" is opened from the Weka GUI chooser "Explorer" button.

Experimentation with Apriori Algorithm to Discover Association Rules
Association rule mining algorithm, Apriori, is used to identify attribute values co-occurring with urinary fistula surgical repair outcome (Tables 15 and 16).

Association rules by the number of fistula
The number of fistula is a characteristic of fistula which is identified by counting the number of fistulas occurred in different sites of the birth canal and bladder. From the total of 29 best rules obtained by eliminating the redundant ones, the antecedent part of only two of the rules start by stating the number of fistula [11] (Table 17).

Association rules by the number of previous repairs at other hospitals
The number of repairs at other hospitals is one of the predictors of the outcomes of urinary fistula surgical repair. It indicates the number of repeated repair attempts that has been made but hasn't enabled the victim to regain complete continence (Table 18).

Association rules by the status of ureters
It was discussed in the literature that obstruction of labour affects multiple organ systems, one of which is ureters. Obstruction of labour may affect only one ureter or both ureters (Table 19).

Number of instances 15546
Number of attributes 11 Number of classes 4 (Cured, Stress, Failed, Residual) Size of the data 2 MB Nominal minMetric Minimum metric score. Consider only rules with scores higher than the specified value. Minimum confidence by default is 0.9

Delta
The delta by which the minimum support is decreased in each iteration (default: 0.05).

Numeric lower Bound
Min Support Lower bound for minimum support (default: 0.1) Numeric

Association rules by the status of urethra
Solbjorg Sjoveian, Siri Vangen, Denis Mukwege, Mathias Onsrud stated that published reports indicate the degree of involvement of urethra (status of urethra) as one of the main prognostic factors for surgical outcome [11] (Table 20).

Association rules by status of bladder neck
Status of bladder neck ranks the degree of injury that the obstruction has resulted on bladder neck on a nominal scale. Clinical assessment at the outpatient or immediately before surgical repair reveals the status of the bladder neck (Table 21).

Association rules by scaring
Scaring refers to fibrosis or dead tissue around the fistula margins. If exists it may vary from minimal when the fistula margins are soft and mobile to extreme when the fistula margins are rigid and fixed. For fresh fistula scaring will be none (Table 22).

Experimentation for Predictive Model Building
Developing a predictive model in datasets with high class imbalance and multiple classes requires some kind of countering the imbalance (Figure 2).

Experimentation with J48 Algorithm
J48 is Weka's implementation of the C4.5 algorithm which can work on multiple valued attributes. As it was observed from the data description the attributes that affect surgical repair outcome of urinary fistula are multi valued. In addition to using the default parameter settings of the algorithm to build predictive model with J48, an attempt was made to find better classifier by varying its important parameters (Table 23).
Binary Splits parameter by default is set to "False". If this value is changed to "True", it enforces the model generated to be binary decision tree rather than generalized decision tree. The confidence factor helps to set a limit so that the algorithm makes more or less pruning. The default value for confidence factor is 0.25. The working of confidence factor requires the unpruned parameter to be set to "False". The subtree raising parameter is by default set to "True" to replace the nodes in a decision tree with a leaf during pruning.
After building four predictive models by modifying the parameters of J48, it has been observed that the performances of the models are not the same. Thus, as indicated in the methodology part based on measures of performance, an evaluation is made by comparing these models.
The first comparison is made between experiments 1, 2, and 3. The common feature of these experiments is that they all return trees by pruning. The second and the third experiments has resulted in predictive accuracy of 79.24% with 0.50 WROC shows that this experiment has very low sensitivity and specificity. Greater sensitivity and specificity among these experiments is observed in experiment one with 0.568 WROC (     (Table 25).
As sensitivity and specificity has greater importance than general accuracy of the classifier in clinical and medical fields, models are better compared based on WROC area? But another challenge with the use of SMOTE is the question where to set the threshold. Here, the researcher has taken 300% SMOTE as the threshold because after the third experiment oversampling the minorities will lead to under sampling of previously majority classes, despite the continuous decrease in accuracy and continuous increase in WROC area ( Figures  3 and 4).

Experimentation with PART Algorithm
PART algorithm extracts rules. Due to this reason the algorithm is categorized under classification by rule induction. The rules are landed together to give a complete set of rules. PART has almost a similar set of parameters with J48 algorithm that can be adjusted to build better model from datasets (Table 26).
The second and the third experiments were done by decreasing the confidence factor to 0.1 and 0.05. Decreasing the confidence factor enforces more pruning. The fourth experiment shows the results of setting the unprune parameter to "True" and taking the default values of the other parameters. The last experiment is done by applying reduced error pruning i.e. setting the value of this parameter to "True". Performance measures such as accuracy, WROC and the number of rules are better in the third experiment. The third experiment is better both in accuracy and WROC area than the other algorithms. Therefore, the model from the third experiment i.e. PART-M2-C0.05-Q1 has an accuracy of 78.66%, and WROC of 0.728 which is better than the others. Table 27 are experiments performed before applying SMOTE. Additional comparison among the performance measure of the classifiers from the best schemes after SMOTE has been applied shows a continuous decrease in accuracy and a continuous increase in area under the ROC curve. The results of PART-M2-C0.05-Q1 after successive SMOTEs are shown in Table 28.

Experimentation with naïve bayes algorithm
Bayesian methods are based on assumptions of probability. The Naïve Bayes algorithm assumes the attributes are independent. Then, the class of a new instance will be computed by multiplying the probabilities of values the instance has assumed under each attribute (Tables 29-31).
The most important parameter in relation to this study is displayModelInOldFormat. However, there are also other parameters        which can be adjusted according to needs of data used in different research areas. Table 28 shows the description of the parameter and type of values it takes. The default value to this parameter is "False".
The researcher has altered this value to "True" as displaying the model in old format is recommended to output the classifier's result for multivalued class classification.

Experimentation with logistic regression
In traditional statistics logistic regression is applicable only in cases where the outcome attribute is binary. In Weka, logistic regression can perform learning on a dataset with multiple outcome classes. As urinary fistula surgical repair intervention can result into more than two outcome classes, experiments were done with multinomial logistic regression. In cases of much co-linearity in the attributes of datasets ridge estimator is used to limit the range of values that the coefficient of regression function assumes.
The experiments shown in Table 32 were performed to develop model with a higher performance measures by incrementing the ridge parameter value from 10 -8 up to 10 -10 and decrementing it up to 10 -4 . The default value for ridge parameter in logistic regression is 10 -8 . In times of much co-linearity the very small ridge value enables to detect the coefficients of the values of each attribute. All the models from logistic regression have shown 79.4% accuracy and area under the WROC curve of 0.762. Comparison among these experiments can be concluded by selecting the default scheme (Logistic-R1.0E-8-M-1).
Like the effect of successive SMOTE observed in Naïve Bayes-O, decrease in performance of the model from logistic regression when SMOTE is increased successively from 100-500%. After 300% SMOTE, model from Logistic-R1.0E-8-M-1 is having as accuracy of 76.8% and area under the WROC curve of 0.752. Comparison of measures of performances of models before and after SMOTE shows that the models before SMOTE are better in both predictive accuracy and area under the WROC curve (Table 33).

Findings from the classification algorithms
The researcher has tried to experiment four algorithms namely: J48, PART, Naïve Bayes, and logistic regression with the purpose of developing a model for urinary fistula surgical repair outcome.
Under each algorithm multiple schemes are tested for their ability in predicting outcomes at better sensitivity and specificity which is expressed in WROC. This measure is selected as a base for comparing performances of schemes because accuracy alone is not a good measure of selecting models in medical areas. The last activity is to compare the best schemes from each algorithm with other best schemes found from other algorithms.
At first glance of Table 34, it seems that logistic regression is better than the others in area under the WROC curve. Close investigation of the models based on area under the ROC curve for each outcome class as shown in Table 35 depicts that the logistic regression is relatively insensitive to "residual" outcome for urinary fistula repair (ROC Residual =0.669). The same drawback is observed in Naïve Bayes-O (ROC Residual =0.677). However, high compromise is made in the ROC area for failed outcomes in PART-M2-C0.05-Q1 as compared to logistic regression and Naïve Bayes models, PART-M2-C0.05-Q1 with no SMOTE is highly sensitive to residual outcome than the models from logistic and Naïve Bayes. Additional comparison based on each outcome's ROC area with J48-U-M2 after 300% SMOTE shows that PART-M2-C0.05-Q1 with no SMOTE is better in all the ROC areas for the outcomes except ROC area for residual outcome (Table 35). Based on these multiple reasons it could be inferred that PART-M2-C0.05-Q1 scheme after 300% SMOTE is relatively better than models from the other schemes ( Figure 5).

Classifier's error
In classification or prediction tasks, the accuracy of the resulting model is measured either in terms of the percentage of instances correctly classified or in terms of "error rate". Classification error rate on pre classified test set is commonly used as an estimate of the expected error rate when classifying new records [12]. To make the procedure valid, the 10-fold cross validation is used, so that model is built and tested 10 times. Errors during each test are averaged to give the average error rate of the model. The classification error rate for the selected model is 23.8%, which means the model has incorrectly classified about around 23.8% instances out of their actual classes each time when the model is tested on the test set. Several reasons may be attributed for increased error rate from the models. First, algorithms differ in their capability as observed from comparisons of performance measures. Second, attributes in preoperative, operative and postoperative course that are not included in the study might have influenced it. In fact, a particular victim regains her continence not because of clinical examination rather because of the treatments and the surgical repair.

Classification rules for predicting stress incontinence after surgical repair
Each rule in Table 37 should be taken independently and no form of relationship can be created among these rules. The rules can be used to situations in which a new instance assumes attributes values indicated by the rule. All the rules shown in the table work for smaller number of instances in the dataset, however, stress incontinence is observed in large number of instances for whom the rules apply.

Classification rules for predicting failure after a surgical repair
Each rule in Tables 38 and 39 should be taken independently and no form of relationship can be created among these rules.

Ethical Considerations
The instances in the dataset include the victims identifying information and health information and all other services provided by the hospital. Beyond explicit importance and use of the information in therapeutic process, researches like this thesis make use of it. But, the use of this medical information of instances for research and other varied purposes raises ethical issues such as: patient's privacy or confidentiality. However, the research is for the purpose of professional contribution to assist obstetric fistula treatment and it will not attempt to harm anybody in any way. Identifying information were removed from the dataset to protect the privacy and confidentiality of the victims treated in the center and of those now on treatment. Ethical clearance is obtained from the research and ethics committee of the School of Public Health of Addis Ababa University to carry out the study and analyze the dataset.

Conclusion and Recommendation Conclusion
Prediction of outcomes of urinary fistula surgical repair intervention is of paramount importance for both during surgical decision making and for special post-operative care that particular victims may require. Browning A [13] has indicated the purpose of predicting victims who are more likely to suffer post-repair complication because of residual outcome. According to him identifying these victims can enable to tailor surgical techniques to try and decrease complication rate and to make the surgery be done by more experienced fistula surgeon. The results from predictive models could also be used in post-operative consultations with the victim who has undergone repair surgery.
Association rules are extracted from the clean dataset with the use of Apriori algorithm which showed attribute values that frequently co-occur together with specific classes. All of the rules showed that less severity of injury co-occurring more with "cured" outcome than any other outcome. The reverse of which indicates stress, residual, and failed surgical outcomes may occur in cases of higher severity of an injury. Moreover, the addition of an attribute value decreases the coverage rules indicating cured surgical outcome, which means that instances with additional injury have a decreased chance of cure than a victim with only one injury of same type.
The study has shown the necessity to experiment as many classification algorithms as possible before picking and using a single algorithm for prediction. On the way to the major objective i.e. developing predictive model, performances of models from best schemes of J48, PART, and Naïve Bayes algorithms were compared with the performance of the best scheme from logistic regression. The comparison has revealed that PART-M2-C0.05-Q1 after 300% SMOTE has performed prediction better than logistic regression in ROC Residual .

Rule No
"IF" Part "Then" part Success ratio %    The model that PART-M2-C0.05-Q1 scheme after 300% SMOTE learns is better in area under the ROC curve for residual outcome than Naïve Bayes and logistic and better than J48 in the ROC area for the other outcome classes.
PART-M2-C0.05-Q1 after 300% SMOTE resulted in 76.81% accuracy and with a weighted area under the ROC curve of 0.742 was used to build the predictive model. At first scene these performance measures seem very low as compared to the very high accuracy, sensitivity and specificity needed in surgical decision making. But, predicting surgical outcomes disregarding the preoperative care provided, intra-operative complexities that may occur during surgery, the post-operative care and complexities at this level of accuracy and ROC area are encouraging.
To sum up, consultation with domain experts on the rules and models that were left after objective evaluations also confirms that the increase in the severity of fistula diminishes the chance of being cured after surgical repair. Less severity, on the other hand, is a positive ground for "cure" as an outcome. This shows that the finding of this research agrees with the previously existing knowledge in urinary fistula surgical repair outcome.

Recommendation
Before the data has been used for the purpose of predictive model building and association rule mining, a number of preprocessing and preparation steps were carried out on the data. Those activities which resulted in clean data are: cleaning for errors, and handling missing values. As indicated in summary statistics during data preparation, the dataset has some error entries that could be prevented by predefining the values a particular attribute can take. This is because of the holistic treatments that the hospital provides to victims of obstetric fistula and injuries in birth tract, so that the database was made to include all the variables to all the different types of injuries. Thus, variables that apply to a particular injury type will be non-applicable to the other. These attribute values create difficulties to the extraction of meaningful knowledge from the database. The solution to this problem, for example, could be to create different forms and tables to record victims based on the type of surgical repair performed. Some important benefits that this solution can provide are, ease in generating reports in simple statistical tools and decrease the task of filling non applicable attribute values if the case is only of a specific type.
The predictive model can assist urinary fistula surgical repair outcome prediction with the given levels of accuracy and weighted area under the ROC curve. The model can also be used to provide post-operative advice and make consultation with a victim who has already undergone surgical repair. With the development of small knowledge base system the usability of the model can go further to the time of actual surgery, by making the system available on hand held small portable computers. But before moving to the construction of knowledge base system (KBS) that contains knowledge of the domain area as depicted by the model obtained, the researcher would like to give some recommendation about the data attribute values captured. First, the entry of errors to columns of the database should be protected by predefining the valid values the attribute can take. Second, to eradicate some inapplicable values for a particular case it would be better to capture data based on the type of surgical intervention that are needed by the situation of victims who came for treatment.
Finally, it has been observed that classification algorithms differ based on the performance of the model they build. With the short period of time given to this research, it was found impossible to experiment more than four algorithms for predictive model building. Therefore, to come up with a model that may show better performance even from the model used to extract predictive rules, classification algorithms such as support vector machine (SVM), multi-layer perceptrones (MLP) and many others can be experimented. This will help to compare the performances of the models with the model from this research, and to move onto the level of deployment.