Received Date: August 30, 2017; Accepted Date: September 11, 2017; Published Date: September 15, 2017
Citation: Zhao P, Yoo I (2017) A Systematic Review of Highly Generalizable Risk Factors for Unplanned 30-Day All-Cause Hospital Readmissions. J Health Med Informat 8:283. doi: 10.4172/2157-7420.1000283
Copyright: © 2017 Zhao P, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Health & Medical Informatics
Background: Hospital readmissions are common and expensive. Numerous global efforts have been devoted to predicting readmissions. However, for many reasons including the variations in the studied populations and inconsistent definitions of readmissions, the outcomes of some studies can hardly be generalized to other studies inside or outside the same country or region.
Objective: The objective was to identify highly generalizable risk factors for unplanned 30-day all-cause hospital readmissions to guide the selection of baseline predictor variables in different readmission studies.
Methods: In July 2017, PubMed was searched to identify articles pertaining to the risk factors for unplanned 30-day all-cause hospital readmissions. To identify potentially eligible risk factors, characteristics of the selected studies were manually extracted. The generalizability of the risk factors was assessed with predefined criteria.
Results: 13 articles were eligible for the review. A total of 42 risk factors were identified and 34 of them were found to be highly generalizable.
Conclusions: The 34 risk factors are not specific to any populations or places, and the corresponding predictor variables can serve as baseline variables in readmission prediction studies. No major difference has been observed between the risk factors identified inside and outside the United States except that US studies appeared to prefer composite comorbidity measures. All the reviewed studies have used traditional statistical regression-based methods to identify risk factors and more applications of data mining techniques are expected in this field.
Unplanned 30-day hospital readmission; Patient readmission; Generalizable risk factors; All-cause
Hospital readmissions have emerged as a global concern due to their high frequency and high associated cost. In the United States, 15.5% of Medicare beneficiaries were readmitted unplanned within 30 days after discharge during 2014/7 to 2015/6 . It has been estimated that unplanned readmissions account for $17.4 billion in Medicare expenditure annually . In England, the 28-day emergency readmission rate was 11.5% during the fiscal year 2011 to 2012 . Although the causes have not been well studied, unplanned hospital readmissions are frequently seen as related to the substandard quality of care received during index admissions. The pressure to reduce cost and improve healthcare quality has triggered the implementations of financial penalty programs targeting readmissions in different countries, including England, Germany, and the United States . In the fiscal year 2017, 79% of the hospitals will be penalized by the Hospital Readmissions Reduction Program (HRRP)  in the United States with the total estimated penalties of $528 million .
Recent years have seen a growing body of literature on hospital readmissions with the goal of improving healthcare quality and lowering cost. The predictive modelling of readmissions is one of the most common study types to help providers better identify high-risk patients. Unfortunately, studies in this area are highly fragmented, especially in target populations. The study outcomes span from models that are specific to populations with particular diseases or surgeries to general purpose all-cause models. Currently, the HRRP in the United States only considers the index conditions or surgeries of acute myocardial infarction, heart failure, pneumonia, chronic obstructive pulmonary disease, elective total hip or knee arthroplasty, and coronary artery bypass graft in the calculation of readmission penalties due to their high prevalence and cost [5,7]. Largely spurred by the HRRP, many studies have focused on readmissions occurring after the index admissions for these conditions or surgeries only. The choice between conditionspecific and all-cause readmission models has long been under debate. However, condition-specific models have been criticized for the poor generalizability, especially in patients with multiple conditions [8,9]. In addition, readmissions are not always clinically relevant to the index admissions. One retrospective analysis of 217,767 unplanned readmissions has found that 58.5% of them were not assigned the same principal diagnoses, diagnosis related groups (DRG), or all-patient DRG as the index admissions .
Attempts to predict readmissions were further complicated by the lack of consensus on data inclusion criteria. Many studies focused on unplanned readmissions while some others included all available readmissions without removing scheduled readmissions. The definitions of unplanned readmissions were also highly inconsistent. Some studies restricted unplanned readmissions to occur in certain departments or specialties and some identified them by diagnosis codes. In some studies, unplanned readmissions were further classified as either potentially avoidable or unavoidable. Readmissions due to progressions of existing conditions or newly developed conditions after discharge are deemed unavoidable [11,12]. It has been argued that including unavoidable readmissions in quality measures is unfair because they are not directly related to the quality of healthcare services during index admissions . However, there is no agreement at present on the criteria to identify avoidable readmissions. In many studies, the avoidable readmissions were determined by medical experts and the inclusion eligibility can be subjective . According to a systematic review of 34 articles in 2011, the measured proportions of avoidable readmissions varied from 5% to 79% .
To exacerbate the situation, diverse time frames were used to capture readmissions. In a systematic review of 26 readmission prediction models developed in six countries, the intervals between discharges and readmissions ranged from 14 days to four years . The Centers for Medicare and Medicaid Services (CMS) in the United States adopted the 30-day time window . In the United Kingdom, both 28-day  and 30-day  periods were used by the National Health Service (NHS) to measure readmission rates. 30-day is currently the most used time frame globally. The possible reasons are that older patients are more vulnerable during this period  and readmissions occurring within 30 days are more likely influenced by the quality of care .
Given the complex nature of hospital readmissions, it is challenging to conduct meaningful readmission prediction studies without good knowledge of existing evidence from both domestic and global research communities. However, due to the heterogeneous target populations and inconsistent definitions of readmissions, the outcomes of some studies can be hardly generalized to other studies . The purpose of this study was to identify the generalizable study outcomes of readmission predictions from the risk factor level to guide the selection of baseline predictor variables in different readmission studies. Especially, we are interested in the risk factors for unplanned 30-day all-cause hospital readmissions due to the better-validated time frame and the broader target.
In the past few years, several attempts have been made to review risk factors or predictor variables for hospital readmissions, yet none of them have focused on generalizability. The review by Vest et al.  in 2010 was limited to US studies from 2000 to 2009 only and the time frame varied from seven-day to six-month for all-cause readmissions. The focus of the review by Kansagara et al.  in 2011 was on readmission prediction models derived in developed countries before 2011. Predictor variables of the reviewed models were tabulated without differentiating condition-specific and all-cause models. Zhou et al.  also placed emphasis on reviewing readmission prediction models developed between 2011 and 2015. The significant predictor variables were summarized without further analysis. Besides, many studies have reviewed readmission risk factors for specific conditions or surgeries. To the best of our knowledge, this is the first systematic review of highly generalizable risk factors for unplanned 30-day all-cause hospital readmissions.
Data source and search strategy
In this study, we present a systematic literature review of highly generalizable risk factors for unplanned 30-day all-cause hospital readmissions following the PRISMA statement for systematic reviews .
A literature search was performed in PubMed to identify articles relating to the risk factors for unplanned 30-day all-cause hospital readmissions. The search keywords have four components reflecting the interest of this review: “unplanned”, “30-day”, “hospital readmission”, “risk factors”. “All-cause” was not included in the keyword because some all-cause readmission studies do not explicitly mention their scopes. Synonyms and hyphenations were included to account for variations in different studies. Wildcards were used to match the verb and noun forms of “readmission”. The logical relationships among the search keywords are shown in Figure 1.
Study inclusion and exclusion criteria
In this study, only peer-reviewed articles written in English were considered. We included articles focusing on identifying statistically significant predictor variables or risk factors for 30-day unplanned all-cause hospital readmissions. Articles were excluded if they met any of the following criteria:
(1) The readmission time frame is other than 30 days, such as 90- day readmission,
(2) Studies focusing on planned readmissions or not differentiating planned and unplanned readmissions,
(3) Studies specific to narrow patient populations with particular medical conditions or underwent certain surgeries,
(4) The study outcome is more than 30-day unplanned all-cause hospital readmission, such as mortality in combination with 30-day unplanned all-cause hospital readmission,
(5) Studies of paediatric and new-born readmissions. Paediatric and new-born readmissions were filtered out because the risk factors may be distinct from adult readmissions [14,20] and the readmissions could be influenced by parent-side factors [20,23].
To reduce redundancy and bias, external validations of existing prediction tools were removed and only the original articles of the cited tools were included if eligible.
Data extraction process
The characteristics of the studies, including the publication year, study region, data source, study design, cohort definition and definition of unplanned readmissions, analysis method, predictor variables, and statistically identified risk factors (P<0.05) were extracted from all the included studies. The risk factors were summarized by category and were grouped if they share the same corresponding predictor variable. The number of studies that analysed predictor variables and the number of studies that found them significant were recorded.
In this study, we define “high generalizability” as the capability of being applied to other hospital readmission prediction studies regardless of target populations and residing places. Although the studies specific to narrow populations have been filtered out during the article selection step, some identified risk factors may be still tied to a sub-population. For example, studies with Medicare patients are less generalizable because those patients are 65 years old or older in the United States. Also, some risk factors identified in one place may not work in other places if they are closely related to the unique local health care systems (e.g., insurance, medical social welfare). In addition, it may be impractical to apply some risk factors to other types of studies due to the difference in study exposures related to designs. As a result, we chose to assess the generalizability of risk factors by three questions:
(1) Whether a risk factor is specific to a narrow population or not,
(2) Whether it is specific to a place or not,
(3) Whether it is specific to a study exposure related to one particular study design or not.
If a risk factor is not specific to any of them, we deem that the risk factor is generalizable.
Figure 2 shows the process of identifying eligible articles. The initial query was performed on July 21, 2017, and returned 370 articles. After removal of one duplicated article and two non- English articles, the remaining 367 articles were reviewed based on titles and abstracts. 331 of them met the exclusion criteria and were filtered out. The remaining 36 articles were then reviewed in full text. One article was removed because it is not a peer-reviewed article. Four studies were excluded from the list because they are external validations of two existing readmission prediction models without major modifications (two articles validated the HOSPITAL score , one article validated the LACE score , and one article validated both the HOSPITAL and LACE scores). The original article of HOSPITAL score was included in this review while the LACE score article was not because the study outcome was both mortality and 30-day unplanned readmission. Four articles were specific to certain diagnoses and thus were removed. Five studies were filtered out because they did not differentiate unplanned readmissions. Nine articles were excluded because they did not report statistically significant predictor variables or risk factors. The remaining 13 highly relevant articles were included in this literature review.
Table 1 in the appendix summarizes the characteristics of the 13 studies. The literature on this topic is very recent. Although we did not intentionally limit the publication date, the earliest eligible article was published in 2009, reflecting a growing interest in predicting 30-day unplanned all-cause readmissions in the past decade. Of the 13 studies, over half (7/13) are based in the United States, two in Israel, two in Singapore, one in Sweden, and one in Taiwan. The majority (12/13) of the studies are retrospective and only one study  adopted the prospective design. Multivariate logistic regression is the most used analysis method (12/13) to identify significant predictor variables and only one study  used Poisson regression.
|Categories||Predictor variables||# Significant / # analyzed|
|Sociodemographic factors||Age ^ ~||5/11||2/6||3/5|
|Gender ^ ~||2/10||1/6||1/4|
|Race or ethnicity ~||2/6||2/4||0/2|
|Required financial assistance*||1/1||-||1/1|
|Index admission in a subsidized ward*||1/1||1/1|
|Healthcare utilizations||Number of hospital admissions ^ ~||7/7||3/3||4/4|
|Number of emergency department visits ^ ~||2/2||2/2|
|Home care services||1/1||1/1|
|Nursing home resident||1/1||1/1|
|Index admission characteristics||Length of stay ^ ~||5/8||4/5||1/3|
|Admission type ^ ~||2/2||1/1||1/1|
|Admission itself is a readmission||1/1||1/1|
|Discharged from oncology service||1/1||1/1|
|Required inpatient dialysis||1/1||1/1|
|Comorbidities and conditions||Comorbidity indices ^ ~||3/5||2/4||1/1|
|Number of comorbidities||2/3||2/3|
|Cancer/malignancy ^ ~||2/3||2/3|
|Chronic obstructive pulmonary disease ~||1/2||1/2|
|Diabetes mellitus ~||1/2||1/2|
|Heart diseases ~||1/2||1/2|
|Acute kidney injury||1/1||1/1|
|Chronic renal failure||1/1||1/1|
|Chronic kidney disease||1/1||1/1|
|Lab tests||Hemoglobin level at discharge||2/2||2/2|
|Albumin level ~||1/2||0/1||1/1|
|Sodium level at discharge||1/1||1/1|
|Medication||Treatment with anti-depressants||1/1||1/1|
|Functional status and health literacy||At-admission activities of daily living*||1/1||1/1|
|In-hospital activities of daily living decline*||1/1||1/1|
|Hospital factors||Bed occupancy||1/1||1/1|
|Admitted to a Veterans Affairs hospital*||1/1||1/1|
^Predictor variables found to be significant in more than one countries or regions
~Predictor variables studied in more than one countries or regions
*Predictor variables with low generalizability
Table 1: Summary of the corresponding predictor variables of the identified risk factors.
The studies are highly heterogeneous in data type and data sources. Two studies [28,29] used claims data only and five studies [24,30- 33] used clinical and/or administrative data from electronic health records (EHR) only. Four studies [26,34-36] combined data from various sources, including proprietary EHR, validated questionnaires, hospital information systems, veterans affairs database, and Medicare dataset. One article  studied state-level discharge summary data and one study  retrospectively analysed the control group of a clinical trial.
The definitions of unplanned readmissions are also very distinct. Four studies [26,32,33,37] directly used the data of unplanned readmissions without any definitions. Two studies [31,35] only included readmissions to emergency departments within 30 days of discharge because emergency department visits are not scheduled in advance. Three studies [28,30,36] excluded planned readmissions based on Clinical Classification Software (CCS)  or Diagnosis Related Group (DRG)  codes, including transplantations, psychiatric issues, maintenance chemotherapy, dental procedures, pregnancy-related procedures, and other planned procedures. Two studies [27,34] excluded admissions to the specialties of obstetrics, gynaecology, dentistry, otolaryngology, ophthalmology, orthopaedic surgery, general surgery, or psychiatry. One study  excluded admissions with a principal diagnosis of cancer because cancer patients may have planned stays for cancer treatments. One study  separated readmissions into potentially avoidable and unavoidable based on administrative data with a validated algorithm SQLape . Unavoidable readmissions include planned readmissions and any unforeseen readmissions for new conditions not related to known diseases during the index admissions . The unavoidable readmissions were excluded from the analysis.
From the 13 studies, a total of 42 risk factors were identified and their corresponding predictor variables were aggregated and summarized in Table 1. They belong to eight major categories, including sociodemographic factors, healthcare utilizations, index admission characteristics, comorbidities and conditions, lab tests, medication, functional status and health literacy, and hospital factors. For each predictor variable, the number of studies found it significant was reported along with the number of studies included it in the analysis. 13 predictor variables were found to be statistically significant (P<0.05) in more than one studies (including age, gender, race, rurality, the insurance payer, the number of hospital admissions in six months or one year before the index admission, the number of emergency department visits in six months or one year before the index admission, the length of stay of the index admission, the type of the index admission, the comorbidity indices, the number of comorbidities, cancer, and the haemoglobin level at discharge). 17 predictor variables were studied in more than one countries or regions and eight of them were found to be significant (P<0.05) in more than one countries or regions (including age, gender, the number of hospital admissions in six months or one year before the index admission, the number of emergency department visits in six months or one year before the index admission, the length of stay of the index admission, the type of the index admission, the comorbidity indices, and cancer). Of the 42 risk factors, 34 meet our generalizability requirements (with answers NO to the three questions) and were found to be highly generalizable. The corresponding predictor variables of the eight risk factors with low generalizability were labelled with asterisk (*) in Table 1 (including the insurance payer, required financial assistance, index admission class, index admission was in a Veterans Affairs hospital, index admission was in a subsidized ward, at-admission activities of daily living, in-hospital activities of daily living decline, and health literacy).
Although it was not the intention of this study to review risk factors only applicable to the United States, about half (7/13) of the studies were based in the United States. To account for the potential bias towards US studies, it is meaningful to compare the risk factors identified within and outside the United States. For each variable, the number of studies found it significant and the number of studies analysed it were further classified by study regions (either US or non- US) (Table 1). For predictor variables only studied in one region, the corresponding numbers in another region were left blank for the sake of clarity. No obvious regional difference was observed for the eight categories, except that the studies in the United States preferred composite comorbidity measures (comorbidity indices and the number of comorbidities) to the presence of individual comorbidities. However, this cannot be justified by significance tests due to the small sample size.
From the 13 studies, 42 risk factors have been identified with 34 being highly generalizable. Their rationale, generalizability, and identification methods will be discussed in this section.
In this review, sociodemographic factors were reported by most studies. Age, gender, race, and socioeconomic status are normally used as predictor variables to account for demographic and social influences on readmissions.
Older age has been reported to associate with higher readmission rates [19,41]. The possible reason is that older patients are often frailer and face more health issues than younger patients, such as comorbidities and polypharmacy . Studies have also observed significant differences in readmission rates between genders [42-44]. Besides biological differences, gender-related social behaviours may play a role in the different readmission patterns . Race and ethnicity can also potentially affect readmissions because they are dimensions of a society’s stratification system to distribute resources, risks, and rewards .
Socioeconomic status measures an individual or a group’s economic and social position by considering income, education, and occupation . Evidence showed that poor physical and psychological health outcomes, including hospital readmissions, were associated with socioeconomic status disadvantage (e.g., low income, limited education, substandard neighbourhood) [47-49]. Although the mechanism is still under debate, lower socioeconomic status was reported to indirectly affect health by causing more stress, exposure to worse physical or social environments, unhealthy lifestyles, or limited access to healthcare resources .
Age was considered in 11 studies among which five studies found that increasing age or older age were significantly associated with readmissions. Two studies found that male gender was a risk factor. African American race was found to have a higher readmission risk in two studies. Living in a rural area, having certain insurance payers, education level lower or equal to high school, requiring medical financial assistance are other reported risk factors.
It is worth noting that some factors under this category may depend on or interact with each other. One example is that, in the United States, most people need to reach age 65 to qualify for Medicare, a national insurance program administered by the US government . In this case, Medicare insurance depends on age. These factors can further interact with each other in more implicit ways. Therefore, studies with these factors may need more careful planning and design.
In addition, factors in this category are unmodifiable. It has long been argued in the United States that using readmission rate as a quality indicator without adjusting for unmodifiable socioeconomic factors is unfair because they are beyond the control of hospitals . In 2016, the socioeconomic risk adjustment in hospital readmission measures was finally enforced by the “21st Century Cures Act” .
Many studies have incorporated patients’ previous healthcare utilizations into readmission prediction models. The assumption is that higher utilizations such as repeated admissions to hospitals or emergency department’s visits prior to the index admissions may account for the total burden of illness , which can potentially relate to readmissions.
Six months or one year are the most common look back periods to count previous hospital admissions or emergency department visits. A longer look-back period may potentially include utilizations less relevant to the readmission of interest and dilute the impact of more recent utilizations. Besides higher numbers of previous hospital admissions and emergency department visits, “received home care services” and “being a nursing home resident” were also identified to associate with higher readmission risks.
Index admission characteristics
It has been shown that the length of stay of index admissions can influence readmissions . A longer stay may indicate a more complicated underlying situation and may expose the patient to more risks . However, a shorter stay may also link to a higher readmission risk because the patient may not be ready for early discharge [13,55]. The relationship between the readmission risk and the length of stay has been found to be U-shaped rather than monotonic . In this review, we did not observe a large discrepancy in the effect of length of stay between the studies as they all agreed that longer index admissions were related to higher risk of readmissions.
Besides the length of stay, the risk factors of acute admission type, admission is a readmission, discharged from oncology service, required inpatient dialysis, and required procedures during the index admission all indicate that patients were in severe situations during the index admissions.
Comorbidities, conditions, lab tests, and medications
It is well established that comorbidities are associated with undesired healthcare outcomes [56-58]. To date, there has been no consensus on the definition of comorbidity yet, but the core concept is the coexistence of more than one condition in the same patient . Evidence shows that the top primary diagnoses of potentially avoidable readmissions are often possible complications of a comorbidity  and higher comorbidity has been linked to increased readmission risks [61,62]. In readmission predictions, comorbidities are represented either in the form of the number of comorbidities, the comorbidity index, or the presence of a comorbid condition.
It has been found that the readmission risk will rise as the number of comorbidities increases from the reviewed studies. More than just counting the number of comorbidities, the comorbidity index further accounts for contributions of different comorbidities. Charlson  and Elixhauser  are the most commonly used comorbidity indices . The Charlson index was originally developed based on medical record review of 19 comorbid conditions with each condition assigned a weight of 1, 2, 3, or 6 depending on the risk associated with mortality . A higher total index indicates a greater chance of one-year mortality. The Charlson/Deyo index is a highly referred variant by adapting the original index to 17 categories of comorbid conditions with ICD-9- CM codes . The Elixhauser index includes a more comprehensive list of 30 comorbidities  but with little overlap with the Charlson index . According to a systematic review of 54 articles in 2012, the Elixhauser index generally outperforms other available indices .
The presences of some chronic or acute conditions are also related to readmissions. Especially, cancer, chronic obstructive pulmonary disease, heart diseases, renal diseases, diabetes mellitus, and sepsis found in this review are among conditions associated with the most readmissions . The included lab tests and medication are closely related to some conditions on the list, such as Anaemia, renal diseases, and depression.
It has been argued that inclusion of comorbidity measures or diagnosis codes in readmission prediction models may reduce the timeliness of the predictions. The reason is that in practice they are normally only available after discharge .
Functional status and health literacy
According to Leidy’s definition, functional status measures a person’s ability to provide for the necessities of life, including daily activities to meet basic needs, fulfil usual roles, and maintain health and well-being . The impairment of functional status has been reported to associate with increased risk of readmissions .
Health literacy was defined as “the degree to which individuals have the capacity to obtain, process, and understand basic health information and services needed to make appropriate health decisions” by Ratzan et al. in 2000  and this definition was adopted by the Institute of Medicine (IOM) of the United States . Although not considered as a social factor, it is more distally influenced by social factors . Low health literacy may attribute to no adherence to treatment plans, compromised communications with clinicians, limited self-care skills , and is associated with many poor health outcomes, including hospital readmissions . To assess health literacy, questionnaire-based tests are administered and several tools are available .
Evidence showed the inclusion of functional status or health literacy can increase the predictive performance of readmission models . However, they are seldom used due to the difficulty of data collection , especially in the case of retrospective studies.
The factors from the hospital side may also contribute to readmissions in many ways. For example, the pressures in hospital resources (e.g., beds) may cause premature discharges of existing patients, which have shown to be related to readmissions . There is also evidence that medical errors associate with higher readmission risks .
However, similar to the finding of another study , most of the identified risk factors are patient-side factors or clinical factors and only two hospital-side risk factors (inpatient bed occupancy>95%, index admission was in a Veterans Affairs Hospital) were found. This could be attributed to the small sample size, but a more plausible reason is that most studies followed the single-center retrospective cohort design. For single-center retrospective studies, it is harder to collect encounter-level hospital-side factors. If possible, it is recommended to collect multi-center data or combine with other data sources, such as claims data, to account for the variances in hospital-side factors.
Another possible reason is that, unlike patient-side factors and clinical factors, which are usually well-defined and readily available in administrative and clinical databases, hospital-side factors are harder to collect. More efforts are needed to define and quantify hospital-side factors in higher granularity beyond the basic hospital characteristics, such as geolocation, hospital type, teaching status, and beds, especially for studies measuring readmission rates for quality compare purposes.
Generalizability of the risk factors
The objective of this study was to review risk factors that can be widely generalized regardless of target populations and their residing countries. The generalizability of the 42 identified risk factors was assessed by the three questions detailed in the methods.
34 of the risk factors meet our generalizability criteria (with answers of NO to the three questions). The corresponding predictor variables of the eight risk factors with low generalizability include the insurance payer, required financial assistance, index admission class, index admission was in a Veterans Affairs hospital, index admission was in a subsidized ward, at-admission activities of daily living, in-hospital activities of daily living decline, and health literacy.
Health insurance is country-specific. “Medicare/Medicaid as insurance” and “Medi-Cal as insurance” are significant but they are only applicable in the United States. The insurance payer is a useless predictor variable for countries with universal health care coverage. “Requiring financial assistance from Medifund”, “index admission was in a subsidized ward”, and “index admission class A” may indicate a lower socioeconomic status but they all closely relate to the financial regulations and social welfare of the patients’ residing countries. The difficulty of collecting these data can be distinct in different countries. “Index admission was in a Veterans Affairs hospital” is only valid in the United States. We excluded functional status and health literacy because they are often harder to collect (e.g., interviews, self-reporting) for retrospective studies.
We kept comorbidity indices in the list of highly generalizable risk factors. Although the Charlson/Deyo and Elixhauser indices were originally built based on ICD-9-CM codes, which were the adaption of ICD-9 codes in the United States , they have been successfully translated to work with ICD-10 codes in Canada and Switzerland [76,77].
Methods to identify risk factors
The reviewed studies are highly consistent in analytical methods. 12 studies used logistic regression and one used Poisson regression. Logistic regression and Poisson regression both belong to the family of generalized linear models, which estimate model parameters by maximizing likelihood . Poisson regression assumes the response variable follows a Poisson distribution, while in logistic regression the response variable can be binomial, ordinal, or multinomial. Binomial logistic regression is usually used in readmission predictions because the outcome is dichotomous (either readmitted or not readmitted). In binomial logistic regression, the binary response variable is linked to the linear combination of independent predictor variables through a logit function . Poisson regression models a discrete count response variable with the logarithm as the link function . In these studies, the adjusted odds ratio was the most used metric to assess a variable’s degree of association to the response variable. The odds ratio measures the relative chance of an outcome of interest to occur under different exposures . The significance levels were set to 0.05 in all the studies.
Surprisingly, none of the 13 reviewed studies has used methods other than traditional statistical analysis. In recent years, data mining has been a hot research area and there have been many successful applications in healthcare . Unlike statistics, which are hypothetic-deductive, data mining uses more flexible and more inductive ways to find patterns hidden in data . Decision trees  are a family of supervised classifiers especially suitable to identify risk factors. The process to assign a label to the response variable can be visualized in a straightforward tree-like structure. The critical cut-off values of predictor variables associated with readmissions can be directly obtained from decision trees. The association rule mining  is another data mining technique appropriate to identify risk factors. This technique intends to discover strong rules (frequent item sets) from data based on predefined criteria. The risk factors can be extracted from the rules with high ranks.
Another concern is that some studies reported results without evaluations and/or internal validations of the prediction models. To reduce the bias and improve the usefulness of a prediction model, it is recommended to report prediction models following the guidelines in the TRIPOD statement  and the statement from the American Heart Association . Especially, it is important to evaluate and report the model’s performance in the derivation and validation datasets.
The most popular model evaluation metric is the area under the receiver operating characteristic curve (AUC, or AUROC), or called the c-statistic . The receiver operating characteristic curve is a graphical representation of a binary classifier’s performance as the discrimination threshold is varied . The AUC measures the model’s ability of discrimination and can be interpreted as the probability that the model will rank a randomly selected positive sample higher than a randomly selected negative sample . The AUC ranges from 0.5 to 1 with 1 indicating a perfect classifier.
The two widely used validation methods are hold-out crossvalidation and k-fold cross-validation . The hold-out method splits the dataset into a derivation dataset and a validation dataset. The derivation dataset is used to build the prediction model and the validation dataset is used to test the model. The disadvantage of the hold-out method is the partition of the original dataset might be biased and the resulting derivation and validation datasets might follow different local distributions. To overcome this issue, k-fold cross-validation method randomly splits the original dataset into k equal-sized partitions and uses one partition as the validation dataset and the remaining partitions as the derivation dataset. This process will be repeated k times and the k validation results will be averaged as the final validation result.
This study has a couple of limitations. First, due to the strict inclusion criteria, only 13 articles were selected into the final literature review and 15/34 of the highly generalizable risk factors were reported in only one study. Because of the small sample size, it is infeasible to conduct statistical significance tests. However, the intent of this study was not to review risk factors that shared by most studies. Instead, the objective was to provide a list of highly generalizable risk factors to guide the selection of baseline predictor variables in different readmission studies. Even if some risk factors were reported by only one study, we chose to keep them in the list because they were reported to be statistically significant in the prediction of readmissions and can be easily applied to other studies.
Second, the articles are imbalanced in study regions with about half (7/13) based in the United States. This may introduce bias and potentially weaken the generalizability of some risk factors. However, after comparing the US and non-US studies, we did not find an apparent difference in most risk factor categories. Studies in the United States are more likely to use composite comorbidity measures such as comorbidity indices and the number of comorbidities other than individual comorbidities. Although the reported comorbidity indices were originally developed in the United States based on ICD-9-CM codes, they have been translated to work with ICD-10 codes and have been applied globally.
In this work, we have identified 34 highly generalizable risk factors for unplanned 30-day all-cause hospital readmissions. They are not specific to any populations or places and the corresponding predictor variables can potentially serve as baseline predictor variables in readmission prediction studies around the world. The majority of the identified risk factors are patient-side factors and clinical factors. Only two hospital-side factors have been identified. This could be due to the limitation of the study design and the difficulty of data collection. No major difference has been observed between the risk factors identified inside and outside the United States except that US studies appeared to prefer composite comorbidity measures. However, this assertion should be validated by significance tests when more eligible studies become available. All the reviewed studies have used traditional statistical regression-based methods to identify risk factors. More applications of modern data mining techniques in readmission prediction studies are expected. Because the reviewed studies only explored the association not causation between different variables and readmissions, the identified risk factors should be only used for the predictive modelling of readmissions not for clinical purposes. When more eligible studies become available, this review will be updated by extending the list of highly generalizable risk factors and incorporating statistical analysis to study the variances in different studies. Overall, the literature suggests a growing interest in developing hospital readmission models in the past decade. The findings of this review can guide the selection of baseline readmission predictor variables and potentially provide the foundation for international collaborations on readmission predictions.