Joint Generation of Binary and Nonnormal Continuous Data

The use of joint models that are capable of handling different data types is becoming increasingly popular in biomedical practice. Evaluation of various statistical techniques that have been developed for mixed data in simulated environments requires concurrent generation of multiple variables. In this article, I comprehensively evaluate the unified framework proposed by Demirtas et al. for simultaneously generating binary and nonnormal continuous data given the marginal characteristics and correlation structure. I conduct this assessment in three simulated settings with synthetic bivariate and multivariate data as well as real depression score data from psychiatric research. Considering close agreement between the specified and empirically computed quantities on average, as measured by some key biasand precision-related quantities, the methodology appears to have prospects to address the need of generating intensive data that have binary and continuous parts for simulation purposes. Citation: Demirtas H (2014) Joint Generation of Binary and Nonnormal Continuous Data. J Biomet Biostat S12: 001. doi:10.4172/2155-6180.S12-001


Introduction and Motivation
Most data sets include different types of variables. For instance in clustered or longitudinal designs, multiple variables are often measured or observed for each individual or at each occasion. This work is concerned with the evaluation of the framework proposed by Demirtas et al. [1] for simultaneously simulating binary and possibly nonnormal continuous data given the marginal characteristics of each variable as well as the linear association structure among variables in the system. The mechanism is built upon a combination of a few random variate generation routines that involve simulation of correlated binary data, multivariate normal data, a mix of binary and normal data, and multivariate nonnormal continuous data. Multivariate normal data generation is well-studied; and correlated binary data generation routine we utilize is predicated on computing tetrachoric correlations via solving a series of double integration equations assuming underlying normality before dichotomization at thresholds that correspond to the specified marginal proportions [2]. A computational routine for joint generation of a binary and normal data, proposed by Demirtas and Doganay [3], constitutes an intermediate stage before augmenting the continuous part so nonnormal data can be accommodated; and is driven by the relationship among phi coefficient, point-biserial and tetrachoric (biserial) correlations. Nonnormality is handled by an extension of Fleishman's [4] power polynomials procedure of expressing any given variable by the sum of linear combinations of powers of a standard univariate normal variate to the multivariate case by finding intermediate correlations that reflect the correlation structure of multivariate normal data whose components yield the nonnormal data through the coefficients of powers of normal variates [4][5][6]. Once such data are simulated, variables in the subsequent analyses can be treated as predictors or outcomes.
The method assumes that all variables before any transformation jointly follow a multivariate normal density. Some components are designed to be dichotomized to obtain binary variables, and some components form a basis for generating continuous data with the intended distributional features. In this random number generation (RNG) system, the proportion parameters for binary data, symmetry and elongation parameters for continuous data (as measured by the third and fourth moments) and a linear association structure among all variables need to be specified.
The following relationships among correlations should be established: 1) for the binary-binary pairs, correlations before and after dichotomization (former is computed, latter is specified); 2) for the continuous-continuous pairs, correlations before and after power polynomial transformation (former is specified, latter is computed); 3) for the binary-continuous pairs, correlations before and after dichotomizing one of the variables (former is computed, latter is specified). Unknown quantities are tetrachoric correlations, intermediate correlations and biserial correlations for Items 1, 2 and 3 above, respectively. The first one is computed through integration, the second one is involved with fairly rudimentary algebra, and the third one comes from a simple formula from the dichotomization literature. Once these operations are performed, one can form an overall correlation matrix for a multivariate normal distribution that would lead to the specified correlations after dichotomizing some of the variables via thresholds that are determined by marginal proportions for the binary part; and after applying the power transformation procedure for the continuous part. More detailed prescription is given in algorithm section. As long as some conditions outlined in algorithm hold, this method is capable of generating data that follow the specified linear association structure for all variables, means of the binary variables, and skewness and kurtosis behavior for continuous variables.
The organization of the rest of the article is as follows. In Getting Ready: Fundamentals, I provide background information on how to generate multivariate normal data, multivariate binary data through dichotomization of an underlying bivariate normal distribution whose correlation is computed by solving a numerical integration problem. Repeating this process for each possible pair gives us the overall correlation matrix. The dichotomized versions are obtained by the specified marginal proportions. I also discuss how the magnitude of the correlation changes when only one variable is dichotomized for bivariate data. In addition, the technique of power polynomials for generating multivariate continuous data is delineated in detail. In Algorithm, I outline the methodology proposed by Demirtas et al. [1] to generate mixed data. In Simulations, I present three simulation studies that encompass a broad range of distributional setups for bivariate and multivariate cases for evaluating the performance of this RNG technique by commonly accepted accuracy and precision measures. Two of these simulated scenarios use synthetic data, and one is devised around a real depression score data from psychiatric research. Discussion includes discussion, concluding remarks and future directions.

Getting Ready: Fundamentals
In this section, I give key characteristics of multivariate normal (MVN) and multivariate binary (MVB) data generation, dichotomization as well as univariate and multivariate Fleishman polynomials.

MVN data generation
Suppose Z ∼ N d (μ,Σ), where μ is the mean vector, and Σ is symmetric, positive semidefinite, d×d variance-covariance matrix. A random draw from a MVN distribution can be obtained using the Cholesky decomposition of Σ and a vector of univariate normal draws. The Cholesky decomposition of Σ produces a lower-triangular matrix A for which AA T =Σ. If z=(z 1 , ...,z d ) are d independent standard normal random variables, then Z=μ+Az is a random draw from the MVN distribution with mean vector μ and covariance matrix Σ.

MVB data generation
Although several multivariate binary data simulation routines appeared in the literature [7], the one that fits into our framework was proposed by Emrich and Piedmonte [2] . We could generate multivariate normal outcomes (Z's) whose correlation parameters are obtained by solving the equation for ρ jk (j=1,...,d − 1; k=2,...,d) where z(p) denotes the p th quantile of the standard normal distribution. As long as δ jk is within the feasible range, the solution is unique. Repeating this numerical integration process d(d−1)/2 times, one can obtain the overall correlation matrix (say Σ) for the d-variate standard normal distribution with mean 0. However, it should be noted that positive semidefiniteness of Σ cannot be guaranteed. To create dichotomous outcomes (Y j ) from the generated normal outcomes (Z j ), we set Y j =1 if Z j ≥ z(1−p j ) and 0 otherwise for j=1,...,d. This produces a vector with the desired properties.

Fleishman polynomials
Fleishman [4] argued that real-life distributions of variables are typically characterized by their first four moments. He presented a moment-matching procedure that simulates nonnormal distributions often used in Monte Carlo studies. It is based on the polynomial transformation, Y=a+bZ+cZ 2 +dZ 3 , where Z follows a standard normal distribution, and Y is standardized (zero mean and unit variance). The distribution of Y depends on the constants a, b, c and d, whose values were tabulated for selected values of skewness (ν 1 =E[Y 3 ]) and kurtosis (ν 2 =E[Y 4 ]-3) in the original paper [4]. This procedure of expressing any given variable by the sum of linear combinations of powers of a standard normal variate is capable of covering a wide area in the skewness-elongation plane whose bounds are given by the general expression , by utilizing the first 12 moments of the standard normal distribution, the following set of equations can be derived after simple but tedious algebra: 24[bd+c 2 (1+b 2 +28bd)+d 2 (12+48bd+141c 2 +225d 2 )]−ν 2 =0 These equations can be solved by the Newton-Raphson method, or any other plausible root-finding or nonlinear optimization routine. More details for the Newton-Raphson algorithm for this particular setting is given by Demirtas et al. [1], and a computer implementation can be found in Demirtas and Hedeker [8]. Note that the parameters are estimated under the assumption that the mean is 0, and the standard deviation is 1; the resulting data set should be back-transformed to the original scale by multiplying every data point by the standard deviation and adding the observed data mean.
Fleishman's method has been extended in several ways in the literature. One extension uses the fifth-order polynomials in the spirit of controlling for higher-order moments [9]. The other one is in regard to a multivariate version of the power method [5] that plays a central role for the remainder of this paper. The procedure for generating multivariate continuous data begins with computation of the constants given in Equations 2-5, for each variable independently. The multivariate case can be formulated in matrix notation as shown below. First, let Z 1 and Z 2 be variables drawn from standard normal populations; let z′ be the vector of powers 0 through 3, z′=[1, Z, Z 2 , Z 3 ]; and let w′ be the weight vector that contains the power function weights a, b, c and d, w′=[a, b, c, d]. The nonnormal variable Y is then defined as the product of these two vectors, Y=w′z. Let 1 2 Y Y r be the correlation between two nonnormal variables Y 1 and Y 2 that correspond to the normal variables Z 1 and Z 2 , respectively. As the variables are standardized, meaning where  is the expected matrix product of z 1 and z′ 2 : is the correlation between Z 1 and Z 2 . After algebraic operations, the following relationship between in terms of polynomial coefficients ensues: r . Clearly, correlations for each pair of variables should be assembled into a matrix of intercorrelations that will be used in multivariate normal data generation. For a definitive source and in-depth coverage of Fleishman polynomials [6].

Correlation and dichotomization
A correlation between two continuous variables is conventionally computed as the common Pearson correlation. A correlation between one continuous and one dichotomous variable is a point-biserial correlation, and a correlation between two dichotomous variables is a phi coefficient (δ jk in Equation 1). The point-biserial and phi coefficients are special cases of the Pearson correlation. That is, if we apply the Pearson formula to data involving one continuous and one dichotomous variable, the result will be identical to that obtained using a formula for a point-biserial correlation. Similarly, if we apply the Pearson formula to data involving two dichotomous variables, the result will be identical to that obtained using a formula for a phi coefficient. The point-biserial and phi coefficients are typically used in practice for analyses of relationships involving variables that are true dichotomies [10]. For example, one could use a point-biserial correlation to assess the relationship between sex and cholesterol level, and one could use a phi coefficient to measure the relationship between sex and smoking status (smoker vs. nonsmoker).
Some variables that are measured as dichotomous variables are not true dichotomies. An example would be a situation where the measured variable is dichotomous (obese vs. non-obese), but the underlying variable is continuous (body mass index). Special types of correlations, specifically biserial and tetrachoric correlations, are used to measure relationships involving such artificial dichotomies. Use of these correlations is based on the assumption that underlying a dichotomous measure is a normally distributed continuous variable. For the case of one continuous and one dichotomous variable, a biserial correlation provides an estimate of the relationship between the continuous variable and the other continuous variable underlying the dichotomy. For the case of two dichotomous variables, the tetrachoric correlation (ρ jk in Equation 1) describes the relationship between the two continuous variables underlying the measured dichotomies.
Suppose that X and Y follow a bivariate normal distribution with a correlation of ρ XY . If X is dichotomized to produce X D , then the resulting correlation between X D and Y can be designated as where p and q are the proportions of the population above and below the point of dichotomization, respectively, and h is the ordinate of the normal curve at the same point. The sign of correlation in Equation 7 should not change with dichotomization, so high and low values of X are assigned 1 and 0, respectively. Of note, for the purposes of this work, artificial versus true dichotomies or terminology differences such as "biserial" versus "tetrachoric" are inconsequential.
Obviously, this is all based on normality. However, the specified correlations among nonnormal continuous variables, along with marginal characteristics of nonnormal data are used to compute the corresponding intermediate correlations among normal variables that underlie the nonnormal continuous ones through Equation 6.
The way of handling the effect of Fleishman's [4] transformation on the correlations between binary-normal and binary-nonnormal continuous pairs is described in Algorithm. Pearson correlations for the normal-normal pairs, phi coefficient for the binary-binary pairs and point-biserial correlation for the binary-normal pairs are obtained within the algorithmic stages, consistent with the dichotomization terminology given in this subsection. After performing calculations given in Equations 1 and 7, one finds an overall Pearson correlation matrix for the underlying multivariate normal realizations before dichotomizing some variables in the system.
After laying the groundwork, I summarize the methodology proposed by Demirtas et al. [1] for generating mixed data in the next section.

Algorithm
Finding the coefficients of powers of normal components of any continuous distribution can be performed by solving a set of nonlinear equations, and employing these coefficients in determining the intermediate correlations among those normal components are explained in Fleishman polynomials. MVN and MVB generation with underlying normal distribution are well-understood, and along with the mathematical connection between point biserial and tetrachoric correlations described in MVN data generation and MVB data generation, one can generate a set of binary and normal variables in a unified manner given marginal proportions and a set of correlations before conducting a transformation from normal to nonnormal variates. More specifically, algorithmic steps are given below. In what follows, B, N and C stand for set of binary, normal and nonnormal continuous variables, respectively. Let X 1 ,X 2 , ...,X j be a set of binary variables with proportion parameters p 1 ,p 2 ...,p j , and let Y m represent the set of continuous variables with known or calculable skewness (ν 1m ) and kurtosis (ν 2m ), where m=1,2,...,k. The (j+k)×(j+k) correlation matrix is Σ. Without loss of generality, assume that variables are arranged in a certain order where similar types of variables are grouped together. Then, Σ is comprised of three components: Σ BB , Σ BC , and Σ CC , where B and C correspond to binary and continuous parts, respectively. In this setup, Σ BB is a j×j submatrix and Σ CC is a k×k submatrix of Σ that stand for the correlations between the binary-binary and continuous-continuous combinations, respectively. Similarly, Σ BC represents a j×k submatrix whose elements are the correlations between binary and continuous variables.
Required parameter values (p for binary variables, ν 1 and ν 2 for continuous variables, and the correlation matrix Σ whose partitions are Σ BC , Σ BB and Σ CC ) are either specified or estimated from a real data set that is to be mimicked.
2. Find the upper and lower correlation bounds for the BB part using the information given in MVB data generation.

Repeat
Step 2 for the BC and CC parts by the approximation method proposed by Demirtas and Hedeker [11]

Bivariate case
The bivariate simulations consist of one binary and five different continuous distributions whose densities, key shape characteristics and population values for skewness (ν 1 ) and kurtosis (ν 2 ) are given below. Uniform distribution: f(y|a,b)=(b−a) −1 , a ≤ y ≤ b, where a and b are the lower and upper bounds of the support of y. We take (0,1) for (a,b). Here, the density is flat and symmetric around its mean (ν 1 =0 and ν 2 =−1.2).

χ 2 distribution:
is the degrees of freedom. k is chosen to be 32. The density is positively skewed and leptokurtic (ν 1 =0.5 and ν 2 =0.375).
This set of continuous distributions covers flat, unimodal and bimodal symmetric, right-and left-skewed shapes in terms of the third moments (skewness); and leptokurtic and platykurtic shapes in terms of the fourth moments (elongation).
2×2×2=8 combinations were employed where the binary proportion p, the correlation Σ 12 and sample size n take the values (0.3,0.5), (−0.3,0.4) and (100,10000), respectively, for each of the five continuous distributions above. The experiment was repeated N=1000 times for each of the 2×2×2×5=40 scenarios. Results for the large sample case were tabulated in Table 1 due to space limitations as the small sample case yielded little or indiscernible deviations. The parameters of interest are three marginal quantities (p,ν 1 ,ν 2 ) and one associational quantity (Σ 12 ). The mean estimates that were obtained by averaging the results across N=1000 simulation replicates are reported in Table  1, focusing on the accuracy aspects (the other two simulation studies that we describe in Multivariate case and Real data-driven case include a precision measure as well). Very close resemblance between the specified and empirically computed quantities on average throughout all scenarios strongly indicates that the procedure is working properly.

Multivariate case
Following the notation in Algorithm, there are two binary (X 1 and X 2 ) with respective proportions p 1 =0.4 and p 2 =0.7, and four continuous variables (Y 1 , Y 2 , Y 3 and Y 4 ) that follow Laplace, Normal mixture, Beta and χ 2 , respectively, in our simulated multivariate setting. The specified variables (the mean and standard deviation could be specified via a distribution or come from a real data set). Note that correlations remain unchanged with a linear transformation. Estimate the power coefficients (a,b,c,d)   13. Generate multivariate normal data with a mean vector of (0,...,0) k+j and correlation matrix of Σ * .
14. Obtain binary variables by the thresholds determined by marginal proportions using quantiles of the normal distribution.
15. Obtain continuous variables by the sum of linear combinations of standard normal using the corresponding (a,b,c,d) coefficients. 16. Go back to the original scale for continuous variables by reverse centering and scaling.
There are a few operational issues that need to be addressed. First, two specification violations can occur if the set of parameter values is specified by the user. In Step 1, the correlation matrix Σ should pass the positive semidefiniteness check. In case of failure, the whole process is aborted. Steps 2 and 3 are designed to protect against correlation bound violations. Correlations among variables are typically not free to vary between −1 and 1, with bounds determined by the marginal distributions. The sorting method of Demirtas and Hedeker [11] can be employed to identify any bound violations that arise from a specification error. If the parameter values are estimated from a complete real data set, positive semidefiniteness condition for Σ must hold and unfeasible correlation range can never ensue. Second, Fleishman polynomials do not cover the entire skewness-elongation plane. Therefore, most but not all not (ν 1 ,ν 2 ) specifications are plausible. Third, even when no above-mentioned possible complications exist, the final correlation matrix, Σ*, is not guaranteed to be positive semidefinite. In such cases, I recommend the method of Higham [12] to proceed with the nearest positive semidefinite correlation matrix. Caveats aside, these days many software packages are capable of performing these algorithmic steps with relative ease from a practical standpoint.

Simulations
The performance of the method has been evaluated in bivariate and After applying the algorithm, the final, overall correlation correlation matrix Σ * of the multivariate normal distribution that plays a central role in obtaining subsequent dichotomization and power transformation to yield the data with desired properties turned out to be Denoting the elements of * ij ∑ where i=1,2,...,6 and j > i, Σ 12 (the BB combination) is computed through the relationship between the tetrachoric correlation and phi coefficient given in Equation 1 (Step 5 in the algorithm). The entries The parameters of interest are the odds ratio between X 1 and X 2 and 15 nonredundant correlation parameters. 1000 simulated data sets were generated with n=1000 rows. The true values (TV), average estimates (AE), raw biases (RB), percentage biases (PB) and 95% coverage rates (CR) across 1000 replicates are shown in Table 2. If the parameter of interest isˆ, AE, RB and PB are accuracy measures, and CR is a hybrid measure of accuracy and precision. Subject to unbiasedness, CRs that are close to the nominal level (95%) suggest that the standard errors are neither too small nor too large; Type 1 and Type II error rates are properly controlled [13]. Of note, we worked with logarithm of odds ratio and Fisher transformed versions of correlations, since they are more likely to follow a normal distribution, which gives more accurate results when we construct confidence intervals. The results tabulated in Table 2 reveal that the average estimates are very similar to the specified values; raw biases are negligibly small and percentage biases are within 5%  (overwhelming majority of them are less than 2% in either direction); and coverage rates are in the neighborhood of the ideal (expected) 95% level.

Real data-driven case
The real data-based simulation study comes from the National Institute of Mental Health Schizophrenia Collaborative Study [14]. Patients were randomly assigned to receive one of three anti-psychotic medications or a placebo. As mentioned by the authors, performance of the three drugs was quite similar; following their approach, I collapsed the subjects from the three drug treatments into a single group. The out-come of interest, severity of illness, was measured on an ordinal scale ranging from 1 (normal) to 7 (extremely ill), which I treat as continuous. Measurements were planned for weeks 0, 1, 3, and 6, but missing values occurred primarily due to drop-out. A few subjects had missing measurements and subsequently returned; for simplicity I have removed these. A small number of measurements were also taken at intermediate time points (weeks 2, 4, and 5) which I also ignore. The sample contains 312 patients who received a drug and 101 who received a placebo. For the purposes of this work, I focus on the complete data. There were 248 and 64 completers in the drug and placebo groups, respectively. Hedeker and Gibbons [14] noted Although the total number of subjects in this study was 437, of these subjects, only 312 had complete data at all time points. I only considered complete cases as this article is concerned with generating data that resemble complete data. Obviously, one can simulate incomplete data set by first generating full data, and then imposing missing values by some nonresponse mechanism [3,15,[23][24][25].
In the data generation process, I simulated one dichotomous variable (DRUG) and four continuous variables (Y 0 ,Y 1 ,Y 3 ,Y 6 ) that correspond to the severity of illness at weeks 0, 1, 3 and 6, respectively, with n=312 subjects as in the original data set.
Specified true values of marginal (proportion of drug patients, skewness and kurtosis of Y s) and associational parameters (correlation matrix of five variables in the data) directly come from the data themselves. In addition to these descriptive measures, model-based quantities (regression coefficients) were also considered.
The regression model is based on well-known linear mixed-effects model [16]. Let where X i (n i ×p) and Z i (n i ×q) contain covariates, β contains fixed effects, b i ∼ N(0,ψ) contains random effects, and ε i ∼ N(0, σ 2 V i ). Times of measurement are often incorporated into X i and Z i , allowing the response trajectories to vary by subject. In the current context, i=1,2,...,312, n i =4, outcome Y denotes illness severity, X i consists of an intercept, DRUG, WEEK, DRUG * WEEK, Z i includes an intercept and WEEK (random slope model), and V i is the identity matrix. The parameters of interest are the β coefficients of the terms that appear in the fixed effects regressor matrix. Of note, this may not be the best analysis model, one may build more complex models. However, as long as the same model is used for finding the true parameter values from the original complete data and for analysis after simulating data, using a suboptimal model does not have any impact on the plausibility of data-generation method which is a key theme in this work.
In total, four regression coefficients, four skewness and kurtosis parameters, one proportion and 10 correlation parameters were examined. N=1000 data sets were generated by the characteristics of the data; and the evaluation criteria in Multivariate case were used to assess how unbiased and precise the estimates are. The results are tabulated in Table 3. As before, discrepancies between the average estimates and specified values are almost nonexistent, leading to extremely small biases. Coverage rates are strikingly close to the nominal levels, demonstrating that the magnitude of variability in the system is almost ideal 1 .

Some logistical and computational details
• All computations were carried out in R [17].
• Misspecification check in Step 1 was done by is.positive.definite function in R package corpcor [18]. 1 Coverage rates for ν 1 and ν 2 in the real data example were not computed due to heavily non normal sampling distribution of these quantities.
• All specified correlation terms should be within the feasible range; this has been checked by the method of Demirtas and Hedeker [8] for each pair of variables (Steps 2 and 3 in the algorithm).
• Tetrachoric correlations in Step 5 were found by tetrachoric function in R package psych [19].
• Power transformation coefficients in Step 6 were found via the R function given in Demirtas and Hedeker [9].
• Although it was not needed in this particular work, working with the closest positive semidefinite matrix can be done by nearPD function in R package Matrix [20], which is an application of Higham [12].
• Multivariate normal data generation in Step 13 was conducted by rmvnorm function in R package mvtnorm [21].
• Implementation of linear mixed effects model in the real data application (Real data-driven case) was performed by lmer function in R package lme4 [22].

Discussion
The method is concerned with repeatedly generating synthetic data with specified distributional features or data that on average mimic real  data with a mix of binary and continuous variables to assess validity and plausibility of statistical techniques. Parameters that govern the hypothetical process that leads to observed data are either specified by users or preferably estimated from a real data set. The technique relies on well-established multivariate data generation techniques for binary and normal data with added operational utility of power polynomials to preserve marginal characteristics of data as well as the association structure among the variables. There are some points that deserve attention.
• This method does not require specialized tools, it can be implemented by existing software.
• True versus artificial dichotomies and terminological complexities such as phi coefficient, biserial, point-biserial, tetrachoric correlations are unimportant and just procedural. These are different variants of the Pearson correlation, historically have been used to differentiate correlations among variables of different nature, and computational formula is the same for all.
• As long as initial and final correlation matrices (Σ and Σ*, respectively) are positive semidefinite; correlation bounds among variables are not violated; and symmetry-peakedness (skewnesselongation) behavior for continuous variables is within the region that can be handled by power polynomials, this approach works well (including situations where the binary variables are highly skewed with major proportion of 1s or 0s). If input parameters come from a real complete data set, the initial positive semidefiniteness (Step 1 in the algorithm) and correlation boundary problems (Steps 2 and 3) can never be encountered. However, the final positive semidefiniteness condition (Step 12) may not hold for all given specifications. In this case, one can resort to the technique proposed by Higham [12], to work with the nearest positive semidefinite correlation matrix.
• The original real data in Real data-driven case have some missing values, but I considered complete cases for the purpose of illustration without regard to missing data issues which are beyond the scope of this manuscript. This is not a limitation, because one can simulate incomplete data set by first generating full data, and then imposing missing values by some nonresponse mechanism [3,15,[23][24][25].
• One appealing feature of this methodology is that simulated variables can be treated as outcomes or predictors in subsequent statistical analyses as the variables are being generated jointly.
• More comprehensive simulation studies with a broader range of parameters could have been investigated, but I believe that these designs adequately accommodate salient marginal parameters such as proportion, skewness and kurtosis, and association parameters such as odds ratio, correlation and regression coefficients.
• As far as continuous part of the data is considered, the method can handle nonnormal features such as skewness, multimodality, boundary at the mode, low or high peakedness.
• This technique is currently designed to accommodate linear associations; modelling nonlinear associations is an important future direction.
• Ideas presented in this paper can be incorporated into the RNG algorithms that involve multivariate ordinal data, proposed by Demirtas [26] and Demirtas and Yavuz [27], to produce binaryordinal-continuous combinations.
Given its computational simplicity, generality and flexibility, the method is likely to be a handy addition to practitioners' toolkit. It is particularly useful for studies that involve longitudinal or clustered designs as well as other situations where multiple binary and continuous variables are collected. When biomedical researchers need to regenerate the original data trends in simulated environments, they could implement this technique in their favorite platform and software with ease.