Background: Alternative splicing of messenger RNAs provides cells with the opportunity to create protein isoforms of a multitude of functions from a single gene by excluding or including exons during post-transcriptional processing. Reconstructing the contribution of these splice variants on the total amount of gene expression remains difficult.
Methods: We introduced a probabilistic formulation of the alternative splicing reconstruction problem using a finite mixture model, and provide a solution based on the maximum likelihood principle. Our model is based on the assumption that the expected expression level of exons in a particular splice variant is the same for all exons in that variant but allows for measurement error. In this algorithm the expression in a patient can be written as a weighted sum of the number of splice variant mixture multivariate Gaussian densities. We estimated the model parameter by maximizing the total likelihood using a Nelder and Mead optimization algorithm in R. To evaluate our algorithm we compared the AIC/BIC values of six models: Established optimal normal mixture modeling method, all exons are equally transcripted, the currently known splice variants, all possible splice variants, the known variants aided with the high prevalent variants of the all possible variants model, and manually selected splice variants.
Results: We applied the models to three genes (SLC2A10, TGFβR2 and FBN1), with 25, 29 and 265 possible splice variants, associated with Marfan’s syndrome in gene/exon expression data of 63 patients with Marfan’s syndrome. The models with the known splice variants aided with the high prevalent splice variants from the all possible splice variants had the best AI C/BI C values for all three genes. In SLC2A10 and FBN1 there was one, in TGFβR2 two predominant splice variants.
Conclusion: We found four possible new splice variants in three genes associ- ated with Marfan’s syndrome.