Simultaneous Determination of Clarithromycin, Tinidazole and Omeprazole in Helicure Tablets Using Reflectance Near-Infrared Spectroscopy with the Aid of Chemometry

A near infrared spectroscopic method for the simultaneous determination of the active principles clarithromycin, tinidazole and omeprazole in a pharmaceutical preparation was developed. The three active principles are quantified using partial least-squares regression methods. The proposed method is applicable over a wide analyte concentration range (80–120%) of labeled content, so it requires careful selection of the calibration set and to ensure thorough homogenization of the product. The method was validated in accordance with the ICH standard validation guidelines for NIR spectroscopy by determining its selectivity, linearity, accuracy, precision and stability. Based on the results, it is an effective alternative to the existing choice (HPLC) for the same purpose. Simultaneous Determination of Clarithromycin, Tinidazole and Omeprazole in Helicure Tablets Using Reflectance Near-Infrared Spectroscopy with the Aid of Chemometry


Introduction
Proton pump inhibitors (PPI), mainly omeprazole (OME), combined with anti-parasite agent as tinidazole (TNZ) and antibacterial agent as clarithromycin (CLA) are used as triple therapies which are all given twice a day for a one-week regimen, according to the European recommendations, but guidelines from the United States recommend 10 or more days of therapy [1]. They were used for eradication of gastrointestinal tract infection with Helicobacter pylori [2].
Quality control in the pharmaceutical industry involves analyses of raw materials, products prior to dosing and end products, which entails a large number of analyses for each preparation. A proper control process has to meet the following criteria: should be sufficiently accurate and precise; the sample pretreatment required should be none or minimal; should allow several analytes to be simultaneously determined and should enable expeditious control of the manufacturing process. NIR is among those analytical techniques that most closely meet these requisites [21,22]. Also advances in software and improved NIR instruments have led to a wide application of NIR in the pharmaceutical industry [23]. NIR is a rapid, nondestructive method, saving time, cost and without using hazardous chemicals. The weak molecular absorbance of the molecules in the NIR range makes it possible to carry out determinations, without sample preparation, directly on the end-product.
NIR covers the transition from the visible spectral range to the mid-infrared region in the area of NIR (800-2500 nm, respectively 12821-4000 cm −1 ) [24]. It has been widely used in food, agriculture, and pharmaceutical industries [25][26][27][28]. Also, NIR is ideal for the analysis of tablets and solid pharmaceutical formulations [29]. The non-destructive nature of NIR is only exploited to the full when determinations are carried out on intact tablets. This opens up new fields of applications, such as determination of degradation products [30], qualification of clinical batches [31], determination of film-coated tablet parameters [32] and quantitative in-process determinations. Unlike UV spectroscopy, NIR is successfully applied for determination of drugs, such as clarithromycin, that lack a suitable chromophore group in different mixtures.
The direct identification of specific bands in NIR spectra is difficult or even impossible. Besides the tableting/capsule formulations also frequently contain various excipients (fillers, binders, disintegrants, lubricants, glidants, solubilizers, etc.). These inert excipients, even in small quantities, have the potential to affect the characteristics, quality, stability, and/or performance of the final drug product. So, the simultaneous determination of active principles for multicomponent preparations in presence of excipients is carried out by using different multivariate calibration techniques. The partial least-squares models (PLS) are used as a basic tool. Unfortunately the NIR spectra are often subject to significant noises, chemical and physical interferences leading to less than optimal calibration. To improve the prediction abilities and the robustness of the regression model, adapted data preprocessing and/or selection of the most relevant wavelengths can be applied.
The selection of variables will identify the most useful wavelengths and remove the non-informative, noisy wavelength zone and zones greatly affected by the formulation excipients for a robust, precise and accurate regression model [33]. Numerous approaches exist for variable selection: knowledge based selection, forward backward procedure and interval selection methods [34].
Spectral pre-processing [35], for example multiplicative scatter correction (MSC), standard normal variate (SNV), offset correction, or Savitzky-Golay derivatives, can remove non-relevant sources of variation and non-linear behavior, and may lead to more parsimonious models (less latent variables in PLS models) [36]. However, the choice of the preprocessing methods is generally based on a trial and error, on the analyst experience and on understanding clues of each technique. MSC and SNV will correct scatter effects, while derivatives will remove additive and multiplicative effects. Therefore the PLS model performs generally better than the one without pre-processing but is not necessarily optimal.
The aim of this work is to describe the simultaneous determination of clarithromycin, tinidazole and omeprazole present in Helicure ® tablets, used for the eradication of gastrointestinal tract infection with Helicobacter pylori. The new method offers an alternative to the previously proposed multi wavelength detection HPLC method which has several drawbacks: using excessive amounts of solvents, and produces polluting waste; also, the analytical procedure is timeconsuming. The proposed alternative is a fast and efficient NIR spectroscopic method that allows the pharmaceutical preparation to be characterized and its three active principles simultaneously quantified, especially clarithromycin that lacks suitable chromophore group. The method was validated for use in routine analysis in accordance with the guidelines of International Conference of Harmonization (ICH) [37,38].

Experimental Equipment and software
NIR spectra were recorded in reflectance mode using a Vector 22 N-T FT-NIR Spectrometer (Bruker Optik GmbH Karlsruhe, Germany) suited for both transmission and diffuse reflection analysis and equipped with a 30-position automatic sample trays. The measurements were performed in diffuse reflectance mode, using a fiber optical probe. Before each sample measurement, a background spectrum was performed inserting the probe into the fiber optical sampling module containing the internal reflectance reference. The powders were measured by directly inserting the probe into the samples. The method was built with reflectance spectra collected with optimized parameters: each spectrum was the average triplicate spectra using 32 scans per spectrum at a resolution of 2 cm −1 over the range from 12500 to 4000 cm -1 . Spectra were visualized using OPUS Viewer Version 6.5 (Bruker Optik GmbH Karlsruhe, Germany). The chemometric processing was done using Matlab ® 7.10.0.499 (R2010a) with the aid of PLS_Toolbox 7.3. The chemometric procedures are discussed ahead.
The reference proposed method was done using HPLC (Hitachi LaChrome Elite, Tokyo, Japan) instrument that as equipped with a model series L-2000 organizer box, L-2300 column oven, L-2130 pump with built in degasser, Rheodyne 7725i injector with a 20 µl loop and a L-2455 photo diode array detector (DAD), separation and quantitation were made on a 250 x 4.6mm (i.d.), 5µm ODS column (Inertsil, Tokyo, Japan). UV detection was performed under scan. The HPLC was operated by EZchrom Elite version 3.3.2 SP1 by Agilent.

Materials and reagents
All experiments were performed using pharmaceutical grade authentic standards of tinidazole (TNZ) and omeprazole (OMP) (Egyptian International Pharmaceutical Industries Company, 10th of Ramadan City, Egypt) and clarithromycin (CLA) (Kahira pharmaceuticals, Egypt), and all standards was certified to have a purity of 99.6-99.9% (w/w), on dried basis.
The studied dosage forms was Helicure ® enteric coated tablets contains 20 mg OMP, 500 mg TNZ and 250 mg CLA. These tablets contain talc and eudrajet L100 as major coat excipients with avicel PH-101, magnesium stearate and cross carmellose as major core tablet excipients. The excipients were kindly supplied by Medical Union Pharmaceuticals (MUP), Egypt.

Samples
The samples used for calibration and validation were of two different origins, production samples and laboratory samples.

Production samples:
The production samples used were obtained from 10 different batches and these samples were used to collect the usual variability of the pharmaceutical preparation. For each batch a single unit dose was selected for grounding in a mill for 15s in order to ensure complete homogeneity and minimize irreproducibility in spectra and analytical results for solid samples. The single dose tablet contains 22.73 wt. % CLA, 45.45 wt. % TNZ and 1.82 wt. % OMP as label content values.
The concentrations of the active principles were also determined using the proposed multi wavelength detection HPLC method and expressed as percentages relative to their nominal concentrations.

Laboratory samples:
The used production samples spanned a very narrow concentration range (± 5% around the nominal value for each active principle) that was inadequate for proper calibration. In order to expand such a narrow range to at least ± 20% around the nominal concentration of each analyte, some production samples were doped with excipients and active principle ingredients (APIs). Then concentration of the APIs was reduced adding small amounts of excipients, and the addition of different quantities of APIs allowed increasing each concentration. This under-dosing/over-dosing procedure was performed attending to minimize correlation between the concentrations of the active principles and avoiding spurious correlations among constituents. This strategy was performed by D-optimal experimental design to prepare 60 samples with three factors and five levels (80, 90, 100, 110 and 120%) that was designed by calibration software Design-Expert (State -Ease, Inc., E. Hennepin Avenue, Minneapolis, USA) version 7 by using mixture optimal design. It was previously suggested that the optimum number of calibration samples would be around 40. For this reason, a calibration set with that number of samples was selected to these NIR calibrations [39]. Samples were prepared in amounts of 1.1g each. The new mixtures were homogenized in the mill for 15s prior to analysis. A training calibration set was obtained as in Table 1. The calibration design was used with concentration ranges of 0.2-0.3 g for CLA, 0.4-0.6 g for TNZ and 0.016-0.024 g for OMP.

Data processing and modeling
The spectral pretreatments tested with a view to constructing the calibration models included the first and second derivatives, multiplicative scatter correction (MSC) and the standard normal variate (SNV). First and second spectral derivatives were obtained by applying the Savitzky-Golay algorithm to 11 moving window points and a second-order polynomial.
The models used to determine the pharmaceutical components were constructed by different partial least-squares regression models: PLS2 (all the outputs), PLS1 (only single output), and genetic algorithm (GA) with PLS1 (GA-PLS1) using the leave-one-one method for crossvalidation.

Variable selection
A NIR spectrum is a complex result of overtone absorbances and combination of absorbances. Nevertheless, the possible understanding of the spectral signals helps the rational selection of variables, i.e. wavelengths, to choose regions that are related to the API and to avoid the water active regions. This helps in development of a specific and a robust model. Only spectral ranges related to the API should be considered for variable selection. Although the identification of these API related spectral ranges can be very challenging, the aromatic nature of most APIs can help to solve this problem.

Knowledge based variable selection (Manual selection):
Water shows high absorption in the NIR region, which is due to the anharmonicity of O-H vibrations. Small variations of water content in the samples can have strong effects on the performance of the method. Water exhibits multiple absorption maxima in the NIR region (10300, 8400, 6900, 5150 cm −1 ). With the objective of gathering information on the sample, pure APIs (CLA, TNZ and OMP) and the excipients (talc, eudrajet L100, avicel PH-101, magnesium stearate and cross carmellose) used to formulate the tablet were analyzed by NIR. These spectra are shown in Figure 1. Looking at the API spectrums, the absorption band at 5500-6500 cm −1 can be originated in the overlapping of the first overtones of C-C and C-H, with the two peaks in 8000-9500 cm −1 being the respective second overtones. There are two regions that are related to the APIs, the first one is at 5700-6340 cm −1 which shows good absorbance for the three APIs and the second one at 8550-9240 cm −1 which shows good absorbance for the three API especially OMP (lowest content of API). Besides, the excipients show low absorption in these regions and they are not expected to interfere with these APIs' peaks. Also, these regions are not expected to show water absorption maxima. This turns these wavelength ranges into a potential API specific region. For correct assignment of absorption bands to functional groups in NIR [40].
In Figure 2 are presented spectra of the powdered drug formulation (production sample) and the powdered matching excipient mixture. The effect of the API specific bands identified in Figure 1 is clearly seen in the specified regions. To further investigate the effect of the API on drug formulation spectrum, matching excipient mixture spectrum was subtracted from the drug formulation spectrum. The subtraction spectrum showed the presence of quantitative information for the API content in the referred ranges. Based on the given observations, it can be concluded that the ranges 5700-6340 cm −1 and 8550-9240 cm −1 are API specific regions. Because they are also water independent regions, these spectral ranges were selected to build the identification method.
In Figure 3 are presented the spectra of the powdered production possible or candidate solutions of the problem. In order to applying GA, the chromosomes are generally coded as binary strings of digits. A value of "1" is that the wavelength was selected as being important, while the value of "0" implied that the wavelength could be omitted.
According to a chromosome, a PLS model is generated using the wavelengths represented by chromosome as well as the optimal number of components determined by cross validation. Thus an initial value of fitting value such as PRESS value is calculated for each chromosome. The members of the population are ranked according to their "fitness" values. Fit solutions are allowed to "live" and breed while unfit solutions "die" and are discarded. Then, the chromosomes with the best fitness values are selected to generate a new set of child chromosomes through the methods of recombination and mutation. The new population formed with the child chromosomes replaces the original, and the chromosomes with the best fitness values in the new population are again selected to reproduce through recombination and mutation. This procedure is an iterative evolutionary process in search of the chromosome with highest fitness value (minimum PRESS). The formation of each new population represents one iteration of the algorithms and is termed a generation. The configuration of GA parameters was shown in Table 2. GA selected 86, 106 and 94 out of 350 wavelengths for CLA, TNZ and OMP, respectively ( Figure 4).

Spectral pre-processing
Spectral pre-processing are mathematical corrections that reduce, eliminate or standardize the effect of variable physical sample properties or instrumental effects on the spectra. Correct selection of spectral data pre-processing can significantly improve the specificity and the robustness of a model. For PLS2, the selected wavelength ranges were treated with SNV then first derivative with Savitzky-Golay smoothing algorithm, 11 moving window points and a second-order polynomial, prior to chemometric treatment.
For PLS1 and GA-PLS1, the selected wavelength ranges were treated with SNV (for CLA) or MSC (for TNZ and OMP) then first derivative with Savitzky-Golay smoothing algorithm, 11 moving window points and a second-order polynomial, prior to chemometric treatment.
The Savitzky-Golay smoothing algorithm was combined with the derivation of spectra to decrease the derivation amplified spectral noise.

Principal component analysis (PCA) and sample selection
Along with the difficult or impossible univariate interpretation of NIR spectra, the multi-collinearity found between wavelengths, build up the necessity to use multivariate data analysis techniques (chemometrics) to interpret spectral data.
In Figure 5 is presented the PCA scores plot of the first two components PC2 versus PC1 for the 40 selected laboratory samples and 10 different production samples (different batches). The variance accounted by the first two factors was 97.18 and 1.96%, respectively for the pre-processed selected ranges of the spectra.
The samples were split into a calibration set (used to construct the model) and a validation set (used to estimate the predictive ability of the model). The samples were chosen from the PCA scores plot ( Figure  5). The calibration set encompassed the maximum variability in the scores plot and the whole concentration range of the analyte. It was observed that the production samples from 10 different batches showed sample and the matching laboratory prepared mixture which shows that there is nearly no difference between the two spectra indicating minimal physical difference between both samples.
Genetic algorithm: Genetic algorithm (GA) is an intelligent selection technique based on the principles of natural evolution and selection. It has been developed by Holland in 1975 [41]. To analyze the data using GA, a first population of random subsets of the wavelength variables was generated. The subset of wavelength variables whose values are to be optimized is termed a chromosome, and the individual wavelength variables are called genes. Each chromosome represents     Table 3). The validation set (samples not used during development) was used once the calibration model was defined. Table 3 contains the features of the three selected models. Some samples were excluded from the model as outliers. This was explained by presence of OMP in low concentration and poor homogeneity of some laboratory prepared samples.

Data modeling
PLS method involves the decomposition of the experimental data (NIR spectra data in that case) into systematic variations (latent variables) that explain the observed variance in data. The purpose of PLS method is to build a calibration model between the concentration of the analytes under study (CLA, TNZ and OMP) and the latent variables (LVs) of the data matrix. PLS performs the decomposition using both spectrum data matrix and analyte concentration [42]. Including extra latent variables in the model increases the possibility of the known problem of over fitting. Therefore optimization of number of the latent variables is a critical issue in the PLS method.
Cross validation (CV) [43] is applied to predict how many are the optimum number of PLS latent variables. CV involves repeatedly dividing the data into two sets, a calibration set used to determine a model and a validation set to determine how well the model performs so that each sample (or portion of the data) is left out of the calibration set once only.
Leave one out (LOO) CV is used in our study for optimizing the number of PLS components, by building the model using (n) samples set to predict the one sample left (validation sample). The concentration of the sample left out during the calibration process was determined. This process was repeated (n) times until each calibration sample had been left once. The predicted concentrations were compared with the known concentrations of the compounds in each calibration sample. The root mean square error of CV (RMSECV) is calculated as: where n is the number of training samples, whereY pred and Y true are predicted and true concentrations in µg ml -1 , respectively.
The RMSECV was used as a diagnostic test for examining the errors in the predicted concentrations. It indicates both the precision and the accuracy of the predictions. It was recalculated upon the addition of each new latent variable to the PLS-1, PLS-2 and GA-PLS models. The usual procedure involves choosing the number of factors that result in the minimum RMSEP . However, this criterion is subject to some constraints because, occasionally, the RMSECV does not reach a sharp minimum but decreases gradually above a given number of factors. On the other hand, because it is calculated from a finite number of samples, it is prone to error. For these reasons, the method developed by Haaland and Thomas [44]was used for selecting the optimum number of latent variables; it involves selecting the model that includes the smallest number of latent variables that result in an insignificant difference between the corresponding RMSECV and the minimum RMSECV.
For PLS1, the optimum number of latent variables was found to be four, three and six latent variables for CLA, TNZ and OMP, respectively. It was observed that the number of LVs increased with decreasing the analyte concentration. The higher number of latent variables for CLA and OMP (4 LVs and 6 LVs, respectively) could be explained by the fact that the major absorbance bands for both were in the same region of TNZ (Figure 1). In addition, the low concentration of OMP made the model need more factors to better explain the OMP concentration (Table 3).
For GA-PLS1, the optimum number of latent variables was found to be three, three and four latent variables for CLA, TNZ and OMP, respectively. It was observed that variable selection using GA improved number of latent variables and that is consistent with the fact that GA chooses wavelengths that are best related to each analyte. For PLS2, The

Parameters Value
Population size 64 Maximum number of generations 100 Mutation rate 0.005 Window width for spectral band 1 Percent of population the same at convergence 50 Cross-over type double Percentage wavelength used at initiation 30 Maximum number of latent variables 6 Number of subsets to divide data into for cross validation 4 Number of iterations for cross validation at each generation 2 Table 2: Genetic algorithm specification and parameters  (Table 3).
Once the calibration model was constructed, the validation set was tested and the RMSEP and bias values were calculated and used as an estimate of efficiency for the tested model (Table 3).

Validation of the method
Linearity: A straight line calibration should be demonstrated over the working range of the assay for NIR predicted values. This may be accomplished during the calibration and validation stage of the NIR method. Unlike usual calibrations, (e.g. in HPLC) when calibrations such as peak area vs. concentration over a linear range of 80% to 120% of the labeled concentration are used, NIR calibrations are best represented two dimensionally as a plot of NIR predicted assay values versus reference values [45]. The correlation coefficient (R 2 ), y-intercept, slope of the regression line and standard error of the slope are shown in (Table 3).

Range:
The calibration range was established through consideration of the practical range necessary, according to the concentration of each compound present in the pharmaceutical product, to give accurate, precise, and linear results. According to the USP [4] the range of analyte reference values in the calibration set defines that range of the NIR method. The calibration range was simultaneously determined with linearity and the samples used were confirmed to span a range at least 80-120 % around the nominal concentration of each analyte, which give good results indicating the accuracy of the method. Selectivity: Method selectivity was evaluated by analysis of laboratory-prepared mixtures of the compounds at various concentrations within the calibration range. It is carried out as external validation for the PLS1, GA-PLS1 and PLS2 models. The laboratoryprepared mixtures were analyzed according to the proposed models. Satisfactory results were obtained, indicating the high selectivity of the proposed method for the simultaneous determination of CLA, TNZ and OMP.
Accuracy: The accuracy study was performed for the spiked drug powder samples in the range of 80-120 % around the nominal concentration of each analyte using standard addition technique. The resulting mixtures were assayed and the mean percentage recoveries and their standard deviation results were obtained for CLA, TNZ and OMP and compared with the results of the proposed method. Satisfactory results were obtained, indicating the high accuracy of the proposed method for the simultaneous determination of CLA, TNZ and OMP.
Precision: Evaluation of the precision estimates, repeatability and intermediate precision, at 3 concentration levels for each compound on different days involved different weighing and dilutions. Intermediate precision was evaluated by analysis of the prepared samples by 2 different analysts using different instruments. The samples were prepared by each analyst.
The short-term precision was determined by measuring the CLA, TNZ and OMP contents of a single batch 6 times within one day by NIR method. The standard deviation and coefficient of variation for the NIR method were calculated to judge the quality and precision of the method. The precision of the method, expressed as C.V. (%), was determined by analysis of CLA, TNZ and OMP on six samples from the same batch of each product.

Conclusion
A NIR method was developed that allows a pharmaceutical preparation to be identified and its three active principles accurately, precisely, expeditiously determined with minimal sample treatment.
Because the concentrations of the three active principles span a wide range, the calibration model was constructed by using an experimental sample preparation design minimizing correlation between components of the samples. The proposed method showed major advantages over conventional methods including simplified analytical procedures, the need for no reagents and the production of no polluting waste. This results in substantial money and time savings. The only sample pre-treatment required is grinding to ensure correct proper homogeneity.
The proposed method was validated in accordance with the ICH guidelines and found to be an effective alternative to existing choice.