Selvaraj Raja^{*}, Varadavenkatesan Thivaharan, Vinayagam Ramesh and Vytla Ramachandra Murty  
Department of Biotechnology, Manipal Institute of Technology, Manipal, Karnataka, India  
Corresponding Author :  Selvaraj Raja Department of Biotechnology Manipal Institute of Technology Manipal, Karnataka576104, India Tel: +9108202924322 Fax: +9108202571071 Email: [email protected] 

Received March 21, 2014; Accepted April 26, 2014; Published April 30, 2014  
Citation: Raja S, Thivaharan V, Ramesh V, Murty VR (2014) Prediction of Viscosities of Aqueous Two Phase Systems Containing Protein by Artificial Neural Network. J Chem Eng Process Technol 5:192. doi:10.4172/21577048.1000192  
Copyright: © 2014 Raja S, et al. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author andsource are credited.  
Related article at Pubmed Scholar Google 
Visit for more related articles at Journal of Chemical Engineering & Process Technology
The viscosities of aqueous two phase system containing bovine serum albumin (BSA) were predicted by artificial neural network (ANN) as a function of concentration of polyethyleneglycol (PEG), concentration of BSA and temperature. A three layer feed forward neural network based on LevenbergMarquardt (LM) algorithm which consisted of three input neurons, 10 hidden neurons and one output neuron (3:10:1) was developed. The performance parameters were calculated and compared with the conventional GrunbergNissan empirical model. The satisfactory values suggest that the proposed ANN model has the capability of predicting viscosity in a better way than the conventional empirical model.
Keywords  
Aqueous two phase system; Artificial neural network; LevenbergMarquardt (LM) algori^{th}m; GrunbergNissan empirical model  
Introduction  
Aqueous two phase system (ATPS) is a separation method which is largely used to purify various biomolecules in a single step [1]. It is a liquidliquid extraction method which can be prepared by mixing aqueous solution of watersoluble polymer and salt or two watersoluble polymers wi^{th} water [2]. ATPS can also be formulated by mixing a watersoluble polymer and protein wi^{th} water. One of such systems has been demonstrated by Johansson [3] for the partitioning of lactate dehydrogenase and some mitochondrial enzymes.  
Viscosity is a transport property and the magnitude determines the mass transfer between phases. The various interrelated properties such as size, shape, structure, degree of polymerization and polymersolvent interactions can be obtained from the viscosity data [4]. Studying and predicting the physical property, viscosity is very much essential for the design and optimization of an ATPS in large scale system. These data are helpful in designing the extraction process in large scale (since large volumes of phase components have to be handled and separated) and for the development of various mathematical models which can predict the partitioning behavior of biomolecules in aqueous two phase system [5].  
To a large extent, viscosity of an ATPS depends on the concentration of phase component compositions and temperature. Substantial cost is involved in performing experiments in the entire range of phase components to determine the viscosity of solutions. In this context, mathematical equations should be available in predicting the physical properties. There are few polynomial equations and empirical equations are available in literature as a function of solute concentration for different temperatures [6]. However, empirical equations suffer because of their validity in a narrow range of dependent variables. Moreover the relative errors and deviations of these empirical models between experimental values and calculated values will be very high.  
One of such empirical equations for viscosity is GrunbergNissan empirical model [7] which was successfully used by Gunduz [8] for an ATPS, consisting of polyethyleneglycol (PEG) Bovine Serum Albumin (BSA)Water system. The model (Eqn. 1) can be written as follows:  
ln(η_{sys}) = a_{1} ln(η_{PEG}) + a_{2} ln(η_{BSA}) + w_{1}w_{2}A (1)  
Where, A is an adjustable parameter which is the characteristic of the intermolecular interactions between PEG and BSA. a_{1} and a_{2} are weight fractions of PEG and BSA respectively and η_{PEG} and η_{BSA} are viscosities of aqueous PEG and BSA solutions respectively.  
In recent years, an artificial intelligence technique, ANN has been widely used to model and predict linear and nonlinear systems. It is a simplified artificial model of the biological neuron system which can provide the relationship between the input and output variables from the given data set. The artificial neuron simulates the basic functions of biological neurons [9].  
The major advantage of ANN is its high learning capacity and predictive ability from the limited information fed to the system. Moreover, in contrast to conventional empirical model, ANN can accommodate multiple input and output variables and it has been proven by researchers that a welltrained ANN can predict various processes in a better way [1013].  
Recently, ANN has been used to predict viscosity of PEG solutions [14], density of ionic liquids [15] and vapor liquid equilibrium data of ionic liquids [16].  
To the best of our knowledge, there are no reports available for predicting viscosities of aqueous two phase systems containing protein using artificial neural network. Therefore the ob^{j}ective of the current study is to model the experimental viscosity of the aforementioned ATPS by using ANN and compare wi^{th} the GrunbergNissan empirical model.  
Methodology  
Collection of viscosity data from literature  
For the current study, the dynamic viscosity data set was chosen from the literature [8] which included the viscosity for the given concentration of PEG (411 % w/w), concentration of BSA (18 % w/w) and the process temperature (1535 °C). A total of 35 experimental data were taken for the present study.  
Neural network topology  
Artificial Neural Network modeling codes were developed by using Matlab R^{2}013a software. Normally, multilayers are used in ANN models and the simplest form of them is the threelayer models which contain input layer (I), one hidden layer (H) and output layer (O). Each layer consists of many nodes which are connected to each other by two coefficients known as weights (w) and biases (b). The commonly used feed forward multilayer neural network (FFMNN) was chosen for the present study. Choosing the number of neurons in the hidden layer and the transfer functions for hidden and output layer is crucial in developing an ANN model. For the present system, the number of neurons in hidden layer was chosen as 10 which was optimum.  
Tangent and linear transfer functions were chosen as hidden and output layers respectively. The output of neuron is calculated by  
O_{j} = F(I_{j}) (2)  
Where I_{j}, O_{j} and F are input of i^{th} neuron, output of i^{th} neuron and transfer function respectively.  
In the present study, hyperbolic tangent transfer function and linear transfer functions have been employed for neurons located in hidden layer and output layers respectively. The input of each neuron (I_{j}) is calculated by the following equation wi^{th} respect to outputs of previous layers (O_{j}), weights connecting i^{th} neuron to the j^{th} neuron (w_{ij}) and bias of the j^{th} neuron (b^{j})  
I_{j} = Σi w_{ij} y_{i} + b^{j} (3)  
In order to optimize the weights and biases, LevenbergMarquardt (LM) back propagation algori^{th}m has been used, which is one of the most popular tools [17].  
The data set was divided into three sets, namely, training set (70%), validation set (15%) and testing set (15%). Training set is used to train the network and the network is adjusted according to its error. Validation set is used to measure the network generalization and to stop the training process when generalization stops improving. The testing set is used to evaluate the performance of trained model against new “unseen” data and hence gives an independent measure of network performance during and after training.  
For the ANN model developed, the prediction accuracy and degree of fitness were evaluated by using the following factors namely, Mean Square Error (MSE), Root Mean Square Error (RMSE), Standard Error of Prediction (SEP), Average Absolute Relative Deviation (AARD) wi^{th} Bias factor (B_{f}) and Accuracy factor (A_{f}) [18].  
(4)  
(5)  
(6)  
(7)  
(8)  
(9)  
where μ_{i,exp} and μ_{i},pred are the experimental and predicted viscosities and ‘n’ is the number of experiments.  
Results and Discussions  
Neural network modeling of viscosity  
Coding for the proposed ANN model was written by using Matlab R^{2}013a software wi^{th} the aim to minimize mean squared error (MSE) and maximize the correlation coefficient value (R^{2}). A low value of MSE and high value of R^{2} is preferred for the best model.  
Various numbers of hidden layers (5, 10 and 15) were checked for the performance of the neural network. The optimum number of hidden layers was found to be 10 which are depicted in the Table 1. Lowest value of MSE (0.0511) and highest value of R^{2} (0.9979) was obtained for 10 number of hidden layers and therefore considered as optimum.  
As discussed earlier, the experimental points from the literature [8] were taken and trained by ANN topology (3:10:1). The selected network consists of three input neurons namely PEG (% w/w), BSA (% w/w) and Temperature (°C), 10 hidden neurons and one output neuron (Viscosity). The connection between the various layers for the developed ANN is shown in Figure 1.  
“trainlm” training function was used in this work which updates weight and bias values according to LevenbergMarquardt (LM) optimization. It is considered to be one of the fastest back propagation algori^{th}ms available. The training set data was trained by adjusting the strength and connections between neurons wi^{th} an ob^{j}ective to fit the output of the entire network to be closer to the desired target and to minimize the performance function. The performance function used was MSE which measures the performance of the network according to the mean squared errors. The training process was halted once the goal was met (i.e., reaching a smallest value of MSE). In the present study, the training was stopped after 16 iterations (epochs). Figure 2 shows the trend of MSE during the process of training, testing and validation. It can be noticed from the figure that MSE value reaches a minimum value of 0.0129 for training. Literature suggests that lower the value of MSE, higher the accuracy of the model. The best value of MSE (0.052937) was obtained after 10 epochs which was depicted (dotted lines) in the figure.  
The optimal values of weights and biases of the ANN topology (3:10:1) based on LM algori^{th}m is shown in Table 2.  
In the Table 2, w_{1} and w_{2} are the input and hidden layer weight matrix, b_{1} and b_{2} are the biases of input and output layer respectively. The viscosity output values were predicted by using Eqn. 2 (data not shown) and the accuracy parameters were calculated.  
The results obtained during training, validation and testing are shown in the Figure 3. The solid 45° line on the predicted versus experimental viscosities reveals that there is a perfect superpose of predicted and experimental data on each other. This confirms that the proposed model is valid. For all the data set, the value of R^{2} was in acceptable range of greater than 0.99 (training: 0.9945, validation: 0.99647 Testing: 0.99697 and overall: 0.99790) which confirms a good agreement wi^{th} the experimental data.  
In addition to this, the prediction accuracy and degree of fitness were calculated (Equations 4 to 9) and shown in the Table 3.  
The accuracy parameters of the ANN model are lesser than the GrunbergNissan model which suggests the best fitting. The B_{f} factor (1.01) and A_{f} factor (1.02) are closer to unity for the ANN model, which indicates that the model is “failsafe” and shows a good concordance between the predicted and experimental values [19,20]. Because of these higher fitness and accuracy, the ANN models can be used to predict and model any linear or nonlinear system in a better way than compared to polynomial or empirical equations.  
Table 4 shows the comparison of AARD values of the literature [8] and the proposed ANN model. It is evident from the table that AARD_{ANN} > AARD_{Eqn} for all the temperatures studied. Moreover, the overall AARD based on GrunbergNissan model and ANN model was 6.39% and 1.71% respectively which confirms that ANN model has lesser error for viscosity prediction.  
In addition to these, the applicability of the ANN model for the generalized condition can be demonstrated by analyzing the absolute relative error (%) values for the proposed ANN model and Grunberg Nissan model. A sample data values were shown in the Table 5. The interesting thing to be noted here is that the adjustable parameter A was not taken into consideration during the ANN topology. Even then, the absolute relative error (%) values are very low when compared to GrunbergNissan model. This significantly corroborates the validity of the ANN model.  
Thence, the acceptable values of these parameters confirm that the developed ANN model defines the actual behavior of the system and the diversity of the model developed [18]. Therefore, the ANN model developed in the present study can be employed to predict the viscosity of PEGBSAWater ATPS wi^{th} varying compositions and different temperatures.  
Conclusions  
An ANN model for the prediction of viscosity of ATPS which consisted of PEGBSAwater was developed based on LM algori^{th}m. The topology of ANN model developed was 3:10:1. Various accuracy parameters were calculated and compared wi^{th} the existing empirical equation (GrunbergNissan model). The overall AARD (%) of the developed model was 1.71 which assures the validity of the model. A very high R^{2} value (>.99) and low MSE justifies the selected ANN model for the prediction of viscosities. Therefore, it is concluded that ANN can be used as a predictive tool to evaluate the viscosities of aqueous two phase systems containing protein.  
Acknowledgements  
The authors gratefully acknowledge the Department of Biotechnology, MIT, Manipal University for providing the facilities to carry out the research work.  
References  

Table 1  Table 2  Table 3  Table 4  Table 5 
Figure 1  Figure 2  Figure 3 