Prediction of Viscosities of Aqueous Two Phase Systems Containing Protein by Artificial Neural Network

The viscosities of aqueous two phase system containing bovine serum albumin (BSA) were predicted by artificial neural network (ANN) as a function of concentration of poly-ethylene-glycol (PEG), concentration of BSA and temperature. A three layer feed forward neural network based on Levenberg-Marquardt (LM) algorithm which consisted of three input neurons, 10 hidden neurons and one output neuron (3:10:1) was developed. The performance parameters were calculated and compared with the conventional Grunberg-Nissan empirical model. The satisfactory values suggest that the proposed ANN model has the capability of predicting viscosity in a better way than the conventional empirical model.


Introduction
Aqueous two phase system (ATPS) is a separation method which is largely used to purify various biomolecules in a single step [1]. It is a liquid-liquid extraction method which can be prepared by mixing aqueous solution of water-soluble polymer and salt or two watersoluble polymers with water [2]. ATPS can also be formulated by mixing a water-soluble polymer and protein with water. One of such systems has been demonstrated by Johansson [3] for the partitioning of lactate dehydrogenase and some mitochondrial enzymes.
Viscosity is a transport property and the magnitude determines the mass transfer between phases. The various inter-related properties such as size, shape, structure, degree of polymerization and polymer-solvent interactions can be obtained from the viscosity data [4]. Studying and predicting the physical property, viscosity is very much essential for the design and optimization of an ATPS in large scale system. These data are helpful in designing the extraction process in large scale (since large volumes of phase components have to be handled and separated) and for the development of various mathematical models which can predict the partitioning behavior of biomolecules in aqueous two phase system [5].
To a large extent, viscosity of an ATPS depends on the concentration of phase component compositions and temperature. Substantial cost is involved in performing experiments in the entire range of phase components to determine the viscosity of solutions. In this context, mathematical equations should be available in predicting the physical properties. There are few polynomial equations and empirical equations are available in literature as a function of solute concentration for different temperatures [6]. However, empirical equations suffer because of their validity in a narrow range of dependent variables. Moreover the relative errors and deviations of these empirical models between experimental values and calculated values will be very high.
One of such empirical equations for viscosity is Grunberg-Nissan empirical model [7] which was successfully used by Gunduz [8] for an ATPS, consisting of poly-ethylene-glycol (PEG)-Bovine Serum Albumin (BSA)-Water system. The model (Eqn. 1) can be written as follows: ln(η sys ) = a 1 ln(η PEG ) + a 2 ln(η BSA ) + w 1 w 2 A Where, A is an adjustable parameter which is the characteristic of the intermolecular interactions between PEG and BSA. a 1 and a 2 are weight fractions of PEG and BSA respectively and η PEG and η BSA are viscosities of aqueous PEG and BSA solutions respectively.
In recent years, an artificial intelligence technique, ANN has been widely used to model and predict linear and non-linear systems. It is a simplified artificial model of the biological neuron system which can provide the relationship between the input and output variables from the given data set. The artificial neuron simulates the basic functions of biological neurons [9].
The major advantage of ANN is its high learning capacity and predictive ability from the limited information fed to the system. Moreover, in contrast to conventional empirical model, ANN can accommodate multiple input and output variables and it has been proven by researchers that a well-trained ANN can predict various processes in a better way [10][11][12][13].
Recently, ANN has been used to predict viscosity of PEG solutions [14], density of ionic liquids [15] and vapor liquid equilibrium data of ionic liquids [16].
To the best of our knowledge, there are no reports available for predicting viscosities of aqueous two phase systems containing protein using artificial neural network. Therefore the objective of the current study is to model the experimental viscosity of the aforementioned ATPS by using ANN and compare with the Grunberg-Nissan empirical model.

Collection of viscosity data from literature
For the current study, the dynamic viscosity data set was chosen from the literature [8] which included the viscosity for the given concentration of PEG (4-11 % w/w), concentration of BSA (1-8 % w/w) and the process temperature (15-35 °C). A total of 35 experimental data were taken for the present study.

Neural network topology
Artificial Neural Network modeling codes were developed by using Matlab R2013a software. Normally, multilayers are used in ANN models and the simplest form of them is the three-layer models which contain input layer (I), one hidden layer (H) and output layer (O). Each layer consists of many nodes which are connected to each other by two coefficients known as weights (w) and biases (b). The commonly used feed forward multilayer neural network (FFMNN) was chosen for the present study. Choosing the number of neurons in the hidden layer and the transfer functions for hidden and output layer is crucial in developing an ANN model. For the present system, the number of neurons in hidden layer was chosen as 10 which was optimum.
Tangent and linear transfer functions were chosen as hidden and output layers respectively. The output of neuron is calculated by Where I j , O j and F are input of i th neuron, output of i th neuron and transfer function respectively.
In the present study, hyperbolic tangent transfer function and linear transfer functions have been employed for neurons located in hidden layer and output layers respectively. The input of each neuron (I j ) is calculated by the following equation with respect to outputs of previous layers (O j ), weights connecting i th neuron to the j th neuron (w ij ) and bias of the j th neuron (b j ) In order to optimize the weights and biases, Levenberg-Marquardt (LM) back propagation algorithm has been used, which is one of the most popular tools [17].
The data set was divided into three sets, namely, training set (70%), validation set (15%) and testing set (15%). Training set is used to train the network and the network is adjusted according to its error. Validation set is used to measure the network generalization and to stop the training process when generalization stops improving. The testing set is used to evaluate the performance of trained model against new "unseen" data and hence gives an independent measure of network performance during and after training. For the ANN model developed, the prediction accuracy and degree of fitness were evaluated by using the following factors namely, Mean Square Error (MSE), Root Mean Square Error (RMSE), Standard Error of Prediction (SEP), Average Absolute Relative Deviation (AARD) with Bias factor (B f ) and Accuracy factor (A f ) [18].
where μ i,exp and μ i,pred are the experimental and predicted viscosities and 'n' is the number of experiments.

Neural network modeling of viscosity
Coding for the proposed ANN model was written by using Matlab R2013a software with the aim to minimize mean squared error (MSE) and maximize the correlation coefficient value (R 2 ). A low value of MSE and high value of R 2 is preferred for the best model.
Various numbers of hidden layers (5, 10 and 15) were checked for the performance of the neural network. The optimum number of hidden layers was found to be 10 which are depicted in the Table 1. Lowest value of MSE (0.0511) and highest value of R 2 (0.9979) was obtained for 10 number of hidden layers and therefore considered as optimum.
As discussed earlier, the experimental points from the literature [8] were taken and trained by ANN topology (3:10:1). The selected network consists of three input neurons namely PEG (% w/w), BSA (% w/w) and Temperature (°C), 10 hidden neurons and one output neuron (Viscosity). The connection between the various layers for the developed ANN is shown in Figure 1.
"trainlm" training function was used in this work which updates weight and bias values according to Levenberg-Marquardt (LM) optimization. It is considered to be one of the fastest back propagation algorithms available. The training set data was trained by adjusting the strength and connections between neurons with an objective to fit the output of the entire network to be closer to the desired target and to minimize the performance function. The performance function used was MSE which measures the performance of the network according to the mean squared errors. The training process was halted once the goal was met (i.e., reaching a smallest value of MSE). In the present study, the training was stopped after 16 iterations (epochs). Figure 2 shows the trend of MSE during the process of training, testing and validation. It can be noticed from the figure that MSE value reaches a minimum value of 0.0129 for training. Literature suggests that lower the value of MSE, higher the accuracy of the model. The best value of MSE (0.052937) was obtained after 10 epochs which was depicted (dotted lines) in the figure.
The optimal values of weights and biases of the ANN topology (3:10:1) based on LM algorithm is shown in Table 2.
In the Table 2, w 1 and w 2 are the input and hidden layer weight matrix, b 1 and b 2 are the biases of input and output layer respectively. The viscosity output values were predicted by using Eqn. 2 (data not shown) and the accuracy parameters were calculated.
The results obtained during training, validation and testing are shown in the Figure 3. The solid 45° line on the predicted versus experimental viscosities reveals that there is a perfect superpose of predicted and experimental data on each other. This confirms that the proposed model is valid. For all the data set, the value of R 2 was in acceptable range of greater than 0.99 (training: 0.9945, validation: 0.99647 Testing: 0.99697 and overall: 0.99790) which confirms a good agreement with the experimental data.
In addition to this, the prediction accuracy and degree of fitness were calculated (Equations 4 to 9) and shown in the Table 3.
The accuracy parameters of the ANN model are lesser than the Grunberg-Nissan model which suggests the best fitting. The B f factor (1.01) and A f factor (1.02) are closer to unity for the ANN model, which indicates that the model is "fail-safe" and shows a good concordance between the predicted and experimental values [19,20]. Because of these higher fitness and accuracy, the ANN models can be used to predict and model any linear or non-linear system in a better way than compared to polynomial or empirical equations.       In addition to these, the applicability of the ANN model for the generalized condition can be demonstrated by analyzing the absolute relative error (%) values for the proposed ANN model and Grunberg-Nissan model. A sample data values were shown in the Table 5. The interesting thing to be noted here is that the adjustable parameter A was not taken into consideration during the ANN topology. Even then, the absolute relative error (%) values are very low when compared to Grunberg-Nissan model. This significantly corroborates the validity of the ANN model.
Thence, the acceptable values of these parameters confirm that the developed ANN model defines the actual behavior of the system and the diversity of the model developed [18]. Therefore, the ANN model developed in the present study can be employed to predict the viscosity of PEG-BSA-Water ATPS with varying compositions and different temperatures.

Conclusions
An ANN model for the prediction of viscosity of ATPS which consisted of PEG-BSA-water was developed based on LM algorithm. The topology of ANN model developed was 3:10:1. Various accuracy parameters were calculated and compared with the existing empirical equation (Grunberg-Nissan model). The overall AARD (%) of the developed model was 1.71 which assures the validity of the model. A very high R 2 value (>.99) and low MSE justifies the selected ANN model for the prediction of viscosities. Therefore, it is concluded that ANN can be used as a predictive tool to evaluate the viscosities of aqueous two phase systems containing protein. Overall AARD (%) 6.39 1.71