Ivan Bangov

Ivan Bangov

Kostantin Preslavski Shumen University, Bulgaria

Title: QSAR/QSPR investigations and cluster analysis based on descriptor fingerprints


Ivan Bangov is a professor in Kostantin Preslavski Shumen University, Bulgaria


Structural fingerprints have been widely used in the structural similarity in both QSAR and QSPR investigations. Recently a novel approach of descriptor fingerprints was proposed. A descriptor fingerprint consists of a defined number of descriptors. It is created as each descriptor takes an interval between an initial initVal value and an end endVal value being extracted from the whole set of studied objects having this descriptor, as well as with the precision (resolution) resVal it is measured. Hence the interval of each descriptor is divided in N=(initVal-endVal)/resVal subintervals (elements of the descriptor array), which further are concatenated into an array which forms the fingerprint of the object. In the process of the fingerprint formation the software determines in which element the descriptor value falls. This element takes the integer value of 1 while the other elements remain 0s. Thus, all the objects are characterized by descriptor fingerprints which are arrays of 1 and 0 elements. The QSAR/QSPR investigations and the cluster analysis of the objects are based on the pairwise similarity which is determined by the Tanimoto index. T=NC/(NA+NB-NC), here NA is the number of 1s of the first object A, NB is the number of 1s of the second object B and NC is the number of 1s being in the same positions within the two fingerprints. The value of this index is between 0 and 1.0. The closer is the value to 1.0 the more similar are the two objects. Accordingly, similar fingerprints with the highest Tanimoto criterion assume similar biological activity or similar physical, chemical or user properties. On the other hand the similarity between fingerprints allows a cluster analysis of the objects. The method of Butina was used in this work. The new method of descriptor fingerprints has been successfully applied to the perception and clustering of series of objects such as allergen and non-allergen food proteins, some toxic proteins biodiesels, esters and other objects.