Yoshimasa Takahashi

Toyohashi University of Technology, Japan

Title: Active QSAR modeling for environmental toxicity prediction by partial least squares


Yoshimasa Takahashi received his Ph.D. in chemometrics at Kyoto University in 1984. He was awarded the Niwa Memorial Award “For studies on information management and computer-aided design system for chemical research” in 1988, presented by Japan Information Center of Science and Technology (JICST). He spent 1997-1998 at Prof. Peter Willett Lab. (University of Sheffield, UK) as a visiting researcher funded by Ministry of Education Japan. He has been a Professor of Molecular Information Engineering since 2001 at Toyohashi University of Technology. He was also a past chair of Division of Structure-Activity Studies, Pharmaceutical Society of Japan. His current research interest center on intelligent information processing based on structural similarity.


QSAR models obtained from a data set that consists of structurally diverse compounds often give us poor results for the prediction. In the previous work, we proposed a technique of active QSAR Modeling that is based on active sampling of a temporary training set. In the method, structurally similar compounds are explored and collected as a training set to make a local model around the query. The result suggested that the approach would often give us better prediction performance than that obtained by the ordinal QSAR modeling. In this paper, we applied the PLS method to QSAR modeling for fish toxicity prediction. We used Topological Fragment Spectra (TFS) to describe structural features of individual compounds. TFS is a digitization of the chemical structure information described in a multidimensional numerical vector. We used a dataset of fish 96h-LC50 for 330 chemicals. The toxicity data were taken from the results of Eco-toxicity tests by Ministry of the Environment, Japan. Those toxicity were converted from units of milligrams per litre to moles per litre (mol/L) and then to the corresponding logarithmic values. The TFS-based PLS model obtained with a single latent variable gave us an approximation of R=0.931, R2=0.866, RMSE=0.341 to the experimental values. But, leave-one-out testing for the data set resulted with the RMSE=0.886, unfortunately.