Multivariate PCA Analysis Combined with Ward ’ s Clustering for Verification of Psychological Characterization in Visually and Acoustically Social Contexts

The development of the nervous system is a process of gene-environment interaction through prenatal and neonatal stages. The chemical environment may cause mal-development of the social brain network leading to developmental disorders. Thus, we need to develop a new behavioral method which distinguishes the subtle difference in early stage development of sensory-motor, socio-emotional, and cognitive processing in humans and animal models. Here, we have developed an innovative tool for signal processing to structuralize the behavioral elements from a large body of complex information via the multivariate analysis (principal components analysis combined with Ward’s clustering) using an animal model (domestic chicks) and found dynamical differences of combined behavioral parameters over social contexts between socially affiliated chicks and isolated chicks. We prepared two groups each containing six subjects, each group had its peer-social environments regulated, either isolated or grouped, and then their social interaction was tested through a series of social contexts consisting of four serial contexts, 1) firstly isolation, 2) acoustical interaction, 3) acoustical and visual meeting with the reference peers, and lastly 4) isolation again. Results show the juveniles’ social cognition and emotion was visualized as specific clustering structures with active and affective behavioral factors that were seen only in context three with unfamiliar peers visually and acoustically but were not seen in isolation or the social auditory without visual context. These results suggest the usefulness of the combined multivariate analysis under sensory cued-restricted contexts to quantitatively reveal a subtle difference of socio-emotional behavior.


Introduction
Increased diagnosis of developmental disorders such as autism is mostly due to the shift from categorical to spectrum views of autism, better recognition, and the inclusion of new subgroups [1]. Another possibility is gene-environment interaction through prenatal and neonatal stages. In fact, chemical environments have been the cause of severe mal-developments such as attention-deficit and hyperactive neurobehavioral characteristics; this can be via Organophosphate (OP) pesticides [2] or hypothyroidism induced by an anti-thyroid drug, propylthiouracil [3]. In order to evaluate the effect of chemical pollutants, we need to develop an innovative behavioral method which quantitatively distinguishes the subtle difference in early development of sensory-motor, emotional, and cognitive function in humans and animal models especially in the social domain because the core symptom of developmental disorders is social communication [1].
A core symptom of Autism Spectrum Disorders (ASD) and other social psychiatric illnesses is social disability. As a result neurobiological understanding of social development is crucial for diagnosis and therapeutic intervention in these disorders. Recent advances in brain imaging [4,5], genome studies [6], and neurobiology in animal models [7,8] have revealed neuronal substrates of the social brain network and a role for environmental factors in the inception and lifelong modulation of ASD [9]. Domestic chick (Gallus gallus) studies have contributed to the understanding of early developmental sensory-motor processes in face-recognition [7,10], attention towards biological motion [11], self-propelled agency [12], and call-recognition in kinship [13][14][15][16], which provide a developmental foundation for what later becomes the adult cortical brain network. It is however still uncertain how these sensory-motor modules may be expressed in a social context as a behavioral phenotype of the over development of the integrated neuronal substrates.
We have previously suggested in the social development model of chicks that social behaviors are learnt in their neonatal social environment with similar aged peers [17]. A multivariate correlation method, Principal Components Analysis (PCA) was successfully applied to our marmoset studies [18,19] and also contributed to visually identify the chicks' acquired social memory [17]. At this next stage we have established the social affiliation learning model together with the socially deprived model. Since hatching the chicks have either been raised in a group (Grp) or socially isolated (Iso). Their behaviors have been quantitatively evaluated in a series of sensory-cued restricted contexts via multivariate analysis based on PCA and Ward's agglomerative hierarchical clustering [20]. Ward's method compiles data objects into clusters and then those clusters into larger cluster recursively leading to a dendrogram, described in the terms of a Lance & Williams updating formula in R [20,21]. Ward's method has been successfully used in the analysis of ASD subtypes [22], genetic patternssubcortical volumetric map correlation [23], the social interaction of chimpanzees [24], and adult Zebra fish behavior [25]. In this study, we used the inner product of factor loading vectors derived by PCA of behavior parameters and could therefore identify the transition in the correlational structure of behavior along sensory-cued restricted contexts, which was found to be sharply different between Grp and Iso chicks in a particular sensory context.

Animals
This experimental protocol was approved by the Ethics review Committee for Animal Experiments of Tokyo University of Agriculture and Technology, TUAT  that follows the animal care and experimental guidelines of Japan Neuroscience Society and NIH, in USA.
Fertilized eggs from domestic chicks (Gallus gallus domestics), White Leghorn, Maria strain, were purchased from a local breeder, GHEN Corporation (Gifu, Japan). They were kept in a dark incubator (Showa Furanki) at 37.7 degrees centigrade with approximately 50% humidity and automatic rolling every hour. On embryonic day 21 (E21, the start of incubation was defined as E 1, which usually coincided with one day before hatching (the day of hatching was defined as post natal day 1, P1), five to fifteen eggs were moved to incubator-boxes as Grp raised condition ((width) × (depth) × (height) in mm: 450 × 450 × 450) surrounded by opaque walls to avoid communication between the other sets of birds with the air circulation. Iso raised condition, each chick was isolated in a sound attenuated incubator surrounded opaque walls with the air circulation (230-270 × 250-270 × 220-300 in mm). Iso birds were regarded as peer-social deficit models without any visual or auditory interaction. To minimize compared conditions, we observed only males. Each incubator box was kept at around 30 plus-minus 2 degrees centigrade, with a light bulb (10 watt). Constant lighting was maintained from E21 until P3, after which a 12h dark-light cycle was set. Chicks were transferred using a small opaque container during daily incubator cleaning and feeding the optimized volume of experimental food and distilled water at 9-10 am.
We regulated each subject to one of two rearing conditions, Iso (subject number: 3 male and 3 female), and Grp (3 male and 3 female) from embryonic E21 through P13. We avoided social interaction with chicks, since handling can induce stress or affiliation effects on chicks [8,26].

Behavioral test
A behavioral test per subject was once conducted on P13 in the different sound proof room from the raised room. The subject chick was put into one of the two transparent metal net boxes covered with plastic sheets (Figure 1a) (250 (w) × 250 (d) × 250 (h) mm) and the three unfamiliar reference grouped chicks (of both genders) into another. The two boxes were placed adjacent to each other, separated only by a masking board. The box containing the subject chick was covered with a transparent acrylic board, in order to reduce olfactory cues from the reference birds and researchers, and also to increase the specificity of call recordings from subject birds through a microphone in the test box. Subject chicks underwent the following four serial peersocial contexts: context 1; initial isolation period with no reference chicks, and a masking board in place for 60 seconds, context 2; presented with acoustic only cues, ensured by a separation board for 60 seconds, context 3; the reference chicks were presented with both visual and acoustic cues for 120 seconds, after removing the separation board, context 4 (final); second period of isolation for 60 second. All behavior was recorded using a top video camera (SONY handycam) with an external microphone in the test box.

Behavioral analysis
The recorded WMV files were transferred into WAVE and JPEG files per second using TMPGEnc-4.0XPress software (Pegasys Inc., Tokyo, Japan). The subject's behavior in context 3 was analyzed for this study. In the 13 parameters of Figure 1d, the x, y coordinate of head center and forehead (in most cases, beak head) position (shown at context 3 in Figure1a) were sampled and used to calculate horizontal velocity (V), face horizontal angle, its rotation velocity (delta Theta) and local preference to define social distance (four equally divided areas, Figure  1b) by Excel (Microsoft, USA). Freezing (Freeze) was defined as V less than 25 mm/second for more than 5 seconds. The parameter "facing to peers" defined as beak angle to adjacent cage within 45 degrees (beakto-separating wall) refers to subject chick's preferred head position towards the reference chicks. We further defined pecking behavior, floor (pk-floor) and cage wall (pk-wall) expressed as frequency of pecking behavior per specified time. We used free software (Syrinx, version 2.4i) distributed by Dr. John Burt (University of Washington) to analyze chick calls. Spectrograms were used to identify three call types ( Figure 1c; d-(distress), dj-(transient), and trill-calls) by analyzing morphology of the first component. Except trill-call, we defined the two call types based on the frequency change of the first component over time, as follows: First, we read the first time point (t1) where the frequency was maximum/minimum (f1), and the second time point (t2) where the frequency minimum/maximum (f2). Next, we calculated (delta) f=(f2-f1). Finally, if (delta) f<-2, we designated this call as a D-call. If (delta) f>-2, as a dj-call.

Heat map visualization
Normalized data from each behavioral parameter (Figure 1d) per subject was visually identified by gray-scale density from top (high score) to bottom (low score) (Figure 2), and within each condition (the rearing condition and the context), the subjects (n=6) were aligned sequentially in the order of V (1)-left to right, from the lowest to highest scores.

Principal components analysis
The Principal Components Analysis (PCA) was performed with all the data used in the heat map method (the total subjects and parameter numbers were 48 and 13, respectively) using free software R. In order to integrate 13 behavioral parameters by BOUQUET [23,24], we used PCA based on a correlation matrix. The calculated PCA scores were plotted in a two dimensional (2D) plane defined by the first and second PCA components because the most significant difference between Grp and Iso over peer-meeting contexts was context three and two with the first component and second component, respectively via oneway ANOVA. To compare behavioral patterns among the different groups, we applied the second PCA to each group of data and fitted the covariance as an ellipse, where the center of the approximation ellipse was the average of the plots, the long axis was derived from the first component eigenvector multiplied by the eigenvalue squared and the short axis from the second component one of sub-PCA, using the variance-covariance matrix (Figure 3c). To visualize the contextual modulation over four contexts, we overlaid four contexts PCA results in a 3-D space such that the bottom plane (s-y plane) was the plane of the first and second components of PCA and the vertical axis (z) was the context order from the first and fourth context. The approximated ellipses and connected lines between the ellipses centers were also illustrated ( Figure 3a). Note that PCA was conducted with all context data (both the grouped and isolated chicks) and each context PCA result plotted on the same PCA coordinate to enable comparison.

Clustering analysis
The aggregative hierarchical clustering with Ward's method based on inner products between Factor Loading vectors (FLv) to principal components whose cumulative proportion of variance was more than 80 % in each four context was performed using free software R. The calculation method is as follows. Euclidean distance between factor i and j was defined as twice the value of 1 minus inner product between FLv i and j. This convert represents no correlation between FLv i and j when the Euclidean distance is 2 because the inner product is 0, which means the inner angle of FLv i and j is vertical. Distances between clusters are recomputed by the Lance-Williams dissimilarity update formula. We used both FLv and inverse FLv (iFLv) to reflect the positive and negative correlation between factors. The defined functions calculate a pair of optimized dendrograms by using either FLv or iFLv as three following R, sample codes are shown below.
First, the code1 define function1, "Clust.Vect", which calculates hierarchical clustering analysis with the status "method" based on inner product matrix of given vectors. Second, the Code 2 define R source code of function2, "Clust.FL", which calculates FLv and iFLv from given data matrix of principal components, then hierarchical clustering analysis by using function1, "Clust.Vect". If status "type" is set as 1, the function "Clust.FL" returns the dendrogram with FLv and iFLv. As a default status, when "type" is 2, the function returns a pair of dendrograms based on the optimized set of FLv and iFLv.

Statistics
Statistical analysis was performed using R for Student's T-test with both side (Figure 2), and Wilks' lambda ( Figure 3).

Results
The behavior feature of each subject (the grouped or isolated chick) under the specified context was illustrated as a heat map according to the reference scale ( Figure 2). The overview of these heat map patterns implied context-dependent modulation. Particularly, Iso showed the highest intensity of depressive factors 4 (Freeze) and 5 (LP-C) whereas Grp showed generally a higher expression of active or socially motivated factors 1 (V), 2 (delta Theta) and 3 (facing to peers). The pattern modulation seemed gradual from context 1 (isolation), 2 (an acoustic social cue: V-A+) and 3 (a visual and acoustic cue: V+A+) then were altered at context 4 (isolation again). The most different patterns between Grp and Iso were significantly confirmed in these factors 1-5 within context 3.
Next, we examined the correlation of the behavior parameters using principal component analysis (PCA) (Figure 3). To verify our multivariate correlation analysis for the contextual pattern modulation of social behaviors in two different groups of social characteristics, we visually represented each data plot distribution with an ellipse approximation per context in the 3 dimensional graphs. Its horizontal plane x-y consists of the first and second PCA components and vertical axis z is context (Figure 3a, see more detail in Materials and Methods) with each contextual plane shown in Figure 3c. The trajectory lines connected between the average center of either Grp or Iso gradually dissociated each other from context 1 to 3 and then crossed in context 4. The factor loading positive vectors of Figure 3b suggested that the most contributed factors were 1 (V), 2 (delta Theta) and 4 (Freeze) such that the grouped chicks behaved with high V and delta Theta, whereas the isolated chicks behaved with high Freeze. The statistical difference at each context confirmed, via Wilks' lambda (Figure 3e), that the difference in context two was significant and became most significant in context three, then disappeared in context 4.
Finally, we attempted to run our R based clustering analysis software utilizing the inner product of the factor loading vectors as statistical distance (see Materials and Methods) and resulted in Figure 3d. Given that the two vectors were highly correlated, their inner product reaches one (maximum), thus the statistical distance becomes zero (closest distance by our definition) which is equivalent to high clustering in the dendrogram. The previously described set of factor, 1(V), 2 (delta Theta) and 4 (Freeze: note the black triangle that means negative correlation) were closely clustered with each other in all contexts. On the other hand, factor 3 (facing to peers) and 6 (LP-G, Grouping area) were closely clustered only in context 3. This multiple correlation structure, i.e., more active 1(V), 2(delta Theta) and socially motivated 3(facing to peers) and 6 (LP-G, Grouping area, i.e., close social distance) in Grp than Iso was shown only in the visual and acoustic social interaction with unfamiliar peers. In addition, this multivariate analysis also expressed the detailed difference of the same isolation but on different terms, context 1 and 4 as the closest factor to the set of factors 1, 2 and 4 was 10 (pecking at wall) and 12 (dj-call), respectively.

Discussion
In this study, the behavioral development of chicks was quantitatively compared between chicks reared as a group and socially isolated chicks within the meeting context. Grp chick approached and focused visual attention to reference chicks, while Iso chick escaped or froze in the visual and acoustic cues context (context three). The meeting test was performed at postnatal day (P) 13, when Grp chicks had fully developed social affiliation to conspecific chicks through predisposition and visual and acoustic imprinting [7,8,10,[26][27][28][29][30][31].
In the first context (context 1), a novel cage without the reference cage, both Grp and Iso chicks behaved similarly. This was contrasting to the sharp behavioral difference in context three. This result suggested that no experience of social interaction until P13 led the Iso chicks to freeze or escape. Interestingly, Iso chicks showed active locomotion in context four; in contrast Grp chicks froze and stayed in the center of the test cage, suggesting that Iso chicks in context 4 were relieved from anxiety or stress in a novel cage (context 1) [32] and unfamiliar animates (context 3).
The shift of these context dependent behavioral patterns was associated with the correlation structure of behavior parameters, i.e., five behavior parameters including two parameters relating to social affiliation (LP-G and |θ|<45ºC) had formed a cluster in the Ward's dendrogram. This result suggested that both of Grp and Iso chicks expressed most integrated behavior phenotype in the visual and acoustic cues context (context three). Even in context four, the correlation between Local Preference (LP-C) and visual attention parameter (facing to peers, |θ|<45ºC) was kept, suggesting that the emotional state in the previous context (context three) had been sustained. Taken together, PCA simplified the behavior parameter correlation in a featured space and Ward's analysis based on the factor loading vectors derived via the PCA further characterized quantitatively similarity among the behavior parameters such as a clustering structure. Thus, the combination of PCA and Ward's analysis allows a subtle difference of socio-emotional attitude, acquired through a neonatal social environment.
In the literature, the quantitative analysis of behavior phenotype has been conducted by statistical analysis based on behavior parameter correlation, for instance, time series analysis of the changing behavior correlation [25] and the analysis of complex behavior parameters by observers with high expertise [33]. Our multivariate analysis could subjectively and quantitatively extract behavior pattern shift according to a series of social context and be applicable to marmosets [18,19] and ASD children [34] by modification of the social context. Starting from obtained behavior correlation structure, we could propose a working hypothesis concerning neuronal network model, compatible with neurobiological data, thus predicting the development and severity of the mental state of research subjects.