Scanning QTLs for Grain Shape Using a Whole Genome SNP Array in Rice

Rice is one of the most vital food crops in the world, and is considered a model plant for genomics and genetics of cereals [1,2]. Asia cultivates 90% of all rice and contains 60% of the world population. It is forecasted that 40% more rice will be required to satisfy rice consumers [3]. Rice grain shape (grain size) is determined by its three dimensions including grain length (GL), grain width (GW) and grain thickness (GT). GL and GW in rice have high heritability. GT is largely determined by grain filling that it is affected by environmental factors such as temperature after pollination. Thus, the heritability of GT is relatively low. Rice grain shape is frequently associated with 1000-grain weight (KW), one component of grain yield [4]. However, grain shape is an important appearance quality trait and affects cooking quality [5,6].


Introduction
Rice is one of the most vital food crops in the world, and is considered a model plant for genomics and genetics of cereals [1,2]. Asia cultivates 90% of all rice and contains 60% of the world population. It is forecasted that 40% more rice will be required to satisfy rice consumers [3]. Rice grain shape (grain size) is determined by its three dimensions including grain length (GL), grain width (GW) and grain thickness (GT). GL and GW in rice have high heritability. GT is largely determined by grain filling that it is affected by environmental factors such as temperature after pollination. Thus, the heritability of GT is relatively low. Rice grain shape is frequently associated with 1000-grain weight (KW), one component of grain yield [4]. However, grain shape is an important appearance quality trait and affects cooking quality [5,6].
Grain shape is regarded as a typical quantitative trait [7,8]. The genetic basis of grain shape has been well studied since the 1990s [9][10][11][12]. Hundreds of quantitative trait loci (QTLs) underlying grain shapes have been detected using several types of populations [13][14][15][16][17]. Many QTLs have been fine mapped, such as qGL3, qGL3.1, qGL7, qGL7-2 and qGW8.1 [13,[18][19][20][21]. Near isogenic lines (NILs) have made great contributions to QTL fine mapping and cloning. Besides the conventional backcrossing strategy for developing NILs, an alternative procedure utilizes a selfing and selection scheme. NILs are selected from an inbred line that is not entirely homozygous with the approach. Progeny of this line will segregate for those loci not yet fixed and will represent a heterogeneous inbred family (HIF) of nearly-isogenic individuals [22]. In general, it is easy to get a few HIFs for major QTLs in the F6 and F7 population consisting of 200 individuals [23].
In recent years, much progress has been made in isolation of QTLs for grain shape. Some genes controlling grain shape have been cloned, such as GS3, GS5, GW2, GW8, qSW5 and GW5 [4,[23][24][25][26][27][28]. GS3 was the first cloned major QTL that largely controlled GL and regulated grain weight [4]. GS3 consists of five exons and encodes 232 amino acids (aa) with a putative PEBP-like domain. A C-A mutation in the second exon occurring between the short-grain (TGC) and the long-grain (TGA), changes the cysteine codon to a termination codon, which results in a 178-aa truncation in the C-terminus of the predicted protein [4,24]. GW2, a major QTL for GW, encodes a RING-type protein with E3 ubiquitin ligase activity. The ligase can function in degradation through the ubiquitin-proteasome pathway in the cytoplasm and negatively regulates cell division [26]. The major GW QTL, qSW5/GW5, has a 1212-bp deletion associated with the increased grain width. The deletion of qSW5 can result in an increase in sink size owing to an increase in cell numbers in the outer glume of the rice flower [25]. GW5 probably functions like GW2 that is, regulating cell division through the ubiquitin-proteasome pathway during seed development [28]. GS5 is a minor QTL controlling GW and grain filling. It encodes a putative serine carboxypeptidase and functions as a positive modulator upstream of cell cycle genes. Evidence shows that its overexpression may result in an increase in cell numbers by promoting mitotic division [23]. Most recently, GW8, a positive regulator of GW, was cloned. It encodes a protein regulating cell proliferation [27].
Although several genes controlling grain shape have been cloned, it is not sufficient to elucidate the genetic basis of a large variation of grain shape in the nature. It is necessary to identify new QTLs for grain shape in order to design breeding for grain shape improvement. A population derived from a cross between two genetically distant genotypes is recommended for QTL analysis. Especially for the previously reported grain shape QTL hotspot regions on chromosomes 3 and 5, where many QTLs were harbored [29,9,18,10,11,30,20], we should assure how many genes there function in regulating the target traits, which is helpful to make design breeding for rice grain shape genetic improvement. A high-density linkage map is a powerful tool to scan QTLs [16,[31][32][33]. Due to its abundance and even distribution throughout a genome, a single nucleotide polymorphism (SNP) is considered to be the most desirable molecular marker for constructing ultra-high-density genetic linkage maps [34,35]. Whole-genome sequencing and oligonucleotide

Grain shape variation of the RIL population
The RILs showed tremendous transgressive segregation for all traits studied. The parents, ZS97 and XZ2 had no significant differences in GL, GW, GT and grain weight (Table 1). However, there were large variations of grain shape in the RILs in both years. The phenotypic values of all the investigated traits in both years had very small values of skewness and kurtosis, indicated their normal distributions were suitable for QTL analysis (Table 1). Transgressive segregation was observed for all four traits in two directions. The mean values of the RIL population were very close to the parental phenotypic values. For example, GL was in the range of 6.91-9.61 mm with a mean of 8.21 mm in the RILs for 2011, which was similar to values for the two parents of 8.29 mm for ZS97 and 8.12 mm for XZ2.

Heritability and correlation
The broad heritability of the four traits greatly varied within 60.6-96.1% (Table 1). KW had the highest heritability of 96.1%. GL and GW had medium broad heritability > 85%. GT had the lowest heritability of 60.6%. KW was positively correlated with all the traits investigated, which was consistent with KW being the product of the three grainshape traits. GW and GT were positively correlated, while GW and GL had negative correlations.

Genotypic constitution of the RIL population
Of the 5102 SNP loci in the RICE6K array, 65% showed polymorphism between parents. Among 197 RILs, 1495 bins were identified on the basis of 5102 SNP sites [11]. At the population level, 2.9% of SNPs showed heterozygosity among all readable genotypes (Table 3). At the level of individual lines, the heterozygous rate at SNP sites was in the range of 2.2-16.6% with an average of 2.9% and the heterozygous rate at genetic distance was 0-28.1%, with an average of 3.4% -more than its expected value of 1.6%. At the bin level, 1-7 lines were heterozygotes for each bin. On average, each line had about 51 heterozygous bins.

QTLs for grain shape
Grain length: A total of seven QTLs were identified for GL in the population. Five QTLs were detected in 2011 and 2012, respectively. Of them, three major QTLs, qGL1, qGL2 and qGL5 were detected in  microarrays are the two main methods for genotyping SNP markers [36,37].
In the present study, we used a SNP array to genotype a recombinant inbred line (RIL) population between indica rice Zhenshan 97 (ZS97) and japonica rice Xizang 2 (XZ2) a Tibetan cultivar. A high-density SNP genetic linkage map was developed and used for mapping QTLs for grain shape with the RILs. A series of QTLs were identified and new QTLs were found for GW and GT.

Plant materials and field experiment
A RIL population, consisted of 197 lines, was developed from a cross between ZS97 (Oryza sativa L. ssp. indica) and XZ2 (O. sativa L. ssp. japonica) using the single seed descent method. The field experiments were conducted following a randomized complete block design with two replicates at Huazhong Agricultural University experimental farm in Wuhan, China. Generations F 7 and F 8 of the RILs and the two parents ZS97 and XZ2 were planted in the 2011 and 2012 rice growing seasons ( Table 2). Seven plants for each RIL were transplanted into a one-row plot, with a distance of 16.5 cm between plants within a row and 26.5 cm between rows. Field management essentially followed normal rice production practices for the area, with fertilizer applied (per hectare) as follows: 48.75 kg N, 58.5 kg P and 93.75 kg K as the basal fertilizer; 86.25 kg N at the tillering stage; 27.6 kg N at the booting stage. The five plants in the middle of each row were individually harvested for grain shape and KW measurement.

Trait measurement
From the 197 RIL lines, 153 with normally filled grain were selected for trait measurement. Twenty fully-filled grains were randomly chosen from the bulked grains of five plants to measure trait values for each RIL. GL was estimated twice by placing 10 grains end-to-end in a straight line along a ruler. These 20 seeds were individually measured for GW and GT using an electronic digital caliper (Guanglu Measuring Instrument Co. Ltd., China) with a precision of 0.01 mm. The averaged GL, GW and GT values of 20 grains were used as the trait values of that line for data analysis. KW was calculated as the grain weight per plant divided by its grain number multiplied by 1000 [11,16]. The trait values averaged across the two replicates within each year were used as the input data for QTL analysis.

Analysis method
Genotyping data by hybridizing F 7 DNA with RICE6K SNP array was conducted by R software http://www.r-project.org/ and a highdensity genetic map consisting of 1495 bins was constructed. To scan QTLs, the composite interval mapping (CIM) method was adopted in the R package r/qtl. There were 1000 permutation tests made for LOD threshold values claiming QTLs, resulting in a LOD > 2.5 at genomewide level of p=0.05. Three covariates were added in the CIM model and 'window' was set as 10. The confidence region of a detected QTL was confined by the function 'lodint' and was set as the region decreasing 1 LOD value around the peak. The explained phenotypic variation and additive effect were calculated via linear regression analysis 'lm' , the model was: y = µ + bx + ε, where y is the phenotype in analysis, and x is the genotype of the SNP site nearest to the identified QTL. X was set at -0.5 when the genotype was from ZS97 and 0.5 otherwise. The heritability (h 2 ) was calculated by the formula: h 2 = (V R -V E )/V R . Where V R is the variation in the RIL population, and V E is the environment variation obtained by V E = (V ZS97 + V XZ2 )/2. both years. The QTLs individually explained 5.3-12.6% of phenotypic variance. XZ2 alleles at qGL3, qGL5 and qGL12 enhanced GL, while ZS97 alleles at the other four QTLs increased GL.
Grain width: Six QTLs were detected for GW on chromosomes 2, 3, 5, 6 and 7 (Table 4). Of them, four QTLs were detected in 2011 and 2012, respectively. Both qGW5 and qGW6 were identified in both years. These QTLs individually explained 6.5-23.3% of the phenotypic variance. qGW5 has the largest additive effect -it explained 22.9 and 23.3% of GW variance in both years with positive additive effects of ZS97 alleles, while ZS97 alleles had negative effects at qGW6.

Grain thickness:
A total of four QTLs were detected on chromosomes 1, 3, 5 and 6, contributing 11.5-29.1% of the GT variation -all identified in 2012. Two other QTLs, qGT1 and qGT3, were identified in 2011. The alleles of XZ2 increased GT at qGT3 and qGT6, while ZS97 alleles decreased GT at qGT1 and qGT5. qGT5 explained about 20% of trait variation in both years.

1000-grain weight:
Four of the total of five QTLs was detected in both years, except for qKW1, which was detected only in 2011. The five QTLs individually explained 3.5-12.5% of the phenotypic variance. Either ZS97 or XZ2 alleles increased KW at some QTLs. qKW5 or qKW6 had comparatively large effects. The regions harboring qKW5 and qKW6 also contained QTLs for GT and GW.

Heterogeneous inbred families for major QTLs:
The progeny of a HIF constitutes a near isogenic F 2 population. In order to quickly obtain near isogenic lines for all detected QTLs, we looked into the genome constitution of the QTL regions. HIFs for all the 14 bin intervals containing these detected 22 QTLs were found, with 1-6 HIFs for each QTL region (Table 5). Six HIFs each were searched in the RIL population for the three QTL regions of qGL2, qGL10-2 and qGW3-2. Four and five HIFs were found in the QTL hotspot regions on chromosomes 5 and 6 with effects on GT, GW and KW.

Novel QTLs in this study
In this study, a total of 22 QTLs were identified for grain shape. We compared the detected QTLs with previous studies (http://www. gramene.org/). Although qGL3 was repeatedly mapped on chromosome 3, it was distinguished from the cloned QTL GS3 and GL3 [4,24,21] by comparing the physical positions of bin interval in which qGL3 is located. In addition, no genotyping difference was detected between ZS97 and XZ2 by GS3 functional marker SF28, in which both parents carried GS3-functional allele (data not shown). qGT6/qGW6 was a major QTL in the bin interval of E29-E32, where no GT and GW QTL was previously reported (http://www.gramene.org/). Thus they are likely novel QTLs. qGT5/qGW5 was located in the qSW5 region. qSW5 showed large effects on GW and KW [23]. Hence, we believe qGT5/qGW5 is allelic to qSW5. Similarly, we suggested that qGL5, qGT1 and qGT3 were new QTLs. These newly identified QTLs would be informative for molecular marker-aided selection. The HIFs in the target QTL regions would be ideal materials for QTL validation and fine mapping [22].

Pleiotropic QTLs
In the present study, six chromosomal regions on chromosomes 1, 2, 3, 5 and 6 had pleiotropic effects on two or more traits. The QTLs of qGW5, qGT5 and qKW5 were located in the neighboring regions on chromosome 5. Moreover their confidence intervals largely overlapped. Thus, there may be a pleiotropic QTL -indeed qSW5 had pleiotropic effects on GW, GT and KW. Thus, we propose that one pleiotropic QTL on chromosome 5 controls GT, GW and KW in this population. Accordingly, the QTLs of qGT6, qGW6 and qKW6 are regarded as one pleiotropic QTL in the region of F45-F69. In addition, the couples of qGL1/qKW1, qGL3/qKW3 and qGW3-1/qGT3 would be pleiotropic QTLs because they shared overlapping confidence intervals and had the same directional additive effects on the traits that were positively correlated. The LOD peaks of qGL2 and qGW2 pointed to the same tiny region, they had the opposite directional additive effects on the traits that did not significantly correlate each other, but negative correlation coefficients were detected between GL and GW in both years. They were likely one pleiotropic QTL too. Thus the pleiotropic QTLs provided good evidence for correlations among traits in the population.

Complementary gene action is the genetic basis of transgressive segregation
The parents in the mapping population did not show significant differences in GL, GW, GT and KW. Many QTLs were identified, which indicated that the genetic constitutions differed between the parents. In addition, all traits showed clear transgressive segregation in both directions. It is expected that favorable alleles for grain shape are sparsely distributed in these two parents. In fact, both ZS97 and XZ2 alleles at QTLs had positive and negative additive effects on all investigated traits. Thus pyramiding most positive or negative alleles in a line resulted in much larger or smaller trait values than for the parents, such as RIL100 with the longest GL carried seven positive alleles and RIL007 with the shortest GL carried only one positive allele ( Figure 1). Complementary action of additive QTLs could well explain the genetic basis underlying transgressive segregation. A similar result was also reported in another population [38].  The data in the parenthesis indicated the average of the RIL population)  In summary, we detected 22 QTLs for grain shape in the present study. Of them, six were probably novel. Pleiotropic QTLs were the major genetic basis of correlations among traits. HIFs for all QTLs were found and would be good materials for QTL validation. These findings will enrich the understanding of grain shape in rice [39].  Interval was given in the format of chromosome: genetic distance. Peak (cM) means the genetic position of LOD peak. A positive additive means Xizang alleles increased phenotypic values, and a negative additive means Zhenshan 97 alleles decreased phenotypic values. Var proportional variance explained by QTL