What do Semen Parameters Mean? How to Define a Normal Semen Analysis

Although abnormal semen parameters are generally considered to be poor predictors of fertility, Semen Analysis (SA) parameters are often used as a screening test to detect male factor infertility. Certainly, men with azoospermia or severe oligozoospermia have markedly impaired fertility potential [1-3]. Since men with abnormal fertility could undergo various medical or surgical treatments to improve semen parameters and fertility, it is clinically relevant to have a test for infertility. According to the American Society for Reproductive Medicine and American Urological Association guidelines, semen analysis and male reproductive history are considered the primary screening evaluation for male fertility potential [3,4]. So, published guidelines for normal semen parameters have substantial clinical relevance. The World Health Organization (WHO) published an updated 5th edition of their reference values for human semen characteristics in 2010 [5,6]. This version presented substantially different numbers for the definition of abnormal semen parameters when compared to prior manuals. Application of these new reference values would result in reclassifying a substantial number of men as not having male factor infertility, and limit couples’ fertility options. Controversy over the changes is heated; some andrology laboratories have elected to ignore the new reference semen parameter values, and instead apply older standards or even arbitrary SA reference values because of concerns about the validity of these numbers [7].


Introduction
Although abnormal semen parameters are generally considered to be poor predictors of fertility, Semen Analysis (SA) parameters are often used as a screening test to detect male factor infertility. Certainly, men with azoospermia or severe oligozoospermia have markedly impaired fertility potential [1][2][3]. Since men with abnormal fertility could undergo various medical or surgical treatments to improve semen parameters and fertility, it is clinically relevant to have a test for infertility. According to the American Society for Reproductive Medicine and American Urological Association guidelines, semen analysis and male reproductive history are considered the primary screening evaluation for male fertility potential [3,4]. So, published guidelines for normal semen parameters have substantial clinical relevance. The World Health Organization (WHO) published an updated 5 th edition of their reference values for human semen characteristics in 2010 [5,6]. This version presented substantially different numbers for the definition of abnormal semen parameters when compared to prior manuals. Application of these new reference values would result in reclassifying a substantial number of men as not having male factor infertility, and limit couples' fertility options. Controversy over the changes is heated; some andrology laboratories have elected to ignore the new reference semen parameter values, and instead apply older standards or even arbitrary SA reference values because of concerns about the validity of these numbers [7].

Defining Semen Analysis Reference Limits
One would expect that the identified prevalence of abnormal semen parameters would approximate the known prevalence of clinical infertility due to male factor. Most observational studies identify the need for medical consultation for infertility in approximately 15% of all couples. Since not all couples will seek medical evaluation for a condition, it is likely that the prevalence of impaired fertility is actually greater than 15% of couples. In addition, it is known that about 50% of the 15% of couples struggling with infertility will have a male factor contributing to difficulty achieving a pregnancy [8]. Therefore, one would expect that at least the lower 7.5% of semen parameters established from a pool of men who represent the general population would be associated with male infertility. This is, unfortunately, not seen with current reference limits.
The WHO 2010 reference limits for semen analysis were determined by assessing SA data from men who had a known history of recent paternity. A "normal" parameter was arbitrarily defined as any value that fell above the 5 th centile; anything below the 5 th centile would be labeled as "abnormal." This is a remarkable decision because the WHO 5 th edition only included SA values from men who had fathered at least one child after ≤12 months of trying. There is no justification for drawing any reference limit cut-off for semen parameters relevant to infertility from this population of men. No matter what cut-point is selected, 100% of men below that cut-point were actually fertile in this population. Every man below the 5 th centile in this study fathered a child in less than 12 months, yet clinicians are expected to use these parameters to identify "male factor" in infertile couples. Clearly the decision to use this population of men and the reference limits they provide needs to be reconsidered. Furthermore, the "normal values" proposed in the most recent 2010 WHO report was discerned from a small number of men (n=1953) who each provided a single semen sample. Not all results from the 1,953 specimens were used for determining every "normal" semen parameters; in some instances (for example, morphology and vitality), a subset of specimens were used. Remarkably, there are no studies from Asia, Africa, the Middle East, or Latin America, and this likely limits the ethnic diversity of the subjects included. Furthermore, there are only a handful of investigators and labs that were used to determine the SA results. This limitation suggests the introduction of bias on behalf of the laboratories, especially for subjective semen parameter evaluations such as morphology. It also calls into question the broad applicability of these values to men outside of Paris, Turku, Edinburgh, and Copenhagen (55% of the data used came from those four cities) 9 . In addition, the short time to initiation of pregnancy also suggests that men in this group had partners who were highly fecund. The SA results of a small, homogenous subset of highly fertile couples cannot be applied to the general population, and an arbitrary cut-off within these parameters is not clinically useful for defining potential for infertility.

Parameter Changes
Several critiques of the changes in the WHO SA criteria from 1999 to 2010 have been previously published [9][10][11]. Notably, the changes almost uniformly decreased the reference value for each of the SA parameters, and many men previously considered being infertile or subfertile would now fall in the "normal" category -despite having a history of infertility that is not explained by a female factor. In fact, anywhere from 15-39% of men who were found to have an abnormal SA using WHO 1999 criteria would be reclassified as "normal" under the WHO 2010 criteria [11,12]. The following sections will outline the changes for each parameter that have been made in the most recent WHO manual, the 5 th edition published in 2010, compared to the penultimate manual, the 4 th edition published in 1999.

Semen Volume
Semen volume decreased from a lower limit of 2mL in the 4 th edition (1999) to 1.5mL in the 5 th edition (2010). Interestingly, the 2010 criteria state that the 1.5mL was derived from specimens that had a 95 th Confidence Interval (CI) of 1.4-1.7mL. Therefore, because over 95% of the "normal" semen samples used in the 2010 criteria was less than 2mL, they all would have been considered abnormal according to the 1999 criteria. The fact that such a huge discrepancy exists between WHO editions calls into question which version can be believed. Importantly, the most accurate methodology for assessing volume is to directly weigh the specimen in a container that itself has been weighed prior to collection, and to subtract the container weight from the specimen + container weight. Only 1582/1953 (81%) of the specimens were analyzed using this preferred methodology. The remaining specimens were analyzed by decanting the semen into a graduated cylinder despite the fact that the 2010 manual specifically recommends against this methodology because of the large volume (up to 0.9mL) which can be lost [5]. Although not utilized for the 5 th edition, the other acceptable methodology for assessing volume includes collecting the specimen directly into a graduated cylinder that has a minimum of 0.1mL demarcations; this would allow technicians to read the volume directly off the primary collection container without the need for transfer. In addition to decanting, any other means of transfer of semen from one container to a graduated cylinder, such as aspirating with a syringe or pipette, are not recommended and should not be utilized by any lab.

Sperm Concentration and Count
The lower reference limit for sperm concentration decreased from 20 million/mL to 15 million/mL (95 th CI 12-16 million). Total count decreased less dramatically, and the 4 th edition considered 40 million total sperm/ejaculate normal with the 5 th edition considering 39 million (95 th CI 33-46 million) total sperm to be the cut-off for normal. Obviously, variability in measuring semen volume could affect the calculation of total sperm count since this parameter is the product of total volume and sperm concentration. Sperm concentration can be obtained using a variety of counting chambers or hem cytometers. It is not clear if all of the 1953 subjects were included in the calculation of the 5 th edition's parameters, however the breakdown of the 1953 is as follows: Neubauer chamber used in 888/1953 (45%), Makler, Burker-Turk, or Thoma chambers in 165/1953 (10%), or Neubauer, Burker-Turk, Thoma, or Malassez chambers in 900/1953 (45%). While Neubauer improved hem cytometers are discussed at length and recommended for use by the 2010 manual, the specific recommendation is only that a 100 micrometer deep chamber is preferred. Alternative chambers, such as those that fill by capillary action, may be more prone to error because the sperm are not uniformly spaced across the counting area. These should be compared to Neubauer improved chambers prior to incorporation into an andrology lab to ensure accuracy. It is well-established that there is variation between counting chambers, and a likely origin of discrepancies stems from variability in dilution and volume instilled [13][14][15]. Other sources of error include delay in placement of coverslip (allowing evaporation and overestimation of numbers), miscounting of cells that are not fully mature sperm, and counting an insufficient number of sperm thus increasing the variability sample [16]. It is notable that the five laboratories used for the 5 th edition stated that they used internal and external quality control measures for every test, but intra-and interlaboratory coefficients of variation were not provided. Of particular relevance to the infertile patient is that some chambers are known to be less reliable at lower concentrations. For example, Makler chambers have been shown to be less reliable than Neubauer improved chambers with concentrations of less than 40 million/mL, and this could be considered when a lab is choosing a standard chamber [17].

Motility
Total motility (the percent of sperm that are motile) dropped from 50% being the cut-off for normal to 40% (95% CI 38-42). Again, based on the CI that is provided in the 5 th edition, it is unexpected that more than 95% of the semen samples included in the WHO 5 th edition would have been considered abnormal if the WHO 4 th edition criteria were applied. It is unclear why such a great discrepancy is seen from one edition to the next, but it does call the applicability of the findings into question because it cannot be assumed that the most recent version had the most accurate determination of motility. Interestingly, progressive motility was the only parameter that reported an increase in the lower limit of normal. WHO 1999 manual designated 25% progressive motility as normal, but WHO 2010 uses 32% (95% CI 31-34%) as the lowest limit of normal. Progressive motility can be subjective, and is defined as sperm that are moving linearly or in a large circle. Nonprogressive motility is all other patterns of movement that do not result in progression, such as swimming in smaller circles or minimal linear movement. Immotile sperm are obviously sperm without any discernable movement. The current WHO criteria, in a nod of appreciation to the subjective nature of motility assessment, eliminated speed from the determination of progressive motility. Much like sperm count and concentration, the assessment of motility is highly dependent on accurate dilution, counting, and replicability; small errors in pipetting or mixing the specimen could significantly bias results. Of note, current recommendations do not define a specific temperature for assessment (the choices being either room temperature or 37°C), but instead only state that whichever temperature is used needs to be standardized for that particular lab. Eleven percent (206/1953) of the specimens were analyzed at room temperature, and the remaining specimens were analyzed at 37°C for the WHO 5 th edition. While no discrepancy in motility assessment was seen when the five labs were compared, the small number of specimens analyzed at room temperature may limit this comparison. It is likely that 37°C gives a more physiologically relevant assessment of motility; however, it is unclear how many andrology laboratories are equipped with heated microscope stages that would allow motility determination at 37°C [18]. The optimal temperature for standardization remains controversial, as does an improved means of standardizing the distinction between progressive and non-progressive motility.

Vitality
The percent of viable sperm considered normal decreased substantially; WHO 1999 manual declared 75% to be the lowest limit of normal, while the 2010 manual considered 58% (95% CI 55-63) to be the cut-off, again with values that would deem every specimen included in the 5 th edition to be uniformly defined as abnormal if the 4 th editions cut-off were utilized. Vitality is determined by the presence of an intact extracellular membrane. Either dye exclusion (living cells exclude membrane impermeable dyes) or the hypo-osmotic swell tests (living cells swell when challenged with hypotonic solution) are relatively reliable. The only limitation regarding which vitality test to choose is that dyes must be avoided (and thus the hypo-osmotic swelling test is preferred) in any specimen that requires living sperm be chosen for intra-cytoplasmic sperm injection. Only data from 1106 men were used in the 5 th edition to determine vitality, and there were only two laboratories that were represented. The method that the two laboratories utilized was the eosin-nigrosin method of dye exclusion.

Morphology
The criteria for determining sperm morphology are, unquestionably, a controversial decision for any laboratory manual that attempts to standardize this parameter. The 4 th edition declared that 14% normal forms was the lowest limit of normal, whereas the 5 th edition designated 4% normal forms (95% CI 3-4%) to be the cut-off as defined by the Tygerberg criteria. Notably, the 5 th edition used data on sperm morphology that was read at three laboratories, and only used the results from 1788 men. A central laboratory read the specimens contributed by 1493/1788 (83%) men involved in multi-center studies, a second lab read the specimens from 206/1788 (12%), and 89/1788 (5%) were read at a third lab. It is clear that the vast majority of specimens were read at one central laboratory, and the results from this laboratory drive the morphology reference limit.
Morphology is a controversial parameter because of the subjective nature of its assessment. Of paramount importance is the appreciation that being able to identify abnormal sperm requires prior knowledge of the normal sperm morphology that facilitates natural fertilization. There are no studies which have been performed which looks at sperm that are delivered via intercourse that systematically analyze the morphologic features of sperm at various points along the female reproductive tract. The Tygerberg criteria were developed by studying the appearance of sperm that are delivered artificially via insemination; this does not provide information which can be extrapolated to natural conception [19]. While these criteria have been validated as being predictive of successful In Vitro Fertilization (IVF), it cannot be assumed that success in IVF translates to success with natural conception [20]. The sperm morphology which is required to traverse the female reproductive tract after intercourse may very well be quite different than the morphology required to survive the manipulation associated with insemination or intra-cytoplasmic sperm injection. Furthermore, the sheer overlap in percent normal morphology which is seen in fertile, sub-fertile, and infertile men regardless of which morphologic criteria are chosen precludes any cut-off from being useful to clinicians. For example, one study has found that the mean±SD percentage of normal sperm in fertile men is 6.2±3.7%, and the mean percentage of normal sperm in infertile group was 4.1±3.5% [21]. Another study demonstrated the mean±SD % normal forms using Tygerberg criteria was 6.5±3.9% for fertile couples and 3±2.6% for infertile couples [22]. Clearly, despite the exacting specifications applied by using the Tygerberg criteria, we cannot draw a line between fertile and infertile men when evaluating morphology. It cannot be underscored enough that cut-offs have no role in any lab parameter that has such a large overlap between fertile and infertile couples.

Clinical Implications
The definition of "normal" being any semen parameter that falls within the 95 th centile is not supported by clinical observation. Use of the 5 th percentile level is suggested by the level of statistical certainty typically used in clinical tests for electrolytes and hormones that are tightly regulated and do not rely on subjective measurement. Furthermore, as noted above, the 5 th centile line as a reference limit is arbitrary and without consideration of the prevalence of impaired fertility in the general population. In fact, it is well-known that up to 40% of infertile men have semen parameters that overlap with those of fertile men [11,23,24]. This degree of overlap calls into question the wisdom behind strictly defining "normal" as anything that falls above the 5 th centile. The primary evidence questioning the concept of an arbitrary cut-off is the observation that 100% of the men who provided specimens for the 2010 manual who fell below the 5 th centile were fertile and initiated a pregnancy. Clearly, drawing any cut-off from within this group of men is inappropriate because it does not define the infertile population -the population that needs an accurate SA. An alternative to the current practice of drawing arbitrary cut-points is warranted.
The goal of defining abnormal semen parameters should not be to precisely define "normal" at the expense of missing the 40% of men who are clinically infertile but overlap with fertile men SA results, because men who have no difficulty initiating a pregnancy do not merit clinical intervention. SA parameters should instead be utilized to more precisely define "abnormal," so couples with abnormal fertility potential can be identified for potential intervention to improve sperm quality. The risk of classifying a fertile man as infertile is far outweighed by the benefit of accurately identifying infertile men, as fertile men with "abnormal" semen parameters are unlikely to undergo a semen analysis and further evaluation.
The implications of lowering the threshold for normal semen parameters for infertile couples are significant. The SA is often the only measurement that reproductive endocrinologists examine when deciding if they are going to further evaluate the male partner for infertility. Because of the lowered threshold, countless couples are being misdiagnosed and therefore missing the opportunity to benefit from specialty referral that could expedite their diagnosis and treatment. Furthermore, the options available for treatment once seen by a specialist are limited based on the definition of "normal" SA. For example, men with clinical varicoceles may not be offered repair if they have semen parameters that do not fall below the 5 th centile because, by definition, they have "normal" results.
Semen paramenter reference values should inspire confidence that they are appropriate and applicable to the relevant clinical population. Many clinicians and patients are stymied by "infertility of unknown origin" in a large portion of infertile couples because there is no identifiable female factor, and the male partner has SA results above the 5 th centile. It is likely that a considerable number of these patients have a missed male factor that could be identified with more appropriate reference values.

Future Considerations
There are several changes that need to take place in future iterations of the WHO criteria in order to appropriately identify male factor infertility. Clearly, more diverse patients need to be incorporated into studies that determine reference values. Even more essential to accurately identify infertile male patients is a shift in the definition of normal. The 5 th centile is not an appropriate cut-point. This cut-point is arbitrary and does not take into account the significant overlap in values between fertile and infertile men. An alternative approach for evaluation of semen parameters would be to express the complete complement of centiles of semen parameters with fertility potential, and eliminate the idea of a cut-off from our vocabulary when discussing semen parameters. For example, providing the 5 th , 15 th , 25 th , 50 th , and 75 th centiles for fertile men may yield more useful information for couples. This approach would allow relative comparison of where the male partner falls in relation to "normal" reference values and facilitate incorporation of this knowledge into the clinical picture for a specific couple. Presenting semen parameters as a continuum instead of as a cut-off would allow clinicians to provide prognostic and diagnostic information without misclassifying male patients. Importantly, a separate set of semen analysis parameters derived from men who were not able to naturally father a child despite ruling out female factor would be valuable. Including sub-fertile and infertile men in their own continuum of semen parameter results would demonstrate to patients and physicians alike the significant overlap in semen parameter values that exists among fertile and infertile men, and perhaps prompt referral to a specialist in male infertility. The obvious overlap in normal and abnormal SA results that would be apparent from this side-by-side comparison will hopefully prevent infertile men from continuing to be overlooked and undertreated.

Conclusion
The most recent WHO reference ranges for normal semen parameters appear arbitrary and have limited clinical utility. Careful consideration of these cut-points can provide a useful lesson in the pitfalls of identifying normal laboratory values.