Does the Requirement of Readability Testing Improve Package Leaflets? Evaluation of the 100 Most Frequently Prescribed Drugs in Germany Marketed before 2005 and First Time in 2007 or After

Objectives: Based on the “Action Plan 2008/2009 for Improving Drug Therapy Safety” issued by the German Federal Ministry of Health, the Federal Institute for Drugs and Medical Devices (BfArM) has launched a study on the effect of readability user tests on the quality of Package Leaflets (PLs). Methods: Based on recommendations from the EU Readability Guideline, a criteria catalogue for the analysis of PLs has been set up, serving as surrogate parameters for readability of statements within the PL. 100 of the most frequently prescribed medicinal products in Germany were selected and their readability analyzed. The study was blinded. Results: This study shows that merely 44% of the 100 most frequently prescribed medicinal products in Germany have PLs with a “normal” or better readability. PLs on the market since 2007 show a trend towards improvement when compared with products marketed before 2005. This effect was even more pronounced with the 23 PLs tested as required. Conclusions: The new European legislation in force by the end of 2005 induced a trend towards better usable PLs. On the average, however, this effect is barely recognisable. Only new products on the market need to be tested in regard to readability. Simultaneously, the text extent increased – a considerable effect against the intended improvement. Practice implications: Apart from text required to be short as possible, of short sentences, simple and clearly written, other legal requirements influence the length of PLs: These conflicts cannot be resolved as long as the entire SmPC needs to be mentioned in the PL due to Medicinal Product Act and liability provisions Nowadays, other (technical) solutions should be legally confirmed to present the content of a PL with a good design in different ways according to the need of each of different user groups. *Corresponding author: Klaus Menges, Federal Institute for Drugs and Medical Devices, Kurt-Georg Kiesinger Allee 3, D-53175 Bonn, Germany, Tel: +49 228 207 3458; Fax: +49 228 207 3567; Email: klaus.menges@bfarm.de Received December 27, 2011; Accepted June 08, 2012; Published June 12, 2012 Citation: Beime B, Menges K (2012) Does the Requirement of Readability Testing Improve Package Leaflets? Evaluation of the 100 Most Frequently Prescribed Drugs in Germany Marketed before 2005 and First Time in 2007 or After. Pharmaceut Reg Affairs 1:102. doi:10.4172/2167-7689.1000102 Copyright: © 2012 Beime B, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Introduction
Well-presented information and communication of the intended use of the medicinal product can help minimize Adverse Drug Reactions (ADR) resulting from medication errors such as wrong dosages or non-observance of warnings. The use of medicinal products always poses a relevant risk to patients. Avoidable ADRs take on a special position here. They occur in particular when medicinal products are inadvertently used in other ways than prescribed. The proper understanding of how a medicinal product is to be used safely and effectively is crucial. Important sources of information in this respect are the PLs of the medicinal product [1]. On the other hand, PLs are often criticised by patients [2] as well as by academia for lack of clarity and readability [3][4][5].
According to Directive 2001/83/EC on the Community code relating to medicinal products for human use, medicinal products must be accompanied by a package leaflet. Since 2005, article 59 of the Directive stipulates that "the package leaflet shall reflect the results of consultations with target patient groups to ensure that it is legible, clear and easy to use." The European Commission has published a "Guideline on the Readability of the Labelling and Package Leaflet of Medicinal Products for Human Use" (Readability Guideline) in order to assist applicants and marketing authorization holders in drawing up PLs and documenting the newly required report on consultation with target patient groups, so called readability user tests. The Guideline was first published in 1998 [6] and was last revised in January of 2009 [7] including respective recommendations. In 2002 and in 2006, BfArM published "Recommendations for Drafting of Package Leaflets" [8,9], giving additional guidance to applicants and marketing authorization holders in Germany.
It stands to reason that, over the years, readability studies have improved the quality of package leaflets [10]. Several aspects of the potential usefulness of readability user tests on PLs have been investigated [11,12]. But, as of yet, no systematic study to this end has been conducted. It is for this reason that the BfArM has launched a study to objectively evaluate changes in the average readability of package leaflets of the 100 most frequently prescribed products [13]. The study was part of an action plan for the improvement of medication safety. The results are presented in this article.

Sample
The Scientific Institute (WIdO) of the General Local Health Insurance (AOK) determined the most prescribed German medicinal products in 2008 from a sub-analysis of the pharmaceutical index [14]. These list 200 statements were used, but still included several strengths or pharmaceutical forms of one product name. A stratification was foreseen based on the date of release at the pharmacy: medicinal products placed on the market before 1 January 2005 -i.e. prior to legal introduction of readability user tests -and those products having been placed on the market after 1 January 2007 first time, assuming that the requirement to provide well readable PL can be confirmed.
Due to the structure of the pharmaceutical market, a few generic companies dominated the list. This would have biased the sample. Therefore, the following exclusion criteria were used to ensure a certain variation within the sample: Identical products with different strengths or dosage forms were excluded, as these are likely to have identical PLs. No more than 2 products/PLs of the same marketing authorization holder were included in the sample. This reduced the list down to 126 products. Products from centralised procedures were not included as different rules for readability testing have been applied in the past.
The resulting sample was analysed by the BfArM whether the product had been placed on the market before 1 January 2005 or after 1 January 2007. 59 products identified in the "2005"group and 67 in the "2007" group. The sample was restricted to the 50 PLs of the most frequently prescribed medicinal products in Germany in each stratum. The marketing authorization holders were asked to provide the corresponding print releases (files in PDF format), and the text versions (RTF format) already available at the BfArM were matched for the text analysis ( Table 1).
The 100 PLs were then made available to Diapharm for analysis. The study was blinded as to the selection criteria. In particular, the date of marketing and any previous submission to readability user tests were not disclosed to the researchers.

Readability scales
The EU Readability Guideline [7]  These recommendations were operationalised into objectively measurable criteria or "readability scales". These criteria serve as surrogate parameters for "readability" of PLs in the Guidelines' sense of the word.
It should be noted that the recommendations themselves are not above criticism. Among other points, criticism has been voiced over the avocation of certain verbal descriptors for communicating the probability of ADRs [15,16] which are part of the recommendations in the Guideline. As these recommendations built the basis for the authority assessment, they have been considered as a reasonable approach, despite a formal testing for construct, criterion and content validity is nowhere provided by the regulatory authorities.
The criteria derived from the EU Readability Guideline are: 6. Font (no stylised fonts, similar letters/numbers, such as "i", "l"and "1" can be easily distinguished)

Headings used as navigation elements
8. Text design (not "justified" text, but columns) 9. Product ranges (more than one strength or form mentioned in the PL)

Print colour (good contrast)
11. Pictograms used in the text (meaning is not misleading or confusing) 12. Symbols used in the text (meaning of the symbol is clear)

Paper formats (landscape format preferred)
The quantifiable number of 20 words as a definition of "long" sentences was taken from the 1998 revision of the EU Readability Guideline [6]. The absolute number of words is absent from the 2009 revision of the Guideline. The following criteria were based on the BfArMs recommendations [8,9]: 14. Repetitive sentences (duplication of identical information)

Explanations in parentheses
16. Tables (supporting ADR section) 17. Unusual elements (indicated to avoid technical interference) As a final addition to these criteria, a statistical readability analysis was conducted [17,18].

Statistical readability analysis (Package Leaflet Readability
Index, PLRI) Every criterion was assessed based on a standardised procedure. The fulfilment of a recommendation, measured in the form of a criterion, resulted in a score. The target value was set equivalent to 0 points. Inadequate fulfilment led to a higher point score, depending on the level of deviation. Where the EU Readability Guideline suggested more than one solution, adherence to the minimum solution was assessed as meeting the target (0 points). All 18 criteria were incorporated into the overall score of the PL with equal shares as no sound justification for a differentiation was available. The scoring of 2 out of the 18 criteria is explicated in order to illustrate the procedure:

Type size
The EU Readability Guideline stipulates that "A type size of 9 points, as measured in font 'Times New Roman', not narrowed, with a space between lines of at least 3 mm, should be considered as a minimum. However, for marketing authorization applications until 1 February 2011, a type size of 8 points, as measured in font 'Times New Roman', not narrowed, with a space between lines of at least 3 mm, should be acceptable as absolute minimum." [7]. Accordingly, the target value (0 points) is set to a type size of 8 points. Lower or higher values translate to the following scores: • Type size ≥ 9 Point -1 Point

Package leaflet readability index
Readability indices illustrate the readability of a text in form of a key figure. A number of different readability indices have been developed for different text genres and languages. The most prominent example is probably the Flesch-Kincaid Grade Level Readability user test used, for example, to evaluate the number of years of education generally required to understand a certain text [17].
The Flesch-Kincaid Test was developed for texts in English language. Similar indices have been developed for German texts [19]. For this study, a combination of the Wheeler-Smith-Index adapted for the German language (WSI G ) [18] and the QU-Index [19] was chosen. WSI G = (words / sentences x (3-syllable-words / words)) * 10 Qu = √(((3-syllable-words / words x 100) / sentences) * 30) -2 The results of both indices are so called "school classes" (as are the results of the Flesch-Kincaid Grade Level Readability user test). This means that the resulting values reference the years of education needed for comprehending the text. A text with a WSI G or a Qu value of "9" hence is suitable for pupils of the ninth grade. Both indices were developed for general texts and not specifically for the genre of PLs. Although similar items are used for calculation, the indices result on this type of text when used first time in a higher variability than expected. Therefore, a combined index of both indices was defined and subsequently validated, called the Package Leaflet Readability Index (PLRI): Compulsory education in Germany covers a minimum of 9 years. For this reason, the target value (0 points) for fulfilling of the "Package Leaflet Readability Index" (PLRI) criterion was set to the PL being suitable for readers with 7.5 to 9 years of education. The following assessment criteria were defined: • PLRI = ≤ 7.50 -1 Point • PLRI = 7.50 -9.00 0 Points • PLRI = 9.01 -9.50 1 Point • PLRI = 9.51 -10.00 2 Points • PLRI = 10.01 -10.50 3 Points • PLRI = 10.51 -11.00 4 Points • PLRI = 11.01 -12.00 5 Points • PLRI = 12.01 -13.00 6 Points A software-based quantitative content analysis with "Text Quest" was used to assess the text criteria [20]. The software was adapted and validated for the use on package leaflets. Layout criteria were assessed by means of the software "Pit Stop Pro" software [21]. Additionally, the criteria underwent manual assessments in regard of usefulness of tables, understanding of pictograms and symbols which were carried out in accordance with pre-defined Standard Operating Procedures.

Validation of the criteria catalogue
To ensure that the criteria accurately measure "readability" in the Guidelines' sense of the word, the method was tested against readability user testings based on the recommended interview technique [11]: For this validation, 20 package leaflets were assessed using the criteria catalogue before they underwent a readability user test. Modifications were then made to the PL according to the results of the readability user test, as stipulated by article 59 of Directive 2001/83/EC. The modified PLs were then assessed independently again using the criteria catalogue. The results of this evaluation accurately reflected the modifications that had been made to the package leaflets.
For the additional "Package Leaflet Readability Index" (PLRI) the comparison of PLs before and after a readability user test shows that even small changes of 0.2 points (corresponding to approximately two The evaluation in accordance with the criteria catalogue delivered an overall score for every PL. A PL that fully meets all criteria developed from the EU Readability Guideline would receive a total score of ≤ 0 points. Based on the validation deviation from this target value was categorised as follows: • ≤ 0 Points (target value) very good PL

Results
The readability index (PLRI) measured, ranged from 5.97 for an analgesic containing Tolperison up to 11.06 for a L-Thyroxin preparation. More than half of all PLs tested (62%) did not show any noticeable negative irregularities: the median PLRI was 8.68. The type sizes used in PLs, however, are often significantly smaller than the "absolute minimum" of 8 points set forth in the EU Readability Guideline.

Results: Marketing "before 2005" / "since 2007"
By disclosing the date of initial marketing to the researchers the PLs were divided into two groups, which were of equal size due to the selection criteria.
Both of these groups contain PLs with "very good" as well as "poor" readability ( Figure 1). The group "since 2007" contains 24 PLs with "normal" to "very good" readability; the group "before 2005" contains 21 PLs with "normal" to "very good" readability. There seems to be a slight positive tendency towards improved readability, but it is statistically insignificant (p-value = 0.58). However, PLs with "poor" readability in the group "since 2007" had an even worse average score than the corresponding group of PLs of the "before 2005" group.
Further analysis of the results reveals that the number of words in a PL correlates with the overall score of the PL. A PL with "very good" readability has an average of 1401 words; a PL with "poor" readability has an average of 2717 words. In other words, the length of the average "poor" PL roughly corresponds to the length of this scientific article. The limitations of this finding are mentioned below.
At the same time, the average number of 2256 words (95% confidence interval: 2003-2509 words) of PLs marketed "before 2005" has increased to 2601 words (95% confidence interval: 2208-2996 words) for newer PLs having been marketed "since 2007" (p-value = 0.14). This trend has been identified by other researchers as well [22]. This pattern can be correlated to PLs rated as "very good", "good / normal" and "poor" as well (Figure 1). The word counts are only reduced in the group of PLs in the group of "very good".

Results: PLs "with" / "without" readability user test
A further stratification has taken place. The BfArM checked whether or not readability user tests had been done on the PLs in the sample.
Surprisingly, in the group of 50 medicinal products having been placed on the market "since 2007", that is: after such tests became a legal requirement, only included 22 products that actually had undergone readability tests. The remaining 28 medicinal products of this group had not been considered new from a regulatory point of view. Thus, they had not been subject to comply with the requirements to provide a readability evaluation of the PL in accordance with § 22 (7) of the German Medicines Act (Arzneimittelgesetz).
All in all, the readability of the 22 products marketed "since 2007", which had undergone readability testing, was significantly better than the readability of the PLs of the 28 products without testing, with an average score of 9.9 points compared to an average score of the was 12.4 points (p-value = 0.049).
In the group of 50 products that had been placed on the market "before 2005", one PL had been readability user tested (Figure 2).
These numbers may give a hint towards a positive trend: the number of PLs rated "normal" to "very good" with readability user test almost make up two thirds (60.9%), the number of PLs rated "poor" make up a little more than one third (39.1%).  with readability test (n=23) without readability test (n=77) Figure 2: Share of PLs with "very good", "good/normal" and "poor" readability in the groups "with readability user test" and "without readability user test" (95% confidence interval given). Despite readability user tests, long sentences were frequently used, the texts were spiked with many technical terms and explanations in parenthesis. This all does not improve readability.

Conclusion
The analysis took extracted indicative parameter from the EU Readability Guideline, which would be able to give a hint to some overall improvement according to the experience gained from such readability user testing reports. The intent of the study was not to evaluate if the content of a given PL is accurate, medically important or even useful for the patient. It was also not the intent to repeat any readability user testing.
The method chosen poses some limitations. The results cannot be extrapolated to identify the most relevant items improving readability or which of the information would have the highest impact on avoiding risks. A readability user test with test subjects is the appropriate method for that issue. The length of the text, the number of long sentences and the Readability Index may overlap in part and therefore overweigh text length and word counts presumably. This will work against the hypothesis in the way readable PL may have rated worse. On the other hand side, patient groups always concerns the length of the PL. This will underpin the importance of the criterion. Even when complex products need to be explained, short sentences are preferred and duplication of the same information several times in the PL should be avoided.
The scoring of the criteria is not evaluated in depth as the entire approach was planned to be explorative. To achieve statistically more reliable figures the sample size would have need to be extended. However, the study provides the basis for a sample size calculation which was missing before.
The results of the study show, however, that the path taken by the European Commission in demanding readability user tests is generally suitable for improving readability of PLs. Almost two thirds (60.9 %) of the PLs that had passed through a readability user test demonstrated "normal" to "very good" readability. Almost two thirds (61.0%) of the PLs that had not been tested showed "poor" readability.
In the subgroup of 50 products that have been on the market since 2007, 22 PLs underwent a readability user test. These PLs are significantly better readable than the 28 PLs that have not been readability tested (p-value = 0.049). However, it has also been shown that the number of PLs assessed as "poorly readable" is very high, even for these newer medicinal products. Additionally, the text of PLs of medicinal products is getting longer (p-value = 0.14). Obviously, this is a longterm trend but in respect of readability this is counterproductive. For good readability a text should be short as possible, of short sentences, simple and clearly written. With increasing complexity, texts become less comprehensible and risks concerning drug safety may increase.
The number of newly marketed products, which cannot be considered as really new products and which PL have not been tested was unexpected high. In conclusion, new products on the market may be really new in the distribution chain. But the marketing authorisation was considerably older and obviously not in line with current legal requirements.

Implications
The study has led to two main findings: First, the institutes that have carried out readability user tests on PLs which have been rated "poor" in this study have unquestionably done an inadequate job. The issues noted in this study would otherwise have been noticed and the PLs should have been corrected accordingly. It has been observed that, in many cases, attention is paid to meeting certain percentage values needed for the formal acceptance of the readability user test by authorities rather than to real improvement of the PL [10]. Our study has revealed the consequences of this approach.
Second, on the market are many products which are frequently administered but will never be tested via a readability user test under the present legislation. These frequently used products are on the market longer than 2005 and therefore not covered by the EU Readability Guideline. What will happen with all these older package leaflets?
Third, in many European marketing authorization procedures (MRP, DCP), the competent national authorities often refer to old originator text rather than using PLs that have been evaluated by means of readability user tests. This corroborates the perception of readability user tests as a formal necessity and it prevents effective improvements to the comprehensibility. As it happens, the EMA has abandoned its initially restrictive stance on the identity of PLs between the originator and the generic drug manufacturers for exactly this reason [23].
Legislation gives authors clear specifications pertaining to structure and content. But for pharmaceutical companies, user-friendliness is not the only criterion for PLs: legal specifications regarding liability also play a crucial role. The resulting conflicts cannot be resolved without another legal provision. Nowadays, other (technical) solutions should be legally confirmed to present the content of a PL with a good design in different ways according to the need of each of different user groups.