A Novel Web-Based Clinical Research Tool for a Nationwide Research Study

Objective: To describe a web-based clinical research tool designed for a nationwide genetic study aiming to enroll


Background and Significance
The World Wide Web has gradually evolved as a vital resource in medicine, business and research [1]. The use of Internet in administering research studies and clinical trials has rapidly gained popularity with the advent of web-based applications used to collect, organize, and analyze large volumes of information in a cost efficient manner [2,3]. Eysenbach et al. [4] described the CHERRIES criteria, a checklist of recommendations in an effort to ensure that complete accounts of Web-based surveys are achieved.
The Duke University Health System created DADOS-Prospective [5], an open-source and CHERRIES-compliant web-based application designed to collect and manage information for clinical and translational trials. DADOS-Survey, which was used for biomedical research, allowed participants to fill out questionnaires for anonymous or non-anonymous studies. Electronic signatures were collected from participants of non-anonymous studies [6].
In 2004 Vanderbilt University created the Research Electronic Data Capture (REDCap), followed by the launching of the REDCap consortium in 2006, allowing other universities to collaborate in the future development of web-based research studies [7]. The REDCap consortium is composed of 473 active institutional partners and other institutions from 48 countries. REDCap is a secure, web-based application designed to support data capture for research studies, providing: 1) An intuitive interface for validated data entry; 2) Audit trails for tracking data manipulation and export procedures; 3) Automated export procedures for seamless data downloads to common statistical packages; and 4) Procedures for importing data from external sources [8]. The Duke University Health System has joined the Redcap consortium and has only continued its use of the DADOS-Prospective for those studies that are ongoing and that were initially created using the DADOS-Prospective software. These web-based research systems offer a number of useful tools for designing and administering questionnaires as well as the ability to monitor participant's activities.
At the time we were developing our system, we conducted a search of the number of active web-based clinical trials nationwide, using the website www.clinicaltrials.gov which was established in 1997 by the DHHS as directed by the Food and Drug Modernization Act [9].
After limiting to those studies currently "seeking new volunteers," we found 245 web-based studies. Some of these studies gathered participants' medical history, symptoms, or vitals; ten observational and two interventional studies aimed to collect genetic information. However none of these studies used a web-based platform that fully addressed our needs for a nationally distributed, purely Internet-based, prospective clinical research genetic study.
There were several key features that were not readily apparent in other systems at the time we developed our program. Firstly, none of these web-based systems were specifically designed to handle studies with more than minimal consent, which would allow for a waiver of the signing of an informed consent document. To date, the DHHS has not yet allowed electronic signatures for informed consent for research studies, even though this is an accepted practice in business and finance. Genetic studies are "by definition" considered as greater than minimal risk and thus we needed a process that would allow for an interactive, yet confidential informed consent process, incorporating a signatory process that would reduce the possibility of fraudulent identity. Secondly we needed a branched logic questionnaire design that would allow for easy and rapid completion by participants. Thirdly, we needed an integrated HIPAA-compliant portal to allow for ease of communication. Finally we needed a process by which clinical information, particularly digital imaging studies, could be easily and securely transmitted to our research system, appropriately tracked and managed.
GARM II web-based study (NCT01115387) was launched to enroll nationwide participants with a genetic background of Age Related Macular Degeneration (ARM). GARM II is intended to analyze the medical, visual, environmental exposure (i.e. smoking and light exposure), dietary supplements, past and current physical activity, sleep patterns, dietary and genetic factors in the adult children (ages of 49 to 65 years old) of patients with ARM. This population group has a 6 to 12 fold increase risk of developing ARM [10]. Participants are expected to complete online questionnaires and to provide saliva samples for DNA analysis. In this paper we describe the web-based platform designed for clinical and translational research.

Materials and Methods
The GARM II study was preceded by paper-based, GARM I, that commenced 20 years ago at the University of Pittsburgh. In that time frame, clinical information has been gathered on more than 700 families of individuals with ARM [11][12][13][14][15][16][17][18][19][20][21][22]. GARM II study received UCLA IRB approval on July 7, 2009 and it was launched on August 7, 2009. Given our commitment to study conditions that affect vision, the GARM II website was designed from the outset to allow individuals with visual disabilities to expand the text size without disrupting the format of the web pages and to use a proxy to aid in their participation if they are unable to use the website on their own.

The informed consent process
Potential participants can initially review any part of the informed consent documentation without any restrictions. They can also invite other individuals (such as family members) to learn about the study by copying a block of text containing a brief explanation of the study and URL and pasting that into their own email message, avoiding any disclosure of their identity or that of their contact individual to the study coordinators. Potential participants are encouraged to create an anonymous user account (e.g. username and password) on the website's home page prior to initiating the Informed Consent online process (IC). In this fashion, they can post questions to the research coordinators within any section of the informed consent documentation and receive responses when they return to the site in the future. When they use their user ID and password, they are presented with the informed consent process within a secure portal that indicates their progress through the different sections. The portal identifies those IC sections that have been completed, sections that have pending questions that the individual has posted and the responses to their prior questions. As these registered potential participants navigate through sections of the IC they can indicate their agreement or acknowledgement of the information in each section, which establishes that they have viewed and understand each part of the informed consent process. This is analogous to the paper-based method of having participants initial each page of the informed consent to ensure that the IC has been completely reviewed.
This study has two different cohorts that have different levels of participation in the research based on whether they have a history of ARM (Group I) or if they have a parent with ARM or a partner who has a parent with ARM (Group II). The informed consent section allows individuals to specify which group for which they are eligible and provides specific information for each cohort during the IC.
Once the IC is nearly completed, the participant is then asked to provide demographic and contact information including another contact individual who does not live with them. This additional information is invaluable for multi-year prospective studies, since participants may move or change phone information. To date, electronic signatures have not been approved by the Department of Health and Human Services (DHHS) for IC documentation, especially in a study that involves genetic information. We take advantage of this restriction to confirm the participant's identity by mailing a copy of the Informed Consent Signatory Page (ICSP) and the HIPAA Consent Signatory Page (HCSP) to the participant's mailing address. Once the participant returns the forms signed and dated, the enrollment procedure is complete.

Communication with participants
The study has established a toll-free number and a general email for interaction with research participants. Non-PHI inquiries and notifications of new surveys to be completed can be sent through regular email with instructions and links to go to the portal to obtain individual-specific information. A similar IC-inquiry process is available for all of the study questionnaires as well as for general questions or concerns ( Figure 1). An email notification is sent to the study participant after the study center provides a written answer to their inquiry, which is accessed through the secure portal.

Enhancing participant understanding and experience with the study
Participants have varying levels of medical knowledge as well as computer expertise. Throughout the website, participants can view supplemental information and definitions of certain medical and nonmedical terms to aid in the comprehension (and completion) of the questionnaires. This supplemental information can be provided by active links on specific terms as well as mouse-over boxes that display additional relevant content.
After logging in the study portal ( Figure 2) with personal username and password credentials, participants can complete the questionnaires at their own pace. It is also possible to appoint a proxy (if needed) during the IC process and/or the completion of all questionnaires. The proxy individual can be a family member, friend, or one of the research coordinators, as long as he identifies himself within the study.
Participants have to expend considerable effort to complete the numerous questionnaires that are distributed over the period of the study, including some questionnaires that are updated on regular cycles. Thus we have provided the ability for participants to download and print PDF versions of their informed consent document, HIPAA consent, and summaries of their questionnaires that they can use for their own medical care. This export process is currently in beta testing and is also being adapted so that we can use a similar process to export XML-structured exports of the responses for individuals and groups of participants in order to construct either material for individual-specific personal health records or de-identified clinical data for the research analysis team.

Managing clinical data and materials from other sources
Our study is not simply dependent on information provided by the participants through the web-based portal and questionnaires. We collect saliva samples for DNA extraction as well as eye care records from the subjects' providers and imaging studies. We have a separate system (also in Filemaker 11) used to track the shipping and receipt of saliva collection containers and to maintain information on DNA extraction yields and purity, sample location and distribution for each participant. This database is linked with the main clinical research database that our coordinators use to monitor participant activities. Research coordinators are able to request, and receive, digital versions of a participants' eye care records and fundus photographs (as made available by the eye care physician) through the use of customized interface with the HIPAA-compliant, FTP services provided by "YouSendIt.com". These files are then de-identified for masked review, abstraction and/or grading and stored in each participant's unique record on the firewalled database, linked within the data structures for the rest of the participant's personal and clinical information.

Questionnaire design
In order to identify the best questionnaire design, various types of known, validated paper-based questionnaires used in biomedical research were identified. The National Health and Nutrition Examination Survey (NHANES) [NHANES I-III, 2006-2007] was consulted to assist in constructing some of our questionnaire matrices.
The majority of questionnaires are designed with a forced choice format to prevent contradictory responses. Participants must provide a response for every query in the study. This is essential for the level of data completeness that is required for a clinical research study. However if a question does not apply to them or if they wish to decline to provide a response, they may choose "Not Applicable" or another similar response with the acknowledgement that they have "completed" the question. The question templates allow for the construction of complex logic branching designs, which include matrices so that multiple elements to a specific question can be answered on a single screen. An example of a matrix-styled question is when a participant is asked to describe the dose, frequency and duration of the current use of medications in relation to a specific condition. The investigator has the ability to specify the amount of material presented in each screen so that multi-page scrolling by the participant can be minimized.
Because of the time and effort required for some of the questionnaires, we found it imperative to create a viable shortcut to lessen the time burden. Questionnaires with multiple choice answers are enabled with an "auto-fill" feature in which participants can partially complete portions of the questionnaire and then select to answer the remaining empty fields with a limited set of responses (e.g. "no", "unsure", or "not applicable") without affecting the prior entries. This feature can be applied to a row and/or a column of multiple-choice questions. All questionnaires are date stamped when they are modified or updated.
One of the challenges of a genetics study is acquiring accurate family histories from each participant and appropriately linking and reconciling these pedigrees from multiple family members in the study. To overcome these challenges, we implemented two separate questionnaires. A one-time questionnaire (Family Medical History Questionnaire) asks for the ages of family members (in terms of 10year blocks, instead of exact birthdates) and birth orders. The medical family history queries the participant for affection status for specific eye and general health conditions as well as allowing specific, non-listed conditions to also be entered. The medical history information is only requested for family members for up to third-degree relatives.
The second part of the family history process was developed primarily for genetic conditions that are rare and yet might be in a large pedigree in which persons from the same family might be participating in the study but not necessarily know each other or be aware that they are distant relatives. In order to identify such distant relatives within the same family, we created a separate questionnaire (Tell us About Your Family) for which participants are instructed to list as many of their family members whose names they can remember. The participants provide the first and last names, birth order and relationships of each family member, though allowances are made for incomplete responses. Participants are instructed to list the oldest, most distant relatives they can recall for whom they do not have the names of their parents. Resulting generations of family members are added to the list by linking the names of the parents from the previously listed names of individuals. The participants are instructed to keep working their way down their family tree until they get to their own brothers and sisters as well as cousins and finally to their own children. The data from this questionnaire is stored in a separate table with highly restricted access and is exclusively used to look for matches to merge pedigrees and link family members. We intentionally avoid linking identifiers in this table with any of the medical history provided which might disclose Personal Health Information (PHI) of some members of the family who may not have consented to participate in the study.
System design and software: The clinical management system is a customized database built with FileMaker (v11.0). The web interface has been scripted with PHP and AJAX. While many groups advocate the use of large-scale SQL relational database systems as Oracle and open source databases such as MySQL, these systems require considerable amount of resources and technical expertise for implementation and maintenance. FileMaker can be configured to access these databases when desired while still maintaining powerful data management tools. Filemaker allows the creation of multiple related tables (Figure 3 and Figure 4) to handle the data storage in a backend database that presents a user-friendly interface.

Pre implementation and security testing
Before IRB approval and deployment, the website was extensively tested by our clinical research coordinators. Security testing was accomplished using the IBM Rational AppScan 7.8.0.2 system. The backend of the system was set up with a separate server behind UCLA's firewall. The system is configured with an available backup system to allow for the restoration of the database with minimal delays should any service disruptions occur. The data is stored in a RAID5 array with continual backup.

Implementation issues
Despite beta testing of the software, a number of problems were identified. We discovered that the average computer literacy of our target population (49-65 years old) was lower than expected. Some had trouble navigating through the web pages and identifying hyperlinks. A number of individuals also used computers with outdated operating systems and browser versions that created compatibility issues. There were also a few participants who did not own a personal computer.
In addition, it was found that initially the entire system operated at a speed that caused considerable delay for the participants due to repetitive logic loops. To rectify these issues, and enhance performance, we made several revisions to the software. We installed a dual core server, and test "tuned" questionnaires. This dramatically improved the system performance, which now supports a larger set of participants due to the increased ease of navigating the website portal.
The GARM II study experienced some incompatibility issues between some participants' internet browsers such as Explorer 6 and 7 (IE6 and IE7). Incompatibility for IE6 still remains unresolved due to the nonconformity of that version with current standards. Multiple modifications and additional enhancements to the software have resulted in substantial improvements in speed and usability for participants and research coordinators.

Current enrollment
As of December 2012, a total of 543 participants (359 Females/184 Males) have been enrolled in the study including 31 participants who subsequently withdrew. The reasons for withdrawal include a lack of time in their daily routine, difficulty and long length of questionnaires, and personal health reasons. A total of 58 participants have been enrolled in Group 1 (individuals affected with ARM) and 485 participants have been enrolled in Group 2 as individuals (or the spouse of these individuals) with at least one parent affected with ARM.
As part of our study recruitment, we had an online campaign to increase web-traffic one year after launching the GARM II study.

Discussion
Our web-based research tool complies with HIPAA regulations and it addresses several barriers to the use of the Internet for clinical research studies. It provides an invaluable tool for the prospective study of an at-risk ARM population and it holds the promise of reaching large numbers of participants, enabling researchers to gather clinical information about ARM associated conditions and early symptoms. The information that is currently being collected will be used to develop  To Be Developed Import Process Export Process a genetic model that combines environmental and dietary risk factors to predict which at-risk individuals are most likely to develop early signs of ARM. The primary outcome measure will be the phenotypegenotype correlation of ARM in at-risk individuals.
The web-based interface should be viewed as a research tool and not as a substitute for the role of research coordinators and investigators in the study. Web-based applications that incorporate research study questionnaires within their design have the ability to sample large populations, screen potential participants beyond geographical limitations and reduce the need for paper. In addition, the main advantages of web-based studies are the automated integrity of data entry; the greater independence of participants to choose when they want to participate in the research (with some obvious guidelines and restrictions), the speed of data collection, and reduction of potential human errors in data transferal from paper forms. We encountered two obvious challenges of attempting to conduct a research study over the web. The first is that current IRB regulations regarding IC often create a document that is long and difficult for an individual to read and understand. In our web-based approach, breaking up the IC into multiple sections made it easier for the prospective participant. Because we do not provide a paper-based alternative to this web-based study, there is without doubt an underrepresentation of individuals who lack the resources for online access or the computer literacy. Only a few of our participants have reported using proxies to assist them with questionnaire completion. We were surprised by how many of our initial participants had very limited computer skills and were using outdated operating systems and browsers.
When conducting a clinical research study, it is highly advantageous for the research staff to build questionnaires that meet the specific needs of the research study, or create new routines with the database (secondary datasets) that can be integrated with the rest of the research program, or be able to easily search and formulate queries from the database in order to track and manage participant activities. With a modest level of training and experience, many of these needs can be handled directly by the researchers themselves with the use of FileMaker.
While one may have concerns that the integrity of the system might be compromised by such interactions, the administrator functions of FileMaker allow the investigator to tightly control the level of access of the different users of the program.
Security is a crucial concern when sensitive protected health information is exchanged using Internet. A Security Socket Layer (SSL) for the GARM II study provides data encryption, transmission and firewall server authentication. We have found that with the ongoing need to make changes and additions to the system, it has been essential to have a development server that allows the staff to make and test changes without jeopardizing the live system. Ongoing security testing is essential since even minor changes can introduce vulnerabilities to the system.
There are a few planned additions to our web-based tool ( Figure 5). One addition is to provide participants with a synopsis of personal health information from the answers of their medical history questionnaires (Currently in beta testing), diagnostic test results and abstracted clinical records for the use of their own personal medical care. In the future, we plan to be HL7 and SNOMED Health compliant. We are developing a set of XML-based export/ import processes to share information with electronic medical records or other clinical studies. We would ask participating research studies to consider including an authorization notice in their IC to share a study participant's information with other future de-identified studies. Thus, this study could further leverage the efforts of multiple studies without undertaking repetitive characterization of the study participants themselves. We have developed software to reconcile pedigree information and hope to employ AI algorithms for searching among the family history data to construct matches among families of distantly related participants even with missing data and misspelled names.
We would welcome the ongoing sharing of future questionnaire sets with other investigators. A major advantage of using existing, standardized questionnaires is that it would facilitate metadata analyses across future, multiple studies. This has been a major objective of the NIH-funded PhenX project, which plans to merge genetics and epidemiologic research studies [23].