Subrata Acharya received her Ph.D. in Computer Science from the University of Pittsburgh, 2008 & M.S. in Computer Engineering from Texas A&M University,College Station, 2004. She has published over 50 peer-reviewed book chapters, peer-reviewed papers at international conferences and in journals in the area of computer and information security. Acharya has obtained significant extramural funding to support her scholarship efforts, including $450K as PI and $230K as co- PI. of particular note is Acharya’s US patent 7966655 B2, awarded in 2011 with Wang Ge and Greenberg for Method and apparatus for optimizing a firewall. Acharya has also developed new courses in the area of health care informatics. She has mentored various students who have appeared as co-authors on her papers, and has supervised numerous undergraduate research projects, masters’ graduate projects, and doctoral dissertation studies.


Apache Hadoop has the potential to offer powerful and cost effective solutions to big data analytics in health care systems; however, sensitive data stored within an HDFS infrastructure have equal potential to be an attractive target for exfiltration, corruption, unauthorized access, and modification. Pairing Apache Hadoop distribute file storage with hardware based Trusted Computing mechanisms based on TCG standards has the potential to alleviate risk of data compromise and maintain information compliance of federal and/or state governmental standards. With the growing use of Hadoop to tackle big data analytics involving sensitive health care data, an HDFS cluster could be a target for data exfiltration, corruption or modification. By implementing open, standards based Trusted Computing Technology at the infrastructure and application levels; a novel and robust security posture and protection is presented to address the issue. A discussion of the motivation for research on this topic, a threat model and evaluation of a targeted Advanced Persistent Threat against HDFS is presented and a set of common security concerns within HDFS is addressed through infrastructure and software involving integrity validation and data-at-rest encryption. To accomplish these goals, technology from the Trusted Computing Group, such as the pervasively available Trusted Platform Module is used. In addition, a discussion of design
considerations in building an encryption framework for Hadoop in a trustworthy manner is presented along with a description of performance and security results of experiments, creating an encryption scheme for Hadoop utilizing hardware key protections and AES-NI for encryption acceleration (based on data obtained from a real world large scale (> 400 beds) healthcare system). This work includes an evaluation of the recently implemented crypto framework for Hadoop and independent test of the performance claims of AES-NI is regarding mitigating encryption performance overhead.