A collaborative effort by eleven Virginia public research universities.
Virginia ACCORD (Assuring Compliance for Computing and Research Data) is a new cyberinstrument project being developed through a partnership between eleven public research universities across the Commonwealth of Virginia. The goal of ACCORD is to democratize access to advanced research computing, enabling researchers to participate in the data revolution regardless of their institutions' current computing infrastructure capabilities.
PI: Dr. Ron Hutchins, Vice President for Information Technology, University of Virginia [BIO]
Co-PI: Dr. Scott Midkiff, Vice President for Information Technology & Chief Information Officer, Virginia Polytechnic Institute [BIO]
Co-PI: Dr. Deborah Crawford, Vice President for Research, George Mason University [BIO]
Co-PI: Dr. Masha Sosonkina, Professor of Modeling, Simulation, & Visualization Engineering, Old Dominion University [BIO]
Co-PI: Dr. Thomas Cheatham, Director of the Center for High Performance Computing, University of Utah [BIO]
ACCORD Program Manager: Dr. Tho Nguyen, Office of the Vice-President for Information Technology, University of Virginia [BIO]
ACCORD Technical Advisor: Dr. Andrew Grimshaw, Professor of Computer Science, University of Virginia [BIO]
The Community Advisory Board (CAB) is tasked with reviewing and providing feedback when appropriate on instrument technical design decisions, project progress, and governance structure. The CAB also serves as liaison to broaden the community of supporter and collaborators beyond the Virginia partners.
Von Welch, Director, Center for Applied Cybersecurity Research, Indiana University
Inder Monga, Executive Director, Energy Science Network (ESNET)
Ruth Marinshaw, CTO-Research Computing, Stanford University
Tom Lehman, Director of Research, Mid-Atlantic Crossroads (MAX)
Richard Starr, Research Scientist for Protected Health Data, Georgia Tech
Today's scientific enterprise is collecting, storing, and analyzing increasingly diverse data types from multiple different sources. The need for protecting data to comply with applicable laws and contract agreements has become a burden for researchers. While many institutions have invested in specialized cyberinfrastructure to support priority research areas (e.g., HIPAA-compliant clusters supporting medical research), researchers working with other types sensitive data are often left to ensure compliance on their own. The lack of access to high-performance and security-compliant cyberinfrastructure can severely limit or even prevent researchers from undertaking projects where data protection demands auditable security implementations. This challenge is exacerbated at smaller, minority-serving, or teaching-oriented institutions, excluding them from many major research opportunities. And this problem is only continuing to grow as the data revolution continues into the 21st Century.
To address these challenges, this proposal aims to develop ACCORD, a shared research computing cyberinstrument hosted at the University of Virginia, providing access to researchers at ten universities across the Commonwealth of Virginia. ACCORD integrates computing resources with security mechanisms to provide research computing services to scientists working with diverse types of sensitive data. Guided by the individual project's Data Usage Agreement, ACCORD delivers the appropriate compute infrastructure with necessary security components to protect data and meet compliance requirements while avoiding needless burden on the analytics and discovery. ACCORD automatically documents all user activities, workflows, technical implementations, and execution logs for auditing support.
INTELLECTUAL MERIT: The proposed ACCORD cyberinstrument will underpin a new research agenda by offering new capabilities to support research efforts involving sensitive data that are currently highly difficult or impractical for researchers to pursue. Through ACCORD, scientists will be able to break new ground in collaborating with government and industry partners on projects where the demand for data protection must be met with clearly documented and verifiable mechanisms. New multi-disciplinary research projects bringing together diverse sensitive datasets will be able to collect, store, and analyze data on a single instrument, enabling new levels of analytics and discoveries.
BROADER IMPACT: The Virginia ACCORD cyberinstrument will support both research and research training at 10 Virginia public research universities, including two HBCU institutions - Virginia State University and Norfolk State University. Regardless of a researcher's home institution cyberinfrastructure status, he/she will have access to computing resources with documented capabilities and services, hence, democratizing research participation and research training opportunities across Commonwealth. The ACCORD instrument design and community collaboration model can be transferable to other consortiums and regional communities. The ACCORD team will distill technical knowledge and best practices learned to develop new training modules in data science and cybersecurity, especially enriching curriculum to enhance the perspective on data protection compliance.
As dictated by the scientific process, research across the sciences relies on data to formulate hypotheses and substantiate new discoveries. Protecting research data is a critical task for many reasons. At the most basic level, research reproducibility mandates that data must be carefully tracked and accounted for. At the other end of the spectrum, researchers must safeguard data to preserve persons' rights to privacy and security, defend public interests, and protect industry's trade secrets. Simply put, in order to maintain research integrity, all research data must have documented protection at the appropriate level. Therefore, ACCORD is being developed as an inherently “secured” system, providing tools and processes that can be part of an end-to-end security implementation.
ACCORD considers all research data to be sensitive to some degree and implements a security baseline to protect all data hosted on the system. Computation on ACCORD will be in the general categories of High-Throughput Computing (HTC) and High-Performance Computing (HPC). Since ACCORD's primary mission is to broaden the community of researchers, the system is not prioritized to support HPC jobs (there are many other research HPC systems available in the community for massive jobs requiring HPC resources). ACCORD is intended to support application research. Therefore, the system may integrate accelerators (to be determined by needs), allowing researchers to implement specialized algorithms. Overall: the ACCORD cyberinstrument is built as a protected high-throughput (HTC) system with high-performance (HPC) and specialized hardware allocations available.
As ACCORD's mission is to broaden research participation, identity and access management (IAM) across the consortium partner institutions is a critical component of the instrument. ACCORD IAM involves both mechanisms and policies to balance between lowering the barrier of effort for users while maintaining an acceptable baseline level of system security.
We recognize that researchers are least inconvenienced when they are able to use their campus identity for authentication. Therefore, we strive to provide a federated IAM service to ACCORD users in the consortium. The key to leveraging campus identity within ACCORD is that we are a fixed community (Virginia ACCORD Consortium) and able to agree on common standards. ACCORD will use the InCommon/REFEDS authentication context assertion to indicate that a sufficient level of campus multi-factor authentication and other common ACCORD identity processing has been completed to sufficiently authenticate the researcher. A new ACCORD-specific assertion can be created to indicate a higher level of assurance for more sensitive data applications.
The Virginia ACCORD consortium leverages the Mid-Atlantic Research Infrastructure Alliance (MARIA) network to support access for users at member institutions. Participation in MARIA puts Virginia institutions among the best-connected institutions to Internet 2, which is US’ largest research and education (R&E) network. The MARIA 100G connections converge at the Mid-Atlantic Research and Education Exchange (MREX), operated by Virginia Tech, at a strategic location in Ashburn, VA. The MREX provides an information exchange and shared access to Internet2, Energy Sciences Network (ESnet), federal research networks, commodity Internet services, and content services. The Internet2 Network also co-locates a major global interconnection point at the MREX location in Ashburn.
Cyberinstruments are vulnerable to well-meaning users just as much as malicious actors. Therefore, the ACCORD cyberinstrument implements an architecture that provides both mechanisms and policies to protect system integrity. The security architecture (see figure to the right) is designed to support secured processes for moving data into and out of the system.
The ACCORD security architecture comprises a secured environment and key mechanisms to facilitate moving data. The first component of the secure data movement process is a Data Transfer Web Application that enables users to move data into the secured environment. The web application uses an overall pre-approved port list, modified only by an ACCORD administrator, that is referenced when a data transfer is initiated. A pre-approved port can be opened by the web application via the file transfer request just prior to calling GLOBUS through GLOBUS APIs to execute the transfer. When GLOBUS completes the transfer, the web application closes the opened port, minimizing the attack surface area of the DMZ. The second component of the process is a data transfer node (DTN) inside the system. The DTN does not connect to any resource inside the secured environment, only through the DMZ to a data source. And the third component is a secured storage that is mounted on both the ACCORD Cluster AND the DTN.
Data movement in and out of the system will be controlled by a distributed management model involving both mechanisms and policies that are documented clearly in detail. Depending on the project’s data usage agreement, users may or may not be able to directly move data (using GLOBUS) out of ACCORD to another system. ACCORD implements a third-party approval process to permit the user moving data out of the system. Approval privilege can be assigned to an account on ACCORD by the administrator. The approver can be the project PI, project data manager, or the data owner. In all cases, users wishing to move data out are required to show competencies (through a training process determined by the institution) on compliance requirements for their data.
A key component of a security compliance plan is capturing the appropriate documentation to support compliance auditing. ACCORD automatically preserves configuration of workflows, user activities, job execution records, and logs of data movement for each project to corroborate the project’s compliance case.
The ACCORD cyberinstrument is designed to support computation on sensitive data, which is a need across all disciplines and applicable for any research institution. The ACCORD consortium of 10 public research universities in Virginia engage in a vibrant agenda of research and research training in all fields of science. It is impractical in the available space just to describe in detail the overwhelming number of research projects that have expressed interest in using ACCORD; and we believe that once the instrument is fully deploy the use cases will continue to grow.
Below, we highlight a subset of ACCORD use cases to demonstrate coverage across research disciplines and among our consortium members.
Virginia Tech (VT):
Project 1: Professor Annie Pearce (School of Construction) and Professor Walid Saad (Electrical and Computer Engineering) are collaborating on a project titled Sustainable water infrastructure networks: cooperation, investment, and decentralization. In this project, Pearce and Saad work with municipalities to model and analyze water infrastructure networks for the purpose of recommending new models of investments, collaborations, and decentralized system design. This effort involves sharing of data between the researchers and municipalities. Municipal partners are concerned about revealing system details that could increase the vulnerability of critical infrastructure to human threats, including specific locations and attributes of in-place infrastructure components. Additionally, they also have privacy concerns about sharing resource consumption data associated with individual households and customers. By using the ACCORD cyberinstrument, Pearce and Saad can demonstrate to municipality partners security measures being taken to protect their data.
Project 2: Peter Sforza (Dept. of Geography and Dept. of Agroforestry) is working on a project titled Smart Farm: Development of Personal Spatial Data Infrastructure (pSDI) for Commercial Farm Data use in the Calibration and Validation of Agricultural Simulation and Forecast Models. Modeling and simulation of farm level data (e.g., the Virginia Grape Production Survey data) would greatly benefit the improvement, validation, and calibration of existing models and indices. Availability of this service would also aid the development of new models based on large scale, region/nation-wide data. Farm level data is considered as sensitive data. Research computing cyberinfrastructure with security compliance capabilities will enable large scale engagement with agricultural communities that is difficult for university and commercial organizations to achieve. ACCORD is positioned to handle the data sharing and curation process, as well as to secure the modeling and simulation performed with the data.
Old Dominion University (ODU):
Project 1: Professor Holly Gaff is a mathematician in the Department of Biological Sciences. She is working with the Center for Disease Control and Prevention (CDC) to develop simulations and mathematical models to understand the pattern of the disease spread in space and time. Her research focus is in the studies of ticks and tick-borne diseases. Dr. Gaff recently secured a contract with the CDC to modernize and extend the capabilities of LYMESIM, a computer program for simulating the spread of ticks and Lyme disease in a given environment. To carry out this effort, she has to work in a strictly secure environment as specified by CDC. The current level of security within ODU research infrastructure does not support her needs. Presently, the work has to be done on an isolated computer not connected to the network. The ACCORD cyberinstrument can provide a secure high-performance computing environment to support both the security as well as computational needs of the project.
Project 2: Dr. Michael Robinson is the Director of the ODU Center for Innovative Transportation Solutions (CITS). He led VMASC advances to the RtePM evacuation simulation, a tool that assists emergency managers to estimate the time required to evacuate a geographical region due to natural or man-made disasters. RtePM includes sensitive geospatial information from the Department of Homeland Security. Highly realistic simulations of such an evacuation, which include more parameters and various scenarios, requires the use of a secure HPC infrastructure. This simulation can potentially generate a large volume of text data (in the order of many GBs to nearly 1 TB). The ACCORD cyberinstrument is well-suited for Dr. Robinson to continue to expand this impactful research effort.
George Mason University (GMU):
Project 1: Professor Huzefa Rangwala is developing a prototype software for providing transnational crime insights. Using open source indicators such as news feeds and social media, and advanced data analytics, this project seeks to provide analysts and law enforcement partners a tool to: (a) predict crime levels (intensity) in various cities; (b) identify precursors associated with these events; (c) correlate heterogeneous crime-events such as narcotics trafficking and gang migrations to the activities of gangs, and (d) provide a model for predicting the next opioid crisis in the United States. This project requires access and sharing of sensitive, non-public, governmental data as well as significant computational resources. Additionally, ACCORD will enable researchers at two consortium members (GMU and VT) currently collaborating on this project to work together easily and securely.
Project 2: Dr. David Weisburd, Professor in Criminology, Law and Society in the College of Humanities and Social Sciences is working with Dr. Alese Wooditch at Temple University to develop an agent-based model (ABM) that will assess the impact of the Department of Homeland Security (DHS) investigations on communities. The project will collect and use data from DHS that will allow the researchers to conduct case studies. Such data is very likely to contain sensitive information (e.g., personal records) and needs to be protected up front. ACCORD would permit investigators to have access to significant computing capacity, while providing sponsors with assurances that the data will be appropriately safeguarded.
College of William and Mary (W&M):
Project 1: Dr. Dan Runfola, Director of the Data Science program was approached by the World Bank in early 2017 to build a geospatial data portal similar to the Geoquery site (http://geoquery.org) as part of the AidData program. However, the project was NOT pursued by Dr. Runfola due to constraints mandated by the WB on its datasets. With the availability of the ACCORD cyberinstrument, Dr. Runfola’s team could leverage its customizable security protocols to overcome the data protection complexity of this project, and will likely be able to take on this project.
Project 2: Dr. John Delos, Professor of Physics at W&M, is collaborating with Drs. Randall Moorman and Karen Fairchild at UVA to develop a predictive tool for neonatal intensive care. This project uses anonymized HIPAA data of neonatal infants to show that some routinely observed vital signs (such as heart rate, oxygen saturation, etc.) can be analyzed to obtain predictive information – e.g., what is the likelihood that this patient will come down with sepsis in the next 24 hours, or need emergency intubation. For many reasons, while data is deidentified, it still needs to be protected. The ACCORD cyberinstrument would offer two major improvements to this collaboration. First, collaborators would be able to directly access the data since appropriate HIPAA protocols could be implemented at both UVA and W&M. Second, the analysis could be easily performed on ACCORD without having to move data outside of the system.
Virginia Commonwealth University (VCU):
Project 1: Dr. Jennifer Fettweis is investigating microbiome transmission between mothers and their offspring and risk for development of obesity. The study leverages an existing cohort established in an ongoing integrative Human Microbiome Project (iHMP) study. This pilot study will include the prospective collection and analysis of stool and rectal samples from children at age 3 years as well as analysis of existing maternal samples previously collected in the third trimester of pregnancy. The data for this study needs to be secured from the collection stage throughout its analysis and archiving/destruction. The ACCORD cyberinstrument can provide a single platform that supports the data handling through its entire lifecycle.
Project 2: Professor Gregory Buck of VCU is investigating microbial biofilms, which occur ubiquitously in the environment and within host organisms. An understanding of the physicochemical properties of biofilms would let us harness their benefits and mitigate their harm. Researchers are employing a high-throughput approach to quantify spatial and physicochemical properties in ex-vivo biofilm samples in response to controlled environmental parameters. They will compare biofilms from three niches: the buccal epithelial biofilm, subgingival plaque in periodontal disease (PD), and the vaginal epithelial biofilm, which occurs during bacterial vaginosis (BV). A subset of the data will need to be protected due to its source. ACCORD can serve as a single tool that meets both computation and protection need for all data associated with this project; hence, removing a major burden from the researchers.
Radford University (RU):
Dr. Victoria Bierman in the School of Nursing is addressing the critical shortage of primary care providers in Rural America by providing advanced training for Family Nurse Practitioner (FNP) and Psychiatric Mental Health Practitioner (PMHNP). These professionals provide critical services for underserved populations, including veterans, in Southwest Virginia. The advanced training requires sharing of course material from multiple sources (hospitals, clinics, military, health departments, and other training programs). This data needs to be protected even when it is only used for training. The ACCORD cyberinstrument is both capable and convenient for Dr. Bierman’s project to use due to ease of access for students (i.e., using their own school credentials on the system).
University of Virginia at Wise (UVA-Wise):
Margaret Tomann is the Program Manager for the Healthy Appalachia Institute, which is an initiative to support economic development in the underserved population of Southwest Virginia. A popular program under this institute is the Undergraduate Research Fellowship program, which supports undergraduate research in Public Health (either as a major or a minor added on to their studies). In addition to helping to protect population and health data for undergraduate research, the ACCORD cyberinstrument will also enable students to train on security and compliance issues.
James Madison University (JMU):
Dr. Nick Swayne at JMU’s Center for Genome and Metagenome Studies (CGMES) is working on innovative, leading-edge research and research training in the methods and principles of genomics, proteomics, and bioinformatics for students at all levels. The data involved consists primarily of pre-publication genomic and proteomic datasets generated on the JMU campus, at collaborating research institutes, or purchased from private commercial vendor. These data are being used for ongoing research and instructional purposes for both undergraduate and graduate students. Protecting this data is critical to preserve data integrity, support research reproducibility, and protect researchers’ intellectual property. The ACCORD instrument will be used widely by at least 16 JMU researchers and dozens of students annually should it become available.
University of Virginia (UVA):
Project 1: Professor Peter Beling, Chair of Systems Engineering Department, leads an NSF I/UCRC site on Visual and Decision Informatics (CVDI). Prof. Beling is collaborating with Capital One on analyzing financial data to detect fraud. Even though the data has been “cleaned”, the industry partner is still wary of giving away client information, business strategies, or other sensitive “secret sauce”. With the availability of the ACCORD cyberinstrument, Prof. Beling can present a clear security plan with detailed documentation to Capital One and come up with a workflow that is acceptable to both sides. Similarly, ACCORD is positioned to advance the industry’s overall robustness and enhance business development in the Commonwealth of Virginia.
Project 2: Professor Matthew Gerber is developing “Sensus”, a cross-platform, general-purpose system for mobile crowdsensing in human-subject studies. This is a versatile platform that can be used in many applications. Currently, Prof. Gerber is funded by the DARPA Warfighter Analytics using Smartphones for Health (WASH) program to deploy Sensus. There is also a strong potential that, if successful, Sensus can be scaled up to support the entire DARPA-WASH program. As a crowd-sensing platform, Sensus captures crowd information and stores them for later analysis. Due to its versatile ability to collect multiple modalities of population data, the data must be protected up front and throughout its existence on the platform (or until deemed not-sensitive). The ACCORD cyberinstrument can be the enabling tool to support the backend of Sensus data handling.
Research training: The ACCORD consortium includes minority-serving and teaching-oriented institutions where research training is an ever more critical mission. As stated by the CIO of James Madison University: “Creating and supporting undergraduate research programs involves a philosophy that is fundamentally different from the basic research done by research-intensive doctoral institutions.” The training program must be accessible to both instructors and students. Popular research training topics include health sciences (e.g., Radford U., UVA at Wise, VCU), data science (e.g., GMU, NSU, ODU), cybersecurity (e.g., ODU, VSU, VCU), and natural and life sciences (e.g., JMU, ODU, W&M). The accessibility and secured design of the ACCORD cyberinstrument will truly transform research training in State of Virginia.
Virginia Smart City Actuator: ACCORD supporting business startups "...The Smart City Actuator program is an innovative initiative by the State of Virignia to catalyze new solutions in smart cities that will drive the State's economy and benefit the public. Many smart cities startups are analyzing data to inform important development decisions in public safety, education, health, and the environment. Handling of "smart city" data, which often comprises both publicly available and protected data, imposes a demand on cyberinfrastructure and personnel expertise that is difficult to meet by startup companies. The ACCORD platform is a potential game-changing partner that supports the sensitive data protection tasks for startups, freeing up resources and allowing them to focus on innovating their product..." David Ihrie, CTO, Virginia Center for Innovative Technology
Center for Visual and Data Analytics: ACCORD underpinning industry-university research collaborations "Members of the financial industry compete for business; however, they are also very interested in collaborating on topics of mutual interest such as combatting financial fraud. Sharing data and collaborating with researchers allow companies to better understand and detect fraud; but data sharing must be balanced with not giving away client information, business strategies, or other sensitive "secret sauce" information. The Center for Visual and Decision Informatics at UVA is a National Science Foundation Industry/University Cooperative Research Center. CVDI works with partners such as Capital One to research and develop novel, privacy-preserving fraud detection and deterrence techniques. The ACCORD instrument is much needed platform that enable these partners to share data with CVDI researchers in a secured and controlled manner. This new collaboration mechanism will advance the industry's overall robustness and enhance business development in the Commonwealth." Peter Beling, Director of UVA CVDI Site, University of Virginia
GMU: ACCORD assuring compliance and secure research "The ACCORD initiative has the exciting potential to foster cross-institutional research collaboration projects that require not only significant computing resources, but also secure data protection capability. By providing secure, high performance computing, this initiative lowers the entry barriers to researchers who might not otherwise be able easily to meet the increasingly stringent standards that are being required by both government and private-sector sponsors." Rebecca Hartley, Director of Export Compliance & Secure Research, George Mason University
Healthy Appalachia Institute: ACCORD suppporting healthcare research in remote and underserved areas "The University of Virginia's College at Wise is dedicated to serving members of the Appalachian communities in SW Virginia. Programs such as the Healthy Appalachia Institute are beneficial to the College in its research efforts and education mission as well as in providing a valuable healthcare service to residents. In addition to storing and accessing patient data by care providers, researchers also analyze data to identify health concerns and problematic issues, enabling them to provide better services throughout the community. As part of that service, the College strives to provide the utmost protection for all patient health data. Through the ACCORD system, service providers and researchers are conveniently able to store, access, share, and analyze health data while data are protected end-to-end." Scott Bevins, Asc. Vice Chancellor for Information Services, UVA College at Wise
JMU: ACCORD support collaborations at a low cost "James Madison University is striving to provide high-impact learning experiences such as undergraduate research and service learning in a climate that fosters intellectual engagement in and outside the classroom. Collaborating on the ACCORD project provides JMU faculty, students, and staff access to advanced computing resources and data management capabilities. The ACCORD cyberinstrument will be an important platform enabling the JMU community to collaborate among ourselves as well as with other partners in the Commonwealth." Dale Hulvey, Asc. Vice President for Information Technology, James Madison University
We welcome community engagement in multiple capacities. In addition to the technical design feedback, we especially welcome contribution from the policy/compliance community. We also welcome feedback from the application researchers to develop an effective workflow.
ACCORD Spring 2017 All- Hands-Meeting (AHM) - Hosted at The University of Virginia on March 27, 2017.
The Spring 2017 AHM convened ACCORD partners to bring everyone up to date on project status and plan several key activities going forward. The Architecture WG reports plans to test provisioning solutions. The architecture team will also take on establishing desired workflow based specific use cases. The ACCORD team discussed with representative from the Science Gateway Community Institute several options for collaboration. Specifically, SGCI experience and expertise will be helpful to the architecture team as they work on identity management and workflow design.
Going forward, the ACCORD consortium members agree on kick-starting several major projects:
Strategy & planning meeting - Hosted at The University of Virginia on September 12, 2016. The planning meeting convened representatives of the technical and policy teams from partner institutions. The goal was to discuss overall design of the Virginia ACCORD cyberinstrument. Attendees agreed that the proposed instrument capabilities is currently lacking in the community. The technical attendees agreed on the feasibility of realizing the instrument's design concept. The PI facilitated discussions on how to best involve each of the Virginia partners. The meeting concluded with a clear timeline and milestones toward completing the project proposal.
Project kick-off meeting - Hosted at The University of Virginia on August 25, 2016.
The kick-off meeting brought together the initial interested partners to the University of Virginia where the PI, Ron Hutchins, presented the vision of the project. Participating Virginia universities and organizations expressed strong support for the project. The team reached consensus on going forward establishing project governance and strategies.