DACCOTA Data Informatics Services
The Biostatistics, Epidemiology, and Research Design Core (BERDC) provides computational infrastructure and a variety of data informatics services to support clinical and translational cancer research.
Data Informatics Services: The BERDC uses a secure web application designed to support data capture for research studies. It provides users with multiple features such as multi-site data entry, real-time data entry validation, and audit trails. This application also has the ability to set up a calendar to schedule and track critical study events such as blood draws, participant visits, etc. Research Electronic Data Capture (REDCap) or an SAS-based web portal will be used to manage clinical data. The BERDC will use diverse data sets and include data integration and data oversight requirements across the multiple affiliated but independent institutions within the DaCCoTA. Our goal is to present a flexible yet robust model for data management, integration, and distribution that provides investigators with multiple technical approaches that are within an overall management framework that meets institutional oversight and data security requirements. This core will work closely to integrate institutional IRB and HIPAA requirements within the CCTSI data management processes to ensure that research data oversight can be uniformly applied across disparate data systems and needs. We strongly believe that our focus on embedding optimal workflows will accelerate translational studies without compromising our core values of data sharing, security, and oversight. Other important aspects that we will consider in this core include the following:
- With Sanford, VA, and the North and South Dakota Cancer Registries, we will be able to link a broad range of data sources (e.g., social services, financial information) to better understand how health systems can be improved, specifically to understand the impact of social and economic determinants of cancer.
- We will ensure that research integrates patient-reported experiences with outcomes.
- We will focus on quality and safety, as well as on value for money, to compare systems across the Dakotas and improve the care that is delivered.
- We will work closely with the Epigenetics and Cancer Research Working Group at UND, which is funded by the NIH. The Epigenetics/Cancer Research Working Group has approximately 20 laboratories from three departments in the SMHS, as well as talented scientists from the College of Nursing, the College of Arts and Sciences, and the USDA Grand Forks Human Nutrition Research Center who are working on diseases such as cancer, obesity, diabetes, infertility, and neurodegenerative disorders such as Alzheimer's disease and Parkinson's disease.
- We will focus on vulnerable populations such as American Indians and rural communities in North and South Dakota.
- We will provide data management and related programming.
- We will review database design and data collection instruments.
- We will develop custom data collection systems.
Computational Infrastructure: For data analysis, the core has access to three high-power workstations for handling and analysis of large datasets that complement the use of the UND Computational Resource Center (a high performance computational cluster). These include the following:
- A total of four Dell Precision T7610 Tower Workstations each equipped with Intel Xeon E5-2687W v2 Eight-core 3.4 GHz Turbo, 25 MB processors (capable of 16 independent thread processes), 64 GB 1866MHz DDR3 RAM, a 1GB NVIDIA Quadro K600 video card, a 256 GB Solid-state drive and two 1TB SATA drives, a DVD-RW drive, a 10Gb Network adapter, a Nvidia Tesla K20C Computer Processor (2496 cores capable of 13.52 Tflops of single precision floating point performance), two 24-inch monitors, and a mouse and keyboard
- One Dell Precision T5610 Tower Workstation running Windows 7 with Intel Xeon E5-2687W v2 Eight-core 3.4 GHz Turbo, 25 MB processors 64 GB 1866MHz DDR3 RAM, a 1GB NVIDIA Quadro K600 video card, a 256 GB Solid-state drive and two 1TB SATA drives, a DVD-RW drive, a 10Gb Network adapter, two 24-inch monitors, and a mouse and keyboard
- The High-Performance Computing (HPC) cluster at the UND Center for Computational Research (UND-CRC), which has 32 Dell PowerEdge 720 server compute nodes each with 3.4 GHz four-core Sandy Bridge processors and 64GB RAM, 8 Nvidia Tesla K20 GPU, and 8 Intel Xeon Phi cards.
The core and its users produce large amounts of data, which can be stored on the redundant data storage on the High-Availability NSS Dell storage appliance (110 TB usable space with weekly backups, located at the UND-CRC) and on a Dell PowerEdge R720xd (6 core Intel Xeon processor) with a dedicated NIC setup with six 4 TB SATA hard drive platters configured for file versioning with RAID 6 Disk Redundancy to protect from disk failure. The server's data is replicated to a second UND (off-site) server via DFS Replication. The core is fully staffed, and the personnel also offer training sessions for core users to facilitate data analysis.
Data Storage: Data comes in various forms including paper based, digital, and laboratory samples. Data storage and recording requirements also vary by discipline and project. Data collected by the investigator or existing data such as epigenetic and de-identified cancer registry data will be stored on a high-powered server with all of the security measures necessary to ensure confidentiality and privacy. The server's data will be backed up to allow for restoration of any folder or file back to its previous state. Permission for access will be determined based on a protocol established by the DaCCoTA.
Data Management Policy: Existing data management policies with slight modifications will be used as guidelines for compliance and adherence to the best practices of data handling. These will include compliance with ethics protocols, and we will ensure that members are aware of the mechanisms and possible consequences of non-compliance with the policy. All members will be expected to adhere to the principles of data possession, quality retention, storage, preservation, reproduction, circulation, encryption, archival means, and proprietary jurisdictions. The policy will also outline data management roles and the responsibilities of internal and external stakeholders, funding institutions, and collaborative partners across teams.
Contact the BERDC
Be sure to contact the BERDC to schedule a consultation.