Infrastructure and Training to Bring Next-Generation Sequence (NGS) Analysis into Undergraduate Education


Our three-year NSF-funded project is developing a sustainable infrastructure and training program to assist undergraduate faculty in integrating Next-Generation Sequence (NGS) analysis into course-based and independent student research. The NSF Proposal Summary and Narrative provides details about the project rationale and structure. Participating faculty are developing more than 100 RNA sequence (RNA-Seq) datasets that bear on novel research problems in eukaryotic genomics. Following refinement of a biochemical and bioinformatics workflow by project staff, a Working Group retreat was conducted at Cold Spring Harbor Laboratory in Year 1 (June 2014) with 10 faculty. In Year 2 (June 2015) 33 faculty participated in 2 workshops that were held at Bowie State University and at California State University at Long Beach. These participants are from diverse institutions and regions of the country and were selected on the basis of proposals for tractable projects investigating differential gene expression and transcriptome re-sequencing. In Year 3, we will be holding a virtual workshop. Faculty teams will learn all phases of a bioinformatics workflow to analyze their datasets, and leave the workshops with a curriculum plan to distribute data analysis among student teams.

Project Plan Summary

Data analysis uses large-scale data storage, bioinformatics workflows, and high performance computing provided by the iPlant Collaborative, an NSF-supported cyberinfrastructure for biological research. Primary training is transitioning from in-person workshops to online webinars and self-paced learning via this dedicated Internet website, providing a sustainable method to introduce large numbers of faculty to NGS analysis. Participants also share instructional strategies and solve analysis problems during regular videoconferences. A multi-faceted evaluation program assesses: 1) impact of the training on faculty participants' knowledge, behavior, and teaching confidence, 2) faculty implementation of the project in a variety of classroom and student research settings, and 3) effects on student learning, interests, and attitudes. Use of the validated SURE survey instrument allows comparison with other students' research in other fields and educational settings.

Intellectual Merit

NGS methods have dramatically decreased the cost of obtaining whole genome data on eukaryotic organisms, and data storage and analysis workflows are becoming freely available online. In particular, RNA-Seq can provide novel data on gene structure and function. This project aims to help move undergraduate education into an age when students work with whole genome sequences as routinely as they work with PCR amplicons today. This project operates on the continuum of biology research and education. It recognizes that many college faculty would like to bring NGS to bear on a problem of their own interest – and invite students as co-investigators in class-based and independent projects. The program is preparing faculty to operate in a new, sequence-driven paradigm and empower them to guide students in novel genome explorations.

Broader Impact

Free online tools have made sophisticated genome analysis available to anyone with an Internet connection. This project is extending the egalitarian nature of genome research by providing an infrastructure for undergraduate faculty to generate and analyze their own genome-scale datasets. About 25% of faculty are from minority-serving institutions with the objective of reaching African American, Hispanic, and Native American faculty and students. The project also provides faculty at predominately teaching institutions access to high performance computing through the NSF's Extreme Science and Engineering Discovery Environment (XSEDE). The Green Line of DNA Subway is an educational workflow specifically designed to support student analysis of RNA-Seq data sets. Advanced applications, including command line customization, are supported in the research-grade Discovery Environment. This infrastructure makes it possible to broadly disseminate on-demand experiments using RNA-Seq in undergraduate settings.

Funding and Support