Training students in analyzing “big data”: A case of plant stress response

Irina Makarevitch, Ph.D.
Associate Professor, Hamline University

Irina Makarevitch

Although many biology fields rely on analysis of large data sets and “data scientists” were named “the sexiest job of the 21st century”, undergraduate students, especially in small primarily undergraduate colleges, simply do not have many opportunities to work with large data sets.  I am interested in maize response to stress and developed an RNA-Seq data set documenting expression level of maize genes across three genotypes and four abiotic stress conditions.  Having already used my research projects in my courses, full of anxiety and excitement, and armed with a bunch of worksheets, animations, short exercises, and long guidelines, as well as large data set files in various formats, I started on a journey.

Principles of Genetics is an introductory genetics course (taught with a Lab) with about 100 students from various biology-related majors.  Students spent three weeks (one three hour-lab each week) playing with RNA-Seq data and learning about gene expression analysis.  It was a lot to take in: complicated and not always intuitive software, complex graphs, the chemistry and ideas of nextgen sequencing.  Learning all of it simultaneously, while also rediscovering the fundamental rules of gene regulation and environmental effects, was, no doubt, difficult and sometimes frustrating to many students.  I spent all these labs walking from one group to another, explaining the ideas and approaches, finding mistakes in R code, providing hints for graph design and interpretation, and encouraging, encouraging, encouraging…  At times I felt downhearted.  However, I continued sharing my experience of “big data analysis”, a story of a kid thrown to a deep lake, looking for resources and learning to swim, and most students stayed in there with me.  At the end, they asked a lot of interesting questions and even answered many of them using the data, made fabulous graphs, and provided meaningful interpretations of the data.  I am very proud of my students for working through challenges, for getting their hands and brains dirty, for seeking and finding help, and for supporting each other through the process.  One of my students expressed it all really well - storming into my office and proudly showing me the heat map she built: “My graphs look just like figures in the actual paper! I never knew I could do it!”  We all learned a lot through this experience: they learned what a real research project in biology looks like and how complicated data analysis is today.  I learned to not underestimate my students and not to be afraid to try! So I look forward to my next semester, when again, full of anxiety and excitement, and armed with a bunch of refined worksheets and redesigned guidelines and large data set files in even more formats, I will start the next part of my journey.