Open data for science education

Open data is the idea that scientific data should be freely available to all, without restrictions, in searchable online repositories. The open data movement is gaining momentum in the scientific community because of its promise to enable more frequent replication of studies and to accelerate the pace of research. But the advantages for science education are just as compelling.

Science students can benefit greatly from educational materials that expose them to real-world phenomena and data. Unlike learning from broad generalizations and pre-fabricated “cookbook” labs, examining and working with real data can increase interest and better prepare students for careers in science. As states begin to adopt the Next Generation Science Standards, which emphasize practices such as analyzing and interpreting data, and mathematical and computational thinking, developers of K–12 science curriculum materials are increasingly looking for ways to incorporate scientific data into their lessons and assessments.

However, barriers exist that prevent educators from effectively using much of the data that scientists produce. As a reader of PLOS blogs, you are likely familiar with the open access movement in scholarly publishing. But even access to journal articles, though valuable, is often not sufficient for educators’ purposes. Data in journal articles are usually in the form of a few graphs. These graphs are typically frozen in PDFs as part of a paper that conveys the authors’ interpretation of the results in the context of their particular study. And the data presentation choices were made with one audience in mind: experts in the field.

Open Data by Colleen Simon for opensource.com (CC-BY-SA)
Open Data by Colleen Simon for opensource.com (CC-BY-SA)

Using data as it is presented in papers is almost never pedagogically sound at the middle or high school level; much must be changed about the presentation. Jargon and acronyms might have to be removed from axis titles, individual data sets might need to be separated if they are layered into a single figure, or perhaps a section of the graph that describes phenomena outside the scope of the lesson and would have to be removed. Making these kinds of educationally necessary modifications—while maintaining scientific accuracy—often requires access to full original datasets.

Unfortunately, most scientific data is not archived and readily available online. Educators have to contact the study authors and see if they are willing and able to pass it along. Just as with journal articles, this “write to the author” stop-gap is wildly inefficient. Study authors often can’t or won’t respond to requests for original data for a variety of reasons. Sometimes they are simply out of town and not checking email. Sometimes they want to publish more papers and are afraid of getting scooped. And sometimes, especially with older studies, they actually can’t find their data.

In a 2002 survey of geneticists, of those who admitted to denying at least one request from a colleague for published data, the most commonly given reason was the “effort required to actually produce the information” (80 percent of respondents). As Todd Vision, a biologist at UNC and contributor to the Data Dryad open data repository, explained in BioScience:

Unarchived data files are often misplaced, corrupted, or the software in which they were produced becomes obsolete. Memories fade.

Science education materials developers need full access to the data in order to determine its pedagogical strengths and weaknesses. This process often involves investigating many different data sets until settling on the ones that will best address the learning goals for their particular project. Following up on hundreds of individual papers—with a dismal rate of return—isn’t feasible for a small education nonprofit or a lone teacher trying to innovate at a struggling school. This leaves vast amounts of potentially more educationally useful data untapped.

I talked to Sandra Porter, who I met at the last Science Online conference, about her experience with obtaining data for curriculum materials development. Sandra is the president of Digital World Biology, and one of her collaborative projects, Bio-ITEST, involved the development of bioinformatics curriculum materials for secondary students. In genetics and bioinformatics, which are inherently data-focused, data archiving requirements are more common and Sandra and her colleagues were able to take advantage of open data resources such as the National Center for Biotechnology Information (NCBI) and the Barcode of Life Data (BOLD) Systems. Yet even in these fields, access to raw data—the kind that practicing scientists would encounter in their careers—can be tricky to obtain. Sandra commented:

The raw data was useful for us because we needed to know what raw data looks like so we could work out analysis problems in advance. These types of data files are not likely to be available from many places since these raw data are usually processed and analyzed through many pipeline steps before they get submitted to a database.

There are many worthy reasons to support the open science movement, but the argument for science education holds its own among them. It has never been easier to bring real scientific data into classrooms, and the benefits to young scientists-in-training are clear. It would be a shame for all of that educational potential to languish on old hard drives.

Author: Mike Klymkowsky

I am a Professor of Molecular, Cellular, and Developmental Biology at the University of Colorado Boulder. Growing up in Pennsylvania, I earned a bachelors degree in biophysics from Penn State then moved to California and earned a Ph.D. from CalTech (working for a time at UCSF and the Haight-Ashbury Free Clinic). I was a Muscular Dystrophy Association post-doctoral fellow at University College London and the Rockefeller University before moving to Boulder. My research has involved a number of topics, including neurotransmitter receptor structure, cytoskeletal organization and ciliary function, neural crest formation, and signaling systems in the context of the clawed frog Xenopus laevis as well as biology education research, leading to the development of the Biological Concepts Instrument (BCI), a suite of virtuallaboratory activities, and biofundamentals, a re-designed introductory molecular biology course. I have a close collaboration with Melanie Cooper (@Michigan State) that has resulted in transformed (and demonstrably effective and engaging) course materials in general and organic chemistry known as CLUE: Chemistry, Life, the Universe & Everything. I was in the first class of Pew Biomedical Scholars and am a Fellow of the American Association for the Advancement of Science.

One thought on “Open data for science education”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s