New database catalogs almost a million syllabi

Columbia's Open Syllabus Project creates open-source syllabi database

Researchers could see which texts are most popular and enduring, or whether there is a reliance on textbooks or new technologies. Credit: Max Riesgo

Researchers could see which texts are most popular and enduring, or whether there is a reliance on textbooks or new technologies. Credit: Max Riesgo

The painstaking work of creating a large-scale searchable online database of nearly a million course syllabi from colleges and universities around the country is underway. Its creators hope it will provide insight into what is being taught, read and discussed in academia. The project aims to facilitate more cooperation and collaborative work.

The Open Syllabus Project, funded by a grant from the Alfred P. Sloan Foundation, has archived 800,000 syllabi so far and is creating the structure to support a database in which the material will be easily accessible and analyzed.

Syllabi provide a ground-level roadmap for each course of study. Yet there’s only a haphazard sense of the larger picture of what’s happening across the country in different scholarly fields, despite a number of attempts at cataloging them, researchers say.

Measuring teaching trends

“We’re still mining stuff out, but the hope is to see long term trends,” said Dennis Tenen, an assistant professor of digital humanities and new media studies at Columbia University and one of the prime researchers on the project. “Are they assigning different texts in Texas? Who is influencing one another? Who is innovating? Making the whole thing available will help others ask all kinds of questions.”

On the OSP website, the researchers say syllabi are “almost completely unexploited resources that are treated as the ephemera of the teaching enterprise rather than as its DNA.”

The database will help define best practices, regional trends and, hopefully, open discussion of intellectual property and other issues surrounding syllabi, Tenen said.

There are a myriad of possibilities with this project. Researchers could see which texts are most popular and enduring, or whether there is a reliance on textbooks or new technologies. They could chart how the teaching of a particular subject differs over time.

Fighting homogeny

The database will also be helpful in illustrating the range of approaches that exists across various intellectual pursuits, which is particularly timely given recent trends toward packaged and commercialized course material. Tenen hopes the database will help “counteract some of those centralizing forces that make education homogenous.”

Because policies and norms around syllabus ownership vary, the OSP says it won’t publish syllabi without permission. The public side of the OSP will be a collection of tools for analyzing metadata extracted from the documents.

“We will also, in the course of this work, advocate for stronger open-access policies for syllabi,” according to the website.

The idea of collecting and studying syllabi is not new, with initial efforts getting underway in the early 1990s. Daniel Cohen, a professor at George Mason University, used early web-crawling tools to build a database of links to pages that the crawlers identified as syllabi through simple keyword matches. But the archive has become obsolete because of link decay over the last decade, the OSP said.

The OSP built upon Cohen’s work, using his links to obtain some syllabi. Cohen is now the founding executive director of the Digital Public Library of America.

Designing the database

The OSP researchers are in the middle of their first year of grant funding, in the sometimes-tedious phase of gathering the syllabi, cleaning them up, and formatting them so they are searchable.

The core database is built with MongoDB, a modern, document-centered, open source database. Researchers are also building the structure that will hold 1 million syllabi and by summer hope to have an application programming interface (API) in place to allow for query-based access to the data through the web, Tenen said.

The grant runs through June; then the group will look for a partner — such as the Digital Public Library of America — to accept and maintain the database.

Patricia Alex
Patricia Alex has worked as a reporter and editor in New Jersey for many years and writes about higher education for one of the state's largest newspapers, The Record.
Patricia Alex
Tags: Data Center,Education,Industries