One of the biggest challenges facing scientists studying biodiversity can be summed up with a little dark humor — the task of cataloging all living species on Earth is both unfathomable and getting easier every day.
Trying to support that statement with hard figures, however, would be impossible. Scientists have yet to figure out how to compile a universal catalog of species, even a basic one with about the same level of detail as a phone book.
Is this the one task too great for Big Data, the massive sets of information that require exceptional technology to process?
Scientists disagree on the answer, but it might be.
The Unknown Unknowns
Being able to find, classify and assess the health of living species would be tremendously beneficial to society.
Such information would inform public policies and economic development decisions of entire nations.
Establishing a global catalog of life would also create opportunities for profit. The undiscovered biota no doubt holds cures, treasures and insights that can’t be dreamed of today.
Yet the depth of our ignorance on this topic is epic. To paraphrase the warrior-poet Donald Rumsfeld, there are known knowns, known unknowns and unknown unknowns. The last category is frightening in size for biologists who would like to see a universal taxonomy in their lifetime.
You would be forgiven for assuming that voraciously burning, paving and extracting large sections of the Earth would result in the eventual death of species that had lived there.
Yet there are scientists who say that life is more resilient than we assume, muting the harm of our actions.
Mark Costello, founder of the Society for the Management of Electronic Biodiversity Data, feels the rate of extinctions is likely overstated. But more importantly, not enough focus is being placed on cataloging life to prove anyone’s point.
“The average time it takes to describe a single new species is 21 years,” Costello said, Costello said, describing the period a specimen is first spotted until it is accepted by taxonomists. That “ridiculous period of time” reflects a lack of resources, he said.
It is believed that an estimated 1.7 million species have been identified so far. But coming up with an accurate total has been hampered by the unknown number of misidentifications, errors and duplications that exist in the world’s many catalogs.
Big Data to the Rescue?
Reconciling these errors sounds like an ideal problem for big data, said C. Titus Brown, who runs the Laboratory of Genomics, Evolution and Development at Michigan State University.
Brown said it’s possible to create an algorithm that would comb existing databases for inconsistencies and produce a data set that’s “maybe 90 to 95 percent” clean. Secondary software or human massaging would strip out most of the remaining outliers, he said.
“I guess I’m an optimist here,” Brown said.
Maybe too optimistic, according to Gerald Guala, director of the Integrated Taxonomic Information System at the U.S. Geological Survey.
Taxonomies—the way things are classified — vary century to century, nation to nation, region to region and sometimes scientist to scientist, Guala said.
He also cited the well-respected taxonomy for Central Florida, which is good for that region, except that it doesn’t mesh well with the categories used in Northern Florida. And when the author of the Central Florida taxonomy died, important details of his work were also lost, Guala recalled.
“Thinking that we can solve this with an algorithm is like going to a county courthouse, feeding all the legal cases for the last 100 years into a database, and expecting to come up with the perfect way to settle every disagreement,” he said.
Millions of New Species
Assuming that clever programming could solidify what is already known, science would still have to find a way to quickly and accurately catalog newly discovered species.
Humbled scientists said in the 1990s that 30 million to 100 million species had yet to experience their 15 minutes of fame. Researchers today suggest that the number of unidentified nematodes could top 1 trillion.
Brown said those numbers are needlessly overwhelming, and he doesn’t see a need to cast a net so fine that everything is caught.
He suggested that targeting sampling in small environments across the world would be an effective way to quantify species. The details could then be fed into data-crunching programs that would connect the dots and draw useful conclusions.
Requiring field researchers to input gene sequences rather than names would avoid inaccurate labeling, he said.
But researchers who are eager to try Brown’s theory should be prepared to embark on endless sampling expeditions.
A spade of dirt might be habitat enough for the prolific nematode.
Jim Nash is an award-winning business, tech and science journalist whose work has appeared in The New York Times, The Economist Group and Scientific American.Tags: Business,Government,Technology