Data and curator: who swallows who?

Dec 07 2010 Published by under Uncategorized

Barend Mons: Data and Curator... who swallows who?

Curation strategy: cross our fingers and hope? Doesn't work! His mantra: "it's criminal to keep generating data without policies to deal with it!" Need to get the collective brain involved.

If you measure anything in -omics, you end up with a Big Ball of Mud. "Ignorance-driven research:" measure, get lots of data, get a result if you're lucky. These days, it becomes "BIGNORANCE driven research," the goal of which is to find some kind of signal in all the noise.

Everybody wants structured data, but nobody wants to do structured-data entry! We need to figure out how to get from messy free text to structured data. Note: people WILL do structured data entry if they see how it helps them. (Metadata librarians take note!)

Lots of wikis trying to solve this problem, but what ends up happening is people repeating each other's assertions rather than checking and correcting errors.

Theme emerging: we only need a tiny tiny share of people's online attention to do a LOT of science! Question is how to earn that share.

Knowledge discovery by computers requires computer-operable data. (No big surprise, but it bears repetition.) Dirty data comes from all kinds of datamining: Web, articles, etc. Then clean it up on the wiki and add URIs to evidence, publications, etc. Put the result out as RDF, then use computer reasoning to adduce insights and guess at their reliability. Hoping also to store but not reason over negative results. Soon they'll be able to track "nanopublications" and make sure people get credit.

Partnerships arising to do things that are too complicated for a single researcher or organization to do. Using ORCID, VIVO, etc. to refer to people.

Summary: we need to remove ambiguity and redundancy; we need computer-reasonable data so that we can throw grid computing at it; we need to involve a million minds in curation; we need data publication (not just sharing) so that data become citable; we need data-citation metrics; we need standard setting from the bottom up.

Comments are off for this post