I'm still buried in translating a presentation into Spanish for Monday and finishing another in English for Wednesday, but here's a small thought to tide folks over, a thought that came to me shortly before my presentation at Access.
At the data-curation workshops I've been to, it has been axiomatic that "we can't afford to keep it all." Some fairly sophisticated judgment rubrics have been worked up, often based on the same kinds of judgment calls that special-collections librarians and archivists make when presented with collection opportunities. Is this dataset unique, or could it be recreated? Is it well-described? Is it in good shape? What is its importance to its field? Et cetera.
There's a problem with this mode of decision-making. It's a human problem. It's a problem that is endemic in the institutional-repository context, which is where I became acquainted with it.
The problem is perhaps best illustrated with a parable; I'll borrow Achaea University from Caveat Lector. Dr. Helen Troia comes to data archivist Ulysses Acqua with a pile of helter-skelter basketology data. Ulysses scrutinizes the dataset (with the help of basketology liaison Menelaus Fox), assesses its value honestly, and decides it just doesn't make the cut. He tells Dr. Troia so, stating his reasons in a professionally courteous fashion.
Will Dr. Troia come back to Ulysses five years later, when she's created the dataset that will revolutionize basketology forever? Not terribly likely, I'd say.
There are people behind every dataset, people who care deeply about their work. Rejecting their data is tantamount to rejecting their work, rejecting them as researchers. While such rejection may still be necessary, it should not be done lightly—it is an act with far-reaching political repercussions.
What, for example, will Dr. Troia tell her departmental colleague Dr. Andromache Memnon about Ulysses and the data service? What happens to the Basketology department's data should Dr. Troia become department chair?
Uncomfortable questions, but ones to take into account when designing and publicizing criteria for what data-curation services accept.