I pointed out Mike Lesk's slideshow in my last tidbits post, finding it a good critical précis of the data problem. It's pleasantly aware of human problems, human problems many treatments of cyberinfrastructure (including, unfortunately, this otherwise useful call to action from Educause) wholly ignore.
So wince and flinch at the design (black Arial on white? really? in 2009?), but read the slideshow anyway.
I do want to pick apart the slide from which I took the title of this post. I reproduce the said slide's text in full:
Can we just give the problem to the libraries?
As a professor in a library school, I wish I could say that libraries were the obvious organization to take care of data. They understand keeping things for a long time and arranging to find them later. It would be a sensible new activity to balance a decrease in foot traffic into book collections. But...
- They have not been ambitious in this area; libraries feel under budget pressure and don't want new tasks.
- They lack the subject area knowledge to deal with complex data sets in scientific areas.
- They often lack the technical skills for advanced data handling.
I have no quarrel whatever with Lesk's first point. Libraries have absolutely been timid about this, and they still are—not without reason, either! This, to me, is the buck-stopper, the Berlin Wall, the concrete bollards. If library administrators shy away from this, or give it lip-service only, Lesk is right and there's nothing to be done. It won't matter how many librarians are ready and willing to do this work, if they're not allowed to or not given sufficient resources and authority to.
How likely is this outcome? In my estimation, more likely than not. My estimation is admittedly colored by this being very early days yet, but as I've remarked before, the longer any interested group dithers, the more likely it is that the action will be elsewhere. The more the action moves away from libraries, the more likely library administrators are to breathe a quiet sigh of relief and turn away from the problem altogether.
So what is a librarian who wants this work to do? Well, one answer is to keep an eye on discipline-specific projects, those that are larger than any single institution, the up-and-coming ICPSRs and Sloan Digital Sky Surveys. For those interested in data curation inside an institution, I think the answer may well be to learn enough to insinuate oneself onto research teams directly through their in-house IT arms. I may revisit this answer later; in-house IT is starting to become just cost-ineffective enough that some recentralization may happen. In that case, the would-be data curator has more options. Either way, though—the wise data curator does not attach himself limpetlike to the library. The action may well be elsewhere.
What is a researcher or funding agency or think-tank that wants libraries to take on this work to do? Researchers need to ask. Nothing gets library priority so fast as a well-articulated request from faculty; that goes double in disciplines where physical library spaces are waning in importance. Agencies and think-tanks: I'd recommend being an awful lot clearer about what the services provided look like and how they need to be staffed. Laundry-lists of skills are useless without an estimate of FTE and budget; such an estimate is noticeably lacking in every single discussion of this problem I've ever read.
I half-agree, half-disagree with Lesk's second point. There's a lot of disciplinary knowledge in academic librarianship. We don't select books blindly! We do it by taking heed of what our local researchers are doing. Many selectors and liaisons assigned to particular disciplines have degrees, sometimes advanced degrees, in that discipline. In the social sciences, by the way, data librarians with appropriate disciplinary knowledge already exist.
The problem isn't the non-existence of disciplinary knowledge; it's the uneven spread of it. For any given discipline at a research university, I'd guess it's a better-than-even bet that the library has a librarian somewhere with appropriate disciplinary expertise—but it's not a certainty.
Of course, there's also a question of how much disciplinary expertise is actually necessary for this work. Diane Hillmann remarked to me at ALA this summer that "[researchers] all think they're special snowflakes," but in her experience the basic sustainability questions don't differ all that much from dataset to dataset. That's what I think, too, with the added wrinkle that disciplinary specialists may actually be too close to their data to have a good read on how others will want to use and query it. An outsider perspective may well be useful!
(The real problem is one of first impressions and secret handshakes, as my SciBling Christina adroitly points out in the context of reference interviews.)
I could very nearly recycle the answers I just gave for Dr. Lesk's second question for his third. In aggregate, research libraries have quite a lot of technology expertise. How much any given library has isn't predictable, and may well not be sufficient.
If we cross the answer to the second question with the answer to the third, we approach the real conundrum: sufficient disciplinary expertise and sufficient technical expertise tend not to coexist within the same librarian. Take me, for example: if it's textual or linguistic data, I'm your librarian—that's my educational background! I can apply common sense and well-honed data-management expertise to numeric or instrument data, but I can't apply disciplinary knowledge because I don't have it. Selectors and liaisons, conversely, likely understand quite a lot about local research in the disciplines they serve, but they mostly don't sling Python and XSLT, nor do they tend to have the digital-preservation knowhow that I do.
John Saylor of Cornell gave what I believe to be the appropriate answer to this problem in his talk at ALA Annual: a technical team dedicated to data needs to work with librarians who have disciplinary expertise in order to solve problems. The disciplinary coverage achievable with this staffing model won't reach 100%, but it'll get as close as seems feasible. Nota bene: without broad participation by disciplinary specialists across the library, a data-curation service suffers and may well fail!
Lesk's objections are serious, pertinent, and pointed. They are not, I believe, unanswerable, but answering them will take considerable vision and will on the part of research-library administrators. Time will tell.