Where are libraries in data curation?

Aug 12 2010 Published by under How Libraries Work, Tactics

The Association for Research Libraries has a pretty good report out (I consider it JISC-quality, which is saying something for me) on where its member libraries are on e-science, cyberinfrastructure, data curation, whatever you want to call it.

I already knew most of what was in the report proper owing to having read the preliminary report (what, me, obsessed?), so the good stuff for me was in the case studies. I loved this blunt assessment from UCSD (paragraphing and emphasis mine):

At present, there are three primary pressure points related to e-science/e-research support at UCSD: turf, money, and interest. In reverse order, with the exception of a few very high-end data generators amongst the faculty, e-science/research lifecycle management is not high on the list of faculty concerns.

  • The NSF’s best efforts notwithstanding, most researchers, at least locally, have been slow to wake to the data challenge. They seem to think, to the extent they think about it at all, that they’ve already got it covered or that they lack the funds to cover it and, therefore, it should be somebody else’s problem.
  • As a consequence, this campus at least, has committed funds for providing the infrastructure and services necessary to curate data for the long-term, in the hope, frankly, that sufficient faculty (and students, but mostly faculty) will avail themselves of both to make the enterprise self-sustaining. That’s the good news; the bad news is that it has committed only those funds and only with the understanding that the enterprise will become self-sustaining. Whether that proves to be the case remains to be seen of course.
  • Finally, there is an awfully large number of parties interested in what remains a still-ill-defined problem space. The associated ‘jostling’ makes calculating the right mix of those parties in the solution space doubly challenging.

What I hear through the grapevine suggests that the above is true at many more places than UCSD.

I would add to this that "self-sustaining" is an incidence of the "them that has the gold gets the services" anti-pattern. Grant-funded research does not produce all data worthy of note. I'm all in favor of earmarks from grants, don't mistake me—I just worry quite a lot that data services will exclude all but the well-funded.

As I read through the case studies, what struck me was that these are stories of pioneering individuals given lots of freedom and enough support to use it. I applaud those individuals (indeed, I know several of them personally), and I love what they're doing. I just—well, I'm a worrywart. I worry about them too. They're heroes, and just as in programming, one can't sustain an enterprise indefinitely on heroes. (We tried that, as I am rather tired of saying, with IRs and their maverick managers. It didn't work, unless by "work" you mean "burn out good people uselessly.")

So when do we move past the hero model of data curation? When will it be mainstreamed? I genuinely wonder.

