Liveblogging Kevin Ashley's talk at #idcc10

Dec 07 2010 Published by under Uncategorized

(Wireless is terribly wonky at the International Digital Curation Conference, so I'm going to try liveblogging instead of Twitter.)

Kevin Ashley, Director, Digital Curation Centre, "Curation Centres, Curation Services: How many is enough?"

In the US, answer is 3: roughly one center per 100 million people, one per $120 billion in research funding. D2C2 at Purdue, UC3 at California, DRCC at Johns Hopkins. Does this mean the UK has too many?

How many services are there? Many per center! Who is being served?

Picture is actually considerably more complicated: many centers across the US, doing different things for different disciplines and institutions and people at different points in data lifecycle. E.g.: national libraries, national subject data centers, international subject data centers, university libraries, government data archives, etc.

Each actor has a different idea of where they sit in the DCC data lifecycle. Some focus on access/use/reuse of data, especially those that focus on highly-curated information and want to see a large audience before they take in a dataset. Institutional actors tend to engage earlier on, in the appraisal/selection stage; they won't take everything, but they'll take more and more diverse datasets. Others will pitch in at the very beginning, helping people with ideas make plans for durable data.

Motivations to help out include: "data behind the graph," reuse value of data for research, "data as record" (in the records-management sense), data reuse in education, increase the value of data via data mashups.

Given the complex landscape, it's hard for researchers to figure out where to go to get help, even those who desperately want to do the right thing! They need some kind of decision tree. Kevin suggests accepting that data has different homes at different points in the process; make that easy, and help people point to data wherever it happens to live. Particular problem when publications refer to small slices of a bigger data source; the connection between the slice and the original dataset can get lost.

Various sources of guidance for researchers and service providers; also potential peer reviewers of grants (how do THEY tell a good plan from a bad plan?).

DMP Online: walkthrough tool for researchers; can be adapted to almost any funder policy worldwide. Rule-driven, structured, generic questions. The same tools can aid peer review of grant applications, because everything is reduced to a common template, making plans easier to compare. Again, how many of these services do we need? Is DMP Online enough?

"In preparing for battle, I have always found that plans are useless, but planning is indispensable." Good thing to recognize! Plans always change, but the planning process is still useful.

IDCC presentation two years ago on what university libraries can do:

  • raise awareness of data issues (improving service to research, not just teaching)
  • leading policy on data management at institutional and national level
  • advising researchers on data management
  • working with IT to develop data-management capacity
  • teaching data literacy to research students
  • developing staff skills locally, reskilling/retraining
  • working with LIS educators to identify and deliver appropriate skills in new graduates

Some of these initiatives have been more successful than others! IT/Library interface often troubled or nonexistent. We're not teaching graduate students, or our own library staffs. Working with LIS instructors is inconsistent. But we're doing pretty well on policy and consciousness raising. (Kevin is talking about his own UK context; I think the US would tick different checkboxen.)

Question: should disciplines or institutions take on the data-curation problem? Pros and cons in both directions (I won't copy the slide; it's complex). Disciplines tend to run on short-term funding and have a narrow view of usefulness. Institutions don't tend to have the depth of knowledge.

Institutions need to know what's theirs, know where it is, know what the rules are (who can see it, who assesses it, who discards it, when that changes). It's part of the institution's public portfolio! Marketing it is also an institutional responsibility.

Current decision points for UK Research Data Services: should every institution do this? what are the rules for new subject repositories, as the data landscape changes? what should be done nationally, what locally? drive or follow the international agenda? Where do institutional research administrators fit into all this; can they put aside turf battles for the efficiency of collective action?

What is the impact of research-data management and public data access? IMPACT. Increased citation rates (see Piwowar et al -- hi, Heather!); the 45% of publications in the sample with associated data scored 85% of the citations. Correlation is not causation, but the link is pretty suggestive. Shared social science data achieves greater impact and effectiveness: more primary publication, more secondary publication, findings robust to confounding factors. Formal sharing is better than informal sharing, which is better than no sharing at all. These numbers are persuasive to evidence-based researchers; we need to bring this to their attention! Also need more investigation across disciplines.

Definite demographic differences in who will share data. Women more likely than men; northern US likelier than southern; senior researchers more likely than juniors. (But the seniors are TELLING their juniors not to! Selfish and counterproductive, IMO.)

Another impact: reuse. "Letting other people discover the stories our data has to tell." Teaching journalists to mine data: "fourth paradigm for the fifth estate." Push to make government data more open allows savvy journalists to find stories in released data. They're being taught Python, analysis techniques, etc. Sometimes they'll get it wrong; we'll have to live with that.


  • Data is often living; treat it that way! (This is a serious weakness in the OAIS model IMO.)
  • More data in the world than is dreamt of in scholarly research, Horatio!
  • Hidden data is wasted data.
  • International collaboration is essential.
  • We have a duty to examine and promote the benefits of good data management and data sharing.
  • Three centers in the US is not enough!

From Christine Borgman: May not be time yet for rigid policies or too much structure from e.g. NSF. This is an experiment in what the scientific communities themselves will come up with, and what the response will be. Let's hang back and study the results. (I agree wholeheartedly!) Response: No, we don't want rigid rules, but we can help them work toward best practices and structured thinking about what a data-management plan is. And some agencies can legitimately set constraints (e.g. use our data centers) and monitor compliance. Fundamentally, though, right now is about getting people used to the whole idea of data management.

Q from Department of Energy: government is afraid of cost of data curation, uncertain of benefits. The more evidence of impact, the better! A: Absolutely agree! We need to measure and market benefits.

2 responses so far

  • Joe HourclĂ© says:

    re: Kevin Ashley's talk ...

    I don't think it was that 3 is enough, but that 3 seemed to be the answer based on evidence (ie, how many there were currently).

  • Chris Rusbridge says:

    "Data is often living; treat it that way! (This is a serious weakness in the OAIS model IMO.)"

    D, absolutely agree. OAIS is about what I call "in the box" preservation. It's not a lot of use for data that changes, has rapid deposit or re-use, is very large or very small. It's also generally overkill for ANYTHING. This was hard to say when the DCC had an OAIS co-author as a senior manager!