L'esprit d'escalier

Oct 29 2009 Published by under Praxis

If you're not reading comments here, you're missing out. For reasons I don't entirely understand, some of the best in the business are seeing fit to comment here. They have more to teach than I do!

Chris Rusbridge (of, among other things, this thought-provoking meditation on digital preservation) has been spotted here, and whenever he pops up he makes me think about things. This time, I was thinking about disciplinary expertise, and how I need to make a better case that less of it is necessary for data curation than generally admitted.

I hope we can at least admit that data curators don't have to be researchers themselves. Do researchers have to be involved in the curation of their own data? Absolutely! Data curation starts at the beginning of the study-design process, and continues all the way through and past publication. But that doesn't mean that researchers have to do everything. The exact division of labor is still being sorted out; that's partly what this blog is about. That the labor must and will be divided appears to be beyond dispute.

The corollary to this is that a data curator will almost always know less about the data, viewed from certain axes, than the researcher does. She may well know more about it viewed from some other axes—file format details, metadata crosswalking, whatever. Some things, though, she won't know and presumably won't have to.

So what does she have to know about the research and the discipline in order to be a responsible data steward? And does she have to walk into the process with that knowledge pre-existing, or can she learn it as she works on the research project? How much of what she needs to know will transfer from other projects she's worked on?

Cards on the table: in the absence of much evidence either way, I think that someone with the intelligence, disciplinary background, and intellectual curiosity of a good subject-specialist librarian can learn enough "on the job" to hit the 80/20 point pretty easily—and 80/20 is more than good enough for a successful campus data-curation program in my book. The other 20% of edge cases can hire specially.

I'll use a True Story about myself as an anecdote. Feel free to quarrel with me (civilly, please) in the comments.

Some years ago I did a small contract job for the ACLS E-book project. They were working on rekeying and marking up an art-history book with extended segments of polytonic Greek text. Their keying vendor took one look and said "no way do we key polytonic Greek." So ACLS told them to key the rest of it and leave placeholders for the Greek. They came to me asking whether I could key the Greek in proper Unicode without snarling up the markup.

I have never studied Greek. I do not speak Greek. I do not write Greek. I do not read Greek, except in the sense that I recognize the letters and can laboriously sound them out. Don't ask me what in the world the accents and squiggly bits in polytonic Greek mean; I haven't the slightest clue.

Not snarling up markup? That I can manage. After an hour or so of research, I found fonts and tools that could enable me to do the keying job correctly and with reasonable efficiency. ACLS and I agreed on a price, and off I went. I didn't know what the squiggles meant, but I could reproduce them, and that was plenty good enough.

When it came time to proof my work, I didn't rely just on my own eyes; that would have been stupid. I called in my classics-major husband. He found typos and the odd homeoteleuton, which I duly fixed up. I sent the result back to ACLS, and they were happy enough to pay me, so there that is.

And there we have it: a partnership between a tech geek and a reasonably well-trained domain specialist (kindly note that my husband was an undergraduate classics major) took care of a data job. I think this can happen more often in more fields.

The chief barrier is the belief that it can't.

2 responses so far

  • David Fiander says:

    There's a wonderful story that I read somewhere about necessary skills and what is lost when processes are automated that is absolutely germane to your example, if not the topic at hand.
    Apparently many years ago a a Large University Press, they were typesetting a Greek text of some sort. At the time, setting the Greek type was still a manual process. During the process, the (human) typesetter stopped working and informed his supervisor that there was a typo in the manuscript, and that the author needed to be contacted. After much back and forth, the author was contacted, he looked at the manuscript, and agreed that there was a typo, and he fixed it.
    The typesetter, who had no background in Greek of any sort whatsoever, was able to identify the error because in all his years of setting Greek type, he had NEVER placed those to particular characters together in a word.

  • Chris Rusbridge says:

    I think you are absolutely right. I speak as someone completely unqualified for the job, with no computer science qualification, no library science qualification and (ahem) no research qualification. Amazing where a crumby physics degree can get you!
    That said, I do get annoyed at some librarian assumptions that, because they know metadata, they can do data curation. I think the key is the intellectual curiosity - and humility - you imply.