And we're back! (With a four-note theme. Wait, that's Peter Schickele on Beethoven. Never mind.)
So yesterday before our enforced break, I asked what we could learn about e-research from a big chunk of space flotsam hitting Jupiter. What had caught my eye was this passage:
… the planetary astronomy community has been filled with excitement—emails are flying, with people exchanging information about the new discovery and its development. Major observatories are canceling their scheduled observations so that they can point their telescopes at Jupiter.
Why are they doing this? Because this is the only chance they get to record data about this particular event. Once it's over, it's over. And once it's over, any data that have been recorded are irreplaceable when lost, destroyed or otherwise rendered unusable.
Irreplaceable. Scary word. Puts data curation in a new light, doesn't it?
If you work in a field that is not reliant on transient observational data and in which experiments are easily replicable, you are one seriously lucky duck. For the rest of us, we get one shot at what we study, because we're stepping into Heraclitus's river every single day of our research lives.
Don't think this phenomenon is limited to the astronomers and the climatologists. Consider the plight of the linguist recording the last native speakers of a moribund language. Consider the historian or sociologist, or ecologist, or… anyway, trust me, it's widespread.
Some corollaries fall out of the irreplaceability axiom. On a walk around the block during this summer's Arts and Humanities Data Curation Institute, I was (perhaps dubiously) inspired to create the image following, patterned on Maslow's famous hierarchy of needs:
Irreplaceability is the reason I put data-acquisition issues at the bottom of the pyramid. If you ain't got the data in your grimy little hands, none of the rest of the pyramid matters!
This is the chief reason I think institutional repositories as a whole have been (pace Cliff Lynch) a failure thus far. They absolutely reek at getting their grimy hands on data, irreplaceable or otherwise. One may sneer at how such outfits as the Center for History and New Media fare on some of the upper strata of the pyramid, and I have in fact done so (privately heretofore, but oh well; Dan Cohen knows I love him), but there is just no denying that CHNM knows how to get its hands on one-time data.
Another corollary: when we are prioritizing what data we curate, since we simply cannot keep it all, irreplaceable data have a leg up on the competition. I believe in some areas of chemistry (and perhaps elsewhere), some rather heated arguments are taking place about whether to keep or recreate data. Looking at the heinous volume of irreplaceable data, I think I have to fall on the "recreate whenever possible" sword, recognizing that it is a sword.
And one last corollary: researchers who gather irreplaceable data have a special obligation to take good care of it!
Salo's Pyramid, by-the-bye, is finding use elsewhere. No one is so surprised by this as I, since it was a spur-of-the-moment thing (I'd just put Maslow's dissertation in the repository, and… look, my brain is a strange and uncanny place, okay?), but for the record, that entire presentation is licensed CC-BY. Gank in good health.