Framing digital preservation

It's not at all difficult to make me roll my eyes. There's one particular narrative that does it every time, with the reliability of the sunrise: the "digital preservation is impossible" narrative. Annoys the living daylights out of me.

I don't think this particular exemplar was the Library of Congress's fault, necessarily. Everything Brylawski is actually quoted as saying is true, and Brylawski is careful to point out that analog formats disintegrate too. Every time, though, nuanced conversations about preservation turn into "digital is bad."

It's nonsense. Preservation is hard. Digital preservation is no harder than analog; we just have better scaffolding in place for (some) analog preservation. (I said "some." Silver nitrate film, anyone?)

I like the way Kyle Felker puts it:

With regards to print material, the systems for archiving and preservation are so old and well-established by librarians, archivists, and publishers that they are practically invisible to scholars. It wasnt always this way, of course, the standards and processes that make it possible for published books and articles to make their way onto library shelves has taken decades to work out. But worked out they now are, so well in most cases that they are mostly invisible to scholars, if not the librarians and other professionals that work to keep them running on a daily basis.

We've worked out quite a bit with regard to preserving digital materials. We know about best practices. We know why spinning disk is to be preferred to CDs. Those of us with some experience and some common sense can make pretty sharp guesses about the preservability of digital materials handed us, just as a paper-preservation expert knows to look at the paper stock and the quality of binding.

Indeed, just as with analog preservation, most of the barriers are not technical; they are social and organizational. Nothing "preserves itself," not analog and not digital. (I keep hearing "benign neglect" put forward as an analog preservation strategy. I respectfully submit that there's a serious problem of survivorship bias in that mode of thinking.) Recently I read about the truly epic NDIIPP effort to save some gameworlds. Technical barriers, yes, but if you read carefully, you find that just about all of those were surmounted (the Second Life efforts are clever as all getout). What stopped them cold was usually copyright, and the lack of a Section 108 analogue.

(Oh, and while I'm thinking about Section 108, a word-for-word "Section 108 analogue" will actually not suffice. Section 108 doesn't usually kick in until the item in hand has essentially disappeared from view. This won't work for digital preservation: once it's gone, it's often really-truly gone. The Section-108 analogue that will do the job for digital materials that Section 108 does for analog materials will have to allow dark archiving before materials disappear.)

What I resent in all this is the sense of helplessness promulgated by the "digital preservation is impossible" narrative. It writes people like me out of the narrative, or even worse, presents us as deluded maniacs (which I really truly resent; I'm no madder than the next librarian). Digital preservation is possible. We just have to do it, build the legal and policy and technical space that lets us do it. Just as we—we librarians—do analog preservation. Kyle again:

When I still was a librarian, I was always a bit perplexed, though, at how the [digital preservation] issue didn't seem to register to most scholars, even those who were actively engaged in digital scholarship. When I spoke to faculty members about what was going to happen to their new whiz-bang digital resource once it was done and they needed to store and access it long-term, I usually got blank stares.

Library invisibility tends to harm libraries and librarians. As the Invisible Wallets, we lost the gravitas we needed to create necessary change in scholarly communication. As the Invisible Preservers, have we done the same with regard to preservation, digital and analog? I don't know… but throwing up our hands and saying "digital preservation is impossible" is surely not going to help us.

Tidbits, 29 September 2010

Sorry for the radio silence, folks; my professional life was disrupted this week by a POTUS visit to campus. (Can I put that on my résumé? "Kicked out of my office and my classroom by the POTUS?" Or would that be bad?) That's over now, though, so time for some tidbits!

Friday foolery: Data management plans

I've linked to this before, but as the NSF data-management-plan go-live date looms overhead, it's worth linking to again: My Data Management Plan -- a satire.

Guaranteed to have your local data-management expert howling with laughter—or curling up under a desk in tears.

Talking open in closed journals

Open-access advocates of all stripes—researchers, librarians, publishers, consultants—are in my experience voracious readers on open access. I fit the profile, but I know people who read even more than I do, approaching "everything."

One practical result of all this reading is that every single one of us—me included—has wanted to read an article on open-access that was published in a journal that we don't have access to. And almost every time, we grumble about the irony.

Here's the thing, though. If we're going to maximize the reach of our message, we're going to have to put up with that particular irony. I don't like that academics have tunnel vision either, but they do—which means publishing in enemy outlets.

Do we need to call it out every time? Only if we care to annoy and antagonize by looking incurably smug. I suggest instead that we mourn our missing access. That, after all, is the actual problem. Irony isn't.

Not shockingly, there's quite a bit of confusion in the research enterprise about what exactly "open" means. Open access is bad enough, with its green and its gold and its gratis and its libre and its cha-cha-cha (okay, I made that last one up). Open data is worse, partly because it started happening on a noticeable scale before the Panton Principles could frame it properly.

It's pretty safe to say, though, that the Cacao Genome Project ain't open data. Glen Newton has all the details, but the basic upshot is that to get to this supposedly (and trumpeted-ly) "open" data, one has to register (pseudonyms need not apply) and agree to an extraordinarily restrictive license that precludes data mashups and publications, among other things.

Now, I don't know what happened here. It may not have been the researchers' fault. Maybe somebody's lawyer wasn't clear on the concept—funny how often this seems to happen in an "open" context, as open-source developers in academia and industry will tell you at great length. Maybe somebody's website developer was asleep at the switch. I won't poke fun at the researchers themselves, nor assume malice or cluelessness, until more is known.

Just for a moment, though, let's think about what this false claim tells us about the brand value of "open." With regard to data, particularly genome data, it seems to be higher than I would have guessed at this early date. The CGP didn't just quietly put their dataset out there; they made a big deal of its supposed openness. That's fascinating, and it's hopeful. Science is prestige-mad. If open data is a prestige brand, that's a good thing for those of us who want to see more of it.

Curiously, I can't come up with analogous cases of "fauxpen" in publishing. There are the lovely hybrid publishers who can't tweak their website designs enough to get rid of demands for money on articles whose authors have purchased the open-access option, I suppose. In my head that's not quite the same thing, though; it's not really trying to leverage the "open" brand falsely, more trying to ignore "open" in order to grab at more money. I might suggest that publishers have done enough of a smear job on open-access publishing that the "open access" brand is worth less than "open data." I hope that sad situation can be reversed.

The comments are open for LOLresearchers, LOLpublishers, or any other (PG-13, please) illustration of the post title. If I get some good ones, I'll post them here (with credit, naturally).

Nature: the response

I was able to get quite a bit of feedback from Nature regarding prices, access, and content. I spoke with our North American sales reps and another staffer from the London office last month and their response is below, with my questions and their answers:

1.      According to Nature’s recent letter, 50% of Nature journals do not currently have an OA option. Are there plans to provide access to the remaining 50%? How would this be accomplished?

Its not 50% of all NPG's journals but 50% of our academic (not Nature-branded) journals. NPG's August letter to customers says "Open access options are now available on 50% of the 50 academic journals we publish including all 15 academic journals owned by NPG. Seven journals published by NPG on behalf of societies offer open access options, with more expected to follow later this year."

All of the academic journals NPG owns now have an open access option. Since the letter was published, six more of our society-owned journals have introduced open access options. They are American Journal of Hypertension, Laboratory Investigation, Modern Pathology, Mucosal Immunology, Neuropsychopharmacology, Obesity. We continue to discuss open access with our publishing partners, ultimately the decision to introduce an open access option on these journals remains with the society or organisation who owns the journal.
Nature Communications is unique amongst the Nature journals in offering an an open access option. The gold open access model (funded by article processing charges) is still inappropriate for Nature and the Nature research journals. These journals decline more than 90% of submissions, these high rejection rates and the developmental editing that goes into every published paper would make APCs prohibitively high. We estimate the APCs on these journals would be between $10-30000, and research funders are not currently willling to support this.  The Nature Review journals do not publish original research papers.

 2.      Does Nature have plans to incorporate newer metrics into journal, article, and author information and assessment? Some examples of these include article downloads, author h-index information, Eigenfactor information, etc. 

Earlier this year, we introduced article download information for 43 journals. This is available to authors within their account on eJournal Press, our manuscript tracking system.

We continue to monitor the alternative metrics such as the Eigenfactor, Article influence score and h-index. These metrics are not yet widely accepted or understood, but we remain very interested in alternative ways of judging impact and value.

For example, at NPG we think that cost per download and cost per local citation are potentially important measures of the value for money of a journal to an institution.

3.      With regards to communicating and sharing consortial plan arrangements and information, are there plans to provide more transparency on pricing? Specifically are there plans for different consortia to know prices and information provided to other individual customers and consortia?

NPG makes its academic list prices public in the interests of transparancy. We have no plans to make terms of consortia agreements public. Each consortium is different in terms of their holdings, number of institutions and FTEs, and these are confidential agreements.

4.      In my phone call with our NA reps we briefly discussed a library advisory board for Nature. Can you give more information on the group’s membership and activities?

The NPG Library Committee is an invited group of NPG institutional customers. The group represents a mix of customers from across the world, working in academic, corporate and government settings. It includes both individual customers and consortia managers. The Committee or a sub-group of it meet face-to-face approximately once a year. We discuss NPG's activities and the wider publishing and information communities with them regularly via a discussion board on Nature Network, email and phone. The Committee provide useful feedback and insight on the views of the information community.

5.      One comment to my blog post mentioned Nature’s mission statement and its recent change. Can you provide more details on how the mission statement is reviewed, updated and shared with customers and readers?

The journal Nature's original 1869 mission statement still stands, and guides Nature Publishing Group's activities today:

THE object which it is proposed to attain by this periodical may be broadly stated as follows. It is intended
FIRST, to place before the general public the grand results of Scientific Work and Scientific Discovery ; and to urge the claims of Science to a more general recognition in Education and in Daily Life ;
And, SECONDLY, to aid Scientific men themselves, by giving early information of all advances made in any branch of Natural knowledge throughout the world, and by affording them an opportunity of discussing the various Scientific questions which arise from time to time.

Nature's mission statement was updated in 2000 as follows:

First, to serve scientists through prompt publication of significant advances in any branch of science, and to provide a forum for the reporting and discussion of news and issues concerning science. Second, to ensure that the results of science are rapidly disseminated to the public throughout the world, in a fashion that conveys their significance for knowledge, culture and daily life.

We have no current plans to update Nature's mission statement.

6.      Another comment to my blog post mentioned the pricing model for Nature as being based on a floating currency model which is not determined by currency rates we see in bank and other financial updates. Is this how international currencies are determined and are there plans to review or revise this model? 

In 2008 NPG introduced local pricing based on four local currencies (dollar, euro, pound sterling and yen). This means price increases are applied to local currencies, independent of currency exchange rate fluctuations.

I have a few comments on these answers:

1. Estimates of $10,000 - $30,000 in author charges for one OA article? You heard it here first. That's the entire journals budget for some small libraries. They'd be able to get one article for the year - for all the faculty. I wouldn't call that a viable OA option.

2. I'm glad Nature is implementing article download information, but I think it much more beneficial if everyone, not just the author, can see the data. As a point of comparison citation data is available to all users in a database. 

 More generally, I think Nature is trying to have it both ways - be a boutique publisher, with high costs and a correspondingly  high-profit margin, and also remain a core publisher, in that most acdemic or scholarly institutions are expected to subscribe at least some of the content. With these cost increases and licensing options, Nature is becoming (or already has become) unaffordable for many institutions. If scholars weren't demanding access, many of these titles would have been cancelled by many libraries by now.  

I don't think you can have it both ways. If most of your customer base can't afford the content, then in my opinion your market is limited and you can't also be considered a core publisher. Can you be both Neiman-Marcus and Wal-Mart at the same time? I don't think you can.  I'm curious to see how many libraries have cancelled titles from Nature or have forgone adding titles because of the cost.

This discussion is also painful for me because I know faculty where I work want Nature journals - I have a list of over six titles that have been requested in the last few years. I feel I am doing a disservice to my colleagues in withholding access to something. But the money is simply not in our budget.

I also want to support publishers that are experimenting with new communication channels in scholarship like web features, a blogging platform, Second Life, podcasts and the like.  Nature has been very progressive in exploring these new areas of scholarship, and their support has helped legitimize them as communication channels. Does it have to come at such a high cost? I hope not.

On preservation versus replication of research data

Sep 20 2010 Published by under Praxis

I often see a cost argument against research-data preservation: if it's cheaper to replicate or regenerate the data than to preserve it, why preserve?

Here's my question: Cheaper for whom?

If we remain within the context of an individual lab, this question is a no-brainer: if it's cheaper to regenerate, regenerate. As we dip our toes into an opener-data world, however, I should think the equation changes rather.

Is it still cheaper for two labs to have to regenerate these same data? Five labs? Twenty labs? How many of those labs will have to buy specialized equipment to create those data, equipment they wouldn't need if the data were shared by the first lab? How much staff time—worst-case, specialized staff time—will be eaten up in regenerating data?

There are certainly offsetting costs to consider: the cost of data discovery, the cost of cleaning up and describing data for sharing, the cost of whatever munging it takes to move data from one lab's context to another's, the magnified cost of any error on the part of the data-generating lab.

Still, my sense is that the discussion around cost has been just a bit simplistic… and is likely to become more complicated as data-sharing norms emerge.

Not hanging separately

Sep 16 2010 Published by under Open Access

I've been thinking again about the question asked me at UCLA: why should academic libraries divert staff and budgetary resources to open access (green or gold, gratis or libre) if our mandate is to serve our local patrons? I gave an answer that I wasn't particularly happy with. I have a bit of an esprit d'escalier answer now, the borrowed words of Benjamin Franklin:

We must, indeed, all hang together, or most assuredly we shall all hang separately.

A lot of us academic librarians know that our faculty basically see us as wallets. We pay for the stuff they use. That's pretty much all they know about us, all they think we do (aside from checking books out at the desk, don't you know—and I am being sarcastic because this function is rarely performed by actual librarians). If we just hang back contentedly being wallets, what will happen to us when the wallet-function breaks, as we all know it's breaking? Particularly, what will happen if we have nothing to fall back on—no rhetoric, no advocacy, no best practice, nothing—from the profession, the collective? A library under siege from its institution with no support external to that institution isn't playing a strong hand.

We also know that toll-access publishers and aggregators have been playing divide-and-conquer for a long time. What are all these NDAs about, if not to divide libraries one from another and prevent us from gathering the collective intelligence that would let us all negotiate fair prices? At this late date, we seem unlikely to reverse this behavior as individual libraries and consortia. It's going to take a full-court press, from as many of us as possible.

Just as open access will. It's perhaps a measure of my own demoralization that I was quite chuffed by this preprint finding that 49% of current academic-librarian–authored materials can be found open-access. Forty-nine percent is dismal, but it's also extraordinary, and rather better than I would have expected. The lesson is clear, though: individual efforts on individual campuses and in individual libraries don't get us very far.

At the risk of sounding all commie and stuff: we work toward a collective openness, or we die off one by one as the business model sustaining us as well as publishers crumbles to bits.

One of our own

Today there was a major report released today by ACRL ( Association of College and Research Libraries) on Value in Academic Libraries: A Comprehensive Research Review and Report. The entire report is at:  It discusses ways libraries can better show value to the academy (or industry, local community, school system, etc.)

Disclaimer: I haven't read the whole thing. Unlike some of my colleagues I rarely get reports prior to the release date and it usually takes me time to read them. Since this is the start of the semester and it's over 150 pages this one will take me some time to digest.

I have read the introductory material, though, and will say this: it's a good report for anyone who wants to better understand the scope, mission and challenges of today's library.  One interesting observation: since it's not in a toll-access journal, it's free to all. Incidentally, this tradition of libraries publishing many of their most topical and case-based industry reports freely on the web has kept our own literature from moving as quickly into new publishing models (like Open Access) or understanding the plight of our academic colleagues. Many of our most important and timely stuff is not in journals (with a few exceptions) while other academic faculty are facing more journal titles in which to publish, more places to monitor research, and also being saddled with longer review times.  Unreliable social networks or tools that shut down or change access models are a problem for monitoring research with social tools. Bloglines and Scribd for two very recent examples. Bloglines will be shutting down Oct 1 and Scribd will require a userid to download content. Coupled with this is the pressure to conform to established criteria for promotion and tenure - who has time to innovate? 

Other recently reported news was the announcement of Frank Turner as University Librarian at Yale University. There's more information here, and while I can't comment on whether I think he has the skills to run a major academic research library, there has been discussion within the library community on whether an academic and non-MLS should be in this or similar positions of leadership in academic libraries. Should we hire and promote our own to run the library? I will say Yale felt justified in promoting one of their own - Prof. Turner has been at Yale for some time and the news release hints that the search committee identified Turner early in the selection process. Is it so bad to pick one of your own? Personally I think this can be both good and bad - good in that you know the candidate well and can be assured of their opinions, management style, etc. It's also bad in that you might also know the candidate well and assume the direction, opinions, or other aspects of their performance.  Do you value tradition to the exclusion of innovation? Can these two qualities peacefully coexist? I don't think there are simple answers here. The other argument in the library community is that this trend cheapens the library or MLS degree. I do have some concerns about this but leadership can be surprsingly democratic and not necessarily based on recent or traditionally relevant experience. In the bas eof Prof. Turner, I think his experience as Provost more than makes up for an MLS degree.    

This second announcement brings up another question: how much can you truly innovate in an established, traditional environment like an ivy league academic campus or a top-tier academic research library? Is it even possible? I think the challenge for Turner is not so much running the library (although it will be substantial considering the size and scope of Yale's collections and library locations) or even adminstrative expectations such as raising funds, raising the library's profile on campus, and garnering support for library services and programs.

I think Turner's real challenge will be addressing the questions raised in the ACRL  report: reconciling the academic library's tradition of having both a collegial and manegerial culture, addressing the requirements of outside accrediting organizations and professions, and demonstrating  library resources as relevant to the overall campus mission.  To me, this is the real challenge of our profession. Is it better to have "one of them" talking to the department faculty, or "one of our own" in the local academic community explaining our culture to others?

Since I have a stake in this, I'll share my opinion: my first priority is that the place I work gets the support it needs to function and thrive. If the best advocate for me as a librarian is a non-MLS administrator, or a faculty colleague, or a student-turned prominent alumnus, or even a library vendor who can tell their peers, then I'm fine with that. Would I rather have it be "one of myown" doing this promoting? Sure, I think everyone is happier when their own peers influence decision-making. But you can't always communicate to everyone on your own terms, or even to some higher level audiences. I think we all need to know when it's better for someone else to toot our horns for us, whether they're one of our own or someone else.

Christine Borgman on data

Sep 14 2010 Published by under Praxis, Tidbits

Christine Borgman has a lengthy track record of saying smart and apposite things about scholarly communication and research data. (See my review of her 2007 book here.)

She has done it again, in a conference paper entitled "Research Data: Who will share what, with whom, when, and why?" If you liked my Ariadne article at all, you will love this, I guarantee it. Strongly recommended, so much so that I didn't want to wait for the next tidbits post.

