I hacked the academy

This post is intended for Dan Cohen and Tom Scheinfeldt's crowdsourced Hacking the Academy book.

Arguments about open access usually appeal to altruism, tradition, or economics. Even arguments supposedly aimed at researcher self-interest strike me as curiously abstract, devoid of useful example. I will therefore tell my story about open access, because I hacked the academy and lived to tell about it.

I graduated from library school in May 2005, and by good fortune managed to begin employment as an institutional-repository manager in July. Knowing no better, I wrote about my experiences and how they were informing my thinking about open access on a weblog.

About two years later, another repository manager contacted me. She was co-editing a themed volume of Library Trends about institutional repositories, and would I be interested in contributing an article?

This all by itself is an academy-hack. Wet-behind-the-ears librarians barely two years out of the starting gate are not usually asked to contribute articles to august periodicals. I can't explain this save by reputation gained from my open writings on my weblog. Can you?

I'm not sure, I said. Most of the things I have to say about repositories aren't happy. That's all right, she said; somebody has to say them. But the Powers that Be… I said. Just write me an article, she said.

So, knowing no better, I did, titling it "Innkeeper at the Roach Motel" after a nickname I had earned from my librarian colleagues. In December 2007 I placed a preprint of that article in the institutional repository I was running.

The article did not appear in print until March 2009, though it had been due out several months previously. Delays happen in the scholarly-publishing business; that's not unusual. Here's what's unusual: the day "Roach Motel" appeared in print, it had five citations already. Bizarrely, two of those citations are in the very same issue of Library Trends in which "Roach Motel" itself appears.

Today, Google Scholar claims over forty citations for various forms of "Roach Motel," though some of them are student papers rather than published articles. Because I still run the repository housing the pre- and post-print, I can tell you that "Roach Motel" has had over eleven thousand pageviews there. Even a full year after publication, it is rarely out of the top spot in monthly pageviews, and never out of the top three. (Incidentally, this is why the repository in question does not publish top-ten monthly download statistics, as others do. I could do it; it just feels too self-aggrandizing.)

This is academy-hacking with a vengeance! I can't explain "Roach Motel"'s wide readership via my own prominence in the field; I didn't have any when "Roach Motel" was published. I can't explain it by publication venue; Library Trends is a solid journal, but it's not the library analogue of Science or Nature or Cell. I can't explain it by selective self-archiving, as I've self-archived almost everything I've written for publication. Moreover, I self-archived "Roach Motel" as soon as I sent in its draft, before it was reviewed and long before I could have had any notion it would make a splash. I can't even make much of a claim about its quality; I hope it's good, but I know it's flawed.

Open access. Open access let me hack the academy. I can't explain the bizarre story of a bizarrely-titled little article from a relatively bizarre librarian any other way. Can you?

And if you can't, why aren't you self-archiving?

Conflagration coming

I'm on record predicting a toll-access journal bloodbath.

Anecdotes are not data, one dead swallow doesn't mean the end of summer, and so on… but I just heard yesterday about a second small independent toll-access journal whose sponsors may be discussing winding it down.

This isn't the scenario I was quite looking for; I expected a stable-fire or two among small journals at the big publishers. That isn't happening yet. Some big publishers are still posting record profits, so the squeeze isn't on. Others are going on buying sprees hoping to trade on exclusive access.

I do think those record profits are endangered, if the mad rush of salesfolk I keep hearing about from my librarian friends is any indication at all. I don't know how the exclusive-access trick will play out, but my bet would be on "unhappily."

It does make sense, though, that small independent journals will find themselves in even more trouble than they're already in. They've always been first on the chopping block in libraries. Not a few have poor or no electronic-publishing strategies (by which I mean more than merely placing journal content online; usage data is not optional these days if one cares to sell to most libraries). If their subsidies are now endangered (which seems to be where current problems are arising), so are they.

Right now, these journals are just kindling. I mean no offense; many of them are excellent. Their deaths, however, aren't going to raise much comment.

I'll hold to my prediction for now, though. There's a conflagration coming.

I need to lift the iron curtain between this blog and my workplace. I beg your indulgence for one post.

As those who read Bora's interview with me know, I discontinued my previous blog Caveat Lector because I was informed that it was causing significant distress to individuals in my workplace. In my best judgment, I could not continue to blog there in any capacity without it appearing that I had simply brushed off the problems I caused. I took those problems very seriously indeed, as the closure of CavLec bears witness.

When I came to ScienceBlogs, I intentionally structured Book of Trogool such that potential conflict would arise as seldom as possible—preferably never. That was simple enough as long as my data-curation responsibilities were limited to membership on exploratory committees, and my scholarly-communication responsibilities were limited to keeping an IR running.

The latter is no longer the case; my position is being revised to include broader responsibility for scholarly-communication issues. While I'm pleased about the change and hopeful that I will serve well in this new capacity, it does bring back the spectre of additional blog-related trouble.

I'm still deciding what to do (and to be clear, this decision is mine alone, just as the decision to shutter CavLec was). My experience last year led me to agree with Jenica Rogers's assertion that libraries, broadly speaking, are not comfortable with online professional identities, and I accept her conclusion that there's little choice but to adapt one's online identity management accordingly.

I invite your thoughts in the comments here, but be aware that comments critical of my workplace are liable to be ruthlessly edited or deleted. (Criticism of me personally, or my approach to blogging, is fair game as long as it doesn't veer into the usual delete-bait.) My gmail (dorothea.salo) is also at your service. As always, I am very grateful for the thoughtful and intelligent nature of my readership.

Tidbits, 13 May 2010

Bon mot?

Saying that large-scale storage is all that's necessary for data curation is like saying that empty bookshelves are all that's necessary for a library.

On NSF data plans

Word on the street is that the NSF is planning to ask all grant applicants to submit data-management plans, possibly (though not certainly) starting this fall.

Fellow SciBlings the Reveres believe this heralds a new era of open data. I'm not so sanguine, at least not yet. Open data may be the eventual goal; I certainly hope it is. At this juncture, though, the NSF would be stupid to issue a blanket demand for it, and I rather suspect the NSF is not stupid.

Part of the problem, of course, is that many disciplinary cultures are simply not ready for even the idea of open data. If the NSF were to mandate it, these disciplines would revolt openly, tossing lots of "government interference in science" rhetoric around. Moreover, disciplines that are hand-in-glove with industry would lead the charge, with industry's big bucks to back them up. I hear quite a lot about industry strongarming academic scientists into considering nearly everything, emphatically including data, a "trade secret."

(Lest anyone think this type of reaction is limited to the sciences, I ask you to recall the kerfuffle at Iowa over electronic theses, spearheaded by the creative-writing department.)

Another part of the problem is that many, perhaps most, scientists who are ready for the idea of open data are emphatically unprepared for its praxis. It's beyond doubt that data management will be extra work for most of these people, given how sloppy and ad-hoc many data practices are; as the NIH Public Access Policy demonstrates, adding to a researcher's workload must be done with extreme circumspection.

The NSF can't hand down guidelines from on high. Blanket "here is how you deal with your data" demands will not work, given the quantity, variability, and variable sensitivity of data across the scientific enterprise. Data standards? Data standards don't exist for the entirety of science (never mind metadata standards), and not even the NSF can wave a magic wand to call them into existence. Rather cleverly, then, the NSF is planning to say "We don't necessarily know how to deal with your data, but we expect you to think about it and do the right thing."

So if you think you might be affected by this rule if it comes to pass, what should you do? Here's what I think.

  • Do not try to revamp every single process and procedure you have. Do not try to "rescue" all your old data all at once. You will swamp yourself and get discouraged. Seriously, don't. Panic won't help you here.
  • Instead, look back at your last funded project, since it will be freshest in your mind. What data did it produce?
  • What happened to that dataset in the course of your research? Did you run programs against it? Be prepared to archive and document that code.
  • Who handled your data? Did they document it? Where? If there is any part of the process you're fuzzy on, be aware that this fuzziness will need to go away for your next project.
  • Ask yourself the famous ten questions (PDF) about your data. The answers will inform your data-management plan.
  • What can't you do for your data that you think should be done? Need partners? Go find them now. Depending on your needs, the right partners may be in your campus library or IT organization, or they may exist at your funder or in a research center near you.

That should keep you out of trouble for a while! It will also mean that you are prepared come the next funding cycle, where many would-be grantees won't be. In today's cutthroat funding environment, that can only help.

No more can-kicking

Having made it back at last from Scotland despite the ash cloud, and overcome jetlag and (some) to-do list explosion, I finally have leisure to reflect a bit on UKSG 2010.

My dominant takeaway is that nearly everyone in the scholarly-publishing ecosystem—publishers and librarians alike—is finally aware that we can't keep kicking the journal-cost can down the street any longer. Serials expenditures cannot and will not continue at their current level, much less increase.

When I think back to the last talk I gave to an audience of publishers, I see that a lot has changed just in my own demeanor. I was scared to rock the boat in 2006. Now, I can say "Being a roadblock is a poor business model" aloud in front of 700-odd people without being lynched, or even being afraid of being lynched. I think many toll-access journal publishers have secretly admitted to themselves that they're roadblocks, and they know way down deep that matters can't go on so.

I also got a sense that librarians are starting to face facts at last and arm themselves for a fight. I'm not the only one; I was stopped after my talk by a gentleman who said that he's never heard so much angry, determined backtalk coming from librarians as he had at UKSG 2010. See also Meredith Farkas, the Librarian in Black and, as always, liberation bibliographer Barbara Fister. That last piece? I'm notorious (not necessarily in a good way) for being outspoken, but I would have been too timid to write it. I love it, though, and I love that I'm not the only outspoken soul in this space!

All of this is deeply fascinating. What will it lead to? I don't know, but here are some guesses:

  • Additional growth in libraries-as-open-access-publishers, predominantly at research institutions. This will not limit itself to the journal literature; monographs are squarely in the sights as well. University presses unable to compete with this new market entrant will merge with libraries or fold, though their death-throes will probably take years or even decades.
  • Toll-access journal publishing will become a zero-sum game, if it isn't already. Every dollar of additional profit for the Elseviers and Informas of this world will be ripped from the pockets of other journals and journal publishers, including scholarly societies that haven't already signed deals with one devil or another. This is what I mean when I say the can can't be kicked any further down the road.
  • No one seems to agree with me on this, but I grow more confident by the day: small, low-subscriber-base journals at Big Deal publishers are in deep trouble as well. They add overhead but no especial additional profit, so they are obvious cost-cutting targets. Perhaps a journal massacre won't happen right away; EBSCO particularly still seems to be on an acquisitions spree. I do believe it will happen, though—and when it does, some of those journals will re-form as gold-OA, while most of the rest will simply fold, publisher-hopping not being an option.
  • Green open access in the form of mandates will grow the supply of open-access content quite quickly in the next few years. No, not all the eligible material will wind up in repositories. Yes, some publishers and researchers will balk. Yes, there will be faculty backlash at some mandate institutions, perhaps even all of them. No, it won't be enough to repeal most mandates, or to stem the tide of new ones.
  • The second-order effects of so much gratis-OA material becoming available will bear watching. I expect them to be difficult for toll-access apologists to explain away or ignore.

Before someone asks, I don't know what will happen with the Federal Research Public Access Act this time ’round. Federal legislation is always something of a crapshoot. FRPAA's previous history shouldn't count it out, though; going round and round the mulberry bush is just the way federal lawmaking works.

Moreover, the zeitgeist around open government, open (research and government) data, and open science may be working in FRPAA's favor this time around. "Open" has a lot more brainspace than it did last time. The non-failure of the NIH Public Access Policy certainly can't hurt. (I say "non-failure" because it's hard to argue that somewhere around a 65% compliance rate, which is the percentage I've heard bruited about, is a total success. It's certainly a vast improvement over the 3% garnered by the initial voluntary policy, though—and I expect compliance to rise further as the NIH starts to tell grantees "no, really, we're serious: no manuscripts, no money.")

Still, I hesitate to predict legislation. I'll stick to the ground I know better: publishing and libraries.

