Looking toward 2011

Dec 30 2010

Before I get to crystal-ball-gazing, I have to point out my track record, because it's really quite bad. Not only am I on record with a major prediction that didn't come true ("IRs in the US will fold"), I quite failed to predict a number of things that did, from Harvard's OA policy to California telling Nature Publishing Group to go suck eggs.

My brain looks at systems. That means I consistently miss outliers, game-changers. I also don't always calibrate my guesses on the durability of systems right.

So with that said, here are some things that wouldn't surprise me a bit in 2011.

  • SCOAP3 eeks through; COPE backpedals or folds. What the open-access movement is facing in 2011 is a world where most of the low-hanging fruit has been plucked. Progress isn't easy or obvious any more (if it ever was), and it can't be made by the pioneers, entrepreneurs, and other earliest-of-early adopters. IRs are no longer fashionable (in the States, I add for my international readers). Gold-OA funds have to contend with the ever-widening maw of Big Deal renewals. My sense of attitudes among research-library administrators, as well as rank-and-file selectors, does not favor COPE's success or even survival.
  • Academic samizdat sees a real copyright lawsuit. Those creeps over at Attributor may well be the instigators. If they're smart, they won't actually sue a university, much less a library; they'll go after Mendeley or something RapidShare-ish, to keep the slumbering faculty behemoth safely abed. It's not out of the question, however, that some tiny school somewhere with grossly inadequate or nonexistent "electronic reserves" protections (and I've seen such schools firsthand; the culprit, aside from faculty themselves, is generally a boundlessly clueless IT shop) will be the target.
  • The initial campus NSF flurry will sputter. I'm worried about this myself. I encourage libraries and IT shops building data-management services on the strength of the NSF's plan requirement to diversify, and that quickly. Find non-NSF people to help. Do a survey or focus-group study to demonstrate non-NSF-related data-management needs. Pay some attention to the digital humanities. Do not plan to rely on a flood of NSF applicants; that flood is highly unlikely to materialize. There's plenty of work to do, don't get me wrong; most of the work just doesn't happen to be NSF work.
  • FRPAA won't make it this time either. Sorry. Maybe next time. Or maybe the NSF won't wait for Congressional cover, though I emphasize the "maybe" on that one.
  • Some chemistry department somewhere will drop ACS accreditation because the institution can't afford ACS journals. I have to admit, I have a little inside info on this one. But it's only logical, really.
  • A bare handful of Big Deal renewals will blow up, à la California and NPG. This is likely to happen in the full glare of the public eye, despite publisher wishes and publisher NDAs, because Big Deals are just that big and that noticeable. Don't be gleeful about this, libraries, because…
  • Faculty will start a lot of "why don't those damn librarians…" grumbling. If you'd like to hear some, pre-2011, have a listen to Amanda French and Tom Scheinfeldt in this episode of the Digital Campus podcast. Those damn librarians. Why don't they just fix this? Where's their damn spine?
  • An IR's gonna fold. Yes, all right, I was wrong when I said this the first time, and I wouldn't be surprised to be wrong again. But I'll say it nonetheless. I see too many libraries who opened IRs on a wing and a prayer without adequate planning or even a sensible collection-development policy. Let's face it, folks: in the absence of mandates, the OA-via-IRs experiment failed. Let's also face that libraries can't run (much less re-run) expensive experiments these days. Result? Some IR somewhere will face a big budget ax. (Disclaimer: those who know me professionally know that the IR I run is getting merged out of existence. That doesn't count for purposes of this prediction; that would be cheating.)
  • We'll see a bare handful more campus or patchwork mandates. I don't think we've quite seen the end of the post-Harvard wave. I do think we're close to that end—and there won't be a second wave, not without a lot more work and evangelism than the open-access movement is currently mustering. There just haven't been enough mandates quickly enough to start up an academic fashion.
  • Another major university press will merge with its library or fold. I haven't a clue which one, but given the continued bumbling confusion among provosts about scholarly publishing being able to cover its nut (hint: it can't), and the continued denial among the humanities that the economics of monographs no longer hold water (hint: go all-digital, perhaps plus POD, or die), this is all but an inevitability. We'll see a few more small scholarly presses fold as well.
  • Crowdsourced data-analysis projects will increase, and pick up more good press. GalaxyZoo alone practically guarantees this one, but the humanities are charging forward with some great transcription projects as well.

It'll be a challenging year, no doubt about it. Let's meet it with fortitude.

Oh, Chicago? Your Freudian slip is showing.

Dec 28 2010

When I was a young and ambitious librarian, as opposed to the cynical crone I am now, I read a number of librarian career manuals. I honestly don't recall which one it was that mentioned open-access journals only with loathing, discouraging any academic librarian serious about her career from publishing in one.

You know, never mind that in digital librarianship then as now, D-Lib Magazine is one of the major prestige outlets. (So much for peer review, incidentally. D-Lib isn't. Doesn't seem to stop them publishing brilliant articles from the best people in the business—and no, I've never managed to land an article there, so I am not being self-serving.) But that book was weak on digital librarianship to begin with.

Be that as it may, as a new institutional-repository manager reading that book, I felt betrayed by my own profession. How was I supposed to cheerlead for open access (gold as well as green) within my library and my institution if my profession took this "do as I say, not as I do" attitude? The rest is history, really; that was only one of many times librarians and librarianship have despised and undermined open-access work, my own as well as that of colleagues at other libraries in other institutions.

So I'm saddened but not shocked to see that the Chicago Manual of Style is similarly undercutting the many scholars actively participating in the open-access movement, as Stuart Shieber ably recounts. Not shocked, but a little surprised, I must say; I always respected and appreciated the Manual for its "if you really believe it's fair use, don't ask permission, because among other things, by doing so you weaken your fair-use case" stance.

That is sound advice, advice that strengthens the cultural commons. What the latest Manual says about open access is nakedly selfish and a tremendous lurch backward. Whoever wrote that segment should be ashamed. Whoever greenlighted it should be ashamed. Whoever demanded that segment be written? Shame isn't enough, frankly. Maybe a severance package?

The conversation on Twitter has noted that Chicago has a bit of a left-hand-right-hand problem here. If their fair-use advice isn't enough evidence of liberality, the University of Chicago Press has one of the most enlightened green-OA policies out there (ignore the color designation and read the text; "yellow" is wholly unfair). So the Manual editors aren't just stabbing Shieber and other OA-friendly faculty in the back; they're gutting their own colleagues over in the journal division. Nice of them.

I don't entirely know what to do about these situations. Voice and protest, yes, certainly. I've exercised voice. Dr. Shieber obviously does. Dr. Kathleen Fitzpatrick has as well, quite skillfully. The fact remains, though, that the academy's print fetish gives that librarian career manual and certainly the Chicago Manual of Style an awful lot more weight than a few blogs can muster—and as we're seeing, the suppliers of the academy's print fetish tend to be quite a bit behinder-hand than even the academy itself.

I wasn't entirely kidding about the severance-package idea. I don't even mean it to be punitive. I'm just not sure how far forward we move (pace wonderful people like Mike Rossner and the Rockefeller crew) given current scholarly-press notions of leadership.

Oh, look, that's there

Dec 27 2010

I don't know why it didn't occur to me until now to look for an article of mine that was to be published in November, but it didn't. (Perhaps because I've had articles delayed before?)

Anyway, "Who Owns Our Work?" has been out for a month in Serials. If you are paywall-stymied, I self-archived a postprint for you.

The presentation it's based on (which I think is a little easier to follow than the article, honestly, and it's also a bit saltier, which is more fun in my book):

Enjoy. If I find time later this week, I'll drag out my broken crystal ball, because that's always fun.

Tidbits, 2010 end-of-year cleanout

Dec 23 2010

Wow, have I ever let the tidbits folder get out of control. Bad me!

I've moved from to Pinboard for the nonce. With the diaspora in full swing, the best way to get me a link of importance is probably to comment here. Speaking of which: I'm getting reports of would-be commenters turned away with 403s. If that's you, would you please drop me a line at dorothea.salo at gmail? I'm trying to get a handle on how widespread the problem is and what might be causing it. Also, I'm really sorry it happened!

What if we threw a data-curation party and nobody came?

Dec 21 2010

So a lot of libraries and campus IT shops in the States are gearing up to deal with this whole NSF data-management plan thing. Websites are going up, would-be consultants are warming up their phones, plans are being planned (and sometimes even executed).

What if we build it and they don't come? Have we thought about this possibility?

I'm afraid my intrinsically Cassandraic nature only partly inspires these questions. We know pretty well from surveys and qualitative investigations (bug me for a bibliography if you like) that the average researcher hasn't a clue librarians can help her look after her research data. The said average researcher despises librarians, for that matter; she thinks that pukka information management can be taught to graduate students soup to nuts in a weeklong seminar, and she thinks that the real limiting skill for data management is deep disciplinary knowledge (which raises the question of why she typically leaves it to wet-behind-the-ears grad students, but…). The average researcher is dead wrong, of course (including about disciplinary knowledge being the sole limiter), but does she know that?

So let's imagine our old friend Dr. Helen Troia of the University of Achaea's Basketology department for a moment, faced with this new NSF requirement. Where will she go for help?

Well, she's probably going to call her NSF program officer first, an eminently reasonable thing to do. I hope the NSF has told its program officers to tell all the Dr. Troias of this world to look for help in their libraries—at least on their own campuses—but I'm not sanguine. What is clear, though, is that the NSF isn't going to manage Dr. Troia's data for her; at most, it'll give her a better idea of what she has to do to prove she's managing it wisely. So where does she go then?

She may also talk to her research-support office. Libraries: does your institution's research-support office know about your NSF-related activities? If it doesn't, better tell it. And she'll have a word with her local grant admin (she's lucky enough to have one) as well. Libraries: what do local grant administrators know about you?

If Dr. Troia's data are digital (not all data covered under the policy are, a point that bears re-emphasis), her next stop is likely to be her departmental IT talent. Libraries: if you are only partnering with campus IT, you may (depending on the way your campus is organized) be missing the boat. Find out where the people in small IT shops hang out, and reach out to them, too.

Now, departmental IT may well take on the job, but they are liable to do it ludicrously wrong. "Here, have some server storage space," they will say, ignoring questions of metadata, versioning, formats, organization, security, citability and other sharing issues, sustainability past grant expiration, and possibly even backup. I'm not sneering; with my own eyes I have seen a campuswide IT shop at a major research university, a shop that should assuredly know better, advertising unbacked-up storage as suitable for data-archiving needs. (No, I won't link. Yes, I am tempted to.) Again, it's a case of people not realizing what they don't know. NSF helper-elves need to be prepared to cope with that.

If departmental IT punts (as it likely should), then and only then will Dr. Troia approach campus IT. She will do so with fear and trepidation, as campus IT tends to be a Cthulhoid monstrosity, as fathomable as sunken Rl'yeh and approximately as helpful. Libraries: how are front-line tech-support finding out about your NSF-related services?

If none of the above people with whom Dr. Troia interacts points her toward the library, she won't come to the library. I wish that weren't so too. It's so. The inevitable corollary is that outreach efforts should not start with researchers. It should start with the layer of support and administrative staff with whom researchers regularly interact.

Even more cheerfully: none of this may work. We just don't know yet. We'll know much better in a year or so! Best have a plan for if it doesn't. Can you get a list of campus NSF awardees, to contact them individually? Do you have a few campus researchers who are willing to do projects with you? Can you get at the graduate students who are doing the real work?

Good luck. I think we'll all need it.

In which copyright is annoying

Dec 20 2010

With all the ferment over copyright law currently, I don't understand why someone hasn't pointed out that from a recordkeeping perspective, tying copyright law to author lifespan is an incredibly bad idea, amounting to an immense research tax on would-be preservers and reusers of culture.

I was recently asked about reuse of a published photograph by Paul Regnard, a French psychologist. Don't bother with Wikipedia; he doesn't have an article there, nor was he important enough to make the pages of the scientific and medical biographical dictionaries I could lay hands on. It is possible to triangulate via Google and some fossicking about that he died in 1927.

So French copyright terms, as best I can tell, currently mirror ours (life plus seventy years), with one wrinkle: if you died in active service, your copyright term lasts an extra thirty years. (What was French copyright law like in 1927? When the term was extended to its current length, did the extension apply retroactively? Darned if I know. If anyone would like to enlighten me, feel free.) So if Regnard died in active service, his photographs are still copyrighted. If not, not.

I'm not planning to investigate 1920s service records for France. I'm just not. So there the matter rests.

Frankly, as a pragmatic tradeoff I'd accept a longer copyright term (odious though that would be) in exchange for a more precise one, such that I wouldn't have to fuss about French service records. I mean, merde.

Friday foolery: the twelve months of Trogool

Dec 10 2010

Right, so, it's December and time for all those year-end recaps. Here's what we've been frothing at the mouth talking about this year, in the form of the first sentence of the first post from each month:

  • January: "Peter Keane has a lengthy and worthwhile piece about the need for a “killer app” in data management." (Wow. I still like this post. That's rare, with me.)
  • February: "Happy Groundhog’s Day Eve!" (Er, okay.)
  • March: "So the United Nations’ Intergovernmental Panel on Climate Change is mired in a rapidly heating controversy over a report that apparently let some dubious information slip through the cracks." (Funny, how this mess has colored everything else that's happened this year in data-management-land.)
  • April: "Not good at organizing your thoughts, much less your research notes?" (Did this actually fool anybody?)
  • May: "Having made it back at last from Scotland despite the ash cloud, and overcome jetlag and (some) to-do list explosion, I finally have leisure to reflect a bit on UKSG 2010." (Boy howdy, do I ever still believe this post. The cracks are showing. And widening.)
  • June: "*blows off the dust*" (Argh. Been a lot of that this year. Sorry.)
  • July: "I would be utterly remiss in my duties were I not to point out SciBling John Wilbanks’s vitally important new open-access initiative." (Heh.)
  • August: "Greetings again, gracious readers." (Oh, yeah. We kinda moved this year. It was cool.)
  • September: "Christine Borgman has a lengthy track record of saying smart and apposite things about scholarly communication and research data." (Yep. Meeting her at IDCC 2010 was a conference highlight for me.)
  • October: "In the first sentence I link to the article, making sure not to use the verboten “click here” as the link text." (I guess a lot of first-posts ended up on Friday this year?)
  • November: "A faculty friend of mine forwarded me the email following." (Where is the Wikileaks for ridiculous journal-publisher behavior?)
  • December: "This is by way of a public-service warning." (Bears repeating.)

Cheers. I do hope 2011 is a better year for blogging. This one has been rough on me—not always for bad reasons, to be sure, but even so.

Themes from IDCC 2010

Dec 08 2010

A few themes coalesced in my head while I was attending IDCC 2010. I don't pretend they're the conference themes; in fact, I know they're not. They're just my personal aha moments.

"Set and forget"

This community understands pretty well that preservation is not a "set and forget" process. The communities this community is embedded in tend not to get that. It's a problem.

I had a good conversation with John Mark Ockerbloom about LOCKSS, which is commonly understood as "set and forget" but which is not by any means robust enough not to require auditing and active intervention.

Institutional repositories have been actively marketed as "set and forget," and we all know where that ended up. In this case, though, it's not so much the auditing that falls down (IRs are actually pretty good at hanging onto bits and bytes) as policy decisions, active collection work, and hardheaded assessment of progress. More on this in a bit.

In any case, "set and forget" is at best an empty promise, at worst an outright lie, and it's good to remember that.

Data curation community of practice

It's scary to be on the bleeding edge, as research data management clearly is. It's doubly scary for those of us who have been on the bleeding edge and suffered for it. What mitigates the fear is community, and I'm quite pleased that data management is even at this early stage building a more active and cohesive community of practice than institutional repositories have ever managed to do.

Reasons for this include the absence of normative software communities in the data-curation space; the potential IR community fragmented quickly and completely around software choices. The enormity of the job also helps. Everyone thought (wrongly) that IRs could be built and maintained by one person with one hand tied behind her back, so where's the need for community? Everyone now thinks (correctly) that research-data management is much larger than any one person, any one library, or even any one institution. We're all looking for partners, collaborators, agony aunts.

And even better, we're finding them.

Open access is losing libraries and librarians

Library involvement in the open access movement in the United States is in trouble. I don't think the movement has entirely come to grips with that yet, but it is. As the "Cassandra of open access," I'd be remiss if I didn't say something.

I see a fair few symptoms. SCOAP3 is going down to the wire. COPE is floundering. When asked to pony up money for open access, I hear librarians and library administrators saying "Look, I thought OA was supposed to fix this budget crisis; instead, it's making my budget picture worse. In fact, when I go ask for more money for serials, I get asked why OA hasn't fixed the problem yet. Go find some other sucker; I'm done propping up this sad little sham of yours."

If that's not bad enough, OA is quietly, steadily losing its footsoldiers in libraries whose institutions don't boast OA mandates. Consider my illustrious co-blogger Sarah Shreeves. Her sole responsibility used to be running Illinois's institutional repository. These days, I learned at IDCC, she is also running the new Scholarly Commons and co-chairing the campus data-curation initiative. These initiatives eat up so much of her time that the IR has of necessity taken a bit of a back seat. I don't talk about my own job here (I really can't), so I'll just say that she and I have been professional twins for a long time, and we continue to be so.

This is great for those footsoldiers, mind you. Being an OA and/or IR footsoldier in the average US academic library is abject misery. The open access movement has never helped, or even taken notice that there might be a problem; when it's not proclaiming loudly that it doesn't exist to solve library problems, it's openly insulting libraries and librarians over a variety of so-called derelictions. This demoralizes the footsoldiers, as well as damaging their credibility and effectiveness within their institutions and their libraries.

The fair few footsoldiers I know are bright, talented, energetic people. I'm frankly thrilled their libraries are recognizing that and finding better professional situations for them. The OA movement, however, shouldn't be as thrilled as I am.

A little while ago I helped coach a friend into a job running a brand-new IR. I encouraged my friend to grill the employer pretty hard on what they were planning to do with the IR—the two questions I've been advocating for years, "what do you want?" and "how are you going to get it?"—and what I learned is that OA is so far down the list (there is a list, at least) for that library that it might as well not be there at all.

In its way, the very success of Open Access Week is a symptom. Listening behind the scenes and reading between the lines this year, I heard a fair few isolated librarians struggling against their own libraries to put together anything at all for the occasion. Several needed the OA Week banner ("this is an international event! it's embarrassing not to participate!") to goose their libraries into action. In addition, I got a distinct sense that some libraries put on an OA Week event in order to tick off the "did something about OA" tickybox for the year, in essence giving themselves an excuse not to do anything else.

I don't have any bright ideas, I'm afraid. I do believe that ARL/SPARC needs to turn its attention to stiffening its membership's collective spine, and giving them a clear and actionable roadmap to follow.

It's quite possible, even likely, that the OA movement will react to these symptoms with a collective shrug; that's certainly how they've treated libraries heretofore. I'm too personally demoralized by the whole mess to argue. The proof of the pudding, and all that. But if US IRs start folding and COPE doesn't make it and institutional mandates stop happening or existing ones backpedal, don't say I didn't warn you.

The Four Sons of digital curation

Dec 07 2010

So I wanted to put in my two penn'orth on this question on DHAnswers about best-practice guidelines for data in the humanities, but what I have to say is a little askew of where that discussion seems to be going. I'll say my piece here, then, and link from there.

At CurateCamp yesterday, the discussion of a curation community of practice suddenly took an extraordinarily technogeeky turn. By way of bringing it back to earth a bit, I pulled out a well-worn analogy that I've used before in other contexts: the Four Sons parable from the Pesach service.

The First Son in the Pesach parable asks his father to describe to him in exhaustive detail all the observances of Pesach and all the stories behind those observances, so that he can do everything correctly and pass on the knowledge to his descendants. Everybody in the CurateCamp room, myself certainly included, was a First Son. We can't get along without our First Sons. The peril of First Sons, though, is that they tend to lack perspective and get caught up in pilpul.

This is exactly what happened at DHAnswers. A couple-three First Sons got to duking it out about the value (or lack thereof) of SGML/XML markup. Derailed the entire conversation into a tiny, tiny corner of a very big question. It's what was happening at that particular moment in CurateCamp, too. It happens a lot, and it's a problem.

The Second Son in the Pesach parable asks, "What is all this to you?" By saying "to you," and not "to us," the Second Son intentionally and hostilely places himself outside the community, treating it as a zoo full of weird and occasionally unsavory animals. He doesn't understand what's going on and will have to be talked into caring. In universities, a lot of Second Sons live at high echelons of library, IT, and university-wide administrations. Grant funders have a fair few of them too.

The Third Son asks only, "What is this?" He's not hostile, but he's utterly clueless, not even understanding what he doesn't know. I've met Third Sons in large numbers among faculty. As the Pesach fable explains, Third Sons need simple and straightforward explanations that they can follow even if they don't really understand the problem domain.

The Fourth Son does not even know how to ask, and he exists in large numbers among faculty as well. The Pesach parable insists upon outreach.

The Third and Fourth Sons are why so very many early digital projects are no longer extant. The Third and Fourth Sons are the ones who perpetrate all the wrongheaded antipatterns DHAnswers has so kindly and snarkily collected. The digital humanities cannot progress among the humanities generally until the Third and Fourth Sons receive more and better guidance—emphatically including warning them away from common antipatterns!

Here's the thing. Too many approaches to digital curation, even to explaining digital curation, are aimed at First Sons. This is self-limiting, counterproductive behavior. Whatever the ACH and the NEH do to address data management among humanities research, it needs to be aimed at all four sons.

Idiosyncrasy at scale: digital curation and the digital humanities

Dec 07 2010

John Unsworth, Illinois. "Idiosyncrasy at scale: digital curation and the digital humanities."

Can't remove ambiguity in the humanities (the way you can in chemistry)! We'd remove everything that matters. This can make it hard to talk about humanities "data" (is there a thermometer for the zeitgeist?). Humanities data are idiosyncratic because the people who make them are.

Research methods are changing as traditional objects of humanities study (e.g. diaries, correspondence) become born-digital. Still have to "tame the mess," recognize that mess has value, including as a mess. Is departure from the norm an "error" or a "data point"?

"Retrieval is the precondition for use; normalization is the precondition for retrieval." (Not sure I agree with this! Techniques exist to deal with messiness.)

Six laws to give us pause:

  • Scholars interested in particular texts.
  • Tools are only useful if they can be applied to texts of interest.
  • No one collection has all texts.
  • No two collections are format-identical.

Therefore: humanities data narratives include normalization (of "Frankendata:" broadly aggregated but imperfectly normalized data). Lots of different kinds of normalization (spelling, punctuation, chunking, markup, metadata).

Example: MONK project, using EEBO and ECCO within the CIC. (Me, on soapbox: This. THIS. is the collateral damage from "sustainability" initiatives that impose firewalls around content. If you're not in the CIC, too bad so sad, you can't use these data.) Lots of data-munging which I won't recount.

Example: Hathi Trust, now available through API. Will be central player in developing research uses for digitized texts. Doing preprocessing/normalization blows up storage space necessary by 100x. There will be a research center established for working with this corpus.

Can we crowdsource corrections, a la GalaxyZoo? People are interested and willing, it can't be automated, and we need the help.

How do I keep my solution from becoming your problem? Association for Computers in the Humanities trying to crowdsource some best-practices recommendations for humanities researchers on managing their digital/digitized collections. Immediate conflict on DHAnswers site: to use markup or not to use markup? Practical upshot: when do we have usefully shareable data? When should we stop messing with it so others can use it? What's data and what's data interpretation, and what do we do when they coexist in the same marked-up text?

Humanities data is bigger than books! Books are the tip of the iceberg. NARA strategy for digitizing archival materials: they have 5x the pages of what's in Hathi Trust, in much less tractable forms than the books Google/Hathi is working on. And that's just one archive! We'll have to learn how to manage this kind of scale.

