Escaping Datageddon: comments, please

Aug 31 2010

I'm due to give an introductory talk on data management to a group of graduate students later this fall. Since I like to steal from the best, I cribbed heavily from MIT's most excellent guide on the subject, particularly their slidedeck, but I thought I could perhaps improve a bit on that deck's organization, as well as cut down some on the information firehose without losing the main points.

I consider this still under construction, so feedback is most welcome.

Escaping Datageddon

The Tip of the Iceberg

Aug 30 2010

The end of the summer and beginning of fall (for academics) is the busiest time of the year - I'm swamped!  A second reason for my silence is that I've been thinking more deeply about some other issues within the academy. Recently I became part of a team appointed to look at our internal assessment activities in the libraries and also determine the scope, depth, and impact of our organization to the campus administration and beyond. In a word - ROI. We must show it and we must prove it to others.

Now I know a lot of academics think assessment and ROI is a dirty word, rife with assumptions that curricula and teaching pedagogy will be micomanaged and misinterpreted by souless bureauocrats, and perhaps even altered at whim to meet fictitious benchmarks, much like Winston Smith in 1984. I'm not denying this can and does happen. Assessment is a popular buzzword today in the academy, and as long as the recession lasts I suspect it will stay high on the radar screen.  

In libraries assessment has generally been a numbing and jumbled mixture of circulation transactions, budget and financial data on products, usability studies with patrons on technology interfaces, products and services, and the occassional collections analysis or technology inventory. In short, a lot of things are measured but not necessarily considered holistically or even determined if the measure is worth the time and effort to maintain. Coupled with this phenomena is the requirement that libraries provide statistics to the member organizations to which they belong, such as ARL, ACRL,  and the like. In many cases the reporting requirements change yearly and libraries must anticipate how to best answer these questions. Inherent in all of this reporting, of course, is the desire to have the numbers provided show your library in a positive way: rank highly on the ARL list, high ILL lending to other institutions, or any other measure.   So this assignment I've been given is going to be very interesting, because it will go to the heart of many of the things we do and how much of our time, staff, and money we spend on them.   

This development dovetails nicely with the recent discussion on blogs and in the science 2.0 community on the future, purpose, and access to supplemental journal materials and the decision by the Journal of Neuroscience to stop accepting journal supplemental materials. Martin Fenner has a great blog post summarizing recent activity on the subject. I agree with him this decision brings up larger questions, like what is the concept of a scientific paper in the 2.0 (and 3.0) web environment? What should a paper contain? What is most important? What do researchers need to duplicate (or ignore) in their competing and complementary work? How should it be made available? And how will it be preserved or changes tracked to ensure proper attribution and recognition?  This concept of the scientific article, just like the current state of assessment workflow and methodology I describe for libraries, will need to determine what researchers need most and if the material is successful in meeting needs or generating funding or other support. 

This is a big challenge indeed, given the current state of disagreement among science disciplines about the use and deposition of materials in preprint repositories, practices and expectations for open shared data sets, the lack of well-defined standards for describing  data, and the need to archivally preserve materials for future generations. Plus as we look more broadly, will this new(er) concept of the research paper, assuming there's consensus on the result, work outside the sciences in  the social sciences and humanities? It's well-known there's already a difference of opinion today in how each of these areas create, disseminate and evaluate peer-reviewed content. Can a new model be created that works for all disciplines?

So in my opinion supplemental data is only the tip of the iceberg, just like the circulation or other operational data collected in a library. There's a lot more to consider, and no easy answers at this stage of the game. Other qutions are likely to emerge. What will the basic unit of research become? Will funded research continue to assume greater status than unfunded work? Can collaborative work be assessed fairly and acurately to determine the contribution and effort from each member?   These questions will need to be answered.

My library perspective tells me that ROI will not go away, and is likely to take on an even larger role in decision-making as technology becomes more data-driven and academics continues to require direct evidence of meeting specific measures to show success of programs and curriciula. We may all become our own Winston Smiths someday.

Friday foolery: Batman and libraries

Aug 27 2010

Many geeks already know that Barbara "Batgirl" Gordon is a librarian. But did you know that Batman solved the Dewey murders, back in the day? Well, now you know.

For more library-themed superhero, check out Rex Libris. I'm not a huge fan, myself, because there's not enough library in the superheroics for my taste, but it's still a good comic. For library- and archive-themed online comics generally, try Shelf Check, Derangement and Description, or the venerable Unshelved.

Incidentally, nobody took me up on my challenge to guess Pleione-the-bicycle's color. Pleione is purple ("matte eggplant" per Electra's own description), and Wikipedia explains the science behind her name if you scroll to the bottom of the entry.

Open access, open data... open discourse

Aug 26 2010

At the end of "Who Owns Our Work?" I pointed out that content vendors no longer own the discourse about their products and about scholarly communication generally. (It didn't fit the article very well—I swear it made more sense in the presentation—but it struck me as important enough to shoehorn in anyway.)

JSTOR found this out in a big way yesterday, as it got egg on its interface (points to Inside Higher Ed for that wonderful headline) and got called on it in public, promptly and bluntly, on Meredith Farkas's blog and elsewhere.

Now, here's a dirty little library secret for you: we librarians complain about software and vendors all the time among ourselves. Constantly. Most of the time, all that complaining doesn't amount to a hill of beans, even when it's smart and cogent. Vendors rarely hear it, and when they do hear it, they don't feel any particular pressure to respond. It's not as though libraries can walk away from them, and it's not as though our grousing makes them lose face.

Except now it does. At least a few library bloggers and groups of librarians among the online public have enough weight to make a difference.

Consider also California vs. Nature Publishing Group. The probably inevitable conclusion has been reached: the parties are going back into a smoke-filled back room to hack out an agreement. Even so, words have been said that cannot be unsaid. California yanked the curtain away from the Wizard of Nature's corner, and quite a few people saw and heeded. This, too, is a shift in discourse.

Making the open-access case to researchers usually means trying to induce a specific lightbulb moment: the moment when they realize that the agreements they sign with publishers mean that they can't do what they thought they could, what they think they should be able to do—whether it's accessing an article they want or putting it on electronic reserve; republishing an article they wrote in a book or putting it on the Web. I get a sense that the lightbulb's gone off for quite a few more people lately. Look at the second comment here, if you will. Sure, it gets a couple of material facts about the JSTOR situation wrong. Still. I didn't think I'd live to see the day when an emeritus professor of European literature, for heaven's sake, would froth at the mouth about scholarly publishing and access!

On the data side of things, Climategate and the Marc Hauser case are leading to calls for open, or at least opener, data. And with regard to peer review, many eyes make bugs shallow, even in the humanities. The discourse shifts even a little more… it's not where we want it to be, but it's moving in good directions.

So what should we do, we librarians, we folk of all stripes working toward open access and open data? How about a third leg on the stool, open discourse? Let's throw open our smoke-filled rooms, the way California did. Let's duel in press releases as well as at negotiating tables. Let's take the ARL's excellent advice and stop signing NDAs. Let's throw our acquisitions numbers out there for all to see (and yes, this means consortia too, reluctant though they may well be to disclose). Let's slug all this out. Openly. On the web. Because we can already see it makes a difference. Really, what do we librarians, the original advocates of freely-available information in a civic cause, have to hide?

Recognizing my own significant bias in this matter, I do still want to suggest also that the library profession recognize good library bloggers and social-media adopters as a strategic asset. Today JSTOR, tomorrow the world? More lightbulbs, more places? This course of action is not without risk to balance reward, of course; good library bloggers turn their laser eyes on their own profession as well as on vendors, because good bloggers are generally honest even when honesty hurts. On balance, though, aren't good bloggers worth protecting? Certainly from vendors, but how about from their own workplaces too? ARL, ALA, this is for you. Library blogs are shutting down and going underground, and as my own blogging history demonstrates, library employers themselves are often the proximate cause. If you want to keep this strategic asset, you need to help protect it.

(All of this is also true of good academic bloggers; adjuncts and those with young careers could particularly do with some protection. I just don't happen to have any particular leverage to exert in that realm.)

For my part, I'm trying to walk my own talk. In Book of Trogool's short history, I've come within an inch of going dark two or three times. Book of Trogool is harder to keep up than Caveat Lector was, because its subject matter is much more circumscribed (which taxes my powers of invention some days) and because I'm eternally, unhappily aware that BoT is being watched with suspicion and distrust from within my own circles.

Even so. I saw a lightbulb flicker on here on this very blog in the comments the other day. I connect researchers with their librarians pretty regularly on FriendFeed and Twitter. I'm reaching the circle of scientists on Scientopia by being an active part of that circle—and no, I don't proselytize open access behind the scenes, of course I don't, but I build credibility for it as I build credibility for myself. None of this is part of my job, but it's part of my work, if you catch the distinction.

So scarred and bruised though I am, afraid though I am, I keep working, working toward open access, open data… and open discourse.

The delicate recruitment dance

Aug 24 2010

This is actually the second recruitment letter I've gotten from North Carolina State University for a repository-manager position. Mostly this is a statement of how long I've been doing this; turnover in repository management is (in my anecdata-fueled estimation) quite high. A five-year repository manager is a rara avis indeed. (There's an article there for a motivated Ph.D student. My hypotheses would be that "maverick managers" are most likely to leave, and that turnover predicts turnover: a repository that loses one manager is more likely to lose the next than would be predicted by normal turnover numbers in the library profession.)

I'm impressed by how delicate the letter's wording is. See, employee-poaching is rude, so the letter can't actually up and say "we hope you'll consider applying." Instead, it asks to have qualified candidates referred to them (wink-wink-nudge-nudge) and assures me that they'll be discreet about their applicants: no sneaky reference-calling, inquiries to be kept confidential, and so on.

Anyway, no, I'm not a suitable candidate and I'm not applying, nor do I have a suitable candidate at my fingertips just at present. (My top students are all doing nicely, thank you; I'm very proud of them. Days I think they are my impact on the world, not anything I've said or written or done. There are many, many worse legacies.) I'm not a suitable candidate because five years of repository work is enough for me; I want to do other things now. (Not to mention that this position leans toward the technical end of repository management, and I'm not as techie as they want—certainly nowhere near as techie as the position's previous holder.)

There's a brief job squib on LISjobs, and going to their jobsite and searching "repository" will turn up the full position description and application instructions.

I have a lot of respect for NCSU Libraries. They get stuff done. I don't know anything much about the work environment, I admit, but if you're a firebreathing techie sort, you could do much worse than giving this a look.

There. I hope I have returned the courtesy of the letter sent me.

A long way from zero

Aug 23 2010

If you believe this paper from last year, roughly one in five current peer-reviewed articles is now to be found open-access. (I'm not sure why the paper is only getting press now, but the world is weird.)

Numbers are bunk, let's just say that up-front. I don't believe these numbers; I don't believe most numbers around open access. It's too hard to draw a circle around what we're trying to find out. This study (as with most in this area) omits an important fine point, as well: how much of this open access is, y'know, legal? The authors found papers "on the home pages of the authors." Let me tell you, researchers don't know from copyright, much less publishers' policies surrounding same. I'd be pleasantly shocked if half those postings were legal.

Still. Twenty percent. One in five. It's a long way from zero.

Some days I feel that what I do is stunningly pointless. This… is not one of those days.

Brief hiatus

Aug 18 2010

Things are a bit slow in my neck of the work woods; I have a syllabus to finish revising, but the worst of that is over.

Therefore I have taken the rest of the week off, and will likely be exploring bike trails. (I bought myself an Electra Townie. Her name is Pleione. Points for anyone who knows what color she is based on her name without using a search engine—and no fair if you follow me on FriendFeed and already know.)

Blogging from me will probably not happen, though my illustrious co-bloggers may fill in a blank or two.

A question of mission

Aug 17 2010

While I was at UCLA, I heard some fairly strong pushback against institutional repositories. Much of it didn't trouble me at all; it's no more than what I've said myself. One strain, however, did bother me rather.

It went something like this (and I apologize; I am probably traducing it): How do libraries justify spending on open access—making local materials available to the world—if our guiding mission is to buy appropriate materials specifically on behalf of our patron base?

To make that abstract question more concrete, consider the following tradeoffs (all of which are from me, not my interlocutor): Cancelling a toll-access journal in order to fund author fees or to underwrite memberships with open-access publishers. Using staff and technology resources to collect student research in an institutional repository because it's what can be collected in the short term, even though doing so has no appreciable short-term impact on the access crisis.

Here's my answer: For how long? Short-term, of course toll-access access will suffer at the hands of any resource shift toward open access. But thinking short-term is exactly, exactly, what got us into the serials crisis to begin with. It's exactly what saddled us with the Big Deal.

We can keep feeding the same broken system in hopes it will become less broken. We can do this. It makes no sense to me, but we can. (What happens when it breaks down completely, which is not outside possibility? Good question. I've never yet seen a short-term thinker with an answer.) Or we can place some longer-term bets, with the explicit understanding that some of them will turn up losers.

I'd rather place the longer-term bets, myself.

Tidbits, 16 August 2010

Aug 16 2010

First run of tidbits on Scientopia! As you can imagine, I've got a few…

As always, drop a comment here if you see something the Book of Trogool crowd needs to know about.

For lo, it is an article

Aug 13 2010

My latest article, "Retooling Libraries for the Data Challenge," is now online. Ariadne is fully open-access, so read away.

If it sounds familiar, that probably means you were at Access 2009 or a UKOLN meeting in Bath earlier this year, at both of which I gave the talk on which the article is based. (If video is your thing, there's video of both these talks about. Check Vimeo.)

My thanks to Ariadne editor Richard Waller for an exceptional job of turning around this article in a really short time. This sort of thing is time-sensitive—I will be shocked if things haven't changed beyond recognition in a couple of years—so I'm extra-glad to see quick publication; I seem much less of a dork that way.

Anyway. Comments and critiques welcome. Mine is never the last word.

