The One Schema

I grumbled on FriendFeed today that I wish folks (IT folks in particular) would understand that there is no single metadata schema that works for every kind of data in every form in every situation. If you're building a data repository intending to store many kinds of data from many disciplines, it had better have a metadata model that accommodates many different vocabularies.

Bill Hooker promptly stepped up to the plate with the following dictum (slightly edited by yours truly):

Three schemas for the astronomers under the sky;
   Seven for the urban planners in their halls of stone;
      Nine with which biologists comply;
and ONE for the Librarian on hir Dark Throne:
In the Land of Library, where the metadata lies.
   One schema to rule them all,
   One schema to find them;
   One schema to bring them all;
      And in the repository bind them.
In the Land of Library, where the metadata lies.

I just named my Aeron chair the Dark Throne, y'all.

Library contracts and journals 101

Libraries sign a lot of contracts to get access to content. A LOT. Think of your household and multiply by it by a thousand or more. The bigger the library the more contracts they sign.

Because we do this with so many publishers, organizations, societies, etc. there are other companies set up to manage all these subscriptions, standing orders of book series, and the like. We call them "subscription agents." These agents are so important that they usually give the biggest parties at the largest library conferences. And we all know that's  the true barometer of clout in a profession.

The American Chemical Society, or ACS (there they are again) recently sent out information on next year's journal subsciption costs to libraries. Now, you may have guessed from earlier posts of mine that the ACS can be a little conservative with regards to publishing. Well, they are conservative with respect to ownership of content, copyright, and open access, but with repsect to licenses and pricing they seem to be quite different.

One good example of this is our library agreement with ACS journals. We pay annual costs for both the new content (called the ACS web editions) and the archive (the Legacy Archive.) This two-tier pricing scheme has been in effect for some time. There are other societies that have access to journals set-up differently (one price for all content, or no archive is available) but most commercial publishers have a similar two-tier system in place.  One example of this is JSTOR in the humanities and social sciences.

Another option publishers have is to bundle journals together into packages. Since ACS journals are in high demand, most larger academic libraries have an all titles (or All Publications) package. This is convenient because when a new title is released you don't have to start a new individual subcription.

All well and good - these systems have been in place for some time and usually eliminate unncessary paperwork in renewing subscriptions. Long story short is our state-wide agreement with the ACS ended, and we had to negotiate with them to renew our subscriptions. In our case, the price increase was manageable (maybe 5%). Some of the possible reasons for the price increase, which were explained to me from our ACS rep, are as follows:

1)     Many places received an early payment discount in the past, which was not factored into the base price for next year. So the base price was raised. While this is odd, it wouldn't surprise me if ACS was in fact doing this and/or it wasn't clear on the invoice. I would recommend libraries check their previous invoices to see if this is reason for part of their increase.

2)     ACS added two new titles to the web editions package, so this subsciption was raised accordingly. This seems to fair to me and in line with previous increases.

3)     The ACS legacy archives costs increased for many schools, and in some cases the cost doubled.  In our case this didn't happen but if a school was in a consortial arrangement this may be the reason why the bill is so much higher.

So there you have it. I don't think this is an evil plan from the ACS but rather an opportunistic way to redefine the parameters of the contract without breaking them. When you sign a contract, as long as the terms are obeyed there's little recourse.

I have seen so many confusing pricing deals from the ACS that after this renewal was settled I moved on. I didn't realize until later that some libraries are seeing very large increases, 20% or more. There's more discussion of this on friendfeed.

I predict we will see more of this pricing instability, especially as newer publishing models develop and mature.  Unfortunately this is an area, as we've seen with the Nature situation earlier this year, where information it's not possible to openly  share this information. So speak up, tell your stories, and make some noise when there's a big price increase.  Tell faculty why journals are being cancelled.  In most cases it's not because of content but other reasons. I predict this increase will cause some libraries to cancel some ACS subscriptions, because the increases will be too large for them to sustain and the increase is coming late in the year, when it's harder to absorb a larger hit.

Nature: the response

I was able to get quite a bit of feedback from Nature regarding prices, access, and content. I spoke with our North American sales reps and another staffer from the London office last month and their response is below, with my questions and their answers:

1.      According to Nature’s recent letter, 50% of Nature journals do not currently have an OA option. Are there plans to provide access to the remaining 50%? How would this be accomplished?

Its not 50% of all NPG's journals but 50% of our academic (not Nature-branded) journals. NPG's August letter to customers says "Open access options are now available on 50% of the 50 academic journals we publish including all 15 academic journals owned by NPG. Seven journals published by NPG on behalf of societies offer open access options, with more expected to follow later this year."

All of the academic journals NPG owns now have an open access option. Since the letter was published, six more of our society-owned journals have introduced open access options. They are American Journal of Hypertension, Laboratory Investigation, Modern Pathology, Mucosal Immunology, Neuropsychopharmacology, Obesity. We continue to discuss open access with our publishing partners, ultimately the decision to introduce an open access option on these journals remains with the society or organisation who owns the journal.
Nature Communications is unique amongst the Nature journals in offering an an open access option. The gold open access model (funded by article processing charges) is still inappropriate for Nature and the Nature research journals. These journals decline more than 90% of submissions, these high rejection rates and the developmental editing that goes into every published paper would make APCs prohibitively high. We estimate the APCs on these journals would be between $10-30000, and research funders are not currently willling to support this.  The Nature Review journals do not publish original research papers.

 2.      Does Nature have plans to incorporate newer metrics into journal, article, and author information and assessment? Some examples of these include article downloads, author h-index information, Eigenfactor information, etc. 

Earlier this year, we introduced article download information for 43 journals. This is available to authors within their account on eJournal Press, our manuscript tracking system.

We continue to monitor the alternative metrics such as the Eigenfactor, Article influence score and h-index. These metrics are not yet widely accepted or understood, but we remain very interested in alternative ways of judging impact and value.

For example, at NPG we think that cost per download and cost per local citation are potentially important measures of the value for money of a journal to an institution.

3.      With regards to communicating and sharing consortial plan arrangements and information, are there plans to provide more transparency on pricing? Specifically are there plans for different consortia to know prices and information provided to other individual customers and consortia?

NPG makes its academic list prices public in the interests of transparancy. We have no plans to make terms of consortia agreements public. Each consortium is different in terms of their holdings, number of institutions and FTEs, and these are confidential agreements.

4.      In my phone call with our NA reps we briefly discussed a library advisory board for Nature. Can you give more information on the group’s membership and activities?

The NPG Library Committee is an invited group of NPG institutional customers. The group represents a mix of customers from across the world, working in academic, corporate and government settings. It includes both individual customers and consortia managers. The Committee or a sub-group of it meet face-to-face approximately once a year. We discuss NPG's activities and the wider publishing and information communities with them regularly via a discussion board on Nature Network, email and phone. The Committee provide useful feedback and insight on the views of the information community.

5.      One comment to my blog post mentioned Nature’s mission statement and its recent change. Can you provide more details on how the mission statement is reviewed, updated and shared with customers and readers?

The journal Nature's original 1869 mission statement still stands, and guides Nature Publishing Group's activities today:

THE object which it is proposed to attain by this periodical may be broadly stated as follows. It is intended
FIRST, to place before the general public the grand results of Scientific Work and Scientific Discovery ; and to urge the claims of Science to a more general recognition in Education and in Daily Life ;
And, SECONDLY, to aid Scientific men themselves, by giving early information of all advances made in any branch of Natural knowledge throughout the world, and by affording them an opportunity of discussing the various Scientific questions which arise from time to time.

Nature's mission statement was updated in 2000 as follows:

First, to serve scientists through prompt publication of significant advances in any branch of science, and to provide a forum for the reporting and discussion of news and issues concerning science. Second, to ensure that the results of science are rapidly disseminated to the public throughout the world, in a fashion that conveys their significance for knowledge, culture and daily life.

We have no current plans to update Nature's mission statement.

6.      Another comment to my blog post mentioned the pricing model for Nature as being based on a floating currency model which is not determined by currency rates we see in bank and other financial updates. Is this how international currencies are determined and are there plans to review or revise this model? 

In 2008 NPG introduced local pricing based on four local currencies (dollar, euro, pound sterling and yen). This means price increases are applied to local currencies, independent of currency exchange rate fluctuations.

I have a few comments on these answers:

1. Estimates of $10,000 - $30,000 in author charges for one OA article? You heard it here first. That's the entire journals budget for some small libraries. They'd be able to get one article for the year - for all the faculty. I wouldn't call that a viable OA option.

2. I'm glad Nature is implementing article download information, but I think it much more beneficial if everyone, not just the author, can see the data. As a point of comparison citation data is available to all users in a database. 

 More generally, I think Nature is trying to have it both ways - be a boutique publisher, with high costs and a correspondingly  high-profit margin, and also remain a core publisher, in that most acdemic or scholarly institutions are expected to subscribe at least some of the content. With these cost increases and licensing options, Nature is becoming (or already has become) unaffordable for many institutions. If scholars weren't demanding access, many of these titles would have been cancelled by many libraries by now.  

I don't think you can have it both ways. If most of your customer base can't afford the content, then in my opinion your market is limited and you can't also be considered a core publisher. Can you be both Neiman-Marcus and Wal-Mart at the same time? I don't think you can.  I'm curious to see how many libraries have cancelled titles from Nature or have forgone adding titles because of the cost.

This discussion is also painful for me because I know faculty where I work want Nature journals - I have a list of over six titles that have been requested in the last few years. I feel I am doing a disservice to my colleagues in withholding access to something. But the money is simply not in our budget.

I also want to support publishers that are experimenting with new communication channels in scholarship like web features, a blogging platform, Second Life, podcasts and the like.  Nature has been very progressive in exploring these new areas of scholarship, and their support has helped legitimize them as communication channels. Does it have to come at such a high cost? I hope not.

The Tip of the Iceberg

The end of the summer and beginning of fall (for academics) is the busiest time of the year - I'm swamped!  A second reason for my silence is that I've been thinking more deeply about some other issues within the academy. Recently I became part of a team appointed to look at our internal assessment activities in the libraries and also determine the scope, depth, and impact of our organization to the campus administration and beyond. In a word - ROI. We must show it and we must prove it to others.

Now I know a lot of academics think assessment and ROI is a dirty word, rife with assumptions that curricula and teaching pedagogy will be micomanaged and misinterpreted by souless bureauocrats, and perhaps even altered at whim to meet fictitious benchmarks, much like Winston Smith in 1984. I'm not denying this can and does happen. Assessment is a popular buzzword today in the academy, and as long as the recession lasts I suspect it will stay high on the radar screen.  

In libraries assessment has generally been a numbing and jumbled mixture of circulation transactions, budget and financial data on products, usability studies with patrons on technology interfaces, products and services, and the occassional collections analysis or technology inventory. In short, a lot of things are measured but not necessarily considered holistically or even determined if the measure is worth the time and effort to maintain. Coupled with this phenomena is the requirement that libraries provide statistics to the member organizations to which they belong, such as ARL, ACRL,  and the like. In many cases the reporting requirements change yearly and libraries must anticipate how to best answer these questions. Inherent in all of this reporting, of course, is the desire to have the numbers provided show your library in a positive way: rank highly on the ARL list, high ILL lending to other institutions, or any other measure.   So this assignment I've been given is going to be very interesting, because it will go to the heart of many of the things we do and how much of our time, staff, and money we spend on them.   

This development dovetails nicely with the recent discussion on blogs and in the science 2.0 community on the future, purpose, and access to supplemental journal materials and the decision by the Journal of Neuroscience to stop accepting journal supplemental materials. Martin Fenner has a great blog post summarizing recent activity on the subject. I agree with him this decision brings up larger questions, like what is the concept of a scientific paper in the 2.0 (and 3.0) web environment? What should a paper contain? What is most important? What do researchers need to duplicate (or ignore) in their competing and complementary work? How should it be made available? And how will it be preserved or changes tracked to ensure proper attribution and recognition?  This concept of the scientific article, just like the current state of assessment workflow and methodology I describe for libraries, will need to determine what researchers need most and if the material is successful in meeting needs or generating funding or other support. 

This is a big challenge indeed, given the current state of disagreement among science disciplines about the use and deposition of materials in preprint repositories, practices and expectations for open shared data sets, the lack of well-defined standards for describing  data, and the need to archivally preserve materials for future generations. Plus as we look more broadly, will this new(er) concept of the research paper, assuming there's consensus on the result, work outside the sciences in  the social sciences and humanities? It's well-known there's already a difference of opinion today in how each of these areas create, disseminate and evaluate peer-reviewed content. Can a new model be created that works for all disciplines?

So in my opinion supplemental data is only the tip of the iceberg, just like the circulation or other operational data collected in a library. There's a lot more to consider, and no easy answers at this stage of the game. Other qutions are likely to emerge. What will the basic unit of research become? Will funded research continue to assume greater status than unfunded work? Can collaborative work be assessed fairly and acurately to determine the contribution and effort from each member?   These questions will need to be answered.

My library perspective tells me that ROI will not go away, and is likely to take on an even larger role in decision-making as technology becomes more data-driven and academics continues to require direct evidence of meeting specific measures to show success of programs and curriciula. We may all become our own Winston Smiths someday.

The Culture of Swag

I am in the midst of obtaining more information from Nature regarding questions raised by my recent post and from commenters, which I hope to share soon.

While the economics of publishing are hard to ignore, the economics of marketing play just as large a role in scholarly communications. Where and to whom are publishers directing their marketing efforts? Are they marketing to me (the librarian), the scientist/researcher who submits articles, or their readers (who may be neither)?

One message is the marketing tone and expectations set at conferences and in professional practice. Publishers at library conferences gave (or used to give away, before the economy tanked)  a LOT of swag, gifts, parties, and the like. I can go into more detail but trust me - its like a Mardi Gras parade with the office supplies and tchotckes. This "culture of swag" is a pervasive and omnipresent part of the conference experience. It's almost impossible to attend a conference and not receive any swag. Is this a good thing?

In the past swag was a way to entice and retain customers for loyalty to a product or vendor. Most of the library content we buy today, after many rounds of budget cuts, fits into several categories. The first category are new products that we'd like to buy but can't afford. Swag can't change my mind in this scenario. The second category contains products which are only available from one source or required for a program accreditation, like Scifinder Scholar, so swag is wasted effort there as well. The last major category, vendors who provide similar services or products who might need swag to show their competitive advantage, is getting smaller all the time. The last two years have experienced many mergers by a select number of vendors (Proquest, OCLC, and EBSCO in most cases). In some casess swag might be effective, but these major players have all created fiefdoms of overpriced content and have bought out many competitors which might innovate within the market. Some of my peers have written in great detail about these three vendors and some of their business practices in their blogs, so I'll skip the details.  

Scientists and researchers buy a lot of content and equipment too, at least in the field of chemistry. What's interesting is that the marketing effort, or culture of swag, takes on a different tone. Prestige, visibility, and influence is the swag of choice. Publishing/refereering/editorial services will make you more influential among your peers, provide greater visibility for your work, and make sure your articles get published in the best journals. This kind of swag is less visible but no less tangible than a mousepad or T-shirt. And deep down I think many scientists know they are being given the swag to provide free refereeing, editing, or journal content if they get an article accepted.  Just like in libraryland its hard to attend a professional science conference meeting and not see this swag too. Scientific professional societies practice this form of swag as well, although the message is more easily disguised as service to the profession and enhancing the prestige of the profession rather than the individual. And in my opinion its as pervasive and omnipresent there as in library culture.

The problem is that this newer culture of sharing, disseminating, and distributing information (open access, open data, open source, open notebook science) doesn't rely on the culture of swag, at least as a traditional purchasing or behavorial tool. How can a publisher persuade me with swag if there's no subscription costs or the journal costs are underwritten from other sources than the library? How can this same publisher use the prestige/visibility/influence argument on a researcher when you can write a blog post that more peers can read than a peer-reviewed journal article? What if I can see how many people have read my article or blog post, accessed the data I've collected for a series of experiments, shared their data with mine, and I can then collaborate with them pre and post publication to correct mistakes, share additional insights, and improve upon the original concept or methodology? What if I can have my work openly reviewed by my peers, eliminating the fear of being scooped on an idea or denied credit for a major breakthrough or discovery, and have this approach considered valid for promotion and tenure in the academy? This environment strips away the artifice and destroys the culture of swag.

So keep the swag in mind when you make decisions on supporting a product or publishing and sharing your research results with peers. You might not be able to say no every time, or influence the giver that it's a waste of time. But its a start.

Where are libraries in data curation?

The Association for Research Libraries has a pretty good report out (I consider it JISC-quality, which is saying something for me) on where its member libraries are on e-science, cyberinfrastructure, data curation, whatever you want to call it.

I already knew most of what was in the report proper owing to having read the preliminary report (what, me, obsessed?), so the good stuff for me was in the case studies. I loved this blunt assessment from UCSD (paragraphing and emphasis mine):

At present, there are three primary pressure points related to e-science/e-research support at UCSD: turf, money, and interest. In reverse order, with the exception of a few very high-end data generators amongst the faculty, e-science/research lifecycle management is not high on the list of faculty concerns.

  • The NSF’s best efforts notwithstanding, most researchers, at least locally, have been slow to wake to the data challenge. They seem to think, to the extent they think about it at all, that they’ve already got it covered or that they lack the funds to cover it and, therefore, it should be somebody else’s problem.
  • As a consequence, this campus at least, has committed funds for providing the infrastructure and services necessary to curate data for the long-term, in the hope, frankly, that sufficient faculty (and students, but mostly faculty) will avail themselves of both to make the enterprise self-sustaining. That’s the good news; the bad news is that it has committed only those funds and only with the understanding that the enterprise will become self-sustaining. Whether that proves to be the case remains to be seen of course.
  • Finally, there is an awfully large number of parties interested in what remains a still-ill-defined problem space. The associated ‘jostling’ makes calculating the right mix of those parties in the solution space doubly challenging.

What I hear through the grapevine suggests that the above is true at many more places than UCSD.

I would add to this that "self-sustaining" is an incidence of the "them that has the gold gets the services" anti-pattern. Grant-funded research does not produce all data worthy of note. I'm all in favor of earmarks from grants, don't mistake me—I just worry quite a lot that data services will exclude all but the well-funded.

As I read through the case studies, what struck me was that these are stories of pioneering individuals given lots of freedom and enough support to use it. I applaud those individuals (indeed, I know several of them personally), and I love what they're doing. I just—well, I'm a worrywart. I worry about them too. They're heroes, and just as in programming, one can't sustain an enterprise indefinitely on heroes. (We tried that, as I am rather tired of saying, with IRs and their maverick managers. It didn't work, unless by "work" you mean "burn out good people uselessly.")

So when do we move past the hero model of data curation? When will it be mainstreamed? I genuinely wonder.

Cautionary tale of change

So it's been a fascinating and productive day here in Los Angeles; I've thoroughly enjoyed myself, and had my brain rattled in useful ways.

The day's final presentation, from Elisabeth Leonard, was all about change and change management in libraries. As she said, practically everything else that was said today (certainly by me!) fed into that theme.

I found myself thinking about the end of Neil Gaiman's Sandman (and no, I won't apologize for spoilers on this one). The plot of The Kindly Ones offers any number of hints—well, sometimes not hints, sometimes more like two-by-fours over the head—that Morpheus is not only allowing but orchestrating his own demise. At Morpheus's funeral, the librarian Lucien is asked why—why would Morpheus do that? Why did he die?

"Charitably," Lucien answers, "I think… sometimes, perhaps, one must change or die. And in the end, there were, perhaps, limits to how much he could let himself change."

I wonder how much that is true of libraries, particularly as we confront the challenges that scholarly communication and research-data management present us. (I wonder how much it's true of scholarly publishers, too. We certainly aren't the only profession in this boat.)

And in my very low, very discouraged moments (which are thankfully rather fewer of late), I wonder whether we should reach for the life preserver or the wrecking ball.

Anyway, I promised my slidedeck with speaker's notes, so here it is:

So are we winning yet?

What do librarians need to know about how you communicate?

So this weekend I'm off to California to give a short talk aimed at helping the Electronic Resources and Libraries conference think about its programming. I've been asked to talk about scholarly communication, institutional repositories, and open access (surprise, surprise).

My slidedeck is done, though I'm still tweaking patter:

So are we winning yet?

I'll post a version with patter after the talk, but for now, I thought I'd throw this out for discussion: What do you, scientists, want librarians to know about how you communicate with other scientists? Where do you feel uncertain about the process? Where do you think it's coming up short? Do you think the process should change, and if so, how and how not?

I'm aware that librarians get stuck in our own thought-bubbles just like everybody else—I myself am certainly no exception. Here's a stab at bursting the bubble.

Small fry, blogging networks, and reputation

So, the PepsiCo blog thing. Right.

Advance disclaimer: this is me talking, not either of my illustrious co-bloggers. We have not yet made a decision about what to do; one co-blogger is across the pond at a conference and the other is vacationing, so that discussion will have to wait a bit. This is just my take.

Book of Trogool is very small fry at ScienceBlogs. Very small. SB was a bit dubious about it at the start, to tell the truth, and if their info-science stable had been better-established I doubt they'd have taken it on. I'm very grateful that they did, because I needed them.

One of the reasons SB's info-sci stable isn't larger is that librarianship is a very difficult profession to blog in. It doesn't like blogs or bloggers, or social media generally, much less trust them or those who engage with each other and the world using them. Because libraries and librarians feel beleaguered, they especially don't like discourse critical of libraries or librarianship in social media coming from one of their own. Library vendors aren't fond of critical discourse in librarian blogs either. For individual librarian bloggers or public social-media figures, this has absolutely meant trouble at work. I'm one example, but very far from the only one—and I earned my problems more than most folks I know in similar straits.

This leaves the beleaguered library blogger who wishes to continue to blog with a few options. One is to be part of a group blog to create strength in numbers; In the Library with the Lead Pipe is a sterling example (and a fabulous blog; if you're interested in libraries from the inside, this is not one to miss). Another is to adopt some of the trappings of the formal library professional literature, such as length, exclusivity, and beta-reading-oops-I-meant-peer-review. ItLwtLP does this as well. A third option is to find a blog home with enough accumulated strength of character and good reputation as to afford some protection—and now you know why I chose ScienceBlogs.

Insofar as letting PepsiCo cadge cachet from SB's stable of bloggers damages SB's reputation (never mind strength of character) it causes me pressing difficulty. I'm not happy about that, because my sense watching events unfold is that SB has seriously damaged its reputation, both by casting its processes into doubt and by losing quite a few talented, brilliant bloggers. Moreover, based on the trajectory of other sellout properties like LiveJournal, unless Adam Bly learns a lot from this experience—and signs point to "not so much with the learning" at this juncture—he will likely err seriously again. And again. Until SB is not only not a shield, but an actual stain on a blogger's escutcheon.

These are petty, selfish concerns, to be sure. They are the tiny concerns of a small-fry blogger. Given that SB is rapidly alienating its big-fish bloggers, however, SB would be advised to heed these concerns, if it wishes to rebuild any sort of a stable.

To be perfectly clear, there is nothing intrinsically wrong with an individual industry scientist or big-pig-publisher employee coming to ScienceBlogs to blog on his or her own initiative. (Me vs. big-pig-publisher employee could be amusing!) I would hope that SB would provide such individuals the exact protections (from their workplaces not least) they have afforded me and other SB bloggers. What's wrong is selling a corporation the chance to trade on the collective cachet accumulated by SB's blogging stable by emitting corporate newspeak under the SB label—and I don't credit for an instant that Dr. Khan or Dr. Mensah or anyone else from PepsiCo will be blogging freely and uninterfered-with. I don't believe all the "advertorial" drapery fixes that basic wrongness.

So I labor under a dilemma. SB has been unique; there are other science-blogging stables, but none of them quite fits Book of Trogool. (Catch me blogging at Nature Networks! Not in this lifetime.) I sincerely doubt any of the group library blogs would take me on; I'm a bit Tabasco for this profession. I can't go back to solo blogging. If SB folds (a possibility, the way things are going), if my co-bloggers are too affronted to continue here, if I decide that I am too affronted to continue here—well, chances are I just hang it up, retreating to the slow, ponderous library literature to get my licks in.

That's not what I want. (Ask my writer's block why. I have named it George...) I hope, instead, that SB can get its managerial act together.

