Archive for the 'Uncategorized' category

Themes from IDCC 2010

Dec 08 2010 Published by under Uncategorized

A few themes coalesced in my head while I was attending IDCC 2010. I don't pretend they're the conference themes; in fact, I know they're not. They're just my personal aha moments.

"Set and forget"

This community understands pretty well that preservation is not a "set and forget" process. The communities this community is embedded in tend not to get that. It's a problem.

I had a good conversation with John Mark Ockerbloom about LOCKSS, which is commonly understood as "set and forget" but which is not by any means robust enough not to require auditing and active intervention.

Institutional repositories have been actively marketed as "set and forget," and we all know where that ended up. In this case, though, it's not so much the auditing that falls down (IRs are actually pretty good at hanging onto bits and bytes) as policy decisions, active collection work, and hardheaded assessment of progress. More on this in a bit.

In any case, "set and forget" is at best an empty promise, at worst an outright lie, and it's good to remember that.

Data curation community of practice

It's scary to be on the bleeding edge, as research data management clearly is. It's doubly scary for those of us who have been on the bleeding edge and suffered for it. What mitigates the fear is community, and I'm quite pleased that data management is even at this early stage building a more active and cohesive community of practice than institutional repositories have ever managed to do.

Reasons for this include the absence of normative software communities in the data-curation space; the potential IR community fragmented quickly and completely around software choices. The enormity of the job also helps. Everyone thought (wrongly) that IRs could be built and maintained by one person with one hand tied behind her back, so where's the need for community? Everyone now thinks (correctly) that research-data management is much larger than any one person, any one library, or even any one institution. We're all looking for partners, collaborators, agony aunts.

And even better, we're finding them.

Open access is losing libraries and librarians

Library involvement in the open access movement in the United States is in trouble. I don't think the movement has entirely come to grips with that yet, but it is. As the "Cassandra of open access," I'd be remiss if I didn't say something.

I see a fair few symptoms. SCOAP3 is going down to the wire. COPE is floundering. When asked to pony up money for open access, I hear librarians and library administrators saying "Look, I thought OA was supposed to fix this budget crisis; instead, it's making my budget picture worse. In fact, when I go ask for more money for serials, I get asked why OA hasn't fixed the problem yet. Go find some other sucker; I'm done propping up this sad little sham of yours."

If that's not bad enough, OA is quietly, steadily losing its footsoldiers in libraries whose institutions don't boast OA mandates. Consider my illustrious co-blogger Sarah Shreeves. Her sole responsibility used to be running Illinois's institutional repository. These days, I learned at IDCC, she is also running the new Scholarly Commons and co-chairing the campus data-curation initiative. These initiatives eat up so much of her time that the IR has of necessity taken a bit of a back seat. I don't talk about my own job here (I really can't), so I'll just say that she and I have been professional twins for a long time, and we continue to be so.

This is great for those footsoldiers, mind you. Being an OA and/or IR footsoldier in the average US academic library is abject misery. The open access movement has never helped, or even taken notice that there might be a problem; when it's not proclaiming loudly that it doesn't exist to solve library problems, it's openly insulting libraries and librarians over a variety of so-called derelictions. This demoralizes the footsoldiers, as well as damaging their credibility and effectiveness within their institutions and their libraries.

The fair few footsoldiers I know are bright, talented, energetic people. I'm frankly thrilled their libraries are recognizing that and finding better professional situations for them. The OA movement, however, shouldn't be as thrilled as I am.

A little while ago I helped coach a friend into a job running a brand-new IR. I encouraged my friend to grill the employer pretty hard on what they were planning to do with the IR—the two questions I've been advocating for years, "what do you want?" and "how are you going to get it?"—and what I learned is that OA is so far down the list (there is a list, at least) for that library that it might as well not be there at all.

In its way, the very success of Open Access Week is a symptom. Listening behind the scenes and reading between the lines this year, I heard a fair few isolated librarians struggling against their own libraries to put together anything at all for the occasion. Several needed the OA Week banner ("this is an international event! it's embarrassing not to participate!") to goose their libraries into action. In addition, I got a distinct sense that some libraries put on an OA Week event in order to tick off the "did something about OA" tickybox for the year, in essence giving themselves an excuse not to do anything else.

I don't have any bright ideas, I'm afraid. I do believe that ARL/SPARC needs to turn its attention to stiffening its membership's collective spine, and giving them a clear and actionable roadmap to follow.

It's quite possible, even likely, that the OA movement will react to these symptoms with a collective shrug; that's certainly how they've treated libraries heretofore. I'm too personally demoralized by the whole mess to argue. The proof of the pudding, and all that. But if US IRs start folding and COPE doesn't make it and institutional mandates stop happening or existing ones backpedal, don't say I didn't warn you.

No responses yet

The Four Sons of digital curation

Dec 07 2010 Published by under Uncategorized

So I wanted to put in my two penn'orth on this question on DHAnswers about best-practice guidelines for data in the humanities, but what I have to say is a little askew of where that discussion seems to be going. I'll say my piece here, then, and link from there.

At CurateCamp yesterday, the discussion of a curation community of practice suddenly took an extraordinarily technogeeky turn. By way of bringing it back to earth a bit, I pulled out a well-worn analogy that I've used before in other contexts: the Four Sons parable from the Pesach service.

The First Son in the Pesach parable asks his father to describe to him in exhaustive detail all the observances of Pesach and all the stories behind those observances, so that he can do everything correctly and pass on the knowledge to his descendants. Everybody in the CurateCamp room, myself certainly included, was a First Son. We can't get along without our First Sons. The peril of First Sons, though, is that they tend to lack perspective and get caught up in pilpul.

This is exactly what happened at DHAnswers. A couple-three First Sons got to duking it out about the value (or lack thereof) of SGML/XML markup. Derailed the entire conversation into a tiny, tiny corner of a very big question. It's what was happening at that particular moment in CurateCamp, too. It happens a lot, and it's a problem.

The Second Son in the Pesach parable asks, "What is all this to you?" By saying "to you," and not "to us," the Second Son intentionally and hostilely places himself outside the community, treating it as a zoo full of weird and occasionally unsavory animals. He doesn't understand what's going on and will have to be talked into caring. In universities, a lot of Second Sons live at high echelons of library, IT, and university-wide administrations. Grant funders have a fair few of them too.

The Third Son asks only, "What is this?" He's not hostile, but he's utterly clueless, not even understanding what he doesn't know. I've met Third Sons in large numbers among faculty. As the Pesach fable explains, Third Sons need simple and straightforward explanations that they can follow even if they don't really understand the problem domain.

The Fourth Son does not even know how to ask, and he exists in large numbers among faculty as well. The Pesach parable insists upon outreach.

The Third and Fourth Sons are why so very many early digital projects are no longer extant. The Third and Fourth Sons are the ones who perpetrate all the wrongheaded antipatterns DHAnswers has so kindly and snarkily collected. The digital humanities cannot progress among the humanities generally until the Third and Fourth Sons receive more and better guidance—emphatically including warning them away from common antipatterns!

Here's the thing. Too many approaches to digital curation, even to explaining digital curation, are aimed at First Sons. This is self-limiting, counterproductive behavior. Whatever the ACH and the NEH do to address data management among humanities research, it needs to be aimed at all four sons.

Comments are off for this post

Idiosyncrasy at scale: digital curation and the digital humanities

Dec 07 2010 Published by under Uncategorized

John Unsworth, Illinois. "Idiosyncrasy at scale: digital curation and the digital humanities."

Can't remove ambiguity in the humanities (the way you can in chemistry)! We'd remove everything that matters. This can make it hard to talk about humanities "data" (is there a thermometer for the zeitgeist?). Humanities data are idiosyncratic because the people who make them are.

Research methods are changing as traditional objects of humanities study (e.g. diaries, correspondence) become born-digital. Still have to "tame the mess," recognize that mess has value, including as a mess. Is departure from the norm an "error" or a "data point"?

"Retrieval is the precondition for use; normalization is the precondition for retrieval." (Not sure I agree with this! Techniques exist to deal with messiness.)

Six laws to give us pause:

  • Scholars interested in particular texts.
  • Tools are only useful if they can be applied to texts of interest.
  • No one collection has all texts.
  • No two collections are format-identical.

Therefore: humanities data narratives include normalization (of "Frankendata:" broadly aggregated but imperfectly normalized data). Lots of different kinds of normalization (spelling, punctuation, chunking, markup, metadata).

Example: MONK project, using EEBO and ECCO within the CIC. (Me, on soapbox: This. THIS. is the collateral damage from "sustainability" initiatives that impose firewalls around content. If you're not in the CIC, too bad so sad, you can't use these data.) Lots of data-munging which I won't recount.

Example: Hathi Trust, now available through API. Will be central player in developing research uses for digitized texts. Doing preprocessing/normalization blows up storage space necessary by 100x. There will be a research center established for working with this corpus.

Can we crowdsource corrections, a la GalaxyZoo? People are interested and willing, it can't be automated, and we need the help.

How do I keep my solution from becoming your problem? Association for Computers in the Humanities trying to crowdsource some best-practices recommendations for humanities researchers on managing their digital/digitized collections. Immediate conflict on DHAnswers site: to use markup or not to use markup? Practical upshot: when do we have usefully shareable data? When should we stop messing with it so others can use it? What's data and what's data interpretation, and what do we do when they coexist in the same marked-up text?

Humanities data is bigger than books! Books are the tip of the iceberg. NARA strategy for digitizing archival materials: they have 5x the pages of what's in Hathi Trust, in much less tractable forms than the books Google/Hathi is working on. And that's just one archive! We'll have to learn how to manage this kind of scale.

3 responses so far

Data and curator: who swallows who?

Dec 07 2010 Published by under Uncategorized

Barend Mons: Data and Curator... who swallows who?

Curation strategy: cross our fingers and hope? Doesn't work! His mantra: "it's criminal to keep generating data without policies to deal with it!" Need to get the collective brain involved.

If you measure anything in -omics, you end up with a Big Ball of Mud. "Ignorance-driven research:" measure, get lots of data, get a result if you're lucky. These days, it becomes "BIGNORANCE driven research," the goal of which is to find some kind of signal in all the noise.

Everybody wants structured data, but nobody wants to do structured-data entry! We need to figure out how to get from messy free text to structured data. Note: people WILL do structured data entry if they see how it helps them. (Metadata librarians take note!)

Lots of wikis trying to solve this problem, but what ends up happening is people repeating each other's assertions rather than checking and correcting errors.

Theme emerging: we only need a tiny tiny share of people's online attention to do a LOT of science! Question is how to earn that share.

Knowledge discovery by computers requires computer-operable data. (No big surprise, but it bears repetition.) Dirty data comes from all kinds of datamining: Web, articles, etc. Then clean it up on the wiki and add URIs to evidence, publications, etc. Put the result out as RDF, then use computer reasoning to adduce insights and guess at their reliability. Hoping also to store but not reason over negative results. Soon they'll be able to track "nanopublications" and make sure people get credit.

Partnerships arising to do things that are too complicated for a single researcher or organization to do. Using ORCID, VIVO, etc. to refer to people.

Summary: we need to remove ambiguity and redundancy; we need computer-reasonable data so that we can throw grid computing at it; we need to involve a million minds in curation; we need data publication (not just sharing) so that data become citable; we need data-citation metrics; we need standard setting from the bottom up.

Comments are off for this post

Liveblogging Kevin Ashley's talk at #idcc10

Dec 07 2010 Published by under Uncategorized

(Wireless is terribly wonky at the International Digital Curation Conference, so I'm going to try liveblogging instead of Twitter.)

Kevin Ashley, Director, Digital Curation Centre, "Curation Centres, Curation Services: How many is enough?"

In the US, answer is 3: roughly one center per 100 million people, one per $120 billion in research funding. D2C2 at Purdue, UC3 at California, DRCC at Johns Hopkins. Does this mean the UK has too many?

How many services are there? Many per center! Who is being served?

Picture is actually considerably more complicated: many centers across the US, doing different things for different disciplines and institutions and people at different points in data lifecycle. E.g.: national libraries, national subject data centers, international subject data centers, university libraries, government data archives, etc.

Each actor has a different idea of where they sit in the DCC data lifecycle. Some focus on access/use/reuse of data, especially those that focus on highly-curated information and want to see a large audience before they take in a dataset. Institutional actors tend to engage earlier on, in the appraisal/selection stage; they won't take everything, but they'll take more and more diverse datasets. Others will pitch in at the very beginning, helping people with ideas make plans for durable data.

Motivations to help out include: "data behind the graph," reuse value of data for research, "data as record" (in the records-management sense), data reuse in education, increase the value of data via data mashups.

Given the complex landscape, it's hard for researchers to figure out where to go to get help, even those who desperately want to do the right thing! They need some kind of decision tree. Kevin suggests accepting that data has different homes at different points in the process; make that easy, and help people point to data wherever it happens to live. Particular problem when publications refer to small slices of a bigger data source; the connection between the slice and the original dataset can get lost.

Various sources of guidance for researchers and service providers; also potential peer reviewers of grants (how do THEY tell a good plan from a bad plan?).

DMP Online: walkthrough tool for researchers; can be adapted to almost any funder policy worldwide. Rule-driven, structured, generic questions. The same tools can aid peer review of grant applications, because everything is reduced to a common template, making plans easier to compare. Again, how many of these services do we need? Is DMP Online enough?

"In preparing for battle, I have always found that plans are useless, but planning is indispensable." Good thing to recognize! Plans always change, but the planning process is still useful.

IDCC presentation two years ago on what university libraries can do:

  • raise awareness of data issues (improving service to research, not just teaching)
  • leading policy on data management at institutional and national level
  • advising researchers on data management
  • working with IT to develop data-management capacity
  • teaching data literacy to research students
  • developing staff skills locally, reskilling/retraining
  • working with LIS educators to identify and deliver appropriate skills in new graduates

Some of these initiatives have been more successful than others! IT/Library interface often troubled or nonexistent. We're not teaching graduate students, or our own library staffs. Working with LIS instructors is inconsistent. But we're doing pretty well on policy and consciousness raising. (Kevin is talking about his own UK context; I think the US would tick different checkboxen.)

Question: should disciplines or institutions take on the data-curation problem? Pros and cons in both directions (I won't copy the slide; it's complex). Disciplines tend to run on short-term funding and have a narrow view of usefulness. Institutions don't tend to have the depth of knowledge.

Institutions need to know what's theirs, know where it is, know what the rules are (who can see it, who assesses it, who discards it, when that changes). It's part of the institution's public portfolio! Marketing it is also an institutional responsibility.

Current decision points for UK Research Data Services: should every institution do this? what are the rules for new subject repositories, as the data landscape changes? what should be done nationally, what locally? drive or follow the international agenda? Where do institutional research administrators fit into all this; can they put aside turf battles for the efficiency of collective action?

What is the impact of research-data management and public data access? IMPACT. Increased citation rates (see Piwowar et al -- hi, Heather!); the 45% of publications in the sample with associated data scored 85% of the citations. Correlation is not causation, but the link is pretty suggestive. Shared social science data achieves greater impact and effectiveness: more primary publication, more secondary publication, findings robust to confounding factors. Formal sharing is better than informal sharing, which is better than no sharing at all. These numbers are persuasive to evidence-based researchers; we need to bring this to their attention! Also need more investigation across disciplines.

Definite demographic differences in who will share data. Women more likely than men; northern US likelier than southern; senior researchers more likely than juniors. (But the seniors are TELLING their juniors not to! Selfish and counterproductive, IMO.)

Another impact: reuse. "Letting other people discover the stories our data has to tell." Teaching journalists to mine data: "fourth paradigm for the fifth estate." Push to make government data more open allows savvy journalists to find stories in released data. They're being taught Python, analysis techniques, etc. Sometimes they'll get it wrong; we'll have to live with that.


  • Data is often living; treat it that way! (This is a serious weakness in the OAIS model IMO.)
  • More data in the world than is dreamt of in scholarly research, Horatio!
  • Hidden data is wasted data.
  • International collaboration is essential.
  • We have a duty to examine and promote the benefits of good data management and data sharing.
  • Three centers in the US is not enough!

From Christine Borgman: May not be time yet for rigid policies or too much structure from e.g. NSF. This is an experiment in what the scientific communities themselves will come up with, and what the response will be. Let's hang back and study the results. (I agree wholeheartedly!) Response: No, we don't want rigid rules, but we can help them work toward best practices and structured thinking about what a data-management plan is. And some agencies can legitimately set constraints (e.g. use our data centers) and monitor compliance. Fundamentally, though, right now is about getting people used to the whole idea of data management.

Q from Department of Energy: government is afraid of cost of data curation, uncertain of benefits. The more evidence of impact, the better! A: Absolutely agree! We need to measure and market benefits.

2 responses so far

The Fourth Jeremiad

Dec 07 2010 Published by under Uncategorized

The Christmas season seems to be bringing up a lot of talk about e-books, journal costs (namely increases), and the role of the library in the digital age. Is it because the Kindle and Nook are popular gift wish list items? It it because some library vendors are pushing bills a little later to the end of the year? I don't know.  

Robert Darnton is the  Carl H. Pforzheimer University Professor and Director of the Harvard University Library. In his recent article in the New York Review of Books, The Library: 3 Jeremiads, Darnton explains many of the things we at BOT have been mentioning and discussing for some time. (Note: Darnton is an historian, not a scientist. Expect verbiage.) I confess I had to look up what jeremiad meant, as I've studied very little theology. Darnton's choice of words is interesting. The OED defines a jeremiad as  "A lamentation; a writing or speech in a strain of grief or distress; a doleful complaint; a complaining tirade; a lugubrious effusion."

Once you wade past the Harvard promotional information, Darnton does a thorough job explaining the three main reasons why libraries are in such a a bad place right now. Just like in the most recent BOT post (which began as a comment) one of the main themes is control.  Who acquires, maintains, and relinquishes control of information is important,  as Libraries routinely give up control of many aspects of what they do for the greater good of society and culture.

What is Darnton's solution? Creating another library resource, the Digital Public Library of America (DPLA). Per his description, it is "a digital library composed of virtually all the books in our greatest research libraries available free of charge to the entire citizenry, in fact, to everyone in the world." I see this as a library-centric way to regain control of information, and wrest the monopoly of Google, et al. on the digital library of the future.

What will the DPLA hold? Of particular note is Darnton's suggestion that:  "the DPLA would exclude books currently being marketed, but it would include millions of books that are out of print yet covered by copyright, especially those published between 1923 and 1964, a period when copyright coverage is most obscure, owing to the proliferation of “orphans”—books whose copyright holders have not been located." Recent attempts to create updated policies for use of orphan works in the US have been unsuccessful. Is this a way to secure mechanisms to use them without contacting copyright holders? This is very interesting, since the GBS has includes orphan works in its scanning program and has been taken to task for its' own interpretation of copyright.   

This is interesting, because one by-product of the Google Books Project, Hathitrust, is collecting GBS content as a library-side answer to Google's monopoly of the content. Hathitrust is also starting to build a governance structure, which has more libraries joining to secure a seat at the table.  So GBS content will have another outlet controlled by libraries, at least for the forseeable future. Presumably Hathitrust will have most of the DPLA content, since the largest academic libraries have already signed onto the GBS project and their works are being scanned as I write this sentence. This also begs another interesting question: what if these GBS books are not the most important books in our culture? What if big parts of these large academic collections are filler, or less relevant for research?   

Likewise Portico and LOCKSS have created a framework to preserve most of the commercially owned journal content. While Portico isn't owned and controlled by libraries (it's owned by Ithaka, a non-profit) , LOCKSS is run by the Stanford University Libraries. The only problem, of course, is that apart from individual libraries' efforts to make information available, most of this content is still controlled to some extent through toll access and/or trigger events from the archives. Open Access journal articles, and subject repositories will have content available to the public for free, so some content is already available this way. 

The last remaining segments of research are local and special collections individually housed by academic libraries, archives, historical societies, and museums. Many of them are digitizing projects and collections as they are able to, and most are putting this information in repositories. Would this information be part of the DPLA? Maybe, although it's not clear from Darnton's article if this is the intention. For many scholars, this primary material is the most essential for their research, not synthesized monographs and summaries. Increasingly, for scientists access to research data and supplemental material is what is desired, not necessarily the final published article or book on a subject.

I think Darnton has missed the point with the DPLA, and it looks to me like duplicative work being promoted to an audience unaware of the environment surrounding digital content and access to information. So I offer a jeremiad of my own: The Library community needs to think more broadly and create broader pathways to content, rather than trying to create more specialized channels to information. 

 The concept of a national library seems outdated to me in light of today's digital environment. I frequently meet and communicate with researchers from all over the world using social networking tools and applications.Digital information doesn't have national boundaries - why create them in a library? It seems more time should be spent looking at how to create an international digital library, or repository, or link existing data and research sources rather than creating segmented units of information created for specialized audiences. There is a rapidly growing collection of digitial data, research material, and communications, all of which will be of tremendous importance to the next generation of researchers. Who will preserve this? How will it be preserved? This is what a DPLA should be thinking of, not items from 1923-1964 that will likely be saved through other scanning programs or as a print copy.

4 responses so far

A thing I'm thankful for

Nov 22 2010 Published by under Uncategorized

So it's gratitude week on Scientopia, and terrible misanthrope that I am, I thought I'd take the edge off a bit and participate.

First up, a thing I'm thankful for. Believe it or not, there are lots of these, but if I have to pick one that's work-related, I'll call out my nice office iMac. It's an elegant machine with a big screen that makes it just as easy to ssh over to a server as it is to read my email.

Don't laugh. When I started in my current position, I had a Winbox, and getting to a proper Unix command-line prompt was one hell of a hassle. (Don't talk to me about Cygwin. Just don't.)

The machine is over three years old, yet doesn't feel sluggish (unlike, sadly, my similar-age MacBook, which is cruising for a replacement soon) and still runs the latest-greatest in everything I need.

To each user her computer; if Winboxen are your thing, good for you. For me, my iMac gets out of my way for the most part and lets me get things done. I appreciate that. A lot.

Comments are off for this post

Little Nuggets

Nov 18 2010 Published by under Miscellanea, Open Access, Praxis, Uncategorized

Little nuggets of information are swirling around in my head. I'm just back from two meetings, in two different cities, and each one had some interesting ideas about the future of library services, collections, and technology.

Meeting #1 was the 2010 SPARC Digital Repositories Meeting in Baltimore. The last time this meeting was held, 2008, the landscape for institutional repositories (and digital repositories) was focussed on how libraries could create and/or host them and convince others of their value. I would say that with a few exceptions, not much has changed.

Just like everyone wants to get married in Jane Austen's Pride and Prejudice, everyone in libraryland seems convinced if the right marketing approach/language is used, the perfect match will be made with respect to people contributing and using IR/DR content. Unfortunately the current IR/DR infrastructure isn't conducive to this. You need to establish relationships before (or while) you build the network, and there're few easy tie-ins to the existing infrastructure.  The keynote speaker, Michael Nielsen, made this point with respect to use and adoption of science online networks and the same is true for libraries. The current reward system isn't set up so scientists can show the value of contributing to social networks outside of the peer review process. I would agree this is true for IRs/SRs/DRS also, although of the three subject repositories have been the most successful.

As you can tell from the program, there was emphasis on collecting and curating open data, which I think showed there is a desire for libraries to find a better match. While this may create a niche for libraries, it's going to take some work between the "data nerds" and the collectors, as this friendfeed discussion shows.  

While several presenters mentioned the need for preservation, there was suprisingly little talk about the importance of having policies, infrastructure, technology in place to do this. In fact these two communities are almost completely disconnected. There's also been very little attention to assessment issues such as identifying if the money and staff time devoted to projects is worthwhile given the continuing recession and shrinking library collections budgets. I see both of these ideas impacting work on IRs/DRs/SRs, although since neither topic is "sexy" it may take some before we see much attention devoted to these issues.

The plan is to have this conference again in two years, and if this happens I predict we will see further shifts in focus or perhaps this program co-sponsored or linked with another organization.

Meeting #2 was a joint ARL/SSP workshop, Partnering to Publish: Innovative Roles for Societies, Institutions, Presses, and Libraries. This should have been a session or part of the schedule for Meeting #1, because it became clear as the meeting progressed that working in the publishing infrastructure is a natural way for libraries to make their repositories and/or preservation efforts tie into the existing promotion and tenure environment. In most cases the speakers at the event were able to show this in easily quanitfiable ways, like sales figures, enhanced content and features in books and journals, as well as stronger relationships with administrative units and campus faculty.

I also attended yet another conference in the last month: the 2010 Library Assessment Conference. Not much of this conference addresses issues in BOT but I will say this - there were twice as many attendees at this meeting than the SPARC meeting with many more presentations and ideas generated. This is currently a hot topic in librarianship and I predict we will see more programming devoted to all areas of this topic in the future.

No responses yet

Nature: the response

I was able to get quite a bit of feedback from Nature regarding prices, access, and content. I spoke with our North American sales reps and another staffer from the London office last month and their response is below, with my questions and their answers:

1.      According to Nature’s recent letter, 50% of Nature journals do not currently have an OA option. Are there plans to provide access to the remaining 50%? How would this be accomplished?

Its not 50% of all NPG's journals but 50% of our academic (not Nature-branded) journals. NPG's August letter to customers says "Open access options are now available on 50% of the 50 academic journals we publish including all 15 academic journals owned by NPG. Seven journals published by NPG on behalf of societies offer open access options, with more expected to follow later this year."

All of the academic journals NPG owns now have an open access option. Since the letter was published, six more of our society-owned journals have introduced open access options. They are American Journal of Hypertension, Laboratory Investigation, Modern Pathology, Mucosal Immunology, Neuropsychopharmacology, Obesity. We continue to discuss open access with our publishing partners, ultimately the decision to introduce an open access option on these journals remains with the society or organisation who owns the journal.
Nature Communications is unique amongst the Nature journals in offering an an open access option. The gold open access model (funded by article processing charges) is still inappropriate for Nature and the Nature research journals. These journals decline more than 90% of submissions, these high rejection rates and the developmental editing that goes into every published paper would make APCs prohibitively high. We estimate the APCs on these journals would be between $10-30000, and research funders are not currently willling to support this.  The Nature Review journals do not publish original research papers.

 2.      Does Nature have plans to incorporate newer metrics into journal, article, and author information and assessment? Some examples of these include article downloads, author h-index information, Eigenfactor information, etc. 

Earlier this year, we introduced article download information for 43 journals. This is available to authors within their account on eJournal Press, our manuscript tracking system.

We continue to monitor the alternative metrics such as the Eigenfactor, Article influence score and h-index. These metrics are not yet widely accepted or understood, but we remain very interested in alternative ways of judging impact and value.

For example, at NPG we think that cost per download and cost per local citation are potentially important measures of the value for money of a journal to an institution.

3.      With regards to communicating and sharing consortial plan arrangements and information, are there plans to provide more transparency on pricing? Specifically are there plans for different consortia to know prices and information provided to other individual customers and consortia?

NPG makes its academic list prices public in the interests of transparancy. We have no plans to make terms of consortia agreements public. Each consortium is different in terms of their holdings, number of institutions and FTEs, and these are confidential agreements.

4.      In my phone call with our NA reps we briefly discussed a library advisory board for Nature. Can you give more information on the group’s membership and activities?

The NPG Library Committee is an invited group of NPG institutional customers. The group represents a mix of customers from across the world, working in academic, corporate and government settings. It includes both individual customers and consortia managers. The Committee or a sub-group of it meet face-to-face approximately once a year. We discuss NPG's activities and the wider publishing and information communities with them regularly via a discussion board on Nature Network, email and phone. The Committee provide useful feedback and insight on the views of the information community.

5.      One comment to my blog post mentioned Nature’s mission statement and its recent change. Can you provide more details on how the mission statement is reviewed, updated and shared with customers and readers?

The journal Nature's original 1869 mission statement still stands, and guides Nature Publishing Group's activities today:

THE object which it is proposed to attain by this periodical may be broadly stated as follows. It is intended
FIRST, to place before the general public the grand results of Scientific Work and Scientific Discovery ; and to urge the claims of Science to a more general recognition in Education and in Daily Life ;
And, SECONDLY, to aid Scientific men themselves, by giving early information of all advances made in any branch of Natural knowledge throughout the world, and by affording them an opportunity of discussing the various Scientific questions which arise from time to time.

Nature's mission statement was updated in 2000 as follows:

First, to serve scientists through prompt publication of significant advances in any branch of science, and to provide a forum for the reporting and discussion of news and issues concerning science. Second, to ensure that the results of science are rapidly disseminated to the public throughout the world, in a fashion that conveys their significance for knowledge, culture and daily life.

We have no current plans to update Nature's mission statement.

6.      Another comment to my blog post mentioned the pricing model for Nature as being based on a floating currency model which is not determined by currency rates we see in bank and other financial updates. Is this how international currencies are determined and are there plans to review or revise this model? 

In 2008 NPG introduced local pricing based on four local currencies (dollar, euro, pound sterling and yen). This means price increases are applied to local currencies, independent of currency exchange rate fluctuations.

I have a few comments on these answers:

1. Estimates of $10,000 - $30,000 in author charges for one OA article? You heard it here first. That's the entire journals budget for some small libraries. They'd be able to get one article for the year - for all the faculty. I wouldn't call that a viable OA option.

2. I'm glad Nature is implementing article download information, but I think it much more beneficial if everyone, not just the author, can see the data. As a point of comparison citation data is available to all users in a database. 

 More generally, I think Nature is trying to have it both ways - be a boutique publisher, with high costs and a correspondingly  high-profit margin, and also remain a core publisher, in that most acdemic or scholarly institutions are expected to subscribe at least some of the content. With these cost increases and licensing options, Nature is becoming (or already has become) unaffordable for many institutions. If scholars weren't demanding access, many of these titles would have been cancelled by many libraries by now.  

I don't think you can have it both ways. If most of your customer base can't afford the content, then in my opinion your market is limited and you can't also be considered a core publisher. Can you be both Neiman-Marcus and Wal-Mart at the same time? I don't think you can.  I'm curious to see how many libraries have cancelled titles from Nature or have forgone adding titles because of the cost.

This discussion is also painful for me because I know faculty where I work want Nature journals - I have a list of over six titles that have been requested in the last few years. I feel I am doing a disservice to my colleagues in withholding access to something. But the money is simply not in our budget.

I also want to support publishers that are experimenting with new communication channels in scholarship like web features, a blogging platform, Second Life, podcasts and the like.  Nature has been very progressive in exploring these new areas of scholarship, and their support has helped legitimize them as communication channels. Does it have to come at such a high cost? I hope not.

10 responses so far

One of our own

Sep 14 2010 Published by under Uncategorized

Today there was a major report released today by ACRL ( Association of College and Research Libraries) on Value in Academic Libraries: A Comprehensive Research Review and Report. The entire report is at:  It discusses ways libraries can better show value to the academy (or industry, local community, school system, etc.)

Disclaimer: I haven't read the whole thing. Unlike some of my colleagues I rarely get reports prior to the release date and it usually takes me time to read them. Since this is the start of the semester and it's over 150 pages this one will take me some time to digest.

I have read the introductory material, though, and will say this: it's a good report for anyone who wants to better understand the scope, mission and challenges of today's library.  One interesting observation: since it's not in a toll-access journal, it's free to all. Incidentally, this tradition of libraries publishing many of their most topical and case-based industry reports freely on the web has kept our own literature from moving as quickly into new publishing models (like Open Access) or understanding the plight of our academic colleagues. Many of our most important and timely stuff is not in journals (with a few exceptions) while other academic faculty are facing more journal titles in which to publish, more places to monitor research, and also being saddled with longer review times.  Unreliable social networks or tools that shut down or change access models are a problem for monitoring research with social tools. Bloglines and Scribd for two very recent examples. Bloglines will be shutting down Oct 1 and Scribd will require a userid to download content. Coupled with this is the pressure to conform to established criteria for promotion and tenure - who has time to innovate? 

Other recently reported news was the announcement of Frank Turner as University Librarian at Yale University. There's more information here, and while I can't comment on whether I think he has the skills to run a major academic research library, there has been discussion within the library community on whether an academic and non-MLS should be in this or similar positions of leadership in academic libraries. Should we hire and promote our own to run the library? I will say Yale felt justified in promoting one of their own - Prof. Turner has been at Yale for some time and the news release hints that the search committee identified Turner early in the selection process. Is it so bad to pick one of your own? Personally I think this can be both good and bad - good in that you know the candidate well and can be assured of their opinions, management style, etc. It's also bad in that you might also know the candidate well and assume the direction, opinions, or other aspects of their performance.  Do you value tradition to the exclusion of innovation? Can these two qualities peacefully coexist? I don't think there are simple answers here. The other argument in the library community is that this trend cheapens the library or MLS degree. I do have some concerns about this but leadership can be surprsingly democratic and not necessarily based on recent or traditionally relevant experience. In the bas eof Prof. Turner, I think his experience as Provost more than makes up for an MLS degree.    

This second announcement brings up another question: how much can you truly innovate in an established, traditional environment like an ivy league academic campus or a top-tier academic research library? Is it even possible? I think the challenge for Turner is not so much running the library (although it will be substantial considering the size and scope of Yale's collections and library locations) or even adminstrative expectations such as raising funds, raising the library's profile on campus, and garnering support for library services and programs.

I think Turner's real challenge will be addressing the questions raised in the ACRL  report: reconciling the academic library's tradition of having both a collegial and manegerial culture, addressing the requirements of outside accrediting organizations and professions, and demonstrating  library resources as relevant to the overall campus mission.  To me, this is the real challenge of our profession. Is it better to have "one of them" talking to the department faculty, or "one of our own" in the local academic community explaining our culture to others?

Since I have a stake in this, I'll share my opinion: my first priority is that the place I work gets the support it needs to function and thrive. If the best advocate for me as a librarian is a non-MLS administrator, or a faculty colleague, or a student-turned prominent alumnus, or even a library vendor who can tell their peers, then I'm fine with that. Would I rather have it be "one of myown" doing this promoting? Sure, I think everyone is happier when their own peers influence decision-making. But you can't always communicate to everyone on your own terms, or even to some higher level audiences. I think we all need to know when it's better for someone else to toot our horns for us, whether they're one of our own or someone else.

2 responses so far

Older posts »