I often see a cost argument against research-data preservation: if it's cheaper to replicate or regenerate the data than to preserve it, why preserve?
Here's my question: Cheaper for whom?
If we remain within the context of an individual lab, this question is a no-brainer: if it's cheaper to regenerate, regenerate. As we dip our toes into an opener-data world, however, I should think the equation changes rather.
Is it still cheaper for two labs to have to regenerate these same data? Five labs? Twenty labs? How many of those labs will have to buy specialized equipment to create those data, equipment they wouldn't need if the data were shared by the first lab? How much staff time—worst-case, specialized staff time—will be eaten up in regenerating data?
There are certainly offsetting costs to consider: the cost of data discovery, the cost of cleaning up and describing data for sharing, the cost of whatever munging it takes to move data from one lab's context to another's, the magnified cost of any error on the part of the data-generating lab.
Still, my sense is that the discussion around cost has been just a bit simplistic… and is likely to become more complicated as data-sharing norms emerge.