[LTER-im] what data should be archived?

Dan Bahauddin danbaha at umn.edu
Fri Jun 1 09:24:35 PDT 2018


Hi Tim,

I'm assuming here that we are talking about storage in publicly accessible
data catalogs.  There is certainly gray area between transparency and
usability, and it may be worth sitting down with any researchers
responsible for producing the data.  They may have a good feel for what is
expected and considered useful within a sub-discipline.

We would not distribute pre-QC'd data or Monte-Carlo datasets (except on
request), nor would we include "working fields" used to calculate final
results.  These have little value to the end user, IMO.  However, post-QC
data that includes "bad", flagged data seems reasonable to share,
especially when some discretion is used when determining what data to
exclude from analyses.  I would also avoid duplicated information, such as
temperatures in both C and F (unless someone with more expertise felt it
was standard in the field to present both).

For local archiving, I would store at least the initial and final versions
of datasets, with some information regarding how the latter was generated
from the former.  For most of our data, we expect this to happen at the
researcher, not the site, level.  As IM, I don't necessarily need to know
how C was converted to F, or even that the initial measurements were in C.


Perhaps I am restating the question more than answering it.  There is a
certain amount of judgement that is applied to these issues, and I think it
is difficult to create a general rule that is applicable to all datasets.


--

Dan Bahauddin*Information Manager

Cedar Creek Ecosystem Science Reserve
2660 Fawn Lake Dr. NE
East Bethel, MN 55005

Office:  612-301-2603
Fax:  612-301-2626*


On Fri, Jun 1, 2018 at 11:01 AM, Whiteaker, Timothy L <whiteaker at utexas.edu>
wrote:

> Hi all,
>
> Are there guidelines which describe **what** LTER data should be
> archived?  For example, should we be archiving these things?
> * Raw sensor data, or just QC'd result
> * 1000s of Monte-Carlo datasets, or just summary of the result
> * Columns originally used for intermediate calculations in Excel, like a
> column that multiples Celsius by 9/5, and another column that adds 32 to it
> to get F
>
> I'm aiming for usefulness and reproducibility, but sometimes I have a hard
> time determining how far back to take the data (as in the first example
> above) to have it still count as reproducible.
>
> Thanks!
>
> Tim Whiteaker
> Research Scientist
> The University of Texas at Austin
> _______________________________________________
> Long Term Ecological Research Network
> im mailing list
> im at lternet.edu
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lternet.edu/pipermail/im/attachments/20180601/16704aeb/attachment.html>


More information about the im mailing list