[LTER-im] Representing datasets from other parties when they are integral to your work

Henshaw, Donald don.henshaw at oregonstate.edu
Wed Sep 20 12:52:20 PDT 2017


Hi,
The USGS collects our streamflow for Lookout Creek which is the highest order stream in the Andrews. We do repackage the USGS data and include this in the same data set with the same structure as all of the other nine small watersheds where we collect streamflow. We do this for several reasons:

·         As a convenience for our PIs the AND LTER provides both the high temporal and daily versions of the USGS streamflow data in the same format as all of our other watersheds.

·         Maintaining the USGS streamflow in AND LTER formats allows our applications for summarizing streamflow and preparing streamflow data for use with our stream chemistry samples work for these USGS data

·         The high temporal versions are only available for the past 15 years or so through USGS. The AND LTER has recreated an hourly data set after historic reconstruction from charts, punch tapes and printouts going back to 1949, so we offer the only complete high temporal version of this data.  The USGS never bothered to Q/C any of the high temporal resolution data until more recent years.

·         The USFS maintained this gage for eight years in the 1950s and 60s and the USGS does not maintain these records. Our daily record is the same as the USGS daily record beginning in 1949 except we include these additional 8 years, so we offer the most complete and “best source” for the daily record as well
We do reference the USGS contribution in the data abstract and methods, but including specific references to the USGS source in a method step in EML sounds very appropriate.

On the other hand we have never done anything with our National Atmospheric Deposition Program (NADP) data, although we do collect similar precipitation chemistry data at AND LTER. This NADP data is well maintained on their website beginning in 1980. However Andrews personnel have spent time every week sending information and samples to NADP since 1980 we should probably make this data more obviously available.

Don

From: im [mailto:im-bounces at lists.lternet.edu] On Behalf Of Margaret O'Brien
Sent: Wednesday, September 20, 2017 11:15 AM
To: Jonathan Walsh
Cc: Emma Rosi; IM committee
Subject: Re: [LTER-im] Representing datasets from other parties when they are integral to your work

Hi Jonathan -

I see several ways to answer this question. Ideally, there is a way to handle different sources of data without complicating your IM system too much. And I hope this answer is not too complicated, either.

It's pretty simple to link to other sources. SBC LTER does it on the main data page (http://sbc.lternet.edu/data/). But some of these we use for our LTER research, so they have been turned into datasets with EML, too.  At SBC LTER, we call this "exogenous data", or "Type 0". And so our re-packaging turns it into "Type I". But we do try to make it clear where it came from.

Particularly in the watersheds, there are a number of these, including from the USGS. I should add though, that we don't get the USGS data that we use from their website, we get their highest resolution data thru back-channels at the end of the water year after it's been QC'd.  In total, SBC LTER has 30-40 datasets like this, including both precipitation and stream flow.

For stream flow and precip, we use these rules for creating datasets from exogenous dat. Keep in mind that the most visible parts of a citation are the creators and title, so these are important.
1. the data are reformatted to match our own (so they can be used together - now they are Type 1). Since stream flow is calc'd from stage height, and we do that part here, we do call it "ours"
2. creator is the primary PI conducting that part of the project
3. the dataset title names the source (if not us), and has both the original station and our station id in it.
4. the abstract also names the source

Here are some examples:
our data from a USGS stream gauge:
http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.3018

A similar dataset from one of our own height-gauges:
http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.3007

precipitation data collected by the county:
http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.5012

We have other exogenous data where we did nothing to it at all -- all we do is repost. These are mostly for reference, and for those, we make no claim to this data (unless we treated it somehow, like aggregating).
So those rules:
1. data are in original format
2. creator is original org (not us)
3. dataset title shows is as close as we can get to what it was when received or downloaded. It may take a phone call to get it right.
4. contact - add one for the org, too.

e.g., here is some KML data, that describes the perimeter of a recent fire.
http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.70



Another project I work with is a biodiversity observation network (BON). Most of the data they work with are exogenous, and a lot of what the BON does is to curate that data so they can use it in integrated research, and create datasets along the way. They have adopted the LTER data management protocols, including the whole EML > EDI > DataONE pathway.

So we highlight the packaging guidelines for different types (de novo, curated exogenous, integration products)
For that, see "Data Packaging", here: http://sbc.marinebon.org/data/overview/
These are very similar to what we do with LTER, but the process is more formalized.


One more note:
EML has an area for describing "source data", under methods. Use it if you can. There is a node called sourceData, that works for holding info about another PASTA dataset. For URLs that are non-pasta, the only current option is to use the text fields.
Here is an example for one of the biodiversity datasets that integrates data from 4 projects (open up Metadat > Methods):
https://portal.edirepository.org/nis/metadataviewer?packageid=edi.5.2
Yes, the LTER dataset is in PASTA, but our code was not not sophisticated enough to get the id inserted into edi.5's /eml//dataSource node. But since both the LTER and BON data are time-series, we'll get that in on the next update.


Margaret


Margaret O'Brien
ORCID: 0000-0002-1693-8322
Information Management
Marine Science Institute, UCSB
Santa Barbara, CA 93106
805-893-2071<tel:(805)%20893-2071> (voice)
http://environmentaldatainitiative.org
http://sbc.marinebon.org
http://sbc.lternet.edu




On Wed, Sep 20, 2017 at 8:11 AM, Jonathan Walsh <walshjcaryinstitute at gmail.com<mailto:walshjcaryinstitute at gmail.com>> wrote:
Hi IMs!

I have a question on how to best represent data that your study uses, but is provided by others.  If you use such data, we could use your insights on how you make that portion of your data available to the community and the LTER.

Baltimore Ecosystem Study gets its stream flow data from the USGS.  We in turn use this flow data to calculate our daily loads and other results that we track.  The USGS data are kept on their website and we incorporate them into our work.

For the purposes of making our data available to the larger community, (LTER, PASTA, DataONE, etc.)  we have historically just pointed to the USGS data on the USGS site.  (example:  https://waterdata.usgs.gov/usa/nwis/uv?01589197 ) as opposed to collecting our own copy and providing it ourselves.

The above precludes us from providing a direct link to the data such as that that would be PASTA "type 1".

If you have any suggestions as to how, if differently, we should represent these data, which are integral to our work, but not provided by us, I would very much appreciate hearing them.

Thank you!

Jonathan

-
Jonathan Walsh
orcid.org/0000-0002-0658-0814<http://orcid.org/0000-0002-0658-0814>
Information Manager, Baltimore Ecosystem Study
Cary Institute of Ecosystem Studies
Box AB; Route 44A
Millbrook, NY 12545-0129
P: 845/677/7600 Extension 103<tel:845%2F677%2F7600%20Extension%20103>
F: 845/677/5976<tel:845%2F677%2F5976>
E: WalshJ at caryinstitute.org<mailto:WalshJ at caryinstitute.org>

_______________________________________________
Long Term Ecological Research Network
im mailing list
im at lternet.edu<mailto:im at lternet.edu>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lternet.edu/pipermail/im/attachments/20170920/0b11367a/attachment-0001.html>


More information about the im mailing list