[LTER-im] Representing datasets from other parties when they are integral to your work
Corinna Gries
cgries at wisc.edu
Thu Sep 21 09:05:49 PDT 2017
Hi Jonathan et al.,
Once you get to think about this, would you be interested in summarizing this as best practice documentation? It sounds as so everyone has slightly, but not majorly different approaches that would be great to compile in an article. I am sure Don and John would help – right?
I think this discussion goes beyond the simple provenance implementation in EML where we say in our best practices:
The <dataSource> tag is for nesting an EML dataset which may be an input to a <methodStep> of the data being described, e.g., calibration information for an instrument or input parameters for a model. This element will also be used by the PASTA provenance tracking system for recording the source data when a derived product is created and described with EML. For more information, see Section III, Recommendations for
Compatibility with External Applications.
And an example of the EML is on page 51 of the document https://im.lternet.edu/sites/im.lternet.edu/files/emlbestpractices-2.0-FINAL-20110801_0.pdf
and somebody mentioned already that if the source dataset is in PASTA it will generate such an EML snippet. Otherwise you’d have to write it yourself.
Corinna
From: im [mailto:im-bounces at lists.lternet.edu] On Behalf Of Jonathan Walsh
Sent: Thursday, September 21, 2017 10:49 AM
To: Porter, John Henderson (jhp7e)
Cc: Emma Rosi; IM committee; jhp7e
Subject: Re: [LTER-im] Representing datasets from other parties when they are integral to your work
Thanks for all your thoughtful replies. You've given me plenty to read, plenty of examples, and plenty of ideas to think about. I'm going through the ideas and suggestions as time permits in between proposal-generating work!
On Thu, Sep 21, 2017 at 11:30 AM, Jonathan Walsh <walshjcaryinstitute at gmail.com<mailto:walshjcaryinstitute at gmail.com>> wrote:
John,
I like the idea of using a methodstep and no matter what else I do or don't' do I'm definitely doing that.
Jonathan
On Wed, Sep 20, 2017 at 3:37 PM, Porter, John Henderson (jhp7e) <jhp7e at eservices.virginia.edu<mailto:jhp7e at eservices.virginia.edu>> wrote:
Interesting distinction.....
We have our local catalog, the LTER Data Portal/EDI, to which we send our metadata (and PASTA then fetches the data). DataONE gets the metadata from PASTA, then links back to PASTA for the data itself.
It looks as if NSF will be increasingly emphasizing having data on EDI/PASTA, and deemphasizing the local catalog. And as we showed at ESIP, you can also implement a local view of the metadata/data in PASTA as a local catalog.
Back to the original subject, we generally try to avoid serving data that is also served by others because we want to avoid the headaches of keeping it updated, and the potential confusion of users over who to cite.
-JP
On Sep 20, 2017 1:46 PM, Jonathan Walsh <walshjcaryinstitute at gmail.com<mailto:walshjcaryinstitute at gmail.com>> wrote:
Thank you for replying. One clarification:
>>>>For LTER data, of course, we are the "best source" so we document that data and share it via PASTA, DataONE etc.
Does that mean you submit a copy of that data to PASTA, DataONE, etc, or just some documentation of what data it is and how to find it?
Thanks!
On Wed, Sep 20, 2017 at 1:12 PM, John Porter <jhp7e at eservices.virginia.edu<mailto:jhp7e at eservices.virginia.edu>> wrote:
Jonathan,
That sounds similar to our approach.
Generally speaking there is a "best source" for data - the place where updated data is reliably available - and in the case of USGS that is their web site, so we just point people there. For LTER data, of course, we are the "best source" so we document that data and share it via PASTA, DataONE etc.
There are a few cases where we do maintain "local" copies of external data, either because the availability at the source is unreliable (not the case for USGS), the data volume is best dealt with on a local network (e.g., large LiDAR datasets), the data is really hard to locate and extract, or where there is a reason that you want to maintain a particular static version of the data (e.g., for a series of analyses conducted over a period of time where you don't want the underlying data to change). However, we DON'T include them in the data catalog.
Don't forget that for your derived products you can include references to the USGS source in a "methodStep" in EML. For referring to source data already documented in EML and stored in PASTA via the "provenance" web service (e.g., https://pasta.lternet.edu/package/provenance/eml/knb-lter-bes/332/580) which will produce a methodStep "stub" that can easily be included in your metadata. For USGS you'll need to generate your own methodStep - but you can use the example produced by the provenance web service to do that....
Hope that helps!
-John Porter
On 9/20/2017 11:11 AM, Jonathan Walsh wrote:
Hi IMs!
I have a question on how to best represent data that your study uses, but is provided by others. If you use such data, we could use your insights on how you make that portion of your data available to the community and the LTER.
Baltimore Ecosystem Study gets its stream flow data from the USGS. We in turn use this flow data to calculate our daily loads and other results that we track. The USGS data are kept on their website and we incorporate them into our work.
For the purposes of making our data available to the larger community, (LTER, PASTA, DataONE, etc.) we have historically just pointed to the USGS data on the USGS site. (example: https://waterdata.usgs.gov/usa/nwis/uv?01589197 ) as opposed to collecting our own copy and providing it ourselves.
The above precludes us from providing a direct link to the data such as that that would be PASTA "type 1".
If you have any suggestions as to how, if differently, we should represent these data, which are integral to our work, but not provided by us, I would very much appreciate hearing them.
Thank you!
Jonathan
-
Jonathan Walsh
orcid.org/0000-0002-0658-0814<http://orcid.org/0000-0002-0658-0814>
Information Manager, Baltimore Ecosystem Study
Cary Institute of Ecosystem Studies
Box AB; Route 44A
Millbrook, NY 12545-0129
P: 845/677/7600 Extension 103<tel:845%2F677%2F7600%20Extension%20103>
F: 845/677/5976<tel:845%2F677%2F5976>
E: WalshJ at caryinstitute.org<mailto:WalshJ at caryinstitute.org>
_______________________________________________
Long Term Ecological Research Network
im mailing list
im at lternet.edu<mailto:im at lternet.edu>
--
John H. Porter
Dept. of Environmental Sciences
University of Virginia
291 McCormick Road<https://maps.google.com/?q=291+McCormick+Road&entry=gmail&source=g>
PO Box 400123
Charlottesville, VA 22904-4123
ORCID: http://orcid.org/0000-0003-3118-5784
--
Jonathan Walsh
orcid.org/0000-0002-0658-0814<http://orcid.org/0000-0002-0658-0814>
Information Manager, Baltimore Ecosystem Study
Cary Institute of Ecosystem Studies
Box AB; Route 44A
Millbrook, NY 12545-0129
P: 845/677/7600 Extension 103<tel:845%2F677%2F7600%20Extension%20103>
F: 845/677/5976<tel:845%2F677%2F5976>
E: WalshJ at caryinstitute.org<mailto:WalshJ at caryinstitute.org>
--
Jonathan Walsh
orcid.org/0000-0002-0658-0814<http://orcid.org/0000-0002-0658-0814>
Information Manager, Baltimore Ecosystem Study
Cary Institute of Ecosystem Studies
Box AB; Route 44A
Millbrook, NY 12545-0129
P: 845/677/7600 Extension 103<tel:845%2F677%2F7600%20Extension%20103>
F: 845/677/5976<tel:845%2F677%2F5976>
E: WalshJ at caryinstitute.org<mailto:WalshJ at caryinstitute.org>
--
Jonathan Walsh
orcid.org/0000-0002-0658-0814<http://orcid.org/0000-0002-0658-0814>
Information Manager, Baltimore Ecosystem Study
Cary Institute of Ecosystem Studies
Box AB; Route 44A
Millbrook, NY 12545-0129
P: 845/677/7600 Extension 103<tel:845%2F677%2F7600%20Extension%20103>
F: 845/677/5976<tel:845%2F677%2F5976>
E: WalshJ at caryinstitute.org<mailto:WalshJ at caryinstitute.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lternet.edu/pipermail/im/attachments/20170921/8ecf885d/attachment-0001.html>
More information about the im
mailing list