[LTER-im] Representing datasets from other parties when they are integral to your work

Porter, John Henderson (jhp7e) jhp7e at eservices.virginia.edu
Wed Sep 20 12:37:30 PDT 2017


Interesting distinction.....

We have our local catalog, the LTER Data Portal/EDI, to which we send our metadata (and PASTA then fetches the data). DataONE gets the metadata from PASTA, then links back to PASTA for the data itself.

It looks as if NSF will be increasingly emphasizing having data on EDI/PASTA, and deemphasizing the local catalog. And as we showed at ESIP, you can also implement a local view of the metadata/data in PASTA as a local catalog.

Back to the original subject, we generally try to avoid serving data that is also served by others because we want to avoid the headaches of keeping it updated, and the potential confusion of users over who to cite.

-JP



On Sep 20, 2017 1:46 PM, Jonathan Walsh <walshjcaryinstitute at gmail.com> wrote:
Thank you for replying.  One clarification:
>>>>For LTER data, of course, we are the "best source" so we document that data and share it via PASTA, DataONE etc.
Does that mean you submit a copy of that data to PASTA, DataONE, etc, or just some documentation of what data it is and how to find it?

Thanks!


On Wed, Sep 20, 2017 at 1:12 PM, John Porter <jhp7e at eservices.virginia.edu<mailto:jhp7e at eservices.virginia.edu>> wrote:

Jonathan,

That sounds similar to our approach.

Generally speaking there is a "best source" for data - the place where updated data is reliably available - and in the case of USGS that is their web site, so we just point people there.   For LTER data, of course, we are the "best source" so we document that data and share it via PASTA, DataONE etc.

There are a few cases where we do maintain "local" copies of external data, either because the availability at the source is unreliable (not the case for USGS), the data volume is best dealt with on a local network (e.g., large LiDAR datasets), the data is really hard to locate and extract, or where there is a reason that you want to maintain a particular static version of the data (e.g., for a series of analyses conducted over a period of time where you don't want the underlying data to change).  However, we DON'T include them in the data catalog.

Don't forget that for your derived products you can include references to the USGS source in a "methodStep" in EML. For referring to source data already documented in EML and stored in PASTA via the "provenance" web service (e.g., https://pasta.lternet.edu/package/provenance/eml/knb-lter-bes/332/580) which will produce a methodStep "stub" that can easily be included in your metadata.  For USGS you'll need to generate your own methodStep - but you can use the example produced by the provenance web service to do that....

Hope that helps!

  -John Porter

On 9/20/2017 11:11 AM, Jonathan Walsh wrote:
Hi IMs!

I have a question on how to best represent data that your study uses, but is provided by others.  If you use such data, we could use your insights on how you make that portion of your data available to the community and the LTER.

Baltimore Ecosystem Study gets its stream flow data from the USGS.  We in turn use this flow data to calculate our daily loads and other results that we track.  The USGS data are kept on their website and we incorporate them into our work.

For the purposes of making our data available to the larger community, (LTER, PASTA, DataONE, etc.)  we have historically just pointed to the USGS data on the USGS site.  (example:  https://waterdata.usgs.gov/usa/nwis/uv?01589197 ) as opposed to collecting our own copy and providing it ourselves.

The above precludes us from providing a direct link to the data such as that that would be PASTA "type 1".

If you have any suggestions as to how, if differently, we should represent these data, which are integral to our work, but not provided by us, I would very much appreciate hearing them.

Thank you!

Jonathan

-
Jonathan Walsh
orcid.org/0000-0002-0658-0814<http://orcid.org/0000-0002-0658-0814>
Information Manager, Baltimore Ecosystem Study
Cary Institute of Ecosystem Studies
Box AB; Route 44A
Millbrook, NY 12545-0129
P: 845/677/7600 Extension 103<tel:845%2F677%2F7600%20Extension%20103>
F: 845/677/5976<tel:845%2F677%2F5976>
E: WalshJ at caryinstitute.org<mailto:WalshJ at caryinstitute.org>



_______________________________________________
Long Term Ecological Research Network
im mailing list
im at lternet.edu<mailto:im at lternet.edu>




--
John H. Porter
Dept. of Environmental Sciences
University of Virginia
291 McCormick Road<https://maps.google.com/?q=291+McCormick+Road&entry=gmail&source=g>
PO Box 400123
Charlottesville, VA 22904-4123
ORCID: http://orcid.org/0000-0003-3118-5784



--
Jonathan Walsh
orcid.org/0000-0002-0658-0814<http://orcid.org/0000-0002-0658-0814>
Information Manager, Baltimore Ecosystem Study
Cary Institute of Ecosystem Studies
Box AB; Route 44A
Millbrook, NY 12545-0129
P: 845/677/7600 Extension 103<tel:845%2F677%2F7600%20Extension%20103>
F: 845/677/5976<tel:845%2F677%2F5976>
E: WalshJ at caryinstitute.org<mailto:WalshJ at caryinstitute.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lternet.edu/pipermail/im/attachments/20170920/0ceaec27/attachment.html>


More information about the im mailing list