[LTER-im] Representing datasets from other parties when they are integral to your work

John Porter jhp7e at eservices.virginia.edu
Wed Sep 20 10:12:35 PDT 2017


Jonathan,

That sounds similar to our approach.

Generally speaking there is a "best source" for data - the place where
updated data is reliably available - and in the case of USGS that is
their web site, so we just point people there.   For LTER data, of
course, we are the "best source" so we document that data and share it
via PASTA, DataONE etc.

There are a few cases where we do maintain "local" copies of external
data, either because the availability at the source is unreliable (not
the case for USGS), the data volume is best dealt with on a local
network (e.g., large LiDAR datasets), the data is really hard to locate
and extract, or where there is a reason that you want to maintain a
particular static version of the data (e.g., for a series of analyses
conducted over a period of time where you don't want the underlying data
to change).  However, we DON'T include them in the data catalog.

Don't forget that for your derived products you can include references
to the USGS source in a "methodStep" in EML. For referring to source
data already documented in EML and stored in PASTA via the "provenance"
web service (e.g.,
https://pasta.lternet.edu/package/provenance/eml/knb-lter-bes/332/580)
which will produce a methodStep "stub" that can easily be included in
your metadata.  For USGS you'll need to generate your own methodStep -
but you can use the example produced by the provenance web service to do
that....

Hope that helps!

  -John Porter


On 9/20/2017 11:11 AM, Jonathan Walsh wrote:
> Hi IMs!
>
> I have a question on how to best represent data that your study uses,
> but is provided by others.  If you use such data, we could use your
> insights on how you make that portion of your data available to the
> community and the LTER.
>
> Baltimore Ecosystem Study gets its stream flow data from the USGS.  We
> in turn use this flow data to calculate our daily loads and other
> results that we track.  The USGS data are kept on their website and we
> incorporate them into our work.
>
> For the purposes of making our data available to the larger community,
> (LTER, PASTA, DataONE, etc.)  we have historically just pointed to the
> USGS data on the USGS site.  (example:
>  https://waterdata.usgs.gov/usa/nwis/uv?01589197 ) as opposed to
> collecting our own copy and providing it ourselves.
>
> The above precludes us from providing a direct link to the data such
> as that that would be PASTA "type 1".
>
> If you have any suggestions as to how, if differently, we should
> represent these data, which are integral to our work, but not provided
> by us, I would very much appreciate hearing them.
>
> Thank you!
>
> Jonathan
>
>> Jonathan Walsh
> orcid.org/0000-0002-0658-0814 <http://orcid.org/0000-0002-0658-0814>
> Information Manager, Baltimore Ecosystem Study
> Cary Institute of Ecosystem Studies
> Box AB; Route 44A
> Millbrook, NY 12545-0129
> P: 845/677/7600 Extension 103 <tel:845%2F677%2F7600%20Extension%20103>
> F: 845/677/5976 <tel:845%2F677%2F5976>
> E: WalshJ at caryinstitute.org <mailto:WalshJ at caryinstitute.org>
>
>
> _______________________________________________
> Long Term Ecological Research Network
> im mailing list
> im at lternet.edu
>

-- 
John H. Porter
Dept. of Environmental Sciences
University of Virginia
291 McCormick Road
PO Box 400123
Charlottesville, VA 22904-4123
ORCID: http://orcid.org/0000-0003-3118-5784

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lternet.edu/pipermail/im/attachments/20170920/02187fab/attachment.html>


More information about the im mailing list