[LTER-im] Representing datasets from other parties when they are integral to your work
Jonathan Walsh
walshjcaryinstitute at gmail.com
Thu Sep 21 08:30:05 PDT 2017
John,
I like the idea of using a methodstep and no matter what else I do or
don't' do I'm definitely doing that.
Jonathan
On Wed, Sep 20, 2017 at 3:37 PM, Porter, John Henderson (jhp7e) <
jhp7e at eservices.virginia.edu> wrote:
> Interesting distinction.....
>
> We have our local catalog, the LTER Data Portal/EDI, to which we send our
> metadata (and PASTA then fetches the data). DataONE gets the metadata from
> PASTA, then links back to PASTA for the data itself.
>
> It looks as if NSF will be increasingly emphasizing having data on
> EDI/PASTA, and deemphasizing the local catalog. And as we showed at ESIP,
> you can also implement a local view of the metadata/data in PASTA as a
> local catalog.
>
> Back to the original subject, we generally try to avoid serving data that
> is also served by others because we want to avoid the headaches of keeping
> it updated, and the potential confusion of users over who to cite.
>
> -JP
>
>
>
> On Sep 20, 2017 1:46 PM, Jonathan Walsh <walshjcaryinstitute at gmail.com>
> wrote:
>
> Thank you for replying. One clarification:
> >>>>For LTER data, of course, we are the "best source" so we document
> that data and share it via PASTA, DataONE etc.
> Does that mean you submit a copy of that data to PASTA, DataONE, etc, or
> just some documentation of what data it is and how to find it?
>
> Thanks!
>
>
> On Wed, Sep 20, 2017 at 1:12 PM, John Porter <jhp7e at eservices.virginia.edu
> > wrote:
>
> Jonathan,
>
> That sounds similar to our approach.
>
> Generally speaking there is a "best source" for data - the place where
> updated data is reliably available - and in the case of USGS that is their
> web site, so we just point people there. For LTER data, of course, we are
> the "best source" so we document that data and share it via PASTA, DataONE
> etc.
>
> There are a few cases where we do maintain "local" copies of external
> data, either because the availability at the source is unreliable (not the
> case for USGS), the data volume is best dealt with on a local network
> (e.g., large LiDAR datasets), the data is really hard to locate and
> extract, or where there is a reason that you want to maintain a particular
> static version of the data (e.g., for a series of analyses conducted over a
> period of time where you don't want the underlying data to change).
> However, we DON'T include them in the data catalog.
>
> Don't forget that for your derived products you can include references to
> the USGS source in a "methodStep" in EML. For referring to source data
> already documented in EML and stored in PASTA via the "provenance" web
> service (e.g., https://pasta.lternet.edu/package/provenance/eml/knb-lter-
> bes/332/580) which will produce a methodStep "stub" that can easily be
> included in your metadata. For USGS you'll need to generate your own
> methodStep - but you can use the example produced by the provenance web
> service to do that....
>
> Hope that helps!
>
> -John Porter
>
> On 9/20/2017 11:11 AM, Jonathan Walsh wrote:
>
> Hi IMs!
>
> I have a question on how to best represent data that your study uses, but
> is provided by others. If you use such data, we could use your insights on
> how you make that portion of your data available to the community and the
> LTER.
>
> Baltimore Ecosystem Study gets its stream flow data from the USGS. We in
> turn use this flow data to calculate our daily loads and other results that
> we track. The USGS data are kept on their website and we incorporate them
> into our work.
>
> For the purposes of making our data available to the larger community,
> (LTER, PASTA, DataONE, etc.) we have historically just pointed to the USGS
> data on the USGS site. (example: https://waterdata.usgs.gov/us
> a/nwis/uv?01589197 ) as opposed to collecting our own copy and providing
> it ourselves.
>
> The above precludes us from providing a direct link to the data such as
> that that would be PASTA "type 1".
>
> If you have any suggestions as to how, if differently, we should represent
> these data, which are integral to our work, but not provided by us, I would
> very much appreciate hearing them.
>
> Thank you!
>
> Jonathan
>
> -
> Jonathan Walsh
> orcid.org/0000-0002-0658-0814
> Information Manager, Baltimore Ecosystem Study
> Cary Institute of Ecosystem Studies
> Box AB; Route 44A
> Millbrook, NY 12545-0129
> P: 845/677/7600 Extension 103
> F: 845/677/5976
> E: WalshJ at caryinstitute.org
>
>
> _______________________________________________
> Long Term Ecological Research Network
> im mailing listim at lternet.edu
>
>
> --
> John H. Porter
> Dept. of Environmental Sciences
> University of Virginia291 McCormick Road <https://maps.google.com/?q=291+McCormick+Road&entry=gmail&source=g>
> PO Box 400123
> Charlottesville, VA 22904-4123
> ORCID: http://orcid.org/0000-0003-3118-5784
>
>
>
>
> --
> Jonathan Walsh
> orcid.org/0000-0002-0658-0814
> Information Manager, Baltimore Ecosystem Study
> Cary Institute of Ecosystem Studies
> Box AB; Route 44A
> Millbrook, NY 12545-0129
> P: 845/677/7600 Extension 103
> F: 845/677/5976
> E: WalshJ at caryinstitute.org
>
>
>
--
Jonathan Walsh
orcid.org/0000-0002-0658-0814
Information Manager, Baltimore Ecosystem Study
Cary Institute of Ecosystem Studies
Box AB; Route 44A
Millbrook, NY 12545-0129
P: 845/677/7600 Extension 103
F: 845/677/5976
E: WalshJ at caryinstitute.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lternet.edu/pipermail/im/attachments/20170921/0889717c/attachment.html>
More information about the im
mailing list