[LTER-im] Representing datasets from other parties when they are integral to your work

Margaret O'Brien margaret.obrien at ucsb.edu
Wed Sep 20 11:14:32 PDT 2017


Hi Jonathan -

I see several ways to answer this question. Ideally, there is a way to
handle different sources of data without complicating your IM system too
much. And I hope this answer is not too complicated, either.

It's pretty simple to link to other sources. SBC LTER does it on the main
data page (http://sbc.lternet.edu/data/). But some of these we use for our
LTER research, so they have been turned into datasets with EML, too.  At
SBC LTER, we call this "exogenous data", or "Type 0". And so our
re-packaging turns it into "Type I". But we do try to make it clear where
it came from.

Particularly in the watersheds, there are a number of these, including from
the USGS. I should add though, that we don't get the USGS data that we use
from their website, we get their highest resolution data thru back-channels
at the end of the water year after it's been QC'd.  In total, SBC LTER has
30-40 datasets like this, including both precipitation and stream flow.

For stream flow and precip, we use these rules for creating datasets from
exogenous dat. Keep in mind that the most visible parts of a citation are
the creators and title, so these are important.
1. the data are reformatted to match our own (so they can be used together
- now they are Type 1). Since stream flow is calc'd from stage height, and
we do that part here, we do call it "ours"
2. creator is the primary PI conducting that part of the project
3. the dataset title names the source (if not us), and has both the
original station and our station id in it.
4. the abstract also names the source

Here are some examples:
our data from a USGS stream gauge:
http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.3018

A similar dataset from one of our own height-gauges:
http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.3007

precipitation data collected by the county:
http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.5012

We have other exogenous data where we did nothing to it at all -- all we do
is repost. These are mostly for reference, and for those, we make no claim
to this data (unless we treated it somehow, like aggregating).
So those rules:
1. data are in original format
2. creator is original org (not us)
3. dataset title shows is as close as we can get to what it was when
received or downloaded. It may take a phone call to get it right.
4. contact - add one for the org, too.

e.g., here is some KML data, that describes the perimeter of a recent fire.
http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.70



Another project I work with is a biodiversity observation network (BON).
Most of the data they work with are exogenous, and a lot of what the BON
does is to curate that data so they can use it in integrated research, and
create datasets along the way. They have adopted the LTER data management
protocols, including the whole EML > EDI > DataONE pathway.

So we highlight the packaging guidelines for different types (de novo,
curated exogenous, integration products)
For that, see "Data Packaging", here: http://sbc.marinebon.org/data/
overview/
These are very similar to what we do with LTER, but the process is more
formalized.


One more note:
EML has an area for describing "source data", under methods. Use it if you
can. There is a node called sourceData, that works for holding info about
another PASTA dataset. For URLs that are non-pasta, the only current option
is to use the text fields.
Here is an example for one of the biodiversity datasets that integrates
data from 4 projects (open up Metadat > Methods):
https://portal.edirepository.org/nis/metadataviewer?packageid=edi.5.2
Yes, the LTER dataset is in PASTA, but our code was not not sophisticated
enough to get the id inserted into edi.5's /eml//dataSource node. But since
both the LTER and BON data are time-series, we'll get that in on the next
update.


Margaret


Margaret O'Brien
ORCID: 0000-0002-1693-8322
Information Management
Marine Science Institute, UCSB
Santa Barbara, CA 93106
805-893-2071 <(805)%20893-2071> (voice)
http://environmentaldatainitiative.org
http://sbc.marinebon.org
http://sbc.lternet.edu



On Wed, Sep 20, 2017 at 8:11 AM, Jonathan Walsh <
walshjcaryinstitute at gmail.com> wrote:

> Hi IMs!
>
> I have a question on how to best represent data that your study uses, but
> is provided by others.  If you use such data, we could use your insights on
> how you make that portion of your data available to the community and the
> LTER.
>
> Baltimore Ecosystem Study gets its stream flow data from the USGS.  We in
> turn use this flow data to calculate our daily loads and other results that
> we track.  The USGS data are kept on their website and we incorporate them
> into our work.
>
> For the purposes of making our data available to the larger community,
> (LTER, PASTA, DataONE, etc.)  we have historically just pointed to the USGS
> data on the USGS site.  (example:  https://waterdata.usgs.gov/us
> a/nwis/uv?01589197 ) as opposed to collecting our own copy and providing
> it ourselves.
>
> The above precludes us from providing a direct link to the data such as
> that that would be PASTA "type 1".
>
> If you have any suggestions as to how, if differently, we should represent
> these data, which are integral to our work, but not provided by us, I would
> very much appreciate hearing them.
>
> Thank you!
>
> Jonathan
>
> -
> Jonathan Walsh
> orcid.org/0000-0002-0658-0814
> Information Manager, Baltimore Ecosystem Study
> Cary Institute of Ecosystem Studies
> Box AB; Route 44A
> Millbrook, NY 12545-0129
> P: 845/677/7600 Extension 103
> F: 845/677/5976
> E: WalshJ at caryinstitute.org
>
> _______________________________________________
> Long Term Ecological Research Network
> im mailing list
> im at lternet.edu
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lternet.edu/pipermail/im/attachments/20170920/bdbd8036/attachment.html>


More information about the im mailing list