[LTER-im] Representing datasets from other parties when they are integral to your work

Jonathan Walsh walshjcaryinstitute at gmail.com
Fri Sep 29 09:39:24 PDT 2017


Thank you for your thoughtful reply.  Sorry for the late response.  As you
know we have a lot going on right now.  I like your description of your own
streamflow and precipitation data method.  Perhaps we could do it similarly.

   1. Be sure to especially cite creator and title
   2. Create a BES dataset from the USGS data derived into the form we used
   it for
   3. Title it accordingly and include it.
   4. If the case is that we do not derive it at all, then we just create a
   dataset of the values we used and cite it as above and include it.

Also thanks for the MBON flow chart.  Very nice.



On Wed, Sep 20, 2017 at 2:14 PM, Margaret O'Brien <margaret.obrien at ucsb.edu>
wrote:

> Hi Jonathan -
>
> I see several ways to answer this question. Ideally, there is a way to
> handle different sources of data without complicating your IM system too
> much. And I hope this answer is not too complicated, either.
>
> It's pretty simple to link to other sources. SBC LTER does it on the main
> data page (http://sbc.lternet.edu/data/). But some of these we use for
> our LTER research, so they have been turned into datasets with EML, too.
> At SBC LTER, we call this "exogenous data", or "Type 0". And so our
> re-packaging turns it into "Type I". But we do try to make it clear where
> it came from.
>
> Particularly in the watersheds, there are a number of these, including
> from the USGS. I should add though, that we don't get the USGS data that we
> use from their website, we get their highest resolution data thru
> back-channels at the end of the water year after it's been QC'd.  In total,
> SBC LTER has 30-40 datasets like this, including both precipitation and
> stream flow.
>
> For stream flow and precip, we use these rules for creating datasets from
> exogenous dat. Keep in mind that the most visible parts of a citation are
> the creators and title, so these are important.
> 1. the data are reformatted to match our own (so they can be used together
> - now they are Type 1). Since stream flow is calc'd from stage height, and
> we do that part here, we do call it "ours"
> 2. creator is the primary PI conducting that part of the project
> 3. the dataset title names the source (if not us), and has both the
> original station and our station id in it.
> 4. the abstract also names the source
>
> Here are some examples:
> our data from a USGS stream gauge:
> http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.3018
>
> A similar dataset from one of our own height-gauges:
> http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.3007
>
> precipitation data collected by the county:
> http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.5012
>
> We have other exogenous data where we did nothing to it at all -- all we
> do is repost. These are mostly for reference, and for those, we make no
> claim to this data (unless we treated it somehow, like aggregating).
> So those rules:
> 1. data are in original format
> 2. creator is original org (not us)
> 3. dataset title shows is as close as we can get to what it was when
> received or downloaded. It may take a phone call to get it right.
> 4. contact - add one for the org, too.
>
> e.g., here is some KML data, that describes the perimeter of a recent
> fire.
> http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.70
>
>
>
> Another project I work with is a biodiversity observation network (BON).
> Most of the data they work with are exogenous, and a lot of what the BON
> does is to curate that data so they can use it in integrated research, and
> create datasets along the way. They have adopted the LTER data management
> protocols, including the whole EML > EDI > DataONE pathway.
>
> So we highlight the packaging guidelines for different types (de novo,
> curated exogenous, integration products)
> For that, see "Data Packaging", here: http://sbc.marinebon.org/data/
> overview/
> These are very similar to what we do with LTER, but the process is more
> formalized.
>
>
> One more note:
> EML has an area for describing "source data", under methods. Use it if you
> can. There is a node called sourceData, that works for holding info about
> another PASTA dataset. For URLs that are non-pasta, the only current option
> is to use the text fields.
> Here is an example for one of the biodiversity datasets that integrates
> data from 4 projects (open up Metadat > Methods):
> https://portal.edirepository.org/nis/metadataviewer?packageid=edi.5.2
> Yes, the LTER dataset is in PASTA, but our code was not not sophisticated
> enough to get the id inserted into edi.5's /eml//dataSource node. But since
> both the LTER and BON data are time-series, we'll get that in on the next
> update.
>
>
> Margaret
>
>
> Margaret O'Brien
> ORCID: 0000-0002-1693-8322
> Information Management
> Marine Science Institute, UCSB
> Santa Barbara, CA 93106
> 805-893-2071 <(805)%20893-2071> (voice)
> http://environmentaldatainitiative.org
> http://sbc.marinebon.org
> http://sbc.lternet.edu
>
>
>
> On Wed, Sep 20, 2017 at 8:11 AM, Jonathan Walsh <
> walshjcaryinstitute at gmail.com> wrote:
>
>> Hi IMs!
>>
>> I have a question on how to best represent data that your study uses, but
>> is provided by others.  If you use such data, we could use your insights on
>> how you make that portion of your data available to the community and the
>> LTER.
>>
>> Baltimore Ecosystem Study gets its stream flow data from the USGS.  We in
>> turn use this flow data to calculate our daily loads and other results that
>> we track.  The USGS data are kept on their website and we incorporate them
>> into our work.
>>
>> For the purposes of making our data available to the larger community,
>> (LTER, PASTA, DataONE, etc.)  we have historically just pointed to the USGS
>> data on the USGS site.  (example:  https://waterdata.usgs.gov/us
>> a/nwis/uv?01589197 ) as opposed to collecting our own copy and providing
>> it ourselves.
>>
>> The above precludes us from providing a direct link to the data such as
>> that that would be PASTA "type 1".
>>
>> If you have any suggestions as to how, if differently, we should
>> represent these data, which are integral to our work, but not provided by
>> us, I would very much appreciate hearing them.
>>
>> Thank you!
>>
>> Jonathan
>>
>> -
>> Jonathan Walsh
>> orcid.org/0000-0002-0658-0814
>> Information Manager, Baltimore Ecosystem Study
>> Cary Institute of Ecosystem Studies
>> Box AB; Route 44A
>> Millbrook, NY 12545-0129
>> P: 845/677/7600 Extension 103
>> F: 845/677/5976
>> E: WalshJ at caryinstitute.org
>>
>> _______________________________________________
>> Long Term Ecological Research Network
>> im mailing list
>> im at lternet.edu
>>
>>
>>
>


-- 
Jonathan Walsh
orcid.org/0000-0002-0658-0814
Information Manager, Baltimore Ecosystem Study
Cary Institute of Ecosystem Studies
Box AB; Route 44A
Millbrook, NY 12545-0129
P: 845/677/7600 Extension 103
F: 845/677/5976
E: WalshJ at caryinstitute.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lternet.edu/pipermail/im/attachments/20170929/d68d443a/attachment.html>


More information about the im mailing list