[LTER-im] Representing datasets from other parties when they are integral to your work

Jonathan Walsh walshjcaryinstitute at gmail.com
Wed Oct 11 13:01:12 PDT 2017


Don,

Your approach makes a lot of sense to me.  I appreciate the information.
In the case of BES this would be like you say, a good use of the method
step node in EML.

Thanks!

On Wed, Sep 20, 2017 at 3:52 PM, Henshaw, Donald <
don.henshaw at oregonstate.edu> wrote:

> Hi,
>
> The USGS collects our streamflow for Lookout Creek which is the highest
> order stream in the Andrews. We do repackage the USGS data and include this
> in the same data set with the same structure as all of the other nine small
> watersheds where we collect streamflow. We do this for several reasons:
>
> ·         As a convenience for our PIs the AND LTER provides both the
> high temporal and daily versions of the USGS streamflow data in the same
> format as all of our other watersheds.
>
> ·         Maintaining the USGS streamflow in AND LTER formats allows our
> applications for summarizing streamflow and preparing streamflow data for
> use with our stream chemistry samples work for these USGS data
>
> ·         The high temporal versions are only available for the past 15
> years or so through USGS. The AND LTER has recreated an hourly data set
> after historic reconstruction from charts, punch tapes and printouts going
> back to 1949, so we offer the only complete high temporal version of this
> data.  The USGS never bothered to Q/C any of the high temporal resolution
> data until more recent years.
>
> ·         The USFS maintained this gage for eight years in the 1950s and
> 60s and the USGS does not maintain these records. Our daily record is the
> same as the USGS daily record beginning in 1949 except we include these
> additional 8 years, so we offer the most complete and “best source” for the
> daily record as well
>
> We do reference the USGS contribution in the data abstract and methods,
> but including specific references to the USGS source in a method step in
> EML sounds very appropriate.
>
>
>
> On the other hand we have never done anything with our National
> Atmospheric Deposition Program (NADP) data, although we do collect similar
> precipitation chemistry data at AND LTER. This NADP data is well maintained
> on their website beginning in 1980. However Andrews personnel have spent
> time every week sending information and samples to NADP since 1980 we
> should probably make this data more obviously available.
>
>
>
> Don
>
>
>
> *From:* im [mailto:im-bounces at lists.lternet.edu] *On Behalf Of *Margaret
> O'Brien
> *Sent:* Wednesday, September 20, 2017 11:15 AM
> *To:* Jonathan Walsh
> *Cc:* Emma Rosi; IM committee
> *Subject:* Re: [LTER-im] Representing datasets from other parties when
> they are integral to your work
>
>
>
> Hi Jonathan -
>
>
>
> I see several ways to answer this question. Ideally, there is a way to
> handle different sources of data without complicating your IM system too
> much. And I hope this answer is not too complicated, either.
>
>
>
> It's pretty simple to link to other sources. SBC LTER does it on the main
> data page (http://sbc.lternet.edu/data/). But some of these we use for
> our LTER research, so they have been turned into datasets with EML, too.
> At SBC LTER, we call this "exogenous data", or "Type 0". And so our
> re-packaging turns it into "Type I". But we do try to make it clear where
> it came from.
>
>
>
> Particularly in the watersheds, there are a number of these, including
> from the USGS. I should add though, that we don't get the USGS data that we
> use from their website, we get their highest resolution data thru
> back-channels at the end of the water year after it's been QC'd.  In total,
> SBC LTER has 30-40 datasets like this, including both precipitation and
> stream flow.
>
>
>
> For stream flow and precip, we use these rules for creating datasets from
> exogenous dat. Keep in mind that the most visible parts of a citation are
> the creators and title, so these are important.
>
> 1. the data are reformatted to match our own (so they can be used together
> - now they are Type 1). Since stream flow is calc'd from stage height, and
> we do that part here, we do call it "ours"
>
> 2. creator is the primary PI conducting that part of the project
>
> 3. the dataset title names the source (if not us), and has both the
> original station and our station id in it.
>
> 4. the abstract also names the source
>
>
>
> Here are some examples:
>
> our data from a USGS stream gauge:
>
> http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.3018
>
>
>
> A similar dataset from one of our own height-gauges:
>
> http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.3007
>
>
>
> precipitation data collected by the county:
>
> http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.5012
>
>
>
> We have other exogenous data where we did nothing to it at all -- all we
> do is repost. These are mostly for reference, and for those, we make no
> claim to this data (unless we treated it somehow, like aggregating).
>
> So those rules:
>
> 1. data are in original format
>
> 2. creator is original org (not us)
>
> 3. dataset title shows is as close as we can get to what it was when
> received or downloaded. It may take a phone call to get it right.
>
> 4. contact - add one for the org, too.
>
>
>
> e.g., here is some KML data, that describes the perimeter of a recent
> fire.
>
> http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.70
>
>
>
>
>
>
>
> Another project I work with is a biodiversity observation network (BON).
> Most of the data they work with are exogenous, and a lot of what the BON
> does is to curate that data so they can use it in integrated research, and
> create datasets along the way. They have adopted the LTER data management
> protocols, including the whole EML > EDI > DataONE pathway.
>
>
>
> So we highlight the packaging guidelines for different types (de novo,
> curated exogenous, integration products)
>
> For that, see "Data Packaging", here: http://sbc.marinebon.org/data/
> overview/
>
> These are very similar to what we do with LTER, but the process is more
> formalized.
>
>
>
>
>
> One more note:
>
> EML has an area for describing "source data", under methods. Use it if you
> can. There is a node called sourceData, that works for holding info about
> another PASTA dataset. For URLs that are non-pasta, the only current option
> is to use the text fields.
>
> Here is an example for one of the biodiversity datasets that integrates
> data from 4 projects (open up Metadat > Methods):
>
> https://portal.edirepository.org/nis/metadataviewer?packageid=edi.5.2
>
> Yes, the LTER dataset is in PASTA, but our code was not not sophisticated
> enough to get the id inserted into edi.5's /eml//dataSource node. But since
> both the LTER and BON data are time-series, we'll get that in on the next
> update.
>
>
>
>
>
> Margaret
>
>
>
>
> Margaret O'Brien
> ORCID: 0000-0002-1693-8322
> Information Management
> Marine Science Institute, UCSB
> Santa Barbara, CA 93106
> 805-893-2071 <(805)%20893-2071> (voice)
> http://environmentaldatainitiative.org
> http://sbc.marinebon.org
>
> http://sbc.lternet.edu
>
>
>
>
>
>
>
> On Wed, Sep 20, 2017 at 8:11 AM, Jonathan Walsh <
> walshjcaryinstitute at gmail.com> wrote:
>
> Hi IMs!
>
>
>
> I have a question on how to best represent data that your study uses, but
> is provided by others.  If you use such data, we could use your insights on
> how you make that portion of your data available to the community and the
> LTER.
>
>
>
> Baltimore Ecosystem Study gets its stream flow data from the USGS.  We in
> turn use this flow data to calculate our daily loads and other results that
> we track.  The USGS data are kept on their website and we incorporate them
> into our work.
>
>
>
> For the purposes of making our data available to the larger community,
> (LTER, PASTA, DataONE, etc.)  we have historically just pointed to the USGS
> data on the USGS site.  (example:  https://waterdata.usgs.gov/
> usa/nwis/uv?01589197 ) as opposed to collecting our own copy and
> providing it ourselves.
>
>
>
> The above precludes us from providing a direct link to the data such as
> that that would be PASTA "type 1".
>
>
>
> If you have any suggestions as to how, if differently, we should represent
> these data, which are integral to our work, but not provided by us, I would
> very much appreciate hearing them.
>
>
>
> Thank you!
>
>
>
> Jonathan
>
>
>
> -
>
> Jonathan Walsh
>
> orcid.org/0000-0002-0658-0814
> Information Manager, Baltimore Ecosystem Study
> Cary Institute of Ecosystem Studies
> Box AB; Route 44A
> Millbrook, NY 12545-0129
> P: 845/677/7600 Extension 103
> F: 845/677/5976
> E: WalshJ at caryinstitute.org
>
>
> _______________________________________________
> Long Term Ecological Research Network
> im mailing list
> im at lternet.edu
>
>
>



-- 
Jonathan Walsh
orcid.org/0000-0002-0658-0814
Information Manager, Baltimore Ecosystem Study
Cary Institute of Ecosystem Studies
Box AB; Route 44A
Millbrook, NY 12545-0129
P: 845/677/7600 Extension 103
F: 845/677/5976
E: WalshJ at caryinstitute.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lternet.edu/pipermail/im/attachments/20171011/f21c1d31/attachment-0001.html>


More information about the im mailing list