[LTER-im] Removing from PASTA?

Jonathan Walsh walshjcaryinstitute at gmail.com
Mon Feb 29 13:29:21 MST 2016


Matt,

That is very good to hear, both the features, and the offer for help!

Thank you!

Jonathan


On Mon, Feb 29, 2016 at 2:52 PM, Matt Jones <jones at nceas.ucsb.edu> wrote:

> Hi Jonathan,
>
> Just to chime in here a bit from the DataONE side of things... we already
> support several features which are of relevance to what you are
> discussing.  We support "obsoletes/obsoletedBy" properties in the system
> metadata, and these provide a direct version chain indicating when one
> identifier represents a newer version replacing an older identifier.  As
> soon as an EML document has been replaced by a newer version, the older
> versions no longer show up in DataONE search results.  But they are still
> accessible if someone knows the identifier directly (e.g., via a
> citation).  If someone accesses an older version directly, the top of the
> page prominently indicates that a newer version is available (for example,
> see https://search.dataone.org/#view/knb-lter-bnz.69.13).
>
> in addition, for more complex rearrangements of data (for example, when
> several data packages get merged into one), we provide a mechanism for
> indicating that the new data set was derived from the multiple earlier data
> sets (using the prov:wasDerivedFrom property).  This goes into the data
> package description.  These complex derivation relationships now show up on
> the DataONE web site, showing the provenance relationships among objects
> directly.  When creating these newly derived products, if you don't want
> the old packages to also show up in searches, then the old packages can be
> marked as 'archived'. When you do that, the new packages will show up in
> searches, and the old packages will be listed as the source of the new
> package, but only the new packages would show up in search results.
>
> All of these features are available to the PASTA system when it submits
> metadata records to DataONE.  I think it covers a lot of what you are
> requesting in this thread. Happy to help with any followup discussion where
> needed.
>
> Matt
>
>
> On Sun, Feb 28, 2016 at 11:32 AM, Margaret O'Brien <
> margaret.obrien at ucsb.edu> wrote:
>
>> Hey folks -
>> These are issues of "Dataset design best practices", and we do have a
>> working group for this. I think that pretty quickly, a group of 4-5 of us
>> on VTC could iron out some recommendations for this particular question,
>> based on James's email to Jim.
>>
>> Maybe we can schedule the April water cooler for this? My calendar says
>> that the March subject is to continue with the IMC-NIMO relations. If
>> anyone needs to talk over ideas ahead of that, we can schedule something ad
>> hoc.
>>
>> Keep in mind that the removal of redundant datasets in D1 that Linda
>> refers to was about the older datasets that came in from the Metacat node.
>> Some sites did redesign their collections with the move to pasta, but these
>> are really two different issues.
>>
>> Margaret
>>
>> -----------
>> Margaret O'Brien
>> Information Management
>> Santa Barbara Coastal LTER
>> Marine Science Institute, UCSB
>> Santa Barbara, CA 93106
>> 805-893-2071 (voice)
>> http://sbc.lternet.edu
>>
>> On 2/28/16 10:16 AM, Jonathan Walsh wrote:
>>
>>> James makes a good point that I forgot to consider.  Those stream
>>> temperature DOIs are already a part of the public domain and may well have
>>> been used and cited so it's silly to consider deleting them.
>>>
>>> It would be nice if we could control what PASTA lists for the search
>>> results so that deprecated data sets would still be available but not show
>>> up as a first choice for the simple search.  And then maybe in advanced
>>> search there could be a toggle to display deprecated datasets.  My goal is
>>> to keep our premiere datasets, which are more and more multi dataset
>>> packages, from being buried by our older, not-as-useful datasets to someone
>>> browsing our data.
>>>
>>> I like the idea of including provenance EML for each of the deprecated
>>> packages in my new multi dataset packages to provide a path forward and
>>> backward between the old and new. I would definitely like to do that.  I'm
>>> also working on a multi dataset package for our telephone survey GIS
>>> componentry and it's a similar situation.
>>>
>>> A strategy to help ensure no new copies of deprecated packages get
>>> released into the wild would indeed be a good topic for a call.
>>>
>>> Thanks for a
>>>
>>>
>>>
>>> On Sat, Feb 27, 2016 at 4:52 PM, James Laundre <jlaundre at mbl.edu
>>> <mailto:jlaundre at mbl.edu>> wrote:
>>>
>>>     Hi Jonathan,
>>>
>>>     I have emailed Mark and Jame Brunt about deleting files from the
>>>     LTER Network Data Portal since we are combining some of our yearly
>>>     files into multiyear files. The email from James is below. I have
>>>     just started the process of deprecating the old data sets and have
>>>     not yet look into including the provenance EML.
>>>
>>>     One suggestion I have is to put a note in the abstract of the
>>>     deprecated data set that explains and points to the new multiyear
>>>     data set.  The abstract most likely will be read by people.
>>>
>>>     Cheers,
>>>
>>>     Jim
>>>
>>>
>>>     *From: *"James Brunt" <jbrunt at lternet.edu <mailto:jbrunt at lternet.edu
>>> >>
>>>     *To: *jlaundre at mbl.edu <mailto:jlaundre at mbl.edu>
>>>     *Cc: *"Mark Servilla" <servilla at LTERnet.edu>
>>>     *Sent: *Monday, April 13, 2015 4:14:01 PM
>>>     *Subject: *deleting data sets
>>>
>>>
>>>     -----BEGIN PGP SIGNED MESSAGE-----
>>>     Hash: SHA1
>>>
>>>     Hi Jim -
>>>
>>>     Mark forwarded me your request regarding deleting data packages. If I
>>>     understand correctly what you are doing is creating new data packages
>>>     that cover a series of years that you want to replace the individual
>>>     annual data packages.
>>>
>>>     The individual packages that have already been published are
>>> basically
>>>     in the public domain having been registered with DataCite and
>>> received
>>>     a DOI, and been contributed to DataONE and from there possibly
>>> beyond.
>>>     These data packages have potentially been used and cited in journals
>>>     and we have an obligation to make sure the original is still
>>>     available.
>>>
>>>     We can however make sure that no new copies of the deprecated data
>>>     packages get released into the wild. This might require a call to
>>>     discuss further but basically the process would be to update all of
>>>     the impacted data packages with a revison that closes public read
>>>     access in the EML to make them private. I'm assuming that you would
>>>     issue new ID numbers. (If you were planning to update say the first
>>>     ID number in each series this would still work to deprecate all the
>>>     other data package IDs.)
>>>
>>>     If you wanted to make a slightly cleaner and more elegant transition
>>>     you could include the provenance EML for each of the deprecated
>>>     packages in your new package EML. That way there is a path forward
>>> and
>>>     backward between the old and the new.
>>>
>>>     I'm sure this probably isn't what you want to hear since you were
>>>     probably hoping to create a more compact list of ARC data packages.
>>>     There were a number of finer grained tweaks to control the display
>>>     like this that we had hoped to implement that had to be abandoned
>>> when
>>>     NSF cut our funding.
>>>
>>>     All that said, it is still technically possible to delete a data
>>>     package from PASTA but it's only through the API and wouldn't have
>>> any
>>>     effect on those records already in DataONE and the wild. The Scope
>>> and
>>>     Identifier are marked as deleted and cannot subsequently be reused.
>>> We
>>>     would discourage this for the reason of our public obligation stated
>>>     above and have only used it under extreme circumstances.
>>>
>>>     I'm happy to continue this discussion to fine tune your strategy as
>>>     you feel necessary.
>>>
>>>     Regards,
>>>
>>>     James
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>     *From: *"Linda A Powell" <powell at fiu.edu <mailto:powell at fiu.edu>>
>>>     *To: *"Jonathan Walsh" <walshjcaryinstitute at gmail.com
>>>     <mailto:walshjcaryinstitute at gmail.com>>, "IM committee"
>>>     <im at lternet.edu <mailto:im at lternet.edu>>
>>>     *Sent: *Saturday, February 27, 2016 1:02:36 PM
>>>     *Subject: *Re: [LTER-im] Removing from PASTA?
>>>
>>>
>>>     Hi Jonathan,
>>>
>>>
>>>     A short while ago the entire IM community with the exception of
>>>     Suzanne (via doodle poll) wanted to have all the old metacat
>>>     (PASTA) records (that we thought were deleted) removed from being
>>>     seen in DataOne.  Suzanne agreed with this practice but can't have
>>>     her files deleted yet as they are still transferring files into
>>>     PASTA. Mark was going to write a script for DataOne that would
>>>     hide/remove these old files from the DataOne users and I don't
>>>     know where he is in the process but hopefully it will be finished
>>>     soon.
>>>
>>>
>>>     I've not deleted a file in PASTA yet so I don't know how well the
>>>     process works.  I suspect that the old, removed, files might not
>>>     show up in PASTA but may be pushed to DataOne.  Hopefully Mark can
>>>     speak to this.
>>>
>>>
>>>     Best,
>>>
>>>
>>>     Linda
>>>
>>>
>>>     Linda Powell
>>>     Information Manager
>>>     Florida Coastal Everglades LTER Program
>>>     OE 148, Florida International University
>>>     University Park
>>>     Miami, Florida 33199
>>>     Phone (Tallahassee, FL): 850-745-0381 <tel:850-745-0381>
>>>     Phone(Miami,FL): 305-856-0039 <tel:305-856-0039> or 305-348-6054
>>>     <tel:305-348-6054>
>>>     Website: http://fcelter.fiu.edu
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>     *From:* im <im-bounces at lists.lternet.edu
>>>     <mailto:im-bounces at lists.lternet.edu>> on behalf of Jonathan Walsh
>>>     <walshjcaryinstitute at gmail.com <mailto:walshjcaryinstitute at gmail.com
>>> >>
>>>     *Sent:* Saturday, February 27, 2016 12:05 PM
>>>     *To:* IM committee
>>>     *Subject:* [LTER-im] Removing from PASTA?
>>>     I think this topic has come up in the past but I do not recall the
>>>     outcome and if so I apologize for that.
>>>
>>>     How can I remove records from PASTA?  I have a whole bunch of
>>>     stream temperature files that I would like to combine.  Then I
>>>     would like to remove the old ones.
>>>
>>>     The reason for this is when one browses BES on PASTA one sees
>>>     mostly stream temperature files and it's confusing.
>>>
>>>     Thank you
>>>
>>>
>>>
>>>     --     Information Manager, Baltimore Ecosystem Study
>>>     Institute of Ecosystem Studies
>>>     Box AB; Route 44A
>>>     Millbrook, NY 12545-0129
>>>     P: 845/677/7600 Extension 103
>>> <tel:845%2F677%2F7600%20Extension%20103>
>>>     F: 845/677/5976 <tel:845%2F677%2F5976>
>>>     E: WalshJ at EcoStudies.org
>>>
>>>     _______________________________________________
>>>     Long Term Ecological Research Network
>>>     im mailing list
>>>     im at lternet.edu <mailto:im at lternet.edu>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Information Manager, Baltimore Ecosystem Study
>>> Institute of Ecosystem Studies
>>> Box AB; Route 44A
>>> Millbrook, NY 12545-0129
>>> P: 845/677/7600 Extension 103 <tel:845%2F677%2F7600%20Extension%20103>
>>> F: 845/677/5976 <tel:845%2F677%2F5976>
>>> E: WalshJ at EcoStudies.org
>>>
>>>
>>> _______________________________________________
>>> Long Term Ecological Research Network
>>> im mailing list
>>> im at lternet.edu
>>>
>>>
>> _______________________________________________
>> Long Term Ecological Research Network
>> im mailing list
>> im at lternet.edu
>>
>>
>
> _______________________________________________
> Long Term Ecological Research Network
> im mailing list
> im at lternet.edu
>
>
>


-- 
Information Manager, Baltimore Ecosystem Study
Institute of Ecosystem Studies
Box AB; Route 44A
Millbrook, NY 12545-0129
P: 845/677/7600 Extension 103
F: 845/677/5976
E: WalshJ at EcoStudies.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lternet.edu/pipermail/im/attachments/20160229/bdcee4ad/attachment-0001.html>


More information about the im mailing list