[LTER-im] Removing from PASTA?

Matt Jones jones at nceas.ucsb.edu
Mon Feb 29 12:52:58 MST 2016


Hi Jonathan,

Just to chime in here a bit from the DataONE side of things... we already
support several features which are of relevance to what you are
discussing.  We support "obsoletes/obsoletedBy" properties in the system
metadata, and these provide a direct version chain indicating when one
identifier represents a newer version replacing an older identifier.  As
soon as an EML document has been replaced by a newer version, the older
versions no longer show up in DataONE search results.  But they are still
accessible if someone knows the identifier directly (e.g., via a
citation).  If someone accesses an older version directly, the top of the
page prominently indicates that a newer version is available (for example,
see https://search.dataone.org/#view/knb-lter-bnz.69.13).

in addition, for more complex rearrangements of data (for example, when
several data packages get merged into one), we provide a mechanism for
indicating that the new data set was derived from the multiple earlier data
sets (using the prov:wasDerivedFrom property).  This goes into the data
package description.  These complex derivation relationships now show up on
the DataONE web site, showing the provenance relationships among objects
directly.  When creating these newly derived products, if you don't want
the old packages to also show up in searches, then the old packages can be
marked as 'archived'. When you do that, the new packages will show up in
searches, and the old packages will be listed as the source of the new
package, but only the new packages would show up in search results.

All of these features are available to the PASTA system when it submits
metadata records to DataONE.  I think it covers a lot of what you are
requesting in this thread. Happy to help with any followup discussion where
needed.

Matt


On Sun, Feb 28, 2016 at 11:32 AM, Margaret O'Brien <margaret.obrien at ucsb.edu
> wrote:

> Hey folks -
> These are issues of "Dataset design best practices", and we do have a
> working group for this. I think that pretty quickly, a group of 4-5 of us
> on VTC could iron out some recommendations for this particular question,
> based on James's email to Jim.
>
> Maybe we can schedule the April water cooler for this? My calendar says
> that the March subject is to continue with the IMC-NIMO relations. If
> anyone needs to talk over ideas ahead of that, we can schedule something ad
> hoc.
>
> Keep in mind that the removal of redundant datasets in D1 that Linda
> refers to was about the older datasets that came in from the Metacat node.
> Some sites did redesign their collections with the move to pasta, but these
> are really two different issues.
>
> Margaret
>
> -----------
> Margaret O'Brien
> Information Management
> Santa Barbara Coastal LTER
> Marine Science Institute, UCSB
> Santa Barbara, CA 93106
> 805-893-2071 (voice)
> http://sbc.lternet.edu
>
> On 2/28/16 10:16 AM, Jonathan Walsh wrote:
>
>> James makes a good point that I forgot to consider.  Those stream
>> temperature DOIs are already a part of the public domain and may well have
>> been used and cited so it's silly to consider deleting them.
>>
>> It would be nice if we could control what PASTA lists for the search
>> results so that deprecated data sets would still be available but not show
>> up as a first choice for the simple search.  And then maybe in advanced
>> search there could be a toggle to display deprecated datasets.  My goal is
>> to keep our premiere datasets, which are more and more multi dataset
>> packages, from being buried by our older, not-as-useful datasets to someone
>> browsing our data.
>>
>> I like the idea of including provenance EML for each of the deprecated
>> packages in my new multi dataset packages to provide a path forward and
>> backward between the old and new. I would definitely like to do that.  I'm
>> also working on a multi dataset package for our telephone survey GIS
>> componentry and it's a similar situation.
>>
>> A strategy to help ensure no new copies of deprecated packages get
>> released into the wild would indeed be a good topic for a call.
>>
>> Thanks for a
>>
>>
>>
>> On Sat, Feb 27, 2016 at 4:52 PM, James Laundre <jlaundre at mbl.edu <mailto:
>> jlaundre at mbl.edu>> wrote:
>>
>>     Hi Jonathan,
>>
>>     I have emailed Mark and Jame Brunt about deleting files from the
>>     LTER Network Data Portal since we are combining some of our yearly
>>     files into multiyear files. The email from James is below. I have
>>     just started the process of deprecating the old data sets and have
>>     not yet look into including the provenance EML.
>>
>>     One suggestion I have is to put a note in the abstract of the
>>     deprecated data set that explains and points to the new multiyear
>>     data set.  The abstract most likely will be read by people.
>>
>>     Cheers,
>>
>>     Jim
>>
>>
>>     *From: *"James Brunt" <jbrunt at lternet.edu <mailto:jbrunt at lternet.edu
>> >>
>>     *To: *jlaundre at mbl.edu <mailto:jlaundre at mbl.edu>
>>     *Cc: *"Mark Servilla" <servilla at LTERnet.edu>
>>     *Sent: *Monday, April 13, 2015 4:14:01 PM
>>     *Subject: *deleting data sets
>>
>>
>>     -----BEGIN PGP SIGNED MESSAGE-----
>>     Hash: SHA1
>>
>>     Hi Jim -
>>
>>     Mark forwarded me your request regarding deleting data packages. If I
>>     understand correctly what you are doing is creating new data packages
>>     that cover a series of years that you want to replace the individual
>>     annual data packages.
>>
>>     The individual packages that have already been published are basically
>>     in the public domain having been registered with DataCite and received
>>     a DOI, and been contributed to DataONE and from there possibly beyond.
>>     These data packages have potentially been used and cited in journals
>>     and we have an obligation to make sure the original is still
>>     available.
>>
>>     We can however make sure that no new copies of the deprecated data
>>     packages get released into the wild. This might require a call to
>>     discuss further but basically the process would be to update all of
>>     the impacted data packages with a revison that closes public read
>>     access in the EML to make them private. I'm assuming that you would
>>     issue new ID numbers. (If you were planning to update say the first
>>     ID number in each series this would still work to deprecate all the
>>     other data package IDs.)
>>
>>     If you wanted to make a slightly cleaner and more elegant transition
>>     you could include the provenance EML for each of the deprecated
>>     packages in your new package EML. That way there is a path forward and
>>     backward between the old and the new.
>>
>>     I'm sure this probably isn't what you want to hear since you were
>>     probably hoping to create a more compact list of ARC data packages.
>>     There were a number of finer grained tweaks to control the display
>>     like this that we had hoped to implement that had to be abandoned when
>>     NSF cut our funding.
>>
>>     All that said, it is still technically possible to delete a data
>>     package from PASTA but it's only through the API and wouldn't have any
>>     effect on those records already in DataONE and the wild. The Scope and
>>     Identifier are marked as deleted and cannot subsequently be reused. We
>>     would discourage this for the reason of our public obligation stated
>>     above and have only used it under extreme circumstances.
>>
>>     I'm happy to continue this discussion to fine tune your strategy as
>>     you feel necessary.
>>
>>     Regards,
>>
>>     James
>>
>>
>>
>> ------------------------------------------------------------------------
>>     *From: *"Linda A Powell" <powell at fiu.edu <mailto:powell at fiu.edu>>
>>     *To: *"Jonathan Walsh" <walshjcaryinstitute at gmail.com
>>     <mailto:walshjcaryinstitute at gmail.com>>, "IM committee"
>>     <im at lternet.edu <mailto:im at lternet.edu>>
>>     *Sent: *Saturday, February 27, 2016 1:02:36 PM
>>     *Subject: *Re: [LTER-im] Removing from PASTA?
>>
>>
>>     Hi Jonathan,
>>
>>
>>     A short while ago the entire IM community with the exception of
>>     Suzanne (via doodle poll) wanted to have all the old metacat
>>     (PASTA) records (that we thought were deleted) removed from being
>>     seen in DataOne.  Suzanne agreed with this practice but can't have
>>     her files deleted yet as they are still transferring files into
>>     PASTA. Mark was going to write a script for DataOne that would
>>     hide/remove these old files from the DataOne users and I don't
>>     know where he is in the process but hopefully it will be finished
>>     soon.
>>
>>
>>     I've not deleted a file in PASTA yet so I don't know how well the
>>     process works.  I suspect that the old, removed, files might not
>>     show up in PASTA but may be pushed to DataOne.  Hopefully Mark can
>>     speak to this.
>>
>>
>>     Best,
>>
>>
>>     Linda
>>
>>
>>     Linda Powell
>>     Information Manager
>>     Florida Coastal Everglades LTER Program
>>     OE 148, Florida International University
>>     University Park
>>     Miami, Florida 33199
>>     Phone (Tallahassee, FL): 850-745-0381 <tel:850-745-0381>
>>     Phone(Miami,FL): 305-856-0039 <tel:305-856-0039> or 305-348-6054
>>     <tel:305-348-6054>
>>     Website: http://fcelter.fiu.edu
>>
>>
>>
>> ------------------------------------------------------------------------
>>     *From:* im <im-bounces at lists.lternet.edu
>>     <mailto:im-bounces at lists.lternet.edu>> on behalf of Jonathan Walsh
>>     <walshjcaryinstitute at gmail.com <mailto:walshjcaryinstitute at gmail.com
>> >>
>>     *Sent:* Saturday, February 27, 2016 12:05 PM
>>     *To:* IM committee
>>     *Subject:* [LTER-im] Removing from PASTA?
>>     I think this topic has come up in the past but I do not recall the
>>     outcome and if so I apologize for that.
>>
>>     How can I remove records from PASTA?  I have a whole bunch of
>>     stream temperature files that I would like to combine.  Then I
>>     would like to remove the old ones.
>>
>>     The reason for this is when one browses BES on PASTA one sees
>>     mostly stream temperature files and it's confusing.
>>
>>     Thank you
>>
>>
>>
>>     --     Information Manager, Baltimore Ecosystem Study
>>     Institute of Ecosystem Studies
>>     Box AB; Route 44A
>>     Millbrook, NY 12545-0129
>>     P: 845/677/7600 Extension 103
>> <tel:845%2F677%2F7600%20Extension%20103>
>>     F: 845/677/5976 <tel:845%2F677%2F5976>
>>     E: WalshJ at EcoStudies.org
>>
>>     _______________________________________________
>>     Long Term Ecological Research Network
>>     im mailing list
>>     im at lternet.edu <mailto:im at lternet.edu>
>>
>>
>>
>>
>>
>> --
>> Information Manager, Baltimore Ecosystem Study
>> Institute of Ecosystem Studies
>> Box AB; Route 44A
>> Millbrook, NY 12545-0129
>> P: 845/677/7600 Extension 103 <tel:845%2F677%2F7600%20Extension%20103>
>> F: 845/677/5976 <tel:845%2F677%2F5976>
>> E: WalshJ at EcoStudies.org
>>
>>
>> _______________________________________________
>> Long Term Ecological Research Network
>> im mailing list
>> im at lternet.edu
>>
>>
> _______________________________________________
> Long Term Ecological Research Network
> im mailing list
> im at lternet.edu
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lternet.edu/pipermail/im/attachments/20160229/6f8d158c/attachment-0001.html>


More information about the im mailing list