[LTER-im] Removing from PASTA?
Jonathan Walsh
walshjcaryinstitute at gmail.com
Mon Feb 29 13:29:21 MST 2016
Matt,
That is very good to hear, both the features, and the offer for help!
Thank you!
Jonathan
On Mon, Feb 29, 2016 at 2:52 PM, Matt Jones <jones at nceas.ucsb.edu> wrote:
> Hi Jonathan,
>
> Just to chime in here a bit from the DataONE side of things... we already
> support several features which are of relevance to what you are
> discussing. We support "obsoletes/obsoletedBy" properties in the system
> metadata, and these provide a direct version chain indicating when one
> identifier represents a newer version replacing an older identifier. As
> soon as an EML document has been replaced by a newer version, the older
> versions no longer show up in DataONE search results. But they are still
> accessible if someone knows the identifier directly (e.g., via a
> citation). If someone accesses an older version directly, the top of the
> page prominently indicates that a newer version is available (for example,
> see https://search.dataone.org/#view/knb-lter-bnz.69.13).
>
> in addition, for more complex rearrangements of data (for example, when
> several data packages get merged into one), we provide a mechanism for
> indicating that the new data set was derived from the multiple earlier data
> sets (using the prov:wasDerivedFrom property). This goes into the data
> package description. These complex derivation relationships now show up on
> the DataONE web site, showing the provenance relationships among objects
> directly. When creating these newly derived products, if you don't want
> the old packages to also show up in searches, then the old packages can be
> marked as 'archived'. When you do that, the new packages will show up in
> searches, and the old packages will be listed as the source of the new
> package, but only the new packages would show up in search results.
>
> All of these features are available to the PASTA system when it submits
> metadata records to DataONE. I think it covers a lot of what you are
> requesting in this thread. Happy to help with any followup discussion where
> needed.
>
> Matt
>
>
> On Sun, Feb 28, 2016 at 11:32 AM, Margaret O'Brien <
> margaret.obrien at ucsb.edu> wrote:
>
>> Hey folks -
>> These are issues of "Dataset design best practices", and we do have a
>> working group for this. I think that pretty quickly, a group of 4-5 of us
>> on VTC could iron out some recommendations for this particular question,
>> based on James's email to Jim.
>>
>> Maybe we can schedule the April water cooler for this? My calendar says
>> that the March subject is to continue with the IMC-NIMO relations. If
>> anyone needs to talk over ideas ahead of that, we can schedule something ad
>> hoc.
>>
>> Keep in mind that the removal of redundant datasets in D1 that Linda
>> refers to was about the older datasets that came in from the Metacat node.
>> Some sites did redesign their collections with the move to pasta, but these
>> are really two different issues.
>>
>> Margaret
>>
>> -----------
>> Margaret O'Brien
>> Information Management
>> Santa Barbara Coastal LTER
>> Marine Science Institute, UCSB
>> Santa Barbara, CA 93106
>> 805-893-2071 (voice)
>> http://sbc.lternet.edu
>>
>> On 2/28/16 10:16 AM, Jonathan Walsh wrote:
>>
>>> James makes a good point that I forgot to consider. Those stream
>>> temperature DOIs are already a part of the public domain and may well have
>>> been used and cited so it's silly to consider deleting them.
>>>
>>> It would be nice if we could control what PASTA lists for the search
>>> results so that deprecated data sets would still be available but not show
>>> up as a first choice for the simple search. And then maybe in advanced
>>> search there could be a toggle to display deprecated datasets. My goal is
>>> to keep our premiere datasets, which are more and more multi dataset
>>> packages, from being buried by our older, not-as-useful datasets to someone
>>> browsing our data.
>>>
>>> I like the idea of including provenance EML for each of the deprecated
>>> packages in my new multi dataset packages to provide a path forward and
>>> backward between the old and new. I would definitely like to do that. I'm
>>> also working on a multi dataset package for our telephone survey GIS
>>> componentry and it's a similar situation.
>>>
>>> A strategy to help ensure no new copies of deprecated packages get
>>> released into the wild would indeed be a good topic for a call.
>>>
>>> Thanks for a
>>>
>>>
>>>
>>> On Sat, Feb 27, 2016 at 4:52 PM, James Laundre <jlaundre at mbl.edu
>>> <mailto:jlaundre at mbl.edu>> wrote:
>>>
>>> Hi Jonathan,
>>>
>>> I have emailed Mark and Jame Brunt about deleting files from the
>>> LTER Network Data Portal since we are combining some of our yearly
>>> files into multiyear files. The email from James is below. I have
>>> just started the process of deprecating the old data sets and have
>>> not yet look into including the provenance EML.
>>>
>>> One suggestion I have is to put a note in the abstract of the
>>> deprecated data set that explains and points to the new multiyear
>>> data set. The abstract most likely will be read by people.
>>>
>>> Cheers,
>>>
>>> Jim
>>>
>>>
>>> *From: *"James Brunt" <jbrunt at lternet.edu <mailto:jbrunt at lternet.edu
>>> >>
>>> *To: *jlaundre at mbl.edu <mailto:jlaundre at mbl.edu>
>>> *Cc: *"Mark Servilla" <servilla at LTERnet.edu>
>>> *Sent: *Monday, April 13, 2015 4:14:01 PM
>>> *Subject: *deleting data sets
>>>
>>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Hi Jim -
>>>
>>> Mark forwarded me your request regarding deleting data packages. If I
>>> understand correctly what you are doing is creating new data packages
>>> that cover a series of years that you want to replace the individual
>>> annual data packages.
>>>
>>> The individual packages that have already been published are
>>> basically
>>> in the public domain having been registered with DataCite and
>>> received
>>> a DOI, and been contributed to DataONE and from there possibly
>>> beyond.
>>> These data packages have potentially been used and cited in journals
>>> and we have an obligation to make sure the original is still
>>> available.
>>>
>>> We can however make sure that no new copies of the deprecated data
>>> packages get released into the wild. This might require a call to
>>> discuss further but basically the process would be to update all of
>>> the impacted data packages with a revison that closes public read
>>> access in the EML to make them private. I'm assuming that you would
>>> issue new ID numbers. (If you were planning to update say the first
>>> ID number in each series this would still work to deprecate all the
>>> other data package IDs.)
>>>
>>> If you wanted to make a slightly cleaner and more elegant transition
>>> you could include the provenance EML for each of the deprecated
>>> packages in your new package EML. That way there is a path forward
>>> and
>>> backward between the old and the new.
>>>
>>> I'm sure this probably isn't what you want to hear since you were
>>> probably hoping to create a more compact list of ARC data packages.
>>> There were a number of finer grained tweaks to control the display
>>> like this that we had hoped to implement that had to be abandoned
>>> when
>>> NSF cut our funding.
>>>
>>> All that said, it is still technically possible to delete a data
>>> package from PASTA but it's only through the API and wouldn't have
>>> any
>>> effect on those records already in DataONE and the wild. The Scope
>>> and
>>> Identifier are marked as deleted and cannot subsequently be reused.
>>> We
>>> would discourage this for the reason of our public obligation stated
>>> above and have only used it under extreme circumstances.
>>>
>>> I'm happy to continue this discussion to fine tune your strategy as
>>> you feel necessary.
>>>
>>> Regards,
>>>
>>> James
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>> *From: *"Linda A Powell" <powell at fiu.edu <mailto:powell at fiu.edu>>
>>> *To: *"Jonathan Walsh" <walshjcaryinstitute at gmail.com
>>> <mailto:walshjcaryinstitute at gmail.com>>, "IM committee"
>>> <im at lternet.edu <mailto:im at lternet.edu>>
>>> *Sent: *Saturday, February 27, 2016 1:02:36 PM
>>> *Subject: *Re: [LTER-im] Removing from PASTA?
>>>
>>>
>>> Hi Jonathan,
>>>
>>>
>>> A short while ago the entire IM community with the exception of
>>> Suzanne (via doodle poll) wanted to have all the old metacat
>>> (PASTA) records (that we thought were deleted) removed from being
>>> seen in DataOne. Suzanne agreed with this practice but can't have
>>> her files deleted yet as they are still transferring files into
>>> PASTA. Mark was going to write a script for DataOne that would
>>> hide/remove these old files from the DataOne users and I don't
>>> know where he is in the process but hopefully it will be finished
>>> soon.
>>>
>>>
>>> I've not deleted a file in PASTA yet so I don't know how well the
>>> process works. I suspect that the old, removed, files might not
>>> show up in PASTA but may be pushed to DataOne. Hopefully Mark can
>>> speak to this.
>>>
>>>
>>> Best,
>>>
>>>
>>> Linda
>>>
>>>
>>> Linda Powell
>>> Information Manager
>>> Florida Coastal Everglades LTER Program
>>> OE 148, Florida International University
>>> University Park
>>> Miami, Florida 33199
>>> Phone (Tallahassee, FL): 850-745-0381 <tel:850-745-0381>
>>> Phone(Miami,FL): 305-856-0039 <tel:305-856-0039> or 305-348-6054
>>> <tel:305-348-6054>
>>> Website: http://fcelter.fiu.edu
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>> *From:* im <im-bounces at lists.lternet.edu
>>> <mailto:im-bounces at lists.lternet.edu>> on behalf of Jonathan Walsh
>>> <walshjcaryinstitute at gmail.com <mailto:walshjcaryinstitute at gmail.com
>>> >>
>>> *Sent:* Saturday, February 27, 2016 12:05 PM
>>> *To:* IM committee
>>> *Subject:* [LTER-im] Removing from PASTA?
>>> I think this topic has come up in the past but I do not recall the
>>> outcome and if so I apologize for that.
>>>
>>> How can I remove records from PASTA? I have a whole bunch of
>>> stream temperature files that I would like to combine. Then I
>>> would like to remove the old ones.
>>>
>>> The reason for this is when one browses BES on PASTA one sees
>>> mostly stream temperature files and it's confusing.
>>>
>>> Thank you
>>>
>>>
>>>
>>> -- Information Manager, Baltimore Ecosystem Study
>>> Institute of Ecosystem Studies
>>> Box AB; Route 44A
>>> Millbrook, NY 12545-0129
>>> P: 845/677/7600 Extension 103
>>> <tel:845%2F677%2F7600%20Extension%20103>
>>> F: 845/677/5976 <tel:845%2F677%2F5976>
>>> E: WalshJ at EcoStudies.org
>>>
>>> _______________________________________________
>>> Long Term Ecological Research Network
>>> im mailing list
>>> im at lternet.edu <mailto:im at lternet.edu>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Information Manager, Baltimore Ecosystem Study
>>> Institute of Ecosystem Studies
>>> Box AB; Route 44A
>>> Millbrook, NY 12545-0129
>>> P: 845/677/7600 Extension 103 <tel:845%2F677%2F7600%20Extension%20103>
>>> F: 845/677/5976 <tel:845%2F677%2F5976>
>>> E: WalshJ at EcoStudies.org
>>>
>>>
>>> _______________________________________________
>>> Long Term Ecological Research Network
>>> im mailing list
>>> im at lternet.edu
>>>
>>>
>> _______________________________________________
>> Long Term Ecological Research Network
>> im mailing list
>> im at lternet.edu
>>
>>
>
> _______________________________________________
> Long Term Ecological Research Network
> im mailing list
> im at lternet.edu
>
>
>
--
Information Manager, Baltimore Ecosystem Study
Institute of Ecosystem Studies
Box AB; Route 44A
Millbrook, NY 12545-0129
P: 845/677/7600 Extension 103
F: 845/677/5976
E: WalshJ at EcoStudies.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lternet.edu/pipermail/im/attachments/20160229/bdcee4ad/attachment-0001.html>
More information about the im
mailing list