[LTER-im-rep] Poll Question concerning LTER Metacat Content and whether it should be archived and hidden during DataONE search and discovery
Linda A Powell
powell at fiu.edu
Thu Nov 12 11:16:31 MST 2015
With respect, I’d like to mention that the FCE LTER Program implemented a ‘versioning’ protocol at its inception in 2000….long before the adoption of EML and LTER Best Practices for handling data. The FCE saw the importance of never overwriting data and allowing users to always be able to download the exact dataset they may have downloaded years earlier. We were always commended by our NSF review teams for the fact that we followed this protocol.
One of the reasons we decided to ‘yield’ to the LTER Network practices in 2011 was the fact that a script couldn’t or wouldn’t be written so that only the most current version of the FCE data was displayed on the Metacat Portal (we had such a feature on the FCE Data Resource Page) and the fact that the files weren’t even ‘ordered’ when displayed on that portal. As you can imagine, users had to read a listing of versioned files to find the most recent and in some cases, it appeared in the middle of the list!
I sat down with Mark Servilla and Duane Costa at the Santa Barbara IM meeting in 2011 to discuss us possibly making changes and doing away with our ‘versioned’ data and at NO time was there a discussion about possible issues with DataONE displaying our ‘deleted’ or ‘archived’ data. In fact, I don’t think we had started migrating the LTER data into DataONE at that point in time. Those FCE data that were deleted from Metacat were NOT incorrect in their scientific value but had to be 'deleted' because of the package ID constraints within Metacat (can't reuse IDs).
We (FCE) realize that removing datasets is not a common occurrence but in this instance what choice did we have in order to move forward? We should have the right to delete or hide datasets from the search and discovery in Metacat and/or DataOne as we (FCE) are the owners of those data.
Florida Coastal Everglades LTER Program
OE 148, Florida International University
Miami, Florida 33199
Phone (Tallahassee, FL): 850-745-0381
Phone(Miami,FL): 305-856-0039 or 305-348-6054
From: im-rep <im-rep-bounces at lists.lternet.edu> on behalf of Margaret O'Brien <margaret.obrien at ucsb.edu>
Sent: Thursday, November 12, 2015 12:44 PM
To: im-rep at lists.lternet.edu; Evelyn Gaiser
Subject: Re: [LTER-im-rep] Poll Question concerning LTER Metacat Content and whether it should be archived and hidden during DataONE search and discovery
Hi Inigo -
Some of your comments indicate a basic lack of understanding of past and
current network systems. I'd like to correct a few of your points here.
Please see comments inline.
im-rep at lists.lternet.edu
On 11/12/15 8:50 AM, Inigo San Gil wrote:
> Deleted (metacat) metadata should have not been exposed to dataOne in
> the first place. I am surprised, even if they call those data
> 'archived'. There are old versions, and then there are deprecated,
> deleted data. Any serious IMS can make that distinction. It is good to
> see metacat is gone, for that, and many other reasons. (it is not
> really gone, though.)
Metacat was perfectly capable of distinguishing between revisions. FCE
had a ad hoc definition of "data set revision" which was inconsistent
with what the rest of the network was doing. Unfortunately, their
practice didn't get aligned with the rest of the network till after the
first DataONE submission. That's why FCE datasets appear the way they do.
> Also, data that is erased at origin (@ FCE, due to whatever reason),
> should be deleted at the public repositories -not sure how-.
I disagree. data that are in repositories should have undergone
sufficient review to be appropriate for the public before it was
submitted. There exists a manual process for removal (for rare cases).
> Due to shortcomings with the repository systems we have been using,
> this deletion has been a headache. Always.
Removing datasets is not a common occurrence, so should not be a trivial
> At best, we were forced by default to retire our numerical
> identifiers, this has been quite irritating. At worse, unwanted, bad
> data remains public through these repos.
Identifiers, if they were carefully assigned originally and in a robust
system, were not retired. See any SBC dataset in PASTA, and you will see
that the same identifier was used for previous revisions in Metacat. I
know this is true for at least half the sites as well (GCE, MCR, HFR,
VCR, AND, NTL among others).
> There is a silver lining to all this, old versions are just that, old
> versions. Anybody that is able to get the hands on an old version of
> data ought to realize that this is not the most current version.
This is true, and has always been how our systems work (metacat and pasta).
> I would ask dataOne to erase that content if you are concerned.
It will be archived (not erased), so that if someone has a link, it will
> Cheers, Inigo
> On 11/12/2015 8:58 AM, Linda A Powell wrote:
>> Dear Information Managers,
>> As some of you may know, the FCE LTER program discontinued its
>> practice of file versioning where each updated data file would be
>> given a different file name (.v1, .v2, etc.) and a new EML package
>> ID. We initially had 525 data files that got combined into new
>> data files so our FCE data count decreased to 125 data files. I
>> started the EML packaging ID numbers for the newly combined data
>> files at knb-lter-fce.1050 (well beyond the last package ID
>> knb-lter-fce.525) and I personally deleted all the old versioned data
>> from the LTER Metacat. I then added the 125 new files back into the
>> Metacat and PASTA.
>> Unfortunately, those files were never really deleted from Metacat,
>> only archived, and when the Metacat files were harvested into
>> DataONE, *ALL* my files, including those I thought were ‘deleted’ and
>> the existing, were uploaded. Now the FCE has a big mess! The old
>> 'deleted' files are listed but none of the files exist any longer so
>> the links to the data don’t work. I’m sure the DataONE users are
>> frustrated! There may be Metacat files that other IMs have thought
>> were deleted that are also showing up in DataONE.
>> */My question to the LTER IMs is whether the content that existed in
>> the LTER Metacat should be archived and made hidden during search and
>> discovery from the DataONE infrastructure (i.e. ONEMercury, CN API,
>> etc.)? /* I've asked Mark Servilla to help with this issue and we
>> thought It would be simplest and scale economically if he could
>> perform this operation at one time and for all LTER site content as
>> opposed to performing this operation for each site independently. Of
>> course we want input from the IMC before we move forward.
>> *I’ve created a Doodle Poll (https://urldefense.proofpoint.com/v2/url?u=http-3A__doodle.com_poll_uw87khqqyhhsiry5&d=AwIGaQ&c=1QsCMERiq7JOmEnKpsSyjg&r=rMjlJunaBgd7uJf-sYlFRA&m=RVv8KlYGFNDj9xLDYUDq5stTIDQI2iy0uF-L1UhVBnQ&s=aiyi-APYh7qQ_iTwA0ej-Yjcuy4vQAZ3gyq2n63drl0&e= )
>> and would appreciate input from EACH of the LTER site IMs as to
>> whether the content that existed in the LTER Metacat should be
>> archived and made hidden during search and discovery from the DataONE
>> infrastructure? Please select ‘Yes’ or ‘No’. *
>> Thank you in advance for your participation!
>> Kindest Regards,
>> Linda Powell
>> Information Manager
>> Florida Coastal Everglades LTER Program
>> OE 148, Florida International University
>> University Park
>> Miami, Florida 33199
>> Phone (Tallahassee, FL): 850-745-0381
>> Phone(Miami,FL): 305-856-0039 or 305-348-6054
>> Website: http://fcelter.fiu.edu
>> Long Term Ecological Research Network
>> im-rep mailing list
>> im-rep at lternet.edu
Santa Barbara Coastal LTER
Marine Science Institute, UCSB
Santa Barbara, CA 93106
> Long Term Ecological Research Network
> im-rep mailing list
> im-rep at lternet.edu
Long Term Ecological Research Network
im-rep mailing list
im-rep at lternet.edu
More information about the im-rep