[LTER-im] LTER Data Access Policy Request for Comments

Mark Servilla mark.servilla at gmail.com
Fri Apr 14 15:30:04 PDT 2017


Hi Don,

With regard to your request for "EDI to devise a way to filter bot requests
from download reports", we have recently modified the capture of request
information to include robot identification in the audit report output. For
example,

robot: Mozilla/5.0 (compatible; Googlebot/2.1; +
> http://www.google.com/bot.html)


The phrase "robot:" will preface any identified request that we believe to
originate from a robot and will be recorded as such in the audit log under
the "user" column (in lieu of "public"). Duane has implemented this
identification by using the robot suspect list provided by the Counter
project (https://www.projectcounter.org/), which is updated on a regular
basis (about every 2-6 months). This feature was just released this past
Wednesday evening (12 April) during our weekly updates. We are still fine
tuning the list to avoid false positives, but it is now functioning in the
production PASTA environment. I realize that this process does not
technically filter out robot requests, but we hope it will suffice to
identify them in the audit logs.

Sincerely,
Mark

---
Mark Servilla
mark.servilla at gmail.com

On Fri, Apr 14, 2017 at 2:43 PM, Henshaw, Donald <
don.henshaw at oregonstate.edu> wrote:

>
>
> We have been tracking data downloads for each data set from our webpage
> for over 15 years and have included this information in LTER and USFS PNW
> annual reports and NSF proposals. While we have heard little feedback from
> NSF regarding their perspective on this, we feel this is valuable
> information. To better account for downloads of Andrews data we hope to
> also track downloads from PASTA and DataONE. We are concerned that the
> download counts from PASTA include web robot counts along with the genuine
> downloads.
>
>
>
> ยท              We encourage EDI to devise a way to filter bot requests
> from download reports
>
>
>
>
>
> _______________________________________________
> Long Term Ecological Research Network
> im mailing list
> im at lternet.edu
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lternet.edu/pipermail/im/attachments/20170414/61f11d01/attachment.html>


More information about the im mailing list