[LTER-im-rep] [LTER-imc-ddms] distributed model -- but what distribution?

Fri Jul 3 12:51:25 MDT 2015

Dear Inigo -
I think your outline of the "roles" is right on the mark, and we should 
include that in the summary doc. However, there are important functional 
features of the system the IMC is putting together that are different 
from the way a 'center' would operate, and we need to delineate those 
better. Our strength is that we already understand that to properly 
manage diverse data it takes diverse approaches, and our experience is 
that a "center" will almost certainly have a single approach. We want to 
avoid that being imposed from outside, and we don't want to recreate it 
ourselves, either.

1. In our system, those roles do not each represent an entire FTE. At a 
"center", these would probably be 1.0 FTE, which would be very 
expensive, and limit potential activities to that person's skills.

2. the tasks of any individual in a role will be specific to the project 
at hand. E.g, the database manager for PASTA has different expertise 
than the one for climDB, because the systems use different platforms.

3. Right away, you can see that some homogenization could be beneficial 
(e.g, DBs running on the same platform are easier for a single person to 
manage). Potential homogenization would be outlined in year 2, and 
enacted some time after. That enactment would move us closer to 
centralization, but in a way that accommodates the needs of the data and 
the current operations.

Also, John has some great text in his earlier response, and with (or 
without!) his permission, I'll put some into the summary doc.

Best,
Margaret

PS - I mentioned "homogenization", and earlier Inigo, you asked why some 
of the projects the IMC has developed have not taken off. I think you 
meant this rhetorically, and actually you were talking about 
site-approaches, not network approaches. But I'm going to add my $0.02 
anyway, because I agree with you that this is frustrating, and that 
there's huge value in reducing the ad hoc heterogeneity we all see. The 
road block we hit is that homogenization takes considerable time, 
expertise, and coordination at multiple levels, and maybe most 
important, broad recognition of its value and potential efficiency. Plus 
to our credit, each of us is dedicated to handling transitions without 
causing hiccups in our current operations; we don't shut everything down 
so we can tinker.

That's why many of the solutions we've come up with so far don't gain 
traction fast - one or more of those elements is missing (time, 
expertise, coordination, recognition of value), and we have to wait for 
them all to converge.

But they do; there are plenty of examples of homogenization taking hold. 
Sites needing one new component (eg. an RDBMS) have switched to 
Metabase, or to GCE-toolbox. Sites needing an all-inclusive form-based 
system have installed DEIMS. About a third of sites use EML for local 
catalogs, some with identical templates; a few use researchProject EML, 
too. We all adhere to the same set of structural checks on our datasets 
when submitting them to PASTA.

This is going to be true for network services as well as for site-level 
work. But a thoughtfully designed de-centralized system is (IMO) the 
best way for us to move toward that goal.

Oops. the PS was as long as the body.

-----------
Margaret O'Brien
Information Management
Santa Barbara Coastal LTER
Marine Science Institute, UCSB
Santa Barbara, CA 93106
805-893-2071 (voice)
http://sbc.lternet.edu

On 7/2/15 9:18 PM, Inigo San Gil wrote:
>
> Hi John,
>
> Thanks for your kind effort.  In the three points that you touch upon, 
> I see something in 1) and 3).
>
> Perhaps I see things different.  Is centralization something we need 
> to stay away from?  My impression is that we can be more effective by 
> coordinating and harmonizing better our efforts, energies and 
> abilities. I.e, efforts towards common (central?) network goals, in a 
> distributed environment, the interesting paradigm is that we work 
> towards homogenization while respecting the intrinsic distributed 
> character of the group. Just fostering an even loosely connected 
> federation seems silly, against the tighter network spirit.
>
> I am sorry you feel the process is somewhat tedious. It is definitely 
> not an easy process. One way you may find some excitement in the 
> boredom is keeping an eye on what could happen if the final outcome is 
> good.  The way I see it is that we need to flush out more the model to 
> actually present something that sounds effective, distributed but 
> network oriented.  You may disagree, but as it stands, most of what I 
> see is more of the same.  And this may be OK, but over the comfort of 
> the situation we did spot some obvious, correctible deficiencies 
> (repetition, redundancies). If not us, somebody else will call them 
> for us, and then, a good chance to re-think the network data center 
> will be lost.  Bottom line,  here is the opportunity to make a better 
> network providing a great data center - sorry if the process (of 
> actually finding the right proposal) feels tedious to you, but we are 
> not all as clear as you may see this yet.
>
> The proposed model is rather similar to the stuff that is in place, we 
> may find the same or similar impediments that the current model 
> encountered. At times is a matter of labels, but substance seems the 
> same, for good (and yes, there is a lot of good) and bad.  You may 
> say, what's the problem with that? we are good, right? Yes, but we can 
> be better, and there is much room for improvement. Here are two things 
> A) we can use more common approaches to information management, thus 
> leveraging our common knowledge.  We do that, right? Yes, but this 
> time.. how we do this for real (some commitments, without so many 
> sticks prodding each other) B) We need to examine what did not work 
> (example: the IM compensation for network time) and avoid the same 
> unexpected outcomes in the proposed plans.
>
> In your point 1) we discuss about prioritization, and what gets done 
> and who directs the LTER Data Center.  You tell us that this new model 
> reflects the distributed characteristic because prioritization and 
> scheduling would be distributed in the sense that the folks serving in 
> the Gov. Committee are indeed distributed and rotating. I can eat 
> that, but, do tell me how is that more efficient to the same 
> distributed aspects of the process we have in place (NISAC, EB, SC, 
> Bob, NSF)? The key aspects of prioritization at high level may not be 
> as important as the involvement at the very detailed level. So, I see 
> this a bit of a distraction -- Perhaps I would like to see articulated 
> how is the larger community going to be vested in the project.  
> Perhaps we can agree that we can have a more enthusiastic and 
> supporting steering, and the real question regarding 1) is whether the 
> Gov. Committee is not going to be "it", is the drivers of the work 
> that pay attention how real innovation is done who may feel the 
> excitement (if history is an indication).
>
> Here are some details that I draw from experience of what I have seen 
> in our own network.
>
> And here is my larger concern.  I hear about figures about the budget, 
> between 700k to 1mil per year (opinions abound). That is not much 
> IMO.  The actual team that will make LTER shine in the IM aspects 
> cannot be "two programmers and the occasional IM with a Project 
> Manager that may or may not be a PI (opinions about the qualifications 
> of such PM).  The real issue is that the whole team should include a 
> Project Manager(s), System Administrators, Content Strategists, 
> Database Administrators, Designers (Web and otherwise), Programers of 
> several flavors, Tech writers, Data Curators, Data Custodians, 
> Outreach and liaisons folk - a lobbyist such as Brian Wee's role would 
> be fine too. I am leaving some crucial roles forgive me. Obviously 
> 700k will not do it.  Some misinformed folks may believe that "this 
> can be done with a couple of graduate students", but for those of us 
> that can grasp the possibilities, we know that one very good 
> programmer and a good leader will not produce it all, and will leave 
> many things undone.  Being understaffed and under equipped may also 
> affects morale, which is a compounding problem.
>
> But I should get real.  A team of 15? No money.  Well, perhaps not, 
> but first, we need to make clear that 2 programmers will do the level 
> of work that we have seen (at best).  There is a way we have been 
> operating: we formed teams of 15 before amongst ourselves (mixed with 
> LNO sometimes).  Ofter, the lion's share of that team was carried by 
> one to three persons, with occasional quality involvement from others, 
> and a few, well, spectators (everybody plays a role!) who struggle to 
> find a footing for whichever circumstance.  But cool project got off 
> the ground, some even without sticks (do EML or else), and little 
> support.  Other times, real involvement happened with all members. I 
> think 15 is not too much to ask, as we have done it, but we can and 
> should improve vastly to take the data-center to a really brilliant 
> position that we all feel ownership.  I think the IMC can work like 
> that, but without us making the commitment, change, or adaptation, it 
> will not happen.   We are in front of a very pervasive problem, we may 
> not even detect it as we may convince ourselves that we got the right 
> plan for it.  Inertia creeps in, we will default to what we have been 
> doing for the last 10, 20 or 30 years, which while not bad, we can and 
> should do much better. At stake is the rare opportunity of taking 
> stock of what we did wrong, and build upon that knowledge to make LTER 
> shine through what always made feel like LTER may actually be network, 
> the IMs.
>
> Here is what I would like to see - and will contribute to that end - 
> for starters flush out how the IMC is really going to pad the 
> projected budget shortfall.  THere is the first item that may convince 
> me we can do this.  Also, I would communicate and explain why we 
> really need all the roles in a data center (and not all in one 
> person). The last is the fixing the problem with the financial model.  
> If we were discouraged to work for the network, even when $ was 
> around, identify the problems, and find the model that may work best.  
> And well, if this turns out to be "supplements", well, supplements it is.
>
> Cheers,
> Inigo
>
>
>
>
> On 7/2/2015 4:31 PM, John Porter wrote:
>> Inigo,
>>
>> A few quick comments. Many of these issues have been discussed at length
>> (some to the point of tedium) during the discussions of the groups
>> meeting (virtually) each Friday. I'll run through a few of them and try
>> to characterize why, even though there are centralized elements, the
>> model truly is distributed.
>>
>> 1) Who should make decisions about what projects money will be spent on?
>> Current model: Bob Waide is the ultimate arbiter regarding how LNO funds
>> are spent. He gets input from the EB, SC, NISAC and IMC, but ultimately
>> he's the one who needs to make sure the budgets balance.
>>
>> Proposed model: The current governance plans call for a Governance
>> Committee composed of LTER IM's and PIs. Although it is one committee,
>> it has distributed membership and will be taking input from the entire
>> IM and PI community. I'm hard pressed to think of a more distributed way
>> select priorities for the network.  I suppose we could go with a "Town
>> Meeting" approach where all the LTER IMs are required to consider and
>> vote on each issue that arises, but it is not clear that every IM wants
>> to spend a very large block of time each month doing this. The
>> expectation is that governance committee members will need to devote
>> significant amounts of time to this effort, and therefore, will rotate
>> frequently.
>>
>> There is a Project Manager whose job it is to support and implement the
>> decisions of the Governance Committee. The thinking is that we need
>> someone who can devote full time and concentration to tracking the
>> progress of individual initiatives, research budget alternatives to be
>> presented to the governance board, manage fiscal details and prepare
>> materials for NSF reports. Although this role COULD be performed by the
>> Governance Committee if they dropped all other activities, it seems to
>> make sense to have someone who can wholly concentrate on making sure
>> that initiatives move forward. However, the Project Manager does not
>> have decision making authority with respect to the major decisions
>> needed to support development of LTER systems.
>>
>> 2) Where should the computational hardware required be housed?
>> Current model: The LNO provides its own servers, at its central 
>> location.
>>
>> Proposed model: There is still wide discussion regarding the use of
>> commercial cloud services vs. contracting with a particular university
>> or company for providing storage, computation and network resources
>> needed to support LTER Network databases.  However, it is not currently
>> contemplated that the hardware will be physically associated with the
>> governance committee or the Project Manager. Although it might be
>> possible to implement a widely distributed model (e.g., each LTER site
>> runs one or more servers supporting one or more network databases),
>> there doesn't seem to be much of a motivation for doing so. I very
>> seldom am in the same room or building as servers we use - so moving
>> them further away or to the cloud is no problem.
>>
>> 3) Who should do the actual development of LTER Network Systems?
>> Current Model: IM's at the LNO do most of the work. As you noted and
>> lauded, occasionally, LTER site IMs are contracted to work on specific
>> projects, but this proved difficult because it involved paying double
>> overhead (once to UNM, again to the IM's home institution).
>>
>> Proposed Model: As a group, we would like to see much broader
>> participation of LTER site IM's in implementing network-wide solutions
>> and have been wrestling with possible solutions (independent contracts,
>> interagency personnel agreements) that would allow funding to flow
>> without the duplicate overhead problem. We don't want to depend on  work
>> diverted from site efforts. A principle has been that people working on
>> network projects should be funded to do so, independent of funding from
>> their LTER site.  If we can get around the overhead issue, it would
>> allow personnel from small groups of sites to receive funding to develop
>> tools to meet network priorities. However, there are some tools and
>> systems (e.g., PASTA, network database server administration) that may
>> require dedicated staff that would not be associated with LTER sites.
>> Additionally, there might be some projects that might demand in-depth
>> expertise in a particular software stack (e.g., Palantir and DEIMS) that
>> does not currently exist at any LTER site. It is not anticipated that
>> there would be a large, permanent staff.
>>
>> I could go on about how the fiscal administration is likely to be
>> physically disassociated from all the other parts, and from any other
>> entity that has its own priorities independent of LTER, but I think you
>> get the point. Decisions: Made by a committee with members drawn from
>> the LTER Network; Computer Systems: Cloud or contracted for separate
>> from the institution administering the grant; Work: Distributed among
>> LTER IM's if appropriate, or use of dedicated or outside staff, if not.
>>
>> I think you'd have to agree that that sounds more like a distributed
>> system than a centralized one! I'll admit that we haven't really
>> seriously discussed the ultimate in distributed systems: giving each
>> individual LTER site identical amounts of money to work on running and
>> improving needed network-wide databases and systems. However, it doesn't
>> take too much thinking to realize that, given limited funding, we need
>> some way to jointly identify and prioritize activities - and that a
>> fully distributed model provides no clear way to do that.
>>
>> Hope this helps!
>>
>>   -John Porter
>>
>> On 7/2/2015 5:34 PM, Inigo San Gil wrote:
>>> Dear IMs,
>>>
>>> Is there a bit of a lull on the DDMS, or perhaps, Im in the dark - I
>>> didnt hear about VTCs, or anything.  Well, I keep mulling these ideas,
>>> met with Bob Waide (get feedback), talk to my PI, and will talk some
>>> more. It is an interesting opportunity for the LTER to be mum, guys.
>>>
>>> We are working towards what is being coined "LTER Distributed IM 
>>> model".
>>>
>>> As I read through the folders (some documents dissapear..), and read 
>>> the
>>> paperwork thus far, I wonder whether a better name would be "LTER Data
>>> Center", and not really an "LTER Distributed IM model".  The reason 
>>> is I
>>> fail to identify the "distributed" part of the equation in the "model".
>>>
>>> Could you please identify for me what parts are "distributed"? I 
>>> fail to
>>> see those clearly.  I see governance schemes, with elaborate 
>>> diagrams. I
>>> see a financial model  and I see a service bucket tasks (wrote most of
>>> these tasks). We had budgeted a few items for the eventuality of 
>>> hosting
>>> solutions in the Cloud ( I would stay away of becoming a Guinea pig -
>>> re: NSF Dear letter).
>>>
>>> But in my mind, the LTER IM Distributed model always has played like 
>>> IMs
>>> working on a coordinate fashion to leverage our collective strengths,
>>> hence mitigating our individual gaps.
>>>
>>> I do not see that vision explicitly in the work thus far.  For me, the
>>> "LTER Distributed IM model" may entail a profound change in LTER and
>>> specifically, a change in the way we operate.  See, this far, we are
>>> site-centric, and network later (if at all).  Sure, we have EML medal,
>>> but it sounds more like the Euro for Europe in terms of integration/
>>> distributed (bad analogy, unless you are a Greek, then you may know 
>>> what
>>> I mean).
>>>
>>> Since I do not know, I would like you to tell each other what do you
>>> think working in a "LTER Distributed IM model" means.
>>>
>>> Yes, I am aware sites (PIs) "want to have their IM there 150% and on 
>>> the
>>> cheap".  True, we identified that we need a person at each site to 
>>> solve
>>> the day-to-day issues related with the handling of each site digital's
>>> assets (and some hardware, I must add).
>>>
>>> It sounds to me, that to have a concerted effort, you would have to 
>>> have
>>> those IMs really involved with the persons who are truly network
>>> dedicated (the Data Center staff) and I am unsure how is this going to
>>> work, given the experience thus far.  One example - after years of
>>> asking for compensation for "network-dedicated" time, we got it, and
>>> while it lasted (not too long), the resource or instrument was 
>>> seriously
>>> under-used.  (How much, I do not really know).  Point is "money" did 
>>> not
>>> kept us from being immersed in a network project.  Then what is it?
>>> Perhaps the weak link was motivation and purpose (those are the main
>>> drivers to get engaged in any activity, along with the mastery of the
>>> activity). The reason I ask, is cause the key on a "LTER Distributed IM
>>> model" seems to me the collaborative aspect of the human assets, that
>>> is, _you_ and _me_ working on a common project.
>>>
>>> Or perhaps, the "LTER Distributed IM model" means something else for
>>> you, I just would like to read what that means for you.  I may be 
>>> wrong,
>>> but there is quite a bit of room for status quo in the ideas that 
>>> bounce
>>> off the documents and watercoolers.
>>>
>>> Thanks for your comments,
>>> Inigo
>>>
>>>
>>>
>>>
>>> -- 
>>>
>>> Inigo San Gil
>>> +1 505 277 2625
>>> http://scholar.google.com/citations?user=foIppL4AAAAJ&hl=en
>>>
>>>
>>>
>>> _______________________________________________
>>> Long Term Ecological Research Network
>>> im-rep mailing list
>>> im-rep at lternet.edu
>>>
>
> _______________________________________________
> Long Term Ecological Research Network
> imc-ddms mailing list
> imc-ddms at lternet.edu