Re: Some thoughts on NDS Labs


Hi Jim,

Thanks for the ideas -- I think they identify a number of important things that NDS Labs could provide. In reading them over, I wonder if we could identify some technical things that could support this vision. I came up with a few that I think might support this, but I'm not sure I am on the right track.

Â* Service discovery. If there were a small number of pragmatically defined *verbs* (eg, collect-with-OAI-ORE, receive-SWORD) or *service names* (SEAD, or GlobusOnline, or Authenticate) that described things that something provides, it could advertise itself. Things like zookeeper and etcd fleet might be able provide this information. Then an app could, similar to how "Rendezvous" used to claim to work, discovery what else is available.
Â* "Mixins" for developing containers or collections of containers. Âi.e., if you're launching a bundle, if you want something that fulfills the need to provision a new MySQL db, use this drop-in addon. Or, a container that connected to an event bus, or something. I'm speculating here, but I think that sharing the tech that performs specific functions would serve the purposes you laid out.
Â* Base images for "NDS Labs," for instance, something that includes all the necessary bits to get going with an app that uses all the other running services for data access, collecting, etc.
Â* A few services that were provided to all apps or orchestrations running on the NDS Labs -- for instance, an authentication system that managed talking to Globus Nexus or InCommon and provided internal tokens that were passed around. Or something that was able to *name* storage locations and manage mountpoints (e.g., iRODS, Ceph, etc) ÂThese could be ala carte.

Now, in addition to this, of course we'd need social interactions between different developers and groups to actually make this happen. This is where I think the hardest aspects of your first two points are going to come into play. I think identifying low-friction methods of communicating will be key here -- and by providing the location these things can be run (and I *hope* this will be federated, and with deployments at multiple sites!) then technology can be transferred directly. Do you have any suggestions for how we can facilitate this type of interaction?

Would you be interested in experimenting with deploying on something like Apache mesos / mesosphere? It seems to be a pretty low-friction way of deploying containers, and I'm rather interested in the service discovery aspects of it. But perhaps it's a poor fit, especially if service discovery is not necessary.

-Matt

On Tue Oct 28 2014 at 3:55:44 PM Jim Myers <myersjd@xxxxxxxxx> wrote:
I think my view would be consistent:

NDS Labs should be a shared space in which projects can:

* Develop/test interoperability
* Develop/test integration (of different types of services)
* Access and contribute to a growing national data space (NDS Share?) - i.e. being able to query/compute over 'real' data and to publish 'real' data collections
* Access and contribute to a growing set of test/demonstration data collections and shared/shareable services
* Develop/test scalability and scale with user demand (the latter linking to NDS Share?)
* Demonstrate/share best practice/new practice

In my mind these are things that are hard for any one of us to do on our own and they all build in a reason why collective action is better than individual efforts to use other clouds, etc. I can build test suites and reference services to interact with, but with NDS Labs, I hopefully only need to supply 1/Nth as much effort to do this. Similarly with data - I can look to my project's user base for test collections (different data types, size, metadata, provenance, ...), but NDS Labs will have more breadth and there's potentially 1/Nth as much work for us to contribute to shared test data.

Etcetera. The thing I like about this type of framing is that explains why you'd come to NDS Labs and what the ground rules might be (you must be able to expose your data/metadata/events through one or more standard APIs, you should identify one or more cross-project interactions/interoperations you wish to test, you should register your 'real' data products with NDS Share, ...), as well as detailing what NDS should provide (standardized machines to run pilots, reference services, and test runs, data/service test suites, test versions of NDS Share store/retrieve services, ...). (The intent would be to be lightweight - no one should be turned away from using NDS Labs because they aren't 'tall enough', but if you really don't want to follow any of the ground rules, you might wonder if you're in the wrong place, or these become criteria for deciding on larger resource requests. Similarly, rather than NDS Labs selecting a lot of specific services it will propose and run, if grou!
Âps of projects can make requests/do the work to create shared resources within NDS Labs, the labs will be agile. (If several groups want a type registry they can stand one up in NDS Labs. If a lot of projects have interest in a given service or data resource, perhaps NDS Labs has some labor to contribute to setup or maintenance...).)


Partial aside: This might be a way to rework the draft NDS vision too - NDS shares a vision with other organizations and projects with the consortium coming together to do just those things where coordination, particularly coordination around data and resources, will accelerate our collective ability to realize the vision. The NDS Labs framing above fits that and I think NDS Share could as well - we all face the issues of provisioning for large data and for the long-term and in guaranteeing researchers a state-of-the-art mix of base and advanced services for longer periods than our project lifetimes. If NDS Share became a place that data could go regardless of input system, where data would remain accessible after the input system went away, where new services and systems (production, pilot, or experimental) could interact with the data, where users could go to understand the range of capabilities available and make apples-to-apples comparisons, and where data could either s!
Âtay forever or be moved at the lowest cost to future repositories, it would, with the consortium backing it, provide better answers on these issues than any of us could at a single project level...

 - Jim


James D. Myers, Ph.D.
Research Investigator
School of Information
University of Michigan
217-417-1786
myersjd@xxxxxxxxx





-----Original Message-----
From: owner-discuss@nationaldataservice.org [mailto:owner-discuss@nationaldataservice.org] On Behalf Of Matthew Turk
Sent: Tuesday, October 28, 2014 3:28 PM
To: discuss@nationaldataservice.org
Subject: Re: Some thoughts on NDS Labs

Hi Ray,

On Tue, Oct 28, 2014 at 2:23 PM, Ray Plante <rplante@xxxxxxxxxxxx> wrote:
> Hi,
>
> On Tue, 28 Oct 2014, Arthur Smith wrote:
>> If NDS is to be the Interop (or NSFNet) to RDA's IETF, then it seems
>> to me like the focus of NDS labs should be implementing and
>> highlighting interoperability of implementations of data standards.
>> For example we heard about the data type registry:
>
> I would definately like to see NDS Labs be a platform for RDA ideas.
> I hope it can also be a lab for cultivating additional RDS WG
> activity, both building on the currently deliveries and inspiring new
> ones.
>
>> But I think we should keep the focus on the data side, not just on
>> putting together another cloud computing environment.
>
> It's been proposed that NDS Labs respond to requests for projects, so
> I would hope that there would be people from the RDA community
> inspired to come forward.
>
> That said, the NDS Labs proposal is essentially one of storage and
> compute resources and support to operate it. Your suggestion (puting
> a registry in a docker container) helps us understand whether the
> proposed environment can serve that role.

I think we should go way beyond just storage and compute -- it needs to be tied into strong community interaction, otherwise we are simply Amazon EC2 (which, I note, has lots of public datasets already) or the Open Science Data Cloud. I think having strong community buy-in, emphasizing interoperability between applications, and so on needs to be a component.

Can you say more about the "NDS labs proposal"?

>
> cheers,
> Ray
>




Other Mailing lists | Author Index | Date Index | Subject Index | Thread Index