Re: Beyond another cloud: data service discovery for NDSLabs


Hi Arthur,

Sorry it's taken me a few days to reply, but I've been pondering your email for a while and trying to formulate a response to it.

I think perhaps what I've been trying to get at in thinking about NDS Labs and trying to spur the conversation, is to figure out what's the best possible way to foster interoperability -- specifically, what can we do, now, to create an environment to explore and experiment. I originally thought we could do this by providing:

Â* Communication mechanisms
Â* Gradual growth and incubation of components that are connectable
Â* Simple start, complex end

Perhaps, though, approaching this from "service discovery" is the wrong way. Adding more indirection, reimplementing things that have been done before -- both are tricky, like you point out, and probably are best to avoid for the time being.

Service discovery in general may simply be too *big* a sandbox for NDSLabs, where individual projects and instances will likely number in the handful, not the hundreds, and where the N^2 process of developing interoperability is going to be relatively small. If we can standardize generic k-v pairs but can't manage service discovery without them, we probably are doing something wrong! Â:)

So let's try what you suggested, and hammer down on what the specific, difficult technical things are that we want to do, and then figure out how to implement them.

Here are the things that for a "next gen epiphyte" I know I would want:

Â* What are the possible applications I can send input data to
Â* Where can the resultant data be sent
Â* Where can selections of data be obtained

I'd also like to provide as much as possible in the simplest technology available -- which means that OAI will be a goal, but not the only conduit for data.

For epiphyte, we're also taking the tactic that (for now) the data is all in the form of files on disk. This won't work forever, but it will for the time being. If we were going to take those three "services," I think I would want to know a host/port, and some format for the transmittance of data -- perhaps even just REST posting, or a URI to a filesystem/filesystem-like thing. If we had that information it might be enough to provide some degree of interop at that level, with a basis of understanding we can build more complex ideas on top of later.

Does that help to focus the ideas more? Or is this still complexity in search of need?

-Matt

On Thu Oct 30 2014 at 2:17:34 PM Arthur Smith <apsmith@xxxxxxx> wrote:
That does sound interesting. However, it also reminds me of RFC 1925:

http://tools.ietf.org/html/rfc1925

in particular "6a - It is always possible to add another level of indirection. " and perhaps #11 as well... Lots of wisdom in the old IETF...

I really liked your talk about what you'd done with Epiphyte - in particular making hard things easy. Very impressive work. Is there some way to organize this by starting from the "hard" use cases NDS labs is trying to address, and drill down to the technology components really needed to make that happen? Discovery does seem likely to be a good part of it, but if it's based on key-value pairs (for example) how does the user know what keys to query, who sets the standards for those keys and meanings of corresponding values? Aside from knowing where exactly the etcd server or whatever is doing that work is. There's got to be some base starting point, a system that knows enough to help the user do things, can we work from there?


ÂÂ Arthur



On 10/30/14, 11:42 AM, Matthew Turk wrote:
Hi all,

In the other thread, Arthur brought up that we don't want "just another cloud infrastructure," which I think was really apt, and something that deserves thought for any NDS Labs project. So I wanted to start a couple topics about what can be provided on top of a standard cloud infrastructure that might be of use.

I'm wondering about discovering data services within a region, where that region is either some subnet on a cloud provider, or even more globally across locations. If we are thinking about interoperability of services, then there are probably a few verbs that could be identified as being necessary. If we can have services identify themselves as providing verb endpoints, that could provide an environment for testing interop.

Kacper and I have been experimenting with this ourselves, mostly looking at the various service discovery mechanisms that operate on docker containers being orchestrated across machines. Some of these do this via introspection, and some will even set up automatic (nginx) reverse proxies for docker containers running inside a system. Right now it looks like etcd is a pretty good solution for this:

https://coreos.com/docs/distributed-configuration/getting-started-with-etcd/
http://www.activestate.com/blog/2014/03/brandon-philips-explains-etcd
https://github.com/coreos/etcd

as it can allow for key/value pairs to be stored, and it's discoverable. For instance:

http://jasonwilder.com/blog/2014/07/15/docker-service-discovery/

I think having a discussion about what we want services to be able to do is perhaps a much bigger topic, but I wonder if this type of thing -- particularly etcd -- would be useful to any projects, and would be a good avenue for service discovery and intertop. Is there something else that would be better?

-Matt



Other Mailing lists | Author Index | Date Index | Subject Index | Thread Index