Pilot Projects

Suggest a pilot

NDS sponsors pilot projects and engages in collaborative (funded) efforts to help build the NDS community and prototype the NDS infrastructure. Projects include:

TERRA-REF uses advanced crop analytics to increase accelerate breeding and the commercial release of high-yield bioenergy sorghum hybrids. The project utilizes NDS Labs Workbench to launch analysis environments from within the dataset viewer. It also uses NDS Labs Workbench for training. Tutorial sessions provide participants with hands-on experience using specialized Jupyter Notebook and RStudio environments to analyze TERRA-REF data products.

See also:

The Data Curation Educational Workbench provides a platform for students to gain hands-on experience with data curation software and tools using the NDS Lab Workbench. The platform brings together a core set of tools to support data curation learning objectives and allows both on-campus and online students to gain experience and experiment with the tools without the interference of setup and administration. The workbench will be piloted with students at the University of Washington Information School enrolled in the Master of Library and Information Science program and the Master of Information Management program, with regular offerings of online sections. After the initial pilot phase, including evaluation and iterative improvement, the workbench will be made available to the broader educational community, particularly Information Schools and other programs offering curriculum in data curation, data management, and data science. A later phase of development would be required to extend the platform for use by practitioners as part of self-guided professional development in data curation.

NDS is partnering with University of Illinois faculty to build a user-friendly platform for plant scientists around the globe who are working on the food security challenge.

As the Earth's population climbs toward 9 billion by 2050—and the world climate continues to change, affecting temperatures, weather patterns, water supply, and even the seasons—future food security has become a grand world challenge. Accurate prediction of how food crops react to climate change will play a critical role in ensuring food security. An ability to computationally mimic the growth, development and response of plants to the environment will allow researchers to conduct many more experiments than can realistically be achieved in the field. Designing more sustainable crops to increase productivity depends on complex interactions between genetics, environment, and ecosystem. Therefore, creation of an in silico—computer simulation—platform that can link models across different biological scales, from cell to ecosystem level, has the potential to provide more accurate simulations of plant response to the environment than any single model could alone.

See also:

Terra is the flagship of NASA's Earth Observing System. Launched in 1999, Terra's five instruments continue to gather data that enable scientists to address fundamental questions that are central to the six NASA Earth Science Research Focus Areas. It is amongst the most popular NASA datasets, serving not only the scientific community, but also governmental, commercial, and educational communities.

The strength of the Terra mission has always been rooted in its five instruments and the ability to fuse the instrument data together for obtaining greater quality of information for Earth Science compared to individual instruments alone. As the data volume grows and the central Earth Science questions shift from process-oriented to climate-oriented questions, the need for data fusion and the ability for scientists to perform large-scale analytics with long records have never been greater. The challenge is particularly acute for Terra, given its growing volume of data (> 1 petabyte), the storage of different instrument data at different archive centers, the different file formats and projection systems employed for different instrument data, and the inadequate cyberinfrastructure for scientists to access and process whole-mission fusion data (including Level 1 data). Sharing newly derived Terra products with the rest of the world also poses challenges. The ACCESS to Terra Data Fusion Products effort is developing data sharing and access protocols in step with the NDS Share vision.

KnowEnG (pronounced "knowing") is a National Institutes of Health-funded initiative  that brings together researchers from the University of Illinois and the Mayo Clinic to create a Center of Excellence in Big Data Computing. It is part of the Big Data to Knowledge (BD2K) Initiative that NIH launched in 2012 to tap the wealth of information contained in biomedical Big Data. KnowEnG is one of 11 Centers of Excellence in Big Data Computing funded by NIH in 2014.

This four-year project is creating a platform where biomedical scientists, clinical researchers, and bioinformaticians can bring their own data and perform common as well as advanced analysis tasks, guided by the "knowledge network," a large compendium of public-domain data. The knowledge network embodies community data on genes, proteins, functions, species, and phenotypes, and relationships among them. Instead of analyzing their data set in an isolated fashion, researchers will be able to go straight to asking global questions. The infrastructure, capacity and tools will grow with the datasets.

See also:

The Materials Data Facility (MDF) is a collaboration between Globus at the University of Chicago, the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign, and the Center for Hierarchical Materials Design (CHiMaD)—a NIST-funded center of excellence. MDF is a scalable repository where materials scientists can publish, preserve, and share research data. The repository provides a focal point for the materials community, enabling publication and discovery of materials data of all sizes.

MDF is developing key data services for materials researchers with the goal of promoting open data sharing, simplifying data publication and curation workflows, encouraging data reuse, and providing powerful data discovery interfaces for data of all sizes and sources. Specifically, MDF services will allow individual researchers and institutions to 1) enable publication of large research datasets with flexible policies; 2) grant the ability to publish data directly from local storage, institutional data stores, or from cloud storage, without third-party publishers; 3) build extensible domain-specific metadata and automated metadata ingestion scripts for key data types; 4) develop publication workflows; 5) register a variety of resources for broader community discovery; and 6) access a discovery model that allows researchers to search, interrogate, and build upon existing published data.

See also:

Using the NDS Labs environment, NIST developers deployed the following pilots, allowing for rapid prototyping and accurate requirements building for the production versions.

Using the powerful visualization and analysis package, yt , as an exemplar, this project is creating flexible and reusable recipes for creating presentations of data customized for a particular community. Going beyond the simple splash page, this project leverages cloud technologies for putting advanced interfaces in front of data. In particular, it enables scientists to safely apply custom analysis to remote data in the form of, for example, Python scripts.

This pilot effort is utilizing NDS Labs resources to host its archive of simulations. NDS staff run a specialized server where an NDS-inspired set of tools allows users to view Jupyter Notebooks , run analysis in Dockerized containers and to add their own findings in additional Jupyter Notebooks.

Would you like to collaborate with NDS? Do you have an idea for a project using the NDS infrastructure? Suggest a pilot!