Skip to main content
Have a request for an upcoming news/science story? Submit a Request

Purdue receives $4 million NSF grant for StreamCI data streaming platform

  • Science Highlights
  • Anvil

Purdue University’s Rosen Center for Advanced Computing (RCAC) is leading a major National Science Foundation (NSF) grant awarded to create an artificial intelligence (AI)-ready streaming data platform for researchers across domains. This new platform, known as StreamCI, will significantly lower technical barriers associated with harnessing massive data streams, empowering experts from a wide-range of scientific fields to build intelligent and responsive applications that will be more efficient and effective than ever before.

In late June Image descriptionof 2025, the NSF awarded Purdue researchers $4 million over five years to develop and refine StreamCI. Carol Song, the Chief Scientist of RCAC, is the Principal Investigator (PI) on the project. She and the Research Software Engineering (RSE) team at RCAC will work alongside co-PIs Ananth Grama, Jian Jin, Michael Heinz, and Ming Qu, and senior personnel Martin Jun, Mohammad Jahanshahi, Kristen Bellisario, Jacob Hosen (all professors at Purdue) to bring StreamCI to life. The soul of the project lies in making continuous data streams (e.g. real-time sensor data) more manageable and AI-ready for scientists who may not be cyberinfrastructure experts, something the group felt was critically needed in the research community.

“This award,” says Song, “is the culmination of several year’s productive collaborations between RCAC and various Purdue research groups, including plant sensor innovations with Dr. Jian Jin’s group, smart manufacturing secure cyberinfrastructure led by Drs. Ananth Grama and Dongyan Xu of computer science, building energy studies led by Dr Ming Qu, and ecology field sensors used in research and classrooms with Dr. Jacob Hosen, to name a few. Not only have we created a prototype system to meet the research needs, through interactions with the researchers, we identified critical needs for a futuristic platform to enable and assist researchers to create AI-enabled applications.”

StreamCI is an open-source streaming data platform that simplifies the collection, management, and analysis of sensor data streams. A prototype system was developed by the staff at RCAC in 2020 and has been successfully used in multiple projects, which in turn has helped to drive the design and development of the proposed updated version of the platform. This new NSF grant will expand StreamCI into an AI-ready streaming data platform that supports data preparation and sharing, ML/AI model development and inference, and other computational workflows.

Streaming sensor data, or real-time data, is tremendously useful in a variety of domains. The issue many face when trying to utilize it is the sheer volume of data that the sensors provide. For example, collecting and monitoring environmental and smart manufacturing sensor data of buildings in real-time in order to reduce energy consumption and improve efficiency and sustainability (an actual use-case for the prototype StreamCI) can easily eclipse billions of data records. Many researchers across the nation lack the resources needed to store this amount of information. They also may lack the knowledge of how to properly manage, process, and analyze such large volumes of data, especially when that data is collected from different sources and is stored in multiple formats. StreamCI is a one-stop platform created to address all of these problems, and more.

“We are excited to have the opportunity to expand on the StreamCI prototype system by introducing key new capabilities, including support for audio and image data, configurability and programmability for domain scientists to construct their streaming data pipelines, and integration with sensor-driven ML/AI workflows,” says Lan Zhao, Senior Research Scientist on the RSE team at RCAC. “This project will collaborate closely with seven domain scientists across a wide range of disciplines to define requirements and validate the enhanced system, ensuring StreamCI's broad applicability, usability, and impact.”

StreamCI is designed to streamline the entire workflow for sensor data streams—from capturing raw sensor data to processing it at various levels of fidelity, anonymization, and transformation. The data streaming platform can also apply ML methods across different modalities and enable Findable, Accessible, Interoperable, and Reusable (FAIR) data sharing for research reproducibility and cross-domain science. Deployed on the infrastructure of Purdue’s Geddes Kubernetes Cluster, as well as Purdue’s Anvil cluster, StreamCI has been developed to be a user-friendly, cloud-based system that is widely accessible to the broad research community. The newest innovations that will be supported by the NSF grant are as follows:

  • High-level data abstractions that allow researchers to combine data streams across sources.
  • AI-readiness through novel tools for data preparation and canonical and customized data pipelines.
  • Low-code application development and integration of powerful tools.
  • Scalable, seamless service deployment and delivery.

The prototype version has been used across a variety of domains, including energy, manufacturing, transportation, agriculture, ecology, audiology, and education, showcasing the applicability and versatility of StreamCI. The platform will also serve as an educational resource, training the next generation of data- and AI-savvy researchers. To learn more about StreamCI and its prototype, please visit: https://mygeohub.org/groups/gabbs/aboutstreamci

Purdue University will host an upcoming training session for CI tools, including for StreamCI. The “Cyberinfrastructure for FAIR Science Workshop” takes place on August 4-5 at the Purdue University West Lafayette campus. Participants will familiarize themselves with and learn how to use the tools developed under an NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) project, in order to make research data and workflows more FAIR. The workshop activities will be organized around three themes:

  • Wrangling data - Resource intensive data processing pipelines using the GeoEDF workflow library
  • Managing continuous data streams - Sensor Data Management with StreamCI
  • Cyber training - Interactive online learning using the CyberFaces platform

Nationwide registration has closed, but registration is still open to Purdue researchers who do not need travel assistance. To learn more about the “Cyberinfrastructure for FAIR Science Workshop,” or to register, please visit: https://www.cyberfaces.org/learn/course/289-cyberinfrastructure-for-fair-science-workshop

StreamCI is funded under NSF award No. 2513947.

Written by: Jonathan Poole, poole43@purdue.edu

Originally posted: