The D. H. Hill Jr. Library will be closed this summer for electrical infrastructure repairs, starting May 5, 2025. About the Hill Library closure →
Updated Apr 30 11:06am
The D. H. Hill Jr. Library will be closed this summer for electrical infrastructure repairs, starting May 5, 2025. About the Hill Library closure →
Updated Apr 30 11:06am
Video still of growing microcolonies of the single-celled fungus, Saccharomyces cerevisiae, which is used to produce bread and wine. The green masks on top of some cells represent how a computer algorithm learns to detect mating yeast cells. (Miranda Lab)
Sometimes small problems lead to big solutions. When Plant and Microbial Biology researcher Orlando Arguello-Miranda needed a central place to store large image files for a 50-person workshop, he asked the Libraries’ Research Facilitation Service (RFS) for help. Simple enough, right?
“I reached out to the RFS for help because, when I came to NC State, I talked to representatives of the Libraries and they were really friendly and very open-minded to new ideas,” Arguello-Miranda says. “I remembered that they said ‘If you have any idea that might require processing information, or questions about how to store information, then you can come and talk to us. If we don't know, we'll get you in contact with somebody that knows.’”
In that conversation, the RFS opened larger questions about what Arguello-Miranda’s microscopy lab was looking for in these images and how best to help them find it. Soon, the RFS brought together the IT team at the College of Agriculture and Life Sciences (CALS) and NC State’s High-Performance Computing Services (HPC) to collaborate in building an AI-integrated web application to process lab images for user-designated information—a powerful tool that would be useful across the sciences.
Now, labs all over the world are using Arguello-Miranda’s application for all sorts of reasons—and further training the AI to improve the application, too.
The project exemplifies how the RFS fosters campus collaborations to make a big impact for researchers at NC State. As they encounter increasingly complex technology requirements, researchers like Arguello-Miranda are finding the expertise they need by consulting with the RFS to address their computing and data questions.
Looking for needles in haystacks
Arguello-Miranda wants to understand how cells divide. Cell division is a fundamental function in diseases like cancer and in processes like healing or antibiotic resistance. “If you can see into the machinery of cells dividing, you can understand how and why it happens and, possibly, how to control its happening. But seeing these events is very difficult,” he says.
At the Miranda Lab, the research team works on microscopy and image processing. They use high-powered microscopy systems to take images of fungal pathogens—because they are particularly difficult to control—and then they analyze those images to look for specific kinds of events in the pathogens’ life cycles.
“We wanted to develop a tool that would allow users to process images faster than what most laboratories and most computational tools can achieve these days. Why? Because the processing of the information in those images has become a bottleneck of scientific progress,” Arguello-Miranda says.
“You could have, for example, a microscope taking thousands of images per day. But at the end of that day, you need to make sense of those images. You know that there is information in the images which is very important, but how do you retrieve that information from the images into numbers, quantifications, statistical tests?”
Pathogens, for example, reproduce sexually in order to spread. If we could witness this process, we could potentially interfere with the process and stop a pathogen in its tracks. But this is easier said than done.
“In some pathogens, nobody has seen life cycle stages such as sexual reproduction because these are events that happen rarely,” Arguello-Miranda says. “You need to have a system to detect those events among thousands of images, and you cannot have a person looking through thousands of images to identify those rare events. So we needed a system that could allow you to process a lot of images very fast with the capacity to identify rare events in the life cycle of fungal pathogens."
Arguello-Miranda and the CALS IT team had developed a prototype of just such a system that could detect and segment fungi in images. The lab was about to use the application in a workshop, but they were concerned that large-scale images from 50 participants was more than their process could handle. Worried that an application running off their laptops would crash with so many simultaneous users, Arguello-Miranda reached out to the RFS.
Scaling up and speeding up
“At first intake, this was: ‘We need some data storage support for this workshop.’ And then quickly it became: ‘You actually need data storage support and computing for the entire project that your lab is doing,’” says Moira Downey, Research Facilitator at the RFS.
Downey and her team found that Arguello-Miranda had a need for sheer computing power—not only were the image files very high resolution, they were also very diverse, ranging from microscopic images of yeast cells to photographs of entire leaves with fungal infections. They also saw that he could use their help in developing terms of use for the images that participants brought to the workshop, because he wanted to use those images to further train the AI model.
“The idea was for the application to be available via a web server—and without their researchers needing to learn how to use HPC or get HPC credentials because that slows everything down,” says Derek DeVries, Research Integration Consultant at the RFS. “It became clear really quickly that they needed a lot of extra computing power and specifically graphic processor unit (GPU) computing power. But their development team had never created a project of this scale.”
DeVries notes that the Miranda Lab could have purchased a powerful GPU server, but there would be a steep learning curve to integrate it, it would be running only intermittently, and they would have to figure out how to keep it updated and secure. Not a great investment-to-reward ratio—especially when there are HPC resources on the same campus.
DeVries sketched out two possible process flows and brought them to Eric Sills, Assistant Vice Chancellor of Shared Services, at the HPC. Weighing both the lab’s needs for their imminent workshop and their aspirations for the application’s wider use in the future, they chose a solution that would speed up the image processing by leveraging the cluster, while simplifying the application’s usability and access.
“Having everybody install the application on their own laptops and trying to figure out all the different dependencies on different systems was going to be really, really hard—that was what they were gearing up for,” DeVries says. “When we were able to come back in about a week with a working prototype, the lab was very excited.”
Once the prototype was functional, the team took a divide-and-conquer approach to its development, with Jamie Dennis of CALS IT helping student developers set up the frontend web server that could take advantage of the pipeline between OIT's Research Storage and the HPC.
At the workshop, Arguello-Miranda and his eight students talked with participants, took notes, and recorded errors. That beta test information informed further development, and now, a more robust application has been made available. To date, it has been used by researchers in Japan, Nigeria, Finland, and several universities here in the United States for tasks as various as examining cardiac cells and characterizing fungal infections on cucumber leaves. "It was exciting, and a privilege, to work on such a project," says Surya Sukumar, the students’ team lead.
“Talking to the RFS enabled the creation of new ideas and solutions,” Arguello-Miranda says. “Those solutions required contact with other groups outside the Libraries, but the Libraries was well-connected with that ecosystem of collaborators, so the project suddenly became a team effort that encompassed people from the Libraries, the HPC, CALS IT, and the microbiologists and microscopists in my lab.”
“The Libraries put together a dynamic and highly organized team for us. A lot of people took part, and they ran with their own ideas and priorities, but always towards the goal of making something great at NC State. Now we're trying to understand how to best introduce this to a wider society, but I think everybody was really motivated by the fact that we could do something to solve a need that’s common to all research laboratories.”
A cupboard of readymade solutions
As new technologies like AI and quantum computing come over the horizon, researchers will find both opportunities and challenges in them. No one can be an expert in all of it, but a consultation with the RFS can help to gather that expertise. Currently, the RFS serves CALS, the College of Sciences (COS), the College of Humanities and Social Sciences (CHASS), and the College of Natural Resources (CNR). The service will continue expanding its reach to other portions of campus in a phased rollout approach.
“I think that being able to help more researchers do the kind of thing that we helped Orlando to accomplish is a good way to grow,” Downey says. “I think we'd like to help researchers think about managing their data—I'm a little bit of a hammer looking for nails with that.”
“Especially directly related to this project, we’re looking for solutions that can be easily reused for a lot of different research use cases,” DeVries says. “So this is a perfect example of taking something powerful but impenetrable like the HPC resources and turning it into something much more accessible.”
“We see this particular structure of an automated workflow that picks things off research storage, processes them in a consistent way, and spits them back onto research storage as potentially useful in research-based courses and labs that have very similar repetitive workflows. Part of the reason why we're excited by this project is for this potential for reuse, which we're already seeing. We have, I think, three other groups at the moment that have potential applications for the same kind of workflow.”
“The reusability piece is really key,” Downey says. “It's keeping people from having to re-invent the wheel, right? We've been identifying areas like documentation and demonstrating opportunities for workflows or tasks that a lot of researchers have to learn as we're shifting away from Google Drive as the primary place for infinite storage. Finding or building a consistent set of documentation and workflows that people can utilize can make that as painless as possible.”