Santa J. Ono, Ph.D. President at University of Michigan - Ann Arbor | Official website
Santa J. Ono, Ph.D. President at University of Michigan - Ann Arbor | Official website
Researchers at the University of Michigan have created an artificial intelligence tool that can analyze the behavior of single molecules within large datasets in a much shorter time frame than previously possible. The tool, named META-SiM, is designed as a foundation model, meaning it is trained on a wide variety of experimental data and can perform many types of analyses.
Tracking the behavior of single molecules is important for understanding cellular processes and disease progression. Researchers typically tag molecules with fluorophores, excite them with lasers, and use microscopes to observe their behavior over time. However, the volume of data generated by this method makes manual analysis slow and prone to missed findings.
META-SiM addresses this challenge by scanning entire datasets to identify patterns or behaviors that may require further investigation. Unlike models built for specific tasks, META-SiM’s broader training allows it to adapt to different kinds of biological data.
Nils Walter, co-director of the Center for RNA Biomedicine and senior author of the study published in Nature Methods, said: “The idea is to grow from single molecules to any larger scale. In principle, data have similarities to one another, and this AI algorithm is able to find out what those similarities are—as well as any deviations—no matter what scale you’re working at. We could also track, say, the movement of wildebeests across Kenya and Tanzania, or even potentially celestial bodies moving across the universe.”
The development team trained META-SiM using millions of simulated traces representing various molecular behaviors observed in laboratory settings. According to Walter, one potential application for META-SiM is identifying instances where errors occur during genetic information splicing—a process implicated in about 60% of human genetic diseases.
Alexander Johnson-Buck, a co-author and research scientist at U-M, compared searching for significant molecular events within large datasets to finding Waldo in the children’s puzzle books: “Doing analysis on large data sets like our single molecule fluorescence microscopy data is like doing a Where’s Waldo? puzzle where you’re trying to find Waldo,” he said. “Except maybe instead of a single page, it’s hidden on dozens of pages or more, and maybe you don’t know what Waldo looks like, and there might be multiple Waldos.”
While META-SiM cannot directly pinpoint every instance scientists are looking for (“Waldo”), it highlights areas worth closer examination. Walter explained: “It accelerates analysis and finds the key things that you would normally have to sift through the data for half a year or so to find basically overnight.”
Johnson-Buck added: “You will still need an expert to interpret that discovery and to put it into context, but it makes the discovery aspect potentially a lot faster.”
The study was led by Jieming Li and Leyou Zhang with support from the National Institutes of Health.