New machine learning algorithm finds genetic signature characteristic of tumors
How do cancer cells differ from healthy cells? A new machine-learning algorithm called “ikarus” knows the answer, reports a team led by MDC bioinformatician Altuna Akalin in the journal Genome Biology. The AI program found a genetic signature characteristic of tumors.
When it comes to identifying patterns in mountains of data, humans are no match for artificial intelligence (AI). In particular, a branch of AI called machine learning is often used to find regularities in datasets – whether for stock market analysis, image and speech recognition or cell classification. To reliably distinguish cancer cells from healthy cells, a team led by Dr. Altuna Akalin, Head of the Bioinformatics and Omics Data Science Platform at the Max Delbrück Center for Molecular Medicine of the Helmholtz Association (MDC) , has now developed a machine learning program. called “ikarus”. The program found a pattern in tumor cells that is common to different types of cancer, consisting of a characteristic combination of genes. According to the team’s paper in the journal Genome Biology, the algorithm also detected types of genes in the pattern that had never been clearly linked to cancer before.
Machine learning basically means that an algorithm uses training data to learn how to answer certain questions on its own. To do this, he looks for patterns in the data that help him solve problems. After the learning phase, the system can generalize from what it has learned in order to evaluate unknown data.
It was a major challenge to get proper training data where experts had already clearly distinguished between “healthy” and “cancerous” cells.
Jan Dohmen, first author of the article
Surprisingly high success rate
Additionally, single-cell sequencing datasets are often noisy. This means that the information they contain about the molecular characteristics of individual cells is not very precise – perhaps because a different number of genes are detected in each cell, or because the samples are not always treated the same way. As Dohmen and his colleague Dr. Vedran Franke, co-lead of the study, report, they sifted through countless publications and contacted a number of research groups in order to obtain adequate data sets. The team ultimately used data from lung and colorectal cancer cells to train the algorithm before applying it to datasets of other tumor types.
In the learning phase, ikarus had to find a list of characteristic genes which it then used to categorize the cells. “We tried and refined various approaches,” says Dohmen. It was a long-term job, as the three scientists tell us. “The key was for ikarus to ultimately use two lists: one for cancer genes and one for genes from other cells,” Franke explains. After the training phase, the algorithm was able to reliably distinguish healthy cells from tumor cells in other types of cancer as well, such as in tissue samples from patients with liver cancer or a neuroblastoma. Its success rate tended to be extraordinarily high, which surprised even the research group. “We didn’t expect there to be a common signature that so precisely defines tumor cells from different cancer types,” says Akalin. “But we still can’t say if the method works for all types of cancer,” adds Dohmen. To make ikarus a reliable cancer diagnostic tool, researchers now want to test it on other types of tumours.
AI as a fully automated diagnostic tool
The project aims to go well beyond the classification of “healthy” versus “cancerous” cells. In the first tests, ikarus has already demonstrated that the method can also distinguish other cell types (and some subtypes) of cells from tumor cells. “We want to make the approach more comprehensive,” says Akalin, “by developing it further so that it can distinguish between all possible cell types in a biopsy.”
In hospitals, pathologists tend to only examine tissue samples from tumors under a microscope to identify the different cell types. It is laborious and time-consuming work. With ikarus, this step could one day become a fully automated process. Also, notes Akalin, the data could be used to draw conclusions about the immediate environment of the tumor. And it could help doctors choose the best therapy. Because the composition of the cancerous tissue and the microenvironment often indicate whether a certain treatment or drug will be effective or not. Moreover, AI can also be useful for developing new drugs. “Ikarus allows us to identify genes that are potential drivers of cancer,” says Akalin. New therapeutic agents could then be used to target these molecular structures.
A remarkable aspect of the publication is that it was fully prepared during the COVID pandemic. Not everyone involved was at their usual offices at the Berlin Institute for Medical Systems Biology (BIMSB), part of the MDC. Instead, they were in home offices and only communicated with each other digitally. For Franke, therefore, “the project shows that a numerical structure can be created to facilitate scientific work under these conditions.”
Max Delbrück Center for Molecular Medicine of the Helmholtz Association
Dohmen, J. et al. (2022) Identifying Tumor Cells at the Single-Cell Level Using Machine Learning. Genome biology. doi.org/10.1186/s13059‐022‐02683‐1.