• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Genetic Prediction of Cancer Recurrence: Scientists Verify Reliability of Computer Models

Genetic Prediction of Cancer Recurrence: Scientists Verify Reliability of Computer Models

© iStock

In biomedical research, machine learning algorithms are often used to analyse data—for instance, to predict cancer recurrence. However, it is not always clear whether these algorithms are detecting meaningful patterns or merely fitting random noise in the data. Scientists from HSE University, IBCh RAS, and Moscow State University have developed a test that makes it possible to determine this distinction. It could become an important tool for verifying the reliability of algorithms in medicine and biology. The study has been published on arXiv.

Machine learning methods help analyse complex biological data, ie for predicting the likelihood of cancer recurrence based on gene expression, which reflects the activity levels of specific DNA regions within cells. However, it is not always clear whether these algorithms are detecting meaningful patterns or merely fitting random noise in the data.

A team of scientists from HSE University, IBCh RAS, and Moscow State University has developed a test to assess how reliably the classifier distinguishes between different patient groups. In this case, the two groups were patients who experienced a recurrence of the disease and those who did not. A model performs correctly if it effectively captures biologically meaningful differences. If the algorithm simply separates the data at random, its accuracy may appear deceptively high. The researchers focused on linear classifiers, one of the most widely used ML tools in biomedicine.

Anton Zhiyanov

'We aimed to test whether randomly generated (synthetic) data could be separated by a linear classifier as effectively as real biological samples. To do this, we calculated an upper bound on the p-value, which indicates the likelihood that the model is merely "guessing." The lower this p-value, the more reliable the classifier,' explains Anton Zhiyanov, Research Fellow at the HSE Laboratory of Molecular Physiology. 

The researchers conducted a series of experiments using synthetic data, allowing them to precisely control the degree of differences between classes. They then applied the new test to real-world medical models that predict the risk of breast cancer recurrence. 

The results showed that most classifiers failed to capture any meaningful differences between patients with and without recurrence. Further analysis revealed that 559 out of 570 models produced results consistent with random chance. This suggests that many algorithms may appear accurate, while in reality their predictions are driven by coincidences rather than genuine patterns.

However, the researchers also identified reliable models that reveal biologically meaningful patterns. One such model was a classifier that focused on the activity levels of the ELOVL5 and IGFBP6 genes. This algorithm was further tested on an independent data sample, confirming that differences in the expression of these genes are indeed linked to the risk of cancer recurrence.

Each point on the graph represents a patient, with the expression levels of two genes measured: IGFBP6 on the X-axis and ELOVL5 on the Y-axis. The orange dots represent patients with a recurrence, while the blue dots represent those without. In the first graph, these points (patients) are clearly separated by a straight line, representing a linear classifier. In the second graph, the points are randomly distributed, and the classifier fails to identify any patterns between gene expression and actual recurrence.

Alexander Tonevitsky

'Our test could become an important tool for verifying the reliability of algorithms in biology and medicine. It helps prevent false conclusions and emphasises models that truly identify important patterns, which is crucial for making decisions about patient treatment,' comments Alexander Tonevitsky, Professor at the HSE Faculty of Biology and Biotechnology.

The study was conducted with support from HSE University's Basic Research Programme within the framework of the Centres of Excellence project.

See also:

Habits Stem from Childhood: School Years Found to Shape Leisure Preferences in Adulthood

Moving to a big city does not necessarily lead to dramatic changes in daily habits. A study conducted at HSE University found that leisure preferences in adulthood are largely shaped during childhood and are influenced by where individuals spent their school years. This conclusion was drawn by Sergey Korotaev, Research Fellow at the HSE Faculty of Economic Sciences, from analysing the leisure habits of more than 5,000 Russians.

Russian Scientists Reconstruct Dynamics of Brain Neuron Model Using Neural Network

Researchers from HSE University in Nizhny Novgorod have shown that a neural network can reconstruct the dynamics of a brain neuron model using just a single set of measurements, such as recordings of its electrical activity. The developed neural network was trained to reconstruct the system's full dynamics and predict its behaviour under changing conditions. This method enables the investigation of complex biological processes, even when not all necessary measurements are available. The study has been published in Chaos, Solitons & Fractals.

Scientists Propose Novel Theory on Origin of Genetic Code

Alan Herbert, Scientific Supervisor of the HSE International Laboratory of Bioinformatics, has put forward a new explanation for one of biology's enduring mysteries—the origin of the genetic code. According to his publication in Biology Letters, the contemporary genetic code may have originated from self-organising molecular complexes known as ‘tinkers.’ The author presents this novel hypothesis based on an analysis of secondary DNA structures using the AlphaFold 3 neural network.

See, Feel, and Understand: HSE Researchers to Explore Mechanisms of Movement Perception in Autism

Scientists at the HSE Cognitive Health and Intelligence Centre have won a grant from the Russian Science Foundation (RSF) to investigate the mechanisms of visual motion perception in autism. The researchers will design an experimental paradigm to explore the relationship between visual attention and motor skills in individuals with autism spectrum disorders. This will provide insight into the neurocognitive mechanisms underlying social interaction difficulties in autism and help identify strategies for compensating for them.

Scholars Disprove Existence of ‘Crisis of Trust’ in Science

An international team of researchers, including specialists from HSE University, has conducted a large-scale survey in 68 countries on the subject of trust in science. In most countries, people continue to highly value the work of scientists and want to see them take a more active role in public life. The results have been published in Nature Human Behaviour.

Education System Reforms Led to Better University Performance, HSE Researchers Find

A study by researchers at the HSE Faculty of Economic Sciences and the Institute of Education have found that the number of academic papers published by research universities in international journals has tripled in the past eight years. Additionally, universities have developed more distinct specialisations. Thus, sectoral universities specialising in medical, pedagogical, technical, and other fields are twice as likely to admit students to target places. The study has been published in Vocation, Technology & Education.

Scientists Record GRB 221009A, the Brightest Gamma-Ray Burst in Cosmic History

A team of scientists from 17 countries, including physicists from HSE University, analysed early photometric and spectroscopic data of GRB 221009A, the brightest gamma-ray burst ever recorded. The data was obtained at the Sayan Observatory one hour and 15 minutes after the emission was registered. The researchers detected photons with an energy of 18 teraelectronvolts (TeV). Theoretically, such high-energy particles should not reach Earth, but data analysis has confirmed that they can. The results challenge the theory of gamma radiation absorption and may point to unknown physical processes. The study has been published in Astronomy & Astrophysics.

Chemists Simplify Synthesis of Drugs Involving Amide Groups

Chemists from HSE University and the Nesmeyanov Institute of Organoelement Compounds of the Russian Academy of Sciences (INEOS RAS) have developed a new method for synthesising amides, essential compounds in drug production. Using a ruthenium catalyst and carbon monoxide under precisely controlled reaction conditions, they successfully obtained the target product without by-products or complex purification steps. The method has already been tested for synthesising a key component of Vorinostat, a drug used to treat T-cell lymphoma. This approach could lower the cost of the drug by orders of magnitude. The paper has been published in the Journal of Catalysis. The study was supported by the Russian Science Foundation.

Scientists Examine Neurobiology of Pragmatic Reasoning

An international team including scientists from HSE University has investigated the brain's ability to comprehend hidden meanings in spoken messages. Using fMRI, the researchers found that unambiguous meanings activate brain regions involved in decision-making, whereas processing complex and ambiguous utterances engages regions responsible for analysing context and the speaker's intentions. The more complex the task, the greater the interaction between these regions, enabling the brain to decipher the meaning. The study has been published in NeuroImage.

Scientists Present New Solution to Imbalanced Learning Problem

Specialists at the HSE Faculty of Computer Science and Sber AI Lab have developed a geometric oversampling technique known as Simplicial SMOTE. Tests on various datasets have shown that it significantly improves classification performance. This technique is particularly valuable in scenarios where rare cases are crucial, such as fraud detection or the diagnosis of rare diseases. The study's results are available on ArXiv.org, an open-access archive, and will be presented at the International Conference on Knowledge Discovery and Data Mining (KDD) in summer 2025 in Toronto, Canada.