Machine-learning techniques used by thousands of scientists to analyse data are producing results that are misleading and often completely wrong. Rice University statistician Dr Genevera Allen gave a warning at the American Association for the Advancement of Science in Washington this week: that scientists are leaning on machine learning algorithms to find patterns in data even when the algorithms are just fixating on noise that won’t be reproduced by another experiment.
“The reproducibility issue Dr Genevera Allen highlights is a typical overfitting problem where the models end up describing the data rather than the underlying feature which the data are representing”, says Dr Geir Rune Flåten, Chief Solutons Officer at Camo Analytics.
“This is a key challenge in all modelling and high on the agenda in all Camo courses and projects. Both in our analysis tools and our approach we apply rigorous validation and leverage domain knowledge to provide reliable and verifiable conclusions. Modeling is not a magic wand solving all problems but rather a powerful tool which can generate fantastic insight if used properly”, says Flåten.
From Nature’s survey of 1,576 researchers who took a brief
online questionnaire on reproducibility in research.
Talk to an expert
Get in touch if you have questions about our products, platform, how to get started or how best to address your analytics needs.