Curious findings about medical image datasets
Prof. Veronika Cheplygina PhD, IT University of Copenhagen
Abstract
It may seem intuitive that we need high quality datasets to ensure for robust algorithms for medical image classification. With the introduction of openly available, larger datasets, it might seem that the problem has been solved. However, this is far from being the case, as it turns out that even these datasets suffer from issues like label noise and shortcuts or confounders. Furthermore, there are behaviours in our research community that threaten the validity of published findings. In this talk I will discuss both types of issues with examples from recent papers.
Relevant Papers:
- Copycats: the many lives of a publicly available medical imaging dataset
- Data usage and citation practices in medical imaging conferences
- Augmenting Chest X-ray Datasets with Non-Expert Annotations
- Machine learning for medical imaging: methodological failures and recommendations for the future
Speaker Bio
Prof. Veronika Cheplygina’s research focuses on meta-research in the fields of machine learning and medical image analysis. She received her Ph.D. from Delft University of Technology in 2015. After a postdoc at the Erasmus Medical Center, in 2017 she started as an assistant professor at Eindhoven University of Technology. In 2020, failing to achieve various metrics, she left the tenure track of search of the next step where she can contribute to open and inclusive science. In 2021 she started as an associate professor at IT University of Copenhagen, and was recently appointed as full professor at the same university. Next to research and teaching, Veronika blogs about academic life at https://www.veronikach.com. She also loves cats, which you will often encounter in her work.
Time & Place
Wednesday, July 23, 2025
13:45 – 14:30
Reisensburg Castle