Ground-truth in a medical machine learning project

  • By: Marek Pitura

Objective, well prepared data is the basis in medical machine learning project. After all, a machine learning model trained on a very good data set produces very good results. This is obvious. But how demanding professional ground-truth preparation is we have found out only recently. In a project where we use machine learning algorithms.

Ground-truth in a medical machine learning project – the challenge

In one of our recent medical machine learning projects, we faced a major organizational challenge. The client had several hundred MRI brain scans available. We were to use them in our medical machine learning project. This was very good news, and such a number of studies augured well for the successful results of the work. However, there was an obstacle that prevented their immediate use. Unfortunately, the studies were “raw.” That is, unannotated and unlabeled in a way that would allow them to be used in the development of artificial intelligence models. Therefore, our task additionally became also the preparation of data “from scratch”. So that they could be used in the project. The data had to be prepared in just a few months.

Ground-truth in a medical machine learning project – the preparation

Preparing several hundred studies to our expected standard, which we could successfully use in a medical machine learning project, required us to take a completely new approach. We had to organize the whole process including both data annotation and verification of the accuracy of these annotations. We realized that these are crucial elements for the implementation of the whole process. If we use poor quality data for training, we should also expect such results from the model trained on them.


The project involved medical imaging, so we engaged radiologists with the expertise necessary to properly analyze a given MRI study to prepare the data. We assembled a team of seven physicians, including two specialists with more than 10 years of experience in brain MRI diagnosis. A major challenge was to ensure there was no influence of a given physician’s personal experience, which is a common problem in the analysis of medical imaging data. It was important to ensure the maximum degree of objective analysis of a given study.


In order to achieve this result, the doctors were divided into two teams. The first, consisting of five people, was in charge of preparing the annotations, that is, marking the areas of interest in this project on the surveys, divided into four subregions. The task of the members of the second team, consisting of the two most experienced radiologists, was to evaluate each annotation. They could accept such a study or refer it for improvement. Thus, each study was described by one specialist. This description was then reviewed by a second, more experienced specialist. Thus, the annotation of each study was double-checked, so to speak. Once in its preparation and the second time in its evaluation. Our task was to ensure a smooth flow of studies and annotations between doctors and to monitor the whole process. 


We placed special emphasis in our medical machine learning project on validating the model and evaluating its performance. For this reason, after the annotation process had been completed and all studies had been accepted by the experts, we selected a test set from the studies that had not been used for training. The annotations of the studies from the test set were then passed on for evaluation to a second member of the evaluation team who had not previously evaluated a particular study.

As a result, such a study and its annotations were evaluated three times. The first time in preparing the annotations, the second time in evaluating them, and the third time by another expert. Not until such a study had been accepted by both radiologists with more than 10 years of experience could it be used for model validation.

Objective analysis of the study

The presented process allowed us to ensure a high degree of objectivity in the analysis of a given study and to eliminate the risk related to incorrect assessment resulting from personal experience of a given doctor.


To sum up – a lot depends on a reliable execution of the ground-truth stage in medical machine learning project. Objective and properly prepared data, on which we will train the model, will ensure the expected results. And very good results are after all the success of the entire project.

See the previous post by Marek Pitura: Features of an “ideal” grant partner


About The Author

Marek Pitura
Medical products manager in Future Processing Healthcare Unit with a Master's Degree in sociology and a demonstrated history of working in the information and communication technologies. In everyday work, I try to combine human approach with business requirements by developing & applying procedures and workflow that allow for evaluation and diagnosis of both: human and market-oriented project needs.