In this workshop, we will discuss on Vision Transformer classification applied on medical diagnosis. After introducing Transformers and the mechanism of self-attention, we will show the results of the ViT architecture compared with a classic CNN and a multistage architecture composed of successive CNNs.
We will then show how In recent years, the scientific community focused on developing Computer-Aided Diagnosis tools that could improve clinicians’ bone fracture diagnosis, primarily based on Convolutional Neural Networks (CNNs). However, the discerning accuracy of fractures’ subtypes was far from optimal. The aim of this study is to evaluate a new CAD system based on Vision Transformers (ViT) and to assess whether clinicians’ diagnostic accuracy could be improved using this system.
To demonstrate this, we will discuss an evaluation made by 11 clinicians, who were asked to classify 150 proximal femur fracture images with and without the help of the ViT.
Workshop led by Leonardo Tanzi, PhD student at Polytechnic University of Turin