In the workshop, have discuss on Vision Transformer classification applied on medical diagnosis. After introducing Transformers and the mechanism of self-attention, we showed the results of the ViT architecture compared with a classic CNN and a multistage architecture composed of successive CNNs. We then showed how In recent years, the scientific community focused on developing Computer-Aided Diagnosis tools that could improve clinicians’ bone fracture diagnosis, primarily based on Convolutional Neural Networks (CNNs). However, the discerning accuracy of fractures’ subtypes was far from optimal. The aim of this study is to evaluate a new CAD system based on Vision Transformers (ViT) and to assess whether clinicians’ diagnostic accuracy could be improved using this system. To demonstrate this, we discussed an evaluation made by 11 clinicians, who were asked to classify 150 proximal femur fracture images with and without the help of the ViT.
Link to the article on medium
Link to the published paper
Link to the Youtube video of the event
Link to the GitHub repository