Minimally invasive surgeries and related applications demand surgical tool classification and segmentation at the instance level. Surgical tools are similar in appearance and are often long, thin, and handled at an angle. The fine-tuning of state-of-the-art (SOTA) instance segmentation models trained on natural images for instrument segmentation has difficulty differentiating between instrument classes. Our research demonstrates that while the bounding box and segmentation mask are often accurate, the classification head misclassifies the class label of the surgical instrument. Based on this insight, we present a new neural network framework that adds a classification module as a new stage to existing instance segmentation models. This module specializes in improving the classification of instrument masks generated by the existing model. The module comprises multi-scale mask attention, which attends to the instrument region and masks the distracting background features. We propose training the proposed classifier module using metric learning with arc loss to handle low inter- class variance of surgical instruments. We conduct exhaustive experiments on the benchmark datasets EndoVis2017 and EndoVis2018. We demonstrate that our method outperforms all (more than 20) SOTA methods compared with and improves the SOTA performance by at least 12 points (20 percent) on the EndoVis2017 benchmark challenge and generalizes effectively across the datasets.
@inproceedings{baby2023forks, title={From Forks to Forceps: A New Framework for Instance Segmentation of Surgical Instruments}, author={Baby, Britty and Thapar, Daksh and Chasmai, Mustafa and Banerjee, Tamajit and Dargan, Kunal and Suri, Ashish and Banerjee, Subhashis and Arora, Chetan}, booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision}, pages={6191--6201}, year={2023} }