Publications

January 2022

Image

The dynamics of pathology dataset creation using urine cytology as an example.

McAlpine E, Michelow P, Celik T. Acta Cytologica. 2022. 66:46–54. doi: 10.1159/000519273.

Full text article available here.

Introduction: Dataset creation is one of the first tasks required for training AI algorithms but is underestimated in pathology. High-quality data are essential for training algorithms and data should be labelled accurately and include sufficient morphological diversity. The dynamics and challenges of labelling a urine cytology dataset using The Paris System (TPS) criteria are presented.

Methods: 2,454 images were labelled by pathologist consensus via video conferencing over a 14-day period. During the labelling sessions, the dynamics of the labelling process were recorded. Quality assurance images were randomly selected from images labelled in previous sessions within this study and randomly distributed throughout new labelling sessions. To assess the effect of time on the labelling process, the labelled set of images was split into 2 groups according to the median relative label time and the time taken to label images and intersession agreement were assessed.

Results: Labelling sessions ranged from 24 m 11 s to 41 m 06 s in length, with a median of 33 m 47 s. The majority of the 2,454 images were labelled as benign urothelial cells, with atypical and malignant urothelial cells more sparsely represented. The time taken to label individual images ranged from 1 s to 42 s with a median of 2.9 s. Labelling times differed significantly among categories, with the median label time for the atypical urothelial category being 7.2 s, followed by the malignant urothelial category at 3.8 s and the benign urothelial category at 2.9 s. The overall intersession agreement for quality assurance images was substantial. The level of agreement differed among classes of urothelial cells – benign and malignant urothelial cell classes showed almost perfect agreement and the atypical urothelial cell class showed moderate agreement. Image labelling times seemed to speed up, and there was no evidence of worsening of intersession agreement with session time.

Discussion/Conclusion: Important aspects of pathology dataset creation are presented, illustrating the significant resources required for labelling a large dataset. We present evidence that the time taken to categorise urine cytology images varies by diagnosis/class. The known challenges relating to the reproducibility of the AUC (atypical) category in TPS when compared to the NHGUC (benign) or HGUC (malignant) categories is also confirmed.


January 2022

The Utility of Unsupervised Machine Learning in Anatomic Pathology

McAlpine E, Michelow P, Celik T. American Journal of Clinical Pathology. 2022. 157:5-14 doi: 10.1093/AJCP/AQAB085

Article available here.

Introduction: Developing accurate supervised machine learning algorithms is hampered by the lack of representative annotated datasets. Most data in anatomic pathology are unlabeled and creating large, annotated datasets is a time consuming and laborious process. Unsupervised learning, which does not require annotated data, possesses the potential to assist with this challenge. This review aims to introduce the concept of unsupervised learning and illustrate how clustering, generative adversarial networks (GANs) and autoencoders have the potential to address the lack of annotated data in anatomic pathology.

Methods: A review of unsupervised learning with examples from the literature was carried out.

Results: Clustering can be used as part of semisupervised learning where labels are propagated from a subset of annotated data points to remaining unlabeled data points in a dataset. GANs may assist by generating large amounts of synthetic data and performing color normalization. Autoencoders allow training of a network on a large, unlabeled dataset and transferring learned representations to a classifier using a smaller, labeled subset (unsupervised pretraining).

Discussion/Conclusion: Unsupervised machine learning techniques such as clustering, GANs, and autoencoders, used individually or in combination, may help address the lack of annotated data in pathology and improve the process of developing supervised learning models.


July 2021

Challenges Developing Deep Learning Algorithms in Cytology

McAlpine E, Pantanowitz L, Michelow P. Acta Cytologica. 2021. 65:301–309. doi: 10.1159/000510991

Article available here.

Background: The incorporation of digital pathology into routine pathology practice is becoming more widespread. Definite advantages exist with respect to the implementation of artificial intelligence (AI) and deep learning in pathology, including cytopathology. However, there are also unique challenges in this regard.

Summary: This review discusses cytology-specific challenges, including the need to implement digital cytology prior to AI; the large file sizes and increased acquisition times for whole slide images in cytology; the routine use of multiple stains, such as Papanicolaou and Romanowsky stains; the lack of high-quality annotated datasets on which to train algorithms; and the considerable computer resources required, in terms of both computer infrastructure and skilled personnel, for computing and storage of data. Global concerns regarding AI that are certainly applicable to cytology include the need for model validation and continued quality assurance, ethical issues such as the use of patient data in developing algorithms, the need to develop regulatory frameworks regarding what type of data can be utilized and ensuring cybersecurity during data collection and storage, and algorithm development.

Key Messages: While AI will likely play a role in cytology practice in the future, applying this technology to cytology poses a unique set of challenges. A broad understanding of digital pathology and algorithm development is desirable to guide the development of algorithms, as well as the need to be cognizant of potential pitfalls to avoid when incorporating the technology in practice.


September 2020

The cytopathologist’s role in developing and evaluating artificial intelligence in cytopathology practice

McAlpine E, Michelow P. Cytopathology. 2020. 31: 385-392. doi: 10.1111/cyt.12799

Article available here.

Abstract: Artificial intelligence (AI) technologies have the potential to transform cytopathology practice, and it is important for cytopathologists to embrace this and place themselves at the forefront of implementing these technologies in cytopathology. This review illustrates an archetypal AI workflow from project conception to implementation in a diagnostic setting and illustrates the cytopathologist's role and level of involvement at each stage of the process. Cytopathologists need to develop and maintain a basic understanding of AI, drive decisions regarding the development and implementation of AI in cytopathology, participate in the generation of datasets used to train and evaluate AI algorithms, understand how the performance of these algorithms is assessed, participate in the validation of these algorithms (either at a regulatory level or in the laboratory setting), and ensure continuous quality assurance of algorithms deployed in a diagnostic setting. In addition, cytopathologists should ensure that these algorithms are developed, trained, tested and deployed in an ethical manner. Cytopathologists need to become informed consumers of these AI algorithms by understanding their workings and limitations, how their performance is assessed and how to validate and verify their output in clinical practice.


July 2020

Implementing Deep Learning Algorithms in Anatomical Pathology using open-source Deep Learning Libraries

McAlpine E, Michelow P. Advances in Anatomic Pathology. 2020. 27: 260-268. doi: 10.1097/PAP.0000000000000265

Article available here.

Abstract: The application of artificial intelligence technologies to anatomic pathology has the potential to transform the practice of pathology, but, despite this, many pathologists are unfamiliar with how these models are created, trained, and evaluated. In addition, many pathologists may feel that they do not possess the necessary skills to allow them to embark on research into this field. This article aims to act as an introductory tutorial to illustrate how to create, train, and evaluate simple artificial learning models (neural networks) on histopathology data sets in the programming language Python using the popular freely available, open-source libraries Keras, TensorFlow, PyTorch, and Detecto. Furthermore, it aims to introduce pathologists to commonly used terms and concepts used in artificial intelligence.