Skip to content

Latest commit

 

History

History
89 lines (89 loc) · 3.46 KB

2024-11-17-ahmed24a.md

File metadata and controls

89 lines (89 loc) · 3.46 KB
title booktitle abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
PathAlign: A vision–language model for whole slide images in histopathology
Proceedings of the MICCAI Workshop on Computational Pathology
Microscopic interpretation of histopathology images underlies many important diagnostic and treatment decisions. While advances in vision–language modeling raise new oppor- tunities for analysis of such images, the gigapixel-scale size of whole slide images (WSIs) introduces unique challenges. Additionally, pathology reports simultaneously highlight key findings from small regions while also aggregating interpretation across multiple slides, often making it difficult to create robust image–text pairs. As such, pathology reports remain a largely untapped source of supervision in computational pathology, with most efforts relying on region-of-interest annotations or self-supervision at the patch-level. In this work, we develop a vision–language model based on the BLIP-2 framework using WSIs paired with curated text from pathology reports. This enables applications utilizing a shared image–text embedding space, such as text or image retrieval for finding cases of interest, as well as integration of the WSI encoder with a frozen large language model (LLM) for WSI-based generative text capabilities such as report generation or AI-in-the-loop interactions. We utilize a de-identified dataset of over 350,000 WSIs and diagnostic text pairs, spanning a wide range of diagnoses, procedure types, and tissue types. We present pathologist evaluation of text generation and text retrieval using WSI embeddings, as well as results for WSI classification and workflow prioritization (slide-level triaging). Model-generated text for WSIs was rated by pathologists as accurate, without clinically significant error or omission, for 78% of WSIs on average. This work demonstrates exciting potential capabilities for language-aligned WSI embeddings.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
ahmed24a
0
PathAlign: A vision–language model for whole slide images in histopathology
72
108
72-108
72
false
Ahmed, Faruk and Sellergen, Andrew and Yang, Lin and Xu, Shawn and Babenko, Boris and Ward, Abbi and Olson, Niels and Mohtashamian, Arash and Matias, Yossi and Corrado, Greg S. and Duong, Quang and Webster, Dale R. and Shetty, Shravya and Golden, Daniel and Liu, Yun and Steiner, David F. and Wulczyn, Ellery
given family
Faruk
Ahmed
given family
Andrew
Sellergen
given family
Lin
Yang
given family
Shawn
Xu
given family
Boris
Babenko
given family
Abbi
Ward
given family
Niels
Olson
given family
Arash
Mohtashamian
given family
Yossi
Matias
given family
Greg S.
Corrado
given family
Quang
Duong
given family
Dale R.
Webster
given family
Shravya
Shetty
given family
Daniel
Golden
given family
Yun
Liu
given family
David F.
Steiner
given family
Ellery
Wulczyn
2024-11-17
Proceedings of the MICCAI Workshop on Computational Pathology
254
inproceedings
date-parts
2024
11
17