Immune Profiler
Tumor immune phenotype classification for personalized cancer treatment
Highlights
Boehringer Ingelheim needed to automate tumor immune phenotype classification from histopathology images to support personalized cancer treatment.
Lead the ML efforts to build a scalable, accurate AI solution for phenotype detection.
Integrated HoVer-Net for nucleus segmentation, extracted image features, and applied AutoML for cell classification and immune archetype assignment.
Achieved 89% F1-score, reduced manual workload, and improved diagnostic precision.
Core Team
Overview
This project focuses on developing an end-to-end ML-driven workflow to classify the immunological phenotype of human solid tumors based on whole slide images (WSIs). The immunological phenotypes β immune inflamed, immune excluded, and immune desert β are essential for understanding tumor biology, predicting disease progression, and tailoring immunotherapy.
The workflow integrates advanced machine learning and deep learning methods for nucleus segmentation, feature extraction, and classification to identify stromal, lymphocyte, and cancer cells. The automated solution reduces manual effort, enhances diagnostic precision, and facilitates personalized treatment strategies.
Data
Tumor immune phenotypes β immune inflamed, excluded, and desert β are essential for advancing oncology and directly influence how well a patient may respond to immunotherapy. Inflamed tumors often have a better prognosis as immune cells are actively infiltrating the tumor, while excluded and desert phenotypes present greater challenges. By automating the analysis of these phenotypes, we aim to support oncologists in tailoring treatments to each patient.

Immune Inflamed

Immune Excluded

Immune Desert
The dataset comprised 76 whole slide images (WSIs) of human solid tumors, including rectal, colorectal, and lung adenocarcinomas:
- β’Patches: 5.7 million patches generated, 2.7 million tissue-rich patches selected for analysis
- β’Annotations: 527 tissue-rich patches manually annotated with 7,318 nuclei
- β’Cell Types: Tumor-infiltrating lymphocytes (TIL), tumor cells, and stromal cells
- β’Labeling: Weakly supervised methods were employed to ensure high-quality labeling of TIL, tumor, and stromal cell populations.
Methods
The proposed solution comprises five main stages (Figure 1):
- WSI Slicing: Whole slide images were divided into smaller high-resolution patches using a custom slicing algorithm (Figure 2).
- Nucleus Segmentation: HoVer-Net, a state-of-the-art deep learning model, was used to segment nuclei.
- Feature Extraction: Detailed features such as shape, texture, and intensity were extracted from segmented nuclei for downstream analysis.
- Nucleus Classification: An AutoML-based approach using the mljar framework classified nuclei into cancer cells, lymphocytes, stromal cells, or others.
- Immune Archetype Assignment: Cells were clustered based on type and spatial arrangement to estimate density distributions, assigning tumors as immune inflamed, excluded, or desert.


Results
The results highlight the performance and scalability of the developed pipeline across segmentation, classification, and immune archetype determination:
- β’WSI Slicing: From 5.7 million patches generated, only 2.7 million containing tissue were retained, ensuring efficient downstream processing.
- β’Nucleus Segmentation: HoVer-Net achieved a Dice Similarity Coefficient of 90% on validation and 75% on test subset.
- β’Nucleus Classification: The AutoML classifier achieved a weighted F1-score of 89%, with high precision for lymphocytes (F1-score: 0.74).
- β’Immune Archetype Determination: The pipeline assigned archetypes with high reproducibility, enabling consistent classification into immune inflamed, excluded, or desert categories (Figure 3).

Conclusion
The hybrid deep and machine learning workflow successfully classified tumor immune phenotypes β immune inflamed, excluded, and desert β using whole slide images. This classification aids in predicting cancer progression and tailoring immunotherapy strategies.
The methodology showcased high accuracy and adaptability across different tumor types. Future enhancements could involve integrating multi-modal data such as gene expression profiles or leveraging foundation models to refine classification further.




