CARDIUM: Congenital Anomaly Recognition with Diagnostic Images and Unified Medical Records

Daniela Vega¹, Hannah V. Ceballos¹, Javier S. Vera¹, Santiago Rodriguez¹, Alejandra Perez¹, Angela Castillo¹, Maria Escobar¹, Dario Londoño¹², Luis A. Sarmiento¹², Camila I. Castro¹, Nadiezhda Rodriguez¹², Juan C. Briceño¹, Pablo Arbelaez¹

¹Universidad de los Andes, Colombia; ²Fundación Santa Fe de Bogotá, Colombia

Abstract

Prenatal diagnosis of Congenital Heart Diseases (CHDs) holds great potential for Artificial Intelligence (AI)-driven solutions. However, collecting high-quality diagnostic data remains difficult due to the rarity of these conditions, resulting in imbalanced and low-quality datasets that hinder model performance. Moreover, no public efforts have been made to integrate multiple sources of information, such as imaging and clinical data, further limiting the ability of AI models to support and enhance clinical decision-making.

To overcome these challenges, we introduce the Congenital Anomaly Recognition with Diagnostic Images and Unified Medical records (CARDIUM) dataset, the first publicly available multimodal dataset consolidating fetal ultrasound and echocardiographic images along with maternal clinical records for prenatal CHD detection. Furthermore, we propose a robust multimodal transformer architecture that incorporates a cross-attention mechanism to fuse feature representations from image and tabular data, improving CHD detection by 11% and 50% over image and tabular single-modality approaches, respectively, and achieving an F1-score of 79.8 ± 4.8% in the CARDIUM dataset. We will publicly release our dataset and code to encourage further research on this unexplored field.

Results

Overall Results

We train and evaluate our model on the CARDIUM dataset, resulting on improved performance metrics compared to each modality separately.

Images	Clinical Data	CHD F1 Score	CHD Precision	CHD Recall	AUC
✓	✓	0.798 ± 0.048	0.876 ± 0.173	0.757 ± 0.104	0.974 ± 0.012
✓		0.689 ± 0.066	0.659 ± 0.135	0.742 ± 0.119	0.955 ± 0.0154
	✓	0.294 ± 0.019	0.192 ± 0.019	0.634 ± 0.049	0.794 ± 0.028

A) MLP-Fusion: concatenate modality features, then process them with an MLP. B) Transformer Encoder Fusion: concatenate features, then process them with a transformer encoder. C) Transformer Decoder Fusion: process image features with a decoder, then integrate tabular features through cross-attention. D) Transformer Encoder with Cross-Attention Fusion: each modality is encoded separately, then fused via cross-attention.

Results per Trimester

In addition to overall performance, we also evaluate our model's effectiveness across the different trimesters of pregnancy in the CARDIUM dataset.

Trimester	CHD F1 Score	CHD Precision	CHD Recall
First	0.222 ± 0.314	0.333 ± 0.471	0.167 ± 0.236
Second	0.603 ± 0.092	0.701 ± 0.212	0.556 ± 0.101
Third	0.732 ± 0.072	0.825 ± 0.127	0.669 ± 0.074

BibTeX

@inproceedings{ vega2025cardium, title={{CARDIUM}: Congenital Anomaly Recognition with Diagnostic Images and Unified Medical records}, author={Daniela Vega and Hannah Ceballos and Javier Santiago Vera Rincon and Santiago Rodriguez and Alejandra Perez and Angela Castillo and Maria Escobar and Dario Londo{\~n}o and Luis Andres Sarmiento and Camila Irene Castro and Nadiezhda Rodriguez and Juan Carlos Brice{\~n}o and Pablo Arbelaez}, booktitle={Third Workshop on Computer Vision for Automated Medical Diagnosis}, year={2025}, url={https://openreview.net/forum?id=MDHl5LCcka} }

CARDIUM: Congenital Anomaly Recognition with Diagnostic Images and Unified Medical Records

Abstract

CARDIUM Model Overview

Results

Overall Results

Results per Trimester

BibTeX