Offline Evaluation Dataset

Created ~500 labeled emergency vs non-emergency messages and a doctor-reviewed benchmark for response quality.

Daniel Okafor

© 2026 Daniel Okafor

Offline Evaluation Dataset