Ground truth is the reference data accepted as correct, used to train machine learning models and to measure how well they perform. It usually takes the form of labeled examples, expert annotations, or a curated benchmark set.
A medical imaging team, for instance, treats radiologist-confirmed diagnoses as ground truth, then scores a model by how often its predictions match those labels.
The reliability of any evaluation depends on the ground truth behind it. If the reference set changes without record, accuracy scores shift for reasons no one can trace. Treating ground truth as a versioned, reproducible data state keeps evaluations comparable across runs and over time.