Labeled data

Labeled data is a group of samples that have been tagged with one or more labels. Labeling typically takes a set of unlabeled data and augments each piece of it with informative tags. For example, a data label might indicate whether a photo contains a horse or a cow, which words were uttered in an audio recording, what type of action is being performed in a video, what the topic of a news article is, what the overall sentiment of a tweet is, or whether a dot in an X-ray is a tumor.

Labels can be obtained by asking humans to make judgments about a given piece of unlabeled data.[1] Labeled data is significantly more expensive to obtain than the raw unlabeled data.

The quality of labeled data directly influences the performance of supervised machine learning models in operation, as these models learn from the provided labels.[2]

  1. ^ "What is Data Labeling? - Data Labeling Explained - AWS". Amazon Web Services, Inc. Retrieved 2024-07-16.
  2. ^ Fredriksson, Teodor; Mattos, David Issa; Bosch, Jan; Olsson, Helena Holmström (2020), Morisio, Maurizio; Torchiano, Marco; Jedlitschka, Andreas (eds.), "Data Labeling: An Empirical Investigation into Industrial Challenges and Mitigation Strategies", Product-Focused Software Process Improvement, vol. 12562, Cham: Springer International Publishing, pp. 202–216, doi:10.1007/978-3-030-64148-1_13, ISBN 978-3-030-64147-4, retrieved 2024-07-13

From Wikipedia, the free encyclopedia · View on Wikipedia

Developed by Nelliwinne