In machine learning, the term "ground truth" refers to the accuracy of the training set's classification for supervised learning techniques. This is used in statistical models to prove or disprove research hypotheses. The term "ground truthing" refers to the process of gathering the proper objective (provable) data for this test. Compare with gold standard.
Bayesian spam filtering is a common example of supervised learning. In this system, the algorithm is manually taught the differences between spam and non-spam. This depends on the ground truth of the messages used to train the algorithm – inaccuracies in the ground truth will correlate to inaccuracies in the resulting spam/non-spam verdicts.
In remote sensing, "ground truth" refers to information collected on location. Ground truth allows image data to be related to real features and materials on the ground. The collection of ground-truth data enables calibration of remote-sensing data, and aids in the interpretation and analysis of what is being sensed. Examples include cartography, meteorology, analysis of aerial photographs, satellite imagery and other techniques in which data are gathered at a distance.
More specifically, ground truth may refer to a process in which a "pixel" on a satellite image is compared to what is there in reality (at the present time) in order to verify the contents of the "pixel" on the image (noting that the concept of a "pixel" is somewhat ill-defined). In the case of a classified image, it allows supervised classification to help determine the accuracy of the classification performed by the remote sensing software and therefore minimize errors in the classification such as errors of commission and errors of omission.
Ground truth is usually done on site, performing surface observations and measurements of various properties of the features of the ground resolution cells that are being studied on the remotely sensed digital image. It also involves taking geographic coordinates of the ground resolution cell with GPS technology and comparing those with the coordinates of the "pixel" being studied provided by the remote sensing software to understand and analyze the location errors and how it may affect a particular study.
Ground truth is important in the initial supervised classification of an image. When the identity and location of land cover types are known through a combination of field work, maps, and personal experience these areas are known as training sites. The spectral characteristics of these areas are used to train the remote sensing software using decision rules for classifying the rest of the image. These decision rules such as Maximum Likelihood Classification, Parallelepiped Classification, and Minimum Distance Classification offer different techniques to classify an image. Additional ground truth sites allow the remote sensor to establish an error matrix which validates the accuracy of the classification method used. Different classification methods may have different percentages of error for a given classification project. It is important that the remote sensor chooses a classification method that works best with the number of classifications used while providing the least amount of error.
Ground truth also helps with atmospheric correction. Since images from satellites obviously have to pass through the atmosphere, they can get distorted because of absorption in the atmosphere. So ground truth can help fully identify objects in satellite photos.
An example of an error of commission is when a pixel reports the presence of a feature (such as trees) that, in reality, is absent (no trees are actually present). Ground truthing ensures that the error matrices have a higher accuracy percentage than would be the case if no pixels were ground truthed. This value is the inverse of the user's accuracy, i.e. Commission Error = 1 - user's accuracy.
An example of an error of omission is when pixels of a certain thing, for example maple trees, are not classified as maple trees. The process of ground truthing helps to ensure that the pixel is classified correctly and the error matrices are more accurate. This value is the inverse of the producer's accuracy, i.e. Omission Error = 1 - producer's accuracy
Geographic information systems such as GIS, GPS, and GNSS, have become so widespread that the term "ground truth" has taken on special meaning in that context. If the location coordinates returned by a location method such as GPS are an estimate of a location, then the "ground truth" is the actual location on earth. A smart phone might return a set of estimated location coordinates such as 43.87870,-103.45901. The ground truth being estimated by those coordinates is the tip of George Washington's nose on Mt. Rushmore. The accuracy of the estimate is the maximum distance between the location coordinates and the ground truth. We could say in this case that the estimate accuracy is 10 meters, meaning that the point on earth represented by the location coordinates is thought to be within 10 meters of George's nose--the ground truth. In slang, the coordinates indicate where we think George Washington's nose is located, and the ground truth is where it's really at. In practice a smart phone or hand-held GPS unit is routinely able to estimate the ground truth within 6-10 meters. Specialized instruments can reduce GPS measurement error to under a centimeter.
US military slang uses "ground truth" to describe the reality of a tactical situation - as opposed to intelligence reports and mission plans. The term appears in the title of the Iraq War documentary film The Ground Truth (2006), and also in military publications, for example Stars and Stripes saying: "Stripes decided to figure out what the ground truth was in Iraq."
The Oxford English Dictionary (s.v. "ground truth") records the use of the word "Groundtruth" in the sense of a "fundamental truth" from Henry Ellison's poem "The Siberian Exile's Tale", published in 1833.
As the Groundtruth of her own Existence it must be regarded, thro' Him in its highest, purest Aspect shown!