The observer-expectancy effect (also called the experimenter-expectancy effect, expectancy bias, observer effect, or experimenter effect) is a form of reactivity in which a researcher's cognitive bias causes them to subconsciously influence the participants of an experiment. Confirmation bias can lead to the experimenter interpreting results incorrectly because of the tendency to look for information that conforms to their hypothesis, and overlook information that argues against it. It is a significant threat to a study's internal validity, and is therefore typically controlled using a double-blind experimental design.
An example of the observer-expectancy effect is demonstrated in music backmasking, in which hidden verbal messages are said to be audible when a recording is played backwards. Some people expect to hear hidden messages when reversing songs, and therefore hear the messages, but to others it sounds like nothing more than random sounds. Often when a song is played backwards, a listener will fail to notice the "hidden" lyrics until they are explicitly pointed out, after which they are obvious. Other prominent examples include facilitated communication and dowsing.
In research, experimenter bias occurs when experimenter expectancies regarding study results bias the research outcome. Examples of experimenter bias include conscious or unconscious influences on subject behavior including creation of demand characteristics that influence subjects, and altered or selective recording of experimental results themselves.
The experimenter may introduce cognitive bias into a study in several ways. In what is called the observer-expectancy effect, the experimenter may subtly communicate their expectations for the outcome of the study to the participants, causing them to alter their behavior to conform to those expectations. Such observer bias effects are near-universal in human data interpretation under expectation and in the presence of imperfect cultural and methodological norms that promote or enforce objectivity.
The classic example of experimenter bias is that of "Clever Hans" (in German, der Kluge Hans), a Orlov Trotter horse claimed by his owner von Osten to be able to do arithmetic and other tasks. As a result of the large public interest in Clever Hans, philosopher and psychologist Carl Stumpf, along with his assistant Oskar Pfungst, investigated these claims. Ruling out simple fraud, Pfungst determined that the horse could answer correctly even when von Osten did not ask the questions. However, the horse was unable to answer correctly when either it could not see the questioner, or if the questioner themselves was unaware of the correct answer: When von Osten knew the answers to the questions, Hans answered correctly 89 percent of the time. However, when von Osten did not know the answers, Hans guessed only six percent of questions correctly.
Pfungst then proceeded to examine the behaviour of the questioner in detail, and showed that as the horse's taps approached the right answer, the questioner's posture and facial expression changed in ways that were consistent with an increase in tension, which was released when the horse made the final, correct tap. This provided a cue that the horse had learned to use as a reinforced cue to stop tapping.
Experimenter-bias also influences human subjects. As an example, researchers compared performance of two groups given the same task (rating portrait pictures and estimating how successful each individual was on a scale of -10 to 10), but with different experimenter expectations.
In one group, ("Group A"), experimenters were told to expect positive ratings while in another group, ("Group B"), experimenters were told to expect negative ratings. Data collected from Group A was a significant and substantially more optimistic appraisal than the data collected from Group B. The researchers suggested that experimenters gave subtle but clear cues with which the subjects complied.
The ultimate source of bias lies in a lack of objectivity. It may occur more often in sociological and medical studies, perhaps due to incentives. Experimenter bias can also be found in some physical sciences, for instance, where an experimenter selectively rounds off measurements. Double blind techniques may be employed to combat bias.
Modern electronic or computerized data acquisition techniques have greatly reduced the likelihood of such bias, but it can still be introduced by a poorly designed analysis technique. Experimenter's bias was not well recognized until the 1950s and 60's, and then it was primarily in medical experiments and studies. Sackett (1979) catalogued 56 biases that can arise in sampling and measurement in clinical research, among the above-stated first six stages of research. These are as follows:
Double blind techniques may be employed to combat bias by causing the experimenter and subject to be ignorant of which condition data flows from.
It might be thought that, due to the central limit theorem of statistics, collecting more independent measurements will improve the precision of estimates, thus decreasing bias. However this assumes that the measurements are statistically independent. In the case of experimenter bias, the measures share correlated bias: simply averaging such data will not lead to a better statistic but may merely reflect the correlations among the individual measurements and their non-independent nature.
In medical sciences, the complexity of living systems and ethical constraints may limit the ability of researchers to perform controlled experiments. In such circumstances scientific knowledge about the phenomenon under study, and the systematic elimination of probable causes of bias, by detecting confounding factors, is the only way to isolate true cause-effect relationships. Experimenter bias in epidemiology has been better studied than in other sciences.
A number of studies into Spiritual Healing illustrate how the design of the study can introduce experimenter bias into the results. A comparison of two studies illustrates that subtle differences in the design of the tests can adversely affect the results of one. The difference was due to the intended result: a positive or negative outcome rather than positive or neutral. A 1995 paper by Hodges & Scofield of spiritual healing used the growth rate of cress seeds as their independent variable in order to eliminate a placebo response or participant bias. The study reported positive results as the test results for each sample were consistent with the healers intention that healing should or should not occur. However the healer involved in the experiment was a personal acquaintance of the study authors raising the distinct possibility of experimenter bias. A randomized clinical trial, published in 2001, investigated the efficacy of spiritual healing (both at a distance and face-to-face) on the treatment of chronic pain in 120 patients. Healers were observed by "simulated healers" who then mimicked the healers movements on a control group while silently counting backwards in fives - a neutral rather than should not heal intention. The study found a decrease in pain in all patient groups but "no statistically significant differences between healing and control groups ... it was concluded that a specific effect of face-to-face or distant healing on chronic pain could not be demonstrated."
When a signal under study is smaller than the rounding error of measurement and data are over-averaged[quantify], a positive result may be found where none exists (i.e. a more precise experimental apparatus would conclusively show no signal). For instance a study of variation in sidereal time, subject to rounding of measures by a human who is aware of the measurement value may lead to selectivity in rounding, effectively generating a false signal. In such cases a single-blind experimental protocol is required; if the human observer does not know the sidereal time of the measurements, then even though the round-off is non-random it cannot introduce a spurious sidereal variation.
Results of a scientific test may be distorted when the underlying data are ambiguous and the scientist is exposed to domain-irrelevant cues which engage emotion. For instance, forensic DNA results are ambiguous, and resolving these ambiguities, particularly when interpreting difficult evidence samples such as those that contain mixtures of DNA from two or more individuals, degraded or inhibited DNA, or limited quantities of DNA template may introduce bias. The full potential of forensic DNA testing can only be realized if observer effects are minimized.
After the data are collected, bias may be introduced during data interpretation and analysis. For example, in deciding which variables to control in analysis, social scientists often face a trade-off between omitted-variable bias and post-treatment bias.