Operant conditioning (also called "instrumental conditioning") is a learning process through which the strength of a behavior is modified by reinforcement or punishment. It is also a procedure that is used to bring about such learning.
Although operant and classical conditioning both involve behaviors controlled by environmental stimuli, they differ in nature. In operant conditioning, stimuli present when a behavior is rewarded or punished come to control that behavior. For example, a child may learn to open a box to get the candy inside, or learn to avoid touching a hot stove; in operant terms, the box and the stove are "discriminative stimuli". However, in classical conditioning, stimuli that signal significant events produce reflexive behavior. For example, sight of candy may cause a child to salivate, or the sound of a door slam may signal an angry parent, causing a child to tremble. Salivation and trembling are not operants; they are not reinforced by their consequences.
The study of animal learning in the 20th century was dominated by the analysis of these two sorts of learning, and they are still at the core of behavior analysis.
Operant conditioning, sometimes called instrumental learning, was first extensively studied by Edward L. Thorndike (1874-1949), who observed the behavior of cats trying to escape from home-made puzzle boxes. A cat could escape from the box by a simple response such as pulling a cord or pushing a pole, but when first constrained, the cats took a long time to get out. With repeated trials ineffective responses occurred less frequently and successful responses occurred more frequently, so the cats escaped more and more quickly. Thorndike generalized this finding in his law of effect, which states that behaviors followed by satisfying consequences tend to be repeated and those that produce unpleasant consequences are less likely to be repeated. In short, some consequences strengthen behavior and some consequences weaken behavior. By plotting escape time against trial number Thorndike produced the first known animal learning curves through this procedure.
Humans appear to learn many simple behaviors through the sort of process studied by Thorndike, now called operant conditioning. That is, responses are retained when they lead to a successful outcome and discarded when they do not, or when they produce aversive effects. This usually happens without being planned by any "teacher", but operant conditioning has been used by parents in teaching their children for thousands of years.
B.F. Skinner (1904-1990) is often referred to as the father of operant conditioning, and his work is frequently cited in connection with this topic. His book "The Behavior of Organisms", published in 1938, initiated his lifelong study of operant conditioning and its application to human and animal behavior. Following the ideas of Ernst Mach, Skinner rejected Thorndike's reference to unobservable mental states such as satisfaction, building his analysis on observable behavior and its equally observable consequences.
Skinner believed that classical conditioning was too simplistic to be used to describe something as complex as human behavior. Operant conditioning, in his opinion, better described human behavior since it examined causes and effects of intentional behavior.
To implement his empirical approach, Skinner invented the operant conditioning chamber, or "Skinner Box", in which subjects such as pigeons and rats were isolated and could be exposed to carefully controlled stimuli. Unlike Thorndike's puzzle box, this arrangement allowed the subject to make one or two simple, repeatable responses, and the rate of such responses became Skinner's primary behavioral measure. Another invention, the cumulative recorder, produced a graphical record from which these response rates could be estimated. These records were the primary data that Skinner and his colleagues used to explore the effects on response rate of various reinforcement schedules. A reinforcement schedule may be defined as "any procedure that delivers reinforcement to an organism according to some well-defined rule". The effects of schedules became, in turn, the basic findings from which Skinner developed his account of operant conditioning. He also drew on many less formal observations of human and animal behavior.
Many of Skinner's writings are devoted to the application of operant conditioning to human behavior. In 1948 he published Walden Two, a fictional account of a peaceful, happy, productive community organized around his conditioning principles. In 1957, Skinner published Verbal Behavior, which extended the principles of operant conditioning to language, a form of human behavior that had previously been analyzed quite differently by linguists and others. Skinner defined new functional relationships such as "mands" and "tacts" to capture some essentials of language, but he introduced no new principles, treating verbal behavior like any other behavior controlled by its consequences, which included the reactions of the speaker's audience.
Operant behavior is said to be "emitted"; that is, initially it is not elicited by any particular stimulus. Thus one may ask why it happens in the first place. The answer to this question is like Darwin's answer to the question of the origin of a "new" bodily structure, namely, variation and selection. Similarly, the behavior of an individual varies from moment to moment, in such aspects as the specific motions involved, the amount of force applied, or the timing of the response. Variations that lead to reinforcement are strengthened, and if reinforcement is consistent, the behavior tends to remain stable. However, behavioral variability can itself be altered through the manipulation of certain variables.
Reinforcement and punishment are the core tools through which operant behavior is modified. These terms are defined by their effect on behavior. Either may be positive or negative, as described below.
There is an additional procedure called "extinction"
Thus there are a total of five basic consequences -
It is important to note that actors (e.g. rat) are not spoken of as being reinforced, punished, or extinguished; it is the actions (e.g. lever press) that are reinforced, punished, or extinguished. Also, reinforcement, punishment, and extinction are not terms whose use is restricted to the laboratory. Naturally occurring consequences can also reinforce, punish, or extinguish behavior and are not always planned or delivered by people.
Schedules of reinforcement are rules that control the delivery of reinforcement. The rules specify either the time that reinforcement is to be made available, or the number of responses to be made, or both. Many rules are possible, but the following are the most basic and commonly used
The effectiveness of reinforcement and punishment can be changed in various ways.
Most of these factors serve biological functions. For example, the process of satiation helps the organism maintain a stable internal environment (homeostasis). When an organism has been deprived of sugar, for example, the taste of sugar is a highly effective reinforcer. However, when the organism's blood sugar reaches or exceeds an optimum level the taste of sugar becomes less effective, perhaps even aversive.
Shaping is a conditioning method much used in animal training and in teaching non-verbal humans. It depends on operant variability and reinforcement, as described above. The trainer starts by identifying the desired final (or "target") behavior. Next, the trainer chooses a behavior that the animal or person already emits with some probability. The form of this behavior is then gradually changed across successive trials by reinforcing behaviors that approximate the target behavior more and more closely. When the target behavior is finally emitted, it may be strengthened and maintained by the use of a schedule of reinforcement.
Noncontingent reinforcement is the delivery of reinforcing stimuli regardless of the organism's behavior. Noncontingent reinforcement may be used in an attempt to reduce an undesired target behavior by reinforcing multiple alternative responses while extinguishing the target response. As no measured behavior is identified as being strengthened, there is controversy surrounding the use of the term noncontingent "reinforcement".
Though initially operant behavior is emitted without an identified reference to a particular stimulus, during operant conditioning operants come under the control of stimuli that are present when behavior is reinforced. Such stimuli are called "discriminative stimuli." A so-called "three-term contingency" is the result. That is, discriminative stimuli set the occasion for responses that produce reward or punishment. Thus, a rat may be trained to press a lever only when a light comes on; a dog rushes to the kitchen when it hears the rattle of its food bag; a child reaches for candy when she sees it on a table.
Most behavior is under stimulus control. Several aspects of this may be distinguished:
Most behavior cannot easily be described in terms of individual responses reinforced one by one. The scope of operant analysis is expanded through the idea of behavioral chains, which are sequences of responses bound together by the three-term contingencies defined above. Chaining is based on the fact, experimentally demonstrated, that a discriminative stimulus not only sets the occasion for subsequent behavior, but it can also reinforce a behavior that precedes it. That is, a discriminative stimulus is also a "conditioned reinforcer". For example, the light that sets the occasion for lever pressing may be used to reinforce "turning around" in the presence of a noise. This results in the sequence "noise - turn-around - light - press lever - food". Much longer chains can be built by adding more stimuli and responses.
In escape learning, a behavior terminates an (aversive) stimulus. For example, shielding one's eyes from sunlight terminates the (aversive) stimulation of bright light in one's eyes. (This is an example of negative reinforcement, defined above.) Behavior that is maintained by preventing a stimulus is called "avoidance," as, for example, putting on sun glasses before going outdoors. Avoidance behavior raises the so-called "avoidance paradox", for, it may be asked, how can the non-occurrence of a stimulus serve as a reinforcer? This question is addressed by several theories of avoidance (see below).
Two kinds of experimental settings are commonly used: discriminated and free-operant avoidance learning.
A discriminated avoidance experiment involves a series of trials in which a neutral stimulus such as a light is followed by an aversive stimulus such as a shock. After the neutral stimulus appears an operant response such as a lever press prevents or terminate the aversive stimulus. In early trials the subject does not make the response until the aversive stimulus has come on, so these early trials are called "escape" trials. As learning progresses, the subject begins to respond during the neutral stimulus and thus prevents the aversive stimulus from occurring. Such trials are called "avoidance trials." This experiment is said to involve classical conditioning, because a neutral CS is paired with an aversive US; this idea underlies the two-factor theory of avoidance learning described below.
In free-operant avoidance a subject periodically receives an aversive stimulus (often an electric shock) unless an operant response is made; the response delays the onset of the shock. In this situation, unlike discriminated avoidance, no prior stimulus signals the shock. Two crucial time intervals determine the rate of avoidance learning. This first is the S-S (shock-shock) interval. This is time between successive shocks in the absence of a response. The second interval is the R-S (response-shock) interval. This specifies the time by which an operant response delays the onset of the next shock. Note that each time the subject performs the operant response, the R-S interval without shock begins anew.
This theory was originally proposed in order to explain discriminated avoidance learning, in which an organism learns to avoid an aversive stimulus by escaping from a signal for that stimulus. Two processes are involved: classical conditioning of the signal followed by operant conditioning of the escape response: a) Classical conditioning of fear. Initially the organism experiences the pairing of a CS (conditioned stimulus) with an aversive US (unconditioned stimulus). The theory assumes that this pairing creates an association between the CS and the US through classical conditioning and, because of the aversive nature of the US, the CS comes to elicit a conditioned emotional reaction (CER) - "fear." b) Reinforcement of the operant response by fear-reduction. As a result of the first process, the CS now signals fear; this unpleasant emotional reaction serves to motivate operant responses, and responses that terminate the CS are reinforced by fear termination. Note that the theory does not say that the organism "avoids" the US in the sense of anticipating it, but rather that the organism "escapes" an aversive internal state that is caused by the CS. Several experimental findings seem to run counter to two-factor theory. For example, avoidance behavior often extinguishes very slowly even when the initial CS-US pairing never occurs again, so the fear response might be expected to extinguish (see Classical conditioning). Further, animals that have learned to avoid often show little evidence of fear, suggesting that escape from fear is not necessary to maintain avoidance behavior.
Some theorists suggest that avoidance behavior may simply be a special case of operant behavior maintained by its consequences. In this view the idea of "consequences" is expanded to include sensitivity to a pattern of events. Thus, in avoidance, the consequence of a response is a reduction in the rate of aversive stimulation. Indeed, experimental evidence suggests that a "missed shock" is detected as a stimulus, and can act as a reinforcer. Cognitive theories of avoidance take this idea a step farther. For example, a rat comes to "expect" shock if it fails to press a lever and to "expect no shock" if it presses it, and avoidance behavior is strengthened if these expectancies are confirmed.
Operant hoarding refers to the observation that rats reinforced in a certain way may allow food pellets to accumulate in a food tray instead of retrieving those pellets. In this procedure, retrieval of the pellets always instituted a one-minute period of extinction during which no additional food pellets were available but those that had been accumulated earlier could be consumed. This finding appears to contradict the usual finding that rats behave impulsively in situations in which there is a choice between a smaller food object right away and a larger food object after some delay. See schedules of reinforcement.
The first scientific studies identifying neurons that responded in ways that suggested they encode for conditioned stimuli came from work by Mahlon deLong and by R.T. Richardson. They showed that nucleus basalis neurons, which release acetylcholine broadly throughout the cerebral cortex, are activated shortly after a conditioned stimulus, or after a primary reward if no conditioned stimulus exists. These neurons are equally active for positive and negative reinforcers, and have been shown to be related to neuroplasticity in many cortical regions. Evidence also exists that dopamine is activated at similar times. There is considerable evidence that dopamine participates in both reinforcement and aversive learning. Dopamine pathways project much more densely onto frontal cortex regions. Cholinergic projections, in contrast, are dense even in the posterior cortical regions like the primary visual cortex. A study of patients with Parkinson's disease, a condition attributed to the insufficient action of dopamine, further illustrates the role of dopamine in positive reinforcement. It showed that while off their medication, patients learned more readily with aversive consequences than with positive reinforcement. Patients who were on their medication showed the opposite to be the case, positive reinforcement proving to be the more effective form of learning when dopamine activity is high.
A neurochemical process involving dopamine has been suggested to underlie reinforcement. When an organism experiences a reinforcing stimulus, dopamine pathways in the brain are activated. This network of pathways "releases a short pulse of dopamine onto many dendrites, thus broadcasting a rather global reinforcement signal to postsynaptic neurons." This allows recently activated synapses to increase their sensitivity to efferent (conducting outward) signals, thus increasing the probability of occurrence for the recent responses that preceded the reinforcement. These responses are, statistically, the most likely to have been the behavior responsible for successfully achieving reinforcement. But when the application of reinforcement is either less immediate or less contingent (less consistent), the ability of dopamine to act upon the appropriate synapses is reduced.
A number of observations seem to show that operant behavior can be established without reinforcement in the sense defined above. Most cited is the phenomenon of autoshaping (sometimes called "sign tracking"), in which a stimulus is repeatedly followed by reinforcement, and in consequence the animal begins to respond to the stimulus. For example, a response key is lighted and then food is presented. When this is repeated a few times a pigeon subject begins to peck the key even though food comes whether the bird pecks or not. Similarly, rats begin to handle small objects, such as a lever, when food is presented nearby. Strikingly, pigeons and rats persist in this behavior even when pecking the key or pressing the lever leads to less food (omission training).
These observations and others appear to contradict the law of effect, and they have prompted some researchers to propose new conceptualizations of operant reinforcement (e.g.) A more general view is that autoshaping is an instance of classical conditioning; the autoshaping procedure has, in fact, become one of the most common ways to measure classical conditioning. In this view, many behaviors can be influenced by both classical contingencies (stimulus-response) and operant contingencies (response-reinforcement), and the experimenter's task is to work out how these interact.
The example of someone having a positive experience with a drug is easy to see how drug dependence and the law of effect works. The tolerance for a drug goes down as one continues to use it after having a positive experience with a certain amount the first time. It will take more and more to get that same feeling. This is when the controlled substance in an experiment would have to be modified and the experiment would really begin. The law of work for psychologist B. F. Skinner almost half a century later on the principles of operant conditioning, "a learning process by which the effect, or consequence, of a response influences the future rate of production of that response.
Reinforcement and punishment are ubiquitous in human social interactions, and a great many applications of operant principles have been suggested and implemented. Following are a few examples.
Applied behavior analysis is the discipline initiated by B. F. Skinner that applies the principles of conditioning to the modification of socially significant human behavior. It uses the basic concepts of conditioning theory, including conditioned stimulus (SC), discriminative stimulus (Sd), response (R), and reinforcing stimulus (Srein or Sr for reinforcers, sometimes Save for aversive stimuli). A conditioned stimulus controls behaviors developed through respondent (classical) conditioning, such as emotional reactions. The other three terms combine to form Skinner's "three-term contingency": a discriminative stimulus sets the occasion for responses that lead to reinforcement. Researchers have found the following protocol to be effective when they use the tools of operant conditioning to modify human behavior:
Practitioners of applied behavior analysis (ABA) bring these procedures, and many variations and developments of them, to bear on a variety of socially significant behaviors and issues. In many cases, practitioners use operant techniques to develop constructive, socially acceptable behaviors to replace aberrant behaviors. The techniques of ABA have been effectively applied in to such things as early intensive behavioral interventions for children with an autism spectrum disorder (ASD) research on the principles influencing criminal behavior, HIV prevention, conservation of natural resources, education,gerontology,health and exercise,industrial safety,language acquisition, littering,medical procedures, parenting,psychotherapy, seatbelt use,severe mental disorders, sports,substance abuse, phobias, pediatric feeding disorders, and zoo management and care of animals. Some of these applications are among those described below.
Positive and negative reinforcement play central roles in the development and maintenance of addiction and drug dependence. An addictive drug is intrinsically rewarding; that is, it functions as a primary positive reinforcer of drug use. The brain's reward system assigns it incentive salience (i.e., it is "wanted" or "desired"), so as an addiction develops, deprivation of the drug leads to craving. In addition, stimuli associated with drug use - e.g., the sight of a syringe, and the location of use - become associated with the intense reinforcement induced by the drug. These previously neutral stimuli acquire several properties: their appearance can induce craving, and they can become conditioned positive reinforcers of continued use. Thus, if an addicted individual encounters one of these drug cues, a craving for the associated drug may reappear. For example, anti-drug agencies previously used posters with images of drug paraphernalia as an attempt to show the dangers of drug use. However, such posters are no longer used because of the effects of incentive salience in causing relapse upon sight of the stimuli illustrated in the posters.
In drug dependent individuals, negative reinforcement occurs when a drug is self-administered in order to alleviate or "escape" the symptoms of physical dependence (e.g., tremors and sweating) and/or psychological dependence (e.g., anhedonia, restlessness, irritability, and anxiety) that arise during the state of drug withdrawal.
Animal trainers and pet owners were applying the principles and practices of operant conditioning long before these ideas were named and studied, and animal training still provides one of the clearest and most convincing examples of operant control. Of the concepts and procedures described in this article, a few of the most salient are the following: (a) availability of primary reinforcement (e.g. a bag of dog yummies); (b) the use of secondary reinforcement, (e.g. sounding a clicker immediately after a desired response, then giving yummy); (c) contingency, assuring that reinforcement (e.g. the clicker) follows the desired behavior and not something else; (d) shaping, as in gradually getting a dog to jump higher and higher; (e) intermittent reinforcement, as in gradually reducing the frequency of reinforcement to induce persistent behavior without satiation; (f) chaining, where a complex behavior is gradually constructed from smaller units.
Example of animal training from Seaworld related on Operant conditioning 
Animal training has effects on positive reinforcement and negative reinforcement. Schedules of reinforcements may play a big role on the animal training case.
Providing positive reinforcement for appropriate child behaviors is a major focus of parent management training. Typically, parents learn to reward appropriate behavior through social rewards (such as praise, smiles, and hugs) as well as concrete rewards (such as stickers or points towards a larger reward as part of an incentive system created collaboratively with the child). In addition, parents learn to select simple behaviors as an initial focus and reward each of the small steps that their child achieves towards reaching a larger goal (this concept is called "successive approximations").
Both psychologists and economists have become interested in applying operant concepts and findings to the behavior of humans in the marketplace. An example is the analysis of consumer demand, as indexed by the amount of a commodity that is purchased. In economics, the degree to which price influences consumption is called "the price elasticity of demand." Certain commodities are more elastic than others; for example, a change in price of certain foods may have a large effect on the amount bought, while gasoline and other essentials may be less affected by price changes. In terms of operant analysis, such effects may be interpreted in terms of motivations of consumers and the relative value of the commodities as reinforcers.
As stated earlier in this article, a variable ratio schedule yields reinforcement after the emission of an unpredictable number of responses. This schedule typically generates rapid, persistent responding. Slot machines pay off on a variable ratio schedule, and they produce just this sort of persistent lever-pulling behavior in gamblers. The variable ratio payoff from slot machines and other forms of gambling has often been cited as a factor underlying gambling addiction.
Nudge theory (or nudge) is a concept in behavioural science, political theory and economics which argues that positive reinforcement and indirect suggestions to try to achieve non-forced compliance can influence the motives, incentives and decision making of groups and individuals, at least as effectively - if not more effectively - than direct instruction, legislation, or enforcement.
The concept of praise as a means of behavioral reinforcement is rooted in B.F. Skinner's model of operant conditioning. Through this lens, praise has been viewed as a means of positive reinforcement, wherein an observed behavior is made more likely to occur by contingently praising said behavior. Hundreds of studies have demonstrated the effectiveness of praise in promoting positive behaviors, notably in the study of teacher and parent use of praise on child in promoting improved behavior and academic performance, but also in the study of work performance. Praise has also been demonstrated to reinforce positive behaviors in non-praised adjacent individuals (such as a classmate of the praise recipient) through vicarious reinforcement. Praise may be more or less effective in changing behavior depending on its form, content and delivery. In order for praise to effect positive behavior change, it must be contingent on the positive behavior (i.e., only administered after the targeted behavior is enacted), must specify the particulars of the behavior that is to be reinforced, and must be delivered sincerely and credibly.
Acknowledging the effect of praise as a positive reinforcement strategy, numerous behavioral and cognitive behavioral interventions have incorporated the use of praise in their protocols. The strategic use of praise is recognized as an evidence-based practice in both classroom management and parenting training interventions, though praise is often subsumed in intervention research into a larger category of positive reinforcement, which includes strategies such as strategic attention and behavioral rewards.
Cognitive-Behavioral Therapy and Operant-Behavioral Therapy Several studies have been done on the effect Cognitive-Behavioral therapy and Operant-Behavioral therapy have on different medical conditions. When patients developed cognitive and behavioral techniques that changed their behaviors, attitudes, and emotions; their pain severity decreased. The results of these studies showed an influence of cognitions on pain perception and impact presented explained the general efficacy of Cognitive-Behavioral therapy (CBT) and Operant-Behavioral therapy (OBT).
Traumatic bonding occurs as the result of ongoing cycles of abuse in which the intermittent reinforcement of reward and punishment creates powerful emotional bonds that are resistant to change.
The other source indicated that  'The necessary conditions for traumatic bonding are that one person must dominate the other and that the level of abuse chronically spikes and then subsides. The relationship is characterized by periods of permissive, compassionate, and even affectionate behavior from the dominant person, punctuated by intermittent episodes of intense abuse. To maintain the upper hand, the victimizer manipulates the behavior of the victim and limits the victim's options so as to perpetuate the power imbalance. Any threat to the balance of dominance and submission may be met with an escalating cycle of punishment ranging from seething intimidation to intensely violent outbursts. The victimizer also isolates the victim from other sources of support, which reduces the likelihood of detection and intervention, impairs the victim's ability to receive countervailing self-referent feedback, and strengthens the sense of unilateral dependency...The traumatic effects of these abusive relationships may include the impairment of the victim's capacity for accurate self-appraisal, leading to a sense of personal inadequacy and a subordinate sense of dependence upon the dominating person. Victims also may encounter a variety of unpleasant social and legal consequences of their emotional and behavioral affiliation with someone who perpetrated aggressive acts, even if they themselves were the recipients of the aggression. '.
Most video games are designed around some type of compulsion loop, adding a type of positive reinforcement through a variable rate schedule to keep the player playing the game, though this can also lead to video game addiction.
As part of a trend in the monetization of video games in the 2010s, some games offered "loot boxes" as rewards or purchasable by real-world funds that offered a random selection of in-game items, distributed by rarity. The practice has been tied to the same methods that slot machines and other gambling devices dole out rewards, as it follows a variable rate schedule. While the general perception that loot boxes are a form of gambling, the practice is only classified as such in a few countries as gambling and otherwise legal. However, methods to use those items as virtual currency for online gambling or trading for real-world money has created a skin gambling market that is under legal evaluation.
Ashforth discussed potentially destructive sides of leadership and identified what he referred to as petty tyrants: leaders who exercise a tyrannical style of management, resulting in a climate of fear in the workplace. Partial or intermittent negative reinforcement can create an effective climate of fear and doubt. When employees get the sense that bullies are tolerated, a climate of fear may be the result.
Individual differences in sensitivity to reward, punishment, and motivation have been studied under the premises of reinforcement sensitivity theory and have also been applied to workplace performance.
Rewards in operant conditioning are positive reinforcers. ... Operant behavior gives a good definition for rewards. Anything that makes an individual come back for more is a positive reinforcer and therefore a reward. Although it provides a good definition, positive reinforcement is only one of several reward functions. ... Rewards are attractive. They are motivating and make us exert an effort. ... Rewards induce approach behavior, also called appetitive or preparatory behavior, and consummatory behavior. ... Thus any stimulus, object, event, activity, or situation that has the potential to make us approach and consume it is by definition a reward.
Abused substances (ranging from alcohol to psychostimulants) are initially ingested at regular occasions according to their positive reinforcing properties. Importantly, repeated exposure to rewarding substances sets off a chain of secondary reinforcing events, whereby cues and contexts associated with drug use may themselves become reinforcing and thereby contribute to the continued use and possible abuse of the substance(s) of choice. ...
An important dimension of reinforcement highly relevant to the addiction process (and particularly relapse) is secondary reinforcement (Stewart, 1992). Secondary reinforcers (in many cases also considered conditioned reinforcers) likely drive the majority of reinforcement processes in humans. In the specific case of drug [addiction], cues and contexts that are intimately and repeatedly associated with drug use will often themselves become reinforcing ... A fundamental piece of Robinson and Berridge's incentive-sensitization theory of addiction posits that the incentive value or attractive nature of such secondary reinforcement processes, in addition to the primary reinforcers themselves, may persist and even become sensitized over time in league with the development of drug addiction (Robinson and Berridge, 1993). ...
Negative reinforcement is a special condition associated with a strengthening of behavioral responses that terminate some ongoing (presumably aversive) stimulus. In this case we can define a negative reinforcer as a motivational stimulus that strengthens such an "escape" response. Historically, in relation to drug addiction, this phenomenon has been consistently observed in humans whereby drugs of abuse are self-administered to quench a motivational need in the state of withdrawal (Wikler, 1952).
When a Pavlovian CS+ is attributed with incentive salience it not only triggers 'wanting' for its UCS, but often the cue itself becomes highly attractive - even to an irrational degree. This cue attraction is another signature feature of incentive salience. The CS becomes hard not to look at (Wiers & Stacy, 2006; Hickey et al., 2010a; Piech et al., 2010; Anderson et al., 2011). The CS even takes on some incentive properties similar to its UCS. An attractive CS often elicits behavioral motivated approach, and sometimes an individual may even attempt to 'consume' the CS somewhat as its UCS (e.g., eat, drink, smoke, have sex with, take as drug). 'Wanting' of a CS can turn also turn the formerly neutral stimulus into an instrumental conditioned reinforcer, so that an individual will work to obtain the cue (however, there exist alternative psychological mechanisms for conditioned reinforcement too).
An important goal in future for addiction neuroscience is to understand how intense motivation becomes narrowly focused on a particular target. Addiction has been suggested to be partly due to excessive incentive salience produced by sensitized or hyper-reactive dopamine systems that produce intense 'wanting' (Robinson and Berridge, 1993). But why one target becomes more 'wanted' than all others has not been fully explained. In addicts or agonist-stimulated patients, the repetition of dopamine-stimulation of incentive salience becomes attributed to particular individualized pursuits, such as taking the addictive drug or the particular compulsions. In Pavlovian reward situations, some cues for reward become more 'wanted' more than others as powerful motivational magnets, in ways that differ across individuals (Robinson et al., 2014b; Saunders and Robinson, 2013). ... However, hedonic effects might well change over time. As a drug was taken repeatedly, mesolimbic dopaminergic sensitization could consequently occur in susceptible individuals to amplify 'wanting' (Leyton and Vezina, 2013; Lodge and Grace, 2011; Wolf and Ferrario, 2010), even if opioid hedonic mechanisms underwent down-regulation due to continual drug stimulation, producing 'liking' tolerance. Incentive-sensitization would produce addiction, by selectively magnifying cue-triggered 'wanting' to take the drug again, and so powerfully cause motivation even if the drug became less pleasant (Robinson and Berridge, 1993).