|
Operant conditioning
An operant response is a voluntary activity that brings about a
reward. In operant conditioning, the buzzer used by Pavlov might
still be presented but the dog must make a particular response
before food is consumed. In other words, there is a special link
(what learning theorists call a contingency) between a particular
behavioural response and a food reward.
While Pavlov was concentrating on the physiological responses of
dogs in harnesses, Thorndike (1911) was studying the behavioural
responses of cats in puzzle boxes. Instead of delivering food
independently of behaviour whenever a signal had been presented,
Thorndike delivered it once his animals had responded. In a body of
work intended to discredit the notion that animals are capable of
reason, Thorndike described the behaviour of a naïve cat in a
specially designed box.

Of course without any food or other home comforts, life was rather
dull and unsustainable in the puzzle box but the cat could get out -
but only by pulling a trigger. Motivated to access food outside the
box, Thorndike's cats would eventually learn to escape by operating
the trigger that released the door latch. Once out of the box, the
cat would get his food. Thorndike called this “trial and error
learning".
This label has largely been replaced by the terms instrumental
learning and
operant
conditioning. The animal sees a cue (the trigger),
performs a response (pulling) and gets a reward (liberty and food).
The effect of the reward is to strengthen the correct response. This
is known as reinforcement. The term reinforcement
refers to the process in which a reinforcer follows a
particular behaviour so that the frequency (or probability) of that
behaviour increases.
Reinforcers and punishments
Operant conditioning enables an animal to associate events over
which it has control. This increases the controllability of the
environment and represents the crucial difference between classical
(Pavlovian) and operant conditioning. In classical conditioning,
rewards become associated with stimuli while in operant conditioning
they become associated with responses. The majority of exercises in
animal training rely on operant conditioning.
Rewarding
the desired behaviour relates to the Law of Effect which states that
whatever behaviour immediately precedes reinforcement will be
strengthened. Several studies suggest that lack of control over
aversive events can bring about major behavioural and physiological
changes. For example, after being exposed to uncontrollable electric
shocks, rats have increased gastric ulceration, increased
defaecation rates and are more susceptible to certain cancers when
compared to individuals that do have control over comparable shocks.
The distinction between classical and instrumental conditioning
seems clear when we are considering the temporal relationship
between the response and the reinforcement but the mechanisms
involved are likely to be the same. Indeed, they are both often
found in what appears to be a single response. The likelihood of an
association arising depends on the relationship between the first
event and the second via stimulus-response-reinforcement chains.
Consider a cat that has learned to run towards the kitchen when she
hears the sound of the tin opener. The sound of the tin opener comes
before food and therefore becomes classically associated with food,
making the appearance of her supper more predictable. Running in
response to the auditory cue is a product of operant conditioning
that is reinforced every time she is fed. Her life is more
controllable as a result of this learning because she can choose the
speed with which she runs or even whether to bother running at all.
This field of study was developed further by B.F. Skinner who
created the Skinner box, a device that is basically a problem box in
which the subject learns by trial and error that pressing a bar
yields a small reward. The bar-pressing behaviour is then
reinforced. Skinner reported his findings in a seminal paper called
the Behaviour of Organisms. He argued that, with the selection of
appropriate rewards, this system could be used to teach anything.

Food is not the only reward that can be used. The other obvious one
is water that can be given to subjects that have been kept thirsty.
This is interesting because close observation of the heads of
experimental pigeons in Skinner boxes shows that they adopt
different approaches to the key (an operant device that must be
pecked) depending on whether they are expecting to receive a food or
water reward. If the reward is water then the bird will use the
device with closed eyes, an open mouth and a peck of longer duration
and less force than the peck for food.
Some argue that reinforcement is necessary for learning to take
place. However, rats that receive a shock to their hind-paws while
in transit, when being trained to run from A to B down an alleyway
will reach A faster than those given only food as an incentive. A
reinforcer is anything that increases the frequency of the
particular behaviour that it follows. Operant conditioning allows us
to use reinforcers and punishers that positively and negatively
influence the likelihood of a behaviour being repeated or not. A
response will increase in strength when followed by a reward. In his
free operant experiments, Skinner measured the strength of a
response by recording the response rate, i.e. number of responses
per unit of time. Skinner used this outcome to develop the principle
of reinforcement.
If a blue tit pecks for long enough at the foil on a milk bottle he
will encounter the cream below and be more likely to repeat the
pecking behaviour. Reinforcement has occurred. The blue tit has an
innate drive to continue foraging in this way if the costs can be
offset by the benefits. In other words, if evolution has equipped
him to be an efficient food gatherer, the bird can judge, at a
subconscious level, whether the time and energy spent pecking at the
foil and the risks of predation by the nearest cat are outweighed by
the (taste and) nutritional value of the cream. This exemplifies
what ethologists refer to as optimal foraging, the non-human
equivalent of a time and motion study. The pleasure and nourishment
brought by the activity justify its pursuit.
The merit of a reinforcer can only be measured in terms of the
degree to which it makes the behaviour more likely in future. If a
trainer's saying "good dog" in response to a dog’s heel-work has no
effect on the dog’s future behaviour then, according to this
definition, reinforcement has not occurred. The trainer's words have
had a neutral or even confusing effect. The definition does not
describe how or why some events act as reinforcers. Whether some
event is called a reinforcer is purely a matter of the effect it
had. This is why, instead of encouraging owners to give their dogs
praise, which can so often be understated and, as a result,
ineffective, many of the more enlightened dog schools tell their
humans to 'make those tails wag'.
Animals can be trained to do quite remarkable things if they are
reinforced at the right time. For example, in one study, Skinner
delivered food to eight pigeons every 15 seconds regardless of what
they were doing at the time. After a number of rewards, six of them
were performing behaviours (such as circling in a single direction)
repeatedly throughout the interval between reinforcers. Even though
there was no causal relationship between the behaviour and the
reinforcer, the birds happened to be doing something at the time of
reinforcement. By waiting for an incidental movement of the eyelids,
scientists were able to teach pigeons to blink to receive a food
reward. Cats that learn to rub their owner's legs just prior to the
delivery of food have learned in the same way. The activity
generally does little to get the food to them quicker, i.e. is not
causal, but because of its contiguity to reinforcement it is
slavishly included in pre-prandial rituals. Those of us who
repeatedly press the on-button at a pedestrian crossing are probably
subject to the same phenomenon. Because we perceive a link between
serial button pressing and the appearance of the signal to cross, we
think it is the best way of getting the desired outcome quicker.
Strangely, this was called superstitious learning for some time.
This was surely a misnomer since the pigeon was behaving predictably
and rationally rather than misguidedly.
Reinforcers can be either primary or secondary.
Primary reinforcers are any resources that animals have evolved to
seek. If the animal's motivation is correctly predicted food, water,
sex, play, liberty, sanctuary and companionship can all be used as
primary reinforcers. Secondary reinforcers are stimuli
that are not intrinsically rewarding but that have become associated
with the kind of primary resources listed above. These associations
make great sense in evolutionary terms since an auditory, olfactory
or visual cue that has become reliably linked with a primary
reinforcer will hold an animal’s interest much longer than a neutral
stimulus. For example, a fox can learn to make associations between
the smell of hens and the meal they represent. If the smell did
nothing to help the fox feed then it would hold no value and remain
neutral. Instead, it encourages the fox to persist in its foraging
activities.
The houselight in
a
Skinner’s box can be used to indicate a correct response if the
reward and the light have been delivered simultaneously on a number
of occasions. The light becomes reinforcing. The rat appears to look
forward to illumination of the light. Consider for a moment the way
in which horses are often praised with tactile stimuli, they can be
either scratched at the withers or patted on the neck. Horses have
evolved to find grooming one another rewarding. Indeed horses
indulging in the familiar 'I'll scratch your back if you scratch
mine' occupation have reduced heart rates that suggest they may be
getting pleasure or stress reduction from the stimulation. So, a
scratch in the correct part of the withers can represent a primary
reinforcer. By comparison, the far more common practice of patting
horses on the neck is reinforcing only if the owner has coupled the
pat with something pleasant. Because horses have not evolved to be
motivated to behave in a certain way for pats on the neck, the
stimulation has to be conditioned as a secondary reinforcer.
Perhaps the best example of a secondary reinforcer is the sound made
by a so-called 'clicker', the handy device used by thousands of
trainers world-wide. Pioneered by students of Skinner, this
association allows the trainer to bridge the gap between the time at
which an animal performs a response correctly and the arrival of a
primary reinforcer. The Brelands developed feeding devices that made
a characteristic sound as a prelude to food. Psychology labs that
use rats for learning studies do the same thing and call it hopper
or magazine training. Essentially the clicker comes to mean 'Yes,
That's good - expect a reward any second now'. When a clicker is
first used the correct association is established by making the
sound just before giving a delicious reward and doing this many
times to convince the animal of the signal's reliability. Clicker
training proves particularly helpful when training behaviours in a
free operant situation. Any secondary reinforcer can be instituted
in this way. The only significant feature of a commercial clicker
device is the sound it makes which is crisp and distinctive. The
crispness facilitates precise reinforcement of sophisticated and
brief behaviours such as the blinking of an eye. Being pocket-sized
or attachable to key-rings, clickers are convenient but by no means
unique. Indeed, as long as they cannot be confused with words that
appear in common parlance, human vocalisations (so-called clicker
words) are even more readily available.
Secondary reinforcers are most effectively established when
presented before or up until the presentation of a primary
reinforcer. Simultaneous presentation of a reward and a novel
secondary stimulus is less likely to work because the primary
reinforcer will block or overshadow the new stimulus. Similarly,
presentation of the secondary stimulus after the primary reinforcer
is unproductive, because although an association will exist between
the two, it does not help the animal predict the arrival of a
reward. Perhaps this is why hunting species respond more to the
smell of blood as reliable precursor of food than intestinal
contents, which appears only after a kill.
Dogs have evolved to appease the leader of their pack and this may
be why they respond so readily to social rewards such as petting and
praise from humans. Some dogs can be reinforced by the slightest
social contact and this is why pushing such a pet to the floor after
it has broken a house rule by jumping up at a visitor is highly
unlikely to eliminate the unwanted behaviour. Given that appeasement
and affirmation of a social bond is the reason for contact being so
reinforcing, it is important to remind ourselves that the human
offering such a reward can only really make it worthwhile for the
dog if the dog has an understanding of that human’s perceived
greater social rank. This is probably why a vet saying “Good Boy” as
she deals with a strange dog in the clinic is far less effective
than when she uses the same words with her own pet. Equally this
explains why guide dog trainers emphasise the importance of
developing a bond between dogs and their trainers and subsequent
owners, before asking them to work.
For dogs with an innate play drive that makes their ancestors, the
wolves, look like proper party poopers, many toys have value and
therefore can have reinforcing properties. As well as being
rewarding in their own right, toys can be used as conditioned or
secondary reinforcers in behaviour therapy. If a dog behaves
fearfully when exposed to certain stimuli, it can be taught to look
forward to exposure to a special toy as a prelude to the arrival of
all pleasant experiences. Once established as a secondary reinforcer
the toy can be used to build pleasant associations with the aversive
stimuli.
The speed or strength of learning increases with the size and
attractiveness of the reinforcer. This is why rats will learn to run
faster in a maze if the food reward at the end is especially
valuable. Recently, John Rogerson has described the use of toys in a
thoughtfully constructed reward gradient. With a reward gradient,
reinforcers are graded in terms of their increasing value to the
animal. Food can be used in this way by being presented in
increasingly tasty, favoured and plentiful forms. Similarly, the
relative value of toys can be determined and phobic dogs can be
exposed to toys of increasing value as they are exposed to closer
approximations of the real source of the phobic response.
There is an important note to bear in mind when we consider the size
of rewards. They can be too great. The effect of increasing the size
or attractiveness of the reinforcer has certain limits in that too
high a level of arousal can have a disruptive effect on learning.
The optimum level of arousal decreases as the complexity of the
learned task increases.
Contiguity is the principle stating that events that occur
together will become associated. Giving a sugar lump to a horse two
minutes after a pat on the neck will not develop a useful
association. The lump and the pat have to arrive together if the pat
is to become reinforcing. The same principle applies in recall
training in dogs. The ignorant owner who sees his dog scavenging,
calls him back and then hits him is hitting him for coming when he
was called, not for chewing chicken bones.
For the best results in a training or learning context, it is not
sufficient for a reinforcer to be contiguous. There is excellent
scientific evidence that it also has to be surprising. The
importance of surprise seems to be that it represents an avenue
through which the animal can know that it has made a discovery. |