|
Shaping
Although Skinner's findings with his lever-pressing rats may seem to
represent common sense, his school of thought has produced some
intriguing principles. Perhaps one of the simplest yet most powerful
of these is shaping. This is the concept of reinforcing successive
approximations to the final response. The technique allows a trainer
to move from a situation where it is impossible to reinforce a
desired response (because that response never occurs) to one where
the response is occurring, being reinforced, and increasing in
reliability. If trainers wish to reinforce particular responses they
can either wait for the behaviour to occur spontaneously, which can
be readily reinforced if the behaviour occurs frequently, or shape
the behaviour pattern. In seeking to train complex behaviours or
those that occur uncommonly in an animal, the trainer will usually
opt to reinforce successive approximations of the final behaviour. A good example of shaping comes from the send-away exercise
in dog training. Contrary to the dog's innate tendency to remain
with the pack, this persuades it to leave the owner. It is achieved
by rewarding the dog for small movements away and then, on the next
occasion, an all-important demand for more of the same response
before reward delivery. Crucially, shaping relies on sparing and
grading the reinforcement so that animal does not stagnate. A
common characteristic among good trainers is their ability to
recognise an opportunity to reinforce improved "approximations".
While poorer trainers complain that their animals fail to understand
what is being asked of them and feel that the animals have peaked in
their training, their superiors have the sense and patience to
monopolise each tiny improvement as the only way of moving towards
the final response.
In shaping, it is important to reward a behaviour as soon as it
happens. This is avoids a phenomenon called the delay of
reinforcement effect. Any delay in rewarding the improvement
will lessen the effect of that reward. This may be because it allows
the subject to perform another response during the delay interval,
which is reinforced. An example might be rewarding a horse for
jumping a fence very cleanly by a giving him a sugar cube. To
administer the sugar while riding, you would have to bend forward
and place it in front of the horse’s mouth. Since this could not be
achieved safely you would probably slow down and even halt. Instead
of learning to jump ever more cleanly, the horse would predictably
learn to slow down and halt, these being the behaviours closest to
the reward.
Generalisation and discrimination
Pavlov found that almost any stimulus could act as a conditioned
stimulus provided it did not produce too strong a response of its
own. In very hungry dogs, even painful stimuli like electric shocks
delivered to the paws, which initially caused flinching and distress
quite soon evoked salivation if paired with food. Pavlov carried out
exhaustive tests using this apparatus and a variety of tactile,
visual or auditory stimuli (the board in front of Pavlov's dog could
be used to present visual images with an infinite variety of colours
and shapes). He found that if a dog was conditioned to salivate when
a pure tone of perhaps 800Hz was sounded, it would also salivate
when other tones were given but to a lesser extent. This is now
known as generalisation. The dog generalised its responses to
include stimuli similar to the conditioned one and the more similar
they were the more the dog salivated.
The opposite process to generalisation is discrimination.
Dogs naturally discriminate to some extent otherwise they would
salivate equally to all sounds and tones. Discrimination can be
accelerated if, as well as rewarding the right tone, the dog is
slightly punished when it salivates to the others. This is called
conditioned discrimination and has been of enormous benefit in
working out the sensory capabilities of animals. For example by
refining the stimuli to which dogs are required to respond in order
to get a reward, we can ask question about what they can actually
see. So, for instance, by training a dog to respond consistently to
a colour in a certain wave-length, we can ask the question “can dogs
see the colour blue”? The ability to discern between panels of the
same reflectance but different colour tells us that the answer is
yes, along with green (we know this because they are more sensitive
to light in these wave bands than to red).
Commands used to cue a behaviour can be the product of
discrimination. Police attack dogs exemplify the way in which
certain words can be kept in reserve for special purposes. When he
is excited at the prospect of a bite, he has to discriminate between
words to discern the release command. Equally, after he has bitten,
when he hears 'leave' he has to discriminate between this command
from his handler and all the other shouting, screaming and blasphemy
that accompanies a dog assisted arrest.
By rewarding animals for responding appropriately to stimuli that
are less and less obvious, we can foster the power to discriminate
between the stimulus that is rewarded and all other background
information that would otherwise prevail. Discrimination is what
allows us to train dogs to detect drugs, pigs to locate truffles,
and chickens to identify images of familiar feathered friends. A
similar process is at play when we train animals to respond to
smaller and smaller cues in training. |