resources for applied ethology

 

contact us

links

           

 

   

Learning Theory

Imprinting - Non-associative learning - Classical conditioning - Operant conditioning

Extinction - Positive reinforcement - Negative reinforcement - Punishment - Shaping

 

 

Operant conditioning

 

An operant response is a voluntary activity that brings about a reward. In operant conditioning, the buzzer used by Pavlov might still be presented but the dog must make a particular response before food is consumed. In other words, there is a special link (what learning theorists call a contingency) between a particular behavioural response and a food reward.

While Pavlov was concentrating on the physiological responses of dogs in harnesses, Thorndike (1911) was studying the behavioural responses of cats in puzzle boxes. Instead of delivering food independently of behaviour whenever a signal had been presented, Thorndike delivered it once his animals had responded. In a body of work intended to discredit the notion that animals are capable of reason, Thorndike described the behaviour of a naïve cat in a specially designed box.

Of course without any food or other home comforts, life was rather dull and unsustainable in the puzzle box but the cat could get out - but only by pulling a trigger. Motivated to access food outside the box, Thorndike's cats would eventually learn to escape by operating the trigger that released the door latch. Once out of the box, the cat would get his food. Thorndike called this “trial and error learning". This label has largely been replaced by the terms instrumental learning and operant conditioning. The animal sees a cue (the trigger), performs a response (pulling) and gets a reward (liberty and food). The effect of the reward is to strengthen the correct response. This is known as reinforcement. The term reinforcement refers to the process in which a reinforcer follows a particular behaviour so that the frequency (or probability) of that behaviour increases.

Reinforcers and punishments

Operant conditioning enables an animal to associate events over which it has control. This increases the controllability of the environment and represents the crucial difference between classical (Pavlovian) and operant conditioning. In classical conditioning, rewards become associated with stimuli while in operant conditioning they become associated with responses. The majority of exercises in animal training rely on operant conditioning. Rewarding the desired behaviour relates to the Law of Effect which states that whatever behaviour immediately precedes reinforcement will be strengthened. Several studies suggest that lack of control over aversive events can bring about major behavioural and physiological changes. For example, after being exposed to uncontrollable electric shocks, rats have increased gastric ulceration, increased defaecation rates and are more susceptible to certain cancers when compared to individuals that do have control over comparable shocks.

The distinction between classical and instrumental conditioning seems clear when we are considering the temporal relationship between the response and the reinforcement but the mechanisms involved are likely to be the same. Indeed, they are both often found in what appears to be a single response. The likelihood of an association arising depends on the relationship between the first event and the second via stimulus-response-reinforcement chains. Consider a cat that has learned to run towards the kitchen when she hears the sound of the tin opener. The sound of the tin opener comes before food and therefore becomes classically associated with food, making the appearance of her supper more predictable. Running in response to the auditory cue is a product of operant conditioning that is reinforced every time she is fed. Her life is more controllable as a result of this learning because she can choose the speed with which she runs or even whether to bother running at all.

This field of study was developed further by B.F. Skinner who created the Skinner box, a device that is basically a problem box in which the subject learns by trial and error that pressing a bar yields a small reward. The bar-pressing behaviour is then reinforced. Skinner reported his findings in a seminal paper called the Behaviour of Organisms. He argued that, with the selection of appropriate rewards, this system could be used to teach anything.

Food is not the only reward that can be used. The other obvious one is water that can be given to subjects that have been kept thirsty. This is interesting because close observation of the heads of experimental pigeons in Skinner boxes shows that they adopt different approaches to the key (an operant device that must be pecked) depending on whether they are expecting to receive a food or water reward. If the reward is water then the bird will use the device with closed eyes, an open mouth and a peck of longer duration and less force than the peck for food.

Some argue that reinforcement is necessary for learning to take place. However, rats that receive a shock to their hind-paws while in transit, when being trained to run from A to B down an alleyway will reach A faster than those given only food as an incentive. A reinforcer is anything that increases the frequency of the particular behaviour that it follows. Operant conditioning allows us to use reinforcers and punishers that positively and negatively influence the likelihood of a behaviour being repeated or not. A response will increase in strength when followed by a reward. In his free operant experiments, Skinner measured the strength of a response by recording the response rate, i.e. number of responses per unit of time. Skinner used this outcome to develop the principle of reinforcement.

If a blue tit pecks for long enough at the foil on a milk bottle he will encounter the cream below and be more likely to repeat the pecking behaviour. Reinforcement has occurred. The blue tit has an innate drive to continue foraging in this way if the costs can be offset by the benefits. In other words, if evolution has equipped him to be an efficient food gatherer, the bird can judge, at a subconscious level, whether the time and energy spent pecking at the foil and the risks of predation by the nearest cat are outweighed by the (taste and) nutritional value of the cream. This exemplifies what ethologists refer to as optimal foraging, the non-human equivalent of a time and motion study. The pleasure and nourishment brought by the activity justify its pursuit.

The merit of a reinforcer can only be measured in terms of the degree to which it makes the behaviour more likely in future. If a trainer's saying "good dog" in response to a dog’s heel-work has no effect on the dog’s future behaviour then, according to this definition, reinforcement has not occurred. The trainer's words have had a neutral or even confusing effect. The definition does not describe how or why some events act as reinforcers. Whether some event is called a reinforcer is purely a matter of the effect it had. This is why, instead of encouraging owners to give their dogs praise, which can so often be understated and, as a result, ineffective, many of the more enlightened dog schools tell their humans to 'make those tails wag'.

Animals can be trained to do quite remarkable things if they are reinforced at the right time. For example, in one study, Skinner delivered food to eight pigeons every 15 seconds regardless of what they were doing at the time. After a number of rewards, six of them were performing behaviours (such as circling in a single direction) repeatedly throughout the interval between reinforcers. Even though there was no causal relationship between the behaviour and the reinforcer, the birds happened to be doing something at the time of reinforcement. By waiting for an incidental movement of the eyelids, scientists were able to teach pigeons to blink to receive a food reward. Cats that learn to rub their owner's legs just prior to the delivery of food have learned in the same way. The activity generally does little to get the food to them quicker, i.e. is not causal, but because of its contiguity to reinforcement it is slavishly included in pre-prandial rituals. Those of us who repeatedly press the on-button at a pedestrian crossing are probably subject to the same phenomenon. Because we perceive a link between serial button pressing and the appearance of the signal to cross, we think it is the best way of getting the desired outcome quicker. Strangely, this was called superstitious learning for some time. This was surely a misnomer since the pigeon was behaving predictably and rationally rather than misguidedly.

Reinforcers can be either primary or secondary. Primary reinforcers are any resources that animals have evolved to seek. If the animal's motivation is correctly predicted food, water, sex, play, liberty, sanctuary and companionship can all be used as primary reinforcers. Secondary reinforcers are stimuli that are not intrinsically rewarding but that have become associated with the kind of primary resources listed above. These associations make great sense in evolutionary terms since an auditory, olfactory or visual cue that has become reliably linked with a primary reinforcer will hold an animal’s interest much longer than a neutral stimulus. For example, a fox can learn to make associations between the smell of hens and the meal they represent. If the smell did nothing to help the fox feed then it would hold no value and remain neutral. Instead, it encourages the fox to persist in its foraging activities.

The houselight in a Skinner’s box can be used to indicate a correct response if the reward and the light have been delivered simultaneously on a number of occasions. The light becomes reinforcing. The rat appears to look forward to illumination of the light. Consider for a moment the way in which horses are often praised with tactile stimuli, they can be either scratched at the withers or patted on the neck. Horses have evolved to find grooming one another rewarding. Indeed horses indulging in the familiar 'I'll scratch your back if you scratch mine' occupation have reduced heart rates that suggest they may be getting pleasure or stress reduction from the stimulation. So, a scratch in the correct part of the withers can represent a primary reinforcer. By comparison, the far more common practice of patting horses on the neck is reinforcing only if the owner has coupled the pat with something pleasant. Because horses have not evolved to be motivated to behave in a certain way for pats on the neck, the stimulation has to be conditioned as a secondary reinforcer.

Perhaps the best example of a secondary reinforcer is the sound made by a so-called 'clicker', the handy device used by thousands of trainers world-wide. Pioneered by students of Skinner, this association allows the trainer to bridge the gap between the time at which an animal performs a response correctly and the arrival of a primary reinforcer. The Brelands developed feeding devices that made a characteristic sound as a prelude to food. Psychology labs that use rats for learning studies do the same thing and call it hopper or magazine training. Essentially the clicker comes to mean 'Yes, That's good - expect a reward any second now'. When a clicker is first used the correct association is established by making the sound just before giving a delicious reward and doing this many times to convince the animal of the signal's reliability. Clicker training proves particularly helpful when training behaviours in a free operant situation. Any secondary reinforcer can be instituted in this way. The only significant feature of a commercial clicker device is the sound it makes which is crisp and distinctive. The crispness facilitates precise reinforcement of sophisticated and brief behaviours such as the blinking of an eye. Being pocket-sized or attachable to key-rings, clickers are convenient but by no means unique. Indeed, as long as they cannot be confused with words that appear in common parlance, human vocalisations (so-called clicker words) are even more readily available.

Secondary reinforcers are most effectively established when presented before or up until the presentation of a primary reinforcer. Simultaneous presentation of a reward and a novel secondary stimulus is less likely to work because the primary reinforcer will block or overshadow the new stimulus. Similarly, presentation of the secondary stimulus after the primary reinforcer is unproductive, because although an association will exist between the two, it does not help the animal predict the arrival of a reward. Perhaps this is why hunting species respond more to the smell of blood as reliable precursor of food than intestinal contents, which appears only after a kill.

Dogs have evolved to appease the leader of their pack and this may be why they respond so readily to social rewards such as petting and praise from humans. Some dogs can be reinforced by the slightest social contact and this is why pushing such a pet to the floor after it has broken a house rule by jumping up at a visitor is highly unlikely to eliminate the unwanted behaviour. Given that appeasement and affirmation of a social bond is the reason for contact being so reinforcing, it is important to remind ourselves that the human offering such a reward can only really make it worthwhile for the dog if the dog has an understanding of that human’s perceived greater social rank. This is probably why a vet saying “Good Boy” as she deals with a strange dog in the clinic is far less effective than when she uses the same words with her own pet. Equally this explains why guide dog trainers emphasise the importance of developing a bond between dogs and their trainers and subsequent owners, before asking them to work.

For dogs with an innate play drive that makes their ancestors, the wolves, look like proper party poopers, many toys have value and therefore can have reinforcing properties. As well as being rewarding in their own right, toys can be used as conditioned or secondary reinforcers in behaviour therapy. If a dog behaves fearfully when exposed to certain stimuli, it can be taught to look forward to exposure to a special toy as a prelude to the arrival of all pleasant experiences. Once established as a secondary reinforcer the toy can be used to build pleasant associations with the aversive stimuli.

The speed or strength of learning increases with the size and attractiveness of the reinforcer. This is why rats will learn to run faster in a maze if the food reward at the end is especially valuable. Recently, John Rogerson has described the use of toys in a thoughtfully constructed reward gradient. With a reward gradient, reinforcers are graded in terms of their increasing value to the animal. Food can be used in this way by being presented in increasingly tasty, favoured and plentiful forms. Similarly, the relative value of toys can be determined and phobic dogs can be exposed to toys of increasing value as they are exposed to closer approximations of the real source of the phobic response.

There is an important note to bear in mind when we consider the size of rewards. They can be too great. The effect of increasing the size or attractiveness of the reinforcer has certain limits in that too high a level of arousal can have a disruptive effect on learning. The optimum level of arousal decreases as the complexity of the learned task increases.

Contiguity is the principle stating that events that occur together will become associated. Giving a sugar lump to a horse two minutes after a pat on the neck will not develop a useful association. The lump and the pat have to arrive together if the pat is to become reinforcing. The same principle applies in recall training in dogs. The ignorant owner who sees his dog scavenging, calls him back and then hits him is hitting him for coming when he was called, not for chewing chicken bones.

For the best results in a training or learning context, it is not sufficient for a reinforcer to be contiguous. There is excellent scientific evidence that it also has to be surprising. The importance of surprise seems to be that it represents an avenue through which the animal can know that it has made a discovery.

 

back to Learning Theory

next section

 
back to top  
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 Dog being operantly conditioned to 'shake hands'

 Dog being operantly conditioned to 'shake hands'

Boo is owned and trained

by Lindy Coote

 

 

 

 

 

 

 

 

           
 

 

© animalbehaviour.net

animal animals behaviour behavior pets horse horses dog dogs cat cats animalbehavior animalbehaviour children kids problem problems behavioural behavioral learning abnormal normal Paul McGreevy