CHAPTER 4 – INSTRUMENTAL LEARNING AND ENGAGEMENT IN DOGS

The objective of instrumental learning (operant conditioning) is not to learn new behaviours. Learning new behaviours is a by-product of the process. The sole objective of instrumental learning, in our view, is to teach the dog that it has control over its circumstances and that certain behaviours on its part will bring about corresponding consequences. While the pathway (instrumental learning) leading to the dog’s preferred consequences can be laid open for the dog, a critical presumption remained – that the dog is interested in the betterment of its circumstances, though its own effort.

In Chapter 2 and 3, we covered the mesolimbic pathway where neural signals trigger the release of dopamine, causing arousal and influences behaviour to seek out pleasurable activity (for example, hunting to eat). Dopamine binds to dopaminergic receptors present in the nucleus accumbens and prefrontal cortex. Increased activity in the projections to the nucleus accumbens play a major role in reinforcement, which manifests as a sense of accomplishment / pride / general happiness, leading the dog to intensify its effort in the same manner as it had done so earlier. We also covered how activity of the medial prefrontal cortex (PFC), which promotes the detection of control, leads to the automatic inhibition of the dorsal raphe nucleus (DRN). Learning control is largely a process of instrumental learning and dogs learn to act in accordance with the outcomes produced by their actions. Through instrumental learning, alterations in the ventromedial PFC circuitry at an early age – learning how to switch off stressors aka escapable stress – creates a specific and persistent change in the prelimbic-dorsal raphe nucleus circuit that led to the inhibition of the dorsal raphe nucleus and prevented passivity in response to even inescapable stress.

The vast majority of dog trainers assumed that the dog is inherently keen to improve its existing circumstances; that it is not passive by default. These trainers will focus exclusively on creating the neural signals (marker-based training / clicker-based training), rushing head long into training new behaviours without checking back to see if the dog’s inherited genotypic and phenotypic traits are congruent with the training methods applied. They will champion the use of positive reinforcement methods loudly and proudly, both in person and virtually in the digital space. They “spit” on those who differ from their philosophy and brandish the big spear of “animal welfare” to wage war on those who understood better and deeper. True instrumental learning encompasses pre-eminently the conditioning of the ventromedial PFC circuitry leading to the suppression of the DRN and helplessness as the default. We want a dog that keeps trying its luck, learning to overcome stressors for a win starting from a young age. The creation of neural signals and spurring the dopaminergic binding in the PFC and nucleus accumbens is but secondary order. This is what we meant by building engagement and drive in the dog. At Happy Paw Ark, rushing into behaviour training is spurned. We are more interested in getting the dog to detect control, seize control and act to improve its circumstances whenever opportunity presented itself. We the humans, control the opportunities through disciplined antecedent arrangement – setting events and conditioned motivating operations – and consistent delivery of consequences.

Let us illustrate this significant topic with a Compare & Contrast table:

Activity	At Happy Paw Ark	Typical Pet Owner	Why We Do What We Do?
Feeding	Food is given whenever the dog does something, e.g looks at trainer in the eye and stay still; dog ignores the bowl and walks over to the trainer	Food appears at fixed timings and in a bowl, regardless of hunger level in dog	Dog learns that it needs to behave in a certain way to make food appear. As the dog learns an expanding array of behaviours to make food appear, it realises the control it has over the appearance of food
Walking	Dog decides where to go; trainer is passive and acts only in the interest of the dog’s safety e.g to steer clear of glass, vehicles, cat poop etc. Trainer is interested in the emotion state of the dog and has food ready to establish association of typically scary objects with appearance of food (counter-conditioning). Food can also be used to reinforce good behaviours such as walking away from food waste left on the ground, jumping up the park bench (overcome obstacle to get somewhere); wagging tail at a passing cyclist etc. The walk can last 30 minutes but covers 300m. Trainer is not interested; the dog decides.	Human decides where to go; pulls the dog along or chicken-dances to entice the dog to follow. Wished that the dog is following nicely and not poking its nose into everywhere everytime. Human is interested in arriving at a certain destination so the dog can mingle e.g dog run. The walk has a fixed distance/destination/duration.	Dog is acting on its own accord, to benefit itself through its own sensory detections. Dog has control over its activity. Dog covers a little distance but comes home exhausted because of the spontaneous learning activities involved.
Playing (predation)	Different objects are used to entice the dog into a chase, grab-bite, kill-bite. Objects are left for dead once dog bites on. The next object comes alive with trainer animating the object. Objects are thrown into corners or confined spaces for dog to retrieve; allow dog to enter and retrieve and feel relieved plus accomplished. No competition from the trainer; dog learns sense of accomplishment by intensifying its behaviours of chasing, biting at object and releasing the dead object.	Owners like their dogs to play fetch because it makes them looks in control and their dog is working “for” them. Mindless and never-ending throwing of ball / kong / rope / frisbee. Lack of consideration of purpose of play in dog’s life and the different stages of the predation sequence. Owners compete with their dogs over the possession of the prey object – tug of war – as a game.	Dog is acting on its own instincts and refining those instincts. Trainer shapes the dog by giving different consequences to different expressions of instincts. If the dog bites the ball and chews, lots of pets and rubs. If the dog play-bites trainer’s hands, the play session is ended prematurely. Dog loses access to wanted consequences.

Consequence-based learning in animals gained traction in the 1960s after B.F Skinner published his Theory of Learning – when A, if B, then C; where A is Antecedent, B is the behaviour of the learner / animal and C is the consequence that corresponds to the behaviour executed by the learner / animal. Skinner also introduced the concept of schedule of reinforcement – a “time-table” stating the various frequency of reinforcing behaviours. In some cases, a behaviour might be reinforced every time it occurs. Sometimes, a behaviour might not be reinforced at all. Other times a behaviour is reinforced every third time it is performed.

Translating the ABCs of the Theory of Learning into the ubiquitous learning quadrants, we have Positive Reinforcement, Negative Reinforcement, Negative Punishment and Positive Punishment:

When all criteria of the Antecedent have been met,

	Positive (Addition, +)	Negative (Subtraction, -)
Reinforcement (Intensifies)	Consequences wanted by the dog is given to the dog after a wanted behaviour is performed by the dog; dog learns that a particular variant of a behaviour brings about wanted consequences	Consequences unwanted by the dog is removed after the dog performed a wanted behaviour; dog learns that a certain behaviour switches off pressure/aversive
Punishment (Diminishes)	Consequences unwanted by the dog is given to the dog after an unwanted behaviour is performed by the dog; dog learns that a certain behaviour under certain circumstances brings about bad stuff	Consequences wanted by the dog is removed after the dog performed an unwanted behaviour; dog learns that access to the good stuff is lost when certain behaviour is performed under certain circumstances

Instinctively, it appears to be unquestionable that dog training should reside in the Positive Reinforcement (R+) quadrant only. Those trainers who espoused such ideological fantasy could not be farther from the truth. We need to examine the premises of R+ before we go further into R-, P+ and P-.

For R+ to work, there are several precursors. For a worker to be motivated to put in his normal effort, the paycheck (wanted consequence) shall be of a sufficient amount and this amount is determined by the worker, not the employer. A new migrant might be willing to put in the same effort for less compared to a rich man’s child who led a life of comfort since young. Now if the new migrant struck lottery and won a million dollars overnight, would you expect him/her to put in the same effort the next day at work? Therefore, the first criterion for R+ to be effective would be that the dog craves its “paycheck”.

If a new migrant worker craves his/her paycheck at the end of the month, and the employer dishes out a difficult task in the middle of the month, would there be stress experienced by the new migrant worker since he/she would think that this task must be completed otherwise I might lose my paycheck. On the contrary, the rich man’s child might be more lackadaisical as the paycheck at the end of the month carries very little significance. Therefore, the second criterion for R+ to be effective would be either (i) the dog is in the position of a new migrant or (ii) the paycheck is significantly enticing even if the dog is a “rich man’s child”. It would be best if both are met. Nonetheless, it must be pointed out that effective R+ involves stress. This stress emanates from the dog’s desire for the “paycheck”. If there is no desire, there is no stress but there is also no behaviour / learning because the paycheck is insignificant to the dog. The dog learns to turn off the stress by behaving correctly and quickly to reach the eventual paycheck.

How does the dog learn which behaviours turn off stress and get the paycheck? Some trainers teach by dangling the “paycheck” in front of the dog and lead the dog to perform the required behaviours i.e Luring. Other trainers prefer to let the dog knows the “paycheck” is right there in the pocket but a certain behaviour or any variation of it is required to unlock the “paycheck” from the pocket i.e differential reinforcement of successive approximation (DR+). Luring is less stressful for the dog because it can see the “paycheck” clearly and smell it and just needs to follow it. The dog learns very little other than to follow the lure. DR+ is more stressful for the dog but the dog volunteers the behaviours out of its own volition in exchange for the paycheck. Those behaviours that unlocked the pocket will be well-remembered and re-enacted with ever increasing intensity and frequency.

The key point here is that even in R+, there is stress because the dog wants the paycheck. Dog behaves in a certain way to get the paycheck and stress is reduced / removed by its own actions. There isn’t a truly fear free or stress free learning/training afterall, contrary to what the proponents of fear-free dog training are claiming.

What about R-? Traditionally R- involves performing an unknown behaviour to turn off pressure or simply not doing anything to avoid pressure (avoidance). This results in typically subdued states of mind in dogs.

The dog is suppressed and when R- is prolonged, typically slips into learned helplessness and depression because the learner (i.e. dog) chances upon the correct behaviour by luck. In Traditional R-, there is little to no sense of control as the dog falters and try again in a repetitive loop. However, in the modern R- application, the dog has already learnt the correct behaviour through R+ and the dog comes to the realisation that when it performs the correct behaviour it has learnt prior, the taught behaviour not only switches off pressure but also brings about wanted consequences. You will get eager dogs giving their heart and soul in performing behaviours. See https://youtu.be/tIxHCQspe30

Those waving the banner of zero use of punishment are literally barking up the wrong tree! What they are fighting against are these compulsive trainers who tend to create subdued and helpless dogs through their training. Combined with luring, we get a dog that is dependent on handler’s cues to perform because any form of initiative will likely earn the dog an aversive stimulus. There is a place for punitive measures in dog training.

Positive punishment or negative punishment serve to diminished unwanted behaviours such as biting, growling, outright aggression involving predation or not. When one is dealing with instincts, one can only influence it through redirection or punishment. When a human sense a fist flying into his/her face, the human will blink and duck. Instincts! There is no way to get rid of instinctive responses for these are innate and not learned through instrumental learning. So we can only hope to control these innate responses through thousands if not millions of repetitions in classical conditioning or positive punishment. Like positive reinforcement, the consequences delivered in P+ must be sufficiently punitive else there is really no point in punishing the dog. So typically, when trainers need to deter the future occurrence of bites on friendlies, the punishment must be abrupt and disastrous for the dog. Usually we will “arrest” the dog and pin it down forcefully until we feel a softening of its body. This denotes a mental state change in the dog and then we allow it to perform other simple behaviours and reward it if done correctly. For those that claim punishments are not necessary in education/training/learning, they obviously have not come across sufficient genres of dogs to be able to call themselves dog trainers. These people exist in a bubble consisting of dogs damaged by artificial selection. These creatures are not truly the dogs that enjoined human communities millennials ago.