The Essence of Marker-based Positive Reinforcement in Dog Training
Actions that lead to beneficial outcomes are more likely to be repeated than those that do not. This process, whereby the probability of a behavioural response increases as a consequence of the outcome of that response, is referred to as positive reinforcement. Intra-cranial self-stimulation (ICSS) is a simple behavioural model that distils positive reinforcement to its minimum neural elements. In ICSS paradigms, mammals make instrumental (operant) responses in order to deliver stimulation to a specific brain area. Sites containing dopamine neurons, or their ascending projections are particularly effective in eliciting this behaviour, and systemic administration of dopamine antagonists causes dramatic reductions in ICSS, strongly implicating dopamine neurons as a neural substrate. A recent study used genetically targeted channelrhodopsin-2 (ChR2) to specifically activate Ventral Tegmental Area (VTA) dopamine neurons and confirmed that dopamine neurons are indeed sufficient to drive vigorous ICSS, consistent with a rich literature demonstrating that VTA dopamine neurons play critical roles in learned appetitive behaviours.
Dopamine is an organic chemical of the catecholamine and phenethylamine families. Dopaminergic signalling is associated with reward-motivated behaviour and motor control. In the ventral tegmental area (VTA), the mesolimbic pathway projects from the prefrontal cortex to the nucleus accumbens of the amygdala, cingulate gyrus, hippocampus, and pyriform complex of the olfactory bulb.
The dopaminergic projections in the amygdala and cingulate gyrus are responsible for emotion formation and processing. In the hippocampus, the presence of dopaminergic neurons is associated with learning, working memory, and long-term memory formation. Lastly, the pyriform complex of the olfactory bulb is responsible for providing dogs with the sense of smell.
In the mesolimbic pathway, dopamine is released during pleasurable situations, causing arousal and influences behaviour (motivations) to seek out the pleasurable activity or occupation and bind to dopaminergic receptors present in the nucleus accumbens and prefrontal cortex. Increased activity in the projections to the nucleus accumbens play a major role in reinforcement and in more extreme cases with addictions.
Having understood the intrinsic effects of Dopamine, we shall take a look at the extrinsic factors. Once again, we assume that the goal of reinforcement learning is to maximise future benefits. Analogous to utilities in economic theories, value functions in reinforcement learning theory refer to the estimates for the sum of future reinforcers. However, since the dog cannot predict the future changes in its environment perfectly, value functions, unlike utilities, reflect the animal’s empirical estimates for its future reinforcers. Reinforcers in the distant future are often temporally discounted so that more immediate reinforcers exert stronger influence on the animal’s behaviour. The reinforcement learning theory utilizes two different types of value functions.
First, action value function refers to the sum of future reinforcers expected for taking a particular action in a particular state of the environment. The term action is used formally: it can refer to not only a physical action, such as counter surfing in a particular location with specific limbs to obtain food (reinforcer), but also an abstract choice, such as ignoring kibbles in favour of treats during mealtime.
Second, state value function, refers to the sum of future reinforcers expected from a particular state of the animal’s environment. If the animal always chooses only one action in a given state, then its action value function would be equal to the state value function. Otherwise, the state value function would correspond to the average of action value functions weighted by the probability of taking each action in a given state.
Neural signals related to action value functions would be useful in choosing a particular action, especially if such signals are observed before the execution of a motor response. Neural activity related to state value functions may play more evaluative roles. In particular, during decision making, the state value function changes from the weighted average of action values for alternative choices to the action value function for the chosen action. The latter is often referred to as chosen value.
In positive reinforcement training for dogs, the term “positive” denotes the addition of a reinforcer wanted by the dog contingent on the performance of a wanted behaviour, immediately after the performance of the wanted behaviour by the dog. There are several assumptions that must be in play for true positive reinforcement training to be effective:
- The training dog’s mesolimbic pathway is normal and its empirical estimates of future reinforcers arising from its participation in human-directed activities are pleasurable;
- The dopaminergic projections in the amygdala and cingulate gyrus are normal and forms pleasurable emotional association between the training activities and future reinforcers;
- The dopaminergic neurons in the hippocampus are functioning normally and form pleasurable working memory from the chosen action value function;
- Through repetitions in a prescribed state value function, the hippocampus forms long-term pleasurable memory of the chosen action value function;
- The human trainer is consistent in his/her use of neural signals (marker) during training;
- The human trainer sets up the training environment such that the state value functions guide the dog towards the chosen value.
It is therefore a fallacy that positive reinforcement training can be applied to dogs which are defensive beyond their threshold, where the dog can be observed to be cowering in a corner of the room, tail tucked between its hind legs, ears folded down and eyes wide open (pupil dilation). Such a dog is incapable of feeling pleasured as cortisol has exerted its effects on the dog.
Similarly, a dog with an extreme level of predatory instinct is not susceptible to positive reinforcement training. Predation is a stressful activity and cortisol levels during predation, especially if parts of the predation process are incorporated in the training to bring about the wanted behaviour, are significantly high. An example of this would be bite training where the dog is frustrated on purpose to produce the Bark and Hold behaviour. A skilled trainer however, knows how to work the predatory instincts at a sub-threshold level to produce the wanted behaviour. However, prey-based training can rarely be called positive reinforcement training as a training often involves negative punishment and negative reinforcement, topics we will cover in subsequent posts.
Trainers who simply click their clickers and feed the dogs all kinds of treats lack the depth of understanding in positive reinforcement training. Good trainers will ask pertinent questions on the dog’s routine of life, dietary habits, predatory instincts amid others. They will also observe the dog for its social and defence drives to prepare a customised training programme that is suited to the dog’s unique temperament.
A dog that has been overfed on a routine basis will not perceive food treats as pleasurable future reinforcers. A dog that has been restrained on a daily basis will participate gamely in an active training session where it gets to sprint, sniff and roll on the lawn. There is no positive reinforcement when the future reinforcer is not deemed pleasurable or when pleasure is derived from non-human directed activities. A skilled trainer will teach dog owners the formation of neural signals at the onset of positive reinforcement training. He/She will also coach the dog owner on the arrangement of state value functions (antecedent arrangement) that will evolve along with the dog’s level of learned appetitive behaviours.
 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3986242/#pone.0094771-Fields1; https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3986242/#pone.0094771-Steinberg1