The unique dog-human relationship has led to the dog’s integral place within our society. As well as being our companions, dogs’ adaptation to our lifestyle has resulted in their exceptional ability to work in a wide variety of roles, such as military dogs, police dogs, assistance dogs, and emotional support dogs. Regardless of career, all service dog agencies face the challenge of maximising the effectiveness of their dogs within a continuously changing environment. In addition to optimising training, agencies must source or breed healthy dogs with a temperament suited to their careers. There is also an ethical responsibility and financial incentive to ensure waste is limited by decreasing the number of inappropriate dogs within the working population.
Many canine health conditions have been extensively studied and can be accurately diagnosed by a veterinary professional. Their relative ease of identification means many health conditions can be reduced and even eliminated from a service dog population through careful breeding. Moreover, commercial genetic tests are now available for an ever-increasing number of health conditions. This means breeders no longer need to rely on clinical diagnoses, and can even identify genetic carriers of disease who are not themselves affected. Accurately recording health conditions and the subsequent careful matching of breeding animals can help ensure an effective and efficient breeding program in regards to animal health.
In contrast, behavioural traits are much more difficult to measure accurately. Within the scientific behaviour literature, there are generally two main types of measurable behavioural data: temperament and behaviour. Service dog breeding programs should track data pertaining to both temperament and behavior in order to improve outcomes in their training programs. As yet, the majority of agencies are unaware of the importance of distinguishing between these types of measurement. Ultimately, this misunderstanding of behavioural data across the service dog sector is resulting in the loss of important efficiency opportunities.
Defining temperament and behaviour
Temperament describes the innate mental traits of a dog, observed through its response to a range of stimuli in its environment
Temperament is determined by the dog’s biology and is primarily a function of the dog’s neurological makeup. As such, it cannot be eliminated or transformed from one type to another (Robins 2017). It is heritable (Wilson 1998; Jones and Gosling 2005; Herborn et al. 2010; Stamps & Groothuis 2010) and consistent over time and situations relative to the population (Bell et al. 2009; Biro 2012; Edwards et al. 2013). However, the environment, socialisation, and training can modify the observable expression of temperament, known as behaviour. This can make temperament difficult to measure, as it requires measuring displayed behaviours over a variety of situations and times (Carter et al. 2012).
The hypothetical graph below shows the reactivity of two dogs from birth until 12 months (Figure 1). Each time point score is the result of observations of the dogs’ responses over a range of situations. How reactivity is displayed—that is, what we observe and score—varies as the puppies get older. Note how Dog Two is always more reactive than Dog One. More importantly, the difference in reactivity between the two dogs remains consistent regardless of time or situation. Thus, in this hypothetical example, we can say reactivity is a temperament trait.

Figure 1. A hypothetical graph showing the reactivity of two dogs from birth until 12 months. Each time point score is the result of observations of the dogs’ behavioural responses over a range of situations.
Behaviour is the currently observable response of a dog to stimuli in its environment
Behaviour is the result of a combination of a dog’s temperament (neurological makeup and genetic potential) and environmental experience. It is time and situationally specific, and can be modified through the environment, socialisation, and learning. It is not directly heritable but is constrained by temperament. Behaviour is relatively easily measured by observing what the dog is currently doing. For example, a dog displaying behavioural discomfort will physically avoid a stimulus, turn its head aside or lower its head, lip lick, yawn, tuck its ears and tail, ground sniff, and display gaze avoidance.
Measuring temperament and behaviour
As stated above, behaviour is measured by observing what the dog is currently doing, whereas temperament cannot be directly measured and relies upon accurate measures of behaviour over time and situations. Most scientific studies use one or a combination of owner surveys, expert ratings, and observational tests to measure temperament. Handler surveys are one of the most commonly used methods. A survey is usually created using traits that are found in other species as well as behaviours specific to the species in question. The survey is filled out by animal caretakers, who rate individuals’ behaviours on a Likert scale, with 1 being the least likely to exhibit the trait, and 5 or 7 as being the most likely. Inter-rater reliability is then assessed, and those adjectives deemed unreliable are dismissed from further analysis. Temperament components are then determined using dimension reduction statistics.
Behaviours can also be video recorded and an ethogram created to score for either frequency or duration. Standardised tests are also used, in which a number of experimental measures are conducted; for example, we can observe reaction to a familiar versus an unfamiliar person to measure sociability. Dimension reduction statistics are usually used in these cases as well. Professional observations over time can be used in a similar manner in service dog organisations, to create temperament trait scores. These scores can be used to inform breeding practices.
However, Diederich and Giffroy (2006) found that due to a lack of consensus in all parameters they reviewed (terminology, test quality requirements and implementation, breed, age, sex, characteristics of social and environmental stimuli used, characteristics of behavioural variables collected, and characteristics of physical and physiological data used), and the uniqueness of each researcher’s method, standardisation across the field is greatly needed. The variability in the literature has resulted in high levels of variability in the methods used by service dog agencies.
Murphy (1998) evaluated the behaviours observed by guide dog trainers in an Australian guide dog population. Categorical temperament traits identified included anxiety, suspicion, low concentration, dog distraction, low willingness, excitability, nervousness, and low body sensitivity. Murphy also provided described behaviours correlated with these nine predefined temperament traits. Building on the concepts from this work it is suggested that having handlers measure and record observable behaviours over a variety of situations, which are then independently clustered into quantitative traits, could remove substantial variation and provide more meaningful information to both breeders and trainers.
A consistent methodology across service dog agencies would enable better information transfer between organisations. A more consistent sector-wide approach to temperament data collection would enable each organisation to incorporate and utilise temperament data from other agencies. This is particularly relevant for agencies with collaborative breeding programs as it would be possible to continue monitoring the temperament of breeding stock’s offspring after they have been transferred to other service homes.
The problems with incorrect measures
Temperament, and thus ultimately behaviour, is likely controlled by numerous genes, making its expression highly complex. Consider the simplified hypothetical figure below. The reactivity temperaments of Dog One and Dog Two are represented by the blue and red boxes respectively. These boxes show the genetic constraints placed on each dog due to their temperament. The dogs can only display the range of observable reactivity-related behaviours that fall within the length of their boxes. Which reactivity-related behaviours are displayed depends upon each dog’s experience and learning, and will vary over time and situations. For example, following training, Dog Two displays observable behaviours that decreased its reactivity score, moving from behaviours represented by the green arrow to behaviours represented by the yellow arrow. Given the same training, Dog One also displays different observable behaviours that reduced its reactivity score, by an equivalent amount. However, no amount of training will get Dog Two to the same level of reactivity as Dog One, because Dog Two is restricted by its higher reactivity temperament (genetics).

Figure 2. Diagram showing the reactivity temperament of two dogs on a scale of low to high and the impact of training on the behaviours observed.
In the following example, we can see how incorrectly measuring behaviour can result in an inaccurate indication of temperament. Again, the reactivity temperaments of Dog One and Dog Two are represented by the blue and red boxes respectively. These boxes show the genetic constraints placed on each dog due to temperament. The dogs can only display the range of observable reactivity related behaviours which fall within the length of their boxes. In contrast to the previous example, the range of each dog’s temperaments now overlap. Dog One has potential to demonstrate behaviours related to low and medium reactivity, whereas Dog Two has potential to show behaviours related to medium and high levels. Therefore, from a temperament perspective, we would classify Dog One as having lower reactivity compared to Dog Two.
However, if measured incorrectly, there is the potential to mislabel these dogs. The green arrows represent the reactivity-related behaviours observed in each dog, measured only once in a single environment. Dog One previously had a negative experience in this environment and so displays behaviours at the highest level of its reactivity potential, while Dog Two previously had a positive experience in this environment and so displays behaviours at the lowest level of its reactivity. Under these circumstances, these behavioural measures would result in Dog Two being incorrectly labelled as having a lower reactivity temperament compared to Dog One. This shows the importance of observing behaviour over time and situations when measuring temperament.

Figure 3. Diagram showing the reactivity temperament of two dogs on a scale of low to high and the impact of measuring behaviours once in a single environment.
Impact on service dog agencies
Both temperamental and behavioural data are needed by all service dog agencies with their own breeding and training departments. However, data requirements and applications vary between departments. Breeding departments require information on dogs from a population level perspective. To increase or decrease particular temperament traits in the population, breeding departments need behavioural measures on dogs across a variety of situations and times, which are combined to produce temperament traits. The earlier this information can be collected the more efficiently breeding decisions can be made. In contrast, training departments require information on modifiable behaviours in individual dogs in order to develop these dogs as required. These different departmental data requirements are visualised in the diagram below.
Starting with training departments, staff work with dogs at an individual level. Dog handlers and trainers collect behavioural data on their dogs through direct observations. They are recording what the dog is currently doing. An example of this is positive reinforcement training plans, where a dog progressively moves through a series of behavioural steps during task acquisition. A trainer’s direct behavioural observation of the dog then informs practice through behavioural modification. In contrast, breeding departments will examine behaviours from a population level, identifying temperament traits for selection. This requires accurate collection of temperament data. The dog’s position in the series of behavioural steps in task acquisition during training will not provide the information needed for a successful breeding program. Instead the breeding department requires behaviours to be observed over time and situations throughout training. This then informs breeding practices such as removal or addition of breeding animals, selective matings and estimated breeding value calculations.
Conclusions
All service dog agencies face the challenge of maximising the effectiveness of their dogs within the continuously changing environment. In addition to optimising training, agencies must source or breed healthy dogs with a temperament suited to their careers. There is also an ethical responsibility and financial incentive to ensure waste is limited by decreasing levels of inappropriate dogs within the working population. Breeding and training departments have different behavioural data needs but are highly interdependent. The training department needs to record modifiable behaviours, looking at what the dog is actually doing. The breeding department must combine numerous behavioural observations taken by the training department over time and situations to measure the dog’s innate temperament.
Currently, most service dog agencies do not differentiate between these two types of data. Many ask trainers to record information on the dog’s temperament rather than the behaviour currently observed, for example, evaluating the dog’s confidence rather than recording confidence-related behaviours observed over various situations. By treating these two types of data separately, service dog organisations could improve efficiencies by more effective reduction of inappropriate behaviours in the population. Moreover, accurately recording individual behaviours rather than temperament during training could help trainers identify areas for future development, improving the service dog product and reducing behavioural withdrawals. Individual agencies collecting the data appropriately, and consistency in terminology and methodology across the service dog sector would result in further efficiencies. To achieve this, high levels of collaboration are needed with the support of larger accreditation bodies.
Since 2012, Helen Vaterlaws-Whiteside, Oxon MA, PhD has worked at the largest guide dog organisation in the world providing expertise in dog behaviour and research. Helen also runs her own consultancy providing evidence-based solutions to support canine development and improve animal welfare in charities of all sizes across the globe. Helen can be reached via email at analytics@caninedata.com.
SHARE