Evaluating Research

Written by Patience Fisher

Every behavior consultant knows not to rely on a client’s description of a pet’s behavior—you have to see that behavior yourself. Trained eyes see differently. This is just as true for analyzing research studies. It is important to understand the underlying principles of research to see what the research really shows.  In this article, I explain that causation and correlation studies seek different answers. I also discuss the inherent limitations of even a well-done study. The importance of statistical significance and confounding factors are discussed, as well as the problem of false positives.

Correlation versus causation

Correlation studies help us to form hypotheses to be tested by experiment. A case study is a correlation study with a sample of one; it has no statistical significance. Multiple case studies that proceed alike are more compelling, but still do not show cause and effect. Cross-sectional studies see what is happening to a group at one point in time. They are the most compelling type of correlation study, but even they can only show whether or not two variables trend together. They are done to see if the expense of a causation study is justified. So be careful about jumping to conclusions based on cases that you, another consultant, or a veterinarian has seen.

Let’s consider an imaginary cross sectional study that shows a correlation between coffee drinking and lung cancer. Ideally, you would want all variables beside coffee drinking—age, sex, diet, health, genetics, etc.—to be the same in both groups. Since this cannot be done outside of the laboratory (if even there), scientists strive to make the two groups as alike as possible and use a large sample size. Scientists also look for “confounding factors”—variables that trend together. In the above example, the fact that smokers are more apt to be coffee drinkers is a confounding factor.

Causation studies, on the other hand, seek to see if a something causes something else—in more scientific terms, the effects of one variable on another. There is always a control group—a group that is not subjected to what is being tested. For example, an experiment to determine if declawed cats miss the litter box more often would require a control group of cats with claws. Cause-and-effect research on the effect of declawing on litter box use could randomly declaw one group of kittens and leave another group with their claws, and then follow them over time in identical settings, possibly in a laboratory. Not only is this expensive, it is ethically questionable. A cohort study would be more applicable to pet behavior research. Cohort studies, for example, would follow two groups of owned kittens—clawed and declawed—over a period of time and keep records of litter box use. Since the selection of which kittens have their claws removed is not random, we must look for confounding factors—are there variables that people who choose to declaw their cat have in common? For instance, are they less apt to have litter boxes in accessible places, and more apt to have them in basements? The list of possible confounding factors can go on and on.

Sample size and p-value

One way researchers try to deal with the problem of confounding factors is to use a large sample size. Statistics can then be used to find the strength of the relationship between the variables: in my example, between missing the box and being declawed. Statistics is a complex subject, but one useful number for you to understand is the p-value. If a relationship is found between declawed cats and missing the litter box, the p-value is an indication of whether or not this relationship is due to chance. For example, consider two groups of 50 cats: a declawed group and a clawed group. Let’s say I see a correlation between declaws and missing the box, with a p-value of 0.05. This means that if instead of dividing that group of 100 cats into two groups of 50 clawed and 50 declawed, I randomly made two groups of cats, with 50 cats in each, with no regard as to whether or not they had claws, there would be a 5% chance that I would get the same result.  If my p-value is 0.10, then there is a 10% chance that the randomly constructed groups would have the correlation I obtained in my research. The higher the p-value, the more likely any observed correlation in a study is due to chance. A correlation is generally considered statistically significant if the p-value is less than 0.05. Unfortunately, p-values are not definitive—sample size affects their accuracy.

In addition to sample size, another important factor in a well-designed study is to define all of the terms. In my example, what exactly is missing the box? A more challenging behavior to define is friendliness. A variable must be measurable. Perhaps I could define friendly as a cat who, when taken in a carrier to a novel room by a novel person, exits the carrier and approaches the person within 30 seconds of the person opening the carrier. The terms used in research must be so scrupulously defined that a second researcher can use them to replicate the experiment and get the same result. Which brings me to my second point—experiments must be replicable. If other researchers cannot replicate my results, my findings lose their significance.

False positives

“False positives” are, unfortunately, common. In Professor McGue’s 2015 University of Minnesota course, Behavioral Genetics, he discusses why.  Sample sizes are invariably too low due to cost and difficulty in dealing with large numbers of subjects. McGue states that sample sizes less than 100,000 are prone to false positives. If the size of the effect under investigation is typically small, the problem of having a small sample size is compounded. Again using my example, if 4% of one group of 50 cats and 10% of the other group of 50 have litter box issues, then only a total of 7 cats ( 2 + 5) are showing the effect being studied. Considering that there are many potential reasons any of the study’s cats could be missing the box, this is a very small effect number. Also consider that my study design and my definitions could influence my outcome—maybe my data in my “friendliness” experiment shows many cats approaching the person after 40 seconds, so I change my definition. A researcher who shows a correlation is more likely to get additional funding than one who shows no correlation, which is a factor that promotes false positives. Of course, if there is a financial benefit to not showing a correlation, that may be an influence as well. Both of these influences may be inadvertent. Even scientists trying to be neutral may be subconsciously influenced by their desire for a certain outcome. A well-known example is discussed below.

The importance of not fooling yourself is discussed at length by Nobel prize–winning physicist Richard Feynman in his commencement address at Caltech given in 1974. Feynman discusses Millikan’s Nobel prize–winning research that measured the charge on an electron by an experiment with falling oil drops. Feynman explains:

[He] got an answer which we now know not to be quite right. It’s a little bit off, because he had the incorrect value for the viscosity of air. It’s interesting to look at the history of measurements of the charge of the electron, after Millikan.

Many researchers got the correct, higher number, but dismissed it: they assumed that they must have done something wrong, because Millikan was such a brilliant scientist. So they would look for and find a reason why something might have been wrong, and redo the experiment, trying to get a value closer to Millikan’s. However, the numbers they dismissed were actually correct. They eventually figured it out, but the respect they had for an authority such as Millikan slowed up their discovery.

Feynman elaborates:

The first principle is that you must not fool yourself—and you are the easiest person to fool. So you have to be very careful about that. After you’ve not fooled yourself, it’s easy not to fool other scientists. You just have to be honest in a conventional way after that.

The truth by authority trap

After reading this article, you should be able to analyze research articles, and avoid being the victim of what Nigel Warburton called the “truth by authority trap,” which he discusses in his book, A Little History of Philosophy.  For hundreds of years after Aristotle’s death, people would discount any new hypothesis that contradicted him. Because of “truth by authority” it was not until about 1,900 years later that Aristotle’s hypothesis that a heavier object would fall faster than a lighter one was tested and disproved, by Galileo Galilei. As Warburton states, “Authority doesn’t prove anything by itself.” So do not accept without question what respected names in our field state. Analyze what the limits of the research is, and do not try to give the findings more weight just because of who did the research.


Patience Fisher owns a Patience for Cats LLC, a cat behavior business based in Pittsburgh, PA; she also owns a nonfiction editing business. She holds a Bachelor’s in Biology, a Master’s in Engineering, a Diploma of Feline Behavior Science Technology, and is a certified veterinary assistant. Visit her on Facebook at Patience for Cats.