Animal Behavior Reliability
  • Home
  • About
  • Foundations
    • Proposal
    • Measurements >
      • Definitions
    • Team makeup
    • Training >
      • Features of test subsets
      • Assessment
    • Metrics
  • Diving deeper
    • Iterative training processes >
      • Tasks and techniques
      • Categorical data
      • Continuous data
      • Rare outcomes
    • Timeline
    • Troubleshooting
    • Reporting
  • Checklist
  • Resources

Mean difference​.

Categorical or continuous data, descriptive statistics
​​This descriptive analysis looks the average of the differences between pairs of observations. It can provide an estimate of the degree of bias when using one set of scores as a "gold standard". 

Mean difference, while easy to perform and interpret, it can only diagnose disagreement between observers. That is, if there is a mean difference, agreement is poor, but if there is no mean difference, agreement may or may not be poor. In other words: it can 'rule out' agreement but it can never 'rule in' agreement. For this reason, mean difference should not be used in isolation, and instead may be used to complement a formal statistical test (e.g. concordance for categorical data, ICC for continuous data). A Bland-Altman plot may be a more useful descriptive analysis to use for continuous data.

Example #1: Continuous data

Two observers scored the same 10 videos for number of vocalizations. On the left, you can see a scatterplot of their scores (similar to an example on the regression page). The difference between observer scores were calculated and used to create the boxplot on the right. The "x" in the boxplot represents the mean difference between observers, which was 3.2 in this example. In these figures, we can see that Observer 2 is consistently overestimating this outcome compared to Observer 1. 
Picture

Example #2: Categorical data

Two observers scored the same 10 videos for presence or absence of nesting behavior. Presence was scored as a 1 while absence was a 0. On the left, you can see their original answers (also used in the percent agreement example), where Observer 1 is the Expert and 2 is the Trainee. The difference between observer scores were calculated and used to create the boxplot in the middle. The "x" in the boxplot represents the mean difference between observers, which was 0.3 in this example, suggesting disagreement. We can then create a contingency table (right) to summarize where the disagreements lie; using this, we can see that Observer 2 consistently overestimates the performance of nesting behavior compared to the expert, scoring it as occurring 3 times when Observer 1 said it was absent. 
Picture
Picture
Picture
< Metrics
Picture
Picture
Picture
Proudly powered by Weebly
  • Home
  • About
  • Foundations
    • Proposal
    • Measurements >
      • Definitions
    • Team makeup
    • Training >
      • Features of test subsets
      • Assessment
    • Metrics
  • Diving deeper
    • Iterative training processes >
      • Tasks and techniques
      • Categorical data
      • Continuous data
      • Rare outcomes
    • Timeline
    • Troubleshooting
    • Reporting
  • Checklist
  • Resources