Animal Behavior Reliability
  • Home
  • About
  • Foundations
    • Proposal
    • Measurements >
      • Definitions
    • Team makeup
    • Training >
      • Features of test subsets
      • Assessment
    • Metrics
  • Diving deeper
    • Iterative training processes >
      • Tasks and techniques
      • Categorical data
      • Continuous data
      • Rare outcomes
    • Timeline
    • Troubleshooting
    • Reporting
  • Checklist
  • Resources

Other Statistical Metrics.​

There are other methods that can be useful in describing or accounting for reliability, but are not as robust as the previously described metrics. They may be useful for some types of data or experiments, however. We describe some that are commonly referenced, though there are others beyond what we have included here.
  • Including observer in the model
  • Mean difference
  • Coefficient of variation (CV)
  • Percent agreement
<
>
To account for potential variability in observer reliability, observer ID can be included in the statistical model as a random effect. This approach accounts for variation across observers, without explicitly estimating the effect of each observer and assuming observer has a systematic effect on your outcomes of interest.

​This method may require caution, as a more robust strategy would be to first train observers to be highly (and similarly) reliable to ensure quality data collection, rather than accept high observer variation and potentially high variability in data quality.

This method may be used in other cases, to account for drift over time. For example, if you achieved strong reliability before data collection began using a more robust metric, but found your team no longer met these cutoffs when you re-tested reliability after the experiment, including observer ID in your models could help account for some drift. If data are in a format that allows them to be re-analyzed, however (e.g. video, photos), that is likely preferable.
This provides an average of the differences between a pair of observations, and measures the degree of systematic differences between the observers. It can provide an estimate of the degree of bias when using one set of scores as a "gold standard". While it is easy to perform and interpret, it should not be used in isolation, and instead may be used to complement a Bland-Altman plot.
​This expresses standard deviation as a percentage of the mean value of 2 sets of paired observations. The resulting value (between 0-1) describes the variability between observations, with lower values indicating less variability. This calculation can be quite tedious, as it requires calculating a CV for each set of paired observations, then averaging all CVs to produce one overall CV. 
This describes the proportion of times that individuals agree on a set of paired observations. The percent agreement is calculated as number of agreements divided by the total number of observations, multiplied by 100. Higher values closer to 100 represent better agreement.

Percent agreement does not account for chance agreement between observers, and is thus likely to overestimate how much they agree. This can be exacerbated if the observations represent categorical data with >2 categories, or if the observations are imbalanced, with some categories overrepresented. 
< Metrics
Diving deeper >
Picture
Picture
Picture
Proudly powered by Weebly
  • Home
  • About
  • Foundations
    • Proposal
    • Measurements >
      • Definitions
    • Team makeup
    • Training >
      • Features of test subsets
      • Assessment
    • Metrics
  • Diving deeper
    • Iterative training processes >
      • Tasks and techniques
      • Categorical data
      • Continuous data
      • Rare outcomes
    • Timeline
    • Troubleshooting
    • Reporting
  • Checklist
  • Resources