Visual observation.
Visually checking data during reliability is an important step that should precede any formal analysis. Graphs allow you to evaluate your confidence in the metric outputs or identify problems that the metrics mask. While it is ultimately up to each person to decide how much to emphasize data visualization, ignoring mismatches at this stage can lead to erroneous and untrustworthy results.
By "visual observation", we mean a few things:
By "visual observation", we mean a few things:
- Plot a time budget, e.g. the events plot output from BORIS
- Create a scatterplot, as we do in regression analyses
- Create a Bland-Altman plot
Example #1: plotting a time budget
These plots, generated from continuous behavioral data scored in BORIS, compare an expert and a trainee's scores during a 5-minute video. Both observers scored (from top to bottom): drinking water (dark blue), self-grooming (orange), manipulating bedding (purple), out of view (teal), and tongue flicks (brown). In the Observer 2 plot, we can see that some behaviors and bouts clearly align (drinking water, self-grooming), indicated by a blue checkmark, but there are clearly errors (indicated by a gray x). Tongue flicks appear to need more training to avoid overestimation, though the trainee is able to identify when it generally occurs. There's a big issue with manipulating bedding and out of view. The trainee appears to be confusing these behaviors for one another, and has issues determining the start and stop of these behaviors. Regardless of the formal statistical metric generated from a comparison that included this video, visual observation would tell us that the trainee is not reliable yet, and needs more orientation to some of these behaviors.
Example #2: scatterplots
This scatterplot compares the results of Observer 1 (Expert) against Observer 2 (Trainee). Both observers scored a set of 9 videos for duration of time an animal spent feeding (e.g. 0 - 25 seconds). Each black dot indicates one video. The red line indicates perfect agreement (1:1 match). This plot shows us there is a problem. All of the points are above the red line, indicating Observer 2 consistently overestimated outcomes compared to Observer 1. This graph would indicate a problem whether or not your formal statistical metrics indicated Observer 2 was reliable.
|
This scatterplot also indicates a problem, and a need for additional training. The trainee (Observer 2) does well with some videos compared to the expert, but not all of them. It seems like the trainee can mostly identify the behavior accurately when it's short, but overestimates the behavior when it is longer.
|
Example #3: line-by-line inspection
Visual observation can also be as simple as looking at the data line-by-line in an Excel or CSV file. In this example, Observer 1 (Expert) and Observer 2 (Trainee) scored the same 10 videos for bouts of play behavior (e.g. 0-10 number of bouts). By comparing across rows, we can look for any disagreement. The expert and trainee generally agree on whether or not any bouts of play behavior occurred in each video, and whether there were few or many bouts, but there are almost no instances of exact agreement. The trainee does not demonstrate a pattern of consistent over- or underestimation when scoring, so it is not clear at a glance what the exact problem may be. Additional training is needed, even if formal statistical metrics suggest the trainee is reliable.
|
Including the time in line-by-line visual observation can provide important context. In this example, head shakes were scored in one video as a point behavior, so only a single time is reported for each behavior (state behaviors, in comparison, would have both a start and stop time). The expert (Observer 1) and trainee (Observer 2) both identify that 6 head shakes occurred in this video, but the trainee misses the first occurrence (at 0:02) and scores 3 shakes between 1:13-1:16 instead of 2. This tells us that the trainee has some issues with this behavior, though it is not clear at a glance at the Excel file what that problem may be. Additional training is needed.
|
A challenge with using this line-by-line approach is that it can be difficult to identify patterns. Instead, making a visual representation, such as one of the plots described above and elsewhere on this site, is generally preferred for this specific purpose.