Rare outcomes.
Sometimes, the variable of interest is rare. It is important to consider if it is relevant to your proposal aims to include this outcome. If so, it is still important to consider consistency in these cases, though the process may be different.
Before data collection
- Other researchers may have examples of the behavior that they can share. Networking may help you build a test subset, in which case you can continue training in the same methods previously described
- If you cannot gather a large enough test subset, try collecting examples and developing a task-based training protocol
- At a minimum, we try to find at least 1-3 examples of the behavior or parameter to illustrate it. Explain to observers that this is something to look for and record if seen. In these cases, you likely will not be able to run formal reliability metrics before data collection begins
During and after data collection
- If the modality of data collection is one that can be revisited (i.e. video), the expert could check instances identified by observers to confirm correct identification and extract examples for use in future test subsets
- Consider discussing and reporting findings in the manuscript, even if the behavior was rare during data collection. In this case, you might not be able to conduct statistical comparisons, but you can report descriptive outcomes, which can add value to the literature and be used by future researchers. For the same reason, you might consider uploading examples of the rare outcome (photos or videos) to an open-source platform. This can provide context about what you scored, but can also aid in the future development of a test subset.
Example
In Downey and Tucker (2023), we measured abnormal oral behaviors in dairy heifers, including urine drinking. We identified a few instances of this behavior occurring prior to training, and used these clips to orient trainees to what the behavior looked like while we trained them to the definition. Urine drinking was rare over the duration of the experiment, and any time a trainee scored this behavior, the expert observer (lead author) confirmed each was a correct classification. This approach, of the expert observer checking each occurrence, would catch false positives (i.e., when a trainee said "urine drinking" occurred but upon review, it did not), but would miss false negatives (i.e. if urine drinking did occur but the trainee did not record it).
Video credit: Blair Downey; Downey and Tucker, 2023
|
In our methods, we note that "Time spent drinking urine was rare and not evaluated with a model." We provide descriptive information in our results: "Five heifers drank urine from a pen mate (19% of Hay heifers, 25% of Control) for a total of 1 to 10 s/12 h." We discuss this result briefly by identifying that it was observed in this age class and population of cattle, though it was rare. We cite the few other studies available that have also scored urine sucking, and found it to be rare as well. In an online data repository associated with this paper, we also include a video of what this behavior looked like to accompany our written definition.
|