Projects

Analysis of behavior across species in the cloud

PI: Pietro Perona (Division of Engineering and Applied Science)
SASE: Anna Ding, Scholar

Biologists and field scientists often rely on camera traps to capture footage of their subjects in the wild. This data may contain mostly videos without animals or non-target animals, which makes manual filtering both costly and time-consuming. To address these issues, many field scientists are turning to machine learning to assist with tasks such as detection, tracking, and downstream behavioral analysis. However, current methods in machine learning for camera trap analysis often require training a model for each new dataset. These processes require large quantities of human annotation, training, and retraining for novel environments, in addition to the time spent acquiring the necessary technical expertise. The Perona Lab at Caltech conducts research at the intersection of computer vision, neuroscience, and ecological applications. Collaborating with Professor Cat Hobaiter, a primatologist at the University of St. Andrews, the Scholar has developed a system for object detection in in-the-wild video footage.

The rise of large language models (like ChatGPT) has transformed the field of machine learning. Recent advances in language vision language model (LVLM) architecture and training have led to significant improvements in accuracy, robustness to noise, and ease of use. The rate of development and progress on these models is unfolding rapidly, with new models being released every few months. These models are already being applied in diverse fields such as medicine and robotics due to their power and flexibility.

Instead of relying on species-specific machine learning model training, the Scholar developed a system that takes natural language prompts and feeds them into an LVLM, making it adaptable to new environments and species without retraining. As part of this work, the Scholar developed evaluation metrics to measure performance across different sampling strategies and model settings, capturing tradeoffs between accuracy and processing time. The pipeline is available through a HuggingFace-hosted web interface that allows researchers to upload and analyze video data without installing any software. There are also options built in to access the tool through Google Colab and GitHub for those with local computers. The system is currently deployed in primatology labs at the University of St. Andrews, and there are plans to deploy it more widely in the ecology community. By combining recent advances in machine learning with an accessible deployment, this project lowers the technical barriers for using AI in behavioral analysis and wildlife monitoring to allow more scalable, efficient analyses of ecological video data.

The tool and documentation can be found here.