Remix.run Logo
hummuscience 2 days ago

The moment I started reading this, I got reminded of this recent study: https://arxiv.org/html/2503.10212v1

The scope is a bit different. The study uses an LLM to interpret pose estimation data and describe the behavior in each frame. The output is text which can be used to create embeddings of behavior. As someone who works in ethology, that's a clever (but maybe expensive) idea.

I think the author could use something similar. With multi-person pose estimation models.