Lili Yu@liliyu_lili·Reply
My go-to test for visual proactivity in realtime systems is live finger counting. It sounds simple, but it requires the model to watch continuously, track visual changes, and respond at the right time. Other models we’ve tried could not do it. https://thinkingmachines.ai/blog/interaction-models/#benchmarks:~:text=Examples%20from%20our%20internal%20audio%20and%20video%20benchmark.