Robots are getting better at reading the room, literally. A team at Brown University, led by graduate student Ivy He, built a planning system that lets robots interpret both spoken instructions and human pointing gestures when searching for objects.
The method combines a vision-language model with a POMDP framework. It treats a pointing gesture as a probability cone that aligns with the eye, elbow, and wrist direction. This approach was revealed at the International Conference on Human-Robot Interaction. It brings robots closer to working naturally with humans.



