Tuesday, April 9, 2024

Engineers teach robots common-sense

Robots are being trained to perform complex household tasks, from cleaning up spills to serving food. One way they learn is through imitation, by copying the movements that a human guides them through. Robots are great imitators, but they need to be programmed to adjust to unexpected situations like bumps and nudges. Otherwise, they might struggle to complete their tasks efficiently.

MIT engineers are now aiming to teach robots common-sense knowledge to help them deal with unexpected situations that may arise during their tasks. To achieve this, they have developed a new method that connects robot motion data with large language models (LLMs), which can provide valuable insights and context to the robot.

The robot can logically break down complex household tasks into smaller subtasks and adapt to disruptions within a subtask without starting the entire task from scratch. This approach can save a lot of time and effort and reduce the need for engineers to explicitly program fixes for every possible failure.

“Imitation learning is a mainstream approach enabling household robots. But suppose a robot is blindly mimicking a human’s motion trajectories. In that case, tiny errors can accumulate and eventually derail the rest of the execution,” says Yanwei Wang, a graduate student in MIT’s Department of Electrical Engineering and Computer Science (EECS). “With our method, a robot can self-correct execution errors and improve overall task success.”

The researchers illustrate their new approach with simple tasks like scooping marbles from one bowl and pouring them into another. The usual approach involves multiple demonstrations by humans for the robot to mimic.

However, researchers have found a new approach that breaks down the task into sequences of subtasks or trajectories. This approach enables the robot to self-correct at the moment, which means it does not have to start from the beginning if it makes a mistake during any of the subtasks. It can simply correct itself and continue with the remaining subtasks. This approach has the potential to make robotic movements more efficient and less time-consuming.

The team found that LLMs can actually automate some tasks. These deep learning models can process vast amounts of text, allowing them to establish connections between words, sentences, and paragraphs. By doing so, they can generate new sentences based on their knowledge of the relationship between words.

The researchers also discovered that LLMs can be prompted to produce a list of subtasks that would be involved in a given task. For instance, if asked to list the actions involved in scooping marbles from one bowl into another, an LLM could come up with a sequence of verbs such as “reach,” “scoop,” “transport,” and “pour.”

“LLMs have a way to tell you how to do each step of a task in natural language. A human’s continuous demonstration is the embodiment of those steps in physical space,” Wang says. “And we wanted to connect the two so that a robot would automatically know what stage it is in a task and be able to replan and recover on its own.”

For their new approach, the team developed an algorithm to automatically connect an LLM’s natural language label for a particular subtask with a robot’s position in physical space or an image that captures the robot’s state, with the natural language label for a specific subtask of the robot. This approach involves the development of an algorithm that can automatically map the label of a subtask to the physical coordinates or image of the robot state. This process is commonly referred to as “grounding.”

The team’s algorithm is a grounding “classifier” that can learn to identify the semantic subtask that the robot is performing based on its physical coordinates or image view.

“With our method, when the robot is making mistakes, we don’t need to ask humans to program or give extra demonstrations of how to recover from failures,” Wang says. “That’s super exciting because there’s a huge effort now toward training household robots with data collected on teleoperation systems. Our algorithm can now convert that training data into robust robot behavior that can do complex tasks despite external perturbations.”