Cornell researchers have developed two technologies that track eye movements and read the wearer’s facial expressions through sonar-like sensing. The technology is small enough to fit on commercial smartglasses or virtual reality (VR) or augmented reality (AR) headsets, yet consumes significantly less power than similar tools using cameras.
Both technologies use speakers and microphones on eyeglasses to bounce soundwaves off the face and pick up reflected signals from facial and eye movements.
GazeTrak is the first eye-tracking system that relies on acoustic signals, while EyeEcho is the first eyeglass-based system that can continuously and accurately detect facial expressions and recreate them through an avatar in real-time. Both devices can last for several hours on a smartglass battery and more than a day on a VR headset.
GazeTrak uses innovative technology to track the direction of the user’s gaze. By utilizing one speaker and four microphones on each lens frame of a pair of glasses, the system emits inaudible sound waves that echo off the eyeball and get picked up by the microphones.
The resulting sound signals are fed into a customized deep-learning pipeline that uses artificial intelligence to continuously analyze the millisecond differences to determine the direction of the user’s gaze. The technology is not affected by loud background noises, making it a reliable tool for tracking gaze.
Currently, GazeTrak isn’t as accurate as other eye-tracking wearables that use conventional cameras. However, the technology is much more energy-efficient, consuming only 5% of the power required by other devices. The researchers behind GazeTrak claim that if the device were to use the same battery capacity as the Tobii Pro Glasses 3, it could run for up to 38.5 hours, compared to the Tobii’s 1.75 hours. Moreover, the accuracy of GazeTrak is expected to improve significantly as the technology advances.
EyeEcho uses one speaker and one microphone located next to the glasses’ hinges to capture facial expressions through skin movement. The reflected signals are then interpreted using AI to enable hands-free video calls through an avatar, even in noisy environments.
EyeEcho can track facial expressions continuously, unlike other smartglasses that only recognize faces or distinguish between a few specific expressions. The EyeEcho system is highly accurate at reading facial expressions with just four minutes of training on each of the 12 test subjects’ faces. The system can still recognize expressions accurately even when the subjects are performing a variety of everyday activities in different environments.
EyeEcho offers better performance than EarIO, which is another expression-reading system developed by Ke Li and colleagues. It is said to offer better performance using less training data, plus its accuracy remains stable over a longer period of time.
“There are many camera-based systems in this area of research or even on commercial products to track facial expressions or gaze movements, like Vision Pro or Oculus,” he said. “But not everyone wants cameras on wearables to capture you and your surroundings all the time.”
“The privacy concerns associated with systems that use video will become more and more important as VR/AR headsets become much smaller and, ultimately, similar to today’s smartglasses,” said co-author François Guimbretière, professor of information science in Cornell Bowers CIS and the multicollege Department of Design Tech. “Because both technologies are so small and power-efficient, they will be a perfect match for lightweight, smart AR glasses.”