NVIDIA demonstrates AI tool to tilt a whole head so that it appears you are looking at somebody … [+] instead of a camera off to the side
Video conference has become the new business travel, thanks to Covid-19, and we’re going to do more of it even after the virus. Thanks to the virus, though, more research is going into improving that, including a host of new AI techniques demonstrated as part of NVIDIA
Maxine, a platform of AI video tools NVIDIA is licencing to partners.
The ability of AI to help with video compression and upresolution is growing. Already, several tools to increase the resolution of old video are on the market, and in not too long, we’ll be watching old SD TV content in HD on a regular basis, and the surface has not been scratched. While Maxine offers AI based upscaling, video conferencing can make use of more than just video compression techniques.
When you have a talking head in front of a fixed background, not much changes over the course of the whole call. If you send high resolution images of the face and background, and in particular the face in various orientations and positions, it is possible to extract keypoints from all the facial muscles, and then animate the real face on the other end. You’ve all seen this happen when an actor like Andy Serkis drives an animated character like Gollum or Snoke, but with the right data one can animate one’s own face to get a realistic video over a tiny bandwidth channel. This can let people using a very low bandwidth mobile data channel still send reasonable video.
NVIDIA claims to send decent quality video in just 30 kilobits/second — the speed of an ancient dial-up modem.
Animating avatars this way has been done for decades, even before it was good enough to use in movies. New AI techniques are making these avatars much more realistic. There is a trade-off — people like the privacy aspect of sending an avatar, they don’t have to clean up, do make-up or even put on clothes or clean their room. At the same time an avatar just doesn’t convey the same human connection as real video, and using this in one direction and not the other would be creepy. Avatars have one big advantage if you want to use them in VR or AR, because there is no headset on their faces and you can move all around them.
A hot topic in video conferencing has been software that adjusts the position of your eyes so that you are looking straight at the other person, rather than off to the side or top where your camera is. That improves the human quality of a video call greatly. In the past, it’s been too hard to do — humans are very good at noticing when somebody’s eyes are not perfectly right. Recent efforts have shown more success. NVIDIA demonstrated shifting not just the eyes but the whole face, which is important for a camera to the side (as is needed with very large displays, or phones used in landscape mode.) This will need more proving out.
Everybody has seen (and is getting tired of) virtual backgrounds, a common feature in Zoom. NVIDIA claims theirs will go far further, with accuracy approaching a real greenscreen. (I have a greenscreen in my home studio and today there is no substitute for that.) These are of course also popular with people wanting to hide mess in their room, but they are also very useful for mixing your face with slides when doing a presentation.
A complement to virtual greenscreen, not demonstrated by NVIDIA but recently promised from Microsoft Teams
, is to take out the person from their background, and then put that head and shoulders into another environment — in this case a simulated audience in an auditorium, with heads packed in an interleaved way to show many people at once with lower bandwidth. Today’s virtual event tools abandon and isolate the audience, and speakers need to see their audience to do a better job. As a plus, this technique, because it does not show people unless they are showing a normal head and shoulders, makes “room bombing” with porn or other images more difficult.
NVIDIA also demonstrated background noise elimination. A similar demo was released for Google Meet’s
more expensive versions. This tool removes many background sounds for people not using a headset (people should still use a headset, though, it sounds much better and is much more two-directional and interactive.) That said, it seems impossible to convince people not to appear without headsets, and so eliminating the background sounds — chair creeks, keyboards, eating, children playing, planes flying over, leaf blowers etc. — is a plus. The Google demo also shows it removing the sound of applause, which an error — that’s a deliberate sound the person is trying to send and is missing online conferences.
NVIDIA has promoted that much of this processing can take place in the cloud, though obviously bandwidth reducing technology has to be done at at least one of the endpoints where the bandwidth is low. It should be noted that doing video and audio processing in the cloud makes it very hard to do end-to-end encryption, something that Zoom got into a lot of trouble for earlier this year. (In particular for saying they did it when they did not.) End to end encryption is always good, though it can be sacrificed for features if the security breaches you are worried about are unlikely to break link level encryption.
Kudos to NVIDIA for promoting new levels of video conferencing. Much more is coming in the future. Even after the virus, work-from-home will continue. I predict companies will soon start buying “work from home” workstations for employees to put in their homes, featuring high quality displays, cameras and audio, plus AI processors, to allow them to get the very best and most realistic video (and just as important, audio) experiences. Such workstations will cost less than just one business trip, so be easy to justify if they get the job done.
I founded ClariNet, the world’s first internet based business, am Chairman Emeritus of the Electronic Frontier Foundation, and a director of the Foresight Institute. My
I founded ClariNet, the world’s first internet based business, am Chairman Emeritus of the Electronic Frontier Foundation, and a director of the Foresight Institute. My current passion in self-driving vehicles and robots. I worked on Google’s car team in its early years and am an advisor and/or investor for car OEMs and many of the top startups in robocars, sensors, delivery robots and even some flying cars. Plus AR/VR and software. I am founding faculty and computing chair for Singularity University, and I write, consult and speak on robocar technology around the globe.
Nvidia, Videotelephony, Artificial intelligence, Avaya, Cloud computing
World news – CA – AI Applied To Video Conferencing Kicks It Up Several Notches