venturebeat.com Open in urlscan Pro
192.0.66.2  Public Scan

Submitted URL: https://link.mail.beehiiv.com/ss/c/u001.1KHRTbC8HPuHqA0YnUG6trNGXZYnTYnGRVy7daaCosMkGUyzfuzWgaXxYHQ1i86upQ408fhcAGVB6xsHcGjVfA...
Effective URL: https://venturebeat.com/ai/google-researchers-unveil-vlogger-an-ai-that-can-bring-still-photos-to-life/
Submission: On March 19 via api from US — Scanned from DE

Form analysis 2 forms found in the DOM

GET https://venturebeat.com/

<form method="get" action="https://venturebeat.com/" class="search-form" id="nav-search-form">
  <input id="mobile-search-input" class="" type="text" placeholder="Search" name="s" aria-label="Search" required="">
  <button type="submit" class="">
    <svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
      <g>
        <path fill-rule="evenodd" clip-rule="evenodd"
          d="M14.965 14.255H15.755L20.745 19.255L19.255 20.745L14.255 15.755V14.965L13.985 14.685C12.845 15.665 11.365 16.255 9.755 16.255C6.16504 16.255 3.255 13.345 3.255 9.755C3.255 6.16501 6.16504 3.255 9.755 3.255C13.345 3.255 16.255 6.16501 16.255 9.755C16.255 11.365 15.665 12.845 14.6851 13.985L14.965 14.255ZM5.255 9.755C5.255 12.245 7.26501 14.255 9.755 14.255C12.245 14.255 14.255 12.245 14.255 9.755C14.255 7.26501 12.245 5.255 9.755 5.255C7.26501 5.255 5.255 7.26501 5.255 9.755Z">
        </path>
      </g>
    </svg>
  </button>
</form>

<form action="" data-action="nonce_mailchimp_boilerplate_subscribe" id="boilerplateNewsletterForm" class="Form js-vb-newsletter-cta">
  <input type="email" name="email" placeholder="Email" class="Form__input" id="boilerplateNewsletterEmail" required="">
  <input type="hidden" name="newsletter" value="vb_dailyroundup">
  <input type="hidden" name="b_f67554569818c29c4c844d121_89d8059242" value="">
  <input type="hidden" id="nonce_mailchimp_boilerplate_subscribe" name="nonce_mailchimp_boilerplate_subscribe" value="baf2d776f7"><input type="hidden" name="_wp_http_referer"
    value="/ai/google-researchers-unveil-vlogger-an-ai-that-can-bring-still-photos-to-life/"> <button type="submit" class="Form__button Newsletter__sub-btn">Subscribe</button>
</form>

Text Content

WE VALUE YOUR PRIVACY

We and our partners store and/or access information on a device, such as cookies
and process personal data, such as unique identifiers and standard information
sent by a device for personalised ads and content, ad and content measurement,
and audience insights, as well as to develop and improve products. With your
permission we and our partners may use precise geolocation data and
identification through device scanning. You may click to consent to our and our
760 partners’ processing as described above. Alternatively you may access more
detailed information and change your preferences before consenting or to refuse
consenting. Please note that some processing of your personal data may not
require your consent, but you have a right to object to such processing. Your
preferences will apply to this website only. You can change your preferences at
any time by returning to this site or visit our privacy policy.
MORE OPTIONSAGREE

Skip to main content
Events Video Special Issues Jobs
VentureBeat Homepage

Subscribe

 * Artificial Intelligence
   * View All
   * AI, ML and Deep Learning
   * Auto ML
   * Data Labelling
   * Synthetic Data
   * Conversational AI
   * NLP
   * Text-to-Speech
 * Security
   * View All
   * Data Security and Privacy
   * Network Security and Privacy
   * Software Security
   * Computer Hardware Security
   * Cloud and Data Storage Security
 * Data Infrastructure
   * View All
   * Data Science
   * Data Management
   * Data Storage and Cloud
   * Big Data and Analytics
   * Data Networks
 * Automation
   * View All
   * Industrial Automation
   * Business Process Automation
   * Development Automation
   * Robotic Process Automation
   * Test Automation
 * Enterprise Analytics
   * View All
   * Business Intelligence
   * Disaster Recovery Business Continuity
   * Statistical Analysis
   * Predictive Analysis
 * More
   * Data Decision Makers
   * Virtual Communication
     * Team Collaboration
     * UCaaS
     * Virtual Reality Collaboration
     * Virtual Employee Experience
   * Programming & Development
     * Product Development
     * Application Development
     * Test Management
     * Development Languages


Subscribe Events Video Special Issues Jobs



GOOGLE RESEARCHERS UNVEIL ‘VLOGGER’, AN AI THAT CAN BRING STILL PHOTOS TO LIFE

Michael Nuñez@MichaelFNunez
March 18, 2024 6:00 AM
 * Share on Facebook
 * Share on X
 * Share on LinkedIn

Image Credit: enriccorona.github.io/vlogger

Join leaders in Boston on March 27 for an exclusive night of networking,
insights, and conversation. Request an invite here.

--------------------------------------------------------------------------------



Google researchers have developed a new artificial intelligence system that can
generate lifelike videos of people speaking, gesturing and moving — from just a
single still photo. The technology, called VLOGGER, relies on advanced machine
learning models to synthesize startlingly realistic footage, opening up a range
of potential applications while also raising concerns around deepfakes and
misinformation.

Described in a research paper titled “VLOGGER: Multimodal Diffusion for Embodied
Avatar Synthesis,” the AI model can take a photo of a person and an audio clip
as input, and then output a video that matches the audio, showing the person
speaking the words and making corresponding facial expressions, head movements
and hand gestures. The videos are not perfect, with some artifacts, but
represent a significant leap in the ability to animate still images.

1
/
5
Live from GTC 2024 - Interview with Ernst & Young
Read More

12





Video Player is loading.
Play Video
Unmute

Duration 0:00
/
Current Time 0:00
Playback Speed Settings
1x
Loaded: 0%

0:00

Remaining Time -0:00
 
FullscreenPlayRewind 10 SecondsUp Next

This is a modal window.



Beginning of dialog window. Escape will cancel and close the window.

TextColorWhiteBlackRedGreenBlueYellowMagentaCyanTransparencyOpaqueSemi-TransparentBackgroundColorBlackWhiteRedGreenBlueYellowMagentaCyanTransparencyOpaqueSemi-TransparentTransparentWindowColorBlackWhiteRedGreenBlueYellowMagentaCyanTransparencyTransparentSemi-TransparentOpaque
Font Size50%75%100%125%150%175%200%300%400%Text Edge
StyleNoneRaisedDepressedUniformDropshadowFont FamilyProportional
Sans-SerifMonospace Sans-SerifProportional SerifMonospace SerifCasualScriptSmall
Caps
Reset restore all settings to the default valuesDone
Close Modal Dialog

End of dialog window.

Share
Playback Speed

0.25x
0.5x
1x Normal
1.5x
2x
Replay the list

TOP ARTICLES






 * Powered by AnyClip
 * Privacy Policy




Live from GTC 2024 - Interview with Ernst & Young

VLOGGER generates photorealistic videos of talking and gesturing avatars from a
single image. (Credit: enriccorona.github.io)


A BREAKTHROUGH IN SYNTHESIZING TALKING HEADS

The researchers, led by Enric Corona at Google Research, leveraged a type of
machine learning model called diffusion models to achieve the novel result.
Diffusion models have recently shown remarkable performance at generating highly
realistic images from text descriptions. By extending them into the video domain
and training on a vast new dataset, the team was able to create an AI system
that can bring photos to life in a highly convincing way.

“In contrast to previous work, our method does not require training for each
person, does not rely on face detection and cropping, generates the complete
image (not just the face or the lips), and considers a broad spectrum of
scenarios (e.g. visible torso or diverse subject identities) that are critical
to correctly synthesize humans who communicate,” the authors wrote.

advertisement



VB EVENT

The AI Impact Tour – Atlanta

Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on
April 10th. This exclusive, invite-only event, in partnership with Microsoft,
will feature discussions on how generative AI is transforming the security
workforce. Space is limited, so request an invite today.

Request an invite

A key enabler was the curation of a huge new dataset called MENTOR containing
over 800,000 diverse identities and 2,200 hours of video — an order of magnitude
larger than what was previously available. This allowed VLOGGER to learn to
generate videos of people with varied ethnicities, ages, clothing, poses and
surroundings without bias.


POTENTIAL APPLICATIONS AND SOCIETAL IMPLICATIONS 

The technology opens up a range of compelling use cases. The paper demonstrates
VLOGGER’s ability to automatically dub videos into other languages by simply
swapping out the audio track, to seamlessly edit and fill in missing frames in a
video, and to create full videos of a person from a single photo.

advertisement


One could imagine actors being able to license detailed 3D models of themselves
that could be used to generate new performances. The technology could also be
used to create photorealistic avatars for virtual reality and gaming. And it
might enable the creation of AI-powered virtual assistants and chatbots that are
more engaging and expressive.

Google sees VLOGGER as a step toward “embodied conversational agents” that can
engage with humans naturally through speech, gestures and eye contact. “VLOGGER
can be used as a stand-alone solution for presentations, education, narration,
low-bandwidth online communication, and as an interface for text-only
human-computer interaction,” the authors wrote.

However, the technology also has the potential for misuse, for example in
creating deepfakes — synthetic media in which a person in a video is replaced
with someone else’s likeness. As these AI-generated videos become more realistic
and easier to create, it could exacerbate the challenges around misinformation
and digital fakery.


A NEW FRONTIER IN AI RESEARCH

While impressive, VLOGGER still has limitations. The generated videos are
relatively short and have a static background. The individuals don’t move around
a 3D environment. And their mannerisms and speech patterns, while realistic, are
not yet indistinguishable from those of real humans.

advertisement


Nonetheless, VLOGGER represents a significant step forward. “We evaluate VLOGGER
on three different benchmarks and show that the proposed model surpasses other
state-of-the-art methods in image quality, identity preservation and temporal
consistency,” the authors reported.

With further advances, this type of AI-generated media is likely to become
ubiquitous. We may soon live in a world where it is hard to tell whether the
person speaking to us in a video is real or generated by a computer program. 

VLOGGER provides an early glimpse of that future. It is a powerful demonstration
of the rapid progress being made in artificial intelligence and a sign of the
increasing challenges we will face in distinguishing between what is real and
what is fake.

VB Daily

Stay in the know! Get the latest news in your inbox daily

Subscribe

By subscribing, you agree to VentureBeat's Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.






NEXT STOP: AI IMPACT TOUR BOSTON

Join us in Boston an exclusive invitation-only evening of networking and
insights to discuss how to ensure data integrity for enterprise AI.

Request an Invite


 * VentureBeat Homepage
 * Follow us on Facebook
 * Follow us on X
 * Follow us on LinkedIn
 * Follow us on RSS

 * Press Releases
 * Contact Us
 * Advertise
 * Share a News Tip
 * Contribute to DataDecisionMakers

 * Privacy Policy
 * Terms of Service
 * Do Not Sell My Personal Information

© 2024 VentureBeat. All rights reserved.