venturebeat.com Open in urlscan Pro
192.0.66.2  Public Scan

URL: https://venturebeat.com/ai/the-self-operating-computer-emerges/
Submission: On December 05 via api from US — Scanned from DE

Form analysis 1 forms found in the DOM

GET https://venturebeat.com/

<form method="get" action="https://venturebeat.com/" class="search-form" id="nav-search-form">
  <input id="mobile-search-input" class="" type="text" placeholder="Search" name="s" aria-label="Search" required="">
  <button type="submit" class="">
    <svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
      <g>
        <path fill-rule="evenodd" clip-rule="evenodd"
          d="M14.965 14.255H15.755L20.745 19.255L19.255 20.745L14.255 15.755V14.965L13.985 14.685C12.845 15.665 11.365 16.255 9.755 16.255C6.16504 16.255 3.255 13.345 3.255 9.755C3.255 6.16501 6.16504 3.255 9.755 3.255C13.345 3.255 16.255 6.16501 16.255 9.755C16.255 11.365 15.665 12.845 14.6851 13.985L14.965 14.255ZM5.255 9.755C5.255 12.245 7.26501 14.255 9.755 14.255C12.245 14.255 14.255 12.245 14.255 9.755C14.255 7.26501 12.245 5.255 9.755 5.255C7.26501 5.255 5.255 7.26501 5.255 9.755Z">
        </path>
      </g>
    </svg>
  </button>
</form>

Text Content

WE VALUE YOUR PRIVACY

We and our partners store and/or access information on a device, such as cookies
and process personal data, such as unique identifiers and standard information
sent by a device for personalised ads and content, ad and content measurement,
and audience insights, as well as to develop and improve products. With your
permission we and our partners may use precise geolocation data and
identification through device scanning. You may click to consent to our and our
760 partners’ processing as described above. Alternatively you may access more
detailed information and change your preferences before consenting or to refuse
consenting. Please note that some processing of your personal data may not
require your consent, but you have a right to object to such processing. Your
preferences will apply to this website only. You can change your preferences at
any time by returning to this site or visit our privacy policy.
MORE OPTIONSAGREE

Skip to main content
Events Video Special Issues Jobs
VentureBeat Homepage

Subscribe

 * Artificial Intelligence
   * View All
   * AI, ML and Deep Learning
   * Auto ML
   * Data Labelling
   * Synthetic Data
   * Conversational AI
   * NLP
   * Text-to-Speech
 * Security
   * View All
   * Data Security and Privacy
   * Network Security and Privacy
   * Software Security
   * Computer Hardware Security
   * Cloud and Data Storage Security
 * Data Infrastructure
   * View All
   * Data Science
   * Data Management
   * Data Storage and Cloud
   * Big Data and Analytics
   * Data Networks
 * Automation
   * View All
   * Industrial Automation
   * Business Process Automation
   * Development Automation
   * Robotic Process Automation
   * Test Automation
 * Enterprise Analytics
   * View All
   * Business Intelligence
   * Disaster Recovery Business Continuity
   * Statistical Analysis
   * Predictive Analysis
 * More
   * Data Decision Makers
   * Virtual Communication
     * Team Collaboration
     * UCaaS
     * Virtual Reality Collaboration
     * Virtual Employee Experience
   * Programming & Development
     * Product Development
     * Application Development
     * Test Management
     * Development Languages


Subscribe Events Video Special Issues Jobs



THE ‘SELF-OPERATING’ COMPUTER EMERGES

Bryson Masse@Bryson_M
November 28, 2023 12:02 PM
 * Share on Facebook
 * Share on X
 * Share on LinkedIn



Are you ready to bring more awareness to your brand? Consider becoming a sponsor
for The AI Impact Tour. Learn more about the opportunities here.

--------------------------------------------------------------------------------



Late nights with a newborn can lead to unexpected breakthroughs. Such was the
case for OthersideAI developer Josh Bickett, who had an idea for a
groundbreaking new “self-operating computer framework” while feeding his
daughter in the middle of the night.



As Bickett explained to VentureBeat, “I’ve been really enjoying time with my
daughter, who’s four weeks now old and I had a lot of new lessons in fatherhood
and all that stuff. But I also had  a little bit of time, and this idea kind of
came to me because I saw different demos of GPT-4 vision. The thing we’re
working on now can actually happen with GPT-4 vision.” 

With his daughter cradled in one arm, Bickett sketched out the basic framework
on his computer. “I just found an initial implementation…it’s not super good at
clicking the mouse in the right way. But what we’re doing is defining the
problem: we need to figure out how to operate a computer.”

When OthersideAI co-founder and CEO Matt Shumer saw the new framework, he
recognized its tremendous potential. As Shumer told VentureBeat, “This is a
milestone in the road to getting to the equivalent of a self-driving car but for
a computer. We have the sensors now. We have the LIDAR systems. Next, we build
the intelligence.”


VB EVENT

The AI Impact Tour

Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming
to a city near you!

 


Learn More


AN AI THAT DECIDES WHERE AND WHAT TO CLICK ON YOUR PC

As Bickett described, the framework “lets the AI control both the mouse where it
clicks and all the keyboard triggers essentially. It’s like an agent like
autoGPT except it’s not text based. It’s vision based so it takes a screenshot
of the computer and then it decides mouse clicks and keyboards, exactly like a
person would.”

Shumer elaborated on how this framework represents a major advance over previous
approaches that relied solely on APIs.

advertisement


“A lot of things that people do on computers, right, you can’t really do with
APIs, which is how a lot of other people are approaching this problem, [when]
they want to build an agent. They built it on top of the publicly available APIs
for this service, but that doesn’t extend to everything.” As Shumer asserted,
“If you truly want to solve something that is autonomous [and] can actually help
us or get more done. You have to allow it to work like a person because the
world is built for people.”

The framework takes screenshots as input and outputs mouse clicks and keyboard
commands, just as a human would. But as both Bickett and Shumer acknowledged,
the real potential lies not in the lightweight framework itself, but in the
advanced computer vision and reasoning models that can be plugged into it. “The
framework will just be like plug and play, you just plug in a better model and
it gets better,” said Bickett.

How AI agents will change computing as we know it

When asked by VentureBeat about the future implications, Shumer painted a bold
vision: “Once this thing is sufficiently reliable, it is going to be your
computer, it is going to be your interface to the digital world.” 

With the self-operating computer framework in place, advanced AI models could
learn to take over all computer interactions just through conversational
commands.

advertisement


As Shumer predicted, different types of specialized computer agent models will
likely emerge to handle different tasks.

 Some may focus on speed for simpler tasks, while others excel at complex
reasoning. Models may also vary for enterprise vs. consumer use cases. But the
overarching goal, according to Shumer, is to develop agents that enable a world
“where people can say, this is what I hate doing. Now, I don’t have to do it
anymore. And we want to make it so damn easy that somebody who can barely use a
computer from the beginning can do it.”


OPEN SOURCE TO FUEL DEVELOPMENT

Bickett believes the open source nature of the framework will further accelerate
progress, allowing developers worldwide to experiment with new applications.
Shumer agreed there is “room for a lot of players in this space…a range of model
providers. A range of applications. And there are going to be a lot of spaces in
this industry to build really really big businesses.”

While Bickett and Shumer see enormous potential, realizing the vision of truly
intelligent computer agents will require immense resources and continued
innovation. 

advertisement


To that end, AI research company Imbue, formerly known as Generally Intelligent,
recently secured a $150 million partnership with Dell to build a powerful AI
training platform.

The massive cluster of around 10,000 Nvidia H100 GPUs will allow Imbue to
develop new foundation models optimized specifically for reasoning abilities, a
key focus of their work. As Imbue co-founder and CEO Kanjun Qiu noted,
“reasoning is the core blocker to agents that work really well.”

Imbue believes robust reasoning is paramount for developing truly effective AI
agents, as it allows machines to handle uncertainty, adapt approaches, gather
new information, make complex decisions, and grapple with real-world
complexities – abilities crucial for functioning autonomously beyond narrow
tasks. 

advertisement


Thecompany adopts a “full stack” methodology encompassing optimized foundation
model training, experimental agent and interface prototyping, robust
tool-building, and theoretical AI research – aiming to advance both the
practical and fundamental understanding of deep learning with the goal of
engineering AI capable of human-level reasoning and eventual artificial general
intelligence..

While the self-operating computer framework is just the first step, Bickett and
Shumer see it ushering in a new era where sophisticated AI agents replace human
computing interfaces entirely. Late nights may keep yielding paradigm-shifting
ideas, but it will take focused work to realize the full vision of computers
that just work – for anyone, anywhere – through ordinary language alone.

VentureBeat's mission is to be a digital town square for technical
decision-makers to gain knowledge about transformative enterprise technology and
transact. Discover our Briefings.




THE AI IMPACT TOUR

Join us for an evening full of networking and insights at VentureBeat's AI
Impact Tour, coming to San Francisco, New York, and Los Angeles!

Learn More


 * VentureBeat Homepage
 * Follow us on Facebook
 * Follow us on X
 * Follow us on LinkedIn
 * Follow us on RSS

 * Press Releases
 * Contact Us
 * Advertise
 * Share a News Tip
 * Contribute to DataDecisionMakers

 * Careers
 * Privacy Policy
 * Terms of Service
 * Do Not Sell My Personal Information

© 2023 VentureBeat. All rights reserved.