arstechnica.com Open in urlscan Pro
3.20.29.237  Public Scan

Submitted URL: https://cvbpm04.na1.hubspotlinks.com/Ctc/I6+113/cVbPm04/VW0gvV3y2KR4W4SzJsT2cZFzFW2QY3Nh5dlrRpN4Tr4MK5nXHsW69t95C6lZ3kVW8YFpg0438NHTW...
Effective URL: https://arstechnica.com/information-technology/2023/03/you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-r...
Submission: On May 02 via api from BE — Scanned from DE

Form analysis 1 forms found in the DOM

GET /search/

<form action="/search/" method="GET" id="search_form">
  <input type="hidden" name="ie" value="UTF-8">
  <input type="text" name="q" id="hdr_search_input" value="" aria-label="Search..." placeholder="Search...">
</form>

Text Content

Skip to main content
 * Biz & IT
 * Tech
 * Science
 * Policy
 * Cars
 * Gaming & Culture
 * Store
 * Forums

Subscribe

Close


NAVIGATE

 * Store
 * Subscribe
 * Videos
 * Features
 * Reviews

 * RSS Feeds
 * Mobile Site

 * About Ars
 * Staff Directory
 * Contact Us

 * Advertise with Ars
 * Reprints


FILTER BY TOPIC

 * Biz & IT
 * Tech
 * Science
 * Policy
 * Cars
 * Gaming & Culture
 * Store
 * Forums


SETTINGS

Front page layout


Grid


List


Site theme

light

dark

Sign in

POCKET-SIZED HALLUCINATION ON DEMAND —


YOU CAN NOW RUN A GPT-3-LEVEL AI MODEL ON YOUR LAPTOP, PHONE, AND RASPBERRY PI


THANKS TO META LLAMA, AI TEXT MODELS MAY HAVE THEIR "STABLE DIFFUSION MOMENT."

Benj Edwards - 3/14/2023, 12:16 AM

Enlarge
Ars Technica

READER COMMENTS

150

Things are moving at lightning speed in AI Land. On Friday, a software developer
named Georgi Gerganov created a tool called "llama.cpp" that can run Meta's new
GPT-3-class AI large language model, LLaMA, locally on a Mac laptop. Soon
thereafter, people worked out how to run LLaMA on Windows as well. Then someone
showed it running on a Pixel 6 phone, and next came a Raspberry Pi (albeit
running very slowly).


FURTHER READING

Meta unveils a new large language model that can run on a single GPU [Updated]

If this keeps up, we may be looking at a pocket-sized ChatGPT competitor before
we know it.

But let's back up a minute, because we're not quite there yet. (At least not
today—as in literally today, March 13, 2023.) But what will arrive next week, no
one knows.

Since ChatGPT launched, some people have been frustrated by the AI model's
built-in limits that prevent it from discussing topics that OpenAI has deemed
sensitive. Thus began the dream—in some quarters—of an open source large
language model (LLM) that anyone could run locally without censorship and
without paying API fees to OpenAI.

Open source solutions do exist (such as GPT-J), but they require a lot of GPU
RAM and storage space. Other open source alternatives could not boast
GPT-3-level performance on readily available consumer-level hardware.

Enter LLaMA, an LLM available in parameter sizes ranging from 7B to 65B (that's
"B" as in "billion parameters," which are floating point numbers stored in
matrices that represent what the model "knows"). LLaMA made a heady claim: that
its smaller-sized models could match OpenAI's GPT-3, the foundational model that
powers ChatGPT, in the quality and speed of its output. There was just one
problem—Meta released the LLaMA code open source, but it held back the "weights"
(the trained "knowledge" stored in a neural network) for qualified researchers
only.

Advertisement



FLYING AT THE SPEED OF LLAMA

Meta's restrictions on LLaMA didn't last long, because on March 2, someone
leaked the LLaMA weights on BitTorrent. Since then, there has been an explosion
of development surrounding LLaMA. Independent AI researcher Simon Willison has
compared this situation to the release of Stable Diffusion, an open source image
synthesis model that launched last August. Here's what he wrote in a post on his
blog:

> It feels to me like that Stable Diffusion moment back in August kick-started
> the entire new wave of interest in generative AI—which was then pushed into
> over-drive by the release of ChatGPT at the end of November.
> 
> That Stable Diffusion moment is happening again right now, for large language
> models—the technology behind ChatGPT itself. This morning I ran a GPT-3 class
> language model on my own personal laptop for the first time!
> 
> AI stuff was weird already. It’s about to get a whole lot weirder.

Typically, running GPT-3 requires several datacenter-class A100 GPUs (also, the
weights for GPT-3 are not public), but LLaMA made waves because it could run on
a single beefy consumer GPU. And now, with optimizations that reduce the model
size using a technique called quantization, LLaMA can run on an M1 Mac or a
lesser Nvidia consumer GPU (although "llama.cpp" only runs on CPU at the
moment—which is impressive and surprising in its own way).

Things are moving so quickly that it's sometimes difficult to keep up with the
latest developments. (Regarding AI's rate of progress, a fellow AI reporter told
Ars, "It's like those videos of dogs where you upend a crate of tennis balls on
them. [They] don't know where to chase first and get lost in the confusion.")

For example, here's a list of notable LLaMA-related events based on a timeline
Willison laid out in a Hacker News comment:

 * February 24, 2023: Meta AI announces LLaMA.
 * March 2, 2023: Someone leaks the LLaMA models via BitTorrent.
 * March 10, 2023: Georgi Gerganov creates llama.cpp, which can run on an M1
   Mac.
 * March 11, 2023: Artem Andreenko runs LLaMA 7B (slowly) on a Raspberry Pi 4,
   4GB RAM, 10 sec/token.
 * March 12, 2023: LLaMA 7B running on NPX, a node.js execution tool.
 * March 13, 2023: Someone gets llama.cpp running on a Pixel 6 phone, also very
   slowly.
 * March 13, 2023, 2023: Stanford releases Alpaca 7B, an instruction-tuned
   version of LLaMA 7B that "behaves similarly to OpenAI's "text-davinci-003"
   but runs on much less powerful hardware.

Advertisement


After obtaining the LLaMA weights ourselves, we followed Willison's instructions
and got the 7B parameter version running on an M1 Macbook Air, and it runs at a
reasonable rate of speed. You call it as a script on the command line with a
prompt, and LLaMA does its best to complete it in a reasonable way.

Enlarge / A screenshot of LLaMA 7B in action on a MacBook Air running llama.cpp.
Benj Edwards / Ars Technica

There's still the question of how much the quantization affects the quality of
the output. In our tests, LLaMA 7B trimmed down to 4-bit quantization was very
impressive for running on a MacBook Air—but still not on par with what you might
expect from ChatGPT. It's entirely possible that better prompting techniques
might generate better results.

Also, optimizations and fine-tunings come quickly when everyone has their hands
on the code and the weights—even though LLaMA is still saddled with some fairly
restrictive terms of use. The release of Alpaca today by Stanford proves that
fine tuning (additional training with a specific goal in mind) can improve
performance, and it's still early days after LLaMA's release.


FURTHER READING

Get ready to meet the Chat GPT clones

As of this writing, running LLaMA on a Mac remains a fairly technical exercise.
You have to install Python and Xcode and be familiar with working on the command
line. Willison has good step-by-step instructions for anyone who would like to
attempt it. But that may soon change as developers continue to code away.

As for the implications of having this tech out in the wild—no one knows yet.
While some worry about AI's impact as a tool for spam and misinformation,
Willison says, "It’s not going to be un-invented, so I think our priority should
be figuring out the most constructive possible ways to use it."

Right now, our only guarantee is that things will change rapidly.


ARS VIDEO


HOW SCIENTISTS RESPOND TO SCIENCE DENIERS




READER COMMENTS

150
Benj Edwards Benj Edwards is an AI and Machine Learning Reporter for Ars
Technica. In his free time, he writes and records music, collects vintage
computers, and enjoys nature. He lives in Raleigh, NC.

Advertisement



PROMOTED COMMENTS

LetterRip

> They claimed similar performance to GPT-3 on a bunch of tasks in their paper,
> are you saying that's not reproducible?


Most tasks work great with a context size of 256 tokens. Simple Q&A type
questions.



> Or is it just that it's not as good in terms of context size?


Very simple problems like 'what is the capital of X', or 'what is the difference
between an X and Y' can usually be answered in short contexts. Any sort of
useful programming problem or multiple interactions requires much longer
contexts - frequently 4k-8k.



> Also what does complexity mean in this context?


Number of interacting 'things' and their interactions. Roughly number of
nouns/objects and verbs/actions and how complexly they are interacting. If I ask
LLaMa to do a simple function of two variables, it has no problem. More complex
ideas it quickly fails.



> And what relation is there between context size and model total parameters,
> could you explain that further?


The smaller the model the higher the loss per token, and the sooner it 'caps
out' (no further reduction in token loss for a longer context. For bigger models
the more context the better they can accurately predict future tokens


https://github.com/BlinkDL/RWKV-LM/raw/main/RWKV-ctxlen.png

See the cap for RWKV of around 1.93 at 1.1K tokens for 1 Billion parameter model
vs 1.64 for the 16 Billion parameter model,



> Sorry for the twenty questions, I'm just intensely curious about the subject
> and I've read a lot of your other comments about AI


Happy to answer questions on this.
March 14, 2023 at 12:53 am



CHANNEL ARS TECHNICA

UNSOLVED MYSTERIES OF QUANTUM LEAP WITH DONALD P. BELLISARIO

Today "Quantum Leap" series creator Donald P. Bellisario joins Ars Technica to
answer once and for all the lingering questions we have about his enduringly
popular show. Was Dr. Sam Beckett really leaping between all those time periods
and people or did he simply imagine it all? What do people in the waiting room
do while Sam is in their bodies? What happens to Sam's loyal ally Al? 30 years
following the series finale, answers to these mysteries and more await.

 * UNSOLVED MYSTERIES OF QUANTUM LEAP WITH DONALD P. BELLISARIO

 * UNSOLVED MYSTERIES OF WARHAMMER 40K WITH AUTHOR DAN ABNETT

 * SITREP: F-16 REPLACEMENT SEARCH A SIGNAL OF F-35 FAIL?

 * SITREP: BOEING 707

 * STEVE BURKE OF GAMERSNEXUS REACTS TO THEIR TOP 1000 COMMENTS ON YOUTUBE

 * MODERN VINTAGE GAMER REACTS TO HIS TOP 1000 COMMENTS ON YOUTUBE

 * HOW THE NES CONQUERED A SKEPTICAL AMERICA IN 1985

 * SCOTT MANLEY REACTS TO HIS TOP 1000 YOUTUBE COMMENTS

 * HOW HORROR WORKS IN AMNESIA: REBIRTH, SOMA AND AMNESIA: THE DARK DESCENT

 * LGR'S CLINT BASINGER REACTS TO HIS TOP 1000 YOUTUBE COMMENTS

 * THE F-35'S NEXT TECH UPGRADE

 * HOW ONE GAMEPLAY DECISION CHANGED DIABLO FOREVER

 * UNSOLVED MORTAL KOMBAT MYSTERIES WITH DOMINIC CIANCIOLO FROM NETHERREALM
   STUDIOS

 * US NAVY GETS AN ITALIAN ACCENT

 * HOW AMAZON’S “UNDONE” ANIMATES DREAMS WITH ROTOSCOPING AND OIL PAINTS

 * FIGHTER PILOT BREAKS DOWN EVERY BUTTON IN AN F-15 COCKPIT

 * HOW NBA JAM BECAME A BILLION-DOLLAR SLAM DUNK

 * LINUS "TECH TIPS" SEBASTIAN REACTS TO HIS TOP 1000 YOUTUBE COMMENTS

 * HOW ALAN WAKE WAS REBUILT 3 YEARS INTO DEVELOPMENT

 * HOW PRINCE OF PERSIA DEFEATED APPLE II'S MEMORY LIMITATIONS

 * HOW CRASH BANDICOOT HACKED THE ORIGINAL PLAYSTATION

 * MYST: THE CHALLENGES OF CD-ROM | WAR STORIES

 * MARKIPLIER REACTS TO HIS TOP 1000 YOUTUBE COMMENTS

 * HOW MIND CONTROL SAVED ODDWORLD: ABE'S ODDYSEE

 * BIOWARE ANSWERS UNSOLVED MYSTERIES OF THE MASS EFFECT UNIVERSE

 * CIVILIZATION: IT'S GOOD TO TAKE TURNS | WAR STORIES

 * SITREP: DOD RESETS BALLISTIC MISSILE INTERCEPTOR PROGRAM

 * WARFRAME'S REBECCA FORD REVIEWS YOUR CHARACTERS

 * SUBNAUTICA: A WORLD WITHOUT GUNS | WAR STORIES

 * HOW SLAY THE SPIRE’S ORIGINAL INTERFACE ALMOST KILLED THE GAME | WAR STORIES

 * AMNESIA: THE DARK DESCENT - THE HORROR FACADE | WAR STORIES

 * COMMAND & CONQUER: TIBERIAN SUN | WAR STORIES

 * BLADE RUNNER: SKINJOBS, VOXELS, AND FUTURE NOIR | WAR STORIES

 * DEAD SPACE: THE DRAG TENTACLE | WAR STORIES

 * TEACH THE CONTROVERSY: FLAT EARTHERS

 * DELTA V: THE BURGEONING WORLD OF SMALL ROCKETS, PAUL ALLEN'S HUGE PLANE, AND
   SPACEX GETS A CRUCIAL GREEN-LIGHT

 * CHRIS HADFIELD EXPLAINS HIS 'SPACE ODDITY' VIDEO

 * THE GREATEST LEAP, EPISODE 1: RISK

 * ULTIMA ONLINE: THE VIRTUAL ECOLOGY | WAR STORIES

More videos
← Previous story Next story →


RELATED STORIES

by Taboolaby Taboola
Sponsored LinksSponsored Links
Promoted LinksPromoted Links

Hersteller in der Klemme: Solaranlagen-Preise fallen ins BodenloseCheckfox


Undo

Hausbesitzer ohne Solar: In 2024 sollten Sie…Enpal


Undo

Keine Installation erforderlich: Neue Dusch WCsDusch WC | Gesponserte Links
Jetzt Suchen


Undo

Kaufen Sie keine teure Smartwatch, bevor Sie nicht dies gelesen habenKnauermann


Undo

Hausbesitzer ohne Solaranlage sollten dieses Komplettpaket kennenSolaranlage
fürs Dach


Undo

Holen Sie sich unseren Gratis Reisekatalog perfekt für
Senioren!hafermannreisen.de
Hier klicken


Undo




TODAY ON ARS

 * Store
 * Subscribe
 * About Us
 * RSS Feeds
 * View Mobile Site

 * Contact Us
 * Staff
 * Advertise with us
 * Reprints


NEWSLETTER SIGNUP

Join the Ars Orbital Transmission mailing list to get weekly updates delivered
to your inbox. Sign me up →



CNMN Collection
WIRED Media Group
© 2024 Condé Nast. All rights reserved. Use of and/or registration on any
portion of this site constitutes acceptance of our User Agreement (updated
1/1/20) and Privacy Policy and Cookie Statement (updated 1/1/20) and Ars
Technica Addendum (effective 8/21/2018). Ars may earn compensation on sales from
links on this site. Read our affiliate link policy.
Your California Privacy Rights | Manage Preferences
The material on this site may not be reproduced, distributed, transmitted,
cached or otherwise used, except with the prior written permission of Condé
Nast.
Ad Choices






WE CARE ABOUT YOUR PRIVACY

We and our 166 partners store and/or access information on a device, such as
unique IDs in cookies to process personal data. You may accept or manage your
choices by clicking below or at any time in the privacy policy page. These
choices will be signaled to our partners and will not affect browsing data.More
information about your privacy


WE AND OUR PARTNERS PROCESS DATA TO PROVIDE:

Use precise geolocation data. Actively scan device characteristics for
identification. Store and/or access information on a device. Personalised
advertising and content, advertising and content measurement, audience research
and services development. List of Partners (vendors)

I Accept
Show Purposes