petewarden.com Open in urlscan Pro
192.0.78.25  Public Scan

Submitted URL: http://petewarden.com/
Effective URL: https://petewarden.com/
Submission Tags: tranco_l324
Submission: On April 08 via api from DE — Scanned from DE

Form analysis 4 forms found in the DOM

GET https://petewarden.com/

<form method="get" id="searchform" action="https://petewarden.com/">
  <label for="s" class="assistive-text">Search</label>
  <input type="text" class="field" name="s" id="s" placeholder="Search">
  <input type="submit" class="submit" name="submit" id="searchsubmit" value="Search">
</form>

POST https://wordpress.com/email-subscriptions

<form action="https://wordpress.com/email-subscriptions" method="post" accept-charset="utf-8" data-blog="55065938" data-post_access_level="everybody" data-subscriber_email="" id="subscribe-blog">
  <div class="wp-block-jetpack-subscriptions__form-elements">
    <p id="subscribe-email">
      <label id="subscribe-field-label" for="subscribe-field" class="screen-reader-text"> Type your email… </label>
      <input required="required" type="email" name="email" class="no-border-radius " style="font-size: 16px;padding: 15px 23px 15px 23px;border-radius: 0px;border-width: 1px;" placeholder="Type your email…" value="" id="subscribe-field"
        title="Please fill in this field.">
    </p>
    <p id="subscribe-submit">
      <input type="hidden" name="action" value="subscribe">
      <input type="hidden" name="blog_id" value="55065938">
      <input type="hidden" name="source" value="https://petewarden.com/">
      <input type="hidden" name="sub-type" value="subscribe-block">
      <input type="hidden" name="app_source" value="">
      <input type="hidden" name="redirect_fragment" value="subscribe-blog">
      <input type="hidden" name="lang" value="en">
      <input type="hidden" id="_wpnonce" name="_wpnonce" value="15f0dff03c"><input type="hidden" name="_wp_http_referer" value="/"> <button type="submit" class="wp-block-button__link no-border-radius"
        style="font-size: 16px;padding: 15px 23px 15px 23px;margin-top: 10px;border-radius: 0px;border-width: 1px;" name="jetpack_subscriptions_widget"> Subscribe </button>
    </p>
  </div>
</form>

POST https://subscribe.wordpress.com

<form method="post" action="https://subscribe.wordpress.com" accept-charset="utf-8" style="display: none;">
  <div class="actnbr-follow-count">Join 1,956 other subscribers</div>
  <div>
    <input type="email" name="email" placeholder="Enter your email address" class="actnbr-email-field" aria-label="Enter your email address">
  </div>
  <input type="hidden" name="action" value="subscribe">
  <input type="hidden" name="blog_id" value="55065938">
  <input type="hidden" name="source" value="https://petewarden.com/">
  <input type="hidden" name="sub-type" value="actionbar-follow">
  <input type="hidden" id="_wpnonce" name="_wpnonce" value="15f0dff03c">
  <div class="actnbr-button-wrap">
    <button type="submit" value="Sign me up"> Sign me up </button>
  </div>
</form>

<form id="jp-carousel-comment-form">
  <label for="jp-carousel-comment-form-comment-field" class="screen-reader-text">Write a Comment...</label>
  <textarea name="comment" class="jp-carousel-comment-form-field jp-carousel-comment-form-textarea" id="jp-carousel-comment-form-comment-field" placeholder="Write a Comment..."></textarea>
  <div id="jp-carousel-comment-form-submit-and-info-wrapper">
    <div id="jp-carousel-comment-form-commenting-as">
      <fieldset>
        <label for="jp-carousel-comment-form-email-field">Email (Required)</label>
        <input type="text" name="email" class="jp-carousel-comment-form-field jp-carousel-comment-form-text-field" id="jp-carousel-comment-form-email-field">
      </fieldset>
      <fieldset>
        <label for="jp-carousel-comment-form-author-field">Name (Required)</label>
        <input type="text" name="author" class="jp-carousel-comment-form-field jp-carousel-comment-form-text-field" id="jp-carousel-comment-form-author-field">
      </fieldset>
      <fieldset>
        <label for="jp-carousel-comment-form-url-field">Website</label>
        <input type="text" name="url" class="jp-carousel-comment-form-field jp-carousel-comment-form-text-field" id="jp-carousel-comment-form-url-field">
      </fieldset>
    </div>
    <input type="submit" name="submit" class="jp-carousel-comment-form-button" id="jp-carousel-comment-form-button-submit" value="Post Comment">
  </div>
</form>

Text Content

Search


PETE WARDEN'S BLOG


EVER TRIED. EVER FAILED. NO MATTER. TRY AGAIN. FAIL AGAIN. FAIL BETTER.

Menu
Skip to content
 * Home
 * About


UNDERSTANDING THE RASPBERRY PI PICO’S MEMORY LAYOUT

January 16, 2024 By Pete Warden in Uncategorized Tags: embedded, pico,
programming 7 Comments

A few months ago I started updating TensorFlow Lite Micro for the Raspberry Pi
Pico board, which uses the RP2040 microcontroller. I ran into some baffling bugs
that stopped me making progress, but eventually I tracked them down to my poor
understanding of the memory layout. Since I had to do a deep dive, I wanted to
share what I learned here.



This diagram shows the physical address layout of the RP2040. I believe the
flash location can be board-specific, but on the Pico boards it begins at
0x10000000 and is two megabytes long. Where things get a bit more complex is the
RAM. The RP2040 has built-in SRAM, made up of four 64KB banks, followed by two
4KB banks. There isn’t much documentation I can find about the characteristics
of these banks, but from what I can gather different banks can be accessed at
the same time by the two Cortex M0 cores on the chip. I believe if the same bank
is accessed by both cores one of the cores will stall for at least a cycle while
the other is given access.



The physical layout is fixed and controlled by the hardware, but the compiler
and linker decide how the software is going to use the available address space.
The default RAM layout is defined in
src/rp2_common/pico_standard_link/memmap_default.ld in the Pico SDK, and I’ve
used those values for the diagram above. To explain some of the labels, the
vector table is a 256 byte array of function pointers for system routines, and
is usually at the start of RAM, .data is where all the global and static
variables that start with a value are stored, .bss is the same, but for
variables that don’t need to be initialized, the heap is where malloc-ed memory
comes from, and the two stacks hold local variables for functions.

There are a few things to be aware of here. There are two stacks, one for each
of the Cortex M0 cores the RP2040 has. Unless your program explicitly calls the
second core, only core 0 will be used, so the core 1 stack is often unused. The
stacks are defined as 2kb in size, and they grow downwards in this diagram,
starting with the highest address as the top of the stack and moving to smaller
addresses as more items are added. For performance reasons, each core’s stack is
defined in a different bank, one of the smaller scratch x or y areas, presumably
so that local variables can be accessed independently by each core, with no risk
of stalls. One oddity is that each stack is 2KB, but the scratch banks are 4kb
each, and so they each only use half of the bank.

The heap size is defined to be the remaining memory once all the other
fixed-size sections have been allocated. This means it stretches from the top of
.bss to the bottom of the core 1 stack. In theory there’s no mandated way for
areas to be allocated from this region when you call malloc(), but in practice
every implementation I’ve seen will begin allocating at the bottom (lowest
address) of the heap, and move upwards as more space is needed for further
allocations.

To recap, the stacks grow downwards from the highest addresses in memory, and
the allocated parts of the heap grow upwards. This means that the area
immediately below the stacks is unlikely to be used unless you’re heavily
allocating memory from the heap. The subtle consequence of this is that you will
probably not observe incorrect behavior in most programs if you end up using
more than 2kb of stack space. The memory at the top of the heap is unlikely to
be used, so the stack can start stomping all over it without any apparent bugs
surfacing, up until the point that it reaches part of the heap that has been
allocated.

So, the nominal limit for stack size on the RP2040 is 2KB, but we can definitely
use 4KB (because that’s the size of the scratch bank), and in all likelihood
many programs will appear to work correctly even if they use a lot more. This is
important because most programs designed for non-embedded platforms assume that
the stack size is on the order of megabytes at least. Even some libraries aimed
at embedded systems assume at least tens of kilobytes of memory is available. In
this case, it was my baby, TensorFlow Lite Micro, that had these buried
assumptions.

My quest started when I saw a particular convolution test fail when I enabled my
dual-core optimizations. After a lot of debugging, I realized that the test
function was allocating several multi-kilobyte arrays as local variables on the
stack. This blew out the 2kb nominal limit, and the 4kb practical limit for the
stack size, but didn’t cause any visible problems because the heap was not
heavily used. However, if you look at the RAM layout diagram above, you’ll see
that the core 1 stack is immediately below the core 0 stack. This means that a
core 0 function that overflows its stack size will start using memory reserved
for the core 1 stack! This caused me a lot of confusion until I figured out what
was going on, and I want to flag this as something to watch out for if anyone
else is working on dual-core RP2040 optimizations. It meant that there were
weird race conditions that meant apparently random data would end up in the data
arrays, depending on which core wrote to those locations first.

Thanks to the great community on the RPi forums I was able to come up with a
simple solution for my immediate problem, by putting the core 0 stack below the
core 1 stack in the memmap_default.ld file (placing core 0 in scratch x, and
core 1 in scratch y) since I controlled all the code running on core 1 and could
ensure it wouldn’t overflow the stack, whereas core 0 ran application code that
I couldn’t control. This allowed core 0’s stack to overflow into the heap, but
left core 1’s stack untouched. I also learned a few helpful techniques from the
forum thread, such as running -fstack-usage to get the stack size of functions
and the ‘USE_STACK_GUARDS’ macro that can check for overflows. I haven’t figured
out how to specify a custom .ld file in cmake yet, but I hope to add that in the
future.

I hope this brain dump of what I learned about the RP2040’s memory layout and
the potential for silent stack overflows helps somebody else out there. It was
one of the most elusive bugs I’ve chased in quite a while, but it was very
satisfying to finally understand what was going on. One of the reasons that I
enjoy working on embedded platforms is that they are small enough systems that
it should be possible to figure out any unexpected behavior, but this one tested
my faith in that idea!


DOOM, DARK COMPUTE, AND AI

January 5, 2024 By Pete Warden in Uncategorized Tags: ai,
artificial-intelligence, computer, programming, technology Leave a comment

Back in 2020 Foone Turing caused a sensation when she showed Doom running on a
pregnancy test. For anyone who remembered desktop computers from the 90’s, it
was amazing to see a disposable device run something that used to take thousands
of dollars worth of hardware. It’s not a fluke either – calculators, ATMs,
fridges, and even keychains can run the game. What this shows is how much
computing power low-cost, everyday objects now have. If you’d told teenage me
that I could buy a 50 cent chip as powerful as my PC, my imagination would have
raced with all of the amazing things that people could build.

So why does the world of embedded devices feel so boring? We have orders of
magnitude more compute available than even a couple of decades ago, but no real
killer apps have emerged, outside of mobile and wearables. The truth is that
most compute is sitting idle most of the time. It’s like Dark Fibre after the
Dot Com Bubble. In both cases it made engineering sense to add the extra
capacity since the marginal cost was so low, even though the applications
weren’t yet there to use it. Dark Fibre eventually gave us streaming, video
calls, and the internet we know today. I think all of this Dark Compute in
embedded devices will lead to a wave of innovation too, once product designers
understand the possibilities.

How much Dark Compute is out there?

From Arm’s own data, there are 100 billion (or 1e14) Arm Cortex M chips out in
the world. Even if we assume most of those are the cheapest M0 class running at
100MHz, this translates to 100 million (or 1e8) integer arithmetic ops per
second per CPU. This suggests that 1e22 integer ops per second could be executed
if they were all working at full capacity. Though this is not comparing apples
to apples, it is more than twice the number of FLOPs available through all the
world’s active GPUs and TPUs. I’ll explain why comparing float and integer
operations is interesting below, but the headline is that the embedded world
contains a massive amount of computing power.

Estimating how much is actually used is harder, but the vast majority of current
applications are for things like fans, appliances, or other devices that don’t
need much more than simple control logic. They’re using these over-powered chips
because once the price of a 32-bit MCU drops below fifty cents (or even ten
cents!) it’s cheaper overall to buy a system that is easy to program and well
supported, as the NRE costs start to dominate. My best guess is that ninety
percent of the time these processors are left idle. That still leaves us in the
1e22 range for the total amount of Dark Compute.

What can we use Dark Compute for?

AI!

You might have guessed where I’m going from the title, but we have an amazing
opportunity to turn all of this dead silicon into delightful experiences for
users. It’s now possible run speech recognition to offer voice interfaces on
everyday devices, local closed captions and translations for accessibility,
person sensing so your TV can pause when you get up to make a cup of tea, play
air drums, recognize gestures, brew coffee perfectly, or a hundred other
interface improvements, all using the same underlying machine learning
technology. In many cases, this doesn’t even need a hardware change, because the
systems already have Dark Compute lying idle. Even better, the quality of the AI
scales with the compute available, so as more modern chips are used the
capabilities of these interface features grow too. It also only needs 8-bit
operations to execute, so the comparisons betweens FLOPS and integer ops in
terms of computing capacity are valid.

There are plenty of challenges still to overcome, from battery usage limiting
compute, to including the right sensors and making the tools easy enough to use,
but I’m convinced we’re going to see a wave of incredible AI innovations once
the engineering community figures out how to effectively use all this idle
capacity. I’m working to make this happen with Useful Sensors, so please get in
touch if you’re interested too, and I’ll be at CES next week if anyone’s around.
Let’s move our compute away from the dark side!


WHY I LOVE MY CHEVY BOLT EV

December 28, 2023 By Pete Warden in Uncategorized 1 Comment

I got my drivers license at 17, on the third attempt, but I never owned a car in
the UK since I always biked or took public transport to work. When I was 25 I
moved to Los Angeles, so I had to become a car owner for the first time. I
wasn’t looking for anything fancy, and so I bought an extremely used 1989 Honda
Civic for $2,000 which I drove for years down the 405 on my 90-minute commute
from Simi Valley to Santa Monica, before eventually upgrading to the cheapest
new car I could find, a Ford Focus, once the Civic became impossible to fix. I
drove that across to Colorado and back multiple times before moving to San
Francisco and happily selling it. I would bike or Muni to my startup in SoMa,
and then once Jetpac was acquired by Google, I’d catch the company bus to
Mountain View most days. I was far from car-free, I used Joanne’s vehicle to get
groceries or run other errands, but I was able to avoid driving for the bulk of
my travel.

All of this is to say that I am definitely not a Car Guy. My brother is, and I
admired the hard work he put into hand-restoring a Carmen Ghia, but cars have
always left me cold. Growing up I also paid far too much attention to terrifying
PSAs about the dangers of car crashes, so I’ve got a lasting fear of hurting
someone while I’m driving. Controlling a ton of metal speeding at 70MPH using
only our human senses still feels like a crazy idea, so I do my best to be very
alert, drive defensively, and generally err on the side of caution. I’ve been
the butt of “Driving Miss Daisy” jokes from friends, and I usually finish last
at go-kart outings, but (knock on wood) I’ve still got a clean record thirty
years after I first got my license.

That’s why I’m so surprised at how much I enjoy the Chevy Bolt I bought last
year. After I left Google it made sense to still have our startup offices in
Mountain View since many of the team were Xooglers too, but with no Google Bus
available I started to use Joanne’s car more, especially when I needed to go to
Stanford too. This became tricky because we needed transport during the day for
the dogs, and while I tried Caltrain it just took too long and it was tricky for
me to get to and from the nearest stations. I resigned myself to the
inconvenience of owning a car again, and hoped I might at least find a plugin
hybrid, but when I started searching I was impressed at how many pure electric
vehicles were available. I didn’t want to support Musk (I’m genuinely worried
that he’s on a Tony Hsieh-esque tailspin) but even without Tesla there were a
lot of options. After researching online, it became clear that the Chevy Bolt EV
was a good fit for what I needed. It had over 200 miles of range, plenty for my
40 mile each-way commute, and had just gone through a major recall for battery
fires, which as an engineer ironically reassured me, since I expected there
would be a lot of scrutiny of all the safety systems after that! I wasn’t
expecting to pick a Chevrolet since I associate them more with trucks and cheap
rental cars, but the price, features, reviews, and reliability ratings ended up
convincing me.

I went in person at Stevens Creek Chevrolet to make the purchase. I’m not a fan
of the US’s weird government-protected dealership setup, and the process
included a bogus-but-obligatory $995 markup masquerading as a security device,
but the staff were pleasant enough and after a couple of hours I was able to
drive off the lot with a new Bolt EUV for around $35,000. One new experience was
having to set up all the required online accounts, even as a tech professional
this was a bit daunting.

Since then I’ve driven almost 15,000 miles and for the first time in my life I
feel excited about owning a car. I actually like Chevrolet’s software, which is
not perfect but it does function surprisingly well. I do rely heavily on Android
Auto though, so GM’s decision to ditch this for future models makes me nervous.
I’d never even owned a car with a reversing camera, so this alone was a big
upgrade. What really makes it shine are the safety features. Audio alerts for
oncoming cars when I’m reversing, even before they’re visible, driver assist for
emergency braking, visual alerts for blindspot lurkers, and a smart rearview
mirror that reduces glare all help me be a better driver.

I also love home charging. So far I’ve only used the Bolt for commuting and
similar trips in the Bay Area, we still have Joanne’s old ICE VW Golf for longer
journeys, so I’ve not had to use a public charger. Once we had the Level 2
charger installed (free of charge, as part of the purchase) I was able to get to
100% in just a few hours, so I just leave it plugged in overnight and leave
every morning with a full charge. I still have range anxiety about longer trips,
especially since the biggest drawback with the Bolt is that it doesn’t offer
very speedy charges on Level 3 stations, but my schedule has prevented me from
doing those anyway so it hasn’t been an issue so far.

I have to admit that the Bolt is a lot of fun to drive too. You just press the
pedal and it accelerates! This sounds simple, but even an automatic transmission
gas car now feels clunky to me, after the smoothness and responsiveness of an
electric motor. It steers nicely as well, though for some reason I have clipped
the curb with my back tire a couple of times, which is a bigger problem than you
might expect since there’s no spare tire and any changes require a visit to a
dealership. The dogs also love curling up on the heated seats.

Electric vehicles aren’t enough by themselves to solve our climate and
congestion issues, I would still love to have better public transport
infrastructure, but they are part of the solution. Since I’m on SF’s 100%
renewable energy electricity plan I do feel good about reducing my environmental
impact as well as minimizing our reliance on the horrific foreign regimes that
are propped up by oil exports. I’m also lucky that I have a garage that I can
use to recharge my vehicle, it wouldn’t have been possible in most of the places
I live, and I’m glad that I could afford a new vehicle. Unfortunately you can’t
buy the same model I purchased, since GM discontinued-then-recontinued the Bolt,
but what’s most impressed me is that many mainstream brands now offer excellent
electric-only cars. If you’re interested in an electric vehicle, but aren’t sure
you want to take the plunge, I hope this post will at least give you some
reassurance that the technology is now pretty mature. Take it from me, I don’t
easily get excited about a car, but my Bolt is one of the best purchases I’ve
ever made!


STANFORD’S HACKLAB COURSE

December 20, 2023 By Pete Warden in Uncategorized Leave a comment

As many of you know, I’m an old geezer working on a CS PhD at Stanford and part
of that involves me taking some classes. The requirements are involved, but this
quarter I ended up taking “Hack Lab: Introduction to Cybersecurity“. I was
initially attracted to it because it focuses on the legal as well as the
technical side of security, knowledge which could have been useful earlier in my
career. I also noticed it was taught by Alex Stamos and Riana Pfefferkorn, two
academics with an amazing amount of experience between them, so I expected
they’d have a lot to share.

I’ve just finished the final work for the course, and while it was challenging
in surprising ways, I learned a lot, and had some fun too. I found the legal
questions the hardest because of how much the answers depended on what seem like
very subtle and arbitrary distinctions, like that between stored communications
and those being transmitted. As an engineer I know how much storage is involved
in any network and that even “at rest” data gets shuttled around behind the
scenes, but what impressed me was how hard lawyers and judges have worked to
match the practical rules with the intent of the lawmakers. Law isn’t code, it’s
run by humans, not machines, which meant I had to put aside my pedantry about
technological definitions to understand the history of interpretations. I still
get confused between a warrant and a writ, but now I have a bit more empathy for
the lawyers in my life at least.

The other side of the course introduced the tools and techniques around security
and hacking through a series of practical workshops. I’ve never worked in this
area, so a lot of the material was new to me, but it was so well presented I
never felt out of my depth. The team had set up example servers and captured
sequences to demonstrate things like sniffing passwords from wifi, XSS attacks,
and much more. I know from my own experience how tough it can be to produce
these kinds of guided tutorials, you have to anticipate all the ways students
can get confused and ensure there are guard rails in place, so I appreciate the
work Alex, Riana, and the TAs put into them all. I was also impressed by some of
the external teaching tools, like Security Shepherd, that were incorporated.

The course took a very broad view of cybersecurity, including cryptocurrency,
which finally got me to download a wallet for one exercise, breaking my years of
studiously ignoring the blockchain. I also now have Tor on my machine, and
understand a bit more about how that all works in case I ever need it. The
section on web fundamentals forced me to brush up on concepts like network
layers in the OSI model, and gave me experience using Wireshark and Burp to
understand network streams, which I may end up using next time I need to debug
an undocumented REST API. The lectures were top notch too, with a lot of real
world examples from Alex and Riana’s lives outside Stanford that brought depth
to the material. There was a lot of audience involvement too, and my proudest
moment was being able to answer what MtGOX originally stood for (Magic the
Gathering Online eXChange).

If you ever get the chance to take INTLPOL 268 (as it’s officially known) I’d
highly recommend it. A lot of the students were from the law school, and the
technical exercises are well designed to be do-able without previous experience
of the field, so it’s suitable for people from a wide range of backgrounds. It’s
covering an area that often falls between the gaps of existing academic
disciplines, but is crucial to understand whether you’re designing a computer
system or planning policy. Thanks to the whole team for a fantastic learning
experience, but especially my lab TA Danny Zhang for his patience as I attempted
to tackle legal questions with an engineering mindset.


LITTLE GOOGLES EVERYWHERE

November 30, 2023 By Pete Warden in Uncategorized 4 Comments

Imagine asking a box on a pillar at Home Depot “Where are the nails?” and
getting directions, your fridge responding with helpful advice when you say “Why
is the ice maker broken?”, or your car answering “How do I change the wiper
speed?”. I think of these kinds of voice assistants for everyday objects as
“Little Googles”, agents that are great at answering questions, but only in a
very specific domain. I want them in my life, but they don’t yet exist. If
they’re as useful as I think, why aren’t they already here, and why is now the
right time for them to succeed?


WHAT ARE “LITTLE GOOGLES?”

I’m a strong believer in Computers as Social Actors, the idea that people want
to interact with new technology as if it was another person. With that in mind,
I always aim to make user experiences as close to existing interactions as
possible to increase the likelihood of adoption. If you think about everyday
life, we often get information we need from a short conversation with someone
else, whether it’s a clerk at Home Depot, or your spouse who knows the car
controls better than you do. I believe that speech to text and LLMs are now
sufficiently advanced to allow a computer to answer 80% of these kinds of
informational queries, all through a voice interface.

The reason we ask people these kinds of questions rather than Googling on our
phones is that the other person has a lot of context and specialized knowledge
that isn’t present in a search engine. The clerk knows which store you’re in,
and how items are organized. Your spouse knows what car you’re driving, and has
learned the controls themselves. It’s just quicker and easier to ask somebody
right now! The idea of “Little Googles” is that we can build devices that offer
the same convenience as a human conversation, even when there’s nobody else
nearby.


WHY DON’T THEY EXIST ALREADY?

If this is such a good idea, why hasn’t anyone built these? There are a couple
of big reasons, one technical and the other financial. The first is that it used
to take hundreds of engineers years to build a reliable speech to text service.
Apple paid $200m to buy Siri in 2010, Alexa reportedly lost $10b in 2022, and I
know from my own experience that Google’s speech team was large, busy, and
deservedly well-paid. This meant that the technology to offer a voice interface
was only available to a few large companies, and they reserved it for their own
products, or other use cases that drove traffic directly to them. Speech to text
was only available if it served those companies’ purposes, which meant that
other potential customers like auto manufacturers or retail stores couldn’t use
it.

The big financial problem came from the requirement for servers. If you’re a
fridge manufacturer you only get paid once, when a consumer buys the appliance.
That fridge might have a useful lifetime of over a decade, so if you offered a
voice interface you’d need to pay for servers to process incoming audio for
years to come. Because most everyday objects aren’t supported by subscriptions
(despite BMW’s best efforts) the money to keep those servers running for an
indeterminate amount of time has to come from the initial purchase. The ongoing
costs associated with voice interfaces have been enough to deter almost anyone
who isn’t making immediate revenue from their use. 

Having to be connected also meant that the audio was sent to someone else’s data
center, with all the privacy issues involved, and required wifi availability,
which is an ongoing maintenance cost in any commercial environment and such a
pain for consumers to set up that less than half of “smart” appliances are ever
connected.


WHY IS NOW THE RIGHT TIME?

OpenAI’s release of Whisper changed everything for voice interfaces. Suddenly
anyone could download a speech to text model that performs well enough for most
use cases, and use it commercially with few strings attached. It shattered the
voice interface monopoly of the big tech companies, removing the technical
barrier.

The financial change was a bit more subtle. These models have become small
enough to fit in 40 megabytes and run on a $50 SoC. This means it’s starting to
be possible to run speech to text on the kinds of chips already found in many
cars and appliances, with no server or internet connection required. This
removes the ongoing costs from the equation, now running a voice interface is
just something that needs to be part of the on-device compute budget, a
one-time, non-recurring expense for the manufacturer.

Moving the voice interface code to the edge also removes the usability problems
and costs of requiring a network connection. You can imagine a Home Depot
product finder being a battery-powered box that is literally glued to a pillar
in the store. You’d just need somebody to periodically change the batteries and
plug in a new SD card as items are moved around. The fridge use case is even
easier, you’d ship the equivalent of the user manual with the appliance and
never update it (since the paper manual doesn’t get any).


NICE IDEA, BUT WHERE’S THE MONEY?

Voice interfaces have often seemed like a solution looking for a problem (see
Alexa’s $10b burn rate). What’s different now is that I’m talking to customers
with use cases that they believe will make them money immediately. Selling
appliance warranties is a big business, but call centers, truck rolls for
repairs, and returns can easily wipe out any profit. A technology that can be
shown to reduce all three would save a lot of money in a very direct way, so
there’s been strong interest in the kinds of “Talking user manuals” we’re
offering at Useful. Helping customers find what they need in a store is another
obvious moneymaker, since a good implementation will increase sales and consumer
satisfaction, so that’s been popular too.


WHAT’S NEXT?

It’s Steam Engine Time for this kind of technology. There are still a lot of
details to be sorted out, but it feels so obvious that it’s now possible and
that this would be a pleasant addition* to most people’s lives as well as
promising profit, that I can’t imagine something like this won’t happen. I’ll be
busy with the team at Useful trying to build some of the initial implementations
and prove that it isn’t such a crazy idea, so I’d love to hear from you if this
is something that resonates. I’d also like to see other implementations of
similar ideas, since I know I can’t be the only one seeing these trends.

(*) Terrifying AI-generated images of fridges with teeth notwithstanding.


STANFORD CS PHD COURSE CHOICES FOR WINTER 2024

November 28, 2023 By Pete Warden in Uncategorized 1 Comment

As you might know I’m working on my PhD at Stanford, and one of my favorite
parts is taking courses. For this second year I need to follow the new
foundation and breadth requirements which in practice means taking a course a
quarter, with each course chosen from one of four areas. For the fall quarter I
took Riana Pfefferkorn and Alex Stamos’ Hacklab: Introduction to Cybersecurity,
which I thoroughly enjoyed and learned a lot, especially about the legal side.
I’m especially thankful to Danny Zhang, my excellent lab RA who had a lot of
patience as I struggled with the difference between a search warrant and civil
sanctions!

That satisfied the “People and Society” section of the requirements, but means I
need to pick a course from one of the other three sections for the upcoming
winter quarter. You might think this would be simple, but as any Googler can
tell you, the more technically advanced the organization the worse the internal
search tools are. The requirements page just has a bare list of course numbers,
with no descriptions or links, and the enrollment tool is so basic that you have
to put a space between the letters and the numbers of the course ID (“CS 243”
instead of “CS243”) before it can find them, so it’s not even just a case of
copying and pasting. To add to the complexity, many of the courses aren’t
offered this coming quarter, so just figuring out what my viable options are was
hard. I thought about writing a script to scrape the results, given a set of
course numbers, but decided to do it manually in the end.

This will be a *very* niche post, but since there are around 100 other second
year Stanford CS PhD students facing the same search problem, I thought I’d post
my notes here in case they’ll be helpful. I make no guarantees about the
accuracy of these results, I may well have fat-fingered some search terms, but
let me know if you spot a mistake. I’ve indexed all 2xx and 3xx level courses in
the first three breadth sections (since I already had the fourth covered), and I
didn’t check 1xx because they tend to be more foundational. For what it’s worth,
I’m most excited about CS 224N – Natural Language Processing with Deep Learning,
and hope I can get signed up once enrollment opens.

2xx/3xx Breadth Courses Available Winter 2024

 * CS 205L – Continuous Mathematical Methods with an Emphasis on Machine
   Learning, Tuesday/Thursday
 * CS 212 – Operating Systems and Systems Programming, Monday/Wednesday
 * CS 223A – Introduction to Robotics, Monday/Wednesday
 * CS 224N – Natural Language Processing with Deep Learning, Tuesday/Thursday
 * CS 228 – Probabilistic Graphical Models: Principles and Techniques,
   Tuesday/Thursday
 * CS 229 – Machine Learning, Monday/Wednesday
 * CS 233 – Geometric and Topological Data Analysis, Monday/Wednesday
 * CS 237B – Principles of Robot Autonomy II, Monday/Wednesday
 * CS 243 – Program Analysis and Optimizations, Monday/Wednesday
 * CS 254 – Computational Complexity, Monday/Wednesday
 * CS 255 – Introduction to Cryptography, Monday/Wednesday
 * CS 256 – Algorithmic Fairness, Tuesday/Thursday
 * CS 246 – Mining Massive Data Sets, Tuesday/Thursday
 * CS 249I – The Modern Internet, Monday/Wednesday
 * CS 348I – Computer Graphics in the Era of AI, Tuesday/Thursday
 * EE 364A – Convex Optimization I, Tuesday/Thursday
 * CS 369O – Optimization Algorithms, Tuesday/Thursday

2xx/3xx Courses Not Offered for Winter 2024

 * CS 221
 * CS 227
 * CS 230
 * CS 231
 * CS 234
 * CS 236
 * CS 240
 * CS 242
 * CS 244
 * CS 245
 * CS 250
 * CS 251
 * CS 257
 * CS 258
 * CS 259
 * CS 261
 * CS 263
 * CS 265
 * CS 269
 * CS 271
 * CS 272
 * CS 273
 * CS 274
 * CS 279
 * CS 281
 * CS 316
 * CS 324
 * CS 326
 * CS 328
 * CS 329D
 * CS 329H
 * CS 329X
 * CS 330
 * CS 331
 * CS 332
 * CS 333
 * CS 334
 * CS 354
 * CS 355
 * CS 356
 * CS 358
 * CS 359
 * CS 371
 * CS 373
 * EE 282


WHY WE’RE BUILDING AN OPEN-SOURCE UNIVERSAL TRANSLATOR

October 16, 2023 By Pete Warden in Uncategorized 4 Comments

We all grew up with TV shows, books, and movies that assume everybody can
understand each when they speak, even if they’re aliens. There are various
in-universe explanations for this convenient feature, and most of them involve a
technological solution. Today, the Google Translate app is the closest thing we
have to this kind of universal translator, but the experience isn’t good enough
to be used everywhere it could be useful. I’ve often found myself bending over a
phone with someone, both of us staring at the screen to see the text, and
switching back and forth between email or another app to share information.

Science fiction translators are effortless. You can walk up to someone and talk
normally, and they understand what you’re saying as soon as you speak. There’s
no setup, no latency, it’s just like any other conversation. So how can we get
there from here?

One of the most common answers is a wearable earpiece. This is in line with
Hitchhikers’ Babel Fish, but there are still massive technical obstacles to
fitting the processing required into such a small device, even offloading
compute to a phone would require a lot of radio and battery usage. These
barriers mean we’ll have to wait for hardware innovations like Syntiant’s to go
through a few more generations before we can create this sort of dream device.

Instead, we’re building a small, unconnected box with a built-in display that
can automatically translate between dozens of different languages. You can see
it in the video above, and we’ve got working demos to share if you’re interested
in trying it out. The form factor means it can be left in-place on a hotel front
desk, brought to a meeting, placed in front of a TV, or anywhere you need
continuous translation. The people we’ve shown this to have already asked to
take them home for visiting relatives, colleagues, or themselves when traveling.

You can get audio out using a speaker or earpiece, but you’ll also see real-time
closed captions of the conversation in the language of your choice. The display
means it’s easy to talk naturally, with eye contact, and be aware of the whole
context. Because it’s a single-purpose device, it’s less complicated to use than
a phone app, and it doesn’t need a network connection, so there are no accounts
or setup involved, it starts working as soon as you plug it in.

We’re not the only ones heading in this direction, but what makes us different
is that we’ve removed the need for any network access, and partly because of
that we’re able to run with much lower latency, using the latest AI techniques
to still achieve high accuracy.

We’re also big believers in open source, so we’re building on top of work like
Meta’s NLLB and OpenAI’s Whisper and will be releasing the results under an open
license. I strongly believe that language translation should be commoditized,
making it into a common resource that a lot of stakeholders can contribute to,
so I hope this will be a step in that direction. This is especially essential
for low-resource languages, where giving communities the opportunity to be
involved in digital preservation is vital for their future survival. Tech
companies don’t have a big profit incentive to support translation, so advancing
the technology will have to rely on other groups for support.

I’m also hoping that making translation widely available will lead manufacturers
to include it in devices like TVs, kiosks, ticket machines, help desks, phone
support, and any other products that could benefit from wider language support.
It’s not likely to replace the work of human translators (as anyone who’s read
auto-translated documentation can vouch for) but I do think that AI can have a
big impact here, bringing people closer together.

If this sounds interesting, please consider supporting our crowdfunding
campaign. One of the challenges we’re facing is showing that there’s real demand
for something like this, so even subscribing to follow the updates helps
demonstrate that it’s something people want. If you have a commercial use case
I’d love to hear from you too, we have a limited number of demo units we are
loaning to the most compelling applications.


THE UNSTOPPABLE RISE OF DISPOSABLE ML FRAMEWORKS

October 15, 2023 By Pete Warden in Uncategorized 2 Comments
Photo by Steve Harwood

On Friday my long-time colleague Nat asked if we should try and expand our
Useful Transformers library into something that could be suitable for a lot more
use cases. We worked together on TensorFlow, as did the main author of UT,
Manjunath, so he was surprised when I didn’t want to head too far in a generic
direction. As I was discussing it with him I realized how much my perspective on
ML library design has changed since we started TensorFlow, and since I think by
writing I wanted to get my thoughts down as this post.

The GGML framework is just over a year old, but it has already changed the whole
landscape of machine learning. Before GGML, an engineer wanting to run an
existing ML model would start with a general purpose framework like PyTorch,
find a data file containing the model architecture and weights, and then figure
out the right sequence of calls to load and execute it. Today it’s much more
likely that they will pick a model-specific code library like whisper.cpp or
llama.cpp, based on GGML.

This isn’t the whole story though, because there are also popular model-specific
libraries like llama2.cpp or llama.c that don’t use GGML, so this movement
clearly isn’t based on the qualities of just one framework. The best term I’ve
been able to come up with to describe these libraries is “disposable”. I know
that might sound derogatory, but I don’t mean it like that, I actually think
it’s the key to all their virtues! They’ve limited their scope to just a few
models, focus on inference or fine-tuning rather than training from scratch, and
overall try to do a few things very well. They’re not designed to last forever,
as models change they’re likely to be replaced by newer versions, but they’re
very good at what they do.

By contrast, traditional frameworks like PyTorch or TensorFlow try to do many
different things for a lot of different audiences. They are designed to be
toolkits that can be reused for almost any possible model, for full training as
well as deployment in production, scaling from laptops (or even in TF’s case
microcontrollers) to distributed clusters of hundreds of GPUs or TPUs. The idea
is that you learn the fundamentals of the API, and then you can reuse that
knowledge for years in many different circumstances.

What I’ve seen firsthand with TensorFlow is how coping with such a wide range of
requirements forces its code to become very complex and hard to understand. The
hope is always that the implementation details can be hidden behind an
interface, so that people can use the system without becoming aware of the
underlying complexity. In practice this is impossible to achieve, because
latency and throughput are so important. The only reason to use ML frameworks
instead of a NumPy Python script is to take advantage of hardware acceleration,
since training and inference time need to be minimized for many projects to be
achievable. If a model takes years to train, it’s effectively untrainable. If a
chatbot response takes days, why bother?

But details leak out from the abstraction layer as soon as an engineer needs to
care about speed. Do all of my layers fit on a TPU? Am I using more memory than
I have available on my GPU? Is there a layer in the middle of my network that’s
only implemented as a CPU operation, and so is causing massive latencies as data
is copied to and from the accelerator? This is where the underlying complexity
of the system comes back to bite us. There are so many levels of indirection
involved that building a mental model of what code is executing and where is not
practical. You can’t even easily step through code in a debugger or analyze it
using a profiler, because much of it executes asynchronously on an accelerator,
goes through multiple compilation steps before running on a regular processor,
or is dispatched to platform-specific libraries that may not even have source
code available. This opaqueness makes it extremely hard for anyone outside of
the core framework team to even identify performance problems, let alone propose
fixes. Because every code path is used by so many different models and use
cases, just verifying that any change doesn’t cause a regression is a massive
job.

By contrast, debugging and profiling issues with disposable frameworks is
delightfully simple. There’s a single big program that you can inspect to
understand the overall flow, and then debug and profile using very standard
tools. If you spot an issue, you can find and change the code easily yourself,
and either keep it on your local copy or create a pull request after checking
the limited number of use cases the framework supports.

Another big pain point for “big” frameworks is installation and dependency
management. I was responsible for creating and maintaining the Raspberry Pi port
of TensorFlow for a couple of years, and it was one of the hardest engineering
jobs I’ve had in my career. It was so painful I eventually gave up, and nobody
else was willing to take it on! Because TF supported so many different
operations, platforms, and libraries, porting and keeping it building on non-x86
platform was a nightmare. There were constantly new layers and operations being
added, many of which in turn relied on third party code that also had to be
ported. I groaned when I saw a new dependency appear in the build files, usually
for something like an Amazon AWS input authentication pip package that didn’t
add much value for the Pi users, but still required me to figure out how to
install it on a platform that was often unsupported by the authors.

The beauty of single-purpose frameworks is that they can include all of the
dependencies they need, right in the source code. This makes them a dream to
install, often only requiring a checkout and build, and makes porting them to
different platforms much simpler.

This is not a new problem, and during my career at Google I saw a lot of domain
or model-specific libraries emerge internally as alternatives to using
TensorFlow. These were often enthusiastically adopted by application engineers,
because they were so much easier to work with. There was often a lot of tension
about this with the infrastructure team, because while this approach helped ship
products, there were fears about the future maintenance cost of supporting many
different libraries. For example, adding support for new accelerators like TPUs
would be much harder if it had to be done for a multitude of internal libraries
rather than just one, and it increased the cost of switching to new models.

Despite these valid concerns, I think disposable frameworks will only grow in
importance. More people are starting to care about inference rather than
training, and a handful of foundation models are beginning to dominate
applications, so the value of using a framework that can handle anything but is
great at nothing is shrinking.

One reason I’m so sure is that we’ve seen this movie before. I spent the first
few years of my career working in games, writing rendering engines in the
Playstation 1 era. The industry standard was for every team to write their own
renderer for each game, maybe copying and pasting some code from other titles
but otherwise with little reuse. This made sense because the performance
constraints were so tight. With only two megabytes of memory on a PS1 and a slow
processor, every byte and cycle counted, so spending a lot of time jettisoning
anything unnecessary and hand-optimizing the functions that mattered was a good
use of programming time. Every large studio had the same worries about
maintaining such a large number of engines across all their games, and every few
years they’d task an internal group to build a more generic renderer that could
be reused by multiple titles. Inevitably these efforts failed. It was faster and
more effective for engineers to write something specialized from scratch than it
was to whittle down and modify a generic framework to do what they needed.

Eventually a couple of large frameworks like Unity and Unreal came to dominate
the industry, but it’s still not unheard of for developers to write their own,
and even getting this far took decades. ML frameworks face the same challenges
as game engines in the 90’s, with application developers given tight performance
and memory constraints that are hard to hit using generic tools. If the past is
any guide we’ll see repeated attempts to promote unified frameworks while
real-world developers rely on less-generic but simpler libraries.

Of course it’s not a totally binary choice. For example we’re still planning on
expanding Useful Transformers to support the LLM and translation models we’re
using for our AI in a Box so we’ll have some genericity, but the mid-2010’s
vision of “One framework to rule them all” is dead. It might be that PyTorch
(which has clearly won the research market) becomes more like MatLab, a place to
prototype and create algorithms, which are then hand-converted to customized
inference frameworks by experienced engineers rather than automated tools or
compilers.

What makes me happiest is that the movement to disposable frameworks is clearly
opening up the world of ML development to many more people. By removing the
layers of indirection and dependencies, the underlying simplicity of machine
learning becomes a lot clearer, and hopefully less intimidating. I can’t wait to
see all of the amazing products this democratization of the technology produces!


REQUEST FOR SENSORS

October 3, 2023 By Pete Warden in Uncategorized 2 Comments

At Useful Sensors we’re focused on building intelligent sensors, ones that use
machine learning to take raw data and turn it into actionable insights.
Sometimes I run across problems in my own life that don’t need advanced
algorithms or AI to solve, but are blocked by hardware limitations. A classic
one is “Did I leave my garage door open?”. A few months ago I even had to post
to our street’s mailing list to ask someone to check it while I was away, since
I was anxious I’d left it open. Thankfully several of my great neighbors jumped
in and confirmed it was closed, but relying on their patience isn’t a long term
or scalable solution.

Sensors to help with this do exist, and have for a long time, so why are they
still a niche product? For me, the holdbacks are difficult setup procedures and
short battery lives. The tradeoff generally seems to be that to get a battery
life measured in years, you need to use a specialized protocol like ZigBee,
Threads, or Matter, which requires a hub, which adds to the setup time and
likelihood of having to troubleshoot issues. Wifi-enabled sensors like the Swann
linked to above don’t specify a battery life (the support team refuses to give
an estimate further down on the page) but I’ve found similar devices last
months, not years. What I would love is a cell-data-connected sensor with zero
accounts, apps, or setup, beyond maybe scanning a QR code to claim it. One of
the reasons I’m a big fan of Blues is that their fixed-cost cell package could
make a device like this possible, but I’m guessing it would still need to be
comparatively large for the hardware and battery required, and comparatively
costly too.

What all of the current solutions have in common is that they demand more of my
time than I’m willing to give. I have plenty of frustrating issues to debug in
my technical work, the last thing I want to do when I get home is deal with
poorly documented setup workflows or change batteries more than once in a blue
moon. I’m guessing that I’m not alone, every product I’ve seen that truly “just
works” has had an order of magnitude more sales than a competitor that has even
a bit of friction.

I would happily pay a lot for a device that I could stick on a garage door, scan
a code on my phone that took me to a web URL where I could claim it (no more
terrible phone apps, please) and then simply sent me a text if it was open for
more than ten minutes. My sense is that the problems that need to be solved are
around power consumption, radio, and cost. These aren’t areas I have expertise
in, so I won’t be attempting this challenge, but I hope someone out there will,
and soon.

A similar application is medication detection. I’m old enough to have my own
pill organizer (don’t laugh too loud, it’s coming for you eventually) but an
accelerometer attached to a pill bottle could tell if I’ve picked it up, and so
presumably taken a dose, on time, and I’d never again have to measure out my
tablets into little plastic slots. Devices like these do exist, but the setup,
cost, and power consumption challenges are even higher, so they’re restricted to
specialized use cases like clinical trials.

It feels like we’ve been on the verge of being able to build products like this
for decades, but so many systems need to work smoothly to make the experience
seamless that nothing has taken off. I really hope that the stars will align
soon and I’ll be able to remove one or two little anxieties from my life!


A PERSONAL HISTORY OF ML QUANTIZATION

October 2, 2023 By Pete Warden in Uncategorized Leave a comment

Tomorrow I’ll be giving a remote talk at the LBQNN workshop at ICCV. The topic
is the history of quantization in machine learning, and while I don’t feel
qualified to give an authoritative account, I did think it might be interesting
to cover the developments I was aware of.

I don’t know if the talk will be recorded, but here are the slides in case they
are useful for reference. Apologies for any mistakes, please do let me know so I
can improve the presentation.


POST NAVIGATION

« Older posts

Type your email…

Subscribe

Join 1,956 other subscribers
 * RSS - Posts


RECENT POSTS

 * Understanding the Raspberry Pi Pico’s Memory Layout
 * Doom, Dark Compute, and AI
 * Why I Love my Chevy Bolt EV
 * Stanford’s HackLab Course
 * Little Googles Everywhere


RECENT COMMENTS

ademidun on Leave a trail of breadcru…oi0841oi on Understanding the Raspberry
Pi…Pete Warden on Understanding the Raspberry Pi…oi0841oi on Understanding the
Raspberry Pi…Alan on Understanding the Raspberry Pi…


ARCHIVES

 * January 2024
 * December 2023
 * November 2023
 * October 2023
 * September 2023
 * July 2023
 * May 2023
 * March 2023
 * January 2023
 * December 2022
 * November 2022
 * October 2022
 * September 2022
 * June 2022
 * May 2022
 * April 2022
 * February 2022
 * January 2022
 * December 2021
 * August 2021
 * April 2021
 * February 2021
 * November 2020
 * August 2020
 * May 2020
 * February 2020
 * January 2020
 * August 2019
 * July 2019
 * April 2019
 * March 2019
 * October 2018
 * July 2018
 * June 2018
 * May 2018
 * April 2018
 * March 2018
 * February 2018
 * January 2018
 * December 2017
 * November 2017
 * October 2017
 * August 2017
 * July 2017
 * June 2017
 * May 2017
 * April 2017
 * January 2017
 * December 2016
 * September 2016
 * May 2016
 * April 2016
 * March 2016
 * February 2016
 * November 2015
 * October 2015
 * September 2015
 * August 2015
 * May 2015
 * April 2015
 * March 2015
 * January 2015
 * December 2014
 * November 2014
 * October 2014
 * September 2014
 * August 2014
 * July 2014
 * June 2014
 * May 2014
 * April 2014
 * March 2014
 * February 2014
 * January 2014
 * December 2013
 * November 2013
 * October 2013
 * September 2013
 * August 2013
 * July 2013
 * June 2013
 * May 2013
 * April 2013
 * March 2013
 * February 2013
 * January 2013
 * November 2012
 * October 2012
 * August 2012
 * July 2012
 * June 2012
 * May 2012
 * April 2012
 * March 2012
 * February 2012
 * January 2012
 * December 2011
 * November 2011
 * October 2011
 * September 2011
 * August 2011
 * July 2011
 * June 2011
 * May 2011
 * April 2011
 * March 2011
 * February 2011
 * January 2011
 * December 2010
 * November 2010
 * October 2010
 * September 2010
 * August 2010
 * July 2010
 * June 2010
 * May 2010
 * April 2010
 * March 2010
 * February 2010
 * January 2010
 * December 2009
 * November 2009
 * October 2009
 * September 2009
 * August 2009
 * July 2009
 * June 2009
 * May 2009
 * April 2009
 * March 2009
 * February 2009
 * January 2009
 * December 2008
 * November 2008
 * October 2008
 * September 2008
 * August 2008
 * July 2008
 * June 2008
 * May 2008
 * April 2008
 * March 2008
 * February 2008
 * January 2008
 * December 2007
 * November 2007
 * October 2007
 * September 2007
 * August 2007
 * July 2007
 * June 2007
 * May 2007
 * April 2007
 * March 2007
 * December 2006
 * November 2006
 * October 2006
 * September 2006
 * August 2006


PETE WARDEN'S BLOG


FOOTER MENU

 * Home
 * About

Blog at WordPress.com.
↑

Pete Warden's blog
Blog at WordPress.com.
 * Subscribe Subscribed
    * Pete Warden's blog
      
      Join 1,956 other subscribers
      
      Sign me up
    * Already have a WordPress.com account? Log in now.

 * Privacy
 *  * Pete Warden's blog
    * Customize
    * Subscribe Subscribed
    * Sign up
    * Log in
    * Report this content
    * View site in Reader
    * Manage subscriptions
    * Collapse this bar

 

Loading Comments...

 

Write a Comment...
Email (Required) Name (Required) Website