petewarden.com Open in urlscan Pro
192.0.78.24 Public Scan

Back to summary

Submitted URL:
http://petewarden.com/
Effective URL:
https://petewarden.com/
Submission: On November 26 via api (November 26th 2023, 4:13:56 am UTC) from US — Scanned from DE

Form analysis
3 forms found in the DOM

GET https://petewarden.com/

<form method="get" id="searchform" action="https://petewarden.com/">
  <label for="s" class="assistive-text">Search</label>
  <input type="text" class="field" name="s" id="s" placeholder="Search">
  <input type="submit" class="submit" name="submit" id="searchsubmit" value="Search">
</form>

POST https://subscribe.wordpress.com

<form method="post" action="https://subscribe.wordpress.com" accept-charset="utf-8" style="display: none;">
  <div class="actnbr-follow-count">Join 1,884 other followers</div>
  <div>
    <input type="email" name="email" placeholder="Enter your email address" class="actnbr-email-field" aria-label="Enter your email address">
  </div>
  <input type="hidden" name="action" value="subscribe">
  <input type="hidden" name="blog_id" value="55065938">
  <input type="hidden" name="source" value="https://petewarden.com/">
  <input type="hidden" name="sub-type" value="actionbar-follow">
  <input type="hidden" id="_wpnonce" name="_wpnonce" value="20aaa89d81">
  <div class="actnbr-button-wrap">
    <button type="submit" value="Sign me up"> Sign me up </button>
  </div>
</form>

<form id="jp-carousel-comment-form">
  <label for="jp-carousel-comment-form-comment-field" class="screen-reader-text">Write a Comment...</label>
  <textarea name="comment" class="jp-carousel-comment-form-field jp-carousel-comment-form-textarea" id="jp-carousel-comment-form-comment-field" placeholder="Write a Comment..."></textarea>
  <div id="jp-carousel-comment-form-submit-and-info-wrapper">
    <div id="jp-carousel-comment-form-commenting-as">
      <fieldset>
        <label for="jp-carousel-comment-form-email-field">Email (Required)</label>
        <input type="text" name="email" class="jp-carousel-comment-form-field jp-carousel-comment-form-text-field" id="jp-carousel-comment-form-email-field">
      </fieldset>
      <fieldset>
        <label for="jp-carousel-comment-form-author-field">Name (Required)</label>
        <input type="text" name="author" class="jp-carousel-comment-form-field jp-carousel-comment-form-text-field" id="jp-carousel-comment-form-author-field">
      </fieldset>
      <fieldset>
        <label for="jp-carousel-comment-form-url-field">Website</label>
        <input type="text" name="url" class="jp-carousel-comment-form-field jp-carousel-comment-form-text-field" id="jp-carousel-comment-form-url-field">
      </fieldset>
    </div>
    <input type="submit" name="submit" class="jp-carousel-comment-form-button" id="jp-carousel-comment-form-button-submit" value="Post Comment">
  </div>
</form>

Text Content

PETE WARDEN'S BLOG

EVER TRIED. EVER FAILED. NO MATTER. TRY AGAIN. FAIL AGAIN. FAIL BETTER.

Menu
Skip to content
* Home
* About

WHY WE’RE BUILDING AN OPEN-SOURCE UNIVERSAL TRANSLATOR

October 16, 2023 By Pete Warden in Uncategorized 4 Comments

We all grew up with TV shows, books, and movies that assume everybody can
understand each when they speak, even if they’re aliens. There are various
in-universe explanations for this convenient feature, and most of them involve a
technological solution. Today, the Google Translate app is the closest thing we
have to this kind of universal translator, but the experience isn’t good enough
to be used everywhere it could be useful. I’ve often found myself bending over a
phone with someone, both of us staring at the screen to see the text, and
switching back and forth between email or another app to share information.

Science fiction translators are effortless. You can walk up to someone and talk
normally, and they understand what you’re saying as soon as you speak. There’s
no setup, no latency, it’s just like any other conversation. So how can we get
there from here?

One of the most common answers is a wearable earpiece. This is in line with
Hitchhikers’ Babel Fish, but there are still massive technical obstacles to
fitting the processing required into such a small device, even offloading
compute to a phone would require a lot of radio and battery usage. These
barriers mean we’ll have to wait for hardware innovations like Syntiant’s to go
through a few more generations before we can create this sort of dream device.

Instead, we’re building a small, unconnected box with a built-in display that
can automatically translate between dozens of different languages. You can see
it in the video above, and we’ve got working demos to share if you’re interested
in trying it out. The form factor means it can be left in-place on a hotel front
desk, brought to a meeting, placed in front of a TV, or anywhere you need
continuous translation. The people we’ve shown this to have already asked to
take them home for visiting relatives, colleagues, or themselves when traveling.

You can get audio out using a speaker or earpiece, but you’ll also see real-time
closed captions of the conversation in the language of your choice. The display
means it’s easy to talk naturally, with eye contact, and be aware of the whole
context. Because it’s a single-purpose device, it’s less complicated to use than
a phone app, and it doesn’t need a network connection, so there are no accounts
or setup involved, it starts working as soon as you plug it in.

We’re not the only ones heading in this direction, but what makes us different
is that we’ve removed the need for any network access, and partly because of
that we’re able to run with much lower latency, using the latest AI techniques
to still achieve high accuracy.

We’re also big believers in open source, so we’re building on top of work like
Meta’s NLLB and OpenAI’s Whisper and will be releasing the results under an open
license. I strongly believe that language translation should be commoditized,
making it into a common resource that a lot of stakeholders can contribute to,
so I hope this will be a step in that direction. This is especially essential
for low-resource languages, where giving communities the opportunity to be
involved in digital preservation is vital for their future survival. Tech
companies don’t have a big profit incentive to support translation, so advancing
the technology will have to rely on other groups for support.

I’m also hoping that making translation widely available will lead manufacturers
to include it in devices like TVs, kiosks, ticket machines, help desks, phone
support, and any other products that could benefit from wider language support.
It’s not likely to replace the work of human translators (as anyone who’s read
auto-translated documentation can vouch for) but I do think that AI can have a
big impact here, bringing people closer together.

If this sounds interesting, please consider supporting our crowdfunding
campaign. One of the challenges we’re facing is showing that there’s real demand
for something like this, so even subscribing to follow the updates helps
demonstrate that it’s something people want. If you have a commercial use case
I’d love to hear from you too, we have a limited number of demo units we are
loaning to the most compelling applications.

THE UNSTOPPABLE RISE OF DISPOSABLE ML FRAMEWORKS

October 15, 2023 By Pete Warden in Uncategorized 1 Comment
Photo by Steve Harwood

On Friday my long-time colleague Nat asked if we should try and expand our
Useful Transformers library into something that could be suitable for a lot more
use cases. We worked together on TensorFlow, as did the main author of UT,
Manjunath, so he was surprised when I didn’t want to head too far in a generic
direction. As I was discussing it with him I realized how much my perspective on
ML library design has changed since we started TensorFlow, and since I think by
writing I wanted to get my thoughts down as this post.

The GGML framework is just over a year old, but it has already changed the whole
landscape of machine learning. Before GGML, an engineer wanting to run an
existing ML model would start with a general purpose framework like PyTorch,
find a data file containing the model architecture and weights, and then figure
out the right sequence of calls to load and execute it. Today it’s much more
likely that they will pick a model-specific code library like whisper.cpp or
llama.cpp, based on GGML.

This isn’t the whole story though, because there are also popular model-specific
libraries like llama2.cpp or llama.c that don’t use GGML, so this movement
clearly isn’t based on the qualities of just one framework. The best term I’ve
been able to come up with to describe these libraries is “disposable”. I know
that might sound derogatory, but I don’t mean it like that, I actually think
it’s the key to all their virtues! They’ve limited their scope to just a few
models, focus on inference or fine-tuning rather than training from scratch, and
overall try to do a few things very well. They’re not designed to last forever,
as models change they’re likely to be replaced by newer versions, but they’re
very good at what they do.

By contrast, traditional frameworks like PyTorch or TensorFlow try to do many
different things for a lot of different audiences. They are designed to be
toolkits that can be reused for almost any possible model, for full training as
well as deployment in production, scaling from laptops (or even in TF’s case
microcontrollers) to distributed clusters of hundreds of GPUs or TPUs. The idea
is that you learn the fundamentals of the API, and then you can reuse that
knowledge for years in many different circumstances.

What I’ve seen firsthand with TensorFlow is how coping with such a wide range of
requirements forces its code to become very complex and hard to understand. The
hope is always that the implementation details can be hidden behind an
interface, so that people can use the system without becoming aware of the
underlying complexity. In practice this is impossible to achieve, because
latency and throughput are so important. The only reason to use ML frameworks
instead of a NumPy Python script is to take advantage of hardware acceleration,
since training and inference time need to be minimized for many projects to be
achievable. If a model takes years to train, it’s effectively untrainable. If a
chatbot response takes days, why bother?

But details leak out from the abstraction layer as soon as an engineer needs to
care about speed. Do all of my layers fit on a TPU? Am I using more memory than
I have available on my GPU? Is there a layer in the middle of my network that’s
only implemented as a CPU operation, and so is causing massive latencies as data
is copied to and from the accelerator? This is where the underlying complexity
of the system comes back to bite us. There are so many levels of indirection
involved that building a mental model of what code is executing and where is not
practical. You can’t even easily step through code in a debugger or analyze it
using a profiler, because much of it executes asynchronously on an accelerator,
goes through multiple compilation steps before running on a regular processor,
or is dispatched to platform-specific libraries that may not even have source
code available. This opaqueness makes it extremely hard for anyone outside of
the core framework team to even identify performance problems, let alone propose
fixes. Because every code path is used by so many different models and use
cases, just verifying that any change doesn’t cause a regression is a massive
job.

By contrast, debugging and profiling issues with disposable frameworks is
delightfully simple. There’s a single big program that you can inspect to
understand the overall flow, and then debug and profile using very standard
tools. If you spot an issue, you can find and change the code easily yourself,
and either keep it on your local copy or create a pull request after checking
the limited number of use cases the framework supports.

Another big pain point for “big” frameworks is installation and dependency
management. I was responsible for creating and maintaining the Raspberry Pi port
of TensorFlow for a couple of years, and it was one of the hardest engineering
jobs I’ve had in my career. It was so painful I eventually gave up, and nobody
else was willing to take it on! Because TF supported so many different
operations, platforms, and libraries, porting and keeping it building on non-x86
platform was a nightmare. There were constantly new layers and operations being
added, many of which in turn relied on third party code that also had to be
ported. I groaned when I saw a new dependency appear in the build files, usually
for something like an Amazon AWS input authentication pip package that didn’t
add much value for the Pi users, but still required me to figure out how to
install it on a platform that was often unsupported by the authors.

The beauty of single-purpose frameworks is that they can include all of the
dependencies they need, right in the source code. This makes them a dream to
install, often only requiring a checkout and build, and makes porting them to
different platforms much simpler.

This is not a new problem, and during my career at Google I saw a lot of domain
or model-specific libraries emerge internally as alternatives to using
TensorFlow. These were often enthusiastically adopted by application engineers,
because they were so much easier to work with. There was often a lot of tension
about this with the infrastructure team, because while this approach helped ship
products, there were fears about the future maintenance cost of supporting many
different libraries. For example, adding support for new accelerators like TPUs
would be much harder if it had to be done for a multitude of internal libraries
rather than just one, and it increased the cost of switching to new models.

Despite these valid concerns, I think disposable frameworks will only grow in
importance. More people are starting to care about inference rather than
training, and a handful of foundation models are beginning to dominate
applications, so the value of using a framework that can handle anything but is
great at nothing is shrinking.

One reason I’m so sure is that we’ve seen this movie before. I spent the first
few years of my career working in games, writing rendering engines in the
Playstation 1 era. The industry standard was for every team to write their own
renderer for each game, maybe copying and pasting some code from other titles
but otherwise with little reuse. This made sense because the performance
constraints were so tight. With only two megabytes of memory on a PS1 and a slow
processor, every byte and cycle counted, so spending a lot of time jettisoning
anything unnecessary and hand-optimizing the functions that mattered was a good
use of programming time. Every large studio had the same worries about
maintaining such a large number of engines across all their games, and every few
years they’d task an internal group to build a more generic renderer that could
be reused by multiple titles. Inevitably these efforts failed. It was faster and
more effective for engineers to write something specialized from scratch than it
was to whittle down and modify a generic framework to do what they needed.

Eventually a couple of large frameworks like Unity and Unreal came to dominate
the industry, but it’s still not unheard of for developers to write their own,
and even getting this far took decades. ML frameworks face the same challenges
as game engines in the 90’s, with application developers given tight performance
and memory constraints that are hard to hit using generic tools. If the past is
any guide we’ll see repeated attempts to promote unified frameworks while
real-world developers rely on less-generic but simpler libraries.

Of course it’s not a totally binary choice. For example we’re still planning on
expanding Useful Transformers to support the LLM and translation models we’re
using for our AI in a Box so we’ll have some genericity, but the mid-2010’s
vision of “One framework to rule them all” is dead. It might be that PyTorch
(which has clearly won the research market) becomes more like MatLab, a place to
prototype and create algorithms, which are then hand-converted to customized
inference frameworks by experienced engineers rather than automated tools or
compilers.

What makes me happiest is that the movement to disposable frameworks is clearly
opening up the world of ML development to many more people. By removing the
layers of indirection and dependencies, the underlying simplicity of machine
learning becomes a lot clearer, and hopefully less intimidating. I can’t wait to
see all of the amazing products this democratization of the technology produces!

REQUEST FOR SENSORS

October 3, 2023 By Pete Warden in Uncategorized 2 Comments

At Useful Sensors we’re focused on building intelligent sensors, ones that use
machine learning to take raw data and turn it into actionable insights.
Sometimes I run across problems in my own life that don’t need advanced
algorithms or AI to solve, but are blocked by hardware limitations. A classic
one is “Did I leave my garage door open?”. A few months ago I even had to post
to our street’s mailing list to ask someone to check it while I was away, since
I was anxious I’d left it open. Thankfully several of my great neighbors jumped
in and confirmed it was closed, but relying on their patience isn’t a long term
or scalable solution.

Sensors to help with this do exist, and have for a long time, so why are they
still a niche product? For me, the holdbacks are difficult setup procedures and
short battery lives. The tradeoff generally seems to be that to get a battery
life measured in years, you need to use a specialized protocol like ZigBee,
Threads, or Matter, which requires a hub, which adds to the setup time and
likelihood of having to troubleshoot issues. Wifi-enabled sensors like the Swann
linked to above don’t specify a battery life (the support team refuses to give
an estimate further down on the page) but I’ve found similar devices last
months, not years. What I would love is a cell-data-connected sensor with zero
accounts, apps, or setup, beyond maybe scanning a QR code to claim it. One of
the reasons I’m a big fan of Blues is that their fixed-cost cell package could
make a device like this possible, but I’m guessing it would still need to be
comparatively large for the hardware and battery required, and comparatively
costly too.

What all of the current solutions have in common is that they demand more of my
time than I’m willing to give. I have plenty of frustrating issues to debug in
my technical work, the last thing I want to do when I get home is deal with
poorly documented setup workflows or change batteries more than once in a blue
moon. I’m guessing that I’m not alone, every product I’ve seen that truly “just
works” has had an order of magnitude more sales than a competitor that has even
a bit of friction.

I would happily pay a lot for a device that I could stick on a garage door, scan
a code on my phone that took me to a web URL where I could claim it (no more
terrible phone apps, please) and then simply sent me a text if it was open for
more than ten minutes. My sense is that the problems that need to be solved are
around power consumption, radio, and cost. These aren’t areas I have expertise
in, so I won’t be attempting this challenge, but I hope someone out there will,
and soon.

A similar application is medication detection. I’m old enough to have my own
pill organizer (don’t laugh too loud, it’s coming for you eventually) but an
accelerometer attached to a pill bottle could tell if I’ve picked it up, and so
presumably taken a dose, on time, and I’d never again have to measure out my
tablets into little plastic slots. Devices like these do exist, but the setup,
cost, and power consumption challenges are even higher, so they’re restricted to
specialized use cases like clinical trials.

It feels like we’ve been on the verge of being able to build products like this
for decades, but so many systems need to work smoothly to make the experience
seamless that nothing has taken off. I really hope that the stars will align
soon and I’ll be able to remove one or two little anxieties from my life!

A PERSONAL HISTORY OF ML QUANTIZATION

October 2, 2023 By Pete Warden in Uncategorized Leave a comment

Tomorrow I’ll be giving a remote talk at the LBQNN workshop at ICCV. The topic
is the history of quantization in machine learning, and while I don’t feel
qualified to give an authoritative account, I did think it might be interesting
to cover the developments I was aware of.

I don’t know if the talk will be recorded, but here are the slides in case they
are useful for reference. Apologies for any mistakes, please do let me know so I
can improve the presentation.

WHY NVIDIA’S AI SUPREMACY IS ONLY TEMPORARY

September 10, 2023 By Pete Warden in Uncategorized 19 Comments

Nvidia is an amazing company that has executed a contrarian vision for decades,
and has rightly become one of the most valuable corporations on the planet
thanks to its central role in the AI revolution. I want to explain why I believe
it’s top spot in machine learning is far from secure over the next few years. To
do that, I’m going to talk about some of the drivers behind Nvidia’s current
dominance, and then how they will change in the future.

CURRENTLY

Here’s why I think Nvidia is winning so hard right now.

#1 – Almost Nobody is Running Large ML Apps

Outside of a few large tech companies, very few corporations have advanced to
actually running large scale AI models in production. They’re still figuring out
how to get started with these new capabilities, so the main costs are around
dataset collection, hardware for training, and salaries for model authors. This
means that machine learning is focused on training, not inference.

#2 – All Nvidia Alternatives Suck

If you’re a developer creating or using ML models, using an Nvidia GPU is a lot
easier and less time consuming than an AMD OpenCL card, Google TPU, a Cerebras
system, or any other hardware. The software stack is much more mature, there are
many more examples, documentation, and other resources, finding engineers
experienced with Nvidia is much easier, and integration with all of the major
frameworks is better. There is no realistic way for a competitor to beat the
platform effect Nvidia has built. It makes sense for the current market to be
winner-takes-all, and they’re the winner, full stop.

#3 – Researchers have the Purchasing Power

It’s incredibly hard to hire ML researchers, anyone with experience has their
pick of job offers right now. That means they need to be kept happy, and one of
the things they demand is use of the Nvidia platform. It’s what they know,
they’re productive with it, picking up an alternative would take time and not
result in skills the job market values, whereas working on models with the tools
they’re comfortable with does. Because researchers are so expensive to hire and
retain, their preferences are given a very high priority when purchasing
hardware.

#4 – Training Latency Rules

As a rule of thumb models need to be trainable from scratch in about a week.
I’ve seen this hold true since the early days of AlexNet, because if the
iteration cycle gets any longer it’s very hard to do the empirical testing and
prototyping that’s still essential to reach your accuracy goals. As hardware
gets faster, people build bigger models up until the point that the training
once again takes roughly the same amount of time, and reap the benefits through
higher-quality models rather than reduced total training time. This makes buying
the latest Nvidia GPUs very attractive, since your existing code will mostly
just work, but faster. In theory there’s an opportunity here for competitors to
win with lower latency, but the inevitably poor state of their software stack
(CUDA has had decades of investment) means it’s mostly an illusion.

WHAT’S GOING TO CHANGE?

So, hopefully I’ve made a convincing case that there are strong structural
reasons behind Nvidia’s success. Here’s how I see those conditions changing over
the next few years.

#1 – Inference will Dominate, not Training

Somebody years ago told me “Training costs scale with the number of researchers,
inference costs scale with the number of users”. What I took away from this is
that there’s some point in the future where the amount of compute any company is
using for running models on user requests will exceed the cycles they’re
spending on training. Even if the cost of a single training run is massive and
running inference is cheap, there are so many potential users in the world with
so many different applications that the accumulated total of those inferences
will exceed the training total. There are only ever going to be so many
researchers.

What this means for hardware is that priorities will shift towards reducing
inference costs. A lot of ML researchers see inference as a subset of training,
but this is wrong in some fundamental ways. It’s often very hard to assemble a
sizable batch of inputs during inference, because that process trades off
latency against throughput, and latency is almost always key in user-facing
applications. Small or single-input batches change the workload dramatically,
and call for very different optimization approaches. There are also a lot of
things (like the weights) that remain constant during inference, and so can
benefit from pre-processing techniques like weight compression or constant
folding.

#2 – CPUs are Competitive for Inference

I didn’t even list CPUs in the Nvidia alternatives above because they’re still
laughably slow for training. The main desktop CPUs (x86, Arm, and maybe RISC-V
soon) have the benefit of many decades of toolchain investment. They have an
even more mature set of development tools and community than Nvidia. They can
also be much cheaper per arithmetic op than any GPU.

Old-timers will remember the early days of the internet when most of the cost of
setting up a dot-com was millions of dollars for a bunch of high-end web server
hardware from someone like Sun. This was because they were the only realistic
platform that could serve web pages reliably and with low-latency. They had the
fastest hardware money could buy, and that was important when entire sites
needed to fit on a single machine. Sun’s market share was rapidly eaten by the
introduction of software that could distribute the work across a large number of
individually much less capable machines, commodity x86 boxes that were far
cheaper.

Training is currently very hard to distribute in a similar way. The workloads
make it possible to split work across a few GPUs that are tightly
interconnected, but the pattern of continuous updates makes reducing latency by
sharding across low-end CPUs unrealistic. This is not true for inference though.
The model weights are fixed and can easily be duplicated across a lot of
machines at initialization time, so no communication is needed. This makes an
army of commodity PCs very appealing for applications relying on ML inference.

#3 – Deployment Engineers gain Power

As inference costs begin to dominate training, there will be a lot of pressure
to reduce those costs. Researchers will no longer be the highest priority, so
their preferences will carry less weight. They will be asked to do things that
are less personally exciting in order to streamline production. There are also
going to be a lot more people capable of training models coming into the
workforce over the next few years, as the skills involved become more widely
understood. This all means researchers’ corporate power will shrink and the
needs of the deployment team will be given higher priority.

#4 – Application Costs Rule

When inference dominates the overall AI budget, the hardware and workload
requirements are very different. Researchers value the ability to quickly
experiment, so they need flexibility to prototype new ideas. Applications
usually change their models comparatively infrequently, and may use the same
fundamental architecture for years, once the researchers have come up with
something that meets their needs. We may almost be heading towards a world where
model authors use a specialized tool, like Matlab is for mathematical
algorithms, and then hand over the results to deployment engineers who will
manually convert the results into something more efficient for an application.
This will make sense because any cost savings will be multiplied over a long
period of time if the model architecture remains constant (even if the weights
change).

WHAT DOES THIS MEAN FOR THE FUTURE?

If you believe my four predictions above, then it’s hard to escape the
conclusion that Nvidia’s share of the overall AI market is going to drop. That
market is going to grow massively so I wouldn’t be surprised if they continue to
grow in absolute unit numbers, but I can’t see how their current margins will be
sustainable.

I expect the winners of this shift will be traditional CPU platforms like x86
and Arm. Inference will need to be tightly integrated into traditional business
logic to run end user applications, so it’s difficult to see how even hardware
specialized for inference can live across a bus, with the latency involved.
Instead I expect CPUs to gain much more tightly integrated machine learning
support, first as co-processors and eventually as specialized instructions, like
the evolution of floating point support.

On a personal level, these beliefs drive my own research and startup focus. The
impact of improving inference is going to be so high over the next few years,
and it still feels neglected compared to training. There are signs that this is
changing though. Communities like r/LocalLlama are mostly focused on improving
inference, the success of GGML shows how much of an appetite there is for
inference-focused frameworks, and the spread of a few general-purpose models
increases the payoff of inference optimizations. One reason I’m so obsessed with
the edge is that it’s the closest environment to the army of commodity PCs that
I think will run most cloud AI in the future. Even back in 2013 I originally
wrote the Jetpac SDK to accelerate computer vision on a cluster of 100 m1.small
AWS servers, since that was cheaper and faster than a GPU instance for running
inference across millions of images. It was only afterwards that I realized what
a good fit it was for mobile devices.

I’d love to hear your thoughts on whether inference is going to be as important
as I’m predicting! Let me know in the comments if you think I’m onto something,
or if I should be stocking up on Nvidia stock.

ACCELERATING AI WITH THE RASPBERRY PI PICO’S DUAL CORES

July 29, 2023 By Pete Warden in Uncategorized Leave a comment

I’ve been a fan of the RP2040 chip powering the Pico since it was launched, and
we’re even using them in some upcoming products, but I’d never used one of its
most intriguing features, the second core. It’s not common to have two cores in
a microcontroller, especially a seventy cent Cortex M0, and most of the system
software for that level of CPU doesn’t have standardized support for threads and
other typical ways to get parallelized performance from your algorithms. I still
wanted to see if I could get a performance boost on compute-intensive tasks like
machine learning though, so I dug into the pico_multicore library which provides
access low-level access to the second core.

The summary is that I was able to get approximately a 1.9x speed boost by
breaking a convolution function into two halves and running one on each
processor. The longer story is that I actually implemented most of this several
months ago, but got stuck due to a silly mistake where I was accidentally
serializing the work by calling functions in the wrong order! I was in the
process of preparing a bug report for the RPi team who had kindly agreed to take
a look when I realized my mistake. Another win for rubberducking!

If you’re interested in the details, the implementation is in my custom version
of an Arm CMSIS-NN source file. I actually ended up putting together an updated
version of the whole TFLite Micro library for the Pico to take advantage of
this. There’s another long story behind that too. I did the first TFLM port for
the Pico in my own time, and since nobody at Google or Raspberry Pi is actively
working on it, it’s remained stuck at that original version. I can’t make the
commitment to be a proper maintainer of this new version, it will be on a
best-effort basis, so bugs and PRs may not be addressed, but I’ve at least tried
to make it easier to update with a sync/sync_with_upstream.sh script that
currently works and is designed to as robust to future changes as I can make it.

If you want more information on the potential speedup, I’ve included some
benchmarking results. The lines to compare are the CONV2D results. For example
the first convolution layer takes 46ms without the optimizations, and 24ms when
run on both the cores. There are other layers in the benchmark that aren’t
optimized, like depthwise convolution, but the overall time for running the
person detection model once drops from 782ms to 599ms. This is already a nice
boost, but in the future we could do something similar for the depthwise
convolution to increase the speed even more.

Thanks to the Raspberry Pi team for building a lovely little chip! Everything
from the PIOs to software overclocking and dual cores makes it a fascinating
system to work with, and I look forward to diving in even deeper.

EXPLORE THE DARK SIDE OF SILICON VALLEY WITH RED TEAM BLUES

July 21, 2023 By Pete Warden in Uncategorized 1 Comment

It’s weird to live in a place that so many people have heard of, but so few
people know. Silicon Valley is so full of charismatic people spinning whatever
stories serve their ends it’s hard for voices with fewer ulterior motives to get
airtime. Even the opponents of big tech have an incentive to mythologize it,
it’s the only way to break through the noise. It’s very rare to find someone
with deep experience of our strange world who can paint a picture I recognize.

That’s a big reason I’ve always loved Cory Doctorow’s writing. He knows the
technology industry and the people who inhabit it inside and out, but he’s not
interested in either hagiography or demonization. He’s always been able to
pinpoint the little details that make this world simultaneously relatable and
deeply weird, like this observation about wealth from his latest book:

> I almost named the figure, but I did not. My extended network of OG Silicon
> Valley types included paupers and billionaires, and long ago, we all figured
> out that the best way to stay on friendly terms was to keep the figures out of
> it.

Red Team Blues is a fast-paced crime novel in the best traditions of Hammett,
but taking inspiration from the streets of 2020’s San Francisco instead of the
1920’s. His eye for detail adds authenticity, with his forensic accountant
protagonist relying more on social media carelessness than implausible hacking
attempts to gather the information he needs. There’s a thread of anger running
through the story too, at the machinery of tax evasion that lies behind so many
industry facades, and contributes to the world of homelessness that is the
mirror image of all the partying billionaires. He’s unsparing in his assessment
of cryptocurrencies, seeing their success as driven by money laundering for some
of the worst people in the world.

I love having an accountant as the center of a thriller, and Cory’s hero Martin
Hench is a lot of fun to spend time with. The plot itself is a rollercoaster
ride through cryptography, drug gangs, wildfire ghost towns, ex-Soviet grifters,
and it will keep you turning the pages. I highly recommend picking up a copy,
it’s enjoyable and thought-provoking at the same time.

To give you one last taste, here’s his perfect pen portrait of someone I’ve met
a few too many times:

> I’ve known a lot of hustlers, aggro types who cut corners and bull their way
> through the consequences. It’s a type, out here. Move fast and break things.
> Don’t ask permission; beg forgiveness. But most of those people, they know
> they’re doing it. You can manage them, tack around them, factor them into your
> plans.
>
> The ones who get high on their own supply, though? There’s no factoring them
> in. Far as they’re concerned, they’re the only player characters in the game
> and everyone else is an NPC, a literal nobody.

HOW CAN AI HELP EVERYDAY LIFE?

July 19, 2023 By Pete Warden in Uncategorized 1 Comment

Video of an AI-controlled lamp

There’s a lot of hype around AI these days, and it’s easy to believe that it’s
just another tech world fad like the Metaverse or crypto. I think that AI is
different though, because the real-world impact doesn’t require a leap of faith
to imagine. For example, I’ve had a long-time dream of being able to look at a
lamp, say “On”, and have the light come on. I want to be able to just ask
everyday objects for help and have them do something intelligent.

To make it easier to understand what I’m talking about, we’ve built a small box
that understands when you’re looking at it, can make sense of spoken language,
and set it up to control a lamp. We’ve designed it to work as simply as
possible:

* There’s no wake word like “Alexa” or “Siri”. You trigger the interaction by
looking at the lamp, using a Person Sensor to detect that gaze.
* We don’t require a set order of commands, we’re able to pick out what you
want from a stream of natural speech using our AI models.
* Everything is running locally on the controller box. This means that not only
is all your data private, it never leaves your home, but there’s also no
setup needed. You don’t have to download an app, connect to wifi, or even
create an account. Plug in the controller and lamp, and it Just Works.

All of this is only possible because of the new wave of transformer models that
are sweeping the world. We’re going to see a massive number of new capabilities
like this enter our everyday lives, not in years but in months. If you’re
interested in how this kind of local, private intelligence (with no server
costs!) could work with your products, I’d love to chat.

WHAT HAPPENS WHEN THE REAL YOUNG LADY’S ILLUSTRATED PRIMER LANDS IN CHINA?

May 15, 2023 By Pete Warden in Uncategorized 1 Comment

I love Brad DeLong’s writing, but I did a double take when he recently commented
“‘A Young Lady’s Illustrated Primer’ continues to recede into the future“. The
Primer he’s referencing is an electronic book from Neal Stephenson’s Diamond Age
novel, an AI tutor designed to educate and empower children, answering their
questions and shaping their characters with stories and challenges. It’s a
powerful and appealing idea in a lot of ways, and offers a very compelling use
case for conversational machine learning models. I also think that a workable
version of it now exists.

The recent advances with large language models have amazed me, and I do think
we’re now a lot closer to an AI companion that could be useful for people of any
age. If you try entering “Tell me a story about a unicorn and a fairy” into
ChatGPT you’ll almost certainly get something more entertaining and coherent
than most adults could come up with on the fly. This model comes across as a
creative and engaging partner, and I’m certain that we’ll be seeing systems
aimed at children soon enough, for better or worse. It feels like a lot of the
functionality of the Primer is already here, even if the curriculum and veracity
of the responses is lacking.

One of the reasons I like Diamond Age so much is that it doesn’t just describe
the Primer as a technology, it looks hard at its likely effects. Frederik Pohl
wrote “a good science fiction story should be able to predict not the automobile
but the traffic jam“, and Stephenson shows how subversive a technology that
delivers information in this new way can be. The owners of the Primer grow up
indoctrinated by its values and teachings, and eventually become a literal army.
This is portrayed in a positive light, since most of those values are ones that
a lot of Western educated people would agree with, but its also clear that
Stephenson believes that the effects of a technology like this would be
incredibly disruptive to the status quo.

How does this all related back to ChatGPT? Try asking it “Tell me about
Tiananmen Square” and you’ll get a clear description of the 1989 government
crackdown that killed hundreds or even thousands of protestors. So what, you
might ask? We’ve been able to type the same query into Google or Wikipedia for
decades to get uncensored information. What’s different about ChatGPT? My friend
Drew Breunig recently wrote an excellent post breaking down how LLMs work, and
one of his side notes is that they can be seen as an extreme form of lossy
compression for all the data that they’ve seen during training. The magic of
LLMs is that they’ve effectively shrunk a lot of the internet’s text content
into a representation that’s a tiny fraction of the size of the original. A
model like LLaMa might have been exposed over a trillion words during training,
but it fits into a 3.5GB file, easily small enough to run locally on a smart
phone or Raspberry Pi. That means the “Tiananmen Square” question can be
answered without having to send a network request. No cloud, wifi, or cell
connection is needed!

If you’re trying to control the flow of information in an authoritarian state
like China, this is a problem. The Great Firewall has been reasonably effective
at preventing ordinary citizens from accessing cloud-based services that might
contradict CCP propaganda because they’re physically located outside of the
country, but monitoring apps that run entirely locally on phones is going to be
a much tougher challenge. One approach would be to produce alternative LLMs that
only include approved texts, but as the “large” in the name implies, training
these models requires a lot of data. Labeling all that data would be a daunting
technical project, and the end results are likely to be less useful overall than
an uncensored version. You could also try to prevent unauthorized models from
being downloaded, but because they’re such useful tools they’re likely to show
up preloaded in everything from phones to laptops and fridges.

This local aspect of the current AI revolution isn’t often appreciated, because
many of the demonstrations show up as familiar text boxes on web pages, just
like the cloud services we’re used to. It starts to become a little clearer when
you see how many models like LLaMa and Stable Diffusion can be run locally as
desktop apps, or even on Raspberry Pis, but these are currently pretty slow and
clunky. What’s going to happen over the next year or two is that the models will
be optimized and start to match or even outstrip the speed of the web
applications. The elimination of cloud bills for server processing and improved
latency will drive commercial providers towards purely edge solutions, and the
flood of edge hardware accelerators will narrow the capability gap between a
typical phone or embedded system and a GPU in a data center.

Simply put, people all over the world are going to be learning from their AI
companions, as rudimentary as they currently are, and censoring information is
going to be a lot harder when the whole process happens on the edge. Local LLMs
are going to change politics all over the world, but especially in authoritarian
states who try to keep strict controls on information flows. The Young Lady’s
Illustrated Primer is already here, it’s just not evenly distributed yet.

NOTES FROM A BANK RUN

March 12, 2023 By Pete Warden in Uncategorized 1 Comment
Photo by Gopal Vijayaraghavan

My startup, Useful Sensors, has all of its money in Silicon Valley Bank. There
are a lot of things I worried about as a CEO, but assessing SVB’s
creditworthiness wasn’t one of them. It clearly should have been. I don’t have
any grand theories about what’s happened over the last few days but I wanted to
share some of my experiences as someone directly affected.

To start with, Useful is not at risk of shutting down. The worst case scenario,
as far as I can tell, is that we only have access to the insured amount of $250k
in our SVB account on Monday. This will be plenty for payroll on Wednesday, and
from what I’ve seen there are enough liquid assets that sales of the government
bonds that triggered the whole process should be enough to return a good portion
of the remaining balance within a week or so. If I need to, I’ll dip into my
personal savings to keep the lights on. I know this isn’t true for many other
startups though, so if they don’t get full access to their funds there will be
job losses and closures.

Although we’re not going to close, it is very disruptive to our business. Making
sure that our customers are delighted and finding more of them should be taking
all of our attention. Instead I spent most of Thursday and Friday dealing with a
rapidly changing set of recommendations from our investors, attempting to move
money, open new accounts, and now I’m discovering the joys of the FDIC claims
process. I’m trying to do this all while I’m flying to Germany for Embedded
World to announce a new distribution deal with OKdo, and this blog post is
actually written from an airport lounge in Paris. Longer term, depending on the
ultimate outcome it may affect when we want to raise our next round. To be
clear, we’re actually in a great position compared to many others, I’m an old
geezer with savings, but long-term planning at a startup is hard enough without
extra challenges like this thrown in.

It has been great having access to investors and founders who are able to help
us in practical ways. We would never have been able to open a new account so
quickly without introductions to helpful staff at another bank. I’ve been glued
to the private founder chat rooms where people have shared their experiences
with things like the FDIC claims process and pending wires. This kind of rapid
communication and sharing of information is what makes Silicon Valley such a
good place to build a startup, I’m very grateful for everyone’s help.

Having said that, the Valley’s ability to spread information and recommendations
quickly was one of the biggest causes of SVB’s demise. I’ve always been a bit of
a rubbernecker at financial disasters, and I’d read enough books on the 2008
financial crisis to understand how bank runs happen. It was strange being in one
myself though, because the logic of “everyone else is pulling their money so
you’d better too before it’s all gone” is so powerful, even though I knew this
mentality was a self-fulfilling prophecy. I planned on what I hoped was a
moderate course of action, withdrawing some of our funds from SVB to move to
another institution to gain some diversification, but by the time I was able to
set up the transfer it was too late.

Technology companies aren’t the most sympathetic victims in the current climate,
for many good reasons. I thought this story covers the political dimensions of
the bank failure well. The summary is that many taxpayers hate the idea of
bailing out startups, especially ones with millions in their bank accounts.
There are a lot of reasons why I think we’ll all benefit from not letting small
businesses pay the price for bank executives messing up their risk management,
but they’re all pretty wonky and will be a hard sell. However the alternative is
a world where only the top two or three banks in the US get most of the
deposits, because they’re perceived as too big to fail. If no financial
regulator spotted the dangers with SVB, how can you expect small business owners
to vet banks themselves? We’ll all just end up going to Citibank or JPMorgan,
which increases the overall systemic risk, as we saw in 2008.

Anyway, I just want to dedicate this to all of the founders having a tough
weekend. Startups are all about dealing with risks, but this is a particularly
frustrating problem to face because it’s so unnecessary. I hope at least we’ll
learn more over the next few weeks about how executives and regulators let a US
bank with $200 billion in assets get into such a sorry state.

POST NAVIGATION

« Older posts

FOLLOW @PETEWARDEN ON TWITTER

* RSS - Posts

petewarden.com Open in urlscan Pro 192.0.78.24 Public Scan

Form analysis 3 forms found in the DOM

GET https://petewarden.com/

POST https://subscribe.wordpress.com

Text Content

petewarden.com Open in urlscan Pro
192.0.78.24 Public Scan

Form analysis
3 forms found in the DOM