petewarden.com
Open in
urlscan Pro
192.0.78.24
Public Scan
Submitted URL: http://petewarden.com/
Effective URL: https://petewarden.com/
Submission: On November 26 via api from US — Scanned from DE
Effective URL: https://petewarden.com/
Submission: On November 26 via api from US — Scanned from DE
Form analysis
3 forms found in the DOMGET https://petewarden.com/
<form method="get" id="searchform" action="https://petewarden.com/">
<label for="s" class="assistive-text">Search</label>
<input type="text" class="field" name="s" id="s" placeholder="Search">
<input type="submit" class="submit" name="submit" id="searchsubmit" value="Search">
</form>
POST https://subscribe.wordpress.com
<form method="post" action="https://subscribe.wordpress.com" accept-charset="utf-8" style="display: none;">
<div class="actnbr-follow-count">Join 1,884 other followers</div>
<div>
<input type="email" name="email" placeholder="Enter your email address" class="actnbr-email-field" aria-label="Enter your email address">
</div>
<input type="hidden" name="action" value="subscribe">
<input type="hidden" name="blog_id" value="55065938">
<input type="hidden" name="source" value="https://petewarden.com/">
<input type="hidden" name="sub-type" value="actionbar-follow">
<input type="hidden" id="_wpnonce" name="_wpnonce" value="20aaa89d81">
<div class="actnbr-button-wrap">
<button type="submit" value="Sign me up"> Sign me up </button>
</div>
</form>
<form id="jp-carousel-comment-form">
<label for="jp-carousel-comment-form-comment-field" class="screen-reader-text">Write a Comment...</label>
<textarea name="comment" class="jp-carousel-comment-form-field jp-carousel-comment-form-textarea" id="jp-carousel-comment-form-comment-field" placeholder="Write a Comment..."></textarea>
<div id="jp-carousel-comment-form-submit-and-info-wrapper">
<div id="jp-carousel-comment-form-commenting-as">
<fieldset>
<label for="jp-carousel-comment-form-email-field">Email (Required)</label>
<input type="text" name="email" class="jp-carousel-comment-form-field jp-carousel-comment-form-text-field" id="jp-carousel-comment-form-email-field">
</fieldset>
<fieldset>
<label for="jp-carousel-comment-form-author-field">Name (Required)</label>
<input type="text" name="author" class="jp-carousel-comment-form-field jp-carousel-comment-form-text-field" id="jp-carousel-comment-form-author-field">
</fieldset>
<fieldset>
<label for="jp-carousel-comment-form-url-field">Website</label>
<input type="text" name="url" class="jp-carousel-comment-form-field jp-carousel-comment-form-text-field" id="jp-carousel-comment-form-url-field">
</fieldset>
</div>
<input type="submit" name="submit" class="jp-carousel-comment-form-button" id="jp-carousel-comment-form-button-submit" value="Post Comment">
</div>
</form>
Text Content
Search PETE WARDEN'S BLOG EVER TRIED. EVER FAILED. NO MATTER. TRY AGAIN. FAIL AGAIN. FAIL BETTER. Menu Skip to content * Home * About WHY WE’RE BUILDING AN OPEN-SOURCE UNIVERSAL TRANSLATOR October 16, 2023 By Pete Warden in Uncategorized 4 Comments We all grew up with TV shows, books, and movies that assume everybody can understand each when they speak, even if they’re aliens. There are various in-universe explanations for this convenient feature, and most of them involve a technological solution. Today, the Google Translate app is the closest thing we have to this kind of universal translator, but the experience isn’t good enough to be used everywhere it could be useful. I’ve often found myself bending over a phone with someone, both of us staring at the screen to see the text, and switching back and forth between email or another app to share information. Science fiction translators are effortless. You can walk up to someone and talk normally, and they understand what you’re saying as soon as you speak. There’s no setup, no latency, it’s just like any other conversation. So how can we get there from here? One of the most common answers is a wearable earpiece. This is in line with Hitchhikers’ Babel Fish, but there are still massive technical obstacles to fitting the processing required into such a small device, even offloading compute to a phone would require a lot of radio and battery usage. These barriers mean we’ll have to wait for hardware innovations like Syntiant’s to go through a few more generations before we can create this sort of dream device. Instead, we’re building a small, unconnected box with a built-in display that can automatically translate between dozens of different languages. You can see it in the video above, and we’ve got working demos to share if you’re interested in trying it out. The form factor means it can be left in-place on a hotel front desk, brought to a meeting, placed in front of a TV, or anywhere you need continuous translation. The people we’ve shown this to have already asked to take them home for visiting relatives, colleagues, or themselves when traveling. You can get audio out using a speaker or earpiece, but you’ll also see real-time closed captions of the conversation in the language of your choice. The display means it’s easy to talk naturally, with eye contact, and be aware of the whole context. Because it’s a single-purpose device, it’s less complicated to use than a phone app, and it doesn’t need a network connection, so there are no accounts or setup involved, it starts working as soon as you plug it in. We’re not the only ones heading in this direction, but what makes us different is that we’ve removed the need for any network access, and partly because of that we’re able to run with much lower latency, using the latest AI techniques to still achieve high accuracy. We’re also big believers in open source, so we’re building on top of work like Meta’s NLLB and OpenAI’s Whisper and will be releasing the results under an open license. I strongly believe that language translation should be commoditized, making it into a common resource that a lot of stakeholders can contribute to, so I hope this will be a step in that direction. This is especially essential for low-resource languages, where giving communities the opportunity to be involved in digital preservation is vital for their future survival. Tech companies don’t have a big profit incentive to support translation, so advancing the technology will have to rely on other groups for support. I’m also hoping that making translation widely available will lead manufacturers to include it in devices like TVs, kiosks, ticket machines, help desks, phone support, and any other products that could benefit from wider language support. It’s not likely to replace the work of human translators (as anyone who’s read auto-translated documentation can vouch for) but I do think that AI can have a big impact here, bringing people closer together. If this sounds interesting, please consider supporting our crowdfunding campaign. One of the challenges we’re facing is showing that there’s real demand for something like this, so even subscribing to follow the updates helps demonstrate that it’s something people want. If you have a commercial use case I’d love to hear from you too, we have a limited number of demo units we are loaning to the most compelling applications. THE UNSTOPPABLE RISE OF DISPOSABLE ML FRAMEWORKS October 15, 2023 By Pete Warden in Uncategorized 1 Comment Photo by Steve Harwood On Friday my long-time colleague Nat asked if we should try and expand our Useful Transformers library into something that could be suitable for a lot more use cases. We worked together on TensorFlow, as did the main author of UT, Manjunath, so he was surprised when I didn’t want to head too far in a generic direction. As I was discussing it with him I realized how much my perspective on ML library design has changed since we started TensorFlow, and since I think by writing I wanted to get my thoughts down as this post. The GGML framework is just over a year old, but it has already changed the whole landscape of machine learning. Before GGML, an engineer wanting to run an existing ML model would start with a general purpose framework like PyTorch, find a data file containing the model architecture and weights, and then figure out the right sequence of calls to load and execute it. Today it’s much more likely that they will pick a model-specific code library like whisper.cpp or llama.cpp, based on GGML. This isn’t the whole story though, because there are also popular model-specific libraries like llama2.cpp or llama.c that don’t use GGML, so this movement clearly isn’t based on the qualities of just one framework. The best term I’ve been able to come up with to describe these libraries is “disposable”. I know that might sound derogatory, but I don’t mean it like that, I actually think it’s the key to all their virtues! They’ve limited their scope to just a few models, focus on inference or fine-tuning rather than training from scratch, and overall try to do a few things very well. They’re not designed to last forever, as models change they’re likely to be replaced by newer versions, but they’re very good at what they do. By contrast, traditional frameworks like PyTorch or TensorFlow try to do many different things for a lot of different audiences. They are designed to be toolkits that can be reused for almost any possible model, for full training as well as deployment in production, scaling from laptops (or even in TF’s case microcontrollers) to distributed clusters of hundreds of GPUs or TPUs. The idea is that you learn the fundamentals of the API, and then you can reuse that knowledge for years in many different circumstances. What I’ve seen firsthand with TensorFlow is how coping with such a wide range of requirements forces its code to become very complex and hard to understand. The hope is always that the implementation details can be hidden behind an interface, so that people can use the system without becoming aware of the underlying complexity. In practice this is impossible to achieve, because latency and throughput are so important. The only reason to use ML frameworks instead of a NumPy Python script is to take advantage of hardware acceleration, since training and inference time need to be minimized for many projects to be achievable. If a model takes years to train, it’s effectively untrainable. If a chatbot response takes days, why bother? But details leak out from the abstraction layer as soon as an engineer needs to care about speed. Do all of my layers fit on a TPU? Am I using more memory than I have available on my GPU? Is there a layer in the middle of my network that’s only implemented as a CPU operation, and so is causing massive latencies as data is copied to and from the accelerator? This is where the underlying complexity of the system comes back to bite us. There are so many levels of indirection involved that building a mental model of what code is executing and where is not practical. You can’t even easily step through code in a debugger or analyze it using a profiler, because much of it executes asynchronously on an accelerator, goes through multiple compilation steps before running on a regular processor, or is dispatched to platform-specific libraries that may not even have source code available. This opaqueness makes it extremely hard for anyone outside of the core framework team to even identify performance problems, let alone propose fixes. Because every code path is used by so many different models and use cases, just verifying that any change doesn’t cause a regression is a massive job. By contrast, debugging and profiling issues with disposable frameworks is delightfully simple. There’s a single big program that you can inspect to understand the overall flow, and then debug and profile using very standard tools. If you spot an issue, you can find and change the code easily yourself, and either keep it on your local copy or create a pull request after checking the limited number of use cases the framework supports. Another big pain point for “big” frameworks is installation and dependency management. I was responsible for creating and maintaining the Raspberry Pi port of TensorFlow for a couple of years, and it was one of the hardest engineering jobs I’ve had in my career. It was so painful I eventually gave up, and nobody else was willing to take it on! Because TF supported so many different operations, platforms, and libraries, porting and keeping it building on non-x86 platform was a nightmare. There were constantly new layers and operations being added, many of which in turn relied on third party code that also had to be ported. I groaned when I saw a new dependency appear in the build files, usually for something like an Amazon AWS input authentication pip package that didn’t add much value for the Pi users, but still required me to figure out how to install it on a platform that was often unsupported by the authors. The beauty of single-purpose frameworks is that they can include all of the dependencies they need, right in the source code. This makes them a dream to install, often only requiring a checkout and build, and makes porting them to different platforms much simpler. This is not a new problem, and during my career at Google I saw a lot of domain or model-specific libraries emerge internally as alternatives to using TensorFlow. These were often enthusiastically adopted by application engineers, because they were so much easier to work with. There was often a lot of tension about this with the infrastructure team, because while this approach helped ship products, there were fears about the future maintenance cost of supporting many different libraries. For example, adding support for new accelerators like TPUs would be much harder if it had to be done for a multitude of internal libraries rather than just one, and it increased the cost of switching to new models. Despite these valid concerns, I think disposable frameworks will only grow in importance. More people are starting to care about inference rather than training, and a handful of foundation models are beginning to dominate applications, so the value of using a framework that can handle anything but is great at nothing is shrinking. One reason I’m so sure is that we’ve seen this movie before. I spent the first few years of my career working in games, writing rendering engines in the Playstation 1 era. The industry standard was for every team to write their own renderer for each game, maybe copying and pasting some code from other titles but otherwise with little reuse. This made sense because the performance constraints were so tight. With only two megabytes of memory on a PS1 and a slow processor, every byte and cycle counted, so spending a lot of time jettisoning anything unnecessary and hand-optimizing the functions that mattered was a good use of programming time. Every large studio had the same worries about maintaining such a large number of engines across all their games, and every few years they’d task an internal group to build a more generic renderer that could be reused by multiple titles. Inevitably these efforts failed. It was faster and more effective for engineers to write something specialized from scratch than it was to whittle down and modify a generic framework to do what they needed. Eventually a couple of large frameworks like Unity and Unreal came to dominate the industry, but it’s still not unheard of for developers to write their own, and even getting this far took decades. ML frameworks face the same challenges as game engines in the 90’s, with application developers given tight performance and memory constraints that are hard to hit using generic tools. If the past is any guide we’ll see repeated attempts to promote unified frameworks while real-world developers rely on less-generic but simpler libraries. Of course it’s not a totally binary choice. For example we’re still planning on expanding Useful Transformers to support the LLM and translation models we’re using for our AI in a Box so we’ll have some genericity, but the mid-2010’s vision of “One framework to rule them all” is dead. It might be that PyTorch (which has clearly won the research market) becomes more like MatLab, a place to prototype and create algorithms, which are then hand-converted to customized inference frameworks by experienced engineers rather than automated tools or compilers. What makes me happiest is that the movement to disposable frameworks is clearly opening up the world of ML development to many more people. By removing the layers of indirection and dependencies, the underlying simplicity of machine learning becomes a lot clearer, and hopefully less intimidating. I can’t wait to see all of the amazing products this democratization of the technology produces! REQUEST FOR SENSORS October 3, 2023 By Pete Warden in Uncategorized 2 Comments At Useful Sensors we’re focused on building intelligent sensors, ones that use machine learning to take raw data and turn it into actionable insights. Sometimes I run across problems in my own life that don’t need advanced algorithms or AI to solve, but are blocked by hardware limitations. A classic one is “Did I leave my garage door open?”. A few months ago I even had to post to our street’s mailing list to ask someone to check it while I was away, since I was anxious I’d left it open. Thankfully several of my great neighbors jumped in and confirmed it was closed, but relying on their patience isn’t a long term or scalable solution. Sensors to help with this do exist, and have for a long time, so why are they still a niche product? For me, the holdbacks are difficult setup procedures and short battery lives. The tradeoff generally seems to be that to get a battery life measured in years, you need to use a specialized protocol like ZigBee, Threads, or Matter, which requires a hub, which adds to the setup time and likelihood of having to troubleshoot issues. Wifi-enabled sensors like the Swann linked to above don’t specify a battery life (the support team refuses to give an estimate further down on the page) but I’ve found similar devices last months, not years. What I would love is a cell-data-connected sensor with zero accounts, apps, or setup, beyond maybe scanning a QR code to claim it. One of the reasons I’m a big fan of Blues is that their fixed-cost cell package could make a device like this possible, but I’m guessing it would still need to be comparatively large for the hardware and battery required, and comparatively costly too. What all of the current solutions have in common is that they demand more of my time than I’m willing to give. I have plenty of frustrating issues to debug in my technical work, the last thing I want to do when I get home is deal with poorly documented setup workflows or change batteries more than once in a blue moon. I’m guessing that I’m not alone, every product I’ve seen that truly “just works” has had an order of magnitude more sales than a competitor that has even a bit of friction. I would happily pay a lot for a device that I could stick on a garage door, scan a code on my phone that took me to a web URL where I could claim it (no more terrible phone apps, please) and then simply sent me a text if it was open for more than ten minutes. My sense is that the problems that need to be solved are around power consumption, radio, and cost. These aren’t areas I have expertise in, so I won’t be attempting this challenge, but I hope someone out there will, and soon. A similar application is medication detection. I’m old enough to have my own pill organizer (don’t laugh too loud, it’s coming for you eventually) but an accelerometer attached to a pill bottle could tell if I’ve picked it up, and so presumably taken a dose, on time, and I’d never again have to measure out my tablets into little plastic slots. Devices like these do exist, but the setup, cost, and power consumption challenges are even higher, so they’re restricted to specialized use cases like clinical trials. It feels like we’ve been on the verge of being able to build products like this for decades, but so many systems need to work smoothly to make the experience seamless that nothing has taken off. I really hope that the stars will align soon and I’ll be able to remove one or two little anxieties from my life! A PERSONAL HISTORY OF ML QUANTIZATION October 2, 2023 By Pete Warden in Uncategorized Leave a comment Tomorrow I’ll be giving a remote talk at the LBQNN workshop at ICCV. The topic is the history of quantization in machine learning, and while I don’t feel qualified to give an authoritative account, I did think it might be interesting to cover the developments I was aware of. I don’t know if the talk will be recorded, but here are the slides in case they are useful for reference. Apologies for any mistakes, please do let me know so I can improve the presentation. WHY NVIDIA’S AI SUPREMACY IS ONLY TEMPORARY September 10, 2023 By Pete Warden in Uncategorized 19 Comments Nvidia is an amazing company that has executed a contrarian vision for decades, and has rightly become one of the most valuable corporations on the planet thanks to its central role in the AI revolution. I want to explain why I believe it’s top spot in machine learning is far from secure over the next few years. To do that, I’m going to talk about some of the drivers behind Nvidia’s current dominance, and then how they will change in the future. CURRENTLY Here’s why I think Nvidia is winning so hard right now. #1 – Almost Nobody is Running Large ML Apps Outside of a few large tech companies, very few corporations have advanced to actually running large scale AI models in production. They’re still figuring out how to get started with these new capabilities, so the main costs are around dataset collection, hardware for training, and salaries for model authors. This means that machine learning is focused on training, not inference. #2 – All Nvidia Alternatives Suck If you’re a developer creating or using ML models, using an Nvidia GPU is a lot easier and less time consuming than an AMD OpenCL card, Google TPU, a Cerebras system, or any other hardware. The software stack is much more mature, there are many more examples, documentation, and other resources, finding engineers experienced with Nvidia is much easier, and integration with all of the major frameworks is better. There is no realistic way for a competitor to beat the platform effect Nvidia has built. It makes sense for the current market to be winner-takes-all, and they’re the winner, full stop. #3 – Researchers have the Purchasing Power It’s incredibly hard to hire ML researchers, anyone with experience has their pick of job offers right now. That means they need to be kept happy, and one of the things they demand is use of the Nvidia platform. It’s what they know, they’re productive with it, picking up an alternative would take time and not result in skills the job market values, whereas working on models with the tools they’re comfortable with does. Because researchers are so expensive to hire and retain, their preferences are given a very high priority when purchasing hardware. #4 – Training Latency Rules As a rule of thumb models need to be trainable from scratch in about a week. I’ve seen this hold true since the early days of AlexNet, because if the iteration cycle gets any longer it’s very hard to do the empirical testing and prototyping that’s still essential to reach your accuracy goals. As hardware gets faster, people build bigger models up until the point that the training once again takes roughly the same amount of time, and reap the benefits through higher-quality models rather than reduced total training time. This makes buying the latest Nvidia GPUs very attractive, since your existing code will mostly just work, but faster. In theory there’s an opportunity here for competitors to win with lower latency, but the inevitably poor state of their software stack (CUDA has had decades of investment) means it’s mostly an illusion. WHAT’S GOING TO CHANGE? So, hopefully I’ve made a convincing case that there are strong structural reasons behind Nvidia’s success. Here’s how I see those conditions changing over the next few years. #1 – Inference will Dominate, not Training Somebody years ago told me “Training costs scale with the number of researchers, inference costs scale with the number of users”. What I took away from this is that there’s some point in the future where the amount of compute any company is using for running models on user requests will exceed the cycles they’re spending on training. Even if the cost of a single training run is massive and running inference is cheap, there are so many potential users in the world with so many different applications that the accumulated total of those inferences will exceed the training total. There are only ever going to be so many researchers. What this means for hardware is that priorities will shift towards reducing inference costs. A lot of ML researchers see inference as a subset of training, but this is wrong in some fundamental ways. It’s often very hard to assemble a sizable batch of inputs during inference, because that process trades off latency against throughput, and latency is almost always key in user-facing applications. Small or single-input batches change the workload dramatically, and call for very different optimization approaches. There are also a lot of things (like the weights) that remain constant during inference, and so can benefit from pre-processing techniques like weight compression or constant folding. #2 – CPUs are Competitive for Inference I didn’t even list CPUs in the Nvidia alternatives above because they’re still laughably slow for training. The main desktop CPUs (x86, Arm, and maybe RISC-V soon) have the benefit of many decades of toolchain investment. They have an even more mature set of development tools and community than Nvidia. They can also be much cheaper per arithmetic op than any GPU. Old-timers will remember the early days of the internet when most of the cost of setting up a dot-com was millions of dollars for a bunch of high-end web server hardware from someone like Sun. This was because they were the only realistic platform that could serve web pages reliably and with low-latency. They had the fastest hardware money could buy, and that was important when entire sites needed to fit on a single machine. Sun’s market share was rapidly eaten by the introduction of software that could distribute the work across a large number of individually much less capable machines, commodity x86 boxes that were far cheaper. Training is currently very hard to distribute in a similar way. The workloads make it possible to split work across a few GPUs that are tightly interconnected, but the pattern of continuous updates makes reducing latency by sharding across low-end CPUs unrealistic. This is not true for inference though. The model weights are fixed and can easily be duplicated across a lot of machines at initialization time, so no communication is needed. This makes an army of commodity PCs very appealing for applications relying on ML inference. #3 – Deployment Engineers gain Power As inference costs begin to dominate training, there will be a lot of pressure to reduce those costs. Researchers will no longer be the highest priority, so their preferences will carry less weight. They will be asked to do things that are less personally exciting in order to streamline production. There are also going to be a lot more people capable of training models coming into the workforce over the next few years, as the skills involved become more widely understood. This all means researchers’ corporate power will shrink and the needs of the deployment team will be given higher priority. #4 – Application Costs Rule When inference dominates the overall AI budget, the hardware and workload requirements are very different. Researchers value the ability to quickly experiment, so they need flexibility to prototype new ideas. Applications usually change their models comparatively infrequently, and may use the same fundamental architecture for years, once the researchers have come up with something that meets their needs. We may almost be heading towards a world where model authors use a specialized tool, like Matlab is for mathematical algorithms, and then hand over the results to deployment engineers who will manually convert the results into something more efficient for an application. This will make sense because any cost savings will be multiplied over a long period of time if the model architecture remains constant (even if the weights change). WHAT DOES THIS MEAN FOR THE FUTURE? If you believe my four predictions above, then it’s hard to escape the conclusion that Nvidia’s share of the overall AI market is going to drop. That market is going to grow massively so I wouldn’t be surprised if they continue to grow in absolute unit numbers, but I can’t see how their current margins will be sustainable. I expect the winners of this shift will be traditional CPU platforms like x86 and Arm. Inference will need to be tightly integrated into traditional business logic to run end user applications, so it’s difficult to see how even hardware specialized for inference can live across a bus, with the latency involved. Instead I expect CPUs to gain much more tightly integrated machine learning support, first as co-processors and eventually as specialized instructions, like the evolution of floating point support. On a personal level, these beliefs drive my own research and startup focus. The impact of improving inference is going to be so high over the next few years, and it still feels neglected compared to training. There are signs that this is changing though. Communities like r/LocalLlama are mostly focused on improving inference, the success of GGML shows how much of an appetite there is for inference-focused frameworks, and the spread of a few general-purpose models increases the payoff of inference optimizations. One reason I’m so obsessed with the edge is that it’s the closest environment to the army of commodity PCs that I think will run most cloud AI in the future. Even back in 2013 I originally wrote the Jetpac SDK to accelerate computer vision on a cluster of 100 m1.small AWS servers, since that was cheaper and faster than a GPU instance for running inference across millions of images. It was only afterwards that I realized what a good fit it was for mobile devices. I’d love to hear your thoughts on whether inference is going to be as important as I’m predicting! Let me know in the comments if you think I’m onto something, or if I should be stocking up on Nvidia stock. ACCELERATING AI WITH THE RASPBERRY PI PICO’S DUAL CORES July 29, 2023 By Pete Warden in Uncategorized Leave a comment I’ve been a fan of the RP2040 chip powering the Pico since it was launched, and we’re even using them in some upcoming products, but I’d never used one of its most intriguing features, the second core. It’s not common to have two cores in a microcontroller, especially a seventy cent Cortex M0, and most of the system software for that level of CPU doesn’t have standardized support for threads and other typical ways to get parallelized performance from your algorithms. I still wanted to see if I could get a performance boost on compute-intensive tasks like machine learning though, so I dug into the pico_multicore library which provides access low-level access to the second core. The summary is that I was able to get approximately a 1.9x speed boost by breaking a convolution function into two halves and running one on each processor. The longer story is that I actually implemented most of this several months ago, but got stuck due to a silly mistake where I was accidentally serializing the work by calling functions in the wrong order! I was in the process of preparing a bug report for the RPi team who had kindly agreed to take a look when I realized my mistake. Another win for rubberducking! If you’re interested in the details, the implementation is in my custom version of an Arm CMSIS-NN source file. I actually ended up putting together an updated version of the whole TFLite Micro library for the Pico to take advantage of this. There’s another long story behind that too. I did the first TFLM port for the Pico in my own time, and since nobody at Google or Raspberry Pi is actively working on it, it’s remained stuck at that original version. I can’t make the commitment to be a proper maintainer of this new version, it will be on a best-effort basis, so bugs and PRs may not be addressed, but I’ve at least tried to make it easier to update with a sync/sync_with_upstream.sh script that currently works and is designed to as robust to future changes as I can make it. If you want more information on the potential speedup, I’ve included some benchmarking results. The lines to compare are the CONV2D results. For example the first convolution layer takes 46ms without the optimizations, and 24ms when run on both the cores. There are other layers in the benchmark that aren’t optimized, like depthwise convolution, but the overall time for running the person detection model once drops from 782ms to 599ms. This is already a nice boost, but in the future we could do something similar for the depthwise convolution to increase the speed even more. Thanks to the Raspberry Pi team for building a lovely little chip! Everything from the PIOs to software overclocking and dual cores makes it a fascinating system to work with, and I look forward to diving in even deeper. EXPLORE THE DARK SIDE OF SILICON VALLEY WITH RED TEAM BLUES July 21, 2023 By Pete Warden in Uncategorized 1 Comment It’s weird to live in a place that so many people have heard of, but so few people know. Silicon Valley is so full of charismatic people spinning whatever stories serve their ends it’s hard for voices with fewer ulterior motives to get airtime. Even the opponents of big tech have an incentive to mythologize it, it’s the only way to break through the noise. It’s very rare to find someone with deep experience of our strange world who can paint a picture I recognize. That’s a big reason I’ve always loved Cory Doctorow’s writing. He knows the technology industry and the people who inhabit it inside and out, but he’s not interested in either hagiography or demonization. He’s always been able to pinpoint the little details that make this world simultaneously relatable and deeply weird, like this observation about wealth from his latest book: > I almost named the figure, but I did not. My extended network of OG Silicon > Valley types included paupers and billionaires, and long ago, we all figured > out that the best way to stay on friendly terms was to keep the figures out of > it. Red Team Blues is a fast-paced crime novel in the best traditions of Hammett, but taking inspiration from the streets of 2020’s San Francisco instead of the 1920’s. His eye for detail adds authenticity, with his forensic accountant protagonist relying more on social media carelessness than implausible hacking attempts to gather the information he needs. There’s a thread of anger running through the story too, at the machinery of tax evasion that lies behind so many industry facades, and contributes to the world of homelessness that is the mirror image of all the partying billionaires. He’s unsparing in his assessment of cryptocurrencies, seeing their success as driven by money laundering for some of the worst people in the world. I love having an accountant as the center of a thriller, and Cory’s hero Martin Hench is a lot of fun to spend time with. The plot itself is a rollercoaster ride through cryptography, drug gangs, wildfire ghost towns, ex-Soviet grifters, and it will keep you turning the pages. I highly recommend picking up a copy, it’s enjoyable and thought-provoking at the same time. To give you one last taste, here’s his perfect pen portrait of someone I’ve met a few too many times: > I’ve known a lot of hustlers, aggro types who cut corners and bull their way > through the consequences. It’s a type, out here. Move fast and break things. > Don’t ask permission; beg forgiveness. But most of those people, they know > they’re doing it. You can manage them, tack around them, factor them into your > plans. > > The ones who get high on their own supply, though? There’s no factoring them > in. Far as they’re concerned, they’re the only player characters in the game > and everyone else is an NPC, a literal nobody. HOW CAN AI HELP EVERYDAY LIFE? July 19, 2023 By Pete Warden in Uncategorized 1 Comment Video of an AI-controlled lamp There’s a lot of hype around AI these days, and it’s easy to believe that it’s just another tech world fad like the Metaverse or crypto. I think that AI is different though, because the real-world impact doesn’t require a leap of faith to imagine. For example, I’ve had a long-time dream of being able to look at a lamp, say “On”, and have the light come on. I want to be able to just ask everyday objects for help and have them do something intelligent. To make it easier to understand what I’m talking about, we’ve built a small box that understands when you’re looking at it, can make sense of spoken language, and set it up to control a lamp. We’ve designed it to work as simply as possible: * There’s no wake word like “Alexa” or “Siri”. You trigger the interaction by looking at the lamp, using a Person Sensor to detect that gaze. * We don’t require a set order of commands, we’re able to pick out what you want from a stream of natural speech using our AI models. * Everything is running locally on the controller box. This means that not only is all your data private, it never leaves your home, but there’s also no setup needed. You don’t have to download an app, connect to wifi, or even create an account. Plug in the controller and lamp, and it Just Works. All of this is only possible because of the new wave of transformer models that are sweeping the world. We’re going to see a massive number of new capabilities like this enter our everyday lives, not in years but in months. If you’re interested in how this kind of local, private intelligence (with no server costs!) could work with your products, I’d love to chat. WHAT HAPPENS WHEN THE REAL YOUNG LADY’S ILLUSTRATED PRIMER LANDS IN CHINA? May 15, 2023 By Pete Warden in Uncategorized 1 Comment I love Brad DeLong’s writing, but I did a double take when he recently commented “‘A Young Lady’s Illustrated Primer’ continues to recede into the future“. The Primer he’s referencing is an electronic book from Neal Stephenson’s Diamond Age novel, an AI tutor designed to educate and empower children, answering their questions and shaping their characters with stories and challenges. It’s a powerful and appealing idea in a lot of ways, and offers a very compelling use case for conversational machine learning models. I also think that a workable version of it now exists. The recent advances with large language models have amazed me, and I do think we’re now a lot closer to an AI companion that could be useful for people of any age. If you try entering “Tell me a story about a unicorn and a fairy” into ChatGPT you’ll almost certainly get something more entertaining and coherent than most adults could come up with on the fly. This model comes across as a creative and engaging partner, and I’m certain that we’ll be seeing systems aimed at children soon enough, for better or worse. It feels like a lot of the functionality of the Primer is already here, even if the curriculum and veracity of the responses is lacking. One of the reasons I like Diamond Age so much is that it doesn’t just describe the Primer as a technology, it looks hard at its likely effects. Frederik Pohl wrote “a good science fiction story should be able to predict not the automobile but the traffic jam“, and Stephenson shows how subversive a technology that delivers information in this new way can be. The owners of the Primer grow up indoctrinated by its values and teachings, and eventually become a literal army. This is portrayed in a positive light, since most of those values are ones that a lot of Western educated people would agree with, but its also clear that Stephenson believes that the effects of a technology like this would be incredibly disruptive to the status quo. How does this all related back to ChatGPT? Try asking it “Tell me about Tiananmen Square” and you’ll get a clear description of the 1989 government crackdown that killed hundreds or even thousands of protestors. So what, you might ask? We’ve been able to type the same query into Google or Wikipedia for decades to get uncensored information. What’s different about ChatGPT? My friend Drew Breunig recently wrote an excellent post breaking down how LLMs work, and one of his side notes is that they can be seen as an extreme form of lossy compression for all the data that they’ve seen during training. The magic of LLMs is that they’ve effectively shrunk a lot of the internet’s text content into a representation that’s a tiny fraction of the size of the original. A model like LLaMa might have been exposed over a trillion words during training, but it fits into a 3.5GB file, easily small enough to run locally on a smart phone or Raspberry Pi. That means the “Tiananmen Square” question can be answered without having to send a network request. No cloud, wifi, or cell connection is needed! If you’re trying to control the flow of information in an authoritarian state like China, this is a problem. The Great Firewall has been reasonably effective at preventing ordinary citizens from accessing cloud-based services that might contradict CCP propaganda because they’re physically located outside of the country, but monitoring apps that run entirely locally on phones is going to be a much tougher challenge. One approach would be to produce alternative LLMs that only include approved texts, but as the “large” in the name implies, training these models requires a lot of data. Labeling all that data would be a daunting technical project, and the end results are likely to be less useful overall than an uncensored version. You could also try to prevent unauthorized models from being downloaded, but because they’re such useful tools they’re likely to show up preloaded in everything from phones to laptops and fridges. This local aspect of the current AI revolution isn’t often appreciated, because many of the demonstrations show up as familiar text boxes on web pages, just like the cloud services we’re used to. It starts to become a little clearer when you see how many models like LLaMa and Stable Diffusion can be run locally as desktop apps, or even on Raspberry Pis, but these are currently pretty slow and clunky. What’s going to happen over the next year or two is that the models will be optimized and start to match or even outstrip the speed of the web applications. The elimination of cloud bills for server processing and improved latency will drive commercial providers towards purely edge solutions, and the flood of edge hardware accelerators will narrow the capability gap between a typical phone or embedded system and a GPU in a data center. Simply put, people all over the world are going to be learning from their AI companions, as rudimentary as they currently are, and censoring information is going to be a lot harder when the whole process happens on the edge. Local LLMs are going to change politics all over the world, but especially in authoritarian states who try to keep strict controls on information flows. The Young Lady’s Illustrated Primer is already here, it’s just not evenly distributed yet. NOTES FROM A BANK RUN March 12, 2023 By Pete Warden in Uncategorized 1 Comment Photo by Gopal Vijayaraghavan My startup, Useful Sensors, has all of its money in Silicon Valley Bank. There are a lot of things I worried about as a CEO, but assessing SVB’s creditworthiness wasn’t one of them. It clearly should have been. I don’t have any grand theories about what’s happened over the last few days but I wanted to share some of my experiences as someone directly affected. To start with, Useful is not at risk of shutting down. The worst case scenario, as far as I can tell, is that we only have access to the insured amount of $250k in our SVB account on Monday. This will be plenty for payroll on Wednesday, and from what I’ve seen there are enough liquid assets that sales of the government bonds that triggered the whole process should be enough to return a good portion of the remaining balance within a week or so. If I need to, I’ll dip into my personal savings to keep the lights on. I know this isn’t true for many other startups though, so if they don’t get full access to their funds there will be job losses and closures. Although we’re not going to close, it is very disruptive to our business. Making sure that our customers are delighted and finding more of them should be taking all of our attention. Instead I spent most of Thursday and Friday dealing with a rapidly changing set of recommendations from our investors, attempting to move money, open new accounts, and now I’m discovering the joys of the FDIC claims process. I’m trying to do this all while I’m flying to Germany for Embedded World to announce a new distribution deal with OKdo, and this blog post is actually written from an airport lounge in Paris. Longer term, depending on the ultimate outcome it may affect when we want to raise our next round. To be clear, we’re actually in a great position compared to many others, I’m an old geezer with savings, but long-term planning at a startup is hard enough without extra challenges like this thrown in. It has been great having access to investors and founders who are able to help us in practical ways. We would never have been able to open a new account so quickly without introductions to helpful staff at another bank. I’ve been glued to the private founder chat rooms where people have shared their experiences with things like the FDIC claims process and pending wires. This kind of rapid communication and sharing of information is what makes Silicon Valley such a good place to build a startup, I’m very grateful for everyone’s help. Having said that, the Valley’s ability to spread information and recommendations quickly was one of the biggest causes of SVB’s demise. I’ve always been a bit of a rubbernecker at financial disasters, and I’d read enough books on the 2008 financial crisis to understand how bank runs happen. It was strange being in one myself though, because the logic of “everyone else is pulling their money so you’d better too before it’s all gone” is so powerful, even though I knew this mentality was a self-fulfilling prophecy. I planned on what I hoped was a moderate course of action, withdrawing some of our funds from SVB to move to another institution to gain some diversification, but by the time I was able to set up the transfer it was too late. Technology companies aren’t the most sympathetic victims in the current climate, for many good reasons. I thought this story covers the political dimensions of the bank failure well. The summary is that many taxpayers hate the idea of bailing out startups, especially ones with millions in their bank accounts. There are a lot of reasons why I think we’ll all benefit from not letting small businesses pay the price for bank executives messing up their risk management, but they’re all pretty wonky and will be a hard sell. However the alternative is a world where only the top two or three banks in the US get most of the deposits, because they’re perceived as too big to fail. If no financial regulator spotted the dangers with SVB, how can you expect small business owners to vet banks themselves? We’ll all just end up going to Citibank or JPMorgan, which increases the overall systemic risk, as we saw in 2008. Anyway, I just want to dedicate this to all of the founders having a tough weekend. Startups are all about dealing with risks, but this is a particularly frustrating problem to face because it’s so unnecessary. I hope at least we’ll learn more over the next few weeks about how executives and regulators let a US bank with $200 billion in assets get into such a sorry state. POST NAVIGATION « Older posts FOLLOW @PETEWARDEN ON TWITTER * RSS - Posts RECENT POSTS * Why We’re Building an Open-Source Universal Translator * The Unstoppable Rise of Disposable ML Frameworks * Request for Sensors * A Personal History of ML Quantization * Why Nvidia’s AI Supremacy is Only Temporary RECENT COMMENTS Pete Warden on Why We’re Building an Op…Daniel on Why We’re Building an Op…Masterpiece Studio,… on Why We’re Building an Op…Can Nvidia become ir… on Why Nvidia’s AI Supremac…alexandertolley on Why We’re Building an Op… ARCHIVES * October 2023 * September 2023 * July 2023 * May 2023 * March 2023 * January 2023 * December 2022 * November 2022 * October 2022 * September 2022 * June 2022 * May 2022 * April 2022 * February 2022 * January 2022 * December 2021 * August 2021 * April 2021 * February 2021 * November 2020 * August 2020 * May 2020 * February 2020 * January 2020 * August 2019 * July 2019 * April 2019 * March 2019 * October 2018 * July 2018 * June 2018 * May 2018 * April 2018 * March 2018 * February 2018 * January 2018 * December 2017 * November 2017 * October 2017 * August 2017 * July 2017 * June 2017 * May 2017 * April 2017 * January 2017 * December 2016 * September 2016 * May 2016 * April 2016 * March 2016 * February 2016 * November 2015 * October 2015 * September 2015 * August 2015 * May 2015 * April 2015 * March 2015 * January 2015 * December 2014 * November 2014 * October 2014 * September 2014 * August 2014 * July 2014 * June 2014 * May 2014 * April 2014 * March 2014 * February 2014 * January 2014 * December 2013 * November 2013 * October 2013 * September 2013 * August 2013 * July 2013 * June 2013 * May 2013 * April 2013 * March 2013 * February 2013 * January 2013 * November 2012 * October 2012 * August 2012 * July 2012 * June 2012 * May 2012 * April 2012 * March 2012 * February 2012 * January 2012 * December 2011 * November 2011 * October 2011 * September 2011 * August 2011 * July 2011 * June 2011 * May 2011 * April 2011 * March 2011 * February 2011 * January 2011 * December 2010 * November 2010 * October 2010 * September 2010 * August 2010 * July 2010 * June 2010 * May 2010 * April 2010 * March 2010 * February 2010 * January 2010 * December 2009 * November 2009 * October 2009 * September 2009 * August 2009 * July 2009 * June 2009 * May 2009 * April 2009 * March 2009 * February 2009 * January 2009 * December 2008 * November 2008 * October 2008 * September 2008 * August 2008 * July 2008 * June 2008 * May 2008 * April 2008 * March 2008 * February 2008 * January 2008 * December 2007 * November 2007 * October 2007 * September 2007 * August 2007 * July 2007 * June 2007 * May 2007 * April 2007 * March 2007 * December 2006 * November 2006 * October 2006 * September 2006 * August 2006 PETE WARDEN'S BLOG FOOTER MENU * Home * About Blog at WordPress.com. ↑ Pete Warden's blog Blog at WordPress.com. * Follow Following * Pete Warden's blog Join 1,884 other followers Sign me up * Already have a WordPress.com account? Log in now. * Privacy * * Pete Warden's blog * Customize * Follow Following * Sign up * Log in * Report this content * View site in Reader * Manage subscriptions * Collapse this bar Loading Comments... Write a Comment... Email (Required) Name (Required) Website