www.emqx.com
Open in
urlscan Pro
13.224.103.96
Public Scan
URL:
https://www.emqx.com/en/blog/reaching-100m-mqtt-connections-with-emqx-5-0
Submission: On July 25 via manual from US — Scanned from DE
Submission: On July 25 via manual from US — Scanned from DE
Form analysis
1 forms found in the DOM<form class="el-form form-container el-form--label-top">
<div class="el-form-item is-required email-input inside-label"><label for="email" class="el-form-item__label">Your email address</label>
<div class="el-form-item__content">
<div class="el-input"><!----><input type="text" autocomplete="off" class="el-input__inner"><!----><!----><!----><!----></div><!---->
</div>
</div> <button type="button" class="el-button button is-gradient no-transform is-size-6 has-text-weight-medium px-5 py-2 el-button--default"><!----><!----><span> Subscribe → </span></button>
</form>
Text Content
Upcoming Webinar Time’s Up! How to Migrate from Google IoT Core to EMQX Register Now → Products Cloud: EMQX Cloud→ * Serverless * Dedicated * BYOC (Bring Your Own Cloud) * Pricing * Sign In Software: EMQX Enterprise→ * Kubernetes Operator * EMQX Terraform * Deployments * Try Free Integrations → SDKs → Edge-to-Cloud Management * EMQX ECP Edge Computing * Neuron * NanoMQ Service & Tools * XMeter * MQTTX A SNEAK PEEK OF EMQX CLOUD SERVERLESS GENERAL RELEASE Learn More → Solutions By Industry Automotive Manufacturing Energy & Utilities Oil & Gas Carrier Transportation & Logistics All Industries → By Use Case V2X Smart Cockpit Internet of Vehicles Industrial Internet Automotive Manufacturing Transportation & Logistics All Use Cases → By Technology Confluent/Kafka MQTT Security MQTT Sparkplug Customer Stories SAIC Volkswagen SGITG E-Surfing EV Power All Customer Stories → Blog Learn Docs EMQX Enterprise → * Get Started * Integrations * SDKs * FAQ EMQX Cloud → * Get Started * Create a New Deployment * Connect To Deployment * FAQ EMQX ECP → EMQX → EMQX on Kubernetes → Neuron → NanoMQ → eKuiper → Learn * Blog * Videos * Presentations * Reports * White Papers * Training * MQTT Quickstart * MQTT Guide * MQTT 5 Explore * Public MQTT Broker * Online MQTT Client * MQTT Bench * JMeter MQTT DATA INFRASTRUCTURE FOR IOT WHITE PAPER Learn More → Community Community Explore Ways to Get Involved Forums Community Discussion Forums Events Worldwide Community Events Groups Find Your Local User Group TIME’S UP! HOW TO MIGRATE FROM GOOGLE IOT CORE TO EMQX Learn More → Company Company * About * Newsroom * Events * Careers Support & Service * Support * Consulting * Contact Us Partners * Overview * Find a partner * Become a partner AT EMQ, WE CODE THE FUTURE We're Hiring → Pricing Contact Sign In Start Free → REACHING 100M MQTT CONNECTIONS WITH EMQX 5.0 EMQX Team May 9, 2022 Table of Contents * Introduction * How we tested the scalability of mqtt broker * Challenges along the way * Results * Final remarks * References The ever-increasing scale of IoT device connections and deployments requires IoT messaging platforms to be massively scalable and robust at scale. To stress test the scalability of the MQTT messaging broker EMQX, we established 100 million MQTT connections to the clusters of 23 EMQX nodes to see how EMQX performs. In this test, each MQTT client subscribed to a unique wildcard topic, which requires more CPU resources than a direct topic. When publishing, we chose a 1-to-1 publisher-subscriber topology and reached 1 M messages processed per second. We also compared how the maximum subscription rate varies as the cluster size increases when we were using two different database backends, RLOG DB and Mnesia. Here we detail our setup and some of the challenges we faced along the way. INTRODUCTION EMQX is an open-source, highly scalable, and distributed MQTT messaging broker written in Erlang/OTP that can support millions of concurrent clients. As such, there is a need to persist and replicate various data among the cluster nodes. For example: MQTT topics and their subscribers, routing information, ACL rules, various configurations, and many more. Since its beginning, EMQX has used Mnesia as the database backend for such needs. Mnesia is an embedded ACID distributed database that comes with Erlang/OTP. It uses a full-mesh peer-to-peer Erlang distribution for transaction coordination and replication. Because of this characteristic, it has trouble scaling horizontally: the more nodes and replicas of the date there are, the bigger is the overhead for write-transaction coordination and the bigger is the risk of split-brain scenarios. In EMQX 5.0, we attempted to mitigate this issue in a new DB backend type called RLOG (as in replication log), which is implemented in Mria. Mria is an extension to the Mnesia database that helps it scale horizontally by defining two types of nodes: i) core nodes, which behave as usual Mnesia nodes and participate in write transactions; ii) replicant nodes, which do not take part in transactions and delegate those to core nodes, while keeping a read-only replica of the data locally. This helps to reduce the risk of split-brain scenarios and lessens the coordination needed for transactions, since fewer nodes participate in it, while keeping read-only data access fast, since data is available locally for reading in all nodes. In order to be able to use this new DB backend by default, we needed to stress test it and verify that it does indeed scale well horizontally. For that end, we performed tests in which a 23 node EMQX cluster sustained 100 million concurrent connections, divided in half between publishers and subscribers, and published messages in a one-to-one fashion. We also compared the RLOG DB backend to the conventional Mnesia one, and confirmed that RLOG can indeed sustain higher arrival rates than Mnesia. HOW WE TESTED THE SCALABILITY OF MQTT BROKER For deploying and running our cluster tests, we used AWS CDK, which allowed us to experiment with different instance types and numbers, and also trying out different development branches of EMQX. You can checkout our scripts in this Github repo. In our load generator nodes ("loadgens" for short), we used our emqtt-bench tool to generate the connection / publishing / subscribing traffic with various options. EMQX's Dashboard and Prometheus were used for monitoring the progress of the test and the instances' health. We've experimented gradually with various instance types and numbers, and in the last runs we've settled on using c6g.metal instances for both EMQX nodes and loadgens, and the "3+20" topology for our cluster: 3 nodes of type "core", which take part in write transactions, and 20 nodes of type "replicant", which are read-only replicas and delegate writes to the core nodes. As for our loadgens, we observed that publisher clients required quite a bit more resources than subscribers. For only connecting and subscribing 100 million connections, only 13 loadgen instances were needed; for publishing as well, we needed 17 instances. We did not use any load-balancers for those tests, and loadgens connected directly to each node. To allow core nodes to be dedicated solely for managing the database transactions, we did not make connections to them, and each loadgen client connected directly to each node in an evenly distributed fashion, so all nodes had about the same number of connections and resource usage. Each subscriber subscribed to a wildcard topic of the form bench/%i/# with QoS 1, where %i stands for a unique number per subscriber, and each publisher published with QoS 1 to a topic of the form bench/%i/test, with the same %i as the subscribers. That ensured that for each publisher there was exactly one subscriber. The size of the payload in the messages was always 256 bytes. In our tests, we first connected all our subscribers clients, and only then started to connect our publishers. Only after all publishers were connected they started to publish each every 90 s. The rate at which both subscribers and publishers connected to the brokers was 16,000 connections / s for the 100 M connection test reported here, although we believe that the cluster can sustain an even higher connection rate. CHALLENGES ALONG THE WAY As we experimented with such large volumes of connections and throughput, we've encountered several challenges along the way, investigated and improved performance bottlenecks. For tracking down memory and CPU usage in Erlang processes, system_monitor was quite a helpful tool, which is basically "htop for BEAM processes", allowing us to find processes with long message queues, high memory and/or CPU usage. It helped us perform a few performance tunings [1][2][3] in Mria after what we observed during the cluster tests. In our initial tests with Mria, without going into too many details, the replication mechanism basically involved logging all transactions to a "phantom" Mnesia table, which was subscribed to by replicant nodes. This effectively generated a bit of network overhead between the core nodes because each transaction was essentially "duplicated". In our OTP fork, we added a new Mnesia module that allows us to capture all committed transaction logs more easily, removing the need for the "duplicate" writes and reducing network usage significantly, and allowing the cluster to sustain higher connection / transaction rates. While stressing the cluster further after those optimizations, we found new bottlenecks that prompted further performance tunings[4][5][6]. Even our benchmarking tool needed a few adjustments to help with such a large volume and rate of connections. Several quality-of-life improvements have been made[7][8][9][10], as well as a couple of performance optimizations[11][12]. In our pub-sub tests, we even needed to use a special fork of it for the sole purpose of the test so that memory usage could be further lowered (not in the current master branch). RESULTS > NOTE: In this test, all of the paired publishers and subscribers happened to > reside in the same broker, which is not an ideal scenario close to real life > use cases. The EMQX team is conducting more tests, and will follow up with > more updates. The animation above illustrates our final results for the 1-to-1 publish-subscribe tests. We established 100 million connections, 50 M of which were subscribers and 50 M were publishers. By publishing every 90 seconds, we see that average inbound and outbound rates of over 1 M messages per second are achieved. At the publishing plateau, each of the 20 replicant nodes (which, we remind, are the ones taking in connections) consumed on average 90 % of its memory (about 113 GiB), and around 97 % CPU during the publishing waves (64 arm64 cores). The 3 core nodes handling the transactions were quite idle in CPU (less than 1 % usage) and used up only 28 % of their memory (about 36 GiB). The network traffic required during the publishing waves of 256 bytes payloads was between 240 and 290 MB / s. The loadgens required almost all their memory (about 120 GiB) and their entire CPU during the publishing plateau. Grafana screenshot of CPU, memory and network usage of EMQX nodes during the test In order to fairly compare a RLOG cluster to an equivalent Mnesia cluster, we used another topology with fewer total connections: 3 core nodes + 7 replicants for RLOG, and a 10-node Mnesia cluster where only 7 of those took in connections. With such topology, we performed connections and subscriptions without publishing at different rates. The plot below illustrates our results. For Mnesia, the faster we try to connect and subscribe to the nodes, the more we observe a "flattening" behavior, where the cluster is not able to reach the target maximum number of connections, which is 50 million in those tests. For RLOG, we see that we can reach higher connection rates without the cluster exhibiting such flattening behavior. With that, we see that Mria using RLOG can perform better under higher connection rates than the older Mnesia backend. FINAL REMARKS After seeing those optimistic results, we believe that the RLOG DB backend offered by Mria is ready for production usage in EMQX 5.0. It is already the default DB backend in the current master branch. Try EMQX Cloud for Free A fully managed, cloud-native MQTT service Get Started → REFERENCES [1] - fix(performance): Move message queues to off_heap by k32 · Pull Request #43 · emqx/mria [2] - perf(replicant): Improve performance of the agent and the replicant by k32 · Pull Request #44 · emqx/mria [3] - fix(mria_status): Remove mria_status process by k32 · Pull Request #48 · emqx/mria [4] - Store transactions in replayq in normal mode by k32 · Pull Request #65 · emqx/mria [5] - feat: Remove redundand data from the mnesia ops by k32 · Pull Request #67 · emqx/mria [6] - feat: Batch transaction imports by k32 · Pull Request #70 · emqx/mria [7] - feat: add new waiting options for publishing by thalesmg · Pull Request #160 · emqx/emqtt-bench [8] - feat: add option to retry connections by thalesmg · Pull Request #161 · emqx/emqtt-bench [9] - Add support for rate control for 1000+ conns/s by qzhuyan · Pull Request #167 · emqx/emqtt-bench [10] - support multi target hosts by qzhuyan · Pull Request #168 · emqx/emqtt-bench [11] - feat: bump max procs to 16M by qzhuyan · Pull Request #138 · emqx/emqtt-bench [12] - feat: tune gc for publishing by thalesmg · Pull Request #164 · emqx/emqtt-bench Edit Feedback 3 Ratings 4.33/5 Average EMQX TEAM The EMQX Team specializes in developing MQTT messaging infrastructure and solutions, including the most scalable open-source MQTT broker on the market – EMQX. The team also coordinates excellence on the enterprise MQTT platform at scale – EMQX Enterprise. SUBSCRIBE TO OUR BLOGS Your email address Subscribe → RELATED POSTS Jan 27, 2021 Zhiwei Yu Build an EMQX cluster based on HAProxy HAProxy can provide high availability, load balancing, and TCP and HTTP based application proxies. This article will introduce how to build the EMQX cluster based on HAProxy. Aug 3, 2022 EMQX Team How EMQX under the new architecture of Mria + RLOG achieves 100M MQTT connections This article will describe in detail the new underlying architecture that exponentially improves EMQX's horizontal scalability, helping you understand the technical principles of EMQX 5.0 cluster expansion. May 20, 2022 EMQX Team A Peek at EMQX 5: The Most Scalable MQTT Broker is Almost There This post will spotlight some of the EMQX MQTT broker features we've released through the Dashboard UI. SUBSCRIBE TO OUR NEWSLETTERS Subscribe → -------------------------------------------------------------------------------- English English 中文 日本語 PRODUCTS * EMQX Enterprise * EMQX Cloud * NanoMQ * Neuron * XMeter LEARN * Docs * Blog * Resources * MQTT Quickstart * MQTT Guide * MQTT 5 Explore SOLUTIONS * Automotive * Manufacturing * Energy & Utilities * Oil & Gas * Carrier * Transportation & Logistics * Finance & Payment * ICT * Retail COMMUNITY * Community * Events * Forums COMPANY * About * Partners * Newsroom * Support * Contact Us * Careers -------------------------------------------------------------------------------- © 2013-2023 EMQ Technologies Inc. All rights reserved Terms of Use Privacy Policy We use cookies! This website uses essential cookies to ensure its proper operation and tracking cookies to understand how you interact with it. Learn more in our cookie policy. AcceptSettings Cookie Preferences Privacy Preference Center When you visit any website, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences or your device and is mostly used to make the site work as you expect it to. The information does not usually directly identify you, but it can give you a more personalized web experience. Because we respect your right to privacy, you can choose not to allow some types of cookies. Click on the different category headings to find out more and change our default settings. However, blocking some types of cookies may impact your experience of the site and the services we are able to offer. Read more about our cookie policy. Strictly Necessary CookiesStrictly Necessary Cookies These cookies are necessary for the website to function and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site will not then work. Performance CookiesPerformance Cookies These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site. All information these cookies collect is aggregated and therefore anonymous. If you do not allow these cookies we will not know when you have visited our site, and will not be able to monitor its performance. Targeting CookiesTargeting Cookies These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising. More Information For any queries in relation to our policy on cookies and your choices, please contact us. Accept AllReject AllSave Settings