www.emqx.com Open in urlscan Pro
13.224.103.96  Public Scan

URL: https://www.emqx.com/en/blog/reaching-100m-mqtt-connections-with-emqx-5-0
Submission: On July 25 via manual from US — Scanned from DE

Form analysis 1 forms found in the DOM

<form class="el-form form-container el-form--label-top">
  <div class="el-form-item is-required email-input inside-label"><label for="email" class="el-form-item__label">Your email address</label>
    <div class="el-form-item__content">
      <div class="el-input"><!----><input type="text" autocomplete="off" class="el-input__inner"><!----><!----><!----><!----></div><!---->
    </div>
  </div> <button type="button" class="el-button button is-gradient no-transform is-size-6 has-text-weight-medium px-5 py-2 el-button--default"><!----><!----><span> Subscribe → </span></button>
</form>

Text Content

Upcoming Webinar
Time’s Up! How to Migrate from Google IoT Core to EMQX
Register Now →

Products
Cloud: EMQX Cloud→
 * Serverless
 * Dedicated
 * BYOC (Bring Your Own Cloud)
 * Pricing
 * Sign In

Software: EMQX Enterprise→
 * Kubernetes Operator
 * EMQX Terraform
 * Deployments
 * Try Free

Integrations →
SDKs →
Edge-to-Cloud Management
 * EMQX ECP

Edge Computing
 * Neuron
 * NanoMQ

Service & Tools
 * XMeter
 * MQTTX

A SNEAK PEEK OF EMQX CLOUD SERVERLESS GENERAL RELEASE

Learn More →
Solutions
By Industry
Automotive
Manufacturing
Energy & Utilities
Oil & Gas
Carrier
Transportation & Logistics
All Industries →
By Use Case
V2X
Smart Cockpit
Internet of Vehicles
Industrial Internet
Automotive Manufacturing
Transportation & Logistics
All Use Cases →
By Technology
Confluent/Kafka
MQTT Security
MQTT Sparkplug
Customer Stories
SAIC Volkswagen
SGITG
E-Surfing
EV Power
All Customer Stories →
Blog
Learn
Docs
EMQX Enterprise →
 * Get Started
 * Integrations
 * SDKs
 * FAQ

EMQX Cloud →
 * Get Started
 * Create a New Deployment
 * Connect To Deployment
 * FAQ

EMQX ECP →
EMQX →
EMQX on Kubernetes →
Neuron →
NanoMQ →
eKuiper →

Learn
 * Blog
 * Videos
 * Presentations
 * Reports
 * White Papers
 * Training

 * MQTT Quickstart
 * MQTT Guide
 * MQTT 5 Explore
 * Public MQTT Broker
 * Online MQTT Client
 * MQTT Bench
 * JMeter MQTT

DATA INFRASTRUCTURE FOR IOT WHITE PAPER

Learn More →
Community
Community
Explore Ways to Get Involved
Forums
Community Discussion Forums
Events
Worldwide Community Events
Groups
Find Your Local User Group

TIME’S UP! HOW TO MIGRATE FROM GOOGLE IOT CORE TO EMQX

Learn More →
Company
Company
 * About
 * Newsroom
 * Events
 * Careers

Support & Service
 * Support
 * Consulting
 * Contact Us

Partners
 * Overview
 * Find a partner
 * Become a partner

AT EMQ, WE CODE THE FUTURE

We're Hiring →
Pricing Contact
Sign In
Start Free →


REACHING 100M MQTT CONNECTIONS WITH EMQX 5.0

EMQX Team May 9, 2022

Table of Contents

 * Introduction
 * How we tested the scalability of mqtt broker
 * Challenges along the way
 * Results
 * Final remarks
 * References

The ever-increasing scale of IoT device connections and deployments requires IoT
messaging platforms to be massively scalable and robust at scale. To stress test
the scalability of the MQTT messaging broker EMQX, we established 100 million
MQTT connections to the clusters of 23 EMQX nodes to see how EMQX performs.

In this test, each MQTT client subscribed to a unique wildcard topic, which
requires more CPU resources than a direct topic. When publishing, we chose a
1-to-1 publisher-subscriber topology and reached 1 M messages processed per
second. We also compared how the maximum subscription rate varies as the cluster
size increases when we were using two different database backends, RLOG DB and
Mnesia. Here we detail our setup and some of the challenges we faced along the
way.


INTRODUCTION

EMQX is an open-source, highly scalable, and distributed MQTT messaging broker
written in Erlang/OTP that can support millions of concurrent clients. As such,
there is a need to persist and replicate various data among the cluster nodes.
For example: MQTT topics and their subscribers, routing information, ACL rules,
various configurations, and many more. Since its beginning, EMQX has used Mnesia
as the database backend for such needs.

Mnesia is an embedded ACID distributed database that comes with Erlang/OTP. It
uses a full-mesh peer-to-peer Erlang distribution for transaction coordination
and replication. Because of this characteristic, it has trouble scaling
horizontally: the more nodes and replicas of the date there are, the bigger is
the overhead for write-transaction coordination and the bigger is the risk of
split-brain scenarios.

In EMQX 5.0, we attempted to mitigate this issue in a new DB backend type called
RLOG (as in replication log), which is implemented in Mria. Mria is an extension
to the Mnesia database that helps it scale horizontally by defining two types of
nodes: i) core nodes, which behave as usual Mnesia nodes and participate in
write transactions; ii) replicant nodes, which do not take part in transactions
and delegate those to core nodes, while keeping a read-only replica of the data
locally. This helps to reduce the risk of split-brain scenarios and lessens the
coordination needed for transactions, since fewer nodes participate in it, while
keeping read-only data access fast, since data is available locally for reading
in all nodes.

In order to be able to use this new DB backend by default, we needed to stress
test it and verify that it does indeed scale well horizontally. For that end, we
performed tests in which a 23 node EMQX cluster sustained 100 million concurrent
connections, divided in half between publishers and subscribers, and published
messages in a one-to-one fashion. We also compared the RLOG DB backend to the
conventional Mnesia one, and confirmed that RLOG can indeed sustain higher
arrival rates than Mnesia.


HOW WE TESTED THE SCALABILITY OF MQTT BROKER

For deploying and running our cluster tests, we used AWS CDK, which allowed us
to experiment with different instance types and numbers, and also trying out
different development branches of EMQX. You can checkout our scripts in this
Github repo. In our load generator nodes ("loadgens" for short), we used our
emqtt-bench tool to generate the connection / publishing / subscribing traffic
with various options. EMQX's Dashboard and Prometheus were used for monitoring
the progress of the test and the instances' health.

We've experimented gradually with various instance types and numbers, and in the
last runs we've settled on using c6g.metal instances for both EMQX nodes and
loadgens, and the "3+20" topology for our cluster: 3 nodes of type "core", which
take part in write transactions, and 20 nodes of type "replicant", which are
read-only replicas and delegate writes to the core nodes. As for our loadgens,
we observed that publisher clients required quite a bit more resources than
subscribers. For only connecting and subscribing 100 million connections, only
13 loadgen instances were needed; for publishing as well, we needed 17
instances.



We did not use any load-balancers for those tests, and loadgens connected
directly to each node. To allow core nodes to be dedicated solely for managing
the database transactions, we did not make connections to them, and each loadgen
client connected directly to each node in an evenly distributed fashion, so all
nodes had about the same number of connections and resource usage. Each
subscriber subscribed to a wildcard topic of the form bench/%i/# with QoS 1,
where %i stands for a unique number per subscriber, and each publisher published
with QoS 1 to a topic of the form bench/%i/test, with the same %i as the
subscribers. That ensured that for each publisher there was exactly one
subscriber. The size of the payload in the messages was always 256 bytes.

In our tests, we first connected all our subscribers clients, and only then
started to connect our publishers. Only after all publishers were connected they
started to publish each every 90 s. The rate at which both subscribers and
publishers connected to the brokers was 16,000 connections / s for the 100 M
connection test reported here, although we believe that the cluster can sustain
an even higher connection rate.


CHALLENGES ALONG THE WAY

As we experimented with such large volumes of connections and throughput, we've
encountered several challenges along the way, investigated and improved
performance bottlenecks. For tracking down memory and CPU usage in Erlang
processes, system_monitor was quite a helpful tool, which is basically "htop for
BEAM processes", allowing us to find processes with long message queues, high
memory and/or CPU usage. It helped us perform a few performance tunings
[1][2][3] in Mria after what we observed during the cluster tests.

In our initial tests with Mria, without going into too many details, the
replication mechanism basically involved logging all transactions to a "phantom"
Mnesia table, which was subscribed to by replicant nodes. This effectively
generated a bit of network overhead between the core nodes because each
transaction was essentially "duplicated". In our OTP fork, we added a new Mnesia
module that allows us to capture all committed transaction logs more easily,
removing the need for the "duplicate" writes and reducing network usage
significantly, and allowing the cluster to sustain higher connection /
transaction rates. While stressing the cluster further after those
optimizations, we found new bottlenecks that prompted further performance
tunings[4][5][6].

Even our benchmarking tool needed a few adjustments to help with such a large
volume and rate of connections. Several quality-of-life improvements have been
made[7][8][9][10], as well as a couple of performance optimizations[11][12]. In
our pub-sub tests, we even needed to use a special fork of it for the sole
purpose of the test so that memory usage could be further lowered (not in the
current master branch).


RESULTS



> NOTE: In this test, all of the paired publishers and subscribers happened to
> reside in the same broker, which is not an ideal scenario close to real life
> use cases. The EMQX team is conducting more tests, and will follow up with
> more updates.

The animation above illustrates our final results for the 1-to-1
publish-subscribe tests. We established 100 million connections, 50 M of which
were subscribers and 50 M were publishers. By publishing every 90 seconds, we
see that average inbound and outbound rates of over 1 M messages per second are
achieved. At the publishing plateau, each of the 20 replicant nodes (which, we
remind, are the ones taking in connections) consumed on average 90 % of its
memory (about 113 GiB), and around 97 % CPU during the publishing waves (64
arm64 cores). The 3 core nodes handling the transactions were quite idle in CPU
(less than 1 % usage) and used up only 28 % of their memory (about 36 GiB). The
network traffic required during the publishing waves of 256 bytes payloads was
between 240 and 290 MB / s. The loadgens required almost all their memory (about
120 GiB) and their entire CPU during the publishing plateau.



Grafana screenshot of CPU, memory and network usage of EMQX nodes during the
test

In order to fairly compare a RLOG cluster to an equivalent Mnesia cluster, we
used another topology with fewer total connections: 3 core nodes + 7 replicants
for RLOG, and a 10-node Mnesia cluster where only 7 of those took in
connections. With such topology, we performed connections and subscriptions
without publishing at different rates. The plot below illustrates our results.
For Mnesia, the faster we try to connect and subscribe to the nodes, the more we
observe a "flattening" behavior, where the cluster is not able to reach the
target maximum number of connections, which is 50 million in those tests. For
RLOG, we see that we can reach higher connection rates without the cluster
exhibiting such flattening behavior. With that, we see that Mria using RLOG can
perform better under higher connection rates than the older Mnesia backend.




FINAL REMARKS

After seeing those optimistic results, we believe that the RLOG DB backend
offered by Mria is ready for production usage in EMQX 5.0. It is already the
default DB backend in the current master branch.

Try EMQX Cloud for Free
A fully managed, cloud-native MQTT service
Get Started →


REFERENCES

[1] - fix(performance): Move message queues to off_heap by k32 · Pull Request
#43 · emqx/mria

[2] - perf(replicant): Improve performance of the agent and the replicant by k32
· Pull Request #44 · emqx/mria

[3] - fix(mria_status): Remove mria_status process by k32 · Pull Request #48 ·
emqx/mria

[4] - Store transactions in replayq in normal mode by k32 · Pull Request #65 ·
emqx/mria

[5] - feat: Remove redundand data from the mnesia ops by k32 · Pull Request #67
· emqx/mria

[6] - feat: Batch transaction imports by k32 · Pull Request #70 · emqx/mria

[7] - feat: add new waiting options for publishing by thalesmg · Pull Request
#160 · emqx/emqtt-bench

[8] - feat: add option to retry connections by thalesmg · Pull Request #161 ·
emqx/emqtt-bench

[9] - Add support for rate control for 1000+ conns/s by qzhuyan · Pull Request
#167 · emqx/emqtt-bench

[10] - support multi target hosts by qzhuyan · Pull Request #168 ·
emqx/emqtt-bench

[11] - feat: bump max procs to 16M by qzhuyan · Pull Request #138 ·
emqx/emqtt-bench

[12] - feat: tune gc for publishing by thalesmg · Pull Request #164 ·
emqx/emqtt-bench

Edit Feedback

3 Ratings
4.33/5 Average


EMQX TEAM

The EMQX Team specializes in developing MQTT messaging infrastructure and
solutions, including the most scalable open-source MQTT broker on the market –
EMQX. The team also coordinates excellence on the enterprise MQTT platform at
scale – EMQX Enterprise.


SUBSCRIBE TO OUR BLOGS

Your email address

Subscribe →


RELATED POSTS

Jan 27, 2021 Zhiwei Yu
Build an EMQX cluster based on HAProxy

HAProxy can provide high availability, load balancing, and TCP and HTTP based
application proxies. This article will introduce how to build the EMQX cluster
based on HAProxy.

Aug 3, 2022 EMQX Team
How EMQX under the new architecture of Mria + RLOG achieves 100M MQTT
connections

This article will describe in detail the new underlying architecture that
exponentially improves EMQX's horizontal scalability, helping you understand the
technical principles of EMQX 5.0 cluster expansion.

May 20, 2022 EMQX Team
A Peek at EMQX 5: The Most Scalable MQTT Broker is Almost There

This post will spotlight some of the EMQX MQTT broker features we've released
through the Dashboard UI.




SUBSCRIBE TO OUR NEWSLETTERS

Subscribe →



--------------------------------------------------------------------------------

English
English 中文 日本語


PRODUCTS

 * EMQX Enterprise
 * EMQX Cloud
 * NanoMQ
 * Neuron
 * XMeter

LEARN

 * Docs
 * Blog
 * Resources
 * MQTT Quickstart
 * MQTT Guide
 * MQTT 5 Explore

SOLUTIONS

 * Automotive
 * Manufacturing
 * Energy & Utilities
 * Oil & Gas
 * Carrier
 * Transportation & Logistics
 * Finance & Payment
 * ICT
 * Retail

COMMUNITY

 * Community
 * Events
 * Forums

COMPANY

 * About
 * Partners
 * Newsroom
 * Support
   
 * Contact Us
 * Careers

--------------------------------------------------------------------------------

© 2013-2023 EMQ Technologies Inc. All rights reserved
Terms of Use Privacy Policy
We use cookies!
This website uses essential cookies to ensure its proper operation and tracking
cookies to understand how you interact with it. Learn more in our cookie policy.
AcceptSettings

Cookie Preferences

Privacy Preference Center
When you visit any website, it may store or retrieve information on your
browser, mostly in the form of cookies. This information might be about you,
your preferences or your device and is mostly used to make the site work as you
expect it to. The information does not usually directly identify you, but it can
give you a more personalized web experience. Because we respect your right to
privacy, you can choose not to allow some types of cookies. Click on the
different category headings to find out more and change our default settings.
However, blocking some types of cookies may impact your experience of the site
and the services we are able to offer. Read more about our cookie policy.
Strictly Necessary CookiesStrictly Necessary Cookies
These cookies are necessary for the website to function and cannot be switched
off in our systems. They are usually only set in response to actions made by you
which amount to a request for services, such as setting your privacy
preferences, logging in or filling in forms. You can set your browser to block
or alert you about these cookies, but some parts of the site will not then work.
Performance CookiesPerformance Cookies
These cookies allow us to count visits and traffic sources so we can measure and
improve the performance of our site. They help us to know which pages are the
most and least popular and see how visitors move around the site. All
information these cookies collect is aggregated and therefore anonymous. If you
do not allow these cookies we will not know when you have visited our site, and
will not be able to monitor its performance.
Targeting CookiesTargeting Cookies
These cookies may be set through our site by our advertising partners. They may
be used by those companies to build a profile of your interests and show you
relevant adverts on other sites. They do not store directly personal
information, but are based on uniquely identifying your browser and internet
device. If you do not allow these cookies, you will experience less targeted
advertising.
More Information
For any queries in relation to our policy on cookies and your choices, please
contact us.
Accept AllReject AllSave Settings