League of Legends stands as a global phenomenon in online gaming, captivating millions of players daily. Its engaging multiplayer online battle arena (MOBA) gameplay hinges not only on strategic depth and skillful execution, but also on seamless communication between teammates. Imagine coordinating a crucial play in a high-stakes match without the ability to instantly communicate with your allies. This highlights the critical role of chat services in the League of Legends experience.
To understand the sheer scale of this challenge, consider the statistics. League of Legends must support peaks of 7.5 million concurrent players, with a staggering 27 million daily active players. The chat service alone processes 11,000 messages per second and routes 1 billion events per server daily, all while maintaining near-constant uptime. This article delves into the architecture behind the League of Legends chat system, revealing how its developers tackled the immense demands of a global gaming community.
The Architectural Foundation of League of Legend Servers Chat
Michal Ptaszek’s talk, “Scaling League of Legends Chat to 70 million Players,” at Strange Loop 2014, offers valuable insights into the system’s evolution. The development philosophy followed a pragmatic approach: “Make it work. Make it right. Make it fast.”
Initially, League of Legends adopted XMPP (Extensible Messaging and Presence Protocol) as the foundation for its chat service, mirroring the strategy employed by WhatsApp and other platforms requiring robust, scalable messaging. The Ejabberd server, an Erlang-based XMPP server, was chosen for its inherent scalability and concurrency capabilities. Erlang’s design principles, focused on distribution and fault tolerance, were crucial from the outset.
However, as player numbers surged, the standard XMPP implementation needed significant customization to meet the performance demands of League of Legends servers. This led to deep optimizations within the Erlang Virtual Machine (VM) itself, focusing on enhanced monitoring and performance tuning to eliminate bottlenecks that typically emerge at massive scale.
Leveraging Riak and CRDTs for Horizontal Scalability
Perhaps the most innovative aspect of the League of Legends chat architecture is its embrace of Riak and CRDTs (Convergent Replicated Data Types). This combination is central to achieving shared-nothing, linearly scalable horizontal expansion. CRDTs, while still considered an advanced concept in distributed systems, offer a unique approach to managing data consistency in highly distributed environments.
The goal was to create a chat system that could scale almost infinitely by simply adding more servers – a crucial requirement for handling the unpredictable growth of a global online game like League of Legends.
Key Statistics of the League of Legends Chat System
- 67 Million: Unique monthly players (across all services using chat).
- 27 Million: Daily active players.
- 7.5 Million: Concurrent players at peak times.
- 1 Billion: Events routed per server, per day.
- 11,000: Messages processed per second.
- Hundreds: Chat servers deployed globally.
- 3: Number of engineers managing the entire chat infrastructure.
- 99%: Uptime.
Platform Components
The League of Legends chat platform is built upon a robust stack of technologies:
- Ejabberd: The core XMPP server, chosen for its Erlang foundation and scalability.
- Riak: A distributed NoSQL database, providing fault tolerance and linear scalability.
- Load Balancers: Distributing traffic across chat servers for optimal performance and availability.
- Graphite, Zabbix, Nagios: Comprehensive monitoring tools for real-time performance analysis and alerting.
- Jenkins: For continuous integration and automated deployments.
- Confluence: For documentation and knowledge sharing within the team.
Deep Dive into the Chat Service Architecture
The chat service within League of Legends servers is more than just text communication. It acts as a central social hub, providing:
- One-on-one and Group Chat: Supporting both private conversations and team-based communication.
- Presence Service: Tracking player online status, game activity, and champion selection.
- Friends List Management: Maintaining social connections between players.
- REST APIs: Exposing chat functionality as a backend service for other League of Legends features, such as verifying friendships for gifting in the store or grouping players in leagues based on social connections.
Low latency and stability are paramount for the chat service. Any disruption or lag directly impacts the in-game experience, making it a critical component of the overall League of Legends infrastructure.
Customizing XMPP and Ejabberd
While XMPP provided a solid starting point, League of Legends engineers needed to deviate from the standard protocol and heavily customize Ejabberd to achieve their performance and feature goals.
- Protocol Optimization: The standard XMPP protocol for friendship creation was deemed too chatty, involving 16 messages. This was reduced to just three messages, significantly reducing database load. Unnecessary features and code from the core XMPP protocol were removed to streamline the system.
- Performance Tuning: Extensive code profiling was conducted to identify and eliminate performance bottlenecks. A key area was optimizing the Multi-User Chat (MUC) routing. Initially, a single MUC router process handled all group chat messages, becoming a bottleneck under heavy load. The solution was to parallelize routing, enabling each user session to directly look up group chat rooms, leveraging all available CPU cores.
- Session Table Optimization: Each Ejabberd server maintains a session table mapping user IDs to active sessions. Message delivery requires looking up session locations within the cluster. By optimizing session existence checks and presence priority handling, distributed writes to the session table were reduced by an impressive 96%, leading to faster logins and more frequent presence updates.
Embracing Shared-Nothing Architecture and Fault Tolerance
The architectural philosophy centered around a shared-nothing architecture to ensure linear horizontal scalability, improved fault isolation, and easier debugging. With hundreds of servers managed by a small team, fault tolerance was crucial. The system is designed to “let it crash,” favoring rapid restarts from a known state over slow, complex recovery processes. For example, database backlogs trigger server restarts, prioritizing real-time queries while rescheduling queued requests.
Each physical server runs both Ejabberd and Riak, forming independent clusters. Riak’s multi-datacenter replication exports persistent data to a secondary cluster for backups and resource-intensive ETL queries, preventing performance impacts on the primary cluster.
Migrating from MySQL to Riak
The initial choice of MySQL as the database proved problematic as scalability demands grew. Performance bottlenecks, reliability issues, and slow schema updates hindered development. Riak, a distributed, fault-tolerant, key-value store, emerged as the solution. Riak’s masterless architecture eliminates single points of failure and ensures data availability even during server outages.
The transition to Riak required significant effort in implementing eventual consistency within the chat server. League of Legends engineers developed an Ejabberd CRDT library to manage write conflicts and ensure data convergence.
CRDTs: Managing Data Consistency in a Distributed Chat System
CRDTs are the cornerstone of Riak’s approach to data consistency. Instead of directly modifying data objects, CRDTs maintain an operational log. For example, adding a friend doesn’t directly update the friends list; instead, an “Add Player X” operation is logged. When the friends list is read, the log is consulted, conflicts are resolved, and operations are applied in any order, ensuring eventual consistency. This approach is crucial for managing data updates across a distributed system without strict transactional overhead.
Riak’s adoption proved highly successful, enabling linear scalability and schema flexibility, allowing for on-the-fly object changes without database downtime.
Monitoring, Feature Toggles, and DevOps Practices
Maintaining the League of Legends chat service at scale requires sophisticated monitoring and DevOps practices:
- Extensive Monitoring: Over 500 real-time counters are collected every minute and fed into monitoring systems like Graphite, Zabbix, and Nagios. Alert thresholds trigger notifications, allowing proactive issue resolution before player impact. A real-world example cited was the rapid detection of a client update causing an infinite presence broadcast loop, immediately visible in Graphite dashboards.
- Feature Toggles: New features are deployed with on/off toggles, allowing instant disabling in case of issues without service restarts. Partial deployments enable testing new code with subsets of users before full rollout, mitigating risks associated with large-scale deployments.
- Hot Code Reloading: Erlang’s hot code reloading capability allows for live code updates without service interruptions. This was crucial in addressing issues caused by unexpected behavior from third-party XMPP clients, allowing fixes to be deployed to chat servers without downtime.
- Selective Logging: Debug mode can be enabled for specific user sessions, capturing detailed logs (XML traffic, events, metrics) without overwhelming log storage. This facilitates targeted debugging and testing of features in production with minimal noise from other users.
- Automated Load Testing: Nightly automated load tests simulate peak traffic conditions in a dedicated environment. Server health metrics are collected, analyzed, and summarized in reports, enabling performance regression detection and capacity planning.
Future Directions
Looking ahead, League of Legends developers are focused on:
- Migrating remaining data off MySQL.
- Expanding chat availability beyond the game client, allowing players to connect and socialize outside of game sessions.
- Leveraging the social graph to enhance player experience and introduce new social features.
- Migrating in-game chat to the out-of-game chat infrastructure, aiming for a unified and more versatile chat platform.
Key Lessons Learned in Scaling League of Legend Servers Chat
The journey of scaling League of Legends chat offers valuable lessons applicable to any large-scale online service:
- Expect Failures: Systems will inevitably fail. Design for resilience and fault tolerance. Prepare for scenarios like ISP outages causing massive player disconnects.
- Scale Amplifies Bugs: Rare bugs become frequent at scale. Even issues occurring once in a billion events will surface daily in a system of this size.
- Visibility is Crucial: Deep understanding of system behavior is essential. Implement comprehensive monitoring, logging, and alerting to maintain system health.
- Strategic Approach: Define a clear scaling strategy. League of Legends chose horizontal scaling and embraced technologies like Riak and CRDTs to support this strategy.
- Iterative Development: “Make it work, then make it right, then make it fast.” Start with a functional system (Ejabberd) and evolve it iteratively based on real-world requirements.
- DevOps Culture: Embrace DevOps practices – automation, monitoring, feature flags, hot updates – to manage complexity and accelerate development cycles.
- Optimize Protocols: Tailor protocols to specific needs. Avoid generic, chatty protocols if a more streamlined approach suffices (e.g., optimized friendship protocol).
- Minimize Shared State: Shared mutable state becomes a major scaling bottleneck. Design systems to minimize or eliminate shared state dependencies.
- Leverage Social Data: Chat services naturally generate valuable social graph data. Utilize this data to improve user experience and create new features.
By focusing on these principles and leveraging a combination of robust technologies and agile DevOps practices, League of Legends has built a chat system capable of handling the immense demands of its global player base, ensuring seamless communication and a high-quality gaming experience on their League Of Legend Servers.