Netflix, the name synonymous with on-demand entertainment, makes streaming movies and TV shows seem effortlessly simple. You press play, and instantly, video magic appears on your screen. But behind this user-friendly interface lies a complex and fascinating infrastructure. This article dives deep into the architecture that powers Netflix, focusing on the crucial role of Netflix Servers in delivering billions of hours of content globally.
To truly appreciate the scale of Netflix’s operation, consider these impressive statistics from 2017:
- Over 110 million subscribers worldwide.
- Operations spanning more than 200 countries.
- Nearly $3 billion in revenue per quarter.
- Over 5 million new subscribers added each quarter.
- More than 1 billion hours of video played weekly.
- 250 million hours of video streamed on a single day in 2017.
- Accounting for over 37% of peak internet traffic in the United States.
- A planned $7 billion investment in new content in 2018.
These numbers highlight a critical point: Netflix is massive. Its global reach, vast subscriber base, and sheer volume of streamed content demand an incredibly robust and efficient system. For Netflix, ensuring a seamless viewing experience is paramount. As a subscription-based service, viewer satisfaction directly translates to subscriber retention. If “play” doesn’t work, or buffering interrupts movie nights, Netflix risks losing valuable customers.
Netflix’s Two-Cloud Strategy: AWS and Open Connect
Netflix’s infrastructure is a compelling case study in cloud computing, showcasing sophisticated strategies for scalability, reliability, and global content delivery. Interestingly, Netflix doesn’t rely solely on a single cloud provider. Instead, it operates across two distinct cloud environments: Amazon Web Services (AWS) and its own custom-built Content Delivery Network (CDN) called Open Connect.
These two clouds work in tandem to provide the Netflix experience you know and love. AWS handles the backend operations, everything that happens before you press play, while Open Connect takes over after you hit play, managing the delivery of video content directly to your device.
The Three Pillars of Netflix Streaming: Client, Backend, and CDN
To understand how Netflix servers function, it’s helpful to visualize the system in three key parts:
-
Client: This is the user interface you interact with – the Netflix app on your smartphone, the website on your computer, or the app on your smart TV. Netflix meticulously controls the client experience across all supported devices.
-
Backend (AWS): Powered by AWS, the backend is responsible for all pre-playback operations. This includes:
- Processing and preparing new video content.
- Managing user accounts and subscriptions.
- Handling user requests from all client devices.
- Personalizing recommendations and artwork.
-
Content Delivery Network (Open Connect): This is Netflix’s global CDN, the network of Netflix servers specifically designed for video delivery. Open Connect stores Netflix content in strategically located servers around the world. When you press play, the video streams from the nearest Open Connect server to your device.
This vertically integrated approach, controlling the client, backend, and CDN, allows Netflix to maintain end-to-end control over the viewing experience. This control is key to ensuring the reliability and quality that Netflix users expect globally.
From Datacenters to AWS: The Evolution of Netflix’s Backend
Netflix’s journey to the cloud was born out of necessity. Launched in 1998 as a DVD rental service, Netflix recognized the future of on-demand streaming. In 2007, they introduced their streaming service, a pivotal moment that coincided with the rise of cloud computing.
Initially, Netflix relied on its own datacenters to power its streaming service. They built two adjacent datacenters, but quickly encountered the challenges of managing physical infrastructure. Scaling was slow and cumbersome, requiring lengthy procurement and installation processes. This led to a reliance on vertical scaling, building monolithic applications on large, powerful computers. However, as Netflix’s growth exploded, this monolithic architecture struggled to maintain reliability.
The 2008 Outage and the AWS Migration
A critical three-day outage in August 2008, caused by database corruption that halted DVD shipments, served as a wake-up call. Netflix realized that managing datacenters was not their core competency. Their strength lay in delivering exceptional video experiences, not in infrastructure management.
This realization led to a bold decision: migrating to AWS. In 2008, AWS was still in its early stages, making Netflix’s choice a forward-thinking bet on the future of cloud computing. Netflix sought the reliability, scalability, and global reach that AWS promised. They wanted to eliminate single points of failure, move away from monolithic architectures, and expand globally without the burden of building more datacenters.
AWS offered “undifferentiated heavy lifting” – managing the infrastructure complexities that don’t directly contribute to Netflix’s competitive advantage. This allowed Netflix engineers to focus on innovation and enhancing the core video streaming experience. The migration to AWS was a massive undertaking, taking over eight years to complete. During this period, Netflix’s streaming customer base grew eightfold, demonstrating the power of AWS to support massive scale. Today, Netflix runs on hundreds of thousands of EC2 instances, a testament to their complete embrace of AWS for their backend operations.
Enhanced Reliability and Cost Efficiency with AWS
Moving to AWS significantly improved Netflix’s service reliability. While outages are never entirely avoidable, the frequency of major disruptions drastically decreased. The cloud infrastructure provided redundancy and resilience that was difficult and costly to achieve with their own datacenters.
Netflix leverages three AWS regions – North Virginia, Portland Oregon, and Dublin Ireland – each with three availability zones. This multi-region architecture is designed for fault tolerance. If one region experiences an issue, traffic can be seamlessly rerouted to another region, a process Netflix calls “region evacuation.” They even conduct monthly tests, intentionally simulating region failures to ensure their systems can handle such events, with evacuation times as low as six minutes. This “global services model” ensures that users can be served from any region, maximizing uptime and availability.
Surprisingly, AWS also proved to be more cost-effective than operating their own datacenters. The elasticity of the cloud – the ability to scale resources up or down on demand – allowed Netflix to optimize costs. They could add servers during peak demand and reduce resources during off-peak hours, paying only for what they used. This dynamic resource allocation eliminated the need for over-provisioning and significantly reduced infrastructure costs per streaming view.
AWS: Powering Pre-Play Operations and Personalization
Within AWS, Netflix servers handle a vast array of tasks beyond just storing video. Anything that happens before you press play relies on the scalable computing and storage capabilities of AWS. This includes:
-
Scalable Computing (EC2): EC2 instances provide the processing power for various Netflix services. When you browse the Netflix catalog, your device communicates with EC2 instances to fetch video lists and details.
-
Scalable Storage (S3): S3 stores massive amounts of data, including video files before transcoding and various metadata.
-
Scalable Distributed Databases (DynamoDB and Cassandra): These databases store user profiles, billing information, viewing history, and more. The distributed nature ensures data redundancy and fault tolerance across regions.
-
Big Data Processing and Analytics: Netflix collects massive datasets on user behavior, viewing patterns, and content performance. Big data processing and analytics in AWS enable Netflix to personalize recommendations, optimize content acquisition, and enhance the user experience.
-
Personalized Artwork: A prime example of data-driven personalization is Netflix’s customized header images. Using viewing history and preferences, Netflix selects artwork designed to resonate with individual users, increasing the likelihood of engagement. For instance, comedy fans might see a Good Will Hunting image featuring Robin Williams, while romance enthusiasts might see a version highlighting Matt Damon and Minnie Driver. Similarly, Pulp Fiction artwork might feature Uma Thurman for viewers who enjoy her films, and John Travolta for his fans. This personalized approach extends to recommendations, where machine learning algorithms analyze viewing data to curate the 40-50 video options displayed to each user out of thousands available.
Transcoding: Preparing Video for Diverse Devices
Before content reaches your screen, Netflix must transcode source media – the high-definition video received from production houses – into a multitude of formats optimized for different devices and network conditions. This computationally intensive process is performed in AWS, utilizing up to 300,000 CPUs concurrently, rivaling the power of supercomputers.
The source media undergoes rigorous validation to identify and correct any digital artifacts or errors. It then enters a “media pipeline,” a series of over 70 software processes that break down the video into smaller chunks for parallel encoding. This parallelism significantly accelerates the transcoding process, allowing Netflix to encode and prepare content for CDN distribution in as little as 30 minutes.
The transcoding process generates a vast array of files for each title. Supporting over 2,200 different devices, Netflix creates an “encoding profile” for each video, optimizing for device-specific formats, network speeds, audio quality, and subtitle languages. For example, The Crown requires approximately 1,200 files, while Stranger Things Season 2, shot in 8K, resulted in 9,570 files.
Open Connect: Netflix’s Purpose-Built CDN
Once video content is transcoded and ready for streaming, Open Connect, Netflix’s custom CDN, takes center stage. A CDN is a network of content distribution servers strategically located globally to deliver content efficiently to users.
Netflix’s CDN strategy has evolved over time. Initially, they used a small, in-house CDN, followed by third-party CDNs. However, recognizing video delivery as a core competency and a potential competitive advantage, Netflix developed Open Connect in 2012.
Open Connect offers several key advantages:
- Cost Efficiency: Bypassing expensive third-party CDN fees, Open Connect significantly reduces delivery costs at Netflix’s massive scale.
- Enhanced Quality: End-to-end control over the video path, from encoding to CDN to client, allows Netflix to optimize video quality and viewing experience.
- Scalability and Optimization: Designed specifically for Netflix’s needs – delivering large video streams to a known subscriber base – Open Connect enables targeted optimizations that generic CDNs cannot achieve. Knowing their users and content allows Netflix to fine-tune its CDN for peak performance.
Open Connect Appliances (OCAs): The Workhorse Netflix Servers
At the heart of Open Connect are Netflix Open Connect Appliances (OCAs), the dedicated Netflix servers responsible for storing and streaming video content. These servers are strategically placed in over 1,000 locations worldwide.
OCAs are high-performance servers optimized for delivering large video files, equipped with substantial storage capacity using hard disks or flash drives. They are grouped into clusters for redundancy and scalability. Different types of OCAs exist, ranging from large servers storing the entire Netflix catalog to smaller caches holding a subset of content.
From a hardware perspective, OCAs are built using commodity PC components, emphasizing cost-effectiveness and scalability. Software-wise, they run on FreeBSD and utilize NGINX as the web server for streaming video.
The number of OCAs at a location depends on factors like desired reliability, anticipated traffic volume, and the percentage of traffic intended for local streaming. When you press play, you are likely streaming video from an OCA server located geographically close to you.
Strategic OCA Placement: ISPs and IXPs
Netflix employs a unique strategy for OCA placement, partnering with Internet Service Providers (ISPs) and leveraging Internet Exchange Points (IXPs). Instead of building and operating its own global network of datacenters, Netflix offers OCAs free to ISPs to host within their networks. They also place OCAs at or near IXPs.
Partnering with ISPs: ISPs provide internet access to end-users and are geographically distributed, close to customers. By placing OCAs within ISP networks, Netflix effectively positions its video servers closer to viewers, reducing latency and improving streaming performance.
Leveraging IXPs: IXPs are datacenters where ISPs and CDNs interconnect to exchange internet traffic. Placing OCAs at IXPs provides another strategic point for content distribution, facilitating efficient traffic exchange and further reducing network distance to users.
This approach avoids the complexity and expense of building a proprietary global network, while still achieving the benefits of a distributed CDN architecture.
Proactive Caching: Anticipating Viewing Demand
To ensure content is readily available, Netflix uses “proactive caching.” Each night, OCAs proactively retrieve video content based on predicted demand.
Netflix analyzes viewing data to predict content popularity in each region. Based on these predictions, a service in AWS instructs each OCA on which videos it should cache for the next day. OCAs then retrieve the designated content, often copying from nearby OCAs if available, or from larger regional caches. This “pre-positioning” of content ensures that popular videos are already present on local Netflix servers when users request them, minimizing latency and maximizing streaming speed.
Netflix operates a tiered caching system, with smaller OCAs in ISPs and IXPs, larger regional caches, and central repositories. This tiered approach optimizes storage and bandwidth utilization. Due to proactive caching and intelligent content placement, Open Connect virtually eliminates “cache misses” – requests for content not present on the local server. Netflix knows precisely where each video resides within its CDN at all times.
Win-Win for Netflix and ISPs: Reduced Congestion and Improved Experience
Hosting OCAs within their networks offers significant benefits to ISPs. By serving Netflix traffic locally from OCAs, ISPs reduce the amount of traffic traversing their networks and the broader internet backbone.
Since Netflix accounts for a substantial portion of internet traffic, localizing this traffic within ISP networks alleviates congestion and reduces the need for costly network capacity upgrades. This translates to cost savings for ISPs and improved network performance for all users. ISPs benefit from reduced transit costs, while Netflix ensures a high-quality streaming experience for its subscribers, creating a mutually beneficial partnership.
Reliability and Resilience of Open Connect
Open Connect mirrors AWS’s reliability through its distributed architecture. OCAs operate independently, creating a resilient network. If an OCA fails, the Netflix client automatically switches to another available OCA, ensuring uninterrupted streaming. Similarly, if an OCA becomes overloaded or network conditions degrade, the client intelligently selects a better-performing server. This inherent redundancy and dynamic client adaptation make Open Connect a highly reliable and resilient CDN.
Client Control: Ensuring Seamless Playback
Netflix’s control over the client application is crucial for managing playback seamlessly. Whether it’s a mobile app, a web browser, or a smart TV app, Netflix develops and maintains the client experience. Even on platforms where Netflix doesn’t directly build the client app, they control the Software Development Kit (SDK) used by device manufacturers.
This client-side control empowers Netflix to adapt to varying network conditions, handle OCA failures, and optimize the streaming experience dynamically and transparently. The SDK allows consistent behavior across all devices, ensuring a uniform Netflix experience regardless of the platform.
Pressing Play: The Journey from Request to Streaming
So, what precisely happens when you press play on Netflix? The seemingly simple action triggers a complex sequence of events:
-
Play Request: You select a video and press play on your Netflix client. The client sends a “play” request to Netflix’s Playback Apps service running in AWS.
-
License Verification: The Playback Apps service verifies your license to view the requested content in your geographic location. Licensing complexities are a major reason Netflix invests in original content, aiming for global, simultaneous releases.
-
OCA URL Retrieval: The Playback Apps service, considering your location and ISP information, returns URLs for up to ten optimal OCA servers.
-
Intelligent Client Selection: The client intelligently probes the network connection quality to each listed OCA server, selecting the fastest and most reliable one. This probing continues throughout the streaming session.
-
Content Probing and Streaming: The client establishes a connection with the chosen OCA, probes for optimal streaming parameters, and begins streaming video content to your device.
-
Adaptive Streaming: Throughout playback, the client continuously monitors network conditions. If network quality fluctuates, the client adaptively adjusts video quality to minimize buffering and maintain a smooth viewing experience. This adaptive streaming is why you might notice picture quality changes during playback, dynamically adjusting to network capacity.
In conclusion, watching Netflix is far from simple behind the scenes. It’s a testament to sophisticated engineering and a brilliant orchestration of cloud services, Netflix servers, and intelligent client technology. Netflix’s architecture demonstrates how to build a globally scalable, reliable, and user-centric streaming platform, delivering entertainment seamlessly to millions worldwide.