October 8, 2019
Exploring NVMe/TCP: Leveraging Standard TCP/IP Networks for NVMe SAN
NVMe-over-TCP, often abbreviated as NVMe/TCP, stands out as the latest iteration of the NVMe-over-Fabrics (NVMe-oF) protocol suite. Its defining characteristic lies in its ability to encapsulate NVMe commands within standard TCP/IP packets, utilizing ubiquitous and cost-effective networks.
All NVMe-over-Fabrics protocols are designed to facilitate communication between servers and SAN arrays using NVMe commands. These commands are significantly more efficient for flash storage devices compared to the legacy SCSI commands originally developed for hard disk drives. NVMe-over-TCP, like NVMe-over-RoCE, transmits these optimized commands over physical Ethernet networks. However, a key advantage of NVMe/TCP is that it eliminates the need for administrators to deploy and configure specialized switches and server network cards, simplifying deployment and reducing costs.
Advantages and Disadvantages of NVMe/TCP
NVMe-over-TCP presents itself as one of the most economically viable solutions for NVMe-oF, primarily because it leverages existing Ethernet infrastructures which are generally less expensive than Fiber Channel networks used in NVMe-over-FC. Furthermore, its implementation is considerably simpler, capitalizing on the widespread familiarity and manageability of TCP/IP networks.
A significant benefit stemming from NVMe/TCP’s inherent routability is the capability for servers and storage arrays to communicate seamlessly across existing corporate networks. This eliminates the requirement for dedicated switching infrastructure. Extending this further, communication is even possible over the internet, enabling scenarios such as remote data backup to geographically separated storage facilities or connecting branch office servers to headquarters storage.
Schematic representation of NVMe over TCP protocol stack
However, NVMe-over-TCP is not without its drawbacks. A primary concern is the computational overhead it introduces on the server. The processing required for TCP operations, particularly the calculation of packet checksums for error detection, consumes server CPU resources that would otherwise be available for application workloads. This can impact overall server performance, especially in CPU-intensive environments.
Another recognized disadvantage is the higher latency compared to other NVMe-over-Fabrics protocols. This latency increase is partly attributed to the mechanisms TCP employs to ensure reliable data delivery, including maintaining multiple data copies within streams to mitigate potential packet loss during routing. Performance evaluations have indicated that NVMe-over-TCP connections can exhibit latency figures 10 to 80 microseconds greater than NVMe-over-RoCE connections under identical network conditions. The exact latency impact is highly dependent on the specific protocol implementation and the nature of the data being transferred. Industry analysts anticipate ongoing optimizations in protocol implementations from vendors will lead to reduced latency in the future.
It’s important to note that the NVMe-over-TCP specification primarily describes a software-based implementation that integrates within the TCP/IP stack of server operating systems or array controllers. Despite this software focus, hardware acceleration for NVMe-over-TCP is technically feasible and could potentially mitigate some of the performance overhead.
NVMe-over-TCP: How It Functionally Operates
From a technical perspective, TCP defines the methodology for encapsulating data into packets for transmission across Ethernet links between a server and a storage array controller. The protocol ensures reliable data delivery by sending these packets to the recipient, where the original information is reassembled in the correct order. This mechanism guarantees data integrity throughout the transfer process.
TCP operates in conjunction with IP, the Internet Protocol, which is responsible for addressing and routing packets across networks. This fundamental interoperability is what makes NVMe-over-TCP compatible with virtually any TCP/IP network, including wide area networks like the internet.
The TCP/IP network model is conventionally structured into four layers: Application, Transport, Network, and Physical. In the context of NVMe-over-TCP:
- The Application Layer corresponds to the NVMe command set.
- The Transport Layer is handled by TCP.
- The Network Layer is managed by IP.
- The Physical Layer is typically Ethernet.
In a practical NVMe-over-TCP communication flow, the initiating server establishes a connection with the storage array controller by sending a connection request. Upon receiving a response from the controller, a communication channel is established. The server then transmits a PDU, or Protocol Data Unit. The PDU defines a structured format for data transfer, including packet size and composition. It also incorporates mechanisms to manage the data transfer process, such as packet sequencing for ordered delivery, and status information within control tags in each packet. Subsequently, the controller sends its own PDU to inform the server about its data transmission parameters for the return data path.
NVMe-over-TCP communications utilize various types of PDUs, each carrying specific information related to the protocol, message format, and in-order data delivery. Crucially, each communication session involves two data flows: one for transmitting the actual data, and another for acknowledgements confirming successful data exchange.
Key Vendors in the NVMe-over-TCP Space: Lightbits Labs and Solarflare Communications
Lightbits Labs and Solarflare Communications were among the pioneering vendors to implement NVMe-over-TCP in their storage solutions.
Lightbits Labs adopted NVMe-over-TCP to enable their elastic storage architecture. This architecture allows users to scale storage capacity by seamlessly adding disk enclosures without experiencing performance degradation with each increment. Furthermore, NVMe-over-TCP allows these storage blocks to be located anywhere within the data center, optimizing resource utilization without being constrained by network topology.
Solarflare (now part of Xilinx) offered solutions that mirrored the benefits of Lightbits Labs, but they also developed their own accelerator cards to address potential latency concerns. They partnered with SuperMicro to deliver turnkey appliance solutions, combining hardware acceleration with NVMe-over-TCP for enhanced performance.
Article originally published on TechTarget France, LeMagIT.
Additional Resources for Deeper Understanding
NVMe over TCP – Learn more about NVMe/TCP technology.
Kubernetes Persistent Storage – Explore persistent storage solutions for Kubernetes.
Edge Cloud Storage – Discover the future of storage in edge cloud environments.
Ceph Storage – Understand Ceph storage performance and latency considerations.
Disaggregated Storage – Delve into the concept and advantages of disaggregated storage architectures.
NVMe Storage Explained: NVMe Shared Storage, NVMe-oF, and More – Gain comprehensive knowledge about NVMe storage technologies.