Quantcast
Channel: network – RDO Community Blogs
Viewing all articles
Browse latest Browse all 7

A quick tour of Infiniband

$
0
0

When considering setting up a Cloud infrastructure, the network interconnect is one of the key element. This is especially true when thinking about the storage part of it like Ceph.

1Gb Ethernet is now a commodity setup for a long while 10Gb Ethernet remains an option. It’s now time to think out of the box. What if the 10Gb isn’t enough ? Can I afford another setup to unleash the full potential of a server ?

This article will not be about using Infiniband with OpenStack or from the Virtual Machines. It will be about setup Infiniband on 3 nodes and estimate what we can achieve with it. We’ll stay focus on the interconnect performance and keep as close as possible to the hardware.

What setup to start ?

In our case, we are running 3 HP DL360 Gen8 servers equipped with a dual-port ConnectX-3 Mellanox (MCX354A-FCBT) card connected to SX6018 Mellanox switch. This card is featuring Dual FDR 56Gb/s or 40/56GbE. It’s also important to notice that those cards are connected on a PCI Express 3.0 8x bus.

All servers are running an Ubuntu 12.04. Then it’s up to choose the Infiniband drivers you want to use. Two options exists :

  • Using built in drivers from Ubuntu
  • Using Mellanox drivers

Using built-in drivers

The first solution is pretty straight-froward to setup as only a few set of packages needs to be installed :

apt-get install infiniband-diags ibutils ibverbs-utils qlvnictools srptools sdpnetstat rds-tools rdmacm-utils perftest libmthca1 libmlx4-1 libipathverbs1

Note that as per bug 1037107, it is recommended to upgrade libmlx4 from the Quantal repository

Using Mellanox drivers

Mellanox is providing on their download page a series of tarball featuring pre-built packages for the main Linux distribution like Redhat, Suse, Ubuntu and Fedora. This pre-built packages have a few advantages :

  • Up-to-date hardware support
  • Updated firmwares (Firmwares can also be downloaded from this page)
  • Latest features
  • Kernel modules compiled via DKMS

Considering the Ubuntu 12.04 case, the main drawback is the need of downgrading the kernel from a 3.8 kernel series to a 3.5 one. It sounds this will be fixed in a couple of months.

The installation process is pretty easy like :

./mlnxofedinstall –enable-sriov –force-fw-update

As you can notice, this script will flash the latest firmware provided in this set of packages (2.30.3110 in our case) with SR_IOV enabled. To get more information about SR_IOV that will not be discussed here, you can watch this video or watch this presentation.

Note that under Ubuntu 12.04, you need the Mellanox drivers to get a good support of SR_IOV

Configuring the adapter

Regarding your cables, your adapter can work in Ethernet or Infiniband mode. This setup is port-based and can be tuned via the port_type_array option of the mlx4_core kernel module. In our case, we use the Infiniband (1) port type.

This quick benchmark will be done on top of the IP, so we need to activate the IPoIB feature by setting IPOIB_LOAD=yes in /etc/infiniband/openib.conf and restart the openibd service.

With SR_IOV disabled (by default), two network devices appears : one per physical port. With a dual ports card, ib0 and ib1 interfaces are available and could be used like any other network device.

Infiniband adapters can run in two different modes : connected or datagram.
To choose one or the other you can do the following :

mode=datagram; ifconfig ib0 down; echo $mode > /sys/class/net/ib0/mode ; ifconfig ib0 up

To make this choice persistent, select the proper value of SET_IPOIB_CM in /etc/infiniband/openib.conf.

Using datagram, the MTU is set to 4K while the connected mode uses a 64K MTU. Note that in Infiniband mode, the MTU cannot be tuned via ethtool.

Running the benchmark

In this quick tour, we’d like to understand how much data a server can manage when using the IPoverIB feature. Infiniband provides a VERB API to do low-level IOs but very few applications are able to use it. Using the IP interface, all IP-capable application will have the benefits of Infiniband even if the performance will degraded regarding what the VERB API is capable-of.

All benchmarks were run using iperf3 in TCP, bidirectional mode on a 30 seconds basis.

The benchmark procedure details will use a very simple meta-language to increase precision and reduce verbosity.
To estimate the bandwidth one server can achieve, the test will be run like the following :

test_1_vs_1() {
   from server1:
      for stream_nb in 1 2 3 4 6 8 16 32 64; do
          benchmark server2 with $stream_nb streams
       done
}

Then we run the same test with two servers as clients:

test_1_vs_2() {
        from server1:
            for stream_nb in 1 2 3 4 8 16 32; do
                benchmark server2 with $stream_nb streams &
                benchmark server3 with $stream_nb streams
             done
    }

Those two tests are run in both connected and datagram mode :

select_mode("datagram")
test_1_vs_1()
test_1_vs_2()

select_mode("connected")
test_1_vs_1()
test_1_vs_2()

Results

Datagram Mode

iperf3 in datagram mode (IB)

Connected Mode

iperf3 in connected mode (IB)

Datagram vs Connected

iperf3 compared connected vs datagram mode

Discussion

In both configuration (datagram or connected) :

  • the bandwidth handled by the host is divided in two equal parts : sending and receiving data. As a result some plots are overlapping : only one of the two traces (Recv or Sent) is visible.
  • the maximum bandwidth is reached by using two clients servers
  • using two clients provides a 2x performance increase

The connected mode reports a 56Gbit/sec traffic up to 8 simultaneous streams. This amount of data is the maximum we can expect from this setup. In this configuration, the Infiniband setup is providing 2.5x more bandwidth than a 10Gb setup can achieve.

Latency

Latency is also one of the key strength of Infiniband. On this short tests, measured TCP latencies were between 13 and 17µs while between 40 and 50µs on Ethernet.

Conclusion

This quick tour of Infiniband demonstrated that :

  • Installing and configuration Infiniband is very easy
  • In bidirectional mode, bandwidth could be 2.5x better than a 10GbE setup
  • TCP Latency is divided by almost 4 times

Infiniband provides a high bandwidth and low-latency interconnect that could be used as a backend for IP-based application. Distributed storage solution or high-demanding application could have the benefit of such solution.


Viewing all articles
Browse latest Browse all 7

Trending Articles