xref: /OK3568_Linux_fs/kernel/Documentation/admin-guide/nfs/nfs-rdma.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun===================
2*4882a593SmuzhiyunSetting up NFS/RDMA
3*4882a593Smuzhiyun===================
4*4882a593Smuzhiyun
5*4882a593Smuzhiyun:Author:
6*4882a593Smuzhiyun  NetApp and Open Grid Computing (May 29, 2008)
7*4882a593Smuzhiyun
8*4882a593Smuzhiyun.. warning::
9*4882a593Smuzhiyun  This document is probably obsolete.
10*4882a593Smuzhiyun
11*4882a593SmuzhiyunOverview
12*4882a593Smuzhiyun========
13*4882a593Smuzhiyun
14*4882a593SmuzhiyunThis document describes how to install and setup the Linux NFS/RDMA client
15*4882a593Smuzhiyunand server software.
16*4882a593Smuzhiyun
17*4882a593SmuzhiyunThe NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server
18*4882a593Smuzhiyunwas first included in the following release, Linux 2.6.25.
19*4882a593Smuzhiyun
20*4882a593SmuzhiyunIn our testing, we have obtained excellent performance results (full 10Gbit
21*4882a593Smuzhiyunwire bandwidth at minimal client CPU) under many workloads. The code passes
22*4882a593Smuzhiyunthe full Connectathon test suite and operates over both Infiniband and iWARP
23*4882a593SmuzhiyunRDMA adapters.
24*4882a593Smuzhiyun
25*4882a593SmuzhiyunGetting Help
26*4882a593Smuzhiyun============
27*4882a593Smuzhiyun
28*4882a593SmuzhiyunIf you get stuck, you can ask questions on the
29*4882a593Smuzhiyunnfs-rdma-devel@lists.sourceforge.net mailing list.
30*4882a593Smuzhiyun
31*4882a593SmuzhiyunInstallation
32*4882a593Smuzhiyun============
33*4882a593Smuzhiyun
34*4882a593SmuzhiyunThese instructions are a step by step guide to building a machine for
35*4882a593Smuzhiyunuse with NFS/RDMA.
36*4882a593Smuzhiyun
37*4882a593Smuzhiyun- Install an RDMA device
38*4882a593Smuzhiyun
39*4882a593Smuzhiyun  Any device supported by the drivers in drivers/infiniband/hw is acceptable.
40*4882a593Smuzhiyun
41*4882a593Smuzhiyun  Testing has been performed using several Mellanox-based IB cards, the
42*4882a593Smuzhiyun  Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter.
43*4882a593Smuzhiyun
44*4882a593Smuzhiyun- Install a Linux distribution and tools
45*4882a593Smuzhiyun
46*4882a593Smuzhiyun  The first kernel release to contain both the NFS/RDMA client and server was
47*4882a593Smuzhiyun  Linux 2.6.25  Therefore, a distribution compatible with this and subsequent
48*4882a593Smuzhiyun  Linux kernel release should be installed.
49*4882a593Smuzhiyun
50*4882a593Smuzhiyun  The procedures described in this document have been tested with
51*4882a593Smuzhiyun  distributions from Red Hat's Fedora Project (http://fedora.redhat.com/).
52*4882a593Smuzhiyun
53*4882a593Smuzhiyun- Install nfs-utils-1.1.2 or greater on the client
54*4882a593Smuzhiyun
55*4882a593Smuzhiyun  An NFS/RDMA mount point can be obtained by using the mount.nfs command in
56*4882a593Smuzhiyun  nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils
57*4882a593Smuzhiyun  version with support for NFS/RDMA mounts, but for various reasons we
58*4882a593Smuzhiyun  recommend using nfs-utils-1.1.2 or greater). To see which version of
59*4882a593Smuzhiyun  mount.nfs you are using, type:
60*4882a593Smuzhiyun
61*4882a593Smuzhiyun  .. code-block:: sh
62*4882a593Smuzhiyun
63*4882a593Smuzhiyun    $ /sbin/mount.nfs -V
64*4882a593Smuzhiyun
65*4882a593Smuzhiyun  If the version is less than 1.1.2 or the command does not exist,
66*4882a593Smuzhiyun  you should install the latest version of nfs-utils.
67*4882a593Smuzhiyun
68*4882a593Smuzhiyun  Download the latest package from: https://www.kernel.org/pub/linux/utils/nfs
69*4882a593Smuzhiyun
70*4882a593Smuzhiyun  Uncompress the package and follow the installation instructions.
71*4882a593Smuzhiyun
72*4882a593Smuzhiyun  If you will not need the idmapper and gssd executables (you do not need
73*4882a593Smuzhiyun  these to create an NFS/RDMA enabled mount command), the installation
74*4882a593Smuzhiyun  process can be simplified by disabling these features when running
75*4882a593Smuzhiyun  configure:
76*4882a593Smuzhiyun
77*4882a593Smuzhiyun  .. code-block:: sh
78*4882a593Smuzhiyun
79*4882a593Smuzhiyun    $ ./configure --disable-gss --disable-nfsv4
80*4882a593Smuzhiyun
81*4882a593Smuzhiyun  To build nfs-utils you will need the tcp_wrappers package installed. For
82*4882a593Smuzhiyun  more information on this see the package's README and INSTALL files.
83*4882a593Smuzhiyun
84*4882a593Smuzhiyun  After building the nfs-utils package, there will be a mount.nfs binary in
85*4882a593Smuzhiyun  the utils/mount directory. This binary can be used to initiate NFS v2, v3,
86*4882a593Smuzhiyun  or v4 mounts. To initiate a v4 mount, the binary must be called
87*4882a593Smuzhiyun  mount.nfs4.  The standard technique is to create a symlink called
88*4882a593Smuzhiyun  mount.nfs4 to mount.nfs.
89*4882a593Smuzhiyun
90*4882a593Smuzhiyun  This mount.nfs binary should be installed at /sbin/mount.nfs as follows:
91*4882a593Smuzhiyun
92*4882a593Smuzhiyun  .. code-block:: sh
93*4882a593Smuzhiyun
94*4882a593Smuzhiyun    $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs
95*4882a593Smuzhiyun
96*4882a593Smuzhiyun  In this location, mount.nfs will be invoked automatically for NFS mounts
97*4882a593Smuzhiyun  by the system mount command.
98*4882a593Smuzhiyun
99*4882a593Smuzhiyun    .. note::
100*4882a593Smuzhiyun      mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed
101*4882a593Smuzhiyun      on the NFS client machine. You do not need this specific version of
102*4882a593Smuzhiyun      nfs-utils on the server. Furthermore, only the mount.nfs command from
103*4882a593Smuzhiyun      nfs-utils-1.1.2 is needed on the client.
104*4882a593Smuzhiyun
105*4882a593Smuzhiyun- Install a Linux kernel with NFS/RDMA
106*4882a593Smuzhiyun
107*4882a593Smuzhiyun  The NFS/RDMA client and server are both included in the mainline Linux
108*4882a593Smuzhiyun  kernel version 2.6.25 and later. This and other versions of the Linux
109*4882a593Smuzhiyun  kernel can be found at: https://www.kernel.org/pub/linux/kernel/
110*4882a593Smuzhiyun
111*4882a593Smuzhiyun  Download the sources and place them in an appropriate location.
112*4882a593Smuzhiyun
113*4882a593Smuzhiyun- Configure the RDMA stack
114*4882a593Smuzhiyun
115*4882a593Smuzhiyun  Make sure your kernel configuration has RDMA support enabled. Under
116*4882a593Smuzhiyun  Device Drivers -> InfiniBand support, update the kernel configuration
117*4882a593Smuzhiyun  to enable InfiniBand support [NOTE: the option name is misleading. Enabling
118*4882a593Smuzhiyun  InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)].
119*4882a593Smuzhiyun
120*4882a593Smuzhiyun  Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or
121*4882a593Smuzhiyun  iWARP adapter support (amso, cxgb3, etc.).
122*4882a593Smuzhiyun
123*4882a593Smuzhiyun  If you are using InfiniBand, be sure to enable IP-over-InfiniBand support.
124*4882a593Smuzhiyun
125*4882a593Smuzhiyun- Configure the NFS client and server
126*4882a593Smuzhiyun
127*4882a593Smuzhiyun  Your kernel configuration must also have NFS file system support and/or
128*4882a593Smuzhiyun  NFS server support enabled. These and other NFS related configuration
129*4882a593Smuzhiyun  options can be found under File Systems -> Network File Systems.
130*4882a593Smuzhiyun
131*4882a593Smuzhiyun- Build, install, reboot
132*4882a593Smuzhiyun
133*4882a593Smuzhiyun  The NFS/RDMA code will be enabled automatically if NFS and RDMA
134*4882a593Smuzhiyun  are turned on. The NFS/RDMA client and server are configured via the hidden
135*4882a593Smuzhiyun  SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The
136*4882a593Smuzhiyun  value of SUNRPC_XPRT_RDMA will be:
137*4882a593Smuzhiyun
138*4882a593Smuzhiyun    #. N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client
139*4882a593Smuzhiyun       and server will not be built
140*4882a593Smuzhiyun
141*4882a593Smuzhiyun    #. M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M,
142*4882a593Smuzhiyun       in this case the NFS/RDMA client and server will be built as modules
143*4882a593Smuzhiyun
144*4882a593Smuzhiyun    #. Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client
145*4882a593Smuzhiyun       and server will be built into the kernel
146*4882a593Smuzhiyun
147*4882a593Smuzhiyun  Therefore, if you have followed the steps above and turned no NFS and RDMA,
148*4882a593Smuzhiyun  the NFS/RDMA client and server will be built.
149*4882a593Smuzhiyun
150*4882a593Smuzhiyun  Build a new kernel, install it, boot it.
151*4882a593Smuzhiyun
152*4882a593SmuzhiyunCheck RDMA and NFS Setup
153*4882a593Smuzhiyun========================
154*4882a593Smuzhiyun
155*4882a593SmuzhiyunBefore configuring the NFS/RDMA software, it is a good idea to test
156*4882a593Smuzhiyunyour new kernel to ensure that the kernel is working correctly.
157*4882a593SmuzhiyunIn particular, it is a good idea to verify that the RDMA stack
158*4882a593Smuzhiyunis functioning as expected and standard NFS over TCP/IP and/or UDP/IP
159*4882a593Smuzhiyunis working properly.
160*4882a593Smuzhiyun
161*4882a593Smuzhiyun- Check RDMA Setup
162*4882a593Smuzhiyun
163*4882a593Smuzhiyun  If you built the RDMA components as modules, load them at
164*4882a593Smuzhiyun  this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel
165*4882a593Smuzhiyun  card:
166*4882a593Smuzhiyun
167*4882a593Smuzhiyun  .. code-block:: sh
168*4882a593Smuzhiyun
169*4882a593Smuzhiyun    $ modprobe ib_mthca
170*4882a593Smuzhiyun    $ modprobe ib_ipoib
171*4882a593Smuzhiyun
172*4882a593Smuzhiyun  If you are using InfiniBand, make sure there is a Subnet Manager (SM)
173*4882a593Smuzhiyun  running on the network. If your IB switch has an embedded SM, you can
174*4882a593Smuzhiyun  use it. Otherwise, you will need to run an SM, such as OpenSM, on one
175*4882a593Smuzhiyun  of your end nodes.
176*4882a593Smuzhiyun
177*4882a593Smuzhiyun  If an SM is running on your network, you should see the following:
178*4882a593Smuzhiyun
179*4882a593Smuzhiyun  .. code-block:: sh
180*4882a593Smuzhiyun
181*4882a593Smuzhiyun    $ cat /sys/class/infiniband/driverX/ports/1/state
182*4882a593Smuzhiyun    4: ACTIVE
183*4882a593Smuzhiyun
184*4882a593Smuzhiyun  where driverX is mthca0, ipath5, ehca3, etc.
185*4882a593Smuzhiyun
186*4882a593Smuzhiyun  To further test the InfiniBand software stack, use IPoIB (this
187*4882a593Smuzhiyun  assumes you have two IB hosts named host1 and host2):
188*4882a593Smuzhiyun
189*4882a593Smuzhiyun  .. code-block:: sh
190*4882a593Smuzhiyun
191*4882a593Smuzhiyun    host1$ ip link set dev ib0 up
192*4882a593Smuzhiyun    host1$ ip address add dev ib0 a.b.c.x
193*4882a593Smuzhiyun    host2$ ip link set dev ib0 up
194*4882a593Smuzhiyun    host2$ ip address add dev ib0 a.b.c.y
195*4882a593Smuzhiyun    host1$ ping a.b.c.y
196*4882a593Smuzhiyun    host2$ ping a.b.c.x
197*4882a593Smuzhiyun
198*4882a593Smuzhiyun  For other device types, follow the appropriate procedures.
199*4882a593Smuzhiyun
200*4882a593Smuzhiyun- Check NFS Setup
201*4882a593Smuzhiyun
202*4882a593Smuzhiyun  For the NFS components enabled above (client and/or server),
203*4882a593Smuzhiyun  test their functionality over standard Ethernet using TCP/IP or UDP/IP.
204*4882a593Smuzhiyun
205*4882a593SmuzhiyunNFS/RDMA Setup
206*4882a593Smuzhiyun==============
207*4882a593Smuzhiyun
208*4882a593SmuzhiyunWe recommend that you use two machines, one to act as the client and
209*4882a593Smuzhiyunone to act as the server.
210*4882a593Smuzhiyun
211*4882a593SmuzhiyunOne time configuration:
212*4882a593Smuzhiyun-----------------------
213*4882a593Smuzhiyun
214*4882a593Smuzhiyun- On the server system, configure the /etc/exports file and start the NFS/RDMA server.
215*4882a593Smuzhiyun
216*4882a593Smuzhiyun  Exports entries with the following formats have been tested::
217*4882a593Smuzhiyun
218*4882a593Smuzhiyun  /vol0   192.168.0.47(fsid=0,rw,async,insecure,no_root_squash)
219*4882a593Smuzhiyun  /vol0   192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash)
220*4882a593Smuzhiyun
221*4882a593Smuzhiyun  The IP address(es) is(are) the client's IPoIB address for an InfiniBand
222*4882a593Smuzhiyun  HCA or the client's iWARP address(es) for an RNIC.
223*4882a593Smuzhiyun
224*4882a593Smuzhiyun  .. note::
225*4882a593Smuzhiyun    The "insecure" option must be used because the NFS/RDMA client does
226*4882a593Smuzhiyun    not use a reserved port.
227*4882a593Smuzhiyun
228*4882a593SmuzhiyunEach time a machine boots:
229*4882a593Smuzhiyun--------------------------
230*4882a593Smuzhiyun
231*4882a593Smuzhiyun- Load and configure the RDMA drivers
232*4882a593Smuzhiyun
233*4882a593Smuzhiyun  For InfiniBand using a Mellanox adapter:
234*4882a593Smuzhiyun
235*4882a593Smuzhiyun  .. code-block:: sh
236*4882a593Smuzhiyun
237*4882a593Smuzhiyun    $ modprobe ib_mthca
238*4882a593Smuzhiyun    $ modprobe ib_ipoib
239*4882a593Smuzhiyun    $ ip li set dev ib0 up
240*4882a593Smuzhiyun    $ ip addr add dev ib0 a.b.c.d
241*4882a593Smuzhiyun
242*4882a593Smuzhiyun  .. note::
243*4882a593Smuzhiyun    Please use unique addresses for the client and server!
244*4882a593Smuzhiyun
245*4882a593Smuzhiyun- Start the NFS server
246*4882a593Smuzhiyun
247*4882a593Smuzhiyun  If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
248*4882a593Smuzhiyun  kernel config), load the RDMA transport module:
249*4882a593Smuzhiyun
250*4882a593Smuzhiyun  .. code-block:: sh
251*4882a593Smuzhiyun
252*4882a593Smuzhiyun    $ modprobe svcrdma
253*4882a593Smuzhiyun
254*4882a593Smuzhiyun  Regardless of how the server was built (module or built-in), start the
255*4882a593Smuzhiyun  server:
256*4882a593Smuzhiyun
257*4882a593Smuzhiyun  .. code-block:: sh
258*4882a593Smuzhiyun
259*4882a593Smuzhiyun    $ /etc/init.d/nfs start
260*4882a593Smuzhiyun
261*4882a593Smuzhiyun  or
262*4882a593Smuzhiyun
263*4882a593Smuzhiyun  .. code-block:: sh
264*4882a593Smuzhiyun
265*4882a593Smuzhiyun    $ service nfs start
266*4882a593Smuzhiyun
267*4882a593Smuzhiyun  Instruct the server to listen on the RDMA transport:
268*4882a593Smuzhiyun
269*4882a593Smuzhiyun  .. code-block:: sh
270*4882a593Smuzhiyun
271*4882a593Smuzhiyun    $ echo rdma 20049 > /proc/fs/nfsd/portlist
272*4882a593Smuzhiyun
273*4882a593Smuzhiyun- On the client system
274*4882a593Smuzhiyun
275*4882a593Smuzhiyun  If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
276*4882a593Smuzhiyun  kernel config), load the RDMA client module:
277*4882a593Smuzhiyun
278*4882a593Smuzhiyun  .. code-block:: sh
279*4882a593Smuzhiyun
280*4882a593Smuzhiyun    $ modprobe xprtrdma.ko
281*4882a593Smuzhiyun
282*4882a593Smuzhiyun  Regardless of how the client was built (module or built-in), use this
283*4882a593Smuzhiyun  command to mount the NFS/RDMA server:
284*4882a593Smuzhiyun
285*4882a593Smuzhiyun  .. code-block:: sh
286*4882a593Smuzhiyun
287*4882a593Smuzhiyun    $ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt
288*4882a593Smuzhiyun
289*4882a593Smuzhiyun  To verify that the mount is using RDMA, run "cat /proc/mounts" and check
290*4882a593Smuzhiyun  the "proto" field for the given mount.
291*4882a593Smuzhiyun
292*4882a593Smuzhiyun  Congratulations! You're using NFS/RDMA!
293