1. 30 1月, 2015 9 次提交
  2. 26 11月, 2014 6 次提交
    • C
      xprtrdma: Display async errors · 7ff11de1
      Chuck Lever 提交于
      An async error upcall is a hard error, and should be reported in
      the system log.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      7ff11de1
    • C
      xprtrdma: Re-write rpcrdma_flush_cqs() · 5c166bef
      Chuck Lever 提交于
      Currently rpcrdma_flush_cqs() attempts to avoid code duplication,
      and simply invokes rpcrdma_recvcq_upcall and rpcrdma_sendcq_upcall.
      
      1. rpcrdma_flush_cqs() can run concurrently with provider upcalls.
         Both flush_cqs() and the upcalls were invoking ib_poll_cq() in
         different threads using the same wc buffers (ep->rep_recv_wcs
         and ep->rep_send_wcs), added by commit 1c00dd07 ("xprtrmda:
         Reduce calls to ib_poll_cq() in completion handlers").
      
         During transport disconnect processing, this sometimes resulted
         in the same reply getting added to the rpcrdma_tasklets_g list
         more than once, which corrupted the list.
      
      2. The upcall functions drain only a limited number of CQEs,
         thanks to the poll budget added by commit 8301a2c0
         ("xprtrdma: Limit work done by completion handler").
      
      Fixes: a7bc211a ("xprtrdma: On disconnect, don't ignore ... ")
      BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=276Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      5c166bef
    • C
      xprtrdma: Refactor tasklet scheduling · f1a03b76
      Chuck Lever 提交于
      Restore the separate function that schedules the reply handling
      tasklet. I need to call it from two different paths.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      f1a03b76
    • C
      xprtrdma: unmap all FMRs during transport disconnect · 467c9674
      Chuck Lever 提交于
      When using RPCRDMA_MTHCAFMR memory registration, after a few
      transport disconnect / reconnect cycles, ib_map_phys_fmr() starts to
      return EINVAL because the provider has exhausted its map pool.
      
      Make sure that all FMRs are unmapped during transport disconnect,
      and that ->send_request remarshals them during an RPC retransmit.
      This resets the transport's MRs to ensure that none are leaked
      during a disconnect.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      467c9674
    • C
      xprtrdma: Cap req_cqinit · e7104a2a
      Chuck Lever 提交于
      Recent work made FRMR registration and invalidation completions
      unsignaled. This greatly reduces the adapter interrupt rate.
      
      Every so often, however, a posted send Work Request is allowed to
      signal. Otherwise, the provider's Work Queue will wrap and the
      workload will hang.
      
      The number of Work Requests that are allowed to remain unsignaled is
      determined by the value of req_cqinit. Currently, this is set to the
      size of the send Work Queue divided by two, minus 1.
      
      For FRMR, the send Work Queue is the maximum number of concurrent
      RPCs (currently 32) times the maximum number of Work Requests an
      RPC might use (currently 7, though some adapters may need more).
      
      For mlx4, this is 224 entries. This leaves completion signaling
      disabled for 111 send Work Requests.
      
      Some providers hold back dispatching Work Requests until a CQE is
      generated.  If completions are disabled, then no CQEs are generated
      for quite some time, and that can stall the Work Queue.
      
      I've seen this occur running xfstests generic/113 over NFSv4, where
      eventually, posting a FAST_REG_MR Work Request fails with -ENOMEM
      because the Work Queue has overflowed. The connection is dropped
      and re-established.
      
      Cap the rep_cqinit setting so completions are not left turned off
      for too long.
      
      BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=269Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      e7104a2a
    • C
      xprtrdma: Return an errno from rpcrdma_register_external() · 92b98361
      Chuck Lever 提交于
      The RPC/RDMA send_request method and the chunk registration code
      expects an errno from the registration function. This allows
      the upper layers to distinguish between a recoverable failure
      (for example, temporary memory exhaustion) and a hard failure
      (for example, a bug in the registration logic).
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      92b98361
  3. 25 11月, 2014 1 次提交
  4. 01 8月, 2014 19 次提交
  5. 23 7月, 2014 1 次提交
    • Y
      xprtrdma: Fix DMA-API-DEBUG warning by checking dma_map result · bf858ab0
      Yan Burman 提交于
      Fix the following warning when DMA-API debug is enabled by checking ib_dma_map_single result:
      [ 1455.345548] ------------[ cut here ]------------
      [ 1455.346863] WARNING: CPU: 3 PID: 3929 at /home/yanb/kernel/net-next/lib/dma-debug.c:1140 check_unmap+0x4e5/0x990()
      [ 1455.349350] mlx4_core 0000:00:07.0: DMA-API: device driver failed to check map error[device address=0x000000007c9f2090] [size=2656 bytes] [mapped as single]
      [ 1455.349350] Modules linked in: xprtrdma netconsole configfs nfsv3 nfs_acl ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm autofs4 auth_rpcgss oid_registry nfsv4 nfs fscache lockd sunrpc dm_mirror dm_region_hash dm_log microcode pcspkr mlx4_ib ib_sa ib_mad ib_core ib_addr mlx4_en ipv6 ptp pps_core vxlan mlx4_core virtio_balloon cirrus ttm drm_kms_helper drm sysimgblt sysfillrect syscopyarea i2c_piix4 i2c_core button ext3 jbd virtio_blk virtio_net virtio_pci virtio_ring virtio uhci_hcd ata_generic ata_piix libata
      [ 1455.349350] CPU: 3 PID: 3929 Comm: mount.nfs Not tainted 3.15.0-rc1-dbg+ #13
      [ 1455.349350] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      [ 1455.349350]  0000000000000474 ffff880069dcf628 ffffffff8151c341 ffffffff817b69d8
      [ 1455.349350]  ffff880069dcf678 ffff880069dcf668 ffffffff8105b5fc 0000000069dcf658
      [ 1455.349350]  ffff880069dcf778 ffff88007b0c9f00 ffffffff8255ec40 0000000000000a60
      [ 1455.349350] Call Trace:
      [ 1455.349350]  [<ffffffff8151c341>] dump_stack+0x52/0x81
      [ 1455.349350]  [<ffffffff8105b5fc>] warn_slowpath_common+0x8c/0xc0
      [ 1455.349350]  [<ffffffff8105b6e6>] warn_slowpath_fmt+0x46/0x50
      [ 1455.349350]  [<ffffffff812e6305>] check_unmap+0x4e5/0x990
      [ 1455.349350]  [<ffffffff81521fb0>] ? _raw_spin_unlock_irq+0x30/0x60
      [ 1455.349350]  [<ffffffff812e6a0a>] debug_dma_unmap_page+0x5a/0x60
      [ 1455.349350]  [<ffffffffa0389583>] rpcrdma_deregister_internal+0xb3/0xd0 [xprtrdma]
      [ 1455.349350]  [<ffffffffa038a639>] rpcrdma_buffer_destroy+0x69/0x170 [xprtrdma]
      [ 1455.349350]  [<ffffffffa03872ff>] xprt_rdma_destroy+0x3f/0xb0 [xprtrdma]
      [ 1455.349350]  [<ffffffffa04a95ff>] xprt_destroy+0x6f/0x80 [sunrpc]
      [ 1455.349350]  [<ffffffffa04a9625>] xprt_put+0x15/0x20 [sunrpc]
      [ 1455.349350]  [<ffffffffa04a899a>] rpc_free_client+0x8a/0xe0 [sunrpc]
      [ 1455.349350]  [<ffffffffa04a8a58>] rpc_release_client+0x68/0xa0 [sunrpc]
      [ 1455.349350]  [<ffffffffa04a9060>] rpc_shutdown_client+0xb0/0xc0 [sunrpc]
      [ 1455.349350]  [<ffffffffa04a8f5d>] ? rpc_ping+0x5d/0x70 [sunrpc]
      [ 1455.349350]  [<ffffffffa04a91ab>] rpc_create_xprt+0xbb/0xd0 [sunrpc]
      [ 1455.349350]  [<ffffffffa04a9273>] rpc_create+0xb3/0x160 [sunrpc]
      [ 1455.349350]  [<ffffffff81129749>] ? __probe_kernel_read+0x69/0xb0
      [ 1455.349350]  [<ffffffffa053851c>] nfs_create_rpc_client+0xdc/0x100 [nfs]
      [ 1455.349350]  [<ffffffffa0538cfa>] nfs_init_client+0x3a/0x90 [nfs]
      [ 1455.349350]  [<ffffffffa05391c8>] nfs_get_client+0x478/0x5b0 [nfs]
      [ 1455.349350]  [<ffffffffa0538e50>] ? nfs_get_client+0x100/0x5b0 [nfs]
      [ 1455.349350]  [<ffffffff81172c6d>] ? kmem_cache_alloc_trace+0x24d/0x260
      [ 1455.349350]  [<ffffffffa05393f3>] nfs_create_server+0xf3/0x4c0 [nfs]
      [ 1455.349350]  [<ffffffffa0545ff0>] ? nfs_request_mount+0xf0/0x1a0 [nfs]
      [ 1455.349350]  [<ffffffffa031c0c3>] nfs3_create_server+0x13/0x30 [nfsv3]
      [ 1455.349350]  [<ffffffffa0546293>] nfs_try_mount+0x1f3/0x230 [nfs]
      [ 1455.349350]  [<ffffffff8108ea21>] ? get_parent_ip+0x11/0x50
      [ 1455.349350]  [<ffffffff812d6343>] ? __this_cpu_preempt_check+0x13/0x20
      [ 1455.349350]  [<ffffffff810d632b>] ? try_module_get+0x6b/0x190
      [ 1455.349350]  [<ffffffffa05449f7>] nfs_fs_mount+0x187/0x9d0 [nfs]
      [ 1455.349350]  [<ffffffffa0545940>] ? nfs_clone_super+0x140/0x140 [nfs]
      [ 1455.349350]  [<ffffffffa0543b20>] ? nfs_auth_info_match+0x40/0x40 [nfs]
      [ 1455.349350]  [<ffffffff8117e360>] mount_fs+0x20/0xe0
      [ 1455.349350]  [<ffffffff811a1c16>] vfs_kern_mount+0x76/0x160
      [ 1455.349350]  [<ffffffff811a29a8>] do_mount+0x428/0xae0
      [ 1455.349350]  [<ffffffff811a30f0>] SyS_mount+0x90/0xe0
      [ 1455.349350]  [<ffffffff8152af52>] system_call_fastpath+0x16/0x1b
      [ 1455.349350] ---[ end trace f1f31572972e211d ]---
      Signed-off-by: NYan Burman <yanb@mellanox.com>
      Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      bf858ab0
  6. 04 6月, 2014 4 次提交
    • C
      xprtrdma: Remove BUG_ON() call sites · c977dea2
      Chuck Lever 提交于
      If an error occurs in the marshaling logic, fail the RPC request
      being processed, but leave the client running.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      c977dea2
    • C
      xprtrdma: Remove Tavor MTU setting · 5bc4bc72
      Chuck Lever 提交于
      Clean up.  Remove HCA-specific clutter in xprtrdma, which is
      supposed to be device-independent.
      
      Hal Rosenstock <hal@dev.mellanox.co.il> observes:
      > Note that there is OpenSM option (enable_quirks) to return 1K MTU
      > in SA PathRecord responses for Tavor so that can be used for this.
      > The default setting for enable_quirks is FALSE so that would need
      > changing.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      5bc4bc72
    • C
      xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting · ec62f40d
      Chuck Lever 提交于
      Devesh Sharma <Devesh.Sharma@Emulex.Com> reports that after a
      disconnect, his HCA is failing to create a fresh QP, leaving
      ia_ri->ri_id->qp set to NULL. But xprtrdma still allows RPCs to
      wake up and post LOCAL_INV as they exit, causing an oops.
      
      rpcrdma_ep_connect() is allowing the wake-up by leaking the QP
      creation error code (-EPERM in this case) to the RPC client's
      generic layer. xprt_connect_status() does not recognize -EPERM, so
      it kills pending RPC tasks immediately rather than retrying the
      connect.
      
      Re-arrange the QP creation logic so that when it fails on reconnect,
      it leaves ->qp with the old QP rather than NULL.  If pending RPC
      tasks wake and exit, LOCAL_INV work requests will flush rather than
      oops.
      
      On initial connect, leaving ->qp == NULL is OK, since there are no
      pending RPCs that might use ->qp. But be sure not to try to destroy
      a NULL QP when rpcrdma_ep_connect() is retried.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      ec62f40d
    • C
      xprtrdma: Reduce the number of hardway buffer allocations · 65866f82
      Chuck Lever 提交于
      While marshaling an RPC/RDMA request, the inline_{rsize,wsize}
      settings determine whether an inline request is used, or whether
      read or write chunks lists are built. The current default value of
      these settings is 1024. Any RPC request smaller than 1024 bytes is
      sent to the NFS server completely inline.
      
      rpcrdma_buffer_create() allocates and pre-registers a set of RPC
      buffers for each transport instance, also based on the inline rsize
      and wsize settings.
      
      RPC/RDMA requests and replies are built in these buffers. However,
      if an RPC/RDMA request is expected to be larger than 1024, a buffer
      has to be allocated and registered for that RPC, and deregistered
      and released when the RPC is complete. This is known has a
      "hardway allocation."
      
      Since the introduction of NFSv4, the size of RPC requests has become
      larger, and hardway allocations are thus more frequent. Hardway
      allocations are significant overhead, and they waste the existing
      RPC buffers pre-allocated by rpcrdma_buffer_create().
      
      We'd like fewer hardway allocations.
      
      Increasing the size of the pre-registered buffers is the most direct
      way to do this. However, a blanket increase of the inline thresholds
      has interoperability consequences.
      
      On my 64-bit system, rpcrdma_buffer_create() requests roughly 7000
      bytes for each RPC request buffer, using kmalloc(). Due to internal
      fragmentation, this wastes nearly 1200 bytes because kmalloc()
      already returns an 8192-byte piece of memory for a 7000-byte
      allocation request, though the extra space remains unused.
      
      So let's round up the size of the pre-allocated buffers, and make
      use of the unused space in the kmalloc'd memory.
      
      This change reduces the amount of hardway allocated memory for an
      NFSv4 general connectathon run from 1322092 to 9472 bytes (99%).
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      65866f82