1. 01 4月, 2017 2 次提交
    • T
      nfs: flexfiles: fix kernel OOPS if MDS returns unsupported DS type · f17f8a14
      Tigran Mkrtchyan 提交于
      this fix aims to fix dereferencing of a mirror in an error state when MDS
      returns unsupported DS type (IOW, not v3), which causes the following oops:
      
      [  220.370709] BUG: unable to handle kernel NULL pointer dereference at 0000000000000065
      [  220.370842] IP: ff_layout_mirror_valid+0x2d/0x110 [nfs_layout_flexfiles]
      [  220.370920] PGD 0
      
      [  220.370972] Oops: 0000 [#1] SMP
      [  220.371013] Modules linked in: nfnetlink_queue nfnetlink_log bluetooth nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_raw ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_security ebtable_filter ebtables ip6table_filter ip6_tables binfmt_misc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel btrfs kvm arc4 snd_hda_codec_hdmi iwldvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate mac80211 xor uvcvideo
      [  220.371814]  videobuf2_vmalloc videobuf2_memops snd_hda_codec_idt mei_wdt videobuf2_v4l2 snd_hda_codec_generic iTCO_wdt ppdev videobuf2_core iTCO_vendor_support dell_rbtn dell_wmi iwlwifi sparse_keymap dell_laptop dell_smbios snd_hda_intel dcdbas videodev snd_hda_codec dell_smm_hwmon snd_hda_core media cfg80211 intel_uncore snd_hwdep raid6_pq snd_seq intel_rapl_perf snd_seq_device joydev i2c_i801 rfkill lpc_ich snd_pcm parport_pc mei_me parport snd_timer dell_smo8800 mei snd shpchp soundcore tpm_tis tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915 nouveau mxm_wmi ttm i2c_algo_bit drm_kms_helper crc32c_intel e1000e drm sdhci_pci firewire_ohci sdhci serio_raw mmc_core firewire_core ptp crc_itu_t pps_core wmi fjes video
      [  220.372568] CPU: 7 PID: 4988 Comm: cat Not tainted 4.10.5-200.fc25.x86_64 #1
      [  220.372647] Hardware name: Dell Inc. Latitude E6520/0J4TFW, BIOS A06 07/11/2011
      [  220.372729] task: ffff94791f6ea580 task.stack: ffffb72b88c0c000
      [  220.372802] RIP: 0010:ff_layout_mirror_valid+0x2d/0x110 [nfs_layout_flexfiles]
      [  220.372883] RSP: 0018:ffffb72b88c0f970 EFLAGS: 00010246
      [  220.372945] RAX: 0000000000000000 RBX: ffff9479015ca600 RCX: ffffffffffffffed
      [  220.373025] RDX: ffffffffffffffed RSI: ffff9479753dc980 RDI: 0000000000000000
      [  220.373104] RBP: ffffb72b88c0f988 R08: 000000000001c980 R09: ffffffffc0ea6112
      [  220.373184] R10: ffffef17477d9640 R11: ffff9479753dd6c0 R12: ffff9479211c7440
      [  220.373264] R13: ffff9478f45b7790 R14: 0000000000000001 R15: ffff9479015ca600
      [  220.373345] FS:  00007f555fa3e700(0000) GS:ffff9479753c0000(0000) knlGS:0000000000000000
      [  220.373435] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  220.373506] CR2: 0000000000000065 CR3: 0000000196044000 CR4: 00000000000406e0
      [  220.373586] Call Trace:
      [  220.373627]  nfs4_ff_layout_prepare_ds+0x5e/0x200 [nfs_layout_flexfiles]
      [  220.373708]  ff_layout_pg_init_read+0x81/0x160 [nfs_layout_flexfiles]
      [  220.373806]  __nfs_pageio_add_request+0x11f/0x4a0 [nfs]
      [  220.373886]  ? nfs_create_request.part.14+0x37/0x330 [nfs]
      [  220.373967]  nfs_pageio_add_request+0xb2/0x260 [nfs]
      [  220.374042]  readpage_async_filler+0xaf/0x280 [nfs]
      [  220.374103]  read_cache_pages+0xef/0x1b0
      [  220.374166]  ? nfs_read_completion+0x210/0x210 [nfs]
      [  220.374239]  nfs_readpages+0x129/0x200 [nfs]
      [  220.374293]  __do_page_cache_readahead+0x1d0/0x2f0
      [  220.374352]  ondemand_readahead+0x17d/0x2a0
      [  220.374403]  page_cache_sync_readahead+0x2e/0x50
      [  220.374460]  generic_file_read_iter+0x6c8/0x950
      [  220.374532]  ? nfs_mapping_need_revalidate_inode+0x17/0x40 [nfs]
      [  220.374617]  nfs_file_read+0x6e/0xc0 [nfs]
      [  220.374670]  __vfs_read+0xe2/0x150
      [  220.374715]  vfs_read+0x96/0x130
      [  220.374758]  SyS_read+0x55/0xc0
      [  220.374801]  entry_SYSCALL_64_fastpath+0x1a/0xa9
      [  220.374856] RIP: 0033:0x7f555f570bd0
      [  220.374900] RSP: 002b:00007ffeb73e1b38 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
      [  220.374986] RAX: ffffffffffffffda RBX: 00007f555f839ae0 RCX: 00007f555f570bd0
      [  220.375066] RDX: 0000000000020000 RSI: 00007f555fa41000 RDI: 0000000000000003
      [  220.375145] RBP: 0000000000021010 R08: ffffffffffffffff R09: 0000000000000000
      [  220.375226] R10: 00007f555fa40010 R11: 0000000000000246 R12: 0000000000022000
      [  220.375305] R13: 0000000000021010 R14: 0000000000001000 R15: 0000000000002710
      [  220.375386] Code: 66 66 90 55 48 89 e5 41 54 53 49 89 fc 48 83 ec 08 48 85 f6 74 2e 48 8b 4e 30 48 89 f3 48 81 f9 00 f0 ff ff 77 1e 48 85 c9 74 15 <48> 83 79 78 00 b8 01 00 00 00 74 2c 48 83 c4 08 5b 41 5c 5d c3
      [  220.375653] RIP: ff_layout_mirror_valid+0x2d/0x110 [nfs_layout_flexfiles] RSP: ffffb72b88c0f970
      [  220.375748] CR2: 0000000000000065
      [  220.403538] ---[ end trace bcdca752211b7da9 ]---
      Signed-off-by: NTigran Mkrtchyan <tigran.mkrtchyan@desy.de>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      f17f8a14
    • O
      NFSv4.1 fix infinite loop on IO BAD_STATEID error · 0e3d3e5d
      Olga Kornievskaia 提交于
      Commit 63d63cbf "NFSv4.1: Don't recheck delegations that
      have already been checked" introduced a regression where when a
      client received BAD_STATEID error it would not send any TEST_STATEID
      and instead go into an infinite loop of resending the IO that caused
      the BAD_STATEID.
      
      Fixes: 63d63cbf ("NFSv4.1: Don't recheck delegations that have already been checked")
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Cc: stable@vger.kernel.org # 4.9+
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      0e3d3e5d
  2. 31 3月, 2017 1 次提交
  3. 28 3月, 2017 4 次提交
  4. 18 3月, 2017 7 次提交
    • W
      pNFS/flexfiles: never nfs4_mark_deviceid_unavailable · da066f3f
      Weston Andros Adamson 提交于
      The flexfiles layout should never mark a device unavailable.
      
      Move nfs4_mark_deviceid_unavailable out of nfs4_pnfs_ds_connect and call
      directly from files layout where it's still needed.
      
      The flexfiles driver still handles marked devices in error paths, but will
      now print a rate limited warning.
      Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      da066f3f
    • W
      pNFS: return status from nfs4_pnfs_ds_connect · a33e4b03
      Weston Andros Adamson 提交于
      The nfs4_pnfs_ds_connect path can call rpc_create which can fail or it
      can wait on another context to reach the same failure.
      
      This checks that the rpc_create succeeded and returns the error to the
      caller.
      
      When an error is returned, both the files and flexfiles layouts will return
      NULL from _prepare_ds(). The flexfiles layout will also return the layout
      with the error NFS4ERR_NXIO.
      Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      a33e4b03
    • O
      NFSv4.1 respect server's max size in CREATE_SESSION · 03385332
      Olga Kornievskaia 提交于
      Currently client doesn't respect max sizes server returns in CREATE_SESSION.
      nfs4_session_set_rwsize() gets called and server->rsize, server->wsize are 0
      so they never get set to the sizes returned by the server.
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      03385332
    • O
      NFS prevent double free in async nfs4_exchange_id · 63513232
      Olga Kornievskaia 提交于
      Since rpc_task is async, the release function should be called which
      will free the impl_id, scope, and owner.
      
      Trond pointed at 2 more problems:
      -- use of client pointer after free in the nfs4_exchangeid_release() function
      -- cl_count mismatch if rpc_run_task() isn't run
      
      Fixes: 8d89bd70 ("NFS setup async exchange_id")
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Cc: stable@vger.kernel.org # 4.9
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      63513232
    • J
      nfs: make nfs4_cb_sv_ops static · 05fae7bb
      Jason Yan 提交于
      Fixes the following sparse warning:
      
      fs/nfs/callback.c:235:21: warning: symbol 'nfs4_cb_sv_ops' was not
      declared. Should it be static?
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      05fae7bb
    • C
      xprtrdma: Squelch kbuild sparse complaint · eed50879
      Chuck Lever 提交于
      New complaint from kbuild for 4.9.y:
      
      net/sunrpc/xprtrdma/verbs.c:489:19: sparse: incompatible types in
          comparison expression (different type sizes)
      
      verbs.c:
      489	max_sge = min(ia->ri_device->attrs.max_sge, RPCRDMA_MAX_SEND_SGES);
      
      I can't reproduce this running sparse here. Likewise, "make W=1
      net/sunrpc/xprtrdma/verbs.o" never indicated any issue.
      
      A little poking suggests that because the range of its values is
      small, gcc can make the actual width of RPCRDMA_MAX_SEND_SGES
      smaller than the width of an unsigned integer.
      
      Fixes: 16f906d6 ("xprtrdma: Reduce required number of send SGEs")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Cc: stable@kernel.org
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      eed50879
    • K
      NFS: fix the fault nrequests decreasing for nfs_inode COPY · 38a33101
      Kinglong Mee 提交于
      The nfs_commit_file for NFSv4.2's COPY operation goes through
      the commit path for normal WRITE, but without increase nrequests,
      so, the nrequests decreased in nfs_commit_release_pages is fault.
      After that, the nrequests will be wrong.
      
      [ 5670.299881] ------------[ cut here ]------------
      [ 5670.300295] WARNING: CPU: 0 PID: 27656 at fs/nfs/inode.c:127 nfs_clear_inode+0x66/0x90 [nfs]
      [ 5670.300558] Modules linked in: nfsv4(E) nfs(E) fscache(E) tun bridge stp llc fuse ip_set nfnetlink vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event ppdev f2fs coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_ens1371 intel_rapl_perf gameport snd_ac97_codec vmw_balloon ac97_bus snd_seq snd_pcm joydev snd_rawmidi snd_timer snd_seq_device snd soundcore nfit parport_pc parport acpi_cpufreq tpm_tis tpm_tis_core tpm i2c_piix4 vmw_vmci shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c vmwgfx drm_kms_helper ttm drm e1000 crc32c_intel mptspi scsi_transport_spi serio_raw mptscsih mptbase ata_generic pata_acpi fjes [last unloaded: fscache]
      [ 5670.302925] CPU: 0 PID: 27656 Comm: umount.nfs4 Tainted: G        W   E   4.11.0-rc1+ #519
      [ 5670.303292] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
      [ 5670.304094] Call Trace:
      [ 5670.304510]  dump_stack+0x63/0x86
      [ 5670.304917]  __warn+0xcb/0xf0
      [ 5670.305276]  warn_slowpath_null+0x1d/0x20
      [ 5670.305661]  nfs_clear_inode+0x66/0x90 [nfs]
      [ 5670.306093]  nfs4_evict_inode+0x61/0x70 [nfsv4]
      [ 5670.306480]  evict+0xbb/0x1c0
      [ 5670.306888]  dispose_list+0x4d/0x70
      [ 5670.307233]  evict_inodes+0x178/0x1a0
      [ 5670.307579]  generic_shutdown_super+0x44/0xf0
      [ 5670.307985]  nfs_kill_super+0x21/0x40 [nfs]
      [ 5670.308325]  deactivate_locked_super+0x43/0x70
      [ 5670.308698]  deactivate_super+0x5a/0x60
      [ 5670.309036]  cleanup_mnt+0x3f/0x90
      [ 5670.309407]  __cleanup_mnt+0x12/0x20
      [ 5670.309837]  task_work_run+0x80/0xa0
      [ 5670.310162]  exit_to_usermode_loop+0x89/0x90
      [ 5670.310497]  syscall_return_slowpath+0xaa/0xb0
      [ 5670.310875]  entry_SYSCALL_64_fastpath+0xa7/0xa9
      [ 5670.311197] RIP: 0033:0x7f1bb3617fe7
      [ 5670.311545] RSP: 002b:00007ffecbabb828 EFLAGS: 00000206 ORIG_RAX: 00000000000000a6
      [ 5670.311906] RAX: 0000000000000000 RBX: 0000000001dca1f0 RCX: 00007f1bb3617fe7
      [ 5670.312239] RDX: 000000000000000c RSI: 0000000000000001 RDI: 0000000001dc83c0
      [ 5670.312653] RBP: 0000000001dc83c0 R08: 0000000000000001 R09: 0000000000000000
      [ 5670.312998] R10: 0000000000000755 R11: 0000000000000206 R12: 00007ffecbabc66a
      [ 5670.313335] R13: 0000000001dc83a0 R14: 0000000000000000 R15: 0000000000000000
      [ 5670.313758] ---[ end trace bf4bfe7764e4eb40 ]---
      
      Cc: linux-kernel@vger.kernel.org
      Fixes: 67911c8f ("NFS: Add nfs_commit_file()")
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Cc: stable@vger.kernel.org # 4.7+
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      38a33101
  5. 13 3月, 2017 2 次提交
    • K
      NFSv4: fix a reference leak caused WARNING messages · 366a1569
      Kinglong Mee 提交于
      Because nfs4_opendata_access() has close the state when access is denied,
      so the state isn't leak.
      Rather than revert the commit a974deee, I'd like clean the strange state close.
      
      [ 1615.094218] ------------[ cut here ]------------
      [ 1615.094607] WARNING: CPU: 0 PID: 23702 at lib/list_debug.c:31 __list_add_valid+0x8e/0xa0
      [ 1615.094913] list_add double add: new=ffff9d7901d9f608, prev=ffff9d7901d9f608, next=ffff9d7901ee8dd0.
      [ 1615.095458] Modules linked in: nfsv4(E) nfs(E) nfsd(E) tun bridge stp llc fuse ip_set nfnetlink vmw_vsock_vmci_transport vsock f2fs snd_seq_midi snd_seq_midi_event fscrypto coretemp ppdev crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_rapl_perf vmw_balloon snd_ens1371 joydev gameport snd_ac97_codec ac97_bus snd_seq snd_pcm snd_rawmidi snd_timer snd_seq_device snd soundcore nfit parport_pc parport acpi_cpufreq tpm_tis tpm_tis_core tpm i2c_piix4 vmw_vmci shpchp auth_rpcgss nfs_acl lockd(E) grace sunrpc(E) xfs libcrc32c vmwgfx drm_kms_helper ttm drm crc32c_intel mptspi e1000 serio_raw scsi_transport_spi mptscsih mptbase ata_generic pata_acpi fjes [last unloaded: nfs]
      [ 1615.097663] CPU: 0 PID: 23702 Comm: fstest Tainted: G        W   E   4.11.0-rc1+ #517
      [ 1615.098015] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
      [ 1615.098807] Call Trace:
      [ 1615.099183]  dump_stack+0x63/0x86
      [ 1615.099578]  __warn+0xcb/0xf0
      [ 1615.099967]  warn_slowpath_fmt+0x5f/0x80
      [ 1615.100370]  __list_add_valid+0x8e/0xa0
      [ 1615.100760]  nfs4_put_state_owner+0x75/0xc0 [nfsv4]
      [ 1615.101136]  __nfs4_close+0x109/0x140 [nfsv4]
      [ 1615.101524]  nfs4_close_state+0x15/0x20 [nfsv4]
      [ 1615.101949]  nfs4_close_context+0x21/0x30 [nfsv4]
      [ 1615.102691]  __put_nfs_open_context+0xb8/0x110 [nfs]
      [ 1615.103155]  put_nfs_open_context+0x10/0x20 [nfs]
      [ 1615.103586]  nfs4_file_open+0x13b/0x260 [nfsv4]
      [ 1615.103978]  do_dentry_open+0x20a/0x2f0
      [ 1615.104369]  ? nfs4_copy_file_range+0x30/0x30 [nfsv4]
      [ 1615.104739]  vfs_open+0x4c/0x70
      [ 1615.105106]  ? may_open+0x5a/0x100
      [ 1615.105469]  path_openat+0x623/0x1420
      [ 1615.105823]  do_filp_open+0x91/0x100
      [ 1615.106174]  ? __alloc_fd+0x3f/0x170
      [ 1615.106568]  do_sys_open+0x130/0x220
      [ 1615.106920]  ? __put_cred+0x3d/0x50
      [ 1615.107256]  SyS_open+0x1e/0x20
      [ 1615.107588]  entry_SYSCALL_64_fastpath+0x1a/0xa9
      [ 1615.107922] RIP: 0033:0x7fab599069b0
      [ 1615.108247] RSP: 002b:00007ffcf0600d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
      [ 1615.108575] RAX: ffffffffffffffda RBX: 00007fab59bcfae0 RCX: 00007fab599069b0
      [ 1615.108896] RDX: 0000000000000200 RSI: 0000000000000200 RDI: 00007ffcf060255e
      [ 1615.109211] RBP: 0000000000040010 R08: 0000000000000000 R09: 0000000000000016
      [ 1615.109515] R10: 00000000000006a1 R11: 0000000000000246 R12: 0000000000041000
      [ 1615.109806] R13: 0000000000040010 R14: 0000000000001000 R15: 0000000000002710
      [ 1615.110152] ---[ end trace 96ed63b1306bf2f3 ]---
      
      Fixes: a974deee ("NFSv4: Fix memory and state leak in...")
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      366a1569
    • K
      nfs4: fix a typo of NFS_ATTR_FATTR_GROUP_NAME · 6f1f6220
      Kinglong Mee 提交于
      This typo cause a memory leak, and a bad client's group id.
      unreferenced object 0xffff96d8073998d0 (size 8):
        comm "kworker/0:3", pid 34224, jiffies 4295361338 (age 761.752s)
        hex dump (first 8 bytes):
          30 00 39 07 d8 96 ff ff                          0.9.....
        backtrace:
          [<ffffffffb883212a>] kmemleak_alloc+0x4a/0xa0
          [<ffffffffb8237bc0>] __kmalloc+0x140/0x220
          [<ffffffffc05c921c>] xdr_stream_decode_string_dup+0x7c/0x110 [sunrpc]
          [<ffffffffc08edcf0>] decode_getfattr_attrs+0x940/0x1630 [nfsv4]
          [<ffffffffc08eea7b>] decode_getfattr_generic.constprop.108+0x9b/0x100 [nfsv4]
          [<ffffffffc08eebaf>] nfs4_xdr_dec_open+0xcf/0x100 [nfsv4]
          [<ffffffffc05bf9c7>] rpcauth_unwrap_resp+0xa7/0xe0 [sunrpc]
          [<ffffffffc05afc70>] call_decode+0x1e0/0x810 [sunrpc]
          [<ffffffffc05bc64d>] __rpc_execute+0x8d/0x420 [sunrpc]
          [<ffffffffc05bc9f2>] rpc_async_schedule+0x12/0x20 [sunrpc]
          [<ffffffffb80bb077>] process_one_work+0x197/0x430
          [<ffffffffb80bb35e>] worker_thread+0x4e/0x4a0
          [<ffffffffb80c1d41>] kthread+0x101/0x140
          [<ffffffffb8839a5c>] ret_from_fork+0x2c/0x40
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      Fixes: 686a816a ("NFSv4: Clean up owner/group attribute decode")
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      6f1f6220
  6. 24 2月, 2017 2 次提交
  7. 23 2月, 2017 2 次提交
  8. 22 2月, 2017 9 次提交
  9. 21 2月, 2017 1 次提交
  10. 14 2月, 2017 1 次提交
  11. 11 2月, 2017 9 次提交
    • C
      sunrpc: Allow xprt->ops->timer method to sleep · b977b644
      Chuck Lever 提交于
      The transport lock is needed to protect the xprt_adjust_cwnd() call
      in xs_udp_timer, but it is not necessary for accessing the
      rq_reply_bytes_recvd or tk_status fields. It is correct to sublimate
      the lock into UDP's xs_udp_timer method, where it is required.
      
      The ->timer method has to take the transport lock if needed, but it
      can now sleep safely, or even call back into the RPC scheduler.
      
      This is more a clean-up than a fix, but the "issue" was introduced
      by my transport switch patches back in 2005.
      
      Fixes: 46c0ee8b ("RPC: separate xprt_timer implementations")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      b977b644
    • C
      xprtrdma: Refactor management of mw_list field · 9a5c63e9
      Chuck Lever 提交于
      Clean up some duplicate code.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      9a5c63e9
    • C
      xprtrdma: Handle stale connection rejection · 0a90487b
      Chuck Lever 提交于
      A server rejects a connection attempt with STALE_CONNECTION when a
      client attempts to connect to a working remote service, but uses a
      QPN and GUID that corresponds to an old connection that was
      abandoned. This might occur after a client crashes and restarts.
      
      Fix rpcrdma_conn_upcall() to distinguish between a normal rejection
      and rejection of stale connection parameters.
      
      As an additional clean-up, remove the code that retries the
      connection attempt with different ORD/IRD values. Code audit of
      other ULP initiators shows no similar special case handling of
      initiator_depth or responder_resources.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      0a90487b
    • C
      xprtrdma: Properly recover FRWRs with in-flight FASTREG WRs · 18c0fb31
      Chuck Lever 提交于
      Sriharsha (sriharsha.basavapatna@broadcom.com) reports an occasional
      double DMA unmap of an FRWR MR when a connection is lost. I see one
      way this can happen.
      
      When a request requires more than one segment or chunk,
      rpcrdma_marshal_req loops, invoking ->frwr_op_map for each segment
      (MR) in each chunk. Each call posts a FASTREG Work Request to
      register one MR.
      
      Now suppose that the transport connection is lost part-way through
      marshaling this request. As part of recovering and resetting that
      req, rpcrdma_marshal_req invokes ->frwr_op_unmap_safe, which hands
      all the req's registered FRWRs to the MR recovery thread.
      
      But note: FRWR registration is asynchronous. So it's possible that
      some of these "already registered" FRWRs are fully registered, and
      some are still waiting for their FASTREG WR to complete.
      
      When the connection is lost, the "already registered" frmrs are
      marked FRMR_IS_VALID, and the "still waiting" WRs flush. Then
      frwr_wc_fastreg marks these frmrs FRMR_FLUSHED_FR.
      
      But thanks to ->frwr_op_unmap_safe, the MR recovery thread is doing
      an unreg / alloc_mr, a DMA unmap, and marking each of these frwrs
      FRMR_IS_INVALID, at the same time frwr_wc_fastreg might be running.
      
      - If the recovery thread runs last, then the frmr is marked
      FRMR_IS_INVALID, and life continues.
      
      - If frwr_wc_fastreg runs last, the frmr is marked FRMR_FLUSHED_FR,
      but the recovery thread has already DMA unmapped that MR. When
      ->frwr_op_map later re-uses this frmr, it sees it is not marked
      FRMR_IS_INVALID, and tries to recover it before using it, resulting
      in a second DMA unmap of the same MR.
      
      The fix is to guarantee in-flight FASTREG WRs have flushed before MR
      recovery runs on those FRWRs. Thus we depend on ro_unmap_safe
      (called from xprt_rdma_send_request on retransmit, or from
      xprt_rdma_free) to clean up old registrations as needed.
      Reported-by: NSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      18c0fb31
    • C
      xprtrdma: Shrink send SGEs array · c6f5b47f
      Chuck Lever 提交于
      We no longer need to accommodate an xdr_buf whose pages start at an
      offset and cross extra page boundaries. If there are more partial or
      whole pages to send than there are available SGEs, the marshaling
      logic is now smart enough to use a Read chunk instead of failing.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      c6f5b47f
    • C
      xprtrdma: Reduce required number of send SGEs · 16f906d6
      Chuck Lever 提交于
      The MAX_SEND_SGES check introduced in commit 655fec69
      ("xprtrdma: Use gathered Send for large inline messages") fails
      for devices that have a small max_sge.
      
      Instead of checking for a large fixed maximum number of SGEs,
      check for a minimum small number. RPC-over-RDMA will switch to
      using a Read chunk if an xdr_buf has more pages than can fit in
      the device's max_sge limit. This is considerably better than
      failing all together to mount the server.
      
      This fix supports devices that have as few as three send SGEs
      available.
      Reported-by: NSelvin Xavier <selvin.xavier@broadcom.com>
      Reported-by: NDevesh Sharma <devesh.sharma@broadcom.com>
      Reported-by: NHonggang Li <honli@redhat.com>
      Reported-by: NRam Amrani <Ram.Amrani@cavium.com>
      Fixes: 655fec69 ("xprtrdma: Use gathered Send for large ...")
      Cc: stable@vger.kernel.org # v4.9+
      Tested-by: NHonggang Li <honli@redhat.com>
      Tested-by: NRam Amrani <Ram.Amrani@cavium.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Reviewed-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      16f906d6
    • C
      xprtrdma: Disable pad optimization by default · c95a3c6b
      Chuck Lever 提交于
      Commit d5440e27 ("xprtrdma: Enable pad optimization") made the
      Linux client omit XDR round-up padding in normal Read and Write
      chunks so that the client doesn't have to register and invalidate
      3-byte memory regions that contain no real data.
      
      Unfortunately, my cheery 2014 assessment that this optimization "is
      supported now by both Linux and Solaris servers" was premature.
      We've found bugs in Solaris in this area since commit d5440e27
      ("xprtrdma: Enable pad optimization") was merged (SYMLINK is the
      main offender).
      
      So for maximum interoperability, I'm disabling this optimization
      again. If a CM private message is exchanged when connecting, the
      client recognizes that the server is Linux, and enables the
      optimization for that connection.
      
      Until now the Solaris server bugs did not impact common operations,
      and were thus largely benign. Soon, less capable devices on Linux
      NFS/RDMA clients will make use of Read chunks more often, and these
      Solaris bugs will prevent interoperation in more cases.
      
      Fixes: 677eb17e ("xprtrdma: Fix XDR tail buffer marshalling")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      c95a3c6b
    • C
      xprtrdma: Per-connection pad optimization · b5f0afbe
      Chuck Lever 提交于
      Pad optimization is changed by echoing into
      /proc/sys/sunrpc/rdma_pad_optimize. This is a global setting,
      affecting all RPC-over-RDMA connections to all servers.
      
      The marshaling code picks up that value and uses it for decisions
      about how to construct each RPC-over-RDMA frame. Having it change
      suddenly in mid-operation can result in unexpected failures. And
      some servers a client mounts might need chunk round-up, while
      others don't.
      
      So instead, copy the pad_optimize setting into each connection's
      rpcrdma_ia when the transport is created, and use the copy, which
      can't change during the life of the connection, instead.
      
      This also removes a hack: rpcrdma_convert_iovs was using
      the remote-invalidation-expected flag to predict when it could leave
      out Write chunk padding. This is because the Linux server handles
      implicit XDR padding on Write chunks correctly, and only Linux
      servers can set the connection's remote-invalidation-expected flag.
      
      It's more sensible to use the pad optimization setting instead.
      
      Fixes: 677eb17e ("xprtrdma: Fix XDR tail buffer marshalling")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      b5f0afbe
    • C
      xprtrdma: Fix Read chunk padding · 24abdf1b
      Chuck Lever 提交于
      When pad optimization is disabled, rpcrdma_convert_iovs still
      does not add explicit XDR round-up padding to a Read chunk.
      
      Commit 677eb17e ("xprtrdma: Fix XDR tail buffer marshalling")
      incorrectly short-circuited the test for whether round-up padding
      is needed that appears later in rpcrdma_convert_iovs.
      
      However, if this is indeed a regular Read chunk (and not a
      Position-Zero Read chunk), the tail iovec _always_ contains the
      chunk's padding, and never anything else.
      
      So, it's easy to just skip the tail when padding optimization is
      enabled, and add the tail in a subsequent Read chunk segment, if
      disabled.
      
      Fixes: 677eb17e ("xprtrdma: Fix XDR tail buffer marshalling")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      24abdf1b