1. 13 1月, 2017 1 次提交
  2. 25 12月, 2016 1 次提交
  3. 16 12月, 2016 2 次提交
  4. 09 12月, 2016 1 次提交
  5. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  6. 15 11月, 2016 2 次提交
  7. 02 11月, 2016 5 次提交
    • J
      nfsd: catch errors in decode_fattr earlier · e864c189
      J. Bruce Fields 提交于
      3c8e0316 "NFSv4: do exact check about attribute specified" fixed
      some handling of unsupported-attribute errors, but it also delayed
      checking for unwriteable attributes till after we decode them.  This
      could lead to odd behavior in the case a client attemps to set an
      attribute we don't know about followed by one we try to parse.  In that
      case the parser for the known attribute will attempt to parse the
      unknown attribute.  It should fail in some safe way, but the error might
      at least be incorrect (probably bad_xdr instead of inval).  So, it's
      better to do that check at the start.
      
      As far as I know this doesn't cause any problems with current clients
      but it might be a minor issue e.g. if we encounter a future client that
      supports a new attribute that we currently don't.
      
      Cc: Yu Zhiguo <yuzg@cn.fujitsu.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      e864c189
    • J
      nfsd: clean up supported attribute handling · 916d2d84
      J. Bruce Fields 提交于
      Minor cleanup, no change in behavior.
      
      Provide helpers for some common attribute bitmap operations.  Drop some
      comments that just echo the code.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      916d2d84
    • J
      nfsd: fix error handling for clients that fail to return the layout · 851238a2
      Jeff Layton 提交于
      Currently, when the client continually returns NFS4ERR_DELAY on a
      CB_LAYOUTRECALL, we'll give up trying to retransmit after two lease
      periods, but leave the layout in place.
      
      What we really need to do here is fence the client in this case. Have it
      fall through to that code in that case instead of into the
      NFS4ERR_NOMATCHING_LAYOUT case.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      851238a2
    • J
      nfsd: more robust allocation failure handling in nfsd_reply_cache_init · 8f97514b
      Jeff Layton 提交于
      Currently, we try to allocate the cache as a single, large chunk, which
      can fail if no big chunks of memory are available. We _do_ try to size
      it according to the amount of memory in the box, but if the server is
      started well after boot time, then the allocation can fail due to memory
      fragmentation.
      
      Fall back to doing a vzalloc if the kcalloc fails, and switch the
      shutdown code to do a kvfree to handle freeing correctly.
      Reported-by: NOlaf Hering <olaf@aepfle.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      8f97514b
    • C
      nfsd: Fix general protection fault in release_lock_stateid() · f46c445b
      Chuck Lever 提交于
      When I push NFSv4.1 / RDMA hard, (xfstests generic/089, for example),
      I get this crash on the server:
      
      Oct 28 22:04:30 klimt kernel: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
      Oct 28 22:04:30 klimt kernel: Modules linked in: cts rpcsec_gss_krb5 iTCO_wdt iTCO_vendor_support sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm btrfs irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd xor pcspkr raid6_pq i2c_i801 i2c_smbus lpc_ich mfd_core sg mei_me mei ioatdma shpchp wmi ipmi_si ipmi_msghandler rpcrdma ib_ipoib rdma_ucm acpi_power_meter acpi_pad ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb ahci libahci ptp mlx4_core pps_core dca libata i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
      Oct 28 22:04:30 klimt kernel: CPU: 7 PID: 1558 Comm: nfsd Not tainted 4.9.0-rc2-00005-g82cd754 #8
      Oct 28 22:04:30 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
      Oct 28 22:04:30 klimt kernel: task: ffff880835c3a100 task.stack: ffff8808420d8000
      Oct 28 22:04:30 klimt kernel: RIP: 0010:[<ffffffffa05a759f>]  [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd]
      Oct 28 22:04:30 klimt kernel: RSP: 0018:ffff8808420dbce0  EFLAGS: 00010246
      Oct 28 22:04:30 klimt kernel: RAX: ffff88084e6660f0 RBX: ffff88084e667020 RCX: 0000000000000000
      Oct 28 22:04:30 klimt kernel: RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88084e667020
      Oct 28 22:04:30 klimt kernel: RBP: ffff8808420dbcf8 R08: 0000000000000001 R09: 0000000000000000
      Oct 28 22:04:30 klimt kernel: R10: ffff880835c3a100 R11: ffff880835c3aca8 R12: 6b6b6b6b6b6b6b6b
      Oct 28 22:04:30 klimt kernel: R13: ffff88084e6670d8 R14: ffff880835f546f0 R15: ffff880835f1c548
      Oct 28 22:04:30 klimt kernel: FS:  0000000000000000(0000) GS:ffff88087bdc0000(0000) knlGS:0000000000000000
      Oct 28 22:04:30 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Oct 28 22:04:30 klimt kernel: CR2: 00007ff020389000 CR3: 0000000001c06000 CR4: 00000000001406e0
      Oct 28 22:04:30 klimt kernel: Stack:
      Oct 28 22:04:30 klimt kernel: ffff88084e667020 0000000000000000 ffff88084e6670d8 ffff8808420dbd20
      Oct 28 22:04:30 klimt kernel: ffffffffa05ac80d ffff880835f54548 ffff88084e640008 ffff880835f545b0
      Oct 28 22:04:30 klimt kernel: ffff8808420dbd70 ffffffffa059803d ffff880835f1c768 0000000000000870
      Oct 28 22:04:30 klimt kernel: Call Trace:
      Oct 28 22:04:30 klimt kernel: [<ffffffffa05ac80d>] nfsd4_free_stateid+0xfd/0x1b0 [nfsd]
      Oct 28 22:04:30 klimt kernel: [<ffffffffa059803d>] nfsd4_proc_compound+0x40d/0x690 [nfsd]
      Oct 28 22:04:30 klimt kernel: [<ffffffffa0583114>] nfsd_dispatch+0xd4/0x1d0 [nfsd]
      Oct 28 22:04:30 klimt kernel: [<ffffffffa047bbf9>] svc_process_common+0x3d9/0x700 [sunrpc]
      Oct 28 22:04:30 klimt kernel: [<ffffffffa047ca64>] svc_process+0xf4/0x330 [sunrpc]
      Oct 28 22:04:30 klimt kernel: [<ffffffffa05827ca>] nfsd+0xfa/0x160 [nfsd]
      Oct 28 22:04:30 klimt kernel: [<ffffffffa05826d0>] ? nfsd_destroy+0x170/0x170 [nfsd]
      Oct 28 22:04:30 klimt kernel: [<ffffffff810b367b>] kthread+0x10b/0x120
      Oct 28 22:04:30 klimt kernel: [<ffffffff810b3570>] ? kthread_stop+0x280/0x280
      Oct 28 22:04:30 klimt kernel: [<ffffffff8174e8ba>] ret_from_fork+0x2a/0x40
      Oct 28 22:04:30 klimt kernel: Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 87 b0 00 00 00 48 89 fb 4c 8b a0 98 00 00 00 <49> 8b 44 24 20 48 8d b8 80 03 00 00 e8 10 66 1a e1 48 89 df e8
      Oct 28 22:04:30 klimt kernel: RIP  [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd]
      Oct 28 22:04:30 klimt kernel: RSP <ffff8808420dbce0>
      Oct 28 22:04:30 klimt kernel: ---[ end trace cf5d0b371973e167 ]---
      
      Jeff Layton says:
      > Hm...now that I look though, this is a little suspicious:
      >
      >    struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
      >
      > I wonder if it's possible for the openstateid to have already been
      > destroyed at this point.
      >
      > We might be better off doing something like this to get the client pointer:
      >
      >    stp->st_stid.sc_client;
      >
      > ...which should be more direct and less dependent on other stateids
      > staying valid.
      
      With the suggested change, I am no longer able to reproduce the above oops.
      
      v2: Fix unhash_lock_stateid() as well
      Fix-suggested-by: NJeff Layton <jlayton@redhat.com>
      Fixes: 42691398 ('nfsd: Fix race between FREE_STATEID and LOCK')
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      f46c445b
  8. 25 10月, 2016 1 次提交
  9. 08 10月, 2016 4 次提交
  10. 28 9月, 2016 1 次提交
  11. 27 9月, 2016 7 次提交
    • J
      nfsd4: setclientid_confirm with unmatched verifier should fail · 7d22fc11
      J. Bruce Fields 提交于
      A setclientid_confirm with (clientid, verifier) both matching an
      existing confirmed record is assumed to be a replay, but if the verifier
      doesn't match, it shouldn't be.
      
      This would be a very rare case, except that clients following
      https://tools.ietf.org/html/rfc7931#section-5.8 may depend on the
      failure.
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      7d22fc11
    • J
      nfsd: randomize SETCLIENTID reply to help distinguish servers · ebd7c72c
      J. Bruce Fields 提交于
      NFSv4.1 has built-in trunking support that allows a client to determine
      whether two connections to two different IP addresses are actually to
      the same server.  NFSv4.0 does not, but RFC 7931 attempts to provide
      clients a means to do this, basically by performing a SETCLIENTID to one
      address and confirming it with a SETCLIENTID_CONFIRM to the other.
      
      Linux clients since 05f4c350 "NFS: Discover NFSv4 server trunking
      when mounting" implement a variation on this suggestion.  It is possible
      that other clients do too.
      
      This depends on the clientid and verifier not being accepted by an
      unrelated server.  Since both are 64-bit values, that would be very
      unlikely if they were random numbers.  But they aren't:
      
      knfsd generates the 64-bit clientid by concatenating the 32-bit boot
      time (in seconds) and a counter.  This makes collisions between
      clientids generated by the same server extremely unlikely.  But
      collisions are very likely between clientids generated by servers that
      boot at the same time, and it's quite common for multiple servers to
      boot at the same time.  The verifier is a concatenation of the
      SETCLIENTID time (in seconds) and a counter, so again collisions between
      different servers are likely if multiple SETCLIENTIDs are done at the
      same time, which is a common case.
      
      Therefore recent NFSv4.0 clients may decide two different servers are
      really the same, and mount a filesystem from the wrong server.
      
      Fortunately the Linux client, since 55b9df93 "nfsv4/v4.1: Verify the
      client owner id during trunking detection", only does this when given
      the non-default "migration" mount option.
      
      The fault is really with RFC 7931, and needs a client fix, but in the
      meantime we can mitigate the chance of these collisions by randomizing
      the starting value of the counters used to generate clientids and
      verifiers.
      Reported-by: NFrank Sorenson <fsorenso@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ebd7c72c
    • J
      nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies · 19e4c347
      Jeff Layton 提交于
      If we are using v4.1+, then we can send notification when contended
      locks become free. Inform the client of that fact.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      19e4c347
    • J
      nfsd: add a LRU list for blocked locks · 7919d0a2
      Jeff Layton 提交于
      It's possible for a client to call in on a lock that is blocked for a
      long time, but discontinue polling for it. A malicious client could
      even set a lock on a file, and then spam the server with failing lock
      requests from different lockowners that pile up in a DoS attack.
      
      Add the blocked lock structures to a per-net namespace LRU when hashing
      them, and timestamp them. If the lock request is not revisited after a
      lease period, we'll drop it under the assumption that the client is no
      longer interested.
      
      This also gives us a mechanism to clean up these objects at server
      shutdown time as well.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      7919d0a2
    • J
      nfsd: have nfsd4_lock use blocking locks for v4.1+ locks · 76d348fa
      Jeff Layton 提交于
      Create a new per-lockowner+per-inode structure that contains a
      file_lock. Have nfsd4_lock add this structure to the lockowner's list
      prior to setting the lock. Then call the vfs and request a blocking lock
      (by setting FL_SLEEP). If we get anything besides FILE_LOCK_DEFERRED
      back, then we dequeue the block structure and free it. When the next
      lock request comes in, we'll look for an existing block for the same
      filehandle and dequeue and reuse it if there is one.
      
      When the lock comes free (a'la an lm_notify call), we dequeue it
      from the lockowner's list and kick off a CB_NOTIFY_LOCK callback to
      inform the client that it should retry the lock request.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      76d348fa
    • J
      nfsd: plumb in a CB_NOTIFY_LOCK operation · a188620e
      Jeff Layton 提交于
      Add the encoding/decoding for CB_NOTIFY_LOCK operations.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      a188620e
    • V
      NFSD: fix corruption in notifier registration · 1eca45f8
      Vasily Averin 提交于
      By design notifier can be registered once only, however nfsd registers
      the same inetaddr notifiers per net-namespace.  When this happen it
      corrupts list of notifiers, as result some notifiers can be not called
      on proper event, traverse on list can be cycled forever, and second
      unregister can access already freed memory.
      
      Cc: stable@vger.kernel.org
      fixes: 36684996 ("nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain")
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      1eca45f8
  12. 23 9月, 2016 1 次提交
  13. 22 9月, 2016 1 次提交
  14. 17 9月, 2016 2 次提交
    • J
      nfsd: eliminate cb_minorversion field · 89dfdc96
      Jeff Layton 提交于
      We already have that info in the client pointer. No need to pass around
      a copy.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      89dfdc96
    • J
      nfsd: don't set a FL_LAYOUT lease for flexfiles layouts · 1983a66f
      Jeff Layton 提交于
      We currently can hit a deadlock (of sorts) when trying to use flexfiles
      layouts with XFS. XFS will call break_layout when something wants to
      write to the file. In the case of the (super-simple) flexfiles layout
      driver in knfsd, the MDS and DS are the same machine.
      
      The client can get a layout and then issue a v3 write to do its I/O. XFS
      will then call xfs_break_layouts, which will cause a CB_LAYOUTRECALL to
      be issued to the client. The client however can't return the layout
      until the v3 WRITE completes, but XFS won't allow the write to proceed
      until the layout is returned.
      
      Christoph says:
      
          XFS only cares about block-like layouts where the client has direct
          access to the file blocks.  I'd need to look how to propagate the
          flag into break_layout, but in principle we don't need to do any
          recalls on truncate ever for file and flexfile layouts.
      
      If we're never going to recall the layout, then we don't even need to
      set the lease at all. Just skip doing so on flexfiles layouts by
      adding a new flag to struct nfsd4_layout_ops and skipping the lease
      setting and removal when that flag is true.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      1983a66f
  15. 13 8月, 2016 1 次提交
  16. 12 8月, 2016 1 次提交
    • C
      nfsd: Fix race between FREE_STATEID and LOCK · 42691398
      Chuck Lever 提交于
      When running LTP's nfslock01 test, the Linux client can send a LOCK
      and a FREE_STATEID request at the same time. The outcome is:
      
      Frame 324    R OPEN stateid [2,O]
      
      Frame 115004 C LOCK lockowner_is_new stateid [2,O] offset 672000 len 64
      Frame 115008 R LOCK stateid [1,L]
      Frame 115012 C WRITE stateid [0,L] offset 672000 len 64
      Frame 115016 R WRITE NFS4_OK
      Frame 115019 C LOCKU stateid [1,L] offset 672000 len 64
      Frame 115022 R LOCKU NFS4_OK
      Frame 115025 C FREE_STATEID stateid [2,L]
      Frame 115026 C LOCK lockowner_is_new stateid [2,O] offset 672128 len 64
      Frame 115029 R FREE_STATEID NFS4_OK
      Frame 115030 R LOCK stateid [3,L]
      Frame 115034 C WRITE stateid [0,L] offset 672128 len 64
      Frame 115038 R WRITE NFS4ERR_BAD_STATEID
      
      In other words, the server returns stateid L in a successful LOCK
      reply, but it has already released it. Subsequent uses of stateid L
      fail.
      
      To address this, protect the generation check in nfsd4_free_stateid
      with the st_mutex. This should guarantee that only one of two
      outcomes occurs: either LOCK returns a fresh valid stateid, or
      FREE_STATEID returns NFS4ERR_LOCKS_HELD.
      Reported-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Fix-suggested-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      42691398
  17. 11 8月, 2016 1 次提交
  18. 05 8月, 2016 7 次提交