1. 24 6月, 2015 10 次提交
  2. 18 6月, 2015 3 次提交
  3. 16 6月, 2015 8 次提交
  4. 12 6月, 2015 4 次提交
  5. 11 6月, 2015 4 次提交
  6. 05 6月, 2015 1 次提交
  7. 02 6月, 2015 4 次提交
    • J
      NFS: drop unneeded goto · 13985b1f
      Julia Lawall 提交于
      Delete jump to a label on the next line, when that label is not
      used elsewhere.
      
      A simplified version of the semantic patch that makes this change is as
      follows: (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @r@
      identifier l;
      @@
      
      -if (...) goto l;
      -l:
      // </smpl>
      
      Also drop the unnecessary ret variable.
      Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      13985b1f
    • C
      NFS: Fix size of NFSACL SETACL operations · d683cc49
      Chuck Lever 提交于
      When encoding the NFSACL SETACL operation, reserve just the estimated
      size of the ACL rather than a fixed maximum. This eliminates needless
      zero padding on the wire that the server ignores.
      
      Fixes: ee5dc773 ('NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!"')
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      d683cc49
    • N
      NFS: report more appropriate block size for directories. · 7ef5ca4f
      NeilBrown 提交于
      In glibc 2.21 (and several previous), a call to opendir() will
      result in a 32K (BUFSIZ*4) buffer being allocated and passed to
      getdents.
      
      However a call to fdopendir() results in an 'fstat' request to
      determine block size and a matching buffer allocated for subsequent
      use with getdents.  This will typically be 1M.
      
      The first getdents call on an NFS directory will always use
      READDIR_PLUS (or NFSv4 equivalent) if available.  Subsequent getdents
      calls only use this more expensive version if some 'stat' requests are
      made between the getdents calls.
      
      For this reason it is good to keep at least that first getdents call
      relatively short.  When fdopendir() and readdir() is used on a large
      directory, it takes approximately 32 times as long to complete as
      using "opendir".  Current versions of 'find' use fdopendir() and
      demonstrate this slowness.
      
      'stat' on a directory currently returns the 'wsize'.  This number has
      no meaning on directories.
      Actual READDIR requests are limited to ->dtsize, which itself is
      capped at 4 pages, coincidently the same as BUFSIZ*4.
      So this is a meaningful number to use as the blocksize on directories,
      and has the effect of making 'find' on large directories go a lot
      faster.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      7ef5ca4f
    • T
      NFSv4: Always drain the slot table before re-establishing the lease · 5cae02f4
      Trond Myklebust 提交于
      While the NFSv4.1 code has always drained the slot tables in order to stop
      non-recovery related RPC calls when doing lease recovery, the NFSv4 code
      did not.
      The reason for the difference in behaviour is that NFSv4 does not have
      session state, and so RPC calls can in theory proceed while recovery is
      happening. In practice, however, anything I/O or state related needs to
      wait until recovery is over.
      
      This patch changes the behaviour of NFSv4 to match that of NFSv4.1 so that
      we can simplify the state recovery code by assuming that we do not have to
      deal with races between recovery and ordinary I/O.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      5cae02f4
  8. 01 6月, 2015 1 次提交
    • O
      fixing infinite OPEN loop in 4.0 stateid recovery · e8d975e7
      Olga Kornievskaia 提交于
      Problem: When an operation like WRITE receives a BAD_STATEID, even though
      recovery code clears the RECLAIM_NOGRACE recovery flag before recovering
      the open state, because of clearing delegation state for the associated
      inode, nfs_inode_find_state_and_recover() gets called and it makes the
      same state with RECLAIM_NOGRACE flag again. As a results, when we restart
      looking over the open states, we end up in the infinite loop instead of
      breaking out in the next test of state flags.
      
      Solution: unset the RECLAIM_NOGRACE set because of
      calling of nfs_inode_find_state_and_recover() after returning from calling
      recover_open() function.
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      e8d975e7
  9. 14 5月, 2015 2 次提交
    • J
      nfs: take extra reference to fl->fl_file when running a setlk · feaff8e5
      Jeff Layton 提交于
      We had a report of a crash while stress testing the NFS client:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000150
          IP: [<ffffffff8127b698>] locks_get_lock_context+0x8/0x90
          PGD 0
          Oops: 0000 [#1] SMP
          Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_filter ebtable_broute bridge stp llc ebtables ip6table_security ip6table_mangle ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw ip6table_filter ip6_tables iptable_security iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw coretemp crct10dif_pclmul ppdev crc32_pclmul crc32c_intel ghash_clmulni_intel vmw_balloon serio_raw vmw_vmci i2c_piix4 shpchp parport_pc acpi_cpufreq parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi scsi_transport_spi mptscsih mptbase e1000 ata_generic pata_acpi
          CPU: 1 PID: 399 Comm: kworker/1:1H Not tainted 4.1.0-0.rc1.git0.1.fc23.x86_64 #1
          Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
          Workqueue: rpciod rpc_async_schedule [sunrpc]
          task: ffff880036aea7c0 ti: ffff8800791f4000 task.ti: ffff8800791f4000
          RIP: 0010:[<ffffffff8127b698>]  [<ffffffff8127b698>] locks_get_lock_context+0x8/0x90
          RSP: 0018:ffff8800791f7c00  EFLAGS: 00010293
          RAX: ffff8800791f7c40 RBX: ffff88001f2ad8c0 RCX: ffffe8ffffc80305
          RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
          RBP: ffff8800791f7c88 R08: ffff88007fc971d8 R09: 279656d600000000
          R10: 0000034a01000000 R11: 279656d600000000 R12: ffff88001f2ad918
          R13: ffff88001f2ad8c0 R14: 0000000000000000 R15: 0000000100e73040
          FS:  0000000000000000(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 0000000000000150 CR3: 0000000001c0b000 CR4: 00000000000407e0
          Stack:
           ffffffff8127c5b0 ffff8800791f7c18 ffffffffa0171e29 ffff8800791f7c58
           ffffffffa0171ef8 ffff8800791f7c78 0000000000000246 ffff88001ea0ba00
           ffff8800791f7c40 ffff8800791f7c40 00000000ff5d86a3 ffff8800791f7ca8
          Call Trace:
           [<ffffffff8127c5b0>] ? __posix_lock_file+0x40/0x760
           [<ffffffffa0171e29>] ? rpc_make_runnable+0x99/0xa0 [sunrpc]
           [<ffffffffa0171ef8>] ? rpc_wake_up_task_queue_locked.part.35+0xc8/0x250 [sunrpc]
           [<ffffffff8127cd3a>] posix_lock_file_wait+0x4a/0x120
           [<ffffffffa03e4f12>] ? nfs41_wake_and_assign_slot+0x32/0x40 [nfsv4]
           [<ffffffffa03bf108>] ? nfs41_sequence_done+0xd8/0x2d0 [nfsv4]
           [<ffffffffa03c116d>] do_vfs_lock+0x2d/0x30 [nfsv4]
           [<ffffffffa03c251d>] nfs4_lock_done+0x1ad/0x210 [nfsv4]
           [<ffffffffa0171a30>] ? __rpc_sleep_on_priority+0x390/0x390 [sunrpc]
           [<ffffffffa0171a30>] ? __rpc_sleep_on_priority+0x390/0x390 [sunrpc]
           [<ffffffffa0171a5c>] rpc_exit_task+0x2c/0xa0 [sunrpc]
           [<ffffffffa0167450>] ? call_refreshresult+0x150/0x150 [sunrpc]
           [<ffffffffa0172640>] __rpc_execute+0x90/0x460 [sunrpc]
           [<ffffffffa0172a25>] rpc_async_schedule+0x15/0x20 [sunrpc]
           [<ffffffff810baa1b>] process_one_work+0x1bb/0x410
           [<ffffffff810bacc3>] worker_thread+0x53/0x480
           [<ffffffff810bac70>] ? process_one_work+0x410/0x410
           [<ffffffff810bac70>] ? process_one_work+0x410/0x410
           [<ffffffff810c0b38>] kthread+0xd8/0xf0
           [<ffffffff810c0a60>] ? kthread_worker_fn+0x180/0x180
           [<ffffffff817a1aa2>] ret_from_fork+0x42/0x70
           [<ffffffff810c0a60>] ? kthread_worker_fn+0x180/0x180
      
      Jean says:
      
      "Running locktests with a large number of iterations resulted in a
       client crash.  The test run took a while and hasn't finished after close
       to 2 hours. The crash happened right after I gave up and killed the test
       (after 107m) with Ctrl+C."
      
      The crash happened because a NULL inode pointer got passed into
      locks_get_lock_context. The call chain indicates that file_inode(filp)
      returned NULL, which means that f_inode was NULL. Since that's zeroed
      out in __fput, that suggests that this filp pointer outlived the last
      reference.
      
      Looking at the code, that seems possible. We copy the struct file_lock
      that's passed in, but if the task is signalled at an inopportune time we
      can end up trying to use that file_lock in rpciod context after the process
      that requested it has already returned (and possibly put its filp
      reference).
      
      Fix this by taking an extra reference to the filp when we allocate the
      lock info, and put it in nfs4_lock_release.
      Reported-by: NJean Spector <jean@primarydata.com>
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      feaff8e5
    • C
      nfs: stat(2) fails during cthon04 basic test5 on NFSv4.0 · 6b196875
      Chuck Lever 提交于
      When running the Connectathon basic tests against a Solaris NFS
      server over NFSv4.0, test5 reports that stat(2) returns a file size
      of zero instead of 1MB.
      
      On success, nfs_commit_inode() can return a positive result; see
      other call sites such as nfs_file_fsync_commit() and
      nfs_commit_unstable_pages().
      
      The call site recently added in nfs_wb_all() does not prevent that
      positive return value from leaking to its callers. If it leaks
      through nfs_sync_inode() back to nfs_getattr(), that causes stat(2)
      to return a positive return value to user space while also not
      filling in the passed-in struct stat.
      
      Additional clean up: the new logic in nfs_wb_all() is rewritten in
      bfields-normal form.
      
      Fixes: 5bb89b47 ("NFSv4.1/pnfs: Separate out metadata . . .")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      6b196875
  10. 25 4月, 2015 1 次提交
    • J
      direct-io: only inc/dec inode->i_dio_count for file systems · fe0f07d0
      Jens Axboe 提交于
      do_blockdev_direct_IO() increments and decrements the inode
      ->i_dio_count for each IO operation. It does this to protect against
      truncate of a file. Block devices don't need this sort of protection.
      
      For a capable multiqueue setup, this atomic int is the only shared
      state between applications accessing the device for O_DIRECT, and it
      presents a scaling wall for that. In my testing, as much as 30% of
      system time is spent incrementing and decrementing this value. A mixed
      read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
      better latencies too. Before:
      
      clat percentiles (usec):
       |  1.00th=[   33],  5.00th=[   34], 10.00th=[   34], 20.00th=[   34],
       | 30.00th=[   34], 40.00th=[   34], 50.00th=[   35], 60.00th=[   35],
       | 70.00th=[   35], 80.00th=[   35], 90.00th=[   37], 95.00th=[   80],
       | 99.00th=[   98], 99.50th=[  151], 99.90th=[  155], 99.95th=[  155],
       | 99.99th=[  165]
      
      After:
      
      clat percentiles (usec):
       |  1.00th=[   95],  5.00th=[  108], 10.00th=[  129], 20.00th=[  149],
       | 30.00th=[  155], 40.00th=[  161], 50.00th=[  167], 60.00th=[  171],
       | 70.00th=[  177], 80.00th=[  185], 90.00th=[  201], 95.00th=[  270],
       | 99.00th=[  390], 99.50th=[  398], 99.90th=[  418], 99.95th=[  422],
       | 99.99th=[  438]
      
      In other setups, Robert Elliott reported seeing good performance
      improvements:
      
      https://lkml.org/lkml/2015/4/3/557
      
      The more applications accessing the device, the worse it gets.
      
      Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells
      do_blockdev_direct_IO() that it need not worry about incrementing
      or decrementing the inode i_dio_count for this caller.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Elliott, Robert (Server Storage) <elliott@hp.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fe0f07d0
  11. 24 4月, 2015 2 次提交