1. 12 6月, 2015 3 次提交
  2. 05 6月, 2015 1 次提交
  3. 01 6月, 2015 1 次提交
    • O
      fixing infinite OPEN loop in 4.0 stateid recovery · e8d975e7
      Olga Kornievskaia 提交于
      Problem: When an operation like WRITE receives a BAD_STATEID, even though
      recovery code clears the RECLAIM_NOGRACE recovery flag before recovering
      the open state, because of clearing delegation state for the associated
      inode, nfs_inode_find_state_and_recover() gets called and it makes the
      same state with RECLAIM_NOGRACE flag again. As a results, when we restart
      looking over the open states, we end up in the infinite loop instead of
      breaking out in the next test of state flags.
      
      Solution: unset the RECLAIM_NOGRACE set because of
      calling of nfs_inode_find_state_and_recover() after returning from calling
      recover_open() function.
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      e8d975e7
  4. 14 5月, 2015 2 次提交
    • J
      nfs: take extra reference to fl->fl_file when running a setlk · feaff8e5
      Jeff Layton 提交于
      We had a report of a crash while stress testing the NFS client:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000150
          IP: [<ffffffff8127b698>] locks_get_lock_context+0x8/0x90
          PGD 0
          Oops: 0000 [#1] SMP
          Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_filter ebtable_broute bridge stp llc ebtables ip6table_security ip6table_mangle ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw ip6table_filter ip6_tables iptable_security iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw coretemp crct10dif_pclmul ppdev crc32_pclmul crc32c_intel ghash_clmulni_intel vmw_balloon serio_raw vmw_vmci i2c_piix4 shpchp parport_pc acpi_cpufreq parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi scsi_transport_spi mptscsih mptbase e1000 ata_generic pata_acpi
          CPU: 1 PID: 399 Comm: kworker/1:1H Not tainted 4.1.0-0.rc1.git0.1.fc23.x86_64 #1
          Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
          Workqueue: rpciod rpc_async_schedule [sunrpc]
          task: ffff880036aea7c0 ti: ffff8800791f4000 task.ti: ffff8800791f4000
          RIP: 0010:[<ffffffff8127b698>]  [<ffffffff8127b698>] locks_get_lock_context+0x8/0x90
          RSP: 0018:ffff8800791f7c00  EFLAGS: 00010293
          RAX: ffff8800791f7c40 RBX: ffff88001f2ad8c0 RCX: ffffe8ffffc80305
          RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
          RBP: ffff8800791f7c88 R08: ffff88007fc971d8 R09: 279656d600000000
          R10: 0000034a01000000 R11: 279656d600000000 R12: ffff88001f2ad918
          R13: ffff88001f2ad8c0 R14: 0000000000000000 R15: 0000000100e73040
          FS:  0000000000000000(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 0000000000000150 CR3: 0000000001c0b000 CR4: 00000000000407e0
          Stack:
           ffffffff8127c5b0 ffff8800791f7c18 ffffffffa0171e29 ffff8800791f7c58
           ffffffffa0171ef8 ffff8800791f7c78 0000000000000246 ffff88001ea0ba00
           ffff8800791f7c40 ffff8800791f7c40 00000000ff5d86a3 ffff8800791f7ca8
          Call Trace:
           [<ffffffff8127c5b0>] ? __posix_lock_file+0x40/0x760
           [<ffffffffa0171e29>] ? rpc_make_runnable+0x99/0xa0 [sunrpc]
           [<ffffffffa0171ef8>] ? rpc_wake_up_task_queue_locked.part.35+0xc8/0x250 [sunrpc]
           [<ffffffff8127cd3a>] posix_lock_file_wait+0x4a/0x120
           [<ffffffffa03e4f12>] ? nfs41_wake_and_assign_slot+0x32/0x40 [nfsv4]
           [<ffffffffa03bf108>] ? nfs41_sequence_done+0xd8/0x2d0 [nfsv4]
           [<ffffffffa03c116d>] do_vfs_lock+0x2d/0x30 [nfsv4]
           [<ffffffffa03c251d>] nfs4_lock_done+0x1ad/0x210 [nfsv4]
           [<ffffffffa0171a30>] ? __rpc_sleep_on_priority+0x390/0x390 [sunrpc]
           [<ffffffffa0171a30>] ? __rpc_sleep_on_priority+0x390/0x390 [sunrpc]
           [<ffffffffa0171a5c>] rpc_exit_task+0x2c/0xa0 [sunrpc]
           [<ffffffffa0167450>] ? call_refreshresult+0x150/0x150 [sunrpc]
           [<ffffffffa0172640>] __rpc_execute+0x90/0x460 [sunrpc]
           [<ffffffffa0172a25>] rpc_async_schedule+0x15/0x20 [sunrpc]
           [<ffffffff810baa1b>] process_one_work+0x1bb/0x410
           [<ffffffff810bacc3>] worker_thread+0x53/0x480
           [<ffffffff810bac70>] ? process_one_work+0x410/0x410
           [<ffffffff810bac70>] ? process_one_work+0x410/0x410
           [<ffffffff810c0b38>] kthread+0xd8/0xf0
           [<ffffffff810c0a60>] ? kthread_worker_fn+0x180/0x180
           [<ffffffff817a1aa2>] ret_from_fork+0x42/0x70
           [<ffffffff810c0a60>] ? kthread_worker_fn+0x180/0x180
      
      Jean says:
      
      "Running locktests with a large number of iterations resulted in a
       client crash.  The test run took a while and hasn't finished after close
       to 2 hours. The crash happened right after I gave up and killed the test
       (after 107m) with Ctrl+C."
      
      The crash happened because a NULL inode pointer got passed into
      locks_get_lock_context. The call chain indicates that file_inode(filp)
      returned NULL, which means that f_inode was NULL. Since that's zeroed
      out in __fput, that suggests that this filp pointer outlived the last
      reference.
      
      Looking at the code, that seems possible. We copy the struct file_lock
      that's passed in, but if the task is signalled at an inopportune time we
      can end up trying to use that file_lock in rpciod context after the process
      that requested it has already returned (and possibly put its filp
      reference).
      
      Fix this by taking an extra reference to the filp when we allocate the
      lock info, and put it in nfs4_lock_release.
      Reported-by: NJean Spector <jean@primarydata.com>
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      feaff8e5
    • C
      nfs: stat(2) fails during cthon04 basic test5 on NFSv4.0 · 6b196875
      Chuck Lever 提交于
      When running the Connectathon basic tests against a Solaris NFS
      server over NFSv4.0, test5 reports that stat(2) returns a file size
      of zero instead of 1MB.
      
      On success, nfs_commit_inode() can return a positive result; see
      other call sites such as nfs_file_fsync_commit() and
      nfs_commit_unstable_pages().
      
      The call site recently added in nfs_wb_all() does not prevent that
      positive return value from leaking to its callers. If it leaks
      through nfs_sync_inode() back to nfs_getattr(), that causes stat(2)
      to return a positive return value to user space while also not
      filling in the passed-in struct stat.
      
      Additional clean up: the new logic in nfs_wb_all() is rewritten in
      bfields-normal form.
      
      Fixes: 5bb89b47 ("NFSv4.1/pnfs: Separate out metadata . . .")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      6b196875
  5. 25 4月, 2015 1 次提交
    • J
      direct-io: only inc/dec inode->i_dio_count for file systems · fe0f07d0
      Jens Axboe 提交于
      do_blockdev_direct_IO() increments and decrements the inode
      ->i_dio_count for each IO operation. It does this to protect against
      truncate of a file. Block devices don't need this sort of protection.
      
      For a capable multiqueue setup, this atomic int is the only shared
      state between applications accessing the device for O_DIRECT, and it
      presents a scaling wall for that. In my testing, as much as 30% of
      system time is spent incrementing and decrementing this value. A mixed
      read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
      better latencies too. Before:
      
      clat percentiles (usec):
       |  1.00th=[   33],  5.00th=[   34], 10.00th=[   34], 20.00th=[   34],
       | 30.00th=[   34], 40.00th=[   34], 50.00th=[   35], 60.00th=[   35],
       | 70.00th=[   35], 80.00th=[   35], 90.00th=[   37], 95.00th=[   80],
       | 99.00th=[   98], 99.50th=[  151], 99.90th=[  155], 99.95th=[  155],
       | 99.99th=[  165]
      
      After:
      
      clat percentiles (usec):
       |  1.00th=[   95],  5.00th=[  108], 10.00th=[  129], 20.00th=[  149],
       | 30.00th=[  155], 40.00th=[  161], 50.00th=[  167], 60.00th=[  171],
       | 70.00th=[  177], 80.00th=[  185], 90.00th=[  201], 95.00th=[  270],
       | 99.00th=[  390], 99.50th=[  398], 99.90th=[  418], 99.95th=[  422],
       | 99.99th=[  438]
      
      In other setups, Robert Elliott reported seeing good performance
      improvements:
      
      https://lkml.org/lkml/2015/4/3/557
      
      The more applications accessing the device, the worse it gets.
      
      Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells
      do_blockdev_direct_IO() that it need not worry about incrementing
      or decrementing the inode i_dio_count for this caller.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Elliott, Robert (Server Storage) <elliott@hp.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fe0f07d0
  6. 24 4月, 2015 14 次提交
  7. 16 4月, 2015 4 次提交
  8. 15 4月, 2015 1 次提交
    • K
      page_writeback: clean up mess around cancel_dirty_page() · b9ea2515
      Konstantin Khlebnikov 提交于
      This patch replaces cancel_dirty_page() with a helper function
      account_page_cleaned() which only updates counters.  It's called from
      truncate_complete_page() and from try_to_free_buffers() (hack for ext3).
      Page is locked in both cases, page-lock protects against concurrent
      dirtiers: see commit 2d6d7f98 ("mm: protect set_page_dirty() from
      ongoing truncation").
      
      Delete_from_page_cache() shouldn't be called for dirty pages, they must
      be handled by caller (either written or truncated).  This patch treats
      final dirty accounting fixup at the end of __delete_from_page_cache() as
      a debug check and adds WARN_ON_ONCE() around it.  If something removes
      dirty pages without proper handling that might be a bug and unwritten
      data might be lost.
      
      Hugetlbfs has no dirty pages accounting, ClearPageDirty() is enough
      here.
      
      cancel_dirty_page() in nfs_wb_page_cancel() is redundant.  This is
      helper for nfs_invalidate_page() and it's called only in case complete
      invalidation.
      
      The mess was started in v2.6.20 after commits 46d2277c ("Clean up
      and make try_to_free_buffers() not race with dirty pages") and
      3e67c098 ("truncate: clear page dirtiness before running
      try_to_free_buffers()") first was reverted right in v2.6.20 in commit
      ecdfc978 ("Resurrect 'try_to_free_buffers()' VM hackery"), second in
      v2.6.25 commit a2b34564 ("Fix dirty page accounting leak with ext3
      data=journal").
      
      Custom fixes were introduced between these points.  NFS in v2.6.23, commit
      1b3b4a1a ("NFS: Fix a write request leak in nfs_invalidate_page()").
      Kludge in __delete_from_page_cache() in v2.6.24, commit 3a692790 ("Do
      dirty page accounting when removing a page from the page cache").  Since
      v2.6.25 all of them are redundant.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9ea2515
  9. 12 4月, 2015 6 次提交
  10. 28 3月, 2015 7 次提交