1. 29 12月, 2015 2 次提交
  2. 22 10月, 2015 1 次提交
    • K
      NFSv4.1/pnfs: Retry through MDS when getting bad length of data · f8417b48
      Kinglong Mee 提交于
      If non rpc-based layout driver return bad length of data, nfs retries
      by calling rpc_restart_call_prepare() that cause an NULL reference panic.
      
      This patch lets nfs retry through MDS for non rpc-based layout driver
      return bad length of data.
      
      [13034.883329] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [13034.884902] IP: [<ffffffffa00db372>] rpc_restart_call_prepare+0x62/0x90 [sunrpc]
      [13034.886558] PGD 0
      [13034.888126] Oops: 0000 [#1] KASAN
      [13034.889710] Modules linked in: blocklayoutdriver(OE) nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c coretemp btrfs crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev vmw_balloon auth_rpcgss shpchp nfs_acl lockd vmw_vmci parport_pc xor raid6_pq grace parport sunrpc i2c_piix4 vmwgfx drm_kms_helper ttm drm mptspi e1000 serio_raw scsi_transport_spi mptscsih mptbase ata_generic pata_acpi [last unloaded: fscache]
      [13034.898260] CPU: 0 PID: 10112 Comm: kworker/0:1 Tainted: G           OE   4.3.0-rc5+ #279
      [13034.899932] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
      [13034.903342] Workqueue: events bl_read_cleanup [blocklayoutdriver]
      [13034.905059] task: ffff88006a9148c0 ti: ffff880035e90000 task.ti: ffff880035e90000
      [13034.906827] RIP: 0010:[<ffffffffa00db372>]  [<ffffffffa00db372>] rpc_restart_call_prepare+0x62/0x90 [sunrpc]
      [13034.910522] RSP: 0018:ffff880035e97b58  EFLAGS: 00010282
      [13034.912378] RAX: fffffbfff04a5a94 RBX: ffff880068fe4858 RCX: 0000000000000003
      [13034.914339] RDX: dffffc0000000000 RSI: 0000000000000003 RDI: 0000000000000282
      [13034.916236] RBP: ffff880035e97b68 R08: 0000000000000001 R09: 0000000000000001
      [13034.918229] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
      [13034.920007] R13: ffff880068fe4858 R14: ffff880068fe4a60 R15: 0000000000001000
      [13034.921845] FS:  0000000000000000(0000) GS:ffffffff82247000(0000) knlGS:0000000000000000
      [13034.923645] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [13034.925525] CR2: 0000000000000000 CR3: 00000000063dd000 CR4: 00000000001406f0
      [13034.932808] Stack:
      [13034.934813]  ffff880068fe4780 0000000000001000 ffff880035e97ba8 ffffffffa08800d2
      [13034.936675]  ffffffffa088029d ffff880068fe4780 ffff880068fe4858 ffffffffa089c0a0
      [13034.938593]  ffff880068fe47e0 ffff88005d59faf0 ffff880035e97be0 ffffffffa087e08f
      [13034.940454] Call Trace:
      [13034.942388]  [<ffffffffa08800d2>] nfs_readpage_result+0x112/0x200 [nfs]
      [13034.944317]  [<ffffffffa088029d>] ? nfs_readpage_done+0xdd/0x160 [nfs]
      [13034.946267]  [<ffffffffa087e08f>] nfs_pgio_result+0x9f/0x120 [nfs]
      [13034.948166]  [<ffffffffa09266cc>] pnfs_ld_read_done+0x7c/0x1e0 [nfsv4]
      [13034.950247]  [<ffffffffa03b07ee>] bl_read_cleanup+0x2e/0x60 [blocklayoutdriver]
      [13034.952156]  [<ffffffff810ebf62>] process_one_work+0x412/0x870
      [13034.954102]  [<ffffffff810ebe84>] ? process_one_work+0x334/0x870
      [13034.955949]  [<ffffffff810ebb50>] ? queue_delayed_work_on+0x40/0x40
      [13034.957985]  [<ffffffff810ec441>] worker_thread+0x81/0x6a0
      [13034.959817]  [<ffffffff810ec3c0>] ? process_one_work+0x870/0x870
      [13034.961785]  [<ffffffff810f43bd>] kthread+0x17d/0x1a0
      [13034.963544]  [<ffffffff810f4240>] ? kthread_create_on_node+0x330/0x330
      [13034.965479]  [<ffffffff81100428>] ? finish_task_switch+0x88/0x220
      [13034.967223]  [<ffffffff810f4240>] ? kthread_create_on_node+0x330/0x330
      [13034.968929]  [<ffffffff81b6ae5f>] ret_from_fork+0x3f/0x70
      [13034.970534]  [<ffffffff810f4240>] ? kthread_create_on_node+0x330/0x330
      [13034.972176] Code: c7 43 50 40 84 0d a0 e8 3d fe 1c e1 48 8d 7b 58 c7 83 e4 00 00 00 00 00 00 00 e8 ca fe 1c e1 4c 8b 63 58 4c 89 e7 e8 be fe 1c e1 <49> 83 3c 24 00 74 12 48 c7 43 50 f0 a2 0e a0 b8 01 00 00 00 5b
      [13034.977148] RIP  [<ffffffffa00db372>] rpc_restart_call_prepare+0x62/0x90 [sunrpc]
      [13034.978780]  RSP <ffff880035e97b58>
      [13034.980399] CR2: 0000000000000000
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      f8417b48
  3. 03 10月, 2015 2 次提交
    • T
      NFS: Fix a write performance regression · 8fa4592a
      Trond Myklebust 提交于
      If all other conditions in nfs_can_extend_write() are met, and there
      are no locks, then we should be able to assume close-to-open semantics
      and the ability to extend our write to cover the whole page.
      
      With this patch, the xfstests generic/074 test completes in 242s instead
      of >1400s on my test rig.
      
      Fixes: bd61e0a9 ("locks: convert posix locks to file_lock_context")
      Cc: Jeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      8fa4592a
    • T
      NFS: Fix up page writeback accounting · 40f90271
      Trond Myklebust 提交于
      Currently, we are crediting all the calls to nfs_writepages_callback()
      (i.e. the nfs_writepages() callback) to nfs_writepage(). Aside from
      being inconsistent with the behaviour of the equivalent readpage/readpages
      accounting, this also means that we cannot distinguish between bulk writes
      and single page writebacks (which confuses the 'nfsiostat -p' tool).
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      40f90271
  4. 21 9月, 2015 1 次提交
  5. 18 8月, 2015 1 次提交
  6. 11 8月, 2015 1 次提交
  7. 23 7月, 2015 1 次提交
  8. 01 7月, 2015 1 次提交
  9. 18 6月, 2015 1 次提交
  10. 11 6月, 2015 1 次提交
  11. 02 6月, 2015 1 次提交
    • T
      writeback: move backing_dev_info->bdi_stat[] into bdi_writeback · 93f78d88
      Tejun Heo 提交于
      Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback)
      and the role of the separation is unclear.  For cgroup support for
      writeback IOs, a bdi will be updated to host multiple wb's where each
      wb serves writeback IOs of a different cgroup on the bdi.  To achieve
      that, a wb should carry all states necessary for servicing writeback
      IOs for a cgroup independently.
      
      This patch moves bdi->bdi_stat[] into wb.
      
      * enum bdi_stat_item is renamed to wb_stat_item and the prefix of all
        enums is changed from BDI_ to WB_.
      
      * BDI_STAT_BATCH() -> WB_STAT_BATCH()
      
      * [__]{add|inc|dec|sum}_wb_stat(bdi, ...) -> [__]{add|inc}_wb_stat(wb, ...)
      
      * bdi_stat[_error]() -> wb_stat[_error]()
      
      * bdi_writeout_inc() -> wb_writeout_inc()
      
      * stat init is moved to bdi_wb_init() and bdi_wb_exit() is added and
        frees stat.
      
      * As there's still only one bdi_writeback per backing_dev_info, all
        uses of bdi->stat[] are mechanically replaced with bdi->wb.stat[]
        introducing no behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      93f78d88
  12. 14 5月, 2015 1 次提交
    • C
      nfs: stat(2) fails during cthon04 basic test5 on NFSv4.0 · 6b196875
      Chuck Lever 提交于
      When running the Connectathon basic tests against a Solaris NFS
      server over NFSv4.0, test5 reports that stat(2) returns a file size
      of zero instead of 1MB.
      
      On success, nfs_commit_inode() can return a positive result; see
      other call sites such as nfs_file_fsync_commit() and
      nfs_commit_unstable_pages().
      
      The call site recently added in nfs_wb_all() does not prevent that
      positive return value from leaking to its callers. If it leaks
      through nfs_sync_inode() back to nfs_getattr(), that causes stat(2)
      to return a positive return value to user space while also not
      filling in the passed-in struct stat.
      
      Additional clean up: the new logic in nfs_wb_all() is rewritten in
      bfields-normal form.
      
      Fixes: 5bb89b47 ("NFSv4.1/pnfs: Separate out metadata . . .")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      6b196875
  13. 24 4月, 2015 1 次提交
  14. 16 4月, 2015 1 次提交
  15. 15 4月, 2015 1 次提交
    • K
      page_writeback: clean up mess around cancel_dirty_page() · b9ea2515
      Konstantin Khlebnikov 提交于
      This patch replaces cancel_dirty_page() with a helper function
      account_page_cleaned() which only updates counters.  It's called from
      truncate_complete_page() and from try_to_free_buffers() (hack for ext3).
      Page is locked in both cases, page-lock protects against concurrent
      dirtiers: see commit 2d6d7f98 ("mm: protect set_page_dirty() from
      ongoing truncation").
      
      Delete_from_page_cache() shouldn't be called for dirty pages, they must
      be handled by caller (either written or truncated).  This patch treats
      final dirty accounting fixup at the end of __delete_from_page_cache() as
      a debug check and adds WARN_ON_ONCE() around it.  If something removes
      dirty pages without proper handling that might be a bug and unwritten
      data might be lost.
      
      Hugetlbfs has no dirty pages accounting, ClearPageDirty() is enough
      here.
      
      cancel_dirty_page() in nfs_wb_page_cancel() is redundant.  This is
      helper for nfs_invalidate_page() and it's called only in case complete
      invalidation.
      
      The mess was started in v2.6.20 after commits 46d2277c ("Clean up
      and make try_to_free_buffers() not race with dirty pages") and
      3e67c098 ("truncate: clear page dirtiness before running
      try_to_free_buffers()") first was reverted right in v2.6.20 in commit
      ecdfc978 ("Resurrect 'try_to_free_buffers()' VM hackery"), second in
      v2.6.25 commit a2b34564 ("Fix dirty page accounting leak with ext3
      data=journal").
      
      Custom fixes were introduced between these points.  NFS in v2.6.23, commit
      1b3b4a1a ("NFS: Fix a write request leak in nfs_invalidate_page()").
      Kludge in __delete_from_page_cache() in v2.6.24, commit 3a692790 ("Do
      dirty page accounting when removing a page from the page cache").  Since
      v2.6.25 all of them are redundant.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9ea2515
  16. 28 3月, 2015 1 次提交
  17. 02 3月, 2015 1 次提交
  18. 14 2月, 2015 2 次提交
  19. 04 2月, 2015 5 次提交
  20. 21 1月, 2015 1 次提交
  21. 17 1月, 2015 3 次提交
  22. 25 11月, 2014 2 次提交
  23. 13 11月, 2014 1 次提交
  24. 25 9月, 2014 3 次提交
    • N
      NFS: avoid waiting at all in nfs_release_page when congested. · 353db796
      NeilBrown 提交于
      If nfs_release_page() is called on a sequence of pages which are all
      in the same file which is blocked on COMMIT, each page could
      contribute a 1 second delay which could be come excessive.  I have
      seen delays of as much as 208 seconds.
      
      To keep the delay to one second, mark the bdi as write-congested
      if the commit didn't finished.  Once it does finish, the
      write-congested flag will be cleared by nfs_commit_release_pages().
      
      With this, the longest total delay in try_to_free_pages that I have
      seen is under 3 seconds.  With no waiting in nfs_release_page at all
      I have seen delays of nearly 1.5 seconds.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      353db796
    • N
      NFS: avoid deadlocks with loop-back mounted NFS filesystems. · 95905446
      NeilBrown 提交于
      Support for loop-back mounted NFS filesystems is useful when NFS is
      used to access shared storage in a high-availability cluster.
      
      If the node running the NFS server fails, some other node can mount the
      filesystem and start providing NFS service.  If that node already had
      the filesystem NFS mounted, it will now have it loop-back mounted.
      
      nfsd can suffer a deadlock when allocating memory and entering direct
      reclaim.
      While direct reclaim does not write to the NFS filesystem it can send
      and wait for a COMMIT through nfs_release_page().
      
      This patch modifies nfs_release_page() to wait a limited time for the
      commit to complete - one second.  If the commit doesn't complete
      in this time, nfs_release_page() will fail.  This means it might now
      fail in some cases where it wouldn't before.  These cases are only
      when 'gfp' includes '__GFP_WAIT'.
      
      nfs_release_page() is only called by try_to_release_page(), and that
      can only be called on an NFS page with required 'gfp' flags from
       - page_cache_pipe_buf_steal() in splice.c
       - shrink_page_list() in vmscan.c
       - invalidate_inode_pages2_range() in truncate.c
      
      The first two handle failure quite safely.  The last is only called
      after ->launder_page() has been called, and that will have waited
      for the commit to finish already.
      
      So aborting if the commit takes longer than 1 second is perfectly safe.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      95905446
    • N
      NFS: don't use STABLE writes during writeback. · e87b4c7a
      NeilBrown 提交于
      commit b31268ac
        FS: Use stable writes when not doing a bulk flush
      
      was a bit heavy handed.
      The particular problem that lead to this patch was that
      small writes to an O_SYNC file we being written as UNSTABLE writes
      followed by a commit.
      This is appropriate for large writes (which require multiple NFS
      requests) but for small writes (single NFS request), using
      NFS_FILE_SYNC is more efficient.
      
      So that patch causes the code to select between the two methods
      depending on how many nfs requests get generated.
      
      Unfortunately this ends up applying to non O_SYNC writes as well.
      In particular if you memory-map a file and update random pages, then
      when they are eventually written out by writeback they will go as
      NFS_FILE_SYNC.  This is inefficient and slows down the application.
      
      So: only set FLUSH_COND_STABLE when wbc->sync_mode is WB_SYNC_ALL.
      With this patch:
       O_SYNC writes are NFS_FILE_SYNC for single requests, and NFS_UNSTABLE
          followed by COMMIT for multiple requests
       Writing immediately before close of fsync follow the same pattern.
       Non-O_SYNC writes without an fsync of close eventually get flushed
       out as UNSTABLE and a commit follows eventually as appropriate.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      e87b4c7a
  25. 13 9月, 2014 1 次提交
  26. 11 9月, 2014 2 次提交
  27. 23 8月, 2014 1 次提交