1. 19 10月, 2011 15 次提交
    • T
      SUNRPC: Fix rpc_sockaddr2uaddr · d77385f2
      Trond Myklebust 提交于
      rpc_sockaddr2uaddr is only used by net/sunrpc/rpcb_clnt.c, where
      it is used in a non-blockable context in at least one case.
      
      Add non-blocking capability by adding a gfp_t argument
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      d77385f2
    • H
      nfs/super.c: local functions should be static · 45402c38
      H Hartley Sweeten 提交于
      commit ae50c0b5 "pnfs: client stats" added additional information to
      the output of /proc/self/mountstats. The new functions introduced are
      only used in this file and should be marked static.
      
      If CONFIG_NFS_V4_1 is not defined, empty stub functions are used.  If
      CONFIG_NFS_V4 is not defined these stub functions are not used at all.
      Adding static for the functions results in compile warnings:
      
      fs/nfs/super.c:743: warning: 'show_sessions' defined but not used
      fs/nfs/super.c:756: warning: 'show_pnfs' defined but not used
      
      Fix this by adding a #ifdef CONFIG_NFS_V4 guard around the two
      show_ functions.
      Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      45402c38
    • P
      pnfsblock: fix writeback deadlock · 75422745
      Peng Tao 提交于
      We should check if the sector is already initialized before
      trying to grab the page from page cache. Otherwise when two
      pages of the same block are written back by two threads each
      calling from writepage_locked, it can cause deadlock like bellow.
      
       [ 1080.972099] INFO: task kswapd0:25 blocked for more than 120 seconds.
       [ 1080.972377] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       [ 1080.972812] kswapd0         D ffff88000c4926c0     0    25      2 0x00000000
       [ 1080.972816]  ffff88000df276b0 0000000000000046 ffff88000df27640 ffffffff81013ba7
       [ 1080.972821]  ffff88000c492310 ffff88000df27fd8 ffff88000df27fd8 00000000001d3440
       [ 1080.972824]  ffff88000c378000 ffff88000c492310 ffff8800175d3d40 ffff880017fc75a8
       [ 1080.972828] Call Trace:
       [ 1080.972860]  [<ffffffff81013ba7>] ? read_tsc+0x9/0x19
       [ 1080.972877]  [<ffffffff810e0b23>] ? lock_page+0x2b/0x2b
       [ 1080.972899]  [<ffffffff81475a1d>] io_schedule+0x63/0x7e
       [ 1080.972902]  [<ffffffff810e0b31>] sleep_on_page+0xe/0x12
       [ 1080.972905]  [<ffffffff81475fe8>] __wait_on_bit_lock+0x46/0x8f
       [ 1080.972916]  [<ffffffff810822d7>] ? lock_release_holdtime.part.7+0x6b/0x72
       [ 1080.972919]  [<ffffffff810e0af6>] __lock_page+0x66/0x68
       [ 1080.972928]  [<ffffffff81072705>] ? autoremove_wake_function+0x3d/0x3d
       [ 1080.972932]  [<ffffffff810e0b1f>] lock_page+0x27/0x2b
       [ 1080.972934]  [<ffffffff810e0bcf>] find_lock_page+0x34/0x57
       [ 1080.972937]  [<ffffffff810e1738>] find_or_create_page+0x34/0x8a
       [ 1080.972947]  [<ffffffffa034245b>] bl_write_pagelist+0x205/0x6da [blocklayoutdriver]
       [ 1080.972951]  [<ffffffffa034145d>] ? bl_free_lseg+0x38/0x38 [blocklayoutdriver]
       [ 1080.972995]  [<ffffffffa02e27b9>] ? nfs_write_rpcsetup+0x118/0x123 [nfs]
       [ 1080.973033]  [<ffffffffa030246b>] pnfs_generic_pg_writepages+0x10b/0x1f4 [nfs]
       [ 1080.973089]  [<ffffffffa02deaae>] nfs_pageio_doio+0x1a/0x43 [nfs]
       [ 1080.973098]  [<ffffffffa02df035>] nfs_pageio_complete+0x16/0x2d [nfs]
       [ 1080.973108]  [<ffffffffa02e2d8f>] nfs_writepage_locked+0xa0/0xbf [nfs]
       [ 1080.973119]  [<ffffffffa02e36a1>] nfs_writepage+0x16/0x2b [nfs]
       [ 1080.973122]  [<ffffffff810e8762>] ? clear_page_dirty_for_io+0x87/0x9a
       [ 1080.973133]  [<ffffffff810efc5b>] shrink_page_list+0x39b/0x6c8
       [ 1080.973139]  [<ffffffff810f03bb>] shrink_inactive_list+0x22c/0x39e
       [ 1080.973144]  [<ffffffff810822d7>] ? lock_release_holdtime.part.7+0x6b/0x72
       [ 1080.973148]  [<ffffffff810f0c33>] shrink_zone+0x445/0x588
       [ 1080.973152]  [<ffffffff810f1a11>] balance_pgdat+0x2c2/0x56b
       [ 1080.973170]  [<ffffffff81254208>] ? __bitmap_weight+0x34/0x80
       [ 1080.973175]  [<ffffffff810f1f78>] kswapd+0x2be/0x2fa
       [ 1080.973179]  [<ffffffff810726c8>] ? __init_waitqueue_head+0x4b/0x4b
       [ 1080.973183]  [<ffffffff810f1cba>] ? balance_pgdat+0x56b/0x56b
       [ 1080.973187]  [<ffffffff81071f69>] kthread+0xa8/0xb0
       [ 1080.973200]  [<ffffffff814806b4>] kernel_thread_helper+0x4/0x10
       [ 1080.973205]  [<ffffffff81071ec1>] ? __init_kthread_worker+0x5a/0x5a
       [ 1080.973210]  [<ffffffff814806b0>] ? gs_change+0x13/0x13
       [ 1080.973213] no locks held by kswapd0/25.
      Signed-off-by: NPeng Tao <peng_tao@emc.com>
      Signed-off-by: NJim Rees <rees@umich.edu>
      Cc: stable@kernel.org [3.0]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      75422745
    • P
      pnfsblock: fix NULL pointer dereference · e6d05a75
      Peng Tao 提交于
      bl_add_page_to_bio returns error pointer. bio should be reset to
      NULL in failure cases as the out path always calls bl_submit_bio.
      Signed-off-by: NPeng Tao <peng_tao@emc.com>
      Signed-off-by: NJim Rees <rees@umich.edu>
      Cc: stable@kernel.org [3.0]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      e6d05a75
    • P
      pnfs: recoalesce when ld read pagelist fails · 9b7eecdc
      Peng Tao 提交于
      For pnfs pagelist read failure, we need to pg_recoalesce and resend IO to
      mds.
      Signed-off-by: NPeng Tao <peng_tao@emc.com>
      Signed-off-by: NJim Rees <rees@umich.edu>
      Cc: stable@kernel.org [3.0]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      9b7eecdc
    • P
      pnfs: recoalesce when ld write pagelist fails · 8ce160c5
      Peng Tao 提交于
      For pnfs pagelist write failure, we need to pg_recoalesce and resend IO to
      mds.
      Signed-off-by: NPeng Tao <peng_tao@emc.com>
      Signed-off-by: NJim Rees <rees@umich.edu>
      Cc: stable@kernel.org [3.0]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      8ce160c5
    • P
      pnfs: make _set_lo_fail generic · 1b0ae068
      Peng Tao 提交于
      file layout and block layout both use it to set mark layout io failure
      bit. So make it generic.
      Signed-off-by: NPeng Tao <peng_tao@emc.com>
      Signed-off-by: NJim Rees <rees@umich.edu>
      Cc: stable@kernel.org [3.0]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      1b0ae068
    • P
      pnfsblock: add missing rpc_put_mount and path_put · 760383f1
      Peng Tao 提交于
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NPeng Tao <peng_tao@emc.com>
      Signed-off-by: NJim Rees <rees@umich.edu>
      Cc: stable@kernel.org [3.0]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      760383f1
    • P
      SUNRPC/NFS: make rpc pipe upcall generic · c1225158
      Peng Tao 提交于
      The same function is used by idmap, gss and blocklayout code. Make it
      generic.
      Signed-off-by: NPeng Tao <peng_tao@emc.com>
      Signed-off-by: NJim Rees <rees@umich.edu>
      Cc: stable@kernel.org [3.0]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      c1225158
    • J
      pnfsblock: fix size of upcall message · fdc17abb
      Jim Rees 提交于
      Make the status field explicitly 32 bits.  "...it's unlikely that the kernel
      and userspace would differ on the size of an int here, but it might be a
      good idea to go ahead and make that explicitly 32 bits in case we end up
      dealing with more exotic arches at some point in the future."
      Suggested-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJim Rees <rees@umich.edu>
      Signed-off-by: NBenny Halevy <bhalevy@tonian.com>
      Cc: stable@kernel.org [3.0]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      fdc17abb
    • J
      pnfsblock: fix return code confusion · 516f2e24
      Jim Rees 提交于
      Always return PTR_ERR, not NULL, from nfs4_blk_get_deviceinfo and
      nfs4_blk_decode_device.
      
      Check for IS_ERR, not NULL, in bl_set_layoutdriver when calling
      nfs4_blk_get_deviceinfo.
      Signed-off-by: NJim Rees <rees@umich.edu>
      Signed-off-by: NBenny Halevy <bhalevy@tonian.com>
      Cc: stable@kernel.org [3.0]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      516f2e24
    • J
      nfs: don't try to migrate pages with active requests · 2da95652
      Jeff Layton 提交于
      nfs_find_and_lock_request will take a reference to the nfs_page and
      will then put it if the req is already locked. It's possible though
      that the reference will be the last one. That put then can kick off
      a whole series of reference puts:
      
      nfs_page
         nfs_open_context
            dentry
                inode
      
      If the inode ends up being deleted, then the VFS will call
      truncate_inode_pages. That function will try to take the page lock, but
      it was already locked when migrate_page was called. The code
      deadlocks.
      
      Fix this by simply refusing the migration request if PagePrivate is
      already set, indicating that the page is already associated with an
      active read or write request.
      
      We've had a customer test a backported version of this patch and
      the preliminary results seem good.
      
      Cc: stable@kernel.org
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Reported-by: NHarshula Jayasuriya <harshula@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      2da95652
    • M
      nfs: fix bug about IPv6 address scope checking · b9dd3abb
      Mi Jinlong 提交于
      The result from ipv6_addr_scope() always not be a single SCOPE,
      so we can't use equal to compare the result with IPV6_ADDR_SCOPE_LINKLOCAL
      at nfs_sockaddr_match_ipaddr6.
      
      This patch fixs the problem, and lets checking address before scope_id.
      Signed-off-by: NMi Jinlong <mijinlong@cn.fujitsu.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      b9dd3abb
    • J
      nfs: don't redirty inode when ncommit == 0 in nfs_commit_unstable_pages · 3236c3e1
      Jeff Layton 提交于
      commit 420e3646 allowed the kernel to reduce the number of unnecessary
      commit calls by skipping the commit when there are a large number of
      outstanding pages.
      
      However, the current test in nfs_commit_unstable_pages does not handle
      the edge condition properly. When ncommit == 0, then that means that the
      kernel doesn't need to do anything more for the inode. The current test
      though in the WB_SYNC_NONE case will return true, and the inode will end
      up being marked dirty. Once that happens the inode will never be clean
      until there's a WB_SYNC_ALL flush.
      
      Fix this by immediately returning from nfs_commit_unstable_pages when
      ncommit == 0.
      
      Mike noticed this problem initially in RHEL5 (2.6.18-based kernel) which
      has a backported version of 420e3646. The inode cache there was growing
      very large. The inode cache was unable to be shrunk since the inodes
      were all marked dirty. Calling sync() would essentially "fix" the
      problem -- the WB_SYNC_ALL flush would result in the inodes all being
      marked clean.
      
      What I'm not clear on is how big a problem this is in mainline kernels
      as the writeback code there is very different. Either way, it seems
      incorrect to re-mark the inode dirty in this case.
      Reported-by: NMike McLean <mikem@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Cc: stable@kernel.org [2.6.34+]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      3236c3e1
    • T
      Revert "NFS: Ensure that writeback_single_inode() calls write_inode() when syncing" · 59b7c05f
      Trond Myklebust 提交于
      This reverts commit b80c3cb6.
      
      The reverted commit was rendered obsolete by a VFS fix: commit
      5547e8aa (writeback: Update dirty flags in
      two steps). We now no longer need to worry about writeback_single_inode()
      missing our marking the inode for COMMIT in 'do_writepages()' call.
      
      Reverting this patch, fixes a performance regression in which the inode
      would continuously get queued to the dirty list, causing the writeback
      code to unnecessarily try to send a COMMIT.
      
      Signed-off-by: Trond Myklebust <Trond.Myklebust>
      Tested-by: NSimon Kirby <sim@hostway.ca>
      Cc: stable@kernel.org [2.6.35+]
      59b7c05f
  2. 18 10月, 2011 1 次提交
  3. 17 10月, 2011 2 次提交
  4. 15 10月, 2011 4 次提交
  5. 14 10月, 2011 8 次提交
  6. 13 10月, 2011 4 次提交
  7. 12 10月, 2011 3 次提交
  8. 11 10月, 2011 3 次提交
    • C
      Btrfs: make sure not to defrag extents past i_size · f7f43cc8
      Chris Mason 提交于
      The btrfs file defrag code will loop through the extents and
      force COW on them.  But there is a concurrent truncate in the middle of
      the defrag, it might end up defragging the same range over and over
      again.
      
      The problem is that writepage won't go through and do anything on pages
      past i_size, so the cow won't happen, so the file will appear to still
      be fragmented.  defrag will end up hitting the same extents again and
      again.
      
      In the worst case, the truncate can actually live lock with the defrag
      because the defrag keeps creating new ordered extents which the truncate
      code keeps waiting on.
      
      The fix here is to make defrag check for i_size inside the main loop,
      instead of just once before the looping starts.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f7f43cc8
    • A
      x86: Default to vsyscall=native for now · 2b666859
      Adrian Bunk 提交于
      This UML breakage:
      
        linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
        linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790
      
      Is caused by commit 3ae36655 ("x86-64: Rework vsyscall emulation and add
      vsyscall= parameter") - the vsyscall emulation code is not fully cooked
      yet as UML relies on some rather fragile SIGSEGV semantics.
      
      Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default
      to vsyscall=native for now, this patch implements that.
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Acked-by: NAndrew Lutomirski <luto@mit.edu>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/r/20111005214047.GE14406@localhost.pp.htv.fiSigned-off-by: NIngo Molnar <mingo@elte.hu>
      2b666859
    • L
      Btrfs: fix recursive auto-defrag · 2a0f7f57
      Li Zefan 提交于
      Follow those steps:
      
        # mount -o autodefrag /dev/sda7 /mnt
        # dd if=/dev/urandom of=/mnt/tmp bs=200K count=1
        # sync
        # dd if=/dev/urandom of=/mnt/tmp bs=8K count=1 conv=notrunc
      
      and then it'll go into a loop: writeback -> defrag -> writeback ...
      
      It's because writeback writes [8K, 200K] and then writes [0, 8K].
      
      I tried to make writeback know if the pages are dirtied by defrag,
      but the patch was a bit intrusive. Here I simply set writeback_index
      when we defrag a file.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      2a0f7f57