1. 14 7月, 2011 1 次提交
    • S
      GFS2: Resolve inode eviction and ail list interaction bug · 380f7c65
      Steven Whitehouse 提交于
      This patch contains a few misc fixes which resolve a recently
      reported issue. This patch has been a real team effort and has
      received a lot of testing.
      
      The first issue is that the ail lock needs to be held over a few
      more operations. The lock thats added into gfs2_releasepage() may
      possibly be a candidate for replacing with RCU at some future
      point, but at this stage we've gone for the obvious fix.
      
      The second issue is that gfs2_write_inode() can end up calling
      a glock recursively when called from gfs2_evict_inode() via the
      syncing code, so it needs a guard added.
      
      The third issue is that we either need to not truncate the metadata
      pages of inodes which have zero link count, but which we cannot
      deallocate due to them still being in use by other nodes, or we need
      to ensure that those pages have all made it through the journal and
      ail lists first. This patch takes the former approach, but the
      latter has also been tested and there is nothing to choose between
      them performance-wise. So again, we could revise that decision
      in the future.
      
      Also, the inode eviction process is now better documented.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Tested-by: NBob Peterson <rpeterso@redhat.com>
      Tested-by: NAbhijith Das <adas@redhat.com>
      Reported-by: NBarry J. Marson <bmarson@redhat.com>
      Reported-by: NDavid Teigland <teigland@redhat.com>
      380f7c65
  2. 12 7月, 2011 3 次提交
    • S
      GFS2: Fix race during filesystem mount · 3942ae53
      Steven Whitehouse 提交于
      There is a potential race during filesystem mounting which has recently
      been reported. It occurs when the userland gfs_controld is able to
      process requests fast enough that it tries to use the sysfs interface
      before the lock module is properly initialised. This is a pretty
      unusual case as normally the lock module initialisation is very quick
      compared with gfs_controld.
      
      This patch adds an interruptible completion which is used to ensure that
      userland will wait for the initialisation of the lock module to
      complete.
      
      There are other potential solutions to this problem, but this is the
      quickest at this stage and has been tested both with and without
      mount.gfs2 present in the system.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Reported-by: NDavid Booher <dbooher@adams.net>
      3942ae53
    • B
      GFS2: force a log flush when invalidating the rindex glock · 1ce53368
      Benjamin Marzinski 提交于
      Right now, there is nothing that forces the log to get flushed when a node
      drops its rindex glock so that another node can grow the filesystem. If the
      log doesn't get flushed, GFS2 can corrupt the sd_log_le_rg list in the
      following way.
      
      A node puts an rgd on the list in rg_lo_add(), and then the rindex glock is
      dropped so the other node can grow the filesystem. When the node reacquires the
      rindex glock, that rgd gets deleted in clear_rgrpdi() before ever being
      removed from the list by gfs2_log_flush().
      
      This code simply forces a log flush when the rindex glock is invalidated,
      solving the problem.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1ce53368
    • J
      cifs: drop spinlock before calling cifs_put_tlink · f484b5d0
      Jeff Layton 提交于
      ...as that function can sleep.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      f484b5d0
  3. 10 7月, 2011 2 次提交
  4. 08 7月, 2011 2 次提交
    • J
    • D
      FS-Cache: Add a helper to bulk uncache pages on an inode · c902ce1b
      David Howells 提交于
      Add an FS-Cache helper to bulk uncache pages on an inode.  This will
      only work for the circumstance where the pages in the cache correspond
      1:1 with the pages attached to an inode's page cache.
      
      This is required for CIFS and NFS: When disabling inode cookie, we were
      returning the cookie and setting cifsi->fscache to NULL but failed to
      invalidate any previously mapped pages.  This resulted in "Bad page
      state" errors and manifested in other kind of errors when running
      fsstress.  Fix it by uncaching mapped pages when we disable the inode
      cookie.
      
      This patch should fix the following oops and "Bad page state" errors
      seen during fsstress testing.
      
        ------------[ cut here ]------------
        kernel BUG at fs/cachefiles/namei.c:201!
        invalid opcode: 0000 [#1] SMP
        Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs
        RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles]
        RSP: 0018:ffff88002ce6dd00  EFLAGS: 00010282
        RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000
        RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282
        RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300
        R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840
        R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0
        FS:  00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0)
        Stack:
         0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00
         ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380
         ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56
        Call Trace:
         cachefiles_lookup_object+0x78/0xd4 [cachefiles]
         fscache_lookup_object+0x131/0x16d [fscache]
         fscache_object_work_func+0x1bc/0x669 [fscache]
         process_one_work+0x186/0x298
         worker_thread+0xda/0x15d
         kthread+0x84/0x8c
         kernel_thread_helper+0x4/0x10
        RIP  cachefiles_walk_to_object+0x436/0x745 [cachefiles]
        ---[ end trace 1d481c9af1804caa ]---
      
      I tested the uncaching by the following means:
      
       (1) Create a big file on my NFS server (104857600 bytes).
      
       (2) Read the file into the cache with md5sum on the NFS client.  Look in
           /proc/fs/fscache/stats:
      
      	Pages  : mrk=25601 unc=0
      
       (3) Open the file for read/write ("bash 5<>/warthog/bigfile").  Look in proc
           again:
      
      	Pages  : mrk=25601 unc=25601
      Reported-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-and-Tested-by: NSuresh Jayaraman <sjayaraman@suse.de>
      cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c902ce1b
  5. 07 7月, 2011 9 次提交
    • M
      btrfs: fix oops when doing space balance · 149e2d76
      Miao Xie 提交于
      We need to make sure the data relocation inode doesn't go through
      the delayed metadata updates, otherwise we get an oops during balance:
      
      kernel BUG at fs/btrfs/relocation.c:4303!
      [SNIP]
      Call Trace:
       [<ffffffffa03143fd>] ? update_ref_for_cow+0x22d/0x330 [btrfs]
       [<ffffffffa0314951>] __btrfs_cow_block+0x451/0x5e0 [btrfs]
       [<ffffffffa031355d>] ? read_block_for_search+0x14d/0x4d0 [btrfs]
       [<ffffffffa0314beb>] btrfs_cow_block+0x10b/0x240 [btrfs]
       [<ffffffffa031acae>] btrfs_search_slot+0x49e/0x7a0 [btrfs]
       [<ffffffffa032d8af>] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
       [<ffffffff8147bf0e>] ? mutex_lock+0x1e/0x50
       [<ffffffffa0380cf1>] btrfs_update_delayed_inode+0x71/0x160 [btrfs]
       [<ffffffffa037ff27>] ? __btrfs_release_delayed_node+0x67/0x190 [btrfs]
       [<ffffffffa0381cf8>] btrfs_run_delayed_items+0xe8/0x120 [btrfs]
       [<ffffffffa03365e0>] btrfs_commit_transaction+0x250/0x850 [btrfs]
       [<ffffffff810f91d9>] ? find_get_pages+0x39/0x130
       [<ffffffffa0336cd5>] ? join_transaction+0x25/0x250 [btrfs]
       [<ffffffff81081de0>] ? wake_up_bit+0x40/0x40
       [<ffffffffa03785fa>] prepare_to_relocate+0xda/0xf0 [btrfs]
       [<ffffffffa037f2bb>] relocate_block_group+0x4b/0x620 [btrfs]
       [<ffffffffa0334cf5>] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs]
       [<ffffffffa037fa43>] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs]
       [<ffffffffa0368ec0>] ? btrfs_tree_unlock+0x50/0x50 [btrfs]
       [<ffffffffa035e39b>] btrfs_relocate_chunk+0x8b/0x670 [btrfs]
       [<ffffffffa031303d>] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs]
       [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
       [<ffffffffa031bea1>] ? btrfs_previous_item+0xb1/0x150 [btrfs]
       [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
       [<ffffffffa035f5aa>] btrfs_balance+0x21a/0x2b0 [btrfs]
       [<ffffffffa0368898>] btrfs_ioctl+0x798/0xd20 [btrfs]
       [<ffffffff8111e358>] ? handle_mm_fault+0x148/0x270
       [<ffffffff814809e8>] ? do_page_fault+0x1d8/0x4b0
       [<ffffffff81160d6a>] do_vfs_ioctl+0x9a/0x540
       [<ffffffff811612b1>] sys_ioctl+0xa1/0xb0
       [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b
      [SNIP]
      RIP  [<ffffffffa037c1cc>] btrfs_reloc_cow_block+0x22c/0x270 [btrfs]
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      149e2d76
    • J
      Btrfs: don't panic if we get an error while balancing V2 · 508794eb
      Josef Bacik 提交于
      A user reported an error where if we try to balance an fs after a device has
      been removed it will blow up.  This is because we get an EIO back and this is
      where BUG_ON(ret) bites us in the ass.  To fix we just exit.  Thanks,
      Reported-by: NAnand Jain <Anand.Jain@oracle.com>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      508794eb
    • D
      btrfs: add missing options displayed in mount output · 0942caa3
      David Sterba 提交于
      There are three missed mount options settable by user which are not
      currently displayed in mount output.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0942caa3
    • D
      xfs: unpin stale inodes directly in IOP_COMMITTED · 1316d4da
      Dave Chinner 提交于
      When inodes are marked stale in a transaction, they are treated
      specially when the inode log item is being inserted into the AIL.
      It tries to avoid moving the log item forward in the AIL due to a
      race condition with the writing the underlying buffer back to disk.
      The was "fixed" in commit de25c181 ("xfs: avoid moving stale inodes
      in the AIL").
      
      To avoid moving the item forward, we return a LSN smaller than the
      commit_lsn of the completing transaction, thereby trying to trick
      the commit code into not moving the inode forward at all. I'm not
      sure this ever worked as intended - it assumes the inode is already
      in the AIL, but I don't think the returned LSN would have been small
      enough to prevent moving the inode. It appears that the reason it
      worked is that the lower LSN of the inodes meant they were inserted
      into the AIL and flushed before the inode buffer (which was moved to
      the commit_lsn of the transaction).
      
      The big problem is that with delayed logging, the returning of the
      different LSN means insertion takes the slow, non-bulk path.  Worse
      yet is that insertion is to a position -before- the commit_lsn so it
      is doing a AIL traversal on every insertion, and has to walk over
      all the items that have already been inserted into the AIL. It's
      expensive.
      
      To compound the matter further, with delayed logging inodes are
      likely to go from clean to stale in a single checkpoint, which means
      they aren't even in the AIL at all when we come across them at AIL
      insertion time. Hence these were all getting inserted into the AIL
      when they simply do not need to be as inodes marked XFS_ISTALE are
      never written back.
      
      Transactional/recovery integrity is maintained in this case by the
      other items in the unlink transaction that were modified (e.g. the
      AGI btree blocks) and committed in the same checkpoint.
      
      So to fix this, simply unpin the stale inodes directly in
      xfs_inode_item_committed() and return -1 to indicate that the AIL
      insertion code does not need to do any further processing of these
      inodes.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      1316d4da
    • J
      cifs: have cifs_cleanup_volume_info not take a double pointer · f9e59bcb
      Jeff Layton 提交于
      ...as that makes for a cumbersome interface. Make it take a regular
      smb_vol pointer and rely on the caller to zero it out if needed.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NPavel Shilovsky <piastryyy@gmail.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      f9e59bcb
    • J
      cifs: fix build_unc_path_to_root to account for a prefixpath · b2a0fa15
      Jeff Layton 提交于
      Regression introduced by commit f87d39d9.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NPavel Shilovsky <piastryyy@gmail.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      b2a0fa15
    • J
      cifs: remove bogus call to cifs_cleanup_volume_info · 677d8537
      Jeff Layton 提交于
      This call to cifs_cleanup_volume_info is clearly wrong. As soon as it's
      called the following call to cifs_get_tcp_session will oops as the
      volume_info pointer will then be NULL.
      
      The caller of cifs_mount should clean up this data since it passed it
      in. There's no need for us to call this here.
      
      Regression introduced by commit 724d9f1c.
      Reported-by: NAdam Williamson <awilliam@redhat.com>
      Cc: Pavel Shilovsky <piastryyy@gmail.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      677d8537
    • D
      FDPIC: Fix memory leak · bcb65a79
      Davidlohr Bueso 提交于
      The shdr4extnum variable isn't being freed in the cleanup process of
      elf_fdpic_core_dump().
      Signed-off-by: NDavidlohr Bueso <dave@gnu.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bcb65a79
    • M
      fs: fix lock initialization · a51cb91d
      Miklos Szeredi 提交于
      locks_alloc_lock() assumed that the allocated struct file_lock is
      already initialized to zero members.  This is only true for the first
      allocation of the structure, after reuse some of the members will have
      random values.
      
      This will for example result in passing random fl_start values to
      userspace in fuse for FL_FLOCK locks, which is an information leak at
      best.
      
      Fix by reinitializing those members which may be non-zero after freeing.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      CC: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a51cb91d
  6. 02 7月, 2011 1 次提交
    • J
      cifs: set socket send and receive timeouts before attempting connect · ee1b3ea9
      Jeff Layton 提交于
      Benjamin S. reported that he was unable to suspend his machine while
      it had a cifs share mounted. The freezer caused this to spew when he
      tried it:
      
      -----------------------[snip]------------------
      PM: Syncing filesystems ... done.
      Freezing user space processes ... (elapsed 0.01 seconds) done.
      Freezing remaining freezable tasks ...
      Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
      cifsd         S ffff880127f7b1b0     0  1821      2 0x00800000
       ffff880127f7b1b0 0000000000000046 ffff88005fe008a8 ffff8800ffffffff
       ffff880127cee6b0 0000000000011100 ffff880127737fd8 0000000000004000
       ffff880127737fd8 0000000000011100 ffff880127f7b1b0 ffff880127736010
      Call Trace:
       [<ffffffff811e85dd>] ? sk_reset_timer+0xf/0x19
       [<ffffffff8122cf3f>] ? tcp_connect+0x43c/0x445
       [<ffffffff8123374e>] ? tcp_v4_connect+0x40d/0x47f
       [<ffffffff8126ce41>] ? schedule_timeout+0x21/0x1ad
       [<ffffffff8126e358>] ? _raw_spin_lock_bh+0x9/0x1f
       [<ffffffff811e81c7>] ? release_sock+0x19/0xef
       [<ffffffff8123e8be>] ? inet_stream_connect+0x14c/0x24a
       [<ffffffff8104485b>] ? autoremove_wake_function+0x0/0x2a
       [<ffffffffa02ccfe2>] ? ipv4_connect+0x39c/0x3b5 [cifs]
       [<ffffffffa02cd7b7>] ? cifs_reconnect+0x1fc/0x28a [cifs]
       [<ffffffffa02cdbdc>] ? cifs_demultiplex_thread+0x397/0xb9f [cifs]
       [<ffffffff81076afc>] ? perf_event_exit_task+0xb9/0x1bf
       [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f [cifs]
       [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f [cifs]
       [<ffffffff810444a1>] ? kthread+0x7a/0x82
       [<ffffffff81002d14>] ? kernel_thread_helper+0x4/0x10
       [<ffffffff81044427>] ? kthread+0x0/0x82
       [<ffffffff81002d10>] ? kernel_thread_helper+0x0/0x10
      
      Restarting tasks ... done.
      -----------------------[snip]------------------
      
      We do attempt to perform a try_to_freeze in cifs_reconnect, but the
      connection attempt itself seems to be taking longer than 20s to time
      out. The connect timeout is governed by the socket send and receive
      timeouts, so we can shorten that period by setting those timeouts
      before attempting the connect instead of after.
      
      Adam Williamson tested the patch and said that it seems to have fixed
      suspending on his laptop when a cifs share is mounted.
      Reported-by: NBenjamin S <da_joind@gmx.net>
      Tested-by: NAdam Williamson <awilliam@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      ee1b3ea9
  7. 30 6月, 2011 2 次提交
  8. 29 6月, 2011 1 次提交
  9. 28 6月, 2011 2 次提交
  10. 27 6月, 2011 1 次提交
    • M
      btrfs: fix inconsonant inode information · 2f7e33d4
      Miao Xie 提交于
      When iputting the inode, We may leave the delayed nodes if they have some
      delayed items that have not been dealt with. So when the inode is read again,
      we must look up the relative delayed node, and use the information in it to
      initialize the inode. Or we will get inconsonant inode information, it may
      cause that the same directory index number is allocated again, and hit the
      following oops:
      
      [ 5447.554187] err add delayed dir index item(name: pglog_0.965_0) into the
      insertion tree of the delayed node(root id: 262, inode id: 258, errno: -17)
      [ 5447.569766] ------------[ cut here ]------------
      [ 5447.575361] kernel BUG at fs/btrfs/delayed-inode.c:1301!
      [SNIP]
      [ 5447.790721] Call Trace:
      [ 5447.793191]  [<ffffffffa0641c4e>] btrfs_insert_dir_item+0x189/0x1bb [btrfs]
      [ 5447.800156]  [<ffffffffa0651a45>] btrfs_add_link+0x12b/0x191 [btrfs]
      [ 5447.806517]  [<ffffffffa0651adc>] btrfs_add_nondir+0x31/0x58 [btrfs]
      [ 5447.812876]  [<ffffffffa0651d6a>] btrfs_create+0xf9/0x197 [btrfs]
      [ 5447.818961]  [<ffffffff8111f840>] vfs_create+0x72/0x92
      [ 5447.824090]  [<ffffffff8111fa8c>] do_last+0x22c/0x40b
      [ 5447.829133]  [<ffffffff8112076a>] path_openat+0xc0/0x2ef
      [ 5447.834438]  [<ffffffff810c58e2>] ? __perf_event_task_sched_out+0x24/0x44
      [ 5447.841216]  [<ffffffff8103ecdd>] ? perf_event_task_sched_out+0x59/0x67
      [ 5447.847846]  [<ffffffff81121a79>] do_filp_open+0x3d/0x87
      [ 5447.853156]  [<ffffffff811e126c>] ? strncpy_from_user+0x43/0x4d
      [ 5447.859072]  [<ffffffff8111f1f5>] ? getname_flags+0x2e/0x80
      [ 5447.864636]  [<ffffffff8111f179>] ? do_getname+0x14b/0x173
      [ 5447.870112]  [<ffffffff8111f1b7>] ? audit_getname+0x16/0x26
      [ 5447.875682]  [<ffffffff8112b1ab>] ? spin_lock+0xe/0x10
      [ 5447.880882]  [<ffffffff81112d39>] do_sys_open+0x69/0xae
      [ 5447.886153]  [<ffffffff81112db1>] sys_open+0x20/0x22
      [ 5447.891114]  [<ffffffff813b9aab>] system_call_fastpath+0x16/0x1b
      
      Fix it by reusing the old delayed node.
      Reported-by: NJim Schutt <jaschut@sandia.gov>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Tested-by: NJim Schutt <jaschut@sandia.gov>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      2f7e33d4
  11. 25 6月, 2011 16 次提交