1. 23 5月, 2012 1 次提交
  2. 21 5月, 2012 1 次提交
    • T
      ext4: enable the 64-bit jbd2 feature based on the 64-bit ext4 feature · f32aaf2d
      Theodore Ts'o 提交于
      Previously we were only enabling the 64-bit jbd2 feature if the number
      of blocks in the file system was greater 2**32-1.  The problem with
      this is that it makes it harder to test the 64-bit journal code paths
      with small file systems, since a small test file system would with the
      64-bit ext4 feature enable would use a 64-bit file system on-disk data
      structures, but use a 32-bit journal.
      
      This would also cause problems when trying to do an online resize to
      grow the filesystem above the 2**32-1 boundary.  Fortunately the patch
      to support online resize for 64-bit file systems hasn't been merged
      yet, so this problem hasn't arisen in practice.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f32aaf2d
  3. 30 4月, 2012 17 次提交
  4. 28 4月, 2012 8 次提交
    • L
      Revert "autofs: work around unhappy compat problem on x86-64" · fcbf94b9
      Linus Torvalds 提交于
      This reverts commit a32744d4.
      
      While that commit was technically the right thing to do, and made the
      x86-64 compat mode work identically to native 32-bit mode (and thus
      fixing the problem with a 32-bit systemd install on a 64-bit kernel), it
      turns out that the automount binaries had workarounds for this compat
      problem.
      
      Now, the workarounds are disgusting: doing an "uname()" to find out the
      architecture of the kernel, and then comparing it for the 64-bit cases
      and fixing up the size of the read() in automount for those.  And they
      were confused: it's not actually a generic 64-bit issue at all, it's
      very much tied to just x86-64, which has different alignment for an
      'u64' in 64-bit mode than in 32-bit mode.
      
      But the end result is that fixing the compat layer actually breaks the
      case of a 32-bit automount on a x86-64 kernel.
      
      There are various approaches to fix this (including just doing a
      "strcmp()" on current->comm and comparing it to "automount"), but I
      think that I will do the one that teaches pipes about a special "packet
      mode", which will allow user space to not have to care too deeply about
      the padding at the end of the autofs packet.
      
      That change will make the compat workaround unnecessary, so let's revert
      it first, and get automount working again in compat mode.  The
      packetized pipes will then fix autofs for systemd.
      Reported-and-requested-by: NMichael Tokarev <mjt@tls.msk.ru>
      Cc: Ian Kent <raven@themaw.net>
      Cc: stable@kernel.org # for 3.3
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fcbf94b9
    • C
      Btrfs: reduce lock contention during extent insertion · dc7fdde3
      Chris Mason 提交于
      We're spending huge amounts of time on lock contention during
      end_io processing because we unconditionally assume we are overwriting
      an existing extent in the file for each IO.
      
      This checks to see if we are outside i_size, and if so, it uses a
      less expensive readonly search of the btree to look for existing
      extents.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      dc7fdde3
    • C
      Btrfs: avoid deadlocks from GFP_KERNEL allocations during btrfs_real_readdir · fede766f
      Chris Mason 提交于
      Btrfs has an optimization where it will preallocate dentries during
      readdir to fill in enough information to open the inode without an extra
      lookup.
      
      But, we're calling d_alloc, which is doing GFP_KERNEL allocations, and
      that leads to deadlocks because our readdir code has tree locks held.
      
      For now, disable this optimization.  We'll fix the gfp mask in the next
      merge window.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      fede766f
    • D
      Btrfs: Fix space checking during fs resize · 7654b724
      Daniel J Blueman 提交于
      Fix out-of-space checking, addressing a warning and potential resource
      leak when resizing the filesystem down while allocating blocks.
      Signed-off-by: NDaniel J Blueman <daniel@quora.org>
      Reviewed-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      7654b724
    • S
      Btrfs: fix block_rsv and space_info lock ordering · 1f699d38
      Stefan Behrens 提交于
      may_commit_transaction() calls
              spin_lock(&space_info->lock);
              spin_lock(&delayed_rsv->lock);
      and update_global_block_rsv() calls
              spin_lock(&block_rsv->lock);
              spin_lock(&sinfo->lock);
      
      Lockdep complains about this at run time.
      Everywhere except in update_global_block_rsv(), the space_info lock is
      the outer lock, therefore the locking order in update_global_block_rsv()
      is changed.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      1f699d38
    • D
      Btrfs: Prevent root_list corruption · 1daf3540
      Daniel J Blueman 提交于
      I was seeing root_list corruption on unmount during fs resize in 3.4-rc4; add
      correct locking to address this.
      Signed-off-by: NDaniel J Blueman <daniel@quora.org>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      1daf3540
    • J
      Btrfs: fix repair code for RAID10 · 3e74317a
      Jan Schmidt 提交于
      btrfs_map_block sets mirror_num, so that the repair code knows eventually
      which device gave us the read error. For RAID10, mirror_num must be 1 or 2.
      Before this fix mirror_num was incorrectly related to our stripe index.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      3e74317a
    • J
      Btrfs: do not start delalloc inodes during sync · 996d282c
      Josef Bacik 提交于
      btrfs_start_delalloc_inodes will just walk the list of delalloc inodes and
      start writing them out, but it doesn't splice the list or anything so as
      long as somebody is doing work on the box you could end up in this section
      _forever_.  So just remove it, it's not needed anyway since sync will start
      writeback on all inodes anyway, all we need to do is wait for ordered
      extents and then we can commit the transaction.  In my horrible torture test
      sync goes from taking 4 minutes to about 1.5 minutes.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      996d282c
  5. 26 4月, 2012 4 次提交
    • W
      revert "proc: clear_refs: do not clear reserved pages" · 63f61a6f
      Will Deacon 提交于
      Revert commit 85e72aa5 ("proc: clear_refs: do not clear reserved
      pages"), which was a quick fix suitable for -stable until ARM had been
      moved over to the gate_vma mechanism:
      
      https://lkml.org/lkml/2012/1/14/55
      
      With commit f9d4861f ("ARM: 7294/1: vectors: use gate_vma for vectors user
      mapping"), ARM does now use the gate_vma, so the PageReserved check can be
      removed from the proc code.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Cc: Nicolas Pitre <nico@linaro.org>
      Acked-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      63f61a6f
    • A
      hugetlbfs: lockdep annotate root inode properly · 65ed7601
      Aneesh Kumar K.V 提交于
      This fixes the below reported false lockdep warning.  e096d0c7
      ("lockdep: Add helper function for dir vs file i_mutex annotation") added
      a similar annotation for every other inode in hugetlbfs but missed the
      root inode because it was allocated by a separate function.
      
      For HugeTLB fs we allow taking i_mutex in mmap.  HugeTLB fs doesn't
      support file write and its file read callback is modified in a05b0855
      ("hugetlbfs: avoid taking i_mutex from hugetlbfs_read()") to not take
      i_mutex.  Hence for HugeTLB fs with regular files we really don't take
      i_mutex with mmap_sem held.
      
       ======================================================
       [ INFO: possible circular locking dependency detected ]
       3.4.0-rc1+ #322 Not tainted
       -------------------------------------------------------
       bash/1572 is trying to acquire lock:
        (&mm->mmap_sem){++++++}, at: [<ffffffff810f1618>] might_fault+0x40/0x90
      
       but task is already holding lock:
        (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff81125f88>] vfs_readdir+0x56/0xa8
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (&sb->s_type->i_mutex_key#12){+.+.+.}:
              [<ffffffff810a09e5>] lock_acquire+0xd5/0xfa
              [<ffffffff816a2f5e>] __mutex_lock_common+0x48/0x350
              [<ffffffff816a3325>] mutex_lock_nested+0x2a/0x31
              [<ffffffff811fb8e1>] hugetlbfs_file_mmap+0x7d/0x104
              [<ffffffff810f859a>] mmap_region+0x272/0x47d
              [<ffffffff810f8a39>] do_mmap_pgoff+0x294/0x2ee
              [<ffffffff810f8b65>] sys_mmap_pgoff+0xd2/0x10e
              [<ffffffff8103d19e>] sys_mmap+0x1d/0x1f
              [<ffffffff816a5922>] system_call_fastpath+0x16/0x1b
      
       -> #0 (&mm->mmap_sem){++++++}:
              [<ffffffff810a0256>] __lock_acquire+0xa81/0xd75
              [<ffffffff810a09e5>] lock_acquire+0xd5/0xfa
              [<ffffffff810f1645>] might_fault+0x6d/0x90
              [<ffffffff81125d62>] filldir+0x6a/0xc2
              [<ffffffff81133a83>] dcache_readdir+0x5c/0x222
              [<ffffffff81125fa8>] vfs_readdir+0x76/0xa8
              [<ffffffff811260b6>] sys_getdents+0x79/0xc9
              [<ffffffff816a5922>] system_call_fastpath+0x16/0x1b
      
       other info that might help us debug this:
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(&sb->s_type->i_mutex_key#12);
                                      lock(&mm->mmap_sem);
                                      lock(&sb->s_type->i_mutex_key#12);
         lock(&mm->mmap_sem);
      
        *** DEADLOCK ***
      
       1 lock held by bash/1572:
        #0:  (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff81125f88>] vfs_readdir+0x56/0xa8
      
       stack backtrace:
       Pid: 1572, comm: bash Not tainted 3.4.0-rc1+ #322
       Call Trace:
        [<ffffffff81699a3c>] print_circular_bug+0x1f8/0x209
        [<ffffffff810a0256>] __lock_acquire+0xa81/0xd75
        [<ffffffff810f38aa>] ? handle_pte_fault+0x5ff/0x614
        [<ffffffff8109e622>] ? mark_lock+0x2d/0x258
        [<ffffffff810f1618>] ? might_fault+0x40/0x90
        [<ffffffff810a09e5>] lock_acquire+0xd5/0xfa
        [<ffffffff810f1618>] ? might_fault+0x40/0x90
        [<ffffffff816a3249>] ? __mutex_lock_common+0x333/0x350
        [<ffffffff810f1645>] might_fault+0x6d/0x90
        [<ffffffff810f1618>] ? might_fault+0x40/0x90
        [<ffffffff81125d62>] filldir+0x6a/0xc2
        [<ffffffff81133a83>] dcache_readdir+0x5c/0x222
        [<ffffffff81125cf8>] ? sys_ioctl+0x74/0x74
        [<ffffffff81125cf8>] ? sys_ioctl+0x74/0x74
        [<ffffffff81125cf8>] ? sys_ioctl+0x74/0x74
        [<ffffffff81125fa8>] vfs_readdir+0x76/0xa8
        [<ffffffff811260b6>] sys_getdents+0x79/0xc9
        [<ffffffff816a5922>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      65ed7601
    • G
      fs/buffer.c: remove BUG() in possible but rare condition · 61065a30
      Glauber Costa 提交于
      While stressing the kernel with with failing allocations today, I hit the
      following chain of events:
      
      alloc_page_buffers():
      
      	bh = alloc_buffer_head(GFP_NOFS);
      	if (!bh)
      		goto no_grow; <= path taken
      
      grow_dev_page():
              bh = alloc_page_buffers(page, size, 0);
              if (!bh)
                      goto failed;  <= taken, consequence of the above
      
      and then the failed path BUG()s the kernel.
      
      The failure is inserted a litte bit artificially, but even then, I see no
      reason why it should be deemed impossible in a real box.
      
      Even though this is not a condition that we expect to see around every
      time, failed allocations are expected to be handled, and BUG() sounds just
      too much.  As a matter of fact, grow_dev_page() can return NULL just fine
      in other circumstances, so I propose we just remove it, then.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      61065a30
    • J
      epoll: clear the tfile_check_list on -ELOOP · 13d51807
      Jason Baron 提交于
      An epoll_ctl(,EPOLL_CTL_ADD,,) operation can return '-ELOOP' to prevent
      circular epoll dependencies from being created.  However, in that case we
      do not properly clear the 'tfile_check_list'.  Thus, add a call to
      clear_tfile_check_list() for the -ELOOP case.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Reported-by: NYurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
      Cc: Nelson Elhage <nelhage@nelhage.com>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Tested-by: NAlexandra N. Kossovsky <Alexandra.Kossovsky@oktetlabs.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      13d51807
  6. 25 4月, 2012 2 次提交
  7. 24 4月, 2012 4 次提交
  8. 22 4月, 2012 2 次提交
  9. 21 4月, 2012 1 次提交