1. 25 2月, 2009 1 次提交
  2. 24 2月, 2009 1 次提交
  3. 23 2月, 2009 1 次提交
  4. 22 2月, 2009 1 次提交
    • T
      ext4: Add fallback for find_group_flex · 05bf9e83
      Theodore Ts'o 提交于
      This is a workaround for find_group_flex() which badly needs to be
      replaced.  One of its problems (besides ignoring the Orlov algorithm)
      is that it is a bit hyperactive about returning failure under
      suspicious circumstances.  This can lead to spurious ENOSPC failures
      even when there are inodes still available.
      
      Work around this for now by retrying the search using
      find_group_other() if find_group_flex() returns -1.  If
      find_group_other() succeeds when find_group_flex() has failed, log a
      warning message.
      
      A better block/inode allocator that will fix this problem for real has
      been queued up for the next merge window.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      05bf9e83
  5. 21 2月, 2009 7 次提交
  6. 20 2月, 2009 3 次提交
  7. 19 2月, 2009 6 次提交
    • I
      inotify: fix GFP_KERNEL related deadlock · f04b30de
      Ingo Molnar 提交于
      Enhanced lockdep coverage of __GFP_NOFS turned up this new lockdep
      assert:
      
      [ 1093.677775]
      [ 1093.677781] =================================
      [ 1093.680031] [ INFO: inconsistent lock state ]
      [ 1093.680031] 2.6.29-rc5-tip-01504-gb49eca1-dirty #1
      [ 1093.680031] ---------------------------------
      [ 1093.680031] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      [ 1093.680031] kswapd0/308 [HC0[0]:SC0[0]:HE1:SE1] takes:
      [ 1093.680031]  (&inode->inotify_mutex){+.+.?.}, at: [<c0205942>] inotify_inode_is_dead+0x20/0x80
      [ 1093.680031] {RECLAIM_FS-ON-W} state was registered at:
      [ 1093.680031]   [<c01696b9>] mark_held_locks+0x43/0x5b
      [ 1093.680031]   [<c016baa4>] lockdep_trace_alloc+0x6c/0x6e
      [ 1093.680031]   [<c01cf8b0>] kmem_cache_alloc+0x20/0x150
      [ 1093.680031]   [<c040d0ec>] idr_pre_get+0x27/0x6c
      [ 1093.680031]   [<c02056e3>] inotify_handle_get_wd+0x25/0xad
      [ 1093.680031]   [<c0205f43>] inotify_add_watch+0x7a/0x129
      [ 1093.680031]   [<c020679e>] sys_inotify_add_watch+0x20f/0x250
      [ 1093.680031]   [<c010389e>] sysenter_do_call+0x12/0x35
      [ 1093.680031]   [<ffffffff>] 0xffffffff
      [ 1093.680031] irq event stamp: 60417
      [ 1093.680031] hardirqs last  enabled at (60417): [<c018d5f5>] call_rcu+0x53/0x59
      [ 1093.680031] hardirqs last disabled at (60416): [<c018d5b9>] call_rcu+0x17/0x59
      [ 1093.680031] softirqs last  enabled at (59656): [<c0146229>] __do_softirq+0x157/0x16b
      [ 1093.680031] softirqs last disabled at (59651): [<c0106293>] do_softirq+0x74/0x15d
      [ 1093.680031]
      [ 1093.680031] other info that might help us debug this:
      [ 1093.680031] 2 locks held by kswapd0/308:
      [ 1093.680031]  #0:  (shrinker_rwsem){++++..}, at: [<c01b0502>] shrink_slab+0x36/0x189
      [ 1093.680031]  #1:  (&type->s_umount_key#4){+++++.}, at: [<c01e6d77>] shrink_dcache_memory+0x110/0x1fb
      [ 1093.680031]
      [ 1093.680031] stack backtrace:
      [ 1093.680031] Pid: 308, comm: kswapd0 Not tainted 2.6.29-rc5-tip-01504-gb49eca1-dirty #1
      [ 1093.680031] Call Trace:
      [ 1093.680031]  [<c016947a>] valid_state+0x12a/0x13d
      [ 1093.680031]  [<c016954e>] mark_lock+0xc1/0x1e9
      [ 1093.680031]  [<c016a5b4>] ? check_usage_forwards+0x0/0x3f
      [ 1093.680031]  [<c016ab74>] __lock_acquire+0x2c6/0xac8
      [ 1093.680031]  [<c01688d9>] ? register_lock_class+0x17/0x228
      [ 1093.680031]  [<c016b3d3>] lock_acquire+0x5d/0x7a
      [ 1093.680031]  [<c0205942>] ? inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c08824c4>] __mutex_lock_common+0x3a/0x4cb
      [ 1093.680031]  [<c0205942>] ? inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c08829ed>] mutex_lock_nested+0x2e/0x36
      [ 1093.680031]  [<c0205942>] ? inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c0205942>] inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c01e6672>] dentry_iput+0x90/0xc2
      [ 1093.680031]  [<c01e67a3>] d_kill+0x21/0x45
      [ 1093.680031]  [<c01e6a46>] __shrink_dcache_sb+0x27f/0x355
      [ 1093.680031]  [<c01e6dc5>] shrink_dcache_memory+0x15e/0x1fb
      [ 1093.680031]  [<c01b05ed>] shrink_slab+0x121/0x189
      [ 1093.680031]  [<c01b0d12>] kswapd+0x39f/0x561
      [ 1093.680031]  [<c01ae499>] ? isolate_pages_global+0x0/0x233
      [ 1093.680031]  [<c0157eae>] ? autoremove_wake_function+0x0/0x43
      [ 1093.680031]  [<c01b0973>] ? kswapd+0x0/0x561
      [ 1093.680031]  [<c0157daf>] kthread+0x41/0x82
      [ 1093.680031]  [<c0157d6e>] ? kthread+0x0/0x82
      [ 1093.680031]  [<c01043ab>] kernel_thread_helper+0x7/0x10
      
      inotify_handle_get_wd() does idr_pre_get() which does a
      kmem_cache_alloc() without __GFP_FS - and is hence deadlockable under
      extreme MM pressure.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: MinChan Kim <minchan.kim@gmail.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f04b30de
    • B
      vt: Declare PIO_CMAP/GIO_CMAP as compatbile ioctls. · 2db69a93
      Bill Nottingham 提交于
      Otherwise, these don't work when called from 32-bit userspace on 64-bit
      kernels.
      
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2db69a93
    • P
      fs/super.c: add lockdep annotation to s_umount · ada723dc
      Peter Zijlstra 提交于
      Li Zefan said:
      
      Thread 1:
        for ((; ;))
        {
            mount -t cpuset xxx /mnt > /dev/null 2>&1
            cat /mnt/cpus > /dev/null 2>&1
            umount /mnt > /dev/null 2>&1
        }
      
      Thread 2:
        for ((; ;))
        {
            mount -t cpuset xxx /mnt > /dev/null 2>&1
            umount /mnt > /dev/null 2>&1
        }
      
      (Note: It is irrelevant which cgroup subsys is used.)
      
      After a while a lockdep warning showed up:
      
      =============================================
      [ INFO: possible recursive locking detected ]
      2.6.28 #479
      ---------------------------------------------
      mount/13554 is trying to acquire lock:
       (&type->s_umount_key#19){--..}, at: [<c049d888>] sget+0x5e/0x321
      
      but task is already holding lock:
       (&type->s_umount_key#19){--..}, at: [<c049da0c>] sget+0x1e2/0x321
      
      other info that might help us debug this:
      1 lock held by mount/13554:
       #0:  (&type->s_umount_key#19){--..}, at: [<c049da0c>] sget+0x1e2/0x321
      
      stack backtrace:
      Pid: 13554, comm: mount Not tainted 2.6.28-mc #479
      Call Trace:
       [<c044ad2e>] validate_chain+0x4c6/0xbbd
       [<c044ba9b>] __lock_acquire+0x676/0x700
       [<c044bb82>] lock_acquire+0x5d/0x7a
       [<c049d888>] ? sget+0x5e/0x321
       [<c061b9b8>] down_write+0x34/0x50
       [<c049d888>] ? sget+0x5e/0x321
       [<c049d888>] sget+0x5e/0x321
       [<c045a2e7>] ? cgroup_set_super+0x0/0x3e
       [<c045959f>] ? cgroup_test_super+0x0/0x2f
       [<c045bcea>] cgroup_get_sb+0x98/0x2e7
       [<c045cfb6>] cpuset_get_sb+0x4a/0x5f
       [<c049dfa4>] vfs_kern_mount+0x40/0x7b
       [<c049e02d>] do_kern_mount+0x37/0xbf
       [<c04af4a0>] do_mount+0x5c3/0x61a
       [<c04addd2>] ? copy_mount_options+0x2c/0x111
       [<c04af560>] sys_mount+0x69/0xa0
       [<c0403251>] sysenter_do_call+0x12/0x31
      
      The cause is after alloc_super() and then retry, an old entry in list
      fs_supers is found, so grab_super(old) is called, but both functions hold
      s_umount lock:
      
      struct super_block *sget(...)
      {
      	...
      retry:
      	spin_lock(&sb_lock);
      	if (test) {
      		list_for_each_entry(old, &type->fs_supers, s_instances) {
      			if (!test(old, data))
      				continue;
      			if (!grab_super(old))  <--- 2nd: down_write(&old->s_umount);
      				goto retry;
      			if (s)
      				destroy_super(s);
      			return old;
      		}
      	}
      	if (!s) {
      		spin_unlock(&sb_lock);
      		s = alloc_super(type);   <--- 1th: down_write(&s->s_umount)
      		if (!s)
      			return ERR_PTR(-ENOMEM);
      		goto retry;
      	}
      	...
      }
      
      It seems like a false positive, and seems like VFS but not cgroup needs to
      be fixed.
      
      Peter said:
      
      We can simply put the new s_umount instance in a but lockdep doesn't
      particularly cares about subclass order.
      
      If there's any issue with the callers of sget() assuming the s_umount lock
      being of sublcass 0, then there is another annotation we can use to fix
      that, but lets not bother with that if this is sufficient.
      
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12673Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NLi Zefan <lizf@cn.fujitsu.com>
      Reported-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Paul Menage <menage@google.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ada723dc
    • N
      mm: task dirty accounting fix · 1cf6e7d8
      Nick Piggin 提交于
      YAMAMOTO-san noticed that task_dirty_inc doesn't seem to be called properly for
      cases where set_page_dirty is not used to dirty a page (eg. mark_buffer_dirty).
      
      Additionally, there is some inconsistency about when task_dirty_inc is
      called.  It is used for dirty balancing, however it even gets called for
      __set_page_dirty_no_writeback.
      
      So rather than increment it in a set_page_dirty wrapper, move it down to
      exactly where the dirty page accounting stats are incremented.
      
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1cf6e7d8
    • D
      timerfd: add flags check · 610d18f4
      Davide Libenzi 提交于
      As requested by Michael, add a missing check for valid flags in
      timerfd_settime(), and make it return EINVAL in case some extra bits are
      set.
      
      Michael said:
      If this is to be any use to userland apps that want to check flag
      support (perhaps it is too late already), then the sooner we get it
      into the kernel the better: 2.6.29 would be good; earlier stables as
      well would be even better.
      
      [akpm@linux-foundation.org: remove unused TFD_FLAGS_SET]
      Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: <stable@kernel.org>		[2.6.27.x, 2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      610d18f4
    • E
      seq_file: properly cope with pread · 8f19d472
      Eric Biederman 提交于
      Currently seq_read assumes that the offset passed to it is always the
      offset it passed to user space.  In the case pread this assumption is
      broken and we do the wrong thing when presented with pread.
      
      To solve this I introduce an offset cache inside of struct seq_file so we
      know where our logical file position is.  Then in seq_read if we try to
      read from another offset we reset our data structures and attempt to go to
      the offset user space wanted.
      
      [akpm@linux-foundation.org: restore FMODE_PWRITE]
      [pjt@google.com: seq_open needs its fmode opened up to take advantage of this]
      Signed-off-by: NEric Biederman <ebiederm@xmission.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Paul Turner <pjt@google.com>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f19d472
  8. 18 2月, 2009 3 次提交
    • J
      block: revert part of 18ce3751 · 78f707bf
      Jens Axboe 提交于
      The above commit added WRITE_SYNC and switched various places to using
      that for committing writes that will be waited upon immediately after
      submission. However, this causes a performance regression with AS and CFQ
      for ext3 at least, since sync_dirty_buffer() will submit some writes with
      WRITE_SYNC while ext3 has sumitted others dependent writes without the sync
      flag set. This causes excessive anticipation/idling in the IO scheduler
      because sync and async writes get interleaved, causing a big performance
      regression for the below test case (which is meant to simulate sqlite
      like behaviour).
      
      ---- test case ----
      
      int main(int argc, char **argv)
      {
      
      	int fdes, i;
      	FILE *fp;
      	struct timeval start;
      	struct timeval end;
      	struct timeval res;
      
      	gettimeofday(&start, NULL);
      	for (i=0; i<ROWS; i++) {
      		fp = fopen("test_file", "a");
      		fprintf(fp, "Some Text Data\n");
      		fdes = fileno(fp);
      		fsync(fdes);
      		fclose(fp);
      	}
      	gettimeofday(&end, NULL);
      
      	timersub(&end, &start, &res);
      	fprintf(stdout, "time to write %d lines is %ld(msec)\n", ROWS,
      			(res.tv_sec*1000000 + res.tv_usec)/1000);
      
      	return 0;
      }
      
      -------------------
      
      Thanks to Sean.White@APCC.com for tracking down this performance
      regression and providing a test case.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      78f707bf
    • S
      fs/bio: bio_alloc_bioset: pass right object ptr to mempool_free · a60e78e5
      Subhash Peddamallu 提交于
      When freeing from bio pool use right ptr to account for bs->front_pad,
      instead of bio ptr,
      Signed-off-by: NSubhash Peddamallu <subhash.peddamallu@gmail.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      a60e78e5
    • A
      Fix incomplete __mntput locking · 1a88b536
      Al Viro 提交于
      Getting this wrong caused
      
      	WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2()
      
      due to optimistically checking cpu_writer->mnt outside the spinlock.
      
      Here's what we really want:
       * we know that nobody will set cpu_writer->mnt to mnt from now on
       * all changes to that sucker are done under cpu_writer->lock
       * we want the laziest equivalent of
      	spin_lock(&cpu_writer->lock);
      	if (likely(cpu_writer->mnt != mnt)) {
      		spin_unlock(&cpu_writer->lock);
      		continue;
      	}
      	/* do stuff */
        that would make sure we won't miss earlier setting of ->mnt done by
        another CPU.
      
      Anyway, for now we just move the spin_lock() earlier and move the test
      into the properly locked region.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Reported-and-tested-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1a88b536
  9. 16 2月, 2009 1 次提交
  10. 14 2月, 2009 2 次提交
  11. 13 2月, 2009 2 次提交
    • Y
      Btrfs: hold trans_mutex when using btrfs_record_root_in_trans · 24562425
      Yan Zheng 提交于
      btrfs_record_root_in_trans needs the trans_mutex held to make sure two
      callers don't race to setup the root in a given transaction.  This adds
      it to all the places that were missing it.
      Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
      24562425
    • C
      Btrfs: make a lockdep class for the extent buffer locks · 4008c04a
      Chris Mason 提交于
      Btrfs is currently using spin_lock_nested with a nested value based
      on the tree depth of the block.  But, this doesn't quite work because
      the max tree depth is bigger than what spin_lock_nested can deal with,
      and because locks are sometimes taken before the level field is filled in.
      
      The solution here is to use lockdep_set_class_and_name instead, and to
      set the class before unlocking the pages when the block is read from the
      disk and just after init of a freshly allocated tree block.
      
      btrfs_clear_path_blocking is also changed to take the locks in the proper
      order, and it also makes sure all the locks currently held are properly
      set to blocking before it tries to retake the spinlocks.  Otherwise, lockdep
      gets upset about bad lock orderin.
      
      The lockdep magic cam from Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      4008c04a
  12. 12 2月, 2009 1 次提交
    • J
      Btrfs: fs/btrfs/volumes.c: remove useless kzalloc · 3f3420df
      Julia Lawall 提交于
      The call to kzalloc is followed by a kmalloc whose result is stored in the
      same variable.
      
      The semantic match that finds the problem is as follows:
      (http://www.emn.fr/x-info/coccinelle/)
      
      // <smpl>
      @r exists@
      local idexpression x;
      statement S;
      expression E;
      identifier f,l;
      position p1,p2;
      expression *ptr != NULL;
      @@
      
      (
      if ((x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...)) == NULL) S
      |
      x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
      ...
      if (x == NULL) S
      )
      <... when != x
           when != if (...) { <+...x...+> }
      x->f = E
      ...>
      (
       return \(0\|<+...x...+>\|ptr\);
      |
       return@p2 ...;
      )
      
      @script:python@
      p1 << r.p1;
      p2 << r.p2;
      @@
      
      print "* file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      3f3420df
  13. 13 2月, 2009 2 次提交
  14. 12 2月, 2009 9 次提交
    • J
      Btrfs: balance_level checks !child after access · 7951f3ce
      Jeff Mahoney 提交于
      The BUG_ON() is in the wrong spot.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      7951f3ce
    • Y
      Btrfs: Avoid using __GFP_HIGHMEM with slab allocator · b335b003
      Yan Zheng 提交于
      btrfs_releasepage may call kmem_cache_alloc indirectly,
      and provide same GFP flags it gets to kmem_cache_alloc.
      So it's possible to use __GFP_HIGHMEM with the slab
      allocator.
      Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
      b335b003
    • C
      Btrfs: don't clean old snapshots on sync(1) · e1df36d2
      Chris Mason 提交于
      Cleaning old snapshots can make sync(1) somewhat slow, and some users
      and applications still use it in a global fsync kind of workload.
      
      This patch changes btrfs not to clean old snapshots during sync, which is
      safe from a FS consistency point of view.  The major downside is that it
      makes it difficult to tell when old snapshots have been reaped and
      the space they were using has been reclaimed.  A new ioctl will be added
      for this purpose instead.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      e1df36d2
    • C
      Btrfs: use larger metadata clusters in ssd mode · 536ac8ae
      Chris Mason 提交于
      Larger metadata clusters can significantly improve writeback performance
      on ssd drives with large erasure blocks.  The larger clusters make it
      more likely a given IO will completely overwrite the ssd block, so it
      doesn't have to do an internal rwm cycle.
      
      On spinning media, lager metadata clusters end up spreading out the
      metadata more over time, which makes fsck slower, so we don't want this
      to be the default.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      536ac8ae
    • C
      Btrfs: process mount options on mount -o remount, · b288052e
      Chris Mason 提交于
      Btrfs wasn't parsing any new mount options during remount, making it
      difficult to set mount options on a root drive.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b288052e
    • J
      Btrfs: make sure all pending extent operations are complete · eb099670
      Josef Bacik 提交于
      Theres a slight problem with finish_current_insert, if we set all to 1 and then
      go through and don't actually skip any of the extents on the pending list, we
      could exit right after we've added new extents.
      
      This is a problem because by inserting the new extents we could have gotten new
      COW's to happen and such, so we may have some pending updates to do or even
      more inserts to do after that.
      
      So this patch will only exit if we have never skipped any of the extents in the
      pending list, and we have no extents to insert, this will make sure that all of
      the pending work is truly done before we return.  I've been running with this
      patch for a few days with all of my other testing and have not seen issues.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@redhat.com>
      eb099670
    • C
      ext2/xip: refuse to change xip flag during remount with busy inodes · 0e4a9b59
      Carsten Otte 提交于
      For a reason that I was unable to understand in three months of debugging,
      mount ext2 -o remount stopped working properly when remounting from
      regular operation to xip, or the other way around.  According to a git
      bisect search, the problem was introduced with the VM_MIXEDMAP/PTE_SPECIAL
      rework in the vm:
      
      commit 70688e4d
      Author: Nick Piggin <npiggin@suse.de>
      Date:   Mon Apr 28 02:13:02 2008 -0700
      
          xip: support non-struct page backed memory
      
      In the failing scenario, the filesystem is mounted read only via root=
      kernel parameter on s390x.  During remount (in rc.sysinit), the inodes of
      the bash binary and its libraries are busy and cannot be invalidated (the
      bash which is running rc.sysinit resides on subject filesystem).
      Afterwards, another bash process (running ifup-eth) recurses into a
      subshell, runs dup_mm (via fork).  Some of the mappings in this bash
      process were created from inodes that could not be invalidated during
      remount.
      
      Both parent and child process crash some time later due to inconsistencies
      in their address spaces.  The issue seems to be timing sensitive, various
      attempts to recreate it have failed.
      
      This patch refuses to change the xip flag during remount in case some
      inodes cannot be invalidated.  This patch keeps users from running into
      that issue.
      
      [akpm@linux-foundation.org: cleanup]
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Jared Hulbert <jaredeh@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e4a9b59
    • J
      ext3: revert "ext3: wait on all pending commits in ext3_sync_fs" · 02ac597c
      Jan Kara 提交于
      This reverts commit c87591b7.
      
      Since journal_start_commit() is now fixed to return 1 when we started a
      transaction commit, there's some transaction waiting to be committed or
      there's a transaction already committing, we don't need to call
      ext3_force_commit() in ext3_sync_fs().  Furthermore ext3_force_commit()
      can unnecessarily create sync transaction which is expensive so it's
      worthwhile to remove it when we can.
      
      Cc: Eric Sandeen <sandeen@redhat.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02ac597c
    • J
      jbd: fix return value of journal_start_commit() · 8fe4cd0d
      Jan Kara 提交于
      journal_start_commit() returns 1 if either a transaction is committing or
      the function has queued a transaction commit.  But it returns 0 if we
      raced with somebody queueing the transaction commit as well.  This
      resulted in ext3_sync_fs() not functioning correctly (description from
      Arthur Jones): In the case of a data=ordered umount with pending long
      symlinks which are delayed due to a long list of other I/O on the backing
      block device, this causes the buffer associated with the long symlinks to
      not be moved to the inode dirty list in the second phase of fsync_super.
      Then, before they can be dirtied again, kjournald exits, seeing the UMOUNT
      flag and the dirty pages are never written to the backing block device,
      causing long symlink corruption and exposing new or previously freed block
      data to userspace.
      
      This can be reproduced with a script created by Eric Sandeen
      <sandeen@redhat.com>:
      
              #!/bin/bash
      
              umount /mnt/test2
              mount /dev/sdb4 /mnt/test2
              rm -f /mnt/test2/*
              dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512
              touch /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
              ln -s /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
              /mnt/test2/link
              umount /mnt/test2
              mount /dev/sdb4 /mnt/test2
              ls /mnt/test2/
      
      This patch fixes journal_start_commit() to always return 1 when there's
      a transaction committing or queued for commit.
      
      Cc: Eric Sandeen <sandeen@redhat.com>
      Cc: Mike Snitzer <snitzer@gmail.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8fe4cd0d
反馈
建议
客服 返回
顶部