1. 04 6月, 2009 4 次提交
    • J
      ocfs2: Fix possible deadlock in quota recovery · 80d73f15
      Jan Kara 提交于
      In ocfs2_finish_quota_recovery() we acquired global quota file lock and started
      recovering local quota file. During this process we need to get quota
      structures, which calls ocfs2_dquot_acquire() which gets global quota file lock
      again. This second lock can block in case some other node has requested the
      quota file lock in the mean time. Fix the problem by moving quota file locking
      down into the function where it is really needed.  Then dqget() or dqput()
      won't be called with the lock held.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      80d73f15
    • J
      ocfs2: Fix possible deadlock with quotas in ocfs2_setattr() · 65bac575
      Jan Kara 提交于
      We called vfs_dq_transfer() with global quota file lock held. This can lead
      to deadlocks as if vfs_dq_transfer() has to allocate new quota structure,
      it calls ocfs2_dquot_acquire() which tries to get quota file lock again and
      this can block if another node requested the lock in the mean time.
      
      Since we have to call vfs_dq_transfer() with transaction already started
      and quota file lock ranks above the transaction start, we cannot just rely
      on ocfs2_dquot_acquire() or ocfs2_dquot_release() on getting the lock
      if they need it. We fix the problem by acquiring pointers to all quota
      structures needed by vfs_dq_transfer() already before calling the function.
      By this we are sure that all quota structures are properly allocated and
      they can be freed only after we drop references to them. Thus we don't need
      quota file lock anywhere inside vfs_dq_transfer().
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      65bac575
    • J
      ocfs2: Fix lock inversion in ocfs2_local_read_info() · b4c30de3
      Jan Kara 提交于
      This function is called with dqio_mutex held but it has to acquire lock
      from global quota file which ranks above this lock. This is not deadlockable
      lock inversion since this code path is take only during mount when noone
      else can race with us but let's clean this up to silence lockdep.
      
      We just drop the dqio_mutex in the beginning of the function and reacquire
      it in the end since we don't need it - noone can race with us at this moment.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      b4c30de3
    • J
      ocfs2: Fix possible deadlock in ocfs2_global_read_dquot() · 4e8a3019
      Jan Kara 提交于
      It is not possible to get a read lock and then try to get the same write lock
      in one thread as that can block on downconvert being requested by other node
      leading to deadlock. So first drop the quota lock for reading and only after
      that get it for writing.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      4e8a3019
  2. 06 5月, 2009 2 次提交
    • C
      ocfs2: update comments in masklog.h · 2b53bc7b
      Coly Li 提交于
      In the mainline ocfs2 code, the interface for masklog is in files under
      /sys/fs/o2cb/masklog, but the comments in fs/ocfs2/cluster/masklog.h
      reference the old /proc interface.  They are out of date.
      
      This patch modifies the comments in cluster/masklog.h, which also provides
      a bash script example on how to change the log mask bits.
      Signed-off-by: NColy Li <coly.li@suse.de>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      2b53bc7b
    • T
      ocfs2: Don't printk the error when listing too many xattrs. · a46fa684
      Tao Ma 提交于
      Currently the kernel defines XATTR_LIST_MAX as 65536
      in include/linux/limits.h.  This is the largest buffer that is used for
      listing xattrs.
      
      But with ocfs2 xattr tree, we actually have no limit for the number.  If
      filesystem has more names than can fit in the buffer, the kernel
      logs will be pollluted with something like this when listing:
      
      (27738,0):ocfs2_iterate_xattr_buckets:3158 ERROR: status = -34
      (27738,0):ocfs2_xattr_tree_list_index_block:3264 ERROR: status = -34
      
      So don't print "ERROR" message as this is not an ocfs2 error.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      a46fa684
  3. 03 5月, 2009 6 次提交
    • O
      ptrace: s/parent/real_parent/ in binfmt_elf_fdpic.c · 0ae05fb2
      Oleg Nesterov 提交于
      ->real_parent is the parent. ->parent may be the tracer.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ae05fb2
    • K
      mm: fix Committed_AS underflow on large NR_CPUS environment · 00a62ce9
      KOSAKI Motohiro 提交于
      The Committed_AS field can underflow in certain situations:
      
      >         # while true; do cat /proc/meminfo  | grep _AS; sleep 1; done | uniq -c
      >               1 Committed_AS: 18446744073709323392 kB
      >              11 Committed_AS: 18446744073709455488 kB
      >               6 Committed_AS:    35136 kB
      >               5 Committed_AS: 18446744073709454400 kB
      >               7 Committed_AS:    35904 kB
      >               3 Committed_AS: 18446744073709453248 kB
      >               2 Committed_AS:    34752 kB
      >               9 Committed_AS: 18446744073709453248 kB
      >               8 Committed_AS:    34752 kB
      >               3 Committed_AS: 18446744073709320960 kB
      >               7 Committed_AS: 18446744073709454080 kB
      >               3 Committed_AS: 18446744073709320960 kB
      >               5 Committed_AS: 18446744073709454080 kB
      >               6 Committed_AS: 18446744073709320960 kB
      
      Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does
      not check for underflow.
      
      But NR_CPUS proportional isn't good calculation.  In general,
      possibility of lock contention is proportional to the number of online
      cpus, not theorical maximum cpus (NR_CPUS).
      
      The current kernel has generic percpu-counter stuff.  using it is right
      way.  it makes code simplify and percpu_counter_read_positive() don't
      make underflow issue.
      Reported-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Eric B Munson <ebmunson@us.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: <stable@kernel.org>		[All kernel versions]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00a62ce9
    • I
      alpha: binfmt_aout fix · 74641f58
      Ivan Kokshaysky 提交于
      This fixes the problem introduced by commit 3bfacef4 (get rid of
      special-casing the /sbin/loader on alpha): osf/1 ecoff binary segfaults
      when binfmt_aout built as module.  That happens because aout binary
      handler gets on the top of the binfmt list due to late registration, and
      kernel attempts to execute the binary without preparatory work that must
      be done by binfmt_loader.
      
      Fixed by changing the registration order of the default binfmt handlers
      using list_add_tail() and introducing insert_binfmt() function which
      places new handler on the top of the binfmt list.  This might be generally
      useful for installing arch-specific frontends for default handlers or just
      for overriding them.
      Signed-off-by: NIvan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Richard Henderson <rth@twiddle.net
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      74641f58
    • V
      pagemap: require aligned-length, non-null reads of /proc/pid/pagemap · 08161786
      Vitaly Mayatskikh 提交于
      The intention of commit aae8679b
      ("pagemap: fix bug in add_to_pagemap, require aligned-length reads of
      /proc/pid/pagemap") was to force reads of /proc/pid/pagemap to be a
      multiple of 8 bytes, but now it allows to read 0 bytes, which actually
      puts some data to user's buffer.  According to POSIX, if count is zero,
      read() should return zero and has no other results.
      Signed-off-by: NVitaly Mayatskikh <v.mayatskih@gmail.com>
      Cc: Thomas Tuttle <ttuttle@google.com>
      Acked-by: NMatt Mackall <mpm@selenic.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08161786
    • N
      mm: close page_mkwrite races · b827e496
      Nick Piggin 提交于
      Change page_mkwrite to allow implementations to return with the page
      locked, and also change it's callers (in page fault paths) to hold the
      lock until the page is marked dirty.  This allows the filesystem to have
      full control of page dirtying events coming from the VM.
      
      Rather than simply hold the page locked over the page_mkwrite call, we
      call page_mkwrite with the page unlocked and allow callers to return with
      it locked, so filesystems can avoid LOR conditions with page lock.
      
      The problem with the current scheme is this: a filesystem that wants to
      associate some metadata with a page as long as the page is dirty, will
      perform this manipulation in its ->page_mkwrite.  It currently then must
      return with the page unlocked and may not hold any other locks (according
      to existing page_mkwrite convention).
      
      In this window, the VM could write out the page, clearing page-dirty.  The
      filesystem has no good way to detect that a dirty pte is about to be
      attached, so it will happily write out the page, at which point, the
      filesystem may manipulate the metadata to reflect that the page is no
      longer dirty.
      
      It is not always possible to perform the required metadata manipulation in
      ->set_page_dirty, because that function cannot block or fail.  The
      filesystem may need to allocate some data structure, for example.
      
      And the VM cannot mark the pte dirty before page_mkwrite, because
      page_mkwrite is allowed to fail, so we must not allow any window where the
      page could be written to if page_mkwrite does fail.
      
      This solution of holding the page locked over the 3 critical operations
      (page_mkwrite, setting the pte dirty, and finally setting the page dirty)
      closes out races nicely, preventing page cleaning for writeout being
      initiated in that window.  This provides the filesystem with a strong
      synchronisation against the VM here.
      
      - Sage needs this race closed for ceph filesystem.
      - Trond for NFS (http://bugzilla.kernel.org/show_bug.cgi?id=12913).
      - I need it for fsblock.
      - I suspect other filesystems may need it too (eg. btrfs).
      - I have converted buffer.c to the new locking. Even simple block allocation
        under dirty pages might be susceptible to i_size changing under partial page
        at the end of file (we also have a buffer.c-side problem here, but it cannot
        be fixed properly without this patch).
      - Other filesystems (eg. NFS, maybe btrfs) will need to change their
        page_mkwrite functions themselves.
      
      [ This also moves page_mkwrite another step closer to fault, which should
        eventually allow page_mkwrite to be moved into ->fault, and thus avoiding a
        filesystem calldown and page lock/unlock cycle in __do_fault. ]
      
      [akpm@linux-foundation.org: fix derefs of NULL ->mapping]
      Cc: Sage Weil <sage@newdream.net>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b827e496
    • I
      autofs4: fix incorrect return in autofs4_mount_busy() · a8985f3a
      Ian Kent 提交于
      Fix an obvious incorrect return status in autofs4_mount_busy().
      Signed-off-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a8985f3a
  4. 01 5月, 2009 1 次提交
    • J
      ocfs2: Fix a missing credit when deleting from indexed directories. · dfa13f39
      Joel Becker 提交于
      The ocfs2 directory index updates two blocks when we remove an entry -
      the dx root and the dx leaf.  OCFS2_DELETE_INODE_CREDITS was only
      accounting for the dx leaf.  This shows up when ocfs2_delete_inode()
      runs out of credits in jbd2_journal_dirty_metadata() at
      "J_ASSERT_JH(jh, handle->h_buffer_credits > 0);".
      
      The test that caught this was running dirop_file_racer from the
      ocfs2-test suite with a 250-character filename PREFIX.  Run on a 512B
      blocksize, it forces the orphan dir index to grow large enough to
      trigger.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      dfa13f39
  5. 30 4月, 2009 5 次提交
  6. 29 4月, 2009 1 次提交
  7. 28 4月, 2009 4 次提交
  8. 27 4月, 2009 8 次提交
  9. 25 4月, 2009 9 次提交