1. 28 10月, 2010 19 次提交
  2. 05 10月, 2010 2 次提交
    • J
      ext3: Fix lost extented attributes for inode with ino == 11 · beb37b85
      Jan Kara 提交于
      If a filesystem has inode size > 128 and someone deletes lost+found and
      reuses inode 11 for some other file, extented attributes set for this
      inode before umount will get lost after remounting the filesystem. This
      is because extended attributes will get stored in an inode but ext3_iget
      will ignore them due to workaround of a bug in an old mkfs.
      
      Fix the problem by initializing i_extra_isize to 0 for freshly allocated
      inodes where mkfs workaround in ext3_iget applies. This way these inodes
      will always store extended attributes in a special block and no problems
      occur.
      
      The bug was spotted and a reproduction test provided by:
        Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com>
      Reviewed-by: NAndreas Dilger <adilger.kernel@dilger.ca>
      Signed-off-by: NJan Kara <jack@suse.cz>
      beb37b85
    • J
      quota: Make QUOTACTL config be selected by its users · 80f44b15
      Jan Kara 提交于
      Remove "depends on" line from QUOTACTL config option and rather select
      the option explicitely from config options which need it. It makes more
      sense this way and also fixes Kconfig warning due to GFS2 selecting
      QUOTACTL but QUOTACTL not depending on it.
      Signed-off-by: NJan Kara <jack@suse.cz>
      80f44b15
  3. 02 10月, 2010 4 次提交
    • F
      reiserfs: fix unwanted reiserfs lock recursion · 9d8117e7
      Frederic Weisbecker 提交于
      Prevent from recursively locking the reiserfs lock in reiserfs_unpack()
      because we may call journal_begin() that requires the lock to be taken
      only once, otherwise it won't be able to release the lock while taking
      other mutexes, ending up in inverted dependencies between the journal
      mutex and the reiserfs lock for example.
      
      This fixes:
      
        =======================================================
        [ INFO: possible circular locking dependency detected ]
        2.6.35.4.4a #3
        -------------------------------------------------------
        lilo/1620 is trying to acquire lock:
         (&journal->j_mutex){+.+...}, at: [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs]
      
        but task is already holding lock:
         (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs]
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
               [<c10562b7>] lock_acquire+0x67/0x80
               [<c12facad>] __mutex_lock_common+0x4d/0x410
               [<c12fb0c8>] mutex_lock_nested+0x18/0x20
               [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs]
               [<d0325c06>] do_journal_begin_r+0x86/0x340 [reiserfs]
               [<d0325f77>] journal_begin+0x77/0x140 [reiserfs]
               [<d0315be4>] reiserfs_remount+0x224/0x530 [reiserfs]
               [<c10b6a20>] do_remount_sb+0x60/0x110
               [<c10cee25>] do_mount+0x625/0x790
               [<c10cf014>] sys_mount+0x84/0xb0
               [<c12fca3d>] syscall_call+0x7/0xb
      
        -> #0 (&journal->j_mutex){+.+...}:
               [<c10560f6>] __lock_acquire+0x1026/0x1180
               [<c10562b7>] lock_acquire+0x67/0x80
               [<c12facad>] __mutex_lock_common+0x4d/0x410
               [<c12fb0c8>] mutex_lock_nested+0x18/0x20
               [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs]
               [<d0325f77>] journal_begin+0x77/0x140 [reiserfs]
               [<d0326271>] reiserfs_persistent_transaction+0x41/0x90 [reiserfs]
               [<d030d06c>] reiserfs_get_block+0x22c/0x1530 [reiserfs]
               [<c10db9db>] __block_prepare_write+0x1bb/0x3a0
               [<c10dbbe6>] block_prepare_write+0x26/0x40
               [<d030b738>] reiserfs_prepare_write+0x88/0x170 [reiserfs]
               [<d03294d6>] reiserfs_unpack+0xe6/0x120 [reiserfs]
               [<d0329782>] reiserfs_ioctl+0x272/0x320 [reiserfs]
               [<c10c3188>] vfs_ioctl+0x28/0xa0
               [<c10c3bbd>] do_vfs_ioctl+0x32d/0x5c0
               [<c10c3eb3>] sys_ioctl+0x63/0x70
               [<c12fca3d>] syscall_call+0x7/0xb
      
        other info that might help us debug this:
      
        2 locks held by lilo/1620:
         #0:  (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<d032945a>] reiserfs_unpack+0x6a/0x120 [reiserfs]
         #1:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs]
      
        stack backtrace:
        Pid: 1620, comm: lilo Not tainted 2.6.35.4.4a #3
        Call Trace:
         [<c10560f6>] __lock_acquire+0x1026/0x1180
         [<c10562b7>] lock_acquire+0x67/0x80
         [<c12facad>] __mutex_lock_common+0x4d/0x410
         [<c12fb0c8>] mutex_lock_nested+0x18/0x20
         [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs]
         [<d0325f77>] journal_begin+0x77/0x140 [reiserfs]
         [<d0326271>] reiserfs_persistent_transaction+0x41/0x90 [reiserfs]
         [<d030d06c>] reiserfs_get_block+0x22c/0x1530 [reiserfs]
         [<c10db9db>] __block_prepare_write+0x1bb/0x3a0
         [<c10dbbe6>] block_prepare_write+0x26/0x40
         [<d030b738>] reiserfs_prepare_write+0x88/0x170 [reiserfs]
         [<d03294d6>] reiserfs_unpack+0xe6/0x120 [reiserfs]
         [<d0329782>] reiserfs_ioctl+0x272/0x320 [reiserfs]
         [<c10c3188>] vfs_ioctl+0x28/0xa0
         [<c10c3bbd>] do_vfs_ioctl+0x32d/0x5c0
         [<c10c3eb3>] sys_ioctl+0x63/0x70
         [<c12fca3d>] syscall_call+0x7/0xb
      Reported-by: NJarek Poplawski <jarkao2@gmail.com>
      Tested-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: All since 2.6.32 <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d8117e7
    • F
      reiserfs: fix dependency inversion between inode and reiserfs mutexes · 3f259d09
      Frederic Weisbecker 提交于
      The reiserfs mutex already depends on the inode mutex, so we can't lock
      the inode mutex in reiserfs_unpack() without using the safe locking API,
      because reiserfs_unpack() is always called with the reiserfs mutex locked.
      
      This fixes:
      
        =======================================================
        [ INFO: possible circular locking dependency detected ]
        2.6.35c #13
        -------------------------------------------------------
        lilo/1606 is trying to acquire lock:
         (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs]
      
        but task is already holding lock:
         (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs]
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
               [<c1056347>] lock_acquire+0x67/0x80
               [<c12f083d>] __mutex_lock_common+0x4d/0x410
               [<c12f0c58>] mutex_lock_nested+0x18/0x20
               [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs]
               [<d0329e9a>] reiserfs_lookup_privroot+0x2a/0x90 [reiserfs]
               [<d0316b81>] reiserfs_fill_super+0x941/0xe60 [reiserfs]
               [<c10b7d17>] get_sb_bdev+0x117/0x170
               [<d0313e21>] get_super_block+0x21/0x30 [reiserfs]
               [<c10b74ba>] vfs_kern_mount+0x6a/0x1b0
               [<c10b7659>] do_kern_mount+0x39/0xe0
               [<c10cebe0>] do_mount+0x340/0x790
               [<c10cf0b4>] sys_mount+0x84/0xb0
               [<c12f25cd>] syscall_call+0x7/0xb
      
        -> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}:
               [<c1056186>] __lock_acquire+0x1026/0x1180
               [<c1056347>] lock_acquire+0x67/0x80
               [<c12f083d>] __mutex_lock_common+0x4d/0x410
               [<c12f0c58>] mutex_lock_nested+0x18/0x20
               [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs]
               [<d0329772>] reiserfs_ioctl+0x272/0x320 [reiserfs]
               [<c10c3228>] vfs_ioctl+0x28/0xa0
               [<c10c3c5d>] do_vfs_ioctl+0x32d/0x5c0
               [<c10c3f53>] sys_ioctl+0x63/0x70
               [<c12f25cd>] syscall_call+0x7/0xb
      
        other info that might help us debug this:
      
        1 lock held by lilo/1606:
         #0:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs]
      
        stack backtrace:
        Pid: 1606, comm: lilo Not tainted 2.6.35c #13
        Call Trace:
         [<c1056186>] __lock_acquire+0x1026/0x1180
         [<c1056347>] lock_acquire+0x67/0x80
         [<c12f083d>] __mutex_lock_common+0x4d/0x410
         [<c12f0c58>] mutex_lock_nested+0x18/0x20
         [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs]
         [<d0329772>] reiserfs_ioctl+0x272/0x320 [reiserfs]
         [<c10c3228>] vfs_ioctl+0x28/0xa0
         [<c10c3c5d>] do_vfs_ioctl+0x32d/0x5c0
         [<c10c3f53>] sys_ioctl+0x63/0x70
         [<c12f25cd>] syscall_call+0x7/0xb
      Reported-by: NJarek Poplawski <jarkao2@gmail.com>
      Tested-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: <stable@kernel.org>		[2.6.32 and later]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f259d09
    • J
      proc: make /proc/pid/limits world readable · 3036e7b4
      Jiri Olsa 提交于
      Having the limits file world readable will ease the task of system
      management on systems where root privileges might be restricted.
      
      Having admin restricted with root priviledges, he/she could not check
      other users process' limits.
      
      Also it'd align with most of the /proc stat files.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Cc: Eugene Teo <eugene@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3036e7b4
    • J
      cifs: prevent infinite recursion in cifs_reconnect_tcon · f569599a
      Jeff Layton 提交于
      cifs_reconnect_tcon is called from smb_init. After a successful
      reconnect, cifs_reconnect_tcon will call reset_cifs_unix_caps. That
      function will, in turn call CIFSSMBQFSUnixInfo and CIFSSMBSetFSUnixInfo.
      Those functions also call smb_init.
      
      It's possible for the session and tcon reconnect to succeed, and then
      for another cifs_reconnect to occur before CIFSSMBQFSUnixInfo or
      CIFSSMBSetFSUnixInfo to be called. That'll cause those functions to call
      smb_init and cifs_reconnect_tcon again, ad infinitum...
      
      Break the infinite recursion by having those functions use a new
      smb_init variant that doesn't attempt to perform a reconnect.
      Reported-and-Tested-by: NMichal Suchanek <hramrach@centrum.cz>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      f569599a
  4. 30 9月, 2010 2 次提交
  5. 29 9月, 2010 1 次提交
    • D
      xfs: force background CIL push under sustained load · 80168676
      Dave Chinner 提交于
      I have been seeing occasional pauses in transaction throughput up to
      30s long under heavy parallel workloads. The only notable thing was
      that the xfsaild was trying to be active during the pauses, but
      making no progress. It was running exactly 20 times a second (on the
      50ms no-progress backoff), and the number of pushbuf events was
      constant across this time as well.  IOWs, the xfsaild appeared to be
      stuck on buffers that it could not push out.
      
      Further investigation indicated that it was trying to push out inode
      buffers that were pinned and/or locked. The xfsbufd was also getting
      woken at the same frequency (by the xfsaild, no doubt) to push out
      delayed write buffers. The xfsbufd was not making any progress
      because all the buffers in the delwri queue were pinned. This scan-
      and-make-no-progress dance went one in the trace for some seconds,
      before the xfssyncd came along an issued a log force, and then
      things started going again.
      
      However, I noticed something strange about the log force - there
      were way too many IO's issued. 516 log buffers were written, to be
      exact. That added up to 129MB of log IO, which got me very
      interested because it's almost exactly 25% of the size of the log.
      He delayed logging code is suppose to aggregate the minimum of 25%
      of the log or 8MB worth of changes before flushing. That's what
      really puzzled me - why did a log force write 129MB instead of only
      8MB?
      
      Essentially what has happened is that no CIL pushes had occurred
      since the previous tail push which cleared out 25% of the log space.
      That caused all the new transactions to block because there wasn't
      log space for them, but they kick the xfsaild to push the tail.
      However, the xfsaild was not making progress because there were
      buffers it could not lock and flush, and the xfsbufd could not flush
      them because they were pinned. As a result, both the xfsaild and the
      xfsbufd could not move the tail of the log forward without the CIL
      first committing.
      
      The cause of the problem was that the background CIL push, which
      should happen when 8MB of aggregated changes have been committed, is
      being held off by the concurrent transaction commit load. The
      background push does a down_write_trylock() which will fail if there
      is a concurrent transaction commit holding the push lock in read
      mode. With 8 CPUs all doing transactions as fast as they can, there
      was enough concurrent transaction commits to hold off the background
      push until tail-pushing could no longer free log space, and the halt
      would occur.
      
      It should be noted that there is no reason why it would halt at 25%
      of log space used by a single CIL checkpoint. This bug could
      definitely violate the "no transaction should be larger than half
      the log" requirement and hence result in corruption if the system
      crashed under heavy load. This sort of bug is exactly the reason why
      delayed logging was tagged as experimental....
      
      The fix is to start blocking background pushes once the threshold
      has been exceeded. Rework the threshold calculations to keep the
      amount of log space a CIL checkpoint can use to below that of the
      AIL push threshold to avoid the problem completely.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      80168676
  6. 24 9月, 2010 5 次提交
  7. 23 9月, 2010 4 次提交
    • K
      /proc/pid/smaps: fix dirty pages accounting · 1c2499ae
      KOSAKI Motohiro 提交于
      Currently, /proc/<pid>/smaps has wrong dirty pages accounting.
      Shared_Dirty and Private_Dirty output only pte dirty pages and ignore
      PG_dirty page flag.  It is difference against documentation, but also
      inconsistent against Referenced field.  (Referenced checks both pte and
      page flags)
      
      This patch fixes it.
      
      Test program:
      
       large-array.c
       ---------------------------------------------------
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       #include <unistd.h>
      
       char array[1*1024*1024*1024L];
      
       int main(void)
       {
               memset(array, 1, sizeof(array));
               pause();
      
               return 0;
       }
       ---------------------------------------------------
      
      Test case:
       1. run ./large-array
       2. cat /proc/`pidof large-array`/smaps
       3. swapoff -a
       4. cat /proc/`pidof large-array`/smaps again
      
      Test result:
       <before patch>
      
      00601000-40601000 rw-p 00000000 00:00 0
      Size:            1048576 kB
      Rss:             1048576 kB
      Pss:             1048576 kB
      Shared_Clean:          0 kB
      Shared_Dirty:          0 kB
      Private_Clean:    218992 kB   <-- showed pages as clean incorrectly
      Private_Dirty:    829584 kB
      Referenced:       388364 kB
      Swap:                  0 kB
      KernelPageSize:        4 kB
      MMUPageSize:           4 kB
      
       <after patch>
      
      00601000-40601000 rw-p 00000000 00:00 0
      Size:            1048576 kB
      Rss:             1048576 kB
      Pss:             1048576 kB
      Shared_Clean:          0 kB
      Shared_Dirty:          0 kB
      Private_Clean:         0 kB
      Private_Dirty:   1048576 kB  <-- fixed
      Referenced:       388480 kB
      Swap:                  0 kB
      KernelPageSize:        4 kB
      MMUPageSize:           4 kB
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c2499ae
    • J
      aio: do not return ERESTARTSYS as a result of AIO · a0c42bac
      Jan Kara 提交于
      OCFS2 can return ERESTARTSYS from its write function when the process is
      signalled while waiting for a cluster lock (and the filesystem is mounted
      with intr mount option).  Generally, it seems reasonable to allow
      filesystems to return this error code from its IO functions.  As we must
      not leak ERESTARTSYS (and similar error codes) to userspace as a result of
      an AIO operation, we have to properly convert it to EINTR inside AIO code
      (restarting the syscall isn't really an option because other AIO could
      have been already submitted by the same io_submit syscall).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0c42bac
    • A
      /proc/vmcore: fix seeking · c227e690
      Arnd Bergmann 提交于
      Commit 73296bc6 ("procfs: Use generic_file_llseek in /proc/vmcore")
      broke seeking on /proc/vmcore.  This changes it back to use default_llseek
      in order to restore the original behaviour.
      
      The problem with generic_file_llseek is that it only allows seeks up to
      inode->i_sb->s_maxbytes, which is zero on procfs and some other virtual
      file systems.  We should merge generic_file_llseek and default_llseek some
      day and clean this up in a proper way, but for 2.6.35/36, reverting vmcore
      is the safer solution.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Reported-by: NCAI Qian <caiqian@redhat.com>
      Tested-by: NCAI Qian <caiqian@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c227e690
    • D
      Prevent freeing uninitialized pointer in compat_do_readv_writev · 767b68e9
      Dan Rosenberg 提交于
      In 32-bit compatibility mode, the error handling for
      compat_do_readv_writev() may free an uninitialized pointer, potentially
      leading to all sorts of ugly memory corruption.  This is reliably
      triggerable by unprivileged users by invoking the readv()/writev()
      syscalls with an invalid iovec pointer.  The below patch fixes this to
      emulate the non-compat version.
      
      Introduced by commit b8373363 ("compat: factor out
      compat_rw_copy_check_uvector from compat_do_readv_writev")
      Signed-off-by: NDan Rosenberg <dan.j.rosenberg@gmail.com>
      Cc: stable@kernel.org (2.6.35)
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      767b68e9
  8. 22 9月, 2010 2 次提交
    • J
      bdi: Fix warnings in __mark_inode_dirty for /dev/zero and friends · 692ebd17
      Jan Kara 提交于
      Inodes of devices such as /dev/zero can get dirty for example via
      utime(2) syscall or due to atime update. Backing device of such inodes
      (zero_bdi, etc.) is however unable to handle dirty inodes and thus
      __mark_inode_dirty complains.  In fact, inode should be rather dirtied
      against backing device of the filesystem holding it. This is generally a
      good rule except for filesystems such as 'bdev' or 'mtd_inodefs'. Inodes
      in these pseudofilesystems are referenced from ordinary filesystem
      inodes and carry mapping with real data of the device. Thus for these
      inodes we have to use inode->i_mapping->backing_dev_info as we did so
      far. We distinguish these filesystems by checking whether sb->s_bdi
      points to a non-trivial backing device or not.
      
      Example: Assume we have an ext3 filesystem on /dev/sda1 mounted on /.
      There's a device inode A described by a path "/dev/sdb" on this
      filesystem. This inode will be dirtied against backing device "8:0"
      after this patch. bdev filesystem contains block device inode B coupled
      with our inode A. When someone modifies a page of /dev/sdb, it's B that
      gets dirtied and the dirtying happens against the backing device "8:16".
      Thus both inodes get filed to a correct bdi list.
      
      Cc: stable@kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      692ebd17
    • J
      char: Mark /dev/zero and /dev/kmem as not capable of writeback · 371d217e
      Jan Kara 提交于
      These devices don't do any writeback but their device inodes still can get
      dirty so mark bdi appropriately so that bdi code does the right thing and files
      inodes to lists of bdi carrying the device inodes.
      
      Cc: stable@kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      371d217e
  9. 20 9月, 2010 1 次提交