1. 30 9月, 2010 17 次提交
  2. 24 9月, 2010 5 次提交
  3. 23 9月, 2010 4 次提交
    • K
      /proc/pid/smaps: fix dirty pages accounting · 1c2499ae
      KOSAKI Motohiro 提交于
      Currently, /proc/<pid>/smaps has wrong dirty pages accounting.
      Shared_Dirty and Private_Dirty output only pte dirty pages and ignore
      PG_dirty page flag.  It is difference against documentation, but also
      inconsistent against Referenced field.  (Referenced checks both pte and
      page flags)
      
      This patch fixes it.
      
      Test program:
      
       large-array.c
       ---------------------------------------------------
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       #include <unistd.h>
      
       char array[1*1024*1024*1024L];
      
       int main(void)
       {
               memset(array, 1, sizeof(array));
               pause();
      
               return 0;
       }
       ---------------------------------------------------
      
      Test case:
       1. run ./large-array
       2. cat /proc/`pidof large-array`/smaps
       3. swapoff -a
       4. cat /proc/`pidof large-array`/smaps again
      
      Test result:
       <before patch>
      
      00601000-40601000 rw-p 00000000 00:00 0
      Size:            1048576 kB
      Rss:             1048576 kB
      Pss:             1048576 kB
      Shared_Clean:          0 kB
      Shared_Dirty:          0 kB
      Private_Clean:    218992 kB   <-- showed pages as clean incorrectly
      Private_Dirty:    829584 kB
      Referenced:       388364 kB
      Swap:                  0 kB
      KernelPageSize:        4 kB
      MMUPageSize:           4 kB
      
       <after patch>
      
      00601000-40601000 rw-p 00000000 00:00 0
      Size:            1048576 kB
      Rss:             1048576 kB
      Pss:             1048576 kB
      Shared_Clean:          0 kB
      Shared_Dirty:          0 kB
      Private_Clean:         0 kB
      Private_Dirty:   1048576 kB  <-- fixed
      Referenced:       388480 kB
      Swap:                  0 kB
      KernelPageSize:        4 kB
      MMUPageSize:           4 kB
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c2499ae
    • J
      aio: do not return ERESTARTSYS as a result of AIO · a0c42bac
      Jan Kara 提交于
      OCFS2 can return ERESTARTSYS from its write function when the process is
      signalled while waiting for a cluster lock (and the filesystem is mounted
      with intr mount option).  Generally, it seems reasonable to allow
      filesystems to return this error code from its IO functions.  As we must
      not leak ERESTARTSYS (and similar error codes) to userspace as a result of
      an AIO operation, we have to properly convert it to EINTR inside AIO code
      (restarting the syscall isn't really an option because other AIO could
      have been already submitted by the same io_submit syscall).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0c42bac
    • A
      /proc/vmcore: fix seeking · c227e690
      Arnd Bergmann 提交于
      Commit 73296bc6 ("procfs: Use generic_file_llseek in /proc/vmcore")
      broke seeking on /proc/vmcore.  This changes it back to use default_llseek
      in order to restore the original behaviour.
      
      The problem with generic_file_llseek is that it only allows seeks up to
      inode->i_sb->s_maxbytes, which is zero on procfs and some other virtual
      file systems.  We should merge generic_file_llseek and default_llseek some
      day and clean this up in a proper way, but for 2.6.35/36, reverting vmcore
      is the safer solution.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Reported-by: NCAI Qian <caiqian@redhat.com>
      Tested-by: NCAI Qian <caiqian@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c227e690
    • D
      Prevent freeing uninitialized pointer in compat_do_readv_writev · 767b68e9
      Dan Rosenberg 提交于
      In 32-bit compatibility mode, the error handling for
      compat_do_readv_writev() may free an uninitialized pointer, potentially
      leading to all sorts of ugly memory corruption.  This is reliably
      triggerable by unprivileged users by invoking the readv()/writev()
      syscalls with an invalid iovec pointer.  The below patch fixes this to
      emulate the non-compat version.
      
      Introduced by commit b8373363 ("compat: factor out
      compat_rw_copy_check_uvector from compat_do_readv_writev")
      Signed-off-by: NDan Rosenberg <dan.j.rosenberg@gmail.com>
      Cc: stable@kernel.org (2.6.35)
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      767b68e9
  4. 22 9月, 2010 2 次提交
    • J
      bdi: Fix warnings in __mark_inode_dirty for /dev/zero and friends · 692ebd17
      Jan Kara 提交于
      Inodes of devices such as /dev/zero can get dirty for example via
      utime(2) syscall or due to atime update. Backing device of such inodes
      (zero_bdi, etc.) is however unable to handle dirty inodes and thus
      __mark_inode_dirty complains.  In fact, inode should be rather dirtied
      against backing device of the filesystem holding it. This is generally a
      good rule except for filesystems such as 'bdev' or 'mtd_inodefs'. Inodes
      in these pseudofilesystems are referenced from ordinary filesystem
      inodes and carry mapping with real data of the device. Thus for these
      inodes we have to use inode->i_mapping->backing_dev_info as we did so
      far. We distinguish these filesystems by checking whether sb->s_bdi
      points to a non-trivial backing device or not.
      
      Example: Assume we have an ext3 filesystem on /dev/sda1 mounted on /.
      There's a device inode A described by a path "/dev/sdb" on this
      filesystem. This inode will be dirtied against backing device "8:0"
      after this patch. bdev filesystem contains block device inode B coupled
      with our inode A. When someone modifies a page of /dev/sdb, it's B that
      gets dirtied and the dirtying happens against the backing device "8:16".
      Thus both inodes get filed to a correct bdi list.
      
      Cc: stable@kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      692ebd17
    • J
      char: Mark /dev/zero and /dev/kmem as not capable of writeback · 371d217e
      Jan Kara 提交于
      These devices don't do any writeback but their device inodes still can get
      dirty so mark bdi appropriately so that bdi code does the right thing and files
      inodes to lists of bdi carrying the device inodes.
      
      Cc: stable@kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      371d217e
  5. 20 9月, 2010 1 次提交
  6. 18 9月, 2010 3 次提交
  7. 17 9月, 2010 3 次提交
  8. 15 9月, 2010 4 次提交
    • J
      aio: check for multiplication overflow in do_io_submit · 75e1c70f
      Jeff Moyer 提交于
      Tavis Ormandy pointed out that do_io_submit does not do proper bounds
      checking on the passed-in iocb array:
      
             if (unlikely(nr < 0))
                     return -EINVAL;
      
             if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(iocbpp)))))
                     return -EFAULT;                      ^^^^^^^^^^^^^^^^^^
      
      The attached patch checks for overflow, and if it is detected, the
      number of iocbs submitted is scaled down to a number that will fit in
      the long.  This is an ok thing to do, as sys_io_submit is documented as
      returning the number of iocbs submitted, so callers should handle a
      return value of less than the 'nr' argument passed in.
      Reported-by: NTavis Ormandy <taviso@cmpxchg8b.com>
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      75e1c70f
    • J
      cifs: fix potential double put of TCP session reference · 460cf341
      Jeff Layton 提交于
      cifs_get_smb_ses must be called on a server pointer on which it holds an
      active reference. It first does a search for an existing SMB session. If
      it finds one, it'll put the server reference and then try to ensure that
      the negprot is done, etc.
      
      If it encounters an error at that point then it'll return an error.
      There's a potential problem here though. When cifs_get_smb_ses returns
      an error, the caller will also put the TCP server reference leading to a
      double-put.
      
      Fix this by having cifs_get_smb_ses only put the server reference if
      it found an existing session that it could use and isn't returning an
      error.
      
      Cc: stable@kernel.org
      Reviewed-by: NSuresh Jayaraman <sjayaraman@suse.de>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      460cf341
    • S
      ceph: stop sending FLUSHSNAPs when we hit a dirty capsnap · cfc0bf66
      Sage Weil 提交于
      Stop sending FLUSHSNAP messages when we hit a capsnap that has dirty_pages
      or is still writing.  We'll send the newer capsnaps only after the older
      ones complete.
      Signed-off-by: NSage Weil <sage@newdream.net>
      cfc0bf66
    • S
      ceph: correctly set 'follows' in flushsnap messages · 8bef9239
      Sage Weil 提交于
      The 'follows' should match the seq for the snap context for the given snap
      cap, which is the context under which we have been dirtying and writing
      data and metadata.  The snapshot that _contains_ those updates thus
      _follows_ that context's seq #.
      Signed-off-by: NSage Weil <sage@newdream.net>
      8bef9239
  9. 14 9月, 2010 1 次提交