1. 28 4月, 2007 1 次提交
  2. 27 4月, 2007 1 次提交
    • M
      [S390] split page_test_and_clear_dirty. · 6c210482
      Martin Schwidefsky 提交于
      The page_test_and_clear_dirty primitive really consists of two
      operations, page_test_dirty and the page_clear_dirty. The combination
      of the two is not an atomic operation, so it makes more sense to have
      two separate operations instead of one.
      In addition to the improved readability of the s390 version of
      SetPageUptodate, it now avoids the page_test_dirty operation which is
      an insert-storage-key-extended (iske) instruction which is an expensive
      operation.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      6c210482
  3. 24 4月, 2007 3 次提交
  4. 13 4月, 2007 1 次提交
  5. 04 4月, 2007 2 次提交
    • D
      [PATCH] SLAB: Mention slab name when listing corrupt objects · e94a40c5
      David Howells 提交于
      Mention the slab name when listing corrupt objects.  Although the function
      that released the memory is mentioned, that is frequently ambiguous as such
      functions often release several pieces of memory.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e94a40c5
    • M
      [S390] page_mkclean data corruption. · 6e1beb3c
      Martin Schwidefsky 提交于
      The git commit c2fda5fe which
      added the page_test_and_clear_dirty call to page_mkclean and the
      git commit 7658cc28 which fixes
      the "nasty and subtle race in shared mmap'ed page writeback"
      problem in clear_page_dirty_for_io cause data corruption on s390.
      
      The effect of the two changes is that for every call to
      clear_page_dirty_for_io a page_test_and_clear_dirty is done. If
      the per page dirty bit is set set_page_dirty is called. Strangly
      clear_page_dirty_for_io is called for not-uptodate pages, e.g.
      over this call-chain:
      
       [<000000000007c0f2>] clear_page_dirty_for_io+0x12a/0x130
       [<000000000007c494>] generic_writepages+0x258/0x3e0
       [<000000000007c692>] do_writepages+0x76/0x7c
       [<00000000000c7a26>] __writeback_single_inode+0xba/0x3e4
       [<00000000000c831a>] sync_sb_inodes+0x23e/0x398
       [<00000000000c8802>] writeback_inodes+0x12e/0x140
       [<000000000007b9ee>] wb_kupdate+0xd2/0x178
       [<000000000007cca2>] pdflush+0x162/0x23c
      
      The bad news now is that page_test_and_clear_dirty might claim
      that a not-uptodate page is dirty since SetPageUptodate which
      resets the per page dirty bit has not yet been called. The page
      writeback that follows clobbers the data on disk.
      
      The simplest solution to this problem is to move the call to
      page_test_and_clear_dirty under the "if (page_mapped(page))".
      If a file backed page is mapped it is uptodate.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      6e1beb3c
  6. 29 3月, 2007 5 次提交
    • C
      [PATCH] mm: fix xip issue with /dev/zero · a76c0b97
      Carsten Otte 提交于
      Fix the bug, that reading into xip mapping from /dev/zero fills the user
      page table with ZERO_PAGE() entries.  Later on, xip cannot tell which pages
      have been ZERO_PAGE() filled by access to a sparse mapping, and which ones
      origin from /dev/zero.  It will unmap ZERO_PAGE from all mappings when
      filling the sparse hole with data.  xip does now use its own zeroed page
      for its sparse mappings.  Please apply.
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a76c0b97
    • H
      [PATCH] holepunch: fix mmap_sem i_mutex deadlock · 90ed52eb
      Hugh Dickins 提交于
      sys_madvise has down_write of mmap_sem, then madvise_remove calls
      vmtruncate_range which takes i_mutex and i_alloc_sem: no, we can easily devise
      deadlocks from that ordering.
      
      madvise_remove drop mmap_sem while calling vmtruncate_range: luckily, since
      madvise_remove doesn't split or merge vmas, it's easy to handle this case with
      a NULL prev, without restructuring sys_madvise.  (Though sad to retake
      mmap_sem when it's unlikely to be needed, and certainly down_read is
      sufficient for MADV_REMOVE, unlike the other madvices.)
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      90ed52eb
    • H
      [PATCH] holepunch: fix disconnected pages after second truncate · 16a10019
      Hugh Dickins 提交于
      shmem_truncate_range has its own truncate_inode_pages_range, to free any pages
      racily instantiated while it was in progress: a SHMEM_PAGEIN flag is set when
      this might have happened.  But holepunching gets no chance to clear that flag
      at the start of vmtruncate_range, so it's always set (unless a truncate came
      just before), so holepunch almost always does this second
      truncate_inode_pages_range.
      
      shmem holepunch has unlikely swap<->file races hereabouts whatever we do
      (without a fuller rework than is fit for this release): I was going to skip
      the second truncate in the punch_hole case, but Miklos points out that would
      make holepunch correctness more vulnerable to swapoff.  So keep the second
      truncate, but follow it by an unmap_mapping_range to eliminate the
      disconnected pages (freed from pagecache while still mapped in userspace) that
      it might have left behind.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16a10019
    • H
      [PATCH] holepunch: fix shmem_truncate_range punch locking · 1ae70006
      Hugh Dickins 提交于
      Miklos Szeredi observes that during truncation of shmem page directories,
      info->lock is released to improve latency (after lowering i_size and
      next_index to exclude races); but this is quite wrong for holepunching, which
      receives no such protection from i_size or next_index, and is left vulnerable
      to races with shmem_unuse, shmem_getpage and shmem_writepage.
      
      Hold info->lock throughout when holepunching?  No, any user could prevent
      rescheduling for far too long.  Instead take info->lock just when needed: in
      shmem_free_swp when removing the swap entries, and whenever removing a
      directory page from the level above.  But so long as we remove before
      scanning, we can safely skip taking the lock at the lower levels, except at
      misaligned start and end of the hole.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ae70006
    • H
      [PATCH] holepunch: fix shmem_truncate_range punching too far · a2646d1e
      Hugh Dickins 提交于
      Miklos Szeredi observes BUG_ON(!entry) in shmem_writepage() triggered in rare
      circumstances, because shmem_truncate_range() erroneously removes partially
      truncated directory pages at the end of the range: later reclaim on pages
      pointing to these removed directories triggers the BUG.  Indeed, and it can
      also cause data loss beyond the hole.
      
      Fix this as in the patch proposed by Miklos, but distinguish between "limit"
      (how far we need to search: ignore truncation's next_index optimization in the
      holepunch case - if there are races it's more consistent to act on the whole
      range specified) and "upper_limit" (how far we can free directory pages:
      generally we must be careful to keep partially punched pages, but can relax at
      end of file - i_size being held stable by i_mutex).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Miklos Szeredi <mszeredi@suse.cs>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2646d1e
  7. 27 3月, 2007 1 次提交
    • V
      block: blk_max_pfn is somtimes wrong · f772b3d9
      Vasily Tarasov 提交于
      There is a small problem in handling page bounce.
      
      At the moment blk_max_pfn equals max_pfn, which is in fact not maximum
      possible _number_ of a page frame, but the _amount_ of page frames.  For
      example for the 32bit x86 node with 4Gb RAM, max_pfn = 0x100000, but not
      0xFFFF.
      
      request_queue structure has a member q->bounce_pfn and queue needs bounce
      pages for the pages _above_ this limit.  This routine is handled by
      blk_queue_bounce(), where the following check is produced:
      
      	if (q->bounce_pfn >= blk_max_pfn)
      		return;
      
      Assume, that a driver has set q->bounce_pfn to 0xFFFF, but blk_max_pfn
      equals 0x10000.  In such situation the check above fails and for each bio
      we always fall down for iterating over pages tied to the bio.
      
      I want to notice, that for quite a big range of device drivers (ide, md,
      ...) such problem doesn't happen because they use BLK_BOUNCE_ANY for
      bounce_pfn.  BLK_BOUNCE_ANY is defined as blk_max_pfn << PAGE_SHIFT, and
      then the check above doesn't fail.  But for other drivers, which obtain
      reuired value from drivers, it fails.  For example sata_nv uses
      ATA_DMA_MASK or dev->dma_mask.
      
      I propose to use (max_pfn - 1) for blk_max_pfn.  And the same for
      blk_max_low_pfn.  The patch also cleanses some checks related with
      bounce_pfn.
      Signed-off-by: NVasily Tarasov <vtaras@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      f772b3d9
  8. 23 3月, 2007 2 次提交
    • D
      [PATCH] NOMMU: make SYSV SHM nattch work correctly · 165b2392
      David Howells 提交于
      Make the SYSV SHM nattch counter work correctly by forcing multiple VMAs to
      be produced to represent MAP_SHARED segments, even if they overlap exactly.
      
      Using this test program:
      
      	http://people.redhat.com/~dhowells/doshm.c
      
      Run as:
      
      	doshm sysv
      
      I can see nattch going from one before the patch:
      
      	# /doshm sysv
      	Command: sysv
      	shmid: 65536
      	memory: 0xc3700000
      	c0b00000-c0b04000 rw-p 00000000 00:00 0
      	c0bb0000-c0bba788 r-xs 00000000 00:0b 14582157  /lib/ld-uClibc-0.9.28.so
      	c3180000-c31dede4 r-xs 00000000 00:0b 14582179  /lib/libuClibc-0.9.28.so
      	c3520000-c352278c rw-p 00000000 00:0b 13763417  /doshm
      	c3584000-c35865e8 r-xs 00000000 00:0b 13763417  /doshm
      	c3588000-c358aa00 rw-p 00008000 00:0b 14582157  /lib/ld-uClibc-0.9.28.so
      	c3590000-c359b6c0 rw-p 00000000 00:00 0
      	c3620000-c3640000 rwxp 00000000 00:00 0
      	c3700000-c37fa000 rw-S 00000000 00:06 1411      /SYSV00000000 (deleted)
      	c3700000-c37fa000 rw-S 00000000 00:06 1411      /SYSV00000000 (deleted)
      	nattch 1
      
      To two after the patch:
      
      	# /doshm sysv
      	Command: sysv
      	shmid: 0
      	memory: 0xc3700000
      	c0bb0000-c0bba788 r-xs 00000000 00:0b 14582157  /lib/ld-uClibc-0.9.28.so
      	c3180000-c31dede4 r-xs 00000000 00:0b 14582179  /lib/libuClibc-0.9.28.so
      	c3320000-c3340000 rwxp 00000000 00:00 0
      	c3530000-c35325e8 r-xs 00000000 00:0b 13763417  /doshm
      	c3534000-c353678c rw-p 00000000 00:0b 13763417  /doshm
      	c3538000-c353aa00 rw-p 00008000 00:0b 14582157  /lib/ld-uClibc-0.9.28.so
      	c3590000-c359b6c0 rw-p 00000000 00:00 0
      	c35a4000-c35a8000 rw-p 00000000 00:00 0
      	c3700000-c37fa000 rw-S 00000000 00:06 1369      /SYSV00000000 (deleted)
      	c3700000-c37fa000 rw-S 00000000 00:06 1369      /SYSV00000000 (deleted)
      	nattch 2
      
      That's +1 to nattch for each shmat() made.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      165b2392
    • D
      [PATCH] NOMMU: supply get_unmapped_area() to fix NOMMU SYSV SHM · d56e03cd
      David Howells 提交于
      Supply a get_unmapped_area() to fix NOMMU SYSV SHM support.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NAdam Litke <agl@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d56e03cd
  9. 17 3月, 2007 4 次提交
  10. 05 3月, 2007 2 次提交
  11. 02 3月, 2007 7 次提交
  12. 21 2月, 2007 3 次提交
  13. 17 2月, 2007 1 次提交
    • N
      [PATCH] knfsd: stop NFSD writes from being broken into lots of little writes to filesystem · 29dbb3fc
      NeilBrown 提交于
      When NFSD receives a write request, the data is typically in a number of
      1448 byte segments and writev is used to collect them together.
      
      Unfortunately, generic_file_buffered_write passes these to the filesystem
      one at a time, so an e.g.  32K over-write becomes a series of partial-page
      writes to each page, causing the filesystem to have to pre-read those pages
      - wasted effort.
      
      generic_file_buffered_write handles one segment of the vector at a time as
      it has to pre-fault in each segment to avoid deadlocks.  When writing from
      kernel-space (and nfsd does) this is not an issue, so
      generic_file_buffered_write does not need to break and iovec from nfsd into
      little pieces.
      
      This patch avoids the splitting when  get_fs is KERNEL_DS as it is
      from NFSd.
      
      This issue was introduced by commit 6527c2bdAcked-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Cc: Norman Weathers <norman.r.weathers@conocophillips.com>
      Cc: Vladimir V. Saveliev <vs@namesys.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29dbb3fc
  14. 16 2月, 2007 3 次提交
  15. 13 2月, 2007 4 次提交