1. 05 8月, 2010 15 次提交
  2. 04 8月, 2010 3 次提交
  3. 31 7月, 2010 1 次提交
    • H
      mm: fix ia64 crash when gcore reads gate area · de51257a
      Hugh Dickins 提交于
      Debian's ia64 autobuilders have been seeing kernel freeze or reboot
      when running the gdb testsuite (Debian bug 588574): dannf bisected to
      2.6.32 62eede62 "mm: ZERO_PAGE without
      PTE_SPECIAL"; and reproduced it with gdb's gcore on a simple target.
      
      I'd missed updating the gate_vma handling in __get_user_pages(): that
      happens to use vm_normal_page() (nowadays failing on the zero page),
      yet reported success even when it failed to get a page - boom when
      access_process_vm() tried to copy that to its intermediate buffer.
      
      Fix this, resisting cleanups: in particular, leave it for now reporting
      success when not asked to get any pages - very probably safe to change,
      but let's not risk it without testing exposure.
      
      Why did ia64 crash with 16kB pages, but succeed with 64kB pages?
      Because setup_gate() pads each 64kB of its gate area with zero pages.
      Reported-by: NAndreas Barth <aba@not.so.argh.org>
      Bisected-by: Ndann frazier <dannf@debian.org>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Tested-by: Ndann frazier <dannf@dannf.org>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de51257a
  4. 21 7月, 2010 2 次提交
    • Y
      x86,nobootmem: make alloc_bootmem_node fall back to other node when 32bit numa is used · b8ab9f82
      Yinghai Lu 提交于
      Borislav Petkov reported his 32bit numa system has problem:
      
      [    0.000000] Reserving total of 4c00 pages for numa KVA remap
      [    0.000000] kva_start_pfn ~ 32800 max_low_pfn ~ 375fe
      [    0.000000] max_pfn = 238000
      [    0.000000] 8202MB HIGHMEM available.
      [    0.000000] 885MB LOWMEM available.
      [    0.000000]   mapped low ram: 0 - 375fe000
      [    0.000000]   low ram: 0 - 375fe000
      [    0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 1000 1000 => 34e7000
      [    0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 200 40 => 34c9d80
      [    0.000000] alloc (nid=0 100000 - 7ee00000) (1000000 - ffffffffffffffff) 180 40 => 34e6140
      [    0.000000] alloc (nid=1 80000000 - c7e60000) (1000000 - ffffffffffffffff) 240 40 => 80000000
      [    0.000000] BUG: unable to handle kernel paging request at 40000000
      [    0.000000] IP: [<c2c8cff1>] __alloc_memory_core_early+0x147/0x1d6
      [    0.000000] *pdpt = 0000000000000000 *pde = f000ff53f000ff00
      ...
      [    0.000000] Call Trace:
      [    0.000000]  [<c2c8b4f8>] ? __alloc_bootmem_node+0x216/0x22f
      [    0.000000]  [<c2c90c9b>] ? sparse_early_usemaps_alloc_node+0x5a/0x10b
      [    0.000000]  [<c2c9149e>] ? sparse_init+0x1dc/0x499
      [    0.000000]  [<c2c79118>] ? paging_init+0x168/0x1df
      [    0.000000]  [<c2c780ff>] ? native_pagetable_setup_start+0xef/0x1bb
      
      looks like it allocates too much high address for bootmem.
      
      Try to cut limit with get_max_mapped()
      Reported-by: NBorislav Petkov <borislav.petkov@amd.com>
      Tested-by: NConny Seidel <conny.seidel@amd.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: <stable@kernel.org>		[2.6.34.x]
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8ab9f82
    • N
      mm/vmscan.c: fix mapping use after free · a6aa62a0
      Nick Piggin 提交于
      We need lock_page_nosync() here because we have no reference to the
      mapping when taking the page lock.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6aa62a0
  5. 19 7月, 2010 3 次提交
  6. 14 7月, 2010 1 次提交
  7. 06 7月, 2010 2 次提交
    • C
      writeback: simplify the write back thread queue · 83ba7b07
      Christoph Hellwig 提交于
      First remove items from work_list as soon as we start working on them.  This
      means we don't have to track any pending or visited state and can get
      rid of all the RCU magic freeing the work items - we can simply free
      them once the operation has finished.  Second use a real completion for
      tracking synchronous requests - if the caller sets the completion pointer
      we complete it, otherwise use it as a boolean indicator that we can free
      the work item directly.  Third unify struct wb_writeback_args and struct
      bdi_work into a single data structure, wb_writeback_work.  Previous we
      set all parameters into a struct wb_writeback_args, copied it into
      struct bdi_work, copied it again on the stack to use it there.  Instead
      of just allocate one structure dynamically or on the stack and use it
      all the way through the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      83ba7b07
    • C
      writeback: remove writeback_inodes_wbc · 9c3a8ee8
      Christoph Hellwig 提交于
      This was just an odd wrapper around writeback_inodes_wb.  Removing this
      also allows to get rid of the bdi member of struct writeback_control
      which was rather out of place there.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      9c3a8ee8
  8. 30 6月, 2010 2 次提交
  9. 18 6月, 2010 1 次提交
    • T
      percpu: fix first chunk match in per_cpu_ptr_to_phys() · 9983b6f0
      Tejun Heo 提交于
      per_cpu_ptr_to_phys() determines whether the passed in @addr belongs
      to the first_chunk or not by just matching the address against the
      address range of the base unit (unit0, used by cpu0).  When an adress
      from another cpu was passed in, it will always determine that the
      address doesn't belong to the first chunk even when it does.  This
      makes the function return a bogus physical address which may lead to
      crash.
      
      This problem was discovered by Cliff Wickman while investigating a
      crash during kdump on a SGI UV system.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NCliff Wickman <cpw@sgi.com>
      Tested-by: NCliff Wickman <cpw@sgi.com>
      Cc: stable@kernel.org
      9983b6f0
  10. 17 6月, 2010 1 次提交
  11. 11 6月, 2010 1 次提交
  12. 09 6月, 2010 2 次提交
    • D
      writeback: limit write_cache_pages integrity scanning to current EOF · d87815cb
      Dave Chinner 提交于
      sync can currently take a really long time if a concurrent writer is
      extending a file. The problem is that the dirty pages on the address
      space grow in the same direction as write_cache_pages scans, so if
      the writer keeps ahead of writeback, the writeback will not
      terminate until the writer stops adding dirty pages.
      
      For a data integrity sync, we only need to write the pages dirty at
      the time we start the writeback, so we can stop scanning once we get
      to the page that was at the end of the file at the time the scan
      started.
      
      This will prevent operations like copying a large file preventing
      sync from completing as it will not write back pages that were
      dirtied after the sync was started. This does not impact the
      existing integrity guarantees, as any dirty page (old or new)
      within the EOF range at the start of the scan will still be
      captured.
      
      This patch will not prevent sync from blocking on large writes into
      holes. That requires more complex intervention while this patch only
      addresses the common append-case of this sync holdoff.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d87815cb
    • D
      writeback: pay attention to wbc->nr_to_write in write_cache_pages · 0b564927
      Dave Chinner 提交于
      If a filesystem writes more than one page in ->writepage, write_cache_pages
      fails to notice this and continues to attempt writeback when wbc->nr_to_write
      has gone negative - this trace was captured from XFS:
      
          wbc_writeback_start: towrt=1024
          wbc_writepage: towrt=1024
          wbc_writepage: towrt=0
          wbc_writepage: towrt=-1
          wbc_writepage: towrt=-5
          wbc_writepage: towrt=-21
          wbc_writepage: towrt=-85
      
      This has adverse effects on filesystem writeback behaviour. write_cache_pages()
      needs to terminate after a certain number of pages are written, not after a
      certain number of calls to ->writepage are made.  This is a regression
      introduced by 17bc6c30 ("vfs: Add
      no_nrwrite_index_update writeback control flag"), but cannot be reverted
      directly due to subsequent bug fixes that have gone in on top of it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b564927
  13. 05 6月, 2010 2 次提交
  14. 01 6月, 2010 1 次提交
  15. 28 5月, 2010 3 次提交
    • N
      tmpfs: convert to use the new truncate convention · 3889e6e7
      npiggin@suse.de 提交于
      Cc: Christoph Hellwig <hch@lst.de>
      Acked-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3889e6e7
    • N
      fs: introduce new truncate sequence · 7bb46a67
      npiggin@suse.de 提交于
      Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
      setattr > vmtruncate > truncate, have filesystems call their truncate sequence
      from ->setattr if filesystem specific operations are required. vmtruncate is
      deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
      previously should be used.
      
      simple_setattr is introduced for simple in-ram filesystems to implement
      the new truncate sequence. Eventually all filesystems should be converted
      to implement a setattr, and the default code in notify_change should go
      away.
      
      simple_setsize is also introduced to perform just the ATTR_SIZE portion
      of simple_setattr (ie. changing i_size and trimming pagecache).
      
      To implement the new truncate sequence:
      - filesystem specific manipulations (eg freeing blocks) must be done in
        the setattr method rather than ->truncate.
      - vmtruncate can not be used by core code to trim blocks past i_size in
        the event of write failure after allocation, so this must be performed
        in the fs code.
      - convert usage of helpers block_write_begin, nobh_write_begin,
        cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
        variants. These avoid calling vmtruncate to trim blocks (see previous).
      - inode_setattr should not be used. generic_setattr is a new function
        to be used to copy simple attributes into the generic inode.
      - make use of the better opportunity to handle errors with the new sequence.
      
      Big problem with the previous calling sequence: the filesystem is not called
      until i_size has already changed.  This means it is not allowed to fail the
      call, and also it does not know what the previous i_size was. Also, generic
      code calling vmtruncate to truncate allocated blocks in case of error had
      no good way to return a meaningful error (or, for example, atomically handle
      block deallocation).
      
      Cc: Christoph Hellwig <hch@lst.de>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7bb46a67
    • C
      rename the generic fsync implementations · 1b061d92
      Christoph Hellwig 提交于
      We don't name our generic fsync implementations very well currently.
      The no-op implementation for in-memory filesystems currently is called
      simple_sync_file which doesn't make too much sense to start with,
      the the generic one for simple filesystems is called simple_fsync
      which can lead to some confusion.
      
      This patch renames the generic file fsync method to generic_file_fsync
      to match the other generic_file_* routines it is supposed to be used
      with, and the no-op implementation to noop_fsync to make it obvious
      what to expect.  In addition add some documentation for both methods.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1b061d92