1. 08 8月, 2014 1 次提交
  2. 31 7月, 2014 8 次提交
  3. 30 7月, 2014 1 次提交
    • R
      mm: fix page_alloc.c kernel-doc warnings · 1aab4d77
      Randy Dunlap 提交于
      Fix kernel-doc warnings and function name in mm/page_alloc.c:
      
        Warning(..//mm/page_alloc.c:6074): No description found for parameter 'pfn'
        Warning(..//mm/page_alloc.c:6074): No description found for parameter 'mask'
        Warning(..//mm/page_alloc.c:6074): Excess function parameter 'start_bitidx' description in 'get_pfnblock_flags_mask'
        Warning(..//mm/page_alloc.c:6102): No description found for parameter 'pfn'
        Warning(..//mm/page_alloc.c:6102): No description found for parameter 'mask'
        Warning(..//mm/page_alloc.c:6102): Excess function parameter 'start_bitidx' description in 'set_pfnblock_flags_mask'
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1aab4d77
  4. 27 7月, 2014 1 次提交
    • H
      mm: fix direct reclaim writeback regression · 8bdd6380
      Hugh Dickins 提交于
      Shortly before 3.16-rc1, Dave Jones reported:
      
        WARNING: CPU: 3 PID: 19721 at fs/xfs/xfs_aops.c:971
                 xfs_vm_writepage+0x5ce/0x630 [xfs]()
        CPU: 3 PID: 19721 Comm: trinity-c61 Not tainted 3.15.0+ #3
        Call Trace:
          xfs_vm_writepage+0x5ce/0x630 [xfs]
          shrink_page_list+0x8f9/0xb90
          shrink_inactive_list+0x253/0x510
          shrink_lruvec+0x563/0x6c0
          shrink_zone+0x3b/0x100
          shrink_zones+0x1f1/0x3c0
          try_to_free_pages+0x164/0x380
          __alloc_pages_nodemask+0x822/0xc90
          alloc_pages_vma+0xaf/0x1c0
          handle_mm_fault+0xa31/0xc50
        etc.
      
       970   if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
       971                   PF_MEMALLOC))
      
      I did not respond at the time, because a glance at the PageDirty block
      in shrink_page_list() quickly shows that this is impossible: we don't do
      writeback on file pages (other than tmpfs) from direct reclaim nowadays.
      Dave was hallucinating, but it would have been disrespectful to say so.
      
      However, my own /var/log/messages now shows similar complaints
      
        WARNING: CPU: 1 PID: 28814 at fs/ext4/inode.c:1881 ext4_writepage+0xa7/0x38b()
        WARNING: CPU: 0 PID: 27347 at fs/ext4/inode.c:1764 ext4_writepage+0xa7/0x38b()
      
      from stressing some mmotm trees during July.
      
      Could a dirty xfs or ext4 file page somehow get marked PageSwapBacked,
      so fail shrink_page_list()'s page_is_file_cache() test, and so proceed
      to mapping->a_ops->writepage()?
      
      Yes, 3.16-rc1's commit 68711a74 ("mm, migration: add destination
      page freeing callback") has provided such a way to compaction: if
      migrating a SwapBacked page fails, its newpage may be put back on the
      list for later use with PageSwapBacked still set, and nothing will clear
      it.
      
      Whether that can do anything worse than issue WARN_ON_ONCEs, and get
      some statistics wrong, is unclear: easier to fix than to think through
      the consequences.
      
      Fixing it here, before the put_new_page(), addresses the bug directly,
      but is probably the worst place to fix it.  Page migration is doing too
      many parts of the job on too many levels: fixing it in
      move_to_new_page() to complement its SetPageSwapBacked would be
      preferable, except why is it (and newpage->mapping and newpage->index)
      done there, rather than down in migrate_page_move_mapping(), once we are
      sure of success? Not a cleanup to get into right now, especially not
      with memcg cleanups coming in 3.17.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bdd6380
  5. 24 7月, 2014 6 次提交
    • N
      mm: hugetlb: fix copy_hugetlb_page_range() · 0253d634
      Naoya Horiguchi 提交于
      Commit 4a705fef ("hugetlb: fix copy_hugetlb_page_range() to handle
      migration/hwpoisoned entry") changed the order of
      huge_ptep_set_wrprotect() and huge_ptep_get(), which leads to breakage
      in some workloads like hugepage-backed heap allocation via libhugetlbfs.
      This patch fixes it.
      
      The test program for the problem is shown below:
      
        $ cat heap.c
        #include <unistd.h>
        #include <stdlib.h>
        #include <string.h>
      
        #define HPS 0x200000
      
        int main() {
        	int i;
        	char *p = malloc(HPS);
        	memset(p, '1', HPS);
        	for (i = 0; i < 5; i++) {
        		if (!fork()) {
        			memset(p, '2', HPS);
        			p = malloc(HPS);
        			memset(p, '3', HPS);
        			free(p);
        			return 0;
        		}
        	}
        	sleep(1);
        	free(p);
        	return 0;
        }
      
        $ export HUGETLB_MORECORE=yes ; export HUGETLB_NO_PREFAULT= ; hugectl --heap ./heap
      
      Fixes 4a705fef ("hugetlb: fix copy_hugetlb_page_range() to handle
      migration/hwpoisoned entry"), so is applicable to -stable kernels which
      include it.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reported-by: NGuillaume Morin <guillaume@morinfr.org>
      Suggested-by: NGuillaume Morin <guillaume@morinfr.org>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>	[2.6.37+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0253d634
    • H
      mm/fs: fix pessimization in hole-punching pagecache · 792ceaef
      Hugh Dickins 提交于
      I wanted to revert my v3.1 commit d0823576 ("mm: pincer in
      truncate_inode_pages_range"), to keep truncate_inode_pages_range() in
      synch with shmem_undo_range(); but have stepped back - a change to
      hole-punching in truncate_inode_pages_range() is a change to
      hole-punching in every filesystem (except tmpfs) that supports it.
      
      If there's a logical proof why no filesystem can depend for its own
      correctness on the pincer guarantee in truncate_inode_pages_range() - an
      instant when the entire hole is removed from pagecache - then let's
      revisit later.  But the evidence is that only tmpfs suffered from the
      livelock, and we have no intention of extending hole-punch to ramfs.  So
      for now just add a few comments (to match or differ from those in
      shmem_undo_range()), and fix one silliness noticed in d0823576...
      
      Its "index == start" addition to the hole-punch termination test was
      incomplete: it opened a way for the end condition to be missed, and the
      loop go on looking through the radix_tree, all the way to end of file.
      Fix that pessimization by resetting index when detected in inner loop.
      
      Note that it's actually hard to hit this case, without the obsessive
      concurrent faulting that trinity does: normally all pages are removed in
      the initial trylock_page() pass, and this loop finds nothing to do.  I
      had to "#if 0" out the initial pass to reproduce bug and test fix.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Lukas Czerner <lczerner@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      792ceaef
    • H
      shmem: fix splicing from a hole while it's punched · b1a36650
      Hugh Dickins 提交于
      shmem_fault() is the actual culprit in trinity's hole-punch starvation,
      and the most significant cause of such problems: since a page faulted is
      one that then appears page_mapped(), needing unmap_mapping_range() and
      i_mmap_mutex to be unmapped again.
      
      But it is not the only way in which a page can be brought into a hole in
      the radix_tree while that hole is being punched; and Vlastimil's testing
      implies that if enough other processors are busy filling in the hole,
      then shmem_undo_range() can be kept from completing indefinitely.
      
      shmem_file_splice_read() is the main other user of SGP_CACHE, which can
      instantiate shmem pagecache pages in the read-only case (without holding
      i_mutex, so perhaps concurrently with a hole-punch).  Probably it's
      silly not to use SGP_READ already (using the ZERO_PAGE for holes): which
      ought to be safe, but might bring surprises - not a change to be rushed.
      
      shmem_read_mapping_page_gfp() is an internal interface used by
      drivers/gpu/drm GEM (and next by uprobes): it should be okay.  And
      shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when
      called internally by the kernel (perhaps for a stacking filesystem,
      which might rely on holes to be reserved): it's unclear whether it could
      be provoked to keep hole-punch busy or not.
      
      We could apply the same umbrella as now used in shmem_fault() to
      shmem_file_splice_read() and the others; but it looks ugly, and use over
      a range raises questions - should it actually be per page? can these get
      starved themselves?
      
      The origin of this part of the problem is my v3.1 commit d0823576
      ("mm: pincer in truncate_inode_pages_range"), once it was duplicated
      into shmem.c.  It seemed like a nice idea at the time, to ensure
      (barring RCU lookup fuzziness) that there's an instant when the entire
      hole is empty; but the indefinitely repeated scans to ensure that make
      it vulnerable.
      
      Revert that "enhancement" to hole-punch from shmem_undo_range(), but
      retain the unproblematic rescanning when it's truncating; add a couple
      of comments there.
      
      Remove the "indices[0] >= end" test: that is now handled satisfactorily
      by the inner loop, and mem_cgroup_uncharge_start()/end() are too light
      to be worth avoiding here.
      
      But if we do not always loop indefinitely, we do need to handle the case
      of swap swizzled back to page before shmem_free_swap() gets it: add a
      retry for that case, as suggested by Konstantin Khlebnikov; and for the
      case of page swizzled back to swap, as suggested by Johannes Weiner.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Suggested-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lukas Czerner <lczerner@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: <stable@vger.kernel.org>	[3.1+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1a36650
    • H
      shmem: fix faulting into a hole, not taking i_mutex · 8e205f77
      Hugh Dickins 提交于
      Commit f00cdc6d ("shmem: fix faulting into a hole while it's
      punched") was buggy: Sasha sent a lockdep report to remind us that
      grabbing i_mutex in the fault path is a no-no (write syscall may already
      hold i_mutex while faulting user buffer).
      
      We tried a completely different approach (see following patch) but that
      proved inadequate: good enough for a rational workload, but not good
      enough against trinity - which forks off so many mappings of the object
      that contention on i_mmap_mutex while hole-puncher holds i_mutex builds
      into serious starvation when concurrent faults force the puncher to fall
      back to single-page unmap_mapping_range() searches of the i_mmap tree.
      
      So return to the original umbrella approach, but keep away from i_mutex
      this time.  We really don't want to bloat every shmem inode with a new
      mutex or completion, just to protect this unlikely case from trinity.
      So extend the original with wait_queue_head on stack at the hole-punch
      end, and wait_queue item on the stack at the fault end.
      
      This involves further use of i_lock to guard against the races: lockdep
      has been happy so far, and I see fs/inode.c:unlock_new_inode() holds
      i_lock around wake_up_bit(), which is comparable to what we do here.
      i_lock is more convenient, but we could switch to shmem's info->lock.
      
      This issue has been tagged with CVE-2014-4171, which will require commit
      f00cdc6d and this and the following patch to be backported: we
      suggest to 3.1+, though in fact the trinity forkbomb effect might go
      back as far as 2.6.16, when madvise(,,MADV_REMOVE) came in - or might
      not, since much has changed, with i_mmap_mutex a spinlock before 3.0.
      Anyone running trinity on 3.0 and earlier? I don't think we need care.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lukas Czerner <lczerner@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: <stable@vger.kernel.org>	[3.1+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8e205f77
    • K
      mm: do not call do_fault_around for non-linear fault · c118678b
      Konstantin Khlebnikov 提交于
      Ingo Korb reported that "repeated mapping of the same file on tmpfs
      using remap_file_pages sometimes triggers a BUG at mm/filemap.c:202 when
      the process exits".
      
      He bisected the bug to d7c17551 ("mm: implement ->map_pages for
      shmem/tmpfs"), although the bug was actually added by commit
      8c6e50b0 ("mm: introduce vm_ops->map_pages()").
      
      The problem is caused by calling do_fault_around for a _non-linear_
      fault.  In this case pgoff is shifted and might become negative during
      calculation.
      
      Faulting around non-linear page-fault makes no sense and breaks the
      logic in do_fault_around because pgoff is shifted.
      Signed-off-by: NKonstantin Khlebnikov <koct9i@gmail.com>
      Reported-by: NIngo Korb <ingo.korb@tu-dortmund.de>
      Tested-by: NIngo Korb <ingo.korb@tu-dortmund.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Ning Qu <quning@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>	[3.15.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c118678b
    • N
      mm/rmap.c: fix pgoff calculation to handle hugepage correctly · a0f7a756
      Naoya Horiguchi 提交于
      I triggered VM_BUG_ON() in vma_address() when I tried to migrate an
      anonymous hugepage with mbind() in the kernel v3.16-rc3.  This is
      because pgoff's calculation in rmap_walk_anon() fails to consider
      compound_order() only to have an incorrect value.
      
      This patch introduces page_to_pgoff(), which gets the page's offset in
      PAGE_CACHE_SIZE.
      
      Kirill pointed out that page cache tree should natively handle
      hugepages, and in order to make hugetlbfs fit it, page->index of
      hugetlbfs page should be in PAGE_CACHE_SIZE.  This is beyond this patch,
      but page_to_pgoff() contains the point to be fixed in a single function.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0f7a756
  6. 04 7月, 2014 5 次提交
    • H
      shmem: fix init_page_accessed use to stop !PageLRU bug · 66d2f4d2
      Hugh Dickins 提交于
      Under shmem swapping load, I sometimes hit the VM_BUG_ON_PAGE(!PageLRU)
      in isolate_lru_pages() at mm/vmscan.c:1281!
      
      Commit 2457aec6 ("mm: non-atomically mark page accessed during page
      cache allocation where possible") looks like interrupted work-in-progress.
      
      mm/filemap.c's call to init_page_accessed() is fine, but not mm/shmem.c's
      - shmem_write_begin() is clearly wrong to use it after shmem_getpage(),
      when the page is always visible in radix_tree, and often already on LRU.
      
      Revert change to shmem_write_begin(), and use init_page_accessed() or
      mark_page_accessed() appropriately for SGP_WRITE in shmem_getpage_gfp().
      
      SGP_WRITE also covers shmem_symlink(), which did not mark_page_accessed()
      before; but since many other filesystems use [__]page_symlink(), which did
      and does mark the page accessed, consider this as rectifying an oversight.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Prabhakar Lad <prabhakar.csengg@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      66d2f4d2
    • C
      hwpoison: fix the handling path of the victimized page frame that belong to non-LRU · 0bc1f8b0
      Chen Yucong 提交于
      Until now, the kernel has the same policy to handle victimized page
      frames that belong to kernel-space(reserved/slab-subsystem) or
      non-LRU(unknown page state).  In other word, the result of handling
      either of these victimized page frames is (IGNORED | FAILED), and the
      return value of memory_failure() is -EBUSY.
      
      This patch is to avoid that memory_failure() returns very soon due to
      the "true" value of (!PageLRU(p)), and it also ensures that
      action_result() can report more precise information("reserved kernel",
      "kernel slab", and "unknown page state") instead of "non LRU",
      especially for memory errors which are detected by memory-scrubbing.
      
      Andi said:
      
      : While running the mcelog test suite on 3.14 I hit the following VM_BUG_ON:
      :
      : soft_offline: 0x56d4: unknown non LRU page type 3ffff800008000
      : page:ffffea000015b400 count:3 mapcount:2097169 mapping:          (null) index:0xffff8800056d7000
      : page flags: 0x3ffff800004081(locked|slab|head)
      : ------------[ cut here ]------------
      : kernel BUG at mm/rmap.c:1495!
      :
      : I think what happened is that a LRU page turned into a slab page in
      : parallel with offlining.  memory_failure initially tests for this case,
      : but doesn't retest later after the page has been locked.
      :
      : ...
      :
      : I ran this patch in a loop over night with some stress plus
      : the mcelog test suite running in a loop. I cannot guarantee it hit it,
      : but it should have given it a good beating.
      :
      : The kernel survived with no messages, although the mcelog test suite
      : got killed at some point because it couldn't fork anymore. Probably
      : some unrelated problem.
      :
      : So the patch is ok for me for .16.
      Signed-off-by: NChen Yucong <slaoub@gmail.com>
      Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reported-by: NAndi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0bc1f8b0
    • N
      msync: fix incorrect fstart calculation · 496a8e68
      Namjae Jeon 提交于
      Fix a regression caused by 7fc34a62 ("mm/msync.c: sync only the
      requested range in msync()").
      
      xfstests generic/075 fail occured on ext4 data=journal mode because the
      intended range was not syncing due to wrong fstart calculation.
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com>
      Reported-by: NEric Whitney <enwlinux@gmail.com>
      Tested-by: NEric Whitney <enwlinux@gmail.com>
      Acked-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
      Reviewed-by: NLukas Czerner <lczerner@redhat.com>
      Tested-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      496a8e68
    • J
      slub: fix off by one in number of slab tests · 8a5b20ae
      Joonsoo Kim 提交于
      min_partial means minimum number of slab cached in node partial list.
      So, if nr_partial is less than it, we keep newly empty slab on node
      partial list rather than freeing it.  But if nr_partial is equal or
      greater than it, it means that we have enough partial slabs so should
      free newly empty slab.  Current implementation missed the equal case so
      if we set min_partial is 0, then, at least one slab could be cached.
      This is critical problem to kmemcg destroying logic because it doesn't
      works properly if some slabs is cached.  This patch fixes this problem.
      
      Fixes 91cb69620284 ("slub: make dead memcg caches discard free slabs
      immediately").
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a5b20ae
    • M
      mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER · dc78327c
      Michal Nazarewicz 提交于
      With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
      the following is triggered at early boot:
      
        SMP: Total of 8 processors activated.
        devtmpfs: initialized
        Unable to handle kernel NULL pointer dereference at virtual address 00000008
        pgd = fffffe0000050000
        [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
        Internal error: Oops: 96000006 [#1] SMP
        Modules linked in:
        CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
        task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
        PC is at __list_add+0x10/0xd4
        LR is at free_one_page+0x270/0x638
        ...
        Call trace:
          __list_add+0x10/0xd4
          free_one_page+0x26c/0x638
          __free_pages_ok.part.52+0x84/0xbc
          __free_pages+0x74/0xbc
          init_cma_reserved_pageblock+0xe8/0x104
          cma_init_reserved_areas+0x190/0x1e4
          do_one_initcall+0xc4/0x154
          kernel_init_freeable+0x204/0x2a8
          kernel_init+0xc/0xd4
      
      This happens because init_cma_reserved_pageblock() calls
      __free_one_page() with pageblock_order as page order but it is bigger
      than MAX_ORDER.  This in turn causes accesses past zone->free_list[].
      
      Fix the problem by changing init_cma_reserved_pageblock() such that it
      splits pageblock into individual MAX_ORDER pages if pageblock is bigger
      than a MAX_ORDER page.
      
      In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
      architectures expect for ia64, powerpc and tile at the moment, the
      “pageblock_order > MAX_ORDER” condition will be optimised out since both
      sides of the operator are constants.  In cases where pageblock size is
      variable, the performance degradation should not be significant anyway
      since init_cma_reserved_pageblock() is called only at boot time at most
      MAX_CMA_AREAS times which by default is eight.
      Signed-off-by: NMichal Nazarewicz <mina86@mina86.com>
      Reported-by: NMark Salter <msalter@redhat.com>
      Tested-by: NMark Salter <msalter@redhat.com>
      Tested-by: NChristopher Covington <cov@codeaurora.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: <stable@vger.kernel.org>	[3.5+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc78327c
  7. 25 6月, 2014 1 次提交
    • G
      cpuset,mempolicy: fix sleeping function called from invalid context · 391acf97
      Gu Zheng 提交于
      When runing with the kernel(3.15-rc7+), the follow bug occurs:
      [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
      [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
      [ 9969.441175] INFO: lockdep is turned off.
      [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G       A      3.15.0-rc7+ #85
      [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
      [ 9969.706052]  ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
      [ 9969.795323]  ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
      [ 9969.884710]  ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
      [ 9969.974071] Call Trace:
      [ 9970.003403]  [<ffffffff8162f523>] dump_stack+0x4d/0x66
      [ 9970.065074]  [<ffffffff8109995a>] __might_sleep+0xfa/0x130
      [ 9970.130743]  [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
      [ 9970.200638]  [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
      [ 9970.272610]  [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
      [ 9970.344584]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
      [ 9970.409282]  [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
      [ 9970.471897]  [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
      [ 9970.536585]  [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
      [ 9970.613763]  [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
      [ 9970.683660]  [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
      [ 9970.759795]  [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
      [ 9970.834885]  [<ffffffff8106a598>] do_fork+0xd8/0x380
      [ 9970.894375]  [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
      [ 9970.969470]  [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
      [ 9971.030011]  [<ffffffff81642009>] stub_clone+0x69/0x90
      [ 9971.091573]  [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
      
      The cause is that cpuset_mems_allowed() try to take
      mutex_lock(&callback_mutex) under the rcu_read_lock(which was hold in
      __mpol_dup()). And in cpuset_mems_allowed(), the access to cpuset is
      under rcu_read_lock, so in __mpol_dup, we can reduce the rcu_read_lock
      protection region to protect the access to cpuset only in
      current_cpuset_is_being_rebound(). So that we can avoid this bug.
      
      This patch is a temporary solution that just addresses the bug
      mentioned above, can not fix the long-standing issue about cpuset.mems
      rebinding on fork():
      
      "When the forker's task_struct is duplicated (which includes
       ->mems_allowed) and it races with an update to cpuset_being_rebound
       in update_tasks_nodemask() then the task's mems_allowed doesn't get
       updated. And the child task's mems_allowed can be wrong if the
       cpuset's nodemask changes before the child has been added to the
       cgroup's tasklist."
      Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable <stable@vger.kernel.org>
      391acf97
  8. 24 6月, 2014 9 次提交
    • H
      mm: fix crashes from mbind() merging vmas · d05f0cdc
      Hugh Dickins 提交于
      In v2.6.34 commit 9d8cebd4 ("mm: fix mbind vma merge problem")
      introduced vma merging to mbind(), but it should have also changed the
      convention of passing start vma from queue_pages_range() (formerly
      check_range()) to new_vma_page(): vma merging may have already freed
      that structure, resulting in BUG at mm/mempolicy.c:1738 and probably
      worse crashes.
      
      Fixes: 9d8cebd4 ("mm: fix mbind vma merge problem")
      Reported-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Tested-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: <stable@vger.kernel.org>	[2.6.34+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d05f0cdc
    • J
      slab: fix oops when reading /proc/slab_allocators · 03787301
      Joonsoo Kim 提交于
      Commit b1cb0982 ("change the management method of free objects of
      the slab") introduced a bug on slab leak detector
      ('/proc/slab_allocators').  This detector works like as following
      decription.
      
       1. traverse all objects on all the slabs.
       2. determine whether it is active or not.
       3. if active, print who allocate this object.
      
      but that commit changed the way how to manage free objects, so the logic
      determining whether it is active or not is also changed.  In before, we
      regard object in cpu caches as inactive one, but, with this commit, we
      mistakenly regard object in cpu caches as active one.
      
      This intoduces kernel oops if DEBUG_PAGEALLOC is enabled.  If
      DEBUG_PAGEALLOC is enabled, kernel_map_pages() is used to detect who
      corrupt free memory in the slab.  It unmaps page table mapping if object
      is free and map it if object is active.  When slab leak detector check
      object in cpu caches, it mistakenly think this object active so try to
      access object memory to retrieve caller of allocation.  At this point,
      page table mapping to this object doesn't exist, so oops occurs.
      
      Following is oops message reported from Dave.
      
      It blew up when something tried to read /proc/slab_allocators
      (Just cat it, and you should see the oops below)
      
        Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        Modules linked in:
        [snip...]
        CPU: 1 PID: 9386 Comm: trinity-c33 Not tainted 3.14.0-rc5+ #131
        task: ffff8801aa46e890 ti: ffff880076924000 task.ti: ffff880076924000
        RIP: 0010:[<ffffffffaa1a8f4a>]  [<ffffffffaa1a8f4a>] handle_slab+0x8a/0x180
        RSP: 0018:ffff880076925de0  EFLAGS: 00010002
        RAX: 0000000000001000 RBX: 0000000000000000 RCX: 000000005ce85ce7
        RDX: ffffea00079be100 RSI: 0000000000001000 RDI: ffff880107458000
        RBP: ffff880076925e18 R08: 0000000000000001 R09: 0000000000000000
        R10: 0000000000000000 R11: 000000000000000f R12: ffff8801e6f84000
        R13: ffffea00079be100 R14: ffff880107458000 R15: ffff88022bb8d2c0
        FS:  00007fb769e45740(0000) GS:ffff88024d040000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffff8801e6f84ff8 CR3: 00000000a22db000 CR4: 00000000001407e0
        DR0: 0000000002695000 DR1: 0000000002695000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602
        Call Trace:
          leaks_show+0xce/0x240
          seq_read+0x28e/0x490
          proc_reg_read+0x3d/0x80
          vfs_read+0x9b/0x160
          SyS_read+0x58/0xb0
          tracesys+0xd4/0xd9
        Code: f5 00 00 00 0f 1f 44 00 00 48 63 c8 44 3b 0c 8a 0f 84 e3 00 00 00 83 c0 01 44 39 c0 72 eb 41 f6 47 1a 01 0f 84 e9 00 00 00 89 f0 <4d> 8b 4c 04 f8 4d 85 c9 0f 84 88 00 00 00 49 8b 7e 08 4d 8d 46
        RIP   handle_slab+0x8a/0x180
      
      To fix the problem, I introduce an object status buffer on each slab.
      With this, we can track object status precisely, so slab leak detector
      would not access active object and no kernel oops would occur.  Memory
      overhead caused by this fix is only imposed to CONFIG_DEBUG_SLAB_LEAK
      which is mainly used for debugging, so memory overhead isn't big
      problem.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Reported-by: NDave Jones <davej@redhat.com>
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reviewed-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      03787301
    • H
      shmem: fix faulting into a hole while it's punched · f00cdc6d
      Hugh Dickins 提交于
      Trinity finds that mmap access to a hole while it's punched from shmem
      can prevent the madvise(MADV_REMOVE) or fallocate(FALLOC_FL_PUNCH_HOLE)
      from completing, until the reader chooses to stop; with the puncher's
      hold on i_mutex locking out all other writers until it can complete.
      
      It appears that the tmpfs fault path is too light in comparison with its
      hole-punching path, lacking an i_data_sem to obstruct it; but we don't
      want to slow down the common case.
      
      Extend shmem_fallocate()'s existing range notification mechanism, so
      shmem_fault() can refrain from faulting pages into the hole while it's
      punched, waiting instead on i_mutex (when safe to sleep; or repeatedly
      faulting when not).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f00cdc6d
    • H
      mm: let mm_find_pmd fix buggy race with THP fault · f72e7dcd
      Hugh Dickins 提交于
      Trinity has reported:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
          IP: __lock_acquire (kernel/locking/lockdep.c:3070 (discriminator 1))
          CPU: 6 PID: 16173 Comm: trinity-c364 Tainted: G        W
                                  3.15.0-rc1-next-20140415-sasha-00020-gaa90d09 #398
          lock_acquire (arch/x86/include/asm/current.h:14
                        kernel/locking/lockdep.c:3602)
          _raw_spin_lock (include/linux/spinlock_api_smp.h:143
                          kernel/locking/spinlock.c:151)
          remove_migration_pte (mm/migrate.c:137)
          rmap_walk (mm/rmap.c:1628 mm/rmap.c:1699)
          remove_migration_ptes (mm/migrate.c:224)
          migrate_pages (mm/migrate.c:922 mm/migrate.c:960 mm/migrate.c:1126)
          migrate_misplaced_page (mm/migrate.c:1733)
          __handle_mm_fault (mm/memory.c:3762 mm/memory.c:3812 mm/memory.c:3925)
          handle_mm_fault (mm/memory.c:3948)
          __get_user_pages (mm/memory.c:1851)
          __mlock_vma_pages_range (mm/mlock.c:255)
          __mm_populate (mm/mlock.c:711)
          SyS_mlockall (include/linux/mm.h:1799 mm/mlock.c:817 mm/mlock.c:791)
      
      I believe this comes about because, whereas collapsing and splitting THP
      functions take anon_vma lock in write mode (which excludes concurrent
      rmap walks), faulting THP functions (write protection and misplaced
      NUMA) do not - and mostly they do not need to.
      
      But they do use a pmdp_clear_flush(), set_pmd_at() sequence which, for
      an instant (indeed, for a long instant, given the inter-CPU TLB flush in
      there), leaves *pmd neither present not trans_huge.
      
      Which can confuse a concurrent rmap walk, as when removing migration
      ptes, seen in the dumped trace.  Although that rmap walk has a 4k page
      to insert, anon_vmas containing THPs are in no way segregated from
      4k-page anon_vmas, so the 4k-intent mm_find_pmd() does need to cope with
      that instant when a trans_huge pmd is temporarily absent.
      
      I don't think we need strengthen the locking at the THP end: it's easily
      handled with an ACCESS_ONCE() before testing both conditions.
      
      And since mm_find_pmd() had only one caller who wanted a THP rather than
      a pmd, let's slightly repurpose it to fail when it hits a THP or
      non-present pmd, and open code split_huge_page_address() again.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Lameter <cl@gentwo.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f72e7dcd
    • H
      mm: thp: fix DEBUG_PAGEALLOC oops in copy_page_rep() · 5338a937
      Hugh Dickins 提交于
      Trinity has for over a year been reporting a CONFIG_DEBUG_PAGEALLOC oops
      in copy_page_rep() called from copy_user_huge_page() called from
      do_huge_pmd_wp_page().
      
      I believe this is a DEBUG_PAGEALLOC false positive, due to the source
      page being split, and a tail page freed, while copy is in progress; and
      not a problem without DEBUG_PAGEALLOC, since the pmd_same() check will
      prevent a miscopy from being made visible.
      
      Fix by adding get_user_huge_page() and put_user_huge_page(): reducing to
      the usual get_page() and put_page() on head page in the usual config;
      but get and put references to all of the tail pages when
      DEBUG_PAGEALLOC.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5338a937
    • D
      mm, pcp: allow restoring percpu_pagelist_fraction default · 7cd2b0a3
      David Rientjes 提交于
      Oleg reports a division by zero error on zero-length write() to the
      percpu_pagelist_fraction sysctl:
      
          divide error: 0000 [#1] SMP DEBUG_PAGEALLOC
          CPU: 1 PID: 9142 Comm: badarea_io Not tainted 3.15.0-rc2-vm-nfs+ #19
          Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
          task: ffff8800d5aeb6e0 ti: ffff8800d87a2000 task.ti: ffff8800d87a2000
          RIP: 0010: percpu_pagelist_fraction_sysctl_handler+0x84/0x120
          RSP: 0018:ffff8800d87a3e78  EFLAGS: 00010246
          RAX: 0000000000000f89 RBX: ffff88011f7fd000 RCX: 0000000000000000
          RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000010
          RBP: ffff8800d87a3e98 R08: ffffffff81d002c8 R09: ffff8800d87a3f50
          R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000060
          R13: ffffffff81c3c3e0 R14: ffffffff81cfddf8 R15: ffff8801193b0800
          FS:  00007f614f1e9740(0000) GS:ffff88011f440000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
          CR2: 00007f614f1fa000 CR3: 00000000d9291000 CR4: 00000000000006e0
          Call Trace:
            proc_sys_call_handler+0xb3/0xc0
            proc_sys_write+0x14/0x20
            vfs_write+0xba/0x1e0
            SyS_write+0x46/0xb0
            tracesys+0xe1/0xe6
      
      However, if the percpu_pagelist_fraction sysctl is set by the user, it
      is also impossible to restore it to the kernel default since the user
      cannot write 0 to the sysctl.
      
      This patch allows the user to write 0 to restore the default behavior.
      It still requires a fraction equal to or larger than 8, however, as
      stated by the documentation for sanity.  If a value in the range [1, 7]
      is written, the sysctl will return EINVAL.
      
      This successfully solves the divide by zero issue at the same time.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reported-by: NOleg Drokin <green@linuxhacker.ru>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7cd2b0a3
    • N
      hugetlb: fix copy_hugetlb_page_range() to handle migration/hwpoisoned entry · 4a705fef
      Naoya Horiguchi 提交于
      There's a race between fork() and hugepage migration, as a result we try
      to "dereference" a swap entry as a normal pte, causing kernel panic.
      The cause of the problem is that copy_hugetlb_page_range() can't handle
      "swap entry" family (migration entry and hwpoisoned entry) so let's fix
      it.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: <stable@vger.kernel.org>	[2.6.37+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a705fef
    • H
      tmpfs: ZERO_RANGE and COLLAPSE_RANGE not currently supported · 13ace4d0
      Hugh Dickins 提交于
      I was well aware of FALLOC_FL_ZERO_RANGE and FALLOC_FL_COLLAPSE_RANGE
      support being added to fallocate(); but didn't realize until now that I
      had been too stupid to future-proof shmem_fallocate() against new
      additions.  -EOPNOTSUPP instead of going on to ordinary fallocation.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Reviewed-by: NLukas Czerner <lczerner@redhat.com>
      Cc: <stable@vger.kernel.org>	[3.15]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      13ace4d0
    • S
      mm: nommu: per-thread vma cache fix · e020d5bd
      Steven Miao 提交于
      mm could be removed from current task struct, using previous vma->vm_mm
      
      It will crash on blackfin after updated to Linux 3.15.  The commit "mm:
      per-thread vma caching" caused the crash.  mm could be removed from
      current task struct before
      
        mmput()->
          exit_mmap()->
            delete_vma_from_mm()
      
      the detailed fault information:
      
          NULL pointer access
          Kernel OOPS in progress
          Deferred Exception context
          CURRENT PROCESS:
          COMM=modprobe PID=278  CPU=0
          invalid mm
          return address: [0x000531de]; contents of:
          0x000531b0:  c727  acea  0c42  181d  0000  0000  0000  a0a8
          0x000531c0:  b090  acaa  0c42  1806  0000  0000  0000  a0e8
          0x000531d0:  b0d0  e801  0000  05b3  0010  e522  0046 [a090]
          0x000531e0:  6408  b090  0c00  17cc  3042  e3ff  f37b  2fc8
      
          CPU: 0 PID: 278 Comm: modprobe Not tainted 3.15.0-ADI-2014R1-pre-00345-gea9f446 #25
          task: 0572b720 ti: 0569e000 task.ti: 0569e000
          Compiled for cpu family 0x27fe (Rev 0), but running on:0x0000 (Rev 0)
          ADSP-BF609-0.0 500(MHz CCLK) 125(MHz SCLK) (mpu off)
          Linux version 3.15.0-ADI-2014R1-pre-00345-gea9f446 (steven@steven-OptiPlex-390) (gcc version 4.3.5 (ADI-trunk/svn-5962) ) #25 Tue Jun 10 17:47:46 CST 2014
      
          SEQUENCER STATUS:		Not tainted
           SEQSTAT: 00000027  IPEND: 8008  IMASK: ffff  SYSCFG: 2806
            EXCAUSE   : 0x27
            physical IVG3 asserted : <0xffa00744> { _trap + 0x0 }
            physical IVG15 asserted : <0xffa00d68> { _evt_system_call + 0x0 }
            logical irq   6 mapped  : <0xffa003bc> { _bfin_coretmr_interrupt + 0x0 }
            logical irq   7 mapped  : <0x00008828> { _bfin_fault_routine + 0x0 }
            logical irq  11 mapped  : <0x00007724> { _l2_ecc_err + 0x0 }
            logical irq  13 mapped  : <0x00008828> { _bfin_fault_routine + 0x0 }
            logical irq  39 mapped  : <0x00150788> { _bfin_twi_interrupt_entry + 0x0 }
            logical irq  40 mapped  : <0x00150788> { _bfin_twi_interrupt_entry + 0x0 }
           RETE: <0x00000000> /* Maybe null pointer? */
           RETN: <0x0569fe50> /* kernel dynamic memory (maybe user-space) */
           RETX: <0x00000480> /* Maybe fixed code section */
           RETS: <0x00053384> { _exit_mmap + 0x28 }
           PC  : <0x000531de> { _delete_vma_from_mm + 0x92 }
          DCPLB_FAULT_ADDR: <0x00000008> /* Maybe null pointer? */
          ICPLB_FAULT_ADDR: <0x000531de> { _delete_vma_from_mm + 0x92 }
          PROCESSOR STATE:
           R0 : 00000004    R1 : 0569e000    R2 : 00bf3db4    R3 : 00000000
           R4 : 057f9800    R5 : 00000001    R6 : 0569ddd0    R7 : 0572b720
           P0 : 0572b854    P1 : 00000004    P2 : 00000000    P3 : 0569dda0
           P4 : 0572b720    P5 : 0566c368    FP : 0569fe5c    SP : 0569fd74
           LB0: 057f523f    LT0: 057f523e    LC0: 00000000
           LB1: 0005317c    LT1: 00053172    LC1: 00000002
           B0 : 00000000    L0 : 00000000    M0 : 0566f5bc    I0 : 00000000
           B1 : 00000000    L1 : 00000000    M1 : 00000000    I1 : ffffffff
           B2 : 00000001    L2 : 00000000    M2 : 00000000    I2 : 00000000
           B3 : 00000000    L3 : 00000000    M3 : 00000000    I3 : 057f8000
          A0.w: 00000000   A0.x: 00000000   A1.w: 00000000   A1.x: 00000000
          USP : 056ffcf8  ASTAT: 02003024
      
          Hardware Trace:
             0 Target : <0x00003fb8> { _trap_c + 0x0 }
               Source : <0xffa006d8> { _exception_to_level5 + 0xa0 } JUMP.L
             1 Target : <0xffa00638> { _exception_to_level5 + 0x0 }
               Source : <0xffa004f2> { _bfin_return_from_exception + 0x6 } RTX
             2 Target : <0xffa004ec> { _bfin_return_from_exception + 0x0 }
               Source : <0xffa00590> { _ex_trap_c + 0x70 } JUMP.S
             3 Target : <0xffa00520> { _ex_trap_c + 0x0 }
               Source : <0xffa0076e> { _trap + 0x2a } JUMP (P4)
             4 Target : <0xffa00744> { _trap + 0x0 }
                FAULT : <0x000531de> { _delete_vma_from_mm + 0x92 } P0 = W[P2 + 2]
               Source : <0x000531da> { _delete_vma_from_mm + 0x8e } P2 = [P4 + 0x18]
             5 Target : <0x000531da> { _delete_vma_from_mm + 0x8e }
               Source : <0x00053176> { _delete_vma_from_mm + 0x2a } IF CC JUMP pcrel
             6 Target : <0x0005314c> { _delete_vma_from_mm + 0x0 }
               Source : <0x00053380> { _exit_mmap + 0x24 } JUMP.L
             7 Target : <0x00053378> { _exit_mmap + 0x1c }
               Source : <0x00053394> { _exit_mmap + 0x38 } IF !CC JUMP pcrel (BP)
             8 Target : <0x00053390> { _exit_mmap + 0x34 }
               Source : <0xffa020e0> { __cond_resched + 0x20 } RTS
             9 Target : <0xffa020c0> { __cond_resched + 0x0 }
               Source : <0x0005338c> { _exit_mmap + 0x30 } JUMP.L
            10 Target : <0x0005338c> { _exit_mmap + 0x30 }
               Source : <0x0005333a> { _delete_vma + 0xb2 } RTS
            11 Target : <0x00053334> { _delete_vma + 0xac }
               Source : <0x0005507a> { _kmem_cache_free + 0xba } RTS
            12 Target : <0x00055068> { _kmem_cache_free + 0xa8 }
               Source : <0x0005505e> { _kmem_cache_free + 0x9e } IF !CC JUMP pcrel (BP)
            13 Target : <0x00055052> { _kmem_cache_free + 0x92 }
               Source : <0x0005501a> { _kmem_cache_free + 0x5a } IF CC JUMP pcrel
            14 Target : <0x00054ff4> { _kmem_cache_free + 0x34 }
               Source : <0x00054fce> { _kmem_cache_free + 0xe } IF CC JUMP pcrel (BP)
            15 Target : <0x00054fc0> { _kmem_cache_free + 0x0 }
               Source : <0x00053330> { _delete_vma + 0xa8 } JUMP.L
          Kernel Stack
          Stack info:
           SP: [0x0569ff24] <0x0569ff24> /* kernel dynamic memory (maybe user-space) */
           Memory from 0x0569ff20 to 056a0000
          0569ff20: 00000001 [04e8da5a] 00008000  00000000  00000000  056a0000  04e8da5a  04e8da5a
          0569ff40: 04eb9eea  ffa00dce  02003025  04ea09c5  057f523f  04ea09c4  057f523e  00000000
          0569ff60: 00000000  00000000  00000000  00000000  00000000  00000000  00000001  00000000
          0569ff80: 00000000  00000000  00000000  00000000  00000000  00000000  00000000  00000000
          0569ffa0: 0566f5bc  057f8000  057f8000  00000001  04ec0170  056ffcf8  056ffd04  057f9800
          0569ffc0: 04d1d498  057f9800  057f8fe4  057f8ef0  00000001  057f928c  00000001  00000001
          0569ffe0: 057f9800  00000000  00000008  00000007  00000001  00000001  00000001 <00002806>
          Return addresses in stack:
              address : <0x00002806> { _show_cpuinfo + 0x2d2 }
          Modules linked in:
          Kernel panic - not syncing: Kernel exception
          [ end Kernel panic - not syncing: Kernel exception
      Signed-off-by: NSteven Miao <realmz6@gmail.com>
      Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: <stable@vger.kernel.org>	[3.15.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e020d5bd
  9. 15 6月, 2014 1 次提交
    • A
      fix __swap_writepage() compile failure on old gcc versions · 05064084
      Al Viro 提交于
      Tetsuo Handa wrote:
       "Commit 62a8067a ("bio_vec-backed iov_iter") introduced an unnamed
        union inside a struct which gcc-4.4.7 cannot handle.  Name the unnamed
         union as u in order to fix build failure"
      
      Let's do this instead: there is only one place in the entire tree that
      steps into this breakage.  Anon structs and unions work in older gcc
      versions; as the matter of fact, we have those in the tree - see e.g.
      struct ieee80211_tx_info in include/net/mac80211.h
      
      What doesn't work is handling their initializers:
      
      struct {
      	int a;
      	union {
      		int b;
      		char c;
      	};
      } x[2] = {{.a = 1, .c = 'a'}, {.a = 0, .b = 1}};
      
      is the obvious syntax for initializer, perfectly fine for C11 and
      handled correctly by gcc-4.7 or later.
      
      Earlier versions, though, break on it - declaration is fine and so's
      access to fields (i.e.  x[0].c = 'a'; would produce the right code), but
      members of the anon structs and unions are not inserted into the right
      namespace.  Tellingly, those older versions will not barf on struct {int
      a; struct {int a;};}; - looks like they just have it hacked up somewhere
      around the handling of .  and -> instead of doing the right thing.
      
      The easiest way to deal with that crap is to turn initialization of
      those fields (in the only place where we have such initializer of
      iov_iter) into plain assignment.
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      05064084
  10. 12 6月, 2014 1 次提交
  11. 09 6月, 2014 1 次提交
    • L
      Don't trigger congestion wait on dirty-but-not-writeout pages · b738d764
      Linus Torvalds 提交于
      shrink_inactive_list() used to wait 0.1s to avoid congestion when all
      the pages that were isolated from the inactive list were dirty but not
      under active writeback.  That makes no real sense, and apparently causes
      major interactivity issues under some loads since 3.11.
      
      The ostensible reason for it was to wait for kswapd to start writing
      pages, but that seems questionable as well, since the congestion wait
      code seems to trigger for kswapd itself as well.  Also, the logic behind
      delaying anything when we haven't actually started writeback is not
      clear - it only delays actually starting that writeback.
      
      We'll still trigger the congestion waiting if
      
       (a) the process is kswapd, and we hit pages flagged for immediate
           reclaim
      
       (b) the process is not kswapd, and the zone backing dev writeback is
           actually congested.
      
      This probably needs to be revisited, but as it is this fixes a reported
      regression.
      Reported-by: NFelipe Contreras <felipe.contreras@gmail.com>
      Pinpointed-by: NHillf Danton <dhillf@gmail.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b738d764
  12. 07 6月, 2014 5 次提交