1. 04 3月, 2014 9 次提交
    • A
      kallsyms: fix absolute addresses for kASLR · 0f55159d
      Andy Honig 提交于
      Currently symbols that are absolute addresses are incorrectly displayed
      in /proc/kallsyms if the kernel is loaded with kASLR.
      
      The problem was that the scripts/kallsyms.c file which generates the
      array of symbol names and addresses uses an relocatable value for all
      symbols, even absolute symbols.  This patch fixes that.
      
      Several kallsyms output in different boot states for comparison:
      
        $ egrep '_(stext|_per_cpu_(start|end))' /root/kallsyms.nokaslr
        0000000000000000 D __per_cpu_start
        0000000000014280 D __per_cpu_end
        ffffffff810001c8 T _stext
        $ egrep '_(stext|_per_cpu_(start|end))' /root/kallsyms.kaslr1
        000000001f200000 D __per_cpu_start
        000000001f214280 D __per_cpu_end
        ffffffffa02001c8 T _stext
        $ egrep '_(stext|_per_cpu_(start|end))' /root/kallsyms.kaslr2
        000000000d400000 D __per_cpu_start
        000000000d414280 D __per_cpu_end
        ffffffff8e4001c8 T _stext
        $ egrep '_(stext|_per_cpu_(start|end))' /root/kallsyms.kaslr-fixed
        0000000000000000 D __per_cpu_start
        0000000000014280 D __per_cpu_end
        ffffffffadc001c8 T _stext
      Signed-off-by: NAndy Honig <ahonig@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f55159d
    • D
      scripts/gen_initramfs_list.sh: fix flags for initramfs LZ4 compression · 5ec384d4
      Daniel M. Weeks 提交于
      LZ4 as implemented in the kernel differs from the default method now
      used by the reference implementation of LZ4.  Until the in-kernel method
      is updated to support the new default, passing the legacy flag (-l) to
      the compressor is necessary.  Without this flag the kernel-generated,
      LZ4-compressed initramfs is junk.
      
      Kyungsik said:
      
      : It seems that lz4 supports legacy format with the same option as lz4c
      : does.  Just looking at the first few bytes of lz4 compressed image, we can
      : see whether it is new format or not.
      :
      : It shows new format magic number without this patch.  New format magic
      : number is 0x184d2204.
      :
      : $ hexdump -C ./initramfs_data.cpio.lz4 |more
      : 00000000  04 22 4d 18 64 70 b9 69 (Little Endian)
      : ...
      :
      : Currently kernel supports legacy format only.
      Signed-off-by: NDaniel M. Weeks <dan@danweeks.net>
      Cc: Michal Marek <mmarek@suse.cz>
      Acked-by: NKyungsik Lee <kyungsik.lee@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ec384d4
    • V
      mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking · 9050d7eb
      Vlastimil Babka 提交于
      Daniel Borkmann reported a VM_BUG_ON assertion failing:
      
        ------------[ cut here ]------------
        kernel BUG at mm/mlock.c:528!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: ccm arc4 iwldvm [...]
         video
        CPU: 3 PID: 2266 Comm: netsniff-ng Not tainted 3.14.0-rc2+ #8
        Hardware name: LENOVO 2429BP3/2429BP3, BIOS G4ET37WW (1.12 ) 05/29/2012
        task: ffff8801f87f9820 ti: ffff88002cb44000 task.ti: ffff88002cb44000
        RIP: 0010:[<ffffffff81171ad0>]  [<ffffffff81171ad0>] munlock_vma_pages_range+0x2e0/0x2f0
        Call Trace:
          do_munmap+0x18f/0x3b0
          vm_munmap+0x41/0x60
          SyS_munmap+0x22/0x30
          system_call_fastpath+0x1a/0x1f
        RIP   munlock_vma_pages_range+0x2e0/0x2f0
        ---[ end trace a0088dcf07ae10f2 ]---
      
      because munlock_vma_pages_range() thinks it's unexpectedly in the middle
      of a THP page.  This can be reproduced with default config since 3.11
      kernels.  A reproducer can be found in the kernel's selftest directory
      for networking by running ./psock_tpacket.
      
      The problem is that an order=2 compound page (allocated by
      alloc_one_pg_vec_page() is part of the munlocked VM_MIXEDMAP vma (mapped
      by packet_mmap()) and mistaken for a THP page and assumed to be order=9.
      
      The checks for THP in munlock came with commit ff6a6da6 ("mm:
      accelerate munlock() treatment of THP pages"), i.e.  since 3.9, but did
      not trigger a bug.  It just makes munlock_vma_pages_range() skip such
      compound pages until the next 512-pages-aligned page, when it encounters
      a head page.  This is however not a problem for vma's where mlocking has
      no effect anyway, but it can distort the accounting.
      
      Since commit 7225522b ("mm: munlock: batch non-THP page isolation
      and munlock+putback using pagevec") this can trigger a VM_BUG_ON in
      PageTransHuge() check.
      
      This patch fixes the issue by adding VM_MIXEDMAP flag to VM_SPECIAL, a
      list of flags that make vma's non-mlockable and non-mergeable.  The
      reasoning is that VM_MIXEDMAP vma's are similar to VM_PFNMAP, which is
      already on the VM_SPECIAL list, and both are intended for non-LRU pages
      where mlocking makes no sense anyway.  Related Lkml discussion can be
      found in [2].
      
       [1] tools/testing/selftests/net/psock_tpacket
       [2] https://lkml.org/lkml/2014/1/10/427Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Reported-by: NDaniel Borkmann <dborkman@redhat.com>
      Tested-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: John David Anglin <dave.anglin@bell.net>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Jared Hulbert <jaredeh@gmail.com>
      Tested-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org> [3.11.x+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9050d7eb
    • F
      memcg: reparent charges of children before processing parent · 4fb1a86f
      Filipe Brandenburger 提交于
      Sometimes the cleanup after memcg hierarchy testing gets stuck in
      mem_cgroup_reparent_charges(), unable to bring non-kmem usage down to 0.
      
      There may turn out to be several causes, but a major cause is this: the
      workitem to offline parent can get run before workitem to offline child;
      parent's mem_cgroup_reparent_charges() circles around waiting for the
      child's pages to be reparented to its lrus, but it's holding
      cgroup_mutex which prevents the child from reaching its
      mem_cgroup_reparent_charges().
      
      Further testing showed that an ordered workqueue for cgroup_destroy_wq
      is not always good enough: percpu_ref_kill_and_confirm's call_rcu_sched
      stage on the way can mess up the order before reaching the workqueue.
      
      Instead, when offlining a memcg, call mem_cgroup_reparent_charges() on
      all its children (and grandchildren, in the correct order) to have their
      charges reparented first.
      
      Fixes: e5fca243 ("cgroup: use a dedicated workqueue for cgroup destruction")
      Signed-off-by: NFilipe Brandenburger <filbranden@google.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Reviewed-by: NTejun Heo <tj@kernel.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>	[v3.10+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4fb1a86f
    • H
      memcg: fix endless loop in __mem_cgroup_iter_next() · ce48225f
      Hugh Dickins 提交于
      Commit 0eef6156 ("memcg: fix css reference leak and endless loop in
      mem_cgroup_iter") got the interaction with the commit a few before it
      d8ad3055 ("mm/memcg: iteration skip memcgs not yet fully
      initialized") slightly wrong, and we didn't notice at the time.
      
      It's elusive, and harder to get than the original, but for a couple of
      days before rc1, I several times saw a endless loop similar to that
      supposedly being fixed.
      
      This time it was a tighter loop in __mem_cgroup_iter_next(): because we
      can get here when our root has already been offlined, and the ordering
      of conditions was such that we then just cycled around forever.
      
      Fixes: 0eef6156 ("memcg: fix css reference leak and endless loop in mem_cgroup_iter").
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: <stable@vger.kernel.org>	[3.12+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ce48225f
    • H
      lib/radix-tree.c: swapoff tmpfs radix_tree: remember to rcu_read_unlock · 5f30fc94
      Hugh Dickins 提交于
      Running fsx on tmpfs with concurrent memhog-swapoff-swapon, lots of
      
        BUG: sleeping function called from invalid context at kernel/fork.c:606
        in_atomic(): 0, irqs_disabled(): 0, pid: 1394, name: swapoff
        1 lock held by swapoff/1394:
         #0:  (rcu_read_lock){.+.+.+}, at: [<ffffffff812520a1>] radix_tree_locate_item+0x1f/0x2b6
      
      followed by
      
        ================================================
        [ BUG: lock held when returning to user space! ]
        3.14.0-rc1 #3 Not tainted
        ------------------------------------------------
        swapoff/1394 is leaving the kernel with locks still held!
        1 lock held by swapoff/1394:
         #0:  (rcu_read_lock){.+.+.+}, at: [<ffffffff812520a1>] radix_tree_locate_item+0x1f/0x2b6
      
      after which the system recovered nicely.
      
      Whoops, I long ago forgot the rcu_read_unlock() on one unlikely branch.
      
      Fixes e504f3fd ("tmpfs radix_tree: locate_item to speed up swapoff")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5f30fc94
    • D
      dma debug: account for cachelines and read-only mappings in overlap tracking · 3b7a6418
      Dan Williams 提交于
      While debug_dma_assert_idle() checks if a given *page* is actively
      undergoing dma the valid granularity of a dma mapping is a *cacheline*.
      Sander's testing shows that the warning message "DMA-API: exceeded 7
      overlapping mappings of pfn..." is falsely triggering.  The test is
      simply mapping multiple cachelines in a given page.
      
      Ultimately we want overlap tracking to be valid as it is a real api
      violation, so we need to track active mappings by cachelines.  Update
      the active dma tracking to use the page-frame-relative cacheline of the
      mapping as the key, and update debug_dma_assert_idle() to check for all
      possible mapped cachelines for a given page.
      
      However, the need to track active mappings is only relevant when the
      dma-mapping is writable by the device.  In fact it is fairly standard
      for read-only mappings to have hundreds or thousands of overlapping
      mappings at once.  Limiting the overlap tracking to writable
      (!DMA_TO_DEVICE) eliminates this class of false-positive overlap
      reports.
      
      Note, the radix gang lookup is sub-optimal.  It would be best if it
      stopped fetching entries once the search passed a page boundary.
      Nevertheless, this implementation does not perturb the original net_dma
      failing case.  That is to say the extra overhead does not show up in
      terms of making the failing case pass due to a timing change.
      
      References:
        http://marc.info/?l=linux-netdev&m=139232263419315&w=2
        http://marc.info/?l=linux-netdev&m=139217088107122&w=2Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
      Reported-by: NDave Jones <davej@redhat.com>
      Tested-by: NDave Jones <davej@redhat.com>
      Tested-by: NSander Eikelenboom <linux@eikelenboom.it>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Francois Romieu <romieu@fr.zoreil.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Wei Liu <wei.liu2@citrix.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3b7a6418
    • D
      mm: close PageTail race · 668f9abb
      David Rientjes 提交于
      Commit bf6bddf1 ("mm: introduce compaction and migration for
      ballooned pages") introduces page_count(page) into memory compaction
      which dereferences page->first_page if PageTail(page).
      
      This results in a very rare NULL pointer dereference on the
      aforementioned page_count(page).  Indeed, anything that does
      compound_head(), including page_count() is susceptible to racing with
      prep_compound_page() and seeing a NULL or dangling page->first_page
      pointer.
      
      This patch uses Andrea's implementation of compound_trans_head() that
      deals with such a race and makes it the default compound_head()
      implementation.  This includes a read memory barrier that ensures that
      if PageTail(head) is true that we return a head page that is neither
      NULL nor dangling.  The patch then adds a store memory barrier to
      prep_compound_page() to ensure page->first_page is set.
      
      This is the safest way to ensure we see the head page that we are
      expecting, PageTail(page) is already in the unlikely() path and the
      memory barriers are unfortunately required.
      
      Hugetlbfs is the exception, we don't enforce a store memory barrier
      during init since no race is possible.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Holger Kiehl <Holger.Kiehl@dwd.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      668f9abb
    • B
      MAINTAINERS: EDAC: add Mauro and Borislav as interim patch collectors · aa15aa0e
      Borislav Petkov 提交于
      We're more or less collecting EDAC patches already anyway so let's hold it
      down so that get_maintainer sees it too.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
      Cc: Doug Thompson <dougthompson@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aa15aa0e
  2. 03 3月, 2014 8 次提交
  3. 02 3月, 2014 8 次提交
  4. 01 3月, 2014 10 次提交
  5. 28 2月, 2014 5 次提交