1. 05 11月, 2019 1 次提交
  2. 31 5月, 2019 1 次提交
  3. 07 5月, 2019 1 次提交
    • R
      powerpc/book3s/64: check for NULL pointer in pgd_alloc() · f3935626
      Rick Lindsley 提交于
      When the memset code was added to pgd_alloc(), it failed to consider
      that kmem_cache_alloc() can return NULL. It's uncommon, but not
      impossible under heavy memory contention. Example oops:
      
        Unable to handle kernel paging request for data at address 0x00000000
        Faulting instruction address: 0xc0000000000a4000
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE SMP NR_CPUS=2048 NUMA pSeries
        CPU: 70 PID: 48471 Comm: entrypoint.sh Kdump: loaded Not tainted 4.14.0-115.6.1.el7a.ppc64le #1
        task: c000000334a00000 task.stack: c000000331c00000
        NIP:  c0000000000a4000 LR: c00000000012f43c CTR: 0000000000000020
        REGS: c000000331c039c0 TRAP: 0300   Not tainted  (4.14.0-115.6.1.el7a.ppc64le)
        MSR:  800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 44022840  XER: 20040000
        CFAR: c000000000008874 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
        ...
        NIP [c0000000000a4000] memset+0x68/0x104
        LR [c00000000012f43c] mm_init+0x27c/0x2f0
        Call Trace:
          mm_init+0x260/0x2f0 (unreliable)
          copy_mm+0x11c/0x638
          copy_process.isra.28.part.29+0x6fc/0x1080
          _do_fork+0xdc/0x4c0
          ppc_clone+0x8/0xc
        Instruction dump:
        409e000c b0860000 38c60002 409d000c 90860000 38c60004 78a0d183 78a506a0
        7c0903a6 41820034 60000000 60420000 <f8860000> f8860008 f8860010 f8860018
      
      Fixes: fc5c2f4a ("powerpc/mm/hash64: Zero PGD pages on allocation")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NRick Lindsley <ricklind@vnet.linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f3935626
  4. 02 5月, 2019 3 次提交
  5. 21 2月, 2019 1 次提交
  6. 05 1月, 2019 1 次提交
    • J
      mm: treewide: remove unused address argument from pte_alloc functions · 4cf58924
      Joel Fernandes (Google) 提交于
      Patch series "Add support for fast mremap".
      
      This series speeds up the mremap(2) syscall by copying page tables at
      the PMD level even for non-THP systems.  There is concern that the extra
      'address' argument that mremap passes to pte_alloc may do something
      subtle architecture related in the future that may make the scheme not
      work.  Also we find that there is no point in passing the 'address' to
      pte_alloc since its unused.  This patch therefore removes this argument
      tree-wide resulting in a nice negative diff as well.  Also ensuring
      along the way that the enabled architectures do not do anything funky
      with the 'address' argument that goes unnoticed by the optimization.
      
      Build and boot tested on x86-64.  Build tested on arm64.  The config
      enablement patch for arm64 will be posted in the future after more
      testing.
      
      The changes were obtained by applying the following Coccinelle script.
      (thanks Julia for answering all Coccinelle questions!).
      Following fix ups were done manually:
      * Removal of address argument from  pte_fragment_alloc
      * Removal of pte_alloc_one_fast definitions from m68k and microblaze.
      
      // Options: --include-headers --no-includes
      // Note: I split the 'identifier fn' line, so if you are manually
      // running it, please unsplit it so it runs for you.
      
      virtual patch
      
      @pte_alloc_func_def depends on patch exists@
      identifier E2;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      type T2;
      @@
      
       fn(...
      - , T2 E2
       )
       { ... }
      
      @pte_alloc_func_proto_noarg depends on patch exists@
      type T1, T2, T3, T4;
      identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
      (
      - T3 fn(T1, T2);
      + T3 fn(T1);
      |
      - T3 fn(T1, T2, T4);
      + T3 fn(T1, T2);
      )
      
      @pte_alloc_func_proto depends on patch exists@
      identifier E1, E2, E4;
      type T1, T2, T3, T4;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
      (
      - T3 fn(T1 E1, T2 E2);
      + T3 fn(T1 E1);
      |
      - T3 fn(T1 E1, T2 E2, T4 E4);
      + T3 fn(T1 E1, T2 E2);
      )
      
      @pte_alloc_func_call depends on patch exists@
      expression E2;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
       fn(...
      -,  E2
       )
      
      @pte_alloc_macro depends on patch exists@
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      identifier a, b, c;
      expression e;
      position p;
      @@
      
      (
      - #define fn(a, b, c) e
      + #define fn(a, b) e
      |
      - #define fn(a, b) e
      + #define fn(a) e
      )
      
      Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.comSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Suggested-by: NKirill A. Shutemov <kirill@shutemov.name>
      Acked-by: NKirill A. Shutemov <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Julia Lawall <Julia.Lawall@lip6.fr>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4cf58924
  7. 04 12月, 2018 2 次提交
  8. 13 8月, 2018 1 次提交
  9. 30 7月, 2018 1 次提交
    • M
      powerpc/mm: Don't report PUDs as memory leaks when using kmemleak · a984506c
      Michael Ellerman 提交于
      Paul Menzel reported that kmemleak was producing reports such as:
      
        unreferenced object 0xc0000000f8b80000 (size 16384):
          comm "init", pid 1, jiffies 4294937416 (age 312.240s)
          hex dump (first 32 bytes):
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          backtrace:
            [<00000000d997deb7>] __pud_alloc+0x80/0x190
            [<0000000087f2e8a3>] move_page_tables+0xbac/0xdc0
            [<00000000091e51c2>] shift_arg_pages+0xc0/0x210
            [<00000000ab88670c>] setup_arg_pages+0x22c/0x2a0
            [<0000000060871529>] load_elf_binary+0x41c/0x1648
            [<00000000ecd9d2d4>] search_binary_handler.part.11+0xbc/0x280
            [<0000000034e0cdd7>] __do_execve_file.isra.13+0x73c/0x940
            [<000000005f953a6e>] sys_execve+0x58/0x70
            [<000000009700a858>] system_call+0x5c/0x70
      
      Indicating that a PUD was being leaked.
      
      However what's really happening is that kmemleak is not able to
      recognise the references from the PGD to the PUD, because they are not
      fully qualified pointers.
      
      We can confirm that in xmon, eg:
      
      Find the task struct for pid 1 "init":
        0:mon> P
             task_struct     ->thread.ksp    PID   PPID S  P CMD
        c0000001fe7c0000 c0000001fe803960      1      0 S 13 systemd
      
      Dump virtual address 0 to find the PGD:
        0:mon> dv 0 c0000001fe7c0000
        pgd  @ 0xc0000000f8b01000
      
      Dump the memory of the PGD:
        0:mon> d c0000000f8b01000
        c0000000f8b01000 00000000f8b90000 0000000000000000  |................|
        c0000000f8b01010 0000000000000000 0000000000000000  |................|
        c0000000f8b01020 0000000000000000 0000000000000000  |................|
        c0000000f8b01030 0000000000000000 00000000f8b80000  |................|
                                          ^^^^^^^^^^^^^^^^
      
      There we can see the reference to our supposedly leaked PUD. But
      because it's missing the leading 0xc, kmemleak won't recognise it.
      
      We can confirm it's still in use by translating an address that is
      mapped via it:
        0:mon> dv 7fff94000000 c0000001fe7c0000
        pgd  @ 0xc0000000f8b01000
        pgdp @ 0xc0000000f8b01038 = 0x00000000f8b80000 <--
        pudp @ 0xc0000000f8b81ff8 = 0x00000000037c4000
        pmdp @ 0xc0000000037c5ca0 = 0x00000000fbd89000
        ptep @ 0xc0000000fbd89000 = 0xc0800001d5ce0386
        Maps physical address = 0x00000001d5ce0000
        Flags = Accessed Dirty Read Write
      
      The fix is fairly simple. We need to tell kmemleak to ignore PUD
      allocations and never report them as leaks. We can also tell it not to
      scan the PGD, because it will never find pointers in there. However it
      will still notice if we allocate a PGD and then leak it.
      Reported-by: NPaul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: NPaul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a984506c
  10. 15 5月, 2018 4 次提交
  11. 30 3月, 2018 1 次提交
  12. 13 2月, 2018 2 次提交
    • A
      powerpc/mm/hash64: Zero PGD pages on allocation · fc5c2f4a
      Aneesh Kumar K.V 提交于
      On powerpc we allocate page table pages from slab caches of different
      sizes. Currently we have a constructor that zeroes out the objects when
      we allocate them for the first time.
      
      We expect the objects to be zeroed out when we free the the object
      back to slab cache. This happens in the unmap path. For hugetlb pages
      we call huge_pte_get_and_clear() to do that.
      
      With the current configuration of page table size, both PUD and PGD
      level tables are allocated from the same slab cache. At the PUD level,
      we use the second half of the table to store the slot information. But
      we never clear that when unmapping.
      
      When such a freed object is then allocated for a PGD page, the second
      half of the page table page will not be zeroed as expected. This
      results in a kernel crash.
      
      Fix it by always clearing PGD pages when they're allocated.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [mpe: Change log wording and formatting, add whitespace]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fc5c2f4a
    • A
      powerpc/mm: Fix crashes with 16G huge pages · fae22116
      Aneesh Kumar K.V 提交于
      To support memory keys, we moved the hash pte slot information to the
      second half of the page table. This was ok with PTE entries at level
      4 (PTE page) and level 3 (PMD). We already allocate larger page table
      pages at those levels to accomodate extra details. For level 4 we
      already have the extra space which was used to track 4k hash page
      table entry details and at level 3 the extra space was allocated to
      track the THP details.
      
      With hugetlbfs PTE, we used this extra space at the PMD level to store
      the slot details. But we also support hugetlbfs PTE at PUD level for
      16GB pages and PUD level page didn't allocate extra space. This
      resulted in memory corruption.
      
      Fix this by allocating extra space at PUD level when HUGETLB is
      enabled.
      
      Fixes: bf9a95f9 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: NRam Pai <linuxram@us.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fae22116
  13. 15 8月, 2017 1 次提交
  14. 13 7月, 2017 1 次提交
    • M
      mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic · dcda9b04
      Michal Hocko 提交于
      __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
      the page allocator.  This has been true but only for allocations
      requests larger than PAGE_ALLOC_COSTLY_ORDER.  It has been always
      ignored for smaller sizes.  This is a bit unfortunate because there is
      no way to express the same semantic for those requests and they are
      considered too important to fail so they might end up looping in the
      page allocator for ever, similarly to GFP_NOFAIL requests.
      
      Now that the whole tree has been cleaned up and accidental or misled
      usage of __GFP_REPEAT flag has been removed for !costly requests we can
      give the original flag a better name and more importantly a more useful
      semantic.  Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
      that the allocator would try really hard but there is no promise of a
      success.  This will work independent of the order and overrides the
      default allocator behavior.  Page allocator users have several levels of
      guarantee vs.  cost options (take GFP_KERNEL as an example)
      
       - GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
         attempt to free memory at all. The most light weight mode which even
         doesn't kick the background reclaim. Should be used carefully because
         it might deplete the memory and the next user might hit the more
         aggressive reclaim
      
       - GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
         allocation without any attempt to free memory from the current
         context but can wake kswapd to reclaim memory if the zone is below
         the low watermark. Can be used from either atomic contexts or when
         the request is a performance optimization and there is another
         fallback for a slow path.
      
       - (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
         non sleeping allocation with an expensive fallback so it can access
         some portion of memory reserves. Usually used from interrupt/bh
         context with an expensive slow path fallback.
      
       - GFP_KERNEL - both background and direct reclaim are allowed and the
         _default_ page allocator behavior is used. That means that !costly
         allocation requests are basically nofail but there is no guarantee of
         that behavior so failures have to be checked properly by callers
         (e.g. OOM killer victim is allowed to fail currently).
      
       - GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
         and all allocation requests fail early rather than cause disruptive
         reclaim (one round of reclaim in this implementation). The OOM killer
         is not invoked.
      
       - GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
         behavior and all allocation requests try really hard. The request
         will fail if the reclaim cannot make any progress. The OOM killer
         won't be triggered.
      
       - GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
         and all allocation requests will loop endlessly until they succeed.
         This might be really dangerous especially for larger orders.
      
      Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
      because they already had their semantic.  No new users are added.
      __alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
      there is no progress and we have already passed the OOM point.
      
      This means that all the reclaim opportunities have been exhausted except
      the most disruptive one (the OOM killer) and a user defined fallback
      behavior is more sensible than keep retrying in the page allocator.
      
      [akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
      [mhocko@suse.com: semantic fix]
        Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
      [mhocko@kernel.org: address other thing spotted by Vlastimil]
        Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Alex Belits <alex.belits@cavium.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: NeilBrown <neilb@suse.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dcda9b04
  15. 05 6月, 2017 1 次提交
    • B
      powerpc/mm/book(e)(3s)/64: Add page table accounting · de3b8761
      Balbir Singh 提交于
      Introduce a helper pgtable_gfp_flags() which
      just returns the current gfp flags and adds
      __GFP_ACCOUNT to account for page table allocation.
      The generic helper is added to include/asm/pgalloc.h
      and has two variants - WARNING ugly bits ahead
      
      1. If the header is included from a module, no check
      for mm == &init_mm is done, since init_mm is not
      exported
      2. For kernel includes, the check is done and required
      see (3e79ec7d arch: x86: charge page tables to kmemcg)
      
      The fundamental assumption is that no module should be
      doing pgd/pud/pmd and pte alloc's on behalf of init_mm
      directly.
      
      NOTE: This adds an overhead to pmd/pud/pgd allocations
      similar to x86.  The other alternative was to implement
      pmd_alloc_kernel/pud_alloc_kernel and pgd_alloc_kernel
      with their offset variants.
      
      For 4k page size, pte_alloc_one no longer calls
      pte_alloc_one_kernel.
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      de3b8761
  16. 25 6月, 2016 2 次提交
    • M
      powerpc: get rid of superfluous __GFP_REPEAT · 2379a23e
      Michal Hocko 提交于
      __GFP_REPEAT has a rather weak semantic but since it has been introduced
      around 2.6.12 it has been ignored for low order allocations.
      
      {pud,pmd}_alloc_one are allocating from {PGT,PUD}_CACHE initialized in
      pgtable_cache_init which doesn't have larger than sizeof(void *) << 12
      size and that fits into !costly allocation request size.
      
      PGALLOC_GFP is used only in radix__pgd_alloc which uses either order-0
      or order-4 requests.  The first one doesn't need the flag while the
      second does.  Drop __GFP_REPEAT from PGALLOC_GFP and add it for the
      order-4 one.
      
      This means that this flag has never been actually useful here because it
      has always been used only for PAGE_ALLOC_COSTLY requests.
      
      Link: http://lkml.kernel.org/r/1464599699-30131-12-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2379a23e
    • M
      tree wide: get rid of __GFP_REPEAT for order-0 allocations part I · 32d6bd90
      Michal Hocko 提交于
      This is the third version of the patchset previously sent [1].  I have
      basically only rebased it on top of 4.7-rc1 tree and dropped "dm: get
      rid of superfluous gfp flags" which went through dm tree.  I am sending
      it now because it is tree wide and chances for conflicts are reduced
      considerably when we want to target rc2.  I plan to send the next step
      and rename the flag and move to a better semantic later during this
      release cycle so we will have a new semantic ready for 4.8 merge window
      hopefully.
      
      Motivation:
      
      While working on something unrelated I've checked the current usage of
      __GFP_REPEAT in the tree.  It seems that a majority of the usage is and
      always has been bogus because __GFP_REPEAT has always been about costly
      high order allocations while we are using it for order-0 or very small
      orders very often.  It seems that a big pile of them is just a
      copy&paste when a code has been adopted from one arch to another.
      
      I think it makes some sense to get rid of them because they are just
      making the semantic more unclear.  Please note that GFP_REPEAT is
      documented as
      
      * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt
      
      * _might_ fail.  This depends upon the particular VM implementation.
        while !costly requests have basically nofail semantic.  So one could
        reasonably expect that order-0 request with __GFP_REPEAT will not loop
        for ever.  This is not implemented right now though.
      
      I would like to move on with __GFP_REPEAT and define a better semantic
      for it.
      
        $ git grep __GFP_REPEAT origin/master | wc -l
        111
        $ git grep __GFP_REPEAT | wc -l
        36
      
      So we are down to the third after this patch series.  The remaining
      places really seem to be relying on __GFP_REPEAT due to large allocation
      requests.  This still needs some double checking which I will do later
      after all the simple ones are sorted out.
      
      I am touching a lot of arch specific code here and I hope I got it right
      but as a matter of fact I even didn't compile test for some archs as I
      do not have cross compiler for them.  Patches should be quite trivial to
      review for stupid compile mistakes though.  The tricky parts are usually
      hidden by macro definitions and thats where I would appreciate help from
      arch maintainers.
      
      [1] http://lkml.kernel.org/r/1461849846-27209-1-git-send-email-mhocko@kernel.org
      
      This patch (of 19):
      
      __GFP_REPEAT has a rather weak semantic but since it has been introduced
      around 2.6.12 it has been ignored for low order allocations.  Yet we
      have the full kernel tree with its usage for apparently order-0
      allocations.  This is really confusing because __GFP_REPEAT is
      explicitly documented to allow allocation failures which is a weaker
      semantic than the current order-0 has (basically nofail).
      
      Let's simply drop __GFP_REPEAT from those places.  This would allow to
      identify place which really need allocator to retry harder and formulate
      a more specific semantic for what the flag is supposed to do actually.
      
      Link: http://lkml.kernel.org/r/1464599699-30131-2-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: John Crispin <blogic@openwrt.org>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32d6bd90
  17. 10 6月, 2016 1 次提交
    • A
      powerpc/mm/radix: Flush page walk cache when freeing page table · a145abf1
      Aneesh Kumar K.V 提交于
      Even though a tlb_flush() does a flush with invalidate all cache,
      we can end up doing an RCU page table free before calling tlb_flush().
      That means we can have page walk cache entries even after we free the
      page table pages. This can result in us doing wrong page table walk.
      
      Avoid this by doing pwc flush on every page table free. We can't batch
      the pwc flush, because the rcu call back function where we free the
      page table pages doesn't have information of the mmu gather. Thus we
      have to do a pwc on every page table page freed.
      
      Note: I also removed the dummy tlb_flush_pgtable call functions for
      hash 32.
      
      Fixes: 1a472c9d ("powerpc/mm/radix: Add tlbflush routines")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a145abf1
  18. 11 5月, 2016 6 次提交
  19. 03 3月, 2016 1 次提交
  20. 29 2月, 2016 1 次提交
  21. 14 12月, 2015 2 次提交
  22. 10 12月, 2013 1 次提交
    • H
      powerpc: Fix PTE page address mismatch in pgtable ctor/dtor · cf77ee54
      Hong H. Pham 提交于
      In pte_alloc_one(), pgtable_page_ctor() is passed an address that has
      not been converted by page_address() to the newly allocated PTE page.
      
      When the PTE is freed, __pte_free_tlb() calls pgtable_page_dtor()
      with an address to the PTE page that has been converted by page_address().
      The mismatch in the PTE's page address causes pgtable_page_dtor() to access
      invalid memory, so resources for that PTE (such as the page lock) is not
      properly cleaned up.
      
      On PPC32, only SMP kernels are affected.
      
      On PPC64, only SMP kernels with 4K page size are affected.
      
      This bug was introduced by commit d614bb04
      "powerpc: Move the pte free routines from common header".
      
      On a preempt-rt kernel, a spinlock is dynamically allocated for each
      PTE in pgtable_page_ctor().  When the PTE is freed, calling
      pgtable_page_dtor() with a mismatched page address causes a memory leak,
      as the pointer to the PTE's spinlock is bogus.
      
      On mainline, there isn't any immediately obvious symptoms, but the
      problem still exists here.
      
      Fixes: d614bb04 "powerpc: Move the pte free routes from common header"
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: linux-stable <stable@vger.kernel.org> # v3.10+
      Signed-off-by: NHong H. Pham <hong.pham@windriver.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cf77ee54
  23. 25 11月, 2013 1 次提交
    • H
      powerpc/kdump: Adding symbols in vmcoreinfo to facilitate dump filtering · 8ff81271
      Hari Bathini 提交于
      When CONFIG_SPARSEMEM_VMEMMAP option is used in kernel, makedumpfile fails
      to filter vmcore dump as it fails to do vmemmap translations. So far
      dump filtering on ppc64 never had to deal with vmemmap addresses seperately
      as vmemmap regions where mapped in zone normal. But with the inclusion of
      CONFIG_SPARSEMEM_VMEMMAP config option in kernel, this vmemmap address
      translation support becomes necessary for dump filtering. For vmemmap adress
      translation, few kernel symbols are needed by dump filtering tool. This patch
      adds those symbols to vmcoreinfo, which a dump filtering tool can use for
      filtering the kernel dump. Tested this changes successfully with makedumpfile
      tool that supports vmemmap to physical address translation outside zone normal.
      
      [ Removed unneeded #ifdef as suggested by Michael Ellerman --BenH ]
      Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8ff81271
  24. 15 11月, 2013 1 次提交
  25. 21 6月, 2013 1 次提交
  26. 14 5月, 2013 1 次提交