1. 10 2月, 2017 1 次提交
  2. 07 7月, 2016 1 次提交
  3. 16 6月, 2016 1 次提交
    • O
      powerpc/mm: Ensure "special" zones are empty · 3079abe5
      Oliver O'Halloran 提交于
      The mm zone mechanism was traditionally used by arch specific code to
      partition memory into allocation zones. However there are several zones
      that are managed by the mm subsystem rather than the architecture. Most
      architectures set the max PFN of these special zones to zero, however on
      powerpc we set them to ~0ul. This, in conjunction with a bug in
      free_area_init_nodes() results in all of system memory being placed in
      ZONE_DEVICE when enabled. Device memory cannot be used for regular kernel
      memory allocations so this will cause a kernel panic at boot. Given the
      planned addition of more mm managed zones (ZONE_CMA) we should aim to be
      consistent with every other architecture and set the max PFN for these
      zones to zero.
      Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3079abe5
  4. 11 5月, 2016 3 次提交
  5. 01 3月, 2016 1 次提交
    • D
      powerpc/mm: Clean up memory hotplug failure paths · 1dace6c6
      David Gibson 提交于
      This makes a number of cleanups to handling of mapping failures during
      memory hotplug on Power:
      
      For errors creating the linear mapping for the hot-added region:
        * This is now reported with EFAULT which is more appropriate than the
          previous EINVAL (the failure is unlikely to be related to the
          function's parameters)
        * An error in this path now prints a warning message, rather than just
          silently failing to add the extra memory.
        * Previously a failure here could result in the region being partially
          mapped.  We now clean up any partial mapping before failing.
      
      For errors creating the vmemmap for the hot-added region:
         * This is now reported with EFAULT instead of causing a BUG() - this
           could happen for external reason (e.g. full hash table) so it's better
           to handle this non-fatally
         * An error message is also printed, so the failure won't be silent
         * As above a failure could cause a partially mapped region, we now
           clean this up. [mpe: move htab_remove_mapping() out of #ifdef
           CONFIG_MEMORY_HOTPLUG to enable this]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1dace6c6
  6. 30 1月, 2016 1 次提交
    • T
      arch: Set IORESOURCE_SYSTEM_RAM flag for System RAM · 35d98e93
      Toshi Kani 提交于
      Set IORESOURCE_SYSTEM_RAM in flags of resource ranges with
      "System RAM", "Kernel code", "Kernel data", and "Kernel bss".
      
      Note that:
      
       - IORESOURCE_SYSRAM (i.e. modifier bit) is set in flags when
         IORESOURCE_MEM is already set. IORESOURCE_SYSTEM_RAM is defined
         as (IORESOURCE_MEM|IORESOURCE_SYSRAM).
      
       - Some archs do not set 'flags' for children nodes, such as
         "Kernel code".  This patch does not change 'flags' in this
         case.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-mips@linux-mips.org
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: sparclinux@vger.kernel.org
      Link: http://lkml.kernel.org/r/1453841853-11383-7-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      35d98e93
  7. 25 1月, 2016 1 次提交
  8. 28 8月, 2015 1 次提交
    • D
      mm: ZONE_DEVICE for "device memory" · 033fbae9
      Dan Williams 提交于
      While pmem is usable as a block device or via DAX mappings to userspace
      there are several usage scenarios that can not target pmem due to its
      lack of struct page coverage. In preparation for "hot plugging" pmem
      into the vmemmap add ZONE_DEVICE as a new zone to tag these pages
      separately from the ones that are subject to standard page allocations.
      Importantly "device memory" can be removed at will by userspace
      unbinding the driver of the device.
      
      Having a separate zone prevents allocation and otherwise marks these
      pages that are distinct from typical uniform memory.  Device memory has
      different lifetime and performance characteristics than RAM.  However,
      since we have run out of ZONES_SHIFT bits this functionality currently
      depends on sacrificing ZONE_DMA.
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Jerome Glisse <j.glisse@gmail.com>
      [hch: various simplifications in the arch interface]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      033fbae9
  9. 08 8月, 2015 1 次提交
  10. 03 6月, 2015 1 次提交
  11. 10 4月, 2015 1 次提交
    • M
      powerpc: Replace mem_init_done with slab_is_available() · f691fa10
      Michael Ellerman 提交于
      We have a powerpc specific global called mem_init_done which is "set on
      boot once kmalloc can be called".
      
      But that's not *quite* true. We set it at the bottom of mem_init(), and
      rely on the fact that mm_init() calls kmem_cache_init() immediately
      after that, and nothing is running in parallel.
      
      So replace it with the generic and 100% correct slab_is_available().
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f691fa10
  12. 10 11月, 2014 2 次提交
  13. 05 11月, 2014 1 次提交
  14. 20 9月, 2014 1 次提交
    • S
      powerpc/mm: Use common paging_init() for NUMA · 6db35ad2
      Scott Wood 提交于
      Commit 1c98025c "powerpc: Dynamic DMA
      zone limits" updated how zones are created in paging_init(), but missed
      the NUMA version of paging_init().  This was noticed via a linker
      error, since dma_pfn_limit_to_zone() was, like the non-NUMA
      paging_init(), limited by #ifndef CONFIG_NEED_MULTIPLE_NODES.
      
      It turns out that the NUMA paging_init() was not actually doing
      anything different from the standard paging_init(), other than a couple
      debug prints, a couple 32-bit-only ifdef sections, and a call to
      mark_nonram_nosave().  It's not clear whether mark_nonram_nosave() is
      inherently wrong to do for NUMA, or just not useful on targets that
      have NUMA, but for now I'm preserving the existing behavior.
      
      Fixes: 1c98025c "powerpc: Dynamic DMA zone limits"
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      6db35ad2
  15. 04 9月, 2014 1 次提交
    • S
      powerpc: Dynamic DMA zone limits · 1c98025c
      Scott Wood 提交于
      Platform code can call limit_zone_pfn() to set appropriate limits
      for ZONE_DMA and ZONE_DMA32, and dma_direct_alloc_coherent() will
      select a suitable zone based on a device's mask and the pfn limits that
      platform code has configured.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
      1c98025c
  16. 07 8月, 2014 1 次提交
  17. 07 3月, 2014 1 次提交
    • N
      powerpc/pseries: Use remove_memory() to remove memory · 9ac8cde9
      Nathan Fontenot 提交于
      The memory remove code for powerpc/pseries should call remove_memory()
      so that we are holding the hotplug_memory lock during memory remove
      operations.
      
      This patch updates the memory node remove handler to call remove_memory()
      and adds a ppc_md.remove_memory() entry to handle pseries specific work
      that is called from arch_remove_memory().
      
      During memory remove in pseries_remove_memblock() we have to stay with
      removing memory one section at a time. This is needed because of how memory
      resources are handled. During memory add for pseries (via the probe file in
      sysfs) we add memory one section at a time which gives us a memory resource
      for each section. Future patches will aim to address this so will not have
      to remove memory one section at a time.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9ac8cde9
  18. 22 1月, 2014 1 次提交
    • T
      memblock: make memblock_set_node() support different memblock_type · e7e8de59
      Tang Chen 提交于
      [sfr@canb.auug.org.au: fix powerpc build]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Liu Jiang <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7e8de59
  19. 15 1月, 2014 1 次提交
  20. 10 1月, 2014 1 次提交
    • S
      powerpc/e6500: TLB miss handler with hardware tablewalk support · 28efc35f
      Scott Wood 提交于
      There are a few things that make the existing hw tablewalk handlers
      unsuitable for e6500:
      
       - Indirect entries go in TLB1 (though the resulting direct entries go in
         TLB0).
      
       - It has threads, but no "tlbsrx." -- so we need a spinlock and
         a normal "tlbsx".  Because we need this lock, hardware tablewalk
         is mandatory on e6500 unless we want to add spinlock+tlbsx to
         the normal bolted TLB miss handler.
      
       - TLB1 has no HES (nor next-victim hint) so we need software round robin
         (TODO: integrate this round robin data with hugetlb/KVM)
      
       - The existing tablewalk handlers map half of a page table at a time,
         because IBM hardware has a fixed 1MiB indirect page size.  e6500
         has variable size indirect entries, with a minimum of 2MiB.
         So we can't do the half-page indirect mapping, and even if we
         could it would be less efficient than mapping the full page.
      
       - Like on e5500, the linear mapping is bolted, so we don't need the
         overhead of supporting nested tlb misses.
      
      Note that hardware tablewalk does not work in rev1 of e6500.
      We do not expect to support e6500 rev1 in mainline Linux.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Cc: Mihai Caraman <mihai.caraman@freescale.com>
      28efc35f
  21. 03 10月, 2013 1 次提交
    • N
      powerpc: Fix memory hotplug with sparse vmemmap · f7e3334a
      Nathan Fontenot 提交于
      Previous commit 46723bfa... introduced a new config option
      HAVE_BOOTMEM_INFO_NODE that ended up breaking memory hot-remove for ppc
      when sparse vmemmap is not defined.
      
      This patch defines HAVE_BOOTMEM_INFO_NODE for ppc and adds the call to
      register_page_bootmem_info_node. Without this we get a BUG_ON for memory
      hot remove in put_page_bootmem().
      
      This also adds a stub for register_page_bootmem_memmap to allow ppc to build
      with sparse vmemmap defined. Leaving this as a stub is fine since the same
      vmemmap addresses are also handled in vmemmap_populate and as such are
      properly mapped.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: <stable@vger.kernel.org> [v3.9+]
      f7e3334a
  22. 27 8月, 2013 1 次提交
  23. 04 7月, 2013 5 次提交
    • J
      mm/PPC: prepare for killing free_all_bootmem_node() · 602ddc70
      Jiang Liu 提交于
      Prepare for killing free_all_bootmem_node() by using free_all_bootmem().
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Alexander Graf <agraf@suse.de>
      Cc: "Suzuki K. Poulose" <suzuki@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      602ddc70
    • J
      mm/ppc: prepare for removing num_physpages and simplify mem_init() · 369a9d85
      Jiang Liu 提交于
      Prepare for removing num_physpages and simplify mem_init().
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      369a9d85
    • J
      mm: concentrate modification of totalram_pages into the mm core · 0c988534
      Jiang Liu 提交于
      Concentrate code to modify totalram_pages into the mm core, so the arch
      memory initialized code doesn't need to take care of it.  With these
      changes applied, only following functions from mm core modify global
      variable totalram_pages: free_bootmem_late(), free_all_bootmem(),
      free_all_bootmem_node(), adjust_managed_page_count().
      
      With this patch applied, it will be much more easier for us to keep
      totalram_pages and zone->managed_pages in consistence.
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: <sworddragon2@aol.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0c988534
    • J
      mm: enhance free_reserved_area() to support poisoning memory with zero · dbe67df4
      Jiang Liu 提交于
      Address more review comments from last round of code review.
      1) Enhance free_reserved_area() to support poisoning freed memory with
         pattern '0'. This could be used to get rid of poison_init_mem()
         on ARM64.
      2) A previous patch has disabled memory poison for initmem on s390
         by mistake, so restore to the original behavior.
      3) Remove redundant PAGE_ALIGN() when calling free_reserved_area().
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: <sworddragon2@aol.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dbe67df4
    • J
      mm: change signature of free_reserved_area() to fix building warnings · 11199692
      Jiang Liu 提交于
      Change signature of free_reserved_area() according to Russell King's
      suggestion to fix following build warnings:
      
        arch/arm/mm/init.c: In function 'mem_init':
        arch/arm/mm/init.c:603:2: warning: passing argument 1 of 'free_reserved_area' makes integer from pointer without a cast [enabled by default]
          free_reserved_area(__va(PHYS_PFN_OFFSET), swapper_pg_dir, 0, NULL);
          ^
        In file included from include/linux/mman.h:4:0,
                         from arch/arm/mm/init.c:15:
        include/linux/mm.h:1301:22: note: expected 'long unsigned int' but argument is of type 'void *'
         extern unsigned long free_reserved_area(unsigned long start, unsigned long end,
      
         mm/page_alloc.c: In function 'free_reserved_area':
      >> mm/page_alloc.c:5134:3: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast [enabled by default]
         In file included from arch/mips/include/asm/page.h:49:0,
                          from include/linux/mmzone.h:20,
                          from include/linux/gfp.h:4,
                          from include/linux/mm.h:8,
                          from mm/page_alloc.c:18:
         arch/mips/include/asm/io.h:119:29: note: expected 'const volatile void *' but argument is of type 'long unsigned int'
         mm/page_alloc.c: In function 'free_area_init_nodes':
         mm/page_alloc.c:5030:34: warning: array subscript is below array bounds [-Warray-bounds]
      
      Also address some minor code review comments.
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: <sworddragon2@aol.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      11199692
  24. 21 6月, 2013 1 次提交
    • A
      powerpc: Make linux pagetable walk safe with THP enabled · 0ac52dd7
      Aneesh Kumar K.V 提交于
      We need to have irqs disabled to handle all the possible parallel update for
      linux page table without holding locks.
      
      Events that we are intersted in while walking page tables are
      1) Page fault
      2) umap
      3) THP split
      4) THP collapse
      
      A) local_irq_disabled:
      ------------------------
      1) page fault:
      A none to valid transition via page fault is not an issue because we
      would either see a none or valid. If it is none, we would error out
      the page table walk. We may need to use on stack values when checking for
      type of page table elements, because if we do
      
      if (!is_hugepd()) {
          if (!pmd_none() {
             if (pmd_bad() {
      
      We could take that bad condition because the pmd got converted to a hugepd
      after the !is_hugepd check via a hugetlb fault.
      
      The right way would be to check for pmd_none higher up or use on stack value.
      
      2) A valid to none conversion via unmap:
      We can safely walk the upper level table, because we don't remove the the
      page table entries until rcu grace period. So even if we followed a
      wrong pointer we still have the pointer valid till the grace period.
      
      A PTE pointer returned need to be atomically checked for _PAGE_PRESENT and
       _PAGE_BUSY. A valid pointer returned could becoming none later. To prevent
      pte_clear we take _PAGE_BUSY.
      
      3) THP split:
      A valid transparent hugepage is converted to nomal page. Before we split we
      do pmd_splitting_flush, which sets the hugepage PTE to _PAGE_SPLITTING
      So when walking page table we need to check for pmd_trans_splitting and
      handle that. The pte returned should also need to be checked for
      _PAGE_SPLITTING before setting _PAGE_BUSY similar to _PAGE_PRESENT. We save
      the value of PTE on stack and check for the flag in the local pte value.
      If we don't have the value set we can safely operate on the local pte value
      and we atomicaly set _PAGE_BUSY.
      
      4) THP collapse:
      A normal page gets converted to hugepage. In the collapse path, we
      mark the pmd none early (pmdp_clear_flush). With irq disabled, if we
      are aleady walking page table we would see the pmd_none and won't continue.
      If we see a valid PMD, we should still check for _PAGE_PRESENT before
      setting _PAGE_BUSY, to make sure we didn't collapse the PTE to a Huge PTE.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0ac52dd7
  25. 30 4月, 2013 2 次提交
  26. 18 4月, 2013 1 次提交
  27. 24 2月, 2013 1 次提交
  28. 29 1月, 2013 1 次提交
  29. 13 9月, 2012 1 次提交
  30. 07 9月, 2012 1 次提交
  31. 16 8月, 2012 1 次提交
  32. 20 3月, 2012 1 次提交