1. 16 8月, 2017 1 次提交
    • A
      powerpc/mm/hugetlb: Add support for reserving gigantic huge pages via kernel command line · 79cc38de
      Aneesh Kumar K.V 提交于
      With commit aa888a74 ("hugetlb: support larger than MAX_ORDER") we added
      support for allocating gigantic hugepages via kernel command line. Switch
      ppc64 arch specific code to use that.
      
      W.r.t FSL support, we now limit our allocation range using BOOTMEM_ALLOC_ACCESSIBLE.
      
      We use the kernel command line to do reservation of hugetlb pages on powernv
      platforms. On pseries hash mmu mode the supported gigantic huge page size is
      16GB and that can only be allocated with hypervisor assist. For pseries the
      command line option doesn't do the allocation. Instead pseries does gigantic
      hugepage allocation based on hypervisor hint that is specified via
      "ibm,expected#pages" property of the memory node.
      
      Cc: Scott Wood <oss@buserror.net>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      79cc38de
  2. 15 8月, 2017 1 次提交
    • C
      powerpc/hugetlb: fix page rights verification in gup_hugepte() · ca8afd40
      Christophe Leroy 提交于
      gup_hugepte() checks if pages are present and readable, and
      when  'write' is set, also checks if the pages are writable.
      
      Initially this was done by checking if _PAGE_PRESENT and
      _PAGE_READ were set. In addition, _PAGE_WRITE was verified for write
      accesses.
      
      The problem is that we have to handle the three following cases:
      1/ The target defines __PAGE_READ and __PAGE_WRITE
      2/ The target defines __PAGE_RW
      3/ The target defines __PAGE_RO
      
      In case 1/, this is obvious
      In case 2/, __PAGE_READ is defined as 0 and __PAGE_WRITE as __PAGE_RW
      so it works as well.
      But in case 3, __PAGE_RW is defined as 0, which means __PAGE_WRITE is 0
      and then the test returns true (page writable) in all cases.
      
      A first correction was attempted in commit 6b8cb66a ("powerpc: Fix
      usage of _PAGE_RO in hugepage"), but that fix is wrong:
      instead of checking that the page is writable when write is requested,
      it checks that the page is NOT writable when write is NOT requested.
      
      This patch adds a new pte_read() helper to check whether a page is
      readable or not. This avoids handling all possible cases in
      gup_hugepte().
      
      Then gup_hugepte() is modified to use pte_present(), pte_read()
      and pte_write() instead of the raw flags.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ca8afd40
  3. 07 7月, 2017 4 次提交
  4. 02 7月, 2017 2 次提交
  5. 05 6月, 2017 1 次提交
  6. 31 3月, 2017 1 次提交
    • A
      powerpc/mm/hugetlb: Filter out hugepage size not supported by page table layout · a525108c
      Aneesh Kumar K.V 提交于
      Without this if firmware reports 1MB page size support we will crash
      trying to use 1MB as hugetlb page size.
      
      echo 300 > /sys/kernel/mm/hugepages/hugepages-1024kB/nr_hugepages
      
      kernel BUG at ./arch/powerpc/include/asm/hugetlb.h:19!
      .....
      ....
      [c0000000e2c27b30] c00000000029dae8 .hugetlb_fault+0x638/0xda0
      [c0000000e2c27c30] c00000000026fb64 .handle_mm_fault+0x844/0x1d70
      [c0000000e2c27d70] c00000000004805c .do_page_fault+0x3dc/0x7c0
      [c0000000e2c27e30] c00000000000ac98 handle_page_fault+0x10/0x30
      
      With fix, we don't enable 1MB as hugepage size.
      
      bash-4.2# cd /sys/kernel/mm/hugepages/
      bash-4.2# ls
      hugepages-16384kB  hugepages-16777216kB
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a525108c
  7. 18 1月, 2017 3 次提交
  8. 10 12月, 2016 2 次提交
    • C
      powerpc/8xx: Implement support of hugepages · 4b914286
      Christophe Leroy 提交于
      8xx uses a two level page table with two different linux page size
      support (4k and 16k). 8xx also support two different hugepage sizes
      512k and 8M. In order to support them on linux we define two different
      page table layout.
      
      The size of pages is in the PGD entry, using PS field (bits 28-29):
      00 : Small pages (4k or 16k)
      01 : 512k pages
      10 : reserved
      11 : 8M pages
      
      For 512K hugepage size a pgd entry have the below format
      [<hugepte address >0101] . The hugepte table allocated will contain 8
      entries pointing to 512K huge pte in 4k pages mode and 64 entries in
      16k pages mode.
      
      For 8M in 16k mode, a pgd entry have the below format
      [<hugepte address >1101] . The hugepte table allocated will contain 8
      entries pointing to 8M huge pte.
      
      For 8M in 4k mode, multiple pgd entries point to the same hugepte
      address and pgd entry will have the below format
      [<hugepte address>1101]. The hugepte table allocated will only have one
      entry.
      
      For the time being, we do not support CPU15 ERRATA when HUGETLB is
      selected
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> (v3, for the generic bits)
      Signed-off-by: NScott Wood <oss@buserror.net>
      4b914286
    • C
      powerpc: get hugetlbpage handling more generic · 03bb2d65
      Christophe Leroy 提交于
      Today there are two implementations of hugetlbpages which are managed
      by exclusive #ifdefs:
      * FSL_BOOKE: several directory entries points to the same single hugepage
      * BOOK3S: one upper level directory entry points to a table of hugepages
      
      In preparation of implementation of hugepage support on the 8xx, we
      need a mix of the two above solutions, because the 8xx needs both cases
      depending on the size of pages:
      * In 4k page size mode, each PGD entry covers a 4M bytes area. It means
      that 2 PGD entries will be necessary to cover an 8M hugepage while a
      single PGD entry will cover 8x 512k hugepages.
      * In 16 page size mode, each PGD entry covers a 64M bytes area. It means
      that 8x 8M hugepages will be covered by one PGD entry and 64x 512k
      hugepages will be covers by one PGD entry.
      
      This patch:
      * removes #ifdefs in favor of if/else based on the range sizes
      * merges the two huge_pte_alloc() functions as they are pretty similar
      * merges the two hugetlbpage_init() functions as they are pretty similar
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> (v3)
      Signed-off-by: NScott Wood <oss@buserror.net>
      03bb2d65
  9. 23 9月, 2016 1 次提交
  10. 21 7月, 2016 1 次提交
  11. 25 6月, 2016 1 次提交
  12. 20 5月, 2016 1 次提交
  13. 11 5月, 2016 2 次提交
  14. 01 5月, 2016 2 次提交
  15. 29 3月, 2016 1 次提交
    • S
      powerpc/mm: Fixup preempt underflow with huge pages · 08a5bb29
      Sebastian Siewior 提交于
      hugepd_free() used __get_cpu_var() once. Nothing ensured that the code
      accessing the variable did not migrate from one CPU to another and soon
      this was noticed by Tiejun Chen in 94b09d75 ("powerpc/hugetlb:
      Replace __get_cpu_var with get_cpu_var"). So we had it fixed.
      
      Christoph Lameter was doing his __get_cpu_var() replaces and forgot
      PowerPC. Then he noticed this and sent his fixed up batch again which
      got applied as 69111bac ("powerpc: Replace __get_cpu_var uses").
      
      The careful reader will noticed one little detail: get_cpu_var() got
      replaced with this_cpu_ptr(). So now we have a put_cpu_var() which does
      a preempt_enable() and nothing that does preempt_disable() so we
      underflow the preempt counter.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      08a5bb29
  16. 29 2月, 2016 1 次提交
  17. 16 1月, 2016 2 次提交
  18. 14 12月, 2015 2 次提交
  19. 12 10月, 2015 2 次提交
  20. 18 8月, 2015 1 次提交
    • M
      powerpc/cell: Drop support for 64K local store on 4K kernels · f444f1f8
      Michael Ellerman 提交于
      Back in the olden days we added support for using 64K pages to map the
      SPU (Synergistic Processing Unit) local store on Cell, when the main
      kernel was using 4K pages.
      
      This was useful at the time because distros were using 4K pages, but
      using 64K pages on the SPUs could reduce TLB pressure there.
      
      However these days the number of Cell users is approaching zero, and
      supporting this option adds unpleasant complexity to the memory
      management code.
      
      So drop the option, CONFIG_SPU_FS_64K_LS, and all related code.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NJeremy Kerr <jk@ozlabs.org>
      f444f1f8
  21. 25 6月, 2015 1 次提交
    • Z
      mm/hugetlb: reduce arch dependent code about huge_pmd_unshare · e81f2d22
      Zhang Zhen 提交于
      Currently we have many duplicates in definitions of huge_pmd_unshare.  In
      all architectures this function just returns 0 when
      CONFIG_ARCH_WANT_HUGE_PMD_SHARE is N.
      
      This patch puts the default implementation in mm/hugetlb.c and lets these
      architectures use the common code.
      Signed-off-by: NZhang Zhen <zhenzhang.zhang@huawei.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: James Yang <James.Yang@freescale.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e81f2d22
  22. 17 6月, 2015 1 次提交
    • P
      powerpc: don't use module_init for non-modular core hugetlb code · 6f114281
      Paul Gortmaker 提交于
      The hugetlbpage.o is obj-y (always built in).  It will never
      be modular, so using module_init as an alias for __initcall is
      somewhat misleading.
      
      Fix this up now, so that we can relocate module_init from
      init.h into module.h in the future.  If we don't do this, we'd
      have to add module.h to obviously non-modular code, and that
      would be a worse thing.
      
      Note that direct use of __initcall is discouraged, vs. one
      of the priority categorized subgroups.  As __initcall gets
      mapped onto device_initcall, our use of arch_initcall (which
      makes sense for arch code) will thus change this registration
      from level 6-device to level 3-arch (i.e. slightly earlier).
      However no observable impact of that small difference has
      been observed during testing, or is expected.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      6f114281
  23. 20 5月, 2015 1 次提交
    • L
      module: add extra argument for parse_params() callback · ecc86170
      Luis R. Rodriguez 提交于
      This adds an extra argument onto parse_params() to be used
      as a way to make the unused callback a bit more useful and
      generic by allowing the caller to pass on a data structure
      of its choice. An example use case is to allow us to easily
      make module parameters for every module which we will do
      next.
      
      @ parse @
      identifier name, args, params, num, level_min, level_max;
      identifier unknown, param, val, doing;
      type s16;
      @@
       extern char *parse_args(const char *name,
       			 char *args,
       			 const struct kernel_param *params,
       			 unsigned num,
       			 s16 level_min,
       			 s16 level_max,
      +			 void *arg,
       			 int (*unknown)(char *param, char *val,
      					const char *doing
      +					, void *arg
      					));
      
      @ parse_mod @
      identifier name, args, params, num, level_min, level_max;
      identifier unknown, param, val, doing;
      type s16;
      @@
       char *parse_args(const char *name,
       			 char *args,
       			 const struct kernel_param *params,
       			 unsigned num,
       			 s16 level_min,
       			 s16 level_max,
      +			 void *arg,
       			 int (*unknown)(char *param, char *val,
      					const char *doing
      +					, void *arg
      					))
      {
      	...
      }
      
      @ parse_args_found @
      expression R, E1, E2, E3, E4, E5, E6;
      identifier func;
      @@
      
      (
      	R =
      	parse_args(E1, E2, E3, E4, E5, E6,
      +		   NULL,
      		   func);
      |
      	R =
      	parse_args(E1, E2, E3, E4, E5, E6,
      +		   NULL,
      		   &func);
      |
      	R =
      	parse_args(E1, E2, E3, E4, E5, E6,
      +		   NULL,
      		   NULL);
      |
      	parse_args(E1, E2, E3, E4, E5, E6,
      +		   NULL,
      		   func);
      |
      	parse_args(E1, E2, E3, E4, E5, E6,
      +		   NULL,
      		   &func);
      |
      	parse_args(E1, E2, E3, E4, E5, E6,
      +		   NULL,
      		   NULL);
      )
      
      @ parse_args_unused depends on parse_args_found @
      identifier parse_args_found.func;
      @@
      
      int func(char *param, char *val, const char *unused
      +		 , void *arg
      		 )
      {
      	...
      }
      
      @ mod_unused depends on parse_args_found @
      identifier parse_args_found.func;
      expression A1, A2, A3;
      @@
      
      -	func(A1, A2, A3);
      +	func(A1, A2, A3, NULL);
      
      Generated-by: Coccinelle SmPL
      Cc: cocci@systeme.lip6.fr
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Felipe Contreras <felipe.contreras@gmail.com>
      Cc: Ewan Milne <emilne@redhat.com>
      Cc: Jean Delvare <jdelvare@suse.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by: NTejun Heo <tj@kernel.org>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ecc86170
  24. 12 5月, 2015 1 次提交
  25. 17 4月, 2015 2 次提交
    • A
      powerpc/mm/thp: Return pte address if we find trans_splitting. · 7d6e7f7f
      Aneesh Kumar K.V 提交于
      For THP that is marked trans splitting, we return the pte.
      This require the callers to handle the pmd_trans_splitting scenario,
      if they care. All the current callers are either looking at pfn or
      write_ok, hence we don't need to update them.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7d6e7f7f
    • A
      powerpc/mm/thp: Make page table walk safe against thp split/collapse · 691e95fd
      Aneesh Kumar K.V 提交于
      We can disable a THP split or a hugepage collapse by disabling irq.
      We do send IPI to all the cpus in the early part of split/collapse,
      and disabling local irq ensure we don't make progress with
      split/collapse. If the THP is getting split we return NULL from
      find_linux_pte_or_hugepte(). For all the current callers it should be ok.
      We need to be careful if we want to use returned pte_t pointer outside
      the irq disabled region. W.r.t to THP split, the pfn remains the same,
      but then a hugepage collapse will result in a pfn change. There are
      few steps we can take to avoid a hugepage collapse.One way is to take page
      reference inside the irq disable region. Other option is to take
      mmap_sem so that a parallel collapse will not happen. We can also
      disable collapse by taking pmd_lock. Another method used by kvm
      subsystem is to check whether we had a mmu_notifer update in between
      using mmu_notifier_retry().
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      691e95fd
  26. 16 4月, 2015 1 次提交
  27. 10 4月, 2015 1 次提交