1. 29 1月, 2014 3 次提交
  2. 22 1月, 2014 1 次提交
    • T
      memblock: make memblock_set_node() support different memblock_type · e7e8de59
      Tang Chen 提交于
      [sfr@canb.auug.org.au: fix powerpc build]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Liu Jiang <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7e8de59
  3. 18 1月, 2014 1 次提交
  4. 15 1月, 2014 4 次提交
    • G
      powerpc: Make add_system_ram_resources() __init · 4f770924
      Geert Uytterhoeven 提交于
      add_system_ram_resources() is a subsys_initcall.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4f770924
    • S
      powerpc: Add debug checks to catch invalid cpu-to-node mappings · 68fb18aa
      Srivatsa S. Bhat 提交于
      There have been some weird bugs in the past where the kernel tried to associate
      threads of the same core to different NUMA nodes, and things went haywire after
      that point (as expected).
      
      But unfortunately, root-causing such issues have been quite challenging, due to
      the lack of appropriate debug checks in the kernel. These bugs usually lead to
      some odd soft-lockups in the scheduler's build-sched-domain code in the CPU
      hotplug path, which makes it very hard to trace it back to the incorrect
      cpu-to-node mappings.
      
      So add appropriate debug checks to catch such invalid cpu-to-node mappings
      as early as possible.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      68fb18aa
    • S
      powerpc: Fix the setup of CPU-to-Node mappings during CPU online · d4edc5b6
      Srivatsa S. Bhat 提交于
      On POWER platforms, the hypervisor can notify the guest kernel about dynamic
      changes in the cpu-numa associativity (VPHN topology update). Hence the
      cpu-to-node mappings that we got from the firmware during boot, may no longer
      be valid after such updates. This is handled using the arch_update_cpu_topology()
      hook in the scheduler, and the sched-domains are rebuilt according to the new
      mappings.
      
      But unfortunately, at the moment, CPU hotplug ignores these updated mappings
      and instead queries the firmware for the cpu-to-numa relationships and uses
      them during CPU online. So the kernel can end up assigning wrong NUMA nodes
      to CPUs during subsequent CPU hotplug online operations (after booting).
      
      Further, a particularly problematic scenario can result from this bug:
      On POWER platforms, the SMT mode can be switched between 1, 2, 4 (and even 8)
      threads per core. The switch to Single-Threaded (ST) mode is performed by
      offlining all except the first CPU thread in each core. Switching back to
      SMT mode involves onlining those other threads back, in each core.
      
      Now consider this scenario:
      
      1. During boot, the kernel gets the cpu-to-node mappings from the firmware
         and assigns the CPUs to NUMA nodes appropriately, during CPU online.
      
      2. Later on, the hypervisor updates the cpu-to-node mappings dynamically and
         communicates this update to the kernel. The kernel in turn updates its
         cpu-to-node associations and rebuilds its sched domains. Everything is
         fine so far.
      
      3. Now, the user switches the machine from SMT to ST mode (say, by running
         ppc64_cpu --smt=1). This involves offlining all except 1 thread in each
         core.
      
      4. The user then tries to switch back from ST to SMT mode (say, by running
         ppc64_cpu --smt=4), and this involves onlining those threads back. Since
         CPU hotplug ignores the new mappings, it queries the firmware and tries to
         associate the newly onlined sibling threads to the old NUMA nodes. This
         results in sibling threads within the same core getting associated with
         different NUMA nodes, which is incorrect.
      
         The scheduler's build-sched-domains code gets thoroughly confused with this
         and enters an infinite loop and causes soft-lockups, as explained in detail
         in commit 3be7db6a (powerpc: VPHN topology change updates all siblings).
      
      So to fix this, use the numa_cpu_lookup_table to remember the updated
      cpu-to-node mappings, and use them during CPU hotplug online operations.
      Further, we also need to ensure that all threads in a core are assigned to a
      common NUMA node, irrespective of whether all those threads were online during
      the topology update. To achieve this, we take care not to use cpu_sibling_mask()
      since it is not hotplug invariant. Instead, we use cpu_first_sibling_thread()
      and set up the mappings manually using the 'threads_per_core' value for that
      particular platform. This helps us ensure that we don't hit this bug with any
      combination of CPU hotplug and SMT mode switching.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d4edc5b6
    • P
      powerpc: Delete non-required instances of include <linux/init.h> · c141611f
      Paul Gortmaker 提交于
      None of these files are actually using any __init type directives
      and hence don't need to include <linux/init.h>.  Most are just a
      left over from __devinit and __cpuinit removal, or simply due to
      code getting copied from one driver to the next.
      
      The one instance where we add an include for init.h covers off
      a case where that file was implicitly getting it from another
      header which itself didn't need it.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      c141611f
  5. 11 1月, 2014 1 次提交
  6. 10 1月, 2014 9 次提交
    • S
      powerpc/fsl-book3e-64: Use paca for hugetlb TLB1 entry selection · bbead78c
      Scott Wood 提交于
      This keeps usage coordinated for hugetlb and indirect entries, which
      should make entry selection more predictable and probably improve overall
      performance when mixing the two.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      bbead78c
    • S
      powerpc/e6500: TLB miss handler with hardware tablewalk support · 28efc35f
      Scott Wood 提交于
      There are a few things that make the existing hw tablewalk handlers
      unsuitable for e6500:
      
       - Indirect entries go in TLB1 (though the resulting direct entries go in
         TLB0).
      
       - It has threads, but no "tlbsrx." -- so we need a spinlock and
         a normal "tlbsx".  Because we need this lock, hardware tablewalk
         is mandatory on e6500 unless we want to add spinlock+tlbsx to
         the normal bolted TLB miss handler.
      
       - TLB1 has no HES (nor next-victim hint) so we need software round robin
         (TODO: integrate this round robin data with hugetlb/KVM)
      
       - The existing tablewalk handlers map half of a page table at a time,
         because IBM hardware has a fixed 1MiB indirect page size.  e6500
         has variable size indirect entries, with a minimum of 2MiB.
         So we can't do the half-page indirect mapping, and even if we
         could it would be less efficient than mapping the full page.
      
       - Like on e5500, the linear mapping is bolted, so we don't need the
         overhead of supporting nested tlb misses.
      
      Note that hardware tablewalk does not work in rev1 of e6500.
      We do not expect to support e6500 rev1 in mainline Linux.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Cc: Mihai Caraman <mihai.caraman@freescale.com>
      28efc35f
    • S
      powerpc: add barrier after writing kernel PTE · 47ce8af4
      Scott Wood 提交于
      There is no barrier between something like ioremap() writing to
      a PTE, and returning the value to a caller that may then store the
      pointer in a place that is visible to other CPUs.  Such callers
      generally don't perform barriers of their own.
      
      Even if callers of ioremap() and similar things did use barriers,
      the most logical choise would be smp_wmb(), which is not
      architecturally sufficient when BookE hardware tablewalk is used.  A
      full sync is specified by the architecture.
      
      For userspace mappings, OTOH, we generally already have an lwsync due
      to locking, and if we occasionally take a spurious fault due to not
      having a full sync with hardware tablewalk, it will not be fatal
      because we will retry rather than oops.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      47ce8af4
    • K
      powerpc/fsl_booke: smp support for booting a relocatable kernel above 64M · 0be7d969
      Kevin Hao 提交于
      When booting above the 64M for a secondary cpu, we also face the
      same issue as the boot cpu that the PAGE_OFFSET map two different
      physical address for the init tlb and the final map. So we have to use
      switch_to_as1/restore_to_as0 between the conversion of these two
      maps. When restoring to as0 for a secondary cpu, we only need to
      return to the caller. So add a new parameter for function
      restore_to_as0 for this purpose.
      
      Use LOAD_REG_ADDR_PIC to get the address of variables which may
      be used before we set the final map in cams for the secondary cpu.
      Move the setting of cams a bit earlier in order to avoid the
      unnecessary using of LOAD_REG_ADDR_PIC.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      0be7d969
    • K
      powerpc/fsl_booke: make sure PAGE_OFFSET map to memstart_addr for relocatable kernel · 7d2471f9
      Kevin Hao 提交于
      This is always true for a non-relocatable kernel. Otherwise the kernel
      would get stuck. But for a relocatable kernel, it seems a little
      complicated. When booting a relocatable kernel, we just align the
      kernel start addr to 64M and map the PAGE_OFFSET from there. The
      relocation will base on this virtual address. But if this address
      is not the same as the memstart_addr, we will have to change the
      map of PAGE_OFFSET to the real memstart_addr and do another relocation
      again.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      [scottwood@freescale.com: make offset long and non-negative in simple case]
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      7d2471f9
    • K
      powerpc/fsl_booke: introduce map_mem_in_cams_addr · 813125d8
      Kevin Hao 提交于
      Introduce this function so we can set both the physical and virtual
      address for the map in cams. This will be used by the relocation code.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      813125d8
    • K
      powerpc/fsl_booke: set the tlb entry for the kernel address in AS1 · 78a235ef
      Kevin Hao 提交于
      We use the tlb1 entries to map low mem to the kernel space. In the
      current code, it assumes that the first tlb entry would cover the
      kernel image. But this is not true for some special cases, such as
      when we run a relocatable kernel above the 64M or set
      CONFIG_KERNEL_START above 64M. So we choose to switch to address
      space 1 before setting these tlb entries.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      78a235ef
    • K
      powerpc: enable the relocatable support for the fsl booke 32bit kernel · dd189692
      Kevin Hao 提交于
      This is based on the codes in the head_44x.S. The difference is that
      the init tlb size we used is 64M. With this patch we can only load the
      kernel at address between memstart_addr ~ memstart_addr + 64M. We will
      fix this restriction in the following patches.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      dd189692
    • K
      powerpc/fsl_booke: protect the access to MAS7 · 7c732cba
      Kevin Hao 提交于
      The e500v1 doesn't implement the MAS7, so we should avoid to access
      this register on that implementations. In the current kernel, the
      access to MAS7 are protected by either CONFIG_PHYS_64BIT or
      MMU_FTR_BIG_PHYS. Since some code are executed before the code
      patching, we have to use CONFIG_PHYS_64BIT in these cases.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      7c732cba
  7. 09 12月, 2013 2 次提交
  8. 02 12月, 2013 1 次提交
  9. 23 11月, 2013 1 次提交
  10. 21 11月, 2013 2 次提交
  11. 15 11月, 2013 1 次提交
  12. 13 11月, 2013 3 次提交
  13. 30 10月, 2013 1 次提交
  14. 29 10月, 2013 2 次提交
  15. 11 10月, 2013 2 次提交
  16. 03 10月, 2013 1 次提交
    • N
      powerpc: Fix memory hotplug with sparse vmemmap · f7e3334a
      Nathan Fontenot 提交于
      Previous commit 46723bfa... introduced a new config option
      HAVE_BOOTMEM_INFO_NODE that ended up breaking memory hot-remove for ppc
      when sparse vmemmap is not defined.
      
      This patch defines HAVE_BOOTMEM_INFO_NODE for ppc and adds the call to
      register_page_bootmem_info_node. Without this we get a BUG_ON for memory
      hot remove in put_page_bootmem().
      
      This also adds a stub for register_page_bootmem_memmap to allow ppc to build
      with sparse vmemmap defined. Leaving this as a stub is fine since the same
      vmemmap addresses are also handled in vmemmap_populate and as such are
      properly mapped.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: <stable@vger.kernel.org> [v3.9+]
      f7e3334a
  17. 13 9月, 2013 1 次提交
  18. 12 9月, 2013 1 次提交
    • N
      mm: migrate: check movability of hugepage in unmap_and_move_huge_page() · 83467efb
      Naoya Horiguchi 提交于
      Currently hugepage migration works well only for pmd-based hugepages
      (mainly due to lack of testing,) so we had better not enable migration of
      other levels of hugepages until we are ready for it.
      
      Some users of hugepage migration (mbind, move_pages, and migrate_pages) do
      page table walk and check pud/pmd_huge() there, so they are safe.  But the
      other users (softoffline and memory hotremove) don't do this, so without
      this patch they can try to migrate unexpected types of hugepages.
      
      To prevent this, we introduce hugepage_migration_support() as an
      architecture dependent check of whether hugepage are implemented on a pmd
      basis or not.  And on some architecture multiple sizes of hugepages are
      available, so hugepage_migration_support() also checks hugepage size.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83467efb
  19. 11 9月, 2013 1 次提交
    • A
      powerpc: Fix possible deadlock on page fault · 69e044dd
      Aneesh Kumar K.V 提交于
       stack_grow_into/14082 is trying to acquire lock:
        (&mm->mmap_sem){++++++}, at: [<c000000000206d28>] .might_fault+0x78/0xe0
      
       but task is already holding lock:
        (&mm->mmap_sem){++++++}, at: [<c0000000007ffd8c>] .do_page_fault+0x24c/0x910
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(&mm->mmap_sem);
         lock(&mm->mmap_sem);
      
        *** DEADLOCK ***
      
        May be due to missing lock nesting notation
      
       1 lock held by stack_grow_into/14082:
        #0:  (&mm->mmap_sem){++++++}, at: [<c0000000007ffd8c>] .do_page_fault+0x24c/0x910
      
       stack backtrace:
       CPU: 21 PID: 14082 Comm: stack_grow_into Not tainted 3.10.0-10.el7.ppc64.debug #1
       Call Trace:
       [c0000003d396b850] [c000000000016e7c] .show_stack+0x7c/0x1f0 (unreliable)
       [c0000003d396b920] [c000000000813fc8] .dump_stack+0x28/0x3c
       [c0000003d396b990] [c000000000124b90] .__lock_acquire+0x1640/0x1800
       [c0000003d396bab0] [c00000000012570c] .lock_acquire+0xac/0x250
       [c0000003d396bb80] [c000000000206d54] .might_fault+0xa4/0xe0
       [c0000003d396bbf0] [c0000000007ffe2c] .do_page_fault+0x2ec/0x910
       [c0000003d396be30] [c0000000000092e8] handle_page_fault+0x10/0x30
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      69e044dd
  20. 27 8月, 2013 1 次提交
  21. 20 8月, 2013 1 次提交