1. 11 7月, 2018 2 次提交
  2. 10 7月, 2018 1 次提交
    • L
      arm64: numa: rework ACPI NUMA initialization · e1896249
      Lorenzo Pieralisi 提交于
      Current ACPI ARM64 NUMA initialization code in
      
      acpi_numa_gicc_affinity_init()
      
      carries out NUMA nodes creation and cpu<->node mappings at the same time
      in the arch backend so that a single SRAT walk is needed to parse both
      pieces of information.  This implies that the cpu<->node mappings must
      be stashed in an array (sized NR_CPUS) so that SMP code can later use
      the stashed values to avoid another SRAT table walk to set-up the early
      cpu<->node mappings.
      
      If the kernel is configured with a NR_CPUS value less than the actual
      processor entries in the SRAT (and MADT), the logic in
      acpi_numa_gicc_affinity_init() is broken in that the cpu<->node mapping
      is only carried out (and stashed for future use) only for a number of
      SRAT entries up to NR_CPUS, which do not necessarily correspond to the
      possible cpus detected at SMP initialization in
      acpi_map_gic_cpu_interface() (ie MADT and SRAT processor entries order
      is not enforced), which leaves the kernel with broken cpu<->node
      mappings.
      
      Furthermore, given the current ACPI NUMA code parsing logic in
      acpi_numa_gicc_affinity_init(), PXM domains for CPUs that are not parsed
      because they exceed NR_CPUS entries are not mapped to NUMA nodes (ie the
      PXM corresponding node is not created in the kernel) leaving the system
      with a broken NUMA topology.
      
      Rework the ACPI ARM64 NUMA initialization process so that the NUMA
      nodes creation and cpu<->node mappings are decoupled. cpu<->node
      mappings are moved to SMP initialization code (where they are needed),
      at the cost of an extra SRAT walk so that ACPI NUMA mappings can be
      batched before being applied, fixing current parsing pitfalls.
      Acked-by: NHanjun Guo <hanjun.guo@linaro.org>
      Tested-by: NJohn Garry <john.garry@huawei.com>
      Fixes: d8b47fca ("arm64, ACPI, NUMA: NUMA support based on SRAT and
      SLIT")
      Link: http://lkml.kernel.org/r/1527768879-88161-2-git-send-email-xiexiuqi@huawei.comReported-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Punit Agrawal <punit.agrawal@arm.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
      Cc: Jeremy Linton <jeremy.linton@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      e1896249
  3. 09 7月, 2018 1 次提交
  4. 06 7月, 2018 20 次提交
  5. 05 7月, 2018 10 次提交
    • S
      arm64: Handle mismatched cache type · 314d53d2
      Suzuki K Poulose 提交于
      Track mismatches in the cache type register (CTR_EL0), other
      than the D/I min line sizes and trap user accesses if there are any.
      
      Fixes: be68a8aa ("arm64: cpufeature: Fix CTR_EL0 field definitions")
      Cc: <stable@vger.kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      314d53d2
    • S
      arm64: Fix mismatched cache line size detection · 4c4a39dd
      Suzuki K Poulose 提交于
      If there is a mismatch in the I/D min line size, we must
      always use the system wide safe value both in applications
      and in the kernel, while performing cache operations. However,
      we have been checking more bits than just the min line sizes,
      which triggers false negatives. We may need to trap the user
      accesses in such cases, but not necessarily patch the kernel.
      
      This patch fixes the check to do the right thing as advertised.
      A new capability will be added to check mismatches in other
      fields and ensure we trap the CTR accesses.
      
      Fixes: be68a8aa ("arm64: cpufeature: Fix CTR_EL0 field definitions")
      Cc: <stable@vger.kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Reported-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      4c4a39dd
    • W
      arm64: kconfig: Ensure spinlock fastpaths are inlined if !PREEMPT · 5d168964
      Will Deacon 提交于
      When running with CONFIG_PREEMPT=n, the spinlock fastpaths fit inside
      64 bytes, which typically coincides with the L1 I-cache line size.
      
      Inline the spinlock fastpaths, like we do already for rwlocks.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      5d168964
    • W
      arm64: locking: Replace ticket lock implementation with qspinlock · c1109047
      Will Deacon 提交于
      It's fair to say that our ticket lock has served us well over time, but
      it's time to bite the bullet and start using the generic qspinlock code
      so we can make use of explicit MCS queuing and potentially better PV
      performance in future.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      c1109047
    • W
      arm64: barrier: Implement smp_cond_load_relaxed · 598865c5
      Will Deacon 提交于
      We can provide an implementation of smp_cond_load_relaxed using READ_ONCE
      and __cmpwait_relaxed.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      598865c5
    • T
      x86/mm: Add TLB purge to free pmd/pte page interfaces · 5e0fb5df
      Toshi Kani 提交于
      ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
      a pud / pmd map.  The following preconditions are met at their entry.
       - All pte entries for a target pud/pmd address range have been cleared.
       - System-wide TLB purges have been peformed for a target pud/pmd address
         range.
      
      The preconditions assure that there is no stale TLB entry for the range.
      Speculation may not cache TLB entries since it requires all levels of page
      entries, including ptes, to have P & A-bits set for an associated address.
      However, speculation may cache pud/pmd entries (paging-structure caches)
      when they have P-bit set.
      
      Add a system-wide TLB purge (INVLPG) to a single page after clearing
      pud/pmd entry's P-bit.
      
      SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
      states that:
        INVLPG invalidates all paging-structure caches associated with the
        current PCID regardless of the liner addresses to which they correspond.
      
      Fixes: 28ee90fe ("x86/mm: implement free pmd/pte page interfaces")
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: mhocko@suse.com
      Cc: akpm@linux-foundation.org
      Cc: hpa@zytor.com
      Cc: cpandya@codeaurora.org
      Cc: linux-mm@kvack.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: stable@vger.kernel.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.com
      5e0fb5df
    • C
      ioremap: Update pgtable free interfaces with addr · 785a19f9
      Chintan Pandya 提交于
      The following kernel panic was observed on ARM64 platform due to a stale
      TLB entry.
      
       1. ioremap with 4K size, a valid pte page table is set.
       2. iounmap it, its pte entry is set to 0.
       3. ioremap the same address with 2M size, update its pmd entry with
          a new value.
       4. CPU may hit an exception because the old pmd entry is still in TLB,
          which leads to a kernel panic.
      
      Commit b6bdb751 ("mm/vmalloc: add interfaces to free unmapped page
      table") has addressed this panic by falling to pte mappings in the above
      case on ARM64.
      
      To support pmd mappings in all cases, TLB purge needs to be performed
      in this case on ARM64.
      
      Add a new arg, 'addr', to pud_free_pmd_page() and pmd_free_pte_page()
      so that TLB purge can be added later in seprate patches.
      
      [toshi.kani@hpe.com: merge changes, rewrite patch description]
      Fixes: 28ee90fe ("x86/mm: implement free pmd/pte page interfaces")
      Signed-off-by: NChintan Pandya <cpandya@codeaurora.org>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: mhocko@suse.com
      Cc: akpm@linux-foundation.org
      Cc: hpa@zytor.com
      Cc: linux-mm@kvack.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: stable@vger.kernel.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20180627141348.21777-3-toshi.kani@hpe.com
      785a19f9
    • T
      x86/mm: Disable ioremap free page handling on x86-PAE · f967db0b
      Toshi Kani 提交于
      ioremap() supports pmd mappings on x86-PAE.  However, kernel's pmd
      tables are not shared among processes on x86-PAE.  Therefore, any
      update to sync'd pmd entries need re-syncing.  Freeing a pte page
      also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().
      
      Disable free page handling on x86-PAE.  pud_free_pmd_page() and
      pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
      This assures that ioremap() does not update sync'd pmd entries at the
      cost of falling back to pte mappings.
      
      Fixes: 28ee90fe ("x86/mm: implement free pmd/pte page interfaces")
      Reported-by: NJoerg Roedel <joro@8bytes.org>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: mhocko@suse.com
      Cc: akpm@linux-foundation.org
      Cc: hpa@zytor.com
      Cc: cpandya@codeaurora.org
      Cc: linux-mm@kvack.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: stable@vger.kernel.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20180627141348.21777-2-toshi.kani@hpe.com
      f967db0b
    • M
      arm64: kexec: always reset to EL2 if present · 76f4e2da
      Mark Rutland 提交于
      Currently machine_kexec() doesn't reset to EL2 in the case of a
      crashdump kernel. This leaves potentially dodgy state active at EL2, and
      means that if the crashdump kernel attempts to online secondary CPUs,
      these will be booted as mismatched ELs.
      
      Let's reset to EL2, as we do in all other cases, and simplify things. If
      EL2 state is corrupt, things are already sufficiently bad that kdump is
      unlikely to work, and it's best-effort regardless.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      76f4e2da
    • M
      arm64: fix infinite stacktrace · 7e7df71f
      Mikulas Patocka 提交于
      I've got this infinite stacktrace when debugging another problem:
      [  908.795225] INFO: rcu_preempt detected stalls on CPUs/tasks:
      [  908.796176]  1-...!: (1 GPs behind) idle=952/1/4611686018427387904 softirq=1462/1462 fqs=355
      [  908.797692]  2-...!: (1 GPs behind) idle=f42/1/4611686018427387904 softirq=1550/1551 fqs=355
      [  908.799189]  (detected by 0, t=2109 jiffies, g=130, c=129, q=235)
      [  908.800284] Task dump for CPU 1:
      [  908.800871] kworker/1:1     R  running task        0    32      2 0x00000022
      [  908.802127] Workqueue: writecache-writeabck writecache_writeback [dm_writecache]
      [  908.820285] Call trace:
      [  908.824785]  __switch_to+0x68/0x90
      [  908.837661]  0xfffffe00603afd90
      [  908.844119]  0xfffffe00603afd90
      [  908.850091]  0xfffffe00603afd90
      [  908.854285]  0xfffffe00603afd90
      [  908.863538]  0xfffffe00603afd90
      [  908.865523]  0xfffffe00603afd90
      
      The machine just locked up and kept on printing the same line over and
      over again. This patch fixes it.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      7e7df71f
  6. 02 7月, 2018 1 次提交
  7. 29 6月, 2018 5 次提交
    • H
      parisc: Build kernel without -ffunction-sections · 24b6c225
      Helge Deller 提交于
      As suggested by Nick Piggin it seems we can drop the -ffunction-sections
      compile flag, now that the kernel uses thin archives. Testing with 32-
      and 64-bit kernel showed no difference in kernel size.
      Suggested-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      24b6c225
    • H
      parisc: Reduce debug output in unwind code · 63ba82c0
      Helge Deller 提交于
      Signed-off-by: NHelge Deller <deller@gmx.de>
      63ba82c0
    • N
      x86/e820: put !E820_TYPE_RAM regions into memblock.reserved · 124049de
      Naoya Horiguchi 提交于
      There is a kernel panic that is triggered when reading /proc/kpageflags
      on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
      
        BUG: unable to handle kernel paging request at fffffffffffffffe
        PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
        Oops: 0000 [#1] SMP PTI
        CPU: 2 PID: 1728 Comm: page-types Not tainted 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014
        RIP: 0010:stable_page_flags+0x27/0x3c0
        Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
        RSP: 0018:ffffbbd44111fde0 EFLAGS: 00010202
        RAX: fffffffffffffffe RBX: 00007fffffffeff9 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffffed1182fff5c0
        RBP: ffffffffffffffff R08: 0000000000000001 R09: 0000000000000001
        R10: ffffbbd44111fed8 R11: 0000000000000000 R12: ffffed1182fff5c0
        R13: 00000000000bffd7 R14: 0000000002fff5c0 R15: ffffbbd44111ff10
        FS:  00007efc4335a500(0000) GS:ffff93a5bfc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: fffffffffffffffe CR3: 00000000b2a58000 CR4: 00000000001406e0
        Call Trace:
         kpageflags_read+0xc7/0x120
         proc_reg_read+0x3c/0x60
         __vfs_read+0x36/0x170
         vfs_read+0x89/0x130
         ksys_pread64+0x71/0x90
         do_syscall_64+0x5b/0x160
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7efc42e75e23
        Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
      
      According to kernel bisection, this problem became visible due to commit
      f7f99100 ("mm: stop zeroing memory during allocation in vmemmap")
      which changes how struct pages are initialized.
      
      Memblock layout affects the pfn ranges covered by node/zone.  Consider
      that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
      the default (no memmap= given) memblock layout is like below:
      
        MEMBLOCK configuration:
         memory size = 0x00000001fff75c00 reserved size = 0x000000000300c000
         memory.cnt  = 0x4
         memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
         memory[0x1]     [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
         memory[0x2]     [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 0 flags: 0x0
         memory[0x3]     [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
         ...
      
      If you give memmap=1G!4G (so it just covers memory[0x2]),
      the range [0x100000000-0x13fffffff] is gone:
      
        MEMBLOCK configuration:
         memory size = 0x00000001bff75c00 reserved size = 0x000000000300c000
         memory.cnt  = 0x3
         memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
         memory[0x1]     [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
         memory[0x2]     [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
         ...
      
      This causes shrinking node 0's pfn range because it is calculated by the
      address range of memblock.memory.  So some of struct pages in the gap
      range are left uninitialized.
      
      We have a function zero_resv_unavail() which does zeroing the struct pages
      within the reserved unavailable range (i.e.  memblock.memory &&
      !memblock.reserved).  This patch utilizes it to cover all unavailable
      ranges by putting them into memblock.reserved.
      
      Link: http://lkml.kernel.org/r/20180615072947.GB23273@hori1.linux.bs1.fc.nec.co.jp
      Fixes: f7f99100 ("mm: stop zeroing memory during allocation in vmemmap")
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Tested-by: NOscar Salvador <osalvador@suse.de>
      Tested-by: N"Herton R. Krzesinski" <herton@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      124049de
    • O
      arm64: dts: hikey960: Define wl1837 power capabilities · a30449eb
      oscardagrach 提交于
      These properties are required for compatibility with runtime PM.
      Without these properties, MMC host controller will not be aware
      of power capabilities. When the wlcore driver attempts to power
      on the device, it will erroneously fail with -EACCES. This fixes
      a regression found here: https://lkml.org/lkml/2018/6/12/930
      
      Fixes: 60f36637 ("wlcore: sdio: allow pm to handle sdio power")
      Signed-off-by: NRyan Grachek <ryan@edited.us>
      Tested-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NJohn Stultz <john.stultz@linaro.org>
      Tested-by: NValentin Schneider <valentin.schneider@arm.com>
      Signed-off-by: NWei Xu <xuwei5@hisilicon.com>
      a30449eb
    • O
      arm64: dts: hikey: Define wl1835 power capabilities · f904390a
      oscardagrach 提交于
      These properties are required for compatibility with runtime PM.
      Without these properties, MMC host controller will not be aware
      of power capabilities. When the wlcore driver attempts to power
      on the device, it will erroneously fail with -EACCES.
      
      Fixes: 60f36637 ("wlcore: sdio: allow pm to handle sdio power")
      Signed-off-by: NRyan Grachek <ryan@edited.us>
      Tested-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: NWei Xu <xuwei5@hisilicon.com>
      f904390a