1. 01 7月, 2016 2 次提交
  2. 28 6月, 2016 2 次提交
  3. 22 6月, 2016 2 次提交
    • M
      arm64: kill ESR_LNX_EXEC · 541ec870
      Mark Rutland 提交于
      Currently we treat ESR_EL1 bit 24 as software-defined for distinguishing
      instruction aborts from data aborts, but this bit is architecturally
      RES0 for instruction aborts, and could be allocated for an arbitrary
      purpose in future. Additionally, we hard-code the value in entry.S
      without the mnemonic, making the code difficult to understand.
      
      Instead, remove ESR_LNX_EXEC, and distinguish aborts based on the esr,
      which we already pass to the sole use of ESR_LNX_EXEC. A new helper,
      is_el0_instruction_abort() is added to make the logic clear. Any
      instruction aborts taken from EL1 will already have been handled by
      bad_mode, so we need not handle that case in the helper.
      
      For consistency, the existing permission_fault helper is renamed to
      is_permission_fault, and the return type is changed to bool. There
      should be no functional changes as the return value was a boolean
      expression, and the result is only used in another boolean expression.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Dave P Martin <dave.martin@arm.com>
      Cc: Huang Shijie <shijie.huang@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      541ec870
    • M
      arm64: add macro to extract ESR_ELx.EC · 275f344b
      Mark Rutland 提交于
      Several places open-code extraction of the EC field from an ESR_ELx
      value, in subtly different ways. This is unfortunate duplication and
      variation, and the precise logic used to extract the field is a
      distraction.
      
      This patch adds a new macro, ESR_ELx_EC(), to extract the EC field from
      an ESR_ELx value in a consistent fashion.
      
      Existing open-coded extractions in core arm64 code are moved over to the
      new helper. KVM code is left as-is for the moment.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Tested-by: NHuang Shijie <shijie.huang@arm.com>
      Cc: Dave P Martin <dave.martin@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      275f344b
  4. 21 6月, 2016 2 次提交
  5. 14 6月, 2016 1 次提交
    • M
      arm64: mm: mark fault_info table const · bbb1681e
      Mark Rutland 提交于
      Unlike the debug_fault_info table, we never intentionally alter the
      fault_info table at runtime, and all derived pointers are treated as
      const currently.
      
      Make the table const so that it can be placed in .rodata and protected
      from unintentional writes, as we do for the syscall tables.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      bbb1681e
  6. 08 6月, 2016 1 次提交
    • W
      arm64: mm: always take dirty state from new pte in ptep_set_access_flags · 0106d456
      Will Deacon 提交于
      Commit 66dbd6e6 ("arm64: Implement ptep_set_access_flags() for
      hardware AF/DBM") ensured that pte flags are updated atomically in the
      face of potential concurrent, hardware-assisted updates. However, Alex
      reports that:
      
       | This patch breaks swapping for me.
       | In the broken case, you'll see either systemd cpu time spike (because
       | it's stuck in a page fault loop) or the system hang (because the
       | application owning the screen is stuck in a page fault loop).
      
      It turns out that this is because the 'dirty' argument to
      ptep_set_access_flags is always 0 for read faults, and so we can't use
      it to set PTE_RDONLY. The failing sequence is:
      
        1. We put down a PTE_WRITE | PTE_DIRTY | PTE_AF pte
        2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF
        3. A read faults due to the missing access flag
        4. ptep_set_access_flags is called with dirty = 0, due to the read fault
        5. pte is then made PTE_WRITE | PTE_DIRTY | PTE_AF | PTE_RDONLY (!)
        6. A write faults, but pte_write is true so we get stuck
      
      The solution is to check the new page table entry (as would be done by
      the generic, non-atomic definition of ptep_set_access_flags that just
      calls set_pte_at) to establish the dirty state.
      
      Cc: <stable@vger.kernel.org> # 4.3+
      Fixes: 66dbd6e6 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM")
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: NAlexander Graf <agraf@suse.de>
      Tested-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      0106d456
  7. 03 6月, 2016 1 次提交
    • M
      arm64: mm: dump: log span level · 48dd73c5
      Mark Rutland 提交于
      The page table dump code logs spans of entries at the same level
      (pgd/pud/pmd/pte) which have the same attributes. While we log the
      (decoded) attributes, we don't log the level, which leaves the output
      ambiguous and/or confusing in some cases.
      
      For example:
      
      0xffff800800000000-0xffff800980000000           6G       RW NX SHD AF        BLK UXN MEM/NORMAL
      
      If using 4K pages, this may describe a span of 6 1G block entries at the
      PGD/PUD level, or 3072 2M block entries at the PMD level.
      
      This patch adds the page table level to each output line, removing this
      ambiguity. For the example above, this will produce:
      
      0xffffffc800000000-0xffffffc980000000           6G PUD       RW NX SHD AF        BLK UXN MEM/NORMAL
      
      When 3 level tables are in use, and we use the asm-generic/nopud.h
      definitions, the dump code treats each entry in the PGD as a 1 element
      table at the PUD level, and logs spans as being PUDs, which can be
      confusing. To counteract this, the "PUD" mnemonic is replaced with "PGD"
      when CONFIG_PGTABLE_LEVELS <= 3. Likewise for "PMD" when
      CONFIG_PGTABLE_LEVELS <= 2.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Huang Shijie <shijie.huang@arm.com>
      Cc: Laura Abbott <labbott@fedoraproject.org>
      Cc: Steve Capper <steve.capper@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      48dd73c5
  8. 31 5月, 2016 1 次提交
  9. 20 5月, 2016 1 次提交
  10. 09 5月, 2016 2 次提交
  11. 05 5月, 2016 1 次提交
  12. 28 4月, 2016 2 次提交
  13. 25 4月, 2016 2 次提交
  14. 22 4月, 2016 2 次提交
  15. 20 4月, 2016 2 次提交
  16. 19 4月, 2016 1 次提交
  17. 16 4月, 2016 4 次提交
    • C
      arm64: Implement ptep_set_access_flags() for hardware AF/DBM · 66dbd6e6
      Catalin Marinas 提交于
      When hardware updates of the access and dirty states are enabled, the
      default ptep_set_access_flags() implementation based on calling
      set_pte_at() directly is potentially racy. This triggers the "racy dirty
      state clearing" warning in set_pte_at() because an existing writable PTE
      is overridden with a clean entry.
      
      There are two main scenarios for this situation:
      
      1. The CPU getting an access fault does not support hardware updates of
         the access/dirty flags. However, a different agent in the system
         (e.g. SMMU) can do this, therefore overriding a writable entry with a
         clean one could potentially lose the automatically updated dirty
         status
      
      2. A more complex situation is possible when all CPUs support hardware
         AF/DBM:
      
         a) Initial state: shareable + writable vma and pte_none(pte)
         b) Read fault taken by two threads of the same process on different
            CPUs
         c) CPU0 takes the mmap_sem and proceeds to handling the fault. It
            eventually reaches do_set_pte() which sets a writable + clean pte.
            CPU0 releases the mmap_sem
         d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The
            pte entry it reads is present, writable and clean and it continues
            to pte_mkyoung()
         e) CPU1 calls ptep_set_access_flags()
      
         If between (d) and (e) the hardware (another CPU) updates the dirty
         state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit
         marking the entry clean again.
      
      This patch implements an arm64-specific ptep_set_access_flags() function
      to perform an atomic update of the PTE flags.
      
      Fixes: 2f4b829c ("arm64: Add support for hardware updates of the access and dirty pte bits")
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: NMing Lei <tom.leiming@gmail.com>
      Tested-by: NJulien Grall <julien.grall@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: <stable@vger.kernel.org> # 4.3+
      [will: reworded comment]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      66dbd6e6
    • G
      arm64, numa: Add NUMA support for arm64 platforms. · 1a2db300
      Ganapatrao Kulkarni 提交于
      Attempt to get the memory and CPU NUMA node via of_numa.  If that
      fails, default the dummy NUMA node and map all memory and CPUs to node
      0.
      Tested-by: NShannon Zhao <shannon.zhao@linaro.org>
      Reviewed-by: NRobert Richter <rrichter@cavium.com>
      Signed-off-by: NGanapatrao Kulkarni <gkulkarni@caviumnetworks.com>
      Signed-off-by: NDavid Daney <david.daney@cavium.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      1a2db300
    • D
      arm64: Move unflatten_device_tree() call earlier. · 3194ac6e
      David Daney 提交于
      In order to extract NUMA information from the device tree, we need to
      have the tree in its unflattened form.
      
      Move the call to bootmem_init() in the tail of paging_init() into
      setup_arch, and adjust header files so that its declaration is
      visible.
      
      Move the unflatten_device_tree() call between the calls to
      paging_init() and bootmem_init().  Follow on patches add NUMA handling
      to bootmem_init().
      Signed-off-by: NDavid Daney <david.daney@cavium.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      3194ac6e
    • S
      arm64: Add cpu_panic_kernel helper · 17eebd1a
      Suzuki K Poulose 提交于
      During the activation of a secondary CPU, we could report serious
      configuration issues and hence request to crash the kernel. We do
      this for CPU ASID bit check now. We will need it also for handling
      mismatched exception levels for the CPUs with VHE. Hence, add a
      helper to do the same for reusability.
      
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      17eebd1a
  18. 15 4月, 2016 3 次提交
    • J
      arm64: mm: Add trace_irqflags annotations to do_debug_exception() · 6afedcd2
      James Morse 提交于
      With CONFIG_PROVE_LOCKING, CONFIG_DEBUG_LOCKDEP and CONFIG_TRACE_IRQFLAGS
      enabled, lockdep will compare current->hardirqs_enabled with the flags from
      local_irq_save().
      
      When a debug exception occurs, interrupts are disabled in entry.S, but
      lockdep isn't told, resulting in:
      DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
      ------------[ cut here ]------------
      WARNING: at ../kernel/locking/lockdep.c:3523
      Modules linked in:
      CPU: 3 PID: 1752 Comm: perf Not tainted 4.5.0-rc4+ #2204
      Hardware name: ARM Juno development board (r1) (DT)
      task: ffffffc974868000 ti: ffffffc975f40000 task.ti: ffffffc975f40000
      PC is at check_flags.part.35+0x17c/0x184
      LR is at check_flags.part.35+0x17c/0x184
      pc : [<ffffff80080fc93c>] lr : [<ffffff80080fc93c>] pstate: 600003c5
      [...]
      ---[ end trace 74631f9305ef5020 ]---
      Call trace:
      [<ffffff80080fc93c>] check_flags.part.35+0x17c/0x184
      [<ffffff80080ffe30>] lock_acquire+0xa8/0xc4
      [<ffffff8008093038>] breakpoint_handler+0x118/0x288
      [<ffffff8008082434>] do_debug_exception+0x3c/0xa8
      [<ffffff80080854b4>] el1_dbg+0x18/0x6c
      [<ffffff80081e82f4>] do_filp_open+0x64/0xdc
      [<ffffff80081d6e60>] do_sys_open+0x140/0x204
      [<ffffff80081d6f58>] SyS_openat+0x10/0x18
      [<ffffff8008085d30>] el0_svc_naked+0x24/0x28
      possible reason: unannotated irqs-off.
      irq event stamp: 65857
      hardirqs last  enabled at (65857): [<ffffff80081fb1c0>] lookup_mnt+0xf4/0x1b4
      hardirqs last disabled at (65856): [<ffffff80081fb188>] lookup_mnt+0xbc/0x1b4
      softirqs last  enabled at (65790): [<ffffff80080bdca4>] __do_softirq+0x1f8/0x290
      softirqs last disabled at (65757): [<ffffff80080be038>] irq_exit+0x9c/0xd0
      
      This patch adds the annotations to do_debug_exception(), while trying not
      to call trace_hardirqs_off() if el1_dbg() interrupted a task that already
      had irqs disabled.
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      6afedcd2
    • A
      arm64: cover the .head.text section in the .text segment mapping · 7eb90f2f
      Ard Biesheuvel 提交于
      Keeping .head.text out of the .text mapping buys us very little: its actual
      payload is only 4 KB, most of which is padding, but the page alignment may
      add up to 2 MB (in case of CONFIG_DEBUG_ALIGN_RODATA=y) of additional
      padding to the uncompressed kernel Image.
      
      Also, on 4 KB granule kernels, the 4 KB misalignment of .text forces us to
      map the adjacent 56 KB of code without the PTE_CONT attribute, and since
      this region contains things like the vector table and the GIC interrupt
      handling entry point, this region is likely to benefit from the reduced TLB
      pressure that results from PTE_CONT mappings.
      
      So remove the alignment between the .head.text and .text sections, and use
      the [_text, _etext) rather than the [_stext, _etext) interval for mapping
      the .text segment.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      7eb90f2f
    • A
      arm64: use 'segment' rather than 'chunk' to describe mapped kernel regions · 2c09ec06
      Ard Biesheuvel 提交于
      Replace the poorly defined term chunk with segment, which is a term that is
      already used by the ELF spec to describe contiguous mappings with the same
      permission attributes of statically allocated ranges of an executable.
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      2c09ec06
  19. 14 4月, 2016 4 次提交
    • A
      arm64: mm: move vmemmap region right below the linear region · 3e1907d5
      Ard Biesheuvel 提交于
      This moves the vmemmap region right below PAGE_OFFSET, aka the start
      of the linear region, and redefines its size to be a power of two.
      Due to the placement of PAGE_OFFSET in the middle of the address space,
      whose size is a power of two as well, this guarantees that virt to
      page conversions and vice versa can be implemented efficiently, by
      masking and shifting rather than ordinary arithmetic.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      3e1907d5
    • A
      arm64: mm: free __init memory via the linear mapping · d386825c
      Ard Biesheuvel 提交于
      The implementation of free_initmem_default() expects __init_begin
      and __init_end to be covered by the linear mapping, which is no
      longer the case. So open code it instead, using addresses that are
      explicitly translated from kernel virtual to linear virtual.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      d386825c
    • A
      arm64: add the initrd region to the linear mapping explicitly · 177e15f0
      Ard Biesheuvel 提交于
      Instead of going out of our way to relocate the initrd if it turns out
      to occupy memory that is not covered by the linear mapping, just add the
      initrd to the linear mapping. This puts the burden on the bootloader to
      pass initrd= and mem= options that are mutually consistent.
      
      Note that, since the placement of the linear region in the PA space is
      also dependent on the placement of the kernel Image, which may reside
      anywhere in memory, we may still end up with a situation where the initrd
      and the kernel Image are simply too far apart to be covered by the linear
      region.
      
      Since we now leave it up to the bootloader to pass the initrd in memory
      that is guaranteed to be accessible by the kernel, add a mention of this to
      the arm64 boot protocol specification as well.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      177e15f0
    • A
      arm64/mm: ensure memstart_addr remains sufficiently aligned · 2958987f
      Ard Biesheuvel 提交于
      After choosing memstart_addr to be the highest multiple of
      ARM64_MEMSTART_ALIGN less than or equal to the first usable physical memory
      address, we clip the memblocks to the maximum size of the linear region.
      Since the kernel may be high up in memory, we take care not to clip the
      kernel itself, which means we have to clip some memory from the bottom if
      this occurs, to ensure that the distance between the first and the last
      usable physical memory address can be covered by the linear region.
      
      However, we fail to update memstart_addr if this clipping from the bottom
      occurs, which means that we may still end up with virtual addresses that
      wrap into the userland range. So increment memstart_addr as appropriate to
      prevent this from happening.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      2958987f
  20. 25 3月, 2016 2 次提交
    • M
      arm64: mm: allow preemption in copy_to_user_page · 691b1e2e
      Mark Rutland 提交于
      Currently we disable preemption in copy_to_user_page; a behaviour that
      we inherited from the 32-bit arm code. This was necessary for older
      cores without broadcast data cache maintenance, and ensured that cache
      lines were dirtied and cleaned by the same CPU. On these systems dirty
      cache line migration was not possible, so this was sufficient to
      guarantee coherency.
      
      On contemporary systems, cache coherence protocols permit (dirty) cache
      lines to migrate between CPUs as a result of speculation, prefetching,
      and other behaviours. To account for this, in ARMv8 data cache
      maintenance operations are broadcast and affect all data caches in the
      domain associated with the VA (i.e. ISH for kernel and user mappings).
      
      In __switch_to we ensure that tasks can be safely migrated in the middle
      of a maintenance sequence, using a dsb(ish) to ensure prior explicit
      memory accesses are observed and cache maintenance operations are
      completed before a task can be run on another CPU.
      
      Given the above, it is not necessary to disable preemption in
      copy_to_user_page. This patch removes the preempt_{disable,enable}
      calls, permitting preemption.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      691b1e2e
    • M
      arm64: consistently use p?d_set_huge · c661cb1c
      Mark Rutland 提交于
      Commit 324420bf ("arm64: add support for ioremap() block
      mappings") added new p?d_set_huge functions which do the hard work to
      generate and set a correct block entry.
      
      These differ from open-coded huge page creation in the early page table
      code by explicitly setting the P?D_TYPE_SECT bits (which are implicitly
      retained by mk_sect_prot() for any valid prot), but are otherwise
      identical (and cannot fail on arm64).
      
      For simplicity and consistency, make use of these in the initial page
      table creation code.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      c661cb1c
  21. 21 3月, 2016 1 次提交
  22. 18 3月, 2016 1 次提交