1. 20 12月, 2021 2 次提交
  2. 17 12月, 2021 1 次提交
  3. 08 12月, 2021 1 次提交
  4. 06 12月, 2021 1 次提交
  5. 01 12月, 2021 1 次提交
  6. 25 11月, 2021 1 次提交
  7. 24 11月, 2021 1 次提交
    • M
      arm64: uaccess: avoid blocking within critical sections · 94902d84
      Mark Rutland 提交于
      As Vincent reports in:
      
        https://lore.kernel.org/r/20211118163417.21617-1-vincent.whitchurch@axis.com
      
      The put_user() in schedule_tail() can get stuck in a livelock, similar
      to a problem recently fixed on riscv in commit:
      
        285a76bb ("riscv: evaluate put_user() arg before enabling user access")
      
      In __raw_put_user() we have a critical section between
      uaccess_ttbr0_enable() and uaccess_ttbr0_disable() where we cannot
      safely call into the scheduler without having taken an exception, as
      schedule() and other scheduling functions will not save/restore the
      TTBR0 state. If either of the `x` or `ptr` arguments to __raw_put_user()
      contain a blocking call, we may call into the scheduler within the
      critical section. This can result in two problems:
      
      1) The access within the critical section will occur without the
         required TTBR0 tables installed. This will fault, and where the
         required tables permit access, the access will be retried without the
         required tables, resulting in a livelock.
      
      2) When TTBR0 SW PAN is in use, check_and_switch_context() does not
         modify TTBR0, leaving a stale value installed. The mappings of the
         blocked task will erroneously be accessible to regular accesses in
         the context of the new task. Additionally, if the tables are
         subsequently freed, local TLB maintenance required to reuse the ASID
         may be lost, potentially resulting in TLB corruption (e.g. in the
         presence of CnP).
      
      The same issue exists for __raw_get_user() in the critical section
      between uaccess_ttbr0_enable() and uaccess_ttbr0_disable().
      
      A similar issue exists for __get_kernel_nofault() and
      __put_kernel_nofault() for the critical section between
      __uaccess_enable_tco_async() and __uaccess_disable_tco_async(), as the
      TCO state is not context-switched by direct calls into the scheduler.
      Here the TCO state may be lost from the context of the current task,
      resulting in unexpected asynchronous tag check faults. It may also be
      leaked to another task, suppressing expected tag check faults.
      
      To fix all of these cases, we must ensure that we do not directly call
      into the scheduler in their respective critical sections. This patch
      reworks __raw_put_user(), __raw_get_user(), __get_kernel_nofault(), and
      __put_kernel_nofault(), ensuring that parameters are evaluated outside
      of the critical sections. To make this requirement clear, comments are
      added describing the problem, and line spaces added to separate the
      critical sections from other portions of the macros.
      
      For __raw_get_user() and __raw_put_user() the `err` parameter is
      conditionally assigned to, and we must currently evaluate this in the
      critical section. This behaviour is relied upon by the signal code,
      which uses chains of put_user_error() and get_user_error(), checking the
      return value at the end. In all cases, the `err` parameter is a plain
      int rather than a more complex expression with a blocking call, so this
      is safe.
      
      In future we should try to clean up the `err` usage to remove the
      potential for this to be a problem.
      
      Aside from the changes to time of evaluation, there should be no
      functional change as a result of this patch.
      Reported-by: NVincent Whitchurch <vincent.whitchurch@axis.com>
      Link: https://lore.kernel.org/r/20211118163417.21617-1-vincent.whitchurch@axis.com
      Fixes: f253d827 ("arm64: uaccess: refactor __{get,put}_user")
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Link: https://lore.kernel.org/r/20211122125820.55286-1-mark.rutland@arm.comSigned-off-by: NWill Deacon <will@kernel.org>
      94902d84
  8. 18 11月, 2021 1 次提交
  9. 16 11月, 2021 2 次提交
    • P
      arm64: mm: Fix VM_BUG_ON(mm != &init_mm) for trans_pgd · d3eb70ea
      Pingfan Liu 提交于
      trans_pgd_create_copy() can hit "VM_BUG_ON(mm != &init_mm)" in the
      function pmd_populate_kernel().
      
      This is the combined consequence of commit 5de59884 ("arm64:
      trans_pgd: pass NULL instead of init_mm to *_populate functions"), which
      replaced &init_mm with NULL and commit 59511cfd ("arm64: mm: use XN
      table mapping attributes for user/kernel mappings"), which introduced
      the VM_BUG_ON.
      
      Since the former sounds reasonable, it is better to work on the later.
      From the perspective of trans_pgd, two groups of functions are
      considered in the later one:
      
        pmd_populate_kernel()
          mm == NULL should be fixed, else it hits VM_BUG_ON()
        p?d_populate()
          mm == NULL means PXN, that is OK, since trans_pgd only copies a
          linear map, no execution will happen on the map.
      
      So it is good enough to just relax VM_BUG_ON() to disregard mm == NULL
      
      Fixes: 59511cfd ("arm64: mm: use XN table mapping attributes for user/kernel mappings")
      Signed-off-by: NPingfan Liu <kernelfans@gmail.com>
      Cc: <stable@vger.kernel.org> # 5.13.x
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Matthias Brugger <mbrugger@suse.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: NPasha Tatashin <pasha.tatashin@soleen.com>
      Link: https://lore.kernel.org/r/20211112052214.9086-1-kernelfans@gmail.comSigned-off-by: NWill Deacon <will@kernel.org>
      d3eb70ea
    • M
      arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR · c6d3cd32
      Mark Rutland 提交于
      When CONFIG_FUNCTION_GRAPH_TRACER is selected and the function graph
      tracer is in use, unwind_frame() may erroneously associate a traced
      function with an incorrect return address. This can happen when starting
      an unwind from a pt_regs, or when unwinding across an exception
      boundary.
      
      This can be seen when recording with perf while the function graph
      tracer is in use. For example:
      
      | # echo function_graph > /sys/kernel/debug/tracing/current_tracer
      | # perf record -g -e raw_syscalls:sys_enter:k /bin/true
      | # perf report
      
      ... reports the callchain erroneously as:
      
      | el0t_64_sync
      | el0t_64_sync_handler
      | el0_svc_common.constprop.0
      | perf_callchain
      | get_perf_callchain
      | syscall_trace_enter
      | syscall_trace_enter
      
      ... whereas when the function graph tracer is not in use, it reports:
      
      | el0t_64_sync
      | el0t_64_sync_handler
      | el0_svc
      | do_el0_svc
      | el0_svc_common.constprop.0
      | syscall_trace_enter
      | syscall_trace_enter
      
      The underlying problem is that ftrace_graph_get_ret_stack() takes an
      index offset from the most recent entry added to the fgraph return
      stack. We start an unwind at offset 0, and increment the offset each
      time we encounter a rewritten return address (i.e. when we see
      `return_to_handler`). This is broken in two cases:
      
      1) Between creating a pt_regs and starting the unwind, function calls
         may place entries on the stack, leaving an arbitrary offset which we
         can only determine by performing a full unwind from the caller of the
         unwind code (and relying on none of the unwind code being
         instrumented).
      
         This can result in erroneous entries being reported in a backtrace
         recorded by perf or kfence when the function graph tracer is in use.
         Currently show_regs() is unaffected as dump_backtrace() performs an
         initial unwind.
      
      2) When unwinding across an exception boundary (whether continuing an
         unwind or starting a new unwind from regs), we currently always skip
         the LR of the interrupted context. Where this was live and contained
         a rewritten address, we won't consume the corresponding fgraph ret
         stack entry, leaving subsequent entries off-by-one.
      
         This can result in erroneous entries being reported in a backtrace
         performed by any in-kernel unwinder when that backtrace crosses an
         exception boundary, with entries after the boundary being reported
         incorrectly. This includes perf, kfence, show_regs(), panic(), etc.
      
      To fix this, we need to be able to uniquely identify each rewritten
      return address such that we can map this back to the original return
      address. We can use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR to associate
      each rewritten return address with a unique location on the stack. As
      the return address is passed in the LR (and so is not guaranteed a
      unique location in memory), we use the FP upon entry to the function
      (i.e. the address of the caller's frame record) as the return address
      pointer. Any nested call will have a different FP value as the caller
      must create its own frame record and update FP to point to this.
      
      Since ftrace_graph_ret_addr() requires the return address with the PAC
      stripped, the stripping of the PAC is moved before the fixup of the
      rewritten address. As we would unconditionally strip the PAC, moving
      this earlier is not harmful, and we can avoid a redundant strip in the
      return address fixup code.
      
      I've tested this with the perf case above, the ftrace selftests, and
      a number of ad-hoc unwinder tests. The tests all pass, and I have seen
      no unexpected behaviour as a result of this change. I've tested with
      pointer authentication under QEMU TCG where magic-sysrq+l correctly
      recovers the original return addresses.
      
      Note that this doesn't fix the issue of skipping a live LR at an
      exception boundary, which is a more general problem and requires more
      substantial rework. Were we to consume the LR in all cases this would
      result in warnings where the interrupted context's LR contains
      `return_to_handler`, but the FP has been altered, e.g.
      
      | func:
      |	<--- ftrace entry ---> 	// logs FP & LR, rewrites LR
      | 	STP	FP, LR, [SP, #-16]!
      | 	MOV	FP, SP
      | 	<--- INTERRUPT --->
      
      ... as ftrace_graph_get_ret_stack() fill not find a matching entry,
      triggering the WARN_ON_ONCE() in unwind_frame().
      
      Link: https://lore.kernel.org/r/20211025164925.GB2001@C02TD0UTHF1T.local
      Link: https://lore.kernel.org/r/20211027132529.30027-1-mark.rutland@arm.comSigned-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Deacon <will@kernel.org>
      Reviewed-by: NMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/20211029162245.39761-1-mark.rutland@arm.comSigned-off-by: NWill Deacon <will@kernel.org>
      c6d3cd32
  10. 12 11月, 2021 1 次提交
  11. 09 11月, 2021 1 次提交
    • Q
      KVM: arm64: Fix host stage-2 finalization · 50a8d331
      Quentin Perret 提交于
      We currently walk the hypervisor stage-1 page-table towards the end of
      hyp init in nVHE protected mode and adjust the host page ownership
      attributes in its stage-2 in order to get a consistent state from both
      point of views. The walk is done on the entire hyp VA space, and expects
      to only ever find page-level mappings. While this expectation is
      reasonable in the half of hyp VA space that maps memory with a fixed
      offset (see the loop in pkvm_create_mappings_locked()), it can be
      incorrect in the other half where nothing prevents the usage of block
      mappings. For instance, on systems where memory is physically aligned at
      an address that happens to maps to a PMD aligned VA in the hyp_vmemmap,
      kvm_pgtable_hyp_map() will install block mappings when backing the
      hyp_vmemmap, which will later cause finalize_host_mappings() to fail.
      Furthermore, it should be noted that all pages backing the hyp_vmemmap
      are also mapped in the 'fixed offset range' of the hypervisor, which
      implies that finalize_host_mappings() will walk both aliases and update
      the host stage-2 attributes twice. The order in which this happens is
      unpredictable, though, since the hyp VA layout is highly dependent on
      the position of the idmap page, hence resulting in a fragile mess at
      best.
      
      In order to fix all of this, let's restrict the finalization walk to
      only cover memory regions in the 'fixed-offset range' of the hyp VA
      space and nothing else. This not only fixes a correctness issue, but
      will also result in a slighlty faster hyp initialization overall.
      
      Fixes: 2c50166c ("KVM: arm64: Mark host bss and rodata section as shared")
      Signed-off-by: NQuentin Perret <qperret@google.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211108154636.393384-1-qperret@google.com
      50a8d331
  12. 08 11月, 2021 7 次提交
  13. 07 11月, 2021 3 次提交
  14. 05 11月, 2021 1 次提交
  15. 02 11月, 2021 1 次提交
  16. 28 10月, 2021 3 次提交
  17. 27 10月, 2021 5 次提交
  18. 26 10月, 2021 7 次提交