1. 12 5月, 2014 1 次提交
    • W
      arm64: debug: avoid accessing mdscr_el1 on fault paths where possible · 2a283070
      Will Deacon 提交于
      Since mdscr_el1 is part of the debug register group, it is highly likely
      to be trapped by a hypervisor to prevent virtual machines from debugging
      (buggering?) each other. Unfortunately, this absolutely destroys our
      performance, since we access the register on many of our low-level
      fault handling paths to keep track of the various debug state machines.
      
      This patch removes our dependency on mdscr_el1 in the case that debugging
      is not being used. More specifically we:
      
        - Use TIF_SINGLESTEP to indicate that a task is stepping at EL0 and
          avoid disabling step in the MDSCR when we don't need to.
          MDSCR_EL1.SS handling is moved to kernel_entry, when trapping from
          userspace.
      
        - Ensure debug exceptions are re-enabled on *all* exception entry
          paths, even the debug exception handling path (where we re-enable
          exceptions after invoking the handler). Since we can now rely on
          MDSCR_EL1.SS being cleared by the entry code, exception handlers can
          usually enable debug immediately before enabling interrupts.
      
        - Remove all debug exception unmasking from ret_to_user and
          el1_preempt, since we will never get here with debug exceptions
          masked.
      
      This results in a slight change to kernel debug behaviour, where we now
      step into interrupt handlers and data aborts from EL1 when debugging the
      kernel, which is actually a useful thing to do. A side-effect of this is
      that it *does* potentially prevent stepping off {break,watch}points when
      there is a high-frequency interrupt source (e.g. a timer), so a debugger
      would need to use either breakpoints or manually disable interrupts to
      get around this issue.
      
      With this patch applied, guest performance is restored under KVM when
      debug register accesses are trapped (and we get a measurable performance
      increase on the host on Cortex-A57 too).
      
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Tested-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      2a283070
  2. 10 5月, 2014 2 次提交
  3. 09 5月, 2014 6 次提交
  4. 04 5月, 2014 2 次提交
    • C
      arm64: Use bus notifiers to set per-device coherent DMA ops · 6ecba8eb
      Catalin Marinas 提交于
      Recently, the default DMA ops have been changed to non-coherent for
      alignment with 32-bit ARM platforms (and DT files). This patch adds bus
      notifiers to be able to set the coherent DMA ops (with no cache
      maintenance) for devices explicitly marked as coherent via the
      "dma-coherent" DT property.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      6ecba8eb
    • M
      arm64: fixmap: fix missing sub-page offset for earlyprintk · f774b7d1
      Marc Zyngier 提交于
      Commit d57c33c5 (add generic fixmap.h) added (among other
      similar things) set_fixmap_io to deal with early ioremap of devices.
      
      More recently, commit bf4b558e (arm64: add early_ioremap support)
      converted the arm64 earlyprintk to use set_fixmap_io. A side effect of
      this conversion is that my virtual machines have stopped booting when
      I pass "earlyprintk=uart8250-8bit,0x3f8" to the guest kernel.
      
      Turns out that the new earlyprintk code doesn't care at all about
      sub-page offsets, and just assumes that the earlyprintk device will
      be page-aligned. Obviously, that doesn't play well with the above example.
      
      Further investigation shows that set_fixmap_io uses __set_fixmap instead
      of __set_fixmap_offset. A fix is to introduce a set_fixmap_offset_io that
      uses the latter, and to remove the superflous call to fix_to_virt
      (which only returns the value that set_fixmap_io has already given us).
      
      With this applied, my VMs are back in business. Tested on a Cortex-A57
      platform with kvmtool as platform emulation.
      
      Cc: Will Deacon <will.deacon@arm.com>
      Acked-by: NMark Salter <msalter@redhat.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      f774b7d1
  5. 26 4月, 2014 1 次提交
  6. 25 4月, 2014 1 次提交
  7. 08 4月, 2014 2 次提交
  8. 07 4月, 2014 1 次提交
    • M
      arm64: fix !CONFIG_COMPAT build failures · ff268ff7
      Mark Salter 提交于
      Recent arm64 builds using CONFIG_ARM64_64K_PAGES are failing with:
      
        arch/arm64/kernel/perf_regs.c: In function ‘perf_reg_abi’:
        arch/arm64/kernel/perf_regs.c:41:2: error: implicit declaration of function ‘is_compat_thread’
      
        arch/arm64/kernel/perf_event.c:1398:2: error: unknown type name ‘compat_uptr_t’
      
      This is due to some recent arm64 perf commits with compat support:
      
        commit 23c7d70d:
          ARM64: perf: add support for frame pointer unwinding in compat mode
      
        commit 2ee0d7fd:
          ARM64: perf: add support for perf registers API
      
      Those patches make the arm64 kernel unbuildable if CONFIG_COMPAT is not
      defined and CONFIG_ARM64_64K_PAGES depends on !CONFIG_COMPAT. This patch
      allows the arm64 kernel to build with and without CONFIG_COMPAT.
      Signed-off-by: NMark Salter <msalter@redhat.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      ff268ff7
  9. 05 4月, 2014 1 次提交
  10. 20 3月, 2014 2 次提交
    • S
      arm64, debug-monitors: Fix CPU hotplug callback registration · 4b0b68af
      Srivatsa S. Bhat 提交于
      Subsystems that want to register CPU hotplug callbacks, as well as perform
      initialization for the CPUs that are already online, often do it as shown
      below:
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      This is wrong, since it is prone to ABBA deadlocks involving the
      cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
      with CPU hotplug operations).
      
      Instead, the correct and race-free way of performing the callback
      registration is:
      
      	cpu_notifier_register_begin();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* Note the use of the double underscored version of the API */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_notifier_register_done();
      
      Fix the debug-monitors code in arm64 by using this latter form of callback
      registration.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Ingo Molnar <mingo@kernel.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4b0b68af
    • S
      arm64, hw_breakpoint.c: Fix CPU hotplug callback registration · 3d0dc643
      Srivatsa S. Bhat 提交于
      Subsystems that want to register CPU hotplug callbacks, as well as perform
      initialization for the CPUs that are already online, often do it as shown
      below:
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      This is wrong, since it is prone to ABBA deadlocks involving the
      cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
      with CPU hotplug operations).
      
      Instead, the correct and race-free way of performing the callback
      registration is:
      
      	cpu_notifier_register_begin();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* Note the use of the double underscored version of the API */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_notifier_register_done();
      
      Fix the hw-breakpoint code in arm64 by using this latter form of callback
      registration.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      3d0dc643
  11. 13 3月, 2014 4 次提交
  12. 04 3月, 2014 3 次提交
  13. 28 2月, 2014 2 次提交
  14. 26 2月, 2014 7 次提交
  15. 23 2月, 2014 1 次提交
  16. 17 2月, 2014 1 次提交
    • O
      ARM64: unwind: Fix PC calculation · e306dfd0
      Olof Johansson 提交于
      The frame PC value in the unwind code used to just take the saved LR
      value and use that.  That's incorrect as a stack trace, since it shows
      the return path stack, not the call path stack.
      
      In particular, it shows faulty information in case the bl is done as
      the very last instruction of one label, since the return point will be
      in the next label. That can easily be seen with tail calls to panic(),
      which is marked __noreturn and thus doesn't have anything useful after it.
      
      Easiest here is to just correct the unwind code and do a -4, to get the
      actual call site for the backtrace instead of the return site.
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      e306dfd0
  17. 08 2月, 2014 1 次提交
    • W
      arm64: atomics: fix use of acquire + release for full barrier semantics · 8e86f0b4
      Will Deacon 提交于
      Linux requires a number of atomic operations to provide full barrier
      semantics, that is no memory accesses after the operation can be
      observed before any accesses up to and including the operation in
      program order.
      
      On arm64, these operations have been incorrectly implemented as follows:
      
      	// A, B, C are independent memory locations
      
      	<Access [A]>
      
      	// atomic_op (B)
      1:	ldaxr	x0, [B]		// Exclusive load with acquire
      	<op(B)>
      	stlxr	w1, x0, [B]	// Exclusive store with release
      	cbnz	w1, 1b
      
      	<Access [C]>
      
      The assumption here being that two half barriers are equivalent to a
      full barrier, so the only permitted ordering would be A -> B -> C
      (where B is the atomic operation involving both a load and a store).
      
      Unfortunately, this is not the case by the letter of the architecture
      and, in fact, the accesses to A and C are permitted to pass their
      nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
      or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
      store-release on B). This is a clear violation of the full barrier
      requirement.
      
      The simple way to fix this is to implement the same algorithm as ARMv7
      using explicit barriers:
      
      	<Access [A]>
      
      	// atomic_op (B)
      	dmb	ish		// Full barrier
      1:	ldxr	x0, [B]		// Exclusive load
      	<op(B)>
      	stxr	w1, x0, [B]	// Exclusive store
      	cbnz	w1, 1b
      	dmb	ish		// Full barrier
      
      	<Access [C]>
      
      but this has the undesirable effect of introducing *two* full barrier
      instructions. A better approach is actually the following, non-intuitive
      sequence:
      
      	<Access [A]>
      
      	// atomic_op (B)
      1:	ldxr	x0, [B]		// Exclusive load
      	<op(B)>
      	stlxr	w1, x0, [B]	// Exclusive store with release
      	cbnz	w1, 1b
      	dmb	ish		// Full barrier
      
      	<Access [C]>
      
      The simple observations here are:
      
        - The dmb ensures that no subsequent accesses (e.g. the access to C)
          can enter or pass the atomic sequence.
      
        - The dmb also ensures that no prior accesses (e.g. the access to A)
          can pass the atomic sequence.
      
        - Therefore, no prior access can pass a subsequent access, or
          vice-versa (i.e. A is strictly ordered before C).
      
        - The stlxr ensures that no prior access can pass the store component
          of the atomic operation.
      
      The only tricky part remaining is the ordering between the ldxr and the
      access to A, since the absence of the first dmb means that we're now
      permitting re-ordering between the ldxr and any prior accesses.
      
      From an (arbitrary) observer's point of view, there are two scenarios:
      
        1. We have observed the ldxr. This means that if we perform a store to
           [B], the ldxr will still return older data. If we can observe the
           ldxr, then we can potentially observe the permitted re-ordering
           with the access to A, which is clearly an issue when compared to
           the dmb variant of the code. Thankfully, the exclusive monitor will
           save us here since it will be cleared as a result of the store and
           the ldxr will retry. Notice that any use of a later memory
           observation to imply observation of the ldxr will also imply
           observation of the access to A, since the stlxr/dmb ensure strict
           ordering.
      
        2. We have not observed the ldxr. This means we can perform a store
           and influence the later ldxr. However, that doesn't actually tell
           us anything about the access to [A], so we've not lost anything
           here either when compared to the dmb variant.
      
      This patch implements this solution for our barriered atomic operations,
      ensuring that we satisfy the full barrier requirements where they are
      needed.
      
      Cc: <stable@vger.kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      8e86f0b4
  18. 05 2月, 2014 2 次提交