1. 07 1月, 2016 1 次提交
    • M
      arm64: head.S: use memset to clear BSS · 2a803c4d
      Mark Rutland 提交于
      Currently we use an open-coded memzero to clear the BSS. As it is a
      trivial implementation, it is sub-optimal.
      
      Our optimised memset doesn't use the stack, is position-independent, and
      for the memzero case can use of DC ZVA to clear large blocks
      efficiently. In __mmap_switched the MMU is on and there are no live
      caller-saved registers, so we can safely call an uninstrumented memset.
      
      This patch changes __mmap_switched to use memset when clearing the BSS.
      We use the __pi_memset alias so as to avoid any instrumentation in all
      kernel configurations.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      2a803c4d
  2. 06 1月, 2016 2 次提交
  3. 05 1月, 2016 3 次提交
    • W
      arm64: mm: move pgd_cache initialisation to pgtable_cache_init · 39b5be9b
      Will Deacon 提交于
      Initialising the suppport for EFI runtime services requires us to
      allocate a pgd off the back of an early_initcall. On systems where the
      PGD_SIZE is smaller than PAGE_SIZE (e.g. 64k pages and 48-bit VA), the
      pgd_cache isn't initialised at this stage, and we panic with a NULL
      dereference during boot:
      
        Unable to handle kernel NULL pointer dereference at virtual address 00000000
      
        __create_mapping.isra.5+0x84/0x350
        create_pgd_mapping+0x20/0x28
        efi_create_mapping+0x5c/0x6c
        arm_enable_runtime_services+0x154/0x1e4
        do_one_initcall+0x8c/0x190
        kernel_init_freeable+0x84/0x1ec
        kernel_init+0x10/0xe0
        ret_from_fork+0x10/0x50
      
      This patch fixes the problem by initialising the pgd_cache earlier, in
      the pgtable_cache_init callback, which sounds suspiciously like what it
      was intended for.
      Reported-by: NDennis Chen <dennis.chen@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      39b5be9b
    • A
      arm64: module: avoid undefined shift behavior in reloc_data() · f9308969
      Ard Biesheuvel 提交于
      Compilers may engage the improbability drive when encountering shifts
      by a distance that is a multiple of the size of the operand type. Since
      the required bounds check is very simple here, we can get rid of all the
      fuzzy masking, shifting and comparing, and use the documented bounds
      directly.
      Reported-by: NDavid Binderman <dcb314@hotmail.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      f9308969
    • A
      arm64: module: fix relocation of movz instruction with negative immediate · b24a5575
      Ard Biesheuvel 提交于
      The test whether a movz instruction with a signed immediate should be
      turned into a movn instruction (i.e., when the immediate is negative)
      is flawed, since the value of imm is always positive. Also, the
      subsequent bounds check is incorrect since the limit update never
      executes, due to the fact that the imm_type comparison will always be
      false for negative signed immediates.
      
      Let's fix this by performing the sign test on sval directly, and
      replacing the bounds check with a simple comparison against U16_MAX.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      [will: tidied up use of sval, renamed MOVK enum value to MOVKZ]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      b24a5575
  4. 22 12月, 2015 6 次提交
    • W
      arm64: traps: address fallout from printk -> pr_* conversion · c9cd0ed9
      Will Deacon 提交于
      Commit ac7b406c ("arm64: Use pr_* instead of printk") was a fairly
      mindless s/printk/pr_*/ change driven by a complaint from checkpatch.
      
      As is usual with such changes, this has led to some odd behaviour on
      arm64:
      
        * syslog now picks up the "pr_emerg" line from dump_backtrace, but not
          the actual trace, which leads to a bunch of "kernel:Call trace:"
          lines in the log
      
        * __{pte,pmd,pgd}_error print at KERN_CRIT, as opposed to KERN_ERR
          which is used by other architectures.
      
      This patch restores the original printk behaviour for dump_backtrace
      and downgrade the pgtable error macros to KERN_ERR.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      c9cd0ed9
    • A
      arm64: ftrace: fix a stack tracer's output under function graph tracer · 20380bb3
      AKASHI Takahiro 提交于
      Function graph tracer modifies a return address (LR) in a stack frame
      to hook a function return. This will result in many useless entries
      (return_to_handler) showing up in
       a) a stack tracer's output
       b) perf call graph (with perf record -g)
       c) dump_backtrace (at panic et al.)
      
      For example, in case of a),
        $ echo function_graph > /sys/kernel/debug/tracing/current_tracer
        $ echo 1 > /proc/sys/kernel/stack_trace_enabled
        $ cat /sys/kernel/debug/tracing/stack_trace
              Depth    Size   Location    (54 entries)
              -----    ----   --------
        0)     4504      16   gic_raise_softirq+0x28/0x150
        1)     4488      80   smp_cross_call+0x38/0xb8
        2)     4408      48   return_to_handler+0x0/0x40
        3)     4360      32   return_to_handler+0x0/0x40
        ...
      
      In case of b),
        $ echo function_graph > /sys/kernel/debug/tracing/current_tracer
        $ perf record -e mem:XXX:x -ag -- sleep 10
        $ perf report
                        ...
                        |          |          |--0.22%-- 0x550f8
                        |          |          |          0x10888
                        |          |          |          el0_svc_naked
                        |          |          |          sys_openat
                        |          |          |          return_to_handler
                        |          |          |          return_to_handler
                        ...
      
      In case of c),
        $ echo function_graph > /sys/kernel/debug/tracing/current_tracer
        $ echo c > /proc/sysrq-trigger
        ...
        Call trace:
        [<ffffffc00044d3ac>] sysrq_handle_crash+0x24/0x30
        [<ffffffc000092250>] return_to_handler+0x0/0x40
        [<ffffffc000092250>] return_to_handler+0x0/0x40
        ...
      
      This patch replaces such entries with real addresses preserved in
      current->ret_stack[] at unwind_frame(). This way, we can cover all
      the cases.
      Reviewed-by: NJungseok Lee <jungseoklee85@gmail.com>
      Signed-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org>
      [will: fixed minor context changes conflicting with irq stack bits]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      20380bb3
    • A
      arm64: pass a task parameter to unwind_frame() · fe13f95b
      AKASHI Takahiro 提交于
      Function graph tracer modifies a return address (LR) in a stack frame
      to hook a function's return. This will result in many useless entries
      (return_to_handler) showing up in a call stack list.
      We will fix this problem in a later patch ("arm64: ftrace: fix a stack
      tracer's output under function graph tracer"). But since real return
      addresses are saved in ret_stack[] array in struct task_struct,
      unwind functions need to be notified of, in addition to a stack pointer
      address, which task is being traced in order to find out real return
      addresses.
      
      This patch extends unwind functions' interfaces by adding an extra
      argument of a pointer to task_struct.
      Signed-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      fe13f95b
    • A
      arm64: ftrace: modify a stack frame in a safe way · 79fdee9b
      AKASHI Takahiro 提交于
      Function graph tracer modifies a return address (LR) in a stack frame by
      calling ftrace_prepare_return() in a traced function's function prologue.
      The current code does this modification before preserving an original
      address at ftrace_push_return_trace() and there is always a small window
      of inconsistency when an interrupt occurs.
      
      This doesn't matter, as far as an interrupt stack is introduced, because
      stack tracer won't be invoked in an interrupt context. But it would be
      better to proactively minimize such a window by moving the LR modification
      after ftrace_push_return_trace().
      Signed-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      79fdee9b
    • J
      arm64: remove irq_count and do_softirq_own_stack() · d224a69e
      James Morse 提交于
      sysrq_handle_reboot() re-enables interrupts while on the irq stack. The
      irq_stack implementation wrongly assumed this would only ever happen
      via the softirq path, allowing it to update irq_count late, in
      do_softirq_own_stack().
      
      This means if an irq occurs in sysrq_handle_reboot(), during
      emergency_restart() the stack will be corrupted, as irq_count wasn't
      updated.
      
      Lose the optimisation, and instead of moving the adding/subtracting of
      irq_count into irq_stack_entry/irq_stack_exit, remove it, and compare
      sp_el0 (struct thread_info) with sp & ~(THREAD_SIZE - 1). This tells us
      if we are on a task stack, if so, we can safely switch to the irq stack.
      Finally, remove do_softirq_own_stack(), we don't need it anymore.
      Reported-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      [will: use get_thread_info macro]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      d224a69e
    • D
      arm64: hugetlb: add support for PTE contiguous bit · 66b3923a
      David Woods 提交于
      The arm64 MMU supports a Contiguous bit which is a hint that the TTE
      is one of a set of contiguous entries which can be cached in a single
      TLB entry.  Supporting this bit adds new intermediate huge page sizes.
      
      The set of huge page sizes available depends on the base page size.
      Without using contiguous pages the huge page sizes are as follows.
      
       4KB:   2MB  1GB
      64KB: 512MB
      
      With a 4KB granule, the contiguous bit groups together sets of 16 pages
      and with a 64KB granule it groups sets of 32 pages.  This enables two new
      huge page sizes in each case, so that the full set of available sizes
      is as follows.
      
       4KB:  64KB   2MB  32MB  1GB
      64KB:   2MB 512MB  16GB
      
      If a 16KB granule is used then the contiguous bit groups 128 pages
      at the PTE level and 32 pages at the PMD level.
      
      If the base page size is set to 64KB then 2MB pages are enabled by
      default.  It is possible in the future to make 2MB the default huge
      page size for both 4KB and 64KB granules.
      Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com>
      Reviewed-by: NSteve Capper <steve.capper@linaro.org>
      Signed-off-by: NDavid Woods <dwoods@ezchip.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      66b3923a
  5. 17 12月, 2015 2 次提交
  6. 16 12月, 2015 1 次提交
    • J
      arm64: reduce stack use in irq_handler · 971c67ce
      James Morse 提交于
      The code for switching to irq_stack stores three pieces of information on
      the stack, fp+lr, as a fake stack frame (that lets us walk back onto the
      interrupted tasks stack frame), and the address of the struct pt_regs that
      contains the register values from kernel entry. (which dump_backtrace()
      will print in any stack trace).
      
      To reduce this, we store fp, and the pointer to the struct pt_regs.
      unwind_frame() can recognise this as the irq_stack dummy frame, (as it only
      appears at the top of the irq_stack), and use the struct pt_regs values
      to find the missing interrupted link-register.
      Suggested-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      971c67ce
  7. 15 12月, 2015 1 次提交
  8. 12 12月, 2015 3 次提交
  9. 11 12月, 2015 4 次提交
    • M
      arm64: cmpxchg: Don't incldue linux/mmdebug.h · 4a6ccf30
      Mark Brown 提交于
      The arm64 asm/cmpxchg.h includes linux/mmdebug.h but doesn't so far as I
      can tell actually use anything from it.  Removing the inclusion reduces
      spurious header dependency rebuilds and also avoids issues with
      recursive inclusions of headers causing build breaks due to attempts to
      use things before they are defined if linux/mmdebug.h starts pulling in
      more low level headers.
      
      Such errors have happened in -next recently, for example:
      
      In file included from include/linux/completion.h:11:0,
                       from include/linux/rcupdate.h:43,
                       from include/linux/tracepoint.h:19,
                       from include/linux/mmdebug.h:6,
                       from ./arch/arm64/include/asm/cmpxchg.h:22,
                       from ./arch/arm64/include/asm/atomic.h:41,
                       from include/linux/atomic.h:4,
                       from include/linux/spinlock.h:406,
                       from include/linux/seqlock.h:35,
                       from include/linux/time.h:5,
                       from include/uapi/linux/timex.h:56,
                       from include/linux/timex.h:56,
                       from include/linux/sched.h:19,
                       from arch/arm64/kernel/asm-offsets.c:21:
      include/linux/wait.h: In function 'wait_on_atomic_t':
      include/linux/wait.h:1218:2: error: implicit declaration of function 'atomic_read' [-Werror=implicit-function-declaration]
       if (atomic_read(val) == 0)
      Signed-off-by: NMark Brown <broonie@kernel.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      4a6ccf30
    • M
      arm64: mm: fold alternatives into .init · 9aa4ec15
      Mark Rutland 提交于
      Currently we treat the alternatives separately from other data that's
      only used during initialisation, using separate .altinstructions and
      .altinstr_replacement linker sections. These are freed for general
      allocation separately from .init*. This is problematic as:
      
      * We do not remove execute permissions, as we do for .init, leaving the
        memory executable.
      
      * We pad between them, making the kernel Image bianry up to PAGE_SIZE
        bytes larger than necessary.
      
      This patch moves the two sections into the contiguous region used for
      .init*. This saves some memory, ensures that we remove execute
      permissions, and allows us to remove some code made redundant by this
      reorganisation.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Andre Przywara <andre.przywara@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Jeremy Linton <jeremy.linton@arm.com>
      Cc: Laura Abbott <labbott@fedoraproject.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      9aa4ec15
    • M
      arm64: Remove redundant padding from linker script · 5b28cd9d
      Mark Rutland 提交于
      Currently we place an ALIGN_DEBUG_RO between text and data for the .text
      and .init sections, and depending on configuration each of these may
      result in up to SECTION_SIZE bytes worth of padding (for
      DEBUG_RODATA_ALIGN).
      
      We make no distinction between the text and data in each of these
      sections at any point when creating the initial page tables in head.S.
      We also make no distinction when modifying the tables; __map_memblock,
      fixup_executable, mark_rodata_ro, and fixup_init only work at section
      granularity. Thus this padding is unnecessary.
      
      For the spit between init text and data we impose a minimum alignment of
      16 bytes, but this is also unnecessary. The init data is output
      immediately after the padding before any symbols are defined, so this is
      not required to keep a symbol for linker a section array correctly
      associated with the data. Any objects within the section will be given
      at least their usual alignment regardless.
      
      This patch removes the redundant padding.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Jeremy Linton <jeremy.linton@arm.com>
      Cc: Laura Abbott <labbott@fedoraproject.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      5b28cd9d
    • M
      arm64: mm: remove pointless PAGE_MASKing · e2c30ee3
      Mark Rutland 提交于
      As pgd_offset{,_k} shift the input address by PGDIR_SHIFT, the sub-page
      bits will always be shifted out. There is no need to apply PAGE_MASK
      before this.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Jeremy Linton <jeremy.linton@arm.com>
      Cc: Laura Abbott <labbott@fedoraproject.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      e2c30ee3
  10. 10 12月, 2015 8 次提交
  11. 09 12月, 2015 1 次提交
    • W
      arm64: irq: fix walking from irq stack to task stack · 7596abf2
      Will Deacon 提交于
      Running with CONFIG_DEBUG_SPINLOCK=y can trigger a BUG with the new IRQ
      stack code:
      
        BUG: spinlock lockup suspected on CPU#1
      
      This is due to the IRQ_STACK_TO_TASK_STACK macro incorrectly retrieving
      the task stack pointer stashed at the top of the IRQ stack.
      
      Sayeth James:
      
      | Yup, this is what is happening. Its an off-by-one due to broken
      | thinking about how the stack works. My broken thinking was:
      |
      | >   top ------------
      | >       | dummy_lr | <- irq_stack_ptr
      | >       ------------
      | >       |   x29    |
      | >       ------------
      | >       |   x19    | <- irq_stack_ptr - 0x10
      | >       ------------
      | >       |   xzr    |
      | >       ------------
      |
      | But the stack-pointer is decreased before use. So it actually looks
      | like this:
      |
      | >       ------------
      | >       |          |  <- irq_stack_ptr
      | >   top ------------
      | >       | dummy_lr |
      | >       ------------
      | >       |   x29    | <- irq_stack_ptr - 0x10
      | >       ------------
      | >       |   x19    |
      | >       ------------
      | >       |   xzr    | <- irq_stack_ptr - 0x20
      | >       ------------
      |
      | The value being used as the original stack is x29, which in all the
      | tests is sp but without the current frames data, hence there are no
      | missing frames in the output.
      |
      | Jungseok Lee picked it up with a 32bit user space because aarch32
      | can't use x29, so it remains 0 forever. The fix he posted is correct.
      
      This patch fixes the macro and adds some of this wisdom to a comment,
      so that the layout of the IRQ stack is well understood.
      
      Cc: James Morse <james.morse@arm.com>
      Reported-by: NJungseok Lee <jungseoklee85@gmail.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      7596abf2
  12. 08 12月, 2015 4 次提交
  13. 05 12月, 2015 1 次提交
    • C
      arm64: Add trace_hardirqs_off annotation in ret_to_user · db3899a6
      Catalin Marinas 提交于
      When a kernel is built with CONFIG_TRACE_IRQFLAGS the following warning
      is produced when entering userspace for the first time:
      
        WARNING: at /work/Linux/linux-2.6-aarch64/kernel/locking/lockdep.c:3519
        Modules linked in:
        CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc3+ #639
        Hardware name: Juno (DT)
        task: ffffffc9768a0000 ti: ffffffc9768a8000 task.ti: ffffffc9768a8000
        PC is at check_flags.part.22+0x19c/0x1a8
        LR is at check_flags.part.22+0x19c/0x1a8
        pc : [<ffffffc0000fba6c>] lr : [<ffffffc0000fba6c>] pstate: 600001c5
        sp : ffffffc9768abe10
        x29: ffffffc9768abe10 x28: ffffffc9768a8000
        x27: 0000000000000000 x26: 0000000000000001
        x25: 00000000000000a6 x24: ffffffc00064be6c
        x23: ffffffc0009f249e x22: ffffffc9768a0000
        x21: ffffffc97fea5480 x20: 00000000000001c0
        x19: ffffffc00169a000 x18: 0000005558cc7b58
        x17: 0000007fb78e3180 x16: 0000005558d2e238
        x15: ffffffffffffffff x14: 0ffffffffffffffd
        x13: 0000000000000008 x12: 0101010101010101
        x11: 7f7f7f7f7f7f7f7f x10: fefefefefefeff63
        x9 : 7f7f7f7f7f7f7f7f x8 : 6e655f7371726964
        x7 : 0000000000000001 x6 : ffffffc0001079c4
        x5 : 0000000000000000 x4 : 0000000000000001
        x3 : ffffffc001698438 x2 : 0000000000000000
        x1 : ffffffc9768a0000 x0 : 000000000000002e
        Call trace:
        [<ffffffc0000fba6c>] check_flags.part.22+0x19c/0x1a8
        [<ffffffc0000fc440>] lock_is_held+0x80/0x98
        [<ffffffc00064bafc>] __schedule+0x404/0x730
        [<ffffffc00064be6c>] schedule+0x44/0xb8
        [<ffffffc000085bb0>] ret_to_user+0x0/0x24
        possible reason: unannotated irqs-off.
        irq event stamp: 502169
        hardirqs last  enabled at (502169): [<ffffffc000085a98>] el0_irq_naked+0x1c/0x24
        hardirqs last disabled at (502167): [<ffffffc0000bb3bc>] __do_softirq+0x17c/0x298
        softirqs last  enabled at (502168): [<ffffffc0000bb43c>] __do_softirq+0x1fc/0x298
        softirqs last disabled at (502143): [<ffffffc0000bb830>] irq_exit+0xa0/0xf0
      
      This happens because we disable interrupts in ret_to_user before calling
      schedule() in work_resched. This patch adds the necessary
      trace_hardirqs_off annotation.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      db3899a6
  14. 04 12月, 2015 3 次提交
    • L
      arm64: ftrace: fix the comments for ftrace_modify_code · 004ab584
      Li Bin 提交于
      There is no need to worry about module and __init text disappearing
      case, because that ftrace has a module notifier that is called when
      a module is being unloaded and before the text goes away and this
      code grabs the ftrace_lock mutex and removes the module functions
      from the ftrace list, such that it will no longer do any
      modifications to that module's text, the update to make functions
      be traced or not is done under the ftrace_lock mutex as well.
      And by now, __init section codes should not been modified
      by ftrace, because it is black listed in recordmcount.c and
      ignored by ftrace.
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NLi Bin <huawei.libin@huawei.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      004ab584
    • L
      arm64: ftrace: stop using kstop_machine to enable/disable tracing · 81a6a146
      Li Bin 提交于
      For ftrace on arm64, kstop_machine which is hugely disruptive
      to a running system is not needed to convert nops to ftrace calls
      or back, because that to be modified instrucions, that NOP, B or BL,
      are all safe instructions which called "concurrent modification
      and execution of instructions", that can be executed by one
      thread of execution as they are being modified by another thread
      of execution without requiring explicit synchronization.
      Signed-off-by: NLi Bin <huawei.libin@huawei.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      81a6a146
    • W
      arm64: spinlock: serialise spin_unlock_wait against concurrent lockers · d86b8da0
      Will Deacon 提交于
      Boqun Feng reported a rather nasty ordering issue with spin_unlock_wait
      on architectures implementing spin_lock with LL/SC sequences and acquire
      semantics:
      
       | CPU 1                   CPU 2                     CPU 3
       | ==================      ====================      ==============
       |                                                   spin_unlock(&lock);
       |                         spin_lock(&lock):
       |                           r1 = *lock; // r1 == 0;
       |                         o = READ_ONCE(object); // reordered here
       | object = NULL;
       | smp_mb();
       | spin_unlock_wait(&lock);
       |                           *lock = 1;
       | smp_mb();
       | o->dead = true;
       |                         if (o) // true
       |                           BUG_ON(o->dead); // true!!
      
      The crux of the problem is that spin_unlock_wait(&lock) can return on
      CPU 1 whilst CPU 2 is in the process of taking the lock. This can be
      resolved by upgrading spin_unlock_wait to a LOCK operation, forcing it
      to serialise against a concurrent locker and giving it acquire semantics
      in the process (although it is not at all clear whether this is needed -
      different callers seem to assume different things about the barrier
      semantics and architectures are similarly disjoint in their
      implementations of the macro).
      
      This patch implements spin_unlock_wait using an LL/SC sequence with
      acquire semantics on arm64. For v8.1 systems with the LSE atomics, the
      exclusive writeback is omitted, since the spin_lock operation is
      indivisible and no intermediate state can be observed.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      d86b8da0