1. 15 6月, 2017 4 次提交
  2. 07 6月, 2017 1 次提交
  3. 04 6月, 2017 3 次提交
  4. 23 5月, 2017 1 次提交
  5. 10 5月, 2017 7 次提交
    • M
      arm64: uaccess: suppress spurious clang warning · d135b8b5
      Mark Rutland 提交于
      Clang tries to warn when there's a mismatch between an operand's size,
      and the size of the register it is held in, as this may indicate a bug.
      Specifically, clang warns when the operand's type is less than 64 bits
      wide, and the register is used unqualified (i.e. %N rather than %xN or
      %wN).
      
      Unfortunately clang can generate these warnings for unreachable code.
      For example, for code like:
      
      do {                                            \
              typeof(*(ptr)) __v = (v);               \
              switch(sizeof(*(ptr))) {                \
              case 1:                                 \
                      // assume __v is 1 byte wide    \
                      asm ("{op}b %w0" : : "r" (v));  \
                      break;                          \
              case 8:                                 \
                      // assume __v is 8 bytes wide   \
                      asm ("{op} %0" : : "r" (v));    \
                      break;                          \
              }
      while (0)
      
      ... if op() were passed a char value and pointer to char, clang may
      produce a warning for the unreachable case where sizeof(*(ptr)) is 8.
      
      For the same reasons, clang produces warnings when __put_user_err() is
      used for types that are less than 64 bits wide.
      
      We could avoid this with a cast to a fixed-width type in each of the
      cases. However, GCC will then warn that pointer types are being cast to
      mismatched integer sizes (in unreachable paths).
      
      Another option would be to use the same union trickery as we do for
      __smp_store_release() and __smp_load_acquire(), but this is fairly
      invasive.
      
      Instead, this patch suppresses the clang warning by using an x modifier
      in the assembly for the 8 byte case of __put_user_err(). No additional
      work is necessary as the value has been cast to typeof(*(ptr)), so the
      compiler will have performed any necessary extension for the reachable
      case.
      
      For consistency, __get_user_err() is also updated to use the x modifier
      for its 8 byte case.
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reported-by: NMatthias Kaehlcke <mka@chromium.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      d135b8b5
    • M
      arm64: atomic_lse: match asm register sizes · 8997c934
      Mark Rutland 提交于
      The LSE atomic code uses asm register variables to ensure that
      parameters are allocated in specific registers. In the majority of cases
      we specifically ask for an x register when using 64-bit values, but in a
      couple of cases we use a w regsiter for a 64-bit value.
      
      For asm register variables, the compiler only cares about the register
      index, with wN and xN having the same meaning. The compiler determines
      the register size to use based on the type of the variable. Thus, this
      inconsistency is merely confusing, and not harmful to code generation.
      
      For consistency, this patch updates those cases to use the x register
      alias. There should be no functional change as a result of this patch.
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      8997c934
    • M
      arm64: uaccess: ensure extension of access_ok() addr · a06040d7
      Mark Rutland 提交于
      Our access_ok() simply hands its arguments over to __range_ok(), which
      implicitly assummes that the addr parameter is 64 bits wide. This isn't
      necessarily true for compat code, which might pass down a 32-bit address
      parameter.
      
      In these cases, we don't have a guarantee that the address has been zero
      extended to 64 bits, and the upper bits of the register may contain
      unknown values, potentially resulting in a suprious failure.
      
      Avoid this by explicitly casting the addr parameter to an unsigned long
      (as is done on other architectures), ensuring that the parameter is
      widened appropriately.
      
      Fixes: 0aea86a2 ("arm64: User access library functions")
      Cc: <stable@vger.kernel.org> # 3.7.x-
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      a06040d7
    • M
      arm64: ensure extension of smp_store_release value · 994870be
      Mark Rutland 提交于
      When an inline assembly operand's type is narrower than the register it
      is allocated to, the least significant bits of the register (up to the
      operand type's width) are valid, and any other bits are permitted to
      contain any arbitrary value. This aligns with the AAPCS64 parameter
      passing rules.
      
      Our __smp_store_release() implementation does not account for this, and
      implicitly assumes that operands have been zero-extended to the width of
      the type being stored to. Thus, we may store unknown values to memory
      when the value type is narrower than the pointer type (e.g. when storing
      a char to a long).
      
      This patch fixes the issue by casting the value operand to the same
      width as the pointer operand in all cases, which ensures that the value
      is zero-extended as we expect. We use the same union trickery as
      __smp_load_acquire and {READ,WRITE}_ONCE() to avoid GCC complaining that
      pointers are potentially cast to narrower width integers in unreachable
      paths.
      
      A whitespace issue at the top of __smp_store_release() is also
      corrected.
      
      No changes are necessary for __smp_load_acquire(). Load instructions
      implicitly clear any upper bits of the register, and the compiler will
      only consider the least significant bits of the register as valid
      regardless.
      
      Fixes: 47933ad4 ("arch: Introduce smp_load_acquire(), smp_store_release()")
      Fixes: 878a84d5 ("arm64: add missing data types in smp_load_acquire/smp_store_release")
      Cc: <stable@vger.kernel.org> # 3.14.x-
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Matthias Kaehlcke <mka@chromium.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      994870be
    • M
      arm64: xchg: hazard against entire exchange variable · fee960be
      Mark Rutland 提交于
      The inline assembly in __XCHG_CASE() uses a +Q constraint to hazard
      against other accesses to the memory location being exchanged. However,
      the pointer passed to the constraint is a u8 pointer, and thus the
      hazard only applies to the first byte of the location.
      
      GCC can take advantage of this, assuming that other portions of the
      location are unchanged, as demonstrated with the following test case:
      
      union u {
      	unsigned long l;
      	unsigned int i[2];
      };
      
      unsigned long update_char_hazard(union u *u)
      {
      	unsigned int a, b;
      
      	a = u->i[1];
      	asm ("str %1, %0" : "+Q" (*(char *)&u->l) : "r" (0UL));
      	b = u->i[1];
      
      	return a ^ b;
      }
      
      unsigned long update_long_hazard(union u *u)
      {
      	unsigned int a, b;
      
      	a = u->i[1];
      	asm ("str %1, %0" : "+Q" (*(long *)&u->l) : "r" (0UL));
      	b = u->i[1];
      
      	return a ^ b;
      }
      
      The linaro 15.08 GCC 5.1.1 toolchain compiles the above as follows when
      using -O2 or above:
      
      0000000000000000 <update_char_hazard>:
         0:	d2800001 	mov	x1, #0x0                   	// #0
         4:	f9000001 	str	x1, [x0]
         8:	d2800000 	mov	x0, #0x0                   	// #0
         c:	d65f03c0 	ret
      
      0000000000000010 <update_long_hazard>:
        10:	b9400401 	ldr	w1, [x0,#4]
        14:	d2800002 	mov	x2, #0x0                   	// #0
        18:	f9000002 	str	x2, [x0]
        1c:	b9400400 	ldr	w0, [x0,#4]
        20:	4a000020 	eor	w0, w1, w0
        24:	d65f03c0 	ret
      
      This patch fixes the issue by passing an unsigned long pointer into the
      +Q constraint, as we do for our cmpxchg code. This may hazard against
      more than is necessary, but this is better than missing a necessary
      hazard.
      
      Fixes: 305d454a ("arm64: atomics: implement native {relaxed, acquire, release} atomics")
      Cc: <stable@vger.kernel.org> # 4.4.x-
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      fee960be
    • K
      arm64: entry: improve data abort handling of tagged pointers · 276e9327
      Kristina Martsenko 提交于
      When handling a data abort from EL0, we currently zero the top byte of
      the faulting address, as we assume the address is a TTBR0 address, which
      may contain a non-zero address tag. However, the address may be a TTBR1
      address, in which case we should not zero the top byte. This patch fixes
      that. The effect is that the full TTBR1 address is passed to the task's
      signal handler (or printed out in the kernel log).
      
      When handling a data abort from EL1, we leave the faulting address
      intact, as we assume it's either a TTBR1 address or a TTBR0 address with
      tag 0x00. This is true as far as I'm aware, we don't seem to access a
      tagged TTBR0 address anywhere in the kernel. Regardless, it's easy to
      forget about address tags, and code added in the future may not always
      remember to remove tags from addresses before accessing them. So add tag
      handling to the EL1 data abort handler as well. This also makes it
      consistent with the EL0 data abort handler.
      
      Fixes: d50240a5 ("arm64: mm: permit use of tagged pointers at EL0")
      Cc: <stable@vger.kernel.org> # 3.12.x-
      Reviewed-by: NDave Martin <Dave.Martin@arm.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NKristina Martsenko <kristina.martsenko@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      276e9327
    • K
      arm64: hw_breakpoint: fix watchpoint matching for tagged pointers · 7dcd9dd8
      Kristina Martsenko 提交于
      When we take a watchpoint exception, the address that triggered the
      watchpoint is found in FAR_EL1. We compare it to the address of each
      configured watchpoint to see which one was hit.
      
      The configured watchpoint addresses are untagged, while the address in
      FAR_EL1 will have an address tag if the data access was done using a
      tagged address. The tag needs to be removed to compare the address to
      the watchpoints.
      
      Currently we don't remove it, and as a result can report the wrong
      watchpoint as being hit (specifically, always either the highest TTBR0
      watchpoint or lowest TTBR1 watchpoint). This patch removes the tag.
      
      Fixes: d50240a5 ("arm64: mm: permit use of tagged pointers at EL0")
      Cc: <stable@vger.kernel.org> # 3.12.x-
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NKristina Martsenko <kristina.martsenko@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      7dcd9dd8
  6. 09 5月, 2017 2 次提交
  7. 03 5月, 2017 1 次提交
    • D
      bpf, arm64: implement jiting of BPF_XADD · 85f68fe8
      Daniel Borkmann 提交于
      This work adds BPF_XADD for BPF_W/BPF_DW to the arm64 JIT and therefore
      completes JITing of all BPF instructions, meaning we can thus also remove
      the 'notyet' label and do not need to fall back to the interpreter when
      BPF_XADD is used in a program!
      
      This now also brings arm64 JIT in line with x86_64, s390x, ppc64, sparc64,
      where all current eBPF features are supported.
      
      BPF_W example from test_bpf:
      
        .u.insns_int = {
          BPF_ALU32_IMM(BPF_MOV, R0, 0x12),
          BPF_ST_MEM(BPF_W, R10, -40, 0x10),
          BPF_STX_XADD(BPF_W, R10, R0, -40),
          BPF_LDX_MEM(BPF_W, R0, R10, -40),
          BPF_EXIT_INSN(),
        },
      
        [...]
        00000020:  52800247  mov w7, #0x12 // #18
        00000024:  928004eb  mov x11, #0xffffffffffffffd8 // #-40
        00000028:  d280020a  mov x10, #0x10 // #16
        0000002c:  b82b6b2a  str w10, [x25,x11]
        // start of xadd mapping:
        00000030:  928004ea  mov x10, #0xffffffffffffffd8 // #-40
        00000034:  8b19014a  add x10, x10, x25
        00000038:  f9800151  prfm pstl1strm, [x10]
        0000003c:  885f7d4b  ldxr w11, [x10]
        00000040:  0b07016b  add w11, w11, w7
        00000044:  880b7d4b  stxr w11, w11, [x10]
        00000048:  35ffffab  cbnz w11, 0x0000003c
        // end of xadd mapping:
        [...]
      
      BPF_DW example from test_bpf:
      
        .u.insns_int = {
          BPF_ALU32_IMM(BPF_MOV, R0, 0x12),
          BPF_ST_MEM(BPF_DW, R10, -40, 0x10),
          BPF_STX_XADD(BPF_DW, R10, R0, -40),
          BPF_LDX_MEM(BPF_DW, R0, R10, -40),
          BPF_EXIT_INSN(),
        },
      
        [...]
        00000020:  52800247  mov w7,  #0x12 // #18
        00000024:  928004eb  mov x11, #0xffffffffffffffd8 // #-40
        00000028:  d280020a  mov x10, #0x10 // #16
        0000002c:  f82b6b2a  str x10, [x25,x11]
        // start of xadd mapping:
        00000030:  928004ea  mov x10, #0xffffffffffffffd8 // #-40
        00000034:  8b19014a  add x10, x10, x25
        00000038:  f9800151  prfm pstl1strm, [x10]
        0000003c:  c85f7d4b  ldxr x11, [x10]
        00000040:  8b07016b  add x11, x11, x7
        00000044:  c80b7d4b  stxr w11, x11, [x10]
        00000048:  35ffffab  cbnz w11, 0x0000003c
        // end of xadd mapping:
        [...]
      
      Tested on Cavium ThunderX ARMv8, test suite results after the patch:
      
        No JIT:   [ 3751.855362] test_bpf: Summary: 311 PASSED, 0 FAILED, [0/303 JIT'ed]
        With JIT: [ 3573.759527] test_bpf: Summary: 311 PASSED, 0 FAILED, [303/303 JIT'ed]
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85f68fe8
  8. 02 5月, 2017 2 次提交
  9. 27 4月, 2017 2 次提交
    • P
      KVM: mark requests that need synchronization · 7a97cec2
      Paolo Bonzini 提交于
      kvm_make_all_requests() provides a synchronization that waits until all
      kicked VCPUs have acknowledged the kick.  This is important for
      KVM_REQ_MMU_RELOAD as it prevents freeing while lockless paging is
      underway.
      
      This patch adds the synchronization property into all requests that are
      currently being used with kvm_make_all_requests() in order to preserve
      the current behavior and only introduce a new framework.  Removing it
      from requests where it is not necessary is left for future patches.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7a97cec2
    • R
      KVM: mark requests that do not need a wakeup · 930f7fd6
      Radim Krčmář 提交于
      Some operations must ensure that the guest is not running with stale
      data, but if the guest is halted, then the update can wait until another
      event happens.  kvm_make_all_requests() currently doesn't wake up, so we
      can mark all requests used with it.
      
      First 8 bits were arbitrarily reserved for request numbers.
      
      Most uses of requests have the request type as a constant, so a compiler
      will optimize the '&'.
      
      An alternative would be to have an inline function that would return
      whether the request needs a wake-up or not, but I like this one better
      even though it might produce worse assembly.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      930f7fd6
  10. 26 4月, 2017 1 次提交
    • A
      arm64: module: split core and init PLT sections · 24af6c4e
      Ard Biesheuvel 提交于
      The arm64 module PLT code allocates all PLT entries in a single core
      section, since the overhead of having a separate init PLT section is
      not justified by the small number of PLT entries usually required for
      init code.
      
      However, the core and init module regions are allocated independently,
      and there is a corner case where the core region may be allocated from
      the VMALLOC region if the dedicated module region is exhausted, but the
      init region, being much smaller, can still be allocated from the module
      region. This leads to relocation failures if the distance between those
      regions exceeds 128 MB. (In fact, this corner case is highly unlikely to
      occur on arm64, but the issue has been observed on ARM, whose module
      region is much smaller).
      
      So split the core and init PLT regions, and name the latter ".init.plt"
      so it gets allocated along with (and sufficiently close to) the .init
      sections that it serves. Also, given that init PLT entries may need to
      be emitted for branches that target the core module, modify the logic
      that disregards defined symbols to only disregard symbols that are
      defined in the same section as the relocated branch instruction.
      
      Since there may now be two PLT entries associated with each entry in
      the symbol table, we can no longer hijack the symbol::st_size fields
      to record the addresses of PLT entries as we emit them for zero-addend
      relocations. So instead, perform an explicit comparison to check for
      duplicate entries.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      24af6c4e
  11. 25 4月, 2017 1 次提交
    • L
      ARM64: Implement pci_remap_cfgspace() interface · f1e209b7
      Lorenzo Pieralisi 提交于
      The PCI bus specification (rev 3.0, 3.2.5 "Transaction Ordering and
      Posting") defines rules for PCI configuration space transactions ordering
      and posting, that state that configuration writes are non-posted
      transactions.
      
      This rule is reinforced by the ARM v8 architecture reference manual (issue
      A.k, Early Write Acknowledgment) that explicitly recommends that No Early
      Write Acknowledgment attribute should be used to map PCI configuration
      (write) transactions.
      
      Current ioremap interface on ARM64 implements mapping functions where the
      Early Write Acknowledgment hint is enabled, so they cannot be used to map
      PCI configuration space in a PCI specs compliant way.
      
      Implement an ARM64 specific pci_remap_cfgspace() interface that allows to
      map PCI config region with nGnRnE attributes, providing a remap function
      that complies with PCI specifications and the ARMv8 architecture reference
      manual recommendations.
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      f1e209b7
  12. 24 4月, 2017 1 次提交
  13. 20 4月, 2017 1 次提交
  14. 18 4月, 2017 1 次提交
    • A
      Remove compat_sys_getdents64() · 2611dc19
      Al Viro 提交于
      Unlike normal compat syscall variants, it is needed only for
      biarch architectures that have different alignement requirements for
      u64 in 32bit and 64bit ABI *and* have __put_user() that won't handle
      a store of 64bit value at 32bit-aligned address.  We used to have one
      such (ia64), but its biarch support has been gone since 2010 (after
      being broken in 2008, which went unnoticed since nobody had been using
      it).
      
      It had escaped removal at the same time only because back in 2004
      a patch that switched several syscalls on amd64 from private wrappers to
      generic compat ones had switched to use of compat_sys_getdents64(), which
      hadn't needed (or used) a compat wrapper on amd64.
      
      Let's bury it - it's at least 7 years overdue.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2611dc19
  15. 11 4月, 2017 1 次提交
    • M
      arm64: add function to get a cpu's MADT GICC table · e0013aed
      Mark Rutland 提交于
      Currently the ACPI parking protocol code needs to parse each CPU's MADT
      GICC table to extract the mailbox address and so on. Each time we parse
      a GICC table, we call back to the parking protocol code to parse it.
      
      This has been fine so far, but we're about to have more code that needs
      to extract data from the GICC tables, and adding a callback for each
      user is going to get unwieldy.
      
      Instead, this patch ensures that we stash a copy of each CPU's GICC
      table at boot time, such that anything needing to parse it can later
      request it. This will allow for other parsers of GICC, and for
      simplification to the ACPI parking protocol code. Note that we must
      store a copy, rather than a pointer, since the core ACPI code
      temporarily maps/unmaps tables while iterating over them.
      
      Since we parse the MADT before we know how many CPUs we have (and hence
      before we setup the percpu areas), we must use an NR_CPUS sized array.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Tested-by: NJeremy Linton <jeremy.linton@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      e0013aed
  16. 09 4月, 2017 8 次提交
  17. 07 4月, 2017 3 次提交