1. 06 5月, 2016 1 次提交
    • A
      mm: thp: kvm: fix memory corruption in KVM with THP enabled · 127393fb
      Andrea Arcangeli 提交于
      After the THP refcounting change, obtaining a compound pages from
      get_user_pages() no longer allows us to assume the entire compound page
      is immediately mappable from a secondary MMU.
      
      A secondary MMU doesn't want to call get_user_pages() more than once for
      each compound page, in order to know if it can map the whole compound
      page.  So a secondary MMU needs to know from a single get_user_pages()
      invocation when it can map immediately the entire compound page to avoid
      a flood of unnecessary secondary MMU faults and spurious
      atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier
      users).
      
      Ideally instead of the page->_mapcount < 1 check, get_user_pages()
      should return the granularity of the "page" mapping in the "mm" passed
      to get_user_pages().  However it's non trivial change to pass the "pmd"
      status belonging to the "mm" walked by get_user_pages up the stack (up
      to the caller of get_user_pages).  So the fix just checks if there is
      not a single pte mapping on the page returned by get_user_pages, and in
      turn if the caller can assume that the whole compound page is mapped in
      the current "mm" (in a pmd_trans_huge()).  In such case the entire
      compound page is safe to map into the secondary MMU without additional
      get_user_pages() calls on the surrounding tail/head pages.  In addition
      of being faster, not having to run other get_user_pages() calls also
      reduces the memory footprint of the secondary MMU fault in case the pmd
      split happened as result of memory pressure.
      
      Without this fix after a MADV_DONTNEED (like invoked by QEMU during
      postcopy live migration or balloning) or after generic swapping (with a
      failure in split_huge_page() that would only result in pmd splitting and
      not a physical page split), KVM would map the whole compound page into
      the shadow pagetables, despite regular faults or userfaults (like
      UFFDIO_COPY) may map regular pages into the primary MMU as result of the
      pte faults, leading to the guest mode and userland mode going out of
      sync and not working on the same memory at all times.
      
      Any other secondary MMU notifier manager (KVM is just one of the many
      MMU notifier users) will need the same information if it doesn't want to
      run a flood of get_user_pages_fast and it can support multiple
      granularity in the secondary MMU mappings, so I think it is justified to
      be exposed not just to KVM.
      
      The other option would be to move transparent_hugepage_adjust to
      mm/huge_memory.c but that currently has all kind of KVM data structures
      in it, so it's definitely not a cut-and-paste work, so I couldn't do a
      fix as cleaner as this one for 4.6.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: "Li, Liang Z" <liang.z.li@intel.com>
      Cc: Amit Shah <amit.shah@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      127393fb
  2. 28 4月, 2016 5 次提交
    • K
      perf/x86/intel: Fix incorrect lbr_sel_mask value · cf3beb7c
      Kan Liang 提交于
      This patch fixes a bug which was introduced by:
      
       b16a5b52 ("perf/x86: Add option to disable reading branch flags/cycles")
      
      In this patch, lbr_sel_mask is used to mask the lbr_select. But LBR_SEL_MASK
      doesn't include the bit for LBR_CALL_STACK. So LBR call stack will never be
      set in lbr_select.
      
      This patch corrects the LBR_SEL_MASK by including all valid bits in
      LBR_SELECT. Also, the LBR_CALL_STACK bit is different as other bit in
      LBR_SELECT. It does not operate in suppress mode, so it needs to be
      specially handled in intel_pmu_setup_hw_lbr_filter.
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1461231010-4399-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cf3beb7c
    • A
      perf/x86/intel/pt: Don't die on VMXON · 1c5ac21a
      Alexander Shishkin 提交于
      Some versions of Intel PT do not support tracing across VMXON, more
      specifically, VMXON will clear TraceEn control bit and any attempt to
      set it before VMXOFF will throw a #GP, which in the current state of
      things will crash the kernel. Namely:
      
        $ perf record -e intel_pt// kvm -nographic
      
      on such a machine will kill it.
      
      To avoid this, notify the intel_pt driver before VMXON and after
      VMXOFF so that it knows when not to enable itself.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: hpa@zytor.com
      Link: http://lkml.kernel.org/r/87oa9dwrfk.fsf@ashishki-desk.ger.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1c5ac21a
    • A
      perf/x86/amd: Set the size of event map array to PERF_COUNT_HW_MAX · 0a25556f
      Adam Borowski 提交于
      The entry for PERF_COUNT_HW_REF_CPU_CYCLES is not used on AMD, but is
      referenced by filter_events() which expects undefined events to have a
      value of 0.
      
      Found via KASAN:
      
        UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:30
        index 9 is out of range for type 'u64 [9]'
        UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:9
        load of address ffffffff81c021c8 with insufficient space for an object of type 'const u64'
      Signed-off-by: NAdam Borowski <kilobyte@angband.pl>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1461749731-30979-1-git-send-email-kilobyte@angband.plSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0a25556f
    • K
      x86/apic: Handle zero vector gracefully in clear_vector_irq() · 1bdb8970
      Keith Busch 提交于
      If x86_vector_alloc_irq() fails x86_vector_free_irqs() is invoked to cleanup
      the already allocated vectors. This subsequently calls clear_vector_irq().
      
      The failed irq has no vector assigned, which triggers the BUG_ON(!vector) in
      clear_vector_irq().
      
      We cannot suppress the call to x86_vector_free_irqs() for the failed
      interrupt, because the other data related to this irq must be cleaned up as
      well. So calling clear_vector_irq() with vector == 0 is legitimate.
      
      Remove the BUG_ON and return if vector is zero,
      
      [ tglx: Massaged changelog ]
      
      Fixes: b5dc8e6c "x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors"
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1bdb8970
    • D
      sparc64: Fix bootup regressions on some Kconfig combinations. · 49fa5230
      David S. Miller 提交于
      The system call tracing bug fix mentioned in the Fixes tag
      below increased the amount of assembler code in the sequence
      of assembler files included by head_64.S
      
      This caused to total set of code to exceed 0x4000 bytes in
      size, which overflows the expression in head_64.S that works
      to place swapper_tsb at address 0x408000.
      
      When this is violated, the TSB is not properly aligned, and
      also the trap table is not aligned properly either.  All of
      this together results in failed boots.
      
      So, do two things:
      
      1) Simplify some code by using ba,a instead of ba/nop to get
         those bytes back.
      
      2) Add a linker script assertion to make sure that if this
         happens again the build will fail.
      
      Fixes: 1a40b953 ("sparc: Fix system call tracing register handling.")
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Reported-by: NJoerg Abraham <joerg.abraham@nokia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49fa5230
  3. 27 4月, 2016 5 次提交
    • A
      ARC: add support for reserved memory defined by device tree · 1b10cb21
      Alexey Brodkin 提交于
      Enable reserved memory initialization from device tree.
      Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      1b10cb21
    • A
      ARC: support generic per-device coherent dma mem · 32ed9a0e
      Alexey Brodkin 提交于
      Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      32ed9a0e
    • R
      nios2: memset: use the right constraint modifier for the %4 output operand · a8950e49
      Romain Perier 提交于
      Depending on the size of the area to be memset'ed, the nios2 memset implementation
      either uses a naive loop (for buffers smaller or equal than 8 bytes) or a more optimized
      implementation (for buffers larger than 8 bytes). This implementation does 4-byte stores
      rather than 1-byte stores to speed up memset.
      
      However, we discovered that on our nios2 platform, memset() was not properly setting the
      buffer to the expected value. A memset of 0xff would not set the entire buffer to 0xff, but to:
      
      0xff 0x00 0xff 0x00 0xff 0x00 0xff 0x00 ...
      
      Which is obviously incorrect. Our investigation has revealed that the problem lies in the
      incorrect constraints used in the inline assembly.
      
      The following piece of assembly, from the nios2 memset implementation, is supposed to
      create a 4-byte value that repeats 4 times the 1-byte pattern passed as memset argument:
      
      /* fill8 %3, %5 (c & 0xff) */
      "       slli    %4, %5, 8\n"
      "       or      %4, %4, %5\n"
      "       slli    %3, %4, 16\n"
      "       or      %3, %3, %4\n"
      
      However, depending on the compiler and optimization level, this code might be compiled as:
      
      34:	280a923a 	slli	r5,r5,8
      38:	294ab03a 	or	r5,r5,r5
      3c:	2808943a 	slli	r4,r5,16
      40:	2148b03a 	or	r4,r4,r5
      
      This is wrong because r5 gets used both for %5 and %4, which leads to the final pattern
      stored in r4 to be 0xff00ff00 rather than the expected 0xffffffff.
      
      %4 is defined with the "=r" constraint, i.e as an output operand. However, as explained in
      http://www.ethernut.de/en/documents/arm-inline-asm.html, this does not prevent gcc from
      using the same register for an output operand (%4) and input operand (%5). By using the
      constraint modifier '&', we indicate that the register should be used for output only. With this
      change, we get the following assembly output:
      
      34:	2810923a 	slli	r8,r5,8
      38:	4150b03a 	or	r8,r8,r5
      3c:	400e943a 	slli	r7,r8,16
      40:	3a0eb03a 	or	r7,r7,r8
      
      Which correctly produces the 0xffffffff pattern when 0xff is passed as the memset() pattern.
      
      It is worth mentioning the observed consequence of this bug: we were hitting the kernel
      BUG() in mm/bootmem.c:__free() that verifies when marking a page as free that it was
      previously marked as occupied (i.e that the bit was set to 1). The entire bootmem bitmap is
      set to 0xff bit via a memset() during the bootmem initialization. The bootmem_free() call right
      after the initialization was finding some bits to be set to 0, which didn't make sense since the
      bitmap has just been memset'ed to 0xff. Except that due to the bug explained above, the
      bitmap was in fact initialized to 0xff00ff00.
      
      Thanks to Marek Vasut for his help and feedback.
      Signed-off-by: NRomain Perier <romain.perier@free-electrons.com>
      Acked-by: NMarek Vasut <marex@denx.de>
      Acked-by: NLey Foon Tan <lftan@altera.com>
      a8950e49
    • R
      powerpc: wire up preadv2 and pwritev2 syscalls · d701cca6
      Rui Salvaterra 提交于
      Wire up preadv2/pwritev2 in the same way as preadv/pwritev. Fixes two
      build warnings on ppc64.
      
      mpe: Lightly tested with fio (slightly hacked to add the syscall
      wrappers):
      
        fio-4217  [009] ....  1304.635300: sys_preadv2(fd: 3, vec:
        10025821de0, vlen: 1, pos_l: 6253000, pos_h: 0, flags: 1)
        fio-4217  [009] ....  1304.635474: sys_preadv2 -> 0x1000
      Signed-off-by: NRui Salvaterra <rsalvaterra@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d701cca6
    • A
      Revert "x86/mm/32: Set NX in __supported_pte_mask before enabling paging" · e16d8a6c
      Andy Lutomirski 提交于
      This reverts commit 320d25b6.
      
      This change was problematic for a couple of reasons:
      
      1. It missed a some entry points (Xen things and 64-bit native).
      
      2. The entry it changed can be executed more than once.  This isn't
         really a problem, but it conflated per-cpu state setup and global
         state setup.
      
      3. It broke 64-bit non-NX.  64-bit non-NX worked the other way around from
         32-bit -- __supported_pte_mask had NX set initially and was *cleared*
         in x86_configure_nx.  With the patch applied, it never got cleared.
      Reported-and-tested-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/59bd15f7f4b56b633a611b7f70876c6d2ad01a98.1461685884.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e16d8a6c
  4. 24 4月, 2016 1 次提交
  5. 23 4月, 2016 3 次提交
    • S
      perf/x86/intel/rapl: Add missing Haswell model · e1089602
      Srinivas Pandruvada 提交于
      Added one missing Haswell model.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: bp@alien8.de
      Cc: hpa@zytor.com
      Link: http://lkml.kernel.org/r/1460907809-11897-1-git-send-email-srinivas.pandruvada@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e1089602
    • A
      perf/x86/intel: Add model number for Skylake Server to perf · b89c1737
      Andi Kleen 提交于
      Everything the same as base Skylake, just a new model number.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1460751933-2264-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b89c1737
    • R
      xen/qspinlock: Don't kick CPU if IRQ is not initialized · 707e59ba
      Ross Lagerwall 提交于
      The following commit:
      
        1fb3a8b2 ("xen/spinlock: Fix locking path engaging too soon under PVHVM.")
      
      ... moved the initalization of the kicker interrupt until after
      native_cpu_up() is called.
      
      However, when using qspinlocks, a CPU may try to kick another CPU that is
      spinning (because it has not yet initialized its kicker interrupt), resulting
      in the following crash during boot:
      
        kernel BUG at /build/linux-Ay7j_C/linux-4.4.0/drivers/xen/events/events_base.c:1210!
        invalid opcode: 0000 [#1] SMP
        ...
        RIP: 0010:[<ffffffff814c97c9>]  [<ffffffff814c97c9>] xen_send_IPI_one+0x59/0x60
        ...
        Call Trace:
         [<ffffffff8102be9e>] xen_qlock_kick+0xe/0x10
         [<ffffffff810cabc2>] __pv_queued_spin_unlock+0xb2/0xf0
         [<ffffffff810ca6d1>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
         [<ffffffff81052936>] ? check_tsc_warp+0x76/0x150
         [<ffffffff81052aa6>] check_tsc_sync_source+0x96/0x160
         [<ffffffff81051e28>] native_cpu_up+0x3d8/0x9f0
         [<ffffffff8102b315>] xen_hvm_cpu_up+0x35/0x80
         [<ffffffff8108198c>] _cpu_up+0x13c/0x180
         [<ffffffff81081a4a>] cpu_up+0x7a/0xa0
         [<ffffffff81f80dfc>] smp_init+0x7f/0x81
         [<ffffffff81f5a121>] kernel_init_freeable+0xef/0x212
         [<ffffffff81817f30>] ? rest_init+0x80/0x80
         [<ffffffff81817f3e>] kernel_init+0xe/0xe0
         [<ffffffff8182488f>] ret_from_fork+0x3f/0x70
         [<ffffffff81817f30>] ? rest_init+0x80/0x80
      
      To fix this, only send the kick if the target CPU's interrupt has been
      initialized. This check isn't racy, because the target is waiting for
      the spinlock, so it won't have initialized the interrupt in the
      meantime.
      Signed-off-by: NRoss Lagerwall <ross.lagerwall@citrix.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Cc: xen-devel@lists.xenproject.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      707e59ba
  6. 22 4月, 2016 8 次提交
  7. 21 4月, 2016 2 次提交
    • G
      s390/mm: fix asce_bits handling with dynamic pagetable levels · 723cacbd
      Gerald Schaefer 提交于
      There is a race with multi-threaded applications between context switch and
      pagetable upgrade. In switch_mm() a new user_asce is built from mm->pgd and
      mm->context.asce_bits, w/o holding any locks. A concurrent mmap with a
      pagetable upgrade on another thread in crst_table_upgrade() could already
      have set new asce_bits, but not yet the new mm->pgd. This would result in a
      corrupt user_asce in switch_mm(), and eventually in a kernel panic from a
      translation exception.
      
      Fix this by storing the complete asce instead of just the asce_bits, which
      can then be read atomically from switch_mm(), so that it either sees the
      old value or the new value, but no mixture. Both cases are OK. Having the
      old value would result in a page fault on access to the higher level memory,
      but the fault handler would see the new mm->pgd, if it was a valid access
      after the mmap on the other thread has completed. So as worst-case scenario
      we would have a page fault loop for the racing thread until the next time
      slice.
      
      Also remove dead code and simplify the upgrade/downgrade path, there are no
      upgrades from 2 levels, and only downgrades from 3 levels for compat tasks.
      There are also no concurrent upgrades, because the mmap_sem is held with
      down_write() in do_mmap, so the flush and table checks during upgrade can
      be removed.
      Reported-by: NMichael Munday <munday@ca.ibm.com>
      Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      723cacbd
    • S
      s390/pci: fix use after free in dma_init · dba59909
      Sebastian Ott 提交于
      After a failure during registration of the dma_table (because of the
      function being in error state) we free its memory but don't reset the
      associated pointer to zero.
      
      When we then receive a notification from firmware (about the function
      being in error state) we'll try to walk and free the dma_table again.
      
      Fix this by resetting the dma_table pointer. In addition to that make
      sure that we free the iommu_bitmap when appropriate.
      Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      dba59909
  8. 20 4月, 2016 6 次提交
  9. 19 4月, 2016 1 次提交
  10. 18 4月, 2016 4 次提交
    • A
      arm64: fix invalidation of wrong __early_cpu_boot_status cacheline · adb49070
      Ard Biesheuvel 提交于
      In head.S, the str_l macro, which takes a source register, a symbol name
      and a temp register, is used to store a status value to the variable
      __early_cpu_boot_status. Subsequently, the value of the temp register is
      reused to invalidate any cachelines covering this variable.
      
      However, since str_l resolves to
      
            adrp    \tmp, \sym
            str     \src, [\tmp, :lo12:\sym]
      
      the temp register never actually holds the address of the variable but
      only of the 4 KB window that covers it, and reusing it leads to the
      wrong cacheline being invalidated. So instead, take the address
      explicitly before doing the store, and reuse that value to perform
      the cache invalidation.
      
      Fixes: bb905274 ("arm64: Handle early CPU boot failures")
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NSuzuki K Poulose <Suzuki.Poulose@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      adb49070
    • A
      powerpc: Update TM user feature bits in scan_features() · 4705e024
      Anton Blanchard 提交于
      We need to update the user TM feature bits (PPC_FEATURE2_HTM and
      PPC_FEATURE2_HTM) to mirror what we do with the kernel TM feature
      bit.
      
      At the moment, if firmware reports TM is not available we turn off
      the kernel TM feature bit but leave the userspace ones on. Userspace
      thinks it can execute TM instructions and it dies trying.
      
      This (together with a QEMU patch) fixes PR KVM, which doesn't currently
      support TM.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4705e024
    • A
      powerpc: Update cpu_user_features2 in scan_features() · beff8237
      Anton Blanchard 提交于
      scan_features() updates cpu_user_features but not cpu_user_features2.
      
      Amongst other things, cpu_user_features2 contains the user TM feature
      bits which we must keep in sync with the kernel TM feature bit.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      beff8237
    • A
      powerpc: scan_features() updates incorrect bits for REAL_LE · 6997e57d
      Anton Blanchard 提交于
      The REAL_LE feature entry in the ibm_pa_feature struct is missing an MMU
      feature value, meaning all the remaining elements initialise the wrong
      values.
      
      This means instead of checking for byte 5, bit 0, we check for byte 0,
      bit 0, and then we incorrectly set the CPU feature bit as well as MMU
      feature bit 1 and CPU user feature bits 0 and 2 (5).
      
      Checking byte 0 bit 0 (IBM numbering), means we're looking at the
      "Memory Management Unit (MMU)" feature - ie. does the CPU have an MMU.
      In practice that bit is set on all platforms which have the property.
      
      This means we set CPU_FTR_REAL_LE always. In practice that seems not to
      matter because all the modern cpus which have this property also
      implement REAL_LE, and we've never needed to disable it.
      
      We're also incorrectly setting MMU feature bit 1, which is:
      
        #define MMU_FTR_TYPE_8xx		0x00000002
      
      Luckily the only place that looks for MMU_FTR_TYPE_8xx is in Book3E
      code, which can't run on the same cpus as scan_features(). So this also
      doesn't matter in practice.
      
      Finally in the CPU user feature mask, we're setting bits 0 and 2. Bit 2
      is not currently used, and bit 0 is:
      
        #define PPC_FEATURE_PPC_LE		0x00000001
      
      Which says the CPU supports the old style "PPC Little Endian" mode.
      Again this should be harmless in practice as no 64-bit CPUs implement
      that mode.
      
      Fix the code by adding the missing initialisation of the MMU feature.
      
      Also add a comment marking CPU user feature bit 2 (0x4) as reserved. It
      would be unsafe to start using it as old kernels incorrectly set it.
      
      Fixes: 44ae3ab3 ("powerpc: Free up some CPU feature bits by moving out MMU-related features")
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      [mpe: Flesh out changelog, add comment reserving 0x4]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6997e57d
  11. 16 4月, 2016 3 次提交
    • V
      x86/hyperv: Avoid reporting bogus NMI status for Gen2 instances · 1e2ae9ec
      Vitaly Kuznetsov 提交于
      Generation2 instances don't support reporting the NMI status on port 0x61,
      read from there returns 'ff' and we end up reporting nonsensical PCI
      error (as there is no PCI bus in these instances) on all NMIs:
      
          NMI: PCI system error (SERR) for reason ff on CPU 0.
          Dazed and confused, but trying to continue
      
      Fix the issue by overriding x86_platform.get_nmi_reason. Use 'booted on
      EFI' flag to detect Gen2 instances.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: devel@linuxdriverproject.org
      Link: http://lkml.kernel.org/r/1460728232-31433-1-git-send-email-vkuznets@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1e2ae9ec
    • H
      s390: add CPU_BIG_ENDIAN config option · 2fd92273
      Heiko Carstens 提交于
      Make sure that s390 appears to be a big endian machine by defining
      this config option.
      
      Without this s390 appears to be little endian as seen by e.g. the
      recordmount script: "perl ./scripts/recordmcount.pl "s390" "little"
      "64""
      This has no practical impact within the script since the endian
      variable is only evaluated for mips. However there are already a
      couple of common code places which evaluate this config option. None
      of them is relevant for s390 currently though.
      
      To avoid any issues in the future (and fix the recordmcount oddity)
      add the new config option.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      2fd92273
    • H
      s390/spinlock: avoid yield to non existent cpu · 84976952
      Heiko Carstens 提交于
      arch_spin_lock_wait_flags() checks if a spinlock is not held before
      trying a compare and swap instruction. If the lock is unlocked it
      tries the compare and swap instruction, however if a different cpu
      grabbed the lock in the meantime the instruction will fail as
      expected.
      
      Subsequently the arch_spin_lock_wait_flags() incorrectly tries to
      figure out if the cpu that holds the lock is running. However it is
      using the wrong cpu number for this (-1) and then will also yield the
      current cpu to the wrong cpu.
      
      Fix this by adding a missing continue statement.
      
      Fixes: 470ada6b ("s390/spinlock: refactor arch_spin_lock_wait[_flags]")
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      84976952
  12. 15 4月, 2016 1 次提交