1. 17 6月, 2015 2 次提交
    • P
      x86: don't use module_init in non-modular devicetree.c code · d54b675a
      Paul Gortmaker 提交于
      The devicetree.o is built for "OF" -- which is bool, and hence
      this code is either present or absent.  It will never be modular,
      so using module_init as an alias for __initcall can be somewhat
      misleading.
      
      Fix this up now, so that we can relocate module_init from
      init.h into module.h in the future.  If we don't do this, we'd
      have to add module.h to obviously non-modular code, and that
      would be a worse thing.
      
      Note that direct use of __initcall is discouraged, vs. one
      of the priority categorized subgroups.  As __initcall gets
      mapped onto device_initcall, our use of device_initcall
      directly in this change means that the runtime impact is
      zero -- it will remain at level 6 in initcall ordering.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      d54b675a
    • P
      x86: don't use module_init in non-modular intel_mid_vrtc.c · 4711e2f9
      Paul Gortmaker 提交于
      The X86_INTEL_MID option is bool, and hence this code is either
      present or absent.  It will never be modular, so using
      module_init as an alias for __initcall is rather misleading.
      
      Fix this up now, so that we can relocate module_init from
      init.h into module.h in the future.  If we don't do this, we'd
      have to add module.h to obviously non-modular code, and that
      would be a worse thing.
      
      Note that direct use of __initcall is discouraged, vs. one
      of the priority categorized subgroups.  As __initcall gets
      mapped onto device_initcall, our use of device_initcall
      directly in this change means that the runtime impact is
      zero -- it will remain at level 6 in initcall ordering.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      4711e2f9
  2. 11 6月, 2015 1 次提交
    • A
      arch/x86/kvm/mmu.c: work around gcc-4.4.4 bug · 5ec45a19
      Andrew Morton 提交于
      Fix this compile issue with gcc-4.4.4:
      
         arch/x86/kvm/mmu.c: In function 'kvm_mmu_pte_write':
         arch/x86/kvm/mmu.c:4256: error: unknown field 'cr0_wp' specified in initializer
         arch/x86/kvm/mmu.c:4257: error: unknown field 'cr4_pae' specified in initializer
         arch/x86/kvm/mmu.c:4257: warning: excess elements in union initializer
         ...
      
      gcc-4.4.4 (at least) has issues when using anonymous unions in
      initializers.
      
      Fixes: edc90b7d ("KVM: MMU: fix SMAP virtualization")
      Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ec45a19
  3. 09 6月, 2015 1 次提交
    • I
      Revert "perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization" · 15c12479
      Ingo Molnar 提交于
      This reverts commit c05199e5.
      
      Vince Weaver reported the following crash while perf fuzzing:
      
      [   79.473121] kernel BUG at mm/vmalloc.c:1335!
      [   79.694391] Call Trace:
      [   79.696997]  <IRQ>
      [   79.699090]  [<ffffffff811b2130>] get_vm_area_caller+0x40/0x50
      [   79.705505]  [<ffffffff81039f4d>] ? snb_uncore_imc_init_box+0x6d/0x90
      [   79.712414]  [<ffffffff810635e5>] __ioremap_caller+0x195/0x350
      [   79.718610]  [<ffffffff81039f4d>] ? snb_uncore_imc_init_box+0x6d/0x90
      [   79.725462]  [<ffffffff81427f6b>] ? debug_object_activate+0x14b/0x1e0
      [   79.732346]  [<ffffffff810637b7>] ioremap_nocache+0x17/0x20
      [   79.738283]  [<ffffffff81039f4d>] snb_uncore_imc_init_box+0x6d/0x90
      [   79.744945]  [<ffffffff81039cf7>] snb_uncore_imc_event_start+0xb7/0x110
      [   79.752020]  [<ffffffff81039d97>] snb_uncore_imc_event_add+0x47/0x60
      [   79.758832]  [<ffffffff81162cbb>] event_sched_in.isra.85+0xfb/0x330
      [   79.765519]  [<ffffffff81162f5f>] group_sched_in+0x6f/0x1e0
      [   79.771481]  [<ffffffff8101df1a>] ? native_sched_clock+0x2a/0x90
      [   79.777858]  [<ffffffff811637bc>] __perf_event_enable+0x25c/0x2a0
      [   79.784418]  [<ffffffff810f3e69>] ? tick_nohz_irq_exit+0x29/0x30
      [   79.790820]  [<ffffffff8115ef30>] ? cpu_clock_event_start+0x40/0x40
      [   79.797546]  [<ffffffff8115ef80>] remote_function+0x50/0x60
      [   79.803535]  [<ffffffff810f8cd1>] flush_smp_call_function_queue+0x81/0x180
      [   79.810840]  [<ffffffff810f9763>] generic_smp_call_function_single_interrupt+0x13/0x60
      [   79.819328]  [<ffffffff8104b5e8>] smp_trace_call_function_single_interrupt+0x38/0xc0
      [   79.827614]  [<ffffffff816de9be>] trace_call_function_single_interrupt+0x6e/0x80
      [   79.835465]  <EOI>
      [   79.837543]  [<ffffffff8156e8b5>] ? cpuidle_enter_state+0x65/0x160
      [   79.844377]  [<ffffffff8156e8a1>] ? cpuidle_enter_state+0x51/0x160
      [   79.851015]  [<ffffffff8156e9e7>] cpuidle_enter+0x17/0x20
      [   79.856791]  [<ffffffff810b6e39>] cpu_startup_entry+0x399/0x440
      [   79.863165]  [<ffffffff816c9ddb>] rest_init+0xbb/0xd0
      
      The offending commit is clearly confused as it moves heavy initialization
      work into IPI context.
      
      Revert it.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yan, Zheng <zheng.z.yan@intel.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      15c12479
  4. 07 6月, 2015 1 次提交
  5. 04 6月, 2015 1 次提交
  6. 02 6月, 2015 1 次提交
    • A
      x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers · 425be567
      Andy Lutomirski 提交于
      The early_idt_handlers asm code generates an array of entry
      points spaced nine bytes apart.  It's not really clear from that
      code or from the places that reference it what's going on, and
      the code only works in the first place because GAS never
      generates two-byte JMP instructions when jumping to global
      labels.
      
      Clean up the code to generate the correct array stride (member size)
      explicitly. This should be considerably more robust against
      screw-ups, as GAS will warn if a .fill directive has a negative
      count.  Using '. =' to advance would have been even more robust
      (it would generate an actual error if it tried to move
      backwards), but it would pad with nulls, confusing anyone who
      tries to disassemble the code.  The new scheme should be much
      clearer to future readers.
      
      While we're at it, improve the comments and rename the array and
      common code.
      
      Binutils may start relaxing jumps to non-weak labels.  If so,
      this change will fix our build, and we may need to backport this
      change.
      
      Before, on x86_64:
      
        0000000000000000 <early_idt_handlers>:
           0:   6a 00                   pushq  $0x0
           2:   6a 00                   pushq  $0x0
           4:   e9 00 00 00 00          jmpq   9 <early_idt_handlers+0x9>
                                5: R_X86_64_PC32        early_idt_handler-0x4
        ...
          48:   66 90                   xchg   %ax,%ax
          4a:   6a 08                   pushq  $0x8
          4c:   e9 00 00 00 00          jmpq   51 <early_idt_handlers+0x51>
                                4d: R_X86_64_PC32       early_idt_handler-0x4
        ...
         117:   6a 00                   pushq  $0x0
         119:   6a 1f                   pushq  $0x1f
         11b:   e9 00 00 00 00          jmpq   120 <early_idt_handler>
                                11c: R_X86_64_PC32      early_idt_handler-0x4
      
      After:
      
        0000000000000000 <early_idt_handler_array>:
           0:   6a 00                   pushq  $0x0
           2:   6a 00                   pushq  $0x0
           4:   e9 14 01 00 00          jmpq   11d <early_idt_handler_common>
        ...
          48:   6a 08                   pushq  $0x8
          4a:   e9 d1 00 00 00          jmpq   120 <early_idt_handler_common>
          4f:   cc                      int3
          50:   cc                      int3
        ...
         117:   6a 00                   pushq  $0x0
         119:   6a 1f                   pushq  $0x1f
         11b:   eb 03                   jmp    120 <early_idt_handler_common>
         11d:   cc                      int3
         11e:   cc                      int3
         11f:   cc                      int3
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Binutils <binutils@sourceware.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      425be567
  7. 29 5月, 2015 2 次提交
    • I
      x86/boot: Add CONFIG_PARAVIRT_SPINLOCKS quirk to arch/x86/boot/compressed/misc.h · 927392d7
      Ingo Molnar 提交于
      Linus reported the following new warning on x86 allmodconfig with GCC 5.1:
      
        > ./arch/x86/include/asm/spinlock.h: In function ‘arch_spin_lock’:
        > ./arch/x86/include/asm/spinlock.h:119:3: warning: implicit declaration
        > of function ‘__ticket_lock_spinning’ [-Wimplicit-function-declaration]
        >    __ticket_lock_spinning(lock, inc.tail);
        >    ^
      
      This warning triggers because of these hacks in misc.h:
      
        /*
         * we have to be careful, because no indirections are allowed here, and
         * paravirt_ops is a kind of one. As it will only run in baremetal anyway,
         * we just keep it from happening
         */
        #undef CONFIG_PARAVIRT
        #undef CONFIG_KASAN
      
      But these hacks were not updated when CONFIG_PARAVIRT_SPINLOCKS was added,
      and eventually (with the introduction of queued paravirt spinlocks in
      recent kernels) this created an invalid Kconfig combination and broke
      the build.
      
      So add a CONFIG_PARAVIRT_SPINLOCKS #undef line as well.
      
      Also remove the _ASM_X86_DESC_H quirk: that undocumented quirk
      was originally added ages ago, in:
      
        099e1377 ("x86: use ELF format in compressed images.")
      
      and I went back to that kernel (and fixed up the main Makefile
      which didn't build anymore) and checked what failure it
      avoided: it avoided an include file dependencies related
      build failure related to our old x86-platforms code.
      
      That old code is long gone, the header dependencies got cleaned
      up, and the build does not fail anymore with the totality of
      asm/desc.h included - so remove the quirk.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      927392d7
    • J
      x86/asm/entry/32: Really make user_mode() work correctly for VM86 mode · 7ba554b5
      Jan Beulich 提交于
      While commit efa70451 ("x86/asm/entry: Make user_mode() work
      correctly if regs came from VM86 mode") claims that "user_mode()
      is now identical to user_mode_vm()", this wasn't actually the
      case - no prior commit made it so.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/5566EB0D020000780007E655@mail.emea.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7ba554b5
  8. 28 5月, 2015 2 次提交
  9. 27 5月, 2015 4 次提交
    • D
      perf/x86: Tweak broken BIOS rules during check_hw_exists() · 68ab7476
      Don Zickus 提交于
      I stumbled upon an AMD box that had the BIOS using a hardware performance
      counter. Instead of printing out a warning and continuing, it failed and
      blocked further perf counter usage.
      
      Looking through the history, I found this commit:
      
        a5ebe0ba ("perf/x86: Check all MSRs before passing hw check")
      
      which tweaked the rules for a Xen guest on an almost identical box and now
      changed the behaviour.
      
      Unfortunately the rules were tweaked incorrectly and will always lead to
      MSR failures even though the MSRs are completely fine.
      
      What happens now is in arch/x86/kernel/cpu/perf_event.c::check_hw_exists():
      
      <snip>
              for (i = 0; i < x86_pmu.num_counters; i++) {
                      reg = x86_pmu_config_addr(i);
                      ret = rdmsrl_safe(reg, &val);
                      if (ret)
                              goto msr_fail;
                      if (val & ARCH_PERFMON_EVENTSEL_ENABLE) {
                              bios_fail = 1;
                              val_fail = val;
                              reg_fail = reg;
                      }
              }
      
      <snip>
              /*
               * Read the current value, change it and read it back to see if it
               * matches, this is needed to detect certain hardware emulators
               * (qemu/kvm) that don't trap on the MSR access and always return 0s.
               */
              reg = x86_pmu_event_addr(0);
      				^^^^
      
      if the first perf counter is enabled, then this routine will always fail
      because the counter is running. :-(
      
              if (rdmsrl_safe(reg, &val))
                      goto msr_fail;
              val ^= 0xffffUL;
              ret = wrmsrl_safe(reg, val);
              ret |= rdmsrl_safe(reg, &val_new);
              if (ret || val != val_new)
                      goto msr_fail;
      
      The above bios_fail used to be a 'goto' which is why it worked in the past.
      
      Further, most vendors have migrated to using fixed counters to hide their
      evilness hence this problem rarely shows up now days except on a few old boxes.
      
      I fixed my problem and kept the spirit of the original Xen fix, by recording a
      safe non-enable register to be used safely for the reading/writing check.
      Because it is not enabled, this passes on bare metal boxes (like metal), but
      should continue to throw an msr_fail on Xen guests because the register isn't
      emulated yet.
      
      Now I get a proper bios_fail error message and Xen should still see their
      msr_fail message (untested).
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: george.dunlap@eu.citrix.com
      Cc: konrad.wilk@oracle.com
      Link: http://lkml.kernel.org/r/1431976608-56970-1-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      68ab7476
    • A
      perf/x86/intel/pt: Untangle pt_buffer_reset_markers() · f73ec48c
      Alexander Shishkin 提交于
      Currently, pt_buffer_reset_markers() is a difficult to read knot of
      arithmetics with a redundant check for multiple-entry TOPA capability,
      a commented out wakeup marker placement and a logical error wrt to
      stop marker placement. The latter happens when write head is not page
      aligned and results in stop marker being placed one page earlier than
      it actually should.
      
      All these problems only affect PT implementations that support
      multiple-entry TOPA tables (read: proper scatter-gather).
      
      For single-entry TOPA implementations, there is no functional impact.
      
      This patch deals with all of the above. Tested on both single-entry
      and multiple-entry TOPA PT implementations.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: hpa@zytor.com
      Link: http://lkml.kernel.org/r/1432308626-18845-4-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f73ec48c
    • P
      perf/x86: Improve HT workaround GP counter constraint · cc1790cf
      Peter Zijlstra 提交于
      The (SNB/IVB/HSW) HT bug only affects events that can be programmed
      onto GP counters, therefore we should only limit the number of GP
      counters that can be used per cpu -- iow we should not constrain the
      FP counters.
      
      Furthermore, we should only enfore such a limit when there are in fact
      exclusive events being scheduled on either sibling.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      [ Fixed build fail for the !CONFIG_CPU_SUP_INTEL case. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      cc1790cf
    • P
      perf/x86: Fix event/group validation · b371b594
      Peter Zijlstra 提交于
      Commit 43b45780 ("perf/x86: Reduce stack usage of
      x86_schedule_events()") violated the rule that 'fake' scheduling; as
      used for event/group validation; should not change the event state.
      
      This went mostly un-noticed because repeated calls of
      x86_pmu::get_event_constraints() would give the same result. And
      x86_pmu::put_event_constraints() would mostly not do anything.
      
      Commit e979121b ("perf/x86/intel: Implement cross-HT corruption
      bug workaround") made the situation much worse by actually setting the
      event->hw.constraint value to NULL, so when validation and actual
      scheduling interact we get NULL ptr derefs.
      
      Fix it by removing the constraint pointer from the event and move it
      back to an array, this time in cpuc instead of on the stack.
      
      validate_group()
        x86_schedule_events()
          event->hw.constraint = c; # store
      
            <context switch>
              perf_task_event_sched_in()
                ...
                  x86_schedule_events();
                    event->hw.constraint = c2; # store
      
                    ...
      
                    put_event_constraints(event); # assume failure to schedule
                      intel_put_event_constraints()
                        event->hw.constraint = NULL;
      
            <context switch end>
      
          c = event->hw.constraint; # read -> NULL
      
          if (!test_bit(hwc->idx, c->idxmsk)) # <- *BOOM* NULL deref
      
      This in particular is possible when the event in question is a
      cpu-wide event and group-leader, where the validate_group() tries to
      add an event to the group.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 43b45780 ("perf/x86: Reduce stack usage of x86_schedule_events()")
      Fixes: e979121b ("perf/x86/intel: Implement cross-HT corruption bug workaround")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b371b594
  10. 25 5月, 2015 1 次提交
  11. 20 5月, 2015 4 次提交
    • L
      kvm/fpu: Enable eager restore kvm FPU for MPX · c447e76b
      Liang Li 提交于
      The MPX feature requires eager KVM FPU restore support. We have verified
      that MPX cannot work correctly with the current lazy KVM FPU restore
      mechanism. Eager KVM FPU restore should be enabled if the MPX feature is
      exposed to VM.
      Signed-off-by: NYang Zhang <yang.z.zhang@intel.com>
      Signed-off-by: NLiang Li <liang.z.li@intel.com>
      [Also activate the FPU on AMD processors. - Paolo]
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c447e76b
    • P
      Revert "KVM: x86: drop fpu_activate hook" · 0fdd74f7
      Paolo Bonzini 提交于
      This reverts commit 4473b570.  We'll
      use the hook again.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0fdd74f7
    • A
      kvm: fix crash in kvm_vcpu_reload_apic_access_page · e8fd5e9e
      Andrea Arcangeli 提交于
      memslot->userfault_addr is set by the kernel with a mmap executed
      from the kernel but the userland can still munmap it and lead to the
      below oops after memslot->userfault_addr points to a host virtual
      address that has no vma or mapping.
      
      [  327.538306] BUG: unable to handle kernel paging request at fffffffffffffffe
      [  327.538407] IP: [<ffffffff811a7b55>] put_page+0x5/0x50
      [  327.538474] PGD 1a01067 PUD 1a03067 PMD 0
      [  327.538529] Oops: 0000 [#1] SMP
      [  327.538574] Modules linked in: macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT iptable_filter ip_tables tun bridge stp llc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ipmi_devintf iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp dcdbas intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sb_edac edac_core ipmi_si ipmi_msghandler acpi_pad wmi acpi_power_meter lpc_ich mfd_core mei_me
      [  327.539488]  mei shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc mlx4_ib ib_sa ib_mad ib_core mlx4_en vxlan ib_addr ip_tunnel xfs libcrc32c sd_mod crc_t10dif crct10dif_common crc32c_intel mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm ahci i2c_core libahci mlx4_core libata tg3 ptp pps_core megaraid_sas ntb dm_mirror dm_region_hash dm_log dm_mod
      [  327.539956] CPU: 3 PID: 3161 Comm: qemu-kvm Not tainted 3.10.0-240.el7.userfault19.4ca4011.x86_64.debug #1
      [  327.540045] Hardware name: Dell Inc. PowerEdge R420/0CN7CM, BIOS 2.1.2 01/20/2014
      [  327.540115] task: ffff8803280ccf00 ti: ffff880317c58000 task.ti: ffff880317c58000
      [  327.540184] RIP: 0010:[<ffffffff811a7b55>]  [<ffffffff811a7b55>] put_page+0x5/0x50
      [  327.540261] RSP: 0018:ffff880317c5bcf8  EFLAGS: 00010246
      [  327.540313] RAX: 00057ffffffff000 RBX: ffff880616a20000 RCX: 0000000000000000
      [  327.540379] RDX: 0000000000002014 RSI: 00057ffffffff000 RDI: fffffffffffffffe
      [  327.540445] RBP: ffff880317c5bd10 R08: 0000000000000103 R09: 0000000000000000
      [  327.540511] R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffffe
      [  327.540576] R13: 0000000000000000 R14: ffff880317c5bd70 R15: ffff880317c5bd50
      [  327.540643] FS:  00007fd230b7f700(0000) GS:ffff880630800000(0000) knlGS:0000000000000000
      [  327.540717] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  327.540771] CR2: fffffffffffffffe CR3: 000000062a2c3000 CR4: 00000000000427e0
      [  327.540837] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  327.540904] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  327.540974] Stack:
      [  327.541008]  ffffffffa05d6d0c ffff880616a20000 0000000000000000 ffff880317c5bdc0
      [  327.541093]  ffffffffa05ddaa2 0000000000000000 00000000002191bf 00000042f3feab2d
      [  327.541177]  00000042f3feab2d 0000000000000002 0000000000000001 0321000000000000
      [  327.541261] Call Trace:
      [  327.541321]  [<ffffffffa05d6d0c>] ? kvm_vcpu_reload_apic_access_page+0x6c/0x80 [kvm]
      [  327.543615]  [<ffffffffa05ddaa2>] vcpu_enter_guest+0x3f2/0x10f0 [kvm]
      [  327.545918]  [<ffffffffa05e2f10>] kvm_arch_vcpu_ioctl_run+0x2b0/0x5a0 [kvm]
      [  327.548211]  [<ffffffffa05e2d02>] ? kvm_arch_vcpu_ioctl_run+0xa2/0x5a0 [kvm]
      [  327.550500]  [<ffffffffa05ca845>] kvm_vcpu_ioctl+0x2b5/0x680 [kvm]
      [  327.552768]  [<ffffffff810b8d12>] ? creds_are_invalid.part.1+0x12/0x50
      [  327.555069]  [<ffffffff810b8d71>] ? creds_are_invalid+0x21/0x30
      [  327.557373]  [<ffffffff812d6066>] ? inode_has_perm.isra.49.constprop.65+0x26/0x80
      [  327.559663]  [<ffffffff8122d985>] do_vfs_ioctl+0x305/0x530
      [  327.561917]  [<ffffffff8122dc51>] SyS_ioctl+0xa1/0xc0
      [  327.564185]  [<ffffffff816de829>] system_call_fastpath+0x16/0x1b
      [  327.566480] Code: 0b 31 f6 4c 89 e7 e8 4b 7f ff ff 0f 0b e8 24 fd ff ff e9 a9 fd ff ff 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> f7 07 00 c0 00 00 55 48 89 e5 75 2a 8b 47 1c 85 c0 74 1e f0
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e8fd5e9e
    • I
      x86/fpu: Disable XSAVES* support for now · e88221c5
      Ingo Molnar 提交于
      The kernel's handling of 'compacted' xsave state layout is buggy:
      
          http://marc.info/?l=linux-kernel&m=142967852317199
      
      I don't have such a system, and the description there is vague, but
      from extrapolation I guess that there were two kinds of bugs
      observed:
      
        - boot crashes, due to size calculations being wrong and the dynamic
          allocation allocating a too small xstate area. (This is now fixed
          in the new FPU code - but still present in stable kernels.)
      
        - FPU state corruption and ABI breakage: if signal handlers try to
          change the FPU state in standard format, which then the kernel
          tries to restore in the compacted format.
      
      These breakages are scary, but they only occur on a small number of
      systems that have XSAVES* CPU support. Yet we have had XSAVES support
      in the upstream kernel for a large number of stable kernel releases,
      and the fixes are involved and unproven.
      
      So do the safe resolution first: disable XSAVES* support and only
      use the standard xstate format. This makes the code work and is
      easy to backport.
      
      On top of this we can work on enabling (and testing!) proper
      compacted format support, without backporting pressure, on top of the
      new, cleaned up FPU code.
      
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e88221c5
  12. 18 5月, 2015 1 次提交
    • B
      x86/mce: Fix MCE severity messages · 17fea54b
      Borislav Petkov 提交于
      Derek noticed that a critical MCE gets reported with the wrong
      error type description:
      
        [Hardware Error]: CPU 34: Machine Check Exception: 5 Bank 9: f200003f000100b0
        [Hardware Error]: RIP !INEXACT! 10:<ffffffff812e14c1> {intel_idle+0xb1/0x170}
        [Hardware Error]: TSC 49587b8e321cb
        [Hardware Error]: PROCESSOR 0:306e4 TIME 1431561296 SOCKET 1 APIC 29
        [Hardware Error]: Some CPUs didn't answer in synchronization
        [Hardware Error]: Machine check: Invalid
      				   ^^^^^^^
      
      The last line with 'Invalid' should have printed the high level
      MCE error type description we get from mce_severity, i.e.
      something like:
      
        [Hardware Error]: Machine check: Action required: data load error in a user process
      
      this happens due to the fact that mce_no_way_out() iterates over
      all MCA banks and possibly overwrites the @msg argument which is
      used in the panic printing later.
      
      Change behavior to take the message of only and the (last)
      critical MCE it detects.
      Reported-by: NDerek <denc716@gmail.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1431936437-25286-3-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      17fea54b
  13. 13 5月, 2015 1 次提交
  14. 11 5月, 2015 5 次提交
  15. 08 5月, 2015 1 次提交
  16. 06 5月, 2015 4 次提交
  17. 05 5月, 2015 1 次提交
  18. 01 5月, 2015 1 次提交
    • J
      x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus · 2c62e849
      Jiang Liu 提交于
      An IO port or MMIO resource assigned to a PCI host bridge may be
      consumed by the host bridge itself or available to its child
      bus/devices. The ACPI specification defines a bit (Producer/Consumer)
      to tell whether the resource is consumed by the host bridge itself,
      but firmware hasn't used that bit consistently, so we can't rely on it.
      
      Before commit 593669c2 ("x86/PCI/ACPI: Use common ACPI resource
      interfaces to simplify implementation"), arch/x86/pci/acpi.c ignored
      all IO port resources defined by acpi_resource_io and
      acpi_resource_fixed_io to filter out IO ports consumed by the host
      bridge itself.
      
      Commit 593669c2 ("x86/PCI/ACPI: Use common ACPI resource interfaces
      to simplify implementation") started accepting all IO port and MMIO
      resources, which caused a regression that IO port resources consumed
      by the host bridge itself became available to its child devices.
      
      Then commit 63f1789e ("x86/PCI/ACPI: Ignore resources consumed by
      host bridge itself") ignored resources consumed by the host bridge
      itself by checking the IORESOURCE_WINDOW flag, which accidently removed
      MMIO resources defined by acpi_resource_memory24, acpi_resource_memory32
      and acpi_resource_fixed_memory32.
      
      On x86 and IA64 platforms, all IO port and MMIO resources are assumed
      to be available to child bus/devices except one special case:
          IO port [0xCF8-0xCFF] is consumed by the host bridge itself
          to access PCI configuration space.
      
      So explicitly filter out PCI CFG IO ports[0xCF8-0xCFF]. This solution
      will also ease the way to consolidate ACPI PCI host bridge common code
      from x86, ia64 and ARM64.
      
      Related ACPI table are archived at:
      https://bugzilla.kernel.org/show_bug.cgi?id=94221
      
      Related discussions at:
      http://patchwork.ozlabs.org/patch/461633/
      https://lkml.org/lkml/2015/3/29/304
      
      Fixes: 63f1789e (Ignore resources consumed by host bridge itself)
      Reported-by: NBernhard Thaler <bernhard.thaler@wvnet.at>
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: 4.0+ <stable@vger.kernel.org> # 4.0+
      Reviewed-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2c62e849
  19. 30 4月, 2015 1 次提交
    • B
      xen: Suspend ticks on all CPUs during suspend · 2b953a5e
      Boris Ostrovsky 提交于
      Commit 77e32c89 ("clockevents: Manage device's state separately for
      the core") decouples clockevent device's modes from states. With this
      change when a Xen guest tries to resume, it won't be calling its
      set_mode op which needs to be done on each VCPU in order to make the
      hypervisor aware that we are in oneshot mode.
      
      This happens because clockevents_tick_resume() (which is an intermediate
      step of resuming ticks on a processor) doesn't call clockevents_set_state()
      anymore and because during suspend clockevent devices on all VCPUs (except
      for the one doing the suspend) are left in ONESHOT state. As result, during
      resume the clockevents state machine will assume that device is already
      where it should be and doesn't need to be updated.
      
      To avoid this problem we should suspend ticks on all VCPUs during
      suspend.
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      2b953a5e
  20. 27 4月, 2015 3 次提交
    • P
      x86: pvclock: Really remove the sched notifier for cross-cpu migrations · 73459e2a
      Paolo Bonzini 提交于
      This reverts commits 0a4e6be9
      and 80f7fdb1.
      
      The task migration notifier was originally introduced in order to support
      the pvclock vsyscall with non-synchronized TSC, but KVM only supports it
      with synchronized TSC.  Hence, on KVM the race condition is only needed
      due to a bad implementation on the host side, and even then it's so rare
      that it's mostly theoretical.
      
      As far as KVM is concerned it's possible to fix the host, avoiding the
      additional complexity in the vDSO and the (re)introduction of the task
      migration notifier.
      
      Xen, on the other hand, hasn't yet implemented vsyscall support at
      all, so we do not care about its plans for non-synchronized TSC.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Suggested-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      73459e2a
    • R
      kvm: x86: fix kvmclock update protocol · 5dca0d91
      Radim Krčmář 提交于
      The kvmclock spec says that the host will increment a version field to
      an odd number, then update stuff, then increment it to an even number.
      The host is buggy and doesn't do this, and the result is observable
      when one vcpu reads another vcpu's kvmclock data.
      
      There's no good way for a guest kernel to keep its vdso from reading
      a different vcpu's kvmclock data, but we don't need to care about
      changing VCPUs as long as we read a consistent data from kvmclock.
      (VCPU can change outside of this loop too, so it doesn't matter if we
      return a value not fit for this VCPU.)
      
      Based on a patch by Radim Krčmář.
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5dca0d91
    • A
      x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue · 61f01dd9
      Andy Lutomirski 提交于
      AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with
      SS == 0 results in an invalid usermode state in which SS is apparently
      equal to __USER_DS but causes #SS if used.
      
      Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
      ensuring that SYSRET never happens with SS set to NULL.
      
      This was exposed by a recent vDSO cleanup.
      
      Fixes: e7d6eefa x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      61f01dd9
  21. 24 4月, 2015 2 次提交
    • L
      x86: fix special __probe_kernel_write() tail zeroing case · d869844b
      Linus Torvalds 提交于
      Commit cae2a173 ("x86: clean up/fix 'copy_in_user()' tail zeroing")
      fixed the failure case tail zeroing of one special case of the x86-64
      generic user-copy routine, namely when used for the user-to-user case
      ("copy_in_user()").
      
      But in the process it broke an even more unusual case: using the user
      copy routine for kernel-to-kernel copying.
      
      Now, normally kernel-kernel copies are obviously done using memcpy(),
      but we have a couple of special cases when we use the user-copy
      functions.  One is when we pass a kernel buffer to a regular user-buffer
      routine, using set_fs(KERNEL_DS).  That's a "normal" case, and continued
      to work fine, because it never takes any faults (with the possible
      exception of a silent and successful vmalloc fault).
      
      But Jan Beulich pointed out another, very unusual, special case: when we
      use the user-copy routines not because it's a path that expects a user
      pointer, but for a couple of ftrace/kgdb cases that want to do a kernel
      copy, but do so using "unsafe" buffers, and use the user-copy routine to
      gracefully handle faults.  IOW, for probe_kernel_write().
      
      And that broke for the case of a faulting kernel destination, because we
      saw the kernel destination and wanted to try to clear the tail of the
      buffer.  Which doesn't work, since that's what faults.
      
      This only triggers for things like kgdb and ftrace users (eg trying
      setting a breakpoint on read-only memory), but it's definitely a bug.
      The fix is to not compare against the kernel address start (TASK_SIZE),
      but instead use the same limits "access_ok()" uses.
      Reported-and-tested-by: NJan Beulich <jbeulich@suse.com>
      Cc: stable@vger.kernel.org # 4.0
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d869844b
    • A
      crypto: x86/sha512_ssse3 - fixup for asm function prototype change · 00425bb1
      Ard Biesheuvel 提交于
      Patch e68410eb ("crypto: x86/sha512_ssse3 - move SHA-384/512
      SSSE3 implementation to base layer") changed the prototypes of the
      core asm SHA-512 implementations so that they are compatible with
      the prototype used by the base layer.
      
      However, in one instance, the register that was used for passing the
      input buffer was reused as a scratch register later on in the code,
      and since the input buffer param changed places with the digest param
      -which needs to be written back before the function returns- this
      resulted in the scratch register to be dereferenced in a memory write
      operation, causing a GPF.
      
      Fix this by changing the scratch register to use the same register as
      the input buffer param again.
      
      Fixes: e68410eb ("crypto: x86/sha512_ssse3 - move SHA-384/512 SSSE3 implementation to base layer")
      Reported-By: NBobby Powers <bobbypowers@gmail.com>
      Tested-By: NBobby Powers <bobbypowers@gmail.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      00425bb1