1. 23 7月, 2014 1 次提交
    • P
      x86, cpu: Fix cache topology for early P4-SMT · 2a226155
      Peter Zijlstra 提交于
      P4 systems with cpuid level < 4 can have SMT, but the cache topology
      description available (cpuid2) does not include SMP information.
      
      Now we know that SMT shares all cache levels, and therefore we can
      mark all available cache levels as shared.
      
      We do this by setting cpu_llc_id to ->phys_proc_id, since that's
      the same for each SMT thread. We can do this unconditional since if
      there's no SMT its still true, the one CPU shares cache with only
      itself.
      
      This fixes a problem where such CPUs report an incorrect LLC CPU mask.
      
      This in turn fixes a crash in the scheduler where the topology was
      build wrong, it assumes the LLC mask to include at least the SMT CPUs.
      
      Cc: Josh Boyer <jwboyer@redhat.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Tested-by: NBruno Wolff III <bruno@wolff.to>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140722133514.GM12054@laptop.lanSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      2a226155
  2. 22 7月, 2014 2 次提交
    • S
      x86_32, entry: Store badsys error code in %eax · 8142b215
      Sven Wegener 提交于
      Commit 554086d8 ("x86_32, entry: Do syscall exit work on badsys
      (CVE-2014-4508)") introduced a regression in the x86_32 syscall entry
      code, resulting in syscall() not returning proper errors for undefined
      syscalls on CPUs supporting the sysenter feature.
      
      The following code:
      
      > int result = syscall(666);
      > printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno));
      
      results in:
      
      > result=666 errno=0 error=Success
      
      Obviously, the syscall return value is the called syscall number, but it
      should have been an ENOSYS error. When run under ptrace it behaves
      correctly, which makes it hard to debug in the wild:
      
      > result=-1 errno=38 error=Function not implemented
      
      The %eax register is the return value register. For debugging via ptrace
      the syscall entry code stores the complete register context on the
      stack. The badsys handlers only store the ENOSYS error code in the
      ptrace register set and do not set %eax like a regular syscall handler
      would. The old resume_userspace call chain contains code that clobbers
      %eax and it restores %eax from the ptrace registers afterwards. The same
      goes for the ptrace-enabled call chain. When ptrace is not used, the
      syscall return value is the passed-in syscall number from the untouched
      %eax register.
      
      Use %eax as the return value register in syscall_badsys and
      sysenter_badsys, like a real syscall handler does, and have the caller
      push the value onto the stack for ptrace access.
      Signed-off-by: NSven Wegener <sven.wegener@stealer.net>
      Link: http://lkml.kernel.org/r/alpine.LNX.2.11.1407221022380.31021@titan.int.lan.stealer.netReviewed-and-tested-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: <stable@vger.kernel.org> # If 554086d8 is backported
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      8142b215
    • B
      x86, MCE: Robustify mcheck_init_device · 51cbe7e7
      Borislav Petkov 提交于
      BorisO reports that misc_register() fails often on xen. The current code
      unregisters the CPU hotplug notifier in that case. If then a CPU is
      offlined and onlined back again, we end up with a second timer running
      on that CPU, leading to soft lockups and system hangs.
      
      So let's leave the hotcpu notifier always registered - even if
      mce_device_create failed for some cores and never unreg it so that we
      can deal with the timer handling accordingly.
      Reported-and-Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Link: http://lkml.kernel.org/r/1403274493-1371-1-git-send-email-boris.ostrovsky@oracle.comSigned-off-by: NBorislav Petkov <bp@suse.de>
      51cbe7e7
  3. 16 7月, 2014 2 次提交
  4. 15 7月, 2014 1 次提交
  5. 11 7月, 2014 2 次提交
  6. 10 7月, 2014 1 次提交
    • M
      x86/efi: Include a .bss section within the PE/COFF headers · c7fb93ec
      Michael Brown 提交于
      The PE/COFF headers currently describe only the initialised-data
      portions of the image, and result in no space being allocated for the
      uninitialised-data portions.  Consequently, the EFI boot stub will end
      up overwriting unexpected areas of memory, with unpredictable results.
      
      Fix by including a .bss section in the PE/COFF headers (functionally
      equivalent to the init_size field in the bzImage header).
      Signed-off-by: NMichael Brown <mbrown@fensystems.co.uk>
      Cc: Thomas Bächler <thomas@archlinux.org>
      Cc: Josh Boyer <jwboyer@fedoraproject.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      c7fb93ec
  7. 08 7月, 2014 1 次提交
    • B
      KVM: x86: Check for nested events if there is an injectable interrupt · 9242b5b6
      Bandan Das 提交于
      With commit b6b8a145 that introduced
      vmx_check_nested_events, checks for injectable interrupts happen
      at different points in time for L1 and L2 that could potentially
      cause a race. The regression occurs because KVM_REQ_EVENT is always
      set when nested_run_pending is set even if there's no pending interrupt.
      Consequently, there could be a small window when check_nested_events
      returns without exiting to L1, but an interrupt comes through soon
      after and it incorrectly, gets injected to L2 by inject_pending_event
      Fix this by adding a call to check for nested events too when a check
      for injectable interrupt returns true
      Signed-off-by: NBandan Das <bsd@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9242b5b6
  8. 04 7月, 2014 1 次提交
    • T
      ptrace,x86: force IRET path after a ptrace_stop() · b9cd18de
      Tejun Heo 提交于
      The 'sysret' fastpath does not correctly restore even all regular
      registers, much less any segment registers or reflags values.  That is
      very much part of why it's faster than 'iret'.
      
      Normally that isn't a problem, because the normal ptrace() interface
      catches the process using the signal handler infrastructure, which
      always returns with an iret.
      
      However, some paths can get caught using ptrace_event() instead of the
      signal path, and for those we need to make sure that we aren't going to
      return to user space using 'sysret'.  Otherwise the modifications that
      may have been done to the register set by the tracer wouldn't
      necessarily take effect.
      
      Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from
      arch_ptrace_stop_needed() which is invoked from ptrace_stop().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9cd18de
  9. 02 7月, 2014 2 次提交
    • H
      perf/x86/intel: ignore CondChgd bit to avoid false NMI handling · b292d7a1
      HATAYAMA Daisuke 提交于
      Currently, any NMI is falsely handled by a NMI handler of NMI watchdog
      if CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR is set.
      
      For example, we use external NMI to make system panic to get crash
      dump, but in this case, the external NMI is falsely handled do to the
      issue.
      
      This commit deals with the issue simply by ignoring CondChgd bit.
      
      Here is explanation in detail.
      
      On x86 NMI watchdog uses performance monitoring feature to
      periodically signal NMI each time performance counter gets overflowed.
      
      intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI
      handler of NMI watchdog, perf_event_nmi_handler(). It identifies an
      owner of a given NMI by looking at overflow status bits in
      MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it
      handles the given NMI as its own NMI.
      
      The problem is that the intel_pmu_handle_irq() doesn't distinguish
      CondChgd bit from other bits. Unlike the other status bits, CondChgd
      bit doesn't represent overflow status for performance counters. Thus,
      CondChgd bit cannot be thought of as a mark indicating a given NMI is
      NMI watchdog's.
      
      As a result, if CondChgd bit is set, any NMI is falsely handled by the
      NMI handler of NMI watchdog. Also, if type of the falsely handled NMI
      is either NMI_UNKNOWN, NMI_SERR or NMI_IO_CHECK, the corresponding
      action is never performed until CondChgd bit is cleared.
      
      I noticed this behavior on systems with Ivy Bridge processors: Intel
      Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems,
      CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set
      in the beginning at boot. Then the CondChgd bit is immediately cleared
      by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain
      0.
      
      On the other hand, on older processors such as Nehalem, Xeon E7540,
      CondChgd bit is not set in the beginning at boot.
      
      I'm not sure about exact behavior of CondChgd bit, in particular when
      this bit is set. Although I read Intel System Programmer's Manual to
      figure out that, the descriptions I found are:
      
        In 18.9.1:
      
        "The MSR_PERF_GLOBAL_STATUS MSR also provides a ¡sticky bit¢ to
         indicate changes to the state of performancmonitoring hardware"
      
        In Table 35-2 IA-32 Architectural MSRs
      
        63 CondChg: status bits of this register has changed.
      
      These are different from the bahviour I see on the actual system as I
      explained above.
      
      At least, I think ignoring CondChgd bit should be enough for NMI
      watchdog perspective.
      Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/20140625.103503.409316067.d.hatayama@jp.fujitsu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b292d7a1
    • P
      x86, tsc: Fix cpufreq lockup · 3896c329
      Peter Zijlstra 提交于
      Mauro reported that his AMD X2 using the powernow-k8 cpufreq driver
      locked up when doing cpu hotplug.
      
      Because we called set_cyc2ns_scale() from the time_cpufreq_notifier()
      unconditionally, it gets called multiple times for each freq change,
      instead of only the once, when the tsc_khz value actually changes.
      
      Because it gets called more than once, we run out of cyc2ns data slots
      and stall, waiting for a free one, but because we're half way offline,
      there's no consumers to free slots.
      
      By placing the call inside the condition that actually changes tsc_khz
      we avoid superfluous calls and avoid the problem.
      Reported-by: NMauro <registosites@hotmail.com>
      Tested-by: NMauro <registosites@hotmail.com>
      Fixes: 20d1c86a ("sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs")
      Cc: <stable@vger.kernel.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Bin Gao <bin.gao@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Stefani Seibold <stefani@seibold.net>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3896c329
  10. 30 6月, 2014 1 次提交
  11. 25 6月, 2014 3 次提交
  12. 24 6月, 2014 3 次提交
  13. 21 6月, 2014 1 次提交
  14. 20 6月, 2014 4 次提交
  15. 19 6月, 2014 3 次提交
  16. 18 6月, 2014 1 次提交
  17. 17 6月, 2014 1 次提交
  18. 14 6月, 2014 2 次提交
  19. 13 6月, 2014 2 次提交
  20. 11 6月, 2014 3 次提交
  21. 09 6月, 2014 1 次提交
  22. 08 6月, 2014 1 次提交
    • M
      x86/boot: EFI_MIXED should not prohibit loading above 4G · 745c5167
      Matt Fleming 提交于
      commit 7d453eee ("x86/efi: Wire up CONFIG_EFI_MIXED") introduced a
      regression for the functionality to load kernels above 4G. The relevant
      (incorrect) reasoning behind this change can be seen in the commit
      message,
      
        "The xloadflags field in the bzImage header is also updated to reflect
        that the kernel supports both entry points by setting both of
        XLF_EFI_HANDOVER_32 and XLF_EFI_HANDOVER_64 when CONFIG_EFI_MIXED=y.
        XLF_CAN_BE_LOADED_ABOVE_4G is disabled so that the kernel text is
        guaranteed to be addressable with 32-bits."
      
      This is obviously bogus since 32-bit EFI loaders will never place the
      kernel above the 4G mark. So this restriction is entirely unnecessary.
      
      But things are worse than that - since we want to encourage people to
      always compile with CONFIG_EFI_MIXED=y so that their kernels work out of
      the box for both 32-bit and 64-bit firmware, commit 7d453eee
      effectively disables XLF_CAN_BE_LOADED_ABOVE_4G completely.
      
      Remove the overzealous and superfluous restriction and restore the
      XLF_CAN_BE_LOADED_ABOVE_4G functionality.
      
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Link: http://lkml.kernel.org/r/1402140380-15377-1-git-send-email-matt@console-pimps.orgSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      745c5167
  23. 07 6月, 2014 1 次提交