1. 12 5月, 2015 1 次提交
  2. 08 5月, 2015 1 次提交
  3. 06 5月, 2015 1 次提交
    • J
      x86/smpboot: Skip delays during SMP initialization similar to Xen · f5d6a52f
      Jan H. Schönherr 提交于
      Remove the per-CPU delays during SMP initialization, which seems
      to be possible on newer architectures with an x2APIC.
      
      Xen does this since 2011. In fact, this commit is basically a
      combination of the following Xen commits. The first removes the
      delays, the second fixes an issue with the removal:
      
        commit 68fce206f6dba9981e8322269db49692c95ce250
        Author: Tim Deegan <Tim.Deegan@citrix.com>
        Date:   Tue Jul 19 14:13:01 2011 +0100
      
          x86: Remove timeouts from INIT-SIPI-SIPI sequence when using x2apic.
      
          Some of the timeouts are pointless since they're waiting for the ICR
          to ack the IPI delivery and that doesn't happen on x2apic.
          The others should be benign (and are suggested in the SDM) but
          removing them makes AP bringup much more reliable on some test boxes.
      Signed-off-by: NTim Deegan <Tim.Deegan@citrix.com>
      
        commit f12ee533150761df5a7099c83f2a5fa6c07d1187
        Author: Gang Wei <gang.wei@intel.com>
        Date:   Thu Dec 29 10:07:54 2011 +0000
      
          X86: Add a delay between INIT & SIPIs for tboot AP bring-up in X2APIC case
      
          Without this delay, Xen could not bring APs up while working with
          TXT/tboot, because tboot needs some time in APs to handle INIT before
          becoming ready for receiving SIPIs (this delay was removed as part of
          c/s 23724 by Tim Deegan).
      Signed-off-by: NGang Wei <gang.wei@intel.com>
      Acked-by: NKeir Fraser <keir@xen.org>
      Acked-by: NTim Deegan <tim@xen.org>
      Committed-by: NTim Deegan <tim@xen.org>
      Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
      Cc: Anthony Liguori <aliguori@amazon.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Gang Wei <gang.wei@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Deegan <tim@xen.org>
      Link: http://lkml.kernel.org/r/1430732554-7294-1-git-send-email-jschoenh@amazon.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f5d6a52f
  4. 02 4月, 2015 1 次提交
  5. 01 4月, 2015 1 次提交
  6. 25 3月, 2015 1 次提交
    • D
      x86/asm/entry: Get rid of KERNEL_STACK_OFFSET · ef593260
      Denys Vlasenko 提交于
      PER_CPU_VAR(kernel_stack) was set up in a way where it points
      five stack slots below the top of stack.
      
      Presumably, it was done to avoid one "sub $5*8,%rsp"
      in syscall/sysenter code paths, where iret frame needs to be
      created by hand.
      
      Ironically, none of them benefits from this optimization,
      since all of them need to allocate additional data on stack
      (struct pt_regs), so they still have to perform subtraction.
      
      This patch eliminates KERNEL_STACK_OFFSET.
      
      PER_CPU_VAR(kernel_stack) now points directly to top of stack.
      pt_regs allocations are adjusted to allocate iret frame as well.
      Hopefully we can merge it later with 32-bit specific
      PER_CPU_VAR(cpu_current_top_of_stack) variable...
      
      Net result in generated code is that constants in several insns
      are changed.
      
      This change is necessary for changing struct pt_regs creation
      in SYSCALL64 code path from MOV to PUSH instructions.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1426785469-15125-2-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ef593260
  7. 12 3月, 2015 1 次提交
    • P
      x86: Use common outgoing-CPU-notification code · 2a442c9c
      Paul E. McKenney 提交于
      This commit removes the open-coded CPU-offline notification with new
      common code.  Among other things, this change avoids calling scheduler
      code using RCU from an offline CPU that RCU is ignoring.  It also allows
      Xen to notice at online time that the CPU did not go offline correctly.
      Note that Xen has the surviving CPU carry out some cleanup operations,
      so if the surviving CPU times out, these cleanup operations might have
      been carried out while the outgoing CPU was still running.  It might
      therefore be unwise to bring this CPU back online, and this commit
      avoids doing so.
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: <x86@kernel.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: <xen-devel@lists.xenproject.org>
      2a442c9c
  8. 07 3月, 2015 1 次提交
  9. 22 1月, 2015 6 次提交
  10. 16 12月, 2014 1 次提交
  11. 10 11月, 2014 1 次提交
  12. 05 11月, 2014 1 次提交
  13. 19 10月, 2014 1 次提交
  14. 03 10月, 2014 1 次提交
  15. 24 9月, 2014 3 次提交
    • W
      sched: Fix unreleased llc_shared_mask bit during CPU hotplug · 03bd4e1f
      Wanpeng Li 提交于
      The following bug can be triggered by hot adding and removing a large number of
      xen domain0's vcpus repeatedly:
      
      	BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group
      	PGD 5a9d5067 PUD 13067 PMD 0
      	Oops: 0000 [#3] SMP
      	[...]
      	Call Trace:
      	load_balance
      	? _raw_spin_unlock_irqrestore
      	idle_balance
      	__schedule
      	schedule
      	schedule_timeout
      	? lock_timer_base
      	schedule_timeout_uninterruptible
      	msleep
      	lock_device_hotplug_sysfs
      	online_store
      	dev_attr_store
      	sysfs_write_file
      	vfs_write
      	SyS_write
      	system_call_fastpath
      
      Last level cache shared mask is built during CPU up and the
      build_sched_domain() routine takes advantage of it to setup
      the sched domain CPU topology.
      
      However, llc_shared_mask is not released during CPU disable,
      which leads to an invalid sched domainCPU topology.
      
      This patch fix it by releasing the llc_shared_mask correctly
      during CPU disable.
      
      Yasuaki also reported that this can happen on real hardware:
      
        https://lkml.org/lkml/2014/7/22/1018
      
      His case is here:
      
      	==
      	Here is an example on my system.
      	My system has 4 sockets and each socket has 15 cores and HT is
      	enabled. In this case, each core of sockes is numbered as
      	follows:
      
      		 | CPU#
      	Socket#0 | 0-14 , 60-74
      	Socket#1 | 15-29, 75-89
      	Socket#2 | 30-44, 90-104
      	Socket#3 | 45-59, 105-119
      
      	Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
      
      	It means that last level cache of Socket#2 is shared with
      	CPU#30-44 and 90-104.
      
      	When hot-removing socket#2 and #3, each core of sockets is
      	numbered as follows:
      
      		 | CPU#
      	Socket#0 | 0-14 , 60-74
      	Socket#1 | 15-29, 75-89
      
      	But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30
      	remains having 0x3fff80000001fffc0000000.
      
      	After that, when hot-adding socket#2 and #3, each core of
      	sockets is numbered as follows:
      
      		 | CPU#
      	Socket#0 | 0-14 , 60-74
      	Socket#1 | 15-29, 75-89
      	Socket#2 | 30-59
      	Socket#3 | 90-119
      
      	Then llc_shared_mask of CPU#30 becomes
      	0x3fff8000fffffffc0000000. It means that last level cache of
      	Socket#2 is shared with CPU#30-59 and 90-104. So the mask has
      	the wrong value.
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Tested-by: NLinn Crosetto <linn@hp.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NToshi Kani <toshi.kani@hp.com>
      Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1411547885-48165-1-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      03bd4e1f
    • L
      x86/smpboot: Speed up suspend/resume by avoiding 100ms sleep for CPU offline during S3 · 2ed53c0d
      Lan Tianyu 提交于
      With certain kernel configurations, CPU offline consumes more than
      100ms during S3.
      
      It's a timing related issue: native_cpu_die() would occasionally fall
      into a 100ms sleep when the CPU idle loop thread marked the CPU state
      to DEAD too slowly.
      
      What native_cpu_die() does is that it polls the CPU state and waits
      for 100ms if CPU state hasn't been marked to DEAD. The 100ms sleep
      doesn't make sense and is purely historic.
      
      To avoid such long sleeping, this patch adds a 'struct completion'
      to each CPU, waits for the completion in native_cpu_die() and wakes
      up the completion when the CPU state is marked to DEAD.
      
      Tested on an Intel Xeon server with 48 cores, Ivybridge and on
      Haswell laptops. The CPU offlining cost on these machines is
      reduced from more than 100ms to less than 5ms. The system
      suspend time is reduced by 2.3s on the servers.
      
      Borislav and Prarit also helped to test the patch on an AMD
      machine and a few systems of various sizes and configurations
      (multi-socket, single-socket, no hyper threading, etc.). No
      issues were seen.
      Tested-by: NPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: NLan Tianyu <tianyu.lan@intel.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: srostedt@redhat.com
      Cc: toshi.kani@hp.com
      Cc: imammedo@redhat.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1409039025-32310-1-git-send-email-tianyu.lan@intel.com
      [ Improved a few minor details in the code, cleaned up the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2ed53c0d
    • D
      x86, sched: Add new topology for multi-NUMA-node CPUs · cebf15eb
      Dave Hansen 提交于
      I'm getting the spew below when booting with Haswell (Xeon
      E5-2699 v3) CPUs and the "Cluster-on-Die" (CoD) feature enabled
      in the BIOS.  It seems similar to the issue that some folks from
      AMD ran in to on their systems and addressed in this commit:
      
        161270fc ("x86/smp: Fix topology checks on AMD MCM CPUs")
      
      Both these Intel and AMD systems break an assumption which is
      being enforced by topology_sane(): a socket may not contain more
      than one NUMA node.
      
      AMD special-cased their system by looking for a cpuid flag.  The
      Intel mode is dependent on BIOS options and I do not know of a
      way which it is enumerated other than the tables being parsed
      during the CPU bringup process.  In other words, we have to trust
      the ACPI tables <shudder>.
      
      This detects the situation where a NUMA node occurs at a place in
      the middle of the "CPU" sched domains.  It replaces the default
      topology with one that relies on the NUMA information from the
      firmware (SRAT table) for all levels of sched domains above the
      hyperthreads.
      
      This also fixes a sysfs bug.  We used to freak out when we saw
      the "mc" group cross a node boundary, so we stopped building the
      MC group.  MC gets exported as the 'core_siblings_list' in
      /sys/devices/system/cpu/cpu*/topology/ and this caused CPUs with
      the same 'physical_package_id' to not be listed together in
      'core_siblings_list'.  This violates a statement from
      Documentation/ABI/testing/sysfs-devices-system-cpu:
      
      	core_siblings: internal kernel map of cpu#'s hardware threads
      	within the same physical_package_id.
      
      	core_siblings_list: human-readable list of the logical CPU
      	numbers within the same physical_package_id as cpu#.
      
      The sysfs effects here cause an issue with the hwloc tool where
      it gets confused and thinks there are more sockets than are
      physically present.
      
      Before this patch, there are two packages:
      
      # cd /sys/devices/system/cpu/
      # cat cpu*/topology/physical_package_id | sort | uniq -c
           18 0
           18 1
      
      But 4 _sets_ of core siblings:
      
      # cat cpu*/topology/core_siblings_list | sort | uniq -c
            9 0-8
            9 18-26
            9 27-35
            9 9-17
      
      After this set, there are only 2 sets of core siblings, which
      is what we expect for a 2-socket system.
      
      # cat cpu*/topology/physical_package_id | sort | uniq -c
           18 0
           18 1
      # cat cpu*/topology/core_siblings_list | sort | uniq -c
           18 0-17
           18 18-35
      
      Example spew:
      ...
      	NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
      	 #2  #3  #4  #5  #6  #7  #8
      	.... node  #1, CPUs:    #9
      	------------[ cut here ]------------
      	WARNING: CPU: 9 PID: 0 at /home/ak/hle/linux-hle-2.6/arch/x86/kernel/smpboot.c:306 topology_sane.isra.2+0x74/0x90()
      	sched: CPU #9's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
      	Modules linked in:
      	CPU: 9 PID: 0 Comm: swapper/9 Not tainted 3.17.0-rc1-00293-g8e01c4d-dirty #631
      	Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRNDSDP1.86B.0036.R05.1407140519 07/14/2014
      	0000000000000009 ffff88046ddabe00 ffffffff8172e485 ffff88046ddabe48
      	ffff88046ddabe38 ffffffff8109691d 000000000000b001 0000000000000009
      	ffff88086fc12580 000000000000b020 0000000000000009 ffff88046ddabe98
      	Call Trace:
      	[<ffffffff8172e485>] dump_stack+0x45/0x56
      	[<ffffffff8109691d>] warn_slowpath_common+0x7d/0xa0
      	[<ffffffff8109698c>] warn_slowpath_fmt+0x4c/0x50
      	[<ffffffff81074f94>] topology_sane.isra.2+0x74/0x90
      	[<ffffffff8107530e>] set_cpu_sibling_map+0x31e/0x4f0
      	[<ffffffff8107568d>] start_secondary+0x1ad/0x240
      	---[ end trace 3fe5f587a9fcde61 ]---
      	#10 #11 #12 #13 #14 #15 #16 #17
      	.... node  #2, CPUs:   #18 #19 #20 #21 #22 #23 #24 #25 #26
      	.... node  #3, CPUs:   #27 #28 #29 #30 #31 #32 #33 #34 #35
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      [ Added LLC domain and s/match_mc/match_die/ ]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: brice.goglin@gmail.com
      Cc: "H. Peter Anvin" <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/r/20140918193334.C065EBCE@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cebf15eb
  16. 16 9月, 2014 1 次提交
  17. 31 7月, 2014 2 次提交
  18. 09 6月, 2014 1 次提交
  19. 05 6月, 2014 3 次提交
    • I
      x86/smpboot: Initialize secondary CPU only if master CPU will wait for it · 3e1a878b
      Igor Mammedov 提交于
      Hang is observed on virtual machines during CPU hotplug,
      especially in big guests with many CPUs. (It reproducible
      more often if host is over-committed).
      
      It happens because master CPU gives up waiting on
      secondary CPU and allows it to run wild. As result
      AP causes locking or crashing system. For example
      as described here:
      
         https://lkml.org/lkml/2014/3/6/257
      
      If master CPU have sent STARTUP IPI successfully,
      and AP signalled to master CPU that it's ready
      to start initialization, make master CPU wait
      indefinitely till AP is onlined.
      To ensure that AP won't ever run wild, make it
      wait at early startup till master CPU confirms its
      intention to wait for AP. If AP doesn't respond in 10
      seconds, the master CPU will timeout and cancel
      AP onlining.
      Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
      Acked-by: NToshi Kani <toshi.kani@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1401975765-22328-4-git-send-email-imammedo@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3e1a878b
    • I
      x86/smpboot: Log error on secondary CPU wakeup failure at ERR level · feef1e8e
      Igor Mammedov 提交于
      If system is running without debug level logging,
      it will not log error if do_boot_cpu() failed to
      wakeup AP. It may lead to silent AP bringup
      failures at boot time.
      Change message level to KERN_ERR to make error
      visible to user as it's done on other architectures.
      Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
      Acked-by: NToshi Kani <toshi.kani@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1401975765-22328-3-git-send-email-imammedo@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      feef1e8e
    • I
      x86: Fix list/memory corruption on CPU hotplug · 89f898c1
      Igor Mammedov 提交于
      currently if AP wake up is failed, master CPU marks AP as not
      present in do_boot_cpu() by calling set_cpu_present(cpu, false).
      That leads to following list corruption on the next physical CPU
      hotplug:
      
      [  418.107336] WARNING: CPU: 1 PID: 45 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
      [  418.115268] list_add corruption. prev->next should be next (ffff88003dc57600), but was ffff88003e20c3a0. (prev=ffff88003e20c3a0).
      [  418.123693] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT cfg80211 xt_conntrack rfkill ee
      [  418.138979] CPU: 1 PID: 45 Comm: kworker/u10:1 Not tainted 3.14.0-rc6+ #387
      [  418.149989] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      [  418.165750] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
      [  418.166433]  0000000000000021 ffff880038ca7988 ffffffff8159b22d 0000000000000021
      [  418.176460]  ffff880038ca79d8 ffff880038ca79c8 ffffffff8106942c ffff880038ca79e8
      [  418.177453]  ffff88003e20c3a0 ffff88003dc57600 ffff88003e20c3a0 00000000ffffffea
      [  418.178445] Call Trace:
      [  418.185811]  [<ffffffff8159b22d>] dump_stack+0x49/0x5c
      [  418.186440]  [<ffffffff8106942c>] warn_slowpath_common+0x8c/0xc0
      [  418.187192]  [<ffffffff81069516>] warn_slowpath_fmt+0x46/0x50
      [  418.191231]  [<ffffffff8136ef51>] ? acpi_ns_get_node+0xb7/0xc7
      [  418.193889]  [<ffffffff812f796e>] __list_add+0xbe/0xd0
      [  418.196649]  [<ffffffff812e2aa9>] kobject_add_internal+0x79/0x200
      [  418.208610]  [<ffffffff812e2e18>] kobject_add_varg+0x38/0x60
      [  418.213831]  [<ffffffff812e2ef4>] kobject_add+0x44/0x70
      [  418.229961]  [<ffffffff813e2c60>] device_add+0xd0/0x550
      [  418.234991]  [<ffffffff813f0e95>] ? pm_runtime_init+0xe5/0xf0
      [  418.250226]  [<ffffffff813e32be>] device_register+0x1e/0x30
      [  418.255296]  [<ffffffff813e82a3>] register_cpu+0xe3/0x130
      [  418.266539]  [<ffffffff81592be5>] arch_register_cpu+0x65/0x150
      [  418.285845]  [<ffffffff81355c0d>] acpi_processor_hotadd_init+0x5a/0x9b
      ...
      Which is caused by the fact that generic_processor_info() allocates
      logical CPU id by calling:
      
       cpu = cpumask_next_zero(-1, cpu_present_mask);
      
      which returns id of previously failed to wake up CPU, since its
      bit is cleared by do_boot_cpu() and as result register_cpu()
      tries to register another CPU with the same id as already
      present but failed to be onlined CPU.
      
      Taking in account that AP will not do anything if master CPU
      failed to wake it up, there is no reason to mark that AP as not
      present and break next cpu hotplug attempts. As a side effect of
      not marking AP as not present, user would be allowed to online
      it again later.
      
      Also fix memory corruption in acpi_unmap_lsapic()
      
      if during CPU hotplug master CPU failed to wake up AP
      it set percpu x86_cpu_to_apicid to BAD_APICID=0xFFFF for AP.
      
      However following attempt to unplug that CPU will lead to
      out of bound write access to __apicid_to_node[] which is
      32768 items long on x86_64 kernel.
      
      So with above fix of cpu_present_mask make sure that a present
      CPU has a valid APIC ID by not setting x86_cpu_to_apicid
      to BAD_APICID in do_boot_cpu() on failure and allow
      acpi_processor_remove()->acpi_unmap_lsapic() cleanly remove CPU.
      Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
      Acked-by: NToshi Kani <toshi.kani@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1401975765-22328-2-git-send-email-imammedo@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      89f898c1
  20. 05 5月, 2014 1 次提交
  21. 01 5月, 2014 1 次提交
    • H
      x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack · 3891a04a
      H. Peter Anvin 提交于
      The IRET instruction, when returning to a 16-bit segment, only
      restores the bottom 16 bits of the user space stack pointer.  This
      causes some 16-bit software to break, but it also leaks kernel state
      to user space.  We have a software workaround for that ("espfix") for
      the 32-bit kernel, but it relies on a nonzero stack segment base which
      is not available in 64-bit mode.
      
      In checkin:
      
          b3b42ac2 x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
      
      we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
      the logic that 16-bit support is crippled on 64-bit kernels anyway (no
      V86 support), but it turns out that people are doing stuff like
      running old Win16 binaries under Wine and expect it to work.
      
      This works around this by creating percpu "ministacks", each of which
      is mapped 2^16 times 64K apart.  When we detect that the return SS is
      on the LDT, we copy the IRET frame to the ministack and use the
      relevant alias to return to userspace.  The ministacks are mapped
      readonly, so if IRET faults we promote #GP to #DF which is an IST
      vector and thus has its own stack; we then do the fixup in the #DF
      handler.
      
      (Making #GP an IST exception would make the msr_safe functions unsafe
      in NMI/MC context, and quite possibly have other effects.)
      
      Special thanks to:
      
      - Andy Lutomirski, for the suggestion of using very small stack slots
        and copy (as opposed to map) the IRET frame there, and for the
        suggestion to mark them readonly and let the fault promote to #DF.
      - Konrad Wilk for paravirt fixup and testing.
      - Borislav Petkov for testing help and useful comments.
      Reported-by: NBrian Gerst <brgerst@gmail.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andrew Lutomriski <amluto@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Dirk Hohndel <dirk@hohndel.org>
      Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
      Cc: comex <comexk@gmail.com>
      Cc: Alexander van Heukelum <heukelum@fastmail.fm>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: <stable@vger.kernel.org> # consider after upstream merge
      3891a04a
  22. 11 3月, 2014 1 次提交
  23. 07 3月, 2014 1 次提交
    • S
      x86: Keep thread_info on thread stack in x86_32 · 198d208d
      Steven Rostedt 提交于
      x86_64 uses a per_cpu variable kernel_stack to always point to
      the thread stack of current. This is where the thread_info is stored
      and is accessed from this location even when the irq or exception stack
      is in use. This removes the complexity of having to maintain the
      thread info on the stack when interrupts are running and having to
      copy the preempt_count and other fields to the interrupt stack.
      
      x86_32 uses the old method of copying the thread_info from the thread
      stack to the exception stack just before executing the exception.
      
      Having the two different requires #ifdefs and also the x86_32 way
      is a bit of a pain to maintain. By converting x86_32 to the same
      method of x86_64, we can remove #ifdefs, clean up the x86_32 code
      a little, and remove the overhead of the copy.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20110806012354.263834829@goodmis.org
      Link: http://lkml.kernel.org/r/20140206144321.852942014@goodmis.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      198d208d
  24. 28 2月, 2014 1 次提交
  25. 09 2月, 2014 1 次提交
  26. 16 1月, 2014 1 次提交
    • P
      x86: Add check for number of available vectors before CPU down · da6139e4
      Prarit Bhargava 提交于
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791
      
      When a cpu is downed on a system, the irqs on the cpu are assigned to
      other cpus.  It is possible, however, that when a cpu is downed there
      aren't enough free vectors on the remaining cpus to account for the
      vectors from the cpu that is being downed.
      
      This results in an interesting "overflow" condition where irqs are
      "assigned" to a CPU but are not handled.
      
      For example, when downing cpus on a 1-64 logical processor system:
      
      <snip>
      [  232.021745] smpboot: CPU 61 is now offline
      [  238.480275] smpboot: CPU 62 is now offline
      [  245.991080] ------------[ cut here ]------------
      [  245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x246/0x250()
      [  246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out
      [  246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas scsi_transport_sas
      [  246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14
      [  246.044451] Hardware name: Intel Corporation S4600LH ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
      [  246.057371]  0000000000000009 ffff88081fa03d40 ffffffff8164fbf6 ffff88081fa0ee48
      [  246.065728]  ffff88081fa03d90 ffff88081fa03d80 ffffffff81054ecc ffff88081fa13040
      [  246.074073]  0000000000000000 ffff88200cce0000 0000000000000040 0000000000000000
      [  246.082430] Call Trace:
      [  246.085174]  <IRQ>  [<ffffffff8164fbf6>] dump_stack+0x46/0x58
      [  246.091633]  [<ffffffff81054ecc>] warn_slowpath_common+0x8c/0xc0
      [  246.098352]  [<ffffffff81054fb6>] warn_slowpath_fmt+0x46/0x50
      [  246.104786]  [<ffffffff815710d6>] dev_watchdog+0x246/0x250
      [  246.110923]  [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
      [  246.119097]  [<ffffffff8106092a>] call_timer_fn+0x3a/0x110
      [  246.125224]  [<ffffffff8106280f>] ? update_process_times+0x6f/0x80
      [  246.132137]  [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
      [  246.140308]  [<ffffffff81061db0>] run_timer_softirq+0x1f0/0x2a0
      [  246.146933]  [<ffffffff81059a80>] __do_softirq+0xe0/0x220
      [  246.152976]  [<ffffffff8165fedc>] call_softirq+0x1c/0x30
      [  246.158920]  [<ffffffff810045f5>] do_softirq+0x55/0x90
      [  246.164670]  [<ffffffff81059d35>] irq_exit+0xa5/0xb0
      [  246.170227]  [<ffffffff8166062a>] smp_apic_timer_interrupt+0x4a/0x60
      [  246.177324]  [<ffffffff8165f40a>] apic_timer_interrupt+0x6a/0x70
      [  246.184041]  <EOI>  [<ffffffff81505a1b>] ? cpuidle_enter_state+0x5b/0xe0
      [  246.191559]  [<ffffffff81505a17>] ? cpuidle_enter_state+0x57/0xe0
      [  246.198374]  [<ffffffff81505b5d>] cpuidle_idle_call+0xbd/0x200
      [  246.204900]  [<ffffffff8100b7ae>] arch_cpu_idle+0xe/0x30
      [  246.210846]  [<ffffffff810a47b0>] cpu_startup_entry+0xd0/0x250
      [  246.217371]  [<ffffffff81646b47>] rest_init+0x77/0x80
      [  246.223028]  [<ffffffff81d09e8e>] start_kernel+0x3ee/0x3fb
      [  246.229165]  [<ffffffff81d0989f>] ? repair_env_string+0x5e/0x5e
      [  246.235787]  [<ffffffff81d095a5>] x86_64_start_reservations+0x2a/0x2c
      [  246.242990]  [<ffffffff81d0969f>] x86_64_start_kernel+0xf8/0xfc
      [  246.249610] ---[ end trace fb74fdef54d79039 ]---
      [  246.254807] ixgbe 0000:c2:00.0 p786p1: initiating reset due to tx timeout
      [  246.262489] ixgbe 0000:c2:00.0 p786p1: Reset adapter
      Last login: Mon Nov 11 08:35:14 from 10.18.17.119
      [root@(none) ~]# [  246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
      [  249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
      [  246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
      [  249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
      
      (last lines keep repeating.  ixgbe driver is dead until module reload.)
      
      If the downed cpu has more vectors than are free on the remaining cpus on the
      system, it is possible that some vectors are "orphaned" even though they are
      assigned to a cpu.  In this case, since the ixgbe driver had a watchdog, the
      watchdog fired and notified that something was wrong.
      
      This patch adds a function, check_vectors(), to compare the number of vectors
      on the CPU going down and compares it to the number of vectors available on
      the system.  If there aren't enough vectors for the CPU to go down, an
      error is returned and propogated back to userspace.
      
      v2: Do not need to look at percpu irqs
      v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest
          Priority Mode
      v4: Additional changes suggested by Gong Chen.
      v5/v6/v7/v8: Updated comment text
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Link: http://lkml.kernel.org/r/1389613861-3853-1-git-send-email-prarit@redhat.comReviewed-by: NGong Chen <gong.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Yang Zhang <yang.z.zhang@Intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Janet Morgan <janet.morgan@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Ruiv Wang <ruiv.wang@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      da6139e4
  27. 20 12月, 2013 1 次提交
  28. 01 10月, 2013 1 次提交
    • B
      x86/boot: Further compress CPUs bootup message · a17bce4d
      Borislav Petkov 提交于
      Turn it into (for example):
      
      [    0.073380] x86: Booting SMP configuration:
      [    0.074005] .... node   #0, CPUs:          #1   #2   #3   #4   #5   #6   #7
      [    0.603005] .... node   #1, CPUs:     #8   #9  #10  #11  #12  #13  #14  #15
      [    1.200005] .... node   #2, CPUs:    #16  #17  #18  #19  #20  #21  #22  #23
      [    1.796005] .... node   #3, CPUs:    #24  #25  #26  #27  #28  #29  #30  #31
      [    2.393005] .... node   #4, CPUs:    #32  #33  #34  #35  #36  #37  #38  #39
      [    2.996005] .... node   #5, CPUs:    #40  #41  #42  #43  #44  #45  #46  #47
      [    3.600005] .... node   #6, CPUs:    #48  #49  #50  #51  #52  #53  #54  #55
      [    4.202005] .... node   #7, CPUs:    #56  #57  #58  #59  #60  #61  #62  #63
      [    4.811005] .... node   #8, CPUs:    #64  #65  #66  #67  #68  #69  #70  #71
      [    5.421006] .... node   #9, CPUs:    #72  #73  #74  #75  #76  #77  #78  #79
      [    6.032005] .... node  #10, CPUs:    #80  #81  #82  #83  #84  #85  #86  #87
      [    6.648006] .... node  #11, CPUs:    #88  #89  #90  #91  #92  #93  #94  #95
      [    7.262005] .... node  #12, CPUs:    #96  #97  #98  #99 #100 #101 #102 #103
      [    7.865005] .... node  #13, CPUs:   #104 #105 #106 #107 #108 #109 #110 #111
      [    8.466005] .... node  #14, CPUs:   #112 #113 #114 #115 #116 #117 #118 #119
      [    9.073006] .... node  #15, CPUs:   #120 #121 #122 #123 #124 #125 #126 #127
      [    9.679901] x86: Booted up 16 nodes, 128 CPUs
      
      and drop useless elements.
      
      Change num_digits() to hpa's division-avoiding, cell-phone-typed
      version which he went at great lengths and pains to submit on a
      Saturday evening.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: huawei.libin@huawei.com
      Cc: wangyijing@huawei.com
      Cc: fenghua.yu@intel.com
      Cc: guohanjun@huawei.com
      Cc: paul.gortmaker@windriver.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a17bce4d
  29. 28 9月, 2013 1 次提交
    • B
      x86: Improve the printout of the SMP bootup CPU table · 646e29a1
      Borislav Petkov 提交于
      As the new x86 CPU bootup printout format code maintainer, I am
      taking immediate action to improve and clean (and thus indulge
      my OCD) the reporting of the cores when coming up online.
      
      Fix padding to a right-hand alignment, cleanup code and bind
      reporting width to the max number of supported CPUs on the
      system, like this:
      
       [    0.074509] smpboot: Booting Node   0, Processors:      #1  #2  #3  #4  #5  #6  #7 OK
       [    0.644008] smpboot: Booting Node   1, Processors:  #8  #9 #10 #11 #12 #13 #14 #15 OK
       [    1.245006] smpboot: Booting Node   2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK
       [    1.864005] smpboot: Booting Node   3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK
       [    2.489005] smpboot: Booting Node   4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK
       [    3.093005] smpboot: Booting Node   5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK
       [    3.698005] smpboot: Booting Node   6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK
       [    4.304005] smpboot: Booting Node   7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK
       [    4.961413] Brought up 64 CPUs
      
      and this:
      
       [    0.072367] smpboot: Booting Node   0, Processors:    #1 #2 #3 #4 #5 #6 #7 OK
       [    0.686329] Brought up 8 CPUs
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Libin <huawei.libin@huawei.com>
      Cc: wangyijing@huawei.com
      Cc: fenghua.yu@intel.com
      Cc: guohanjun@huawei.com
      Cc: paul.gortmaker@windriver.com
      Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      646e29a1
  30. 25 9月, 2013 1 次提交