1. 30 12月, 2010 1 次提交
  2. 27 10月, 2010 2 次提交
  3. 21 10月, 2010 1 次提交
  4. 12 10月, 2010 1 次提交
  5. 02 10月, 2010 1 次提交
  6. 21 9月, 2010 1 次提交
  7. 18 9月, 2010 2 次提交
    • H
      x86, hotplug: Move WBINVD back outside the play_dead loop · a68e5c94
      H. Peter Anvin 提交于
      On processors with hyperthreading, when only one thread is offlined
      the other thread can cause a spurious wakeup on the idled thread.  We
      do not want to re-WBINVD when that happens.
      
      Ideally, we should simply skip WBINVD unless we're the last thread on
      a particular core to shut down, but there might be similar issues
      elsewhere in the system.
      
      Thus, revert to previous behavior of only WBINVD outside the loop.
      Partly as a result, remove the mb()'s around it: they are not
      necessary since wbinvd() is a serializing instruction, but they were
      intended to make sure the compiler didn't do any funny loop
      optimizations.
      Reported-by: NAsit Mallick <asit.k.mallick@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Arjan van de Ven <arjan@linux.kernel.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.hl>
      LKML-Reference: <tip-ea530692@git.kernel.org>
      a68e5c94
    • H
      x86, hotplug: Use mwait to offline a processor, fix the legacy case · ea530692
      H. Peter Anvin 提交于
      The code in native_play_dead() has a number of problems:
      
      1. We should use MWAIT when available, to put ourselves into a deeper
         sleep state.
      2. We use the existence of CLFLUSH to determine if WBINVD is safe, but
         that is totally bogus -- WBINVD is 486+, whereas CLFLUSH is a much
         later addition.
      3. We should do WBINVD inside the loop, just in case of something like
         setting an A bit on page tables.  Pointed out by Arjan van de Ven.
      
      This code is based in part of a previous patch by Venki Pallipadi, but
      unlike that patch this one keeps all the detection code local instead
      of pre-caching a bunch of information.  We're shutting down the CPU;
      there is absolutely no hurry.
      
      This patch moves all the code to C and deletes the global
      wbinvd_halt() which is broken anyway.
      Originally-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.hl>
      LKML-Reference: <20090522232230.162239000@intel.com>
      ea530692
  8. 16 9月, 2010 1 次提交
  9. 24 8月, 2010 1 次提交
    • A
      x86, vmware: Remove deprecated VMI kernel support · 9863c90f
      Alok Kataria 提交于
      With the recent innovations in CPU hardware acceleration technologies
      from Intel and AMD, VMware ran a few experiments to compare these
      techniques to guest paravirtualization technique on VMware's platform.
      These hardware assisted virtualization techniques have outperformed the
      performance benefits provided by VMI in most of the workloads. VMware
      expects that these hardware features will be ubiquitous in a couple of
      years, as a result, VMware has started a phased retirement of this
      feature from the hypervisor.
      
      Please note that VMI has always been an optimization and non-VMI kernels
      still work fine on VMware's platform.
      Latest versions of VMware's product which support VMI are,
      Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence
      releases for these products will continue supporting VMI.
      
      For more details about VMI retirement take a look at this,
      http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html
      
      This feature removal was scheduled for 2.6.37 back in September 2009.
      Signed-off-by: NAlok N Kataria <akataria@vmware.com>
      LKML-Reference: <1282600151.19396.22.camel@ank32.eng.vmware.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      9863c90f
  10. 20 8月, 2010 1 次提交
    • B
      x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues · d7c53c9e
      Borislav Petkov 提交于
      When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d:
      Stuck ??" message due to multiple cores concurrently accessing the
      cpu_callin_mask, among others.
      
      Since these codepaths are not protected from concurrent access due to
      the fact that there's no sane reason for making an already complex
      code unnecessarily more complex - we hit the issue only when insanely
      switching cores off- and online - serialize hotplugging cores on the
      sysfs level and be done with it.
      
      [ v2.1: fix !HOTPLUG_CPU build ]
      
      Cc: <stable@kernel.org>
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      LKML-Reference: <20100819181029.GC17171@aftab>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      d7c53c9e
  11. 19 8月, 2010 1 次提交
    • J
      x86-32: Separate 1:1 pagetables from swapper_pg_dir · fd89a137
      Joerg Roedel 提交于
      This patch fixes machine crashes which occur when heavily exercising the
      CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
      AMD Erratum 383 and result in a fatal machine check exception. Here's
      the scenario:
      
      1. On 32-bit, the swapper_pg_dir page table is used as the initial page
      table for booting a secondary CPU.
      
      2. To make this work, swapper_pg_dir needs a direct mapping of physical
      memory in it (the low mappings). By adding those low, large page (2M)
      mappings (PAE kernel), we create the necessary conditions for Erratum
      383 to occur.
      
      3. Other CPUs which do not participate in the off- and onlining game may
      use swapper_pg_dir while the low mappings are present (when leave_mm is
      called). For all steps below, the CPU referred to is a CPU that is using
      swapper_pg_dir, and not the CPU which is being onlined.
      
      4. The presence of the low mappings in swapper_pg_dir can result
      in TLB entries for addresses below __PAGE_OFFSET to be established
      speculatively. These TLB entries are marked global and large.
      
      5. When the CPU with such TLB entry switches to another page table, this
      TLB entry remains because it is global.
      
      6. The process then generates an access to an address covered by the
      above TLB entry but there is a permission mismatch - the TLB entry
      covers a large global page not accessible to userspace.
      
      7. Due to this permission mismatch a new 4kb, user TLB entry gets
      established. Further, Erratum 383 provides for a small window of time
      where both TLB entries are present. This results in an uncorrectable
      machine check exception signalling a TLB multimatch which panics the
      machine.
      
      There are two ways to fix this issue:
      
              1. Always do a global TLB flush when a new cr3 is loaded and the
              old page table was swapper_pg_dir. I consider this a hack hard
              to understand and with performance implications
      
              2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
              does.
      
      This patch implements solution 2. It introduces a trampoline_pg_dir
      which has the same layout as swapper_pg_dir with low_mappings. This page
      table is used as the initial page table of the booting CPU. Later in the
      bringup process, it switches to swapper_pg_dir and does a global TLB
      flush. This fixes the crashes in our test cases.
      
      -v2: switch to swapper_pg_dir right after entering start_secondary() so
      that we are able to access percpu data which might not be mapped in the
      trampoline page table.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      LKML-Reference: <20100816123833.GB28147@aftab>
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      fd89a137
  12. 10 8月, 2010 1 次提交
  13. 31 7月, 2010 1 次提交
    • S
      x86, mtrr: Use stop machine context to rendezvous all the cpu's · 68f202e4
      Suresh Siddha 提交于
      Use the stop machine context rather than IPI's to rendezvous all the cpus for
      MTRR initialization that happens during cpu bringup or for MTRR modifications
      during runtime.
      
      This avoids deadlock scenario (reported by Prarit) like:
      
      cpu A holds a read_lock (tasklist_lock for example) with irqs enabled
      cpu B waits for the same lock with irqs disabled using write_lock_irq
      cpu C doing set_mtrr() (during AP bringup for example), which will try to
      rendezvous all the cpus using IPI's
      
      This will result in C and A come to the rendezvous point and waiting
      for B. B is stuck forever waiting for the lock and thus not
      reaching the rendezvous point.
      
      Using stop cpu (run in the process context of per cpu based keventd) to do
      this rendezvous, avoids this deadlock scenario.
      
      Also make sure all the cpu's are in the rendezvous handler before we proceed
      with the local_irq_save() on each cpu. This lock step disabling irqs on all
      the cpus will avoid other deadlock scenarios (for example involving
      with the blocking smp_call_function's etc).
      
         [ This problem is very old. Marking -stable only for 2.6.35 as the
           stop_one_cpu_nowait() API is present only in 2.6.35. Any older
           kernel interested in this fix need to do some more work in backporting
           this patch. ]
      Reported-by: NPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1280515602.2682.10.camel@sbsiddha-MOBL3.sc.intel.com>
      Acked-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: stable@kernel.org	[2.6.35]
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      68f202e4
  14. 29 6月, 2010 1 次提交
    • T
      workqueue: increase max_active of keventd and kill current_is_keventd() · b71ab8c2
      Tejun Heo 提交于
      Define WQ_MAX_ACTIVE and create keventd with max_active set to half of
      it which means that keventd now can process upto WQ_MAX_ACTIVE / 2 - 1
      works concurrently.  Unless some combination can result in dependency
      loop longer than max_active, deadlock won't happen and thus it's
      unnecessary to check whether current_is_keventd() before trying to
      schedule a work.  Kill current_is_keventd().
      
      (Lockdep annotations are broken.  We need lock_map_acquire_read_norecurse())
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      b71ab8c2
  15. 02 6月, 2010 1 次提交
    • B
      x86, smpboot: Fix cores per node printing on boot · 4adc8b71
      Borislav Petkov 提交于
      Percpu initialization happens now after booting the cores on the
      machine and this causes them all to be displayed as belonging to
      node 0:
      
      Jun  8 05:57:21 kepek kernel: [    0.106999] Booting Node   0,
      Processors  #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 Ok.
      
      Use early_cpu_to_node() to get the correct node of each core
      instead.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Cc: Mike Travis <travis@sgi.com>
      LKML-Reference: <20100601190455.GA14237@aftab>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4adc8b71
  16. 25 5月, 2010 1 次提交
  17. 03 4月, 2010 1 次提交
  18. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  19. 16 3月, 2010 1 次提交
    • S
      x86: Handle legacy PIC interrupts on all the cpu's · 36e9e1ea
      Suresh Siddha 提交于
      Ingo Molnar reported that with the recent changes of not
      statically blocking IRQ0_VECTOR..IRQ15_VECTOR's on all the
      cpu's, broke an AMD platform (with Nvidia chipset) boot when
      "noapic" boot option is used.
      
      On this platform, legacy PIC interrupts are getting delivered to
      all the cpu's instead of just the boot cpu. Thus not
      initializing the vector to irq mapping for the legacy irq's
      resulted in not handling certain interrupts causing boot hang.
      
      Fix this by initializing the vector to irq mapping on all the
      logical cpu's, if the legacy IRQ is handled by the legacy PIC.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      [ -v2: io-apic-enabled improvement ]
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <1268692386.3296.43.camel@sbs-t61.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      36e9e1ea
  20. 27 2月, 2010 1 次提交
    • R
      x86: Enable NMI on all cpus on UV · 78c06176
      Russ Anderson 提交于
      Enable NMI on all cpus in UV system and add an NMI handler
      to dump_stack on each cpu.
      
      By default on x86 all the cpus except the boot cpu have NMI
      masked off.  This patch enables NMI on all cpus in UV system
      and adds an NMI handler to dump_stack on each cpu.  This
      way if a system hangs we can NMI the machine and get a
      backtrace from all the cpus.
      
      Version 2: Use x86_platform driver mechanism for nmi init, per
                 Ingo's suggestion.
      
      Version 3: Clean up Ingo's nits.
      Signed-off-by: NRuss Anderson <rja@sgi.com>
      LKML-Reference: <20100226164912.GA24439@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      78c06176
  21. 20 2月, 2010 2 次提交
  22. 18 2月, 2010 1 次提交
  23. 10 2月, 2010 1 次提交
    • S
      x86, apic: Don't use logical-flat mode when CPU hotplug may exceed 8 CPUs · 681ee44d
      Suresh Siddha 提交于
      We need to fall back from logical-flat APIC mode to physical-flat mode
      when we have more than 8 CPUs.  However, in the presence of CPU
      hotplug(with bios listing not enabled but possible cpus as disabled cpus in
      MADT), we have to consider the number of possible CPUs rather than
      the number of current CPUs; otherwise we may cross the 8-CPU boundary
      when CPUs are added later.
      
      32bit apic code can use more cleanups (like the removal of vendor checks in
      32bit default_setup_apic_routing()) and more unifications with 64bit code.
      Yinghai has some patches in works already. This patch addresses the boot issue
      that is reported in the virtualization guest context.
      
      [ hpa: incorporated function annotation feedback from Yinghai Lu ]
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1265767304.2833.19.camel@sbs-t61.sc.intel.com>
      Acked-by: NShaohui Zheng <shaohui.zheng@intel.com>
      Reviewed-by: NYinghai Lu <yinghai@kernel.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      681ee44d
  24. 30 1月, 2010 1 次提交
    • S
      x86, irq: Move __setup_vector_irq() before the first irq enable in cpu online path · 9d133e5d
      Suresh Siddha 提交于
      Lowest priority delivery of logical flat mode is broken on some systems,
      such that even when IO-APIC RTE says deliver the interrupt to a particular CPU,
      interrupt subsystem delivers the interrupt to totally different CPU.
      
      For example, this behavior was observed on a P4 based system with SiS chipset
      which was reported by Li Zefan. We have been handling this kind of behavior by
      making sure that in logical flat mode, we assign the same vector to irq
      mappings on all the 8 possible logical cpu's.
      
      But we have been doing this initial assignment (__setup_vector_irq()) a little
      late (before which interrupts were already enabled for a short duration).
      
      Move the __setup_vector_irq() before the first irq enable point in the
      cpu online path to avoid the issue of not handling some interrupts that
      wrongly hit the cpu which is still coming online.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20100129194330.283696385@sbs-t61.sc.intel.com>
      Tested-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      9d133e5d
  25. 12 12月, 2009 1 次提交
    • M
      x86: Limit the number of processor bootup messages · 2eaad1fd
      Mike Travis 提交于
      When there are a large number of processors in a system, there
      is an excessive amount of messages sent to the system console.
      It's estimated that with 4096 processors in a system, and the
      console baudrate set to 56K, the startup messages will take
      about 84 minutes to clear the serial port.
      
      This set of patches limits the number of repetitious messages
      which contain no additional information.  Much of this information
      is obtainable from the /proc and /sysfs.   Some of the messages
      are also sent to the kernel log buffer as KERN_DEBUG messages so
      dmesg can be used to examine more closely any details specific to
      a problem.
      
      The new cpu bootup sequence for system_state == SYSTEM_BOOTING:
      
      Booting Node   0, Processors  #1 #2 #3 #4 #5 #6 #7 Ok.
      Booting Node   1, Processors  #8 #9 #10 #11 #12 #13 #14 #15 Ok.
      ...
      Booting Node   3, Processors  #56 #57 #58 #59 #60 #61 #62 #63 Ok.
      Brought up 64 CPUs
      
      After the system is running, a single line boot message is displayed
      when CPU's are hotplugged on:
      
          Booting Node %d Processor %d APIC 0x%x
      
      Status of the following lines:
      
          CPU: Physical Processor ID:		printed once (for boot cpu)
          CPU: Processor Core ID:		printed once (for boot cpu)
          CPU: Hyper-Threading is disabled	printed once (for boot cpu)
          CPU: Thermal monitoring enabled	printed once (for boot cpu)
          CPU %d/0x%x -> Node %d:		removed
          CPU %d is now offline:		only if system_state == RUNNING
          Initializing CPU#%d:		KERN_DEBUG
      Signed-off-by: NMike Travis <travis@sgi.com>
      LKML-Reference: <4B219E28.8080601@sgi.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      2eaad1fd
  26. 02 12月, 2009 1 次提交
  27. 16 11月, 2009 1 次提交
  28. 08 11月, 2009 1 次提交
    • F
      hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events · 24f1e32c
      Frederic Weisbecker 提交于
      This patch rebase the implementation of the breakpoints API on top of
      perf events instances.
      
      Each breakpoints are now perf events that handle the
      register scheduling, thread/cpu attachment, etc..
      
      The new layering is now made as follows:
      
             ptrace       kgdb      ftrace   perf syscall
                \          |          /         /
                 \         |         /         /
                                              /
                  Core breakpoint API        /
                                            /
                           |               /
                           |              /
      
                    Breakpoints perf events
      
                           |
                           |
      
                     Breakpoints PMU ---- Debug Register constraints handling
                                          (Part of core breakpoint API)
                           |
                           |
      
                   Hardware debug registers
      
      Reasons of this rewrite:
      
      - Use the centralized/optimized pmu registers scheduling,
        implying an easier arch integration
      - More powerful register handling: perf attributes (pinned/flexible
        events, exclusive/non-exclusive, tunable period, etc...)
      
      Impact:
      
      - New perf ABI: the hardware breakpoints counters
      - Ptrace breakpoints setting remains tricky and still needs some per
        thread breakpoints references.
      
      Todo (in the order):
      
      - Support breakpoints perf counter events for perf tools (ie: implement
        perf_bpcounter_event())
      - Support from perf tools
      
      Changes in v2:
      
      - Follow the perf "event " rename
      - The ptrace regression have been fixed (ptrace breakpoint perf events
        weren't released when a task ended)
      - Drop the struct hw_breakpoint and store generic fields in
        perf_event_attr.
      - Separate core and arch specific headers, drop
        asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h
      - Use new generic len/type for breakpoint
      - Handle off case: when breakpoints api is not supported by an arch
      
      Changes in v3:
      
      - Fix broken CONFIG_KVM, we need to propagate the breakpoint api
        changes to kvm when we exit the guest and restore the bp registers
        to the host.
      
      Changes in v4:
      
      - Drop the hw_breakpoint_restore() stub as it is only used by KVM
      - EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a
        module
      - Restore the breakpoints unconditionally on kvm guest exit:
        TIF_DEBUG_THREAD doesn't anymore cover every cases of running
        breakpoints and vcpu->arch.switch_db_regs might not always be
        set when the guest used debug registers.
        (Waiting for a reliable optimization)
      
      Changes in v5:
      
      - Split-up the asm-generic/hw-breakpoint.h moving to
        linux/hw_breakpoint.h into a separate patch
      - Optimize the breakpoints restoring while switching from kvm guest
        to host. We only want to restore the state if we have active
        breakpoints to the host, otherwise we don't care about messed-up
        address registers.
      - Add asm/hw_breakpoint.h to Kbuild
      - Fix bad breakpoint type in trace_selftest.c
      
      Changes in v6:
      
      - Fix wrong header inclusion in trace.h (triggered a build
        error with CONFIG_FTRACE_SELFTEST
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jan Kiszka <jan.kiszka@web.de>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      24f1e32c
  29. 24 9月, 2009 1 次提交
  30. 04 9月, 2009 1 次提交
    • A
      x86, sched: Workaround broken sched domain creation for AMD Magny-Cours · 5a925b42
      Andreas Herrmann 提交于
      Current sched domain creation code can't handle multi-node processors.
      When switching to power_savings scheduling errors show up and
      system might hang later on (due to broken sched domain hierarchy):
      
        # echo 0  >> /sys/devices/system/cpu/sched_mc_power_savings
        CPU0 attaching sched-domain:
         domain 0: span 0-5 level MC
          groups: 0 1 2 3 4 5
          domain 1: span 0-23 level NODE
           groups: 0-5 6-11 18-23 12-17
        ...
        # echo 1  >> /sys/devices/system/cpu/sched_mc_power_savings
        CPU0 attaching sched-domain:
         domain 0: span 0-11 level MC
          groups: 0 1 2 3 4 5 6 7 8 9 10 11
        ERROR: parent span is not a superset of domain->span
          domain 1: span 0-5 level CPU
        ERROR: domain->groups does not contain CPU0
           groups: 6-11 (__cpu_power = 12288)
        ERROR: groups don't span domain->span
           domain 2: span 0-23 level NODE
            groups:
        ERROR: domain->cpu_power not set
      
        ERROR: groups don't span domain->span
        ...
      
      Fixing all aspects of power-savings scheduling for Magny-Cours needs
      some larger changes in the sched domain creation code.
      
      As a short-term and temporary workaround avoid the problems by
      extending "the worst possible hack" ;-(
      and always use llc_shared_map on AMD Magny-Cours when MC domain span
      is calculated.
      
      With this I get:
      
        # echo 1  >> /sys/devices/system/cpu/sched_mc_power_savings
        CPU0 attaching sched-domain:
         domain 0: span 0-5 level MC
          groups: 0 1 2 3 4 5
          domain 1: span 0-5 level CPU
           groups: 0-5 (__cpu_power = 6144)
           domain 2: span 0-23 level NODE
            groups: 0-5 (__cpu_power = 6144) 6-11 (__cpu_power = 6144) 18-23 (__cpu_power = 6144) 12-17 (__cpu_power = 6144)
        ...
      
      I.e. no errors during sched domain creation, no system hangs, and also
      mc_power_savings scheduling works to a certain extend.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      5a925b42
  31. 02 9月, 2009 1 次提交
  32. 31 8月, 2009 1 次提交
  33. 22 8月, 2009 1 次提交
    • S
      x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init · d0af9eed
      Suresh Siddha 提交于
      SDM Vol 3a section titled "MTRR considerations in MP systems" specifies
      the need for synchronizing the logical cpu's while initializing/updating
      MTRR.
      
      Currently Linux kernel does the synchronization of all cpu's only when
      a single MTRR register is programmed/updated. During an AP online
      (during boot/cpu-online/resume)  where we initialize all the MTRR/PAT registers,
      we don't follow this synchronization algorithm.
      
      This can lead to scenarios where during a dynamic cpu online, that logical cpu
      is initializing MTRR/PAT with cache disabled (cr0.cd=1) etc while other logical
      HT sibling continue to run (also with cache disabled because of cr0.cd=1
      on its sibling).
      
      Starting from Westmere, VMX transitions with cr0.cd=1 don't work properly
      (because of some VMX performance optimizations) and the above scenario
      (with one logical cpu doing VMX activity and another logical cpu coming online)
      can result in system crash.
      
      Fix the MTRR initialization by doing rendezvous of all the cpus. During
      boot and resume, we delay the MTRR/PAT init for APs till all the
      logical cpu's come online and the rendezvous process at the end of AP's bringup,
      will initialize the MTRR/PAT for all AP's.
      
      For dynamic single cpu online, we synchronize all the logical cpus and
      do the MTRR/PAT init on the AP that is coming online.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      d0af9eed
  34. 22 7月, 2009 1 次提交
    • J
      x86, intel_txt: Intel TXT Sx shutdown support · 86886e55
      Joseph Cihula 提交于
      Support for graceful handling of sleep states (S3/S4/S5) after an Intel(R) TXT launch.
      
      Without this patch, attempting to place the system in one of the ACPI sleep
      states (S3/S4/S5) will cause the TXT hardware to treat this as an attack and
      will cause a system reset, with memory locked.  Not only may the subsequent
      memory scrub take some time, but the platform will be unable to enter the
      requested power state.
      
      This patch calls back into the tboot so that it may properly and securely clean
      up system state and clear the secrets-in-memory flag, after which it will place
      the system into the requested sleep state using ACPI information passed by the kernel.
      
       arch/x86/kernel/smpboot.c     |    2 ++
       drivers/acpi/acpica/hwsleep.c |    3 +++
       kernel/cpu.c                  |    7 ++++++-
       3 files changed, 11 insertions(+), 1 deletion(-)
      Signed-off-by: NJoseph Cihula <joseph.cihula@intel.com>
      Signed-off-by: NShane Wang <shane.wang@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      86886e55
  35. 12 6月, 2009 1 次提交
    • Y
      x86: make zap_low_mapping could be used early · 55cd6367
      Yinghai Lu 提交于
      Only one cpu is there, just call __flush_tlb for it. Fixes the following boot
      warning on x86:
      
        [    0.000000] Memory: 885032k/915540k available (5993k kernel code, 29844k reserved, 3842k data, 428k init, 0k highmem)
        [    0.000000] virtual kernel memory layout:
        [    0.000000]     fixmap  : 0xffe17000 - 0xfffff000   (1952 kB)
        [    0.000000]     vmalloc : 0xf8615000 - 0xffe15000   ( 120 MB)
        [    0.000000]     lowmem  : 0xc0000000 - 0xf7e15000   ( 894 MB)
        [    0.000000]       .init : 0xc19a5000 - 0xc1a10000   ( 428 kB)
        [    0.000000]       .data : 0xc15da4bb - 0xc199af6c   (3842 kB)
        [    0.000000]       .text : 0xc1000000 - 0xc15da4bb   (5993 kB)
        [    0.000000] Checking if this processor honours the WP bit even in supervisor mode...Ok.
        [    0.000000] ------------[ cut here ]------------
        [    0.000000] WARNING: at kernel/smp.c:369 smp_call_function_many+0x50/0x1b0()
        [    0.000000] Hardware name: System Product Name
        [    0.000000] Modules linked in:
        [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.30-tip #52504
        [    0.000000] Call Trace:
        [    0.000000]  [<c104aa16>] warn_slowpath_common+0x65/0x95
        [    0.000000]  [<c104aa58>] warn_slowpath_null+0x12/0x15
        [    0.000000]  [<c1073bbe>] smp_call_function_many+0x50/0x1b0
        [    0.000000]  [<c1037615>] ? do_flush_tlb_all+0x0/0x41
        [    0.000000]  [<c1037615>] ? do_flush_tlb_all+0x0/0x41
        [    0.000000]  [<c1073d4f>] smp_call_function+0x31/0x58
        [    0.000000]  [<c1037615>] ? do_flush_tlb_all+0x0/0x41
        [    0.000000]  [<c104f635>] on_each_cpu+0x26/0x65
        [    0.000000]  [<c10374b5>] flush_tlb_all+0x19/0x1b
        [    0.000000]  [<c1032ab3>] zap_low_mappings+0x4d/0x56
        [    0.000000]  [<c15d64b5>] ? printk+0x14/0x17
        [    0.000000]  [<c19b42a8>] mem_init+0x23d/0x245
        [    0.000000]  [<c19a56a1>] start_kernel+0x17a/0x2d5
        [    0.000000]  [<c19a5347>] ? unknown_bootoption+0x0/0x19a
        [    0.000000]  [<c19a5039>] __init_begin+0x39/0x41
        [    0.000000] ---[ end trace 4eaa2a86a8e2da22 ]---
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      55cd6367
  36. 07 6月, 2009 1 次提交
    • C
      x86, apic: Fix dummy apic read operation together with broken MP handling · 103428e5
      Cyrill Gorcunov 提交于
      Ingo Molnar reported that read_apic is buggy novadays:
      
      [    0.000000] Using APIC driver default
      [    0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs
      [    0.000000] Local APIC disabled by BIOS -- you can enable it with "lapic"
      [    0.000000] APIC: disable apic facility
      [    0.000000] ------------[ cut here ]------------
      [    0.000000] WARNING: at arch/x86/kernel/apic/apic.c:254 native_apic_read_dummy+0x2d/0x3b()
      [    0.000000] Hardware name: HP OmniBook PC
      
      Indeed we still rely on apic->read operation for SMP compiled
      kernel. And instead of disfigure the SMP code with #ifdef we
      allow to call apic->read. To capture any unexpected results
      we check for apic->read being called for sane reason via
      WARN_ON_ONCE but(!) instead of OR we should use AND logical
      operation (thanks Yinghai for spotting the root of the problem).
      
      Along with that we could be have bad MP table and we are
      to fix it that way no SMP started and no complains about
      BIOS bug if apic was just disabled via command line.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      LKML-Reference: <20090607124840.GD4547@lenovo>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      103428e5
  37. 03 6月, 2009 1 次提交