1. 07 7月, 2009 1 次提交
  2. 03 7月, 2009 5 次提交
  3. 02 7月, 2009 3 次提交
  4. 01 7月, 2009 1 次提交
    • J
      x86: Mark device_nb as static and fix NULL noise · b25ae679
      Jaswinder Singh Rajput 提交于
      This sparse warning:
      
        arch/x86/kernel/amd_iommu.c:1195:23: warning: symbol 'device_nb' was not declared. Should it be static?
      
      triggers because device_nb is global but is only used in a
      single .c file. change device_nb to static to fix that - this
      also addresses the sparse warning.
      
      This sparse warning:
      
        arch/x86/kernel/amd_iommu.c:1766:10: warning: Using plain integer as NULL pointer
      
      triggers because plain integer 0 is used in place of a NULL
      pointer. change 0 to NULL to fix that - this also address the
      sparse warning.
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Joerg Roedel <joerg.roedel@amd.com>
      LKML-Reference: <1246458194.6940.20.camel@hpdv5.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b25ae679
  5. 29 6月, 2009 1 次提交
  6. 28 6月, 2009 1 次提交
    • H
      Revert "x86: cap iomem_resource to addressable physical memory" · ff8a4bae
      H. Peter Anvin 提交于
      This reverts commit 95ee14e4.
      Mikael Petterson <mikepe@it.uu.se> reported that at least one of his
      systems will not boot as a result.  We have ruled out the detection
      algorithm malfunctioning, so it is not a matter of producing the
      incorrect bitmasks; rather, something in the application of them
      fails.
      
      Revert the commit until we can root cause and correct this problem.
      
      -stable team: this means the underlying commit should be rejected.
      Reported-and-isolated-by: NMikael Petterson <mikpe@it.uu.se>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      LKML-Reference: <200906261559.n5QFxJH8027336@pilspetsen.it.uu.se>
      Cc: stable@kernel.org
      Cc: Grant Grundler <grundler@parisc-linux.org>
      ff8a4bae
  7. 26 6月, 2009 3 次提交
  8. 24 6月, 2009 3 次提交
    • C
      x86: Fix uv bau sending buffer initialization · 9c26f52b
      Cliff Wickman 提交于
      The initialization of the UV Broadcast Assist Unit's sending
      buffers was making an invalid assumption about the
      initialization of an MMR that defines its address.
      
      The BIOS will not be providing that MMR.  So
      uv_activation_descriptor_init() should unconditionally set it.
      
      Tested on UV simulator.
      Signed-off-by: NCliff Wickman <cpw@sgi.com>
      Cc: <stable@kernel.org> # for v2.6.30.x
      LKML-Reference: <E1MJTfj-0005i1-W8@eag09.americas.sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9c26f52b
    • Y
      perf_counter, x86: Set global control MSR correctly · c14dab5c
      Yong Wang 提交于
      Previous code made an assumption that the power on value of global
      control MSR has enabled all fixed and general purpose counters properly.
      
      However, this is not the case for certain Intel processors, such as
      Atom - and it might also be firmware dependent.
      
      Each enable bit in IA32_PERF_GLOBAL_CTRL is AND'ed with the
      enable bits for all privilege levels in the respective IA32_PERFEVTSELx
      or IA32_PERF_FIXED_CTR_CTRL MSRs to start/stop the counting of
      respective counters. Counting is enabled if the AND'ed results is true;
      counting is disabled when the result is false.
      
      The end result is that all fixed counters are always disabled on Atom
      processors because the assumption is just invalid.
      
      Fix this by not initializing the ctrl-mask out of the global MSR,
      but setting it to perf_counter_mask.
      Reported-by: NStephane Eranian <eranian@googlemail.com>
      Signed-off-by: NYong Wang <yong.y.wang@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <20090624021324.GA2788@ywang-moblin2.bj.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c14dab5c
    • W
      Intel-IOMMU, intr-remap: source-id checking · f007e99c
      Weidong Han 提交于
      To support domain-isolation usages, the platform hardware must be
      capable of uniquely identifying the requestor (source-id) for each
      interrupt message. Without source-id checking for interrupt remapping
      , a rouge guest/VM with assigned devices can launch interrupt attacks
      to bring down anothe guest/VM or the VMM itself.
      
      This patch adds source-id checking for interrupt remapping, and then
      really isolates interrupts for guests/VMs with assigned devices.
      
      Because PCI subsystem is not initialized yet when set up IOAPIC
      entries, use read_pci_config_byte to access PCI config space directly.
      Signed-off-by: NWeidong Han <weidong.han@intel.com>
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      f007e99c
  9. 23 6月, 2009 1 次提交
    • P
      x86: Move init_gbpages() to setup_arch() · 854c879f
      Pekka J Enberg 提交于
      The init_gbpages() function is conditionally called from
      init_memory_mapping() function. There are two call-sites where
      this 'after_bootmem' condition can be true: setup_arch() and
      mem_init() via pci_iommu_alloc().
      
      Therefore, it's safe to move the call to init_gbpages() to
      setup_arch() as it's always called before mem_init().
      
      This removes an after_bootmem use - paving the way to remove
      all uses of that state variable.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <Pine.LNX.4.64.0906221731210.19474@melkki.cs.Helsinki.FI>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      854c879f
  10. 22 6月, 2009 6 次提交
    • T
      x86: ensure percpu lpage doesn't consume too much vmalloc space · 0017c869
      Tejun Heo 提交于
      On extreme configuration (e.g. 32bit 32-way NUMA machine), lpage
      percpu first chunk allocator can consume too much of vmalloc space.
      Make it fall back to 4k allocator if the consumption goes over 20%.
      
      [ Impact: add sanity check for lpage percpu first chunk allocator ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      0017c869
    • T
      x86: implement percpu_alloc kernel parameter · fa8a7094
      Tejun Heo 提交于
      According to Andi, it isn't clear whether lpage allocator is worth the
      trouble as there are many processors where PMD TLB is far scarcer than
      PTE TLB.  The advantage or disadvantage probably depends on the actual
      size of percpu area and specific processor.  As performance
      degradation due to TLB pressure tends to be highly workload specific
      and subtle, it is difficult to decide which way to go without more
      data.
      
      This patch implements percpu_alloc kernel parameter to allow selecting
      which first chunk allocator to use to ease debugging and testing.
      
      While at it, make sure all the failure paths report why something
      failed to help determining why certain allocator isn't working.  Also,
      kill the "Great future plan" comment which had already been realized
      quite some time ago.
      
      [ Impact: allow explicit percpu first chunk allocator selection ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      fa8a7094
    • T
      x86: fix pageattr handling for lpage percpu allocator and re-enable it · e59a1bb2
      Tejun Heo 提交于
      lpage allocator aliases a PMD page for each cpu and returns whatever
      is unused to the page allocator.  When the pageattr of the recycled
      pages are changed, this makes the two aliases point to the overlapping
      regions with different attributes which isn't allowed and known to
      cause subtle data corruption in certain cases.
      
      This can be handled in simliar manner to the x86_64 highmap alias.
      pageattr code should detect if the target pages have PMD alias and
      split the PMD alias and synchronize the attributes.
      
      pcpur allocator is updated to keep the allocated PMD pages map sorted
      in ascending address order and provide pcpu_lpage_remapped() function
      which binary searches the array to determine whether the given address
      is aliased and if so to which address.  pageattr is updated to use
      pcpu_lpage_remapped() to detect the PMD alias and split it up as
      necessary from cpa_process_alias().
      
      Jan Beulich spotted the original problem and incorrect usage of vaddr
      instead of laddr for lookup.
      
      With this, lpage percpu allocator should work correctly.  Re-enable
      it.
      
      [ Impact: fix subtle lpage pageattr bug and re-enable lpage ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      e59a1bb2
    • T
      x86: prepare setup_pcpu_lpage() for pageattr fix · 0ff2587f
      Tejun Heo 提交于
      Make the following changes in preparation of coming pageattr updates.
      
      * Define and use array of struct pcpul_ent instead of array of
        pointers.  The only difference is ->cpu field which is set but
        unused yet.
      
      * Rename variables according to the above change.
      
      * Rename local variable vm to pcpul_vm and move it out of the
        function.
      
      [ Impact: no functional difference ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      0ff2587f
    • T
      x86: rename remap percpu first chunk allocator to lpage · 97c9bf06
      Tejun Heo 提交于
      The "remap" allocator remaps large pages to build the first chunk;
      however, the name isn't very good because 4k allocator remaps too and
      the whole point of the remap allocator is using large page mapping.
      The allocator will be generalized and exported outside of x86, rename
      it to lpage before that happens.
      
      percpu_alloc kernel parameter is updated to accept both "remap" and
      "lpage" for lpage allocator.
      
      [ Impact: code cleanup, kernel parameter argument updated ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      97c9bf06
    • T
      x86: fix duplicate free in setup_pcpu_remap() failure path · c5806df9
      Tejun Heo 提交于
      In the failure path, setup_pcpu_remap() tries to free the area which
      has already been freed to make holes in the large page.  Fix it.
      
      [ Impact: fix duplicate free in failure path ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      c5806df9
  11. 21 6月, 2009 2 次提交
    • J
      perf_counter, x8: Fix L1-data-Cache-Store-Referencees for AMD · d9f2a5ec
      Jaswinder Singh Rajput 提交于
      Fix AMD's Data Cache Refills from System event.
      
      After this patch :
      
       ./tools/perf/perf stat -e l1d -e l1d-misses -e l1d-write -e l1d-prefetch -e l1d-prefetch-miss -e l1i -e l1i-misses -e l1i-prefetch -e l2 -e l2-misses -e l2-write -e dtlb -e dtlb-misses -e itlb -e itlb-misses -e bpu -e bpu-misses ls /dev/ > /dev/null
      
       Performance counter stats for 'ls /dev/':
      
              2499484  L1-data-Cache-Load-Referencees             (scaled from 3.97%)
                70347  L1-data-Cache-Load-Misses                  (scaled from 7.30%)
                 9360  L1-data-Cache-Store-Referencees            (scaled from 8.64%)
                32804  L1-data-Cache-Prefetch-Referencees         (scaled from 17.72%)
                 7693  L1-data-Cache-Prefetch-Misses              (scaled from 22.97%)
              2180945  L1-instruction-Cache-Load-Referencees      (scaled from 28.48%)
                14518  L1-instruction-Cache-Load-Misses           (scaled from 35.00%)
                 2405  L1-instruction-Cache-Prefetch-Referencees  (scaled from 34.89%)
                71387  L2-Cache-Load-Referencees                  (scaled from 34.94%)
                18732  L2-Cache-Load-Misses                       (scaled from 34.92%)
                79918  L2-Cache-Store-Referencees                 (scaled from 36.02%)
              1295294  Data-TLB-Cache-Load-Referencees            (scaled from 35.99%)
                30896  Data-TLB-Cache-Load-Misses                 (scaled from 33.36%)
              1222030  Instruction-TLB-Cache-Load-Referencees     (scaled from 29.46%)
                  357  Instruction-TLB-Cache-Load-Misses          (scaled from 20.46%)
               530888  Branch-Cache-Load-Referencees              (scaled from 11.48%)
                 8638  Branch-Cache-Load-Misses                   (scaled from 5.09%)
      
          0.011295149  seconds time elapsed.
      
      Earlier it always shows value 0.
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      LKML-Reference: <1245484165.3102.6.camel@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d9f2a5ec
    • A
      x86: Set cpu_llc_id on AMD CPUs · 99bd0c0f
      Andreas Herrmann 提交于
      This counts when building sched domains in case NUMA information
      is not available.
      
      ( See cpu_coregroup_mask() which uses llc_shared_map which in turn is
        created based on cpu_llc_id. )
      
      Currently Linux builds domains as follows:
      (example from a dual socket quad-core system)
      
       CPU0 attaching sched-domain:
        domain 0: span 0-7 level CPU
         groups: 0 1 2 3 4 5 6 7
      
        ...
      
       CPU7 attaching sched-domain:
        domain 0: span 0-7 level CPU
         groups: 7 0 1 2 3 4 5 6
      
      Ever since that is borked for multi-core AMD CPU systems.
      This patch fixes that and now we get a proper:
      
       CPU0 attaching sched-domain:
        domain 0: span 0-3 level MC
         groups: 0 1 2 3
         domain 1: span 0-7 level CPU
          groups: 0-3 4-7
      
        ...
      
       CPU7 attaching sched-domain:
        domain 0: span 4-7 level MC
         groups: 7 4 5 6
         domain 1: span 0-7 level CPU
          groups: 4-7 0-3
      
      This allows scheduler to assign tasks to cores on different sockets
      (i.e. that don't share last level cache) for performance reasons.
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      LKML-Reference: <20090619085909.GJ5218@alberich.amd.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      99bd0c0f
  12. 20 6月, 2009 1 次提交
  13. 19 6月, 2009 3 次提交
    • P
      perf_counter: Make callchain samples extensible · f9188e02
      Peter Zijlstra 提交于
      Before exposing upstream tools to a callchain-samples ABI, tidy it
      up to make it more extensible in the future:
      
      Use markers in the IP chain to denote context, use (u64)-1..-4095 range
      for these context markers because we use them for ERR_PTR(), so these
      addresses are unlikely to be mapped.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f9188e02
    • S
      function-graph: add stack frame test · 71e308a2
      Steven Rostedt 提交于
      In case gcc does something funny with the stack frames, or the return
      from function code, we would like to detect that.
      
      An arch may implement passing of a variable that is unique to the
      function and can be saved on entering a function and can be tested
      when exiting the function. Usually the frame pointer can be used for
      this purpose.
      
      This patch also implements this for x86. Where it passes in the stack
      frame of the parent function, and will test that frame on exit.
      
      There was a case in x86_32 with optimize for size (-Os) where, for a
      few functions, gcc would align the stack frame and place a copy of the
      return address into it. The function graph tracer modified the copy and
      not the actual return address. On return from the funtion, it did not go
      to the tracer hook, but returned to the parent. This broke the function
      graph tracer, because the return of the parent (where gcc did not do
      this funky manipulation) returned to the location that the child function
      was suppose to. This caused strange kernel crashes.
      
      This test detected the problem and pointed out where the issue was.
      
      This modifies the parameters of one of the functions that the arch
      specific code calls, so it includes changes to arch code to accommodate
      the new prototype.
      
      Note, I notice that the parsic arch implements its own push_return_trace.
      This is now a generic function and the ftrace_push_return_trace should be
      used instead. This patch does not touch that code.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      71e308a2
    • P
      gcov: enable GCOV_PROFILE_ALL for x86_64 · 7bf99fb6
      Peter Oberparleiter 提交于
      Enable gcov profiling of the entire kernel on x86_64. Required changes
      include disabling profiling for:
      
      * arch/kernel/acpi/realmode and arch/kernel/boot/compressed:
        not linked to main kernel
      * arch/vdso, arch/kernel/vsyscall_64 and arch/kernel/hpet:
        profiling causes segfaults during boot (incompatible context)
      Signed-off-by: NPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Li Wei <W.Li@Sun.COM>
      Cc: Michael Ellerman <michaele@au1.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: WANG Cong <xiyou.wangcong@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7bf99fb6
  14. 18 6月, 2009 9 次提交
    • H
      x86, mce: fix error path in mce_create_device() · b1f49f95
      Hidetoshi Seto 提交于
      Don't skip removing mce_attrs in route from error2.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      b1f49f95
    • Y
      x86: use zalloc_cpumask_var for mce_dev_initialized · e92fae06
      Yinghai Lu 提交于
      We need a cleared cpu_mask to record if mce is initialized, especially
      when MAXSMP is used.
      
      used zalloc_... instead
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Reviewed-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: stable@kernel.org
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      e92fae06
    • Y
      x86: fix duplicated sysfs attribute · 74b602c7
      Yinghai Lu 提交于
      The sysfs attribute cmci_disabled was accidentall turned into a
      duplicate of ignore_ce, breaking all other attributes.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      74b602c7
    • A
      x86: de-assembler-ize asm/desc.h · bc3f5d3d
      Alexander van Heukelum 提交于
      asm/desc.h is included in three assembly files, but the only macro
      it defines, GET_DESC_BASE, is never used. This patch removes the
      includes, removes the macro GET_DESC_BASE and the ASSEMBLY guard
      from asm/desc.h.
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      bc3f5d3d
    • A
      i386: fix/simplify espfix stack switching, move it into assembly · dc4c2a0a
      Alexander van Heukelum 提交于
      The espfix code triggers if we have a protected mode userspace
      application with a 16-bit stack. On returning to userspace, with iret,
      the CPU doesn't restore the high word of the stack pointer. This is an
      "official" bug, and the work-around used in the kernel is to temporarily
      switch to a 32-bit stack segment/pointer pair where the high word of the
      pointer is equal to the high word of the userspace stackpointer.
      
      The current implementation uses THREAD_SIZE to determine the cut-off,
      but there is no good reason not to use the more natural 64kb... However,
      implementing this by simply substituting THREAD_SIZE with 65536 in
      patch_espfix_desc crashed the test application. patch_espfix_desc tries
      to do what is described above, but gets it subtly wrong if the userspace
      stack pointer is just below a multiple of THREAD_SIZE: an overflow
      occurs to bit 13... With a bit of luck, when the kernelspace
      stackpointer is just below a 64kb-boundary, the overflow then ripples
      trough to bit 16 and userspace will see its stack pointer changed by
      65536.
      
      This patch moves all espfix code into entry_32.S. Selecting a 16-bit
      cut-off simplifies the code. The game with changing the limit dynamically
      is removed too. It complicates matters and I see no value in it. Changing
      only the top 16-bit word of ESP is one instruction and it also implies
      that only two bytes of the ESPFIX GDT entry need to be changed and this
      can be implemented in just a handful simple to understand instructions.
      As a side effect, the operation to compute the original ESP from the
      ESPFIX ESP and the GDT entry simplifies a bit too, and the remaining
      three instructions have been expanded inline in entry_32.S.
      
      impact: can now reliably run userspace with ESP=xxxxfffc on 16-bit
      stack segment
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Acked-by: NStas Sergeev <stsp@aknet.ru>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      dc4c2a0a
    • A
      i386: fix return to 16-bit stack from NMI handler · 2e04bc76
      Alexander van Heukelum 提交于
      Returning to a task with a 16-bit stack requires special care: the iret
      instruction does not restore the high word of esp in that case. The
      espfix code fixes this, but currently is not invoked on NMIs. This means
      that a running task gets the upper word of esp clobbered due intervening
      NMIs. To reproduce, compile and run the following program with the nmi
      watchdog enabled (nmi_watchdog=2 on the command line). Using gdb you can
      see that the high bits of esp contain garbage, while the low bits are
      still correct.
      
      This patch puts the espfix code back into the NMI code path.
      
      The patch is slightly complicated due to the irqtrace infrastructure not
      being NMI-safe. The NMI return path cannot call TRACE_IRQS_IRET.
      Otherwise, the tail of the normal iret-code is correct for the nmi code
      path too. To be able to share this code-path, the TRACE_IRQS_IRET was
      move up a bit. The espfix code exists after the TRACE_IRQS_IRET, but
      this code explicitly disables interrupts. This short interrupts-off
      section is now not traced anymore. The return-to-kernel path now always
      includes the preliminary test to decide if the espfix code should be
      called. This is never the case, but doing it this way keeps the patch as
      simple as possible and the few extra instructions should not affect
      timing in any significant way.
      
       #define _GNU_SOURCE
       #include <stdio.h>
       #include <sys/types.h>
       #include <sys/mman.h>
       #include <unistd.h>
       #include <sys/syscall.h>
       #include <asm/ldt.h>
      
      int modify_ldt(int func, void *ptr, unsigned long bytecount)
      {
              return syscall(SYS_modify_ldt, func, ptr, bytecount);
      }
      
      /* this is assumed to be usable */
       #define SEGBASEADDR 0x10000
       #define SEGLIMIT 0x20000
      
      /* 16-bit segment */
      struct user_desc desc = {
              .entry_number = 0,
              .base_addr = SEGBASEADDR,
              .limit = SEGLIMIT,
              .seg_32bit = 0,
              .contents = 0, /* ??? */
              .read_exec_only = 0,
              .limit_in_pages = 0,
              .seg_not_present = 0,
              .useable = 1
      };
      
      int main(void)
      {
              setvbuf(stdout, NULL, _IONBF, 0);
      
              /* map a 64 kb segment */
              char *pointer = mmap((void *)SEGBASEADDR, SEGLIMIT+1,
                              PROT_EXEC|PROT_READ|PROT_WRITE,
                              MAP_SHARED|MAP_ANONYMOUS, -1, 0);
              if (pointer == NULL) {
                      printf("could not map space\n");
                      return 0;
              }
      
              /* write ldt, new mode */
              int err = modify_ldt(0x11, &desc, sizeof(desc));
              if (err) {
                      printf("error modifying ldt: %i\n", err);
                      return 0;
              }
      
              for (int i=0; i<1000; i++) {
              asm volatile (
                      "pusha\n\t"
                      "mov %ss, %eax\n\t" /* preserve ss:esp */
                      "mov %esp, %ebp\n\t"
                      "push $7\n\t" /* index 0, ldt, user mode */
                      "push $65536-4096\n\t" /* esp */
                      "lss (%esp), %esp\n\t" /* switch to new stack */
                      "push %eax\n\t" /* save old ss:esp on new stack */
                      "push %ebp\n\t"
                      "add $17*65536, %esp\n\t" /* set high bits */
                      "mov %esp, %edx\n\t"
      
                      "mov $10000000, %ecx\n\t" /* wait... */
                      "1: loop 1b\n\t" /* ... a bit */
      
                      "cmp %esp, %edx\n\t"
                      "je 1f\n\t"
                      "ud2\n\t" /* esp changed inexplicably! */
                      "1:\n\t"
                      "sub $17*65536, %esp\n\t" /* restore high bits */
                      "lss (%esp), %esp\n\t" /* restore old ss:esp */
                      "popa\n\t");
      
                      printf("\rx%ix", i);
              }
      
              return 0;
      }
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Acked-by: NStas Sergeev <stsp@aknet.ru>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      2e04bc76
    • C
      x86, ioapic: Don't call disconnect_bsp_APIC if no APIC present · 3f4c3955
      Cyrill Gorcunov 提交于
      Vegard Nossum reported:
      
      [  503.576724] ACPI: Preparing to enter system sleep state S5
      [  503.710857] Disabling non-boot CPUs ...
      [  503.716853] Power down.
      [  503.717770] ------------[ cut here ]------------
      [  503.717770] WARNING: at arch/x86/kernel/apic/apic.c:249 native_apic_write_du)
      [  503.717770] Hardware name: OptiPlex GX100
      [  503.717770] Modules linked in:
      [  503.717770] Pid: 2136, comm: halt Not tainted 2.6.30 #443
      [  503.717770] Call Trace:
      [  503.717770]  [<c154d327>] ? printk+0x18/0x1a
      [  503.717770]  [<c1017358>] ? native_apic_write_dummy+0x38/0x50
      [  503.717770]  [<c10360fc>] warn_slowpath_common+0x6c/0xc0
      [  503.717770]  [<c1017358>] ? native_apic_write_dummy+0x38/0x50
      [  503.717770]  [<c1036165>] warn_slowpath_null+0x15/0x20
      [  503.717770]  [<c1017358>] native_apic_write_dummy+0x38/0x50
      [  503.717770]  [<c1017173>] disconnect_bsp_APIC+0x63/0x100
      [  503.717770]  [<c1019e48>] disable_IO_APIC+0xb8/0xc0
      [  503.717770]  [<c1214231>] ? acpi_power_off+0x0/0x29
      [  503.717770]  [<c1015e55>] native_machine_shutdown+0x65/0x80
      [  503.717770]  [<c1015c36>] native_machine_power_off+0x26/0x30
      [  503.717770]  [<c1015c49>] machine_power_off+0x9/0x10
      [  503.717770]  [<c1046596>] kernel_power_off+0x36/0x40
      [  503.717770]  [<c104680d>] sys_reboot+0xfd/0x1f0
      [  503.717770]  [<c109daa0>] ? perf_swcounter_event+0xb0/0x130
      [  503.717770]  [<c109db7d>] ? perf_counter_task_sched_out+0x5d/0x120
      [  503.717770]  [<c102dfc6>] ? finish_task_switch+0x56/0xd0
      [  503.717770]  [<c154da1e>] ? schedule+0x49e/0xb40
      [  503.717770]  [<c10444b0>] ? sys_kill+0x70/0x160
      [  503.717770]  [<c119d9db>] ? selinux_file_ioctl+0x3b/0x50
      [  503.717770]  [<c10dd443>] ? sys_ioctl+0x63/0x70
      [  503.717770]  [<c1003024>] sysenter_do_call+0x12/0x22
      [  503.717770] ---[ end trace 8157b5d0ed378f15 ]---
      
      |
      | That's including this commit:
      |
      | commit 103428e5
      |Author: Cyrill Gorcunov <gorcunov@openvz.org>
      |Date:   Sun Jun 7 16:48:40 2009 +0400
      |
      |    x86, apic: Fix dummy apic read operation together with broken MP handling
      |
      
      If we have apic disabled we don't even switch to APIC mode and do not
      calling for connect_bsp_APIC. Though on SMP compiled kernel the
      native_machine_shutdown does try to write the apic register anyway.
      
      Fix it with explicit check if we really should touch apic registers.
      Reported-by: NVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      LKML-Reference: <20090617181322.GG10822@lenovo>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3f4c3955
    • P
      perf_counter: x86: Set the period in the intel overflow handler · 60f916de
      Peter Zijlstra 提交于
      Commit 9e350de3 ("perf_counter: Accurate period data")
      missed a spot, which caused all Intel-PMU samples to have a
      period of 0.
      
      This broke auto-freq sampling.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60f916de
    • H
      x86: Remove duplicated #include's · 8653f88f
      Huang Weiyi 提交于
      Signed-off-by: NHuang Weiyi <weiyi.huang@gmail.com>
      LKML-Reference: <1244895686-2348-1-git-send-email-weiyi.huang@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8653f88f