1. 04 12月, 2009 1 次提交
  2. 06 11月, 2009 1 次提交
    • C
      x86: Make sure get_user_desc() doesn't sign extend. · 2c75910f
      Chris Lalancette 提交于
      The current implementation of get_user_desc() sign extends the return
      value because of integer promotion rules.  For the most part, this
      doesn't matter, because the top bit of base2 is usually 0.  If, however,
      that bit is 1, then the entire value will be 0xffff...  which is
      probably not what the caller intended.
      
      This patch casts the entire thing to unsigned before returning, which
      generates almost the same assembly as the current code but replaces the
      final "cltq" (sign extend) with a "mov %eax %eax" (zero-extend).  This
      fixes booting certain guests under KVM.
      Signed-off-by: NChris Lalancette <clalance@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2c75910f
  3. 04 11月, 2009 1 次提交
    • S
      x86, fs: Fix x86 procfs stack information for threads on 64-bit · 89240ba0
      Stefani Seibold 提交于
      This patch fixes two issues in the procfs stack information on
      x86-64 linux.
      
      The 32 bit loader compat_do_execve did not store stack
      start. (this was figured out by Alexey Dobriyan).
      
      The stack information on a x64_64 kernel always shows 0 kbyte
      stack usage, because of a missing implementation of the KSTK_ESP
      macro which always returned -1.
      
      The new implementation now returns the right value.
      Signed-off-by: NStefani Seibold <stefani@seibold.net>
      Cc: Americo Wang <xiyou.wangcong@gmail.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <1257240160.4889.24.camel@wall-e>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      89240ba0
  4. 03 11月, 2009 1 次提交
  5. 21 10月, 2009 1 次提交
  6. 16 10月, 2009 1 次提交
  7. 14 10月, 2009 1 次提交
  8. 13 10月, 2009 1 次提交
    • J
      x86/paravirt: Use normal calling sequences for irq enable/disable · 71999d98
      Jeremy Fitzhardinge 提交于
      Bastian Blank reported a boot crash with stackprotector enabled,
      and debugged it back to edx register corruption.
      
      For historical reasons irq enable/disable/save/restore had special
      calling sequences to make them more efficient.  With the more
      recent introduction of higher-level and more general optimisations
      this is no longer necessary so we can just use the normal PVOP_
      macros.
      
      This fixes some residual bugs in the old implementations which left
      edx liable to inadvertent clobbering. Also, fix some bugs in
      __PVOP_VCALLEESAVE which were revealed by actual use.
      Reported-by: NBastian Blank <bastian@waldi.eu.org>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stable Kernel <stable@kernel.org>
      Cc: Xen-devel <xen-devel@lists.xensource.com>
      LKML-Reference: <4AD3BC9B.7040501@goop.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      71999d98
  9. 10 10月, 2009 1 次提交
    • J
      x86/amd-iommu: Workaround for erratum 63 · c5cca146
      Joerg Roedel 提交于
      There is an erratum for IOMMU hardware which documents
      undefined behavior when forwarding SMI requests from
      peripherals and the DTE of that peripheral has a sysmgt
      value of 01b. This problem caused weird IO_PAGE_FAULTS in my
      case.
      This patch implements the suggested workaround for that
      erratum into the AMD IOMMU driver.  The erratum is
      documented with number 63.
      
      Cc: stable@kernel.org
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      c5cca146
  10. 04 10月, 2009 1 次提交
  11. 02 10月, 2009 2 次提交
    • I
      x86: EDAC: MCE: Fix MCE decoding callback logic · f436f8bb
      Ingo Molnar 提交于
      Make decoding of MCEs happen only on AMD hardware by registering a
      non-default callback only on CPU families which support it.
      
      While looking at the interaction of decode_mce() with the other MCE
      code i also noticed a few other things and made the following
      cleanups/fixes:
      
       - Fixed the mce_decode() weak alias - a weak alias is really not
         good here, it should be a proper callback. A weak alias will be
         overriden if a piece of code is built into the kernel - not
         good, obviously.
      
       - The patch initializes the callback on AMD family 10h and 11h.
      
       - Added the more correct fallback printk of:
      
      	No support for human readable MCE decoding on this CPU type.
      	Transcribe the message and run it through 'mcelog --ascii' to decode.
      
         On CPUs that dont have a decoder.
      
       - Made the surrounding code more readable.
      
      Note that the callback allows us to have a default fallback -
      without having to check the CPU versions during the printout
      itself. When an EDAC module registers itself, it can install the
      decode-print function.
      
      (there's no unregister needed as this is core code.)
      
      version -v2 by Borislav Petkov:
      
       - add K8 to the set of supported CPUs
      
       - always build in edac_mce_amd since we use an early_initcall now
      
       - fix checkpatch warnings
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      LKML-Reference: <20091001141432.GA11410@aftab>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f436f8bb
    • S
      x86: fix csum_ipv6_magic asm memory clobber · 392d814d
      Samuel Thibault 提交于
      Just like ip_fast_csum, the assembly snippet in csum_ipv6_magic needs a
      memory clobber, as it is only passed the address of the buffer, not a
      memory reference to the buffer itself.
      
      This caused failures in Hurd's pfinetv4 when we tried to compile it with
      gcc-4.3 (bogus checksums).
      Signed-off-by: NSamuel Thibault <samuel.thibault@ens-lyon.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      392d814d
  12. 01 10月, 2009 2 次提交
  13. 24 9月, 2009 4 次提交
  14. 23 9月, 2009 1 次提交
  15. 22 9月, 2009 1 次提交
  16. 21 9月, 2009 5 次提交
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
    • T
      Use macros for .data.page_aligned section. · abe1ee3a
      Tim Abbott 提交于
      This patch changes the remaining direct references to
      .data.page_aligned in C and assembly code to use the macros in
      include/linux/linkage.h.
      Signed-off-by: NTim Abbott <tabbott@ksplice.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      abe1ee3a
    • S
      x86: Fix uaccess_32.h typo · 4fe48782
      Sergey Senozhatsky 提交于
      Trivial: correct "that the we don't" typo.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      LKML-Reference: <20090917125401.GU3717@localdomain.by>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4fe48782
    • F
      x86: Trivial whitespace cleanups · 878f4f53
      Felipe Contreras 提交于
      Signed-off-by: NFelipe Contreras <felipe.contreras@gmail.com>
      Cc: Vegard Nossum <vegardno@ifi.uio.no>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Alok N Kataria <akataria@vmware.com>
      Cc: "Tan Wei Chong" <wei.chong.tan@intel.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Lin Ming <ming.m.lin@intel.com>
      Cc: Bob Moore <robert.moore@intel.com>
      LKML-Reference: <1253137123-18047-2-git-send-email-felipe.contreras@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      878f4f53
    • C
      x86, apic: Fix missed handling of discrete apics · 8312136f
      Cyrill Gorcunov 提交于
      In case of discrete (pretty old) apics we may have cpu_has_apic bit
      not set but have to check if smp_found_config (MP spec) is there
      and apic was not disabled.
      
      Also don't forget to print apic/io-apic for such case as well.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      LKML-Reference: <20090915071230.GA10604@lenovo>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8312136f
  17. 18 9月, 2009 3 次提交
  18. 16 9月, 2009 3 次提交
  19. 15 9月, 2009 6 次提交
    • P
      x86: Add generic aperf/mperf code · 5cbc19a9
      Peter Zijlstra 提交于
      Move some of the aperf/mperf code out from the cpufreq driver
      thingy so that other people can enjoy it too.
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Yanmin <yanmin_zhang@linux.intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Cc: cpufreq@vger.kernel.org
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5cbc19a9
    • P
      x86: Move APERF/MPERF into a X86_FEATURE · a8303aaf
      Peter Zijlstra 提交于
      Move the APERFMPERF capacility into a X86_FEATURE flag so that it
      can be used outside of the acpi cpufreq driver.
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Yanmin <yanmin_zhang@linux.intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Cc: cpufreq@vger.kernel.org
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a8303aaf
    • P
      sched: Reduce forkexec_idx · b8a543ea
      Peter Zijlstra 提交于
      If we're looking to place a new task, we might as well find the
      idlest position _now_, not 1 tick ago.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b8a543ea
    • M
      sched: Improve latencies and throughput · 0ec9fab3
      Mike Galbraith 提交于
      Make the idle balancer more agressive, to improve a
      x264 encoding workload provided by Jason Garrett-Glaser:
      
       NEXT_BUDDY NO_LB_BIAS
       encoded 600 frames, 252.82 fps, 22096.60 kb/s
       encoded 600 frames, 250.69 fps, 22096.60 kb/s
       encoded 600 frames, 245.76 fps, 22096.60 kb/s
      
       NO_NEXT_BUDDY LB_BIAS
       encoded 600 frames, 344.44 fps, 22096.60 kb/s
       encoded 600 frames, 346.66 fps, 22096.60 kb/s
       encoded 600 frames, 352.59 fps, 22096.60 kb/s
      
       NO_NEXT_BUDDY NO_LB_BIAS
       encoded 600 frames, 425.75 fps, 22096.60 kb/s
       encoded 600 frames, 425.45 fps, 22096.60 kb/s
       encoded 600 frames, 422.49 fps, 22096.60 kb/s
      
      Peter pointed out that this is better done via newidle_idx,
      not via LB_BIAS, newidle balancing should look for where
      there is load _now_, not where there was load 2 ticks ago.
      
      Worst-case latencies are improved as well as no buddies
      means less vruntime spread. (as per prior lkml discussions)
      
      This change improves kbuild-peak parallelism as well.
      Reported-by: NJason Garrett-Glaser <darkshikari@gmail.com>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1253011667.9128.16.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0ec9fab3
    • P
      sched: Tweak wake_idx · 78e7ed53
      Peter Zijlstra 提交于
      When merging select_task_rq_fair() and sched_balance_self() we lost
      the use of wake_idx, restore that and set them to 0 to make wake
      balancing more aggressive.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      78e7ed53
    • P
      sched: Merge select_task_rq_fair() and sched_balance_self() · c88d5910
      Peter Zijlstra 提交于
      The problem with wake_idle() is that is doesn't respect things like
      cpu_power, which means it doesn't deal well with SMT nor the recent
      RT interaction.
      
      To cure this, it needs to do what sched_balance_self() does, which
      leads to the possibility of merging select_task_rq_fair() and
      sched_balance_self().
      
      Modify sched_balance_self() to:
      
        - update_shares() when walking up the domain tree,
          (it only called it for the top domain, but it should
           have done this anyway), which allows us to remove
          this ugly bit from try_to_wake_up().
      
        - do wake_affine() on the smallest domain that contains
          both this (the waking) and the prev (the wakee) cpu for
          WAKE invocations.
      
      Then use the top-down balance steps it had to replace wake_idle().
      
      This leads to the dissapearance of SD_WAKE_BALANCE and
      SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE.
      
      SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective.
      
      Touch all topology bits to replace the old with new SD flags --
      platforms might need re-tuning, enabling SD_BALANCE_WAKE
      conditionally on a NUMA distance seems like a good additional
      feature, magny-core and small nehalem systems would want this
      enabled, systems with slow interconnects would not.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c88d5910
  20. 11 9月, 2009 3 次提交
    • M
      x86: Increase MIN_GAP to include randomized stack · 80938332
      Michal Hocko 提交于
      Currently we are not including randomized stack size when calculating
      mmap_base address in arch_pick_mmap_layout for topdown case. This might
      cause that mmap_base starts in the stack reserved area because stack is
      randomized by 1GB for 64b (8MB for 32b) and the minimum gap is 128MB.
      
      If the stack really grows down to mmap_base then we can get silent mmap
      region overwrite by the stack values.
      
      Let's include maximum stack randomization size into MIN_GAP which is
      used as the low bound for the gap in mmap.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      LKML-Reference: <1252400515-6866-1-git-send-email-mhocko@suse.cz>
      Acked-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Stable Team <stable@kernel.org>
      80938332
    • B
      x86: Fix code patching for paravirt-alternatives on 486 · 5367b688
      Ben Hutchings 提交于
      As reported in <http://bugs.debian.org/511703> and
      <http://bugs.debian.org/515982>, kernels with paravirt-alternatives
      enabled crash in text_poke_early() on at least some 486-class
      processors.
      
      The problem is that text_poke_early() itself uses inline functions
      affected by paravirt-alternatives and so will modify instructions that
      have already been prefetched.  Pentium and later processors will
      invalidate the prefetched instructions in this case, but 486-class
      processors do not.
      
      Change sync_core() to limit prefetching on 486-class (and 386-class)
      processors, and move the call to sync_core() above the call to the
      modifiable local_irq_restore().
      Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
      LKML-Reference: <1252547631.3423.134.camel@localhost>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      5367b688
    • S
      x86/tracing: comment need for atomic nop · fc06b852
      Steven Rostedt 提交于
      The dynamic function tracer relys on the macro P6_NOP5 always being
      an atomic NOP. If for some reason it is changed to be two operations
      (like a nop2 nop3) it can faults within the kernel when the function
      tracer modifies the code.
      
      This patch adds a comment to note that the P6_NOPs are expected to
      be atomic. This will hopefully prevent anyone from changing that.
      Reported-by: NMathieu Desnoyer <mathieu.desnoyers@polymtl.ca>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      fc06b852