1. 06 2月, 2010 1 次提交
  2. 23 1月, 2010 1 次提交
  3. 14 1月, 2010 1 次提交
    • L
      x86-64: support native xadd rwsem implementation · bafaecd1
      Linus Torvalds 提交于
      This one is much faster than the spinlock based fallback rwsem code,
      with certain artifical benchmarks having shown 300%+ improvement on
      threaded page faults etc.
      
      Again, note the 32767-thread limit here. So this really does need that
      whole "make rwsem_count_t be 64-bit and fix the BIAS values to match"
      extension on top of it, but that is conceptually a totally independent
      issue.
      
      NOT TESTED! The original patch that this all was based on were tested by
      KAMEZAWA Hiroyuki, but maybe I screwed up something when I created the
      cleaned-up series, so caveat emptor..
      
      Also note that it _may_ be a good idea to mark some more registers
      clobbered on x86-64 in the inline asms instead of saving/restoring them.
      They are inline functions, but they are only used in places where there
      are not a lot of live registers _anyway_, so doing for example the
      clobbers of %r8-%r11 in the asm wouldn't make the fast-path code any
      worse, and would make the slow-path code smaller.
      
      (Not that the slow-path really matters to that degree. Saving a few
      unnecessary registers is the _least_ of our problems when we hit the slow
      path. The instruction/cycle counting really only matters in the fast
      path).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <alpine.LFD.2.00.1001121810410.17145@localhost.localdomain>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      bafaecd1
  4. 17 12月, 2009 1 次提交
    • B
      x86, msr: msrs_alloc/free for CONFIG_SMP=n · 6ede31e0
      Borislav Petkov 提交于
      Randy Dunlap reported the following build error:
      
      "When CONFIG_SMP=n, CONFIG_X86_MSR=m:
      
      ERROR: "msrs_free" [drivers/edac/amd64_edac_mod.ko] undefined!
      ERROR: "msrs_alloc" [drivers/edac/amd64_edac_mod.ko] undefined!"
      
      This is due to the fact that <arch/x86/lib/msr.c> is conditioned on
      CONFIG_SMP and in the UP case we have only the stubs in the header.
      Fork off SMP functionality into a new file (msr-smp.c) and build
      msrs_{alloc,free} unconditionally.
      Reported-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NBorislav Petkov <petkovbb@gmail.com>
      LKML-Reference: <20091216231625.GD27228@liondog.tnic>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      6ede31e0
  5. 08 12月, 2009 1 次提交
  6. 07 12月, 2009 1 次提交
  7. 01 10月, 2009 2 次提交
  8. 05 9月, 2009 1 次提交
    • H
      x86, msr: change msr-reg.o to obj-y, and export its symbols · b19ae399
      H. Peter Anvin 提交于
      Change msr-reg.o to obj-y (it will be included in virtually every
      kernel since it is used by the initialization code for AMD processors)
      and add a separate C file to export its symbols to modules, so that
      msr.ko can use them; on uniprocessors we bypass the helper functions
      in msr.o and use the accessor functions directly via inlines.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      LKML-Reference: <20090904140834.GA15789@elte.hu>
      Cc: Borislav Petkov <petkovbb@googlemail.com>
      b19ae399
  9. 01 9月, 2009 1 次提交
  10. 27 8月, 2009 1 次提交
    • M
      x86: Instruction decoder API · eb13296c
      Masami Hiramatsu 提交于
      Add x86 instruction decoder to arch-specific libraries. This decoder
      can decode x86 instructions used in kernel into prefix, opcode, modrm,
      sib, displacement and immediates. This can also show the length of
      instructions.
      
      This version introduces instruction attributes for decoding
      instructions.
      The instruction attribute tables are generated from the opcode map file
      (x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk).
      
      Currently, the opcode maps are based on opcode maps in Intel(R) 64 and
      IA-32 Architectures Software Developers Manual Vol.2: Appendix.A,
      and consist of below two types of opcode tables.
      
      1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are
      written as below;
      
       Table: table-name
       Referrer: escaped-name
       opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...]
        (or)
       opcode: escape # escaped-name
       EndTable
      
      Group opcodes, which has 8 elements, are written as below;
      
       GrpTable: GrpXXX
       reg:  mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...]
       EndTable
      
      These opcode maps include a few SSE and FP opcodes (for setup), because
      those opcodes are used in the kernel.
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NJim Keniston <jkenisto@us.ibm.com>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: K.Prasad <prasad@linux.vnet.ibm.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Przemysław Pawełczyk <przemyslaw@pawelczyk.it>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      LKML-Reference: <20090813203413.31965.49709.stgit@localhost.localdomain>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      eb13296c
  11. 04 7月, 2009 1 次提交
    • I
      x86: atomic64: Export APIs to modules · 1fde902d
      Ingo Molnar 提交于
      atomic64_t primitives are used by a handful of drivers,
      so export the APIs consistently. These were inlined
      before.
      
      Also mark atomic64_32.o a core object, so that the symbols
      are available even if not linked to core kernel pieces.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <tip-05118ab8859492ac9ddda0154cf90e37b0a4a0b0@git.kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1fde902d
  12. 03 7月, 2009 1 次提交
    • I
      x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file · b7882b7c
      Ingo Molnar 提交于
      Linus noted that the atomic64_t primitives are all inlines
      currently which is crazy because these functions have a large
      register footprint anyway.
      
      Move them to a separate file: arch/x86/lib/atomic64_32.c
      
      Also, while at it, rename all uses of 'unsigned long long' to
      the much shorter u64.
      
      This makes the appearance of the prototypes a lot nicer - and
      it also uncovered a few bugs where (yet unused) API variants
      had 'long' as their return type instead of u64.
      
      [ More intrusive changes are not yet done in this patch. ]
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b7882b7c
  13. 10 6月, 2009 1 次提交
  14. 04 9月, 2008 1 次提交
  15. 09 7月, 2008 3 次提交
  16. 24 5月, 2008 1 次提交
    • S
      ftrace: trace irq disabled critical timings · 81d68a96
      Steven Rostedt 提交于
      This patch adds latency tracing for critical timings
      (how long interrupts are disabled for).
      
       "irqsoff" is added to /debugfs/tracing/available_tracers
      
      Note:
        tracing_max_latency
          also holds the max latency for irqsoff (in usecs).
         (default to large number so one must start latency tracing)
      
        tracing_thresh
          threshold (in usecs) to always print out if irqs off
          is detected to be longer than stated here.
          If irq_thresh is non-zero, then max_irq_latency
          is ignored.
      
      Here's an example of a trace with ftrace_enabled = 0
      
      =======
      preemption latency trace v1.1.5 on 2.6.24-rc7
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      --------------------------------------------------------------------
       latency: 100 us, #3/3, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
          -----------------
          | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
          -----------------
       => started at: _spin_lock_irqsave+0x2a/0xb7
       => ended at:   _spin_unlock_irqrestore+0x32/0x5f
      
                       _------=> CPU#
                      / _-----=> irqs-off
                     | / _----=> need-resched
                     || / _---=> hardirq/softirq
                     ||| / _--=> preempt-depth
                     |||| /
                     |||||     delay
         cmd     pid ||||| time  |   caller
            \   /    |||||   \   |   /
       swapper-0     1d.s3    0us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000])
       swapper-0     1d.s3  100us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000])
       swapper-0     1d.s3  100us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f)
      
      vim:ft=help
      =======
      
      And this is a trace with ftrace_enabled == 1
      
      =======
      preemption latency trace v1.1.5 on 2.6.24-rc7
      --------------------------------------------------------------------
       latency: 102 us, #12/12, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
          -----------------
          | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
          -----------------
       => started at: _spin_lock_irqsave+0x2a/0xb7
       => ended at:   _spin_unlock_irqrestore+0x32/0x5f
      
                       _------=> CPU#
                      / _-----=> irqs-off
                     | / _----=> need-resched
                     || / _---=> hardirq/softirq
                     ||| / _--=> preempt-depth
                     |||| /
                     |||||     delay
         cmd     pid ||||| time  |   caller
            \   /    |||||   \   |   /
       swapper-0     1dNs3    0us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000])
       swapper-0     1dNs3   46us : e1000_read_phy_reg+0x16/0x225 [e1000] (e1000_update_stats+0x5e2/0x64c [e1000])
       swapper-0     1dNs3   46us : e1000_swfw_sync_acquire+0x10/0x99 [e1000] (e1000_read_phy_reg+0x49/0x225 [e1000])
       swapper-0     1dNs3   46us : e1000_get_hw_eeprom_semaphore+0x12/0xa6 [e1000] (e1000_swfw_sync_acquire+0x36/0x99 [e1000])
       swapper-0     1dNs3   47us : __const_udelay+0x9/0x47 (e1000_read_phy_reg+0x116/0x225 [e1000])
       swapper-0     1dNs3   47us+: __delay+0x9/0x50 (__const_udelay+0x45/0x47)
       swapper-0     1dNs3   97us : preempt_schedule+0xc/0x84 (__delay+0x4e/0x50)
       swapper-0     1dNs3   98us : e1000_swfw_sync_release+0xc/0x55 [e1000] (e1000_read_phy_reg+0x211/0x225 [e1000])
       swapper-0     1dNs3   99us+: e1000_put_hw_eeprom_semaphore+0x9/0x35 [e1000] (e1000_swfw_sync_release+0x50/0x55 [e1000])
       swapper-0     1dNs3  101us : _spin_unlock_irqrestore+0xe/0x5f (e1000_update_stats+0x641/0x64c [e1000])
       swapper-0     1dNs3  102us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000])
       swapper-0     1dNs3  102us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f)
      
      vim:ft=help
      =======
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      81d68a96
  17. 27 4月, 2008 2 次提交
    • A
      x86, UML: remove x86-specific implementations of find_first_bit · 5245698f
      Alexander van Heukelum 提交于
      x86 has been switched to the generic versions of find_first_bit
      and find_first_zero_bit, but the original versions were retained.
      This patch just removes the now unused x86-specific versions.
      
      also update UML.
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5245698f
    • A
      x86: change x86 to use generic find_next_bit · 6fd92b63
      Alexander van Heukelum 提交于
      The versions with inline assembly are in fact slower on the machines I
      tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD
      Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid
      crashing the benchmark.
      
      Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size
      1...512, for each possible bitmap with one bit set, for each possible
      offset: find the position of the first bit starting at offset. If you
      follow ;). Times include setup of the bitmap and checking of the
      results.
      
      		Athlon		Xeon		Opteron 32/64bit
      x86-specific:	0m3.692s	0m2.820s	0m3.196s / 0m2.480s
      generic:	0m2.622s	0m1.662s	0m2.100s / 0m1.572s
      
      If the bitmap size is not a multiple of BITS_PER_LONG, and no set
      (cleared) bit is found, find_next_bit (find_next_zero_bit) returns a
      value outside of the range [0, size]. The generic version always returns
      exactly size. The generic version also uses unsigned long everywhere,
      while the x86 versions use a mishmash of int, unsigned (int), long and
      unsigned long.
      
      Using the generic version does give a slightly bigger kernel, though.
      
      defconfig:	   text    data     bss     dec     hex filename
      x86-specific:	4738555  481232  626688 5846475  5935cb vmlinux (32 bit)
      generic:	4738621  481232  626688 5846541  59360d vmlinux (32 bit)
      x86-specific:	5392395  846568  724424 6963387  6a40bb vmlinux (64 bit)
      generic:	5392458  846568  724424 6963450  6a40fa vmlinux (64 bit)
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6fd92b63
  18. 06 2月, 2008 1 次提交
  19. 30 1月, 2008 1 次提交
  20. 11 10月, 2007 14 次提交
  21. 22 7月, 2007 1 次提交
    • A
      i386: Move all simple string operations out of line · b520b85a
      Andi Kleen 提交于
      The compiler generally generates reasonable inline code for the simple
      cases and for the rest it's better for code size for them to be out of line.
      Also there they can be potentially optimized more in the future.
      
      In fact they probably should be in a .S file because they're all pure
      assembly, but that's for another day.
      
      Also some code style cleanup on them while I was on it (this seems
      to be the last untouched really early Linux code)
      
      This saves ~12k text for a defconfig kernel with gcc 4.1.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b520b85a
  22. 21 2月, 2007 2 次提交