1. 18 12月, 2010 1 次提交
    • C
      x86: this_cpu_cmpxchg and this_cpu_xchg operations · 7296e08a
      Christoph Lameter 提交于
      Provide support as far as the hardware capabilities of the x86 cpus
      allow.
      
      Define CONFIG_CMPXCHG_LOCAL in Kconfig.cpu to allow core code to test for
      fast cpuops implementations.
      
      V1->V2:
      	- Take out the definition for this_cpu_cmpxchg_8 and move it into
      	  a separate patch.
      
      tj: - Reordered ops to better follow this_cpu_* organization.
          - Renamed macro temp variables similar to their existing
            neighbours.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      7296e08a
  2. 04 5月, 2010 1 次提交
    • B
      x86-32: Rework cache flush denied handler · 40d2e763
      Brian Gerst 提交于
      The cache flush denied error is an erratum on some AMD 486 clones.  If an invd
      instruction is executed in userspace, the processor calls exception 19 (13 hex)
      instead of #GP (13 decimal).  On cpus where XMM is not supported, redirect
      exception 19 to do_general_protection().  Also, remove die_if_kernel(), since
      this was the last user.
      Signed-off-by: NBrian Gerst <brgerst@gmail.com>
      LKML-Reference: <1269176446-2489-2-git-send-email-brgerst@gmail.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      40d2e763
  3. 26 3月, 2010 1 次提交
    • P
      x86, perf, bts, mm: Delete the never used BTS-ptrace code · faa4602e
      Peter Zijlstra 提交于
      Support for the PMU's BTS features has been upstreamed in
      v2.6.32, but we still have the old and disabled ptrace-BTS,
      as Linus noticed it not so long ago.
      
      It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without
      regard for other uses (perf) and doesn't provide the flexibility
      needed for perf either.
      
      Its users are ptrace-block-step and ptrace-bts, since ptrace-bts
      was never used and ptrace-block-step can be implemented using a
      much simpler approach.
      
      So axe all 3000 lines of it. That includes the *locked_memory*()
      APIs in mm/mlock.c as well.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Markus Metzger <markus.t.metzger@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <20100325135413.938004390@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      faa4602e
  4. 14 1月, 2010 1 次提交
    • L
      x86-64: support native xadd rwsem implementation · bafaecd1
      Linus Torvalds 提交于
      This one is much faster than the spinlock based fallback rwsem code,
      with certain artifical benchmarks having shown 300%+ improvement on
      threaded page faults etc.
      
      Again, note the 32767-thread limit here. So this really does need that
      whole "make rwsem_count_t be 64-bit and fix the BIAS values to match"
      extension on top of it, but that is conceptually a totally independent
      issue.
      
      NOT TESTED! The original patch that this all was based on were tested by
      KAMEZAWA Hiroyuki, but maybe I screwed up something when I created the
      cleaned-up series, so caveat emptor..
      
      Also note that it _may_ be a good idea to mark some more registers
      clobbered on x86-64 in the inline asms instead of saving/restoring them.
      They are inline functions, but they are only used in places where there
      are not a lot of live registers _anyway_, so doing for example the
      clobbers of %r8-%r11 in the asm wouldn't make the fast-path code any
      worse, and would make the slow-path code smaller.
      
      (Not that the slow-path really matters to that degree. Saving a few
      unnecessary registers is the _least_ of our problems when we hit the slow
      path. The instruction/cycle counting really only matters in the fast
      path).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <alpine.LFD.2.00.1001121810410.17145@localhost.localdomain>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      bafaecd1
  5. 06 1月, 2010 1 次提交
  6. 05 1月, 2010 1 次提交
  7. 19 11月, 2009 1 次提交
    • J
      x86: Eliminate redundant/contradicting cache line size config options · 350f8f56
      Jan Beulich 提交于
      Rather than having X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
      (with inconsistent defaults), just having the latter suffices as
      the former can be easily calculated from it.
      
      To be consistent, also change X86_INTERNODE_CACHE_BYTES to
      X86_INTERNODE_CACHE_SHIFT, and set it to 7 (128 bytes) for NUMA
      to account for last level cache line size (which here matters
      more than L1 cache line size).
      
      Finally, make sure the default value for X86_L1_CACHE_SHIFT,
      when X86_GENERIC is selected, is being seen before that for the
      individual CPU model options (other than on x86-64, where
      GENERIC_CPU is part of the choice construct, X86_GENERIC is a
      separate option on ix86).
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Acked-by: NRavikiran Thirumalai <kiran@scalex86.org>
      Acked-by: NNick Piggin <npiggin@suse.de>
      LKML-Reference: <4AFD5710020000780001F8F0@vpn.id2.novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      350f8f56
  8. 26 10月, 2009 1 次提交
  9. 03 10月, 2009 1 次提交
  10. 01 10月, 2009 1 次提交
    • L
      x86: Optimize cmpxchg64() at build-time some more · 982d007a
      Linus Torvalds 提交于
      Try to avoid the 'alternates()' code when we can statically
      determine that cmpxchg8b is fine. We already have that
      CONFIG_x86_CMPXCHG64 (enabled by PAE support), and we could easily
      also enable it for some of the CPU cases.
      
      Note, this patch only adds CMPXCHG8B for the obvious Intel CPU's,
      not for others. (There was something really messy about cmpxchg8b
      and clone CPU's, so if you enable it on other CPUs later, do it
      carefully.)
      
      If we avoid that asm-alternative thing when we can assume the
      instruction exists, we'll generate less support crud, and we'll
      avoid the whole issue with that extra 'nop' for padding instruction
      sizes etc.
      
      LKML-Reference: <alpine.LFD.2.01.0909301743150.6996@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      982d007a
  11. 23 8月, 2009 1 次提交
  12. 11 6月, 2009 1 次提交
  13. 24 4月, 2009 1 次提交
  14. 16 4月, 2009 1 次提交
    • I
      x86: disable X86_PTRACE_BTS for now · d45b41ae
      Ingo Molnar 提交于
      Oleg Nesterov found a couple of races in the ptrace-bts code
      and fixes are queued up for it but they did not get ready in time
      for the merge window. We'll merge them in v2.6.31 - until then
      mark the feature as CONFIG_BROKEN. There's no user-space yet
      making use of this so it's not a big issue.
      
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d45b41ae
  15. 14 3月, 2009 1 次提交
  16. 06 2月, 2009 1 次提交
  17. 05 2月, 2009 1 次提交
  18. 21 1月, 2009 1 次提交
    • I
      x86: make x86_32 use tlb_64.c, build fix, clean up X86_L1_CACHE_BYTES · ace6c6c8
      Ingo Molnar 提交于
      Fix:
      
        arch/x86/mm/tlb.c:47: error: ‘CONFIG_X86_INTERNODE_CACHE_BYTES’ undeclared here (not in a function)
      
      The CONFIG_X86_INTERNODE_CACHE_BYTES symbol is only defined on 64-bit,
      because vsmp support is 64-bit only. Define it on 32-bit too - where it
      will always be equal to X86_L1_CACHE_BYTES.
      
      Also move the default of X86_L1_CACHE_BYTES (which is separate from the
      more commonly used L1_CACHE_SHIFT kconfig symbol) from 128 bytes to
      64 bytes.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ace6c6c8
  19. 14 1月, 2009 1 次提交
    • I
      x86: change the default cache size to 64 bytes · 0a2a18b7
      Ingo Molnar 提交于
      Right now the generic cacheline size is 128 bytes - that is wasteful
      when structures are aligned, as all modern x86 CPUs have an (effective)
      cacheline sizes of 64 bytes.
      
      It was set to 128 bytes due to some cacheline aliasing problems on
      older P4 systems, but those are many years old and we dont optimize
      for them anymore. (They'll still get the 128 bytes cacheline size if
      the kernel is specifically built for Pentium 4)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      0a2a18b7
  20. 06 1月, 2009 1 次提交
  21. 25 12月, 2008 1 次提交
  22. 24 12月, 2008 1 次提交
    • I
      x86: disable X86_PTRACE_BTS · 40f15ad8
      Ingo Molnar 提交于
      there's a new ptrace arch level feature in .28:
      
        config X86_PTRACE_BTS
        bool "Branch Trace Store"
      
      it has broken fork() handling: the old DS area gets copied over into
      a new task without clearing it.
      
      Fixes exist but they came too late:
      
        c5dee617: x86, bts: memory accounting
        bf53de90: x86, bts: add fork and exit handling
      
      and are queued up for v2.6.29. This shows that the facility is still not
      tested well enough to release into a stable kernel - disable it for now and
      reactivate in .29. In .29 the hardware-branch-tracer will use the DS/BTS
      facilities too - hopefully resulting in better code.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      40f15ad8
  23. 26 11月, 2008 1 次提交
  24. 28 10月, 2008 1 次提交
  25. 13 10月, 2008 2 次提交
  26. 12 10月, 2008 2 次提交
  27. 10 9月, 2008 1 次提交
  28. 09 9月, 2008 1 次提交
    • L
      x86: disable static NOPLs on 32 bits · 14469a8d
      Linus Torvalds 提交于
      On 32-bit, at least the generic nops are fairly reasonable, but the
      default nops for 64-bit really look pretty sad, and the P6 nops really do
      look better.
      
      So I would suggest perhaps moving the static P6 nop selection into the
      CONFIG_X86_64 thing.
      
      The alternative is to just get rid of that static nop selection, and just
      have two cases: 32-bit and 64-bit, and just pick obviously safe cases for
      them.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      14469a8d
  29. 08 9月, 2008 1 次提交
  30. 18 8月, 2008 1 次提交
    • T
      x86: configuration options to compile out x86 CPU support code · 8d02c211
      Thomas Petazzoni 提交于
      This patch adds some configuration options that allow to compile out
      CPU vendor-specific code in x86 kernels (in arch/x86/kernel/cpu). The
      new configuration options are only visible when CONFIG_EMBEDDED is
      selected, as they are mostly interesting for space savings reasons.
      
      An example of size saving, on x86 with only Intel CPU support:
      
         text	   data	    bss	    dec	    hex	filename
      1125479	 118760	 212992	1457231	 163c4f	vmlinux.old
      1121355	 116536	 212992	1450883	 162383	vmlinux
        -4124   -2224       0   -6348   -18CC +/-
      
      However, I'm not exactly sure that the Kconfig wording is correct with
      regard to !64BIT / 64BIT.
      
      [ mingo@elte.hu: convert macro to inline ]
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8d02c211
  31. 22 7月, 2008 1 次提交
  32. 18 7月, 2008 1 次提交
    • M
      x86: APIC: remove apic_write_around(); use alternatives · 593f4a78
      Maciej W. Rozycki 提交于
      Use alternatives to select the workaround for the 11AP Pentium erratum
      for the affected steppings on the fly rather than build time.  Remove the
      X86_GOOD_APIC configuration option and replace all the calls to
      apic_write_around() with plain apic_write(), protecting accesses to the
      ESR as appropriate due to the 3AP Pentium erratum.  Remove
      apic_read_around() and all its invocations altogether as not needed.
      Remove apic_write_atomic() and all its implementing backends.  The use of
      ASM_OUTPUT2() is not strictly needed for input constraints, but I have
      used it for readability's sake.
      
      I had the feeling no one else was brave enough to do it, so I went ahead
      and here it is.  Verified by checking the generated assembly and tested
      with both a 32-bit and a 64-bit configuration, also with the 11AP
      "feature" forced on and verified with gdb on /proc/kcore to work as
      expected (as an 11AP machines are quite hard to get hands on these days).
      Some script complained about the use of "volatile", but apic_write() needs
      it for the same reason and is effectively a replacement for writel(), so I
      have disregarded it.
      
      I am not sure what the policy wrt defconfig files is, they are generated
      and there is risk of a conflict resulting from an unrelated change, so I
      have left changes to them out.  The option will get removed from them at
      the next run.
      
      Some testing with machines other than mine will be needed to avoid some
      stupid mistake, but despite its volume, the change is not really that
      intrusive, so I am fairly confident that because it works for me, it will
      everywhere.
      Signed-off-by: NMaciej W. Rozycki <macro@linux-mips.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      593f4a78
  33. 09 7月, 2008 1 次提交
  34. 13 5月, 2008 2 次提交
  35. 01 5月, 2008 1 次提交
  36. 27 4月, 2008 2 次提交
    • A
      x86, bitops: select the generic bitmap search functions · 19870def
      Alexander van Heukelum 提交于
      Introduce GENERIC_FIND_FIRST_BIT and GENERIC_FIND_NEXT_BIT in
      lib/Kconfig, defaulting to off. An arch that wants to use the
      generic implementation now only has to use a select statement
      to include them.
      
      I added an always-y option (X86_CPU) to arch/x86/Kconfig.cpu
      and used that to select the generic search functions. This
      way ARCH=um SUBARCH=i386 automatically picks up the change
      too, and arch/um/Kconfig.i386 can therefore be simplified a
      bit. ARCH=um SUBARCH=x86_64 does things differently, but
      still compiles fine. It seems that a "def_bool y" always
      wins over a "def_bool n"?
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      19870def
    • A
      x86: merge the simple bitops and move them to bitops.h · 12d9c842
      Alexander van Heukelum 提交于
      Some of those can be written in such a way that the same
      inline assembly can be used to generate both 32 bit and
      64 bit code.
      
      For ffs and fls, x86_64 unconditionally used the cmov
      instruction and i386 unconditionally used a conditional
      branch over a mov instruction. In the current patch I
      chose to select the version based on the availability
      of the cmov instruction instead. A small detail here is
      that x86_64 did not previously set CONFIG_X86_CMOV=y.
      
      Improved comments for ffs, ffz, fls and variations.
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      12d9c842