1. 16 12月, 2011 1 次提交
    • H
      x86, bitops: Move fls64.h inside __KERNEL__ · 83d99df7
      H. Peter Anvin 提交于
      We would include <asm-generic/bitops/fls64.h> even without __KERNEL__,
      but that doesn't make sense, as:
      
      1. That file provides fls64(), but the corresponding function fls() is
         not exported to user space.
      2. The implementation of fls64.h uses kernel-only symbols.
      3. fls64.h is not exported to user space.
      
      This appears to have been a bug introduced in checkin:
      
      d57594c2 bitops: use __fls for fls64 on 64-bit archs
      
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Alexander van Heukelum <heukelum@mailshack.com>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/4EEA77E1.6050009@zytor.com
      83d99df7
  2. 27 7月, 2011 1 次提交
  3. 24 3月, 2011 3 次提交
    • A
      bitops: remove minix bitops from asm/bitops.h · 61f2e7b0
      Akinobu Mita 提交于
      minix bit operations are only used by minix filesystem and useless by
      other modules.  Because byte order of inode and block bitmaps is different
      on each architecture like below:
      
      m68k:
      	big-endian 16bit indexed bitmaps
      
      h8300, microblaze, s390, sparc, m68knommu:
      	big-endian 32 or 64bit indexed bitmaps
      
      m32r, mips, sh, xtensa:
      	big-endian 32 or 64bit indexed bitmaps for big-endian mode
      	little-endian bitmaps for little-endian mode
      
      Others:
      	little-endian bitmaps
      
      In order to move minix bit operations from asm/bitops.h to architecture
      independent code in minix filesystem, this provides two config options.
      
      CONFIG_MINIX_FS_BIG_ENDIAN_16BIT_INDEXED is only selected by m68k.
      CONFIG_MINIX_FS_NATIVE_ENDIAN is selected by the architectures which use
      native byte order bitmaps (h8300, microblaze, s390, sparc, m68knommu,
      m32r, mips, sh, xtensa).  The architectures which always use little-endian
      bitmaps do not select these options.
      
      Finally, we can remove minix bit operations from asm/bitops.h for all
      architectures.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NGreg Ungerer <gerg@uclinux.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NPaul Mundt <lethal@linux-sh.org>
      Cc: Chris Zankel <chris@zankel.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      61f2e7b0
    • A
      bitops: remove ext2 non-atomic bitops from asm/bitops.h · f312eff8
      Akinobu Mita 提交于
      As the result of conversions, there are no users of ext2 non-atomic bit
      operations except for ext2 filesystem itself.  Now we can put them into
      architecture independent code in ext2 filesystem, and remove from
      asm/bitops.h for all architectures.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f312eff8
    • A
      bitops: introduce little-endian bitops for most architectures · 861b5ae7
      Akinobu Mita 提交于
      Introduce little-endian bit operations to the big-endian architectures
      which do not have native little-endian bit operations and the
      little-endian architectures.  (alpha, avr32, blackfin, cris, frv, h8300,
      ia64, m32r, mips, mn10300, parisc, sh, sparc, tile, x86, xtensa)
      
      These architectures can just include generic implementation
      (asm-generic/bitops/le.h).
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Acked-by: NHans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      861b5ae7
  4. 10 10月, 2010 1 次提交
    • A
      bitops: make asm-generic/bitops/find.h more generic · 708ff2a0
      Akinobu Mita 提交于
      asm-generic/bitops/find.h has the extern declarations of find_next_bit()
      and find_next_zero_bit() and the macro definitions of find_first_bit()
      and find_first_zero_bit(). It is only usable by the architectures which
      enables CONFIG_GENERIC_FIND_NEXT_BIT and disables
      CONFIG_GENERIC_FIND_FIRST_BIT.
      
      x86 and tile enable both CONFIG_GENERIC_FIND_NEXT_BIT and
      CONFIG_GENERIC_FIND_FIRST_BIT. These architectures cannot include
      asm-generic/bitops/find.h in their asm/bitops.h. So ifdefed extern
      declarations of find_first_bit and find_first_zero_bit() are put in
      linux/bitops.h.
      
      This makes asm-generic/bitops/find.h usable by these architectures
      and use it. Also this change is needed for the forthcoming duplicated
      extern declarations cleanup.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      708ff2a0
  5. 27 9月, 2010 1 次提交
  6. 07 4月, 2010 1 次提交
    • B
      x86: Add optimized popcnt variants · d61931d8
      Borislav Petkov 提交于
      Add support for the hardware version of the Hamming weight function,
      popcnt, present in CPUs which advertize it under CPUID, Function
      0x0000_0001_ECX[23]. On CPUs which don't support it, we fallback to the
      default lib/hweight.c sw versions.
      
      A synthetic benchmark comparing popcnt with __sw_hweight64 showed almost
      a 3x speedup on a F10h machine.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      LKML-Reference: <20100318112015.GC11152@aftab>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      d61931d8
  7. 14 1月, 2009 1 次提交
    • A
      x86, generic: mark complex bitops.h inlines as __always_inline · c8399943
      Andi Kleen 提交于
      Impact: reduce kernel image size
      
      Hugh Dickins noticed that older gcc versions when the kernel
      is built for code size didn't inline some of the bitops.
      
      Mark all complex x86 bitops that have more than a single
      asm statement or two as always inline to avoid this problem.
      
      Probably should be done for other architectures too.
      
      Ingo then found a better fix that only requires
      a single line change, but it unfortunately only
      works on gcc 4.3.
      
      On older gccs the original patch still makes a ~0.3% defconfig
      difference with CONFIG_OPTIMIZE_INLINING=y.
      
      With gcc 4.1 and a defconfig like build:
      
          6116998 1138540  883788 8139326  7c323e vmlinux-oi-with-patch
          6137043 1138540  883788 8159371  7c808b vmlinux-optimize-inlining
      
      ~20k / 0.3% difference.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c8399943
  8. 10 1月, 2009 1 次提交
  9. 06 11月, 2008 1 次提交
  10. 23 10月, 2008 2 次提交
  11. 23 9月, 2008 1 次提交
  12. 23 7月, 2008 1 次提交
    • V
      x86: consolidate header guards · 77ef50a5
      Vegard Nossum 提交于
      This patch is the result of an automatic script that consolidates the
      format of all the headers in include/asm-x86/.
      
      The format:
      
      1. No leading underscore. Names with leading underscores are reserved.
      2. Pathname components are separated by two underscores. So we can
         distinguish between mm_types.h and mm/types.h.
      3. Everything except letters and numbers are turned into single
         underscores.
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      77ef50a5
  13. 18 7月, 2008 1 次提交
  14. 21 6月, 2008 1 次提交
    • I
      x86, bitops: make constant-bit set/clear_bit ops faster, gcc workaround · 437a0a54
      Ingo Molnar 提交于
      Jeremy Fitzhardinge reported this compiler bug:
      
      Suggestion from Linus: add "r" to the input constraint of the
      set_bit()/clear_bit()'s constant 'nr' branch:
      
      Blows up on "gcc version 3.4.4 20050314 (prerelease) (Debian 3.4.3-13)":
      
       CC      init/main.o
      include2/asm/bitops.h: In function `start_kernel':
      include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints
      include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints
      include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints
      include2/asm/bitops.h:59: error: impossible constraint in `asm'
      include2/asm/bitops.h:59: error: impossible constraint in `asm'
      include2/asm/bitops.h:59: error: impossible constraint in `asm'
      Reported-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      437a0a54
  15. 20 6月, 2008 1 次提交
  16. 19 6月, 2008 1 次提交
    • L
      x86, bitops: make constant-bit set/clear_bit ops faster · 1a750e0c
      Linus Torvalds 提交于
      On Wed, 18 Jun 2008, Linus Torvalds wrote:
      >
      > And yes, the "lock andl" should be noticeably faster than the xchgl.
      
      I dunno. Here's a untested (!!) patch that turns constant-bit
      set/clear_bit ops into byte mask ops (lock orb/andb).
      
      It's not exactly pretty. The reason for using the byte versions is that a
      locked op is serialized in the memory pipeline anyway, so there are no
      forwarding issues (that could slow down things when we access things with
      different sizes), and the byte ops are a lot smaller than 32-bit and
      particularly 64-bit ops (big constants, and the 64-bit ops need the REX
      prefix byte too).
      
      [ Side note: I wonder if we should turn the "test_bit()" C version into a
        "char *" version too.. It could actually help with alias analysis, since
        char pointers can alias anything. So it might be the RightThing(tm) to
        do for multiple reasons. I dunno. It's a separate issue. ]
      
      It does actually shrink the kernel image a bit (a couple of hundred bytes
      on the text segment for my everything-compiled-in image), and while it's
      totally untested the (admittedly few) code generation points I looked at
      seemed sane. And "lock orb" should be noticeably faster than "lock bts".
      
      If somebody wants to play with it, go wild. I didn't do "change_bit()",
      because nobody sane uses that thing anyway. I guarantee nothing. And if it
      breaks, nobody saw me do anything.  You can't prove this email wasn't sent
      by somebody who is good at forging smtp.
      
      This does require a gcc that is recent enough for "__builtin_constant_p()"
      to work in an inline function, but I suspect our kernel requirements are
      already higher than that. And if you do have an old gcc that is supported,
      the worst that would happen is that the optimization doesn't trigger.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1a750e0c
  17. 25 5月, 2008 1 次提交
  18. 11 5月, 2008 1 次提交
  19. 27 4月, 2008 5 次提交
    • J
    • A
      x86: finalize bitops unification · d66462f5
      Alexander van Heukelum 提交于
      include/asm-x86/bitops_32.h and include/asm-x86/bitops_64.h are now
      almost identical. The 64-bit version sets ARCH_HAS_FAST_MULTIPLIER
      and has an extra inline function set_bit_string. The define currently
      has no influence on the generated code, but it can be argued that
      setting it on i386 is the right thing to do anyhow. The addition
      of the extra inline function on i386 does not hurt either.
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d66462f5
    • A
      x86: merge the simple bitops and move them to bitops.h · 12d9c842
      Alexander van Heukelum 提交于
      Some of those can be written in such a way that the same
      inline assembly can be used to generate both 32 bit and
      64 bit code.
      
      For ffs and fls, x86_64 unconditionally used the cmov
      instruction and i386 unconditionally used a conditional
      branch over a mov instruction. In the current patch I
      chose to select the version based on the availability
      of the cmov instruction instead. A small detail here is
      that x86_64 did not previously set CONFIG_X86_CMOV=y.
      
      Improved comments for ffs, ffz, fls and variations.
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      12d9c842
    • A
      x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps · 64970b68
      Alexander van Heukelum 提交于
      This moves an optimization for searching constant-sized small
      bitmaps form x86_64-specific to generic code.
      
      On an i386 defconfig (the x86#testing one), the size of vmlinux hardly
      changes with this applied. I have observed only four places where this
      optimization avoids a call into find_next_bit:
      
      In the functions return_unused_surplus_pages, alloc_fresh_huge_page,
      and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap.
      In __next_cpu a call is avoided for a 32-bit bitmap. That's it.
      
      On x86_64, 52 locations are optimized with a minimal increase in
      code size:
      
      Current #testing defconfig:
      	146 x bsf, 27 x find_next_*bit
         text    data     bss     dec     hex filename
         5392637  846592  724424 6963653  6a41c5 vmlinux
      
      After removing the x86_64 specific optimization for find_next_*bit:
      	94 x bsf, 79 x find_next_*bit
         text    data     bss     dec     hex filename
         5392358  846592  724424 6963374  6a40ae vmlinux
      
      After this patch (making the optimization generic):
      	146 x bsf, 27 x find_next_*bit
         text    data     bss     dec     hex filename
         5392396  846592  724424 6963412  6a40d4 vmlinux
      
      [ tglx@linutronix.de: build fixes ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      64970b68
    • A
      x86: change x86 to use generic find_next_bit · 6fd92b63
      Alexander van Heukelum 提交于
      The versions with inline assembly are in fact slower on the machines I
      tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD
      Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid
      crashing the benchmark.
      
      Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size
      1...512, for each possible bitmap with one bit set, for each possible
      offset: find the position of the first bit starting at offset. If you
      follow ;). Times include setup of the bitmap and checking of the
      results.
      
      		Athlon		Xeon		Opteron 32/64bit
      x86-specific:	0m3.692s	0m2.820s	0m3.196s / 0m2.480s
      generic:	0m2.622s	0m1.662s	0m2.100s / 0m1.572s
      
      If the bitmap size is not a multiple of BITS_PER_LONG, and no set
      (cleared) bit is found, find_next_bit (find_next_zero_bit) returns a
      value outside of the range [0, size]. The generic version always returns
      exactly size. The generic version also uses unsigned long everywhere,
      while the x86 versions use a mishmash of int, unsigned (int), long and
      unsigned long.
      
      Using the generic version does give a slightly bigger kernel, though.
      
      defconfig:	   text    data     bss     dec     hex filename
      x86-specific:	4738555  481232  626688 5846475  5935cb vmlinux (32 bit)
      generic:	4738621  481232  626688 5846541  59360d vmlinux (32 bit)
      x86-specific:	5392395  846568  724424 6963387  6a40bb vmlinux (64 bit)
      generic:	5392458  846568  724424 6963450  6a40fa vmlinux (64 bit)
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6fd92b63
  20. 17 4月, 2008 2 次提交
    • J
      286275c9
    • J
      x86: bitops asm constraint fixes · 709f744f
      Jan Beulich 提交于
      This (simplified) piece of code didn't behave as expected due to
      incorrect constraints in some of the bitops functions, when
      X86_FEATURE_xxx is referring to other than the first long:
      
      int test(struct cpuinfo_x86 *c) {
      	if (cpu_has(c, X86_FEATURE_xxx))
      		clear_cpu_cap(c, X86_FEATURE_xxx);
      	return cpu_has(c, X86_FEATURE_xxx);
      }
      
      I'd really like understand, though, what the policy of (not) having a
      "memory" clobber in these operations is - currently, this appears to
      be totally inconsistent. Also, many comments of the non-atomic
      functions say those may also be re-ordered - this contradicts the use
      of "asm volatile" in there, which again I'd like to understand.
      
      As much as all of these, using 'int' for the 'nr' parameter and
      'void *' for the 'addr' one is in conflict with
      Documentation/atomic_ops.txt, especially because bt{,c,r,s} indeed
      take the bit index as signed (which hence would really need special
      precaution) and access the full 32 bits (if 'unsigned long' was used
      properly here, 64 bits for x86-64) pointed at, so invalid uses like
      referencing a 'char' array cannot currently be caught.
      
      Finally, the code with and without this patch relies heavily on the
      -fno-strict-aliasing compiler switch and I'm not certain this really
      is a good idea.
      
      In the light of all of this I'm sending this as RFC, as fixing the
      above might warrant a much bigger patch...
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      709f744f
  21. 30 1月, 2008 2 次提交
  22. 11 10月, 2007 1 次提交