提交 39d997b5 编写于 作者: A Akinobu Mita 提交者: Ingo Molnar

x86, core: Optimize hweight32()

Optimize hweight32 by using the same technique in hweight64.

The proof of this technique can be found in the commit log for
f9b41929 ("bitops: hweight()
speedup").

The userspace benchmark on x86_32 showed 20% speedup with
bitmap_weight() which uses hweight32 to count bits for each
unsigned long on 32bit architectures.

 int main(void)
 {
	#define SZ (1024 * 1024 * 512)

	static DECLARE_BITMAP(bitmap, SZ) = {
	        [0 ... 100] = 1,
	};

	return bitmap_weight(bitmap, SZ);
 }
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1258603932-4590-1-git-send-email-akinobu.mita@gmail.com>
[ only x86 sets ARCH_HAS_FAST_MULTIPLIER so we do this via the x86 tree]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
上级 6b7b2849
...@@ -11,11 +11,18 @@ ...@@ -11,11 +11,18 @@
unsigned int hweight32(unsigned int w) unsigned int hweight32(unsigned int w)
{ {
#ifdef ARCH_HAS_FAST_MULTIPLIER
w -= (w >> 1) & 0x55555555;
w = (w & 0x33333333) + ((w >> 2) & 0x33333333);
w = (w + (w >> 4)) & 0x0f0f0f0f;
return (w * 0x01010101) >> 24;
#else
unsigned int res = w - ((w >> 1) & 0x55555555); unsigned int res = w - ((w >> 1) & 0x55555555);
res = (res & 0x33333333) + ((res >> 2) & 0x33333333); res = (res & 0x33333333) + ((res >> 2) & 0x33333333);
res = (res + (res >> 4)) & 0x0F0F0F0F; res = (res + (res >> 4)) & 0x0F0F0F0F;
res = res + (res >> 8); res = res + (res >> 8);
return (res + (res >> 16)) & 0x000000FF; return (res + (res >> 16)) & 0x000000FF;
#endif
} }
EXPORT_SYMBOL(hweight32); EXPORT_SYMBOL(hweight32);
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册