1. 29 4月, 2008 7 次提交
  2. 28 4月, 2008 2 次提交
    • P
      mempolicy: add bitmap_onto() and bitmap_fold() operations · 7ea931c9
      Paul Jackson 提交于
      The following adds two more bitmap operators, bitmap_onto() and bitmap_fold(),
      with the usual cpumask and nodemask wrappers.
      
      The bitmap_onto() operator computes one bitmap relative to another.  If the
      n-th bit in the origin mask is set, then the m-th bit of the destination mask
      will be set, where m is the position of the n-th set bit in the relative mask.
      
      The bitmap_fold() operator folds a bitmap into a second that has bit m set iff
      the input bitmap has some bit n set, where m == n mod sz, for the specified sz
      value.
      
      There are two substantive changes between this patch and its
      predecessor bitmap_relative:
       1) Renamed bitmap_relative() to be bitmap_onto().
       2) Added bitmap_fold().
      
      The essential motivation for bitmap_onto() is to provide a mechanism for
      converting a cpuset-relative CPU or Node mask to an absolute mask.  Cpuset
      relative masks are written as if the current task were in a cpuset whose CPUs
      or Nodes were just the consecutive ones numbered 0..N-1, for some N.  The
      bitmap_onto() operator is provided in anticipation of adding support for the
      first such cpuset relative mask, by the mbind() and set_mempolicy() system
      calls, using a planned flag of MPOL_F_RELATIVE_NODES.  These bitmap operators
      (and their nodemask wrappers, in particular) will be used in code that
      converts the user specified cpuset relative memory policy to a specific system
      node numbered policy, given the current mems_allowed of the tasks cpuset.
      
      Such cpuset relative mempolicies will address two deficiencies
      of the existing interface between cpusets and mempolicies:
       1) A task cannot at present reliably establish a cpuset
          relative mempolicy because there is an essential race
          condition, in that the tasks cpuset may be changed in
          between the time the task can query its cpuset placement,
          and the time the task can issue the applicable mbind or
          set_memplicy system call.
       2) A task cannot at present establish what cpuset relative
          mempolicy it would like to have, if it is in a smaller
          cpuset than it might have mempolicy preferences for,
          because the existing interface only allows specifying
          mempolicies for nodes currently allowed by the cpuset.
      
      Cpuset relative mempolicies are useful for tasks that don't distinguish
      particularly between one CPU or Node and another, but only between how many of
      each are allowed, and the proper placement of threads and memory pages on the
      various CPUs and Nodes available.
      
      The motivation for the added bitmap_fold() can be seen in the following
      example.
      
      Let's say an application has specified some mempolicies that presume 16 memory
      nodes, including say a mempolicy that specified MPOL_F_RELATIVE_NODES (cpuset
      relative) nodes 12-15.  Then lets say that application is crammed into a
      cpuset that only has 8 memory nodes, 0-7.  If one just uses bitmap_onto(),
      this mempolicy, mapped to that cpuset, would ignore the requested relative
      nodes above 7, leaving it empty of nodes.  That's not good; better to fold the
      higher nodes down, so that some nodes are included in the resulting mapped
      mempolicy.  In this case, the mempolicy nodes 12-15 are taken modulo 8 (the
      weight of the mems_allowed of the confining cpuset), resulting in a mempolicy
      specifying nodes 4-7.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: <kosaki.motohiro@jp.fujitsu.com>
      Cc: <ray-lk@madrabbit.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ea931c9
    • C
      Remove set_migrateflags() · 488514d1
      Christoph Lameter 提交于
      Migrate flags must be set on slab creation as agreed upon when the antifrag
      logic was reviewed.  Otherwise some slabs of a slabcache will end up in the
      unmovable and others in the reclaimable section depending on which flag was
      active when a new slab page was allocated.
      
      This likely slid in somehow when antifrag was merged. Remove it.
      
      The buffer_heads are always allocated with __GFP_RECLAIMABLE because the
      SLAB_RECLAIM_ACCOUNT option is set.  The set_migrateflags() never had any
      effect there.
      
      Radix tree allocations are not directly reclaimable but they are allocated
      with __GFP_RECLAIMABLE set on each allocation.  We now set
      SLAB_RECLAIM_ACCOUNT on radix tree slab creation making sure that radix
      tree slabs are consistently placed in the reclaimable section.  Radix tree
      slabs will also be accounted as such.
      
      There is then no user left of set_migratepages. So remove it.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      488514d1
  3. 27 4月, 2008 4 次提交
    • A
      x86, bitops: select the generic bitmap search functions · 19870def
      Alexander van Heukelum 提交于
      Introduce GENERIC_FIND_FIRST_BIT and GENERIC_FIND_NEXT_BIT in
      lib/Kconfig, defaulting to off. An arch that wants to use the
      generic implementation now only has to use a select statement
      to include them.
      
      I added an always-y option (X86_CPU) to arch/x86/Kconfig.cpu
      and used that to select the generic search functions. This
      way ARCH=um SUBARCH=i386 automatically picks up the change
      too, and arch/um/Kconfig.i386 can therefore be simplified a
      bit. ARCH=um SUBARCH=x86_64 does things differently, but
      still compiles fine. It seems that a "def_bool y" always
      wins over a "def_bool n"?
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      19870def
    • A
      x86: generic versions of find_first_(zero_)bit, convert i386 · 77b9bd9c
      Alexander van Heukelum 提交于
      Generic versions of __find_first_bit and __find_first_zero_bit
      are introduced as simplified versions of __find_next_bit and
      __find_next_zero_bit. Their compilation and use are guarded by
      a new config variable GENERIC_FIND_FIRST_BIT.
      
      The generic versions of find_first_bit and find_first_zero_bit
      are implemented in terms of the newly introduced __find_first_bit
      and __find_first_zero_bit.
      
      This patch does not remove the i386-specific implementation,
      but it does switch i386 to use the generic functions by setting
      GENERIC_FIND_FIRST_BIT=y for X86_32.
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      77b9bd9c
    • A
      x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps · 64970b68
      Alexander van Heukelum 提交于
      This moves an optimization for searching constant-sized small
      bitmaps form x86_64-specific to generic code.
      
      On an i386 defconfig (the x86#testing one), the size of vmlinux hardly
      changes with this applied. I have observed only four places where this
      optimization avoids a call into find_next_bit:
      
      In the functions return_unused_surplus_pages, alloc_fresh_huge_page,
      and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap.
      In __next_cpu a call is avoided for a 32-bit bitmap. That's it.
      
      On x86_64, 52 locations are optimized with a minimal increase in
      code size:
      
      Current #testing defconfig:
      	146 x bsf, 27 x find_next_*bit
         text    data     bss     dec     hex filename
         5392637  846592  724424 6963653  6a41c5 vmlinux
      
      After removing the x86_64 specific optimization for find_next_*bit:
      	94 x bsf, 79 x find_next_*bit
         text    data     bss     dec     hex filename
         5392358  846592  724424 6963374  6a40ae vmlinux
      
      After this patch (making the optimization generic):
      	146 x bsf, 27 x find_next_*bit
         text    data     bss     dec     hex filename
         5392396  846592  724424 6963412  6a40d4 vmlinux
      
      [ tglx@linutronix.de: build fixes ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      64970b68
    • A
      x86: change x86 to use generic find_next_bit · 6fd92b63
      Alexander van Heukelum 提交于
      The versions with inline assembly are in fact slower on the machines I
      tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD
      Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid
      crashing the benchmark.
      
      Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size
      1...512, for each possible bitmap with one bit set, for each possible
      offset: find the position of the first bit starting at offset. If you
      follow ;). Times include setup of the bitmap and checking of the
      results.
      
      		Athlon		Xeon		Opteron 32/64bit
      x86-specific:	0m3.692s	0m2.820s	0m3.196s / 0m2.480s
      generic:	0m2.622s	0m1.662s	0m2.100s / 0m1.572s
      
      If the bitmap size is not a multiple of BITS_PER_LONG, and no set
      (cleared) bit is found, find_next_bit (find_next_zero_bit) returns a
      value outside of the range [0, size]. The generic version always returns
      exactly size. The generic version also uses unsigned long everywhere,
      while the x86 versions use a mishmash of int, unsigned (int), long and
      unsigned long.
      
      Using the generic version does give a slightly bigger kernel, though.
      
      defconfig:	   text    data     bss     dec     hex filename
      x86-specific:	4738555  481232  626688 5846475  5935cb vmlinux (32 bit)
      generic:	4738621  481232  626688 5846541  59360d vmlinux (32 bit)
      x86-specific:	5392395  846568  724424 6963387  6a40bb vmlinux (64 bit)
      generic:	5392458  846568  724424 6963450  6a40fa vmlinux (64 bit)
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6fd92b63
  4. 26 4月, 2008 1 次提交
    • A
      Add option to enable -Wframe-larger-than= on gcc 4.4 · 35bb5b1e
      Andi Kleen 提交于
      Add option to enable -Wframe-larger-than= on gcc 4.4
      
      gcc mainline (upcoming 4.4) added a new -Wframe-larger-than=...
      option to warn at build time about too large stack frames. Add a config
      option to enable this warning, since this very useful for the kernel.
      
      I choose (somewhat arbitarily) 2048 as default warning threshold for 64bit
      and 1024 as default for 32bit architectures.  With some research and
      fixing all the code for smaller values these defaults should be probably
      lowered.
      
      With the default allyesconfigs have some new warnings, but I think
      that is all code that should be just fixed.
      
      At some point (when gcc 4.4 is released and widely used) this should
      obsolete make checkstack
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      35bb5b1e
  5. 24 4月, 2008 1 次提交
  6. 20 4月, 2008 3 次提交
  7. 19 4月, 2008 3 次提交
  8. 18 4月, 2008 4 次提交
    • S
      firewire: fw-ohci: add option for remote debugging · 080de8c2
      Stefan Richter 提交于
      This way firewire-ohci can be used for remote debugging like ohci1394.
      Version with amendment from Fri, 11 Apr 2008 00:08:08 +0200.
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      Acked-by: NBernhard Kaindl <bk@suse.de>
      080de8c2
    • J
      kgdb: allow static kgdbts boot configuration · 974460c5
      Jason Wessel 提交于
      This patch adds in the ability to compile the kgdb internal test
      string into the kernel so as to run the tests at boot without changing
      the kernel boot arguments.  This patch also changes all the error
      paths to invoke WARN_ON(1) which will emit the line number of the file
      and dump the kernel stack when an error occurs.
      
      You can disable the tests in a kernel that is built this way
      using "kgdbts="
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      974460c5
    • J
      kgdb: add kgdb internal test suite · e8d31c20
      Jason Wessel 提交于
      This patch adds regression tests for testing the kgdb core and arch
      specific implementation.
      
      The kgdb test suite is designed to be built into the kernel and not as
      a module because it uses a number of low level kernel and kgdb
      primitives which should not be exported externally.
      
      The kgdb test suite is designed as a KGDB I/O module which
      simulates the communications that a debugger would have with kgdb.
      The tests are broken up in to a line by line and referenced here as
      a "get" which is kgdb requesting input and "put" which is kgdb
      sending a response.
      
      The kgdb suite can be invoked from the kernel command line
      arguments system or executed dynamically at run time.  The test
      suite uses the variable "kgdbts" to obtain the information about
      which tests to run and to configure the verbosity level.  The
      following are the various characters you can use with the kgdbts=
      line:
      
      When using the "kgdbts=" you only choose one of the following core
      test types:
      A = Run all the core tests silently
      V1 = Run all the core tests with minimal output
      V2 = Run all the core tests in debug mode
      
      You can also specify optional tests:
      N## = Go to sleep with interrupts of for ## seconds
            to test the HW NMI watchdog
      F## = Break at do_fork for ## iterations
      S## = Break at sys_open for ## iterations
      
      NOTE: that the do_fork and sys_open tests are mutually exclusive.
      
      To invoke the kgdb test suite from boot you use a kernel start
      argument as follows:
      	kgdbts=V1 kgdbwait
      Or if you wanted to perform the NMI test for 6 seconds and do_fork
      test for 100 forks, you could use:
      	kgdbts=V1N6F100 kgdbwait
      
      The test suite can also be invoked at run time with:
      echo kgdbts=V1N6F100 > /sys/module/kgdbts/parameters/kgdbts
      Or as another example:
      echo kgdbts=V2 > /sys/module/kgdbts/parameters/kgdbts
      
      When developing a new kgdb arch specific implementation or
      using these tests for the purpose of regression testing,
      several invocations are required.
      
      1) Boot with the test suite enabled by using the kernel arguments
            "kgdbts=V1F100 kgdbwait"
         ## If kgdb arch specific implementation has NMI use
            "kgdbts=V1N6F100
      
      2) After the system boot run the basic test.
      echo kgdbts=V1 > /sys/module/kgdbts/parameters/kgdbts
      
      3) Run the concurrency tests.  It is best to use n+1
         while loops where n is the number of cpus you have
         in your system.  The example below uses only two
         loops.
      
      ## This tests break points on sys_open
      while [ 1 ] ; do find / > /dev/null 2>&1 ; done &
      while [ 1 ] ; do find / > /dev/null 2>&1 ; done &
      echo kgdbts=V1S10000 > /sys/module/kgdbts/parameters/kgdbts
      fg # and hit control-c
      fg # and hit control-c
      ## This tests break points on do_fork
      while [ 1 ] ; do date > /dev/null ; done &
      while [ 1 ] ; do date > /dev/null ; done &
      echo kgdbts=V1F1000 > /sys/module/kgdbts/parameters/kgdbts
      fg # and hit control-c
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e8d31c20
    • J
      kgdb: core · dc7d5527
      Jason Wessel 提交于
      kgdb core code. Handles the protocol and the arch details.
      
      [ mingo@elte.hu: heavily modified, simplified and cleaned up. ]
      [ xemul@openvz.org: use find_task_by_pid_ns ]
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NJan Kiszka <jan.kiszka@web.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      dc7d5527
  9. 17 4月, 2008 3 次提交
  10. 15 4月, 2008 3 次提交
    • P
      [LMB] Restructure allocation loops to avoid unsigned underflow · d9024df0
      Paul Mackerras 提交于
      There is a potential bug in __lmb_alloc_base where we subtract `size'
      from the base address of a reserved region without checking whether
      the subtraction could wrap around and produce a very large unsigned
      value.  In fact it probably isn't possible to hit the bug in practice
      since it would only occur in the situation where we can't satisfy the
      allocation request and there is a reserved region starting at 0.
      
      This fixes the potential bug by breaking out of the loop when we get
      to the point where the base of the reserved region is less than the
      size requested.  This also restructures the loop to be a bit easier to
      follow.
      
      The same logic got copied into lmb_alloc_nid_unreserved, so this makes
      a similar change there.  Here the bug is more likely to be hit because
      the outer loop  (in lmb_alloc_nid) goes through the memory regions in
      increasing order rather than decreasing order as __lmb_alloc_base
      does, and we are therefore more likely to hit the case where we are
      testing against a reserved region with a base address of 0.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      d9024df0
    • P
      [LMB] Fix some whitespace and other formatting issues, use pr_debug · 300613e5
      Paul Mackerras 提交于
      This makes no semantic changes.  It fixes the whitespace and formatting
      a bit, gets rid of a local DBG macro and uses the equivalent pr_debug
      instead, and restructures one while loop that had a function call and
      assignment in the condition to be a bit more readable.  Some comments
      about functions being called with relocation disabled were also removed
      as they would just be confusing to most readers now that the code is
      in lib/.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      300613e5
    • D
      [LMB] Add lmb_alloc_nid() · c50f68c8
      David S. Miller 提交于
      A variant of lmb_alloc() that tries to allocate memory on a specified
      NUMA node 'nid' but falls back to normal lmb_alloc() if that fails.
      
      The caller provides a 'nid_range' function pointer which assists the
      allocator.  It is given args 'start', 'end', and pointer to integer
      'this_nid'.
      
      It places at 'this_nid' the NUMA node id that corresponds to 'start',
      and returns the end address within 'start' to 'end' at which memory
      assosciated with 'nid' ends.
      
      This callback allows a platform to use lmb_alloc_nid() in just
      about any context, even ones in which early_pfn_to_nid() might
      not be working yet.
      
      This function will be used by the NUMA setup code on sparc64, and also
      it can be used by powerpc, replacing it's hand crafted
      "careful_allocation()" function in arch/powerpc/mm/numa.c
      
      If x86 ever converts it's NUMA support over to using the LMB helpers,
      it can use this too as it has something entirely similar.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      c50f68c8
  11. 14 4月, 2008 1 次提交
    • C
      slub: Deal with config variable dependencies · 5b06c853
      Christoph Lameter 提交于
      count_partial() is used by both slabinfo and the sysfs proc support. Move
      the function directly before the beginning of the sysfs code so that it can
      be easily found. Rework the preprocessor conditional to take into account
      that slub sysfs support depends on CONFIG_SYSFS *and* CONFIG_SLUB_DEBUG.
      
      Make CONFIG_SLUB_STATS depend on CONFIG_SLUB_DEBUG and CONFIG_SYSFS. There
      is no point of keeping statistics if no one can restrive them.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      5b06c853
  12. 11 4月, 2008 1 次提交
  13. 08 4月, 2008 1 次提交
  14. 04 4月, 2008 2 次提交
  15. 31 3月, 2008 1 次提交
    • M
      fix uevent action-string regression · a9edadbf
      Mark Lord 提交于
      Mark Lord wrote:
      >
      > On boot, syslog is flooded with "uevent: unsupported action-string;" messages.
      ..
      > Mar 28 14:43:29 shrimp kernel: tty ptyqd: uevent: unsupported
      > action-string; this will be ignored in a future kernel version
      > Mar 28 14:43:29 shrimp kernel: tty ptyqe: uevent: unsupported
      > action-string; this will be ignored in a future kernel version
      > Mar 28 14:43:29 shrimp kernel: tty ptyqf: uevent: unsupported
      > action-string; this will be ignored in a future kernel version
      > Mar 28 14:43:29 shrimp kernel: tty ptyr0: uevent: unsupported
      > action-string; this will be ignored in a future kernel version
      ..
      
      These messages are a regression compared with 2.6.24, which did not
      flood the syslog with them.
      
      The actual underlying problem was introduced in 2.6.23, when somebody
      made the string parsing no longer accept nul-terminated strings as a
      valid input to store_uevent().
      
      Eg.  "add\0" was valid prior to 2.6.23, where the code regressed to
      require "add" without the '\0'.
      
      This patch fixes the 2.6.23 / 2.6.24 regressions, by having the code
      once again tolerate the trailing '\0', if present.
      
      According to GregKH, this mainly affects older Ubuntu systems, such as
      the one I have here that requires this fix.
      Signed-off-by: NMark Lord <mlord@pobox.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9edadbf
  16. 29 3月, 2008 1 次提交
  17. 28 3月, 2008 1 次提交
  18. 25 3月, 2008 1 次提交