1. 27 7月, 2011 1 次提交
  2. 25 5月, 2011 1 次提交
    • M
      bitmap, irq: add smp_affinity_list interface to /proc/irq · 4b060420
      Mike Travis 提交于
      Manually adjusting the smp_affinity for IRQ's becomes unwieldy when the
      cpu count is large.
      
      Setting smp affinity to cpus 256 to 263 would be:
      
      	echo 000000ff,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > smp_affinity
      
      instead of:
      
      	echo 256-263 > smp_affinity_list
      
      Think about what it looks like for cpus around say, 4088 to 4095.
      
      We already have many alternate "list" interfaces:
      
      /sys/devices/system/cpu/cpuX/indexY/shared_cpu_list
      /sys/devices/system/cpu/cpuX/topology/thread_siblings_list
      /sys/devices/system/cpu/cpuX/topology/core_siblings_list
      /sys/devices/system/node/nodeX/cpulist
      /sys/devices/pci***/***/local_cpulist
      
      Add a companion interface, smp_affinity_list to use cpu lists instead of
      cpu maps.  This conforms to other companion interfaces where both a map
      and a list interface exists.
      
      This required adding a bitmap_parselist_user() function in a manner
      similar to the bitmap_parse_user() function.
      
      [akpm@linux-foundation.org: make __bitmap_parselist() static]
      Signed-off-by: NMike Travis <travis@sgi.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b060420
  3. 07 3月, 2010 1 次提交
  4. 25 2月, 2010 1 次提交
    • P
      rcu: Accelerate grace period if last non-dynticked CPU · 8bd93a2c
      Paul E. McKenney 提交于
      Currently, rcu_needs_cpu() simply checks whether the current CPU
      has an outstanding RCU callback, which means that the last CPU
      to go into dyntick-idle mode might wait a few ticks for the
      relevant grace periods to complete.  However, if all the other
      CPUs are in dyntick-idle mode, and if this CPU is in a quiescent
      state (which it is for RCU-bh and RCU-sched any time that we are
      considering going into dyntick-idle mode), then the grace period
      is instantly complete.
      
      This patch therefore repeatedly invokes the RCU grace-period
      machinery in order to force any needed grace periods to complete
      quickly.  It does so a limited number of times in order to
      prevent starvation by an RCU callback function that might pass
      itself to call_rcu().
      
      However, if any CPU other than the current one is not in
      dyntick-idle mode, fall back to simply checking (with fix to bug
      noted by Lai Jiangshan).  Also, take advantage of last
      grace-period forcing, the opportunity to do so noted by Steve
      Rostedt.  And apply simplified #ifdef condition suggested by
      Frederic Weisbecker.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-15-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8bd93a2c
  5. 07 12月, 2009 1 次提交
    • P
      sched: Fix balance vs hotplug race · 6ad4c188
      Peter Zijlstra 提交于
      Since (e761b772: cpu hotplug, sched: Introduce cpu_active_map and redo
      sched domain managment) we have cpu_active_mask which is suppose to rule
      scheduler migration and load-balancing, except it never (fully) did.
      
      The particular problem being solved here is a crash in try_to_wake_up()
      where select_task_rq() ends up selecting an offline cpu because
      select_task_rq_fair() trusts the sched_domain tree to reflect the
      current state of affairs, similarly select_task_rq_rt() trusts the
      root_domain.
      
      However, the sched_domains are updated from CPU_DEAD, which is after the
      cpu is taken offline and after stop_machine is done. Therefore it can
      race perfectly well with code assuming the domains are right.
      
      Cure this by building the domains from cpu_active_mask on
      CPU_DOWN_PREPARE.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6ad4c188
  6. 24 9月, 2009 4 次提交
  7. 23 9月, 2009 1 次提交
    • X
      generic-ipi: make struct call_function_data lockless · 54fdade1
      Xiao Guangrong 提交于
      This patch can remove spinlock from struct call_function_data, the
      reasons are below:
      
      1: add a new interface for cpumask named cpumask_test_and_clear_cpu(),
         it can atomically test and clear specific cpu, we can use it instead
         of cpumask_test_cpu() and cpumask_clear_cpu() and no need data->lock
         to protect those in generic_smp_call_function_interrupt().
      
      2: in smp_call_function_many(), after csd_lock() return, the current's
         cfd_data is deleted from call_function list, so it not have race
         between other cpus, then cfs_data is only used in
         smp_call_function_many() that must disable preemption and not from
         a hardware interrupthandler or from a bottom half handler to call,
         only the correspond cpu can use it, so it not have race in current
         cpu, no need cfs_data->lock to protect it.
      
      3: after 1 and 2, cfs_data->lock is only use to protect cfs_data->refs in
         generic_smp_call_function_interrupt(), so we can define cfs_data->refs
         to atomic_t, and no need cfs_data->lock any more.
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      [akpm@linux-foundation.org: use atomic_dec_return()]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54fdade1
  8. 22 8月, 2009 1 次提交
    • L
      Make bitmask 'and' operators return a result code · f4b0373b
      Linus Torvalds 提交于
      When 'and'ing two bitmasks (where 'andnot' is a variation on it), some
      cases want to know whether the result is the empty set or not.  In
      particular, the TLB IPI sending code wants to do cpumask operations and
      determine if there are any CPU's left in the final set.
      
      So this just makes the bitmask (and cpumask) functions return a boolean
      for whether the result has any bits set.
      
      Cc: stable@kernel.org (2.6.30, needed by TLB shootdown fix)
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f4b0373b
  9. 09 6月, 2009 1 次提交
  10. 01 1月, 2009 1 次提交
  11. 30 12月, 2008 3 次提交
  12. 19 12月, 2008 1 次提交
    • M
      cpumask: Add alloc_cpumask_var_node() · 7b4967c5
      Mike Travis 提交于
      Impact: New API
      
      This will be needed in x86 code to allocate the domain and old_domain
      cpumasks on the same node as where the containing irq_cfg struct is
      allocated.
      
      (Also fixes double-dump_stack on rare CONFIG_DEBUG_PER_CPU_MAPS case)
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (re-impl alloc_cpumask_var)
      7b4967c5
  13. 13 12月, 2008 2 次提交
    • R
      cpumask: Use all NR_CPUS bits unless CONFIG_CPUMASK_OFFSTACK · 7be75853
      Rusty Russell 提交于
      Impact: futureproof as we convert more code to new APIs
      
      The old cpumask operators treat all NR_CPUS bits as relevent, the new
      ones use nr_cpumask_bits.  For large NR_CPUS and small nr_cpu_ids, this
      makes a difference.
      
      However, mixing the two can cause problems with undefined bits.  An
      arch which sets CONFIG_CPUMASK_OFFSTACK should have converted across
      to the new operators, so it's safe in that case.
      
      (Thanks to Stephen Rothwell for bisecting the initial unused-bits bug,
      and Mike Travis for this solution).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Mike Travis <travis@sgi.com>
      7be75853
    • R
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and... · 29c0177e
      Rusty Russell 提交于
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers.
      
      Impact: change calling convention of existing cpumask APIs
      
      Most cpumask functions started with cpus_: these have been replaced by
      cpumask_ ones which take struct cpumask pointers as expected.
      
      These four functions don't have good replacement names; fortunately
      they're rarely used, so we just change them over.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: paulus@samba.org
      Cc: mingo@redhat.com
      Cc: tony.luck@intel.com
      Cc: ralf@linux-mips.org
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: cl@linux-foundation.org
      Cc: srostedt@redhat.com
      29c0177e
  14. 10 11月, 2008 1 次提交
  15. 07 11月, 2008 1 次提交
  16. 06 11月, 2008 1 次提交
    • R
      cpumask: introduce new API, without changing anything · 2d3854a3
      Rusty Russell 提交于
      Impact: introduce new APIs
      
      We want to deprecate cpumasks on the stack, as we are headed for
      gynormous numbers of CPUs.  Eventually, we want to head towards an
      undefined 'struct cpumask' so they can never be declared on stack.
      
      1) New cpumask functions which take pointers instead of copies.
         (cpus_* -> cpumask_*)
      
      2) Several new helpers to reduce requirements for temporary cpumasks
         (cpumask_first_and, cpumask_next_and, cpumask_any_and)
      
      3) Helpers for declaring cpumasks on or offstack for large NR_CPUS
         (cpumask_var_t, alloc_cpumask_var and free_cpumask_var)
      
      4) 'struct cpumask' for explicitness and to mark new-style code.
      
      5) Make iterator functions stop at nr_cpu_ids (a runtime constant),
         not NR_CPUS for time efficiency and for smaller dynamic allocations
         in future.
      
      6) cpumask_copy() so we can allocate less than a full cpumask eventually
         (for alloc_cpumask_var), and so we can eliminate the 'struct cpumask'
         definition eventually.
      
      7) work_on_cpu() helper for doing task on a CPU, rather than saving old
         cpumask for current thread and manipulating it.
      
      8) smp_call_function_many() which is smp_call_function_mask() except
         taking a cpumask pointer.
      
      Note that this patch simply introduces the new functions and leaves
      the obsolescent ones in place.  This is to simplify the transition
      patches.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2d3854a3
  17. 31 7月, 2008 1 次提交
  18. 29 7月, 2008 1 次提交
    • L
      cpu masks: optimize and clean up cpumask_of_cpu() · e56b3bc7
      Linus Torvalds 提交于
      Clean up and optimize cpumask_of_cpu(), by sharing all the zero words.
      
      Instead of stupidly generating all possible i=0...NR_CPUS 2^i patterns
      creating a huge array of constant bitmasks, realize that the zero words
      can be shared.
      
      In other words, on a 64-bit architecture, we only ever need 64 of these
      arrays - with a different bit set in one single world (with enough zero
      words around it so that we can create any bitmask by just offsetting in
      that big array). And then we just put enough zeroes around it that we
      can point every single cpumask to be one of those things.
      
      So when we have 4k CPU's, instead of having 4k arrays (of 4k bits each,
      with one bit set in each array - 2MB memory total), we have exactly 64
      arrays instead, each 8k bits in size (64kB total).
      
      And then we just point cpumask(n) to the right position (which we can
      calculate dynamically). Once we have the right arrays, getting
      "cpumask(n)" ends up being:
      
        static inline const cpumask_t *get_cpu_mask(unsigned int cpu)
        {
                const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG];
                p -= cpu / BITS_PER_LONG;
                return (const cpumask_t *)p;
        }
      
      This brings other advantages and simplifications as well:
      
       - we are not wasting memory that is just filled with a single bit in
         various different places
      
       - we don't need all those games to re-create the arrays in some dense
         format, because they're already going to be dense enough.
      
      if we compile a kernel for up to 4k CPU's, "wasting" that 64kB of memory
      is a non-issue (especially since by doing this "overlapping" trick we
      probably get better cache behaviour anyway).
      
      [ mingo@elte.hu:
      
        Converted Linus's mails into a commit. See:
      
           http://lkml.org/lkml/2008/7/27/156
           http://lkml.org/lkml/2008/7/28/320
      
        Also applied a family filter - which also has the side-effect of leaving
        out the bits where Linus calls me an idio... Oh, never mind ;-)
      ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Mike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e56b3bc7
  19. 26 7月, 2008 1 次提交
    • M
      cpumask: make cpumask_of_cpu_map generic · b8d317d1
      Mike Travis 提交于
      If an arch doesn't define cpumask_of_cpu_map, create a generic
      statically-initialized one for them.  This allows removal of the buggy
      cpumask_of_cpu() macro (&cpumask_of_cpu() gives address of
      out-of-scope var).
      
      An arch with NR_CPUS of 4096 probably wants to allocate this itself
      based on the actual number of CPUs, since otherwise they're using 2MB
      of rodata (1024 cpus means 128k).  That's what
      CONFIG_HAVE_CPUMASK_OF_CPU_MAP is for (only x86/64 does so at the
      moment).
      
      In future as we support more CPUs, we'll need to resort to a
      get_cpu_map()/put_cpu_map() allocation scheme.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jack Steiner <steiner@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b8d317d1
  20. 20 7月, 2008 1 次提交
  21. 19 7月, 2008 2 次提交
    • M
      cpumask: Provide a generic set of CPUMASK_ALLOC macros · 77586c2b
      Mike Travis 提交于
        * Provide a generic set of CPUMASK_ALLOC macros patterned after the
          SCHED_CPUMASK_ALLOC macros.  This is used where multiple cpumask_t
          variables are declared on the stack to reduce the amount of stack
          space required.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      77586c2b
    • M
      cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr · 65c01184
      Mike Travis 提交于
        * This patch replaces the dangerous lvalue version of cpumask_of_cpu
          with new cpumask_of_cpu_ptr macros.  These are patterned after the
          node_to_cpumask_ptr macros.
      
          In general terms, if there is a cpumask_of_cpu_map[] then a pointer to
          the cpumask_of_cpu_map[cpu] entry is used.  The cpumask_of_cpu_map
          is provided when there is a large NR_CPUS count, reducing
          greatly the amount of code generated and stack space used for
          cpumask_of_cpu().  The pointer to the cpumask_t value is needed for
          calling set_cpus_allowed_ptr() to reduce the amount of stack space
          needed to pass the cpumask_t value.
      
          If there isn't a cpumask_of_cpu_map[], then a temporary variable is
          declared and filled in with value from cpumask_of_cpu(cpu) as well as
          a pointer variable pointing to this temporary variable.  Afterwards,
          the pointer is used to reference the cpumask value.  The compiler
          will optimize out the extra dereference through the pointer as well
          as the stack space used for the pointer, resulting in identical code.
      
          A good example of the orthogonal usages is in net/sunrpc/svc.c:
      
      	case SVC_POOL_PERCPU:
      	{
      		unsigned int cpu = m->pool_to[pidx];
      		cpumask_of_cpu_ptr(cpumask, cpu);
      
      		*oldmask = current->cpus_allowed;
      		set_cpus_allowed_ptr(current, cpumask);
      		return 1;
      	}
      	case SVC_POOL_PERNODE:
      	{
      		unsigned int node = m->pool_to[pidx];
      		node_to_cpumask_ptr(nodecpumask, node);
      
      		*oldmask = current->cpus_allowed;
      		set_cpus_allowed_ptr(current, nodecpumask);
      		return 1;
      	}
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      65c01184
  22. 18 7月, 2008 1 次提交
    • M
      cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2) · e761b772
      Max Krasnyansky 提交于
      This is based on Linus' idea of creating cpu_active_map that prevents
      scheduler load balancer from migrating tasks to the cpu that is going
      down.
      
      It allows us to simplify domain management code and avoid unecessary
      domain rebuilds during cpu hotplug event handling.
      
      Please ignore the cpusets part for now. It needs some more work in order
      to avoid crazy lock nesting. Although I did simplfy and unify domain
      reinitialization logic. We now simply call partition_sched_domains() in
      all the cases. This means that we're using exact same code paths as in
      cpusets case and hence the test below cover cpusets too.
      Cpuset changes to make rebuild_sched_domains() callable from various
      contexts are in the separate patch (right next after this one).
      
      This not only boots but also easily handles
      	while true; do make clean; make -j 8; done
      and
      	while true; do on-off-cpu 1; done
      at the same time.
      (on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing).
      
      Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing
      this on right now in gnome-terminal and things are moving just fine.
      
      Also this is running with most of the debug features enabled (lockdep,
      mutex, etc) no BUG_ONs or lockdep complaints so far.
      
      I believe I addressed all of the Dmitry's comments for original Linus'
      version. I changed both fair and rt balancer to mask out non-active cpus.
      And replaced cpu_is_offline() with !cpu_active() in the main scheduler
      code where it made sense (to me).
      Signed-off-by: NMax Krasnyanskiy <maxk@qualcomm.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NGregory Haskins <ghaskins@novell.com>
      Cc: dmitry.adamushko@gmail.com
      Cc: pj@sgi.com
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e761b772
  23. 06 7月, 2008 1 次提交
  24. 05 7月, 2008 1 次提交
  25. 24 5月, 2008 2 次提交
    • A
      cpumask: make for_each_cpu_mask a bit smaller · 7baac8b9
      Alexander van Heukelum 提交于
      The for_each_cpu_mask loop is used quite often in the kernel. It
      makes use of two functions: first_cpu and next_cpu. This patch
      changes for_each_cpu_mask to use only the latter. Because next_cpu
      finds the next eligible cpu _after_ the given one, the iteration
      variable has to be initialized to -1 and next_cpu has to be
      called with this value before the first iteration. An x86_64
      defconfig kernel (from sched/latest) is about 2500 bytes smaller
      with this patch applied:
      
         text	   data	    bss	    dec	    hex	filename
      6222517	 917952	 749932	7890401	 7865e1	vmlinux.orig
      6219922	 917952	 749932	7887806	 785bbe	vmlinux
      
      The same size reduction is seen for defconfig+MAXSMP
      
         text	   data	    bss	    dec	    hex	filename
      6241772	2563968	1492716	10298456	 9d2458	vmlinux.orig
      6239211	2563968	1492716	10295895	 9d1a57	vmlinux
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7baac8b9
    • M
      x86: Add performance variants of cpumask operators · 41df0d61
      Mike Travis 提交于
        * Increase performance for systems with large count NR_CPUS by limiting
          the range of the cpumask operators that loop over the bits in a cpumask_t
          variable.  This removes a large amount of wasted cpu cycles.
      
        * Add performance variants of the cpumask operators:
      
          int cpus_weight_nr(mask)	     Same using nr_cpu_ids instead of NR_CPUS
          int first_cpu_nr(mask)	     Number lowest set bit, or nr_cpu_ids
          int next_cpu_nr(cpu, mask)	     Next cpu past 'cpu', or nr_cpu_ids
          for_each_cpu_mask_nr(cpu, mask)  for-loop cpu over mask using nr_cpu_ids
      
        * Modify following to use performance variants:
      
          #define num_online_cpus()	cpus_weight_nr(cpu_online_map)
          #define num_possible_cpus()	cpus_weight_nr(cpu_possible_map)
          #define num_present_cpus()	cpus_weight_nr(cpu_present_map)
      
          #define for_each_possible_cpu(cpu) for_each_cpu_mask_nr((cpu), ...)
          #define for_each_online_cpu(cpu)   for_each_cpu_mask_nr((cpu), ...)
          #define for_each_present_cpu(cpu)  for_each_cpu_mask_nr((cpu), ...)
      
        * Comment added to include/linux/cpumask.h:
      
          Note: The alternate operations with the suffix "_nr" are used
      	  to limit the range of the loop to nr_cpu_ids instead of
      	  NR_CPUS when NR_CPUS > 64 for performance reasons.
      	  If NR_CPUS is <= 64 then most assembler bitmask
      	  operators execute faster with a constant range, so
      	  the operator will continue to use NR_CPUS.
      
      	  Another consideration is that nr_cpu_ids is initialized
      	  to NR_CPUS and isn't lowered until the possible cpus are
      	  discovered (including any disabled cpus).  So early uses
      	  will span the entire range of NR_CPUS.
      
          (The net effect is that for systems with 64 or less CPU's there are no
           functional changes.)
      
      For inclusion into sched-devel/latest tree.
      
      Based on:
      	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
          +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git
      
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Reviewed-by: NPaul Jackson <pj@sgi.com>
      Reviewed-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      41df0d61
  26. 13 5月, 2008 1 次提交
  27. 28 4月, 2008 1 次提交
    • P
      mempolicy: add bitmap_onto() and bitmap_fold() operations · 7ea931c9
      Paul Jackson 提交于
      The following adds two more bitmap operators, bitmap_onto() and bitmap_fold(),
      with the usual cpumask and nodemask wrappers.
      
      The bitmap_onto() operator computes one bitmap relative to another.  If the
      n-th bit in the origin mask is set, then the m-th bit of the destination mask
      will be set, where m is the position of the n-th set bit in the relative mask.
      
      The bitmap_fold() operator folds a bitmap into a second that has bit m set iff
      the input bitmap has some bit n set, where m == n mod sz, for the specified sz
      value.
      
      There are two substantive changes between this patch and its
      predecessor bitmap_relative:
       1) Renamed bitmap_relative() to be bitmap_onto().
       2) Added bitmap_fold().
      
      The essential motivation for bitmap_onto() is to provide a mechanism for
      converting a cpuset-relative CPU or Node mask to an absolute mask.  Cpuset
      relative masks are written as if the current task were in a cpuset whose CPUs
      or Nodes were just the consecutive ones numbered 0..N-1, for some N.  The
      bitmap_onto() operator is provided in anticipation of adding support for the
      first such cpuset relative mask, by the mbind() and set_mempolicy() system
      calls, using a planned flag of MPOL_F_RELATIVE_NODES.  These bitmap operators
      (and their nodemask wrappers, in particular) will be used in code that
      converts the user specified cpuset relative memory policy to a specific system
      node numbered policy, given the current mems_allowed of the tasks cpuset.
      
      Such cpuset relative mempolicies will address two deficiencies
      of the existing interface between cpusets and mempolicies:
       1) A task cannot at present reliably establish a cpuset
          relative mempolicy because there is an essential race
          condition, in that the tasks cpuset may be changed in
          between the time the task can query its cpuset placement,
          and the time the task can issue the applicable mbind or
          set_memplicy system call.
       2) A task cannot at present establish what cpuset relative
          mempolicy it would like to have, if it is in a smaller
          cpuset than it might have mempolicy preferences for,
          because the existing interface only allows specifying
          mempolicies for nodes currently allowed by the cpuset.
      
      Cpuset relative mempolicies are useful for tasks that don't distinguish
      particularly between one CPU or Node and another, but only between how many of
      each are allowed, and the proper placement of threads and memory pages on the
      various CPUs and Nodes available.
      
      The motivation for the added bitmap_fold() can be seen in the following
      example.
      
      Let's say an application has specified some mempolicies that presume 16 memory
      nodes, including say a mempolicy that specified MPOL_F_RELATIVE_NODES (cpuset
      relative) nodes 12-15.  Then lets say that application is crammed into a
      cpuset that only has 8 memory nodes, 0-7.  If one just uses bitmap_onto(),
      this mempolicy, mapped to that cpuset, would ignore the requested relative
      nodes above 7, leaving it empty of nodes.  That's not good; better to fold the
      higher nodes down, so that some nodes are included in the resulting mapped
      mempolicy.  In this case, the mempolicy nodes 12-15 are taken modulo 8 (the
      weight of the mems_allowed of the confining cpuset), resulting in a mempolicy
      specifying nodes 4-7.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: <kosaki.motohiro@jp.fujitsu.com>
      Cc: <ray-lk@madrabbit.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ea931c9
  28. 20 4月, 2008 3 次提交
    • M
      x86: convert cpumask_of_cpu macro to allocated array · 9f0e8d04
      Mike Travis 提交于
        * Here is a simple patch to use an allocated array of cpumasks to
          represent cpumask_of_cpu() instead of constructing one on the stack.
          It's based on the Kconfig option "HAVE_CPUMASK_OF_CPU_MAP" which is
          currently only set for x86_64 SMP.  Otherwise the the existing
          cpumask_of_cpu() is used but has been changed to produce an lvalue
          so a pointer to it can be used.
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9f0e8d04
    • M
      cpumask: add CPU_MASK_ALL_PTR macro · 321a8e9d
      Mike Travis 提交于
        * Add a static cpumask_t variable "CPU_MASK_ALL_PTR" to use as
          a pointer reference to CPU_MASK_ALL.  This reduces where possible
          the instances where CPU_MASK_ALL allocates and fills a large
          array on the stack.  Used only if NR_CPUS > BITS_PER_LONG.
      
        * Change init/main.c to use new set_cpus_allowed_ptr().
      
      Depends on:
      	[sched-devel]: sched: add new set_cpus_allowed_ptr function
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      321a8e9d
    • M
      cpumask: add cpumask_scnprintf_len function · 30ca60c1
      Mike Travis 提交于
      Add a new function cpumask_scnprintf_len() to return the number of
      characters needed to display "len" cpumask bits.  The current method
      of allocating NR_CPUS bytes is incorrect as what's really needed is
      9 characters per 32-bit word of cpumask bits (8 hex digits plus the
      seperator [','] or the terminating NULL.)  This function provides the
      caller the means to allocate the correct string length.
      
      Cc: Paul Jackson <pj@sgi.com>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      30ca60c1
  29. 30 1月, 2008 1 次提交
  30. 07 1月, 2008 1 次提交
    • I
      CPU hotplug: fix cpu_is_offline() on !CONFIG_HOTPLUG_CPU · a263898f
      Ingo Molnar 提交于
      make randconfig bootup testing found that the cpufreq code
      crashes on bootup, if the powernow-k8 driver is enabled and
      if maxcpus=1 passed on the boot line to a !CONFIG_HOTPLUG_CPU
      kernel.
      
      First lockdep found out that there's an inconsistent unlock
      sequence:
      
       =====================================
       [ BUG: bad unlock balance detected! ]
       -------------------------------------
       swapper/1 is trying to release lock (&per_cpu(cpu_policy_rwsem, cpu)) at:
       [<ffffffff806ffd8e>] unlock_policy_rwsem_write+0x3c/0x42
       but there are no more locks to release!
      
      Call Trace:
       [<ffffffff806ffd8e>] unlock_policy_rwsem_write+0x3c/0x42
       [<ffffffff80251c29>] print_unlock_inbalance_bug+0x104/0x12c
       [<ffffffff80252f3a>] mark_held_locks+0x56/0x94
       [<ffffffff806ffd8e>] unlock_policy_rwsem_write+0x3c/0x42
       [<ffffffff807008b6>] cpufreq_add_dev+0x2a8/0x5c4
       ...
      
      then shortly afterwards the cpufreq code crashed on an assert:
      
       ------------[ cut here ]------------
       kernel BUG at drivers/cpufreq/cpufreq.c:1068!
       invalid opcode: 0000 [1] SMP
       [...]
       Call Trace:
        [<ffffffff805145d6>] sysdev_driver_unregister+0x5b/0x91
        [<ffffffff806ff520>] cpufreq_register_driver+0x15d/0x1a2
        [<ffffffff80cc0596>] powernowk8_init+0x86/0x94
       [...]
       ---[ end trace 1e9219be2b4431de ]---
      
      the bug was caused by maxcpus=1 bootup, which brought up the
      secondary core as !cpu_online() but !cpu_is_offline() either,
      which on on !CONFIG_HOTPLUG_CPU is always 0 (include/linux/cpu.h):
      
        /* CPUs don't go offline once they're online w/o CONFIG_HOTPLUG_CPU */
        static inline int cpu_is_offline(int cpu) { return 0; }
      
      but the cpufreq code uses cpu_online() and cpu_is_offline() in
      a mixed way - the low-level drivers use cpu_online(), while
      the cpufreq core uses cpu_is_offline(). This opened up the
      possibility to add the non-initialized sysdev device of the
      secondary core:
      
       cpufreq-core: trying to register driver powernow-k8
       cpufreq-core: adding CPU 0
       powernow-k8: BIOS error - no PSB or ACPI _PSS objects
       cpufreq-core: initialization failed
       cpufreq-core: adding CPU 1
       cpufreq-core: initialization failed
      
      which then blew up. The fix is to make cpu_is_offline() always
      the negation of cpu_online(). With that fix applied the kernel
      boots up fine without crashing:
      
       Calling initcall 0xffffffff80cc0510: powernowk8_init+0x0/0x94()
       powernow-k8: Found 1 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ processors (1 cpu cores) (version 2.20.00)
       powernow-k8: BIOS error - no PSB or ACPI _PSS objects
       initcall 0xffffffff80cc0510: powernowk8_init+0x0/0x94() returned -19.
       initcall 0xffffffff80cc0510 ran for 19 msecs: powernowk8_init+0x0/0x94()
       Calling initcall 0xffffffff80cc328f: init_lapic_nmi_sysfs+0x0/0x39()
      
      We could fix this by making CPU enumeration aware of max_cpus, but that
      would be more fragile IMO, and the cpu_online(cpu) != cpu_is_offline(cpu)
      possibility was quite confusing and a continuous source of bugs too.
      
      Most distributions have kernels with CPU hotplug enabled, so this bug
      remained hidden for a long time.
      
      Bug forensics:
      
      The broken cpu_is_offline() API variant was introduced via:
      
       commit a59d2e4e6977e7b94e003c96a41f07e96cddc340
       Author: Rusty Russell <rusty@rustcorp.com.au>
       Date:   Mon Mar 8 06:06:03 2004 -0800
      
           [PATCH] minor cleanups for hotplug CPUs
      
      ( this predates linux-2.6.git, this commit is available from Thomas's
        historic git tree. )
      
      Then 1.5 years later the cpufreq code made use of it:
      
       commit c32b6b8e
       Author: Ashok Raj <ashok.raj@intel.com>
       Date:   Sun Oct 30 14:59:54 2005 -0800
      
           [PATCH] create and destroy cpufreq sysfs entries based on cpu notifiers
      
       +       if (cpu_is_offline(cpu))
       +               return 0;
      
      which is a correct use of the subtly broken new API. v2.6.15 then
      shipped with this bug included.
      
      then it took two more years for random-kernel qa to hit it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a263898f