1. 09 6月, 2010 4 次提交
    • T
      sched: add hooks for workqueue · 21aa9af0
      Tejun Heo 提交于
      Concurrency managed workqueue needs to know when workers are going to
      sleep and waking up.  Using these two hooks, cmwq keeps track of the
      current concurrency level and throttles execution of new works if it's
      too high and wakes up another worker from the sleep hook if it becomes
      too low.
      
      This patch introduces PF_WQ_WORKER to identify workqueue workers and
      adds the following two hooks.
      
      * wq_worker_waking_up(): called when a worker is woken up.
      
      * wq_worker_sleeping(): called when a worker is going to sleep and may
        return a pointer to a local task which should be woken up.  The
        returned task is woken up using try_to_wake_up_local() which is
        simplified ttwu which is called under rq lock and can only wake up
        local tasks.
      
      Both hooks are currently defined as noop in kernel/workqueue_sched.h.
      Later cmwq implementation will replace them with proper
      implementation.
      
      These hooks are hard coded as they'll always be enabled.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      21aa9af0
    • T
      sched: refactor try_to_wake_up() · 9ed3811a
      Tejun Heo 提交于
      Factor ttwu_activate() and ttwu_woken_up() out of try_to_wake_up().
      The factoring out doesn't affect try_to_wake_up() much
      code-generation-wise.  Depending on configuration options, it ends up
      generating the same object code as before or slightly different one
      due to different register assignment.
      
      This is to help future implementation of try_to_wake_up_local().
      
      Mike Galbraith suggested rename to ttwu_post_activation() from
      ttwu_woken_up() and comment update in try_to_wake_up().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      9ed3811a
    • T
      sched: adjust when cpu_active and cpuset configurations are updated during cpu on/offlining · 3a101d05
      Tejun Heo 提交于
      Currently, when a cpu goes down, cpu_active is cleared before
      CPU_DOWN_PREPARE starts and cpuset configuration is updated from a
      default priority cpu notifier.  When a cpu is coming up, it's set
      before CPU_ONLINE but cpuset configuration again is updated from the
      same cpu notifier.
      
      For cpu notifiers, this presents an inconsistent state.  Threads which
      a CPU_DOWN_PREPARE notifier expects to be bound to the CPU can be
      migrated to other cpus because the cpu is no more inactive.
      
      Fix it by updating cpu_active in the highest priority cpu notifier and
      cpuset configuration in the second highest when a cpu is coming up.
      Down path is updated similarly.  This guarantees that all other cpu
      notifiers see consistent cpu_active and cpuset configuration.
      
      cpuset_track_online_cpus() notifier is converted to
      cpuset_update_active_cpus() which just updates the configuration and
      now called from cpuset_cpu_[in]active() notifiers registered from
      sched_init_smp().  If cpuset is disabled, cpuset_update_active_cpus()
      degenerates into partition_sched_domains() making separate notifier
      for !CONFIG_CPUSETS unnecessary.
      
      This problem is triggered by cmwq.  During CPU_DOWN_PREPARE, hotplug
      callback creates a kthread and kthread_bind()s it to the target cpu,
      and the thread is expected to run on that cpu.
      
      * Ingo's test discovered __cpuinit/exit markups were incorrect.
        Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Menage <menage@google.com>
      3a101d05
    • T
      sched: define and use CPU_PRI_* enums for cpu notifier priorities · 50a323b7
      Tejun Heo 提交于
      Instead of hardcoding priority 10 and 20 in sched and perf, collect
      them into CPU_PRI_* enums.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      50a323b7
  2. 05 6月, 2010 11 次提交
    • R
      module: fix bne2 "gave up waiting for init of module libcrc32c" · 9bea7f23
      Rusty Russell 提交于
      Problem: it's hard to avoid an init routine stumbling over a
      request_module these days.  And it's not clear it's always a bad idea:
      for example, a module like kvm with dynamic dependencies on kvm-intel
      or kvm-amd would be neater if it could simply request_module the right
      one.
      
      In this particular case, it's libcrc32c:
      
      	libcrc32c_mod_init
      	 crypto_alloc_shash
      	  crypto_alloc_tfm
      	   crypto_find_alg
      	    crypto_alg_mod_lookup
      	     crypto_larval_lookup
      	      request_module
      
      If another module is waiting inside resolve_symbol() for libcrc32c to
      finish initializing (ie. bne2 depends on libcrc32c) then it does so
      holding the module lock, and our request_module() can't make progress
      until that is released.
      
      Waiting inside resolve_symbol() without the lock isn't all that hard:
      we just need to pass the -EBUSY up the call chain so we can sleep
      where we don't hold the lock.  Error reporting is a bit trickier: we
      need to copy the name of the unfinished module before releasing the
      lock.
      
      Other notes:
      1) This also fixes a theoretical issue where a weak dependency would allow
         symbol version mismatches to be ignored.
      2) We rename use_module to ref_module to make life easier for the only
         external user (the out-of-tree ksplice patches).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Tim Abbot <tabbott@ksplice.com>
      Tested-by: NBrandon Philips <bphilips@suse.de>
      9bea7f23
    • R
      module: verify_export_symbols under the lock · be593f4c
      Rusty Russell 提交于
      It disabled preempt so it was "safe", but nothing stops another module
      slipping in before this module is added to the global list now we don't
      hold the lock the whole time.
      
      So we check this just after we check for duplicate modules, and just
      before we put the module in the global list.
      
      (find_symbol finds symbols in coming and going modules, too).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      be593f4c
    • L
      module: move find_module check to end · 3bafeb62
      Linus Torvalds 提交于
      I think Rusty may have made the lock a bit _too_ finegrained there, and
      didn't add it to some places that needed it. It looks, for example, like
      PATCH 1/2 actually drops the lock in places where it's needed
      ("find_module()" is documented to need it, but now load_module() didn't
      hold it at all when it did the find_module()).
      
      Rather than adding a new "module_loading" list, I think we should be able
      to just use the existing "modules" list, and just fix up the locking a
      bit.
      
      In fact, maybe we could just move the "look up existing module" a bit
      later - optimistically assuming that the module doesn't exist, and then
      just undoing the work if it turns out that we were wrong, just before
      adding ourselves to the list.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      3bafeb62
    • R
      module: make locking more fine-grained. · 75676500
      Rusty Russell 提交于
      Kay Sievers <kay.sievers@vrfy.org> reports that we still have some
      contention over module loading which is slowing boot.
      
      Linus also disliked a previous "drop lock and regrab" patch to fix the
      bne2 "gave up waiting for init of module libcrc32c" message.
      
      This is more ambitious: we only grab the lock where we need it.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Brandon Philips <brandon@ifup.org>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      75676500
    • R
      module: Make module sysfs functions private. · 6407ebb2
      Rusty Russell 提交于
      These were placed in the header in ef665c1a to get the various
      SYSFS/MODULE config combintations to compile.
      
      That may have been necessary then, but it's not now.  These functions
      are all local to module.c.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      6407ebb2
    • R
      module: move sysfs exposure to end of load_module · 80a3d1bb
      Rusty Russell 提交于
      This means a little extra work, but is more logical: we don't put
      anything in sysfs until we're about to put the module into the
      global list an parse its parameters.
      
      This also gives us a logical place to put duplicate module detection
      in the next patch.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      80a3d1bb
    • R
      module: fix kdb's illicit use of struct module_use. · c8e21ced
      Rusty Russell 提交于
      Linus changed the structure, and luckily this didn't compile any more.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Martin Hicks <mort@sgi.com>
      c8e21ced
    • L
      module: Make the 'usage' lists be two-way · 2c02dfe7
      Linus Torvalds 提交于
      When adding a module that depends on another one, we used to create a
      one-way list of "modules_which_use_me", so that module unloading could
      see who needs a module.
      
      It's actually quite simple to make that list go both ways: so that we
      not only can see "who uses me", but also see a list of modules that are
      "used by me".
      
      In fact, we always wanted that list in "module_unload_free()": when we
      unload a module, we want to also release all the other modules that are
      used by that module.  But because we didn't have that list, we used to
      first iterate over all modules, and then iterate over each "used by me"
      list of that module.
      
      By making the list two-way, we simplify module_unload_free(), and it
      allows for some trivial fixes later too.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cleaned & rebased)
      2c02dfe7
    • A
      kernel/: fix BUG_ON checks for cpu notifier callbacks direct call · 9e506f7a
      Akinobu Mita 提交于
      The commit 80b5184c ("kernel/: convert cpu
      notifier to return encapsulate errno value") changed the return value of
      cpu notifier callbacks.
      
      Those callbacks don't return NOTIFY_BAD on failures anymore.  But there
      are a few callbacks which are called directly at init time and checking
      the return value.
      
      I forgot to change BUG_ON checking by the direct callers in the commit.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e506f7a
    • G
      cgroups: alloc_css_id() increments hierarchy depth · 94b3dd0f
      Greg Thelen 提交于
      Child groups should have a greater depth than their parents.  Prior to
      this change, the parent would incorrectly report zero memory usage for
      child cgroups when use_hierarchy is enabled.
      
      test script:
        mount -t cgroup none /cgroups -o memory
        cd /cgroups
        mkdir cg1
      
        echo 1 > cg1/memory.use_hierarchy
        mkdir cg1/cg11
      
        echo $$ > cg1/cg11/tasks
        dd if=/dev/zero of=/tmp/foo bs=1M count=1
      
        echo
        echo CHILD
        grep cache cg1/cg11/memory.stat
      
        echo
        echo PARENT
        grep cache cg1/memory.stat
      
        echo $$ > tasks
        rmdir cg1/cg11 cg1
        cd /
        umount /cgroups
      
      Using fae9c791, a recent patch that changed alloc_css_id() depth computation,
      the parent incorrectly reports zero usage:
        root@ubuntu:~# ./test
        1+0 records in
        1+0 records out
        1048576 bytes (1.0 MB) copied, 0.0151844 s, 69.1 MB/s
      
        CHILD
        cache 1048576
        total_cache 1048576
      
        PARENT
        cache 0
        total_cache 0
      
      With this patch, the parent correctly includes child usage:
        root@ubuntu:~# ./test
        1+0 records in
        1+0 records out
        1048576 bytes (1.0 MB) copied, 0.0136827 s, 76.6 MB/s
      
        CHILD
        cache 1052672
        total_cache 1052672
      
        PARENT
        cache 0
        total_cache 1052672
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Acked-by: NPaul Menage <menage@google.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: <stable@kernel.org>		[2.6.34.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      94b3dd0f
    • O
      sys_personality: change sys_personality() to accept "unsigned int" instead of u_long · 485d5276
      Oleg Nesterov 提交于
      task_struct->pesonality is "unsigned int", but sys_personality() paths use
      "unsigned long pesonality".  This means that every assignment or
      comparison is not right.  In particular, if this argument does not fit
      into "unsigned int" __set_personality() changes the caller's personality
      and then sys_personality() returns -EINVAL.
      
      Turn this argument into "unsigned int" and avoid overflows.  Obviously,
      this is the user-visible change, we just ignore the upper bits.  But this
      can't break the sane application.
      
      There is another thing which can confuse the poorly written applications.
      User-space thinks that this syscall returns int, not long.  This means
      that the returned value can be negative and look like the error code.  But
      note that libc won't be confused and thus errno won't be set, and with
      this patch the user-space can never get -1 unless sys_personality() really
      fails.  And, most importantly, the negative RET != -1 is only possible if
      that app previously called personality(RET).
      Pointed-out-by: NWenming Zhang <wezhang@redhat.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      485d5276
  3. 03 6月, 2010 2 次提交
  4. 02 6月, 2010 1 次提交
  5. 01 6月, 2010 2 次提交
  6. 31 5月, 2010 10 次提交
  7. 30 5月, 2010 1 次提交
  8. 28 5月, 2010 9 次提交