1. 16 12月, 2013 1 次提交
    • M
      ftrace: Initialize the ftrace profiler for each possible cpu · c4602c1c
      Miao Xie 提交于
      Ftrace currently initializes only the online CPUs. This implementation has
      two problems:
      - If we online a CPU after we enable the function profile, and then run the
        test, we will lose the trace information on that CPU.
        Steps to reproduce:
        # echo 0 > /sys/devices/system/cpu/cpu1/online
        # cd <debugfs>/tracing/
        # echo <some function name> >> set_ftrace_filter
        # echo 1 > function_profile_enabled
        # echo 1 > /sys/devices/system/cpu/cpu1/online
        # run test
      - If we offline a CPU before we enable the function profile, we will not clear
        the trace information when we enable the function profile. It will trouble
        the users.
        Steps to reproduce:
        # cd <debugfs>/tracing/
        # echo <some function name> >> set_ftrace_filter
        # echo 1 > function_profile_enabled
        # run test
        # cat trace_stat/function*
        # echo 0 > /sys/devices/system/cpu/cpu1/online
        # echo 0 > function_profile_enabled
        # echo 1 > function_profile_enabled
        # cat trace_stat/function*
        # run test
        # cat trace_stat/function*
      
      So it is better that we initialize the ftrace profiler for each possible cpu
      every time we enable the function profile instead of just the online ones.
      
      Link: http://lkml.kernel.org/r/1387178401-10619-1-git-send-email-miaox@cn.fujitsu.com
      
      Cc: stable@vger.kernel.org # 2.6.31+
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      c4602c1c
  2. 06 12月, 2013 1 次提交
  3. 29 11月, 2013 1 次提交
  4. 28 11月, 2013 2 次提交
    • T
      cgroup: fix cgroup_subsys_state leak for seq_files · e605b365
      Tejun Heo 提交于
      If a cgroup file implements either read_map() or read_seq_string(),
      such file is served using seq_file by overriding file->f_op to
      cgroup_seqfile_operations, which also overrides the release method to
      single_release() from cgroup_file_release().
      
      Because cgroup_file_open() didn't use to acquire any resources, this
      used to be fine, but since f7d58818 ("cgroup: pin
      cgroup_subsys_state when opening a cgroupfs file"), cgroup_file_open()
      pins the css (cgroup_subsys_state) which is put by
      cgroup_file_release().  The patch forgot to update the release path
      for seq_files and each open/release cycle leaks a css reference.
      
      Fix it by updating cgroup_file_release() to also handle seq_files and
      using it for seq_file release path too.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org # v3.12
      e605b365
    • P
      cpuset: Fix memory allocator deadlock · 0fc0287c
      Peter Zijlstra 提交于
      Juri hit the below lockdep report:
      
      [    4.303391] ======================================================
      [    4.303392] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
      [    4.303394] 3.12.0-dl-peterz+ #144 Not tainted
      [    4.303395] ------------------------------------------------------
      [    4.303397] kworker/u4:3/689 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
      [    4.303399]  (&p->mems_allowed_seq){+.+...}, at: [<ffffffff8114e63c>] new_slab+0x6c/0x290
      [    4.303417]
      [    4.303417] and this task is already holding:
      [    4.303418]  (&(&q->__queue_lock)->rlock){..-...}, at: [<ffffffff812d2dfb>] blk_execute_rq_nowait+0x5b/0x100
      [    4.303431] which would create a new lock dependency:
      [    4.303432]  (&(&q->__queue_lock)->rlock){..-...} -> (&p->mems_allowed_seq){+.+...}
      [    4.303436]
      
      [    4.303898] the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock:
      [    4.303918] -> (&p->mems_allowed_seq){+.+...} ops: 2762 {
      [    4.303922]    HARDIRQ-ON-W at:
      [    4.303923]                     [<ffffffff8108ab9a>] __lock_acquire+0x65a/0x1ff0
      [    4.303926]                     [<ffffffff8108cbe3>] lock_acquire+0x93/0x140
      [    4.303929]                     [<ffffffff81063dd6>] kthreadd+0x86/0x180
      [    4.303931]                     [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0
      [    4.303933]    SOFTIRQ-ON-W at:
      [    4.303933]                     [<ffffffff8108abcc>] __lock_acquire+0x68c/0x1ff0
      [    4.303935]                     [<ffffffff8108cbe3>] lock_acquire+0x93/0x140
      [    4.303940]                     [<ffffffff81063dd6>] kthreadd+0x86/0x180
      [    4.303955]                     [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0
      [    4.303959]    INITIAL USE at:
      [    4.303960]                    [<ffffffff8108a884>] __lock_acquire+0x344/0x1ff0
      [    4.303963]                    [<ffffffff8108cbe3>] lock_acquire+0x93/0x140
      [    4.303966]                    [<ffffffff81063dd6>] kthreadd+0x86/0x180
      [    4.303969]                    [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0
      [    4.303972]  }
      
      Which reports that we take mems_allowed_seq with interrupts enabled. A
      little digging found that this can only be from
      cpuset_change_task_nodemask().
      
      This is an actual deadlock because an interrupt doing an allocation will
      hit get_mems_allowed()->...->__read_seqcount_begin(), which will spin
      forever waiting for the write side to complete.
      
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Reported-by: NJuri Lelli <juri.lelli@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Tested-by: NJuri Lelli <juri.lelli@gmail.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      0fc0287c
  5. 26 11月, 2013 1 次提交
    • S
      ftrace: Fix function graph with loading of modules · 8a56d776
      Steven Rostedt (Red Hat) 提交于
      Commit 8c4f3c3f "ftrace: Check module functions being traced on reload"
      fixed module loading and unloading with respect to function tracing, but
      it missed the function graph tracer. If you perform the following
      
       # cd /sys/kernel/debug/tracing
       # echo function_graph > current_tracer
       # modprobe nfsd
       # echo nop > current_tracer
      
      You'll get the following oops message:
      
       ------------[ cut here ]------------
       WARNING: CPU: 2 PID: 2910 at /linux.git/kernel/trace/ftrace.c:1640 __ftrace_hash_rec_update.part.35+0x168/0x1b9()
       Modules linked in: nfsd exportfs nfs_acl lockd ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt
       CPU: 2 PID: 2910 Comm: bash Not tainted 3.13.0-rc1-test #7
       Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
        0000000000000668 ffff8800787efcf8 ffffffff814fe193 ffff88007d500000
        0000000000000000 ffff8800787efd38 ffffffff8103b80a 0000000000000668
        ffffffff810b2b9a ffffffff81a48370 0000000000000001 ffff880037aea000
       Call Trace:
        [<ffffffff814fe193>] dump_stack+0x4f/0x7c
        [<ffffffff8103b80a>] warn_slowpath_common+0x81/0x9b
        [<ffffffff810b2b9a>] ? __ftrace_hash_rec_update.part.35+0x168/0x1b9
        [<ffffffff8103b83e>] warn_slowpath_null+0x1a/0x1c
        [<ffffffff810b2b9a>] __ftrace_hash_rec_update.part.35+0x168/0x1b9
        [<ffffffff81502f89>] ? __mutex_lock_slowpath+0x364/0x364
        [<ffffffff810b2cc2>] ftrace_shutdown+0xd7/0x12b
        [<ffffffff810b47f0>] unregister_ftrace_graph+0x49/0x78
        [<ffffffff810c4b30>] graph_trace_reset+0xe/0x10
        [<ffffffff810bf393>] tracing_set_tracer+0xa7/0x26a
        [<ffffffff810bf5e1>] tracing_set_trace_write+0x8b/0xbd
        [<ffffffff810c501c>] ? ftrace_return_to_handler+0xb2/0xde
        [<ffffffff811240a8>] ? __sb_end_write+0x5e/0x5e
        [<ffffffff81122aed>] vfs_write+0xab/0xf6
        [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85
        [<ffffffff81122dbd>] SyS_write+0x59/0x82
        [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85
        [<ffffffff8150a2d2>] system_call_fastpath+0x16/0x1b
       ---[ end trace 940358030751eafb ]---
      
      The above mentioned commit didn't go far enough. Well, it covered the
      function tracer by adding checks in __register_ftrace_function(). The
      problem is that the function graph tracer circumvents that (for a slight
      efficiency gain when function graph trace is running with a function
      tracer. The gain was not worth this).
      
      The problem came with ftrace_startup() which should always be called after
      __register_ftrace_function(), if you want this bug to be completely fixed.
      
      Anyway, this solution moves __register_ftrace_function() inside of
      ftrace_startup() and removes the need to call them both.
      Reported-by: NDave Wysochanski <dwysocha@redhat.com>
      Fixes: ed926f9b ("ftrace: Use counters to enable functions to trace")
      Cc: stable@vger.kernel.org # 3.0+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8a56d776
  6. 23 11月, 2013 5 次提交
    • L
      workqueue: fix pool ID allocation leakage and remove BUILD_BUG_ON() in init_workqueues · 4e8b22bd
      Li Bin 提交于
      When one work starts execution, the high bits of work's data contain
      pool ID. It can represent a maximum of WORK_OFFQ_POOL_NONE. Pool ID
      is assigned WORK_OFFQ_POOL_NONE when the work being initialized
      indicating that no pool is associated and get_work_pool() uses it to
      check the associated pool. So if worker_pool_assign_id() assigns a
      ID greater than or equal WORK_OFFQ_POOL_NONE to a pool, it triggers
      leakage, and it may break the non-reentrance guarantee.
      
      This patch fix this issue by modifying the worker_pool_assign_id()
      function calling idr_alloc() by setting @end param WORK_OFFQ_POOL_NONE.
      
      Furthermore, in the current implementation, the BUILD_BUG_ON() in
      init_workqueues makes no sense. The number of worker pools needed
      cannot be determined at compile time, because the number of backing
      pools for UNBOUND workqueues is dynamic based on the assigned custom
      attributes. So remove it.
      
      tj: Minor comment and indentation updates.
      Signed-off-by: NLi Bin <huawei.libin@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      4e8b22bd
    • L
      workqueue: fix comment typo for __queue_work() · 9ef28a73
      Li Bin 提交于
      It seems the "dying" should be "draining" here.
      Signed-off-by: NLi Bin <huawei.libin@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      9ef28a73
    • T
      workqueue: fix ordered workqueues in NUMA setups · 8a2b7538
      Tejun Heo 提交于
      An ordered workqueue implements execution ordering by using single
      pool_workqueue with max_active == 1.  On a given pool_workqueue, work
      items are processed in FIFO order and limiting max_active to 1
      enforces the queued work items to be processed one by one.
      
      Unfortunately, 4c16bd32 ("workqueue: implement NUMA affinity for
      unbound workqueues") accidentally broke this guarantee by applying
      NUMA affinity to ordered workqueues too.  On NUMA setups, an ordered
      workqueue would end up with separate pool_workqueues for different
      nodes.  Each pool_workqueue still limits max_active to 1 but multiple
      work items may be executed concurrently and out of order depending on
      which node they are queued to.
      
      Fix it by using dedicated ordered_wq_attrs[] when creating ordered
      workqueues.  The new attrs match the unbound ones except that no_numa
      is always set thus forcing all NUMA nodes to share the default
      pool_workqueue.
      
      While at it, add sanity check in workqueue creation path which
      verifies that an ordered workqueues has only the default
      pool_workqueue.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NLibin <huawei.libin@huawei.com>
      Cc: stable@vger.kernel.org
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      8a2b7538
    • O
      workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY · 91151228
      Oleg Nesterov 提交于
      Move the setting of PF_NO_SETAFFINITY up before set_cpus_allowed()
      in create_worker(). Otherwise userland can change ->cpus_allowed
      in between.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      91151228
    • T
      cgroup: use a dedicated workqueue for cgroup destruction · e5fca243
      Tejun Heo 提交于
      Since be445626 ("cgroup: remove synchronize_rcu() from
      cgroup_diput()"), cgroup destruction path makes use of workqueue.  css
      freeing is performed from a work item from that point on and a later
      commit, ea15f8cc ("cgroup: split cgroup destruction into two
      steps"), moves css offlining to workqueue too.
      
      As cgroup destruction isn't depended upon for memory reclaim, the
      destruction work items were put on the system_wq; unfortunately, some
      controller may block in the destruction path for considerable duration
      while holding cgroup_mutex.  As large part of destruction path is
      synchronized through cgroup_mutex, when combined with high rate of
      cgroup removals, this has potential to fill up system_wq's max_active
      of 256.
      
      Also, it turns out that memcg's css destruction path ends up queueing
      and waiting for work items on system_wq through work_on_cpu().  If
      such operation happens while system_wq is fully occupied by cgroup
      destruction work items, work_on_cpu() can't make forward progress
      because system_wq is full and other destruction work items on
      system_wq can't make forward progress because the work item waiting
      for work_on_cpu() is holding cgroup_mutex, leading to deadlock.
      
      This can be fixed by queueing destruction work items on a separate
      workqueue.  This patch creates a dedicated workqueue -
      cgroup_destroy_wq - for this purpose.  As these work items shouldn't
      have inter-dependencies and mostly serialized by cgroup_mutex anyway,
      giving high concurrency level doesn't buy anything and the workqueue's
      @max_active is set to 1 so that destruction work items are executed
      one by one on each CPU.
      
      Hugh Dickins: Because cgroup_init() is run before init_workqueues(),
      cgroup_destroy_wq can't be allocated from cgroup_init().  Do it from a
      separate core_initcall().  In the future, we probably want to reorder
      so that workqueue init happens before cgroup_init().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NHugh Dickins <hughd@google.com>
      Reported-by: NShawn Bohrer <shawn.bohrer@gmail.com>
      Link: http://lkml.kernel.org/r/20131111220626.GA7509@sbohrermbp13-local.rgmadvisors.com
      Link: http://lkml.kernel.org/g/alpine.LNX.2.00.1310301606080.2333@eggly.anvils
      Cc: stable@vger.kernel.org # v3.9+
      e5fca243
  7. 20 11月, 2013 2 次提交
    • K
      kernel/bounds: avoid circular dependencies in generated headers · 24b9fdc5
      Kirill A. Shutemov 提交于
      <linux/spinlock.h> has heavy dependencies on other header files.
      It triggers circular dependencies in generated headers on IA64, at
      least:
      
        CC      kernel/bounds.s
      In file included from /home/space/kas/git/public/linux/arch/ia64/include/asm/thread_info.h:9:0,
                       from include/linux/thread_info.h:54,
                       from include/asm-generic/preempt.h:4,
                       from arch/ia64/include/generated/asm/preempt.h:1,
                       from include/linux/preempt.h:18,
                       from include/linux/spinlock.h:50,
                       from kernel/bounds.c:14:
      /home/space/kas/git/public/linux/arch/ia64/include/asm/asm-offsets.h:1:35: fatal error: generated/asm-offsets.h: No such file or directory
      compilation terminated.
      
      Let's replace <linux/spinlock.h> with <linux/spinlock_types.h>, it's
      enough to find out size of spinlock_t.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-and-Tested-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      24b9fdc5
    • J
      genetlink: only pass array to genl_register_family_with_ops() · c53ed742
      Johannes Berg 提交于
      As suggested by David Miller, make genl_register_family_with_ops()
      a macro and pass only the array, evaluating ARRAY_SIZE() in the
      macro, this is a little safer.
      
      The openvswitch has some indirection, assing ops/n_ops directly in
      that code. This might ultimately just assign the pointers in the
      family initializations, saving the struct genl_family_and_ops and
      code (once mcast groups are handled differently.)
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c53ed742
  8. 16 11月, 2013 1 次提交
  9. 15 11月, 2013 10 次提交
  10. 13 11月, 2013 16 次提交