1. 11 3月, 2014 3 次提交
  2. 04 3月, 2014 1 次提交
  3. 27 2月, 2014 9 次提交
    • L
      cpuset: fix a race condition in __cpuset_node_allowed_softwall() · 99afb0fd
      Li Zefan 提交于
      It's not safe to access task's cpuset after releasing task_lock().
      Holding callback_mutex won't help.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      99afb0fd
    • L
      cpuset: fix a locking issue in cpuset_migrate_mm() · 47295830
      Li Zefan 提交于
      I can trigger a lockdep warning:
      
        # mount -t cgroup -o cpuset xxx /cgroup
        # mkdir /cgroup/cpuset
        # mkdir /cgroup/tmp
        # echo 0 > /cgroup/tmp/cpuset.cpus
        # echo 0 > /cgroup/tmp/cpuset.mems
        # echo 1 > /cgroup/tmp/cpuset.memory_migrate
        # echo $$ > /cgroup/tmp/tasks
        # echo 1 > /cgruop/tmp/cpuset.mems
      
        ===============================
        [ INFO: suspicious RCU usage. ]
        3.14.0-rc1-0.1-default+ #32 Not tainted
        -------------------------------
        include/linux/cgroup.h:682 suspicious rcu_dereference_check() usage!
        ...
          [<ffffffff81582174>] dump_stack+0x72/0x86
          [<ffffffff810b8f01>] lockdep_rcu_suspicious+0x101/0x140
          [<ffffffff81105ba1>] cpuset_migrate_mm+0xb1/0xe0
        ...
      
      We used to hold cgroup_mutex when calling cpuset_migrate_mm(), but now
      we hold cpuset_mutex, which causes task_css() to complain.
      
      This is not a false-positive but a real issue.
      
      Holding cpuset_mutex won't prevent a task from migrating to another
      cpuset, and it won't prevent the original task->cgroup from destroying
      during this change.
      
      Fixes: 5d21cc2d (cpuset: replace cgroup_mutex locking with cpuset internal locking)
      Cc: <stable@vger.kernel.org> # 3.9+
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Sigend-off-by: NTejun Heo <tj@kernel.org>
      47295830
    • R
      genirq: Include missing header file in irqdomain.c · 64be38ab
      Rashika Kheria 提交于
      Include appropriate header file include/linux/of_irq.h in
      kernel/irq/irqdomain.c because it contains prototype definition of
      function define in kernel/irq/irqdomain.c.
      
      This eliminates the following warning in kernel/irq/irqdomain.c:
      kernel/irq/irqdomain.c:468:14: warning: no previous prototype for ‘irq_create_of_mapping’ [-Wmissing-prototypes]
      Signed-off-by: NRashika Kheria <rashika.kheria@gmail.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Link: http://lkml.kernel.org/r/eb89aebea7ff1a46122918ac389ebecf8248be9a.1393493276.git.rashika.kheria@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      64be38ab
    • P
      perf: Fix hotplug splat · e3703f8c
      Peter Zijlstra 提交于
      Drew Richardson reported that he could make the kernel go *boom* when hotplugging
      while having perf events active.
      
      It turned out that when you have a group event, the code in
      __perf_event_exit_context() fails to remove the group siblings from
      the context.
      
      We then proceed with destroying and freeing the event, and when you
      re-plug the CPU and try and add another event to that CPU, things go
      *boom* because you've still got dead entries there.
      Reported-by: NDrew Richardson <drew.richardson@arm.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/n/tip-k6v5wundvusvcseqj1si0oz0@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e3703f8c
    • J
      sched/deadline: Prevent rt_time growth to infinity · faa59937
      Juri Lelli 提交于
      Kirill Tkhai noted:
      
        Since deadline tasks share rt bandwidth, we must care about
        bandwidth timer set. Otherwise rt_time may grow up to infinity
        in update_curr_dl(), if there are no other available RT tasks
        on top level bandwidth.
      
      RT task were in fact throttled right after they got enqueued,
      and never executed again (rt_time never again went below rt_runtime).
      
      Peter then proposed to accrue DL execution on rt_time only when
      rt timer is active, and proposed a patch (this patch is a slight
      modification of that) to implement that behavior. While this
      solves Kirill problem, it has a drawback.
      
      Indeed, Kirill noted again:
      
        It looks we may get into a situation, when all CPU time is shared
        between RT and DL tasks:
      
        rt_runtime = n
        rt_period  = 2n
      
        | RT working, DL sleeping  | DL working, RT sleeping      |
        -----------------------------------------------------------
        | (1)     duration = n     | (2)     duration = n         | (repeat)
        |--------------------------|------------------------------|
        | (rt_bw timer is running) | (rt_bw timer is not running) |
      
        No time for fair tasks at all.
      
      While this can happen during the first period, if rq is always backlogged,
      RT tasks won't have the opportunity to execute anymore: rt_time reached
      rt_runtime during (1), suppose after (2) RT is enqueued back, it gets
      throttled since rt timer didn't fire, replenishment is from now on eaten up
      by DL tasks that accrue their execution on rt_time (while rt timer is
      active - we have an RT task waiting for replenishment). FAIR tasks are
      not touched after this first period. Ok, this is not ideal, and the situation
      is even worse!
      
      What above (the nice case), practically never happens in reality, where
      your rt timer is not aligned to tasks periods, tasks are in general not
      periodic, etc.. Long story short, you always risk to overload your system.
      
      This patch is based on Peter's idea, but exploits an additional fact:
      if you don't have RT tasks enqueued, it makes little sense to continue
      incrementing rt_time once you reached the upper limit (DL tasks have their
      own mechanism for throttling).
      
      This cures both problems:
      
       - no matter how many DL instances in the past, you'll have an rt_time
         slightly above rt_runtime when an RT task is enqueued, and from that
         point on (after the first replenishment), the task will normally execute;
      
       - you can still eat up all bandwidth during the first period, but not
         anymore after that, remember that DL execution will increment rt_time
         till the upper limit is reached.
      
      The situation is still not perfect! But, we have a simple solution for now,
      that limits how much you can jeopardize your system, as we keep working
      towards the right answer: RT groups scheduled using deadline servers.
      Reported-by: NKirill Tkhai <tkhai@yandex.ru>
      Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20140225151515.617714e2f2cd6c558531ba61@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      faa59937
    • J
      sched/deadline: Switch CPU's presence test order · eec751ed
      Juri Lelli 提交于
      Commit 82b95800 ("sched/deadline: Test for CPU's presence explicitly")
      changed how we check if a CPU returned by cpudeadline machinery is
      valid. But, we don't want to call cpu_present() if best_cpu is
      equal to -1. So, switch the order of tests inside WARN_ON().
      Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: boris.ostrovsky@oracle.com
      Cc: konrad.wilk@oracle.com
      Cc: rostedt@goodmis.org
      Link: http://lkml.kernel.org/r/1393238832-9100-1-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      eec751ed
    • K
      sched/deadline: Cleanup RT leftovers from {inc/dec}_dl_migration · 3908ac13
      Kirill Tkhai 提交于
      In deadline class we do not have group scheduling.
      
      So, let's remove unnecessary
      
      	X = X;
      
      equations.
      Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Link: http://lkml.kernel.org/r/1393343543.4089.5.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3908ac13
    • G
      sched: Fix double normalization of vruntime · 791c9e02
      George McCollister 提交于
      dequeue_entity() is called when p->on_rq and sets se->on_rq = 0
      which appears to guarentee that the !se->on_rq condition is met.
      If the task has done set_current_state(TASK_INTERRUPTIBLE) without
      schedule() the second condition will be met and vruntime will be
      incorrectly adjusted twice.
      
      In certain cases this can result in the task's vruntime never increasing
      past the vruntime of other tasks on the CFS' run queue, starving them of
      CPU time.
      
      This patch changes switched_from_fair() to use !p->on_rq instead of
      !se->on_rq.
      
      I'm able to cause a task with a priority of 120 to starve all other
      tasks with the same priority on an ARM platform running 3.2.51-rt72
      PREEMPT RT by writing one character at time to a serial tty (16550 UART)
      in a tight loop. I'm also able to verify making this change corrects the
      problem on that platform and kernel version.
      Signed-off-by: NGeorge McCollister <george.mccollister@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1392767811-28916-1-git-send-email-george.mccollister@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      791c9e02
    • C
      genirq: Remove racy waitqueue_active check · c685689f
      Chuansheng Liu 提交于
      We hit one rare case below:
      
      T1 calling disable_irq(), but hanging at synchronize_irq()
      always;
      The corresponding irq thread is in sleeping state;
      And all CPUs are in idle state;
      
      After analysis, we found there is one possible scenerio which
      causes T1 is waiting there forever:
      CPU0                                       CPU1
       synchronize_irq()
        wait_event()
          spin_lock()
                                                 atomic_dec_and_test(&threads_active)
            insert the __wait into queue
          spin_unlock()
                                                 if(waitqueue_active)
          atomic_read(&threads_active)
                                                   wake_up()
      
      Here after inserted the __wait into queue on CPU0, and before
      test if queue is empty on CPU1, there is no barrier, it maybe
      cause it is not visible for CPU1 immediately, although CPU0 has
      updated the queue list.
      It is similar for CPU0 atomic_read() threads_active also.
      
      So we'd need one smp_mb() before waitqueue_active.that, but removing
      the waitqueue_active() check solves it as wel l and it makes
      things simple and clear.
      Signed-off-by: NChuansheng Liu <chuansheng.liu@intel.com>
      Cc: Xiaoming Wang <xiaoming.wang@intel.com>
      Link: http://lkml.kernel.org/r/1393212590-32543-1-git-send-email-chuansheng.liu@intel.com
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      c685689f
  4. 22 2月, 2014 9 次提交
  5. 21 2月, 2014 1 次提交
  6. 20 2月, 2014 1 次提交
    • S
      sched_clock: Prevent callers from seeing half-updated data · 5ae8aabe
      Stephen Boyd 提交于
      The generic sched_clock registration function was previously
      done lockless, due to the fact that it was expected to be called
      only once. However, now there are systems that may register
      multiple sched_clock sources, for which the lack of locking has
      casued problems:
      
      If two sched_clock sources are registered we may end up in a
      situation where a call to sched_clock() may be accessing the
      epoch cycle count for the old counter and the cycle count for the
      new counter. This can lead to confusing results where
      sched_clock() values jump and then are reset to 0 (due to the way
      the registration function forces the epoch_ns to be 0).
      
      Fix this by reorganizing the registration function to hold the
      seqlock for as short a time as possible while we update the
      clock_data structure for a new counter. We also put any
      accumulated time into epoch_ns instead of resetting the time to
      0 so that the clock doesn't reset after each successful
      registration.
      
      [jstultz: Added extra context to the commit message]
      Reported-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Josh Cartwright <joshc@codeaurora.org>
      Link: http://lkml.kernel.org/r/1392662736-7803-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5ae8aabe
  7. 19 2月, 2014 2 次提交
    • T
      cgroup: update cgroup_enable_task_cg_lists() to grab siglock · 532de3fc
      Tejun Heo 提交于
      Currently, there's nothing preventing cgroup_enable_task_cg_lists()
      from missing set PF_EXITING and race against cgroup_exit().  Depending
      on the timing, cgroup_exit() may finish with the task still linked on
      css_set leading to list corruption.  Fix it by grabbing siglock in
      cgroup_enable_task_cg_lists() so that PF_EXITING is guaranteed to be
      visible.
      
      This whole on-demand cg_list optimization is extremely fragile and has
      ample possibility to lead to bugs which can cause things like
      once-a-year oops during boot.  I'm wondering whether the better
      approach would be just adding "cgroup_disable=all" handling which
      disables the whole cgroup rather than tempting fate with this
      on-demand craziness.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: stable@vger.kernel.org
      532de3fc
    • L
      workqueue: ensure @task is valid across kthread_stop() · 5bdfff96
      Lai Jiangshan 提交于
      When a kworker should die, the kworkre is notified through WORKER_DIE
      flag instead of kthread_should_stop().  This, IIRC, is primarily to
      keep the test synchronized inside worker_pool lock.  WORKER_DIE is
      first set while holding pool->lock, the lock is dropped and
      kthread_stop() is called.
      
      Unfortunately, this means that there's a slight chance that the target
      kworker may see WORKER_DIE before kthread_stop() finishes and exits
      and frees the target task before or during kthread_stop().
      
      Fix it by pinning the target task before setting WORKER_DIE and
      putting it after kthread_stop() is done.
      
      tj: Improved patch description and comment.  Moved pinning above
          WORKER_DIE for better signify what it's protecting.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      5bdfff96
  8. 18 2月, 2014 2 次提交
    • J
      inotify: Fix reporting of cookies for inotify events · 45a22f4c
      Jan Kara 提交于
      My rework of handling of notification events (namely commit 7053aee2
      "fsnotify: do not share events between notification groups") broke
      sending of cookies with inotify events. We didn't propagate the value
      passed to fsnotify() properly and passed 4 uninitialized bytes to
      userspace instead (so it is also an information leak). Sadly I didn't
      notice this during my testing because inotify cookies aren't used very
      much and LTP inotify tests ignore them.
      
      Fix the problem by passing the cookie value properly.
      
      Fixes: 7053aee2Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      45a22f4c
    • L
      printk: fix syslog() overflowing user buffer · e4178d80
      Linus Torvalds 提交于
      This is not a buffer overflow in the traditional sense: we don't
      overflow any *kernel* buffers, but we do mis-count the amount of data we
      copy back to user space for the SYSLOG_ACTION_READ_ALL case.
      
      In particular, if the user buffer is too small to hold everything, and
      *if* there is a continuation line at just the right place, we can end up
      giving the user more data than he asked for.
      
      The reason is that we first count up the number of bytes all the log
      records contains, then we walk the records again until we've skipped the
      records at the beginning that won't fit, and then we walk the rest of
      the records and copy them to the user space buffer.
      
      And in between that "skip the initial records that won't fit" and the
      "copy the records that *will* fit to user space", we reset the 'prev'
      variable that contained the record information for the last record not
      copied.  That meant that when we started copying to user space, we now
      had a different character count than what we had originally calculated
      in the first record walk-through.
      
      The fix is to simply not clear the 'prev' flags value (in both cases
      where we had the same logic: syslog_print_all and kmsg_dump_get_buffer:
      the latter is used for pstore-like dumping)
      Reported-and-tested-by: NDebabrata Banerjee <dbanerje@akamai.com>
      Acked-by: NKay Sievers <kay@vrfy.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e4178d80
  9. 14 2月, 2014 1 次提交
    • T
      tick: Clear broadcast pending bit when switching to oneshot · dd5fd9b9
      Thomas Gleixner 提交于
      AMD systems which use the C1E workaround in the amd_e400_idle routine
      trigger the WARN_ON_ONCE in the broadcast code when onlining a CPU.
      
      The reason is that the idle routine of those AMD systems switches the
      cpu into forced broadcast mode early on before the newly brought up
      CPU can switch over to high resolution / NOHZ mode. The timer related
      CPU1 bringup looks like this:
      
        clockevent_register_device(local_apic);
        tick_setup(local_apic);
        ...
        idle()
      	tick_broadcast_on_off(FORCE);
      	tick_broadcast_oneshot_control(ENTER)
      	  cpumask_set(cpu, broadcast_oneshot_mask);
      	halt();
      
      Now the broadcast interrupt on CPU0 sets CPU1 in the
      broadcast_pending_mask and wakes CPU1. So CPU1 continues:
      
      	local_apic_timer_interrupt()
      	   tick_handle_periodic();
      	   softirq()
      	     tick_init_highres();
      	       cpumask_clr(cpu, broadcast_oneshot_mask);
      	
      	tick_broadcast_oneshot_control(ENTER)
      	   WARN_ON(cpumask_test(cpu, broadcast_pending_mask);
      
      So while we remove CPU1 from the broadcast_oneshot_mask when we switch
      over to highres mode, we do not clear the pending bit, which then
      triggers the warning when we go back to idle.
      
      The reason why this is only visible on C1E affected AMD systems is
      that the other machines enter the deep sleep states via
      acpi_idle/intel_idle and exit the broadcast mode before executing the
      remote triggered local_apic_timer_interrupt. So the pending bit is
      already cleared when the switch over to highres mode is clearing the
      oneshot mask.
      
      The solution is simple: Clear the pending bit together with the mask
      bit when we switch over to highres mode.
      
      Stanislaw came up independently with the same patch by enforcing the
      C1E workaround and debugging the fallout. I picked mine, because mine
      has a changelog :)
      Reported-by: Npoma <pomidorabelisima@gmail.com>
      Debugged-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Olaf Hering <olaf@aepfle.de>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Justin M. Forbes <jforbes@redhat.com>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402111434180.21991@ionos.tec.linutronix.de
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      dd5fd9b9
  10. 13 2月, 2014 1 次提交
    • T
      Revert "cgroup: use an ordered workqueue for cgroup destruction" · 1a11533f
      Tejun Heo 提交于
      This reverts commit ab3f5faa.
      Explanation from Hugh:
      
        It's because more thorough testing, by others here, found that it
        wasn't always solving the problem: so I asked Tejun privately to
        hold off from sending it in, until we'd worked out why not.
      
        Most of our testing being on a v3,11-based kernel, it was perfectly
        possible that the problem was merely our own e.g. missing Tejun's
        8a2b7538 ("workqueue: fix ordered workqueues in NUMA setups").
      
        But that turned out not to be enough to fix it either. Then Filipe
        pointed out how percpu_ref_kill_and_confirm() uses call_rcu_sched()
        before we ever get to put the offline on to the workqueue: by the
        time we get to the workqueue, the ordering has already been lost.
      
        So, thanks for the Acks, but I'm afraid that this ordered workqueue
        solution is just not good enough: we should simply forget that patch
        and provide a different answer."
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      1a11533f
  11. 12 2月, 2014 1 次提交
    • S
      ring-buffer: Fix first commit on sub-buffer having non-zero delta · d651aa1d
      Steven Rostedt (Red Hat) 提交于
      Each sub-buffer (buffer page) has a full 64 bit timestamp. The events on
      that page use a 27 bit delta against that timestamp in order to save on
      bits written to the ring buffer. If the time between events is larger than
      what the 27 bits can hold, a "time extend" event is added to hold the
      entire 64 bit timestamp again and the events after that hold a delta from
      that timestamp.
      
      As a "time extend" is always paired with an event, it is logical to just
      allocate the event with the time extend, to make things a bit more efficient.
      
      Unfortunately, when the pairing code was written, it removed the "delta = 0"
      from the first commit on a page, causing the events on the page to be
      slightly skewed.
      
      Fixes: 69d1b839 "ring-buffer: Bind time extend and data events together"
      Cc: stable@vger.kernel.org # 2.6.37+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d651aa1d
  12. 11 2月, 2014 2 次提交
    • L
      cgroup: protect modifications to cgroup_idr with cgroup_mutex · 0ab02ca8
      Li Zefan 提交于
      Setup cgroupfs like this:
        # mount -t cgroup -o cpuacct xxx /cgroup
        # mkdir /cgroup/sub1
        # mkdir /cgroup/sub2
      
      Then run these two commands:
        # for ((; ;)) { mkdir /cgroup/sub1/tmp && rmdir /mnt/sub1/tmp; } &
        # for ((; ;)) { mkdir /cgroup/sub2/tmp && rmdir /mnt/sub2/tmp; } &
      
      After seconds you may see this warning:
      
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 25243 at lib/idr.c:527 sub_remove+0x87/0x1b0()
      idr_remove called for id=6 which is not allocated.
      ...
      Call Trace:
       [<ffffffff8156063c>] dump_stack+0x7a/0x96
       [<ffffffff810591ac>] warn_slowpath_common+0x8c/0xc0
       [<ffffffff81059296>] warn_slowpath_fmt+0x46/0x50
       [<ffffffff81300aa7>] sub_remove+0x87/0x1b0
       [<ffffffff810f3f02>] ? css_killed_work_fn+0x32/0x1b0
       [<ffffffff81300bf5>] idr_remove+0x25/0xd0
       [<ffffffff810f2bab>] cgroup_destroy_css_killed+0x5b/0xc0
       [<ffffffff810f4000>] css_killed_work_fn+0x130/0x1b0
       [<ffffffff8107cdbc>] process_one_work+0x26c/0x550
       [<ffffffff8107eefe>] worker_thread+0x12e/0x3b0
       [<ffffffff81085f96>] kthread+0xe6/0xf0
       [<ffffffff81570bac>] ret_from_fork+0x7c/0xb0
      ---[ end trace 2d1577ec10cf80d0 ]---
      
      It's because allocating/removing cgroup ID is not properly synchronized.
      
      The bug was introduced when we converted cgroup_ida to cgroup_idr.
      While synchronization is already done inside ida_simple_{get,remove}(),
      users are responsible for concurrent calls to idr_{alloc,remove}().
      
      tj: Refreshed on top of b58c8998 ("cgroup: fix error return from
      cgroup_create()").
      
      Fixes: 4e96ee8e ("cgroup: convert cgroup_ida to cgroup_idr")
      Cc: <stable@vger.kernel.org> #3.12+
      Reported-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      0ab02ca8
    • P
      genirq: Add missing irq_to_desc export for CONFIG_SPARSE_IRQ=n · 2c45aada
      Paul Gortmaker 提交于
      In allmodconfig builds for sparc and any other arch which does
      not set CONFIG_SPARSE_IRQ, the following will be seen at modpost:
      
        CC [M]  lib/cpu-notifier-error-inject.o
        CC [M]  lib/pm-notifier-error-inject.o
      ERROR: "irq_to_desc" [drivers/gpio/gpio-mcp23s08.ko] undefined!
      make[2]: *** [__modpost] Error 1
      
      This happens because commit 3911ff30 ("genirq: export
      handle_edge_irq() and irq_to_desc()") added one export for it, but
      there were actually two instances of it, in an if/else clause for
      CONFIG_SPARSE_IRQ.  Add the second one.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: stable@vger.kernel.org	# 3.4+
      Link: http://lkml.kernel.org/r/1392057610-11514-1-git-send-email-paul.gortmaker@windriver.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      2c45aada
  13. 09 2月, 2014 1 次提交
  14. 08 2月, 2014 3 次提交
    • T
      cgroup: fix locking in cgroup_cfts_commit() · 48573a89
      Tejun Heo 提交于
      cgroup_cfts_commit() walks the cgroup hierarchy that the target
      subsystem is attached to and tries to apply the file changes.  Due to
      the convolution with inode locking, it can't keep cgroup_mutex locked
      while iterating.  It currently holds only RCU read lock around the
      actual iteration and then pins the found cgroup using dget().
      
      Unfortunately, this is incorrect.  Although the iteration does check
      cgroup_is_dead() before invoking dget(), there's nothing which
      prevents the dentry from going away inbetween.  Note that this is
      different from the usual css iterations where css_tryget() is used to
      pin the css - css_tryget() tests whether the css can be pinned and
      fails if not.
      
      The problem can be solved by simply holding cgroup_mutex instead of
      RCU read lock around the iteration, which actually reduces LOC.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: stable@vger.kernel.org
      48573a89
    • T
      cgroup: fix error return from cgroup_create() · b58c8998
      Tejun Heo 提交于
      cgroup_create() was returning 0 after allocation failures.  Fix it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: stable@vger.kernel.org
      b58c8998
    • T
      cgroup: fix error return value in cgroup_mount() · eb46bf89
      Tejun Heo 提交于
      When cgroup_mount() fails to allocate an id for the root, it didn't
      set ret before jumping to unlock_drop ending up returning 0 after a
      failure.  Fix it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: stable@vger.kernel.org
      eb46bf89
  15. 07 2月, 2014 1 次提交
    • H
      cgroup: use an ordered workqueue for cgroup destruction · ab3f5faa
      Hugh Dickins 提交于
      Sometimes the cleanup after memcg hierarchy testing gets stuck in
      mem_cgroup_reparent_charges(), unable to bring non-kmem usage down to 0.
      
      There may turn out to be several causes, but a major cause is this: the
      workitem to offline parent can get run before workitem to offline child;
      parent's mem_cgroup_reparent_charges() circles around waiting for the
      child's pages to be reparented to its lrus, but it's holding cgroup_mutex
      which prevents the child from reaching its mem_cgroup_reparent_charges().
      
      Just use an ordered workqueue for cgroup_destroy_wq.
      
      tj: Committing as the temporary fix until the reverse dependency can
          be removed from memcg.  Comment updated accordingly.
      
      Fixes: e5fca243 ("cgroup: use a dedicated workqueue for cgroup destruction")
      Suggested-by: NFilipe Brandenburger <filbranden@google.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NTejun Heo <tj@kernel.org>
      ab3f5faa
  16. 06 2月, 2014 2 次提交
    • M
      time: Fix overflow when HZ is smaller than 60 · 80d767d7
      Mikulas Patocka 提交于
      When compiling for the IA-64 ski emulator, HZ is set to 32 because the
      emulation is slow and we don't want to waste too many cycles processing
      timers. Alpha also has an option to set HZ to 32.
      
      This causes integer underflow in
      kernel/time/jiffies.c:
      kernel/time/jiffies.c:66:2: warning: large integer implicitly truncated to unsigned type [-Woverflow]
        .mult  = NSEC_PER_JIFFY << JIFFIES_SHIFT, /* details above */
        ^
      
      This patch reduces the JIFFIES_SHIFT value to avoid the overflow.
      Signed-off-by: NMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1401241639100.23871@file01.intranet.prod.int.rdu2.redhat.com
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      80d767d7
    • L
      execve: use 'struct filename *' for executable name passing · c4ad8f98
      Linus Torvalds 提交于
      This changes 'do_execve()' to get the executable name as a 'struct
      filename', and to free it when it is done.  This is what the normal
      users want, and it simplifies and streamlines their error handling.
      
      The controlled lifetime of the executable name also fixes a
      use-after-free problem with the trace_sched_process_exec tracepoint: the
      lifetime of the passed-in string for kernel users was not at all
      obvious, and the user-mode helper code used UMH_WAIT_EXEC to serialize
      the pathname allocation lifetime with the execve() having finished,
      which in turn meant that the trace point that happened after
      mm_release() of the old process VM ended up using already free'd memory.
      
      To solve the kernel string lifetime issue, this simply introduces
      "getname_kernel()" that works like the normal user-space getname()
      function, except with the source coming from kernel memory.
      
      As Oleg points out, this also means that we could drop the tcomm[] array
      from 'struct linux_binprm', since the pathname lifetime now covers
      setup_new_exec().  That would be a separate cleanup.
      Reported-by: NIgor Zhbanov <i.zhbanov@samsung.com>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4ad8f98