1. 04 4月, 2017 1 次提交
    • X
      rtmutex: Deboost before waking up the top waiter · 2a1c6029
      Xunlei Pang 提交于
      We should deboost before waking the high-priority task, such that we
      don't run two tasks with the same "state" (priority, deadline,
      sched_class, etc).
      
      In order to make sure the boosting task doesn't start running between
      unlock and deboost (due to 'spurious' wakeup), we move the deboost
      under the wait_lock, that way its serialized against the wait loop in
      __rt_mutex_slowlock().
      
      Doing the deboost early can however lead to priority-inversion if
      current would get preempted after the deboost but before waking our
      high-prio task, hence we disable preemption before doing deboost, and
      enabling it after the wake up is over.
      
      This gets us the right semantic order, but most importantly however;
      this change ensures pointer stability for the next patch, where we
      have rt_mutex_setprio() cache a pointer to the top-most waiter task.
      If we, as before this change, do the wakeup first and then deboost,
      this pointer might point into thin air.
      
      [peterz: Changelog + patch munging]
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NXunlei Pang <xlpang@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170323150216.110065320@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      2a1c6029
  2. 24 3月, 2017 12 次提交
    • P
      futex: Drop hb->lock before enqueueing on the rtmutex · 56222b21
      Peter Zijlstra 提交于
      When PREEMPT_RT_FULL does the spinlock -> rt_mutex substitution the PI
      chain code will (falsely) report a deadlock and BUG.
      
      The problem is that it hold hb->lock (now an rt_mutex) while doing
      task_blocks_on_rt_mutex on the futex's pi_state::rtmutex. This, when
      interleaved just right with futex_unlock_pi() leads it to believe to see an
      AB-BA deadlock.
      
        Task1 (holds rt_mutex,	Task2 (does FUTEX_LOCK_PI)
               does FUTEX_UNLOCK_PI)
      
      				lock hb->lock
      				lock rt_mutex (as per start_proxy)
        lock hb->lock
      
      Which is a trivial AB-BA.
      
      It is not an actual deadlock, because it won't be holding hb->lock by the
      time it actually blocks on the rt_mutex, but the chainwalk code doesn't
      know that and it would be a nightmare to handle this gracefully.
      
      To avoid this problem, do the same as in futex_unlock_pi() and drop
      hb->lock after acquiring wait_lock. This still fully serializes against
      futex_unlock_pi(), since adding to the wait_list does the very same lock
      dance, and removing it holds both locks.
      
      Aside of solving the RT problem this makes the lock and unlock mechanism
      symetric and reduces the hb->lock held time.
      Reported-and-tested-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104152.161341537@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      56222b21
    • P
      futex: Futex_unlock_pi() determinism · bebe5b51
      Peter Zijlstra 提交于
      The problem with returning -EAGAIN when the waiter state mismatches is that
      it becomes very hard to proof a bounded execution time on the
      operation. And seeing that this is a RT operation, this is somewhat
      important.
      
      While in practise; given the previous patch; it will be very unlikely to
      ever really take more than one or two rounds, proving so becomes rather
      hard.
      
      However, now that modifying wait_list is done while holding both hb->lock
      and wait_lock, the scenario can be avoided entirely by acquiring wait_lock
      while still holding hb-lock. Doing a hand-over, without leaving a hole.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104152.112378812@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      bebe5b51
    • P
      futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock() · cfafcd11
      Peter Zijlstra 提交于
      By changing futex_lock_pi() to use rt_mutex_*_proxy_lock() all wait_list
      modifications are done under both hb->lock and wait_lock.
      
      This closes the obvious interleave pattern between futex_lock_pi() and
      futex_unlock_pi(), but not entirely so. See below:
      
      Before:
      
      futex_lock_pi()			futex_unlock_pi()
        unlock hb->lock
      
      				  lock hb->lock
      				  unlock hb->lock
      
      				  lock rt_mutex->wait_lock
      				  unlock rt_mutex_wait_lock
      				    -EAGAIN
      
        lock rt_mutex->wait_lock
        list_add
        unlock rt_mutex->wait_lock
      
        schedule()
      
        lock rt_mutex->wait_lock
        list_del
        unlock rt_mutex->wait_lock
      
      				  <idem>
      				    -EAGAIN
      
        lock hb->lock
      
      
      After:
      
      futex_lock_pi()			futex_unlock_pi()
      
        lock hb->lock
        lock rt_mutex->wait_lock
        list_add
        unlock rt_mutex->wait_lock
        unlock hb->lock
      
        schedule()
      				  lock hb->lock
      				  unlock hb->lock
        lock hb->lock
        lock rt_mutex->wait_lock
        list_del
        unlock rt_mutex->wait_lock
      
      				  lock rt_mutex->wait_lock
      				  unlock rt_mutex_wait_lock
      				    -EAGAIN
      
        unlock hb->lock
      
      
      It does however solve the earlier starvation/live-lock scenario which got
      introduced with the -EAGAIN since unlike the before scenario; where the
      -EAGAIN happens while futex_unlock_pi() doesn't hold any locks; in the
      after scenario it happens while futex_unlock_pi() actually holds a lock,
      and then it is serialized on that lock.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104152.062785528@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      cfafcd11
    • P
      futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock() · 38d589f2
      Peter Zijlstra 提交于
      With the ultimate goal of keeping rt_mutex wait_list and futex_q waiters
      consistent it's necessary to split 'rt_mutex_futex_lock()' into finer
      parts, such that only the actual blocking can be done without hb->lock
      held.
      
      Split split_mutex_finish_proxy_lock() into two parts, one that does the
      blocking and one that does remove_waiter() when the lock acquire failed.
      
      When the rtmutex was acquired successfully the waiter can be removed in the
      acquisiton path safely, since there is no concurrency on the lock owner.
      
      This means that, except for futex_lock_pi(), all wait_list modifications
      are done with both hb->lock and wait_lock held.
      
      [bigeasy@linutronix.de: fix for futex_requeue_pi_signal_restart]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104152.001659630@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      38d589f2
    • P
      futex,rt_mutex: Introduce rt_mutex_init_waiter() · 50809358
      Peter Zijlstra 提交于
      Since there's already two copies of this code, introduce a helper now
      before adding a third one.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.950039479@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      50809358
    • P
      futex: Pull rt_mutex_futex_unlock() out from under hb->lock · 16ffa12d
      Peter Zijlstra 提交于
      There's a number of 'interesting' problems, all caused by holding
      hb->lock while doing the rt_mutex_unlock() equivalient.
      
      Notably:
      
       - a PI inversion on hb->lock; and,
      
       - a SCHED_DEADLINE crash because of pointer instability.
      
      The previous changes:
      
       - changed the locking rules to cover {uval,pi_state} with wait_lock.
      
       - allow to do rt_mutex_futex_unlock() without dropping wait_lock; which in
         turn allows to rely on wait_lock atomicity completely.
      
       - simplified the waiter conundrum.
      
      It's now sufficient to hold rtmutex::wait_lock and a reference on the
      pi_state to protect the state consistency, so hb->lock can be dropped
      before calling rt_mutex_futex_unlock().
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.900002056@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      16ffa12d
    • P
      futex: Rework inconsistent rt_mutex/futex_q state · 73d786bd
      Peter Zijlstra 提交于
      There is a weird state in the futex_unlock_pi() path when it interleaves
      with a concurrent futex_lock_pi() at the point where it drops hb->lock.
      
      In this case, it can happen that the rt_mutex wait_list and the futex_q
      disagree on pending waiters, in particular rt_mutex will find no pending
      waiters where futex_q thinks there are. In this case the rt_mutex unlock
      code cannot assign an owner.
      
      The futex side fixup code has to cleanup the inconsistencies with quite a
      bunch of interesting corner cases.
      
      Simplify all this by changing wake_futex_pi() to return -EAGAIN when this
      situation occurs. This then gives the futex_lock_pi() code the opportunity
      to continue and the retried futex_unlock_pi() will now observe a coherent
      state.
      
      The only problem is that this breaks RT timeliness guarantees. That
      is, consider the following scenario:
      
        T1 and T2 are both pinned to CPU0. prio(T2) > prio(T1)
      
          CPU0
      
          T1
            lock_pi()
            queue_me()  <- Waiter is visible
      
          preemption
      
          T2
            unlock_pi()
      	loops with -EAGAIN forever
      
      Which is undesirable for PI primitives. Future patches will rectify
      this.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.850383690@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      73d786bd
    • P
      futex: Cleanup refcounting · bf92cf3a
      Peter Zijlstra 提交于
      Add a put_pit_state() as counterpart for get_pi_state() so the refcounting
      becomes consistent.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.801778516@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      bf92cf3a
    • P
      futex: Change locking rules · 734009e9
      Peter Zijlstra 提交于
      Currently futex-pi relies on hb->lock to serialize everything. But hb->lock
      creates another set of problems, especially priority inversions on RT where
      hb->lock becomes a rt_mutex itself.
      
      The rt_mutex::wait_lock is the most obvious protection for keeping the
      futex user space value and the kernel internal pi_state in sync.
      
      Rework and document the locking so rt_mutex::wait_lock is held accross all
      operations which modify the user space value and the pi state.
      
      This allows to invoke rt_mutex_unlock() (including deboost) without holding
      hb->lock as a next step.
      
      Nothing yet relies on the new locking rules.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.751993333@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      734009e9
    • P
      futex,rt_mutex: Provide futex specific rt_mutex API · 5293c2ef
      Peter Zijlstra 提交于
      Part of what makes futex_unlock_pi() intricate is that
      rt_mutex_futex_unlock() -> rt_mutex_slowunlock() can drop
      rt_mutex::wait_lock.
      
      This means it cannot rely on the atomicy of wait_lock, which would be
      preferred in order to not rely on hb->lock so much.
      
      The reason rt_mutex_slowunlock() needs to drop wait_lock is because it can
      race with the rt_mutex fastpath, however futexes have their own fast path.
      
      Since futexes already have a bunch of separate rt_mutex accessors, complete
      that set and implement a rt_mutex variant without fastpath for them.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.702962446@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5293c2ef
    • P
      futex: Use smp_store_release() in mark_wake_futex() · 1b367ece
      Peter Zijlstra 提交于
      Since the futex_q can dissapear the instruction after assigning NULL,
      this really should be a RELEASE barrier. That stops loads from hitting
      dead memory too.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.604296452@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      1b367ece
    • P
      futex: Cleanup variable names for futex_top_waiter() · 499f5aca
      Peter Zijlstra 提交于
      futex_top_waiter() returns the top-waiter on the pi_mutex. Assinging
      this to a variable 'match' totally obscures the code.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: xlpang@redhat.com
      Cc: rostedt@goodmis.org
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: dvhart@infradead.org
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170322104151.554710645@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      499f5aca
  3. 15 3月, 2017 2 次提交
  4. 02 3月, 2017 2 次提交
  5. 28 2月, 2017 1 次提交
  6. 13 2月, 2017 1 次提交
    • Y
      futex: Move futex_init() to core_initcall · 25f71d1c
      Yang Yang 提交于
      The UEVENT user mode helper is enabled before the initcalls are executed
      and is available when the root filesystem has been mounted.
      
      The user mode helper is triggered by device init calls and the executable
      might use the futex syscall.
      
      futex_init() is marked __initcall which maps to device_initcall, but there
      is no guarantee that futex_init() is invoked _before_ the first device init
      call which triggers the UEVENT user mode helper.
      
      If the user mode helper uses the futex syscall before futex_init() then the
      syscall crashes with a NULL pointer dereference because the futex subsystem
      has not been initialized yet.
      
      Move futex_init() to core_initcall so futexes are initialized before the
      root filesystem is mounted and the usermode helper becomes available.
      
      [ tglx: Rewrote changelog ]
      Signed-off-by: NYang Yang <yang.yang29@zte.com.cn>
      Cc: jiang.biao2@zte.com.cn
      Cc: jiang.zhengxiong@zte.com.cn
      Cc: zhong.weidong@zte.com.cn
      Cc: deng.huali@zte.com.cn
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cnSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      25f71d1c
  7. 26 12月, 2016 1 次提交
    • T
      ktime: Get rid of the union · 2456e855
      Thomas Gleixner 提交于
      ktime is a union because the initial implementation stored the time in
      scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
      variant for 32bit machines. The Y2038 cleanup removed the timespec variant
      and switched everything to scalar nanoseconds. The union remained, but
      become completely pointless.
      
      Get rid of the union and just keep ktime_t as simple typedef of type s64.
      
      The conversion was done with coccinelle and some manual mopping up.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      2456e855
  8. 21 11月, 2016 1 次提交
  9. 05 9月, 2016 1 次提交
  10. 30 7月, 2016 1 次提交
  11. 09 6月, 2016 1 次提交
    • M
      futex: Calculate the futex key based on a tail page for file-based futexes · 077fa7ae
      Mel Gorman 提交于
      Mike Galbraith reported that the LTP test case futex_wake04 was broken
      by commit 65d8fc77 ("futex: Remove requirement for lock_page()
      in get_futex_key()").
      
      This test case uses futexes backed by hugetlbfs pages and so there is an
      associated inode with a futex stored on such pages. The problem is that
      the key is being calculated based on the head page index of the hugetlbfs
      page and not the tail page.
      
      Prior to the optimisation, the page lock was used to stabilise mappings and
      pin the inode is file-backed which is overkill. If the page was a compound
      page, the head page was automatically looked up as part of the page lock
      operation but the tail page index was used to calculate the futex key.
      
      After the optimisation, the compound head is looked up early and the page
      lock is only relied upon to identify truncated pages, special pages or a
      shmem page moving to swapcache. The head page is looked up because without
      the page lock, special care has to be taken to pin the inode correctly.
      However, the tail page is still required to calculate the futex key so
      this patch records the tail page.
      
      On vanilla 4.6, the output of the test case is;
      
      futex_wake04    0  TINFO  :  Hugepagesize 2097152
      futex_wake04    1  TFAIL  :  futex_wake04.c:126: Bug: wait_thread2 did not wake after 30 secs.
      
      With the patch applied
      
      futex_wake04    0  TINFO  :  Hugepagesize 2097152
      futex_wake04    1  TPASS  :  Hi hydra, thread2 awake!
      
      Fixes: 65d8fc77 "futex: Remove requirement for lock_page() in get_futex_key()"
      Reported-and-tested-by: NMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NDavidlohr Bueso <dave@stgolabs.net>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160608132522.GM2469@suse.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      077fa7ae
  12. 23 5月, 2016 1 次提交
    • L
      x86: remove more uaccess_32.h complexity · bd28b145
      Linus Torvalds 提交于
      I'm looking at trying to possibly merge the 32-bit and 64-bit versions
      of the x86 uaccess.h implementation, but first this needs to be cleaned
      up.
      
      For example, the 32-bit version of "__copy_from_user_inatomic()" is
      mostly the special cases for the constant size, and it's actually almost
      never relevant.  Most users aren't actually using a constant size
      anyway, and the few cases that do small constant copies are better off
      just using __get_user() instead.
      
      So get rid of the unnecessary complexity.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd28b145
  13. 21 4月, 2016 1 次提交
  14. 20 4月, 2016 1 次提交
  15. 09 3月, 2016 1 次提交
  16. 17 2月, 2016 2 次提交
    • M
      futex: Remove requirement for lock_page() in get_futex_key() · 65d8fc77
      Mel Gorman 提交于
      When dealing with key handling for shared futexes, we can drastically reduce
      the usage/need of the page lock. 1) For anonymous pages, the associated futex
      object is the mm_struct which does not require the page lock. 2) For inode
      based, keys, we can check under RCU read lock if the page mapping is still
      valid and take reference to the inode. This just leaves one rare race that
      requires the page lock in the slow path when examining the swapcache.
      
      Additionally realtime users currently have a problem with the page lock being
      contended for unbounded periods of time during futex operations.
      
      Task A
           get_futex_key()
           lock_page()
          ---> preempted
      
      Now any other task trying to lock that page will have to wait until
      task A gets scheduled back in, which is an unbound time.
      
      With this patch, we pretty much have a lockless futex_get_key().
      
      Experiments show that this patch can boost/speedup the hashing of shared
      futexes with the perf futex benchmarks (which is good for measuring such
      change) by up to 45% when there are high (> 100) thread counts on a 60 core
      Westmere. Lower counts are pretty much in the noise range or less than 10%,
      but mid range can be seen at over 30% overall throughput (hash ops/sec).
      This makes anon-mem shared futexes much closer to its private counterpart.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      [ Ported on top of thp refcount rework, changelog, comments, fixes. ]
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Chris Mason <clm@fb.com>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: dave@stgolabs.net
      Link: http://lkml.kernel.org/r/1455045314-8305-3-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      65d8fc77
    • D
      futex: Rename barrier references in ordering guarantees · 8ad7b378
      Davidlohr Bueso 提交于
      Ingo suggested we rename how we reference barriers A and B
      regarding futex ordering guarantees. This patch replaces,
      for both barriers, MB (A) with smp_mb(); (A), such that:
      
       - We explicitly state that the barriers are SMP, and
      
       - We standardize how we reference these across futex.c
         helping readers follow what barrier does what and where.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Chris Mason <clm@fb.com>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: dave@stgolabs.net
      Link: http://lkml.kernel.org/r/1455045314-8305-2-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8ad7b378
  17. 26 1月, 2016 1 次提交
    • T
      rtmutex: Make wait_lock irq safe · b4abf910
      Thomas Gleixner 提交于
      Sasha reported a lockdep splat about a potential deadlock between RCU boosting
      rtmutex and the posix timer it_lock.
      
      CPU0					CPU1
      
      rtmutex_lock(&rcu->rt_mutex)
        spin_lock(&rcu->rt_mutex.wait_lock)
      					local_irq_disable()
      					spin_lock(&timer->it_lock)
      					spin_lock(&rcu->mutex.wait_lock)
      --> Interrupt
          spin_lock(&timer->it_lock)
      
      This is caused by the following code sequence on CPU1
      
           rcu_read_lock()
           x = lookup();
           if (x)
           	spin_lock_irqsave(&x->it_lock);
           rcu_read_unlock();
           return x;
      
      We could fix that in the posix timer code by keeping rcu read locked across
      the spinlocked and irq disabled section, but the above sequence is common and
      there is no reason not to support it.
      
      Taking rt_mutex.wait_lock irq safe prevents the deadlock.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      b4abf910
  18. 21 1月, 2016 1 次提交
    • J
      ptrace: use fsuid, fsgid, effective creds for fs access checks · caaee623
      Jann Horn 提交于
      By checking the effective credentials instead of the real UID / permitted
      capabilities, ensure that the calling process actually intended to use its
      credentials.
      
      To ensure that all ptrace checks use the correct caller credentials (e.g.
      in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
      flag), use two new flags and require one of them to be set.
      
      The problem was that when a privileged task had temporarily dropped its
      privileges, e.g.  by calling setreuid(0, user_uid), with the intent to
      perform following syscalls with the credentials of a user, it still passed
      ptrace access checks that the user would not be able to pass.
      
      While an attacker should not be able to convince the privileged task to
      perform a ptrace() syscall, this is a problem because the ptrace access
      check is reused for things in procfs.
      
      In particular, the following somewhat interesting procfs entries only rely
      on ptrace access checks:
      
       /proc/$pid/stat - uses the check for determining whether pointers
           should be visible, useful for bypassing ASLR
       /proc/$pid/maps - also useful for bypassing ASLR
       /proc/$pid/cwd - useful for gaining access to restricted
           directories that contain files with lax permissions, e.g. in
           this scenario:
           lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
           drwx------ root root /root
           drwxr-xr-x root root /root/foobar
           -rw-r--r-- root root /root/foobar/secret
      
      Therefore, on a system where a root-owned mode 6755 binary changes its
      effective credentials as described and then dumps a user-specified file,
      this could be used by an attacker to reveal the memory layout of root's
      processes or reveal the contents of files he is not allowed to access
      (through /proc/$pid/cwd).
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: NJann Horn <jann@thejh.net>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      caaee623
  19. 16 1月, 2016 2 次提交
  20. 20 12月, 2015 6 次提交