1. 22 3月, 2020 1 次提交
  2. 30 1月, 2020 2 次提交
  3. 05 12月, 2019 2 次提交
  4. 21 8月, 2019 1 次提交
  5. 19 7月, 2019 1 次提交
  6. 17 7月, 2019 1 次提交
  7. 29 6月, 2019 1 次提交
    • O
      signal: remove the wrong signal_pending() check in restore_user_sigmask() · 97abc889
      Oleg Nesterov 提交于
      This is the minimal fix for stable, I'll send cleanups later.
      
      Commit 854a6ed5 ("signal: Add restore_user_sigmask()") introduced
      the visible change which breaks user-space: a signal temporary unblocked
      by set_user_sigmask() can be delivered even if the caller returns
      success or timeout.
      
      Change restore_user_sigmask() to accept the additional "interrupted"
      argument which should be used instead of signal_pending() check, and
      update the callers.
      
      Eric said:
      
      : For clarity.  I don't think this is required by posix, or fundamentally to
      : remove the races in select.  It is what linux has always done and we have
      : applications who care so I agree this fix is needed.
      :
      : Further in any case where the semantic change that this patch rolls back
      : (aka where allowing a signal to be delivered and the select like call to
      : complete) would be advantage we can do as well if not better by using
      : signalfd.
      :
      : Michael is there any chance we can get this guarantee of the linux
      : implementation of pselect and friends clearly documented.  The guarantee
      : that if the system call completes successfully we are guaranteed that no
      : signal that is unblocked by using sigmask will be delivered?
      
      Link: http://lkml.kernel.org/r/20190604134117.GA29963@redhat.com
      Fixes: 854a6ed5 ("signal: Add restore_user_sigmask()")
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reported-by: NEric Wong <e@80x24.org>
      Tested-by: NEric Wong <e@80x24.org>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: <stable@vger.kernel.org>	[5.0+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97abc889
  8. 31 5月, 2019 1 次提交
  9. 08 3月, 2019 3 次提交
    • R
      epoll: use rwlock in order to reduce ep_poll_callback() contention · a218cc49
      Roman Penyaev 提交于
      The goal of this patch is to reduce contention of ep_poll_callback()
      which can be called concurrently from different CPUs in case of high
      events rates and many fds per epoll.  Problem can be very well
      reproduced by generating events (write to pipe or eventfd) from many
      threads, while consumer thread does polling.  In other words this patch
      increases the bandwidth of events which can be delivered from sources to
      the poller by adding poll items in a lockless way to the list.
      
      The main change is in replacement of the spinlock with a rwlock, which
      is taken on read in ep_poll_callback(), and then by adding poll items to
      the tail of the list using xchg atomic instruction.  Write lock is taken
      everywhere else in order to stop list modifications and guarantee that
      list updates are fully completed (I assume that write side of a rwlock
      does not starve, it seems qrwlock implementation has these guarantees).
      
      The following are some microbenchmark results based on the test [1]
      which starts threads which generate N events each.  The test ends when
      all events are successfully fetched by the poller thread:
      
       spinlock
       ========
      
       threads  events/ms  run-time ms
             8       6402        12495
            16       7045        22709
            32       7395        43268
      
       rwlock + xchg
       =============
      
       threads  events/ms  run-time ms
             8      10038         7969
            16      12178        13138
            32      13223        24199
      
      According to the results bandwidth of delivered events is significantly
      increased, thus execution time is reduced.
      
      This patch was tested with different sort of microbenchmarks and
      artificial delays (e.g.  "udelay(get_random_int() & 0xff)") introduced
      in kernel on paths where items are added to lists.
      
      [1] https://github.com/rouming/test-tools/blob/master/stress-epoll.c
      
      Link: http://lkml.kernel.org/r/20190103150104.17128-5-rpenyaev@suse.deSigned-off-by: NRoman Penyaev <rpenyaev@suse.de>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a218cc49
    • R
      epoll: unify awaking of wakeup source on ep_poll_callback() path · c3e320b6
      Roman Penyaev 提交于
      Original comment "Activate ep->ws since epi->ws may get deactivated at
      any time" indeed sounds loud, but it is incorrect, because the path
      where we check epi->ws is a path where insert to ovflist happens, i.e.
      ep_scan_ready_list() has taken ep->mtx and waits for this callback to
      finish, thus ep_modify() (which unregisters wakeup source) waits for
      ep_scan_ready_list().
      
      Here in this patch I simply call ep_pm_stay_awake_rcu(), which is a bit
      extra for this path (indirectly protected by main ep->mtx, so even rcu
      is not needed), but I do not want to create another naked
      __ep_pm_stay_awake() variant only for this particular case, so rcu variant
      is just better for all the cases.
      
      Link: http://lkml.kernel.org/r/20190103150104.17128-4-rpenyaev@suse.deSigned-off-by: NRoman Penyaev <rpenyaev@suse.de>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c3e320b6
    • R
      epoll: make sure all elements in ready list are in FIFO order · c141175d
      Roman Penyaev 提交于
      Patch series "use rwlock in order to reduce ep_poll_callback()
      contention", v3.
      
      The last patch targets the contention problem in ep_poll_callback(),
      which can be very well reproduced by generating events (write to pipe or
      eventfd) from many threads, while consumer thread does polling.
      
      The following are some microbenchmark results based on the test [1]
      which starts threads which generate N events each.  The test ends when
      all events are successfully fetched by the poller thread:
      
       spinlock
       ========
      
       threads  events/ms  run-time ms
             8       6402        12495
            16       7045        22709
            32       7395        43268
      
       rwlock + xchg
       =============
      
       threads  events/ms  run-time ms
             8      10038         7969
            16      12178        13138
            32      13223        24199
      
      According to the results bandwidth of delivered events is significantly
      increased, thus execution time is reduced.
      
      This patch (of 4):
      
      All coming events are stored in FIFO order and this is also should be
      applicable to ->ovflist, which originally is stack, i.e.  LIFO.
      
      Thus to keep correct FIFO order ->ovflist should reversed by adding
      elements to the head of the read list but not to the tail.
      
      Link: http://lkml.kernel.org/r/20190103150104.17128-2-rpenyaev@suse.deSigned-off-by: NRoman Penyaev <rpenyaev@suse.de>
      Reviewed-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c141175d
  10. 05 1月, 2019 8 次提交
  11. 04 1月, 2019 1 次提交
    • L
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds 提交于
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csky still had the old "verify_area()" name as an alias.
      
       - the iter_iov code had magical hardcoded knowledge of the actual
         values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
         really used it)
      
       - microblaze used the type argument for a debug printout
      
      but other than those oddities this should be a total no-op patch.
      
      I tried to fix up all architectures, did fairly extensive grepping for
      access_ok() uses, and the changes are trivial, but I may have missed
      something.  Any missed conversion should be trivially fixable, though.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96d4f267
  12. 07 12月, 2018 2 次提交
    • D
      signal: Add restore_user_sigmask() · 854a6ed5
      Deepa Dinamani 提交于
      Refactor the logic to restore the sigmask before the syscall
      returns into an api.
      This is useful for versions of syscalls that pass in the
      sigmask and expect the current->sigmask to be changed during
      the execution and restored after the execution of the syscall.
      
      With the advent of new y2038 syscalls in the subsequent patches,
      we add two more new versions of the syscalls (for pselect, ppoll
      and io_pgetevents) in addition to the existing native and compat
      versions. Adding such an api reduces the logic that would need to
      be replicated otherwise.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      854a6ed5
    • D
      signal: Add set_user_sigmask() · ded653cc
      Deepa Dinamani 提交于
      Refactor reading sigset from userspace and updating sigmask
      into an api.
      
      This is useful for versions of syscalls that pass in the
      sigmask and expect the current->sigmask to be changed during,
      and restored after, the execution of the syscall.
      
      With the advent of new y2038 syscalls in the subsequent patches,
      we add two more new versions of the syscalls (for pselect, ppoll,
      and io_pgetevents) in addition to the existing native and compat
      versions. Adding such an api reduces the logic that would need to
      be replicated otherwise.
      
      Note that the calls to sigprocmask() ignored the return value
      from the api as the function only returns an error on an invalid
      first argument that is hardcoded at these call sites.
      The updated logic uses set_current_blocked() instead.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      ded653cc
  13. 23 8月, 2018 7 次提交
  14. 29 6月, 2018 1 次提交
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  15. 15 6月, 2018 1 次提交
  16. 26 5月, 2018 1 次提交
  17. 03 4月, 2018 1 次提交
  18. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  19. 02 2月, 2018 2 次提交
  20. 29 11月, 2017 2 次提交