1. 10 8月, 2009 1 次提交
  2. 28 4月, 2009 1 次提交
    • E
      net: Avoid extra wakeups of threads blocked in wait_for_packet() · bf368e4e
      Eric Dumazet 提交于
      In 2.6.25 we added UDP mem accounting.
      
      This unfortunatly added a penalty when a frame is transmitted, since
      we have at TX completion time to call sock_wfree() to perform necessary
      memory accounting. This calls sock_def_write_space() and utimately
      scheduler if any thread is waiting on the socket.
      Thread(s) waiting for an incoming frame was scheduled, then had to sleep
      again as event was meaningless.
      
      (All threads waiting on a socket are using same sk_sleep anchor)
      
      This adds lot of extra wakeups and increases latencies, as noted
      by Christoph Lameter, and slows down softirq handler.
      
      Reference : http://marc.info/?l=linux-netdev&m=124060437012283&w=2 
      
      Fortunatly, Davide Libenzi recently added concept of keyed wakeups
      into kernel, and particularly for sockets (see commit
      37e5540b 
      epoll keyed wakeups: make sockets use keyed wakeups)
      
      Davide goal was to optimize epoll, but this new wakeup infrastructure
      can help non epoll users as well, if they care to setup an appropriate
      handler.
      
      This patch introduces new DEFINE_WAIT_FUNC() helper and uses it
      in wait_for_packet(), so that only relevant event can wakeup a thread
      blocked in this function.
      
      Trace of function calls from bnx2 TX completion bnx2_poll_work() is :
      __kfree_skb()
       skb_release_head_state()
        sock_wfree()
         sock_def_write_space()
          __wake_up_sync_key()
           __wake_up_common()
            receiver_wake_function() : Stops here since thread is waiting for an INPUT
      Reported-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf368e4e
  3. 14 4月, 2009 1 次提交
    • J
      wait: don't use __wake_up_common() · 78ddb08f
      Johannes Weiner 提交于
      '777c6c5f wait: prevent exclusive waiter starvation' made
      __wake_up_common() global to be used from abort_exclusive_wait().
      
      It was needed to do a wake-up with the waitqueue lock held while
      passing down a key to the wake-up function.
      
      Since '4ede816a epoll keyed wakeups: add __wake_up_locked_key() and
      __wake_up_sync_key()' there is an appropriate wrapper for this case:
      __wake_up_locked_key().
      
      Use it here and make __wake_up_common() private to the scheduler
      again.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1239720785-19661-1-git-send-email-hannes@cmpxchg.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      78ddb08f
  4. 01 4月, 2009 2 次提交
    • D
      epoll keyed wakeups: introduce new *_poll() wakeup macros · c0da3775
      Davide Libenzi 提交于
      Introduce new wakeup macros that allow passing an event mask to the wakeup
      targets.  They exactly mimic their non-_poll() counterpart, with the added
      event mask passing capability.  I did add only the ones currently
      requested, avoiding the _nr() and _all() for the moment.
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      Cc: William Lee Irwin III <wli@movementarian.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c0da3775
    • D
      epoll keyed wakeups: add __wake_up_locked_key() and __wake_up_sync_key() · 4ede816a
      Davide Libenzi 提交于
      This patchset introduces wakeup hints for some of the most popular (from
      epoll POV) devices, so that epoll code can avoid spurious wakeups on its
      waiters.
      
      The problem with epoll is that the callback-based wakeups do not, ATM,
      carry any information about the events the wakeup is related to.  So the
      only choice epoll has (not being able to call f_op->poll() from inside the
      callback), is to add the file* to a ready-list and resolve the real events
      later on, at epoll_wait() (or its own f_op->poll()) time.  This can cause
      spurious wakeups, since the wake_up() itself might be for an event the
      caller is not interested into.
      
      The rate of these spurious wakeup can be pretty high in case of many
      network sockets being monitored.
      
      By allowing devices to report the events the wakeups refer to (at least
      the two major classes - POLLIN/POLLOUT), we are able to spare useless
      wakeups by proper handling inside the epoll's poll callback.
      
      Epoll will have in any case to call f_op->poll() on the file* later on,
      since the change to be done in order to have the full event set sent via
      wakeup, is too invasive for the way our f_op->poll() system works (the
      full event set is calculated inside the poll function - there are too many
      of them to even start thinking the change - also poll/select would need
      change too).
      
      Epoll is changed in a way that both devices which send event hints, and
      the ones that don't, are correctly handled.  The former will gain some
      efficiency though.
      
      As a general rule for devices, would be to add an event mask by using
      key-aware wakeup macros, when making up poll wait queues.  I tested it
      (together with the epoll's poll fix patch Andrew has in -mm) and wakeups
      for the supported devices are correctly filtered.
      
      Test program available here:
      
      http://www.xmailserver.org/epoll_test.c
      
      This patch:
      
      Nothing revolutionary here.  Just using the available "key" that our
      wakeup core already support.  The __wake_up_locked_key() was no brainer,
      since both __wake_up_locked() and __wake_up_locked_key() are thin wrappers
      around __wake_up_common().
      
      The __wake_up_sync() function had a body, so the choice was between
      borrowing the body for __wake_up_sync_key() and calling it from
      __wake_up_sync(), or make an inline and calling it from both.  I chose the
      former since in most archs it all resolves to "mov $0, REG; jmp ADDR".
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      Cc: William Lee Irwin III <wli@movementarian.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ede816a
  5. 06 2月, 2009 1 次提交
    • J
      wait: prevent exclusive waiter starvation · 777c6c5f
      Johannes Weiner 提交于
      With exclusive waiters, every process woken up through the wait queue must
      ensure that the next waiter down the line is woken when it has finished.
      
      Interruptible waiters don't do that when aborting due to a signal.  And if
      an aborting waiter is concurrently woken up through the waitqueue, noone
      will ever wake up the next waiter.
      
      This has been observed with __wait_on_bit_lock() used by
      lock_page_killable(): the first contender on the queue was aborting when
      the actual lock holder woke it up concurrently.  The aborted contender
      didn't acquire the lock and therefor never did an unlock followed by
      waking up the next waiter.
      
      Add abort_exclusive_wait() which removes the process' wait descriptor from
      the waitqueue, iff still queued, or wakes up the next waiter otherwise.
      It does so under the waitqueue lock.  Racing with a wake up means the
      aborting process is either already woken (removed from the queue) and will
      wake up the next waiter, or it will remove itself from the queue and the
      concurrent wake up will apply to the next waiter after it.
      
      Use abort_exclusive_wait() in __wait_event_interruptible_exclusive() and
      __wait_on_bit_lock() when they were interrupted by other means than a wake
      up through the queue.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Reported-by: NChris Mason <chris.mason@oracle.com>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Mentored-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Chuck Lever <cel@citi.umich.edu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>		["after some testing"]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      777c6c5f
  6. 17 10月, 2008 1 次提交
    • T
      wait: kill is_sync_wait() · a25d644f
      Tejun Heo 提交于
      is_sync_wait() is used to distinguish between sync and async waits.
      Basically sync waits are the ones initialized with init_waitqueue_entry()
      and async ones with init_waitqueue_func_entry().  The sync/async
      distinction is used only in prepare_to_wait[_exclusive]() and its only
      function is to skip setting the current task state if the wait is async.
      This has a few problems.
      
      * No one uses it.  None of func_entry users use prepare_to_wait()
        functions, so the code path never gets executed.
      
      * The distinction is bogus.  Maybe back when func_entry is used only
        by aio but it's now also used by epoll and in future possibly by 9p
        and poll/select.
      
      * Taking @state as argument and ignoring it silenly depending on how
        @wait is initialized is just a bad error-prone API.
      
      * It prevents func_entry waits from using wait->private for no good
        reason.
      
      This patch kills is_sync_wait() and the associated code paths from
      prepare_to_wait[_exclusive]().  As there was no user of these code paths,
      this patch doesn't cause any behavior difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a25d644f
  7. 14 2月, 2008 1 次提交
  8. 06 2月, 2008 1 次提交
    • P
      lockdep: annotate epoll · 0ccf831c
      Peter Zijlstra 提交于
      On Sat, 2008-01-05 at 13:35 -0800, Davide Libenzi wrote:
      
      > I remember I talked with Arjan about this time ago. Basically, since 1)
      > you can drop an epoll fd inside another epoll fd 2) callback-based wakeups
      > are used, you can see a wake_up() from inside another wake_up(), but they
      > will never refer to the same lock instance.
      > Think about:
      >
      > 	dfd = socket(...);
      > 	efd1 = epoll_create();
      > 	efd2 = epoll_create();
      > 	epoll_ctl(efd1, EPOLL_CTL_ADD, dfd, ...);
      > 	epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
      >
      > When a packet arrives to the device underneath "dfd", the net code will
      > issue a wake_up() on its poll wake list. Epoll (efd1) has installed a
      > callback wakeup entry on that queue, and the wake_up() performed by the
      > "dfd" net code will end up in ep_poll_callback(). At this point epoll
      > (efd1) notices that it may have some event ready, so it needs to wake up
      > the waiters on its poll wait list (efd2). So it calls ep_poll_safewake()
      > that ends up in another wake_up(), after having checked about the
      > recursion constraints. That are, no more than EP_MAX_POLLWAKE_NESTS, to
      > avoid stack blasting. Never hit the same queue, to avoid loops like:
      >
      > 	epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
      > 	epoll_ctl(efd3, EPOLL_CTL_ADD, efd2, ...);
      > 	epoll_ctl(efd4, EPOLL_CTL_ADD, efd3, ...);
      > 	epoll_ctl(efd1, EPOLL_CTL_ADD, efd4, ...);
      >
      > The code "if (tncur->wq == wq || ..." prevents re-entering the same
      > queue/lock.
      
      Since the epoll code is very careful to not nest same instance locks
      allow the recursion.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ccf831c
  9. 07 12月, 2007 2 次提交
  10. 10 7月, 2007 1 次提交
  11. 31 10月, 2006 1 次提交
    • P
      [PATCH] lockdep: annotate DECLARE_WAIT_QUEUE_HEAD · 7259f0d0
      Peter Zijlstra 提交于
      kernel: INFO: trying to register non-static key.
      kernel: the code is fine but needs lockdep annotation.
      kernel: turning off the locking correctness validator.
      kernel:  [<c04051ed>] show_trace_log_lvl+0x58/0x16a
      kernel:  [<c04057fa>] show_trace+0xd/0x10
      kernel:  [<c0405913>] dump_stack+0x19/0x1b
      kernel:  [<c043b1e2>] __lock_acquire+0xf0/0x90d
      kernel:  [<c043bf70>] lock_acquire+0x4b/0x6b
      kernel:  [<c061472f>] _spin_lock_irqsave+0x22/0x32
      kernel:  [<c04363d3>] prepare_to_wait+0x17/0x4b
      kernel:  [<f89a24b6>] lpfc_do_work+0xdd/0xcc2 [lpfc]
      kernel:  [<c04361b9>] kthread+0xc3/0xf2
      kernel:  [<c0402005>] kernel_thread_helper+0x5/0xb
      
      Another case of non-static lockdep keys; duplicate the paradigm set by
      DECLARE_COMPLETION_ONSTACK and introduce DECLARE_WAIT_QUEUE_HEAD_ONSTACK.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Greg KH <gregkh@suse.de>
      Cc: Markus Lidel <markus.lidel@shadowconnect.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Cc: Marcel Holtmann <marcel@holtmann.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7259f0d0
  12. 11 7月, 2006 1 次提交
  13. 04 7月, 2006 2 次提交
  14. 26 4月, 2006 1 次提交
  15. 07 11月, 2005 1 次提交
  16. 24 6月, 2005 1 次提交
  17. 25 5月, 2005 1 次提交
  18. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4