1. 11 6月, 2013 1 次提交
  2. 30 4月, 2013 2 次提交
  3. 03 4月, 2013 1 次提交
  4. 01 4月, 2013 1 次提交
    • K
      net: add option to enable error queue packets waking select · 7d4c04fc
      Keller, Jacob E 提交于
      Currently, when a socket receives something on the error queue it only wakes up
      the socket on select if it is in the "read" list, that is the socket has
      something to read. It is useful also to wake the socket if it is in the error
      list, which would enable software to wait on error queue packets without waking
      up for regular data on the socket. The main use case is for receiving
      timestamped transmit packets which return the timestamp to the socket via the
      error queue. This enables an application to select on the socket for the error
      queue only instead of for the regular traffic.
      
      -v2-
      * Added the SO_SELECT_ERR_QUEUE socket option to every architechture specific file
      * Modified every socket poll function that checks error queue
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Cc: Jeffrey Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Matthew Vick <matthew.vick@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d4c04fc
  5. 13 2月, 2013 1 次提交
    • E
      net: fix infinite loop in __skb_recv_datagram() · 77c1090f
      Eric Dumazet 提交于
      Tommi was fuzzing with trinity and reported the following problem :
      
      commit 3f518bf7 (datagram: Add offset argument to __skb_recv_datagram)
      missed that a raw socket receive queue can contain skbs with no payload.
      
      We can loop in __skb_recv_datagram() with MSG_PEEK mode, because
      wait_for_packet() is not prepared to skip these skbs.
      
      [   83.541011] INFO: rcu_sched detected stalls on CPUs/tasks: {}
      (detected by 0, t=26002 jiffies, g=27673, c=27672, q=75)
      [   83.541011] INFO: Stall ended before state dump start
      [  108.067010] BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child31:2847]
      ...
      [  108.067010] Call Trace:
      [  108.067010]  [<ffffffff818cc103>] __skb_recv_datagram+0x1a3/0x3b0
      [  108.067010]  [<ffffffff818cc33d>] skb_recv_datagram+0x2d/0x30
      [  108.067010]  [<ffffffff819ed43d>] rawv6_recvmsg+0xad/0x240
      [  108.067010]  [<ffffffff818c4b04>] sock_common_recvmsg+0x34/0x50
      [  108.067010]  [<ffffffff818bc8ec>] sock_recvmsg+0xbc/0xf0
      [  108.067010]  [<ffffffff818bf31e>] sys_recvfrom+0xde/0x150
      [  108.067010]  [<ffffffff81ca4329>] system_call_fastpath+0x16/0x1b
      Reported-by: NTommi Rantala <tt.rantala@gmail.com>
      Tested-by: NTommi Rantala <tt.rantala@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77c1090f
  6. 28 6月, 2012 1 次提交
  7. 16 4月, 2012 1 次提交
  8. 29 3月, 2012 1 次提交
  9. 22 2月, 2012 2 次提交
  10. 19 10月, 2011 1 次提交
  11. 25 8月, 2011 1 次提交
  12. 07 12月, 2010 1 次提交
  13. 07 9月, 2010 2 次提交
  14. 13 7月, 2010 1 次提交
  15. 27 5月, 2010 1 次提交
    • E
      net: fix lock_sock_bh/unlock_sock_bh · 8a74ad60
      Eric Dumazet 提交于
      This new sock lock primitive was introduced to speedup some user context
      socket manipulation. But it is unsafe to protect two threads, one using
      regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh
      
      This patch changes lock_sock_bh to be careful against 'owned' state.
      If owned is found to be set, we must take the slow path.
      lock_sock_bh() now returns a boolean to say if the slow path was taken,
      and this boolean is used at unlock_sock_bh time to call the appropriate
      unlock function.
      
      After this change, BH are either disabled or enabled during the
      lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
      so we rename these functions to lock_sock_fast()/unlock_sock_fast().
      Reported-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Tested-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a74ad60
  16. 04 5月, 2010 1 次提交
  17. 29 4月, 2010 1 次提交
    • E
      net: speedup udp receive path · 4b0b72f7
      Eric Dumazet 提交于
      Since commit 95766fff ([UDP]: Add memory accounting.), 
      each received packet needs one extra sock_lock()/sock_release() pair.
      
      This added latency because of possible backlog handling. Then later,
      ticket spinlocks added yet another latency source in case of DDOS.
      
      This patch introduces lock_sock_bh() and unlock_sock_bh()
      synchronization primitives, avoiding one atomic operation and backlog
      processing.
      
      skb_free_datagram_locked() uses them instead of full blown
      lock_sock()/release_sock(). skb is orphaned inside locked section for
      proper socket memory reclaim, and finally freed outside of it.
      
      UDP receive path now take the socket spinlock only once.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b0b72f7
  18. 21 4月, 2010 1 次提交
  19. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  20. 31 10月, 2009 1 次提交
  21. 19 10月, 2009 1 次提交
  22. 14 8月, 2009 1 次提交
    • N
      net: skb ftracer - add tracepoint to skb_copy_datagram_iovec (v3) · e9b3cc1b
      Neil Horman 提交于
      skb allocation / cosumption tracer - Add consumption tracepoint
      
      This patch adds a tracepoint to skb_copy_datagram_iovec, which is called each
      time a userspace process copies a frame from a socket receive queue to a user
      space buffer.  It allows us to hook in and examine each sk_buff that the system
      receives on a per-socket bases, and can be use to compile a list of which skb's
      were received by which processes.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      
       include/trace/events/skb.h |   20 ++++++++++++++++++++
       net/core/datagram.c        |    3 +++
       2 files changed, 23 insertions(+)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9b3cc1b
  23. 10 7月, 2009 1 次提交
    • J
      net: adding memory barrier to the poll and receive callbacks · a57de0b4
      Jiri Olsa 提交于
      Adding memory barrier after the poll_wait function, paired with
      receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
      to wrap the memory barrier.
      
      Without the memory barrier, following race can happen.
      The race fires, when following code paths meet, and the tp->rcv_nxt
      and __add_wait_queue updates stay in CPU caches.
      
      CPU1                         CPU2
      
      sys_select                   receive packet
        ...                        ...
        __add_wait_queue           update tp->rcv_nxt
        ...                        ...
        tp->rcv_nxt check          sock_def_readable
        ...                        {
        schedule                      ...
                                      if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
                                              wake_up_interruptible(sk->sk_sleep)
                                      ...
                                   }
      
      If there was no cache the code would work ok, since the wait_queue and
      rcv_nxt are opposit to each other.
      
      Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
      passed the tp->rcv_nxt check and sleeps, or will get the new value for
      tp->rcv_nxt and will return with new data mask.
      In both cases the process (CPU1) is being added to the wait queue, so the
      waitqueue_active (CPU2) call cannot miss and will wake up CPU1.
      
      The bad case is when the __add_wait_queue changes done by CPU1 stay in its
      cache, and so does the tp->rcv_nxt update on CPU2 side.  The CPU1 will then
      endup calling schedule and sleep forever if there are no more data on the
      socket.
      
      Calls to poll_wait in following modules were ommited:
      	net/bluetooth/af_bluetooth.c
      	net/irda/af_irda.c
      	net/irda/irnet/irnet_ppp.c
      	net/mac80211/rc80211_pid_debugfs.c
      	net/phonet/socket.c
      	net/rds/af_rds.c
      	net/rfkill/core.c
      	net/sunrpc/cache.c
      	net/sunrpc/rpc_pipe.c
      	net/tipc/socket.c
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a57de0b4
  24. 09 6月, 2009 1 次提交
  25. 08 6月, 2009 1 次提交
  26. 09 5月, 2009 1 次提交
  27. 28 4月, 2009 1 次提交
    • E
      net: Avoid extra wakeups of threads blocked in wait_for_packet() · bf368e4e
      Eric Dumazet 提交于
      In 2.6.25 we added UDP mem accounting.
      
      This unfortunatly added a penalty when a frame is transmitted, since
      we have at TX completion time to call sock_wfree() to perform necessary
      memory accounting. This calls sock_def_write_space() and utimately
      scheduler if any thread is waiting on the socket.
      Thread(s) waiting for an incoming frame was scheduled, then had to sleep
      again as event was meaningless.
      
      (All threads waiting on a socket are using same sk_sleep anchor)
      
      This adds lot of extra wakeups and increases latencies, as noted
      by Christoph Lameter, and slows down softirq handler.
      
      Reference : http://marc.info/?l=linux-netdev&m=124060437012283&w=2 
      
      Fortunatly, Davide Libenzi recently added concept of keyed wakeups
      into kernel, and particularly for sockets (see commit
      37e5540b 
      epoll keyed wakeups: make sockets use keyed wakeups)
      
      Davide goal was to optimize epoll, but this new wakeup infrastructure
      can help non epoll users as well, if they care to setup an appropriate
      handler.
      
      This patch introduces new DEFINE_WAIT_FUNC() helper and uses it
      in wait_for_packet(), so that only relevant event can wakeup a thread
      blocked in this function.
      
      Trace of function calls from bnx2 TX completion bnx2_poll_work() is :
      __kfree_skb()
       skb_release_head_state()
        sock_wfree()
         sock_def_write_space()
          __wake_up_sync_key()
           __wake_up_common()
            receiver_wake_function() : Stops here since thread is waiting for an INPUT
      Reported-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf368e4e
  28. 21 4月, 2009 2 次提交
  29. 14 3月, 2009 1 次提交
  30. 05 11月, 2008 1 次提交
    • E
      net: sk_free_datagram() should use sk_mem_reclaim_partial() · 270acefa
      Eric Dumazet 提交于
      I noticed a contention on udp_memory_allocated on regular UDP applications.
      
      While tcp_memory_allocated is seldom used, it appears each incoming UDP frame
      is currently touching udp_memory_allocated when queued, and when received by
      application.
      
      One possible solution is to use sk_mem_reclaim_partial() instead of
      sk_mem_reclaim(), so that we keep a small reserve (less than one page)
      of memory for each UDP socket.
      
      We did something very similar on TCP side in commit
      9993e7d3
      ([TCP]: Do not purge sk_forward_alloc entirely in tcp_delack_timer())
      
      A more complex solution would need to convert prot->memory_allocated to
      use a percpu_counter with batches of 64 or 128 pages.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      270acefa
  31. 14 10月, 2008 1 次提交
  32. 16 8月, 2008 1 次提交
  33. 26 7月, 2008 1 次提交
  34. 29 1月, 2008 3 次提交
    • H
      [NET] CORE: Introducing new memory accounting interface. · 3ab224be
      Hideo Aoki 提交于
      This patch introduces new memory accounting functions for each network
      protocol. Most of them are renamed from memory accounting functions
      for stream protocols. At the same time, some stream memory accounting
      functions are removed since other functions do same thing.
      
      Renaming:
      	sk_stream_free_skb()		->	sk_wmem_free_skb()
      	__sk_stream_mem_reclaim()	->	__sk_mem_reclaim()
      	sk_stream_mem_reclaim()		->	sk_mem_reclaim()
      	sk_stream_mem_schedule 		->    	__sk_mem_schedule()
      	sk_stream_pages()      		->	sk_mem_pages()
      	sk_stream_rmem_schedule()	->	sk_rmem_schedule()
      	sk_stream_wmem_schedule()	->	sk_wmem_schedule()
      	sk_charge_skb()			->	sk_mem_charge()
      
      Removeing
      	sk_stream_rfree():	consolidates into sock_rfree()
      	sk_stream_set_owner_r(): consolidates into skb_set_owner_r()
      	sk_stream_mem_schedule()
      
      The following functions are added.
          	sk_has_account(): check if the protocol supports accounting
      	sk_mem_uncharge(): do the opposite of sk_mem_charge()
      
      In addition, to achieve consolidation, updating sk_wmem_queued is
      removed from sk_mem_charge().
      
      Next, to consolidate memory accounting functions, this patch adds
      memory accounting calls to network core functions. Moreover, present
      memory accounting call is renamed to new accounting call.
      
      Finally we replace present memory accounting calls with new interface
      in TCP and SCTP.
      Signed-off-by: NTakahiro Yasui <tyasui@redhat.com>
      Signed-off-by: NHideo Aoki <haoki@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ab224be
    • H
      [UDP]: Only increment counter on first peek/recv · a59322be
      Herbert Xu 提交于
      The previous move of the the UDP inDatagrams counter caused each
      peek of the same packet to be counted separately.  This may be
      undesirable.
      
      This patch fixes this by adding a bit to sk_buff to record whether
      this packet has already been seen through skb_recv_datagram.  We
      then only increment the counter when the packet is seen for the
      first time.
      
      The only dodgy part is the fact that skb_recv_datagram doesn't have
      a good way of returning this new bit of information.  So I've added
      a new function __skb_recv_datagram that does return this and made
      skb_recv_datagram a wrapper around it.
      
      The plan is to eventually replace all uses of skb_recv_datagram with
      this new function at which time it can be renamed its proper name.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a59322be
    • H
      [UDP]: Avoid repeated counting of checksum errors due to peeking · 27ab2568
      Herbert Xu 提交于
      Currently it is possible for two processes to peek on the same socket
      and end up incrementing the error counter twice for the same packet.
      
      This patch fixes it by making skb_kill_datagram return whether it
      succeeded in unlinking the packet and only incrementing the counter
      if it did.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27ab2568