1. 06 6月, 2012 1 次提交
  2. 01 6月, 2012 1 次提交
  3. 30 5月, 2012 1 次提交
  4. 20 5月, 2012 1 次提交
    • E
      net: introduce skb_try_coalesce() · bad43ca8
      Eric Dumazet 提交于
      Move tcp_try_coalesce() protocol independent part to
      skb_try_coalesce().
      
      skb_try_coalesce() can be used in IPv4 defrag and IPv6 reassembly,
      to build optimized skbs (less sk_buff, and possibly less 'headers')
      
      skb_try_coalesce() is zero copy, unless the copy can fit in destination
      header (its a rare case)
      
      kfree_skb_partial() is also moved to net/core/skbuff.c and exported,
      because IPv6 will need it in patch (ipv6: use skb coalescing in
      reassembly).
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bad43ca8
  5. 19 5月, 2012 3 次提交
  6. 18 5月, 2012 2 次提交
    • N
      drop_monitor: convert to modular building · cad456d5
      Neil Horman 提交于
      When I first wrote drop monitor I wrote it to just build monolithically.  There
      is no reason it can't be built modularly as well, so lets give it that
      flexibiity.
      
      I've tested this by building it as both a module and monolithically, and it
      seems to work quite well
      
      Change notes:
      
      v2)
      * fixed for_each_present_cpu loops to be more correct as per Eric D.
      * Converted exit path failures to BUG_ON as per Ben H.
      
      v3)
      * Converted del_timer to del_timer_sync to close race noted by Ben H.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      Reviewed-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cad456d5
    • E
      net: netdev_alloc_skb() use build_skb() · a1c7fff7
      Eric Dumazet 提交于
      netdev_alloc_skb() is used by networks driver in their RX path to
      allocate an skb to receive an incoming frame.
      
      With recent skb->head_frag infrastructure, it makes sense to change
      netdev_alloc_skb() to use build_skb() and a frag allocator.
      
      This permits a zero copy splice(socket->pipe), and better GRO or TCP
      coalescing.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1c7fff7
  7. 17 5月, 2012 3 次提交
  8. 16 5月, 2012 2 次提交
  9. 11 5月, 2012 2 次提交
    • E
      pktgen: fix crash at module unload · c57b5468
      Eric Dumazet 提交于
      commit 7d3d43da (net: In unregister_netdevice_notifier unregister
      the netdevices.) makes pktgen crashing at module unload.
      
      [  296.820578] BUG: spinlock bad magic on CPU#6, rmmod/3267
      [  296.820719]  lock: ffff880310c38000, .magic: ffff8803, .owner: <none>/-1, .owner_cpu: -1
      [  296.820943] Pid: 3267, comm: rmmod Not tainted 3.4.0-rc5+ #254
      [  296.821079] Call Trace:
      [  296.821211]  [<ffffffff8168a715>] spin_dump+0x8a/0x8f
      [  296.821345]  [<ffffffff8168a73b>] spin_bug+0x21/0x26
      [  296.821507]  [<ffffffff812b4741>] do_raw_spin_lock+0x131/0x140
      [  296.821648]  [<ffffffff8169188e>] _raw_spin_lock+0x1e/0x20
      [  296.821786]  [<ffffffffa00cc0fd>] __pktgen_NN_threads+0x4d/0x140 [pktgen]
      [  296.821928]  [<ffffffffa00ccf8d>] pktgen_device_event+0x10d/0x1e0 [pktgen]
      [  296.822073]  [<ffffffff8154ed4f>] unregister_netdevice_notifier+0x7f/0x100
      [  296.822216]  [<ffffffffa00d2a0b>] pg_cleanup+0x48/0x73 [pktgen]
      [  296.822357]  [<ffffffff8109528e>] sys_delete_module+0x17e/0x2a0
      [  296.822502]  [<ffffffff81699652>] system_call_fastpath+0x16/0x1b
      
      Hold the pktgen_thread_lock while splicing pktgen_threads, and test
      pktgen_exiting in pktgen_device_event() to make unload faster.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c57b5468
    • D
      Revert "net: maintain namespace isolation between vlan and real device" · 59b9997b
      David S. Miller 提交于
      This reverts commit 8a83a00b.
      
      It causes regressions for S390 devices, because it does an
      unconditional DST drop on SKBs for vlans and the QETH device
      needs the neighbour entry hung off the DST for certain things
      on transmit.
      
      Arnd can't remember exactly why he even needed this change.
      
      Conflicts:
      
      	drivers/net/macvlan.c
      	net/8021q/vlan_dev.c
      	net/core/dev.c
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59b9997b
  10. 10 5月, 2012 2 次提交
  11. 09 5月, 2012 1 次提交
  12. 07 5月, 2012 3 次提交
  13. 04 5月, 2012 2 次提交
  14. 03 5月, 2012 4 次提交
  15. 01 5月, 2012 4 次提交
    • E
      net: add a prefetch in socket backlog processing · e4cbb02a
      Eric Dumazet 提交于
      TCP or UDP stacks have big enough latencies that prefetching next
      pointer is worth it.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4cbb02a
    • E
      net: makes skb_splice_bits() aware of skb->head_frag · 1d0c0b32
      Eric Dumazet 提交于
      __skb_splice_bits() can check if skb to be spliced has its skb->head
      mapped to a page fragment, instead of a kmalloc() area.
      
      If so we can avoid a copy of the skb head and get a reference on
      underlying page.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Matt Carlson <mcarlson@broadcom.com>
      Cc: Michael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d0c0b32
    • E
      net: make GRO aware of skb->head_frag · d7e8883c
      Eric Dumazet 提交于
      GRO can check if skb to be merged has its skb->head mapped to a page
      fragment, instead of a kmalloc() area.
      
      We 'upgrade' skb->head as a fragment in itself
      
      This avoids the frag_list fallback, and permits to build true GRO skb
      (one sk_buff and up to 16 fragments), using less memory.
      
      This reduces number of cache misses when user makes its copy, since a
      single sk_buff is fetched.
      
      This is a followup of patch "net: allow skb->head to be a page fragment"
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Matt Carlson <mcarlson@broadcom.com>
      Cc: Michael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7e8883c
    • E
      net: allow skb->head to be a page fragment · d3836f21
      Eric Dumazet 提交于
      skb->head is currently allocated from kmalloc(). This is convenient but
      has the drawback the data cannot be converted to a page fragment if
      needed.
      
      We have three spots were it hurts :
      
      1) GRO aggregation
      
       When a linear skb must be appended to another skb, GRO uses the
      frag_list fallback, very inefficient since we keep all struct sk_buff
      around. So drivers enabling GRO but delivering linear skbs to network
      stack aren't enabling full GRO power.
      
      2) splice(socket -> pipe).
      
       We must copy the linear part to a page fragment.
       This kind of defeats splice() purpose (zero copy claim)
      
      3) TCP coalescing.
      
       Recently introduced, this permits to group several contiguous segments
      into a single skb. This shortens queue lengths and save kernel memory,
      and greatly reduce probabilities of TCP collapses. This coalescing
      doesnt work on linear skbs (or we would need to copy data, this would be
      too slow)
      
      Given all these issues, the following patch introduces the possibility
      of having skb->head be a fragment in itself. We use a new skb flag,
      skb->head_frag to carry this information.
      
      build_skb() is changed to accept a frag_size argument. Drivers willing
      to provide a page fragment instead of kmalloc() data will set a non zero
      value, set to the fragment size.
      
      Then, on situations we need to convert the skb head to a frag in itself,
      we can check if skb->head_frag is set and avoid the copies or various
      fallbacks we have.
      
      This means drivers currently using frags could be updated to avoid the
      current skb->head allocation and reduce their memory footprint (aka skb
      truesize). (thats 512 or 1024 bytes saved per skb). This also makes
      bpf/netfilter faster since the 'first frag' will be part of skb linear
      part, no need to copy data.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Matt Carlson <mcarlson@broadcom.com>
      Cc: Michael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3836f21
  16. 29 4月, 2012 1 次提交
  17. 28 4月, 2012 2 次提交
    • N
      drop_monitor: Make updating data->skb smp safe · 3885ca78
      Neil Horman 提交于
      Eric Dumazet pointed out to me that the drop_monitor protocol has some holes in
      its smp protections.  Specifically, its possible to replace data->skb while its
      being written.  This patch corrects that by making data->skb an rcu protected
      variable.  That will prevent it from being overwritten while a tracepoint is
      modifying it.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: David Miller <davem@davemloft.net>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3885ca78
    • N
      drop_monitor: fix sleeping in invalid context warning · cde2e9a6
      Neil Horman 提交于
      Eric Dumazet pointed out this warning in the drop_monitor protocol to me:
      
      [   38.352571] BUG: sleeping function called from invalid context at kernel/mutex.c:85
      [   38.352576] in_atomic(): 1, irqs_disabled(): 0, pid: 4415, name: dropwatch
      [   38.352580] Pid: 4415, comm: dropwatch Not tainted 3.4.0-rc2+ #71
      [   38.352582] Call Trace:
      [   38.352592]  [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0
      [   38.352599]  [<ffffffff81063f2a>] __might_sleep+0xca/0xf0
      [   38.352606]  [<ffffffff81655b16>] mutex_lock+0x26/0x50
      [   38.352610]  [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0
      [   38.352616]  [<ffffffff810b72d9>] tracepoint_probe_register+0x29/0x90
      [   38.352621]  [<ffffffff8153a585>] set_all_monitor_traces+0x105/0x170
      [   38.352625]  [<ffffffff8153a8ca>] net_dm_cmd_trace+0x2a/0x40
      [   38.352630]  [<ffffffff8154a81a>] genl_rcv_msg+0x21a/0x2b0
      [   38.352636]  [<ffffffff810f8029>] ? zone_statistics+0x99/0xc0
      [   38.352640]  [<ffffffff8154a600>] ? genl_rcv+0x30/0x30
      [   38.352645]  [<ffffffff8154a059>] netlink_rcv_skb+0xa9/0xd0
      [   38.352649]  [<ffffffff8154a5f0>] genl_rcv+0x20/0x30
      [   38.352653]  [<ffffffff81549a7e>] netlink_unicast+0x1ae/0x1f0
      [   38.352658]  [<ffffffff81549d76>] netlink_sendmsg+0x2b6/0x310
      [   38.352663]  [<ffffffff8150824f>] sock_sendmsg+0x10f/0x130
      [   38.352668]  [<ffffffff8150abe0>] ? move_addr_to_kernel+0x60/0xb0
      [   38.352673]  [<ffffffff81515f04>] ? verify_iovec+0x64/0xe0
      [   38.352677]  [<ffffffff81509c46>] __sys_sendmsg+0x386/0x390
      [   38.352682]  [<ffffffff810ffaf9>] ? handle_mm_fault+0x139/0x210
      [   38.352687]  [<ffffffff8165b5bc>] ? do_page_fault+0x1ec/0x4f0
      [   38.352693]  [<ffffffff8106ba4d>] ? set_next_entity+0x9d/0xb0
      [   38.352699]  [<ffffffff81310b49>] ? tty_ldisc_deref+0x9/0x10
      [   38.352703]  [<ffffffff8106d363>] ? pick_next_task_fair+0x63/0x140
      [   38.352708]  [<ffffffff8150b8d4>] sys_sendmsg+0x44/0x80
      [   38.352713]  [<ffffffff8165f8e2>] system_call_fastpath+0x16/0x1b
      
      It stems from holding a spinlock (trace_state_lock) while attempting to register
      or unregister tracepoint hooks, making in_atomic() true in this context, leading
      to the warning when the tracepoint calls might_sleep() while its taking a mutex.
      Since we only use the trace_state_lock to prevent trace protocol state races, as
      well as hardware stat list updates on an rcu write side, we can just convert the
      spinlock to a mutex to avoid this problem.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: David Miller <davem@davemloft.net>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cde2e9a6
  18. 27 4月, 2012 1 次提交
  19. 26 4月, 2012 1 次提交
  20. 24 4月, 2012 3 次提交