1. 25 8月, 2011 1 次提交
  2. 21 8月, 2011 1 次提交
  3. 18 8月, 2011 1 次提交
  4. 02 8月, 2011 1 次提交
  5. 22 7月, 2011 1 次提交
  6. 09 7月, 2011 1 次提交
  7. 07 7月, 2011 1 次提交
    • S
      skbuff: skb supports zero-copy buffers · a6686f2f
      Shirley Ma 提交于
      This patch adds userspace buffers support in skb shared info. A new
      struct skb_ubuf_info is needed to maintain the userspace buffers
      argument and index, a callback is used to notify userspace to release
      the buffers once lower device has done DMA (Last reference to that skb
      has gone).
      
      If there is any userspace apps to reference these userspace buffers,
      then these userspaces buffers will be copied into kernel. This way we
      can prevent userspace apps from holding these userspace buffers too long.
      
      Use destructor_arg to point to the userspace buffer info; a new tx flags
      SKBTX_DEV_ZEROCOPY is added for zero-copy buffer check.
      Signed-off-by: NShirley Ma <xma@...ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6686f2f
  8. 21 5月, 2011 1 次提交
    • L
      sanitize <linux/prefetch.h> usage · 268bb0ce
      Linus Torvalds 提交于
      Commit e66eed65 ("list: remove prefetching from regular list
      iterators") removed the include of prefetch.h from list.h, which
      uncovered several cases that had apparently relied on that rather
      obscure header file dependency.
      
      So this fixes things up a bit, using
      
         grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]')
         grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]')
      
      to guide us in finding files that either need <linux/prefetch.h>
      inclusion, or have it despite not needing it.
      
      There are more of them around (mostly network drivers), but this gets
      many core ones.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      268bb0ce
  9. 18 5月, 2011 1 次提交
    • E
      net: add skb_dst_force() in sock_queue_err_skb() · abb57ea4
      Eric Dumazet 提交于
      Commit 7fee226a (add a noref bit on skb dst) forgot to use
      skb_dst_force() on packets queued in sk_error_queue
      
      This triggers following warning, for applications using IP_CMSG_PKTINFO
      receiving one error status
      
      
      ------------[ cut here ]------------
      WARNING: at include/linux/skbuff.h:457 ip_cmsg_recv_pktinfo+0xa6/0xb0()
      Hardware name: 2669UYD
      Modules linked in: isofs vboxnetadp vboxnetflt nfsd ebtable_nat ebtables
      lib80211_crypt_ccmp uinput xcbc hdaps tp_smapi thinkpad_ec radeonfb fb_ddc
      radeon ttm drm_kms_helper drm ipw2200 intel_agp intel_gtt libipw i2c_algo_bit
      i2c_i801 agpgart rng_core cfbfillrect cfbcopyarea cfbimgblt video raid10 raid1
      raid0 linear md_mod vboxdrv
      Pid: 4697, comm: miredo Not tainted 2.6.39-rc6-00569-g5895198c-dirty #22
      Call Trace:
       [<c17746b6>] ? printk+0x1d/0x1f
       [<c1058302>] warn_slowpath_common+0x72/0xa0
       [<c15bbca6>] ? ip_cmsg_recv_pktinfo+0xa6/0xb0
       [<c15bbca6>] ? ip_cmsg_recv_pktinfo+0xa6/0xb0
       [<c1058350>] warn_slowpath_null+0x20/0x30
       [<c15bbca6>] ip_cmsg_recv_pktinfo+0xa6/0xb0
       [<c15bbdd7>] ip_cmsg_recv+0x127/0x260
       [<c154f82d>] ? skb_dequeue+0x4d/0x70
       [<c1555523>] ? skb_copy_datagram_iovec+0x53/0x300
       [<c178e834>] ? sub_preempt_count+0x24/0x50
       [<c15bdd2d>] ip_recv_error+0x23d/0x270
       [<c15de554>] udp_recvmsg+0x264/0x2b0
       [<c15ea659>] inet_recvmsg+0xd9/0x130
       [<c1547752>] sock_recvmsg+0xf2/0x120
       [<c11179cb>] ? might_fault+0x4b/0xa0
       [<c15546bc>] ? verify_iovec+0x4c/0xc0
       [<c1547660>] ? sock_recvmsg_nosec+0x100/0x100
       [<c1548294>] __sys_recvmsg+0x114/0x1e0
       [<c1093895>] ? __lock_acquire+0x365/0x780
       [<c1148b66>] ? fget_light+0xa6/0x3e0
       [<c1148b7f>] ? fget_light+0xbf/0x3e0
       [<c1148aee>] ? fget_light+0x2e/0x3e0
       [<c1549f29>] sys_recvmsg+0x39/0x60
      
      Close bug https://bugzilla.kernel.org/show_bug.cgi?id=34622Reported-by: NWitold Baryluk <baryluk@smp.if.uj.edu.pl>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      abb57ea4
  10. 31 3月, 2011 1 次提交
  11. 17 3月, 2011 1 次提交
  12. 02 3月, 2011 1 次提交
  13. 28 1月, 2011 1 次提交
  14. 25 1月, 2011 2 次提交
    • M
      net: change netdev->features to u32 · 04ed3e74
      Michał Mirosław 提交于
      Quoting Ben Hutchings: we presumably won't be defining features that
      can only be enabled on 64-bit architectures.
      
      Occurences found by `grep -r` on net/, drivers/net, include/
      
      [ Move features and vlan_features next to each other in
        struct netdev, as per Eric Dumazet's suggestion -DaveM ]
      Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04ed3e74
    • M
      GRO: fix merging a paged skb after non-paged skbs · d1dc7abf
      Michal Schmidt 提交于
      Suppose that several linear skbs of the same flow were received by GRO. They
      were thus merged into one skb with a frag_list. Then a new skb of the same flow
      arrives, but it is a paged skb with data starting in its frags[].
      
      Before adding the skb to the frag_list skb_gro_receive() will of course adjust
      the skb to throw away the headers. It correctly modifies the page_offset and
      size of the frag, but it leaves incorrect information in the skb:
       ->data_len is not decreased at all.
       ->len is decreased only by headlen, as if no change were done to the frag.
      Later in a receiving process this causes skb_copy_datagram_iovec() to return
      -EFAULT and this is seen in userspace as the result of the recv() syscall.
      
      In practice the bug can be reproduced with the sfc driver. By default the
      driver uses an adaptive scheme when it switches between using
      napi_gro_receive() (with skbs) and napi_gro_frags() (with pages). The bug is
      reproduced when under rx load with enough successful GRO merging the driver
      decides to switch from the former to the latter.
      
      Manual control is also possible, so reproducing this is easy with netcat:
       - on machine1 (with sfc): nc -l 12345 > /dev/null
       - on machine2: nc machine1 12345 < /dev/zero
       - on machine1:
         echo 1 > /sys/module/sfc/parameters/rx_alloc_method  # use skbs
         echo 2 > /sys/module/sfc/parameters/rx_alloc_method  # use pages
       - See that nc has quit suddenly.
      
      [v2: Modified by Eric Dumazet to avoid advancing skb->data past the end
           and to use a temporary variable.]
      Signed-off-by: NMichal Schmidt <mschmidt@redhat.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1dc7abf
  15. 13 1月, 2011 1 次提交
  16. 17 12月, 2010 1 次提交
  17. 16 12月, 2010 1 次提交
  18. 04 12月, 2010 1 次提交
  19. 17 10月, 2010 1 次提交
    • E
      net: allocate skbs on local node · 564824b0
      Eric Dumazet 提交于
      commit b30973f8 (node-aware skb allocation) spread a wrong habit of
      allocating net drivers skbs on a given memory node : The one closest to
      the NIC hardware. This is wrong because as soon as we try to scale
      network stack, we need to use many cpus to handle traffic and hit
      slub/slab management on cross-node allocations/frees when these cpus
      have to alloc/free skbs bound to a central node.
      
      skb allocated in RX path are ephemeral, they have a very short
      lifetime : Extra cost to maintain NUMA affinity is too expensive. What
      appeared as a nice idea four years ago is in fact a bad one.
      
      In 2010, NIC hardwares are multiqueue, or we use RPS to spread the load,
      and two 10Gb NIC might deliver more than 28 million packets per second,
      needing all the available cpus.
      
      Cost of cross-node handling in network and vm stacks outperforms the
      small benefit hardware had when doing its DMA transfert in its 'local'
      memory node at RX time. Even trying to differentiate the two allocations
      done for one skb (the sk_buff on local node, the data part on NIC
      hardware node) is not enough to bring good performance.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      564824b0
  20. 09 9月, 2010 1 次提交
  21. 07 9月, 2010 2 次提交
    • K
      skb: Add tracepoints to freeing skb · 07dc22e7
      Koki Sanagi 提交于
      This patch adds tracepoint to consume_skb and add trace_kfree_skb
      before __kfree_skb in skb_free_datagram_locked and net_tx_action.
      Combinating with tracepoint on dev_hard_start_xmit, we can check
      how long it takes to free transmitted packets. And using it, we can
      calculate how many packets driver had at that time. It is useful when
      a drop of transmitted packet is a problem.
      
                  sshd-6828  [000] 112689.258154: consume_skb: skbaddr=f2d99bb8
      Signed-off-by: NKoki Sanagi <sanagi.koki@jp.fujitsu.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com>
      Cc: Izumo Taku <izumi.taku@jp.fujitsu.com>
      Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Scott Mcmillan <scott.a.mcmillan@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      LKML-Reference: <4C724364.50903@jp.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      07dc22e7
    • E
      net: pskb_expand_head() optimization · 1fd63041
      Eric Dumazet 提交于
      pskb_expand_head() blindly takes references on fragments before calling
      skb_release_data(), potentially releasing these references.
      
      We can add a fast path, avoiding these atomic operations, if we own the
      last reference on skb->head.
      
      Based on a previous patch from David
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1fd63041
  22. 04 9月, 2010 1 次提交
  23. 02 9月, 2010 2 次提交
    • E
      gro: fix different skb headrooms · 3d3be433
      Eric Dumazet 提交于
      Packets entering GRO might have different headrooms, even for a given
      flow (because of implementation details in drivers, like copybreak).
      We cant force drivers to deliver packets with a fixed headroom.
      
      1) fix skb_segment()
      
      skb_segment() makes the false assumption headrooms of fragments are same
      than the head. When CHECKSUM_PARTIAL is used, this can give csum_start
      errors, and crash later in skb_copy_and_csum_dev()
      
      2) allocate a minimal skb for head of frag_list
      
      skb_gro_receive() uses netdev_alloc_skb(headroom + skb_gro_offset(p)) to
      allocate a fresh skb. This adds NET_SKB_PAD to a padding already
      provided by netdevice, depending on various things, like copybreak.
      
      Use alloc_skb() to allocate an exact padding, to reduce cache line
      needs:
      NET_SKB_PAD + NET_IP_ALIGN
      
      bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626
      
      Many thanks to Plamen Petrov, testing many debugging patches !
      With help of Jarek Poplawski.
      Reported-by: NPlamen Petrov <pvp-lsts@fs.uni-ruse.bg>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Jarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d3be433
    • E
      net: skbuff.c cleanup · 6602cebb
      Eric Dumazet 提交于
      (skb->data - skb->head) can be changed by skb_headroom(skb)
      
      Remove some uses of NET_SKBUFF_DATA_USES_OFFSET, using
      (skb_end_pointer(skb) - skb->head) or
      (skb_tail_pointer(skb) - skb->head) : compiler does the right thing,
      and this is more readable for us ;)
      
      (struct skb_shared_info *) casts in pskb_expand_head() to help memcpy()
      to use aligned moves.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6602cebb
  24. 23 8月, 2010 1 次提交
  25. 19 8月, 2010 1 次提交
  26. 25 7月, 2010 1 次提交
  27. 23 7月, 2010 2 次提交
  28. 13 7月, 2010 1 次提交
  29. 14 6月, 2010 2 次提交
  30. 01 6月, 2010 1 次提交
  31. 29 5月, 2010 2 次提交
  32. 22 5月, 2010 1 次提交
  33. 21 5月, 2010 1 次提交
  34. 18 5月, 2010 1 次提交
    • E
      net: add a noref bit on skb dst · 7fee226a
      Eric Dumazet 提交于
      Use low order bit of skb->_skb_dst to tell dst is not refcounted.
      
      Change _skb_dst to _skb_refdst to make sure all uses are catched.
      
      skb_dst() returns the dst, regardless of noref bit set or not, but
      with a lockdep check to make sure a noref dst is not given if current
      user is not rcu protected.
      
      New skb_dst_set_noref() helper to set an notrefcounted dst on a skb.
      (with lockdep check)
      
      skb_dst_drop() drops a reference only if skb dst was refcounted.
      
      skb_dst_force() helper is used to force a refcount on dst, when skb
      is queued and not anymore RCU protected.
      
      Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if
      !IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in
      sock_queue_rcv_skb(), in __nf_queue().
      
      Use skb_dst_force() in dev_requeue_skb().
      
      Note: dst_use_noref() still dirties dst, we might transform it
      later to do one dirtying per jiffies.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7fee226a