1. 10 12月, 2014 5 次提交
    • A
      skb_copy_datagram_iovec() can die · d3a9632f
      Al Viro 提交于
      no callers other than itself.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d3a9632f
    • A
      switch memcpy_to_msg() and skb_copy{,_and_csum}_datagram_msg() to primitives · e5a4b0bb
      Al Viro 提交于
      ... making both non-draining.  That means that tcp_recvmsg() becomes
      non-draining.  And _that_ would break iscsit_do_rx_data() unless we
      	a) make sure tcp_recvmsg() is uniformly non-draining (it is)
      	b) make sure it copes with arbitrary (including shifted)
      iov_iter (it does, all it uses is iov_iter primitives)
      	c) make iscsit_do_rx_data() initialize ->msg_iter only once.
      
      Fortunately, (c) is doable with minimal work and we are rid of one
      the two places where kernel send/recvmsg users would be unhappy with
      non-draining behaviour.
      
      Actually, that makes all but one of ->recvmsg() instances iov_iter-clean.
      The exception is skcipher_recvmsg() and it also isn't hard to convert
      to primitives (iov_iter_get_pages() is needed there).  That'll wait
      a bit - there's some interplay with ->sendmsg() path for that one.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e5a4b0bb
    • A
      put iov_iter into msghdr · c0371da6
      Al Viro 提交于
      Note that the code _using_ ->msg_iter at that point will be very
      unhappy with anything other than unshifted iovec-backed iov_iter.
      We still need to convert users to proper primitives.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c0371da6
    • H
      dst: no need to take reference on DST_NOCACHE dsts · dbfc4fb7
      Hannes Frederic Sowa 提交于
      Since commit f8864972 ("ipv4: fix dst race in sk_dst_get()")
      DST_NOCACHE dst_entries get freed by RCU. So there is no need to get a
      reference on them when we are in rcu protected sections.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dbfc4fb7
    • E
      net: avoid two atomic operations in fast clones · 6ffe75eb
      Eric Dumazet 提交于
      Commit ce1a4ea3 ("net: avoid one atomic operation in skb_clone()")
      took the wrong way to save one atomic operation.
      
      It is actually possible to avoid two atomic operations, if we
      do not change skb->fclone values, and only rely on clone_ref
      content to signal if the clone is available or not.
      
      skb_clone() can simply use the fast clone if clone_ref is 1.
      
      kfree_skbmem() can avoid the atomic_dec_and_test() if clone_ref is 1.
      
      Note that because we usually free the clone before the original skb,
      this particular attempt is only done for the original skb to have better
      branch prediction.
      
      SKB_FCLONE_FREE is removed.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Chris Mason <clm@fb.com>
      Cc: Sabrina Dubroca <sd@queasysnail.net>
      Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ffe75eb
  2. 09 12月, 2014 1 次提交
    • A
      net: Add functions for handling padding frame and adding to length · 9c0c1124
      Alexander Duyck 提交于
      This patch adds two new helper functions skb_put_padto and eth_skb_pad.
      These functions deviate from the standard skb_pad or skb_padto in that they
      will also update the length and tail pointers so that they reflect the
      padding added to the frame.
      
      The eth_skb_pad helper is meant to be used with Ethernet devices to update
      either Rx or Tx frames so that they report the correct size.  The
      skb_put_padto helper is meant to be used primarily in the transmit path for
      network devices that need frames to be padded up to some minimum size and
      don't wish to simply update the length somewhere external to the frame.
      
      The motivation behind this is that there are a number of implementations
      throughout the network device drivers that are all doing the same thing,
      but each a little bit differently and as a result several implementations
      contain bugs such as updating the length without updating the tail offset
      and other similar issues.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c0c1124
  3. 24 11月, 2014 6 次提交
  4. 22 11月, 2014 2 次提交
  5. 12 11月, 2014 2 次提交
    • A
      net: Remove __skb_alloc_page and __skb_alloc_pages · 160d2aba
      Alexander Duyck 提交于
      Remove the two functions which are now dead code.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      160d2aba
    • A
      net: Add device Rx page allocation function · 71dfda58
      Alexander Duyck 提交于
      This patch implements __dev_alloc_pages and __dev_alloc_page.  These are
      meant to replace the __skb_alloc_pages and __skb_alloc_page functions.  The
      reason for doing this is that it occurred to me that __skb_alloc_page is
      supposed to be passed an sk_buff pointer, but it is NULL in all cases where
      it is used.  Worse is that in the case of ixgbe it is passed NULL via the
      sk_buff pointer in the rx_buffer info structure which means the compiler is
      not correctly stripping it out.
      
      The naming for these functions is based on dev_alloc_skb and __dev_alloc_skb.
      There was originally a netdev_alloc_page, however that was passed a
      net_device pointer and this function is not so I thought it best to follow
      that naming scheme since that is the same difference between dev_alloc_skb
      and netdev_alloc_skb.
      
      In the case of anything greater than order 0 it is assumed that we want a
      compound page so __GFP_COMP is set for all allocations as we expect a
      compound page when assigning a page frag.
      
      The other change in this patch is to exploit the behaviors of the page
      allocator in how it handles flags.  So for example we can always set
      __GFP_COMP and __GFP_MEMALLOC since they are ignored if they are not
      applicable or are overridden by another flag.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71dfda58
  6. 08 11月, 2014 2 次提交
  7. 06 11月, 2014 3 次提交
  8. 04 11月, 2014 1 次提交
  9. 31 10月, 2014 1 次提交
  10. 29 10月, 2014 1 次提交
  11. 15 10月, 2014 1 次提交
  12. 09 10月, 2014 1 次提交
  13. 05 10月, 2014 1 次提交
    • V
      net: Cleanup skb cloning by adding SKB_FCLONE_FREE · c8753d55
      Vijay Subramanian 提交于
      SKB_FCLONE_UNAVAILABLE has overloaded meaning depending on type of skb.
      1: If skb is allocated from head_cache, it indicates fclone is not available.
      2: If skb is a companion fclone skb (allocated from fclone_cache), it indicates
      it is available to be used.
      
      To avoid confusion for case 2 above, this patch  replaces
      SKB_FCLONE_UNAVAILABLE with SKB_FCLONE_FREE where appropriate. For fclone
      companion skbs, this indicates it is free for use.
      
      SKB_FCLONE_UNAVAILABLE will now simply indicate skb is from head_cache and
      cannot / will not have a companion fclone.
      Signed-off-by: NVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8753d55
  14. 02 10月, 2014 2 次提交
    • T
      udp: Generalize skb_udp_segment · 8bce6d7d
      Tom Herbert 提交于
      skb_udp_segment is the function called from udp4_ufo_fragment to
      segment a UDP tunnel packet. This function currently assumes
      segmentation is transparent Ethernet bridging (i.e. VXLAN
      encapsulation). This patch generalizes the function to
      operate on either Ethertype or IP protocol.
      
      The inner_protocol field must be set to the protocol of the inner
      header. This can now be either an Ethertype or an IP protocol
      (in a union). A new flag in the skbuff indicates which type is
      effective. skb_set_inner_protocol and skb_set_inner_ipproto
      helper functions were added to set the inner_protocol. These
      functions are called from the point where the tunnel encapsulation
      is occuring.
      
      When skb_udp_tunnel_segment is called, the function to segment the
      inner packet is selected based on the inner IP or Ethertype. In the
      case of an IP protocol encapsulation, the function is derived from
      inet[6]_offloads. In the case of Ethertype, skb->protocol is
      set to the inner_protocol and skb_mac_gso_segment is called. (GRE
      currently does this, but it might be possible to lookup the protocol
      in offload_base and call the appropriate segmenation function
      directly).
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8bce6d7d
    • E
      net: cleanup and document skb fclone layout · d0bf4a9e
      Eric Dumazet 提交于
      Lets use a proper structure to clearly document and implement
      skb fast clones.
      
      Then, we might experiment more easily alternative layouts.
      
      This patch adds a new skb_fclone_busy() helper, used by tcp and xfrm,
      to stop leaking of implementation details.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0bf4a9e
  15. 30 9月, 2014 1 次提交
    • E
      net: reorganize sk_buff for faster __copy_skb_header() · b1937227
      Eric Dumazet 提交于
      With proliferation of bit fields in sk_buff, __copy_skb_header() became
      quite expensive, showing as the most expensive function in a GSO
      workload.
      
      __copy_skb_header() performance is also critical for non GSO TCP
      operations, as it is used from skb_clone()
      
      This patch carefully moves all the fields that were not copied in a
      separate zone : cloned, nohdr, fclone, peeked, head_frag, xmit_more
      
      Then I moved all other fields and all other copied fields in a section
      delimited by headers_start[0]/headers_end[0] section so that we
      can use a single memcpy() call, inlined by compiler using long
      word load/stores.
      
      I also tried to make all copies in the natural orders of sk_buff,
      to help hardware prefetching.
      
      I made sure sk_buff size did not change.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1937227
  16. 28 9月, 2014 1 次提交
  17. 27 9月, 2014 2 次提交
    • E
      net: introduce __skb_header_release() · f4a775d1
      Eric Dumazet 提交于
      While profiling TCP stack, I noticed one useless atomic operation
      in tcp_sendmsg(), caused by skb_header_release().
      
      It turns out all current skb_header_release() users have a fresh skb,
      that no other user can see, so we can avoid one atomic operation.
      
      Introduce __skb_header_release() to clearly document this.
      
      This gave me a 1.5 % improvement on TCP_RR workload.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4a775d1
    • P
      netfilter: bridge: move br_netfilter out of the core · 34666d46
      Pablo Neira Ayuso 提交于
      Jesper reported that br_netfilter always registers the hooks since
      this is part of the bridge core. This harms performance for people that
      don't need this.
      
      This patch modularizes br_netfilter so it can be rmmod'ed, thus,
      the hooks can be unregistered. I think the bridge netfilter should have
      been a separated module since the beginning, Patrick agreed on that.
      
      Note that this is breaking compatibility for users that expect that
      bridge netfilter is going to be available after explicitly 'modprobe
      bridge' or via automatic load through brctl.
      
      However, the damage can be easily undone by modprobing br_netfilter.
      The bridge core also spots a message to provide a clue to people that
      didn't notice that this has been deprecated.
      
      On top of that, the plan is that nftables will not rely on this software
      layer, but integrate the connection tracking into the bridge layer to
      enable stateful filtering and NAT, which is was bridge netfilter users
      seem to require.
      
      This patch still keeps the fake_dst_ops in the bridge core, since this
      is required by when the bridge port is initialized. So we can safely
      modprobe/rmmod br_netfilter anytime.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      34666d46
  18. 20 9月, 2014 1 次提交
  19. 14 9月, 2014 1 次提交
  20. 06 9月, 2014 2 次提交
    • A
      net: Add function for parsing the header length out of linear ethernet frames · 56193d1b
      Alexander Duyck 提交于
      This patch updates some of the flow_dissector api so that it can be used to
      parse the length of ethernet buffers stored in fragments.  Most of the
      changes needed were to __skb_get_poff as it needed to be updated to support
      sending a linear buffer instead of a skb.
      
      I have split __skb_get_poff into two functions, the first is skb_get_poff
      and it retains the functionality of the original __skb_get_poff.  The other
      function is __skb_get_poff which now works much like __skb_flow_dissect in
      relation to skb_flow_dissect in that it provides the same functionality but
      works with just a data buffer and hlen instead of needing an skb.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56193d1b
    • A
      net-timestamp: Make the clone operation stand-alone from phy timestamping · 62bccb8c
      Alexander Duyck 提交于
      The phy timestamping takes a different path than the regular timestamping
      does in that it will create a clone first so that the packets needing to be
      timestamped can be placed in a queue, or the context block could be used.
      
      In order to support these use cases I am pulling the core of the code out
      so it can be used in other drivers beyond just phy devices.
      
      In addition I have added a destructor named sock_efree which is meant to
      provide a simple way for dropping the reference to skb exceptions that
      aren't part of either the receive or send windows for the socket, and I
      have removed some duplication in spots where this destructor could be used
      in place of sock_edemux.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62bccb8c
  21. 02 9月, 2014 2 次提交
    • T
      net: Infrastructure for checksum unnecessary conversions · d96535a1
      Tom Herbert 提交于
      For normal path, added skb_checksum_try_convert which is called
      to attempt to convert CHECKSUM_UNNECESSARY to CHECKSUM_COMPLETE. The
      primary condition to allow this is that ip_summed is CHECKSUM_NONE
      and csum_valid is true, which will be the state after consuming
      a CHECKSUM_UNNECESSARY.
      
      For GRO path, added skb_gro_checksum_try_convert which is the GRO
      analogue of skb_checksum_try_convert. The primary condition to allow
      this is that NAPI_GRO_CB(skb)->csum_cnt == 0 and
      NAPI_GRO_CB(skb)->csum_valid is set. This implies that we have consumed
      all available CHECKSUM_UNNECESSARY checksums in the GRO path.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d96535a1
    • T
      net: Support for csum_bad in skbuff · 5a212329
      Tom Herbert 提交于
      This flag indicates that an invalid checksum was detected in the
      packet. __skb_mark_checksum_bad helper function was added to set this.
      
      Checksums can be marked bad from a driver or the GRO path (the latter
      is implemented in this patch). csum_bad is checked in
      __skb_checksum_validate_complete (i.e. calling that when ip_summed ==
      CHECKSUM_NONE).
      
      csum_bad works in conjunction with ip_summed value. In the case that
      ip_summed is CHECKSUM_NONE and csum_bad is set, this implies that the
      first (or next) checksum encountered in the packet is bad. When
      ip_summed is CHECKSUM_UNNECESSARY, the first checksum after the last
      one validated is bad. For example, if ip_summed == CHECKSUM_UNNECESSARY,
      csum_level == 1, and csum_bad is set-- then the third checksum in the
      packet is bad. In the normal path, the packet will be dropped when
      processing the protocol layer of the bad checksum:
      __skb_decr_checksum_unnecessary called twice for the good checksums
      changing ip_summed to CHECKSUM_NONE so that
      __skb_checksum_validate_complete is called to validate the third
      checksum and that will fail since csum_bad is set.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a212329
  22. 30 8月, 2014 1 次提交
    • T
      net: Clarification of CHECKSUM_UNNECESSARY · 77cffe23
      Tom Herbert 提交于
      This patch:
       - Clarifies the specific requirements of devices returning
         CHECKSUM_UNNECESSARY (comments in skbuff.h).
       - Adds csum_level field to skbuff. This is used to express how
         many checksums are covered by CHECKSUM_UNNECESSARY (stores n - 1).
         This replaces the overloading of skb->encapsulation, that field is
         is now only used to indicate inner headers are valid.
       - Change __skb_checksum_validate_needed to "consume" each checksum
         as indicated by csum_level as layers of the the packet are parsed.
       - Remove skb_pop_rcv_encapsulation, no longer needed in the new
         csum_level model.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77cffe23