1. 01 7月, 2017 2 次提交
  2. 21 6月, 2017 1 次提交
    • Y
      net: introduce __skb_put_[zero, data, u8] · de77b966
      yuan linyu 提交于
      follow Johannes Berg, semantic patch file as below,
      @@
      identifier p, p2;
      expression len;
      expression skb;
      type t, t2;
      @@
      (
      -p = __skb_put(skb, len);
      +p = __skb_put_zero(skb, len);
      |
      -p = (t)__skb_put(skb, len);
      +p = __skb_put_zero(skb, len);
      )
      ... when != p
      (
      p2 = (t2)p;
      -memset(p2, 0, len);
      |
      -memset(p, 0, len);
      )
      
      @@
      identifier p;
      expression len;
      expression skb;
      type t;
      @@
      (
      -t p = __skb_put(skb, len);
      +t p = __skb_put_zero(skb, len);
      )
      ... when != p
      (
      -memset(p, 0, len);
      )
      
      @@
      type t, t2;
      identifier p, p2;
      expression skb;
      @@
      t *p;
      ...
      (
      -p = __skb_put(skb, sizeof(t));
      +p = __skb_put_zero(skb, sizeof(t));
      |
      -p = (t *)__skb_put(skb, sizeof(t));
      +p = __skb_put_zero(skb, sizeof(t));
      )
      ... when != p
      (
      p2 = (t2)p;
      -memset(p2, 0, sizeof(*p));
      |
      -memset(p, 0, sizeof(*p));
      )
      
      @@
      expression skb, len;
      @@
      -memset(__skb_put(skb, len), 0, len);
      +__skb_put_zero(skb, len);
      
      @@
      expression skb, len, data;
      @@
      -memcpy(__skb_put(skb, len), data, len);
      +__skb_put_data(skb, data, len);
      
      @@
      expression SKB, C, S;
      typedef u8;
      identifier fn = {__skb_put};
      fresh identifier fn2 = fn ## "_u8";
      @@
      - *(u8 *)fn(SKB, S) = C;
      + fn2(SKB, C);
      Signed-off-by: Nyuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de77b966
  3. 16 6月, 2017 6 次提交
    • J
      networking: add and use skb_put_u8() · 634fef61
      Johannes Berg 提交于
      Joe and Bjørn suggested that it'd be nicer to not have the
      cast in the fairly common case of doing
      	*(u8 *)skb_put(skb, 1) = c;
      
      Add skb_put_u8() for this case, and use it across the code,
      using the following spatch:
      
          @@
          expression SKB, C, S;
          typedef u8;
          identifier fn = {skb_put};
          fresh identifier fn2 = fn ## "_u8";
          @@
          - *(u8 *)fn(SKB, S) = C;
          + fn2(SKB, C);
      
      Note that due to the "S", the spatch isn't perfect, it should
      have checked that S is 1, but there's also places that use a
      sizeof expression like sizeof(var) or sizeof(u8) etc. Turns
      out that nobody ever did something like
      	*(u8 *)skb_put(skb, 2) = c;
      
      which would be wrong anyway since the second byte wouldn't be
      initialized.
      Suggested-by: NJoe Perches <joe@perches.com>
      Suggested-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      634fef61
    • J
      networking: make skb_push & __skb_push return void pointers · d58ff351
      Johannes Berg 提交于
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions return void * and remove all the casts across
      the tree, adding a (u8 *) cast only where the unsigned char pointer
      was used directly, all done with the following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      
          @@
          expression SKB, LEN;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          @@
          - fn(SKB, LEN)[0]
          + *(u8 *)fn(SKB, LEN)
      
      Note that the last part there converts from push(...)[0] to the
      more idiomatic *(u8 *)push(...).
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d58ff351
    • J
      networking: make skb_pull & friends return void pointers · af72868b
      Johannes Berg 提交于
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions return void * and remove all the casts across
      the tree, adding a (u8 *) cast only where the unsigned char pointer
      was used directly, all done with the following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = {
                  skb_pull,
                  __skb_pull,
                  skb_pull_inline,
                  __pskb_pull_tail,
                  __pskb_pull,
                  pskb_pull
          };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = {
                  skb_pull,
                  __skb_pull,
                  skb_pull_inline,
                  __pskb_pull_tail,
                  __pskb_pull,
                  pskb_pull
          };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af72868b
    • J
      networking: make skb_put & friends return void pointers · 4df864c1
      Johannes Berg 提交于
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions (skb_put, __skb_put and pskb_put) return void *
      and remove all the casts across the tree, adding a (u8 *) cast only
      where the unsigned char pointer was used directly, all done with the
      following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = { skb_put, __skb_put };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = { skb_put, __skb_put };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      
      which actually doesn't cover pskb_put since there are only three
      users overall.
      
      A handful of stragglers were converted manually, notably a macro in
      drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
      instances in net/bluetooth/hci_sock.c. In the former file, I also
      had to fix one whitespace problem spatch introduced.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4df864c1
    • J
      networking: introduce and use skb_put_data() · 59ae1d12
      Johannes Berg 提交于
      A common pattern with skb_put() is to just want to memcpy()
      some data into the new space, introduce skb_put_data() for
      this.
      
      An spatch similar to the one for skb_put_zero() converts many
      of the places using it:
      
          @@
          identifier p, p2;
          expression len, skb, data;
          type t, t2;
          @@
          (
          -p = skb_put(skb, len);
          +p = skb_put_data(skb, data, len);
          |
          -p = (t)skb_put(skb, len);
          +p = skb_put_data(skb, data, len);
          )
          (
          p2 = (t2)p;
          -memcpy(p2, data, len);
          |
          -memcpy(p, data, len);
          )
      
          @@
          type t, t2;
          identifier p, p2;
          expression skb, data;
          @@
          t *p;
          ...
          (
          -p = skb_put(skb, sizeof(t));
          +p = skb_put_data(skb, data, sizeof(t));
          |
          -p = (t *)skb_put(skb, sizeof(t));
          +p = skb_put_data(skb, data, sizeof(t));
          )
          (
          p2 = (t2)p;
          -memcpy(p2, data, sizeof(*p));
          |
          -memcpy(p, data, sizeof(*p));
          )
      
          @@
          expression skb, len, data;
          @@
          -memcpy(skb_put(skb, len), data, len);
          +skb_put_data(skb, data, len);
      
      (again, manually post-processed to retain some comments)
      Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59ae1d12
    • J
      skbuff: make skb_put_zero() return void · 83ad357d
      Johannes Berg 提交于
      It's nicer to return void, since then there's no need to
      cast to any structures. Currently none of the users have
      a cast, but a number of future conversions do.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83ad357d
  4. 12 6月, 2017 2 次提交
  5. 05 6月, 2017 1 次提交
    • J
      skbuff: return -EMSGSIZE in skb_to_sgvec to prevent overflow · 48a1df65
      Jason A. Donenfeld 提交于
      This is a defense-in-depth measure in response to bugs like
      4d6fa57b ("macsec: avoid heap overflow in skb_to_sgvec"). There's
      not only a potential overflow of sglist items, but also a stack overflow
      potential, so we fix this by limiting the amount of recursion this function
      is allowed to do. Not actually providing a bounded base case is a future
      disaster that we can easily avoid here.
      
      As a small matter of house keeping, we take this opportunity to move the
      documentation comment over the actual function the documentation is for.
      
      While this could be implemented by using an explicit stack of skbuffs,
      when implementing this, the function complexity increased considerably,
      and I don't think such complexity and bloat is actually worth it. So,
      instead I built this and tested it on x86, x86_64, ARM, ARM64, and MIPS,
      and measured the stack usage there. I also reverted the recent MIPS
      changes that give it a separate IRQ stack, so that I could experience
      some worst-case situations. I found that limiting it to 24 layers deep
      yielded a good stack usage with room for safety, as well as being much
      deeper than any driver actually ever creates.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Sabrina Dubroca <sd@queasysnail.net>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48a1df65
  6. 30 5月, 2017 1 次提交
  7. 22 5月, 2017 2 次提交
  8. 20 5月, 2017 6 次提交
  9. 18 5月, 2017 1 次提交
  10. 17 5月, 2017 1 次提交
  11. 22 4月, 2017 1 次提交
  12. 14 4月, 2017 1 次提交
  13. 09 4月, 2017 1 次提交
    • S
      skbuff: Extend gso_type to unsigned int. · 7f564528
      Steffen Klassert 提交于
      All available gso_type flags are currently in use, so
      extend gso_type from 'unsigned short' to 'unsigned int'
      to be able to add further flags.
      
      We reorder the struct skb_shared_info to use
      two bytes of the four byte hole before dataref.
      All fields before dataref are cleared, i.e.
      four bytes more than before the change.
      
      The remaining two byte hole is moved to the
      beginning of the structure, this protects us
      from immediate overwites on out of bound writes
      to the sk_buff head.
      
      Structure layout on x86-64 before the change:
      
      struct skb_shared_info {
      	unsigned char              nr_frags;             /*     0     1 */
      	__u8                       tx_flags;             /*     1     1 */
      	short unsigned int         gso_size;             /*     2     2 */
      	short unsigned int         gso_segs;             /*     4     2 */
      	short unsigned int         gso_type;             /*     6     2 */
      	struct sk_buff *           frag_list;            /*     8     8 */
      	struct skb_shared_hwtstamps hwtstamps;           /*    16     8 */
      	u32                        tskey;                /*    24     4 */
      	__be32                     ip6_frag_id;          /*    28     4 */
      	atomic_t                   dataref;              /*    32     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	void *                     destructor_arg;       /*    40     8 */
      	skb_frag_t                 frags[17];            /*    48   272 */
      	/* --- cacheline 5 boundary (320 bytes) --- */
      
      	/* size: 320, cachelines: 5, members: 12 */
      	/* sum members: 316, holes: 1, sum holes: 4 */
      };
      
      Structure layout on x86-64 after the change:
      
      struct skb_shared_info {
      	short unsigned int         _unused;              /*     0     2 */
      	unsigned char              nr_frags;             /*     2     1 */
      	__u8                       tx_flags;             /*     3     1 */
      	short unsigned int         gso_size;             /*     4     2 */
      	short unsigned int         gso_segs;             /*     6     2 */
      	struct sk_buff *           frag_list;            /*     8     8 */
      	struct skb_shared_hwtstamps hwtstamps;           /*    16     8 */
      	unsigned int               gso_type;             /*    24     4 */
      	u32                        tskey;                /*    28     4 */
      	__be32                     ip6_frag_id;          /*    32     4 */
      	atomic_t                   dataref;              /*    36     4 */
      	void *                     destructor_arg;       /*    40     8 */
      	skb_frag_t                 frags[17];            /*    48   272 */
      	/* --- cacheline 5 boundary (320 bytes) --- */
      
      	/* size: 320, cachelines: 5, members: 13 */
      };
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f564528
  14. 02 3月, 2017 1 次提交
  15. 11 2月, 2017 1 次提交
  16. 08 2月, 2017 1 次提交
  17. 02 2月, 2017 2 次提交
  18. 11 1月, 2017 1 次提交
  19. 09 1月, 2017 4 次提交
    • W
      net-tc: convert tc_from to tc_from_ingress and tc_redirected · bc31c905
      Willem de Bruijn 提交于
      The tc_from field fulfills two roles. It encodes whether a packet was
      redirected by an act_mirred device and, if so, whether act_mirred was
      called on ingress or egress. Split it into separate fields.
      
      The information is needed by the special IFB loop, where packets are
      taken out of the normal path by act_mirred, forwarded to IFB, then
      reinjected at their original location (ingress or egress) by IFB.
      
      The IFB device cannot use skb->tc_at_ingress, because that may have
      been overwritten as the packet travels from act_mirred to ifb_xmit,
      when it passes through tc_classify on the IFB egress path. Cache this
      value in skb->tc_from_ingress.
      
      That field is valid only if a packet arriving at ifb_xmit came from
      act_mirred. Other packets can be crafted to reach ifb_xmit. These
      must be dropped. Set tc_redirected on redirection and drop all packets
      that do not have this bit set.
      
      Both fields are set only on cloned skbs in tc actions, so original
      packet sources do not have to clear the bit when reusing packets
      (notably, pktgen and octeon).
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc31c905
    • W
      net-tc: convert tc_at to tc_at_ingress · 8dc07fdb
      Willem de Bruijn 提交于
      Field tc_at is used only within tc actions to distinguish ingress from
      egress processing. A single bit is sufficient for this purpose.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8dc07fdb
    • W
      net-tc: convert tc_verd to integer bitfields · a5135bcf
      Willem de Bruijn 提交于
      Extract the remaining two fields from tc_verd and remove the __u16
      completely. TC_AT and TC_FROM are converted to equivalent two-bit
      integer fields tc_at and tc_from. Where possible, use existing
      helper skb_at_tc_ingress when reading tc_at. Introduce helper
      skb_reset_tc to clear fields.
      
      Not documenting tc_from and tc_at, because they will be replaced
      with single bit fields in follow-on patches.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5135bcf
    • W
      net-tc: extract skip classify bit from tc_verd · e7246e12
      Willem de Bruijn 提交于
      Packets sent by the IFB device skip subsequent tc classification.
      A single bit governs this state. Move it out of tc_verd in
      anticipation of removing that __u16 completely.
      
      The new bitfield tc_skip_classify temporarily uses one bit of a
      hole, until tc_verd is removed completely in a follow-up patch.
      
      Remove the bit hole comment. It could be 2, 3, 4 or 5 bits long.
      With that many options, little value in documenting it.
      
      Introduce a helper function to deduplicate the logic in the two
      sites that check this bit.
      
      The field tc_skip_classify is set only in IFB on skbs cloned in
      act_mirred, so original packet sources do not have to clear the
      bit when reusing packets (notably, pktgen and octeon).
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7246e12
  20. 26 12月, 2016 1 次提交
    • T
      ktime: Cleanup ktime_set() usage · 8b0e1953
      Thomas Gleixner 提交于
      ktime_set(S,N) was required for the timespec storage type and is still
      useful for situations where a Seconds and Nanoseconds part of a time value
      needs to be converted. For anything where the Seconds argument is 0, this
      is pointless and can be replaced with a simple assignment.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      8b0e1953
  21. 10 12月, 2016 1 次提交
  22. 09 12月, 2016 1 次提交
    • E
      udp: under rx pressure, try to condense skbs · c8c8b127
      Eric Dumazet 提交于
      Under UDP flood, many softirq producers try to add packets to
      UDP receive queue, and one user thread is burning one cpu trying
      to dequeue packets as fast as possible.
      
      Two parts of the per packet cost are :
      - copying payload from kernel space to user space,
      - freeing memory pieces associated with skb.
      
      If socket is under pressure, softirq handler(s) can try to pull in
      skb->head the payload of the packet if it fits.
      
      Meaning the softirq handler(s) can free/reuse the page fragment
      immediately, instead of letting udp_recvmsg() do this hundreds of usec
      later, possibly from another node.
      
      Additional gains :
      - We reduce skb->truesize and thus can store more packets per SO_RCVBUF
      - We avoid cache line misses at copyout() time and consume_skb() time,
      and avoid one put_page() with potential alien freeing on NUMA hosts.
      
      This comes at the cost of a copy, bounded to available tail room, which
      is usually small. (We might have to fix GRO_MAX_HEAD which looks bigger
      than necessary)
      
      This patch gave me about 5 % increase in throughput in my tests.
      
      skb_condense() helper could probably used in other contexts.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8c8b127
  23. 06 12月, 2016 1 次提交