1. 10 1月, 2020 2 次提交
  2. 25 12月, 2019 2 次提交
  3. 05 12月, 2019 1 次提交
    • M
      net: Fixed updating of ethertype in skb_mpls_push() · d04ac224
      Martin Varghese 提交于
      The skb_mpls_push was not updating ethertype of an ethernet packet if
      the packet was originally received from a non ARPHRD_ETHER device.
      
      In the below OVS data path flow, since the device corresponding to
      port 7 is an l3 device (ARPHRD_NONE) the skb_mpls_push function does
      not update the ethertype of the packet even though the previous
      push_eth action had added an ethernet header to the packet.
      
      recirc_id(0),in_port(7),eth_type(0x0800),ipv4(tos=0/0xfc,ttl=64,frag=no),
      actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
      push_mpls(label=13,tc=0,ttl=64,bos=1,eth_type=0x8847),4
      
      Fixes: 8822e270 ("net: core: move push MPLS functionality from OvS to core helper")
      Signed-off-by: NMartin Varghese <martin.varghese@nokia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d04ac224
  4. 03 12月, 2019 1 次提交
    • M
      Fixed updating of ethertype in function skb_mpls_pop · 040b5cfb
      Martin Varghese 提交于
      The skb_mpls_pop was not updating ethertype of an ethernet packet if the
      packet was originally received from a non ARPHRD_ETHER device.
      
      In the below OVS data path flow, since the device corresponding to port 7
      is an l3 device (ARPHRD_NONE) the skb_mpls_pop function does not update
      the ethertype of the packet even though the previous push_eth action had
      added an ethernet header to the packet.
      
      recirc_id(0),in_port(7),eth_type(0x8847),
      mpls(label=12/0xfffff,tc=0/0,ttl=0/0x0,bos=1/1),
      actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
      pop_mpls(eth_type=0x800),4
      
      Fixes: ed246cee ("net: core: move pop MPLS functionality from OvS to core helper")
      Signed-off-by: NMartin Varghese <martin.varghese@nokia.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      040b5cfb
  5. 16 10月, 2019 2 次提交
    • D
      net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actions · fa4e0f88
      Davide Caratti 提交于
      the following script:
      
       # tc qdisc add dev eth0 clsact
       # tc filter add dev eth0 egress protocol ip matchall \
       > action mpls push protocol mpls_uc label 0x355aa bos 1
      
      causes corruption of all IP packets transmitted by eth0. On TC egress, we
      can't rely on the value of skb->mac_len, because it's 0 and a MPLS 'push'
      operation will result in an overwrite of the first 4 octets in the packet
      L2 header (e.g. the Destination Address if eth0 is an Ethernet); the same
      error pattern is present also in the MPLS 'pop' operation. Fix this error
      in act_mpls data plane, computing 'mac_len' as the difference between the
      network header and the mac header (when not at TC ingress), and use it in
      MPLS 'push'/'pop' core functions.
      
      v2: unbreak 'make htmldocs' because of missing documentation of 'mac_len'
          in skb_mpls_pop(), reported by kbuild test robot
      
      CC: Lorenzo Bianconi <lorenzo@kernel.org>
      Fixes: 2a2ea508 ("net: sched: add mpls manipulation actions to TC")
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Acked-by: NJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa4e0f88
    • D
      net: avoid errors when trying to pop MLPS header on non-MPLS packets · dedc5a08
      Davide Caratti 提交于
      the following script:
      
       # tc qdisc add dev eth0 clsact
       # tc filter add dev eth0 egress matchall action mpls pop
      
      implicitly makes the kernel drop all packets transmitted by eth0, if they
      don't have a MPLS header. This behavior is uncommon: other encapsulations
      (like VLAN) just let the packet pass unmodified. Since the result of MPLS
      'pop' operation would be the same regardless of the presence / absence of
      MPLS header(s) in the original packet, we can let skb_mpls_pop() return 0
      when dealing with non-MPLS packets.
      
      For the OVS use-case, this is acceptable because __ovs_nla_copy_actions()
      already ensures that MPLS 'pop' operation only occurs with packets having
      an MPLS Ethernet type (and there are no other callers in current code, so
      the semantic change should be ok).
      
      v2: better documentation of use-cases for skb_mpls_pop(), thanks to Simon
          Horman
      
      Fixes: 2a2ea508 ("net: sched: add mpls manipulation actions to TC")
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Acked-by: NJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dedc5a08
  6. 14 10月, 2019 1 次提交
  7. 02 10月, 2019 1 次提交
    • F
      netfilter: drop bridge nf reset from nf_reset · 895b5c9f
      Florian Westphal 提交于
      commit 174e2381
      ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
      recycle always drop skb extensions.  The additional skb_ext_del() that is
      performed via nf_reset on napi skb recycle is not needed anymore.
      
      Most nf_reset() calls in the stack are there so queued skb won't block
      'rmmod nf_conntrack' indefinitely.
      
      This removes the skb_ext_del from nf_reset, and renames it to a more
      fitting nf_reset_ct().
      
      In a few selected places, add a call to skb_ext_reset to make sure that
      no active extensions remain.
      
      I am submitting this for "net", because we're still early in the release
      cycle.  The patch applies to net-next too, but I think the rename causes
      needless divergence between those trees.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      895b5c9f
  8. 28 9月, 2019 1 次提交
    • F
      sk_buff: drop all skb extensions on free and skb scrubbing · 174e2381
      Florian Westphal 提交于
      Now that we have a 3rd extension, add a new helper that drops the
      extension space and use it when we need to scrub an sk_buff.
      
      At this time, scrubbing clears secpath and bridge netfilter data, but
      retains the tc skb extension, after this patch all three get cleared.
      
      NAPI reuse/free assumes we can only have a secpath attached to skb, but
      it seems better to clear all extensions there as well.
      
      v2: add unlikely hint (Eric Dumazet)
      
      Fixes: 95a7233c ("net: openvswitch: Set OvS recirc_id from tc chain index")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      174e2381
  9. 07 9月, 2019 1 次提交
    • S
      net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list · 3dcbdb13
      Shmulik Ladkani 提交于
      Historically, support for frag_list packets entering skb_segment() was
      limited to frag_list members terminating on exact same gso_size
      boundaries. This is verified with a BUG_ON since commit 89319d38
      ("net: Add frag_list support to skb_segment"), quote:
      
          As such we require all frag_list members terminate on exact MSS
          boundaries.  This is checked using BUG_ON.
          As there should only be one producer in the kernel of such packets,
          namely GRO, this requirement should not be difficult to maintain.
      
      However, since commit 6578171a ("bpf: add bpf_skb_change_proto helper"),
      the "exact MSS boundaries" assumption no longer holds:
      An eBPF program using bpf_skb_change_proto() DOES modify 'gso_size', but
      leaves the frag_list members as originally merged by GRO with the
      original 'gso_size'. Example of such programs are bpf-based NAT46 or
      NAT64.
      
      This lead to a kernel BUG_ON for flows involving:
       - GRO generating a frag_list skb
       - bpf program performing bpf_skb_change_proto() or bpf_skb_adjust_room()
       - skb_segment() of the skb
      
      See example BUG_ON reports in [0].
      
      In commit 13acc94e ("net: permit skb_segment on head_frag frag_list skb"),
      skb_segment() was modified to support the "gso_size mangling" case of
      a frag_list GRO'ed skb, but *only* for frag_list members having
      head_frag==true (having a page-fragment head).
      
      Alas, GRO packets having frag_list members with a linear kmalloced head
      (head_frag==false) still hit the BUG_ON.
      
      This commit adds support to skb_segment() for a 'head_skb' packet having
      a frag_list whose members are *non* head_frag, with gso_size mangled, by
      disabling SG and thus falling-back to copying the data from the given
      'head_skb' into the generated segmented skbs - as suggested by Willem de
      Bruijn [1].
      
      Since this approach involves the penalty of skb_copy_and_csum_bits()
      when building the segments, care was taken in order to enable this
      solution only when required:
       - untrusted gso_size, by testing SKB_GSO_DODGY is set
         (SKB_GSO_DODGY is set by any gso_size mangling functions in
          net/core/filter.c)
       - the frag_list is non empty, its item is a non head_frag, *and* the
         headlen of the given 'head_skb' does not match the gso_size.
      
      [0]
      https://lore.kernel.org/netdev/20190826170724.25ff616f@pixies/
      https://lore.kernel.org/netdev/9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com/
      
      [1]
      https://lore.kernel.org/netdev/CA+FuTSfVsgNDi7c=GUU8nMg2hWxF2SjCNLXetHeVPdnxAW5K-w@mail.gmail.com/
      
      Fixes: 6578171a ("bpf: add bpf_skb_change_proto helper")
      Suggested-by: NWillem de Bruijn <willemdebruijn.kernel@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3dcbdb13
  10. 06 9月, 2019 1 次提交
    • P
      net: openvswitch: Set OvS recirc_id from tc chain index · 95a7233c
      Paul Blakey 提交于
      Offloaded OvS datapath rules are translated one to one to tc rules,
      for example the following simplified OvS rule:
      
      recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)
      
      Will be translated to the following tc rule:
      
      $ tc filter add dev dev1 ingress \
      	    prio 1 chain 0 proto ip \
      		flower tcp ct_state -trk \
      		action ct pipe \
      		action goto chain 2
      
      Received packets will first travel though tc, and if they aren't stolen
      by it, like in the above rule, they will continue to OvS datapath.
      Since we already did some actions (action ct in this case) which might
      modify the packets, and updated action stats, we would like to continue
      the proccessing with the correct recirc_id in OvS (here recirc_id(2))
      where we left off.
      
      To support this, introduce a new skb extension for tc, which
      will be used for translating tc chain to ovs recirc_id to
      handle these miss cases. Last tc chain index will be set
      by tc goto chain action and read by OvS datapath.
      Signed-off-by: NPaul Blakey <paulb@mellanox.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95a7233c
  11. 31 7月, 2019 1 次提交
  12. 23 7月, 2019 2 次提交
  13. 17 7月, 2019 1 次提交
  14. 09 7月, 2019 5 次提交
  15. 10 6月, 2019 2 次提交
    • S
      net: Don't disable interrupts in __netdev_alloc_skb() · 92dcabd7
      Sebastian Andrzej Siewior 提交于
      __netdev_alloc_skb() can be used from any context and is used by NAPI
      and non-NAPI drivers. Non-NAPI drivers use it in interrupt context and
      NAPI drivers use it during initial allocation (->ndo_open() or
      ->ndo_change_mtu()). Some NAPI drivers share the same function for the
      initial allocation and the allocation in their NAPI callback.
      
      The interrupts are disabled in order to ensure locked access from every
      context to `netdev_alloc_cache'.
      
      Let __netdev_alloc_skb() check if interrupts are disabled. If they are, use
      `netdev_alloc_cache'. Otherwise disable BH and use `napi_alloc_cache.page'.
      The IRQ check is cheaper compared to disabling & enabling interrupts and
      memory allocation with disabled interrupts does not work on -RT.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92dcabd7
    • S
      net: Don't disable interrupts in napi_alloc_frag() · 7ba7aeab
      Sebastian Andrzej Siewior 提交于
      netdev_alloc_frag() can be used from any context and is used by NAPI
      and non-NAPI drivers. Non-NAPI drivers use it in interrupt context
      and NAPI drivers use it during initial allocation (->ndo_open() or
      ->ndo_change_mtu()). Some NAPI drivers share the same function for the
      initial allocation and the allocation in their NAPI callback.
      
      The interrupts are disabled in order to ensure locked access from every
      context to `netdev_alloc_cache'.
      
      Let netdev_alloc_frag() check if interrupts are disabled. If they are,
      use `netdev_alloc_cache' otherwise disable BH and invoke
      __napi_alloc_frag() for the allocation. The IRQ check is cheaper
      compared to disabling & enabling interrupts and memory allocation with
      disabled interrupts does not work on -RT.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ba7aeab
  16. 05 6月, 2019 1 次提交
  17. 31 5月, 2019 3 次提交
  18. 25 5月, 2019 1 次提交
    • J
      bpf: sockmap, fix use after free from sleep in psock backlog workqueue · bd95e678
      John Fastabend 提交于
      Backlog work for psock (sk_psock_backlog) might sleep while waiting
      for memory to free up when sending packets. However, while sleeping
      the socket may be closed and removed from the map by the user space
      side.
      
      This breaks an assumption in sk_stream_wait_memory, which expects the
      wait queue to be still there when it wakes up resulting in a
      use-after-free shown below. To fix his mark sendmsg as MSG_DONTWAIT
      to avoid the sleep altogether. We already set the flag for the
      sendpage case but we missed the case were sendmsg is used.
      Sockmap is currently the only user of skb_send_sock_locked() so only
      the sockmap paths should be impacted.
      
      ==================================================================
      BUG: KASAN: use-after-free in remove_wait_queue+0x31/0x70
      Write of size 8 at addr ffff888069a0c4e8 by task kworker/0:2/110
      
      CPU: 0 PID: 110 Comm: kworker/0:2 Not tainted 5.0.0-rc2-00335-g28f9d1a3-dirty #14
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
      Workqueue: events sk_psock_backlog
      Call Trace:
       print_address_description+0x6e/0x2b0
       ? remove_wait_queue+0x31/0x70
       kasan_report+0xfd/0x177
       ? remove_wait_queue+0x31/0x70
       ? remove_wait_queue+0x31/0x70
       remove_wait_queue+0x31/0x70
       sk_stream_wait_memory+0x4dd/0x5f0
       ? sk_stream_wait_close+0x1b0/0x1b0
       ? wait_woken+0xc0/0xc0
       ? tcp_current_mss+0xc5/0x110
       tcp_sendmsg_locked+0x634/0x15d0
       ? tcp_set_state+0x2e0/0x2e0
       ? __kasan_slab_free+0x1d1/0x230
       ? kmem_cache_free+0x70/0x140
       ? sk_psock_backlog+0x40c/0x4b0
       ? process_one_work+0x40b/0x660
       ? worker_thread+0x82/0x680
       ? kthread+0x1b9/0x1e0
       ? ret_from_fork+0x1f/0x30
       ? check_preempt_curr+0xaf/0x130
       ? iov_iter_kvec+0x5f/0x70
       ? kernel_sendmsg_locked+0xa0/0xe0
       skb_send_sock_locked+0x273/0x3c0
       ? skb_splice_bits+0x180/0x180
       ? start_thread+0xe0/0xe0
       ? update_min_vruntime.constprop.27+0x88/0xc0
       sk_psock_backlog+0xb3/0x4b0
       ? strscpy+0xbf/0x1e0
       process_one_work+0x40b/0x660
       worker_thread+0x82/0x680
       ? process_one_work+0x660/0x660
       kthread+0x1b9/0x1e0
       ? __kthread_create_on_node+0x250/0x250
       ret_from_fork+0x1f/0x30
      
      Fixes: 20bf50de ("skbuff: Function to send an skbuf on a socket")
      Reported-by: NJakub Sitnicki <jakub@cloudflare.com>
      Tested-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      bd95e678
  19. 18 4月, 2019 1 次提交
  20. 17 4月, 2019 1 次提交
  21. 04 4月, 2019 1 次提交
    • S
      net-gro: Fix GRO flush when receiving a GSO packet. · 0ab03f35
      Steffen Klassert 提交于
      Currently we may merge incorrectly a received GSO packet
      or a packet with frag_list into a packet sitting in the
      gro_hash list. skb_segment() may crash case because
      the assumptions on the skb layout are not met.
      The correct behaviour would be to flush the packet in the
      gro_hash list and send the received GSO packet directly
      afterwards. Commit d61d072e ("net-gro: avoid reorders")
      sets NAPI_GRO_CB(skb)->flush in this case, but this is not
      checked before merging. This patch makes sure to check this
      flag and to not merge in that case.
      
      Fixes: d61d072e ("net-gro: avoid reorders")
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ab03f35
  22. 28 3月, 2019 1 次提交
  23. 18 2月, 2019 1 次提交
  24. 05 1月, 2019 1 次提交
    • D
      net, skbuff: do not prefer skb allocation fails early · f8c468e8
      David Rientjes 提交于
      Commit dcda9b04 ("mm, tree wide: replace __GFP_REPEAT by
      __GFP_RETRY_MAYFAIL with more useful semantic") replaced __GFP_REPEAT in
      alloc_skb_with_frags() with __GFP_RETRY_MAYFAIL when the allocation may
      directly reclaim.
      
      The previous behavior would require reclaim up to 1 << order pages for
      skb aligned header_len of order > PAGE_ALLOC_COSTLY_ORDER before failing,
      otherwise the allocations in alloc_skb() would loop in the page allocator
      looking for memory.  __GFP_RETRY_MAYFAIL makes both allocations failable
      under memory pressure, including for the HEAD allocation.
      
      This can cause, among many other things, write() to fail with ENOTCONN
      during RPC when under memory pressure.
      
      These allocations should succeed as they did previous to dcda9b04
      even if it requires calling the oom killer and additional looping in the
      page allocator to find memory.  There is no way to specify the previous
      behavior of __GFP_REPEAT, but it's unlikely to be necessary since the
      previous behavior only guaranteed that 1 << order pages would be reclaimed
      before failing for order > PAGE_ALLOC_COSTLY_ORDER.  That reclaim is not
      guaranteed to be contiguous memory, so repeating for such large orders is
      usually not beneficial.
      
      Removing the setting of __GFP_RETRY_MAYFAIL to restore the previous
      behavior, specifically not allowing alloc_skb() to fail for small orders
      and oom kill if necessary rather than allowing RPCs to fail.
      
      Fixes: dcda9b04 ("mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic")
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8c468e8
  25. 22 12月, 2018 2 次提交
  26. 20 12月, 2018 3 次提交
    • F
      net: switch secpath to use skb extension infrastructure · 4165079b
      Florian Westphal 提交于
      Remove skb->sp and allocate secpath storage via extension
      infrastructure.  This also reduces sk_buff by 8 bytes on x86_64.
      
      Total size of allyesconfig kernel is reduced slightly, as there is
      less inlined code (one conditional atomic op instead of two on
      skb_clone).
      
      No differences in throughput in following ipsec performance tests:
      - transport mode with aes on 10GB link
      - tunnel mode between two network namespaces with aes and null cipher
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4165079b
    • F
      net: convert bridge_nf to use skb extension infrastructure · de8bda1d
      Florian Westphal 提交于
      This converts the bridge netfilter (calling iptables hooks from bridge)
      facility to use the extension infrastructure.
      
      The bridge_nf specific hooks in skb clone and free paths are removed, they
      have been replaced by the skb_ext hooks that do the same as the bridge nf
      allocations hooks did.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de8bda1d
    • F
      sk_buff: add skb extension infrastructure · df5042f4
      Florian Westphal 提交于
      This adds an optional extension infrastructure, with ispec (xfrm) and
      bridge netfilter as first users.
      objdiff shows no changes if kernel is built without xfrm and br_netfilter
      support.
      
      The third (planned future) user is Multipath TCP which is still
      out-of-tree.
      MPTCP needs to map logical mptcp sequence numbers to the tcp sequence
      numbers used by individual subflows.
      
      This DSS mapping is read/written from tcp option space on receive and
      written to tcp option space on transmitted tcp packets that are part of
      and MPTCP connection.
      
      Extending skb_shared_info or adding a private data field to skb fclones
      doesn't work for incoming skb, so a different DSS propagation method would
      be required for the receive side.
      
      mptcp has same requirements as secpath/bridge netfilter:
      
      1. extension memory is released when the sk_buff is free'd.
      2. data is shared after cloning an skb (clone inherits extension)
      3. adding extension to an skb will COW the extension buffer if needed.
      
      The "MPTCP upstreaming" effort adds SKB_EXT_MPTCP extension to store the
      mapping for tx and rx processing.
      
      Two new members are added to sk_buff:
      1. 'active_extensions' byte (filling a hole), telling which extensions
         are available for this skb.
         This has two purposes.
         a) avoids the need to initialize the pointer.
         b) allows to "delete" an extension by clearing its bit
         value in ->active_extensions.
      
         While it would be possible to store the active_extensions byte
         in the extension struct instead of sk_buff, there is one problem
         with this:
          When an extension has to be disabled, we can always clear the
          bit in skb->active_extensions.  But in case it would be stored in the
          extension buffer itself, we might have to COW it first, if
          we are dealing with a cloned skb.  On kmalloc failure we would
          be unable to turn an extension off.
      
      2. extension pointer, located at the end of the sk_buff.
         If the active_extensions byte is 0, the pointer is undefined,
         it is not initialized on skb allocation.
      
      This adds extra code to skb clone and free paths (to deal with
      refcount/free of extension area) but this replaces similar code that
      manages skb->nf_bridge and skb->sp structs in the followup patches of
      the series.
      
      It is possible to add support for extensions that are not preseved on
      clones/copies.
      
      To do this, it would be needed to define a bitmask of all extensions that
      need copy/cow semantics, and change __skb_ext_copy() to check
      ->active_extensions & SKB_EXT_PRESERVE_ON_CLONE, then just set
      ->active_extensions to 0 on the new clone.
      
      This isn't done here because all extensions that get added here
      need the copy/cow semantics.
      
      v2:
      Allocate entire extension space using kmem_cache.
      Upside is that this allows better tracking of used memory,
      downside is that we will allocate more space than strictly needed in
      most cases (its unlikely that all extensions are active/needed at same
      time for same skb).
      The allocated memory (except the small extension header) is not cleared,
      so no additonal overhead aside from memory usage.
      
      Avoid atomic_dec_and_test operation on skb_ext_put()
      by using similar trick as kfree_skbmem() does with fclone_ref:
      If recount is 1, there is no concurrent user and we can free right away.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df5042f4