1. 11 3月, 2021 2 次提交
    • D
      net, bpf: Fix ip6ip6 crash with collect_md populated skbs · a188bb56
      Daniel Borkmann 提交于
      I ran into a crash where setting up a ip6ip6 tunnel device which was /not/
      set to collect_md mode was receiving collect_md populated skbs for xmit.
      
      The BPF prog was populating the skb via bpf_skb_set_tunnel_key() which is
      assigning special metadata dst entry and then redirecting the skb to the
      device, taking ip6_tnl_start_xmit() -> ipxip6_tnl_xmit() -> ip6_tnl_xmit()
      and in the latter it performs a neigh lookup based on skb_dst(skb) where
      we trigger a NULL pointer dereference on dst->ops->neigh_lookup() since
      the md_dst_ops do not populate neigh_lookup callback with a fake handler.
      
      Transform the md_dst_ops into generic dst_blackhole_ops that can also be
      reused elsewhere when needed, and use them for the metadata dst entries as
      callback ops.
      
      Also, remove the dst_md_discard{,_out}() ops and rely on dst_discard{,_out}()
      from dst_init() which free the skb the same way modulo the splat. Given we
      will be able to recover just fine from there, avoid any potential splats
      iff this gets ever triggered in future (or worse, panic on warns when set).
      
      Fixes: f38a9eb1 ("dst: Metadata destinations")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a188bb56
    • D
      net: Consolidate common blackhole dst ops · c4c877b2
      Daniel Borkmann 提交于
      Move generic blackhole dst ops to the core and use them from both
      ipv4_dst_blackhole_ops and ip6_dst_blackhole_ops where possible. No
      functional change otherwise. We need these also in other locations
      and having to define them over and over again is not great.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4c877b2
  2. 11 9月, 2020 1 次提交
  3. 09 5月, 2020 1 次提交
  4. 26 9月, 2019 1 次提交
  5. 02 7月, 2019 1 次提交
  6. 21 5月, 2019 1 次提交
  7. 22 3月, 2019 1 次提交
  8. 18 1月, 2019 1 次提交
  9. 04 8月, 2018 1 次提交
  10. 18 4月, 2018 1 次提交
    • D
      net/ipv6: move metrics from dst to rt6_info · d4ead6b3
      David Ahern 提交于
      Similar to IPv4, add fib metrics to the fib struct, which at the moment
      is rt6_info. Will be moved to fib6_info in a later patch. Copy metrics
      into dst by reference using refcount.
      
      To make the transition:
      - add dst_metrics to rt6_info. Default to dst_default_metrics if no
        metrics are passed during route add. No need for a separate pmtu
        entry; it can reference the MTU slot in fib6_metrics
      
      - ip6_convert_metrics allocates memory in the FIB entry and uses
        ip_metrics_convert to copy from netlink attribute to metrics entry
      
      - the convert metrics call is done in ip6_route_info_create simplifying
        the route add path
        + fib6_commit_metrics and fib6_copy_metrics and the temporary
          mx6_config are no longer needed
      
      - add fib6_metric_set helper to change the value of a metric in the
        fib entry since dst_metric_set can no longer be used
      
      - cow_metrics for IPv6 can drop to dst_cow_metrics_generic
      
      - rt6_dst_from_metrics_check is no longer needed
      
      - rt6_fill_node needs the FIB entry and dst as separate arguments to
        keep compatibility with existing output. Current dst address is
        renamed to dest.
        (to be consistent with IPv4 rt6_fill_node really should be split
        into 2 functions similar to fib_dump_info and rt_fill_info)
      
      - rt6_fill_node no longer needs the temporary metrics variable
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4ead6b3
  11. 30 11月, 2017 5 次提交
    • D
      net: Remove dst->next · 7149f813
      David Miller 提交于
      There are no more users.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      7149f813
    • D
      xfrm: Move dst->path into struct xfrm_dst · 0f6c480f
      David Miller 提交于
      The first member of an IPSEC route bundle chain sets it's dst->path to
      the underlying ipv4/ipv6 route that carries the bundle.
      
      Stated another way, if one were to follow the xfrm_dst->child chain of
      the bundle, the final non-NULL pointer would be the path and point to
      either an ipv4 or an ipv6 route.
      
      This is largely used to make sure that PMTU events propagate down to
      the correct ipv4 or ipv6 route.
      
      When we don't have the top of an IPSEC bundle 'dst->path == dst'.
      
      Move it down into xfrm_dst and key off of dst->xfrm.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      0f6c480f
    • D
      ipv6: Move dst->from into struct rt6_info. · 3a2232e9
      David Miller 提交于
      The dst->from value is only used by ipv6 routes to track where
      a route "came from".
      
      Any time we clone or copy a core ipv6 route in the ipv6 routing
      tables, we have the copy/clone's ->from point to the base route.
      
      This is used to handle route expiration properly.
      
      Only ipv6 uses this mechanism, and only ipv6 code references
      it.  So it is safe to move it into rt6_info.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      3a2232e9
    • D
      xfrm: Move child route linkage into xfrm_dst. · b6ca8bd5
      David Miller 提交于
      XFRM bundle child chains look like this:
      
      	xdst1 --> xdst2 --> xdst3 --> path_dst
      
      All of xdstN are xfrm_dst objects and xdst->u.dst.xfrm is non-NULL.
      The final child pointer in the chain, here called 'path_dst', is some
      other kind of route such as an ipv4 or ipv6 one.
      
      The xfrm output path pops routes, one at a time, via the child
      pointer, until we hit one which has a dst->xfrm pointer which
      is NULL.
      
      We can easily preserve the above mechanisms with child sitting
      only in the xfrm_dst structure.  All children in the chain
      before we break out of the xfrm_output() loop have dst->xfrm
      non-NULL and are therefore xfrm_dst objects.
      
      Since we break out of the loop when we find dst->xfrm NULL, we
      will not try to dereference 'dst' as if it were an xfrm_dst.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6ca8bd5
    • D
      net: Create and use new helper xfrm_dst_child(). · b92cf4aa
      David Miller 提交于
      Only IPSEC routes have a non-NULL dst->child pointer.  And IPSEC
      routes are identified by a non-NULL dst->xfrm pointer.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b92cf4aa
  12. 11 10月, 2017 2 次提交
  13. 22 8月, 2017 1 次提交
    • D
      net: check type when freeing metadata dst · e65a4955
      David Lamparter 提交于
      Commit 3fcece12 ("net: store port/representator id in metadata_dst")
      added a new type field to metadata_dst, but metadata_dst_free() wasn't
      updated to check it before freeing the METADATA_IP_TUNNEL specific dst
      cache entry.
      
      This is not currently causing problems since it's far enough back in the
      struct to be zeroed for the only other type currently in existance
      (METADATA_HW_PORT_MUX), but nevertheless it's not correct.
      
      Fixes: 3fcece12 ("net: store port/representator id in metadata_dst")
      Signed-off-by: NDavid Lamparter <equinox@diac24.net>
      Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
      Cc: Sridhar Samudrala <sridhar.samudrala@intel.com>
      Cc: Simon Horman <horms@verge.net.au>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e65a4955
  14. 19 8月, 2017 1 次提交
  15. 25 6月, 2017 1 次提交
    • J
      net: store port/representator id in metadata_dst · 3fcece12
      Jakub Kicinski 提交于
      Switches and modern SR-IOV enabled NICs may multiplex traffic from Port
      representators and control messages over single set of hardware queues.
      Control messages and muxed traffic may need ordered delivery.
      
      Those requirements make it hard to comfortably use TC infrastructure today
      unless we have a way of attaching metadata to skbs at the upper device.
      Because single set of queues is used for many netdevs stopping TC/sched
      queues of all of them reliably is impossible and lower device has to
      retreat to returning NETDEV_TX_BUSY and usually has to take extra locks on
      the fastpath.
      
      This patch attempts to enable port/representative devs to attach metadata
      to skbs which carry port id.  This way representatives can be queueless and
      all queuing can be performed at the lower netdev in the usual way.
      
      Traffic arriving on the port/representative interfaces will be have
      metadata attached and will subsequently be queued to the lower device for
      transmission.  The lower device should recognize the metadata and translate
      it to HW specific format which is most likely either a special header
      inserted before the network headers or descriptor/metadata fields.
      
      Metadata is associated with the lower device by storing the netdev pointer
      along with port id so that if TC decides to redirect or mirror the new
      netdev will not try to interpret it.
      
      This is mostly for SR-IOV devices since switches don't have lower netdevs
      today.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fcece12
  16. 18 6月, 2017 6 次提交
    • W
      net: remove DST_NOCACHE flag · a4c2fd7f
      Wei Wang 提交于
      DST_NOCACHE flag check has been removed from dst_release() and
      dst_hold_safe() in a previous patch because all the dst are now ref
      counted properly and can be released based on refcnt only.
      Looking at the rest of the DST_NOCACHE use, all of them can now be
      removed or replaced with other checks.
      So this patch gets rid of all the DST_NOCACHE usage and remove this flag
      completely.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4c2fd7f
    • W
      net: remove DST_NOGC flag · b2a9c0ed
      Wei Wang 提交于
      Now that all the components have been changed to release dst based on
      refcnt only and not depend on dst gc anymore, we can remove the
      temporary flag DST_NOGC.
      
      Note that we also need to remove the DST_NOCACHE check in dst_release()
      and dst_hold_safe() because now all the dst are released based on refcnt
      and behaves as DST_NOCACHE.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2a9c0ed
    • W
      net: remove dst gc related code · 5b7c9a8f
      Wei Wang 提交于
      This patch removes all dst gc related code and all the dst free
      functions
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b7c9a8f
    • W
      xfrm: take refcnt of dst when creating struct xfrm_dst bundle · 52df157f
      Wei Wang 提交于
      During the creation of xfrm_dst bundle, always take ref count when
      allocating the dst. This way, xfrm_bundle_create() will form a linked
      list of dst with dst->child pointing to a ref counted dst child. And
      the returned dst pointer is also ref counted. This makes the link from
      the flow cache to this dst now ref counted properly.
      As the dst is always ref counted properly, we can safely mark
      DST_NOGC flag so dst_release() will release dst based on refcnt only.
      And dst gc is no longer needed and all dst_free() and its related
      function calls should be replaced with dst_release() or
      dst_release_immediate().
      
      The special handling logic for dst->child in dst_destroy() can be
      replaced with a simple dst_release_immediate() call on the child to
      release the whole list linked by dst->child pointer.
      Previously used DST_NOHASH flag is not needed anymore as well. The
      reason that DST_NOHASH is used in the existing code is mainly to prevent
      the dst inserted in the fib tree to be wrongly destroyed during the
      deletion of the xfrm_dst bundle. So in the existing code, DST_NOHASH
      flag is marked in all the dst children except the one which is in the
      fib tree.
      However, with this patch series to remove dst gc logic and release dst
      only based on ref count, it is safe to release all the children from a
      xfrm_dst bundle as long as the dst children are all ref counted
      properly which is already the case in the existing code.
      So, this patch removes the use of DST_NOHASH flag.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52df157f
    • W
      net: introduce a new function dst_dev_put() · 4a6ce2b6
      Wei Wang 提交于
      This function should be called when removing routes from fib tree after
      the dst gc is no longer in use.
      We first mark DST_OBSOLETE_DEAD on this dst to make sure next
      dst_ops->check() fails and returns NULL.
      Secondly, as we no longer keep the gc_list, we need to properly
      release dst->dev right at the moment when the dst is removed from
      the fib/fib6 tree.
      It does the following:
      1. change dst->input and output pointers to dst_discard/dst_dscard_out to
         discard all packets
      2. replace dst->dev with loopback interface
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a6ce2b6
    • W
      net: introduce DST_NOGC in dst_release() to destroy dst based on refcnt · 5f56f409
      Wei Wang 提交于
      The current mechanism of freeing dst is a bit complicated. dst has its
      ref count and when user grabs the reference to the dst, the ref count is
      properly taken in most cases except in IPv4/IPv6/decnet/xfrm routing
      code due to some historic reasons.
      
      If the reference to dst is always taken properly, we should be able to
      simplify the logic in dst_release() to destroy dst when dst->__refcnt
      drops from 1 to 0. And this should be the only condition to determine
      if we can call dst_destroy().
      And as dst is always ref counted, there is no need for a dst garbage
      list to hold the dst entries that already get removed by the routing
      code but are still held by other users. And the task to periodically
      check the list to free dst if ref count become 0 is also not needed
      anymore.
      
      This patch introduces a temporary flag DST_NOGC(no garbage collector).
      If it is set in the dst, dst_release() will call dst_destroy() when
      dst->__refcnt drops to 0. dst_hold_safe() will also check for this flag
      and do atomic_inc_not_zero() similar as DST_NOCACHE to avoid double free
      issue.
      This temporary flag is mainly used so that we can make the transition
      component by component without breaking other parts.
      This flag will be removed after all components are properly transitioned.
      
      This patch also introduces a new function dst_release_immediate() which
      destroys dst without waiting on the rcu when refcnt drops to 0. It will
      be used in later patches.
      
      Follow-up patches will correct all the places to properly take ref count
      on dst and mark DST_NOGC. dst_release() or dst_release_immediate() will
      be used to release the dst instead of dst_free() and its related
      functions.
      And final clean-up patch will remove the DST_NOGC flag.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f56f409
  17. 10 6月, 2017 1 次提交
    • K
      Fix an intermittent pr_emerg warning about lo becoming free. · f186ce61
      Krister Johansen 提交于
      It looks like this:
      
      Message from syslogd@flamingo at Apr 26 00:45:00 ...
       kernel:unregister_netdevice: waiting for lo to become free. Usage count = 4
      
      They seem to coincide with net namespace teardown.
      
      The message is emitted by netdev_wait_allrefs().
      
      Forced a kdump in netdev_run_todo, but found that the refcount on the lo
      device was already 0 at the time we got to the panic.
      
      Used bcc to check the blocking in netdev_run_todo.  The only places
      where we're off cpu there are in the rcu_barrier() and msleep() calls.
      That behavior is expected.  The msleep time coincides with the amount of
      time we spend waiting for the refcount to reach zero; the rcu_barrier()
      wait times are not excessive.
      
      After looking through the list of callbacks that the netdevice notifiers
      invoke in this path, it appears that the dst_dev_event is the most
      interesting.  The dst_ifdown path places a hold on the loopback_dev as
      part of releasing the dev associated with the original dst cache entry.
      Most of our notifier callbacks are straight-forward, but this one a)
      looks complex, and b) places a hold on the network interface in
      question.
      
      I constructed a new bcc script that watches various events in the
      liftime of a dst cache entry.  Note that dst_ifdown will take a hold on
      the loopback device until the invalidated dst entry gets freed.
      
      [      __dst_free] on DST: ffff883ccabb7900 IF tap1008300eth0 invoked at 1282115677036183
          __dst_free
          rcu_nocb_kthread
          kthread
          ret_from_fork
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f186ce61
  18. 27 5月, 2017 1 次提交
    • E
      ipv4: add reference counting to metrics · 3fb07daf
      Eric Dumazet 提交于
      Andrey Konovalov reported crashes in ipv4_mtu()
      
      I could reproduce the issue with KASAN kernels, between
      10.246.7.151 and 10.246.7.152 :
      
      1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 &
      
      2) At the same time run following loop :
      while :
      do
       ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
       ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
      done
      
      Cong Wang attempted to add back rt->fi in commit
      82486aa6 ("ipv4: restore rt->fi for reference counting")
      but this proved to add some issues that were complex to solve.
      
      Instead, I suggested to add a refcount to the metrics themselves,
      being a standalone object (in particular, no reference to other objects)
      
      I tried to make this patch as small as possible to ease its backport,
      instead of being super clean. Note that we believe that only ipv4 dst
      need to take care of the metric refcount. But if this is wrong,
      this patch adds the basic infrastructure to extend this to other
      families.
      
      Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang
      for his efforts on this problem.
      
      Fixes: 2860583f ("ipv4: Kill rt->fi")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fb07daf
  19. 08 2月, 2017 1 次提交
  20. 17 2月, 2016 1 次提交
  21. 07 1月, 2016 1 次提交
  22. 10 11月, 2015 1 次提交
  23. 08 10月, 2015 1 次提交
  24. 01 9月, 2015 1 次提交
  25. 26 8月, 2015 1 次提交
    • W
      route: fix a use-after-free · e252b3d1
      WANG Cong 提交于
      This patch fixes the following crash:
      
       general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc7+ #166
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       task: ffff88010656d280 ti: ffff880106570000 task.ti: ffff880106570000
       RIP: 0010:[<ffffffff8182f91b>]  [<ffffffff8182f91b>] dst_destroy+0xa6/0xef
       RSP: 0018:ffff880107603e38  EFLAGS: 00010202
       RAX: 0000000000000001 RBX: ffff8800d225a000 RCX: ffffffff82250fd0
       RDX: 0000000000000001 RSI: ffffffff82250fd0 RDI: 6b6b6b6b6b6b6b6b
       RBP: ffff880107603e58 R08: 0000000000000001 R09: 0000000000000001
       R10: 000000000000b530 R11: ffff880107609000 R12: 0000000000000000
       R13: ffffffff82343c40 R14: 0000000000000000 R15: ffffffff8182fb4f
       FS:  0000000000000000(0000) GS:ffff880107600000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 00007fcabd9d3000 CR3: 00000000d7279000 CR4: 00000000000006e0
       Stack:
        ffffffff82250fd0 ffff8801077d6f00 ffffffff82253c40 ffff8800d225a000
        ffff880107603e68 ffffffff8182fb5d ffff880107603f08 ffffffff810d795e
        ffffffff810d7648 ffff880106574000 ffff88010656d280 ffff88010656d280
       Call Trace:
        <IRQ>
        [<ffffffff8182fb5d>] dst_destroy_rcu+0xe/0x1d
        [<ffffffff810d795e>] rcu_process_callbacks+0x618/0x7eb
        [<ffffffff810d7648>] ? rcu_process_callbacks+0x302/0x7eb
        [<ffffffff8182fb4f>] ? dst_gc_task+0x1eb/0x1eb
        [<ffffffff8107e11b>] __do_softirq+0x178/0x39f
        [<ffffffff8107e52e>] irq_exit+0x41/0x95
        [<ffffffff81a4f215>] smp_apic_timer_interrupt+0x34/0x40
        [<ffffffff81a4d5cd>] apic_timer_interrupt+0x6d/0x80
        <EOI>
        [<ffffffff8100b968>] ? default_idle+0x21/0x32
        [<ffffffff8100b966>] ? default_idle+0x1f/0x32
        [<ffffffff8100bf19>] arch_cpu_idle+0xf/0x11
        [<ffffffff810b0bc7>] default_idle_call+0x1f/0x21
        [<ffffffff810b0dce>] cpu_startup_entry+0x1ad/0x273
        [<ffffffff8102fe67>] start_secondary+0x135/0x156
      
      dst is freed right before lwtstate_put(), this is not correct...
      
      Fixes: 61adedf3 ("route: move lwtunnel state to dst_entry")
      Acked-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e252b3d1
  26. 21 8月, 2015 1 次提交
  27. 01 8月, 2015 1 次提交
    • A
      bpf: add helpers to access tunnel metadata · d3aa45ce
      Alexei Starovoitov 提交于
      Introduce helpers to let eBPF programs attached to TC manipulate tunnel metadata:
      bpf_skb_[gs]et_tunnel_key(skb, key, size, flags)
      skb: pointer to skb
      key: pointer to 'struct bpf_tunnel_key'
      size: size of 'struct bpf_tunnel_key'
      flags: room for future extensions
      
      First eBPF program that uses these helpers will allocate per_cpu
      metadata_dst structures that will be used on TX.
      On RX metadata_dst is allocated by tunnel driver.
      
      Typical usage for TX:
      struct bpf_tunnel_key tkey;
      ... populate tkey ...
      bpf_skb_set_tunnel_key(skb, &tkey, sizeof(tkey), 0);
      bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);
      
      RX:
      struct bpf_tunnel_key tkey = {};
      bpf_skb_get_tunnel_key(skb, &tkey, sizeof(tkey), 0);
      ... lookup or redirect based on tkey ...
      
      'struct bpf_tunnel_key' will be extended in the future by adding
      elements to the end and the 'size' argument will indicate which fields
      are populated, thereby keeping backwards compatibility.
      The 'flags' argument may be used as well when the 'size' is not enough or
      to indicate completely different layout of bpf_tunnel_key.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3aa45ce
  28. 22 7月, 2015 1 次提交
    • T
      dst: Metadata destinations · f38a9eb1
      Thomas Graf 提交于
      Introduces a new dst_metadata which enables to carry per packet metadata
      between forwarding and processing elements via the skb->dst pointer.
      
      The structure is set up to be a union. Thus, each separate type of
      metadata requires its own dst instance. If demand arises to carry
      multiple types of metadata concurrently, metadata dst entries can be
      made stackable.
      
      The metadata dst entry is refcnt'ed as expected for now but a non
      reference counted use is possible if the reference is forced before
      queueing the skb.
      
      In order to allow allocating dsts with variable length, the existing
      dst_alloc() is split into a dst_alloc() and dst_init() function. The
      existing dst_init() function to initialize the subsystem is being
      renamed to dst_subsys_init() to make it clear what is what.
      
      The check before ip_route_input() is changed to ignore metadata dsts
      and drop the dst inside the routing function thus allowing to interpret
      metadata in a later commit.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f38a9eb1
  29. 21 7月, 2015 1 次提交