1. 04 7月, 2019 2 次提交
  2. 02 7月, 2019 3 次提交
  3. 28 6月, 2019 1 次提交
    • D
      ipv6: Convert gateway validation to use fib6_info · b2c709cc
      David Ahern 提交于
      Gateway validation does not need a dst_entry, it only needs the fib
      entry to validate the gateway resolution and egress device. So,
      convert ip6_nh_lookup_table from ip6_pol_route to fib6_table_lookup
      and ip6_route_check_nh to use fib6_lookup over rt6_lookup.
      
      ip6_pol_route is a call to fib6_table_lookup and if successful a call
      to fib6_select_path. From there the exception cache is searched for an
      entry or a dst_entry is created to return to the caller. The exception
      entry is not relevant for gateway validation, so what matters are the
      calls to fib6_table_lookup and then fib6_select_path.
      
      Similarly, rt6_lookup can be replaced with a call to fib6_lookup with
      RT6_LOOKUP_F_IFACE set in flags. Again, the exception cache search is
      not relevant, only the lookup with path selection. The primary difference
      in the lookup paths is the use of rt6_select with fib6_lookup versus
      rt6_device_match with rt6_lookup. When you remove complexities in the
      rt6_select path, e.g.,
      1. saddr is not set for gateway validation, so RT6_LOOKUP_F_HAS_SADDR
         is not relevant
      2. rt6_check_neigh is not called so that removes the RT6_NUD_FAIL_DO_RR
         return and round-robin logic.
      
      the code paths are believed to be equivalent for the given use case -
      validate the gateway and optionally given the device. Furthermore, it
      aligns the validation with onlink code path and the lookup path actually
      used for rx and tx.
      
      Adjust the users, ip6_route_check_nh_onlink and ip6_route_check_nh to
      handle a fib6_info vs a rt6_info when performing validation checks.
      
      Existing selftests fib-onlink-tests.sh and fib_tests.sh are used to
      verify the changes.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2c709cc
  4. 27 6月, 2019 3 次提交
    • N
      ipv6: fix neighbour resolution with raw socket · 2c6b55f4
      Nicolas Dichtel 提交于
      The scenario is the following: the user uses a raw socket to send an ipv6
      packet, destinated to a not-connected network, and specify a connected nh.
      Here is the corresponding python script to reproduce this scenario:
      
       import socket
       IPPROTO_RAW = 255
       send_s = socket.socket(socket.AF_INET6, socket.SOCK_RAW, IPPROTO_RAW)
       # scapy
       # p = IPv6(src='fd00:100::1', dst='fd00:200::fa')/ICMPv6EchoRequest()
       # str(p)
       req = b'`\x00\x00\x00\x00\x08:@\xfd\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\xfd\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xfa\x80\x00\x81\xc0\x00\x00\x00\x00'
       send_s.sendto(req, ('fd00:175::2', 0, 0, 0))
      
      fd00:175::/64 is a connected route and fd00:200::fa is not a connected
      host.
      
      With this scenario, the kernel starts by sending a NS to resolve
      fd00:175::2. When it receives the NA, it flushes its queue and try to send
      the initial packet. But instead of sending it, it sends another NS to
      resolve fd00:200::fa, which obvioulsy fails, thus the packet is dropped. If
      the user sends again the packet, it now uses the right nh (fd00:175::2).
      
      The problem is that ip6_dst_lookup_neigh() uses the rt6i_gateway, which is
      :: because the associated route is a connected route, thus it uses the dst
      addr of the packet. Let's use rt6_nexthop() to choose the right nh.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c6b55f4
    • N
      ipv6: constify rt6_nexthop() · 9b1c1ef1
      Nicolas Dichtel 提交于
      There is no functional change in this patch, it only prepares the next one.
      
      rt6_nexthop() will be used by ip6_dst_lookup_neigh(), which uses const
      variables.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Acked-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b1c1ef1
    • E
      ipv6: fix suspicious RCU usage in rt6_dump_route() · 3b525691
      Eric Dumazet 提交于
      syzbot reminded us that rt6_nh_dump_exceptions() needs to be called
      with rcu_read_lock()
      
      net/ipv6/route.c:1593 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      2 locks held by syz-executor609/8966:
       #0: 00000000b7dbe288 (rtnl_mutex){+.+.}, at: netlink_dump+0xe7/0xfb0 net/netlink/af_netlink.c:2199
       #1: 00000000f2d87c21 (&(&tb->tb6_lock)->rlock){+...}, at: spin_lock_bh include/linux/spinlock.h:343 [inline]
       #1: 00000000f2d87c21 (&(&tb->tb6_lock)->rlock){+...}, at: fib6_dump_table.isra.0+0x37e/0x570 net/ipv6/ip6_fib.c:533
      
      stack backtrace:
      CPU: 0 PID: 8966 Comm: syz-executor609 Not tainted 5.2.0-rc5+ #43
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       lockdep_rcu_suspicious+0x153/0x15d kernel/locking/lockdep.c:5250
       fib6_nh_get_excptn_bucket+0x18e/0x1b0 net/ipv6/route.c:1593
       rt6_nh_dump_exceptions+0x45/0x4d0 net/ipv6/route.c:5541
       rt6_dump_route+0x904/0xc50 net/ipv6/route.c:5640
       fib6_dump_node+0x168/0x280 net/ipv6/ip6_fib.c:467
       fib6_walk_continue+0x4a9/0x8e0 net/ipv6/ip6_fib.c:1986
       fib6_walk+0x9d/0x100 net/ipv6/ip6_fib.c:2034
       fib6_dump_table.isra.0+0x38a/0x570 net/ipv6/ip6_fib.c:534
       inet6_dump_fib+0x93c/0xb00 net/ipv6/ip6_fib.c:624
       rtnl_dump_all+0x295/0x490 net/core/rtnetlink.c:3445
       netlink_dump+0x558/0xfb0 net/netlink/af_netlink.c:2244
       __netlink_dump_start+0x5b1/0x7d0 net/netlink/af_netlink.c:2352
       netlink_dump_start include/linux/netlink.h:226 [inline]
       rtnetlink_rcv_msg+0x73d/0xb00 net/core/rtnetlink.c:5182
       netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
       rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5237
       netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
       netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328
       netlink_sendmsg+0x8ae/0xd70 net/netlink/af_netlink.c:1917
       sock_sendmsg_nosec net/socket.c:646 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:665
       sock_write_iter+0x27c/0x3e0 net/socket.c:994
       call_write_iter include/linux/fs.h:1872 [inline]
       new_sync_write+0x4d3/0x770 fs/read_write.c:483
       __vfs_write+0xe1/0x110 fs/read_write.c:496
       vfs_write+0x20c/0x580 fs/read_write.c:558
       ksys_write+0x14f/0x290 fs/read_write.c:611
       __do_sys_write fs/read_write.c:623 [inline]
       __se_sys_write fs/read_write.c:620 [inline]
       __x64_sys_write+0x73/0xb0 fs/read_write.c:620
       do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4401b9
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffc8e134978 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004401b9
      RDX: 000000000000001c RSI: 0000000020000000 RDI: 00
      
      Fixes: 1e47b483 ("ipv6: Dump route exceptions if requested")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Stefano Brivio <sbrivio@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b525691
  5. 26 6月, 2019 1 次提交
  6. 25 6月, 2019 6 次提交
    • S
      ip6_fib: Don't discard nodes with valid routing information in fib6_locate_1() · 40cb35d5
      Stefano Brivio 提交于
      When we perform an inexact match on FIB nodes via fib6_locate_1(), longer
      prefixes will be preferred to shorter ones. However, it might happen that
      a node, with higher fn_bit value than some other, has no valid routing
      information.
      
      In this case, we'll pick that node, but it will be discarded by the check
      on RTN_RTINFO in fib6_locate(), and we might miss nodes with valid routing
      information but with lower fn_bit value.
      
      This is apparent when a routing exception is created for a default route:
       # ip -6 route list
       fc00:1::/64 dev veth_A-R1 proto kernel metric 256 pref medium
       fc00:2::/64 dev veth_A-R2 proto kernel metric 256 pref medium
       fc00:4::1 via fc00:2::2 dev veth_A-R2 metric 1024 pref medium
       fe80::/64 dev veth_A-R1 proto kernel metric 256 pref medium
       fe80::/64 dev veth_A-R2 proto kernel metric 256 pref medium
       default via fc00:1::2 dev veth_A-R1 metric 1024 pref medium
       # ip -6 route list cache
       fc00:4::1 via fc00:2::2 dev veth_A-R2 metric 1024 expires 593sec mtu 1500 pref medium
       fc00:3::1 via fc00:1::2 dev veth_A-R1 metric 1024 expires 593sec mtu 1500 pref medium
       # ip -6 route flush cache    # node for default route is discarded
       Failed to send flush request: No such process
       # ip -6 route list cache
       fc00:3::1 via fc00:1::2 dev veth_A-R1 metric 1024 expires 586sec mtu 1500 pref medium
      
      Check right away if the node has a RTN_RTINFO flag, before replacing the
      'prev' pointer, that indicates the longest matching prefix found so far.
      
      Fixes: 38fbeeee ("ipv6: prepare fib6_locate() for exception table")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40cb35d5
    • S
      ipv6: Dump route exceptions if requested · 1e47b483
      Stefano Brivio 提交于
      Since commit 2b760fcf ("ipv6: hook up exception table to store dst
      cache"), route exceptions reside in a separate hash table, and won't be
      found by walking the FIB, so they won't be dumped to userspace on a
      RTM_GETROUTE message.
      
      This causes 'ip -6 route list cache' and 'ip -6 route flush cache' to
      have no function anymore:
      
       # ip -6 route get fc00:3::1
       fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 539sec mtu 1400 pref medium
       # ip -6 route get fc00:4::1
       fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 536sec mtu 1500 pref medium
       # ip -6 route list cache
       # ip -6 route flush cache
       # ip -6 route get fc00:3::1
       fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 520sec mtu 1400 pref medium
       # ip -6 route get fc00:4::1
       fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 519sec mtu 1500 pref medium
      
      because iproute2 lists cached routes using RTM_GETROUTE, and flushes them
      by listing all the routes, and deleting them with RTM_DELROUTE one by one.
      
      If cached routes are requested using the RTM_F_CLONED flag together with
      strict checking, or if no strict checking is requested (and hence we can't
      consistently apply filters), look up exceptions in the hash table
      associated with the current fib6_info in rt6_dump_route(), and, if present
      and not expired, add them to the dump.
      
      We might be unable to dump all the entries for a given node in a single
      message, so keep track of how many entries were handled for the current
      node in fib6_walker, and skip that amount in case we start from the same
      partially dumped node.
      
      When a partial dump restarts, as the starting node might change when
      'sernum' changes, we have no guarantee that we need to skip the same
      amount of in-node entries. Therefore, we need two counters, and we need to
      zero the in-node counter if the node from which the dump is resumed
      differs.
      
      Note that, with the current version of iproute2, this only fixes the
      'ip -6 route list cache': on a flush command, iproute2 doesn't pass
      RTM_F_CLONED and, due to this inconsistency, 'ip -6 route flush cache' is
      still unable to fetch the routes to be flushed. This will be addressed in
      a patch for iproute2.
      
      To flush cached routes, a procfs entry could be introduced instead: that's
      how it works for IPv4. We already have a rt6_flush_exception() function
      ready to be wired to it. However, this would not solve the issue for
      listing.
      
      Versions of iproute2 and kernel tested:
      
                          iproute2
      kernel             4.14.0   4.15.0   4.19.0   5.0.0   5.1.0    5.1.0, patched
       3.18    list        +        +        +        +       +            +
               flush       +        +        +        +       +            +
       4.4     list        +        +        +        +       +            +
               flush       +        +        +        +       +            +
       4.9     list        +        +        +        +       +            +
               flush       +        +        +        +       +            +
       4.14    list        +        +        +        +       +            +
               flush       +        +        +        +       +            +
       4.15    list
               flush
       4.19    list
               flush
       5.0     list
               flush
       5.1     list
               flush
       with    list        +        +        +        +       +            +
       fix     flush       +        +        +                             +
      
      v7:
        - Explain usage of "skip" counters in commit message (suggested by
          David Ahern)
      
      v6:
        - Rebase onto net-next, use recently introduced nexthop walker
        - Make rt6_nh_dump_exceptions() a separate function (suggested by David
          Ahern)
      
      v5:
        - Use dump_routes and dump_exceptions from filter, ignore NLM_F_MATCH,
          update test results (flushing works with iproute2 < 5.0.0 now)
      
      v4:
        - Split NLM_F_MATCH and strict check handling in separate patches
        - Filter routes using RTM_F_CLONED: if it's not set, only return
          non-cached routes, and if it's set, only return cached routes:
          change requested by David Ahern and Martin Lau. This implies that
          iproute2 needs a separate patch to be able to flush IPv6 cached
          routes. This is not ideal because we can't fix the breakage caused
          by 2b760fcf entirely in kernel. However, two years have passed
          since then, and this makes it more tolerable
      
      v3:
        - More descriptive comment about expired exceptions in rt6_dump_route()
        - Swap return values of rt6_dump_route() (suggested by Martin Lau)
        - Don't zero skip_in_node in case we don't dump anything in a given pass
          (also suggested by Martin Lau)
        - Remove check on RTM_F_CLONED altogether: in the current UAPI semantic,
          it's just a flag to indicate the route was cloned, not to filter on
          routes
      
      v2: Add tracking of number of entries to be skipped in current node after
          a partial dump. As we restart from the same node, if not all the
          exceptions for a given node fit in a single message, the dump will
          not terminate, as suggested by Martin Lau. This is a concrete
          possibility, setting up a big number of exceptions for the same route
          actually causes the issue, suggested by David Ahern.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Fixes: 2b760fcf ("ipv6: hook up exception table to store dst cache")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e47b483
    • S
      ipv6/route: Change return code of rt6_dump_route() for partial node dumps · bf9a8a06
      Stefano Brivio 提交于
      In the next patch, we are going to add optional dump of exceptions to
      rt6_dump_route().
      
      Change the return code of rt6_dump_route() to accomodate partial node
      dumps: we might dump multiple routes per node, and might be able to dump
      only a given number of them, so fib6_dump_node() will need to know how
      many routes have been dumped on partial dump, to restart the dump from the
      point where it was interrupted.
      
      Note that fib6_dump_node() is the only caller and already handles all
      non-negative return codes as success: those become -1 to signal that we're
      done with the node. If we fail, return 0, as we were unable to dump the
      single route in the node, but we're not done with it.
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf9a8a06
    • S
      ipv6/route: Don't match on fc_nh_id if not set in ip6_route_del() · 3401bfb1
      Stefano Brivio 提交于
      If fc_nh_id isn't set, we shouldn't try to match against it. This
      actually matters just for the RTF_CACHE below (where this case is
      already handled): if iproute2 gets a route exception and tries to
      delete it, it won't reference it by fc_nh_id, even if a nexthop
      object might be associated to the originating route.
      
      Fixes: 5b98324e ("ipv6: Allow routes to use nexthop objects")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3401bfb1
    • S
      Revert "net/ipv6: Bail early if user only wants cloned entries" · ef11209d
      Stefano Brivio 提交于
      This reverts commit 08e814c9: as we
      are preparing to fix listing and dumping of IPv6 cached routes, we
      need to allow RTM_F_CLONED as a flag to match routes against while
      dumping them.
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef11209d
    • S
      fib_frontend, ip6_fib: Select routes or exceptions dump from RTM_F_CLONED · 564c91f7
      Stefano Brivio 提交于
      The following patches add back the ability to dump IPv4 and IPv6 exception
      routes, and we need to allow selection of regular routes or exceptions.
      
      Use RTM_F_CLONED as filter to decide whether to dump routes or exceptions:
      iproute2 passes it in dump requests (except for IPv6 cache flush requests,
      this will be fixed in iproute2) and this used to work as long as
      exceptions were stored directly in the FIB, for both IPv4 and IPv6.
      
      Caveat: if strict checking is not requested (that is, if the dump request
      doesn't go through ip_valid_fib_dump_req()), we can't filter on protocol,
      tables or route types.
      
      In this case, filtering on RTM_F_CLONED would be inconsistent: we would
      fix 'ip route list cache' by returning exception routes and at the same
      time introduce another bug in case another selector is present, e.g. on
      'ip route list cache table main' we would return all exception routes,
      without filtering on tables.
      
      Keep this consistent by applying no filters at all, and dumping both
      routes and exceptions, if strict checking is not requested. iproute2
      currently filters results anyway, and no unwanted results will be
      presented to the user. The kernel will just dump more data than needed.
      
      v7: No changes
      
      v6: Rebase onto net-next, no changes
      
      v5: New patch: add dump_routes and dump_exceptions flags in filter and
          simply clear the unwanted one if strict checking is enabled, don't
          ignore NLM_F_MATCH and don't set filter_set if NLM_F_MATCH is set.
          Skip filtering altogether if no strict checking is requested:
          selecting routes or exceptions only would be inconsistent with the
          fact we can't filter on tables.
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      564c91f7
  7. 24 6月, 2019 5 次提交
    • W
      ipv6: convert major tx path to use RT6_LOOKUP_F_DST_NOREF · 7d9e5f42
      Wei Wang 提交于
      For tx path, in most cases, we still have to take refcnt on the dst
      cause the caller is caching the dst somewhere. But it still is
      beneficial to make use of RT6_LOOKUP_F_DST_NOREF flag while doing the
      route lookup. It is cause this flag prevents manipulating refcnt on
      net->ipv6.ip6_null_entry when doing fib6_rule_lookup() to traverse each
      routing table. The null_entry is a shared object and constant updates on
      it cause false sharing.
      
      We converted the current major lookup function ip6_route_output_flags()
      to make use of RT6_LOOKUP_F_DST_NOREF.
      
      Together with the change in the rx path, we see noticable performance
      boost:
      I ran synflood tests between 2 hosts under the same switch. Both hosts
      have 20G mlx NIC, and 8 tx/rx queues.
      Sender sends pure SYN flood with random src IPs and ports using trafgen.
      Receiver has a simple TCP listener on the target port.
      Both hosts have multiple custom rules:
      - For incoming packets, only local table is traversed.
      - For outgoing packets, 3 tables are traversed to find the route.
      The packet processing rate on the receiver is as follows:
      - Before the fix: 3.78Mpps
      - After the fix:  5.50Mpps
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d9e5f42
    • W
      ipv6: convert rx data path to not take refcnt on dst · 67f415dd
      Wei Wang 提交于
      ip6_route_input() is the key function to do the route lookup in the
      rx data path. All the callers to this function are already holding rcu
      lock. So it is fairly easy to convert it to not take refcnt on the dst:
      We pass in flag RT6_LOOKUP_F_DST_NOREF and do skb_dst_set_noref().
      This saves a few atomic inc or dec operations and should boost
      performance overall.
      This also makes the logic more aligned with v4.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67f415dd
    • W
      ipv6: honor RT6_LOOKUP_F_DST_NOREF in rule lookup logic · d64a1f57
      Wei Wang 提交于
      This patch specifically converts the rule lookup logic to honor this
      flag and not release refcnt when traversing each rule and calling
      lookup() on each routing table.
      Similar to previous patch, we also need some special handling of dst
      entries in uncached list because there is always 1 refcnt taken for them
      even if RT6_LOOKUP_F_DST_NOREF flag is set.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d64a1f57
    • W
      ipv6: initialize rt6->rt6i_uncached in all pre-allocated dst entries · 74109218
      Wei Wang 提交于
      Initialize rt6->rt6i_uncached on the following pre-allocated dsts:
      net->ipv6.ip6_null_entry
      net->ipv6.ip6_prohibit_entry
      net->ipv6.ip6_blk_hole_entry
      
      This is a preparation patch for later commits to be able to distinguish
      dst entries in uncached list by doing:
      !list_empty(rt6->rt6i_uncached)
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74109218
    • W
      ipv6: introduce RT6_LOOKUP_F_DST_NOREF flag in ip6_pol_route() · 0e09edcc
      Wei Wang 提交于
      This new flag is to instruct the route lookup function to not take
      refcnt on the dst entry. The user which does route lookup with this flag
      must properly use rcu protection.
      ip6_pol_route() is the major route lookup function for both tx and rx
      path.
      In this function:
      Do not take refcnt on dst if RT6_LOOKUP_F_DST_NOREF flag is set, and
      directly return the route entry. The caller should be holding rcu lock
      when using this flag, and decide whether to take refcnt or not.
      
      One note on the dst cache in the uncached_list:
      As uncached_list does not consume refcnt, one refcnt is always returned
      back to the caller even if RT6_LOOKUP_F_DST_NOREF flag is set.
      Uncached dst is only possible in the output path. So in such call path,
      caller MUST check if the dst is in the uncached_list before assuming
      that there is no refcnt taken on the returned dst.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e09edcc
  8. 23 6月, 2019 1 次提交
    • I
      ipv6: Error when route does not have any valid nexthops · 9eee3b49
      Ido Schimmel 提交于
      When user space sends invalid information in RTA_MULTIPATH, the nexthop
      list in ip6_route_multipath_add() is empty and 'rt_notif' is set to
      NULL.
      
      The code that emits the in-kernel notifications does not check for this
      condition, which results in a NULL pointer dereference [1].
      
      Fix this by bailing earlier in the function if the parsed nexthop list
      is empty. This is consistent with the corresponding IPv4 code.
      
      v2:
      * Check if parsed nexthop list is empty and bail with extack set
      
      [1]
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 9190 Comm: syz-executor149 Not tainted 5.2.0-rc5+ #38
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:call_fib6_multipath_entry_notifiers+0xd1/0x1a0
      net/ipv6/ip6_fib.c:396
      Code: 8b b5 30 ff ff ff 48 c7 85 68 ff ff ff 00 00 00 00 48 c7 85 70 ff ff
      ff 00 00 00 00 89 45 88 4c 89 e0 48 c1 e8 03 4c 89 65 80 <42> 80 3c 28 00
      0f 85 9a 00 00 00 48 b8 00 00 00 00 00 fc ff df 4d
      RSP: 0018:ffff88809788f2c0 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 1ffff11012f11e59 RCX: 00000000ffffffff
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffff88809788f390 R08: ffff88809788f8c0 R09: 000000000000000c
      R10: ffff88809788f5d8 R11: ffff88809788f527 R12: 0000000000000000
      R13: dffffc0000000000 R14: ffff88809788f8c0 R15: ffffffff89541d80
      FS:  000055555632c880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000080 CR3: 000000009ba7c000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        ip6_route_multipath_add+0xc55/0x1490 net/ipv6/route.c:5094
        inet6_rtm_newroute+0xed/0x180 net/ipv6/route.c:5208
        rtnetlink_rcv_msg+0x463/0xb00 net/core/rtnetlink.c:5219
        netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
        rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5237
        netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
        netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328
        netlink_sendmsg+0x8ae/0xd70 net/netlink/af_netlink.c:1917
        sock_sendmsg_nosec net/socket.c:646 [inline]
        sock_sendmsg+0xd7/0x130 net/socket.c:665
        ___sys_sendmsg+0x803/0x920 net/socket.c:2286
        __sys_sendmsg+0x105/0x1d0 net/socket.c:2324
        __do_sys_sendmsg net/socket.c:2333 [inline]
        __se_sys_sendmsg net/socket.c:2331 [inline]
        __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2331
        do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4401f9
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffc09fd0028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004401f9
      RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401a80
      R13: 0000000000401b10 R14: 0000000000000000 R15: 0000000000000000
      
      Reported-by: syzbot+382566d339d52cd1a204@syzkaller.appspotmail.com
      Fixes: ebee3cad ("ipv6: Add IPv6 multipath notifications for add / replace")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9eee3b49
  9. 20 6月, 2019 2 次提交
    • A
      netfilter: synproxy: fix building syncookie calls · 8527fa6c
      Arnd Bergmann 提交于
      When either CONFIG_IPV6 or CONFIG_SYN_COOKIES are disabled, the kernel
      fails to build:
      
      include/linux/netfilter_ipv6.h:180:9: error: implicit declaration of function '__cookie_v6_init_sequence'
            [-Werror,-Wimplicit-function-declaration]
              return __cookie_v6_init_sequence(iph, th, mssp);
      include/linux/netfilter_ipv6.h:194:9: error: implicit declaration of function '__cookie_v6_check'
            [-Werror,-Wimplicit-function-declaration]
              return __cookie_v6_check(iph, th, cookie);
      net/ipv6/netfilter.c:237:26: error: use of undeclared identifier '__cookie_v6_init_sequence'; did you mean 'cookie_init_sequence'?
      net/ipv6/netfilter.c:238:21: error: use of undeclared identifier '__cookie_v6_check'; did you mean '__cookie_v4_check'?
      
      Fix the IS_ENABLED() checks to match the function declaration
      and definitions for these.
      
      Fixes: 3006a522 ("netfilter: synproxy: remove module dependency on IPv6 SYNPROXY")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8527fa6c
    • D
      ipv6: Default fib6_type to RTN_UNICAST when not set · c7036d97
      David Ahern 提交于
      A user reported that routes are getting installed with type 0 (RTN_UNSPEC)
      where before the routes were RTN_UNICAST. One example is from accel-ppp
      which apparently still uses the ioctl interface and does not set
      rtmsg_type. Another is the netlink interface where ipv6 does not require
      rtm_type to be set (v4 does). Prior to the commit in the Fixes tag the
      ipv6 stack converted type 0 to RTN_UNICAST, so restore that behavior.
      
      Fixes: e8478e80 ("net/ipv6: Save route type in rt6_info")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7036d97
  10. 19 6月, 2019 6 次提交
    • E
      inet: fix various use-after-free in defrags units · d5dd8879
      Eric Dumazet 提交于
      syzbot reported another issue caused by my recent patches. [1]
      
      The issue here is that fqdir_exit() is initiating a work queue
      and immediately returns. A bit later cleanup_net() was able
      to free the MIB (percpu data) and the whole struct net was freed,
      but we had active frag timers that fired and triggered use-after-free.
      
      We need to make sure that timers can catch fqdir->dead being set,
      to bailout.
      
      Since RCU is used for the reader side, this means
      we want to respect an RCU grace period between these operations :
      
      1) qfdir->dead = 1;
      
      2) netns dismantle (freeing of various data structure)
      
      This patch uses new new (struct pernet_operations)->pre_exit
      infrastructure to ensures a full RCU grace period
      happens between fqdir_pre_exit() and fqdir_exit()
      
      This also means we can use a regular work queue, we no
      longer need rcu_work.
      
      Tested:
      
      $ time for i in {1..1000}; do unshare -n /bin/false;done
      
      real	0m2.585s
      user	0m0.160s
      sys	0m2.214s
      
      [1]
      
      BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
      Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860
      
      CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
       __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
       kasan_report+0x12/0x20 mm/kasan/common.c:614
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
       ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
       call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
       expire_timers kernel/time/timer.c:1366 [inline]
       __run_timers kernel/time/timer.c:1685 [inline]
       __run_timers kernel/time/timer.c:1653 [inline]
       run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
       __do_softirq+0x25c/0x94c kernel/softirq.c:293
       invoke_softirq kernel/softirq.c:374 [inline]
       irq_exit+0x180/0x1d0 kernel/softirq.c:414
       exiting_irq arch/x86/include/asm/apic.h:536 [inline]
       smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
       apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
       </IRQ>
      RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
      Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
      RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
      RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
      RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
      RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
      R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
      R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
       tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
       tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
       tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
       tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
       security_file_ioctl+0x77/0xc0 security/security.c:1370
       ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
       __do_sys_ioctl fs/ioctl.c:720 [inline]
       __se_sys_ioctl fs/ioctl.c:718 [inline]
       __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
       do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4592c9
      Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
      RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
      RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
      R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff
      
      Allocated by task 9047:
       save_stack+0x23/0x90 mm/kasan/common.c:71
       set_track mm/kasan/common.c:79 [inline]
       __kasan_kmalloc mm/kasan/common.c:489 [inline]
       __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
       kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
       slab_post_alloc_hook mm/slab.h:437 [inline]
       slab_alloc mm/slab.c:3326 [inline]
       kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
       kmem_cache_zalloc include/linux/slab.h:732 [inline]
       net_alloc net/core/net_namespace.c:386 [inline]
       copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
       create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
       unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
       ksys_unshare+0x440/0x980 kernel/fork.c:2692
       __do_sys_unshare kernel/fork.c:2760 [inline]
       __se_sys_unshare kernel/fork.c:2758 [inline]
       __x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
       do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 2541:
       save_stack+0x23/0x90 mm/kasan/common.c:71
       set_track mm/kasan/common.c:79 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
       kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
       __cache_free mm/slab.c:3432 [inline]
       kmem_cache_free+0x86/0x260 mm/slab.c:3698
       net_free net/core/net_namespace.c:402 [inline]
       net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
       net_drop_ns net/core/net_namespace.c:408 [inline]
       cleanup_net+0x538/0x960 net/core/net_namespace.c:571
       process_one_work+0x989/0x1790 kernel/workqueue.c:2269
       worker_thread+0x98/0xe40 kernel/workqueue.c:2415
       kthread+0x354/0x420 kernel/kthread.c:255
       ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
      
      The buggy address belongs to the object at ffff88808b9fe100
       which belongs to the cache net_namespace of size 6784
      The buggy address is located 560 bytes inside of
       6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
      The buggy address belongs to the page:
      page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
      flags: 0x1fffc0000010200(slab|head)
      raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
      raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                           ^
       ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 3c8fc878 ("inet: frags: rework rhashtable dismantle")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5dd8879
    • T
      treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 · d2912cb1
      Thomas Gleixner 提交于
      Based on 2 normalized pattern(s):
      
        this program is free software you can redistribute it and or modify
        it under the terms of the gnu general public license version 2 as
        published by the free software foundation
      
        this program is free software you can redistribute it and or modify
        it under the terms of the gnu general public license version 2 as
        published by the free software foundation #
      
      extracted by the scancode license scanner the SPDX license identifier
      
        GPL-2.0-only
      
      has been chosen to replace the boilerplate/reference in 4122 file(s).
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NEnrico Weigelt <info@metux.net>
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NAllison Randal <allison@lohutok.net>
      Cc: linux-spdx@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2912cb1
    • I
      ipv6: Stop sending in-kernel notifications for each nexthop · d5382fef
      Ido Schimmel 提交于
      Both listeners - mlxsw and netdevsim - of IPv6 FIB notifications are now
      ready to handle IPv6 multipath notifications.
      
      Therefore, stop ignoring such notifications in both drivers and stop
      sending notification for each added / deleted nexthop.
      
      v2:
      * Remove 'multipath_rt' from 'struct fib6_entry_notifier_info'
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5382fef
    • I
      ipv6: Add IPv6 multipath notification for route delete · 2881fd61
      Ido Schimmel 提交于
      If all the nexthops of a multipath route are being deleted, send one
      notification for the entire route, instead of one per-nexthop.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2881fd61
    • I
      ipv6: Add IPv6 multipath notifications for add / replace · ebee3cad
      Ido Schimmel 提交于
      Emit a notification when a multipath routes is added or replace.
      
      Note that unlike the replace notifications sent from fib6_add_rt2node(),
      it is possible we are sending a 'FIB_EVENT_ENTRY_REPLACE' when a route
      was merely added and not replaced.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ebee3cad
    • I
      ipv6: Extend notifier info for multipath routes · d4b96c7b
      Ido Schimmel 提交于
      Extend the IPv6 FIB notifier info with number of sibling routes being
      notified.
      
      This will later allow listeners to process one notification for a
      multipath routes instead of N, where N is the number of nexthops.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4b96c7b
  11. 17 6月, 2019 2 次提交
  12. 15 6月, 2019 3 次提交
  13. 13 6月, 2019 2 次提交
    • E
      tcp: add optional per socket transmit delay · a842fe14
      Eric Dumazet 提交于
      Adding delays to TCP flows is crucial for studying behavior
      of TCP stacks, including congestion control modules.
      
      Linux offers netem module, but it has unpractical constraints :
      - Need root access to change qdisc
      - Hard to setup on egress if combined with non trivial qdisc like FQ
      - Single delay for all flows.
      
      EDT (Earliest Departure Time) adoption in TCP stack allows us
      to enable a per socket delay at a very small cost.
      
      Networking tools can now establish thousands of flows, each of them
      with a different delay, simulating real world conditions.
      
      This requires FQ packet scheduler or a EDT-enabled NIC.
      
      This patchs adds TCP_TX_DELAY socket option, to set a delay in
      usec units.
      
        unsigned int tx_delay = 10000; /* 10 msec */
      
        setsockopt(fd, SOL_TCP, TCP_TX_DELAY, &tx_delay, sizeof(tx_delay));
      
      Note that FQ packet scheduler limits might need some tweaking :
      
      man tc-fq
      
      PARAMETERS
         limit
             Hard  limit  on  the  real  queue  size. When this limit is
             reached, new packets are dropped. If the value is  lowered,
             packets  are  dropped so that the new limit is met. Default
             is 10000 packets.
      
         flow_limit
             Hard limit on the maximum  number  of  packets  queued  per
             flow.  Default value is 100.
      
      Use of TCP_TX_DELAY option will increase number of skbs in FQ qdisc,
      so packets would be dropped if any of the previous limit is hit.
      
      Use of a jump label makes this support runtime-free, for hosts
      never using the option.
      
      Also note that TSQ (TCP Small Queues) limits are slightly changed
      with this patch : we need to account that skbs artificially delayed
      wont stop us providind more skbs to feed the pipe (netem uses
      skb_orphan_partial() for this purpose, but FQ can not use this trick)
      
      Because of that, using big delays might very well trigger
      old bugs in TSO auto defer logic and/or sndbuf limited detection.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a842fe14
    • S
      vrf: Increment Icmp6InMsgs on the original netdev · e1ae5c2e
      Stephen Suryaputra 提交于
      Get the ingress interface and increment ICMP counters based on that
      instead of skb->dev when the the dev is a VRF device.
      
      This is a follow up on the following message:
      https://www.spinics.net/lists/netdev/msg560268.html
      
      v2: Avoid changing skb->dev since it has unintended effect for local
          delivery (David Ahern).
      Signed-off-by: NStephen Suryaputra <ssuryaextr@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1ae5c2e
  14. 12 6月, 2019 1 次提交
  15. 11 6月, 2019 2 次提交
    • D
      ipv6: Allow routes to use nexthop objects · 5b98324e
      David Ahern 提交于
      Add support for RTA_NH_ID attribute to allow a user to specify a
      nexthop id to use with a route. fc_nh_id is added to fib6_config to
      hold the value passed in the RTA_NH_ID attribute. If a nexthop id
      is given, the gateway, device, encap and multipath attributes can
      not be set.
      
      Update ip6_route_del to check metric and protocol before nexthop
      specs. If fc_nh_id is set, then it must match the id in the route
      entry. Since IPv6 allows delete of a cached entry (an exception),
      add ip6_del_cached_rt_nh to cycle through all of the fib6_nh in
      a fib entry if it is using a nexthop.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b98324e
    • D
      ipv6: Handle all fib6_nh in a nexthop in mtu updates · 2d44234b
      David Ahern 提交于
      Use nexthop_for_each_fib6_nh to call fib6_nh_mtu_change for each
      fib6_nh in a nexthop for rt6_mtu_change_route. For __ip6_rt_update_pmtu,
      we need to find the nexthop that correlates to the device and gateway
      in the rt6_info.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d44234b