1. 09 4月, 2019 3 次提交
  2. 04 4月, 2019 2 次提交
    • D
      ipv6: Flip to fib_nexthop_info · c0a72077
      David Ahern 提交于
      Export fib_nexthop_info and fib_add_nexthop for use by IPv6 code.
      Remove rt6_nexthop_info and rt6_add_nexthop in favor of the IPv4
      versions. Update fib_nexthop_info for IPv6 linkdown check and
      RTA_GATEWAY for AF_INET6.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0a72077
    • D
      ipv4: Add fib_nh_common to fib_result · eba618ab
      David Ahern 提交于
      Most of the ipv4 code only needs data from fib_nh_common. Add
      fib_nh_common selection to fib_result and update users to use it.
      
      Right now, fib_nh_common in fib_result will point to a fib_nh struct
      that is embedded within a fib_info:
      
              fib_info  --> fib_nh
                            fib_nh
                            ...
                            fib_nh
                              ^
          fib_result->nhc ----+
      
      Later, nhc can point to a fib_nh within a nexthop struct:
      
              fib_info --> nexthop --> fib_nh
                                         ^
          fib_result->nhc ---------------+
      
      or for a nexthop group:
      
              fib_info --> nexthop --> nexthop --> fib_nh
                                       nexthop --> fib_nh
                                       ...
                                       nexthop --> fib_nh
                                                     ^
          fib_result->nhc ---------------------------+
      
      In all cases nhsel within fib_result will point to which leg in the
      multipath route is used.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eba618ab
  3. 30 3月, 2019 5 次提交
  4. 16 1月, 2019 1 次提交
    • I
      net: ipv4: Fix memory leak in network namespace dismantle · f97f4dd8
      Ido Schimmel 提交于
      IPv4 routing tables are flushed in two cases:
      
      1. In response to events in the netdev and inetaddr notification chains
      2. When a network namespace is being dismantled
      
      In both cases only routes associated with a dead nexthop group are
      flushed. However, a nexthop group will only be marked as dead in case it
      is populated with actual nexthops using a nexthop device. This is not
      the case when the route in question is an error route (e.g.,
      'blackhole', 'unreachable').
      
      Therefore, when a network namespace is being dismantled such routes are
      not flushed and leaked [1].
      
      To reproduce:
      # ip netns add blue
      # ip -n blue route add unreachable 192.0.2.0/24
      # ip netns del blue
      
      Fix this by not skipping error routes that are not marked with
      RTNH_F_DEAD when flushing the routing tables.
      
      To prevent the flushing of such routes in case #1, add a parameter to
      fib_table_flush() that indicates if the table is flushed as part of
      namespace dismantle or not.
      
      Note that this problem does not exist in IPv6 since error routes are
      associated with the loopback device.
      
      [1]
      unreferenced object 0xffff888066650338 (size 56):
        comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 b0 1c 62 61 80 88 ff ff  ..........ba....
          e8 8b a1 64 80 88 ff ff 00 07 00 08 fe 00 00 00  ...d............
        backtrace:
          [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220
          [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20
          [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380
          [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690
          [<0000000014f62875>] netlink_sendmsg+0x929/0xe10
          [<00000000bac9d967>] sock_sendmsg+0xc8/0x110
          [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0
          [<000000002e94f880>] __sys_sendmsg+0xf7/0x250
          [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610
          [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<000000003a8b605b>] 0xffffffffffffffff
      unreferenced object 0xffff888061621c88 (size 48):
        comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
        hex dump (first 32 bytes):
          6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
          6b 6b 6b 6b 6b 6b 6b 6b d8 8e 26 5f 80 88 ff ff  kkkkkkkk..&_....
        backtrace:
          [<00000000733609e3>] fib_table_insert+0x978/0x1500
          [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220
          [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20
          [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380
          [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690
          [<0000000014f62875>] netlink_sendmsg+0x929/0xe10
          [<00000000bac9d967>] sock_sendmsg+0xc8/0x110
          [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0
          [<000000002e94f880>] __sys_sendmsg+0xf7/0x250
          [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610
          [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<000000003a8b605b>] 0xffffffffffffffff
      
      Fixes: 8cced9ef ("[NETNS]: Enable routing configuration in non-initial namespace.")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f97f4dd8
  5. 25 10月, 2018 1 次提交
  6. 16 10月, 2018 3 次提交
    • D
      net: Enable kernel side filtering of route dumps · effe6792
      David Ahern 提交于
      Update parsing of route dump request to enable kernel side filtering.
      Allow filtering results by protocol (e.g., which routing daemon installed
      the route), route type (e.g., unicast), table id and nexthop device. These
      amount to the low hanging fruit, yet a huge improvement, for dumping
      routes.
      
      ip_valid_fib_dump_req is called with RTNL held, so __dev_get_by_index can
      be used to look up the device index without taking a reference. From
      there filter->dev is only used during dump loops with the lock still held.
      
      Set NLM_F_DUMP_FILTERED in the answer_flags so the user knows the results
      have been filtered should no entries be returned.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      effe6792
    • D
      net/ipv4: Plumb support for filtering route dumps · 18a8021a
      David Ahern 提交于
      Implement kernel side filtering of routes by table id, egress device index,
      protocol and route type. If the table id is given in the filter, lookup the
      table and call fib_table_dump directly for it.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18a8021a
    • D
      net: Add struct for fib dump filter · 4724676d
      David Ahern 提交于
      Add struct fib_dump_filter for options on limiting which routes are
      returned in a dump request. The current list is table id, protocol,
      route type, rtm_flags and nexthop device index. struct net is needed
      to lookup the net_device from the index.
      
      Declare the filter for each route dump handler and plumb the new
      arguments from dump handlers to ip_valid_fib_dump_req.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4724676d
  7. 11 10月, 2018 1 次提交
    • S
      net: ipv4: update fnhe_pmtu when first hop's MTU changes · af7d6cce
      Sabrina Dubroca 提交于
      Since commit 5aad1de5 ("ipv4: use separate genid for next hop
      exceptions"), exceptions get deprecated separately from cached
      routes. In particular, administrative changes don't clear PMTU anymore.
      
      As Stefano described in commit e9fa1495 ("ipv6: Reflect MTU changes
      on PMTU of exceptions for MTU-less routes"), the PMTU discovered before
      the local MTU change can become stale:
       - if the local MTU is now lower than the PMTU, that PMTU is now
         incorrect
       - if the local MTU was the lowest value in the path, and is increased,
         we might discover a higher PMTU
      
      Similarly to what commit e9fa1495 did for IPv6, update PMTU in those
      cases.
      
      If the exception was locked, the discovered PMTU was smaller than the
      minimal accepted PMTU. In that case, if the new local MTU is smaller
      than the current PMTU, let PMTU discovery figure out if locking of the
      exception is still needed.
      
      To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU
      notifier. By the time the notifier is called, dev->mtu has been
      changed. This patch adds the old MTU as additional information in the
      notifier structure, and a new call_netdevice_notifiers_u32() function.
      
      Fixes: 5aad1de5 ("ipv4: use separate genid for next hop exceptions")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af7d6cce
  8. 09 10月, 2018 1 次提交
    • D
      rtnetlink: Update fib dumps for strict data checking · e8ba330a
      David Ahern 提交于
      Add helper to check netlink message for route dumps. If the strict flag
      is set the dump request is expected to have an rtmsg struct as the header.
      All elements of the struct are expected to be 0 with the exception of
      rtm_flags (which is used by both ipv4 and ipv6 dumps) and no attributes
      can be appended. rtm_flags can only have RTM_F_CLONED and RTM_F_PREFIX
      set.
      
      Update inet_dump_fib, inet6_dump_fib, mpls_dump_routes, ipmr_rtm_dumproute,
      and ip6mr_rtm_dumproute to call this helper if strict data checking is
      enabled.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8ba330a
  9. 21 9月, 2018 1 次提交
  10. 22 5月, 2018 1 次提交
  11. 15 3月, 2018 1 次提交
    • S
      ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu · d52e5a7e
      Sabrina Dubroca 提交于
      Prior to the rework of PMTU information storage in commit
      2c8cec5c ("ipv4: Cache learned PMTU information in inetpeer."),
      when a PMTU event advertising a PMTU smaller than
      net.ipv4.route.min_pmtu was received, we would disable setting the DF
      flag on packets by locking the MTU metric, and set the PMTU to
      net.ipv4.route.min_pmtu.
      
      Since then, we don't disable DF, and set PMTU to
      net.ipv4.route.min_pmtu, so the intermediate router that has this link
      with a small MTU will have to drop the packets.
      
      This patch reestablishes pre-2.6.39 behavior by splitting
      rtable->rt_pmtu into a bitfield with rt_mtu_locked and rt_pmtu.
      rt_mtu_locked indicates that we shouldn't set the DF bit on that path,
      and is checked in ip_dont_fragment().
      
      One possible workaround is to set net.ipv4.route.min_pmtu to a value low
      enough to accommodate the lowest MTU encountered.
      
      Fixes: 2c8cec5c ("ipv4: Cache learned PMTU information in inetpeer.")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d52e5a7e
  12. 05 3月, 2018 1 次提交
  13. 01 3月, 2018 2 次提交
  14. 29 9月, 2017 1 次提交
  15. 04 8月, 2017 2 次提交
    • I
      net: fib_rules: Implement notification logic in core · 1b2a4440
      Ido Schimmel 提交于
      Unlike the routing tables, the FIB rules share a common core, so instead
      of replicating the same logic for each address family we can simply dump
      the rules and send notifications from the core itself.
      
      To protect the integrity of the dump, a rules-specific sequence counter
      is added for each address family and incremented whenever a rule is
      added or deleted (under RTNL).
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b2a4440
    • I
      net: core: Make the FIB notification chain generic · 04b1d4e5
      Ido Schimmel 提交于
      The FIB notification chain is currently soley used by IPv4 code.
      However, we're going to introduce IPv6 FIB offload support, which
      requires these notification as well.
      
      As explained in commit c3852ef7 ("ipv4: fib: Replay events when
      registering FIB notifier"), upon registration to the chain, the callee
      receives a full dump of the FIB tables and rules by traversing all the
      net namespaces. The integrity of the dump is ensured by a per-namespace
      sequence counter that is incremented whenever a change to the tables or
      rules occurs.
      
      In order to allow more address families to use the chain, each family is
      expected to register its fib_notifier_ops in its pernet init. These
      operations allow the common code to read the family's sequence counter
      as well as dump its tables and rules in the given net namespace.
      
      Additionally, a 'family' parameter is added to sent notifications, so
      that listeners could distinguish between the different families.
      
      Implement the common code that allows listeners to register to the chain
      and for address families to register their fib_notifier_ops. Subsequent
      patches will implement these operations in IPv6.
      
      In the future, ipmr and ip6mr will be extended to provide these
      notifications as well.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04b1d4e5
  16. 03 8月, 2017 1 次提交
  17. 04 7月, 2017 1 次提交
  18. 30 5月, 2017 1 次提交
  19. 27 5月, 2017 2 次提交
  20. 23 5月, 2017 1 次提交
  21. 22 3月, 2017 1 次提交
    • N
      net: ipv4: add support for ECMP hash policy choice · bf4e0a3d
      Nikolay Aleksandrov 提交于
      This patch adds support for ECMP hash policy choice via a new sysctl
      called fib_multipath_hash_policy and also adds support for L4 hashes.
      The current values for fib_multipath_hash_policy are:
       0 - layer 3 (default)
       1 - layer 4
      If there's an skb hash already set and it matches the chosen policy then it
      will be used instead of being calculated (currently only for L4).
      In L3 mode we always calculate the hash due to the ICMP error special
      case, the flow dissector's field consistentification should handle the
      address order thus we can remove the address reversals.
      If the skb is provided we always use it for the hash calculation,
      otherwise we fallback to fl4, that is if skb is NULL fl4 has to be set.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf4e0a3d
  22. 17 3月, 2017 2 次提交
    • I
      ipv4: fib_rules: Add notifier info to FIB rules notifications · 6a003a5f
      Ido Schimmel 提交于
      Whenever a FIB rule is added or removed, a notification is sent in the
      FIB notification chain. However, listeners don't have a way to tell
      which rule was added or removed.
      
      This is problematic as we would like to give listeners the ability to
      decide which action to execute based on the notified rule. Specifically,
      offloading drivers should be able to determine if they support the
      reflection of the notified FIB rule and flush their LPM tables in case
      they don't.
      
      Do that by adding a notifier info to these notifications and embed the
      common FIB rule struct in it.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a003a5f
    • I
      ipv4: fib_rules: Check if rule is a default rule · 3c71006d
      Ido Schimmel 提交于
      Currently, when non-default (custom) FIB rules are used, devices capable
      of layer 3 offloading flush their tables and let the kernel do the
      forwarding instead.
      
      When these devices' drivers are loaded they register to the FIB
      notification chain, which lets them know about the existence of any
      custom FIB rules. This is done by sending a RULE_ADD notification based
      on the value of 'net->ipv4.fib_has_custom_rules'.
      
      This approach is problematic when VRF offload is taken into account, as
      upon the creation of the first VRF netdev, a l3mdev rule is programmed
      to direct skbs to the VRF's table.
      
      Instead of merely reading the above value and sending a single RULE_ADD
      notification, we should iterate over all the FIB rules and send a
      detailed notification for each, thereby allowing offloading drivers to
      sanitize the rules they don't support and potentially flush their
      tables.
      
      While l3mdev rules are uniquely marked, the default rules are not.
      Therefore, when they are being notified they might invoke offloading
      drivers to unnecessarily flush their tables.
      
      Solve this by adding an helper to check if a FIB rule is a default rule.
      Namely, its selector should match all packets and its action should
      point to the local, main or default tables.
      
      As noted by David Ahern, uniquely marking the default rules is
      insufficient. When using VRFs, it's common to avoid false hits by moving
      the rule for the local table to just before the main table:
      
      Default configuration:
      $ ip rule show
      0:      from all lookup local
      32766:  from all lookup main
      32767:  from all lookup default
      
      Common configuration with VRFs:
      $ ip rule show
      1000:   from all lookup [l3mdev-table]
      32765:  from all lookup local
      32766:  from all lookup main
      32767:  from all lookup default
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c71006d
  23. 11 3月, 2017 2 次提交
  24. 11 2月, 2017 1 次提交
  25. 09 2月, 2017 1 次提交
  26. 07 1月, 2017 1 次提交