1. 16 10月, 2018 3 次提交
    • D
      net: Enable kernel side filtering of route dumps · effe6792
      David Ahern 提交于
      Update parsing of route dump request to enable kernel side filtering.
      Allow filtering results by protocol (e.g., which routing daemon installed
      the route), route type (e.g., unicast), table id and nexthop device. These
      amount to the low hanging fruit, yet a huge improvement, for dumping
      routes.
      
      ip_valid_fib_dump_req is called with RTNL held, so __dev_get_by_index can
      be used to look up the device index without taking a reference. From
      there filter->dev is only used during dump loops with the lock still held.
      
      Set NLM_F_DUMP_FILTERED in the answer_flags so the user knows the results
      have been filtered should no entries be returned.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      effe6792
    • D
      net/ipv4: Plumb support for filtering route dumps · 18a8021a
      David Ahern 提交于
      Implement kernel side filtering of routes by table id, egress device index,
      protocol and route type. If the table id is given in the filter, lookup the
      table and call fib_table_dump directly for it.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18a8021a
    • D
      net: Add struct for fib dump filter · 4724676d
      David Ahern 提交于
      Add struct fib_dump_filter for options on limiting which routes are
      returned in a dump request. The current list is table id, protocol,
      route type, rtm_flags and nexthop device index. struct net is needed
      to lookup the net_device from the index.
      
      Declare the filter for each route dump handler and plumb the new
      arguments from dump handlers to ip_valid_fib_dump_req.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4724676d
  2. 11 10月, 2018 1 次提交
    • S
      net: ipv4: update fnhe_pmtu when first hop's MTU changes · af7d6cce
      Sabrina Dubroca 提交于
      Since commit 5aad1de5 ("ipv4: use separate genid for next hop
      exceptions"), exceptions get deprecated separately from cached
      routes. In particular, administrative changes don't clear PMTU anymore.
      
      As Stefano described in commit e9fa1495 ("ipv6: Reflect MTU changes
      on PMTU of exceptions for MTU-less routes"), the PMTU discovered before
      the local MTU change can become stale:
       - if the local MTU is now lower than the PMTU, that PMTU is now
         incorrect
       - if the local MTU was the lowest value in the path, and is increased,
         we might discover a higher PMTU
      
      Similarly to what commit e9fa1495 did for IPv6, update PMTU in those
      cases.
      
      If the exception was locked, the discovered PMTU was smaller than the
      minimal accepted PMTU. In that case, if the new local MTU is smaller
      than the current PMTU, let PMTU discovery figure out if locking of the
      exception is still needed.
      
      To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU
      notifier. By the time the notifier is called, dev->mtu has been
      changed. This patch adds the old MTU as additional information in the
      notifier structure, and a new call_netdevice_notifiers_u32() function.
      
      Fixes: 5aad1de5 ("ipv4: use separate genid for next hop exceptions")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af7d6cce
  3. 09 10月, 2018 1 次提交
    • D
      rtnetlink: Update fib dumps for strict data checking · e8ba330a
      David Ahern 提交于
      Add helper to check netlink message for route dumps. If the strict flag
      is set the dump request is expected to have an rtmsg struct as the header.
      All elements of the struct are expected to be 0 with the exception of
      rtm_flags (which is used by both ipv4 and ipv6 dumps) and no attributes
      can be appended. rtm_flags can only have RTM_F_CLONED and RTM_F_PREFIX
      set.
      
      Update inet_dump_fib, inet6_dump_fib, mpls_dump_routes, ipmr_rtm_dumproute,
      and ip6mr_rtm_dumproute to call this helper if strict data checking is
      enabled.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8ba330a
  4. 22 9月, 2018 1 次提交
  5. 21 9月, 2018 1 次提交
  6. 29 7月, 2018 1 次提交
  7. 08 7月, 2018 1 次提交
  8. 13 6月, 2018 1 次提交
    • K
      treewide: kzalloc() -> kcalloc() · 6396bb22
      Kees Cook 提交于
      The kzalloc() function has a 2-factor argument form, kcalloc(). This
      patch replaces cases of:
      
              kzalloc(a * b, gfp)
      
      with:
              kcalloc(a * b, gfp)
      
      as well as handling cases of:
      
              kzalloc(a * b * c, gfp)
      
      with:
      
              kzalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kzalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kzalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kzalloc
      + kcalloc
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(sizeof(THING) * C2, ...)
      |
        kzalloc(sizeof(TYPE) * C2, ...)
      |
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(C1 * C2, ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6396bb22
  9. 29 5月, 2018 1 次提交
  10. 25 5月, 2018 1 次提交
  11. 24 5月, 2018 2 次提交
  12. 18 5月, 2018 1 次提交
  13. 28 3月, 2018 1 次提交
  14. 13 2月, 2018 1 次提交
    • K
      net: Convert pernet_subsys, registered from inet_init() · f84c6821
      Kirill Tkhai 提交于
      arp_net_ops just addr/removes /proc entry.
      
      devinet_ops allocates and frees duplicate of init_net tables
      and (un)registers sysctl entries.
      
      fib_net_ops allocates and frees pernet tables, creates/destroys
      netlink socket and (un)initializes /proc entries. Foreign
      pernet_operations do not touch them.
      
      ip_rt_proc_ops only modifies pernet /proc entries.
      
      xfrm_net_ops creates/destroys /proc entries, allocates/frees
      pernet statistics, hashes and tables, and (un)initializes
      sysctl files. These are not touched by foreigh pernet_operations
      
      xfrm4_net_ops allocates/frees private pernet memory, and
      configures sysctls.
      
      sysctl_route_ops creates/destroys sysctls.
      
      rt_genid_ops only initializes fields of just allocated net.
      
      ipv4_inetpeer_ops allocated/frees net private memory.
      
      igmp_net_ops just creates/destroys /proc files and socket,
      noone else interested in.
      
      tcp_sk_ops seems to be safe, because tcp_sk_init() does not
      depend on any other pernet_operations modifications. Iteration
      over hash table in inet_twsk_purge() is made under RCU lock,
      and it's safe to iterate the table this way. Removing from
      the table happen from inet_twsk_deschedule_put(), but this
      function is safe without any extern locks, as it's synchronized
      inside itself. There are many examples, it's used in different
      context. So, it's safe to leave tcp_sk_exit_batch() unlocked.
      
      tcp_net_metrics_ops is synchronized on tcp_metrics_lock and safe.
      
      udplite4_net_ops only creates/destroys pernet /proc file.
      
      icmp_sk_ops creates percpu sockets, not touched by foreign
      pernet_operations.
      
      ipmr_net_ops creates/destroys pernet fib tables, (un)registers
      fib rules and /proc files. This seem to be safe to execute
      in parallel with foreign pernet_operations.
      
      af_inet_ops just sets up default parameters of newly created net.
      
      ipv4_mib_ops creates and destroys pernet percpu statistics.
      
      raw_net_ops, tcp4_net_ops, udp4_net_ops, ping_v4_net_ops
      and ip_proc_ops only create/destroy pernet /proc files.
      
      ip4_frags_ops creates and destroys sysctl file.
      
      So, it's safe to make the pernet_operations async.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f84c6821
  15. 25 1月, 2018 1 次提交
  16. 21 12月, 2017 1 次提交
  17. 01 11月, 2017 1 次提交
  18. 22 9月, 2017 1 次提交
    • P
      net: avoid a full fib lookup when rp_filter is disabled. · 6e617de8
      Paolo Abeni 提交于
      Since commit 1dced6a8 ("ipv4: Restore accept_local behaviour
      in fib_validate_source()") a full fib lookup is needed even if
      the rp_filter is disabled, if accept_local is false - which is
      the default.
      
      What we really need in the above scenario is just checking
      that the source IP address is not local, and in most case we
      can do that is a cheaper way looking up the ifaddr hash table.
      
      This commit adds a helper for such lookup, and uses it to
      validate the src address when rp_filter is disabled and no
      'local' routes are created by the user space in the relevant
      namespace.
      
      A new ipv4 netns flag is added to account for such routes.
      We need that to preserve the same behavior we had before this
      patch.
      
      It also drops the checks to bail early from __fib_validate_source,
      added by the commit 1dced6a8 ("ipv4: Restore accept_local
      behaviour in fib_validate_source()") they do not give any
      measurable performance improvement: if we do the lookup with are
      on a slower path.
      
      This improves UDP performances for unconnected sockets
      when rp_filter is disabled by 5% and also gives small but
      measurable performance improvement for TCP flood scenarios.
      
      v1 -> v2:
       - use the ifaddr lookup helper in __ip_dev_find(), as suggested
         by Eric
       - fall-back to full lookup if custom local routes are present
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e617de8
  19. 10 8月, 2017 1 次提交
  20. 04 8月, 2017 1 次提交
    • I
      net: core: Make the FIB notification chain generic · 04b1d4e5
      Ido Schimmel 提交于
      The FIB notification chain is currently soley used by IPv4 code.
      However, we're going to introduce IPv6 FIB offload support, which
      requires these notification as well.
      
      As explained in commit c3852ef7 ("ipv4: fib: Replay events when
      registering FIB notifier"), upon registration to the chain, the callee
      receives a full dump of the FIB tables and rules by traversing all the
      net namespaces. The integrity of the dump is ensured by a per-namespace
      sequence counter that is incremented whenever a change to the tables or
      rules occurs.
      
      In order to allow more address families to use the chain, each family is
      expected to register its fib_notifier_ops in its pernet init. These
      operations allow the common code to read the family's sequence counter
      as well as dump its tables and rules in the given net namespace.
      
      Additionally, a 'family' parameter is added to sent notifications, so
      that listeners could distinguish between the different families.
      
      Implement the common code that allows listeners to register to the chain
      and for address families to register their fib_notifier_ops. Subsequent
      patches will implement these operations in IPv6.
      
      In the future, ipmr and ip6mr will be extended to provide these
      notifications as well.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04b1d4e5
  21. 21 7月, 2017 1 次提交
    • M
      ipv4: initialize fib_trie prior to register_netdev_notifier call. · 8799a221
      Mahesh Bandewar 提交于
      Net stack initialization currently initializes fib-trie after the
      first call to netdevice_notifier() call. In fact fib_trie initialization
      needs to happen before first rtnl_register(). It does not cause any problem
      since there are no devices UP at this moment, but trying to bring 'lo'
      UP at initialization would make this assumption wrong and exposes the issue.
      
      Fixes following crash
      
       Call Trace:
        ? alternate_node_alloc+0x76/0xa0
        fib_table_insert+0x1b7/0x4b0
        fib_magic.isra.17+0xea/0x120
        fib_add_ifaddr+0x7b/0x190
        fib_netdev_event+0xc0/0x130
        register_netdevice_notifier+0x1c1/0x1d0
        ip_fib_init+0x72/0x85
        ip_rt_init+0x187/0x1e9
        ip_init+0xe/0x1a
        inet_init+0x171/0x26c
        ? ipv4_offload_init+0x66/0x66
        do_one_initcall+0x43/0x160
        kernel_init_freeable+0x191/0x219
        ? rest_init+0x80/0x80
        kernel_init+0xe/0x150
        ret_from_fork+0x22/0x30
       Code: f6 46 23 04 74 86 4c 89 f7 e8 ae 45 01 00 49 89 c7 4d 85 ff 0f 85 7b ff ff ff 31 db eb 08 4c 89 ff e8 16 47 01 00 48 8b 44 24 38 <45> 8b 6e 14 4d 63 76 74 48 89 04 24 0f 1f 44 00 00 48 83 c4 08
       RIP: kmem_cache_alloc+0xcf/0x1c0 RSP: ffff9b1500017c28
       CR2: 0000000000000014
      
      Fixes: 7b1a74fd ("[NETNS]: Refactor fib initialization so it can handle multiple namespaces.")
      Fixes: 7f9b8052 ("[IPV4]: fib hash|trie initialization")
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8799a221
  22. 05 7月, 2017 1 次提交
  23. 04 7月, 2017 1 次提交
  24. 30 5月, 2017 2 次提交
  25. 23 5月, 2017 2 次提交
  26. 17 5月, 2017 1 次提交
    • D
      net: Improve handling of failures on link and route dumps · f6c5775f
      David Ahern 提交于
      In general, rtnetlink dumps do not anticipate failure to dump a single
      object (e.g., link or route) on a single pass. As both route and link
      objects have grown via more attributes, that is no longer a given.
      
      netlink dumps can handle a failure if the dump function returns an
      error; specifically, netlink_dump adds the return code to the response
      if it is <= 0 so userspace is notified of the failure. The missing
      piece is the rtnetlink dump functions returning the error.
      
      Fix route and link dump functions to return the errors if no object is
      added to an skb (detected by skb->len != 0). IPv6 route dumps
      (rt6_dump_route) already return the error; this patch updates IPv4 and
      link dumps. Other dump functions may need to be ajusted as well.
      Reported-by: NJan Moskyto Matejka <mq@ucw.cz>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6c5775f
  27. 28 4月, 2017 1 次提交
  28. 18 4月, 2017 1 次提交
  29. 14 4月, 2017 1 次提交
  30. 23 3月, 2017 1 次提交
  31. 02 3月, 2017 1 次提交
  32. 27 2月, 2017 1 次提交
  33. 19 1月, 2017 1 次提交
    • D
      lwtunnel: fix autoload of lwt modules · 9ed59592
      David Ahern 提交于
      Trying to add an mpls encap route when the MPLS modules are not loaded
      hangs. For example:
      
          CONFIG_MPLS=y
          CONFIG_NET_MPLS_GSO=m
          CONFIG_MPLS_ROUTING=m
          CONFIG_MPLS_IPTUNNEL=m
      
          $ ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2
      
      The ip command hangs:
      root       880   826  0 21:25 pts/0    00:00:00 ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2
      
          $ cat /proc/880/stack
          [<ffffffff81065a9b>] call_usermodehelper_exec+0xd6/0x134
          [<ffffffff81065efc>] __request_module+0x27b/0x30a
          [<ffffffff814542f6>] lwtunnel_build_state+0xe4/0x178
          [<ffffffff814aa1e4>] fib_create_info+0x47f/0xdd4
          [<ffffffff814ae451>] fib_table_insert+0x90/0x41f
          [<ffffffff814a8010>] inet_rtm_newroute+0x4b/0x52
          ...
      
      modprobe is trying to load rtnl-lwt-MPLS:
      
      root       881     5  0 21:25 ?        00:00:00 /sbin/modprobe -q -- rtnl-lwt-MPLS
      
      and it hangs after loading mpls_router:
      
          $ cat /proc/881/stack
          [<ffffffff81441537>] rtnl_lock+0x12/0x14
          [<ffffffff8142ca2a>] register_netdevice_notifier+0x16/0x179
          [<ffffffffa0033025>] mpls_init+0x25/0x1000 [mpls_router]
          [<ffffffff81000471>] do_one_initcall+0x8e/0x13f
          [<ffffffff81119961>] do_init_module+0x5a/0x1e5
          [<ffffffff810bd070>] load_module+0x13bd/0x17d6
          ...
      
      The problem is that lwtunnel_build_state is called with rtnl lock
      held preventing mpls_init from registering.
      
      Given the potential references held by the time lwtunnel_build_state it
      can not drop the rtnl lock to the load module. So, extract the module
      loading code from lwtunnel_build_state into a new function to validate
      the encap type. The new function is called while converting the user
      request into a fib_config which is well before any table, device or
      fib entries are examined.
      
      Fixes: 745041e2 ("lwtunnel: autoload of lwt modules")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ed59592
  34. 03 1月, 2017 1 次提交
  35. 25 12月, 2016 1 次提交