1. 13 3月, 2020 1 次提交
  2. 25 2月, 2020 1 次提交
    • A
      ip6mr: Fix RCU list debugging warning · 28b380e2
      Amol Grover 提交于
      ip6mr_for_each_table() macro uses list_for_each_entry_rcu()
      for traversing outside an RCU read side critical section
      but under the protection of rtnl_mutex. Hence add the
      corresponding lockdep expression to silence the following
      false-positive warnings:
      
      [    4.319479] =============================
      [    4.319480] WARNING: suspicious RCU usage
      [    4.319482] 5.5.4-stable #17 Tainted: G            E
      [    4.319483] -----------------------------
      [    4.319485] net/ipv6/ip6mr.c:1243 RCU-list traversed in non-reader section!!
      
      [    4.456831] =============================
      [    4.456832] WARNING: suspicious RCU usage
      [    4.456834] 5.5.4-stable #17 Tainted: G            E
      [    4.456835] -----------------------------
      [    4.456837] net/ipv6/ip6mr.c:1582 RCU-list traversed in non-reader section!!
      Signed-off-by: NAmol Grover <frextrite@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28b380e2
  3. 05 10月, 2019 1 次提交
  4. 07 9月, 2019 1 次提交
    • H
      ipmr: remove hard code cache_resolve_queue_len limit · 0079ad8e
      Hangbin Liu 提交于
      This is a re-post of previous patch wrote by David Miller[1].
      
      Phil Karn reported[2] that on busy networks with lots of unresolved
      multicast routing entries, the creation of new multicast group routes
      can be extremely slow and unreliable.
      
      The reason is we hard-coded multicast route entries with unresolved source
      addresses(cache_resolve_queue_len) to 10. If some multicast route never
      resolves and the unresolved source addresses increased, there will
      be no ability to create new multicast route cache.
      
      To resolve this issue, we need either add a sysctl entry to make the
      cache_resolve_queue_len configurable, or just remove cache_resolve_queue_len
      limit directly, as we already have the socket receive queue limits of mrouted
      socket, pointed by David.
      
      >From my side, I'd perfer to remove the cache_resolve_queue_len limit instead
      of creating two more(IPv4 and IPv6 version) sysctl entry.
      
      [1] https://lkml.org/lkml/2018/7/22/11
      [2] https://lkml.org/lkml/2018/7/21/343
      
      v3: instead of remove cache_resolve_queue_len totally, let's only remove
      the hard code limit when allocate the unresolved cache, as Eric Dumazet
      suggested, so we don't need to re-count it in other places.
      
      v2: hold the mfc_unres_lock while walking the unresolved list in
      queue_count(), as Nikolay Aleksandrov remind.
      Reported-by: NPhil Karn <karn@ka9q.net>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0079ad8e
  5. 31 5月, 2019 1 次提交
  6. 08 4月, 2019 1 次提交
    • N
      rhashtable: use bit_spin_locks to protect hash bucket. · 8f0db018
      NeilBrown 提交于
      This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
      bucket pointer to lock the hash chain for that bucket.
      
      The benefits of a bit spin_lock are:
       - no need to allocate a separate array of locks.
       - no need to have a configuration option to guide the
         choice of the size of this array
       - locking cost is often a single test-and-set in a cache line
         that will have to be loaded anyway.  When inserting at, or removing
         from, the head of the chain, the unlock is free - writing the new
         address in the bucket head implicitly clears the lock bit.
         For __rhashtable_insert_fast() we ensure this always happens
         when adding a new key.
       - even when lockings costs 2 updates (lock and unlock), they are
         in a cacheline that needs to be read anyway.
      
      The cost of using a bit spin_lock is a little bit of code complexity,
      which I think is quite manageable.
      
      Bit spin_locks are sometimes inappropriate because they are not fair -
      if multiple CPUs repeatedly contend of the same lock, one CPU can
      easily be starved.  This is not a credible situation with rhashtable.
      Multiple CPUs may want to repeatedly add or remove objects, but they
      will typically do so at different buckets, so they will attempt to
      acquire different locks.
      
      As we have more bit-locks than we previously had spinlocks (by at
      least a factor of two) we can expect slightly less contention to
      go with the slightly better cache behavior and reduced memory
      consumption.
      
      To enhance type checking, a new struct is introduced to represent the
        pointer plus lock-bit
      that is stored in the bucket-table.  This is "struct rhash_lock_head"
      and is empty.  A pointer to this needs to be cast to either an
      unsigned lock, or a "struct rhash_head *" to be useful.
      Variables of this type are most often called "bkt".
      
      Previously "pprev" would sometimes point to a bucket, and sometimes a
      ->next pointer in an rhash_head.  As these are now different types,
      pprev is NULL when it would have pointed to the bucket. In that case,
      'blk' is used, together with correct locking protocol.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f0db018
  7. 05 3月, 2019 1 次提交
    • I
      ip6mr: Do not call __IP6_INC_STATS() from preemptible context · 87c11f1d
      Ido Schimmel 提交于
      Similar to commit 44f49dd8 ("ipmr: fix possible race resulting from
      improper usage of IP_INC_STATS_BH() in preemptible context."), we cannot
      assume preemption is disabled when incrementing the counter and
      accessing a per-CPU variable.
      
      Preemption can be enabled when we add a route in process context that
      corresponds to packets stored in the unresolved queue, which are then
      forwarded using this route [1].
      
      Fix this by using IP6_INC_STATS() which takes care of disabling
      preemption on architectures where it is needed.
      
      [1]
      [  157.451447] BUG: using __this_cpu_add() in preemptible [00000000] code: smcrouted/2314
      [  157.460409] caller is ip6mr_forward2+0x73e/0x10e0
      [  157.460434] CPU: 3 PID: 2314 Comm: smcrouted Not tainted 5.0.0-rc7-custom-03635-g22f2712113f1 #1336
      [  157.460449] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
      [  157.460461] Call Trace:
      [  157.460486]  dump_stack+0xf9/0x1be
      [  157.460553]  check_preemption_disabled+0x1d6/0x200
      [  157.460576]  ip6mr_forward2+0x73e/0x10e0
      [  157.460705]  ip6_mr_forward+0x9a0/0x1510
      [  157.460771]  ip6mr_mfc_add+0x16b3/0x1e00
      [  157.461155]  ip6_mroute_setsockopt+0x3cb/0x13c0
      [  157.461384]  do_ipv6_setsockopt.isra.8+0x348/0x4060
      [  157.462013]  ipv6_setsockopt+0x90/0x110
      [  157.462036]  rawv6_setsockopt+0x4a/0x120
      [  157.462058]  __sys_setsockopt+0x16b/0x340
      [  157.462198]  __x64_sys_setsockopt+0xbf/0x160
      [  157.462220]  do_syscall_64+0x14d/0x610
      [  157.462349]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 0912ea38 ("[IPV6] MROUTE: Add stats in multicast routing module method ip6_mr_forward().")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NAmit Cohen <amitc@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87c11f1d
  8. 22 2月, 2019 1 次提交
  9. 28 1月, 2019 1 次提交
    • N
      ip6mr: Fix notifiers call on mroute_clean_tables() · 146820cc
      Nir Dotan 提交于
      When the MC route socket is closed, mroute_clean_tables() is called to
      cleanup existing routes. Mistakenly notifiers call was put on the cleanup
      of the unresolved MC route entries cache.
      In a case where the MC socket closes before an unresolved route expires,
      the notifier call leads to a crash, caused by the driver trying to
      increment a non initialized refcount_t object [1] and then when handling
      is done, to decrement it [2]. This was detected by a test recently added in
      commit 6d4efada ("selftests: forwarding: Add multicast routing test").
      
      Fix that by putting notifiers call on the resolved entries traversal,
      instead of on the unresolved entries traversal.
      
      [1]
      
      [  245.748967] refcount_t: increment on 0; use-after-free.
      [  245.754829] WARNING: CPU: 3 PID: 3223 at lib/refcount.c:153 refcount_inc_checked+0x2b/0x30
      ...
      [  245.802357] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
      [  245.811873] RIP: 0010:refcount_inc_checked+0x2b/0x30
      ...
      [  245.907487] Call Trace:
      [  245.910231]  mlxsw_sp_router_fib_event.cold.181+0x42/0x47 [mlxsw_spectrum]
      [  245.917913]  notifier_call_chain+0x45/0x7
      [  245.922484]  atomic_notifier_call_chain+0x15/0x20
      [  245.927729]  call_fib_notifiers+0x15/0x30
      [  245.932205]  mroute_clean_tables+0x372/0x3f
      [  245.936971]  ip6mr_sk_done+0xb1/0xc0
      [  245.940960]  ip6_mroute_setsockopt+0x1da/0x5f0
      ...
      
      [2]
      
      [  246.128487] refcount_t: underflow; use-after-free.
      [  246.133859] WARNING: CPU: 0 PID: 7 at lib/refcount.c:187 refcount_sub_and_test_checked+0x4c/0x60
      [  246.183521] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
      ...
      [  246.193062] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fibmr_event_work [mlxsw_spectrum]
      [  246.202394] RIP: 0010:refcount_sub_and_test_checked+0x4c/0x60
      ...
      [  246.298889] Call Trace:
      [  246.301617]  refcount_dec_and_test_checked+0x11/0x20
      [  246.307170]  mlxsw_sp_router_fibmr_event_work.cold.196+0x47/0x78 [mlxsw_spectrum]
      [  246.315531]  process_one_work+0x1fa/0x3f0
      [  246.320005]  worker_thread+0x2f/0x3e0
      [  246.324083]  kthread+0x118/0x130
      [  246.327683]  ? wq_update_unbound_numa+0x1b0/0x1b0
      [  246.332926]  ? kthread_park+0x80/0x80
      [  246.337013]  ret_from_fork+0x1f/0x30
      
      Fixes: 088aa3ee ("ip6mr: Support fib notifications")
      Signed-off-by: NNir Dotan <nird@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      146820cc
  10. 02 1月, 2019 1 次提交
    • W
      ip: validate header length on virtual device xmit · cb9f1b78
      Willem de Bruijn 提交于
      KMSAN detected read beyond end of buffer in vti and sit devices when
      passing truncated packets with PF_PACKET. The issue affects additional
      ip tunnel devices.
      
      Extend commit 76c0ddd8 ("ip6_tunnel: be careful when accessing the
      inner header") and commit ccfec9e5 ("ip_tunnel: be careful when
      accessing the inner header").
      
      Move the check to a separate helper and call at the start of each
      ndo_start_xmit function in net/ipv4 and net/ipv6.
      
      Minor changes:
      - convert dev_kfree_skb to kfree_skb on error path,
        as dev_kfree_skb calls consume_skb which is not for error paths.
      - use pskb_network_may_pull even though that is pedantic here,
        as the same as pskb_may_pull for devices without llheaders.
      - do not cache ipv6 hdrs if used only once
        (unsafe across pskb_may_pull, was more relevant to earlier patch)
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb9f1b78
  11. 18 12月, 2018 1 次提交
  12. 15 12月, 2018 1 次提交
  13. 07 12月, 2018 1 次提交
  14. 25 10月, 2018 1 次提交
  15. 16 10月, 2018 3 次提交
  16. 09 10月, 2018 1 次提交
    • D
      rtnetlink: Update fib dumps for strict data checking · e8ba330a
      David Ahern 提交于
      Add helper to check netlink message for route dumps. If the strict flag
      is set the dump request is expected to have an rtmsg struct as the header.
      All elements of the struct are expected to be 0 with the exception of
      rtm_flags (which is used by both ipv4 and ipv6 dumps) and no attributes
      can be appended. rtm_flags can only have RTM_F_CLONED and RTM_F_PREFIX
      set.
      
      Update inet_dump_fib, inet6_dump_fib, mpls_dump_routes, ipmr_rtm_dumproute,
      and ip6mr_rtm_dumproute to call this helper if strict data checking is
      enabled.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8ba330a
  17. 03 10月, 2018 1 次提交
  18. 22 6月, 2018 1 次提交
    • N
      rhashtable: split rhashtable.h · 0eb71a9d
      NeilBrown 提交于
      Due to the use of rhashtables in net namespaces,
      rhashtable.h is included in lots of the kernel,
      so a small changes can required a large recompilation.
      This makes development painful.
      
      This patch splits out rhashtable-types.h which just includes
      the major type declarations, and does not include (non-trivial)
      inline code.  rhashtable.h is no longer included by anything
      in the include/ directory.
      Common include files only include rhashtable-types.h so a large
      recompilation is only triggered when that changes.
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0eb71a9d
  19. 06 6月, 2018 2 次提交
  20. 16 5月, 2018 1 次提交
  21. 23 4月, 2018 1 次提交
  22. 28 3月, 2018 1 次提交
  23. 27 3月, 2018 3 次提交
  24. 08 3月, 2018 1 次提交
  25. 02 3月, 2018 11 次提交