1. 26 4月, 2021 19 次提交
  2. 14 4月, 2021 4 次提交
    • C
      SUNRPC: Handle major timeout in xprt_adjust_timeout() · 09252177
      Chris Dion 提交于
      Currently if a major timeout value is reached, but the minor value has
      not been reached, an ETIMEOUT will not be sent back to the caller.
      This can occur if the v4 server is not responding to requests and
      retrans is configured larger than the default of two.
      
      For example, A TCP mount with a configured timeout value of 50 and a
      retransmission count of 3 to a v4 server which is not responding:
      
      1. Initial value and increment set to 5s, maxval set to 20s, retries at 3
      2. Major timeout is set to 20s, minor timeout set to 5s initially
      3. xport_adjust_timeout() is called after 5s, retry with 10s timeout,
         minor timeout is bumped to 10s
      4. And again after another 10s, 15s total time with minor timeout set
         to 15s
      5. After 20s total time xport_adjust_timeout is called as major timeout is
         reached, but skipped because the minor timeout is not reached
             - After this time the cpu spins continually calling
             	 xport_adjust_timeout() and returning 0 for 10 seconds.
      	 As seen on perf sched:
         	 39243.913182 [0005]  mount.nfs[3794] 4607.938      0.017   9746.863
      6. This continues until the 15s minor timeout condition is reached (in
         this case for 10 seconds). After which the ETIMEOUT is processed
         back to the caller, the cpu spinning stops, and normal operations
         continue
      
      Fixes: 7de62bc0 ("SUNRPC dont update timeout value on connection reset")
      Signed-off-by: NChris Dion <Christopher.Dion@dell.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      09252177
    • C
      SUNRPC: Remove trace_xprt_transmit_queued · 6cf23783
      Chuck Lever 提交于
      This tracepoint can crash when dereferencing snd_task because
      when some transports connect, they put a cookie in that field
      instead of a pointer to an rpc_task.
      
      BUG: KASAN: use-after-free in trace_event_raw_event_xprt_writelock_event+0x141/0x18e [sunrpc]
      Read of size 2 at addr ffff8881a83bd3a0 by task git/331872
      
      CPU: 11 PID: 331872 Comm: git Tainted: G S                5.12.0-rc2-00007-g3ab6e585a7f9 #1453
      Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
      Call Trace:
       dump_stack+0x9c/0xcf
       print_address_description.constprop.0+0x18/0x239
       kasan_report+0x174/0x1b0
       trace_event_raw_event_xprt_writelock_event+0x141/0x18e [sunrpc]
       xprt_prepare_transmit+0x8e/0xc1 [sunrpc]
       call_transmit+0x4d/0xc6 [sunrpc]
      
      Fixes: 9ce07ae5 ("SUNRPC: Replace dprintk() call site in xprt_prepare_transmit")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      6cf23783
    • C
      SUNRPC: Add tracepoint that fires when an RPC is retransmitted · e936a597
      Chuck Lever 提交于
      A separate tracepoint can be left enabled all the time to capture
      rare but important retransmission events. So for example:
      
      kworker/u26:3-568   [009]   156.967933: xprt_retransmit:      task:44093@5 xid=0xa25dbc79 nfsv3 WRITE ntrans=2
      
      Or, for example, enable all nfs and nfs4 tracepoints, and set up a
      trigger to disable tracing when xprt_retransmit fires to capture
      everything that leads up to it.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      e936a597
    • C
      SUNRPC: Move fault injection call sites · 7638e0bf
      Chuck Lever 提交于
      I've hit some crashes that occur in the xprt_rdma_inject_disconnect
      path. It appears that, for some provides, rdma_disconnect() can
      take so long that the transport can disconnect and release its
      hardware resources while rdma_disconnect() is still running,
      resulting in a UAF in the provider.
      
      The transport's fault injection method may depend on the stability
      of transport data structures. That means it needs to be invoked
      only from contexts that hold the transport write lock.
      
      Fixes: 4a068258 ("SUNRPC: Transport fault injection")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      7638e0bf
  3. 05 4月, 2021 3 次提交
  4. 24 3月, 2021 2 次提交
    • V
      net: bridge: don't notify switchdev for local FDB addresses · 6ab4c311
      Vladimir Oltean 提交于
      As explained in this discussion:
      https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/
      
      the switchdev notifiers for FDB entries managed to have a zero-day bug.
      The bridge would not say that this entry is local:
      
      ip link add br0 type bridge
      ip link set swp0 master br0
      bridge fdb add dev swp0 00:01:02:03:04:05 master local
      
      and the switchdev driver would be more than happy to offload it as a
      normal static FDB entry. This is despite the fact that 'local' and
      non-'local' entries have completely opposite directions: a local entry
      is locally terminated and not forwarded, whereas a static entry is
      forwarded and not locally terminated. So, for example, DSA would install
      this entry on swp0 instead of installing it on the CPU port as it should.
      
      There is an even sadder part, which is that the 'local' flag is implicit
      if 'static' is not specified, meaning that this command produces the
      same result of adding a 'local' entry:
      
      bridge fdb add dev swp0 00:01:02:03:04:05 master
      
      I've updated the man pages for 'bridge', and after reading it now, it
      should be pretty clear to any user that the commands above were broken
      and should have never resulted in the 00:01:02:03:04:05 address being
      forwarded (this behavior is coherent with non-switchdev interfaces):
      https://patchwork.kernel.org/project/netdevbpf/cover/20210211104502.2081443-1-olteanv@gmail.com/
      If you're a user reading this and this is what you want, just use:
      
      bridge fdb add dev swp0 00:01:02:03:04:05 master static
      
      Because switchdev should have given drivers the means from day one to
      classify FDB entries as local/non-local, but didn't, it means that all
      drivers are currently broken. So we can just as well omit the switchdev
      notifications for local FDB entries, which is exactly what this patch
      does to close the bug in stable trees. For further development work
      where drivers might want to trap the local FDB entries to the host, we
      can add a 'bool is_local' to br_switchdev_fdb_call_notifiers(), and
      selectively make drivers act upon that bit, while all the others ignore
      those entries if the 'is_local' bit is set.
      
      Fixes: 6b26b51b ("net: bridge: Add support for notifying devices about FDB add/del")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ab4c311
    • M
      net/sched: act_ct: clear post_ct if doing ct_clear · 8ca1b090
      Marcelo Ricardo Leitner 提交于
      Invalid detection works with two distinct moments: act_ct tries to find
      a conntrack entry and set post_ct true, indicating that that was
      attempted. Then, when flow dissector tries to dissect CT info and no
      entry is there, it knows that it was tried and no entry was found, and
      synthesizes/sets
                        key->ct_state = TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
                                        TCA_FLOWER_KEY_CT_FLAGS_INVALID;
      mimicing what OVS does.
      
      OVS has this a bit more streamlined, as it recomputes the key after
      trying to find a conntrack entry for it.
      
      Issue here is, when we have 'tc action ct clear', it didn't clear
      post_ct, causing a subsequent match on 'ct_state -trk' to fail, due to
      the above. The fix, thus, is to clear it.
      
      Reproducer rules:
      tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 0 \
      	protocol ip flower ip_proto tcp ct_state -trk \
      	action ct zone 1 pipe \
      	action goto chain 2
      tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 2 \
      	protocol ip flower \
      	action ct clear pipe \
      	action goto chain 4
      tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 4 \
      	protocol ip flower ct_state -trk \
      	action mirred egress redirect dev enp130s0f1np1_0
      
      With the fix, the 3rd rule matches, like it does with OVS kernel
      datapath.
      
      Fixes: 7baf2429 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: Nwenxu <wenxu@ucloud.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ca1b090
  5. 23 3月, 2021 2 次提交
  6. 21 3月, 2021 1 次提交
  7. 20 3月, 2021 2 次提交
  8. 19 3月, 2021 2 次提交
    • J
      net: check all name nodes in __dev_alloc_name · 6c015a22
      Jiri Bohac 提交于
      __dev_alloc_name(), when supplied with a name containing '%d',
      will search for the first available device number to generate a
      unique device name.
      
      Since commit ff927412 ("net:
      introduce name_node struct to be used in hashlist") network
      devices may have alternate names.  __dev_alloc_name() does take
      these alternate names into account, possibly generating a name
      that is already taken and failing with -ENFILE as a result.
      
      This demonstrates the bug:
      
          # rmmod dummy 2>/dev/null
          # ip link property add dev lo altname dummy0
          # modprobe dummy numdummies=1
          modprobe: ERROR: could not insert 'dummy': Too many open files in system
      
      Instead of creating a device named dummy1, modprobe fails.
      
      Fix this by checking all the names in the d->name_node list, not just d->name.
      Signed-off-by: NJiri Bohac <jbohac@suse.cz>
      Fixes: ff927412 ("net: introduce name_node struct to be used in hashlist")
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c015a22
    • J
      ipv6: weaken the v4mapped source check · dcc32f4f
      Jakub Kicinski 提交于
      This reverts commit 6af1799a.
      
      Commit 6af1799a ("ipv6: drop incoming packets having a v4mapped
      source address") introduced an input check against v4mapped addresses.
      Use of such addresses on the wire is indeed questionable and not
      allowed on public Internet. As the commit pointed out
      
        https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02
      
      lists potential issues.
      
      Unfortunately there are applications which use v4mapped addresses,
      and breaking them is a clear regression. For example v4mapped
      addresses (or any semi-valid addresses, really) may be used
      for uni-direction event streams or packet export.
      
      Since the issue which sparked the addition of the check was with
      TCP and request_socks in particular push the check down to TCPv6
      and DCCP. This restores the ability to receive UDPv6 packets with
      v4mapped address as the source.
      
      Keep using the IPSTATS_MIB_INHDRERRORS statistic to minimize the
      user-visible changes.
      
      Fixes: 6af1799a ("ipv6: drop incoming packets having a v4mapped source address")
      Reported-by: NSunyi Shao <sunyishao@fb.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Acked-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dcc32f4f
  9. 18 3月, 2021 5 次提交