1. 08 11月, 2019 5 次提交
    • A
      mac80211: fix station inactive_time shortly after boot · 285531f9
      Ahmed Zaki 提交于
      In the first 5 minutes after boot (time of INITIAL_JIFFIES),
      ieee80211_sta_last_active() returns zero if last_ack is zero. This
      leads to "inactive time" showing jiffies_to_msecs(jiffies).
      
       # iw wlan0 station get fc:ec:da:64:a6:dd
       Station fc:ec:da:64:a6:dd (on wlan0)
      	inactive time:	4294894049 ms
      	.
      	.
      	connected time:	70 seconds
      
      Fix by returning last_rx if last_ack == 0.
      Signed-off-by: NAhmed Zaki <anzaki@gmail.com>
      Link: https://lore.kernel.org/r/20191031121243.27694-1-anzaki@gmail.comSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      285531f9
    • J
      mac80211: fix ieee80211_txq_setup_flows() failure path · 6dd47d97
      Johannes Berg 提交于
      If ieee80211_txq_setup_flows() fails, we don't clean up LED
      state properly, leading to crashes later on, fix that.
      
      Fixes: dc8b274f ("mac80211: Move up init of TXQs")
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Acked-by: NToke Høiland-Jørgensen <toke@toke.dk>
      Link: https://lore.kernel.org/r/20191105154110.1ccf7112ba5d.I0ba865792446d051867b33153be65ce6b063d98c@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      6dd47d97
    • D
      ipv4: Fix table id reference in fib_sync_down_addr · e0a31262
      David Ahern 提交于
      Hendrik reported routes in the main table using source address are not
      removed when the address is removed. The problem is that fib_sync_down_addr
      does not account for devices in the default VRF which are associated
      with the main table. Fix by updating the table id reference.
      
      Fixes: 5a56a0b3 ("net: Don't delete routes in different VRFs")
      Reported-by: NHendrik Donner <hd@os-cillation.de>
      Signed-off-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0a31262
    • E
      ipv6: fixes rt6_probe() and fib6_nh->last_probe init · 1bef4c22
      Eric Dumazet 提交于
      While looking at a syzbot KCSAN report [1], I found multiple
      issues in this code :
      
      1) fib6_nh->last_probe has an initial value of 0.
      
         While probably okay on 64bit kernels, this causes an issue
         on 32bit kernels since the time_after(jiffies, 0 + interval)
         might be false ~24 days after boot (for HZ=1000)
      
      2) The data-race found by KCSAN
         I could use READ_ONCE() and WRITE_ONCE(), but we also can
         take the opportunity of not piling-up too many rt6_probe_deferred()
         works by using instead cmpxchg() so that only one cpu wins the race.
      
      [1]
      BUG: KCSAN: data-race in find_match / find_match
      
      write to 0xffff8880bb7aabe8 of 8 bytes by interrupt on cpu 1:
       rt6_probe net/ipv6/route.c:663 [inline]
       find_match net/ipv6/route.c:757 [inline]
       find_match+0x5bd/0x790 net/ipv6/route.c:733
       __find_rr_leaf+0xe3/0x780 net/ipv6/route.c:831
       find_rr_leaf net/ipv6/route.c:852 [inline]
       rt6_select net/ipv6/route.c:896 [inline]
       fib6_table_lookup+0x383/0x650 net/ipv6/route.c:2164
       ip6_pol_route+0xee/0x5c0 net/ipv6/route.c:2200
       ip6_pol_route_output+0x48/0x60 net/ipv6/route.c:2452
       fib6_rule_lookup+0x3d6/0x470 net/ipv6/fib6_rules.c:117
       ip6_route_output_flags_noref+0x16b/0x230 net/ipv6/route.c:2484
       ip6_route_output_flags+0x50/0x1a0 net/ipv6/route.c:2497
       ip6_dst_lookup_tail+0x25d/0xc30 net/ipv6/ip6_output.c:1049
       ip6_dst_lookup_flow+0x68/0x120 net/ipv6/ip6_output.c:1150
       inet6_csk_route_socket+0x2f7/0x420 net/ipv6/inet6_connection_sock.c:106
       inet6_csk_xmit+0x91/0x1f0 net/ipv6/inet6_connection_sock.c:121
       __tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
       tcp_transmit_skb net/ipv4/tcp_output.c:1185 [inline]
       tcp_xmit_probe_skb+0x19b/0x1d0 net/ipv4/tcp_output.c:3735
      
      read to 0xffff8880bb7aabe8 of 8 bytes by interrupt on cpu 0:
       rt6_probe net/ipv6/route.c:657 [inline]
       find_match net/ipv6/route.c:757 [inline]
       find_match+0x521/0x790 net/ipv6/route.c:733
       __find_rr_leaf+0xe3/0x780 net/ipv6/route.c:831
       find_rr_leaf net/ipv6/route.c:852 [inline]
       rt6_select net/ipv6/route.c:896 [inline]
       fib6_table_lookup+0x383/0x650 net/ipv6/route.c:2164
       ip6_pol_route+0xee/0x5c0 net/ipv6/route.c:2200
       ip6_pol_route_output+0x48/0x60 net/ipv6/route.c:2452
       fib6_rule_lookup+0x3d6/0x470 net/ipv6/fib6_rules.c:117
       ip6_route_output_flags_noref+0x16b/0x230 net/ipv6/route.c:2484
       ip6_route_output_flags+0x50/0x1a0 net/ipv6/route.c:2497
       ip6_dst_lookup_tail+0x25d/0xc30 net/ipv6/ip6_output.c:1049
       ip6_dst_lookup_flow+0x68/0x120 net/ipv6/ip6_output.c:1150
       inet6_csk_route_socket+0x2f7/0x420 net/ipv6/inet6_connection_sock.c:106
       inet6_csk_xmit+0x91/0x1f0 net/ipv6/inet6_connection_sock.c:121
       __tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 18894 Comm: udevd Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: cc3a86c8 ("ipv6: Change rt6_probe to take a fib6_nh")
      Fixes: f547fac6 ("ipv6: rate-limit probes for neighbourless routes")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1bef4c22
    • P
      nfc: netlink: fix double device reference drop · 025ec40b
      Pan Bian 提交于
      The function nfc_put_device(dev) is called twice to drop the reference
      to dev when there is no associated local llcp. Remove one of them to fix
      the bug.
      
      Fixes: 52feb444 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
      Fixes: d9b8d8e1 ("NFC: llcp: Service Name Lookup netlink interface")
      Signed-off-by: NPan Bian <bianpan2016@163.com>
      Reviewed-by: NJohan Hovold <johan@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      025ec40b
  2. 07 11月, 2019 3 次提交
    • U
      net/smc: fix ethernet interface refcounting · 98f33755
      Ursula Braun 提交于
      If a pnet table entry is to be added mentioning a valid ethernet
      interface, but an invalid infiniband or ISM device, the dev_put()
      operation for the ethernet interface is called twice, resulting
      in a negative refcount for the ethernet interface, which disables
      removal of such a network interface.
      
      This patch removes one of the dev_put() calls.
      
      Fixes: 890a2cb4 ("net/smc: rework pnet table")
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98f33755
    • J
      net/tls: add a TX lock · 79ffe608
      Jakub Kicinski 提交于
      TLS TX needs to release and re-acquire the socket lock if send buffer
      fills up.
      
      TLS SW TX path currently depends on only allowing one thread to enter
      the function by the abuse of sk_write_pending. If another writer is
      already waiting for memory no new ones are allowed in.
      
      This has two problems:
       - writers don't wake other threads up when they leave the kernel;
         meaning that this scheme works for single extra thread (second
         application thread or delayed work) because memory becoming
         available will send a wake up request, but as Mallesham and
         Pooja report with larger number of threads it leads to threads
         being put to sleep indefinitely;
       - the delayed work does not get _scheduled_ but it may _run_ when
         other writers are present leading to crashes as writers don't
         expect state to change under their feet (same records get pushed
         and freed multiple times); it's hard to reliably bail from the
         work, however, because the mere presence of a writer does not
         guarantee that the writer will push pending records before exiting.
      
      Ensuring wakeups always happen will make the code basically open
      code a mutex. Just use a mutex.
      
      The TLS HW TX path does not have any locking (not even the
      sk_write_pending hack), yet it uses a per-socket sg_tx_data
      array to push records.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Reported-by: NMallesham  Jatharakonda <mallesh537@gmail.com>
      Reported-by: NPooja Trivedi <poojatrivedi@gmail.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79ffe608
    • J
      net/tls: don't pay attention to sk_write_pending when pushing partial records · 02b1fa07
      Jakub Kicinski 提交于
      sk_write_pending being not zero does not guarantee that partial
      record will be pushed. If the thread waiting for memory times out
      the pending record may get stuck.
      
      In case of tls_device there is no path where parial record is
      set and writer present in the first place. Partial record is
      set only in tls_push_sg() and tls_push_sg() will return an
      error immediately. All tls_device callers of tls_push_sg()
      will return (and not wait for memory) if it failed.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02b1fa07
  3. 06 11月, 2019 3 次提交
    • J
      net/tls: fix sk_msg trim on fallback to copy mode · 683916f6
      Jakub Kicinski 提交于
      sk_msg_trim() tries to only update curr pointer if it falls into
      the trimmed region. The logic, however, does not take into the
      account pointer wrapping that sk_msg_iter_var_prev() does nor
      (as John points out) the fact that msg->sg is a ring buffer.
      
      This means that when the message was trimmed completely, the new
      curr pointer would have the value of MAX_MSG_FRAGS - 1, which is
      neither smaller than any other value, nor would it actually be
      correct.
      
      Special case the trimming to 0 length a little bit and rework
      the comparison between curr and end to take into account wrapping.
      
      This bug caused the TLS code to not copy all of the message, if
      zero copy filled in fewer sg entries than memcopy would need.
      
      Big thanks to Alexander Potapenko for the non-KMSAN reproducer.
      
      v2:
       - take into account that msg->sg is a ring buffer (John).
      
      Link: https://lore.kernel.org/netdev/20191030160542.30295-1-jakub.kicinski@netronome.com/ (v1)
      
      Fixes: d829e9c4 ("tls: convert to generic sk_msg interface")
      Reported-by: syzbot+f8495bff23a879a6d0bd@syzkaller.appspotmail.com
      Reported-by: syzbot+6f50c99e8f6194bf363f@syzkaller.appspotmail.com
      Co-developed-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      683916f6
    • J
      net: sched: prevent duplicate flower rules from tcf_proto destroy race · 59eb87cb
      John Hurley 提交于
      When a new filter is added to cls_api, the function
      tcf_chain_tp_insert_unique() looks up the protocol/priority/chain to
      determine if the tcf_proto is duplicated in the chain's hashtable. It then
      creates a new entry or continues with an existing one. In cls_flower, this
      allows the function fl_ht_insert_unque to determine if a filter is a
      duplicate and reject appropriately, meaning that the duplicate will not be
      passed to drivers via the offload hooks. However, when a tcf_proto is
      destroyed it is removed from its chain before a hardware remove hook is
      hit. This can lead to a race whereby the driver has not received the
      remove message but duplicate flows can be accepted. This, in turn, can
      lead to the offload driver receiving incorrect duplicate flows and out of
      order add/delete messages.
      
      Prevent duplicates by utilising an approach suggested by Vlad Buslov. A
      hash table per block stores each unique chain/protocol/prio being
      destroyed. This entry is only removed when the full destroy (and hardware
      offload) has completed. If a new flow is being added with the same
      identiers as a tc_proto being detroyed, then the add request is replayed
      until the destroy is complete.
      
      Fixes: 8b64678e ("net: sched: refactor tp insert/delete for concurrent execution")
      Signed-off-by: NJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Reported-by: NLouis Peens <louis.peens@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59eb87cb
    • I
      taprio: fix panic while hw offload sched list swap · 0763b3e8
      Ivan Khoronzhuk 提交于
      Don't swap oper and admin schedules too early, it's not correct and
      causes crash.
      
      Steps to reproduce:
      
      1)
      tc qdisc replace dev eth0 parent root handle 100 taprio \
          num_tc 3 \
          map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
          queues 1@0 1@1 1@2 \
          base-time $SOME_BASE_TIME \
          sched-entry S 01 80000 \
          sched-entry S 02 15000 \
          sched-entry S 04 40000 \
          flags 2
      
      2)
      tc qdisc replace dev eth0 parent root handle 100 taprio \
          base-time $SOME_BASE_TIME \
          sched-entry S 01 90000 \
          sched-entry S 02 20000 \
          sched-entry S 04 40000 \
          flags 2
      
      3)
      tc qdisc replace dev eth0 parent root handle 100 taprio \
          base-time $SOME_BASE_TIME \
          sched-entry S 01 150000 \
          sched-entry S 02 200000 \
          sched-entry S 04 40000 \
          flags 2
      
      Do 2 3 2 .. steps  more times if not happens and observe:
      
      [  305.832319] Unable to handle kernel write to read-only memory at
      virtual address ffff0000087ce7f0
      [  305.910887] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
      [  305.919306] Hardware name: Texas Instruments AM654 Base Board (DT)
      
      [...]
      
      [  306.017119] x1 : ffff800848031d88 x0 : ffff800848031d80
      [  306.022422] Call trace:
      [  306.024866]  taprio_free_sched_cb+0x4c/0x98
      [  306.029040]  rcu_process_callbacks+0x25c/0x410
      [  306.033476]  __do_softirq+0x10c/0x208
      [  306.037132]  irq_exit+0xb8/0xc8
      [  306.040267]  __handle_domain_irq+0x64/0xb8
      [  306.044352]  gic_handle_irq+0x7c/0x178
      [  306.048092]  el1_irq+0xb0/0x128
      [  306.051227]  arch_cpu_idle+0x10/0x18
      [  306.054795]  do_idle+0x120/0x138
      [  306.058015]  cpu_startup_entry+0x20/0x28
      [  306.061931]  rest_init+0xcc/0xd8
      [  306.065154]  start_kernel+0x3bc/0x3e4
      [  306.068810] Code: f2fbd5b7 f2fbd5b6 d503201f f9400422 (f9000662)
      [  306.074900] ---[ end trace 96c8e2284a9d9d6e ]---
      [  306.079507] Kernel panic - not syncing: Fatal exception in interrupt
      [  306.085847] SMP: stopping secondary CPUs
      [  306.089765] Kernel Offset: disabled
      
      Try to explain one of the possible crash cases:
      
      The "real" admin list is assigned when admin_sched is set to
      new_admin, it happens after "swap", that assigns to oper_sched NULL.
      Thus if call qdisc show it can crash.
      
      Farther, next second time, when sched list is updated, the admin_sched
      is not NULL and becomes the oper_sched, previous oper_sched was NULL so
      just skipped. But then admin_sched is assigned new_admin, but schedules
      to free previous assigned admin_sched (that already became oper_sched).
      
      Farther, next third time, when sched list is updated,
      while one more swap, oper_sched is not null, but it was happy to be
      freed already (while prev. admin update), so while try to free
      oper_sched the kernel panic happens at taprio_free_sched_cb().
      
      So, move the "swap emulation" where it should be according to function
      comment from code.
      
      Fixes: 9c66d156 ("taprio: Add support for hardware offloading")
      Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Acked-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Tested-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0763b3e8
  4. 05 11月, 2019 13 次提交
  5. 02 11月, 2019 3 次提交
    • J
      net: fix installing orphaned programs · aefc3e72
      Jakub Kicinski 提交于
      When netdevice with offloaded BPF programs is destroyed
      the programs are orphaned and removed from the program
      IDA - their IDs get released (the programs may remain
      accessible via existing open file descriptors and pinned
      files). After IDs are released they are set to 0.
      
      This confuses dev_change_xdp_fd() because it compares
      the __dev_xdp_query() result where 0 means no program
      with prog->aux->id where 0 means orphaned.
      
      dev_change_xdp_fd() would have incorrectly returned success
      even though it had not installed the program.
      
      Since drivers already catch this case via bpf_offload_dev_match()
      let them handle this case. The error message drivers produce in
      this case ("program loaded for a different device") is in fact
      correct as the orphaned program must had to be loaded for a
      different device.
      
      Fixes: c14a9f63 ("net: Don't call XDP_SETUP_PROG when nothing is changed")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aefc3e72
    • J
      net: cls_bpf: fix NULL deref on offload filter removal · 41aa29a5
      Jakub Kicinski 提交于
      Commit 40119211 ("net: sched: refactor block offloads counter
      usage") missed the fact that either new prog or old prog may be
      NULL.
      
      Fixes: 40119211 ("net: sched: refactor block offloads counter usage")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41aa29a5
    • E
      inet: stop leaking jiffies on the wire · a904a069
      Eric Dumazet 提交于
      Historically linux tried to stick to RFC 791, 1122, 2003
      for IPv4 ID field generation.
      
      RFC 6864 made clear that no matter how hard we try,
      we can not ensure unicity of IP ID within maximum
      lifetime for all datagrams with a given source
      address/destination address/protocol tuple.
      
      Linux uses a per socket inet generator (inet_id), initialized
      at connection startup with a XOR of 'jiffies' and other
      fields that appear clear on the wire.
      
      Thiemo Nagel pointed that this strategy is a privacy
      concern as this provides 16 bits of entropy to fingerprint
      devices.
      
      Let's switch to a random starting point, this is just as
      good as far as RFC 6864 is concerned and does not leak
      anything critical.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NThiemo Nagel <tnagel@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a904a069
  6. 01 11月, 2019 2 次提交
    • E
      tcp: increase tcp_max_syn_backlog max value · 623d0c2d
      Eric Dumazet 提交于
      tcp_max_syn_backlog default value depends on memory size
      and TCP ehash size. Before this patch, the max value
      was 2048 [1], which is considered too small nowadays.
      
      Increase it to 4096 to match the recent SOMAXCONN change.
      
      [1] This is with TCP ehash size being capped to 524288 buckets.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Yue Cao <ycao009@ucr.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      623d0c2d
    • D
      rxrpc: Fix handling of last subpacket of jumbo packet · f9c32435
      David Howells 提交于
      When rxrpc_recvmsg_data() sets the return value to 1 because it's drained
      all the data for the last packet, it checks the last-packet flag on the
      whole packet - but this is wrong, since the last-packet flag is only set on
      the final subpacket of the last jumbo packet.  This means that a call that
      receives its last packet in a jumbo packet won't complete properly.
      
      Fix this by having rxrpc_locate_data() determine the last-packet state of
      the subpacket it's looking at and passing that back to the caller rather
      than having the caller look in the packet header.  The caller then needs to
      cache this in the rxrpc_call struct as rxrpc_locate_data() isn't then
      called again for this packet.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Fixes: e2de6c40 ("rxrpc: Use info in skbuff instead of reparsing a jumbo packet")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9c32435
  7. 31 10月, 2019 4 次提交
    • E
      net: annotate accesses to sk->sk_incoming_cpu · 7170a977
      Eric Dumazet 提交于
      This socket field can be read and written by concurrent cpus.
      
      Use READ_ONCE() and WRITE_ONCE() annotations to document this,
      and avoid some compiler 'optimizations'.
      
      KCSAN reported :
      
      BUG: KCSAN: data-race in tcp_v4_rcv / tcp_v4_rcv
      
      write to 0xffff88812220763c of 4 bytes by interrupt on cpu 0:
       sk_incoming_cpu_update include/net/sock.h:953 [inline]
       tcp_v4_rcv+0x1b3c/0x1bb0 net/ipv4/tcp_ipv4.c:1934
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       process_backlog+0x1d3/0x420 net/core/dev.c:5955
       napi_poll net/core/dev.c:6392 [inline]
       net_rx_action+0x3ae/0xa90 net/core/dev.c:6460
       __do_softirq+0x115/0x33f kernel/softirq.c:292
       do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082
       do_softirq.part.0+0x6b/0x80 kernel/softirq.c:337
       do_softirq kernel/softirq.c:329 [inline]
       __local_bh_enable_ip+0x76/0x80 kernel/softirq.c:189
      
      read to 0xffff88812220763c of 4 bytes by interrupt on cpu 1:
       sk_incoming_cpu_update include/net/sock.h:952 [inline]
       tcp_v4_rcv+0x181a/0x1bb0 net/ipv4/tcp_ipv4.c:1934
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       process_backlog+0x1d3/0x420 net/core/dev.c:5955
       napi_poll net/core/dev.c:6392 [inline]
       net_rx_action+0x3ae/0xa90 net/core/dev.c:6460
       __do_softirq+0x115/0x33f kernel/softirq.c:292
       run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
       smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7170a977
    • T
      SUNRPC: Destroy the back channel when we destroy the host transport · 669996ad
      Trond Myklebust 提交于
      When we're destroying the host transport mechanism, we should ensure
      that we do not leak memory by failing to release any back channel
      slots that might still exist.
      Reported-by: NNeil Brown <neilb@suse.de>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      669996ad
    • T
      SUNRPC: The RDMA back channel mustn't disappear while requests are outstanding · 9edb455e
      Trond Myklebust 提交于
      If there are RDMA back channel requests being processed by the
      server threads, then we should hold a reference to the transport
      to ensure it doesn't get freed from underneath us.
      Reported-by: NNeil Brown <neilb@suse.de>
      Fixes: 63cae470 ("xprtrdma: Handle incoming backward direction RPC calls")
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      9edb455e
    • T
      SUNRPC: The TCP back channel mustn't disappear while requests are outstanding · 875f0706
      Trond Myklebust 提交于
      If there are TCP back channel requests being processed by the
      server threads, then we should hold a reference to the transport
      to ensure it doesn't get freed from underneath us.
      Reported-by: NNeil Brown <neilb@suse.de>
      Fixes: 2ea24497 ("SUNRPC: RPC callbacks may be split across several..")
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      875f0706
  8. 30 10月, 2019 5 次提交
  9. 29 10月, 2019 2 次提交
    • E
      udp: fix data-race in udp_set_dev_scratch() · a793183c
      Eric Dumazet 提交于
      KCSAN reported a data-race in udp_set_dev_scratch() [1]
      
      The issue here is that we must not write over skb fields
      if skb is shared. A similar issue has been fixed in commit
      89c22d8c ("net: Fix skb csum races when peeking")
      
      While we are at it, use a helper only dealing with
      udp_skb_scratch(skb)->csum_unnecessary, as this allows
      udp_set_dev_scratch() to be called once and thus inlined.
      
      [1]
      BUG: KCSAN: data-race in udp_set_dev_scratch / udpv6_recvmsg
      
      write to 0xffff888120278317 of 1 bytes by task 10411 on cpu 1:
       udp_set_dev_scratch+0xea/0x200 net/ipv4/udp.c:1308
       __first_packet_length+0x147/0x420 net/ipv4/udp.c:1556
       first_packet_length+0x68/0x2a0 net/ipv4/udp.c:1579
       udp_poll+0xea/0x110 net/ipv4/udp.c:2720
       sock_poll+0xed/0x250 net/socket.c:1256
       vfs_poll include/linux/poll.h:90 [inline]
       do_select+0x7d0/0x1020 fs/select.c:534
       core_sys_select+0x381/0x550 fs/select.c:677
       do_pselect.constprop.0+0x11d/0x160 fs/select.c:759
       __do_sys_pselect6 fs/select.c:784 [inline]
       __se_sys_pselect6 fs/select.c:769 [inline]
       __x64_sys_pselect6+0x12e/0x170 fs/select.c:769
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      read to 0xffff888120278317 of 1 bytes by task 10413 on cpu 0:
       udp_skb_csum_unnecessary include/net/udp.h:358 [inline]
       udpv6_recvmsg+0x43e/0xe90 net/ipv6/udp.c:310
       inet6_recvmsg+0xbb/0x240 net/ipv6/af_inet6.c:592
       sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871
       ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
       do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
       __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
       __do_sys_recvmmsg net/socket.c:2703 [inline]
       __se_sys_recvmmsg net/socket.c:2696 [inline]
       __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 10413 Comm: syz-executor.0 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 2276f58a ("udp: use a separate rx queue for packet reception")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Reviewed-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a793183c
    • E
      net: add READ_ONCE() annotation in __skb_wait_for_more_packets() · 7c422d0c
      Eric Dumazet 提交于
      __skb_wait_for_more_packets() can be called while other cpus
      can feed packets to the socket receive queue.
      
      KCSAN reported :
      
      BUG: KCSAN: data-race in __skb_wait_for_more_packets / __udp_enqueue_schedule_skb
      
      write to 0xffff888102e40b58 of 8 bytes by interrupt on cpu 0:
       __skb_insert include/linux/skbuff.h:1852 [inline]
       __skb_queue_before include/linux/skbuff.h:1958 [inline]
       __skb_queue_tail include/linux/skbuff.h:1991 [inline]
       __udp_enqueue_schedule_skb+0x2d7/0x410 net/ipv4/udp.c:1470
       __udp_queue_rcv_skb net/ipv4/udp.c:1940 [inline]
       udp_queue_rcv_one_skb+0x7bd/0xc70 net/ipv4/udp.c:2057
       udp_queue_rcv_skb+0xb5/0x400 net/ipv4/udp.c:2074
       udp_unicast_rcv_skb.isra.0+0x7e/0x1c0 net/ipv4/udp.c:2233
       __udp4_lib_rcv+0xa44/0x17c0 net/ipv4/udp.c:2300
       udp_rcv+0x2b/0x40 net/ipv4/udp.c:2470
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       process_backlog+0x1d3/0x420 net/core/dev.c:5955
      
      read to 0xffff888102e40b58 of 8 bytes by task 13035 on cpu 1:
       __skb_wait_for_more_packets+0xfa/0x320 net/core/datagram.c:100
       __skb_recv_udp+0x374/0x500 net/ipv4/udp.c:1683
       udp_recvmsg+0xe1/0xb10 net/ipv4/udp.c:1712
       inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871
       ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
       do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
       __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
       __do_sys_recvmmsg net/socket.c:2703 [inline]
       __se_sys_recvmmsg net/socket.c:2696 [inline]
       __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 13035 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c422d0c