1. 07 9月, 2019 2 次提交
  2. 06 9月, 2019 1 次提交
    • E
      net: sched: fix reordering issues · b88dd52c
      Eric Dumazet 提交于
      Whenever MQ is not used on a multiqueue device, we experience
      serious reordering problems. Bisection found the cited
      commit.
      
      The issue can be described this way :
      
      - A single qdisc hierarchy is shared by all transmit queues.
        (eg : tc qdisc replace dev eth0 root fq_codel)
      
      - When/if try_bulk_dequeue_skb_slow() dequeues a packet targetting
        a different transmit queue than the one used to build a packet train,
        we stop building the current list and save the 'bad' skb (P1) in a
        special queue. (bad_txq)
      
      - When dequeue_skb() calls qdisc_dequeue_skb_bad_txq() and finds this
        skb (P1), it checks if the associated transmit queues is still in frozen
        state. If the queue is still blocked (by BQL or NIC tx ring full),
        we leave the skb in bad_txq and return NULL.
      
      - dequeue_skb() calls q->dequeue() to get another packet (P2)
      
        The other packet can target the problematic queue (that we found
        in frozen state for the bad_txq packet), but another cpu just ran
        TX completion and made room in the txq that is now ready to accept
        new packets.
      
      - Packet P2 is sent while P1 is still held in bad_txq, P1 might be sent
        at next round. In practice P2 is the lead of a big packet train
        (P2,P3,P4 ...) filling the BQL budget and delaying P1 by many packets :/
      
      To solve this problem, we have to block the dequeue process as long
      as the first packet in bad_txq can not be sent. Reordering issues
      disappear and no side effects have been seen.
      
      Fixes: a53851e2 ("net: sched: explicit locking in gso_cpu fallback")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b88dd52c
  3. 05 9月, 2019 8 次提交
  4. 04 9月, 2019 3 次提交
  5. 03 9月, 2019 2 次提交
  6. 01 9月, 2019 3 次提交
    • V
      net/sched: cbs: Set default link speed to 10 Mbps in cbs_set_port_rate · 1c6c09a0
      Vladimir Oltean 提交于
      The discussion to be made is absolutely the same as in the case of
      previous patch ("taprio: Set default link speed to 10 Mbps in
      taprio_set_picos_per_byte"). Nothing is lost when setting a default.
      
      Cc: Leandro Dorileo <leandro.maciel.dorileo@intel.com>
      Fixes: e0a7683d ("net/sched: cbs: fix port_rate miscalculation")
      Acked-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c6c09a0
    • V
      taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte · f04b514c
      Vladimir Oltean 提交于
      The taprio budget needs to be adapted at runtime according to interface
      link speed. But that handling is problematic.
      
      For one thing, installing a qdisc on an interface that doesn't have
      carrier is not illegal. But taprio prints the following stack trace:
      
      [   31.851373] ------------[ cut here ]------------
      [   31.856024] WARNING: CPU: 1 PID: 207 at net/sched/sch_taprio.c:481 taprio_dequeue+0x1a8/0x2d4
      [   31.864566] taprio: dequeue() called with unknown picos per byte.
      [   31.864570] Modules linked in:
      [   31.873701] CPU: 1 PID: 207 Comm: tc Not tainted 5.3.0-rc5-01199-g8838fe023cd6 #1689
      [   31.881398] Hardware name: Freescale LS1021A
      [   31.885661] [<c03133a4>] (unwind_backtrace) from [<c030d8cc>] (show_stack+0x10/0x14)
      [   31.893368] [<c030d8cc>] (show_stack) from [<c10ac958>] (dump_stack+0xb4/0xc8)
      [   31.900555] [<c10ac958>] (dump_stack) from [<c0349d04>] (__warn+0xe0/0xf8)
      [   31.907395] [<c0349d04>] (__warn) from [<c0349d64>] (warn_slowpath_fmt+0x48/0x6c)
      [   31.914841] [<c0349d64>] (warn_slowpath_fmt) from [<c0f38db4>] (taprio_dequeue+0x1a8/0x2d4)
      [   31.923150] [<c0f38db4>] (taprio_dequeue) from [<c0f227b0>] (__qdisc_run+0x90/0x61c)
      [   31.930856] [<c0f227b0>] (__qdisc_run) from [<c0ec82ac>] (net_tx_action+0x12c/0x2bc)
      [   31.938560] [<c0ec82ac>] (net_tx_action) from [<c0302298>] (__do_softirq+0x130/0x3c8)
      [   31.946350] [<c0302298>] (__do_softirq) from [<c03502a0>] (irq_exit+0xbc/0xd8)
      [   31.953536] [<c03502a0>] (irq_exit) from [<c03a4808>] (__handle_domain_irq+0x60/0xb4)
      [   31.961328] [<c03a4808>] (__handle_domain_irq) from [<c0754478>] (gic_handle_irq+0x58/0x9c)
      [   31.969638] [<c0754478>] (gic_handle_irq) from [<c0301a8c>] (__irq_svc+0x6c/0x90)
      [   31.977076] Exception stack(0xe8167b20 to 0xe8167b68)
      [   31.982100] 7b20: e9d4bd80 00000cc0 000000cf 00000000 e9d4bd80 c1f38958 00000cc0 c1f38960
      [   31.990234] 7b40: 00000001 000000cf 00000004 e9dc0800 00000000 e8167b70 c0f478ec c0f46d94
      [   31.998363] 7b60: 60070013 ffffffff
      [   32.001833] [<c0301a8c>] (__irq_svc) from [<c0f46d94>] (netlink_trim+0x18/0xd8)
      [   32.009104] [<c0f46d94>] (netlink_trim) from [<c0f478ec>] (netlink_broadcast_filtered+0x34/0x414)
      [   32.017930] [<c0f478ec>] (netlink_broadcast_filtered) from [<c0f47cec>] (netlink_broadcast+0x20/0x28)
      [   32.027102] [<c0f47cec>] (netlink_broadcast) from [<c0eea378>] (rtnetlink_send+0x34/0x88)
      [   32.035238] [<c0eea378>] (rtnetlink_send) from [<c0f25890>] (notify_and_destroy+0x2c/0x44)
      [   32.043461] [<c0f25890>] (notify_and_destroy) from [<c0f25e08>] (qdisc_graft+0x398/0x470)
      [   32.051595] [<c0f25e08>] (qdisc_graft) from [<c0f27a00>] (tc_modify_qdisc+0x3a4/0x724)
      [   32.059470] [<c0f27a00>] (tc_modify_qdisc) from [<c0ee4c84>] (rtnetlink_rcv_msg+0x260/0x2ec)
      [   32.067864] [<c0ee4c84>] (rtnetlink_rcv_msg) from [<c0f4a988>] (netlink_rcv_skb+0xb8/0x110)
      [   32.076172] [<c0f4a988>] (netlink_rcv_skb) from [<c0f4a170>] (netlink_unicast+0x1b4/0x22c)
      [   32.084392] [<c0f4a170>] (netlink_unicast) from [<c0f4a5e4>] (netlink_sendmsg+0x33c/0x380)
      [   32.092614] [<c0f4a5e4>] (netlink_sendmsg) from [<c0ea9f40>] (sock_sendmsg+0x14/0x24)
      [   32.100403] [<c0ea9f40>] (sock_sendmsg) from [<c0eaa780>] (___sys_sendmsg+0x214/0x228)
      [   32.108279] [<c0eaa780>] (___sys_sendmsg) from [<c0eabad0>] (__sys_sendmsg+0x50/0x8c)
      [   32.116068] [<c0eabad0>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x54)
      [   32.123938] Exception stack(0xe8167fa8 to 0xe8167ff0)
      [   32.128960] 7fa0:                   b6fa68c8 000000f8 00000003 bea142d0 00000000 00000000
      [   32.137093] 7fc0: b6fa68c8 000000f8 0052154c 00000128 5d6468a2 00000000 00000028 00558c9c
      [   32.145224] 7fe0: 00000070 bea14278 00530d64 b6e17e64
      [   32.150659] ---[ end trace 2139c9827c3e5177 ]---
      
      This happens because the qdisc ->dequeue callback gets called. Which
      again is not illegal, the qdisc will dequeue even when the interface is
      up but doesn't have carrier (and hence SPEED_UNKNOWN), and the frames
      will be dropped further down the stack in dev_direct_xmit().
      
      And, at the end of the day, for what? For calculating the initial budget
      of an interface which is non-operational at the moment and where frames
      will get dropped anyway.
      
      So if we can't figure out the link speed, default to SPEED_10 and move
      along. We can also remove the runtime check now.
      
      Cc: Leandro Dorileo <leandro.maciel.dorileo@intel.com>
      Fixes: 7b9eba7b ("net/sched: taprio: fix picos_per_byte miscalculation")
      Acked-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f04b514c
    • V
      taprio: Fix kernel panic in taprio_destroy · efb55222
      Vladimir Oltean 提交于
      taprio_init may fail earlier than this line:
      
      	list_add(&q->taprio_list, &taprio_list);
      
      i.e. due to the net device not being multi queue.
      
      Attempting to remove q from the global taprio_list when it is not part
      of it will result in a kernel panic.
      
      Fix it by matching list_add and list_del better to one another in the
      order of operations. This way we can keep the deletion unconditional
      and with lower complexity - O(1).
      
      Cc: Leandro Dorileo <leandro.maciel.dorileo@intel.com>
      Fixes: 7b9eba7b ("net/sched: taprio: fix picos_per_byte miscalculation")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Acked-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efb55222
  7. 31 8月, 2019 1 次提交
    • D
      rxrpc: Fix lack of conn cleanup when local endpoint is cleaned up [ver #2] · d12040b6
      David Howells 提交于
      When a local endpoint is ceases to be in use, such as when the kafs module
      is unloaded, the kernel will emit an assertion failure if there are any
      outstanding client connections:
      
      	rxrpc: Assertion failed
      	------------[ cut here ]------------
      	kernel BUG at net/rxrpc/local_object.c:433!
      
      and even beyond that, will evince other oopses if there are service
      connections still present.
      
      Fix this by:
      
       (1) Removing the triggering of connection reaping when an rxrpc socket is
           released.  These don't actually clean up the connections anyway - and
           further, the local endpoint may still be in use through another
           socket.
      
       (2) Mark the local endpoint as dead when we start the process of tearing
           it down.
      
       (3) When destroying a local endpoint, strip all of its client connections
           from the idle list and discard the ref on each that the list was
           holding.
      
       (4) When destroying a local endpoint, call the service connection reaper
           directly (rather than through a workqueue) to immediately kill off all
           outstanding service connections.
      
       (5) Make the service connection reaper reap connections for which the
           local endpoint is marked dead.
      
      Only after destroying the connections can we close the socket lest we get
      an oops in a workqueue that's looking at a connection or a peer.
      
      Fixes: 3d18cbb7 ("rxrpc: Fix conn expiry timers")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d12040b6
  8. 30 8月, 2019 1 次提交
  9. 29 8月, 2019 10 次提交
    • D
      mac80211: Correctly set noencrypt for PAE frames · f8b43c5c
      Denis Kenzior 提交于
      The noencrypt flag was intended to be set if the "frame was received
      unencrypted" according to include/uapi/linux/nl80211.h.  However, the
      current behavior is opposite of this.
      
      Cc: stable@vger.kernel.org
      Fixes: 018f6fbf ("mac80211: Send control port frames over nl80211")
      Signed-off-by: NDenis Kenzior <denkenz@gmail.com>
      Link: https://lore.kernel.org/r/20190827224120.14545-3-denkenz@gmail.comSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      f8b43c5c
    • D
      mac80211: Don't memset RXCB prior to PAE intercept · c8a41c6a
      Denis Kenzior 提交于
      In ieee80211_deliver_skb_to_local_stack intercepts EAPoL frames if
      mac80211 is configured to do so and forwards the contents over nl80211.
      During this process some additional data is also forwarded, including
      whether the frame was received encrypted or not.  Unfortunately just
      prior to the call to ieee80211_deliver_skb_to_local_stack, skb->cb is
      cleared, resulting in incorrect data being exposed over nl80211.
      
      Fixes: 018f6fbf ("mac80211: Send control port frames over nl80211")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDenis Kenzior <denkenz@gmail.com>
      Link: https://lore.kernel.org/r/20190827224120.14545-2-denkenz@gmail.comSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      c8a41c6a
    • F
      netfilter: nf_flow_table: clear skb tstamp before xmit · de20900f
      Florian Westphal 提交于
      If 'fq' qdisc is used and a program has requested timestamps,
      skb->tstamp needs to be cleared, else fq will treat these as
      'transmit time'.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      de20900f
    • D
      net/sched: pfifo_fast: fix wrong dereference in pfifo_fast_enqueue · 092e22e5
      Davide Caratti 提交于
      Now that 'TCQ_F_CPUSTATS' bit can be cleared, depending on the value of
      'TCQ_F_NOLOCK' bit in the parent qdisc, we can't assume anymore that
      per-cpu counters are there in the error path of skb_array_produce().
      Otherwise, the following splat can be seen:
      
       Unable to handle kernel paging request at virtual address 0000600dea430008
       Mem abort info:
         ESR = 0x96000005
         Exception class = DABT (current EL), IL = 32 bits
         SET = 0, FnV = 0
         EA = 0, S1PTW = 0
       Data abort info:
         ISV = 0, ISS = 0x00000005
         CM = 0, WnR = 0
       user pgtable: 64k pages, 48-bit VAs, pgdp = 000000007b97530e
       [0000600dea430008] pgd=0000000000000000, pud=0000000000000000
       Internal error: Oops: 96000005 [#1] SMP
      [...]
       pstate: 10000005 (nzcV daif -PAN -UAO)
       pc : pfifo_fast_enqueue+0x524/0x6e8
       lr : pfifo_fast_enqueue+0x46c/0x6e8
       sp : ffff800d39376fe0
       x29: ffff800d39376fe0 x28: 1ffff001a07d1e40
       x27: ffff800d03e8f188 x26: ffff800d03e8f200
       x25: 0000000000000062 x24: ffff800d393772f0
       x23: 0000000000000000 x22: 0000000000000403
       x21: ffff800cca569a00 x20: ffff800d03e8ee00
       x19: ffff800cca569a10 x18: 00000000000000bf
       x17: 0000000000000000 x16: 0000000000000000
       x15: 0000000000000000 x14: ffff1001a726edd0
       x13: 1fffe4000276a9a4 x12: 0000000000000000
       x11: dfff200000000000 x10: ffff800d03e8f1a0
       x9 : 0000000000000003 x8 : 0000000000000000
       x7 : 00000000f1f1f1f1 x6 : ffff1001a726edea
       x5 : ffff800cca56a53c x4 : 1ffff001bf9a8003
       x3 : 1ffff001bf9a8003 x2 : 1ffff001a07d1dcb
       x1 : 0000600dea430000 x0 : 0000600dea430008
       Process ping (pid: 6067, stack limit = 0x00000000dc0aa557)
       Call trace:
        pfifo_fast_enqueue+0x524/0x6e8
        htb_enqueue+0x660/0x10e0 [sch_htb]
        __dev_queue_xmit+0x123c/0x2de0
        dev_queue_xmit+0x24/0x30
        ip_finish_output2+0xc48/0x1720
        ip_finish_output+0x548/0x9d8
        ip_output+0x334/0x788
        ip_local_out+0x90/0x138
        ip_send_skb+0x44/0x1d0
        ip_push_pending_frames+0x5c/0x78
        raw_sendmsg+0xed8/0x28d0
        inet_sendmsg+0xc4/0x5c0
        sock_sendmsg+0xac/0x108
        __sys_sendto+0x1ac/0x2a0
        __arm64_sys_sendto+0xc4/0x138
        el0_svc_handler+0x13c/0x298
        el0_svc+0x8/0xc
       Code: f9402e80 d538d081 91002000 8b010000 (885f7c03)
      
      Fix this by testing the value of 'TCQ_F_CPUSTATS' bit in 'qdisc->flags',
      before dereferencing 'qdisc->cpu_qstats'.
      
      Fixes: 8a53e616 ("net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too")
      CC: Paolo Abeni <pabeni@redhat.com>
      CC: Stefano Brivio <sbrivio@redhat.com>
      Reported-by: NLi Shuang <shuali@redhat.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      092e22e5
    • W
      tcp: inherit timestamp on mtu probe · 888a5c53
      Willem de Bruijn 提交于
      TCP associates tx timestamp requests with a byte in the bytestream.
      If merging skbs in tcp_mtu_probe, migrate the tstamp request.
      
      Similar to MSG_EOR, do not allow moving a timestamp from any segment
      in the probe but the last. This to avoid merging multiple timestamps.
      
      Tested with the packetdrill script at
      https://github.com/wdebruij/packetdrill/commits/mtu_probe-1
      
      Link: http://patchwork.ozlabs.org/patch/1143278/#2232897
      Fixes: 4ed2d765 ("net-timestamp: TCP timestamping")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      888a5c53
    • V
      net: sched: act_sample: fix psample group handling on overwrite · dbf47a2a
      Vlad Buslov 提交于
      Action sample doesn't properly handle psample_group pointer in overwrite
      case. Following issues need to be fixed:
      
      - In tcf_sample_init() function RCU_INIT_POINTER() is used to set
        s->psample_group, even though we neither setting the pointer to NULL, nor
        preventing concurrent readers from accessing the pointer in some way.
        Use rcu_swap_protected() instead to safely reset the pointer.
      
      - Old value of s->psample_group is not released or deallocated in any way,
        which results resource leak. Use psample_group_put() on non-NULL value
        obtained with rcu_swap_protected().
      
      - The function psample_group_put() that released reference to struct
        psample_group pointed by rcu-pointer s->psample_group doesn't respect rcu
        grace period when deallocating it. Extend struct psample_group with rcu
        head and use kfree_rcu when freeing it.
      
      Fixes: 5c5670fa ("net/sched: Introduce sample tc action")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dbf47a2a
    • J
      openvswitch: Clear the L4 portion of the key for "later" fragments. · 0754b4e8
      Justin Pettit 提交于
      Only the first fragment in a datagram contains the L4 headers.  When the
      Open vSwitch module parses a packet, it always sets the IP protocol
      field in the key, but can only set the L4 fields on the first fragment.
      The original behavior would not clear the L4 portion of the key, so
      garbage values would be sent in the key for "later" fragments.  This
      patch clears the L4 fields in that circumstance to prevent sending those
      garbage values as part of the upcall.
      Signed-off-by: NJustin Pettit <jpettit@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0754b4e8
    • G
      openvswitch: Properly set L4 keys on "later" IP fragments · ad06a566
      Greg Rose 提交于
      When IP fragments are reassembled before being sent to conntrack, the
      key from the last fragment is used.  Unless there are reordering
      issues, the last fragment received will not contain the L4 ports, so the
      key for the reassembled datagram won't contain them.  This patch updates
      the key once we have a reassembled datagram.
      
      The handle_fragments() function works on L3 headers so we pull the L3/L4
      flow key update code from key_extract into a new function
      'key_extract_l3l4'.  Then we add a another new function
      ovs_flow_key_update_l3l4() and export it so that it is accessible by
      handle_fragments() for conntrack packet reassembly.
      Co-authored-by: NJustin Pettit <jpettit@ovn.org>
      Signed-off-by: NGreg Rose <gvrose8192@gmail.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad06a566
    • E
      mld: fix memory leak in mld_del_delrec() · a84d0164
      Eric Dumazet 提交于
      Similar to the fix done for IPv4 in commit e5b1c6c6
      ("igmp: fix memory leak in igmpv3_del_delrec()"), we need to
      make sure mca_tomb and mca_sources are not blindly overwritten.
      
      Using swap() then a call to ip6_mc_clear_src() will take care
      of the missing free.
      
      BUG: memory leak
      unreferenced object 0xffff888117d9db00 (size 64):
        comm "syz-executor247", pid 6918, jiffies 4294943989 (age 25.350s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 fe 88 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000005b463030>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
          [<000000005b463030>] slab_post_alloc_hook mm/slab.h:522 [inline]
          [<000000005b463030>] slab_alloc mm/slab.c:3319 [inline]
          [<000000005b463030>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3548
          [<00000000939cbf94>] kmalloc include/linux/slab.h:552 [inline]
          [<00000000939cbf94>] kzalloc include/linux/slab.h:748 [inline]
          [<00000000939cbf94>] ip6_mc_add1_src net/ipv6/mcast.c:2236 [inline]
          [<00000000939cbf94>] ip6_mc_add_src+0x31f/0x420 net/ipv6/mcast.c:2356
          [<00000000d8972221>] ip6_mc_source+0x4a8/0x600 net/ipv6/mcast.c:449
          [<000000002b203d0d>] do_ipv6_setsockopt.isra.0+0x1b92/0x1dd0 net/ipv6/ipv6_sockglue.c:748
          [<000000001f1e2d54>] ipv6_setsockopt+0x89/0xd0 net/ipv6/ipv6_sockglue.c:944
          [<00000000c8f7bdf9>] udpv6_setsockopt+0x4e/0x90 net/ipv6/udp.c:1558
          [<000000005a9a0c5e>] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3139
          [<00000000910b37b2>] __sys_setsockopt+0x10f/0x220 net/socket.c:2084
          [<00000000e9108023>] __do_sys_setsockopt net/socket.c:2100 [inline]
          [<00000000e9108023>] __se_sys_setsockopt net/socket.c:2097 [inline]
          [<00000000e9108023>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2097
          [<00000000f4818160>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:296
          [<000000008d367e8f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 1666d49e ("mld: do not remove mld souce list info when set link down")
      Fixes: 9c8bb163 ("igmp, mld: Fix memory leak in igmpv3/mld_del_delrec()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a84d0164
    • D
      net/sched: pfifo_fast: fix wrong dereference when qdisc is reset · 04d37cf4
      Davide Caratti 提交于
      Now that 'TCQ_F_CPUSTATS' bit can be cleared, depending on the value of
      'TCQ_F_NOLOCK' bit in the parent qdisc, we need to be sure that per-cpu
      counters are present when 'reset()' is called for pfifo_fast qdiscs.
      Otherwise, the following script:
      
       # tc q a dev lo handle 1: root htb default 100
       # tc c a dev lo parent 1: classid 1:100 htb \
       > rate 95Mbit ceil 100Mbit burst 64k
       [...]
       # tc f a dev lo parent 1: protocol arp basic classid 1:100
       [...]
       # tc q a dev lo parent 1:100 handle 100: pfifo_fast
       [...]
       # tc q d dev lo root
      
      can generate the following splat:
      
       Unable to handle kernel paging request at virtual address dfff2c01bd148000
       Mem abort info:
         ESR = 0x96000004
         Exception class = DABT (current EL), IL = 32 bits
         SET = 0, FnV = 0
         EA = 0, S1PTW = 0
       Data abort info:
         ISV = 0, ISS = 0x00000004
         CM = 0, WnR = 0
       [dfff2c01bd148000] address between user and kernel address ranges
       Internal error: Oops: 96000004 [#1] SMP
       [...]
       pstate: 80000005 (Nzcv daif -PAN -UAO)
       pc : pfifo_fast_reset+0x280/0x4d8
       lr : pfifo_fast_reset+0x21c/0x4d8
       sp : ffff800d09676fa0
       x29: ffff800d09676fa0 x28: ffff200012ee22e4
       x27: dfff200000000000 x26: 0000000000000000
       x25: ffff800ca0799958 x24: ffff1001940f332b
       x23: 0000000000000007 x22: ffff200012ee1ab8
       x21: 0000600de8a40000 x20: 0000000000000000
       x19: ffff800ca0799900 x18: 0000000000000000
       x17: 0000000000000002 x16: 0000000000000000
       x15: 0000000000000000 x14: 0000000000000000
       x13: 0000000000000000 x12: ffff1001b922e6e2
       x11: 1ffff001b922e6e1 x10: 0000000000000000
       x9 : 1ffff001b922e6e1 x8 : dfff200000000000
       x7 : 0000000000000000 x6 : 0000000000000000
       x5 : 1fffe400025dc45c x4 : 1fffe400025dc357
       x3 : 00000c01bd148000 x2 : 0000600de8a40000
       x1 : 0000000000000007 x0 : 0000600de8a40004
       Call trace:
        pfifo_fast_reset+0x280/0x4d8
        qdisc_reset+0x6c/0x370
        htb_reset+0x150/0x3b8 [sch_htb]
        qdisc_reset+0x6c/0x370
        dev_deactivate_queue.constprop.5+0xe0/0x1a8
        dev_deactivate_many+0xd8/0x908
        dev_deactivate+0xe4/0x190
        qdisc_graft+0x88c/0xbd0
        tc_get_qdisc+0x418/0x8a8
        rtnetlink_rcv_msg+0x3a8/0xa78
        netlink_rcv_skb+0x18c/0x328
        rtnetlink_rcv+0x28/0x38
        netlink_unicast+0x3c4/0x538
        netlink_sendmsg+0x538/0x9a0
        sock_sendmsg+0xac/0xf8
        ___sys_sendmsg+0x53c/0x658
        __sys_sendmsg+0xc8/0x140
        __arm64_sys_sendmsg+0x74/0xa8
        el0_svc_handler+0x164/0x468
        el0_svc+0x10/0x14
       Code: 910012a0 92400801 d343fc03 11000c21 (38fb6863)
      
      Fix this by testing the value of 'TCQ_F_CPUSTATS' bit in 'qdisc->flags',
      before dereferencing 'qdisc->cpu_qstats'.
      
      Changes since v1:
       - coding style improvements, thanks to Stefano Brivio
      
      Fixes: 8a53e616 ("net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too")
      CC: Paolo Abeni <pabeni@redhat.com>
      Reported-by: NLi Shuang <shuali@redhat.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04d37cf4
  10. 28 8月, 2019 6 次提交
    • J
      libceph: don't call crypto_free_sync_skcipher() on a NULL tfm · e8c99200
      Jia-Ju Bai 提交于
      In set_secret(), key->tfm is assigned to NULL on line 55, and then
      ceph_crypto_key_destroy(key) is executed.
      
      ceph_crypto_key_destroy(key)
        crypto_free_sync_skcipher(key->tfm)
          crypto_free_skcipher(&tfm->base);
      
      This happens to work because crypto_sync_skcipher is a trivial wrapper
      around crypto_skcipher: &tfm->base is still 0 and crypto_free_skcipher()
      handles that.  Let's not rely on the layout of crypto_sync_skcipher.
      
      This bug is found by a static analysis tool STCheck written by us.
      
      Fixes: 69d6302b ("libceph: Remove VLA usage of skcipher").
      Signed-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      e8c99200
    • E
      tcp: remove empty skb from write queue in error cases · fdfc5c85
      Eric Dumazet 提交于
      Vladimir Rutsky reported stuck TCP sessions after memory pressure
      events. Edge Trigger epoll() user would never receive an EPOLLOUT
      notification allowing them to retry a sendmsg().
      
      Jason tested the case of sk_stream_alloc_skb() returning NULL,
      but there are other paths that could lead both sendmsg() and sendpage()
      to return -1 (EAGAIN), with an empty skb queued on the write queue.
      
      This patch makes sure we remove this empty skb so that
      Jason code can detect that the queue is empty, and
      call sk->sk_write_space(sk) accordingly.
      
      Fixes: ce5ec440 ("tcp: ensure epoll edge trigger wakeup when write queue is empty")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jason Baron <jbaron@akamai.com>
      Reported-by: NVladimir Rutsky <rutsky@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fdfc5c85
    • K
      net/rds: Fix info leak in rds6_inc_info_copy() · 7d0a0658
      Ka-Cheong Poon 提交于
      The rds6_inc_info_copy() function has a couple struct members which
      are leaking stack information.  The ->tos field should hold actual
      information and the ->flags field needs to be zeroed out.
      
      Fixes: 3eb45036 ("rds: add type of service(tos) infrastructure")
      Fixes: b7ff8b10 ("rds: Extend RDS API for IPv6 support")
      Reported-by: N黄ID蝴蝶 <butterflyhuangxx@gmail.com>
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NKa-Cheong Poon <ka-cheong.poon@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d0a0658
    • F
      net: fix skb use after free in netpoll · 2c1644cf
      Feng Sun 提交于
      After commit baeababb
      ("tun: return NET_XMIT_DROP for dropped packets"),
      when tun_net_xmit drop packets, it will free skb and return NET_XMIT_DROP,
      netpoll_send_skb_on_dev will run into following use after free cases:
      1. retry netpoll_start_xmit with freed skb;
      2. queue freed skb in npinfo->txq.
      queue_process will also run into use after free case.
      
      hit netpoll_send_skb_on_dev first case with following kernel log:
      
      [  117.864773] kernel BUG at mm/slub.c:306!
      [  117.864773] invalid opcode: 0000 [#1] SMP PTI
      [  117.864774] CPU: 3 PID: 2627 Comm: loop_printmsg Kdump: loaded Tainted: P           OE     5.3.0-050300rc5-generic #201908182231
      [  117.864775] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  117.864775] RIP: 0010:kmem_cache_free+0x28d/0x2b0
      [  117.864781] Call Trace:
      [  117.864781]  ? tun_net_xmit+0x21c/0x460
      [  117.864781]  kfree_skbmem+0x4e/0x60
      [  117.864782]  kfree_skb+0x3a/0xa0
      [  117.864782]  tun_net_xmit+0x21c/0x460
      [  117.864782]  netpoll_start_xmit+0x11d/0x1b0
      [  117.864788]  netpoll_send_skb_on_dev+0x1b8/0x200
      [  117.864789]  __br_forward+0x1b9/0x1e0 [bridge]
      [  117.864789]  ? skb_clone+0x53/0xd0
      [  117.864790]  ? __skb_clone+0x2e/0x120
      [  117.864790]  deliver_clone+0x37/0x50 [bridge]
      [  117.864790]  maybe_deliver+0x89/0xc0 [bridge]
      [  117.864791]  br_flood+0x6c/0x130 [bridge]
      [  117.864791]  br_dev_xmit+0x315/0x3c0 [bridge]
      [  117.864792]  netpoll_start_xmit+0x11d/0x1b0
      [  117.864792]  netpoll_send_skb_on_dev+0x1b8/0x200
      [  117.864792]  netpoll_send_udp+0x2c6/0x3e8
      [  117.864793]  write_msg+0xd9/0xf0 [netconsole]
      [  117.864793]  console_unlock+0x386/0x4e0
      [  117.864793]  vprintk_emit+0x17e/0x280
      [  117.864794]  vprintk_default+0x29/0x50
      [  117.864794]  vprintk_func+0x4c/0xbc
      [  117.864794]  printk+0x58/0x6f
      [  117.864795]  loop_fun+0x24/0x41 [printmsg_loop]
      [  117.864795]  kthread+0x104/0x140
      [  117.864795]  ? 0xffffffffc05b1000
      [  117.864796]  ? kthread_park+0x80/0x80
      [  117.864796]  ret_from_fork+0x35/0x40
      Signed-off-by: NFeng Sun <loyou85@gmail.com>
      Signed-off-by: NXiaojun Zhao <xiaojunzhao141@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c1644cf
    • V
      net: dsa: tag_8021q: Future-proof the reserved fields in the custom VID · bcccb0a5
      Vladimir Oltean 提交于
      After witnessing the discussion in https://lkml.org/lkml/2019/8/14/151
      w.r.t. ioctl extensibility, it became clear that such an issue might
      prevent that the 3 RSV bits inside the DSA 802.1Q tag might also suffer
      the same fate and be useless for further extension.
      
      So clearly specify that the reserved bits should currently be
      transmitted as zero and ignored on receive. The DSA tagger already does
      this (and has always did), and is the only known user so far (no
      Wireshark dissection plugin, etc). So there should be no incompatibility
      to speak of.
      
      Fixes: 0471dd42 ("net: dsa: tag_8021q: Create a stable binary format")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcccb0a5
    • C
      net_sched: fix a NULL pointer deref in ipt action · 981471bd
      Cong Wang 提交于
      The net pointer in struct xt_tgdtor_param is not explicitly
      initialized therefore is still NULL when dereferencing it.
      So we have to find a way to pass the correct net pointer to
      ipt_destroy_target().
      
      The best way I find is just saving the net pointer inside the per
      netns struct tcf_idrinfo, which could make this patch smaller.
      
      Fixes: 0c66dc1e ("netfilter: conntrack: register hooks in netns when needed by ruleset")
      Reported-and-tested-by: itugrok@yahoo.com
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      981471bd
  11. 27 8月, 2019 3 次提交