1. 13 3月, 2020 24 次提交
  2. 12 3月, 2020 5 次提交
    • D
      net: mptcp: don't hang before sending 'MP capable with data' · 767d3ded
      Davide Caratti 提交于
      the following packetdrill script
      
        socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
        fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
        fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
        connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
        > S 0:0(0) <mss 1460,sackOK,TS val 100 ecr 0,nop,wscale 8,mpcapable v1 flags[flag_h] nokey>
        < S. 0:0(0) ack 1 win 65535 <mss 1460,sackOK,TS val 700 ecr 100,nop,wscale 8,mpcapable v1 flags[flag_h] key[skey=2]>
        > . 1:1(0) ack 1 win 256 <nop, nop, TS val 100 ecr 700,mpcapable v1 flags[flag_h] key[ckey,skey]>
        getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
        fcntl(3, F_SETFL, O_RDWR) = 0
        write(3, ..., 1000) = 1000
      
      doesn't transmit 1KB data packet after a successful three-way-handshake,
      using mp_capable with data as required by protocol v1, and write() hangs
      forever:
      
       PID: 973    TASK: ffff97dd399cae80  CPU: 1   COMMAND: "packetdrill"
        #0 [ffffa9b94062fb78] __schedule at ffffffff9c90a000
        #1 [ffffa9b94062fc08] schedule at ffffffff9c90a4a0
        #2 [ffffa9b94062fc18] schedule_timeout at ffffffff9c90e00d
        #3 [ffffa9b94062fc90] wait_woken at ffffffff9c120184
        #4 [ffffa9b94062fcb0] sk_stream_wait_connect at ffffffff9c75b064
        #5 [ffffa9b94062fd20] mptcp_sendmsg at ffffffff9c8e801c
        #6 [ffffa9b94062fdc0] sock_sendmsg at ffffffff9c747324
        #7 [ffffa9b94062fdd8] sock_write_iter at ffffffff9c7473c7
        #8 [ffffa9b94062fe48] new_sync_write at ffffffff9c302976
        #9 [ffffa9b94062fed0] vfs_write at ffffffff9c305685
       #10 [ffffa9b94062ff00] ksys_write at ffffffff9c305985
       #11 [ffffa9b94062ff38] do_syscall_64 at ffffffff9c004475
       #12 [ffffa9b94062ff50] entry_SYSCALL_64_after_hwframe at ffffffff9ca0008c
           RIP: 00007f959407eaf7  RSP: 00007ffe9e95a910  RFLAGS: 00000293
           RAX: ffffffffffffffda  RBX: 0000000000000008  RCX: 00007f959407eaf7
           RDX: 00000000000003e8  RSI: 0000000001785fe0  RDI: 0000000000000008
           RBP: 0000000001785fe0   R8: 0000000000000000   R9: 0000000000000003
           R10: 0000000000000007  R11: 0000000000000293  R12: 00000000000003e8
           R13: 00007ffe9e95ae30  R14: 0000000000000000  R15: 0000000000000000
           ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
      
      Fix it ensuring that socket state is TCP_ESTABLISHED on reception of the
      third ack.
      
      Fixes: 1954b860 ("mptcp: Check connection state before attempting send")
      Suggested-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      767d3ded
    • J
      net: Add missing annotation for *netlink_seq_start() · 64fbca01
      Jules Irenge 提交于
      Sparse reports a warning at netlink_seq_start()
      
      warning: context imbalance in netlink_seq_start() - wrong count at exit
      The root cause is the missing annotation at netlink_seq_start()
      Add the missing  __acquires(RCU) annotation
      Signed-off-by: NJules Irenge <jbi.octave@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64fbca01
    • J
      tcp: Add missing annotation for tcp_child_process() · 734c8f75
      Jules Irenge 提交于
      Sparse reports warning at tcp_child_process()
      warning: context imbalance in tcp_child_process() - unexpected unlock
      The root cause is the missing annotation at tcp_child_process()
      
      Add the missing __releases(&((child)->sk_lock.slock)) annotation
      Signed-off-by: NJules Irenge <jbi.octave@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      734c8f75
    • J
      raw: Add missing annotations to raw_seq_start() and raw_seq_stop() · 0d8a42c9
      Jules Irenge 提交于
      Sparse reports warnings at raw_seq_start() and raw_seq_stop()
      
      warning: context imbalance in raw_seq_start() - wrong count at exit
      warning: context imbalance in raw_seq_stop() - unexpected unlock
      
      The root cause is the missing annotations at raw_seq_start()
      	and raw_seq_stop()
      Add the missing __acquires(&h->lock) annotation
      Add the missing __releases(&h->lock) annotation
      Signed-off-by: NJules Irenge <jbi.octave@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d8a42c9
    • J
      net: sched: make newly activated qdiscs visible · 4cda7527
      Julian Wiedmann 提交于
      In their .attach callback, mq[prio] only add the qdiscs of the currently
      active TX queues to the device's qdisc hash list.
      If a user later increases the number of active TX queues, their qdiscs
      are not visible via eg. 'tc qdisc show'.
      
      Add a hook to netif_set_real_num_tx_queues() that walks all active
      TX queues and adds those which are missing to the hash list.
      
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jamal Hadi Salim <jhs@mojatatu.com>
      CC: Cong Wang <xiyou.wangcong@gmail.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cda7527
  3. 11 3月, 2020 1 次提交
  4. 10 3月, 2020 3 次提交
  5. 09 3月, 2020 3 次提交
    • E
      net/sched: act_ct: fix lockdep splat in tcf_ct_flow_table_get · 138470a9
      Eric Dumazet 提交于
      Convert zones_lock spinlock to zones_mutex mutex,
      and struct (tcf_ct_flow_table)->ref to a refcount,
      so that control path can use regular GFP_KERNEL allocations
      from standard process context. This is more robust
      in case of memory pressure.
      
      The refcount is needed because tcf_ct_flow_table_put() can
      be called from RCU callback, thus in BH context.
      
      The issue was spotted by syzbot, as rhashtable_init()
      was called with a spinlock held, which is bad since GFP_KERNEL
      allocations can sleep.
      
      Note to developers : Please make sure your patches are tested
      with CONFIG_DEBUG_ATOMIC_SLEEP=y
      
      BUG: sleeping function called from invalid context at mm/slab.h:565
      in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9582, name: syz-executor610
      2 locks held by syz-executor610/9582:
       #0: ffffffff8a34eb80 (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:72 [inline]
       #0: ffffffff8a34eb80 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x3f9/0xad0 net/core/rtnetlink.c:5437
       #1: ffffffff8a3961b8 (zones_lock){+...}, at: spin_lock_bh include/linux/spinlock.h:343 [inline]
       #1: ffffffff8a3961b8 (zones_lock){+...}, at: tcf_ct_flow_table_get+0xa3/0x1700 net/sched/act_ct.c:67
      Preemption disabled at:
      [<0000000000000000>] 0x0
      CPU: 0 PID: 9582 Comm: syz-executor610 Not tainted 5.6.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x188/0x20d lib/dump_stack.c:118
       ___might_sleep.cold+0x1f4/0x23d kernel/sched/core.c:6798
       slab_pre_alloc_hook mm/slab.h:565 [inline]
       slab_alloc_node mm/slab.c:3227 [inline]
       kmem_cache_alloc_node_trace+0x272/0x790 mm/slab.c:3593
       __do_kmalloc_node mm/slab.c:3615 [inline]
       __kmalloc_node+0x38/0x60 mm/slab.c:3623
       kmalloc_node include/linux/slab.h:578 [inline]
       kvmalloc_node+0x61/0xf0 mm/util.c:574
       kvmalloc include/linux/mm.h:645 [inline]
       kvzalloc include/linux/mm.h:653 [inline]
       bucket_table_alloc+0x8b/0x480 lib/rhashtable.c:175
       rhashtable_init+0x3d2/0x750 lib/rhashtable.c:1054
       nf_flow_table_init+0x16d/0x310 net/netfilter/nf_flow_table_core.c:498
       tcf_ct_flow_table_get+0xe33/0x1700 net/sched/act_ct.c:82
       tcf_ct_init+0xba4/0x18a6 net/sched/act_ct.c:1050
       tcf_action_init_1+0x697/0xa20 net/sched/act_api.c:945
       tcf_action_init+0x1e9/0x2f0 net/sched/act_api.c:1001
       tcf_action_add+0xdb/0x370 net/sched/act_api.c:1411
       tc_ctl_action+0x366/0x456 net/sched/act_api.c:1466
       rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5440
       netlink_rcv_skb+0x15a/0x410 net/netlink/af_netlink.c:2478
       netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
       netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
       netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x6b9/0x7d0 net/socket.c:2343
       ___sys_sendmsg+0x100/0x170 net/socket.c:2397
       __sys_sendmsg+0xec/0x1b0 net/socket.c:2430
       do_syscall_64+0xf6/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4403d9
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffd719af218 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004403d9
      RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000000000005 R09: 00000000004002c8
      R10: 0000000000000008 R11: 00000000000
      
      Fixes: c34b961a ("net/sched: act_ct: Create nf flow table per zone")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Paul Blakey <paulb@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      138470a9
    • J
      sched: act: allow user to specify type of HW stats for a filter · 44f86580
      Jiri Pirko 提交于
      Currently, user who is adding an action expects HW to report stats,
      however it does not have exact expectations about the stats types.
      That is aligned with TCA_ACT_HW_STATS_TYPE_ANY.
      
      Allow user to specify the type of HW stats for an action and require it.
      
      Pass the information down to flow_offload layer.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44f86580
    • J
      flow_offload: check for basic action hw stats type · 319a1d19
      Jiri Pirko 提交于
      Introduce flow_action_basic_hw_stats_types_check() helper and use it
      in drivers. That sanitizes the drivers which do not have support
      for action HW stats types.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      319a1d19
  6. 06 3月, 2020 3 次提交
  7. 05 3月, 2020 1 次提交
    • V
      net: mscc: ocelot: eliminate confusion between CPU and NPI port · 69df578c
      Vladimir Oltean 提交于
      Ocelot has the concept of a CPU port. The CPU port is represented in the
      forwarding and the queueing system, but it is not a physical device. The
      CPU port can either be accessed via register-based injection/extraction
      (which is the case of Ocelot), via Frame-DMA (similar to the first one),
      or "connected" to a physical Ethernet port (called NPI in the datasheet)
      which is the case of the Felix DSA switch.
      
      In Ocelot the CPU port is at index 11.
      In Felix the CPU port is at index 6.
      
      The CPU bit is treated special in the forwarding, as it is never cleared
      from the forwarding port mask (once added to it). Other than that, it is
      treated the same as a normal front port.
      
      Both Felix and Ocelot should use the CPU port in the same way. This
      means that Felix should not use the NPI port directly when forwarding to
      the CPU, but instead use the CPU port.
      
      This patch is fixing this such that Felix will use port 6 as its CPU
      port, and just use the NPI port to carry the traffic.
      
      Therefore, eliminate the "ocelot->cpu" variable which was holding the
      index of the NPI port for Felix, and the index of the CPU port module
      for Ocelot, so the variable was actually configuring different things
      for different drivers and causing at least part of the confusion.
      
      Also remove the "ocelot->num_cpu_ports" variable, which is the result of
      another confusion. The 2 CPU ports mentioned in the datasheet are
      because there are two frame extraction channels (register based or DMA
      based). This is of no relevance to the driver at the moment, and
      invisible to the analyzer module.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Suggested-by: NAllan W. Nielsen <allan.nielsen@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69df578c