1. 03 10月, 2019 7 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 4fbb97ba
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Remove the skb_ext_del from nf_reset, and renames it to a more
         fitting nf_reset_ct(). Patch from Florian Westphal.
      
      2) Fix deadlock in nft_connlimit between packet path updates and
         the garbage collector.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fbb97ba
    • V
      ptp_qoriq: Initialize the registers' spinlock before calling ptp_qoriq_settime · db34a471
      Vladimir Oltean 提交于
      Because ptp_qoriq_settime is being called prior to spin_lock_init, the
      following stack trace can be seen at driver probe time:
      
      [    2.269117] the code is fine but needs lockdep annotation.
      [    2.274569] turning off the locking correctness validator.
      [    2.280027] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc7-01478-g01eaa67a4797 #263
      [    2.288073] Hardware name: Freescale LS1021A
      [    2.292337] [<c0313cb4>] (unwind_backtrace) from [<c030e11c>] (show_stack+0x10/0x14)
      [    2.300045] [<c030e11c>] (show_stack) from [<c1219440>] (dump_stack+0xcc/0xf8)
      [    2.307235] [<c1219440>] (dump_stack) from [<c03b9b44>] (register_lock_class+0x730/0x73c)
      [    2.315372] [<c03b9b44>] (register_lock_class) from [<c03b6190>] (__lock_acquire+0x78/0x270c)
      [    2.323856] [<c03b6190>] (__lock_acquire) from [<c03b90cc>] (lock_acquire+0xe0/0x22c)
      [    2.331649] [<c03b90cc>] (lock_acquire) from [<c123c310>] (_raw_spin_lock_irqsave+0x54/0x68)
      [    2.340048] [<c123c310>] (_raw_spin_lock_irqsave) from [<c0e73fe4>] (ptp_qoriq_settime+0x38/0x80)
      [    2.348878] [<c0e73fe4>] (ptp_qoriq_settime) from [<c0e746d4>] (ptp_qoriq_init+0x1f8/0x484)
      [    2.357189] [<c0e746d4>] (ptp_qoriq_init) from [<c0e74aac>] (ptp_qoriq_probe+0xd0/0x184)
      [    2.365243] [<c0e74aac>] (ptp_qoriq_probe) from [<c0b0a07c>] (platform_drv_probe+0x48/0x9c)
      [    2.373555] [<c0b0a07c>] (platform_drv_probe) from [<c0b07a14>] (really_probe+0x1c4/0x400)
      [    2.381779] [<c0b07a14>] (really_probe) from [<c0b07e28>] (driver_probe_device+0x78/0x1b8)
      [    2.390003] [<c0b07e28>] (driver_probe_device) from [<c0b081d0>] (device_driver_attach+0x58/0x60)
      [    2.398832] [<c0b081d0>] (device_driver_attach) from [<c0b082d4>] (__driver_attach+0xfc/0x160)
      [    2.407402] [<c0b082d4>] (__driver_attach) from [<c0b05a84>] (bus_for_each_dev+0x68/0xb4)
      [    2.415539] [<c0b05a84>] (bus_for_each_dev) from [<c0b06b68>] (bus_add_driver+0x104/0x20c)
      [    2.423763] [<c0b06b68>] (bus_add_driver) from [<c0b0909c>] (driver_register+0x78/0x10c)
      [    2.431815] [<c0b0909c>] (driver_register) from [<c030313c>] (do_one_initcall+0x8c/0x3ac)
      [    2.439954] [<c030313c>] (do_one_initcall) from [<c1f013f4>] (kernel_init_freeable+0x468/0x548)
      [    2.448610] [<c1f013f4>] (kernel_init_freeable) from [<c12344d8>] (kernel_init+0x8/0x10c)
      [    2.456745] [<c12344d8>] (kernel_init) from [<c03010b4>] (ret_from_fork+0x14/0x20)
      [    2.464273] Exception stack(0xea89ffb0 to 0xea89fff8)
      [    2.469297] ffa0:                                     00000000 00000000 00000000 00000000
      [    2.477432] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      [    2.485566] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
      
      Fixes: ff54571a ("ptp_qoriq: convert to use ptp_qoriq_init/free")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db34a471
    • D
      Merge branch 'SJA1105-DSA-locking-fixes-for-PTP' · 76d67494
      David S. Miller 提交于
      Vladimir Oltean says:
      
      ====================
      SJA1105 DSA locking fixes for PTP
      
      This series fixes the locking API usage problems spotted when compiling
      the kernel with CONFIG_DEBUG_ATOMIC_SLEEP=y and CONFIG_DEBUG_SPINLOCK=y.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76d67494
    • V
      net: dsa: sja1105: Fix sleeping while atomic in .port_hwtstamp_set · 3e8db7e5
      Vladimir Oltean 提交于
      Currently this stack trace can be seen with CONFIG_DEBUG_ATOMIC_SLEEP=y:
      
      [   41.568348] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:909
      [   41.576757] in_atomic(): 1, irqs_disabled(): 0, pid: 208, name: ptp4l
      [   41.583212] INFO: lockdep is turned off.
      [   41.587123] CPU: 1 PID: 208 Comm: ptp4l Not tainted 5.3.0-rc6-01445-ge950f2d4bc7f-dirty #1827
      [   41.599873] [<c0313d7c>] (unwind_backtrace) from [<c030e13c>] (show_stack+0x10/0x14)
      [   41.607584] [<c030e13c>] (show_stack) from [<c1212d50>] (dump_stack+0xd4/0x100)
      [   41.614863] [<c1212d50>] (dump_stack) from [<c037dfc8>] (___might_sleep+0x1c8/0x2b4)
      [   41.622574] [<c037dfc8>] (___might_sleep) from [<c122ea90>] (__mutex_lock+0x48/0xab8)
      [   41.630368] [<c122ea90>] (__mutex_lock) from [<c122f51c>] (mutex_lock_nested+0x1c/0x24)
      [   41.638340] [<c122f51c>] (mutex_lock_nested) from [<c0c6fe08>] (sja1105_static_config_reload+0x30/0x27c)
      [   41.647779] [<c0c6fe08>] (sja1105_static_config_reload) from [<c0c7015c>] (sja1105_hwtstamp_set+0x108/0x1cc)
      [   41.657562] [<c0c7015c>] (sja1105_hwtstamp_set) from [<c0feb650>] (dev_ifsioc+0x18c/0x330)
      [   41.665788] [<c0feb650>] (dev_ifsioc) from [<c0febbd8>] (dev_ioctl+0x320/0x6e8)
      [   41.673064] [<c0febbd8>] (dev_ioctl) from [<c0f8b1f4>] (sock_ioctl+0x334/0x5e8)
      [   41.680340] [<c0f8b1f4>] (sock_ioctl) from [<c05404a8>] (do_vfs_ioctl+0xb0/0xa10)
      [   41.687789] [<c05404a8>] (do_vfs_ioctl) from [<c0540e3c>] (ksys_ioctl+0x34/0x58)
      [   41.695151] [<c0540e3c>] (ksys_ioctl) from [<c0301000>] (ret_fast_syscall+0x0/0x28)
      [   41.702768] Exception stack(0xe8495fa8 to 0xe8495ff0)
      [   41.707796] 5fa0:                   beff4a8c 00000001 00000011 000089b0 beff4a8c beff4a80
      [   41.715933] 5fc0: beff4a8c 00000001 0000000c 00000036 b6fa98c8 004e19c1 00000001 00000000
      [   41.724069] 5fe0: 004dcedc beff4a6c 004c0738 b6e7af4c
      [   41.729860] BUG: scheduling while atomic: ptp4l/208/0x00000002
      [   41.735682] INFO: lockdep is turned off.
      
      Enabling RX timestamping will logically disturb the fastpath (processing
      of meta frames). Replace bool hwts_rx_en with a bit that is checked
      atomically from the fastpath and temporarily unset from the sleepable
      context during a change of the RX timestamping process (a destructive
      operation anyways, requires switch reset).
      If found unset, the fastpath (net/dsa/tag_sja1105.c) will just drop any
      received meta frame and not take the meta_lock at all.
      
      Fixes: a602afd2 ("net: dsa: sja1105: Expose PTP timestamping ioctls to userspace")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e8db7e5
    • V
      net: dsa: sja1105: Initialize the meta_lock · d6530e5a
      Vladimir Oltean 提交于
      Otherwise, with CONFIG_DEBUG_SPINLOCK=y, this stack trace gets printed
      when enabling RX timestamping and receiving a PTP frame:
      
      [  318.537078] INFO: trying to register non-static key.
      [  318.542040] the code is fine but needs lockdep annotation.
      [  318.547500] turning off the locking correctness validator.
      [  318.552972] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-13257-g0825b0669811-dirty #1962
      [  318.561283] Hardware name: Freescale LS1021A
      [  318.565566] [<c03144bc>] (unwind_backtrace) from [<c030e164>] (show_stack+0x10/0x14)
      [  318.573289] [<c030e164>] (show_stack) from [<c11b9f50>] (dump_stack+0xd4/0x100)
      [  318.580579] [<c11b9f50>] (dump_stack) from [<c03b9b40>] (register_lock_class+0x728/0x734)
      [  318.588731] [<c03b9b40>] (register_lock_class) from [<c03b60c4>] (__lock_acquire+0x78/0x25cc)
      [  318.597227] [<c03b60c4>] (__lock_acquire) from [<c03b8ef8>] (lock_acquire+0xd8/0x234)
      [  318.605033] [<c03b8ef8>] (lock_acquire) from [<c11db934>] (_raw_spin_lock+0x44/0x54)
      [  318.612755] [<c11db934>] (_raw_spin_lock) from [<c1164370>] (sja1105_rcv+0x1f8/0x4e8)
      [  318.620561] [<c1164370>] (sja1105_rcv) from [<c115d7cc>] (dsa_switch_rcv+0x80/0x204)
      [  318.628283] [<c115d7cc>] (dsa_switch_rcv) from [<c0f58c80>] (__netif_receive_skb_one_core+0x50/0x6c)
      [  318.637386] [<c0f58c80>] (__netif_receive_skb_one_core) from [<c0f58f04>] (netif_receive_skb_internal+0xac/0x264)
      [  318.647611] [<c0f58f04>] (netif_receive_skb_internal) from [<c0f59e98>] (napi_gro_receive+0x1d8/0x338)
      [  318.656887] [<c0f59e98>] (napi_gro_receive) from [<c0c298a4>] (gfar_clean_rx_ring+0x328/0x724)
      [  318.665472] [<c0c298a4>] (gfar_clean_rx_ring) from [<c0c29e60>] (gfar_poll_rx_sq+0x34/0x94)
      [  318.673795] [<c0c29e60>] (gfar_poll_rx_sq) from [<c0f5b40c>] (net_rx_action+0x128/0x4f8)
      [  318.681860] [<c0f5b40c>] (net_rx_action) from [<c03022f0>] (__do_softirq+0x148/0x5ac)
      [  318.689666] [<c03022f0>] (__do_softirq) from [<c0355af4>] (irq_exit+0x160/0x170)
      [  318.697040] [<c0355af4>] (irq_exit) from [<c03c6818>] (__handle_domain_irq+0x60/0xb4)
      [  318.704847] [<c03c6818>] (__handle_domain_irq) from [<c07e9440>] (gic_handle_irq+0x58/0x9c)
      [  318.713172] [<c07e9440>] (gic_handle_irq) from [<c0301a70>] (__irq_svc+0x70/0x98)
      [  318.720622] Exception stack(0xc2001f18 to 0xc2001f60)
      [  318.725656] 1f00:                                                       00000001 00000006
      [  318.733805] 1f20: 00000000 c20165c0 ffffe000 c2010cac c2010cf4 00000001 00000000 c2010c88
      [  318.741955] 1f40: c1f7a5a8 00000000 00000000 c2001f68 c03ba140 c030a288 200e0013 ffffffff
      [  318.750110] [<c0301a70>] (__irq_svc) from [<c030a288>] (arch_cpu_idle+0x24/0x3c)
      [  318.757486] [<c030a288>] (arch_cpu_idle) from [<c038a480>] (do_idle+0x1b8/0x2a4)
      [  318.764859] [<c038a480>] (do_idle) from [<c038a94c>] (cpu_startup_entry+0x18/0x1c)
      [  318.772407] [<c038a94c>] (cpu_startup_entry) from [<c1e00f10>] (start_kernel+0x4cc/0x4fc)
      
      Fixes: 844d7edc ("net: dsa: sja1105: Add a global sja1105_tagger_data structure")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6530e5a
    • D
      net/rds: Fix error handling in rds_ib_add_one() · d64bf89a
      Dotan Barak 提交于
      rds_ibdev:ipaddr_list and rds_ibdev:conn_list are initialized
      after allocation some resources such as protection domain.
      If allocation of such resources fail, then these uninitialized
      variables are accessed in rds_ib_dev_free() in failure path. This
      can potentially crash the system. The code has been updated to
      initialize these variables very early in the function.
      Signed-off-by: NDotan Barak <dotanb@dev.mellanox.co.il>
      Signed-off-by: NSudhakar Dindukurti <sudhakar.dindukurti@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d64bf89a
    • L
      net: dsa: rtl8366: Check VLAN ID and not ports · e8521e53
      Linus Walleij 提交于
      There has been some confusion between the port number and
      the VLAN ID in this driver. What we need to check for
      validity is the VLAN ID, nothing else.
      
      The current confusion came from assigning a few default
      VLANs for default routing and we need to rewrite that
      properly.
      
      Instead of checking if the port number is a valid VLAN
      ID, check the actual VLAN IDs passed in to the callback
      one by one as expected.
      
      Fixes: d8652956 ("net: dsa: realtek-smi: Add Realtek SMI driver")
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8521e53
  2. 02 10月, 2019 29 次提交
    • M
      mlx5: avoid 64-bit division in dr_icm_pool_mr_create() · 8b6b82ad
      Michal Kubecek 提交于
      Recently added code introduces 64-bit division in dr_icm_pool_mr_create()
      so that build on 32-bit architectures fails with
      
        ERROR: "__umoddi3" [drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko] undefined!
      
      As the divisor is always a power of 2, we can use bitwise operation
      instead.
      
      Fixes: 29cf8feb ("net/mlx5: DR, ICM pool memory allocator")
      Reported-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b6b82ad
    • T
      tipc: fix unlimited bundling of small messages · e95584a8
      Tuong Lien 提交于
      We have identified a problem with the "oversubscription" policy in the
      link transmission code.
      
      When small messages are transmitted, and the sending link has reached
      the transmit window limit, those messages will be bundled and put into
      the link backlog queue. However, bundles of data messages are counted
      at the 'CRITICAL' level, so that the counter for that level, instead of
      the counter for the real, bundled message's level is the one being
      increased.
      Subsequent, to-be-bundled data messages at non-CRITICAL levels continue
      to be tested against the unchanged counter for their own level, while
      contributing to an unrestrained increase at the CRITICAL backlog level.
      
      This leaves a gap in congestion control algorithm for small messages
      that can result in starvation for other users or a "real" CRITICAL
      user. Even that eventually can lead to buffer exhaustion & link reset.
      
      We fix this by keeping a 'target_bskb' buffer pointer at each levels,
      then when bundling, we only bundle messages at the same importance
      level only. This way, we know exactly how many slots a certain level
      have occupied in the queue, so can manage level congestion accurately.
      
      By bundling messages at the same level, we even have more benefits. Let
      consider this:
      - One socket sends 64-byte messages at the 'CRITICAL' level;
      - Another sends 4096-byte messages at the 'LOW' level;
      
      When a 64-byte message comes and is bundled the first time, we put the
      overhead of message bundle to it (+ 40-byte header, data copy, etc.)
      for later use, but the next message can be a 4096-byte one that cannot
      be bundled to the previous one. This means the last bundle carries only
      one payload message which is totally inefficient, as for the receiver
      also! Later on, another 64-byte message comes, now we make a new bundle
      and the same story repeats...
      
      With the new bundling algorithm, this will not happen, the 64-byte
      messages will be bundled together even when the 4096-byte message(s)
      comes in between. However, if the 4096-byte messages are sent at the
      same level i.e. 'CRITICAL', the bundling algorithm will again cause the
      same overhead.
      
      Also, the same will happen even with only one socket sending small
      messages at a rate close to the link transmit's one, so that, when one
      message is bundled, it's transmitted shortly. Then, another message
      comes, a new bundle is created and so on...
      
      We will solve this issue radically by another patch.
      
      Fixes: 365ad353 ("tipc: reduce risk of user starvation during link congestion")
      Reported-by: NHoang Le <hoang.h.le@dektech.com.au>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e95584a8
    • D
      xen-netfront: do not use ~0U as error return value for xennet_fill_frags() · a761129e
      Dongli Zhang 提交于
      xennet_fill_frags() uses ~0U as return value when the sk_buff is not able
      to cache extra fragments. This is incorrect because the return type of
      xennet_fill_frags() is RING_IDX and 0xffffffff is an expected value for
      ring buffer index.
      
      In the situation when the rsp_cons is approaching 0xffffffff, the return
      value of xennet_fill_frags() may become 0xffffffff which xennet_poll() (the
      caller) would regard as error. As a result, queue->rx.rsp_cons is set
      incorrectly because it is updated only when there is error. If there is no
      error, xennet_poll() would be responsible to update queue->rx.rsp_cons.
      Finally, queue->rx.rsp_cons would point to the rx ring buffer entries whose
      queue->rx_skbs[i] and queue->grant_rx_ref[i] are already cleared to NULL.
      This leads to NULL pointer access in the next iteration to process rx ring
      buffer entries.
      
      The symptom is similar to the one fixed in
      commit 00b36850 ("xen-netfront: do not assume sk_buff_head list is
      empty in error handling").
      
      This patch changes the return type of xennet_fill_frags() to indicate
      whether it is successful or failed. The queue->rx.rsp_cons will be
      always updated inside this function.
      
      Fixes: ad4f15dc ("xen/netfront: don't bug in case of too many frags")
      Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a761129e
    • D
      ipv6: Handle race in addrconf_dad_work · a3ce2a21
      David Ahern 提交于
      Rajendra reported a kernel panic when a link was taken down:
      
      [ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
      [ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290
      
      <snip>
      
      [ 6870.570501] Call Trace:
      [ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40
      [ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0
      [ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260
      [ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430
      [ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
      [ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430
      [ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490
      [ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430
      [ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0
      [ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
      [ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60
      [ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70
      [ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0
      
      addrconf_dad_work is kicked to be scheduled when a device is brought
      up. There is a race between addrcond_dad_work getting scheduled and
      taking the rtnl lock and a process taking the link down (under rtnl).
      The latter removes the host route from the inet6_addr as part of
      addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
      to use the host route in ipv6_ifa_notify. If the down event removes
      the host route due to the race to the rtnl, then the BUG listed above
      occurs.
      
      This scenario does not occur when the ipv6 address is not kept
      (net.ipv6.conf.all.keep_addr_on_down = 0) as addrconf_ifdown sets the
      state of the ifp to DEAD. Handle when the addresses are kept by checking
      IF_READY which is reset by addrconf_ifdown.
      
      The 'dead' flag for an inet6_addr is set only under rtnl, in
      addrconf_ifdown and it means the device is getting removed (or IPv6 is
      disabled). The interesting cases for changing the idev flag are
      addrconf_notify (NETDEV_UP and NETDEV_CHANGE) and addrconf_ifdown
      (reset the flag). The former does not have the idev lock - only rtnl;
      the latter has both. Based on that the existing dead + IF_READY check
      can be moved to right after the rtnl_lock in addrconf_dad_work.
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Reported-by: NRajendra Dendukuri <rajendra.dendukuri@broadcom.com>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3ce2a21
    • E
      tcp: adjust rto_base in retransmits_timed_out() · 3256a2d6
      Eric Dumazet 提交于
      The cited commit exposed an old retransmits_timed_out() bug
      which assumed it could call tcp_model_timeout() with
      TCP_RTO_MIN as rto_base for all states.
      
      But flows in SYN_SENT or SYN_RECV state uses a different
      RTO base (1 sec instead of 200 ms, unless BPF choses
      another value)
      
      This caused a reduction of SYN retransmits from 6 to 4 with
      the default /proc/sys/net/ipv4/tcp_syn_retries value.
      
      Fixes: a41e8a88 ("tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Marek Majkowski <marek@cloudflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3256a2d6
    • D
      vsock: Fix a lockdep warning in __vsock_release() · 0d9138ff
      Dexuan Cui 提交于
      Lockdep is unhappy if two locks from the same class are held.
      
      Fix the below warning for hyperv and virtio sockets (vmci socket code
      doesn't have the issue) by using lock_sock_nested() when __vsock_release()
      is called recursively:
      
      ============================================
      WARNING: possible recursive locking detected
      5.3.0+ #1 Not tainted
      --------------------------------------------
      server/1795 is trying to acquire lock:
      ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock]
      
      but task is already holding lock:
      ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(sk_lock-AF_VSOCK);
        lock(sk_lock-AF_VSOCK);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by server/1795:
       #0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0
       #1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      stack backtrace:
      CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1
      Call Trace:
       dump_stack+0x67/0x90
       __lock_acquire.cold.67+0xd2/0x20b
       lock_acquire+0xb5/0x1c0
       lock_sock_nested+0x6d/0x90
       hvs_release+0x10/0x120 [hv_sock]
       __vsock_release+0x24/0xf0 [vsock]
       __vsock_release+0xa0/0xf0 [vsock]
       vsock_release+0x12/0x30 [vsock]
       __sock_release+0x37/0xa0
       sock_close+0x14/0x20
       __fput+0xc1/0x250
       task_work_run+0x98/0xc0
       do_exit+0x344/0xc60
       do_group_exit+0x47/0xb0
       get_signal+0x15c/0xc50
       do_signal+0x30/0x720
       exit_to_usermode_loop+0x50/0xa0
       do_syscall_64+0x24e/0x270
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7f4184e85f31
      Tested-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d9138ff
    • J
      hso: fix NULL-deref on tty open · 8353da9f
      Johan Hovold 提交于
      Fix NULL-pointer dereference on tty open due to a failure to handle a
      missing interrupt-in endpoint when probing modem ports:
      
      	BUG: kernel NULL pointer dereference, address: 0000000000000006
      	...
      	RIP: 0010:tiocmget_submit_urb+0x1c/0xe0 [hso]
      	...
      	Call Trace:
      	hso_start_serial_device+0xdc/0x140 [hso]
      	hso_serial_open+0x118/0x1b0 [hso]
      	tty_open+0xf1/0x490
      
      Fixes: 542f5482 ("tty: Modem functions for the HSO driver")
      Signed-off-by: NJohan Hovold <johan@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8353da9f
    • O
      net: ag71xx: fix mdio subnode support · 569aad4f
      Oleksij Rempel 提交于
      This patch is syncing driver with actual devicetree documentation:
      Documentation/devicetree/bindings/net/qca,ar71xx.txt
      |Optional subnodes:
      |- mdio : specifies the mdio bus, used as a container for phy nodes
      |  according to phy.txt in the same directory
      
      The driver was working with fixed phy without any noticeable issues. This bug
      was uncovered by introducing dsa ar9331-switch driver.
      Since no one reported this bug until now, I assume no body is using it
      and this patch should not brake existing system.
      
      Fixes: d51b6ce4 ("net: ethernet: add ag71xx driver")
      Signed-off-by: NOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      569aad4f
    • D
      Merge branch 'stmmac-fixes' · b33210e3
      David S. Miller 提交于
      Jose Abreu says:
      
      ====================
      net: stmmac: Fixes for -net
      
      Misc fixes for -net tree. More info in commit logs.
      
      v2 is just a rebase of v1 against -net and we added a new patch (09/09) to
      fix RSS feature.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b33210e3
    • J
      net: stmmac: xgmac: Fix RSS writing wrong keys · 56627336
      Jose Abreu 提交于
      Commit b6b6cc9a, changed the call to dwxgmac2_rss_write_reg()
      passing it the variable cfg->key[i].
      
      As key is an u8 but we write 32 bits at a time we need to cast it into
      an u32 so that the correct key values are written. Notice that the for
      loop already takes this into account so we don't try to write past the
      keys size.
      
      Fixes: b6b6cc9a ("net: stmmac: selftest: avoid large stack usage")
      Signed-off-by: NJose Abreu <Jose.Abreu@synopsys.com>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56627336
    • J
      net: stmmac: xgmac: Fix RSS not writing all Keys to HW · 3c72d4d3
      Jose Abreu 提交于
      The sizeof(cfg->key) is != ARRAY_SIZE(cfg->key). Fix it. This warning is
      triggered when running with cc flag -Wsizeof-array-div.
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Reported-by: NNick Desaulniers <ndesaulniers@google.com>
      Reported-by: NNathan Chancellor <natechancellor@gmail.com>
      Fixes: 76067459 ("net: stmmac: Implement RSS and enable it in XGMAC core")
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NJose Abreu <Jose.Abreu@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c72d4d3
    • J
      net: stmmac: xgmac: Disable the Timestamp interrupt by default · 30300d9f
      Jose Abreu 提交于
      We don't use it anyway as XGMAC only supports polling for timestamp (in
      current SW implementation). This greatly reduces the system load by
      reducing the number of interrupts.
      
      Fixes: 2142754f ("net: stmmac: Add MAC related callbacks for XGMAC2")
      Signed-off-by: NJose Abreu <Jose.Abreu@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30300d9f
    • J
      net: stmmac: Do not stop PHY if WoL is enabled · 3e2bf04f
      Jose Abreu 提交于
      If WoL is enabled we can't really stop the PHY, otherwise we will not
      receive the WoL packet. Fix this by telling phylink that only the MAC is
      down and only stop the PHY if WoL is not enabled.
      
      Fixes: 74371272 ("net: stmmac: Convert to phylink and remove phylib logic")
      Signed-off-by: NJose Abreu <Jose.Abreu@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e2bf04f
    • J
      net: stmmac: Correctly take timestamp for PTPv2 · 14f34733
      Jose Abreu 提交于
      The case for PTPV2_EVENT requires event packets to be captured so add
      this setting to the list of enabled captures.
      
      Fixes: 891434b1 ("stmmac: add IEEE PTPv1 and PTPv2 support.")
      Signed-off-by: NJose Abreu <Jose.Abreu@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14f34733
    • J
      net: stmmac: dwmac4: Always update the MAC Hash Filter · f79bfda3
      Jose Abreu 提交于
      We need to always update the MAC Hash Filter so that previous entries
      are invalidated.
      
      Found out while running stmmac selftests.
      
      Fixes: b8ef7020 ("net: stmmac: add support for hash table size 128/256 in dwmac4")
      Signed-off-by: NJose Abreu <Jose.Abreu@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f79bfda3
    • J
      net: stmmac: selftests: Always use max DMA size in Jumbo Test · 432439fe
      Jose Abreu 提交于
      Although some XGMAC setups support frames larger than DMA size, some of
      them may not. As we can't know before-hand which ones support let's use
      the maximum DMA buffer size in the Jumbo Tests.
      
      User can always reconfigure the MTU to achieve larger frames.
      
      Fixes: 427849e8 ("net: stmmac: selftests: Add Jumbo Frame tests")
      Signed-off-by: NJose Abreu <Jose.Abreu@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      432439fe
    • J
      net: stmmac: xgmac: Detect Hash Table size dinamically · c11986b9
      Jose Abreu 提交于
      Since commit b8ef7020 ("net: stmmac: add support for hash table size
      128/256 in dwmac4"), we can detect the Hash Table dinamically.
      
      Let's implement this feature in XGMAC cores and fix possible setups that
      don't support the maximum size for Hash Table.
      Signed-off-by: NJose Abreu <Jose.Abreu@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c11986b9
    • J
      net: stmmac: xgmac: Not all Unicast addresses may be available · 9a2ae7b3
      Jose Abreu 提交于
      Some setups may not have all Unicast addresses filters available. Let's
      check this before trying to setup filters.
      
      Fixes: 0efedbf1 ("net: stmmac: xgmac: Fix XGMAC selftests")
      Signed-off-by: NJose Abreu <Jose.Abreu@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a2ae7b3
    • V
      devlink: Fix error handling in param and info_get dumpit cb · 93c2fcb0
      Vasundhara Volam 提交于
      If any of the param or info_get op returns error, dumpit cb is
      skipping to dump remaining params or info_get ops for all the
      drivers.
      
      Fix to not return if any of the param/info_get op returns error
      as not supported and continue to dump remaining information.
      
      v2: Modify the patch to return error, except for params/info_get
      op that return -EOPNOTSUPP as suggested by Andrew Lunn. Also, modify
      commit message to reflect the same.
      
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Michael Chan <michael.chan@broadcom.com>
      Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93c2fcb0
    • W
      net: dsa: rtl8366rb: add missing of_node_put after calling of_get_child_by_name · f32eb9d8
      Wen Yang 提交于
      of_node_put needs to be called when the device node which is got
      from of_get_child_by_name finished using.
      irq_domain_add_linear() also calls of_node_get() to increase refcount,
      so irq_domain will not be affected when it is released.
      
      Fixes: d8652956 ("net: dsa: realtek-smi: Add Realtek SMI driver")
      Signed-off-by: NWen Yang <wenyang@linux.alibaba.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Vivien Didelot <vivien.didelot@gmail.com>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f32eb9d8
    • W
      net: mscc: ocelot: add missing of_node_put after calling of_get_child_by_name · d2c50b1c
      Wen Yang 提交于
      of_node_put needs to be called when the device node which is got
      from of_get_child_by_name finished using.
      In both cases of success and failure, we need to release 'ports',
      so clean up the code using goto.
      
      fixes: a556c76a ("net: mscc: Add initial Ocelot switch support")
      Signed-off-by: NWen Yang <wenyang@linux.alibaba.com>
      Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
      Cc: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2c50b1c
    • V
      net: sched: cbs: Avoid division by zero when calculating the port rate · 83c8c3cf
      Vladimir Oltean 提交于
      As explained in the "net: sched: taprio: Avoid division by zero on
      invalid link speed" commit, it is legal for the ethtool API to return
      zero as a link speed. So guard against it to ensure we don't perform a
      division by zero in kernel.
      
      Fixes: e0a7683d ("net/sched: cbs: fix port_rate miscalculation")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Acked-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83c8c3cf
    • V
      net: sched: taprio: Avoid division by zero on invalid link speed · 9a9251a3
      Vladimir Oltean 提交于
      The check in taprio_set_picos_per_byte is currently not robust enough
      and will trigger this division by zero, due to e.g. PHYLINK not setting
      kset->base.speed when there is no PHY connected:
      
      [   27.109992] Division by zero in kernel.
      [   27.113842] CPU: 1 PID: 198 Comm: tc Not tainted 5.3.0-rc5-01246-gc4006b8c2637-dirty #212
      [   27.121974] Hardware name: Freescale LS1021A
      [   27.126234] [<c03132e0>] (unwind_backtrace) from [<c030d8b8>] (show_stack+0x10/0x14)
      [   27.133938] [<c030d8b8>] (show_stack) from [<c10b21b0>] (dump_stack+0xb0/0xc4)
      [   27.141124] [<c10b21b0>] (dump_stack) from [<c10af97c>] (Ldiv0_64+0x8/0x18)
      [   27.148052] [<c10af97c>] (Ldiv0_64) from [<c0700260>] (div64_u64+0xcc/0xf0)
      [   27.154978] [<c0700260>] (div64_u64) from [<c07002d0>] (div64_s64+0x4c/0x68)
      [   27.161993] [<c07002d0>] (div64_s64) from [<c0f3d890>] (taprio_set_picos_per_byte+0xe8/0xf4)
      [   27.170388] [<c0f3d890>] (taprio_set_picos_per_byte) from [<c0f3f614>] (taprio_change+0x668/0xcec)
      [   27.179302] [<c0f3f614>] (taprio_change) from [<c0f2bc24>] (qdisc_create+0x1fc/0x4f4)
      [   27.187091] [<c0f2bc24>] (qdisc_create) from [<c0f2c0c8>] (tc_modify_qdisc+0x1ac/0x6f8)
      [   27.195055] [<c0f2c0c8>] (tc_modify_qdisc) from [<c0ee9604>] (rtnetlink_rcv_msg+0x268/0x2dc)
      [   27.203449] [<c0ee9604>] (rtnetlink_rcv_msg) from [<c0f4fef0>] (netlink_rcv_skb+0xe0/0x114)
      [   27.211756] [<c0f4fef0>] (netlink_rcv_skb) from [<c0f4f6cc>] (netlink_unicast+0x1b4/0x22c)
      [   27.219977] [<c0f4f6cc>] (netlink_unicast) from [<c0f4fa84>] (netlink_sendmsg+0x284/0x340)
      [   27.228198] [<c0f4fa84>] (netlink_sendmsg) from [<c0eae5fc>] (sock_sendmsg+0x14/0x24)
      [   27.235988] [<c0eae5fc>] (sock_sendmsg) from [<c0eaedf8>] (___sys_sendmsg+0x214/0x228)
      [   27.243863] [<c0eaedf8>] (___sys_sendmsg) from [<c0eb015c>] (__sys_sendmsg+0x50/0x8c)
      [   27.251652] [<c0eb015c>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x54)
      [   27.259524] Exception stack(0xe8045fa8 to 0xe8045ff0)
      [   27.264546] 5fa0:                   b6f608c8 000000f8 00000003 bed7e2f0 00000000 00000000
      [   27.272681] 5fc0: b6f608c8 000000f8 004ce54c 00000128 5d3ce8c7 00000000 00000026 00505c9c
      [   27.280812] 5fe0: 00000070 bed7e298 004ddd64 b6dd1e64
      
      Russell King points out that the ethtool API says zero is a valid return
      value of __ethtool_get_link_ksettings:
      
         * If it is enabled then they are read-only; if the link
         * is up they represent the negotiated link mode; if the link is down,
         * the speed is 0, %SPEED_UNKNOWN or the highest enabled speed and
         * @duplex is %DUPLEX_UNKNOWN or the best enabled duplex mode.
      
        So, it seems that taprio is not following the API... I'd suggest either
        fixing taprio, or getting agreement to change the ethtool API.
      
      The chosen path was to fix taprio.
      
      Fixes: 7b9eba7b ("net/sched: taprio: fix picos_per_byte miscalculation")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Acked-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a9251a3
    • P
      netfilter: nft_connlimit: disable bh on garbage collection · 34a4c95a
      Pablo Neira Ayuso 提交于
      BH must be disabled when invoking nf_conncount_gc_list() to perform
      garbage collection, otherwise deadlock might happen.
      
        nf_conncount_add+0x1f/0x50 [nf_conncount]
        nft_connlimit_eval+0x4c/0xe0 [nft_connlimit]
        nft_dynset_eval+0xb5/0x100 [nf_tables]
        nft_do_chain+0xea/0x420 [nf_tables]
        ? sch_direct_xmit+0x111/0x360
        ? noqueue_init+0x10/0x10
        ? __qdisc_run+0x84/0x510
        ? tcp_packet+0x655/0x1610 [nf_conntrack]
        ? ip_finish_output2+0x1a7/0x430
        ? tcp_error+0x130/0x150 [nf_conntrack]
        ? nf_conntrack_in+0x1fc/0x4c0 [nf_conntrack]
        nft_do_chain_ipv4+0x66/0x80 [nf_tables]
        nf_hook_slow+0x44/0xc0
        ip_rcv+0xb5/0xd0
        ? ip_rcv_finish_core.isra.19+0x360/0x360
        __netif_receive_skb_one_core+0x52/0x70
        netif_receive_skb_internal+0x34/0xe0
        napi_gro_receive+0xba/0xe0
        e1000_clean_rx_irq+0x1e9/0x420 [e1000e]
        e1000e_poll+0xbe/0x290 [e1000e]
        net_rx_action+0x149/0x3b0
        __do_softirq+0xde/0x2d8
        irq_exit+0xba/0xc0
        do_IRQ+0x85/0xd0
        common_interrupt+0xf/0xf
        </IRQ>
        RIP: 0010:nf_conncount_gc_list+0x3b/0x130 [nf_conncount]
      
      Fixes: 2f971a8f ("netfilter: nf_conncount: move all list iterations under spinlock")
      Reported-by: NLaura Garcia Liebana <nevola@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      34a4c95a
    • F
      netfilter: drop bridge nf reset from nf_reset · 895b5c9f
      Florian Westphal 提交于
      commit 174e2381
      ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
      recycle always drop skb extensions.  The additional skb_ext_del() that is
      performed via nf_reset on napi skb recycle is not needed anymore.
      
      Most nf_reset() calls in the stack are there so queued skb won't block
      'rmmod nf_conntrack' indefinitely.
      
      This removes the skb_ext_del from nf_reset, and renames it to a more
      fitting nf_reset_ct().
      
      In a few selected places, add a call to skb_ext_reset to make sure that
      no active extensions remain.
      
      I am submitting this for "net", because we're still early in the release
      cycle.  The patch applies to net-next too, but I think the rename causes
      needless divergence between those trees.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      895b5c9f
    • D
      Merge tag 'mac80211-for-davem-2019-10-01' of... · 9cfc3702
      David S. Miller 提交于
      Merge tag 'mac80211-for-davem-2019-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      A small list of fixes this time:
       * two null pointer dereference fixes
       * a fix for preempt-enabled/BHs-enabled (lockdep) splats
         (that correctly pointed out a bug)
       * a fix for multi-BSSID ordering assumptions
       * a fix for the EDMG support, on-stack chandefs need to
         be initialized properly (now that they're bigger)
       * beacon (head) data from userspace should be validated
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cfc3702
    • A
      ionic: select CONFIG_NET_DEVLINK · 6de6d185
      Arnd Bergmann 提交于
      When no other driver selects the devlink library code, ionic
      produces a link failure:
      
      drivers/net/ethernet/pensando/ionic/ionic_devlink.o: In function `ionic_devlink_alloc':
      ionic_devlink.c:(.text+0xd): undefined reference to `devlink_alloc'
      drivers/net/ethernet/pensando/ionic/ionic_devlink.o: In function `ionic_devlink_register':
      ionic_devlink.c:(.text+0x71): undefined reference to `devlink_register'
      
      Add the same 'select' statement that the other drivers use here.
      
      Fixes: fbfb8031 ("ionic: Add hardware init and device commands")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NShannon Nelson <snelson@pensando.io>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6de6d185
    • A
      docs: networking: Add title caret and missing doc · c5f75a14
      Adam Zerella 提交于
      Resolving a couple of Sphinx documentation warnings
      that are generated in the networking section.
      
      - WARNING: document isn't included in any toctree
      - WARNING: Title underline too short.
      Signed-off-by: NAdam Zerella <adam.zerella@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5f75a14
    • L
      net: socionext: netsec: always grab descriptor lock · 55131dec
      Lorenzo Bianconi 提交于
      Always acquire tx descriptor spinlock even if a xdp program is not loaded
      on the netsec device since ndo_xdp_xmit can run concurrently with
      netsec_netdev_start_xmit and netsec_clean_tx_dring. This can happen
      loading a xdp program on a different device (e.g virtio-net) and
      xdp_do_redirect_map/xdp_do_redirect_slow can redirect to netsec even if
      we do not have a xdp program on it.
      
      Fixes: ba2b2321 ("net: netsec: add XDP support")
      Tested-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
      Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55131dec
  3. 01 10月, 2019 4 次提交