1. 11 9月, 2019 10 次提交
  2. 10 9月, 2019 2 次提交
  3. 08 9月, 2019 1 次提交
    • F
      nfp: flower: cmsg rtnl locks can timeout reify messages · 28abe579
      Fred Lotter 提交于
      Flower control message replies are handled in different locations. The truly
      high priority replies are handled in the BH (tasklet) context, while the
      remaining replies are handled in a predefined Linux work queue. The work
      queue handler orders replies into high and low priority groups, and always
      start servicing the high priority replies within the received batch first.
      
      Reply Type:			Rtnl Lock:	Handler:
      
      CMSG_TYPE_PORT_MOD		no		BH tasklet (mtu)
      CMSG_TYPE_TUN_NEIGH		no		BH tasklet
      CMSG_TYPE_FLOW_STATS		no		BH tasklet
      CMSG_TYPE_PORT_REIFY		no		WQ high
      CMSG_TYPE_PORT_MOD		yes		WQ high (link/mtu)
      CMSG_TYPE_MERGE_HINT		yes		WQ low
      CMSG_TYPE_NO_NEIGH		no		WQ low
      CMSG_TYPE_ACTIVE_TUNS		no		WQ low
      CMSG_TYPE_QOS_STATS		no		WQ low
      CMSG_TYPE_LAG_CONFIG		no		WQ low
      
      A subset of control messages can block waiting for an rtnl lock (from both
      work queue priority groups). The rtnl lock is heavily contended for by
      external processes such as systemd-udevd, systemd-network and libvirtd,
      especially during netdev creation, such as when flower VFs and representors
      are instantiated.
      
      Kernel netlink instrumentation shows that external processes (such as
      systemd-udevd) often use successive rtnl_trylock() sequences, which can result
      in an rtnl_lock() blocked control message to starve for longer periods of time
      during rtnl lock contention, i.e. netdev creation.
      
      In the current design a single blocked control message will block the entire
      work queue (both priorities), and introduce a latency which is
      nondeterministic and dependent on system wide rtnl lock usage.
      
      In some extreme cases, one blocked control message at exactly the wrong time,
      just before the maximum number of VFs are instantiated, can block the work
      queue for long enough to prevent VF representor REIFY replies from getting
      handled in time for the 40ms timeout.
      
      The firmware will deliver the total maximum number of REIFY message replies in
      around 300us.
      
      Only REIFY and MTU update messages require replies within a timeout period (of
      40ms). The MTU-only updates are already done directly in the BH (tasklet)
      handler.
      
      Move the REIFY handler down into the BH (tasklet) in order to resolve timeouts
      caused by a blocked work queue waiting on rtnl locks.
      Signed-off-by: NFred Lotter <frederik.lotter@netronome.com>
      Signed-off-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28abe579
  4. 07 9月, 2019 7 次提交
  5. 06 9月, 2019 6 次提交
    • D
      Merge tag 'wireless-drivers-for-davem-2019-09-05' of... · 74346c43
      David S. Miller 提交于
      Merge tag 'wireless-drivers-for-davem-2019-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for 5.3
      
      Fourth set of fixes for 5.3, and hopefully really the last one. Quite
      a few CVE fixes this time but at least to my knowledge none of them
      have a known exploit.
      
      mt76
      
      * workaround firmware hang by disabling hardware encryption on MT7630E
      
      * disable 5GHz band for MT7630E as it's not working properly
      
      mwifiex
      
      * fix IE parsing to avoid a heap buffer overflow
      
      iwlwifi
      
      * fix for QuZ device initialisation
      
      rt2x00
      
      * another fix for rekeying
      
      * revert a commit causing degradation in rx signal levels
      
      rsi
      
      * fix a double free
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74346c43
    • R
      MAINTAINERS: add myself as maintainer for xilinx axiethernet driver · b0a3caea
      Radhey Shyam Pandey 提交于
      I am maintaining xilinx axiethernet driver in xilinx tree and would like
      to maintain it in the mainline kernel as well. Hence adding myself as a
      maintainer. Also Anirudha and John has moved to new roles, so based on
      request removing them from the maintainer list.
      Signed-off-by: NRadhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
      Acked-by: NJohn Linn <john.linn@xilinx.com>
      Acked-by: NMichal Simek <michal.simek@xilinx.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0a3caea
    • E
      net: sched: fix reordering issues · b88dd52c
      Eric Dumazet 提交于
      Whenever MQ is not used on a multiqueue device, we experience
      serious reordering problems. Bisection found the cited
      commit.
      
      The issue can be described this way :
      
      - A single qdisc hierarchy is shared by all transmit queues.
        (eg : tc qdisc replace dev eth0 root fq_codel)
      
      - When/if try_bulk_dequeue_skb_slow() dequeues a packet targetting
        a different transmit queue than the one used to build a packet train,
        we stop building the current list and save the 'bad' skb (P1) in a
        special queue. (bad_txq)
      
      - When dequeue_skb() calls qdisc_dequeue_skb_bad_txq() and finds this
        skb (P1), it checks if the associated transmit queues is still in frozen
        state. If the queue is still blocked (by BQL or NIC tx ring full),
        we leave the skb in bad_txq and return NULL.
      
      - dequeue_skb() calls q->dequeue() to get another packet (P2)
      
        The other packet can target the problematic queue (that we found
        in frozen state for the bad_txq packet), but another cpu just ran
        TX completion and made room in the txq that is now ready to accept
        new packets.
      
      - Packet P2 is sent while P1 is still held in bad_txq, P1 might be sent
        at next round. In practice P2 is the lead of a big packet train
        (P2,P3,P4 ...) filling the BQL budget and delaying P1 by many packets :/
      
      To solve this problem, we have to block the dequeue process as long
      as the first packet in bad_txq can not be sent. Reordering issues
      disappear and no side effects have been seen.
      
      Fixes: a53851e2 ("net: sched: explicit locking in gso_cpu fallback")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b88dd52c
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 2e9550ed
      David S. Miller 提交于
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2019-09-05
      
      1) Several xfrm interface fixes from Nicolas Dichtel:
         - Avoid an interface ID corruption on changelink.
         - Fix wrong intterface names in the logs.
         - Fix a list corruption when changing network namespaces.
         - Fix unregistation of the underying phydev.
      
      2) Fix a potential warning when merging xfrm_plocy nodes.
         From Florian Westphal.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e9550ed
    • Z
      forcedeth: use per cpu to collect xmit/recv statistics · f4b633b9
      Zhu Yanjun 提交于
      When testing with a background iperf pushing 1Gbit/sec traffic and running
      both ifconfig and netstat to collect statistics, some deadlocks occurred.
      
      Ifconfig and netstat will call nv_get_stats64 to get software xmit/recv
      statistics. In the commit f5d827ae ("forcedeth: implement
      ndo_get_stats64() API"), the normal tx/rx variables is to collect tx/rx
      statistics. The fix is to replace normal tx/rx variables with per
      cpu 64-bit variable to collect xmit/recv statistics. The per cpu variable
      will avoid deadlocks and provide fast efficient statistics updates.
      
      In nv_probe, the per cpu variable is initialized. In nv_remove, this
      per cpu variable is freed.
      
      In xmit/recv process, this per cpu variable will be updated.
      
      In nv_get_stats64, this per cpu variable on each cpu is added up. Then
      the driver can get xmit/recv packets statistics.
      
      A test runs for several days with this commit, the deadlocks disappear
      and the performance is better.
      
      Tested:
         - iperf SMP x86_64 ->
         Client connecting to 1.1.1.108, TCP port 5001
         TCP window size: 85.0 KByte (default)
         ------------------------------------------------------------
         [  3] local 1.1.1.105 port 38888 connected with 1.1.1.108 port 5001
         [ ID] Interval       Transfer     Bandwidth
         [  3]  0.0-10.0 sec  1.10 GBytes   943 Mbits/sec
      
         ifconfig results:
      
         enp0s9 Link encap:Ethernet  HWaddr 00:21:28:6f:de:0f
                inet addr:1.1.1.105  Bcast:0.0.0.0  Mask:255.255.255.0
                UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
                RX packets:5774764531 errors:0 dropped:0 overruns:0 frame:0
                TX packets:633534193 errors:0 dropped:0 overruns:0 carrier:0
                collisions:0 txqueuelen:1000
                RX bytes:7646159340904 (7.6 TB) TX bytes:11425340407722 (11.4 TB)
      
         netstat results:
      
         Kernel Interface table
         Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
         ...
         enp0s9 1500 0  5774764531 0    0 0      633534193      0      0  0 BMRU
         ...
      
      Fixes: f5d827ae ("forcedeth: implement ndo_get_stats64() API")
      CC: Joe Jin <joe.jin@oracle.com>
      CC: JUNXIAO_BI <junxiao.bi@oracle.com>
      Reported-and-tested-by: NNan san <nan.1986san@gmail.com>
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4b633b9
    • M
      net: sonic: return NETDEV_TX_OK if failed to map buffer · 6e1cdedc
      Mao Wenan 提交于
      NETDEV_TX_BUSY really should only be used by drivers that call
      netif_tx_stop_queue() at the wrong moment. If dma_map_single() is
      failed to map tx DMA buffer, it might trigger an infinite loop.
      This patch use NETDEV_TX_OK instead of NETDEV_TX_BUSY, and change
      printk to pr_err_ratelimited.
      
      Fixes: d9fb9f38 ("*sonic/natsemi/ns83829: Move the National Semi-conductor drivers")
      Signed-off-by: NMao Wenan <maowenan@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e1cdedc
  6. 05 9月, 2019 14 次提交