1. 04 11月, 2018 10 次提交
    • E
      net/mlx4_en: use __netdev_tx_sent_queue() · c2973444
      Eric Dumazet 提交于
      doorbell only depends on xmit_more and netif_tx_queue_stopped()
      
      Using __netdev_tx_sent_queue() avoids messing with BQL stop flag,
      and is more generic.
      
      This patch increases performance on GSO workload by keeping
      doorbells to the minimum required.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2973444
    • E
      net: do not abort bulk send on BQL status · fe60faa5
      Eric Dumazet 提交于
      Before calling dev_hard_start_xmit(), upper layers tried
      to cook optimal skb list based on BQL budget.
      
      Problem is that GSO packets can end up comsuming more than
      the BQL budget.
      
      Breaking the loop is not useful, since requeued packets
      are ahead of any packets still in the qdisc.
      
      It is also more expensive, since next TX completion will
      push these packets later, while skbs are not in cpu caches.
      
      It is also a behavior difference with TSO packets, that can
      break the BQL limit by a large amount.
      
      Note that drivers should use __netdev_tx_sent_queue()
      in order to have optimal xmit_more support, and avoid
      useless atomic operations as shown in the following patch.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe60faa5
    • E
      net: bql: add __netdev_tx_sent_queue() · 3e59020a
      Eric Dumazet 提交于
      When qdisc_run() tries to use BQL budget to bulk-dequeue a batch
      of packets, GSO can later transform this list in another list
      of skbs, and each skb is sent to device ndo_start_xmit(),
      one at a time, with skb->xmit_more being set to one but
      for last skb.
      
      Problem is that very often, BQL limit is hit in the middle of
      the packet train, forcing dev_hard_start_xmit() to stop the
      bulk send and requeue the end of the list.
      
      BQL role is to avoid head of line blocking, making sure
      a qdisc can deliver high priority packets before low priority ones.
      
      But there is no way requeued packets can be bypassed by fresh
      packets in the qdisc.
      
      Aborting the bulk send increases TX softirqs, and hot cache
      lines (after skb_segment()) are wasted.
      
      Note that for TSO packets, we never split a packet in the middle
      because of BQL limit being hit.
      
      Drivers should be able to update BQL counters without
      flipping/caring about BQL status, if the current skb
      has xmit_more set.
      
      Upper layers are ultimately responsible to stop sending another
      packet train when BQL limit is hit.
      
      Code template in a driver might look like the following :
      
      	send_doorbell = __netdev_tx_sent_queue(tx_queue, nr_bytes, skb->xmit_more);
      
      Note that __netdev_tx_sent_queue() use is not mandatory,
      since following patch will change dev_hard_start_xmit()
      to not care about BQL status.
      
      But it is highly recommended so that xmit_more full benefits
      can be reached (less doorbells sent, and less atomic operations as well)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e59020a
    • D
      Merge branch 's390-qeth-fixes' · b8a5d06a
      David S. Miller 提交于
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2018-11-02
      
      please apply one round of qeth fixes for -net.
      
      Patch 1 is rather large and removes a use-after-free hazard from many of our
      debug trace entries.
      Patch 2 is yet another fix-up for the L3 subdriver's new IP address management
      code.
      Patch 3 and 4 resolve some fallout from the recent changes wrt how/when qeth
      allocates its net_device.
      Patch 5 makes sure we don't set reserved bits when building HW commands from
      user-provided data.
      And finally, patch 6 allows ethtool to play nice with new HW.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8a5d06a
    • J
      s390/qeth: report 25Gbit link speed · 54e049c2
      Julian Wiedmann 提交于
      This adds the various identifiers for 25Gbit cards, and wires them up
      into sysfs and ethtool.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54e049c2
    • J
      s390/qeth: sanitize ARP requests · 125d7d30
      Julian Wiedmann 提交于
      The ARP_{ADD,REMOVE}_ENTRY cmd structs contain reserved fields.
      Introduce a common helper that doesn't raw-copy the user-provided data
      into the cmd, but only sets those fields that are strictly needed for
      the command.
      
      This also sets the correct command length for ARP_REMOVE_ENTRY.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      125d7d30
    • J
      s390/qeth: fix initial operstate · 9fae5c3b
      Julian Wiedmann 提交于
      Setting the carrier 'on' for an unregistered netdevice doesn't update
      its operstate. Fix this by delaying the update until the netdevice has
      been registered.
      
      Fixes: 91cc98f5 ("s390/qeth: remove duplicated carrier state tracking")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fae5c3b
    • J
      s390/qeth: unregister netdevice only when registered · 30356d08
      Julian Wiedmann 提交于
      qeth only registers its netdevice when the qeth device is first set
      online. Thus a device that has never been set online will trigger
      a WARN ("network todo 'hsi%d' but state 0") in unregister_netdev() when
      removed.
      
      Fix this by protecting the unregister step, just like we already protect
      against repeated registering of the netdevice.
      
      Fixes: d3d1b205 ("s390/qeth: allocate netdevice early")
      Reported-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30356d08
    • J
      s390/qeth: fix HiperSockets sniffer · bd74a7f9
      Julian Wiedmann 提交于
      Sniffing mode for L3 HiperSockets requires that no IP addresses are
      registered with the HW. The preferred way to achieve this is for
      userspace to delete all the IPs on the interface. But qeth is expected
      to also tolerate a configuration where that is not the case, by skipping
      the IP registration when in sniffer mode.
      Since commit 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      reworked the IP registration logic in the L3 subdriver, this no longer
      works. When the qeth device is set online, qeth_l3_recover_ip() now
      unconditionally registers all unicast addresses from our internal
      IP table.
      
      While we could fix this particular problem by skipping
      qeth_l3_recover_ip() on a sniffer device, the more future-proof change
      is to skip the IP address registration at the lowest level. This way we
      a) catch any future code path that attempts to register an IP address
         without considering the sniffer scenario, and
      b) continue to build up our internal IP table, so that if sniffer mode
         is switched off later we can operate just like normal.
      
      Fixes: 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd74a7f9
    • J
      s390/qeth: sanitize strings in debug messages · e19e5be8
      Julian Wiedmann 提交于
      As Documentation/s390/s390dbf.txt states quite clearly, using any
      pointer in sprinf-formatted s390dbf debug entries is dangerous.
      The pointers are dereferenced whenever the trace file is read from.
      So if the referenced data has a shorter life-time than the trace file,
      any read operation can result in a use-after-free.
      
      So rip out all hazardous use of indirect data, and replace any usage of
      dev_name() and such by the Bus ID number.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e19e5be8
  2. 03 11月, 2018 8 次提交
    • D
      Merge branch 'net-timeout-fixes-for-GENET-and-SYSTEMPORT' · 265ad063
      David S. Miller 提交于
      Florian Fainelli says:
      
      ====================
      net: timeout fixes for GENET and SYSTEMPORT
      
      This patch series fixes occasional transmit timeout around the time
      the system goes into suspend. GENET and SYSTEMPORT have nearly the same
      logic in that regard and were both affected in the same way.
      
      Please queue up for stable, thanks!
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      265ad063
    • F
      net: systemport: Protect stop from timeout · 7cb6a2a2
      Florian Fainelli 提交于
      A timing hazard exists when the network interface is stopped that
      allows a watchdog timeout to be processed by a separate core in
      parallel. This creates the potential for the timeout handler to
      wake the queues while the driver is shutting down, or access
      registers after their clocks have been removed.
      
      The more common case is that the watchdog timeout will produce a
      warning message which doesn't lead to a crash. The chances of this
      are greatly increased by the fact that bcm_sysport_netif_stop stops
      the transmit queues which can easily precipitate a watchdog time-
      out because of stale trans_start data in the queues.
      
      This commit corrects the behavior by ensuring that the watchdog
      timeout is disabled before enterring bcm_sysport_netif_stop. There
      are currently only two users of the bcm_sysport_netif_stop function:
      close and suspend.
      
      The close case already handles the issue by exiting the RUNNING
      state before invoking the driver close service.
      
      The suspend case now performs the netif_device_detach to exit the
      PRESENT state before the call to bcm_sysport_netif_stop rather than
      after it.
      
      These behaviors prevent any future scheduling of the driver timeout
      service during the window. The netif_tx_stop_all_queues function
      in bcm_sysport_netif_stop is replaced with netif_tx_disable to ensure
      synchronization with any transmit or timeout threads that may
      already be executing on other cores.
      
      For symmetry, the netif_device_attach call upon resume is moved to
      after the call to bcm_sysport_netif_start. Since it wakes the transmit
      queues it is not necessary to invoke netif_tx_start_all_queues from
      bcm_sysport_netif_start so it is moved into the driver open service.
      
      Fixes: 40755a0f ("net: systemport: add suspend and resume support")
      Fixes: 80105bef ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7cb6a2a2
    • D
      net: bcmgenet: protect stop from timeout · 09e805d2
      Doug Berger 提交于
      A timing hazard exists when the network interface is stopped that
      allows a watchdog timeout to be processed by a separate core in
      parallel. This creates the potential for the timeout handler to
      wake the queues while the driver is shutting down, or access
      registers after their clocks have been removed.
      
      The more common case is that the watchdog timeout will produce a
      warning message which doesn't lead to a crash. The chances of this
      are greatly increased by the fact that bcmgenet_netif_stop stops
      the transmit queues which can easily precipitate a watchdog time-
      out because of stale trans_start data in the queues.
      
      This commit corrects the behavior by ensuring that the watchdog
      timeout is disabled before enterring bcmgenet_netif_stop. There
      are currently only two users of the bcmgenet_netif_stop function:
      close and suspend.
      
      The close case already handles the issue by exiting the RUNNING
      state before invoking the driver close service.
      
      The suspend case now performs the netif_device_detach to exit the
      PRESENT state before the call to bcmgenet_netif_stop rather than
      after it.
      
      These behaviors prevent any future scheduling of the driver timeout
      service during the window. The netif_tx_stop_all_queues function
      in bcmgenet_netif_stop is replaced with netif_tx_disable to ensure
      synchronization with any transmit or timeout threads that may
      already be executing on other cores.
      
      For symmetry, the netif_device_attach call upon resume is moved to
      after the call to bcmgenet_netif_start. Since it wakes the transmit
      queues it is not necessary to invoke netif_tx_start_all_queues from
      bcmgenet_netif_start so it is moved into the driver open service.
      
      Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
      Signed-off-by: NDoug Berger <opendmb@gmail.com>
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09e805d2
    • D
      rxrpc: Fix lockup due to no error backoff after ack transmit error · c7e86acf
      David Howells 提交于
      If the network becomes (partially) unavailable, say by disabling IPv6, the
      background ACK transmission routine can get itself into a tizzy by
      proposing immediate ACK retransmission.  Since we're in the call event
      processor, that happens immediately without returning to the workqueue
      manager.
      
      The condition should clear after a while when either the network comes back
      or the call times out.
      
      Fix this by:
      
       (1) When re-proposing an ACK on failed Tx, don't schedule it immediately.
           This will allow a certain amount of time to elapse before we try
           again.
      
       (2) Enforce a return to the workqueue manager after a certain number of
           iterations of the call processing loop.
      
       (3) Add a backoff delay that increases the delay on deferred ACKs by a
           jiffy per failed transmission to a limit of HZ.  The backoff delay is
           cleared on a successful return from kernel_sendmsg().
      
       (4) Cancel calls immediately if the opening sendmsg fails.  The layer
           above can arrange retransmission or rotate to another server.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7e86acf
    • T
      net: dsa: microchip: initialize mutex before use · 284fb78e
      Tristram Ha 提交于
      Initialize mutex before use.  Avoid kernel complaint when
      CONFIG_DEBUG_LOCK_ALLOC is enabled.
      
      Fixes: b987e98e ("dsa: add DSA switch driver for Microchip KSZ9477")
      Signed-off-by: NTristram Ha <Tristram.Ha@microchip.com>
      Reviewed-by: NPavel Machek <pavel@ucw.cz>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      284fb78e
    • J
      net/ipv6: Add anycast addresses to a global hashtable · 2384d025
      Jeff Barnhill 提交于
      icmp6_send() function is expensive on systems with a large number of
      interfaces. Every time it’s called, it has to verify that the source
      address does not correspond to an existing anycast address by looping
      through every device and every anycast address on the device.  This can
      result in significant delays for a CPU when there are a large number of
      neighbors and ND timers are frequently timing out and calling
      neigh_invalidate().
      
      Add anycast addresses to a global hashtable to allow quick searching for
      matching anycast addresses.  This is based on inet6_addr_lst in addrconf.c.
      Signed-off-by: NJeff Barnhill <0xeffeff@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2384d025
    • F
      usbnet: smsc95xx: disable carrier check while suspending · 7b900ead
      Frieder Schrempf 提交于
      We need to make sure, that the carrier check polling is disabled
      while suspending. Otherwise we can end up with usbnet_read_cmd()
      being issued when only usbnet_read_cmd_nopm() is allowed. If this
      happens, read operations lock up.
      
      Fixes: d69d1694 ("usbnet: smsc95xx: fix link detection for disabled autonegotiation")
      Signed-off-by: NFrieder Schrempf <frieder.schrempf@kontron.de>
      Reviewed-by: NRaghuram Chary J <RaghuramChary.Jallipalli@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b900ead
    • M
      net: document skb parameter in function 'skb_gso_size_check' · 49682bfa
      Mathieu Malaterre 提交于
      Remove kernel-doc warning:
      
        net/core/skbuff.c:4953: warning: Function parameter or member 'skb' not described in 'skb_gso_size_check'
      Signed-off-by: NMathieu Malaterre <malat@debian.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49682bfa
  3. 02 11月, 2018 6 次提交
    • C
      net: drop skb on failure in ip_check_defrag() · 7de414a9
      Cong Wang 提交于
      Most callers of pskb_trim_rcsum() simply drop the skb when
      it fails, however, ip_check_defrag() still continues to pass
      the skb up to stack. This is suspicious.
      
      In ip_check_defrag(), after we learn the skb is an IP fragment,
      passing the skb to callers makes no sense, because callers expect
      fragments are defrag'ed on success. So, dropping the skb when we
      can't defrag it is reasonable.
      
      Note, prior to commit 88078d98, this is not a big problem as
      checksum will be fixed up anyway. After it, the checksum is not
      correct on failure.
      
      Found this during code review.
      
      Fixes: 88078d98 ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7de414a9
    • L
      Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 7c6c54b5
      Linus Torvalds 提交于
      Pull i2c fixes from Wolfram Sang:
       "I2C has a core bugfix & cleanup as well as an ID addition and
        MAINTAINERS update for you"
      
      * 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        MAINTAINERS: add maintainer for IMX LPI2C driver
        dt-bindings: i2c: i2c-imx-lpi2c: add imx8qxp compatible string
        i2c: Clear client->irq in i2c_device_remove
        i2c: Remove unnecessary call to irq_find_mapping
      7c6c54b5
    • L
      Merge branch 'for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu · 6444ccfd
      Linus Torvalds 提交于
      Pull percpu fixes from Dennis Zhou:
       "Two small things for v4.20.
      
        The first fixes a clang uninitialized variable warning for arm64 in
        the default path calls BUILD_BUG(). The second removes an unnecessary
        unlikely() in a WARN_ON() use"
      
      * 'for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
        arm64: percpu: Initialize ret in the default case
        mm: percpu: remove unnecessary unlikely()
      6444ccfd
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 82aa4671
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) BPF verifier fixes from Daniel Borkmann.
      
       2) HNS driver fixes from Huazhong Tan.
      
       3) FDB only works for ethernet devices, reject attempts to install FDB
          rules for others. From Ido Schimmel.
      
       4) Fix spectre V1 in vhost, from Jason Wang.
      
       5) Don't pass on-stack object to irq_set_affinity_hint() in mvpp2
          driver, from Marc Zyngier.
      
       6) Fix mlx5e checksum handling when RXFCS is enabled, from Eric
          Dumazet.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (49 commits)
        openvswitch: Fix push/pop ethernet validation
        net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules
        bpf: test make sure to run unpriv test cases in test_verifier
        bpf: add various test cases to test_verifier
        bpf: don't set id on after map lookup with ptr_to_map_val return
        bpf: fix partial copy of map_ptr when dst is scalar
        libbpf: Fix compile error in libbpf_attach_type_by_name
        kselftests/bpf: use ping6 as the default ipv6 ping binary if it exists
        selftests: mlxsw: qos_mc_aware: Add a test for UC awareness
        selftests: mlxsw: qos_mc_aware: Tweak for min shaper
        mlxsw: spectrum: Set minimum shaper on MC TCs
        mlxsw: reg: QEEC: Add minimum shaper fields
        net: hns3: bugfix for rtnl_lock's range in the hclgevf_reset()
        net: hns3: bugfix for rtnl_lock's range in the hclge_reset()
        net: hns3: bugfix for handling mailbox while the command queue reinitialized
        net: hns3: fix incorrect return value/type of some functions
        net: hns3: bugfix for hclge_mdio_write and hclge_mdio_read
        net: hns3: bugfix for is_valid_csq_clean_head()
        net: hns3: remove unnecessary queue reset in the hns3_uninit_all_ring()
        net: hns3: bugfix for the initialization of command queue's spin lock
        ...
      82aa4671
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · ffb845db
      Linus Torvalds 提交于
      Pull sparc fixes from David Miller:
       "Two small fixes"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: Wire up compat getpeername and getsockname.
        sparc64: Remvoe set_fs() from perf_callchain_user().
      ffb845db
    • L
      Merge tag 'csky-for-linus-4.20-fixup-dtb' of https://github.com/c-sky/csky-linux · 5c99a8d1
      Linus Torvalds 提交于
      Pull csky dtb fixups from Guo Ren:
       "These fix the csky dtb Kbuild to follow the new Devicetree dtb build
        rules"
      
      * tag 'csky-for-linus-4.20-fixup-dtb' of https://github.com/c-sky/csky-linux:
        csky: use common dtb build rules
        csky: remove builtin-dtb Kbuild
      5c99a8d1
  4. 01 11月, 2018 16 次提交