1. 27 9月, 2021 7 次提交
    • A
      dsa: mv88e6xxx: Include tagger overhead when setting MTU for DSA and CPU ports · b9c587fe
      Andrew Lunn 提交于
      Same members of the Marvell Ethernet switches impose MTU restrictions
      on ports used for connecting to the CPU or another switch for DSA. If
      the MTU is set too low, tagged frames will be discarded. Ensure the
      worst case tagger overhead is included in setting the MTU for DSA and
      CPU ports.
      
      Fixes: 1baf0fac ("net: dsa: mv88e6xxx: Use chip-wide max frame size for MTU")
      Reported by: 曹煜 <cao88yu@gmail.com>
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9c587fe
    • A
      dsa: mv88e6xxx: Fix MTU definition · b92ce2f5
      Andrew Lunn 提交于
      The MTU passed to the DSA driver is the payload size, typically 1500.
      However, the switch uses the frame size when applying restrictions.
      Adjust the MTU with the size of the Ethernet header and the frame
      checksum. The VLAN header also needs to be included when the frame
      size it per port, but not when it is global.
      
      Fixes: 1baf0fac ("net: dsa: mv88e6xxx: Use chip-wide max frame size for MTU")
      Reported by: 曹煜 <cao88yu@gmail.com>
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b92ce2f5
    • A
      dsa: mv88e6xxx: 6161: Use chip wide MAX MTU · fe230361
      Andrew Lunn 提交于
      The datasheets suggests the 6161 uses a per port setting for jumbo
      frames. Testing has however shown this is not correct, it uses the old
      style chip wide MTU control. Change the ops in the 6161 structure to
      reflect this.
      
      Fixes: 1baf0fac ("net: dsa: mv88e6xxx: Use chip-wide max frame size for MTU")
      Reported by: 曹煜 <cao88yu@gmail.com>
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe230361
    • Y
      net: mdiobus: Fix memory leak in __mdiobus_register · ab609f25
      Yanfei Xu 提交于
      Once device_register() failed, we should call put_device() to
      decrement reference count for cleanup. Or it will cause memory
      leak.
      
      BUG: memory leak
      unreferenced object 0xffff888114032e00 (size 256):
        comm "kworker/1:3", pid 2960, jiffies 4294943572 (age 15.920s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 08 2e 03 14 81 88 ff ff  ................
          08 2e 03 14 81 88 ff ff 90 76 65 82 ff ff ff ff  .........ve.....
        backtrace:
          [<ffffffff8265cfab>] kmalloc include/linux/slab.h:591 [inline]
          [<ffffffff8265cfab>] kzalloc include/linux/slab.h:721 [inline]
          [<ffffffff8265cfab>] device_private_init drivers/base/core.c:3203 [inline]
          [<ffffffff8265cfab>] device_add+0x89b/0xdf0 drivers/base/core.c:3253
          [<ffffffff828dd643>] __mdiobus_register+0xc3/0x450 drivers/net/phy/mdio_bus.c:537
          [<ffffffff828cb835>] __devm_mdiobus_register+0x75/0xf0 drivers/net/phy/mdio_devres.c:87
          [<ffffffff82b92a00>] ax88772_init_mdio drivers/net/usb/asix_devices.c:676 [inline]
          [<ffffffff82b92a00>] ax88772_bind+0x330/0x480 drivers/net/usb/asix_devices.c:786
          [<ffffffff82baa33f>] usbnet_probe+0x3ff/0xdf0 drivers/net/usb/usbnet.c:1745
          [<ffffffff82c36e17>] usb_probe_interface+0x177/0x370 drivers/usb/core/driver.c:396
          [<ffffffff82661d17>] call_driver_probe drivers/base/dd.c:517 [inline]
          [<ffffffff82661d17>] really_probe.part.0+0xe7/0x380 drivers/base/dd.c:596
          [<ffffffff826620bc>] really_probe drivers/base/dd.c:558 [inline]
          [<ffffffff826620bc>] __driver_probe_device+0x10c/0x1e0 drivers/base/dd.c:751
          [<ffffffff826621ba>] driver_probe_device+0x2a/0x120 drivers/base/dd.c:781
          [<ffffffff82662a26>] __device_attach_driver+0xf6/0x140 drivers/base/dd.c:898
          [<ffffffff8265eca7>] bus_for_each_drv+0xb7/0x100 drivers/base/bus.c:427
          [<ffffffff826625a2>] __device_attach+0x122/0x260 drivers/base/dd.c:969
          [<ffffffff82660916>] bus_probe_device+0xc6/0xe0 drivers/base/bus.c:487
          [<ffffffff8265cd0b>] device_add+0x5fb/0xdf0 drivers/base/core.c:3359
          [<ffffffff82c343b9>] usb_set_configuration+0x9d9/0xb90 drivers/usb/core/message.c:2170
          [<ffffffff82c4473c>] usb_generic_driver_probe+0x8c/0xc0 drivers/usb/core/generic.c:238
      
      BUG: memory leak
      unreferenced object 0xffff888116f06900 (size 32):
        comm "kworker/0:2", pid 2670, jiffies 4294944448 (age 7.160s)
        hex dump (first 32 bytes):
          75 73 62 2d 30 30 31 3a 30 30 33 00 00 00 00 00  usb-001:003.....
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81484516>] kstrdup+0x36/0x70 mm/util.c:60
          [<ffffffff814845a3>] kstrdup_const+0x53/0x80 mm/util.c:83
          [<ffffffff82296ba2>] kvasprintf_const+0xc2/0x110 lib/kasprintf.c:48
          [<ffffffff82358d4b>] kobject_set_name_vargs+0x3b/0xe0 lib/kobject.c:289
          [<ffffffff826575f3>] dev_set_name+0x63/0x90 drivers/base/core.c:3147
          [<ffffffff828dd63b>] __mdiobus_register+0xbb/0x450 drivers/net/phy/mdio_bus.c:535
          [<ffffffff828cb835>] __devm_mdiobus_register+0x75/0xf0 drivers/net/phy/mdio_devres.c:87
          [<ffffffff82b92a00>] ax88772_init_mdio drivers/net/usb/asix_devices.c:676 [inline]
          [<ffffffff82b92a00>] ax88772_bind+0x330/0x480 drivers/net/usb/asix_devices.c:786
          [<ffffffff82baa33f>] usbnet_probe+0x3ff/0xdf0 drivers/net/usb/usbnet.c:1745
          [<ffffffff82c36e17>] usb_probe_interface+0x177/0x370 drivers/usb/core/driver.c:396
          [<ffffffff82661d17>] call_driver_probe drivers/base/dd.c:517 [inline]
          [<ffffffff82661d17>] really_probe.part.0+0xe7/0x380 drivers/base/dd.c:596
          [<ffffffff826620bc>] really_probe drivers/base/dd.c:558 [inline]
          [<ffffffff826620bc>] __driver_probe_device+0x10c/0x1e0 drivers/base/dd.c:751
          [<ffffffff826621ba>] driver_probe_device+0x2a/0x120 drivers/base/dd.c:781
          [<ffffffff82662a26>] __device_attach_driver+0xf6/0x140 drivers/base/dd.c:898
          [<ffffffff8265eca7>] bus_for_each_drv+0xb7/0x100 drivers/base/bus.c:427
          [<ffffffff826625a2>] __device_attach+0x122/0x260 drivers/base/dd.c:969
      
      Reported-by: syzbot+398e7dc692ddbbb4cfec@syzkaller.appspotmail.com
      Signed-off-by: NYanfei Xu <yanfei.xu@windriver.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab609f25
    • D
      Revert "ibmvnic: check failover_pending in login response" · 2974b8a6
      Desnes A. Nunes do Rosario 提交于
      This reverts commit d437f5aa.
      
      Code has been duplicated through commit <273c29e9> "ibmvnic: check
      failover_pending in login response"
      Signed-off-by: NDesnes A. Nunes do Rosario <desnesn@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2974b8a6
    • M
      net: bgmac-platform: handle mac-address deferral · 763716a5
      Matthew Hagan 提交于
      This patch is a replication of Christian Lamparter's "net: bgmac-bcma:
      handle deferred probe error due to mac-address" patch for the
      bgmac-platform driver [1].
      
      As is the case with the bgmac-bcma driver, this change is to cover the
      scenario where the MAC address cannot yet be discovered due to reliance
      on an nvmem provider which is yet to be instantiated, resulting in a
      random address being assigned that has to be manually overridden.
      
      [1] https://lore.kernel.org/netdev/20210919115725.29064-1-chunkeey@gmail.comSigned-off-by: NMatthew Hagan <mnhagan88@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      763716a5
    • C
      net: hns: Fix spelling mistake "maped" -> "mapped" · 44b6aa2e
      Colin Ian King 提交于
      There is a spelling mistake in a dev_err error message. Fix it.
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44b6aa2e
  2. 26 9月, 2021 1 次提交
    • net: prevent user from passing illegal stab size · b193e15a
      王贇 提交于
      We observed below report when playing with netlink sock:
      
        UBSAN: shift-out-of-bounds in net/sched/sch_api.c:580:10
        shift exponent 249 is too large for 32-bit type
        CPU: 0 PID: 685 Comm: a.out Not tainted
        Call Trace:
         dump_stack_lvl+0x8d/0xcf
         ubsan_epilogue+0xa/0x4e
         __ubsan_handle_shift_out_of_bounds+0x161/0x182
         __qdisc_calculate_pkt_len+0xf0/0x190
         __dev_queue_xmit+0x2ed/0x15b0
      
      it seems like kernel won't check the stab log value passing from
      user, and will use the insane value later to calculate pkt_len.
      
      This patch just add a check on the size/cell_log to avoid insane
      calculation.
      Reported-by: NAbaci <abaci@linux.alibaba.com>
      Signed-off-by: NMichael Wang <yun.wang@linux.alibaba.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b193e15a
  3. 25 9月, 2021 1 次提交
    • J
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 7fe7f318
      Jakub Kicinski 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS fixes for net
      
      1) ipset limits the max allocatable memory via kvmalloc() to MAX_INT,
         from Jozsef Kadlecsik.
      
      2) Check ip_vs_conn_tab_bits value to be in the range specified
         in Kconfig, from Andrea Claudi.
      
      3) Initialize fragment offset in ip6tables, from Jeremy Sowden.
      
      4) Make conntrack hash chain length random, from Florian Westphal.
      
      5) Add zone ID to conntrack and NAT hashtuple again, also from Florian.
      
      6) Add selftests for bidirectional zone support and colliding tuples,
         from Florian Westphal.
      
      7) Unlink table before synchronize_rcu when cleaning tables with
         owner, from Florian.
      
      8) ipset limits the max allocatable memory via kvmalloc() to MAX_INT.
      
      9) Release conntrack entries via workqueue in masquerade, from Florian.
      
      10) Fix bogus net_init in iptables raw table definition, also from Florian.
      
      11) Work around missing softdep in log extensions, from Florian Westphal.
      
      12) Serialize hash resizes and cleanups with mutex, from Eric Dumazet.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
        netfilter: conntrack: serialize hash resizes and cleanups
        netfilter: log: work around missing softdep backend module
        netfilter: iptable_raw: drop bogus net_init annotation
        netfilter: nf_nat_masquerade: defer conntrack walk to work queue
        netfilter: nf_nat_masquerade: make async masq_inet6_event handling generic
        netfilter: nf_tables: Fix oversized kvmalloc() calls
        netfilter: nf_tables: unlink table before deleting it
        selftests: netfilter: add zone stress test with colliding tuples
        selftests: netfilter: add selftest for directional zone support
        netfilter: nat: include zone id in nat table hash again
        netfilter: conntrack: include zone id in tuple hash again
        netfilter: conntrack: make max chain length random
        netfilter: ip6_tables: zero-initialize fragment offset
        ipvs: check that ip_vs_conn_tab_bits is between 8 and 20
        netfilter: ipset: Fix oversized kvmalloc() calls
      ====================
      
      Link: https://lore.kernel.org/r/20210924221113.348767-1-pablo@netfilter.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      7fe7f318
  4. 24 9月, 2021 10 次提交
    • D
      drivers: net: mhi: fix error path in mhi_net_newlink · 4526fe74
      Daniele Palmas 提交于
      Fix double free_netdev when mhi_prepare_for_transfer fails.
      
      Fixes: 3ffec6a1 ("net: Add mhi-net driver")
      Signed-off-by: NDaniele Palmas <dnlplm@gmail.com>
      Reviewed-by: NManivannan Sadhasivam <mani@kernel.org>
      Reviewed-by: NLoic Poulain <loic.poulain@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4526fe74
    • A
      smsc95xx: fix stalled rx after link change · 5ab8a447
      Aaro Koskinen 提交于
      After commit 05b35e7e ("smsc95xx: add phylib support"), link changes
      are no longer propagated to usbnet. As a result, rx URB allocation won't
      happen until there is a packet sent out first (this might never happen,
      e.g. running just ssh server with a static IP). Fix by triggering usbnet
      EVENT_LINK_CHANGE.
      
      Fixes: 05b35e7e ("smsc95xx: add phylib support")
      Signed-off-by: NAaro Koskinen <aaro.koskinen@iki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ab8a447
    • X
      net: ipv4: Fix rtnexthop len when RTA_FLOW is present · 597aa16c
      Xiao Liang 提交于
      Multipath RTA_FLOW is embedded in nexthop. Dump it in fib_add_nexthop()
      to get the length of rtnexthop correct.
      
      Fixes: b0f60193 ("ipv4: Refactor nexthop attributes in fib_dump_info")
      Signed-off-by: NXiao Liang <shaw.leon@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      597aa16c
    • V
      net: enetc: fix the incorrect clearing of IF_MODE bits · 325fd36a
      Vladimir Oltean 提交于
      The enetc phylink .mac_config handler intends to clear the IFMODE field
      (bits 1:0) of the PM0_IF_MODE register, but incorrectly clears all the
      other fields instead.
      
      For normal operation, the bug was inconsequential, due to the fact that
      we write the PM0_IF_MODE register in two stages, first in
      phylink .mac_config (which incorrectly cleared out a bunch of stuff),
      then we update the speed and duplex to the correct values in
      phylink .mac_link_up.
      
      Judging by the code (not tested), it looks like maybe loopback mode was
      broken, since this is one of the settings in PM0_IF_MODE which is
      incorrectly cleared.
      
      Fixes: c76a9721 ("net: enetc: force the RGMII speed and duplex instead of operating in inband mode")
      Reported-by: NPavel Machek (CIP) <pavel@denx.de>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      325fd36a
    • D
      Merge branch 'mptcp-fixes' · 42007019
      David S. Miller 提交于
      Mat Martineau says:
      
      ====================
      mptcp: Bug fixes
      
      This patch set includes two separate fixes for the net tree:
      
      Patch 1 makes sure that MPTCP token searches are always limited to the
      appropriate net namespace.
      
      Patch 2 allows userspace to always change the backup settings for
      configured endpoints even if those endpoints are not currently in use.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      42007019
    • D
      mptcp: allow changing the 'backup' bit when no sockets are open · 3f4a0890
      Davide Caratti 提交于
      current Linux refuses to change the 'backup' bit of MPTCP endpoints, i.e.
      using MPTCP_PM_CMD_SET_FLAGS, unless it finds (at least) one subflow that
      matches the endpoint address. There is no reason for that, so we can just
      ignore the return value of mptcp_nl_addr_backup(). In this way, endpoints
      can reconfigure their 'backup' flag even if no MPTCP sockets are open (or
      more generally, in case the MP_PRIO message is not sent out).
      
      Fixes: 0f9f696a ("mptcp: add set_flags command in PM netlink")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f4a0890
    • F
      mptcp: don't return sockets in foreign netns · ea1300b9
      Florian Westphal 提交于
      mptcp_token_get_sock() may return a mptcp socket that is in
      a different net namespace than the socket that received the token value.
      
      The mptcp syncookie code path had an explicit check for this,
      this moves the test into mptcp_token_get_sock() function.
      
      Eventually token.c should be converted to pernet storage, but
      such change is not suitable for net tree.
      
      Fixes: 2c5ebd00 ("mptcp: refactor token container")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea1300b9
    • X
      sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb · f7e745f8
      Xin Long 提交于
      We should always check if skb_header_pointer's return is NULL before
      using it, otherwise it may cause null-ptr-deref, as syzbot reported:
      
        KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
        RIP: 0010:sctp_rcv_ootb net/sctp/input.c:705 [inline]
        RIP: 0010:sctp_rcv+0x1d84/0x3220 net/sctp/input.c:196
        Call Trace:
        <IRQ>
         sctp6_rcv+0x38/0x60 net/sctp/ipv6.c:1109
         ip6_protocol_deliver_rcu+0x2e9/0x1ca0 net/ipv6/ip6_input.c:422
         ip6_input_finish+0x62/0x170 net/ipv6/ip6_input.c:463
         NF_HOOK include/linux/netfilter.h:307 [inline]
         NF_HOOK include/linux/netfilter.h:301 [inline]
         ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:472
         dst_input include/net/dst.h:460 [inline]
         ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline]
         NF_HOOK include/linux/netfilter.h:307 [inline]
         NF_HOOK include/linux/netfilter.h:301 [inline]
         ipv6_rcv+0x28c/0x3c0 net/ipv6/ip6_input.c:297
      
      Fixes: 3acb50c1 ("sctp: delay as much as possible skb_linearize")
      Reported-by: syzbot+581aff2ae6b860625116@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7e745f8
    • L
      Merge tag 'net-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 9bc62afe
      Linus Torvalds 提交于
      Pull networking fixes from Jakub Kicinski:
       "Current release - regressions:
      
         - dsa: bcm_sf2: fix array overrun in bcm_sf2_num_active_ports()
      
        Previous releases - regressions:
      
         - introduce a shutdown method to mdio device drivers, and make DSA
           switch drivers compatible with masters disappearing on shutdown;
           preventing infinite reference wait
      
         - fix issues in mdiobus users related to ->shutdown vs ->remove
      
         - virtio-net: fix pages leaking when building skb in big mode
      
         - xen-netback: correct success/error reporting for the
           SKB-with-fraglist
      
         - dsa: tear down devlink port regions when tearing down the devlink
           port on error
      
         - nexthop: fix division by zero while replacing a resilient group
      
         - hns3: check queue, vf, vlan ids range before using
      
        Previous releases - always broken:
      
         - napi: fix race against netpoll causing NAPI getting stuck
      
         - mlx4_en: ensure link operstate is updated even if link comes up
           before netdev registration
      
         - bnxt_en: fix TX timeout when TX ring size is set to the smallest
      
         - enetc: fix illegal access when reading affinity_hint; prevent oops
           on sysfs access
      
         - mtk_eth_soc: avoid creating duplicate offload entries
      
        Misc:
      
         - core: correct the sock::sk_lock.owned lockdep annotations"
      
      * tag 'net-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
        atlantic: Fix issue in the pm resume flow.
        net/mlx4_en: Don't allow aRFS for encapsulated packets
        net: mscc: ocelot: fix forwarding from BLOCKING ports remaining enabled
        net: ethernet: mtk_eth_soc: avoid creating duplicate offload entries
        nfc: st-nci: Add SPI ID matching DT compatible
        MAINTAINERS: remove Guvenc Gulce as net/smc maintainer
        nexthop: Fix memory leaks in nexthop notification chain listeners
        mptcp: ensure tx skbs always have the MPTCP ext
        qed: rdma - don't wait for resources under hw error recovery flow
        s390/qeth: fix deadlock during failing recovery
        s390/qeth: Fix deadlock in remove_discipline
        s390/qeth: fix NULL deref in qeth_clear_working_pool_list()
        net: dsa: realtek: register the MDIO bus under devres
        net: dsa: don't allocate the slave_mii_bus using devres
        Doc: networking: Fox a typo in ice.rst
        net: dsa: fix dsa_tree_setup error path
        net/smc: fix 'workqueue leaked lock' in smc_conn_abort_work
        net/smc: add missing error check in smc_clc_prfx_set()
        net: hns3: fix a return value error in hclge_get_reset_status()
        net: hns3: check vlan id before using it
        ...
      9bc62afe
    • S
      memcg: flush lruvec stats in the refault · 1f828223
      Shakeel Butt 提交于
      Prior to the commit 7e1c0d6f ("memcg: switch lruvec stats to rstat")
      and the commit aa48e47e ("memcg: infrastructure to flush memcg
      stats"), each lruvec memcg stats can be off by (nr_cgroups * nr_cpus *
      32) at worst and for unbounded amount of time.  The commit aa48e47e
      moved the lruvec stats to rstat infrastructure and the commit
      7e1c0d6f bounded the error for all the lruvec stats to (nr_cpus *
      32) at worst for at most 2 seconds.  More specifically it decoupled the
      number of stats and the number of cgroups from the error rate.
      
      However this reduction in error comes with the cost of triggering the
      slowpath of stats update more frequently.  Previously in the slowpath
      the kernel adds the stats up the memcg tree.  After aa48e47e, the
      kernel triggers the asyn lruvec stats flush through queue_work().  This
      causes regression reports from 0day kernel bot [1] as well as from
      phoronix test suite [2].
      
      We tried two options to fix the regression:
      
       1) Increase the threshold to trigger the slowpath in lruvec stats
          update codepath from 32 to 512.
      
       2) Remove the slowpath from lruvec stats update codepath and instead
          flush the stats in the page refault codepath. The assumption is that
          the kernel timely flush the stats, so, the update tree would be
          small in the refault codepath to not cause the preformance impact.
      
      Following are the results of will-it-scale/page_fault[1|2|3] benchmark
      on four settings i.e.  (1) 5.15-rc1 as baseline (2) 5.15-rc1 with
      aa48e47e and 7e1c0d6f reverted (3) 5.15-rc1 with option-1
      (4) 5.15-rc1 with option-2.
      
        test       (1)      (2)               (3)               (4)
        pg_f1   368563   406277 (10.23%)   399693  (8.44%)   416398 (12.97%)
        pg_f2   338399   372133  (9.96%)   369180  (9.09%)   381024 (12.59%)
        pg_f3   500853   575399 (14.88%)   570388 (13.88%)   576083 (15.02%)
      
      From the above result, it seems like the option-2 not only solves the
      regression but also improves the performance for at least these
      benchmarks.
      
      Feng Tang (intel) ran the aim7 benchmark with these two options and
      confirms that option-1 reduces the regression but option-2 removes the
      regression.
      
      Michael Larabel (phoronix) ran multiple benchmarks with these options
      and reported the results at [3] and it shows for most benchmarks
      option-2 removes the regression introduced by the commit aa48e47e
      ("memcg: infrastructure to flush memcg stats").
      
      Based on the experiment results, this patch proposed the option-2 as the
      solution to resolve the regression.
      
      Link: https://lore.kernel.org/all/20210726022421.GB21872@xsang-OptiPlex-9020 [1]
      Link: https://www.phoronix.com/scan.php?page=article&item=linux515-compile-regress [2]
      Link: https://openbenchmarking.org/result/2109226-DEBU-LINUX5104 [3]
      Fixes: aa48e47e ("memcg: infrastructure to flush memcg stats")
      Signed-off-by: NShakeel Butt <shakeelb@google.com>
      Tested-by: NMichael Larabel <Michael@phoronix.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Hillf Danton <hdanton@sina.com>,
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>,
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f828223
  5. 23 9月, 2021 14 次提交
  6. 22 9月, 2021 7 次提交
    • J
      MAINTAINERS: ARM/VT8500, remove defunct e-mail · 8f1b7ba5
      Jiri Slaby 提交于
      linux@prisktech.co.nz is defunct:
      
        4.1.2 <linux@prisktech.co.nz>: Recipient address rejected: Domain not found
      
      Remove it from MAINTAINERS and mark the ARM/VT8500 entry orphan.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f1b7ba5
    • P
      mptcp: ensure tx skbs always have the MPTCP ext · 977d293e
      Paolo Abeni 提交于
      Due to signed/unsigned comparison, the expression:
      
      	info->size_goal - skb->len > 0
      
      evaluates to true when the size goal is smaller than the
      skb size. That results in lack of tx cache refill, so that
      the skb allocated by the core TCP code lacks the required
      MPTCP skb extensions.
      
      Due to the above, syzbot is able to trigger the following WARN_ON():
      
      WARNING: CPU: 1 PID: 810 at net/mptcp/protocol.c:1366 mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366
      Modules linked in:
      CPU: 1 PID: 810 Comm: syz-executor.4 Not tainted 5.14.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366
      Code: ff 4c 8b 74 24 50 48 8b 5c 24 58 e9 0f fb ff ff e8 13 44 8b f8 4c 89 e7 45 31 ed e8 98 57 2e fe e9 81 f4 ff ff e8 fe 43 8b f8 <0f> 0b 41 bd ea ff ff ff e9 6f f4 ff ff 4c 89 e7 e8 b9 8e d2 f8 e9
      RSP: 0018:ffffc9000531f6a0 EFLAGS: 00010216
      RAX: 000000000000697f RBX: 0000000000000000 RCX: ffffc90012107000
      RDX: 0000000000040000 RSI: ffffffff88eac9e2 RDI: 0000000000000003
      RBP: ffff888078b15780 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff88eac017 R11: 0000000000000000 R12: ffff88801de0a280
      R13: 0000000000006b58 R14: ffff888066278280 R15: ffff88803c2fe9c0
      FS:  00007fd9f866e700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007faebcb2f718 CR3: 00000000267cb000 CR4: 00000000001506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       __mptcp_push_pending+0x1fb/0x6b0 net/mptcp/protocol.c:1547
       mptcp_release_cb+0xfe/0x210 net/mptcp/protocol.c:3003
       release_sock+0xb4/0x1b0 net/core/sock.c:3206
       sk_stream_wait_memory+0x604/0xed0 net/core/stream.c:145
       mptcp_sendmsg+0xc39/0x1bc0 net/mptcp/protocol.c:1749
       inet6_sendmsg+0x99/0xe0 net/ipv6/af_inet6.c:643
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       sock_write_iter+0x2a0/0x3e0 net/socket.c:1057
       call_write_iter include/linux/fs.h:2163 [inline]
       new_sync_write+0x40b/0x640 fs/read_write.c:507
       vfs_write+0x7cf/0xae0 fs/read_write.c:594
       ksys_write+0x1ee/0x250 fs/read_write.c:647
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x4665f9
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fd9f866e188 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 000000000056c038 RCX: 00000000004665f9
      RDX: 00000000000e7b78 RSI: 0000000020000000 RDI: 0000000000000003
      RBP: 00000000004bfcc4 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056c038
      R13: 0000000000a9fb1f R14: 00007fd9f866e300 R15: 0000000000022000
      
      Fix the issue rewriting the relevant expression to avoid
      sign-related problems - note: size_goal is always >= 0.
      
      Additionally, ensure that the skb in the tx cache always carries
      the relevant extension.
      
      Reported-and-tested-by: syzbot+263a248eec3e875baa7b@syzkaller.appspotmail.com
      Fixes: 1094c6fe ("mptcp: fix possible divide by zero")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      977d293e
    • S
      qed: rdma - don't wait for resources under hw error recovery flow · 1ea78123
      Shai Malin 提交于
      If the HW device is during recovery, the HW resources will never return,
      hence we shouldn't wait for the CID (HW context ID) bitmaps to clear.
      This fix speeds up the error recovery flow.
      
      Fixes: 64515dc8 ("qed: Add infrastructure for error detection and recovery")
      Signed-off-by: NMichal Kalderon <mkalderon@marvell.com>
      Signed-off-by: NAriel Elior <aelior@marvell.com>
      Signed-off-by: NShai Malin <smalin@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ea78123
    • J
      Merge branch 's390-qeth-fixes-2021-09-21' · b52d3161
      Jakub Kicinski 提交于
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2021-09-21
      
      This brings two fixes for deadlocks when a device is removed while it
      has certain types of async work pending. And one additional fix for a
      missing NULL check in an error case.
      ====================
      
      Link: https://lore.kernel.org/r/20210921145217.1584654-1-jwi@linux.ibm.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      b52d3161
    • A
      s390/qeth: fix deadlock during failing recovery · d2b59bd4
      Alexandra Winter 提交于
      Commit 0b9902c1 ("s390/qeth: fix deadlock during recovery") removed
      taking discipline_mutex inside qeth_do_reset(), fixing potential
      deadlocks. An error path was missed though, that still takes
      discipline_mutex and thus has the original deadlock potential.
      
      Intermittent deadlocks were seen when a qeth channel path is configured
      offline, causing a race between qeth_do_reset and ccwgroup_remove.
      Call qeth_set_offline() directly in the qeth_do_reset() error case and
      then a new variant of ccwgroup_set_offline(), without taking
      discipline_mutex.
      
      Fixes: b41b554c ("s390/qeth: fix locking for discipline setup / removal")
      Signed-off-by: NAlexandra Winter <wintera@linux.ibm.com>
      Reviewed-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d2b59bd4
    • A
      s390/qeth: Fix deadlock in remove_discipline · ee909d0b
      Alexandra Winter 提交于
      Problem: qeth_close_dev_handler is a worker that tries to acquire
      card->discipline_mutex via drv->set_offline() in ccwgroup_set_offline().
      Since commit b41b554c
      ("s390/qeth: fix locking for discipline setup / removal")
      qeth_remove_discipline() is called under card->discipline_mutex and
      cancels the work and waits for it to finish.
      
      STOPLAN reception with reason code IPA_RC_VEPA_TO_VEB_TRANSITION is the
      only situation that schedules close_dev_work. In that situation scheduling
      qeth recovery will also result in an offline interface, when resetting the
      isolation mode fails, if the external switch is still set to VEB.
      And since commit 0b9902c1 ("s390/qeth: fix deadlock during recovery")
      qeth recovery does not aquire card->discipline_mutex anymore.
      
      So we accept the longer pathlength of qeth_schedule_recovery in this
      error situation and re-use the existing function.
      
      As a side-benefit this changes the hwtrap to behave like during recovery
      instead of like during a user-triggered set_offline.
      
      Fixes: b41b554c ("s390/qeth: fix locking for discipline setup / removal")
      Signed-off-by: NAlexandra Winter <wintera@linux.ibm.com>
      Acked-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ee909d0b
    • J
      s390/qeth: fix NULL deref in qeth_clear_working_pool_list() · 248f064a
      Julian Wiedmann 提交于
      When qeth_set_online() calls qeth_clear_working_pool_list() to roll
      back after an error exit from qeth_hardsetup_card(), we are at risk of
      accessing card->qdio.in_q before it was allocated by
      qeth_alloc_qdio_queues() via qeth_mpc_initialize().
      
      qeth_clear_working_pool_list() then dereferences NULL, and by writing to
      queue->bufs[i].pool_entry scribbles all over the CPU's lowcore.
      Resulting in a crash when those lowcore areas are used next (eg. on
      the next machine-check interrupt).
      
      Such a scenario would typically happen when the device is first set
      online and its queues aren't allocated yet. An early IO error or certain
      misconfigs (eg. mismatched transport mode, bad portno) then cause us to
      error out from qeth_hardsetup_card() with card->qdio.in_q still being
      NULL.
      
      Fix it by checking the pointer for NULL before accessing it.
      
      Note that we also have (rare) paths inside qeth_mpc_initialize() where
      a configuration change can cause us to free the existing queues,
      expecting that subsequent code will allocate them again. If we then
      error out before that re-allocation happens, the same bug occurs.
      
      Fixes: eff73e16 ("s390/qeth: tolerate pre-filled RX buffer")
      Reported-by: NStefan Raspl <raspl@linux.ibm.com>
      Root-caused-by: NHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: NAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      248f064a