1. 10 11月, 2019 40 次提交
    • J
      platform/x86: pmc_atom: Add Siemens SIMATIC IPC227E to critclk_systems DMI table · 06e8438e
      Jan Kiszka 提交于
      commit ad0d315b4d4e7138f43acf03308192ec00e9614d upstream.
      
      The SIMATIC IPC227E uses the PMC clock for on-board components and gets
      stuck during boot if the clock is disabled. Therefore, add this device
      to the critical systems list.
      
      Fixes: 648e9218 ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL")
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      06e8438e
    • M
      wireless: Skip directory when generating certificates · 2d830cf2
      Maxim Mikityanskiy 提交于
      [ Upstream commit 32b5a2c9950b9284000059d752f7afa164deb15e ]
      
      Commit 715a1233 ("wireless: don't write C files on failures") drops
      the `test -f $$f` check. The list of targets contains the
      CONFIG_CFG80211_EXTRA_REGDB_KEYDIR directory itself, and this check used
      to filter it out. After the check was removed, the extra keydir option
      no longer works, failing with the following message:
      
      od: 'standard input': read error: Is a directory
      
      This commit restores the check to make extra keydir work again.
      
      Fixes: 715a1233 ("wireless: don't write C files on failures")
      Signed-off-by: NMaxim Mikityanskiy <maxtram95@gmail.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2d830cf2
    • E
      net/flow_dissector: switch to siphash · 558d2bda
      Eric Dumazet 提交于
      [ Upstream commit 55667441c84fa5e0911a0aac44fb059c15ba6da2 ]
      
      UDP IPv6 packets auto flowlabels are using a 32bit secret
      (static u32 hashrnd in net/core/flow_dissector.c) and
      apply jhash() over fields known by the receivers.
      
      Attackers can easily infer the 32bit secret and use this information
      to identify a device and/or user, since this 32bit secret is only
      set at boot time.
      
      Really, using jhash() to generate cookies sent on the wire
      is a serious security concern.
      
      Trying to change the rol32(hash, 16) in ip6_make_flowlabel() would be
      a dead end. Trying to periodically change the secret (like in sch_sfq.c)
      could change paths taken in the network for long lived flows.
      
      Let's switch to siphash, as we did in commit df453700e8d8
      ("inet: switch IP ID generator to siphash")
      
      Using a cryptographically strong pseudo random function will solve this
      privacy issue and more generally remove other weak points in the stack.
      
      Packet schedulers using skb_get_hash_perturb() benefit from this change.
      
      Fixes: b5677416 ("ipv6: Enable auto flow labels by default")
      Fixes: 42240901 ("ipv6: Implement different admin modes for automatic flow labels")
      Fixes: 67800f9b ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel")
      Fixes: cb1ce2ef ("ipv6: Implement automatic flow label generation on transmit")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NJonathan Berger <jonathann1@walla.com>
      Reported-by: NAmit Klein <aksecurity@gmail.com>
      Reported-by: NBenny Pinkas <benny@pinkas.net>
      Cc: Tom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      558d2bda
    • K
      r8152: add device id for Lenovo ThinkPad USB-C Dock Gen 2 · f6ef3599
      Kazutoshi Noguchi 提交于
      [ Upstream commit b3060531979422d5bb18d80226f978910284dc70 ]
      
      This device is sold as 'ThinkPad USB-C Dock Gen 2 (40AS)'.
      Chipset is RTL8153 and works with r8152.
      Without this, the generic cdc_ether grabs the device, and the device jam
      connected networks up when the machine suspends.
      Signed-off-by: NKazutoshi Noguchi <noguchi.kazutosi@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f6ef3599
    • V
      net: dsa: fix switch tree list · c33f7efe
      Vivien Didelot 提交于
      [ Upstream commit 50c7d2ba9de20f60a2d527ad6928209ef67e4cdd ]
      
      If there are multiple switch trees on the device, only the last one
      will be listed, because the arguments of list_add_tail are swapped.
      
      Fixes: 83c0afae ("net: dsa: Add new binding implementation")
      Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c33f7efe
    • A
      net: usb: lan78xx: Connect PHY before registering MAC · 6b5bf3f3
      Andrew Lunn 提交于
      [ Upstream commit 38b4fe320119859c11b1dc06f6b4987a16344fa1 ]
      
      As soon as the netdev is registers, the kernel can start using the
      interface. If the driver connects the MAC to the PHY after the netdev
      is registered, there is a race condition where the interface can be
      opened without having the PHY connected.
      
      Change the order to close this race condition.
      
      Fixes: 92571a1a ("lan78xx: Connect phy early")
      Reported-by: NDaniel Wagner <dwagner@suse.de>
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Tested-by: NDaniel Wagner <dwagner@suse.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b5bf3f3
    • D
      net: bcmgenet: reset 40nm EPHY on energy detect · 07c62fc7
      Doug Berger 提交于
      [ Upstream commit 25382b991d252aed961cd434176240f9de6bb15f ]
      
      The EPHY integrated into the 40nm Set-Top Box devices can falsely
      detect energy when connected to a disabled peer interface. When the
      peer interface is enabled the EPHY will detect and report the link
      as active, but on occasion may get into a state where it is not
      able to exchange data with the connected GENET MAC. This issue has
      not been observed when the link parameters are auto-negotiated;
      however, it has been observed with a manually configured link.
      
      It has been empirically determined that issuing a soft reset to the
      EPHY when energy is detected prevents it from getting into this bad
      state.
      
      Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
      Signed-off-by: NDoug Berger <opendmb@gmail.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07c62fc7
    • D
      net: phy: bcm7xxx: define soft_reset for 40nm EPHY · 6d3ccc2a
      Doug Berger 提交于
      [ Upstream commit fe586b823372a9f43f90e2c6aa0573992ce7ccb7 ]
      
      The internal 40nm EPHYs use a "Workaround for putting the PHY in
      IDDQ mode." These PHYs require a soft reset to restore functionality
      after they are powered back up.
      
      This commit defines the soft_reset function to use genphy_soft_reset
      during phy_init_hw to accommodate this.
      
      Fixes: 6e2d85ec0559 ("net: phy: Stop with excessive soft reset")
      Signed-off-by: NDoug Berger <opendmb@gmail.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d3ccc2a
    • D
      net: bcmgenet: don't set phydev->link from MAC · 97cc6827
      Doug Berger 提交于
      [ Upstream commit 7de48402faa32298c3551ea32c76ccb4f9d3025d ]
      
      When commit 28b2e0d2 ("net: phy: remove parameter new_link from
      phy_mac_interrupt()") removed the new_link parameter it set the
      phydev->link state from the MAC before invoking phy_mac_interrupt().
      
      However, once commit 88d6272acaaa ("net: phy: avoid unneeded MDIO
      reads in genphy_read_status") was added this initialization prevents
      the proper determination of the connection parameters by the function
      genphy_read_status().
      
      This commit removes that initialization to restore the proper
      functionality.
      
      Fixes: 88d6272acaaa ("net: phy: avoid unneeded MDIO reads in genphy_read_status")
      Signed-off-by: NDoug Berger <opendmb@gmail.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97cc6827
    • F
      net: dsa: b53: Do not clear existing mirrored port mask · 57e286f6
      Florian Fainelli 提交于
      [ Upstream commit c763ac436b668d7417f0979430ec0312ede4093d ]
      
      Clearing the existing bitmask of mirrored ports essentially prevents us
      from capturing more than one port at any given time. This is clearly
      wrong, do not clear the bitmask prior to setting up the new port.
      Reported-by: NHubert Feurstein <h.feurstein@gmail.com>
      Fixes: ed3af5fd ("net: dsa: b53: Add support for port mirroring")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57e286f6
    • A
      net/mlx5e: Fix ethtool self test: link speed · db91be8e
      Aya Levin 提交于
      [ Upstream commit 534e7366f41b0c689b01af4375aefcd1462adedf ]
      
      Ethtool self test contains a test for link speed. This test reads the
      PTYS register and determines whether the current speed is valid or not.
      Change current implementation to use the function mlx5e_port_linkspeed()
      that does the same check and fails when speed is invalid. This code
      redundancy lead to a bug when mlx5e_port_linkspeed() was updated with
      expended speeds and the self test was not.
      
      Fixes: 2c81bfd5 ("net/mlx5e: Move port speed code from en_ethtool.c to en/port.c")
      Signed-off-by: NAya Levin <ayal@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db91be8e
    • H
      r8169: fix wrong PHY ID issue with RTL8168dp · 5eb1967b
      Heiner Kallweit 提交于
      [ Upstream commit 62bdc8fd1c21d4263ebd18bec57f82532d09249f ]
      
      As reported in [0] at least one RTL8168dp version has problems
      establishing a link. This chip version has an integrated RTL8211b PHY,
      however the chip seems to report a wrong PHY ID, resulting in a wrong
      PHY driver (for Generic Realtek PHY) being loaded.
      Work around this issue by adding a hook to r8168dp_2_mdio_read()
      for returning the correct PHY ID.
      
      [0] https://bbs.archlinux.org/viewtopic.php?id=246508
      
      Fixes: 242cd9b5 ("r8169: use phy_resume/phy_suspend")
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5eb1967b
    • M
      net/mlx5e: Fix handling of compressed CQEs in case of low NAPI budget · 9e7c4fa2
      Maxim Mikityanskiy 提交于
      [ Upstream commit 9df86bdb6746d7fcfc2fda715f7a7c3d0ddb2654 ]
      
      When CQE compression is enabled, compressed CQEs use the following
      structure: a title is followed by one or many blocks, each containing 8
      mini CQEs (except the last, which may contain fewer mini CQEs).
      
      Due to NAPI budget restriction, a complete structure is not always
      parsed in one NAPI run, and some blocks with mini CQEs may be deferred
      to the next NAPI poll call - we have the mlx5e_decompress_cqes_cont call
      in the beginning of mlx5e_poll_rx_cq. However, if the budget is
      extremely low, some blocks may be left even after that, but the code
      that follows the mlx5e_decompress_cqes_cont call doesn't check it and
      assumes that a new CQE begins, which may not be the case. In such cases,
      random memory corruptions occur.
      
      An extremely low NAPI budget of 8 is used when busy_poll or busy_read is
      active.
      
      This commit adds a check to make sure that the previous compressed CQE
      has been completely parsed after mlx5e_decompress_cqes_cont, otherwise
      it prevents a new CQE from being fetched in the middle of a compressed
      CQE.
      
      This commit fixes random crashes in __build_skb, __page_pool_put_page
      and other not-related-directly places, that used to happen when both CQE
      compression and busy_poll/busy_read were enabled.
      
      Fixes: 7219ab34 ("net/mlx5e: CQE compression")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e7c4fa2
    • P
      selftests: fib_tests: add more tests for metric update · 0c3355cc
      Paolo Abeni 提交于
      [ Upstream commit 37de3b354150450ba12275397155e68113e99901 ]
      
      This patch adds two more tests to ipv4_addr_metric_test() to
      explicitly cover the scenarios fixed by the previous patch.
      Suggested-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c3355cc
    • P
      ipv4: fix route update on metric change. · b166e883
      Paolo Abeni 提交于
      [ Upstream commit 0b834ba00ab5337e938c727e216e1f5249794717 ]
      
      Since commit af4d768a ("net/ipv4: Add support for specifying metric
      of connected routes"), when updating an IP address with a different metric,
      the associated connected route is updated, too.
      
      Still, the mentioned commit doesn't handle properly some corner cases:
      
      $ ip addr add dev eth0 192.168.1.0/24
      $ ip addr add dev eth0 192.168.2.1/32 peer 192.168.2.2
      $ ip addr add dev eth0 192.168.3.1/24
      $ ip addr change dev eth0 192.168.1.0/24 metric 10
      $ ip addr change dev eth0 192.168.2.1/32 peer 192.168.2.2 metric 10
      $ ip addr change dev eth0 192.168.3.1/24 metric 10
      $ ip -4 route
      192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.0
      192.168.2.2 dev eth0 proto kernel scope link src 192.168.2.1
      192.168.3.0/24 dev eth0 proto kernel scope link src 192.168.2.1 metric 10
      
      Only the last route is correctly updated.
      
      The problem is the current test in fib_modify_prefix_metric():
      
      	if (!(dev->flags & IFF_UP) ||
      	    ifa->ifa_flags & (IFA_F_SECONDARY | IFA_F_NOPREFIXROUTE) ||
      	    ipv4_is_zeronet(prefix) ||
      	    prefix == ifa->ifa_local || ifa->ifa_prefixlen == 32)
      
      Which should be the logical 'not' of the pre-existing test in
      fib_add_ifaddr():
      
      	if (!ipv4_is_zeronet(prefix) && !(ifa->ifa_flags & IFA_F_SECONDARY) &&
      	    (prefix != addr || ifa->ifa_prefixlen < 32))
      
      To properly negate the original expression, we need to change the last
      logical 'or' to a logical 'and'.
      
      Fixes: af4d768a ("net/ipv4: Add support for specifying metric of connected routes")
      Reported-and-suggested-by: NBeniamino Galvani <bgalvani@redhat.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b166e883
    • E
      net: add READ_ONCE() annotation in __skb_wait_for_more_packets() · cd3bcb44
      Eric Dumazet 提交于
      [ Upstream commit 7c422d0ce97552dde4a97e6290de70ec6efb0fc6 ]
      
      __skb_wait_for_more_packets() can be called while other cpus
      can feed packets to the socket receive queue.
      
      KCSAN reported :
      
      BUG: KCSAN: data-race in __skb_wait_for_more_packets / __udp_enqueue_schedule_skb
      
      write to 0xffff888102e40b58 of 8 bytes by interrupt on cpu 0:
       __skb_insert include/linux/skbuff.h:1852 [inline]
       __skb_queue_before include/linux/skbuff.h:1958 [inline]
       __skb_queue_tail include/linux/skbuff.h:1991 [inline]
       __udp_enqueue_schedule_skb+0x2d7/0x410 net/ipv4/udp.c:1470
       __udp_queue_rcv_skb net/ipv4/udp.c:1940 [inline]
       udp_queue_rcv_one_skb+0x7bd/0xc70 net/ipv4/udp.c:2057
       udp_queue_rcv_skb+0xb5/0x400 net/ipv4/udp.c:2074
       udp_unicast_rcv_skb.isra.0+0x7e/0x1c0 net/ipv4/udp.c:2233
       __udp4_lib_rcv+0xa44/0x17c0 net/ipv4/udp.c:2300
       udp_rcv+0x2b/0x40 net/ipv4/udp.c:2470
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       process_backlog+0x1d3/0x420 net/core/dev.c:5955
      
      read to 0xffff888102e40b58 of 8 bytes by task 13035 on cpu 1:
       __skb_wait_for_more_packets+0xfa/0x320 net/core/datagram.c:100
       __skb_recv_udp+0x374/0x500 net/ipv4/udp.c:1683
       udp_recvmsg+0xe1/0xb10 net/ipv4/udp.c:1712
       inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871
       ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
       do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
       __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
       __do_sys_recvmmsg net/socket.c:2703 [inline]
       __se_sys_recvmmsg net/socket.c:2696 [inline]
       __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 13035 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd3bcb44
    • E
      net: use skb_queue_empty_lockless() in busy poll contexts · 4f3df7f1
      Eric Dumazet 提交于
      [ Upstream commit 3f926af3f4d688e2e11e7f8ed04e277a14d4d4a4 ]
      
      Busy polling usually runs without locks.
      Let's use skb_queue_empty_lockless() instead of skb_queue_empty()
      
      Also uses READ_ONCE() in __skb_try_recv_datagram() to address
      a similar potential problem.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f3df7f1
    • E
      net: use skb_queue_empty_lockless() in poll() handlers · eaf548fe
      Eric Dumazet 提交于
      [ Upstream commit 3ef7cf57c72f32f61e97f8fa401bc39ea1f1a5d4 ]
      
      Many poll() handlers are lockless. Using skb_queue_empty_lockless()
      instead of skb_queue_empty() is more appropriate.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eaf548fe
    • E
      udp: use skb_queue_empty_lockless() · afa1f5e9
      Eric Dumazet 提交于
      [ Upstream commit 137a0dbe3426fd7bcfe3f8117b36a87b3590e4eb ]
      
      syzbot reported a data-race [1].
      
      We should use skb_queue_empty_lockless() to document that we are
      not ensuring a mutual exclusion and silence KCSAN.
      
      [1]
      BUG: KCSAN: data-race in __skb_recv_udp / __udp_enqueue_schedule_skb
      
      write to 0xffff888122474b50 of 8 bytes by interrupt on cpu 0:
       __skb_insert include/linux/skbuff.h:1852 [inline]
       __skb_queue_before include/linux/skbuff.h:1958 [inline]
       __skb_queue_tail include/linux/skbuff.h:1991 [inline]
       __udp_enqueue_schedule_skb+0x2c1/0x410 net/ipv4/udp.c:1470
       __udp_queue_rcv_skb net/ipv4/udp.c:1940 [inline]
       udp_queue_rcv_one_skb+0x7bd/0xc70 net/ipv4/udp.c:2057
       udp_queue_rcv_skb+0xb5/0x400 net/ipv4/udp.c:2074
       udp_unicast_rcv_skb.isra.0+0x7e/0x1c0 net/ipv4/udp.c:2233
       __udp4_lib_rcv+0xa44/0x17c0 net/ipv4/udp.c:2300
       udp_rcv+0x2b/0x40 net/ipv4/udp.c:2470
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       process_backlog+0x1d3/0x420 net/core/dev.c:5955
      
      read to 0xffff888122474b50 of 8 bytes by task 8921 on cpu 1:
       skb_queue_empty include/linux/skbuff.h:1494 [inline]
       __skb_recv_udp+0x18d/0x500 net/ipv4/udp.c:1653
       udp_recvmsg+0xe1/0xb10 net/ipv4/udp.c:1712
       inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871
       ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
       do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
       __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
       __do_sys_recvmmsg net/socket.c:2703 [inline]
       __se_sys_recvmmsg net/socket.c:2696 [inline]
       __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 8921 Comm: syz-executor.4 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      afa1f5e9
    • E
      net: add skb_queue_empty_lockless() · d5ac4232
      Eric Dumazet 提交于
      [ Upstream commit d7d16a89350ab263484c0aa2b523dd3a234e4a80 ]
      
      Some paths call skb_queue_empty() without holding
      the queue lock. We must use a barrier in order
      to not let the compiler do strange things, and avoid
      KCSAN splats.
      
      Adding a barrier in skb_queue_empty() might be overkill,
      I prefer adding a new helper to clearly identify
      points where the callers might be lockless. This might
      help us finding real bugs.
      
      The corresponding WRITE_ONCE() should add zero cost
      for current compilers.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5ac4232
    • X
      vxlan: check tun_info options_len properly · 83532eb4
      Xin Long 提交于
      [ Upstream commit eadf52cf1852196a1363044dcda22fa5d7f296f7 ]
      
      This patch is to improve the tun_info options_len by dropping
      the skb when TUNNEL_VXLAN_OPT is set but options_len is less
      than vxlan_metadata. This can void a potential out-of-bounds
      access on ip_tun_info.
      
      Fixes: ee122c79 ("vxlan: Flow based tunneling")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      83532eb4
    • E
      udp: fix data-race in udp_set_dev_scratch() · a8a5adbb
      Eric Dumazet 提交于
      [ Upstream commit a793183caa9afae907a0d7ddd2ffd57329369bf5 ]
      
      KCSAN reported a data-race in udp_set_dev_scratch() [1]
      
      The issue here is that we must not write over skb fields
      if skb is shared. A similar issue has been fixed in commit
      89c22d8c ("net: Fix skb csum races when peeking")
      
      While we are at it, use a helper only dealing with
      udp_skb_scratch(skb)->csum_unnecessary, as this allows
      udp_set_dev_scratch() to be called once and thus inlined.
      
      [1]
      BUG: KCSAN: data-race in udp_set_dev_scratch / udpv6_recvmsg
      
      write to 0xffff888120278317 of 1 bytes by task 10411 on cpu 1:
       udp_set_dev_scratch+0xea/0x200 net/ipv4/udp.c:1308
       __first_packet_length+0x147/0x420 net/ipv4/udp.c:1556
       first_packet_length+0x68/0x2a0 net/ipv4/udp.c:1579
       udp_poll+0xea/0x110 net/ipv4/udp.c:2720
       sock_poll+0xed/0x250 net/socket.c:1256
       vfs_poll include/linux/poll.h:90 [inline]
       do_select+0x7d0/0x1020 fs/select.c:534
       core_sys_select+0x381/0x550 fs/select.c:677
       do_pselect.constprop.0+0x11d/0x160 fs/select.c:759
       __do_sys_pselect6 fs/select.c:784 [inline]
       __se_sys_pselect6 fs/select.c:769 [inline]
       __x64_sys_pselect6+0x12e/0x170 fs/select.c:769
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      read to 0xffff888120278317 of 1 bytes by task 10413 on cpu 0:
       udp_skb_csum_unnecessary include/net/udp.h:358 [inline]
       udpv6_recvmsg+0x43e/0xe90 net/ipv6/udp.c:310
       inet6_recvmsg+0xbb/0x240 net/ipv6/af_inet6.c:592
       sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871
       ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
       do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
       __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
       __do_sys_recvmmsg net/socket.c:2703 [inline]
       __se_sys_recvmmsg net/socket.c:2696 [inline]
       __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 10413 Comm: syz-executor.0 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 2276f58a ("udp: use a separate rx queue for packet reception")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Reviewed-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8a5adbb
    • W
      selftests: net: reuseport_dualstack: fix uninitalized parameter · 12fab163
      Wei Wang 提交于
      [ Upstream commit d64479a3e3f9924074ca7b50bd72fa5211dca9c1 ]
      
      This test reports EINVAL for getsockopt(SOL_SOCKET, SO_DOMAIN)
      occasionally due to the uninitialized length parameter.
      Initialize it to fix this, and also use int for "test_family" to comply
      with the API standard.
      
      Fixes: d6a61f80 ("soreuseport: test mixed v4/v6 sockets")
      Reported-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Cc: Craig Gallek <cgallek@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      12fab163
    • Z
      net: Zeroing the structure ethtool_wolinfo in ethtool_get_wol() · 321c9915
      zhanglin 提交于
      [ Upstream commit 5ff223e86f5addbfae26419cbb5d61d98f6fbf7d ]
      
      memset() the structure ethtool_wolinfo that has padded bytes
      but the padded bytes have not been zeroed out.
      Signed-off-by: Nzhanglin <zhang.lin16@zte.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      321c9915
    • D
      net: usb: lan78xx: Disable interrupts before calling generic_handle_irq() · 9da271c1
      Daniel Wagner 提交于
      [ Upstream commit 0a29ac5bd3a988dc151c8d26910dec2557421f64 ]
      
      lan78xx_status() will run with interrupts enabled due to the change in
      ed194d136769 ("usb: core: remove local_irq_save() around ->complete()
      handler"). generic_handle_irq() expects to be run with IRQs disabled.
      
      [    4.886203] 000: irq 79 handler irq_default_primary_handler+0x0/0x8 enabled interrupts
      [    4.886243] 000: WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:152 __handle_irq_event_percpu+0x154/0x168
      [    4.896294] 000: Modules linked in:
      [    4.896301] 000: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.6 #39
      [    4.896310] 000: Hardware name: Raspberry Pi 3 Model B+ (DT)
      [    4.896315] 000: pstate: 60000005 (nZCv daif -PAN -UAO)
      [    4.896321] 000: pc : __handle_irq_event_percpu+0x154/0x168
      [    4.896331] 000: lr : __handle_irq_event_percpu+0x154/0x168
      [    4.896339] 000: sp : ffff000010003cc0
      [    4.896346] 000: x29: ffff000010003cc0 x28: 0000000000000060
      [    4.896355] 000: x27: ffff000011021980 x26: ffff00001189c72b
      [    4.896364] 000: x25: ffff000011702bc0 x24: ffff800036d6e400
      [    4.896373] 000: x23: 000000000000004f x22: ffff000010003d64
      [    4.896381] 000: x21: 0000000000000000 x20: 0000000000000002
      [    4.896390] 000: x19: ffff8000371c8480 x18: 0000000000000060
      [    4.896398] 000: x17: 0000000000000000 x16: 00000000000000eb
      [    4.896406] 000: x15: ffff000011712d18 x14: 7265746e69206465
      [    4.896414] 000: x13: ffff000010003ba0 x12: ffff000011712df0
      [    4.896422] 000: x11: 0000000000000001 x10: ffff000011712e08
      [    4.896430] 000: x9 : 0000000000000001 x8 : 000000000003c920
      [    4.896437] 000: x7 : ffff0000118cc410 x6 : ffff0000118c7f00
      [    4.896445] 000: x5 : 000000000003c920 x4 : 0000000000004510
      [    4.896453] 000: x3 : ffff000011712dc8 x2 : 0000000000000000
      [    4.896461] 000: x1 : 73a3f67df94c1500 x0 : 0000000000000000
      [    4.896466] 000: Call trace:
      [    4.896471] 000:  __handle_irq_event_percpu+0x154/0x168
      [    4.896481] 000:  handle_irq_event_percpu+0x50/0xb0
      [    4.896489] 000:  handle_irq_event+0x40/0x98
      [    4.896497] 000:  handle_simple_irq+0xa4/0xf0
      [    4.896505] 000:  generic_handle_irq+0x24/0x38
      [    4.896513] 000:  intr_complete+0xb0/0xe0
      [    4.896525] 000:  __usb_hcd_giveback_urb+0x58/0xd8
      [    4.896533] 000:  usb_giveback_urb_bh+0xd0/0x170
      [    4.896539] 000:  tasklet_action_common.isra.0+0x9c/0x128
      [    4.896549] 000:  tasklet_hi_action+0x24/0x30
      [    4.896556] 000:  __do_softirq+0x120/0x23c
      [    4.896564] 000:  irq_exit+0xb8/0xd8
      [    4.896571] 000:  __handle_domain_irq+0x64/0xb8
      [    4.896579] 000:  bcm2836_arm_irqchip_handle_irq+0x60/0xc0
      [    4.896586] 000:  el1_irq+0xb8/0x140
      [    4.896592] 000:  arch_cpu_idle+0x10/0x18
      [    4.896601] 000:  do_idle+0x200/0x280
      [    4.896608] 000:  cpu_startup_entry+0x20/0x28
      [    4.896615] 000:  rest_init+0xb4/0xc0
      [    4.896623] 000:  arch_call_rest_init+0xc/0x14
      [    4.896632] 000:  start_kernel+0x454/0x480
      
      Fixes: ed194d136769 ("usb: core: remove local_irq_save() around ->complete() handler")
      Cc: Woojung Huh <woojung.huh@microchip.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Stefan Wahren <wahrenst@gmx.net>
      Cc: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NDaniel Wagner <dwagner@suse.de>
      Tested-by: NStefan Wahren <wahrenst@gmx.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9da271c1
    • G
      netns: fix GFP flags in rtnl_net_notifyid() · 40400fdd
      Guillaume Nault 提交于
      [ Upstream commit d4e4fdf9e4a27c87edb79b1478955075be141f67 ]
      
      In rtnl_net_notifyid(), we certainly can't pass a null GFP flag to
      rtnl_notify(). A GFP_KERNEL flag would be fine in most circumstances,
      but there are a few paths calling rtnl_net_notifyid() from atomic
      context or from RCU critical sections. The later also precludes the use
      of gfp_any() as it wouldn't detect the RCU case. Also, the nlmsg_new()
      call is wrong too, as it uses GFP_KERNEL unconditionally.
      
      Therefore, we need to pass the GFP flags as parameter and propagate it
      through function calls until the proper flags can be determined.
      
      In most cases, GFP_KERNEL is fine. The exceptions are:
        * openvswitch: ovs_vport_cmd_get() and ovs_vport_cmd_dump()
          indirectly call rtnl_net_notifyid() from RCU critical section,
      
        * rtnetlink: rtmsg_ifinfo_build_skb() already receives GFP flags as
          parameter.
      
      Also, in ovs_vport_cmd_build_info(), let's change the GFP flags used
      by nlmsg_new(). The function is allowed to sleep, so better make the
      flags consistent with the ones used in the following
      ovs_vport_cmd_fill_info() call.
      
      Found by code inspection.
      
      Fixes: 9a963454 ("netns: notify netns id events")
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      40400fdd
    • E
      net/mlx4_core: Dynamically set guaranteed amount of counters per VF · 1d72dbb4
      Eran Ben Elisha 提交于
      [ Upstream commit e19868efea0c103f23b4b7e986fd0a703822111f ]
      
      Prior to this patch, the amount of counters guaranteed per VF in the
      resource tracker was MLX4_VF_COUNTERS_PER_PORT * MLX4_MAX_PORTS. It was
      set regardless if the VF was single or dual port.
      This caused several VFs to have no guaranteed counters although the
      system could satisfy their request.
      
      The fix is to dynamically guarantee counters, based on each VF
      specification.
      
      Fixes: 9de92c60 ("net/mlx4_core: Adjust counter grant policy in the resource tracker")
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d72dbb4
    • J
      net: hisilicon: Fix ping latency when deal with high throughput · f05975d9
      Jiangfeng Xiao 提交于
      [ Upstream commit e56bd641ca61beb92b135298d5046905f920b734 ]
      
      This is due to error in over budget processing.
      When dealing with high throughput, the used buffers
      that exceeds the budget is not cleaned up. In addition,
      it takes a lot of cycles to clean up the used buffer,
      and then the buffer where the valid data is located can take effect.
      Signed-off-by: NJiangfeng Xiao <xiaojiangfeng@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f05975d9
    • T
      net: fix sk_page_frag() recursion from memory reclaim · 1d5cb12a
      Tejun Heo 提交于
      [ Upstream commit 20eb4f29b60286e0d6dc01d9c260b4bd383c58fb ]
      
      sk_page_frag() optimizes skb_frag allocations by using per-task
      skb_frag cache when it knows it's the only user.  The condition is
      determined by seeing whether the socket allocation mask allows
      blocking - if the allocation may block, it obviously owns the task's
      context and ergo exclusively owns current->task_frag.
      
      Unfortunately, this misses recursion through memory reclaim path.
      Please take a look at the following backtrace.
      
       [2] RIP: 0010:tcp_sendmsg_locked+0xccf/0xe10
           ...
           tcp_sendmsg+0x27/0x40
           sock_sendmsg+0x30/0x40
           sock_xmit.isra.24+0xa1/0x170 [nbd]
           nbd_send_cmd+0x1d2/0x690 [nbd]
           nbd_queue_rq+0x1b5/0x3b0 [nbd]
           __blk_mq_try_issue_directly+0x108/0x1b0
           blk_mq_request_issue_directly+0xbd/0xe0
           blk_mq_try_issue_list_directly+0x41/0xb0
           blk_mq_sched_insert_requests+0xa2/0xe0
           blk_mq_flush_plug_list+0x205/0x2a0
           blk_flush_plug_list+0xc3/0xf0
       [1] blk_finish_plug+0x21/0x2e
           _xfs_buf_ioapply+0x313/0x460
           __xfs_buf_submit+0x67/0x220
           xfs_buf_read_map+0x113/0x1a0
           xfs_trans_read_buf_map+0xbf/0x330
           xfs_btree_read_buf_block.constprop.42+0x95/0xd0
           xfs_btree_lookup_get_block+0x95/0x170
           xfs_btree_lookup+0xcc/0x470
           xfs_bmap_del_extent_real+0x254/0x9a0
           __xfs_bunmapi+0x45c/0xab0
           xfs_bunmapi+0x15/0x30
           xfs_itruncate_extents_flags+0xca/0x250
           xfs_free_eofblocks+0x181/0x1e0
           xfs_fs_destroy_inode+0xa8/0x1b0
           destroy_inode+0x38/0x70
           dispose_list+0x35/0x50
           prune_icache_sb+0x52/0x70
           super_cache_scan+0x120/0x1a0
           do_shrink_slab+0x120/0x290
           shrink_slab+0x216/0x2b0
           shrink_node+0x1b6/0x4a0
           do_try_to_free_pages+0xc6/0x370
           try_to_free_mem_cgroup_pages+0xe3/0x1e0
           try_charge+0x29e/0x790
           mem_cgroup_charge_skmem+0x6a/0x100
           __sk_mem_raise_allocated+0x18e/0x390
           __sk_mem_schedule+0x2a/0x40
       [0] tcp_sendmsg_locked+0x8eb/0xe10
           tcp_sendmsg+0x27/0x40
           sock_sendmsg+0x30/0x40
           ___sys_sendmsg+0x26d/0x2b0
           __sys_sendmsg+0x57/0xa0
           do_syscall_64+0x42/0x100
           entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      In [0], tcp_send_msg_locked() was using current->page_frag when it
      called sk_wmem_schedule().  It already calculated how many bytes can
      be fit into current->page_frag.  Due to memory pressure,
      sk_wmem_schedule() called into memory reclaim path which called into
      xfs and then IO issue path.  Because the filesystem in question is
      backed by nbd, the control goes back into the tcp layer - back into
      tcp_sendmsg_locked().
      
      nbd sets sk_allocation to (GFP_NOIO | __GFP_MEMALLOC) which makes
      sense - it's in the process of freeing memory and wants to be able to,
      e.g., drop clean pages to make forward progress.  However, this
      confused sk_page_frag() called from [2].  Because it only tests
      whether the allocation allows blocking which it does, it now thinks
      current->page_frag can be used again although it already was being
      used in [0].
      
      After [2] used current->page_frag, the offset would be increased by
      the used amount.  When the control returns to [0],
      current->page_frag's offset is increased and the previously calculated
      number of bytes now may overrun the end of allocated memory leading to
      silent memory corruptions.
      
      Fix it by adding gfpflags_normal_context() which tests sleepable &&
      !reclaim and use it to determine whether to use current->task_frag.
      
      v2: Eric didn't like gfp flags being tested twice.  Introduce a new
          helper gfpflags_normal_context() and combine the two tests.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d5cb12a
    • B
      net: ethernet: ftgmac100: Fix DMA coherency issue with SW checksum · 189982d1
      Benjamin Herrenschmidt 提交于
      [ Upstream commit 88824e3bf29a2fcacfd9ebbfe03063649f0f3254 ]
      
      We are calling the checksum helper after the dma_map_single()
      call to map the packet. This is incorrect as the checksumming
      code will touch the packet from the CPU. This means the cache
      won't be properly flushes (or the bounce buffering will leave
      us with the unmodified packet to DMA).
      
      This moves the calculation of the checksum & vlan tags to
      before the DMA mapping.
      
      This also has the side effect of fixing another bug: If the
      checksum helper fails, we goto "drop" to drop the packet, which
      will not unmap the DMA mapping.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Fixes: 05690d63 ("ftgmac100: Upgrade to NETIF_F_HW_CSUM")
      Reviewed-by: NVijay Khemka <vijaykhemka@fb.com>
      Tested-by: NVijay Khemka <vijaykhemka@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      189982d1
    • F
      net: dsa: bcm_sf2: Fix IMP setup for port different than 8 · 5536fc89
      Florian Fainelli 提交于
      [ Upstream commit 5fc0f21246e50afdf318b5a3a941f7f4f57b8947 ]
      
      Since it became possible for the DSA core to use a CPU port different
      than 8, our bcm_sf2_imp_setup() function was broken because it assumes
      that registers are applicable to port 8. In particular, the port's MAC
      is going to stay disabled, so make sure we clear the RX_DIS and TX_DIS
      bits if we are not configured for port 8.
      
      Fixes: 9f91484f ("net: dsa: make "label" property optional for dsa2")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5536fc89
    • E
      net: annotate lockless accesses to sk->sk_napi_id · 2c50a36d
      Eric Dumazet 提交于
      [ Upstream commit ee8d153d46a3b98c064ee15c0c0a3bbf1450e5a1 ]
      
      We already annotated most accesses to sk->sk_napi_id
      
      We missed sk_mark_napi_id() and sk_mark_napi_id_once()
      which might be called without socket lock held in UDP stack.
      
      KCSAN reported :
      BUG: KCSAN: data-race in udpv6_queue_rcv_one_skb / udpv6_queue_rcv_one_skb
      
      write to 0xffff888121c6d108 of 4 bytes by interrupt on cpu 0:
       sk_mark_napi_id include/net/busy_poll.h:125 [inline]
       __udpv6_queue_rcv_skb net/ipv6/udp.c:571 [inline]
       udpv6_queue_rcv_one_skb+0x70c/0xb40 net/ipv6/udp.c:672
       udpv6_queue_rcv_skb+0xb5/0x400 net/ipv6/udp.c:689
       udp6_unicast_rcv_skb.isra.0+0xd7/0x180 net/ipv6/udp.c:832
       __udp6_lib_rcv+0x69c/0x1770 net/ipv6/udp.c:913
       udpv6_rcv+0x2b/0x40 net/ipv6/udp.c:1015
       ip6_protocol_deliver_rcu+0x22a/0xbe0 net/ipv6/ip6_input.c:409
       ip6_input_finish+0x30/0x50 net/ipv6/ip6_input.c:450
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip6_input+0x177/0x190 net/ipv6/ip6_input.c:459
       dst_input include/net/dst.h:442 [inline]
       ip6_rcv_finish+0x110/0x140 net/ipv6/ip6_input.c:76
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ipv6_rcv+0x1a1/0x1b0 net/ipv6/ip6_input.c:284
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       process_backlog+0x1d3/0x420 net/core/dev.c:5955
       napi_poll net/core/dev.c:6392 [inline]
       net_rx_action+0x3ae/0xa90 net/core/dev.c:6460
      
      write to 0xffff888121c6d108 of 4 bytes by interrupt on cpu 1:
       sk_mark_napi_id include/net/busy_poll.h:125 [inline]
       __udpv6_queue_rcv_skb net/ipv6/udp.c:571 [inline]
       udpv6_queue_rcv_one_skb+0x70c/0xb40 net/ipv6/udp.c:672
       udpv6_queue_rcv_skb+0xb5/0x400 net/ipv6/udp.c:689
       udp6_unicast_rcv_skb.isra.0+0xd7/0x180 net/ipv6/udp.c:832
       __udp6_lib_rcv+0x69c/0x1770 net/ipv6/udp.c:913
       udpv6_rcv+0x2b/0x40 net/ipv6/udp.c:1015
       ip6_protocol_deliver_rcu+0x22a/0xbe0 net/ipv6/ip6_input.c:409
       ip6_input_finish+0x30/0x50 net/ipv6/ip6_input.c:450
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip6_input+0x177/0x190 net/ipv6/ip6_input.c:459
       dst_input include/net/dst.h:442 [inline]
       ip6_rcv_finish+0x110/0x140 net/ipv6/ip6_input.c:76
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ipv6_rcv+0x1a1/0x1b0 net/ipv6/ip6_input.c:284
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       process_backlog+0x1d3/0x420 net/core/dev.c:5955
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 10890 Comm: syz-executor.0 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: e68b6e50 ("udp: enable busy polling for all sockets")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c50a36d
    • E
      net: annotate accesses to sk->sk_incoming_cpu · 0cfaf03c
      Eric Dumazet 提交于
      [ Upstream commit 7170a977743b72cf3eb46ef6ef89885dc7ad3621 ]
      
      This socket field can be read and written by concurrent cpus.
      
      Use READ_ONCE() and WRITE_ONCE() annotations to document this,
      and avoid some compiler 'optimizations'.
      
      KCSAN reported :
      
      BUG: KCSAN: data-race in tcp_v4_rcv / tcp_v4_rcv
      
      write to 0xffff88812220763c of 4 bytes by interrupt on cpu 0:
       sk_incoming_cpu_update include/net/sock.h:953 [inline]
       tcp_v4_rcv+0x1b3c/0x1bb0 net/ipv4/tcp_ipv4.c:1934
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       process_backlog+0x1d3/0x420 net/core/dev.c:5955
       napi_poll net/core/dev.c:6392 [inline]
       net_rx_action+0x3ae/0xa90 net/core/dev.c:6460
       __do_softirq+0x115/0x33f kernel/softirq.c:292
       do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082
       do_softirq.part.0+0x6b/0x80 kernel/softirq.c:337
       do_softirq kernel/softirq.c:329 [inline]
       __local_bh_enable_ip+0x76/0x80 kernel/softirq.c:189
      
      read to 0xffff88812220763c of 4 bytes by interrupt on cpu 1:
       sk_incoming_cpu_update include/net/sock.h:952 [inline]
       tcp_v4_rcv+0x181a/0x1bb0 net/ipv4/tcp_ipv4.c:1934
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       process_backlog+0x1d3/0x420 net/core/dev.c:5955
       napi_poll net/core/dev.c:6392 [inline]
       net_rx_action+0x3ae/0xa90 net/core/dev.c:6460
       __do_softirq+0x115/0x33f kernel/softirq.c:292
       run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
       smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0cfaf03c
    • E
      inet: stop leaking jiffies on the wire · 07de7389
      Eric Dumazet 提交于
      [ Upstream commit a904a0693c189691eeee64f6c6b188bd7dc244e9 ]
      
      Historically linux tried to stick to RFC 791, 1122, 2003
      for IPv4 ID field generation.
      
      RFC 6864 made clear that no matter how hard we try,
      we can not ensure unicity of IP ID within maximum
      lifetime for all datagrams with a given source
      address/destination address/protocol tuple.
      
      Linux uses a per socket inet generator (inet_id), initialized
      at connection startup with a XOR of 'jiffies' and other
      fields that appear clear on the wire.
      
      Thiemo Nagel pointed that this strategy is a privacy
      concern as this provides 16 bits of entropy to fingerprint
      devices.
      
      Let's switch to a random starting point, this is just as
      good as far as RFC 6864 is concerned and does not leak
      anything critical.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NThiemo Nagel <tnagel@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07de7389
    • X
      erspan: fix the tun_info options_len check for erspan · 163901dc
      Xin Long 提交于
      [ Upstream commit 2eb8d6d2910cfe3dc67dc056f26f3dd9c63d47cd ]
      
      The check for !md doens't really work for ip_tunnel_info_opts(info) which
      only does info + 1. Also to avoid out-of-bounds access on info, it should
      ensure options_len is not less than erspan_metadata in both erspan_xmit()
      and ip6erspan_tunnel_xmit().
      
      Fixes: 1a66a836 ("gre: add collect_md mode to ERSPAN tunnel")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      163901dc
    • E
      dccp: do not leak jiffies on the wire · 96df1ec2
      Eric Dumazet 提交于
      [ Upstream commit 3d1e5039f5f87a8731202ceca08764ee7cb010d3 ]
      
      For some reason I missed the case of DCCP passive
      flows in my previous patch.
      
      Fixes: a904a0693c18 ("inet: stop leaking jiffies on the wire")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NThiemo Nagel <tnagel@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96df1ec2
    • V
      cxgb4: fix panic when attaching to ULD fail · f291613f
      Vishal Kulkarni 提交于
      [ Upstream commit fc89cc358fb64e2429aeae0f37906126636507ec ]
      
      Release resources when attaching to ULD fail. Otherwise, data
      mismatch is seen between LLD and ULD later on, which lead to
      kernel panic when accessing resources that should not even
      exist in the first place.
      
      Fixes: 94cdb8bb ("cxgb4: Add support for dynamic allocation of resources for ULD")
      Signed-off-by: NShahjada Abul Husain <shahjada@chelsio.com>
      Signed-off-by: NVishal Kulkarni <vishal@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f291613f
    • J
      nbd: handle racing with error'ed out commands · 1f032ca2
      Josef Bacik 提交于
      [ Upstream commit 7ce23e8e0a9cd38338fc8316ac5772666b565ca9 ]
      
      We hit the following warning in production
      
      print_req_error: I/O error, dev nbd0, sector 7213934408 flags 80700
      ------------[ cut here ]------------
      refcount_t: underflow; use-after-free.
      WARNING: CPU: 25 PID: 32407 at lib/refcount.c:190 refcount_sub_and_test_checked+0x53/0x60
      Workqueue: knbd-recv recv_work [nbd]
      RIP: 0010:refcount_sub_and_test_checked+0x53/0x60
      Call Trace:
       blk_mq_free_request+0xb7/0xf0
       blk_mq_complete_request+0x62/0xf0
       recv_work+0x29/0xa1 [nbd]
       process_one_work+0x1f5/0x3f0
       worker_thread+0x2d/0x3d0
       ? rescuer_thread+0x340/0x340
       kthread+0x111/0x130
       ? kthread_create_on_node+0x60/0x60
       ret_from_fork+0x1f/0x30
      ---[ end trace b079c3c67f98bb7c ]---
      
      This was preceded by us timing out everything and shutting down the
      sockets for the device.  The problem is we had a request in the queue at
      the same time, so we completed the request twice.  This can actually
      happen in a lot of cases, we fail to get a ref on our config, we only
      have one connection and just error out the command, etc.
      
      Fix this by checking cmd->status in nbd_read_stat.  We only change this
      under the cmd->lock, so we are safe to check this here and see if we've
      already error'ed this command out, which would indicate that we've
      completed it as well.
      Reviewed-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1f032ca2
    • J
      nbd: protect cmd->status with cmd->lock · 82b7c99e
      Josef Bacik 提交于
      [ Upstream commit de6346ecbc8f5591ebd6c44ac164e8b8671d71d7 ]
      
      We already do this for the most part, except in timeout and clear_req.
      For the timeout case we take the lock after we grab a ref on the config,
      but that isn't really necessary because we're safe to touch the cmd at
      this point, so just move the order around.
      
      For the clear_req cause this is initiated by the user, so again is safe.
      Reviewed-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      82b7c99e
    • D
      cifs: Fix cifsInodeInfo lock_sem deadlock when reconnect occurs · 80b42f43
      Dave Wysochanski 提交于
      [ Upstream commit d46b0da7a33dd8c99d969834f682267a45444ab3 ]
      
      There's a deadlock that is possible and can easily be seen with
      a test where multiple readers open/read/close of the same file
      and a disruption occurs causing reconnect.  The deadlock is due
      a reader thread inside cifs_strict_readv calling down_read and
      obtaining lock_sem, and then after reconnect inside
      cifs_reopen_file calling down_read a second time.  If in
      between the two down_read calls, a down_write comes from
      another process, deadlock occurs.
      
              CPU0                    CPU1
              ----                    ----
      cifs_strict_readv()
       down_read(&cifsi->lock_sem);
                                     _cifsFileInfo_put
                                        OR
                                     cifs_new_fileinfo
                                      down_write(&cifsi->lock_sem);
      cifs_reopen_file()
       down_read(&cifsi->lock_sem);
      
      Fix the above by changing all down_write(lock_sem) calls to
      down_write_trylock(lock_sem)/msleep() loop, which in turn
      makes the second down_read call benign since it will never
      block behind the writer while holding lock_sem.
      Signed-off-by: NDave Wysochanski <dwysocha@redhat.com>
      Suggested-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed--by: NRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      80b42f43