1. 02 8月, 2021 12 次提交
    • S
      net: phy: micrel: Fix detection of ksz87xx switch · a5e63c7d
      Steve Bennett 提交于
      The logic for discerning between KSZ8051 and KSZ87XX PHYs is incorrect
      such that the that KSZ87XX switch is not identified correctly.
      
      ksz8051_ksz8795_match_phy_device() uses the parameter ksz_phy_id
      to discriminate whether it was called from ksz8051_match_phy_device()
      or from ksz8795_match_phy_device() but since PHY_ID_KSZ87XX is the
      same value as PHY_ID_KSZ8051, this doesn't work.
      
      Instead use a bool to discriminate the caller.
      
      Without this patch, the KSZ8795 switch port identifies as:
      
      ksz8795-switch spi3.1 ade1 (uninitialized): PHY [dsa-0.1:03] driver [Generic PHY]
      
      With the patch, it identifies correctly:
      
      ksz8795-switch spi3.1 ade1 (uninitialized): PHY [dsa-0.1:03] driver [Micrel KSZ87XX Switch]
      
      Fixes: 8b95599c ("net: phy: micrel: Discern KSZ8051 and KSZ8795 PHYs")
      Signed-off-by: NSteve Bennett <steveb@workware.net.au>
      Reviewed-by: NMarek Vasut <marex@denx.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5e63c7d
    • D
      Merge branch 'sja1105-fdb-fixes' · cebb5103
      David S. Miller 提交于
      Vladimir Oltean says:
      
      ====================
      FDB fixes for NXP SJA1105
      
      I have some upcoming patches that make heavy use of statically installed
      FDB entries, and when testing them on SJA1105P/Q/R/S and SJA1110, it
      became clear that these switches do not behave reliably at all.
      
      - On SJA1110, a static FDB entry cannot be installed at all
      - On SJA1105P/Q/R/S, it is very picky about the inner/outer VLAN type
      - Dynamically learned entries will make us not install static ones, or
        even if we do, they might not take effect
      
      Patch 5/6 has a conflict with net-next (sorry), the commit message of
      that patch describes how to deal with it. Thanks.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cebb5103
    • V
      net: dsa: sja1105: match FDB entries regardless of inner/outer VLAN tag · 47c2c0c2
      Vladimir Oltean 提交于
      On SJA1105P/Q/R/S and SJA1110, the L2 Lookup Table entries contain a
      maskable "inner/outer tag" bit which means:
      - when set to 1: match single-outer and double tagged frames
      - when set to 0: match untagged and single-inner tagged frames
      - when masked off: match all frames regardless of the type of tag
      
      This driver does not make any meaningful distinction between inner tags
      (matches on TPID) and outer tags (matches on TPID2). In fact, all VLAN
      table entries are installed as SJA1110_VLAN_D_TAG, which means that they
      match on both inner and outer tags.
      
      So it does not make sense that we install FDB entries with the IOTAG bit
      set to 1.
      
      In VLAN-unaware mode, we set both TPID and TPID2 to 0xdadb, so the
      switch will see frames as outer-tagged or double-tagged (never inner).
      So the FDB entries will match if IOTAG is set to 1.
      
      In VLAN-aware mode, we set TPID to 0x8100 and TPID2 to 0x88a8. So the
      switch will see untagged and 802.1Q-tagged packets as inner-tagged, and
      802.1ad-tagged packets as outer-tagged. So untagged and 802.1Q-tagged
      packets will not match FDB entries if IOTAG is set to 1, but 802.1ad
      tagged packets will. Strange.
      
      To fix this, simply mask off the IOTAG bit from FDB entries, and make
      them match regardless of whether the VLAN tag is inner or outer.
      
      Fixes: 1da73821 ("net: dsa: sja1105: Add FDB operations for P/Q/R/S series")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47c2c0c2
    • V
      net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too · 589918df
      Vladimir Oltean 提交于
      Similar but not quite the same with what was done in commit b11f0a4c
      ("net: dsa: sja1105: be stateless when installing FDB entries") for
      SJA1105E/T, it is desirable to drop the priv->vlan_aware check and
      simply go ahead and install FDB entries in the VLAN that was given by
      the bridge.
      
      As opposed to SJA1105E/T, in SJA1105P/Q/R/S and SJA1110, the FDB is a
      maskable TCAM, and we are installing VLAN-unaware FDB entries with the
      VLAN ID masked off. However, such FDB entries might completely obscure
      VLAN-aware entries where the VLAN ID is included in the search mask,
      because the switch looks up the FDB from left to right and picks the
      first entry which results in a masked match. So it depends on whether
      the bridge installs first the VLAN-unaware or the VLAN-aware FDB entries.
      
      Anyway, if we had a VLAN-unaware FDB entry towards one set of DESTPORTS
      and a VLAN-aware one towards other set of DESTPORTS, the result is that
      the packets in VLAN-aware mode will be forwarded towards the DESTPORTS
      specified by the VLAN-unaware entry.
      
      To solve this, simply do not use the masked matching ability of the FDB
      for VLAN ID, and always match precisely on it. In VLAN-unaware mode, we
      configure the switch for shared VLAN learning, so the VLAN ID will be
      ignored anyway during lookup, so it is redundant to mask it off in the
      TCAM.
      
      This patch conflicts with net-next commit 0fac6aa0 ("net: dsa: sja1105:
      delete the best_effort_vlan_filtering mode") which changed this line:
      	if (priv->vlan_state != SJA1105_VLAN_UNAWARE) {
      into:
      	if (priv->vlan_aware) {
      
      When merging with net-next, the lines added by this patch should take
      precedence in the conflict resolution (i.e. the "if" condition should be
      deleted in both cases).
      
      Fixes: 1da73821 ("net: dsa: sja1105: Add FDB operations for P/Q/R/S series")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      589918df
    • V
      net: dsa: sja1105: ignore the FDB entry for unknown multicast when adding a new address · 728db843
      Vladimir Oltean 提交于
      Currently, when sja1105pqrs_fdb_add() is called for a host-joined IPv6
      MDB entry such as 33:33:00:00:00:6a, the search for that address will
      return the FDB entry for SJA1105_UNKNOWN_MULTICAST, which has a
      destination MAC of 01:00:00:00:00:00 and a mask of 01:00:00:00:00:00.
      It returns that entry because, well, it matches, in the sense that
      unknown multicast is supposed by design to match it...
      
      But the issue is that we then proceed to overwrite this entry with the
      one for our precise host-joined multicast address, and the unknown
      multicast entry is no longer there - unknown multicast is now flooded to
      the same group of ports as broadcast, which does not look up the FDB.
      
      To solve this problem, we should ignore searches that return the unknown
      multicast address as the match, and treat them as "no match" which will
      result in the entry being installed to hardware.
      
      For this to work properly, we need to put the result of the FDB search
      in a temporary variable in order to avoid overwriting the l2_lookup
      entry we want to program. The l2_lookup entry returned by the search
      might not have the same set of DESTPORTS and not even the same MACADDR
      as the entry we're trying to add.
      
      Fixes: 4d942354 ("net: dsa: sja1105: offload bridge port flags to device")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      728db843
    • V
      net: dsa: sja1105: invalidate dynamic FDB entries learned concurrently with statically added ones · 6c5fc159
      Vladimir Oltean 提交于
      The procedure to add a static FDB entry in sja1105 is concurrent with
      dynamic learning performed on all bridge ports and the CPU port.
      
      The switch looks up the FDB from left to right, and also learns
      dynamically from left to right, so it is possible that between the
      moment when we pick up a free slot to install an FDB entry, another slot
      to the left of that one becomes free due to an address ageing out, and
      that other slot is then immediately used by the switch to learn
      dynamically the same address as we're trying to add statically.
      
      The result is that we succeeded to add our static FDB entry, but it is
      being shadowed by a dynamic FDB entry to its left, and the switch will
      behave as if our static FDB entry did not exist.
      
      We cannot really prevent this from happening unless we make the entire
      process to add a static FDB entry a huge critical section where address
      learning is temporarily disabled on _all_ ports, and then re-enabled
      according to the configuration done by sja1105_port_set_learning.
      However, that is kind of disruptive for the operation of the network.
      
      What we can do alternatively is to simply read back the FDB for dynamic
      entries located before our newly added static one, and delete them.
      This will guarantee that our static FDB entry is now operational. It
      will still not guarantee that there aren't dynamic FDB entries to the
      _right_ of that static FDB entry, but at least those entries will age
      out by themselves since they aren't hit, and won't bother anyone.
      
      Fixes: 291d1e72 ("net: dsa: sja1105: Add support for FDB and MDB management")
      Fixes: 1da73821 ("net: dsa: sja1105: Add FDB operations for P/Q/R/S series")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c5fc159
    • V
      net: dsa: sja1105: overwrite dynamic FDB entries with static ones in .port_fdb_add · e11e865b
      Vladimir Oltean 提交于
      The SJA1105 switch family leaves it up to software to decide where
      within the FDB to install a static entry, and to concatenate destination
      ports for already existing entries (the FDB is also used for multicast
      entries), it is not as simple as just saying "please add this entry".
      
      This means we first need to search for an existing FDB entry before
      adding a new one. The driver currently manages to fool itself into
      thinking that if an FDB entry already exists, there is nothing to be
      done. But that FDB entry might be dynamically learned, case in which it
      should be replaced with a static entry, but instead it is left alone.
      
      This patch checks the LOCKEDS ("locked/static") bit from found FDB
      entries, and lets the code "goto skip_finding_an_index;" if the FDB
      entry was not static. So we also need to move the place where we set
      LOCKEDS = true, to cover the new case where a dynamic FDB entry existed
      but was dynamic.
      
      Fixes: 291d1e72 ("net: dsa: sja1105: Add support for FDB and MDB management")
      Fixes: 1da73821 ("net: dsa: sja1105: Add FDB operations for P/Q/R/S series")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e11e865b
    • V
      net: dsa: sja1105: fix static FDB writes for SJA1110 · cb81698f
      Vladimir Oltean 提交于
      The blamed commit made FDB access on SJA1110 functional only as far as
      dumping the existing entries goes, but anything having to do with an
      entry's index (adding, deleting) is still broken.
      
      There are in fact 2 problems, all caused by improperly inheriting the
      code from SJA1105P/Q/R/S:
      - An entry size is SJA1110_SIZE_L2_LOOKUP_ENTRY (24) bytes and not
        SJA1105PQRS_SIZE_L2_LOOKUP_ENTRY (20) bytes
      - The "index" field within an FDB entry is at bits 10:1 for SJA1110 and
        not 15:6 as in SJA1105P/Q/R/S
      
      This patch moves the packing function for the cmd->index outside of
      sja1105pqrs_common_l2_lookup_cmd_packing() and into the device specific
      functions sja1105pqrs_l2_lookup_cmd_packing and
      sja1110_l2_lookup_cmd_packing.
      
      Fixes: 74e7feff ("net: dsa: sja1105: fix dynamic access to L2 Address Lookup table for SJA1110")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb81698f
    • D
      mhi: Fix networking tree build. · 40e15940
      David S. Miller 提交于
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40e15940
    • Y
      net/sched: taprio: Fix init procedure · ebca25ea
      Yannick Vignon 提交于
      Commit 13511704 ("net: taprio offload: enforce qdisc to netdev queue mapping")
      resulted in duplicate entries in the qdisc hash.
      While this did not impact the overall operation of the qdisc and taprio
      code paths, it did result in an infinite loop when dumping the qdisc
      properties, at least on one target (NXP LS1028 ARDB).
      Removing the duplicate call to qdisc_hash_add() solves the problem.
      
      Fixes: 13511704 ("net: taprio offload: enforce qdisc to netdev queue mapping")
      Signed-off-by: NYannick Vignon <yannick.vignon@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ebca25ea
    • J
      net, gro: Set inner transport header offset in tcp/udp GRO hook · d51c5907
      Jakub Sitnicki 提交于
      GSO expects inner transport header offset to be valid when
      skb->encapsulation flag is set. GSO uses this value to calculate the length
      of an individual segment of a GSO packet in skb_gso_transport_seglen().
      
      However, tcp/udp gro_complete callbacks don't update the
      skb->inner_transport_header when processing an encapsulated TCP/UDP
      segment. As a result a GRO skb has ->inner_transport_header set to a value
      carried over from earlier skb processing.
      
      This can have mild to tragic consequences. From miscalculating the GSO
      segment length to triggering a page fault [1], when trying to read TCP/UDP
      header at an address past the skb->data page.
      
      The latter scenario leads to an oops report like so:
      
        BUG: unable to handle page fault for address: ffff9fa7ec00d008
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 123f201067 P4D 123f201067 PUD 123f209067 PMD 0
        Oops: 0000 [#1] SMP NOPTI
        CPU: 44 PID: 0 Comm: swapper/44 Not tainted 5.4.53-cloudflare-2020.7.21 #1
        Hardware name: HYVE EDGE-METAL-GEN10/HS-1811DLite1, BIOS V2.15 02/21/2020
        RIP: 0010:skb_gso_transport_seglen+0x44/0xa0
        Code: c0 41 83 e0 11 f6 87 81 00 00 00 20 74 30 0f b7 87 aa 00 00 00 0f [...]
        RSP: 0018:ffffad8640bacbb8 EFLAGS: 00010202
        RAX: 000000000000feda RBX: ffff9fcc8d31bc00 RCX: ffff9fa7ec00cffc
        RDX: ffff9fa7ebffdec0 RSI: 000000000000feda RDI: 0000000000000122
        RBP: 00000000000005c4 R08: 0000000000000001 R09: 0000000000000000
        R10: ffff9fe588ae3800 R11: ffff9fe011fc92f0 R12: ffff9fcc8d31bc00
        R13: ffff9fe0119d4300 R14: 00000000000005c4 R15: ffff9fba57d70900
        FS:  0000000000000000(0000) GS:ffff9fe68df00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffff9fa7ec00d008 CR3: 0000003e99b1c000 CR4: 0000000000340ee0
        Call Trace:
         <IRQ>
         skb_gso_validate_network_len+0x11/0x70
         __ip_finish_output+0x109/0x1c0
         ip_sublist_rcv_finish+0x57/0x70
         ip_sublist_rcv+0x2aa/0x2d0
         ? ip_rcv_finish_core.constprop.0+0x390/0x390
         ip_list_rcv+0x12b/0x14f
         __netif_receive_skb_list_core+0x2a9/0x2d0
         netif_receive_skb_list_internal+0x1b5/0x2e0
         napi_complete_done+0x93/0x140
         veth_poll+0xc0/0x19f [veth]
         ? mlx5e_napi_poll+0x221/0x610 [mlx5_core]
         net_rx_action+0x1f8/0x790
         __do_softirq+0xe1/0x2bf
         irq_exit+0x8e/0xc0
         do_IRQ+0x58/0xe0
         common_interrupt+0xf/0xf
         </IRQ>
      
      The bug can be observed in a simple setup where we send IP/GRE/IP/TCP
      packets into a netns over a veth pair. Inside the netns, packets are
      forwarded to dummy device:
      
        trafgen -> [veth A]--[veth B] -forward-> [dummy]
      
      For veth B to GRO aggregate packets on receive, it needs to have an XDP
      program attached (for example, a trivial XDP_PASS). Additionally, for UDP,
      we need to enable GSO_UDP_L4 feature on the device:
      
        ip netns exec A ethtool -K AB rx-udp-gro-forwarding on
      
      The last component is an artificial delay to increase the chances of GRO
      batching happening:
      
        ip netns exec A tc qdisc add dev AB root \
           netem delay 200us slot 5ms 10ms packets 2 bytes 64k
      
      With such a setup in place, the bug can be observed by tracing the skb
      outer and inner offsets when GSO skb is transmitted from the dummy device:
      
      tcp:
      
      FUNC              DEV   SKB_LEN  NH  TH ENC INH ITH GSO_SIZE GSO_TYPE
      ip_finish_output  dumB     2830 270 290   1 294 254     1383 (tcpv4,gre,)
                                                      ^^^
      udp:
      
      FUNC              DEV   SKB_LEN  NH  TH ENC INH ITH GSO_SIZE GSO_TYPE
      ip_finish_output  dumB     2818 270 290   1 294 254     1383 (gre,udp_l4,)
                                                      ^^^
      
      Fix it by updating the inner transport header offset in tcp/udp
      gro_complete callbacks, similar to how {inet,ipv6}_gro_complete callbacks
      update the inner network header offset, when skb->encapsulation flag is
      set.
      
      [1] https://lore.kernel.org/netdev/CAKxSbF01cLpZem2GFaUaifh0S-5WYViZemTicAg7FCHOnh6kug@mail.gmail.com/
      
      Fixes: bf296b12 ("tcp: Add GRO support")
      Fixes: f993bc25 ("net: core: handle encapsulation offloads when computing segment lengths")
      Fixes: e20cf8d3 ("udp: implement GRO for plain UDP sockets.")
      Reported-by: NAlex Forster <aforster@cloudflare.com>
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d51c5907
    • P
      qede: fix crash in rmmod qede while automatic debug collection · 1159e25c
      Prabhakar Kushwaha 提交于
      A crash has been observed if rmmod is done while automatic debug
      collection in progress. It is due to a race  condition between
      both of them.
      
      To fix stop the sp_task during unload to avoid running qede_sp_task
      even if they are schedule during removal process.
      Signed-off-by: NAlok Prasad <palok@marvell.com>
      Signed-off-by: NShai Malin <smalin@marvell.com>
      Signed-off-by: NAriel Elior <aelior@marvell.com>
      Signed-off-by: NPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1159e25c
  2. 31 7月, 2021 19 次提交
    • L
      Merge tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · c7d10223
      Linus Torvalds 提交于
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes for 5.14-rc4, including fixes from bpf, can, WiFi
        (mac80211) and netfilter trees.
      
        Current release - regressions:
      
         - mac80211: fix starting aggregation sessions on mesh interfaces
      
        Current release - new code bugs:
      
         - sctp: send pmtu probe only if packet loss in Search Complete state
      
         - bnxt_en: add missing periodic PHC overflow check
      
         - devlink: fix phys_port_name of virtual port and merge error
      
         - hns3: change the method of obtaining default ptp cycle
      
         - can: mcba_usb_start(): add missing urb->transfer_dma initialization
      
        Previous releases - regressions:
      
         - set true network header for ECN decapsulation
      
         - mlx5e: RX, avoid possible data corruption w/ relaxed ordering and
           LRO
      
         - phy: re-add check for PHY_BRCM_DIS_TXCRXC_NOENRGY on the BCM54811
           PHY
      
         - sctp: fix return value check in __sctp_rcv_asconf_lookup
      
        Previous releases - always broken:
      
         - bpf:
             - more spectre corner case fixes, introduce a BPF nospec
               instruction for mitigating Spectre v4
             - fix OOB read when printing XDP link fdinfo
             - sockmap: fix cleanup related races
      
         - mac80211: fix enabling 4-address mode on a sta vif after assoc
      
         - can:
             - raw: raw_setsockopt(): fix raw_rcv panic for sock UAF
             - j1939: j1939_session_deactivate(): clarify lifetime of session
               object, avoid UAF
             - fix number of identical memory leaks in USB drivers
      
         - tipc:
             - do not blindly write skb_shinfo frags when doing decryption
             - fix sleeping in tipc accept routine"
      
      * tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
        gve: Update MAINTAINERS list
        can: esd_usb2: fix memory leak
        can: ems_usb: fix memory leak
        can: usb_8dev: fix memory leak
        can: mcba_usb_start(): add missing urb->transfer_dma initialization
        can: hi311x: fix a signedness bug in hi3110_cmd()
        MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver
        bpf: Fix leakage due to insufficient speculative store bypass mitigation
        bpf: Introduce BPF nospec instruction for mitigating Spectre v4
        sis900: Fix missing pci_disable_device() in probe and remove
        net: let flow have same hash in two directions
        nfc: nfcsim: fix use after free during module unload
        tulip: windbond-840: Fix missing pci_disable_device() in probe and remove
        sctp: fix return value check in __sctp_rcv_asconf_lookup
        nfc: s3fwrn5: fix undefined parameter values in dev_err()
        net/mlx5: Fix mlx5_vport_tbl_attr chain from u16 to u32
        net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev()
        net/mlx5: Unload device upon firmware fatal error
        net/mlx5e: Fix page allocation failure for ptp-RQ over SF
        net/mlx5e: Fix page allocation failure for trap-RQ over SF
        ...
      c7d10223
    • L
      Merge tag 'acpi-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · e1dab4c0
      Linus Torvalds 提交于
      Pull ACPI fixes from Rafael Wysocki:
       "These revert a recent IRQ resources handling modification that turned
        out to be problematic, fix suspend-to-idle handling on AMD platforms
        to take upcoming systems into account properly and fix the retrieval
        of the DPTF attributes of the PCH FIVR.
      
        Specifics:
      
         - Revert recent change of the ACPI IRQ resources handling that
           attempted to improve the ACPI IRQ override selection logic, but
           introduced serious regressions on some systems (Hui Wang).
      
         - Fix up quirks for AMD platforms in the suspend-to-idle support code
           so as to take upcoming systems using uPEP HID AMDI007 into account
           as appropriate (Mario Limonciello).
      
         - Fix the code retrieving DPTF attributes of the PCH FIVR so that it
           agrees on the return data type with the ACPI control method
           evaluated for this purpose (Srinivas Pandruvada)"
      
      * tag 'acpi-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: DPTF: Fix reading of attributes
        Revert "ACPI: resources: Add checks for ACPI IRQ override"
        ACPI: PM: Add support for upcoming AMD uPEP HID AMDI007
      e1dab4c0
    • L
      pipe: make pipe writes always wake up readers · 3a34b13a
      Linus Torvalds 提交于
      Since commit 1b6b26ae ("pipe: fix and clarify pipe write wakeup
      logic") we have sanitized the pipe write logic, and would only try to
      wake up readers if they needed it.
      
      In particular, if the pipe already had data in it before the write,
      there was no point in trying to wake up a reader, since any existing
      readers must have been aware of the pre-existing data already.  Doing
      extraneous wakeups will only cause potential thundering herd problems.
      
      However, it turns out that some Android libraries have misused the EPOLL
      interface, and expected "edge triggered" be to "any new write will
      trigger it".  Even if there was no edge in sight.
      
      Quoting Sandeep Patil:
       "The commit 1b6b26ae ('pipe: fix and clarify pipe write wakeup
        logic') changed pipe write logic to wakeup readers only if the pipe
        was empty at the time of write. However, there are libraries that
        relied upon the older behavior for notification scheme similar to
        what's described in [1]
      
        One such library 'realm-core'[2] is used by numerous Android
        applications. The library uses a similar notification mechanism as GNU
        Make but it never drains the pipe until it is full. When Android moved
        to v5.10 kernel, all applications using this library stopped working.
      
        The library has since been fixed[3] but it will be a while before all
        applications incorporate the updated library"
      
      Our regression rule for the kernel is that if applications break from
      new behavior, it's a regression, even if it was because the application
      did something patently wrong.  Also note the original report [4] by
      Michal Kerrisk about a test for this epoll behavior - but at that point
      we didn't know of any actual broken use case.
      
      So add the extraneous wakeup, to approximate the old behavior.
      
      [ I say "approximate", because the exact old behavior was to do a wakeup
        not for each write(), but for each pipe buffer chunk that was filled
        in. The behavior introduced by this change is not that - this is just
        "every write will cause a wakeup, whether necessary or not", which
        seems to be sufficient for the broken library use. ]
      
      It's worth noting that this adds the extraneous wakeup only for the
      write side, while the read side still considers the "edge" to be purely
      about reading enough from the pipe to allow further writes.
      
      See commit f467a6a6 ("pipe: fix and clarify pipe read wakeup logic")
      for the pipe read case, which remains that "only wake up if the pipe was
      full, and we read something from it".
      
      Link: https://lore.kernel.org/lkml/CAHk-=wjeG0q1vgzu4iJhW5juPkTsjTYmiqiMUYAebWW+0bam6w@mail.gmail.com/ [1]
      Link: https://github.com/realm/realm-core [2]
      Link: https://github.com/realm/realm-core/issues/4666 [3]
      Link: https://lore.kernel.org/lkml/CAKgNAkjMBGeAwF=2MKK758BhxvW58wYTgYKB2V-gY1PwXxrH+Q@mail.gmail.com/ [4]
      Link: https://lore.kernel.org/lkml/20210729222635.2937453-1-sspatil@android.com/Reported-by: NSandeep Patil <sspatil@android.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3a34b13a
    • R
      Merge branches 'acpi-resources' and 'acpi-dptf' · e83f54ea
      Rafael J. Wysocki 提交于
      * acpi-resources:
        Revert "ACPI: resources: Add checks for ACPI IRQ override"
      
      * acpi-dptf:
        ACPI: DPTF: Fix reading of attributes
      e83f54ea
    • L
      Merge tag 'block-5.14-2021-07-30' of git://git.kernel.dk/linux-block · 4669e13c
      Linus Torvalds 提交于
      Pull block fixes from Jens Axboe:
      
       - gendisk freeing fix (Christoph)
      
       - blk-iocost wake ordering fix (Tejun)
      
       - tag allocation error handling fix (John)
      
       - loop locking fix. While this isn't the prettiest fix in the world,
         nobody has any good alternatives for 5.14. Something to likely
         revisit for 5.15. (Tetsuo)
      
      * tag 'block-5.14-2021-07-30' of git://git.kernel.dk/linux-block:
        block: delay freeing the gendisk
        blk-iocost: fix operation ordering in iocg_wake_fn()
        blk-mq-sched: Fix blk_mq_sched_alloc_tags() error handling
        loop: reintroduce global lock for safe loop_validate_file() traversal
      4669e13c
    • L
      Merge tag 'io_uring-5.14-2021-07-30' of git://git.kernel.dk/linux-block · 27eb687b
      Linus Torvalds 提交于
      Pull io_uring fixes from Jens Axboe:
      
       - A fix for block backed reissue (me)
      
       - Reissue context hardening (me)
      
       - Async link locking fix (Pavel)
      
      * tag 'io_uring-5.14-2021-07-30' of git://git.kernel.dk/linux-block:
        io_uring: fix poll requests leaking second poll entries
        io_uring: don't block level reissue off completion path
        io_uring: always reissue from task_work context
        io_uring: fix race in unified task_work running
        io_uring: fix io_prep_async_link locking
      27eb687b
    • L
      Merge tag 'libata-5.14-2021-07-30' of git://git.kernel.dk/linux-block · f6c5971b
      Linus Torvalds 提交于
      Pull libata fixlets from Jens Axboe:
      
       - A fix for PIO highmem (Christoph)
      
       - Kill HAVE_IDE as it's now unused (Lukas)
      
      * tag 'libata-5.14-2021-07-30' of git://git.kernel.dk/linux-block:
        arch: Kconfig: clean up obsolete use of HAVE_IDE
        libata: fix ata_pio_sector for CONFIG_HIGHMEM
      f6c5971b
    • L
      Merge tag 'for-5.14-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 051df241
      Linus Torvalds 提交于
      Pull btrfs fixes from David Sterba:
      
       - fix -Warray-bounds warning, to help external patchset to make it
         default treewide
      
       - fix writeable device accounting (syzbot report)
      
       - fix fsync and log replay after a rename and inode eviction
      
       - fix potentially lost error code when submitting multiple bios for
         compressed range
      
      * tag 'for-5.14-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: calculate number of eb pages properly in csum_tree_block
        btrfs: fix rw device counting in __btrfs_free_extra_devids
        btrfs: fix lost inode on log replay after mix of fsync, rename and inode eviction
        btrfs: mark compressed range uptodate only if all bio succeed
      051df241
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · 8723bc8f
      Linus Torvalds 提交于
      Pull HID fixes from Jiri Kosina:
      
       - resume timing fix for intel-ish driver (Ye Xiang)
      
       - fix for using incorrect MMIO register in amd_sfh driver (Dylan
         MacKenzie)
      
       - Cintiq 24HDT / 27QHDT regression fix and touch processing fix for
         Wacom driver (Jason Gerecke)
      
       - device removal bugfix for ft260 driver (Michael Zaidman)
      
       - other small assorted fixes
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
        HID: ft260: fix device removal due to USB disconnect
        HID: wacom: Skip processing of touches with negative slot values
        HID: wacom: Re-enable touch by default for Cintiq 24HDT / 27QHDT
        HID: Kconfig: Fix spelling mistake "Uninterruptable" -> "Uninterruptible"
        HID: apple: Add support for Keychron K1 wireless keyboard
        HID: fix typo in Kconfig
        HID: ft260: fix format type warning in ft260_word_show()
        HID: amd_sfh: Use correct MMIO register for DMA address
        HID: asus: Remove check for same LED brightness on set
        HID: intel-ish-hid: use async resume function
      8723bc8f
    • L
      Merge branch 'akpm' (patches from Andrew) · ad6ec09d
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "7 patches.
      
        Subsystems affected by this patch series: lib, ocfs2, and mm (slub,
        migration, and memcg)"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook()
        slub: fix unreclaimable slab stat for bulk free
        mm/migrate: fix NR_ISOLATED corruption on 64-bit
        mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code
        ocfs2: issue zeroout to EOF blocks
        ocfs2: fix zero out valid data
        lib/test_string.c: move string selftest in the Runtime Testing menu
      ad6ec09d
    • J
      Merge tag 'linux-can-fixes-for-5.14-20210730' of... · 8d670412
      Jakub Kicinski 提交于
      Merge tag 'linux-can-fixes-for-5.14-20210730' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2021-07-30
      
      The first patch is by me and adds Yasushi SHOJI as a reviewer for the
      Microchip CAN BUS Analyzer Tool driver.
      
      Dan Carpenter's patch fixes a signedness bug in the hi311x driver.
      
      Pavel Skripkin provides 4 patches, the first targets the mcba_usb
      driver by adding the missing urb->transfer_dma initialization, which
      was broken in a previous commit. The last 3 patches fix a memory leak
      in the usb_8dev, ems_usb and esd_usb2 driver.
      
      * tag 'linux-can-fixes-for-5.14-20210730' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
        can: esd_usb2: fix memory leak
        can: ems_usb: fix memory leak
        can: usb_8dev: fix memory leak
        can: mcba_usb_start(): add missing urb->transfer_dma initialization
        can: hi311x: fix a signedness bug in hi3110_cmd()
        MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver
      ====================
      
      Link: https://lore.kernel.org/r/20210730070526.1699867-1-mkl@pengutronix.deSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      8d670412
    • W
      mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook() · 121dffe2
      Wang Hai 提交于
      When I use kfree_rcu() to free a large memory allocated by kmalloc_node(),
      the following dump occurs.
      
        BUG: kernel NULL pointer dereference, address: 0000000000000020
        [...]
        Oops: 0000 [#1] SMP
        [...]
        Workqueue: events kfree_rcu_work
        RIP: 0010:__obj_to_index include/linux/slub_def.h:182 [inline]
        RIP: 0010:obj_to_index include/linux/slub_def.h:191 [inline]
        RIP: 0010:memcg_slab_free_hook+0x120/0x260 mm/slab.h:363
        [...]
        Call Trace:
          kmem_cache_free_bulk+0x58/0x630 mm/slub.c:3293
          kfree_bulk include/linux/slab.h:413 [inline]
          kfree_rcu_work+0x1ab/0x200 kernel/rcu/tree.c:3300
          process_one_work+0x207/0x530 kernel/workqueue.c:2276
          worker_thread+0x320/0x610 kernel/workqueue.c:2422
          kthread+0x13d/0x160 kernel/kthread.c:313
          ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
      
      When kmalloc_node() a large memory, page is allocated, not slab, so when
      freeing memory via kfree_rcu(), this large memory should not be used by
      memcg_slab_free_hook(), because memcg_slab_free_hook() is is used for
      slab.
      
      Using page_objcgs_check() instead of page_objcgs() in
      memcg_slab_free_hook() to fix this bug.
      
      Link: https://lkml.kernel.org/r/20210728145655.274476-1-wanghai38@huawei.com
      Fixes: 270c6a71 ("mm: memcontrol/slab: Use helpers to access slab page's memcg_data")
      Signed-off-by: NWang Hai <wanghai38@huawei.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      121dffe2
    • S
      slub: fix unreclaimable slab stat for bulk free · f227f0fa
      Shakeel Butt 提交于
      SLUB uses page allocator for higher order allocations and update
      unreclaimable slab stat for such allocations.  At the moment, the bulk
      free for SLUB does not share code with normal free code path for these
      type of allocations and have missed the stat update.  So, fix the stat
      update by common code.  The user visible impact of the bug is the
      potential of inconsistent unreclaimable slab stat visible through
      meminfo and vmstat.
      
      Link: https://lkml.kernel.org/r/20210728155354.3440560-1-shakeelb@google.com
      Fixes: 6a486c0a ("mm, sl[ou]b: improve memory accounting")
      Signed-off-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f227f0fa
    • A
      mm/migrate: fix NR_ISOLATED corruption on 64-bit · b5916c02
      Aneesh Kumar K.V 提交于
      Similar to commit 2da9f630 ("mm/vmscan: fix NR_ISOLATED_FILE
      corruption on 64-bit") avoid using unsigned int for nr_pages.  With
      unsigned int type the large unsigned int converts to a large positive
      signed long.
      
      Symptoms include CMA allocations hanging forever due to
      alloc_contig_range->...->isolate_migratepages_block waiting forever in
      "while (unlikely(too_many_isolated(pgdat)))".
      
      Link: https://lkml.kernel.org/r/20210728042531.359409-1-aneesh.kumar@linux.ibm.com
      Fixes: c5fc5c3a ("mm: migrate: account THP NUMA migration counters correctly")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NYang Shi <shy828301@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5916c02
    • J
      mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code · 30def935
      Johannes Weiner 提交于
      Dan Carpenter reports:
      
          The patch 2d146aa3: "mm: memcontrol: switch to rstat" from Apr
          29, 2021, leads to the following static checker warning:
      
      	    kernel/cgroup/rstat.c:200 cgroup_rstat_flush()
      	    warn: sleeping in atomic context
      
          mm/memcontrol.c
            3572  static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
            3573  {
            3574          unsigned long val;
            3575
            3576          if (mem_cgroup_is_root(memcg)) {
            3577                  cgroup_rstat_flush(memcg->css.cgroup);
      			    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      
          This is from static analysis and potentially a false positive.  The
          problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold()
          which holds an rcu_read_lock().  And the cgroup_rstat_flush() function
          can sleep.
      
            3578                  val = memcg_page_state(memcg, NR_FILE_PAGES) +
            3579                          memcg_page_state(memcg, NR_ANON_MAPPED);
            3580                  if (swap)
            3581                          val += memcg_page_state(memcg, MEMCG_SWAP);
            3582          } else {
            3583                  if (!swap)
            3584                          val = page_counter_read(&memcg->memory);
            3585                  else
            3586                          val = page_counter_read(&memcg->memsw);
            3587          }
            3588          return val;
            3589  }
      
      __mem_cgroup_threshold() indeed holds the rcu lock.  In addition, the
      thresholding code is invoked during stat changes, and those contexts
      have irqs disabled as well.  If the lock breaking occurs inside the
      flush function, it will result in a sleep from an atomic context.
      
      Use the irqsafe flushing variant in mem_cgroup_usage() to fix this.
      
      Link: https://lkml.kernel.org/r/20210726150019.251820-1-hannes@cmpxchg.org
      Fixes: 2d146aa3 ("mm: memcontrol: switch to rstat")
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NChris Down <chris@chrisdown.name>
      Reviewed-by: NRik van Riel <riel@surriel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30def935
    • J
      ocfs2: issue zeroout to EOF blocks · 9449ad33
      Junxiao Bi 提交于
      For punch holes in EOF blocks, fallocate used buffer write to zero the
      EOF blocks in last cluster.  But since ->writepage will ignore EOF
      pages, those zeros will not be flushed.
      
      This "looks" ok as commit 6bba4471 ("ocfs2: fix data corruption by
      fallocate") will zero the EOF blocks when extend the file size, but it
      isn't.  The problem happened on those EOF pages, before writeback, those
      pages had DIRTY flag set and all buffer_head in them also had DIRTY flag
      set, when writeback run by write_cache_pages(), DIRTY flag on the page
      was cleared, but DIRTY flag on the buffer_head not.
      
      When next write happened to those EOF pages, since buffer_head already
      had DIRTY flag set, it would not mark page DIRTY again.  That made
      writeback ignore them forever.  That will cause data corruption.  Even
      directio write can't work because it will fail when trying to drop pages
      caches before direct io, as it found the buffer_head for those pages
      still had DIRTY flag set, then it will fall back to buffer io mode.
      
      To make a summary of the issue, as writeback ingores EOF pages, once any
      EOF page is generated, any write to it will only go to the page cache,
      it will never be flushed to disk even file size extends and that page is
      not EOF page any more.  The fix is to avoid zero EOF blocks with buffer
      write.
      
      The following code snippet from qemu-img could trigger the corruption.
      
        656   open("6b3711ae-3306-4bdd-823c-cf1c0060a095.conv.2", O_RDWR|O_DIRECT|O_CLOEXEC) = 11
        ...
        660   fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2275868672, 327680 <unfinished ...>
        660   fallocate(11, 0, 2275868672, 327680) = 0
        658   pwrite64(11, "
      
      Link: https://lkml.kernel.org/r/20210722054923.24389-2-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9449ad33
    • J
      ocfs2: fix zero out valid data · f267aeb6
      Junxiao Bi 提交于
      If append-dio feature is enabled, direct-io write and fallocate could
      run in parallel to extend file size, fallocate used "orig_isize" to
      record i_size before taking "ip_alloc_sem", when
      ocfs2_zeroout_partial_cluster() zeroout EOF blocks, i_size maybe already
      extended by ocfs2_dio_end_io_write(), that will cause valid data zeroed
      out.
      
      Link: https://lkml.kernel.org/r/20210722054923.24389-1-junxiao.bi@oracle.com
      Fixes: 6bba4471 ("ocfs2: fix data corruption by fallocate")
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f267aeb6
    • M
      lib/test_string.c: move string selftest in the Runtime Testing menu · b2ff70a0
      Matteo Croce 提交于
      STRING_SELFTEST is presented in the "Library routines" menu.  Move it in
      Kernel hacking > Kernel Testing and Coverage > Runtime Testing together
      with other similar tests found in lib/
      
      	--- Runtime Testing
      	<*>   Test functions located in the hexdump module at runtime
      	<*>   Test string functions (NEW)
      	<*>   Test functions located in the string_helpers module at runtime
      	<*>   Test strscpy*() family of functions at runtime
      	<*>   Test kstrto*() family of functions at runtime
      	<*>   Test printf() family of functions at runtime
      	<*>   Test scanf() family of functions at runtime
      
      Link: https://lkml.kernel.org/r/20210719185158.190371-1-mcroce@linux.microsoft.comSigned-off-by: NMatteo Croce <mcroce@microsoft.com>
      Cc: Peter Rosin <peda@axentia.se>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b2ff70a0
    • C
      gve: Update MAINTAINERS list · 028a7177
      Catherine Sullivan 提交于
      The team maintaining the gve driver has undergone some changes,
      this updates the MAINTAINERS file accordingly.
      Signed-off-by: NCatherine Sullivan <csully@google.com>
      Signed-off-by: NJon Olson <jonolson@google.com>
      Signed-off-by: NDavid Awogbemila <awogbemila@google.com>
      Signed-off-by: NJeroen de Borst <jeroendb@google.com>
      Link: https://lore.kernel.org/r/20210729155258.442650-1-csully@google.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      028a7177
  3. 30 7月, 2021 9 次提交