1. 03 8月, 2021 11 次提交
  2. 02 8月, 2021 12 次提交
    • G
      net/ipv4: Replace one-element array with flexible-array member · 2d3e5caf
      Gustavo A. R. Silva 提交于
      There is a regular need in the kernel to provide a way to declare having
      a dynamically sized set of trailing elements in a structure. Kernel code
      should always use “flexible array members”[1] for these cases. The older
      style of one-element or zero-length arrays should no longer be used[2].
      
      Use an anonymous union with a couple of anonymous structs in order to
      keep userspace unchanged:
      
      $ pahole -C ip_msfilter net/ipv4/ip_sockglue.o
      struct ip_msfilter {
      	union {
      		struct {
      			__be32     imsf_multiaddr_aux;   /*     0     4 */
      			__be32     imsf_interface_aux;   /*     4     4 */
      			__u32      imsf_fmode_aux;       /*     8     4 */
      			__u32      imsf_numsrc_aux;      /*    12     4 */
      			__be32     imsf_slist[1];        /*    16     4 */
      		};                                       /*     0    20 */
      		struct {
      			__be32     imsf_multiaddr;       /*     0     4 */
      			__be32     imsf_interface;       /*     4     4 */
      			__u32      imsf_fmode;           /*     8     4 */
      			__u32      imsf_numsrc;          /*    12     4 */
      			__be32     imsf_slist_flex[0];   /*    16     0 */
      		};                                       /*     0    16 */
      	};                                               /*     0    20 */
      
      	/* size: 20, cachelines: 1, members: 1 */
      	/* last cacheline: 20 bytes */
      };
      
      Also, refactor the code accordingly and make use of the struct_size()
      and flex_array_size() helpers.
      
      This helps with the ongoing efforts to globally enable -Warray-bounds
      and get us closer to being able to tighten the FORTIFY_SOURCE routines
      on memcpy().
      
      [1] https://en.wikipedia.org/wiki/Flexible_array_member
      [2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays
      
      Link: https://github.com/KSPP/linux/issues/79
      Link: https://github.com/KSPP/linux/issues/109Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d3e5caf
    • V
      net: dsa: remove the struct packet_type argument from dsa_device_ops::rcv() · 29a097b7
      Vladimir Oltean 提交于
      No tagging driver uses this.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29a097b7
    • K
      nfc: hci: pass callback data param as pointer in nci_request() · 35d7a6f1
      Krzysztof Kozlowski 提交于
      The nci_request() receives a callback function and unsigned long data
      argument "opt" which is passed to the callback.  Almost all of the
      nci_request() callers pass pointer to a stack variable as data argument.
      Only few pass scalar value (e.g. u8).
      
      All such callbacks do not modify passed data argument and in previous
      commit they were made as const.  However passing pointers via unsigned
      long removes the const annotation.  The callback could simply cast
      unsigned long to a pointer to writeable memory.
      
      Use "const void *" as type of this "opt" argument to solve this and
      prevent modifying the pointed contents.  This is also consistent with
      generic pattern of passing data arguments - via "void *".  In few places
      which pass scalar values, use casts via "unsigned long" to suppress any
      warnings.
      Signed-off-by: NKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35d7a6f1
    • C
      cavium: switch from 'pci_' to 'dma_' API · 1e0dd56e
      Christophe JAILLET 提交于
      The wrappers in include/linux/pci-dma-compat.h should go away.
      
      The patch has been generated with the coccinelle script below. It has been
      hand modified to use 'dma_set_mask_and_coherent()' instead of
      'pci_set_dma_mask()/pci_set_consistent_dma_mask()' when applicable.
      
      It has been compile tested.
      
      @@
      @@
      -    PCI_DMA_BIDIRECTIONAL
      +    DMA_BIDIRECTIONAL
      
      @@
      @@
      -    PCI_DMA_TODEVICE
      +    DMA_TO_DEVICE
      
      @@
      @@
      -    PCI_DMA_FROMDEVICE
      +    DMA_FROM_DEVICE
      
      @@
      @@
      -    PCI_DMA_NONE
      +    DMA_NONE
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_alloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_zalloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_free_consistent(e1, e2, e3, e4)
      +    dma_free_coherent(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_single(e1, e2, e3, e4)
      +    dma_map_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_single(e1, e2, e3, e4)
      +    dma_unmap_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4, e5;
      @@
      -    pci_map_page(e1, e2, e3, e4, e5)
      +    dma_map_page(&e1->dev, e2, e3, e4, e5)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_page(e1, e2, e3, e4)
      +    dma_unmap_page(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_sg(e1, e2, e3, e4)
      +    dma_map_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_sg(e1, e2, e3, e4)
      +    dma_unmap_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
      +    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_device(e1, e2, e3, e4)
      +    dma_sync_single_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
      +    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
      +    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2;
      @@
      -    pci_dma_mapping_error(e1, e2)
      +    dma_mapping_error(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_dma_mask(e1, e2)
      +    dma_set_mask(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_consistent_dma_mask(e1, e2)
      +    dma_set_coherent_mask(&e1->dev, e2)
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e0dd56e
    • V
      net: dsa: mt7530: drop paranoid checks in .get_tag_protocol() · 244f8a80
      Vladimir Oltean 提交于
      It is desirable to reduce the surface of DSA_TAG_PROTO_NONE as much as
      we can, because we now have options for switches without hardware
      support for DSA tagging, and the occurrence in the mt7530 driver is in
      fact quite gratuitout and easy to remove. Since ds->ops->get_tag_protocol()
      is only called for CPU ports, the checks for a CPU port in
      mtk_get_tag_protocol() are redundant and can be removed.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NDENG Qingfang <dqfext@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      244f8a80
    • D
      Merge branch 'octeon-drr-config' · a3280efd
      David S. Miller 提交于
      Sunil Goutham says:
      
      ====================
      cn10k: DWRR MTU and weights configuration
      
      On OcteonTx2 DWRR quantum is directly configured into each of
      the transmit scheduler queues. And PF/VF drivers were free to
      config any value upto 2^24.
      
      On CN10K, HW is modified, the quantum configuration at scheduler
      queues is in terms of weight. And SW needs to setup a base DWRR MTU
      at NIX_AF_DWRR_RPM_MTU / NIX_AF_DWRR_SDP_MTU. HW will do
      'DWRR MTU * weight' to get the quantum.
      
      This patch series addresses this HW change on CN10K silicons,
      both admin function and PF/VF drivers are modified.
      
      Also added support to program DWRR MTU via devlink params.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3280efd
    • S
      octeontx2-pf: cn10k: Config DWRR weight based on MTU · c39830a4
      Sunil Goutham 提交于
      Program SQ, MDQ, TL4 to TL2 transmit scheduler queues' DWRR
      weight based on DWRR MTU programmed at NIX_AF_DWRR_RPM_MTU.
      The DWRR MTU from admin function is retrieved via mbox.
      
      On OcteaonTx2 silicon, admin function driver responds with DWRR
      MTU as '1'. This helps to avoid silicon specific transmit
      scheduler DWRR quantum/weight configuration logic.
      Signed-off-by: NSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c39830a4
    • S
      octeontx2-af: cn10k: DWRR MTU configuration · 76660df2
      Sunil Goutham 提交于
      On OcteonTx2 DWRR quantum is directly configured into each of
      the transmit scheduler queues. And PF/VF drivers were free to
      config any value upto 2^24.
      
      On CN10K, HW is modified, the quantum configuration at scheduler
      queues is in terms of weight. And SW needs to setup a base DWRR MTU
      at NIX_AF_DWRR_RPM_MTU / NIX_AF_DWRR_SDP_MTU. HW will do
      'DWRR MTU * weight' to get the quantum. For LBK traffic, value
      programmed into NIX_AF_DWRR_RPM_MTU register is considered as
      DWRR MTU.
      
      This patch programs a default DWRR MTU of 8192 into HW and also
      provides a way to change this via devlink params.
      Signed-off-by: NSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76660df2
    • D
      selftests/net: remove min gso test in packet_snd · cfba3fb6
      Dust Li 提交于
      This patch removed the 'raw gso min size - 1' test which
      always fails now:
      ./in_netns.sh ./psock_snd -v -c -g -l "${mss}"
        raw gso min size - 1 (expected to fail)
        tx: 1524
        rx: 1472
        OK
      
      After commit 7c6d2ecb ("net: be more gentle about silly
      gso requests coming from user"), we relaxed the min gso_size
      check in virtio_net_hdr_to_skb().
      So when a packet which is smaller then the gso_size,
      GSO for this packet will not be set, the packet will be
      send/recv successfully.
      Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
      Reviewed-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfba3fb6
    • Y
      bonding: 3ad: fix the concurrency between __bond_release_one() and bond_3ad_state_machine_handler() · 220ade77
      Yufeng Mo 提交于
      Some time ago, I reported a calltrace issue
      "did not find a suitable aggregator", please see[1].
      After a period of analysis and reproduction, I find
      that this problem is caused by concurrency.
      
      Before the problem occurs, the bond structure is like follows:
      
      bond0 - slaver0(eth0) - agg0.lag_ports -> port0 - port1
                            \
                              port0
            \
              slaver1(eth1) - agg1.lag_ports -> NULL
                            \
                              port1
      
      If we run 'ifenslave bond0 -d eth1', the process is like below:
      
      excuting __bond_release_one()
      |
      bond_upper_dev_unlink()[step1]
      |                       |                       |
      |                       |                       bond_3ad_lacpdu_recv()
      |                       |                       ->bond_3ad_rx_indication()
      |                       |                       spin_lock_bh()
      |                       |                       ->ad_rx_machine()
      |                       |                       ->__record_pdu()[step2]
      |                       |                       spin_unlock_bh()
      |                       |                       |
      |                       bond_3ad_state_machine_handler()
      |                       spin_lock_bh()
      |                       ->ad_port_selection_logic()
      |                       ->try to find free aggregator[step3]
      |                       ->try to find suitable aggregator[step4]
      |                       ->did not find a suitable aggregator[step5]
      |                       spin_unlock_bh()
      |                       |
      |                       |
      bond_3ad_unbind_slave() |
      spin_lock_bh()
      spin_unlock_bh()
      
      step1: already removed slaver1(eth1) from list, but port1 remains
      step2: receive a lacpdu and update port0
      step3: port0 will be removed from agg0.lag_ports. The struct is
             "agg0.lag_ports -> port1" now, and agg0 is not free. At the
      	   same time, slaver1/agg1 has been removed from the list by step1.
      	   So we can't find a free aggregator now.
      step4: can't find suitable aggregator because of step2
      step5: cause a calltrace since port->aggregator is NULL
      
      To solve this concurrency problem, put bond_upper_dev_unlink()
      after bond_3ad_unbind_slave(). In this way, we can invalid the port
      first and skip this port in bond_3ad_state_machine_handler(). This
      eliminates the situation that the slaver has been removed from the
      list but the port is still valid.
      
      [1]https://lore.kernel.org/netdev/10374.1611947473@famine/Signed-off-by: NYufeng Mo <moyufeng@huawei.com>
      Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      220ade77
    • C
      net_sched: refactor TC action init API · 695176bf
      Cong Wang 提交于
      TC action ->init() API has 10 parameters, it becomes harder
      to read. Some of them are just boolean and can be replaced
      by flags. Similarly for the internal API tcf_action_init()
      and tcf_exts_validate().
      
      This patch converts them to flags and fold them into
      the upper 16 bits of "flags", whose lower 16 bits are still
      reserved for user-space. More specifically, the following
      kernel flags are introduced:
      
      TCA_ACT_FLAGS_POLICE replace 'name' in a few contexts, to
      distinguish whether it is compatible with policer.
      
      TCA_ACT_FLAGS_BIND replaces 'bind', to indicate whether
      this action is bound to a filter.
      
      TCA_ACT_FLAGS_REPLACE  replaces 'ovr' in most contexts,
      means we are replacing an existing action.
      
      TCA_ACT_FLAGS_NO_RTNL replaces 'rtnl_held' but has the
      opposite meaning, because we still hold RTNL in most
      cases.
      
      The only user-space flag TCA_ACT_FLAGS_NO_PERCPU_STATS is
      untouched and still stored as before.
      
      I have tested this patch with tdc and I do not see any
      failure related to this patch.
      Tested-by: NVlad Buslov <vladbu@nvidia.com>
      Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      695176bf
    • M
      niu: read property length only if we use it · 451395f7
      Martin Kaiser 提交于
      In three places, the driver calls of_get_property and reads the property
      length although the length is not used. Update the calls to not request
      the length.
      Signed-off-by: NMartin Kaiser <martin@kaiser.cx>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      451395f7
  3. 01 8月, 2021 2 次提交
    • J
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · d39e8b92
      Jakub Kicinski 提交于
      Andrii Nakryiko says:
      
      ====================
      bpf-next 2021-07-30
      
      We've added 64 non-merge commits during the last 15 day(s) which contain
      a total of 83 files changed, 5027 insertions(+), 1808 deletions(-).
      
      The main changes are:
      
      1) BTF-guided binary data dumping libbpf API, from Alan.
      
      2) Internal factoring out of libbpf CO-RE relocation logic, from Alexei.
      
      3) Ambient BPF run context and cgroup storage cleanup, from Andrii.
      
      4) Few small API additions for libbpf 1.0 effort, from Evgeniy and Hengqi.
      
      5) bpf_program__attach_kprobe_opts() fixes in libbpf, from Jiri.
      
      6) bpf_{get,set}sockopt() support in BPF iterators, from Martin.
      
      7) BPF map pinning improvements in libbpf, from Martynas.
      
      8) Improved module BTF support in libbpf and bpftool, from Quentin.
      
      9) Bpftool cleanups and documentation improvements, from Quentin.
      
      10) Libbpf improvements for supporting CO-RE on old kernels, from Shuyi.
      
      11) Increased maximum cgroup storage size, from Stanislav.
      
      12) Small fixes and improvements to BPF tests and samples, from various folks.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
        tools: bpftool: Complete metrics list in "bpftool prog profile" doc
        tools: bpftool: Document and add bash completion for -L, -B options
        selftests/bpf: Update bpftool's consistency script for checking options
        tools: bpftool: Update and synchronise option list in doc and help msg
        tools: bpftool: Complete and synchronise attach or map types
        selftests/bpf: Check consistency between bpftool source, doc, completion
        tools: bpftool: Slightly ease bash completion updates
        unix_bpf: Fix a potential deadlock in unix_dgram_bpf_recvmsg()
        libbpf: Add btf__load_vmlinux_btf/btf__load_module_btf
        tools: bpftool: Support dumping split BTF by id
        libbpf: Add split BTF support for btf__load_from_kernel_by_id()
        tools: Replace btf__get_from_id() with btf__load_from_kernel_by_id()
        tools: Free BTF objects at various locations
        libbpf: Rename btf__get_from_id() as btf__load_from_kernel_by_id()
        libbpf: Rename btf__load() as btf__load_into_kernel()
        libbpf: Return non-null error on failures in libbpf_find_prog_btf_id()
        bpf: Emit better log message if bpf_iter ctx arg btf_id == 0
        tools/resolve_btfids: Emit warnings and patch zero id for missing symbols
        bpf: Increase supported cgroup storage value size
        libbpf: Fix race when pinning maps in parallel
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20210730225606.1897330-1-andrii@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      d39e8b92
    • J
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · d2e11fd2
      Jakub Kicinski 提交于
      Conflicting commits, all resolutions pretty trivial:
      
      drivers/bus/mhi/pci_generic.c
        5c2c8531 ("bus: mhi: pci-generic: configurable network interface MRU")
        56f6f4c4 ("bus: mhi: pci_generic: Apply no-op for wake using sideband wake boolean")
      
      drivers/nfc/s3fwrn5/firmware.c
        a0302ff5 ("nfc: s3fwrn5: remove unnecessary label")
        46573e3a ("nfc: s3fwrn5: fix undefined parameter values in dev_err()")
        801e541c ("nfc: s3fwrn5: fix undefined parameter values in dev_err()")
      
      MAINTAINERS
        7d901a1e ("net: phy: add Maxlinear GPY115/21x/24x driver")
        8a7b46fa ("MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver")
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d2e11fd2
  4. 31 7月, 2021 15 次提交
    • L
      Merge tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · c7d10223
      Linus Torvalds 提交于
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes for 5.14-rc4, including fixes from bpf, can, WiFi
        (mac80211) and netfilter trees.
      
        Current release - regressions:
      
         - mac80211: fix starting aggregation sessions on mesh interfaces
      
        Current release - new code bugs:
      
         - sctp: send pmtu probe only if packet loss in Search Complete state
      
         - bnxt_en: add missing periodic PHC overflow check
      
         - devlink: fix phys_port_name of virtual port and merge error
      
         - hns3: change the method of obtaining default ptp cycle
      
         - can: mcba_usb_start(): add missing urb->transfer_dma initialization
      
        Previous releases - regressions:
      
         - set true network header for ECN decapsulation
      
         - mlx5e: RX, avoid possible data corruption w/ relaxed ordering and
           LRO
      
         - phy: re-add check for PHY_BRCM_DIS_TXCRXC_NOENRGY on the BCM54811
           PHY
      
         - sctp: fix return value check in __sctp_rcv_asconf_lookup
      
        Previous releases - always broken:
      
         - bpf:
             - more spectre corner case fixes, introduce a BPF nospec
               instruction for mitigating Spectre v4
             - fix OOB read when printing XDP link fdinfo
             - sockmap: fix cleanup related races
      
         - mac80211: fix enabling 4-address mode on a sta vif after assoc
      
         - can:
             - raw: raw_setsockopt(): fix raw_rcv panic for sock UAF
             - j1939: j1939_session_deactivate(): clarify lifetime of session
               object, avoid UAF
             - fix number of identical memory leaks in USB drivers
      
         - tipc:
             - do not blindly write skb_shinfo frags when doing decryption
             - fix sleeping in tipc accept routine"
      
      * tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
        gve: Update MAINTAINERS list
        can: esd_usb2: fix memory leak
        can: ems_usb: fix memory leak
        can: usb_8dev: fix memory leak
        can: mcba_usb_start(): add missing urb->transfer_dma initialization
        can: hi311x: fix a signedness bug in hi3110_cmd()
        MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver
        bpf: Fix leakage due to insufficient speculative store bypass mitigation
        bpf: Introduce BPF nospec instruction for mitigating Spectre v4
        sis900: Fix missing pci_disable_device() in probe and remove
        net: let flow have same hash in two directions
        nfc: nfcsim: fix use after free during module unload
        tulip: windbond-840: Fix missing pci_disable_device() in probe and remove
        sctp: fix return value check in __sctp_rcv_asconf_lookup
        nfc: s3fwrn5: fix undefined parameter values in dev_err()
        net/mlx5: Fix mlx5_vport_tbl_attr chain from u16 to u32
        net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev()
        net/mlx5: Unload device upon firmware fatal error
        net/mlx5e: Fix page allocation failure for ptp-RQ over SF
        net/mlx5e: Fix page allocation failure for trap-RQ over SF
        ...
      c7d10223
    • L
      Merge tag 'acpi-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · e1dab4c0
      Linus Torvalds 提交于
      Pull ACPI fixes from Rafael Wysocki:
       "These revert a recent IRQ resources handling modification that turned
        out to be problematic, fix suspend-to-idle handling on AMD platforms
        to take upcoming systems into account properly and fix the retrieval
        of the DPTF attributes of the PCH FIVR.
      
        Specifics:
      
         - Revert recent change of the ACPI IRQ resources handling that
           attempted to improve the ACPI IRQ override selection logic, but
           introduced serious regressions on some systems (Hui Wang).
      
         - Fix up quirks for AMD platforms in the suspend-to-idle support code
           so as to take upcoming systems using uPEP HID AMDI007 into account
           as appropriate (Mario Limonciello).
      
         - Fix the code retrieving DPTF attributes of the PCH FIVR so that it
           agrees on the return data type with the ACPI control method
           evaluated for this purpose (Srinivas Pandruvada)"
      
      * tag 'acpi-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: DPTF: Fix reading of attributes
        Revert "ACPI: resources: Add checks for ACPI IRQ override"
        ACPI: PM: Add support for upcoming AMD uPEP HID AMDI007
      e1dab4c0
    • L
      pipe: make pipe writes always wake up readers · 3a34b13a
      Linus Torvalds 提交于
      Since commit 1b6b26ae ("pipe: fix and clarify pipe write wakeup
      logic") we have sanitized the pipe write logic, and would only try to
      wake up readers if they needed it.
      
      In particular, if the pipe already had data in it before the write,
      there was no point in trying to wake up a reader, since any existing
      readers must have been aware of the pre-existing data already.  Doing
      extraneous wakeups will only cause potential thundering herd problems.
      
      However, it turns out that some Android libraries have misused the EPOLL
      interface, and expected "edge triggered" be to "any new write will
      trigger it".  Even if there was no edge in sight.
      
      Quoting Sandeep Patil:
       "The commit 1b6b26ae ('pipe: fix and clarify pipe write wakeup
        logic') changed pipe write logic to wakeup readers only if the pipe
        was empty at the time of write. However, there are libraries that
        relied upon the older behavior for notification scheme similar to
        what's described in [1]
      
        One such library 'realm-core'[2] is used by numerous Android
        applications. The library uses a similar notification mechanism as GNU
        Make but it never drains the pipe until it is full. When Android moved
        to v5.10 kernel, all applications using this library stopped working.
      
        The library has since been fixed[3] but it will be a while before all
        applications incorporate the updated library"
      
      Our regression rule for the kernel is that if applications break from
      new behavior, it's a regression, even if it was because the application
      did something patently wrong.  Also note the original report [4] by
      Michal Kerrisk about a test for this epoll behavior - but at that point
      we didn't know of any actual broken use case.
      
      So add the extraneous wakeup, to approximate the old behavior.
      
      [ I say "approximate", because the exact old behavior was to do a wakeup
        not for each write(), but for each pipe buffer chunk that was filled
        in. The behavior introduced by this change is not that - this is just
        "every write will cause a wakeup, whether necessary or not", which
        seems to be sufficient for the broken library use. ]
      
      It's worth noting that this adds the extraneous wakeup only for the
      write side, while the read side still considers the "edge" to be purely
      about reading enough from the pipe to allow further writes.
      
      See commit f467a6a6 ("pipe: fix and clarify pipe read wakeup logic")
      for the pipe read case, which remains that "only wake up if the pipe was
      full, and we read something from it".
      
      Link: https://lore.kernel.org/lkml/CAHk-=wjeG0q1vgzu4iJhW5juPkTsjTYmiqiMUYAebWW+0bam6w@mail.gmail.com/ [1]
      Link: https://github.com/realm/realm-core [2]
      Link: https://github.com/realm/realm-core/issues/4666 [3]
      Link: https://lore.kernel.org/lkml/CAKgNAkjMBGeAwF=2MKK758BhxvW58wYTgYKB2V-gY1PwXxrH+Q@mail.gmail.com/ [4]
      Link: https://lore.kernel.org/lkml/20210729222635.2937453-1-sspatil@android.com/Reported-by: NSandeep Patil <sspatil@android.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3a34b13a
    • A
      Merge branch 'tools: bpftool: update, synchronise and validate types and options' · ab0720ce
      Andrii Nakryiko 提交于
      Quentin Monnet says:
      
      ====================
      
      To work with the different program types, map types, attach types etc.
      supported by eBPF, bpftool needs occasional updates to learn about the new
      features supported by the kernel. When such types translate into new
      keyword for the command line, updates are expected in several locations:
      typically, the help message displayed from bpftool itself, the manual page,
      and the bash completion file should be updated. The options used by the
      different commands for bpftool should also remain synchronised at those
      locations.
      
      Several omissions have occurred in the past, and a number of types are
      still missing today. This set is an attempt to improve the situation. It
      brings up-to-date the lists of types or options in bpftool, and also adds a
      Python script to the BPF selftests to automatically check that most of
      these lists remain synchronised.
      
      v2:
      - Reformat some lines in the bash completion file.
      - Do not reformat attach types, to preserve git-blame history.
      - Do not call Python script from tools/testing/selftests/bpf/Makefile.
      ====================
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      ab0720ce
    • Q
      tools: bpftool: Complete metrics list in "bpftool prog profile" doc · 475a23c2
      Quentin Monnet 提交于
      Profiling programs with bpftool was extended some time ago to support
      two new metrics, namely itlb_misses and dtlb_misses (misses for the
      instruction/data translation lookaside buffer). Update the manual page
      and bash completion accordingly.
      
      Fixes: 450d060e ("bpftool: Add {i,d}tlb_misses support for bpftool profile")
      Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210730215435.7095-8-quentin@isovalent.com
      475a23c2
    • Q
      tools: bpftool: Document and add bash completion for -L, -B options · 8cc8c635
      Quentin Monnet 提交于
      The -L|--use-loader option for using loader programs when loading, or
      when generating a skeleton, did not have any documentation or bash
      completion. Same thing goes for -B|--base-btf, used to pass a path to a
      base BTF object for split BTF such as BTF for kernel modules.
      
      This patch documents and adds bash completion for those options.
      
      Fixes: 75fa1777 ("tools/bpftool: Add bpftool support for split BTF")
      Fixes: d510296d ("bpftool: Use syscall/loader program in "prog load" and "gen skeleton" command.")
      Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210730215435.7095-7-quentin@isovalent.com
      8cc8c635
    • Q
      selftests/bpf: Update bpftool's consistency script for checking options · da87772f
      Quentin Monnet 提交于
      Update the script responsible for checking that the different types used
      at various places in bpftool are synchronised, and extend it to check
      the consistency of options between the help messages in the source code
      and the manual pages.
      Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210730215435.7095-6-quentin@isovalent.com
      da87772f
    • Q
      tools: bpftool: Update and synchronise option list in doc and help msg · c07ba629
      Quentin Monnet 提交于
      All bpftool commands support the options for JSON output and debug from
      libbpf. In addition, some commands support additional options
      corresponding to specific use cases.
      
      The list of options described in the man pages for the different
      commands are not always accurate. The messages for interactive help are
      mostly limited to HELP_SPEC_OPTIONS, and are even less representative of
      the actual set of options supported for the commands.
      
      Let's update the lists:
      
      - HELP_SPEC_OPTIONS is modified to contain the "default" options (JSON
        and debug), and to be extensible (no ending curly bracket).
      - All commands use HELP_SPEC_OPTIONS in their help message, and then
        complete the list with their specific options.
      - The lists of options in the man pages are updated.
      - The formatting of the list for bpftool.rst is adjusted to match
        formatting for the other man pages. This is for consistency, and also
        because it will be helpful in a future patch to automatically check
        that the files are synchronised.
      Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210730215435.7095-5-quentin@isovalent.com
      c07ba629
    • Q
      tools: bpftool: Complete and synchronise attach or map types · b544342e
      Quentin Monnet 提交于
      Update bpftool's list of attach type names to tell it about the latest
      attach types, or the "ringbuf" map. Also update the documentation, help
      messages, and bash completion when relevant.
      
      These missing items were reported by the newly added Python script used
      to help maintain consistency in bpftool.
      Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210730215435.7095-4-quentin@isovalent.com
      b544342e
    • Q
      selftests/bpf: Check consistency between bpftool source, doc, completion · a2b5944f
      Quentin Monnet 提交于
      Whenever the eBPF subsystem gains new elements, such as new program or
      map types, it is necessary to update bpftool if we want it able to
      handle the new items.
      
      In addition to the main arrays containing the names of these elements in
      the source code, there are also multiple locations to update:
      
      - The help message in the do_help() functions in bpftool's source code.
      - The RST documentation files.
      - The bash completion file.
      
      This has led to omissions multiple times in the past. This patch
      attempts to address this issue by adding consistency checks for all
      these different locations. It also verifies that the bpf_prog_type,
      bpf_map_type and bpf_attach_type enums from the UAPI BPF header have all
      their members present in bpftool.
      
      The script requires no argument to run, it reads and parses the
      different files to check, and prints the mismatches, if any. It
      currently reports a number of missing elements, which will be fixed in a
      later patch:
      
        $ ./test_bpftool_synctypes.py
        Comparing [...]/linux/tools/bpf/bpftool/map.c (map_type_name) and [...]/linux/tools/bpf/bpftool/bash-completion/bpftool (BPFTOOL_MAP_CREATE_TYPES): {'ringbuf'}
        Comparing BPF header (enum bpf_attach_type) and [...]/linux/tools/bpf/bpftool/common.c (attach_type_name): {'BPF_TRACE_ITER', 'BPF_XDP_DEVMAP', 'BPF_XDP', 'BPF_SK_REUSEPORT_SELECT', 'BPF_XDP_CPUMAP', 'BPF_SK_REUSEPORT_SELECT_OR_MIGRATE'}
        Comparing [...]/linux/tools/bpf/bpftool/prog.c (attach_type_strings) and [...]/linux/tools/bpf/bpftool/prog.c (do_help() ATTACH_TYPE): {'skb_verdict'}
        Comparing [...]/linux/tools/bpf/bpftool/prog.c (attach_type_strings) and [...]/linux/tools/bpf/bpftool/Documentation/bpftool-prog.rst (ATTACH_TYPE): {'skb_verdict'}
        Comparing [...]/linux/tools/bpf/bpftool/prog.c (attach_type_strings) and [...]/linux/tools/bpf/bpftool/bash-completion/bpftool (BPFTOOL_PROG_ATTACH_TYPES): {'skb_verdict'}
      
      Note that the script does NOT check for consistency between the list of
      program types that bpftool claims it accepts and the actual list of
      keywords that can be used. This is because bpftool does not "see" them,
      they are ELF section names parsed by libbpf. It is not hard to parse the
      section_defs[] array in libbpf, but some section names are associated
      with program types that bpftool cannot load at the moment. For example,
      some programs require a BTF target and an attach target that bpftool
      cannot handle. The script may be extended to parse the array and check
      only relevant values in the future.
      
      The script is not added to the selftests' Makefile, because doing so
      would require all patches with BPF UAPI change to also update bpftool.
      Instead it is to be added to the CI.
      Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210730215435.7095-3-quentin@isovalent.com
      a2b5944f
    • Q
      tools: bpftool: Slightly ease bash completion updates · 510b4d4c
      Quentin Monnet 提交于
      Bash completion for bpftool gets two minor improvements in this patch.
      
      Move the detection of attach types for "bpftool cgroup attach" outside
      of the "case/esac" bloc, where we cannot reuse our variable holding the
      list of supported attach types as a pattern list. After the change, we
      have only one list of cgroup attach types to update when new types are
      added, instead of the former two lists.
      
      Also rename the variables holding lists of names for program types, map
      types, and attach types, to make them more unique. This can make it
      slightly easier to point people to the relevant variables to update, but
      the main objective here is to help run a script to check that bash
      completion is up-to-date with bpftool's source code.
      Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210730215435.7095-2-quentin@isovalent.com
      510b4d4c
    • J
      Merge branch 'clean-devlink-net-namespace-operations' · aae950b1
      Jakub Kicinski 提交于
      Leon Romanovsky says:
      
      ====================
      Clean devlink net namespace operations
      
      This short series continues my work on devlink core code to make devlink
      reload less prone to errors and harden it from API abuse.
      
      Despite first patch being a clear fix, I would ask you to apply it to
      net-next anyway, because the fixed patch is anyway old and it will
      help us to eliminate merge conflicts that will arise for following
      patches or even for the second one.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1627578998.git.leonro@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      aae950b1
    • L
      devlink: Allocate devlink directly in requested net namespace · 26713455
      Leon Romanovsky 提交于
      There is no need in extra call indirection and check from impossible
      flow where someone tries to set namespace without prior call
      to devlink_alloc().
      
      Instead of this extra logic and additional EXPORT_SYMBOL, use specialized
      devlink allocation function that receives net namespace as an argument.
      
      Such specialized API allows clear view when devlink initialized in wrong
      net namespace and/or kernel users don't try to change devlink namespace
      under the hood.
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      26713455
    • L
      devlink: Break parameter notification sequence to be before/after unload/load driver · 05a7f4a8
      Leon Romanovsky 提交于
      The change of namespaces during devlink reload calls to driver unload
      before it accesses devlink parameters. The commands below causes to
      use-after-free bug when trying to get flow steering mode.
      
       * ip netns add n1
       * devlink dev reload pci/0000:00:09.0 netns n1
      
       ==================================================================
       BUG: KASAN: use-after-free in mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
       Read of size 4 at addr ffff888009d04308 by task devlink/275
      
       CPU: 6 PID: 275 Comm: devlink Not tainted 5.12.0-rc2+ #2853
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Call Trace:
        dump_stack+0x93/0xc2
        print_address_description.constprop.0+0x18/0x140
        ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
        ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
        kasan_report.cold+0x7c/0xd8
        ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
        mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
        devlink_nl_param_fill+0x1c8/0xe80
        ? __free_pages_ok+0x37a/0x8a0
        ? devlink_flash_update_timeout_notify+0xd0/0xd0
        ? lock_acquire+0x1a9/0x6d0
        ? fs_reclaim_acquire+0xb7/0x160
        ? lock_is_held_type+0x98/0x110
        ? 0xffffffff81000000
        ? lock_release+0x1f9/0x6c0
        ? fs_reclaim_release+0xa1/0xf0
        ? lock_downgrade+0x6d0/0x6d0
        ? lock_is_held_type+0x98/0x110
        ? lock_is_held_type+0x98/0x110
        ? memset+0x20/0x40
        ? __build_skb_around+0x1f8/0x2b0
        devlink_param_notify+0x6d/0x180
        devlink_reload+0x1c3/0x520
        ? devlink_remote_reload_actions_performed+0x30/0x30
        ? mutex_trylock+0x24b/0x2d0
        ? devlink_nl_cmd_reload+0x62b/0x1070
        devlink_nl_cmd_reload+0x66d/0x1070
        ? devlink_reload+0x520/0x520
        ? devlink_get_from_attrs+0x1bc/0x260
        ? devlink_nl_pre_doit+0x64/0x4d0
        genl_family_rcv_msg_doit+0x1e9/0x2f0
        ? mutex_lock_io_nested+0x1130/0x1130
        ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240
        ? security_capable+0x51/0x90
        genl_rcv_msg+0x27f/0x4a0
        ? genl_get_cmd+0x3c0/0x3c0
        ? lock_acquire+0x1a9/0x6d0
        ? devlink_reload+0x520/0x520
        ? lock_release+0x6c0/0x6c0
        netlink_rcv_skb+0x11d/0x340
        ? genl_get_cmd+0x3c0/0x3c0
        ? netlink_ack+0x9f0/0x9f0
        ? lock_release+0x1f9/0x6c0
        genl_rcv+0x24/0x40
        netlink_unicast+0x433/0x700
        ? netlink_attachskb+0x730/0x730
        ? _copy_from_iter_full+0x178/0x650
        ? __alloc_skb+0x113/0x2b0
        netlink_sendmsg+0x6f1/0xbd0
        ? netlink_unicast+0x700/0x700
        ? lock_is_held_type+0x98/0x110
        ? netlink_unicast+0x700/0x700
        sock_sendmsg+0xb0/0xe0
        __sys_sendto+0x193/0x240
        ? __x64_sys_getpeername+0xb0/0xb0
        ? do_sys_openat2+0x10b/0x370
        ? __up_read+0x1a1/0x7b0
        ? do_user_addr_fault+0x219/0xdc0
        ? __x64_sys_openat+0x120/0x1d0
        ? __x64_sys_open+0x1a0/0x1a0
        __x64_sys_sendto+0xdd/0x1b0
        ? syscall_enter_from_user_mode+0x1d/0x50
        do_syscall_64+0x2d/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xae
       RIP: 0033:0x7fc69d0af14a
       Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
       RSP: 002b:00007ffc1d8292f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
       RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fc69d0af14a
       RDX: 0000000000000038 RSI: 0000555f57c56440 RDI: 0000000000000003
       RBP: 0000555f57c56410 R08: 00007fc69d17b200 R09: 000000000000000c
       R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
       R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      
       Allocated by task 146:
        kasan_save_stack+0x1b/0x40
        __kasan_kmalloc+0x99/0xc0
        mlx5_init_fs+0xf0/0x1c50 [mlx5_core]
        mlx5_load+0xd2/0x180 [mlx5_core]
        mlx5_init_one+0x2f6/0x450 [mlx5_core]
        probe_one+0x47d/0x6e0 [mlx5_core]
        pci_device_probe+0x2a0/0x4a0
        really_probe+0x20a/0xc90
        driver_probe_device+0xd8/0x380
        device_driver_attach+0x1df/0x250
        __driver_attach+0xff/0x240
        bus_for_each_dev+0x11e/0x1a0
        bus_add_driver+0x309/0x570
        driver_register+0x1ee/0x380
        0xffffffffa06b8062
        do_one_initcall+0xd5/0x410
        do_init_module+0x1c8/0x760
        load_module+0x6d8b/0x9650
        __do_sys_finit_module+0x118/0x1b0
        do_syscall_64+0x2d/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
       Freed by task 275:
        kasan_save_stack+0x1b/0x40
        kasan_set_track+0x1c/0x30
        kasan_set_free_info+0x20/0x30
        __kasan_slab_free+0x102/0x140
        slab_free_freelist_hook+0x74/0x1b0
        kfree+0xd7/0x2a0
        mlx5_unload+0x16/0xb0 [mlx5_core]
        mlx5_unload_one+0xae/0x120 [mlx5_core]
        mlx5_devlink_reload_down+0x1bc/0x380 [mlx5_core]
        devlink_reload+0x141/0x520
        devlink_nl_cmd_reload+0x66d/0x1070
        genl_family_rcv_msg_doit+0x1e9/0x2f0
        genl_rcv_msg+0x27f/0x4a0
        netlink_rcv_skb+0x11d/0x340
        genl_rcv+0x24/0x40
        netlink_unicast+0x433/0x700
        netlink_sendmsg+0x6f1/0xbd0
        sock_sendmsg+0xb0/0xe0
        __sys_sendto+0x193/0x240
        __x64_sys_sendto+0xdd/0x1b0
        do_syscall_64+0x2d/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
       The buggy address belongs to the object at ffff888009d04300
        which belongs to the cache kmalloc-128 of size 128
       The buggy address is located 8 bytes inside of
        128-byte region [ffff888009d04300, ffff888009d04380)
       The buggy address belongs to the page:
       page:0000000086a64ecc refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888009d04000 pfn:0x9d04
       head:0000000086a64ecc order:1 compound_mapcount:0
       flags: 0x4000000000010200(slab|head)
       raw: 4000000000010200 ffffea0000203980 0000000200000002 ffff8880050428c0
       raw: ffff888009d04000 000000008020001d 00000001ffffffff 0000000000000000
       page dumped because: kasan: bad access detected
      
       Memory state around the buggy address:
        ffff888009d04200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ffff888009d04280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       >ffff888009d04300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                             ^
        ffff888009d04380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
        ffff888009d04400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ==================================================================
      
      The right solution to devlink reload is to notify about deletion of
      parameters, unload driver, change net namespaces, load driver and notify
      about addition of parameters.
      
      Fixes: 070c63f2 ("net: devlink: allow to change namespaces during reload")
      Reviewed-by: NParav Pandit <parav@nvidia.com>
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      05a7f4a8
    • C
      unix_bpf: Fix a potential deadlock in unix_dgram_bpf_recvmsg() · 0b846445
      Cong Wang 提交于
      As Eric noticed, __unix_dgram_recvmsg() may acquire u->iolock
      too, so we have to release it before calling this function.
      
      Fixes: 9825d866 ("af_unix: Implement unix_dgram_bpf_recvmsg()")
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Acked-by: NJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      0b846445