1. 30 11月, 2022 1 次提交
    • D
      KVM: x86/xen: Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured · d8ba8ba4
      David Woodhouse 提交于
      Closer inspection of the Xen code shows that we aren't supposed to be
      using the XEN_RUNSTATE_UPDATE flag unconditionally. It should be
      explicitly enabled by guests through the HYPERVISOR_vm_assist hypercall.
      If we randomly set the top bit of ->state_entry_time for a guest that
      hasn't asked for it and doesn't expect it, that could make the runtimes
      fail to add up and confuse the guest. Without the flag it's perfectly
      safe for a vCPU to read its own vcpu_runstate_info; just not for one
      vCPU to read *another's*.
      
      I briefly pondered adding a word for the whole set of VMASST_TYPE_*
      flags but the only one we care about for HVM guests is this, so it
      seemed a bit pointless.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20221127122210.248427-3-dwmw2@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d8ba8ba4
  2. 23 11月, 2022 2 次提交
    • C
      KVM: s390: pv: add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE · 8c516b25
      Claudio Imbrenda 提交于
      Add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE to signal that the
      KVM_PV_ASYNC_DISABLE and KVM_PV_ASYNC_DISABLE_PREPARE commands for the
      KVM_S390_PV_COMMAND ioctl are available.
      Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
      Reviewed-by: NNico Boehr <nrb@linux.ibm.com>
      Reviewed-by: NSteffen Eiden <seiden@linux.ibm.com>
      Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
      Link: https://lore.kernel.org/r/20221111170632.77622-4-imbrenda@linux.ibm.com
      Message-Id: <20221111170632.77622-4-imbrenda@linux.ibm.com>
      Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
      8c516b25
    • C
      KVM: s390: pv: asynchronous destroy for reboot · fb491d55
      Claudio Imbrenda 提交于
      Until now, destroying a protected guest was an entirely synchronous
      operation that could potentially take a very long time, depending on
      the size of the guest, due to the time needed to clean up the address
      space from protected pages.
      
      This patch implements an asynchronous destroy mechanism, that allows a
      protected guest to reboot significantly faster than previously.
      
      This is achieved by clearing the pages of the old guest in background.
      In case of reboot, the new guest will be able to run in the same
      address space almost immediately.
      
      The old protected guest is then only destroyed when all of its memory
      has been destroyed or otherwise made non protected.
      
      Two new PV commands are added for the KVM_S390_PV_COMMAND ioctl:
      
      KVM_PV_ASYNC_CLEANUP_PREPARE: set aside the current protected VM for
      later asynchronous teardown. The current KVM VM will then continue
      immediately as non-protected. If a protected VM had already been
      set aside for asynchronous teardown, but without starting the teardown
      process, this call will fail. There can be at most one VM set aside at
      any time. Once it is set aside, the protected VM only exists in the
      context of the Ultravisor, it is not associated with the KVM VM
      anymore. Its protected CPUs have already been destroyed, but not its
      memory. This command can be issued again immediately after starting
      KVM_PV_ASYNC_CLEANUP_PERFORM, without having to wait for completion.
      
      KVM_PV_ASYNC_CLEANUP_PERFORM: tears down the protected VM previously
      set aside using KVM_PV_ASYNC_CLEANUP_PREPARE. Ideally the
      KVM_PV_ASYNC_CLEANUP_PERFORM PV command should be issued by userspace
      from a separate thread. If a fatal signal is received (or if the
      process terminates naturally), the command will terminate immediately
      without completing. All protected VMs whose teardown was interrupted
      will be put in the need_cleanup list. The rest of the normal KVM
      teardown process will take care of properly cleaning up all remaining
      protected VMs, including the ones on the need_cleanup list.
      Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
      Reviewed-by: NNico Boehr <nrb@linux.ibm.com>
      Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
      Reviewed-by: NSteffen Eiden <seiden@linux.ibm.com>
      Link: https://lore.kernel.org/r/20221111170632.77622-2-imbrenda@linux.ibm.com
      Message-Id: <20221111170632.77622-2-imbrenda@linux.ibm.com>
      Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
      fb491d55
  3. 10 11月, 2022 1 次提交
  4. 27 10月, 2022 1 次提交
  5. 25 10月, 2022 1 次提交
  6. 08 10月, 2022 1 次提交
    • Z
      vDPA: allow userspace to query features of a vDPA device · 22856510
      Zhu Lingshan 提交于
      This commit adds a new vDPA netlink attribution
      VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES. Userspace can query
      features of vDPA devices through this new attr.
      
      This commit invokes vdpa_config_ops.get_config()
      rather than vdpa_get_config_unlocked() to read
      the device config spcae, so no races in
      vdpa_set_features_unlocked()
      
      Userspace tool iproute2 example:
      $ vdpa dev config show vdpa0
      vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false max_vq_pairs 4 mtu 1500
        negotiated_features MRG_RXBUF CTRL_VQ MQ VERSION_1 ACCESS_PLATFORM
        dev_features MTU MAC MRG_RXBUF CTRL_VQ MQ ANY_LAYOUT VERSION_1 ACCESS_PLATFORM
      Signed-off-by: NZhu Lingshan <lingshan.zhu@intel.com>
      Message-Id: <20220929014555.112323-2-lingshan.zhu@intel.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      22856510
  7. 07 10月, 2022 2 次提交
    • A
      virtio_blk: add SECURE ERASE command support · e60d6407
      Alvaro Karsz 提交于
      Support for the VIRTIO_BLK_F_SECURE_ERASE VirtIO feature.
      
      A device that offers this feature can receive VIRTIO_BLK_T_SECURE_ERASE
      commands.
      
      A device which supports this feature has the following fields in the
      virtio config:
      
      - max_secure_erase_sectors
      - max_secure_erase_seg
      - secure_erase_sector_alignment
      
      max_secure_erase_sectors and secure_erase_sector_alignment are expressed
      in 512-byte units.
      
      Every secure erase command has the following fields:
      
      - sectors: The starting offset in 512-byte units.
      - num_sectors: The number of sectors.
      Signed-off-by: NAlvaro Karsz <alvaro.karsz@solid-run.com>
      Message-Id: <20220921082729.2516779-1-alvaro.karsz@solid-run.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      e60d6407
    • J
      vdpa: device feature provisioning · 90fea5a8
      Jason Wang 提交于
      This patch allows the device features to be provisioned through
      netlink. A new attribute is introduced to allow the userspace to pass
      a 64bit device features during device adding.
      
      This provides several advantages:
      
      - Allow to provision a subset of the features to ease the cross vendor
        live migration.
      - Better debug-ability for vDPA framework and parent.
      Reviewed-by: NEli Cohen <elic@nvidia.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Message-Id: <20220927074810.28627-2-jasowang@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      90fea5a8
  8. 04 10月, 2022 1 次提交
    • O
      ethtool: add interface to interact with Ethernet Power Equipment · 18ff0bcd
      Oleksij Rempel 提交于
      Add interface to support Power Sourcing Equipment. At current step it
      provides generic way to address all variants of PSE devices as defined
      in IEEE 802.3-2018 but support only objects specified for IEEE 802.3-2018 104.4
      PoDL Power Sourcing Equipment (PSE).
      
      Currently supported and mandatory objects are:
      IEEE 802.3-2018 30.15.1.1.3 aPoDLPSEPowerDetectionStatus
      IEEE 802.3-2018 30.15.1.1.2 aPoDLPSEAdminState
      IEEE 802.3-2018 30.15.1.2.1 acPoDLPSEAdminControl
      
      This is minimal interface needed to control PSE on each separate
      ethernet port but it provides not all mandatory objects specified in
      IEEE 802.3-2018.
      
      Since "PoDL PSE" and "PSE" have similar names, but some different values
      I decide to not merge them and keep separate naming schema. This should
      allow as to be as close to IEEE 802.3 spec as possible and avoid name
      conflicts in the future.
      
      This implementation is connected to PHYs instead of MACs because PSE
      auto classification can potentially interfere with PHY auto negotiation.
      So, may be some extra PHY related initialization will be needed.
      
      With WIP version of ethtools interaction with PSE capable link looks
      as following:
      
      $ ip l
      ...
      5: t1l1@eth0: <BROADCAST,MULTICAST> ..
      ...
      
      $ ethtool --show-pse t1l1
      PSE attributs for t1l1:
      PoDL PSE Admin State: disabled
      PoDL PSE Power Detection Status: disabled
      
      $ ethtool --set-pse t1l1 podl-pse-admin-control enable
      $ ethtool --show-pse t1l1
      PSE attributs for t1l1:
      PoDL PSE Admin State: enabled
      PoDL PSE Power Detection Status: delivering power
      Signed-off-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: NBagas Sanjaya <bagasdotme@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      18ff0bcd
  9. 30 9月, 2022 6 次提交
  10. 29 9月, 2022 7 次提交
  11. 28 9月, 2022 1 次提交
    • T
      net: tls: Add ARIA-GCM algorithm · 62e56ef5
      Taehee Yoo 提交于
      RFC 6209 describes ARIA for TLS 1.2.
      ARIA-128-GCM and ARIA-256-GCM are defined in RFC 6209.
      
      This patch would offer performance increment and an opportunity for
      hardware offload.
      
      Benchmark results:
      iperf-ssl are used.
      CPU: intel i3-12100.
      
        TLS(openssl-3.0-dev)
      [  3]  0.0- 1.0 sec   185 MBytes  1.55 Gbits/sec
      [  3]  1.0- 2.0 sec   186 MBytes  1.56 Gbits/sec
      [  3]  2.0- 3.0 sec   186 MBytes  1.56 Gbits/sec
      [  3]  3.0- 4.0 sec   186 MBytes  1.56 Gbits/sec
      [  3]  4.0- 5.0 sec   186 MBytes  1.56 Gbits/sec
      [  3]  0.0- 5.0 sec   927 MBytes  1.56 Gbits/sec
        kTLS(aria-generic)
      [  3]  0.0- 1.0 sec   198 MBytes  1.66 Gbits/sec
      [  3]  1.0- 2.0 sec   194 MBytes  1.62 Gbits/sec
      [  3]  2.0- 3.0 sec   194 MBytes  1.63 Gbits/sec
      [  3]  3.0- 4.0 sec   194 MBytes  1.63 Gbits/sec
      [  3]  4.0- 5.0 sec   194 MBytes  1.62 Gbits/sec
      [  3]  0.0- 5.0 sec   974 MBytes  1.63 Gbits/sec
        kTLS(aria-avx wirh GFNI)
      [  3]  0.0- 1.0 sec   632 MBytes  5.30 Gbits/sec
      [  3]  1.0- 2.0 sec   657 MBytes  5.51 Gbits/sec
      [  3]  2.0- 3.0 sec   657 MBytes  5.51 Gbits/sec
      [  3]  3.0- 4.0 sec   656 MBytes  5.50 Gbits/sec
      [  3]  4.0- 5.0 sec   656 MBytes  5.50 Gbits/sec
      [  3]  0.0- 5.0 sec  3.18 GBytes  5.47 Gbits/sec
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: NVadim Fedorenko <vfedorenko@novek.ru>
      Link: https://lore.kernel.org/r/20220925150033.24615-1-ap420073@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      62e56ef5
  12. 27 9月, 2022 3 次提交
  13. 26 9月, 2022 2 次提交
    • Q
      btrfs: introduce BTRFS_QGROUP_STATUS_FLAGS_MASK for later expansion · e71564c0
      Qu Wenruo 提交于
      Currently we only have 3 qgroup flags:
      
      - BTRFS_QGROUP_STATUS_FLAG_ON
      - BTRFS_QGROUP_STATUS_FLAG_RESCAN
      - BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT
      
      These flags match the on-disk flags used in btrfs_qgroup_status.
      
      But we're going to introduce extra runtime flags which will not reach
      disks.
      
      So here we introduce a new mask, BTRFS_QGROUP_STATUS_FLAGS_MASK, to
      make sure only those flags can reach disks.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e71564c0
    • Q
      btrfs: separate BLOCK_GROUP_TREE compat RO flag from EXTENT_TREE_V2 · 1c56ab99
      Qu Wenruo 提交于
      The problem of long mount time caused by block group item search is
      already known for some time, and the solution of block group tree has
      been proposed.
      
      There is really no need to bound this feature into extent tree v2, just
      introduce compat RO flag, BLOCK_GROUP_TREE, to correctly solve the
      problem.
      
      All the code handling block group root is already in the upstream
      kernel, thus this patch really only needs to introduce the new compat RO
      flag.
      
      This patch introduces one extra artificial limitation on block group
      tree feature, that free space cache v2 and no-holes feature must be
      enabled to use this new compat RO feature.
      
      This artificial requirement is mostly to reduce the test combinations,
      and can be a guideline for future features, to mostly rely on the latest
      default features.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1c56ab99
  14. 24 9月, 2022 7 次提交
  15. 23 9月, 2022 2 次提交
    • P
      tun: support not enabling carrier in TUNSETIFF · 195624d9
      Patrick Rohr 提交于
      This change adds support for not enabling carrier during TUNSETIFF
      interface creation by specifying the IFF_NO_CARRIER flag.
      
      Our tests make heavy use of tun interfaces. In some scenarios, the test
      process creates the interface but another process brings it up after the
      interface is discovered via netlink notification. In that case, it is
      not possible to create a tun/tap interface with carrier off without it
      racing against the bring up. Immediately setting carrier off via
      TUNSETCARRIER is still too late.
      Signed-off-by: NPatrick Rohr <prohr@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Reviewed-by: NMaciej Żenczykowski <maze@google.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Reviewed-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      195624d9
    • S
      net: phy: Add support for rate matching · 0c3e10cb
      Sean Anderson 提交于
      This adds support for rate matching (also known as rate adaptation) to
      the phy subsystem. The general idea is that the phy interface runs at
      one speed, and the MAC throttles the rate at which it sends packets to
      the link speed. There's a good overview of several techniques for
      achieving this at [1]. This patch adds support for three: pause-frame
      based (such as in Aquantia phys), CRS-based (such as in 10PASS-TS and
      2BASE-TL), and open-loop-based (such as in 10GBASE-W).
      
      This patch makes a few assumptions and a few non assumptions about the
      types of rate matching available. First, it assumes that different phys
      may use different forms of rate matching. Second, it assumes that phys
      can use rate matching for any of their supported link speeds (e.g. if a
      phy supports 10BASE-T and XGMII, then it can adapt XGMII to 10BASE-T).
      Third, it does not assume that all interface modes will use the same
      form of rate matching. Fourth, it does not assume that all phy devices
      will support rate matching (even if some do). Relaxing or strengthening
      these (non-)assumptions could result in a different API. For example, if
      all interface modes were assumed to use the same form of rate matching,
      then a bitmask of interface modes supportting rate matching would
      suffice.
      
      For some better visibility into the process, the current rate matching
      mode is exposed as part of the ethtool ksettings. For the moment, only
      read access is supported. I'm not sure what userspace might want to
      configure yet (disable it altogether, disable just one mode, specify the
      mode to use, etc.). For the moment, since only pause-based rate
      adaptation support is added in the next few commits, rate matching can
      be disabled altogether by adjusting the advertisement.
      
      802.3 calls this feature "rate adaptation" in clause 49 (10GBASE-R) and
      "rate matching" in clause 61 (10PASS-TL and 2BASE-TS). Aquantia also calls
      this feature "rate adaptation". I chose "rate matching" because it is
      shorter, and because Russell doesn't think "adaptation" is correct in this
      context.
      Signed-off-by: NSean Anderson <sean.anderson@seco.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c3e10cb
  16. 22 9月, 2022 2 次提交
    • D
      bpf: Add bpf_user_ringbuf_drain() helper · 20571567
      David Vernet 提交于
      In a prior change, we added a new BPF_MAP_TYPE_USER_RINGBUF map type which
      will allow user-space applications to publish messages to a ring buffer
      that is consumed by a BPF program in kernel-space. In order for this
      map-type to be useful, it will require a BPF helper function that BPF
      programs can invoke to drain samples from the ring buffer, and invoke
      callbacks on those samples. This change adds that capability via a new BPF
      helper function:
      
      bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx,
                             u64 flags)
      
      BPF programs may invoke this function to run callback_fn() on a series of
      samples in the ring buffer. callback_fn() has the following signature:
      
      long callback_fn(struct bpf_dynptr *dynptr, void *context);
      
      Samples are provided to the callback in the form of struct bpf_dynptr *'s,
      which the program can read using BPF helper functions for querying
      struct bpf_dynptr's.
      
      In order to support bpf_ringbuf_drain(), a new PTR_TO_DYNPTR register
      type is added to the verifier to reflect a dynptr that was allocated by
      a helper function and passed to a BPF program. Unlike PTR_TO_STACK
      dynptrs which are allocated on the stack by a BPF program, PTR_TO_DYNPTR
      dynptrs need not use reference tracking, as the BPF helper is trusted to
      properly free the dynptr before returning. The verifier currently only
      supports PTR_TO_DYNPTR registers that are also DYNPTR_TYPE_LOCAL.
      
      Note that while the corresponding user-space libbpf logic will be added
      in a subsequent patch, this patch does contain an implementation of the
      .map_poll() callback for BPF_MAP_TYPE_USER_RINGBUF maps. This
      .map_poll() callback guarantees that an epoll-waiting user-space
      producer will receive at least one event notification whenever at least
      one sample is drained in an invocation of bpf_user_ringbuf_drain(),
      provided that the function is not invoked with the BPF_RB_NO_WAKEUP
      flag. If the BPF_RB_FORCE_WAKEUP flag is provided, a wakeup
      notification is sent even if no sample was drained.
      Signed-off-by: NDavid Vernet <void@manifault.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20220920000100.477320-3-void@manifault.com
      20571567
    • D
      bpf: Define new BPF_MAP_TYPE_USER_RINGBUF map type · 583c1f42
      David Vernet 提交于
      We want to support a ringbuf map type where samples are published from
      user-space, to be consumed by BPF programs. BPF currently supports a
      kernel -> user-space circular ring buffer via the BPF_MAP_TYPE_RINGBUF
      map type.  We'll need to define a new map type for user-space -> kernel,
      as none of the helpers exported for BPF_MAP_TYPE_RINGBUF will apply
      to a user-space producer ring buffer, and we'll want to add one or
      more helper functions that would not apply for a kernel-producer
      ring buffer.
      
      This patch therefore adds a new BPF_MAP_TYPE_USER_RINGBUF map type
      definition. The map type is useless in its current form, as there is no
      way to access or use it for anything until we one or more BPF helpers. A
      follow-on patch will therefore add a new helper function that allows BPF
      programs to run callbacks on samples that are published to the ring
      buffer.
      Signed-off-by: NDavid Vernet <void@manifault.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Acked-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20220920000100.477320-2-void@manifault.com
      583c1f42