1. 01 6月, 2020 2 次提交
    • J
      x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit · f7d31e65
      Jon Doron 提交于
      The problem the patch is trying to address is the fact that 'struct
      kvm_hyperv_exit' has different layout on when compiling in 32 and 64 bit
      modes.
      
      In 64-bit mode the default alignment boundary is 64 bits thus
      forcing extra gaps after 'type' and 'msr' but in 32-bit mode the
      boundary is at 32 bits thus no extra gaps.
      
      This is an issue as even when the kernel is 64 bit, the userspace using
      the interface can be both 32 and 64 bit but the same 32 bit userspace has
      to work with 32 bit kernel.
      
      The issue is fixed by forcing the 64 bit layout, this leads to ABI
      change for 32 bit builds and while we are obviously breaking '32 bit
      userspace with 32 bit kernel' case, we're fixing the '32 bit userspace
      with 64 bit kernel' one.
      
      As the interface has no (known) users and 32 bit KVM is rather baroque
      nowadays, this seems like a reasonable decision.
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NJon Doron <arilou@gmail.com>
      Message-Id: <20200424113746.3473563-2-arilou@gmail.com>
      Reviewed-by: NRoman Kagan <rvkagan@yandex-team.ru>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f7d31e65
    • V
      KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT · 72de5fa4
      Vitaly Kuznetsov 提交于
      Introduce new capability to indicate that KVM supports interrupt based
      delivery of 'page ready' APF events. This includes support for both
      MSR_KVM_ASYNC_PF_INT and MSR_KVM_ASYNC_PF_ACK.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200525144125.143875-8-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      72de5fa4
  2. 05 5月, 2020 1 次提交
  3. 27 4月, 2020 1 次提交
  4. 25 4月, 2020 2 次提交
  5. 23 4月, 2020 1 次提交
  6. 19 4月, 2020 2 次提交
    • G
      uapi: linux: fiemap.h: Replace zero-length array with flexible-array member · 6e88abb8
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      6e88abb8
    • G
      uapi: linux: dlm_device.h: Replace zero-length array with flexible-array member · d6cdad87
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      d6cdad87
  7. 17 4月, 2020 1 次提交
  8. 11 4月, 2020 1 次提交
  9. 08 4月, 2020 5 次提交
  10. 06 4月, 2020 1 次提交
    • M
      netfilter: xt_IDLETIMER: target v1 - match Android layout · bc9fe614
      Maciej Żenczykowski 提交于
      Android has long had an extension to IDLETIMER to send netlink
      messages to userspace, see:
        https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/include/uapi/linux/netfilter/xt_IDLETIMER.h#42
      Note: this is idletimer target rev 1, there is no rev 0 in
      the Android common kernel sources, see registration at:
        https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/net/netfilter/xt_IDLETIMER.c#483
      
      When we compare that to upstream's new idletimer target rev 1:
        https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git/tree/include/uapi/linux/netfilter/xt_IDLETIMER.h#n46
      
      We immediately notice that these two rev 1 structs are the
      same size and layout, and that while timer_type and send_nl_msg
      are differently named and serve a different purpose, they're
      at the same offset.
      
      This makes them impossible to tell apart - and thus one cannot
      know in a mixed Android/vanilla environment whether one means
      timer_type or send_nl_msg.
      
      Since this is iptables/netfilter uapi it introduces a problem
      between iptables (vanilla vs Android) userspace and kernel
      (vanilla vs Android) if the two don't match each other.
      
      Additionally when at some point in the future Android picks up
      5.7+ it's not at all clear how to resolve the resulting merge
      conflict.
      
      Furthermore, since upgrading the kernel on old Android phones
      is pretty much impossible there does not seem to be an easy way
      out of this predicament.
      
      The only thing I've been able to come up with is some super
      disgusting kernel version >= 5.7 check in the iptables binary
      to flip between different struct layouts.
      
      By adding a dummy field to the vanilla Linux kernel header file
      we can force the two structs to be compatible with each other.
      
      Long term I think I would like to deprecate send_nl_msg out of
      Android entirely, but I haven't quite been able to figure out
      exactly how we depend on it.  It seems to be very similar to
      sysfs notifications but with some extra info.
      
      Currently it's actually always enabled whenever Android uses
      the IDLETIMER target, so we could also probably entirely
      remove it from the uapi in favour of just always enabling it,
      but again we can't upgrade old kernels already in the field.
      
      (Also note that this doesn't change the structure's size,
      as it is simply fitting into the pre-existing padding, and
      that since 5.7 hasn't been released yet, there's still time
      to make this uapi visible change)
      
      Cc: Manoj Basapathi <manojbm@codeaurora.org>
      Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      bc9fe614
  11. 03 4月, 2020 3 次提交
  12. 02 4月, 2020 2 次提交
    • T
      vhost: introduce vDPA-based backend · 4c8cf318
      Tiwei Bie 提交于
      This patch introduces a vDPA-based vhost backend. This backend is
      built on top of the same interface defined in virtio-vDPA and provides
      a generic vhost interface for userspace to accelerate the virtio
      devices in guest.
      
      This backend is implemented as a vDPA device driver on top of the same
      ops used in virtio-vDPA. It will create char device entry named
      vhost-vdpa-$index for userspace to use. Userspace can use vhost ioctls
      on top of this char device to setup the backend.
      
      Vhost ioctls are extended to make it type agnostic and behave like a
      virtio device, this help to eliminate type specific API like what
      vhost_net/scsi/vsock did:
      
      - VHOST_VDPA_GET_DEVICE_ID: get the virtio device ID which is defined
        by virtio specification to differ from different type of devices
      - VHOST_VDPA_GET_VRING_NUM: get the maximum size of virtqueue
        supported by the vDPA device
      - VHSOT_VDPA_SET/GET_STATUS: set and get virtio status of vDPA device
      - VHOST_VDPA_SET/GET_CONFIG: access virtio config space
      - VHOST_VDPA_SET_VRING_ENABLE: enable a specific virtqueue
      
      For memory mapping, IOTLB API is mandated for vhost-vDPA which means
      userspace drivers are required to use
      VHOST_IOTLB_UPDATE/VHOST_IOTLB_INVALIDATE to add or remove mapping for
      a specific userspace memory region.
      
      The vhost-vDPA API is designed to be type agnostic, but it allows net
      device only in current stage. Due to the lacking of control virtqueue
      support, some features were filter out by vhost-vdpa.
      
      We will enable more features and devices in the near future.
      Signed-off-by: NTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: NEugenio Pérez <eperezma@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Link: https://lore.kernel.org/r/20200326140125.19794-8-jasowang@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
      4c8cf318
    • R
      Input: update SPDX tag for input-event-codes.h · 3a857962
      Rajat Jain 提交于
      Replace the
      /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
      with
      /* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
      
      to help coreboot community consume this file without relaxing their
      licensing checks.
      Signed-off-by: NRajat Jain <rajatja@google.com>
      Link: https://lore.kernel.org/r/20200329172513.133548-1-rajatja@google.comSigned-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      3a857962
  13. 31 3月, 2020 6 次提交
    • I
      devlink: Add packet trap policers support · 1e8c6619
      Ido Schimmel 提交于
      Devices capable of offloading the kernel's datapath and perform
      functions such as bridging and routing must also be able to send (trap)
      specific packets to the kernel (i.e., the CPU) for processing.
      
      For example, a device acting as a multicast-aware bridge must be able to
      trap IGMP membership reports to the kernel for processing by the bridge
      module.
      
      In most cases, the underlying device is capable of handling packet rates
      that are several orders of magnitude higher compared to those that can
      be handled by the CPU.
      
      Therefore, in order to prevent the underlying device from overwhelming
      the CPU, devices usually include packet trap policers that are able to
      police the trapped packets to rates that can be handled by the CPU.
      
      This patch allows capable device drivers to register their supported
      packet trap policers with devlink. User space can then tune the
      parameters of these policer (currently, rate and burst size) and read
      from the device the number of packets that were dropped by the policer,
      if supported.
      
      Subsequent patches in the series will allow device drivers to create
      default binding between these policers and packet trap groups and allow
      user space to change the binding.
      
      v2:
      * Add 'strict_start_type' in devlink policy
      * Have device drivers provide max/min rate/burst size for each policer.
        Use them to check validity of user provided parameters
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e8c6619
    • A
      bpf: Implement bpf_prog replacement for an active bpf_cgroup_link · 0c991ebc
      Andrii Nakryiko 提交于
      Add new operation (LINK_UPDATE), which allows to replace active bpf_prog from
      under given bpf_link. Currently this is only supported for bpf_cgroup_link,
      but will be extended to other kinds of bpf_links in follow-up patches.
      
      For bpf_cgroup_link, implemented functionality matches existing semantics for
      direct bpf_prog attachment (including BPF_F_REPLACE flag). User can either
      unconditionally set new bpf_prog regardless of which bpf_prog is currently
      active under given bpf_link, or, optionally, can specify expected active
      bpf_prog. If active bpf_prog doesn't match expected one, no changes are
      performed, old bpf_link stays intact and attached, operation returns
      a failure.
      
      cgroup_bpf_replace() operation is resolving race between auto-detachment and
      bpf_prog update in the same fashion as it's done for bpf_link detachment,
      except in this case update has no way of succeeding because of target cgroup
      marked as dying. So in this case error is returned.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200330030001.2312810-3-andriin@fb.com
      0c991ebc
    • A
      bpf: Implement bpf_link-based cgroup BPF program attachment · af6eea57
      Andrii Nakryiko 提交于
      Implement new sub-command to attach cgroup BPF programs and return FD-based
      bpf_link back on success. bpf_link, once attached to cgroup, cannot be
      replaced, except by owner having its FD. Cgroup bpf_link supports only
      BPF_F_ALLOW_MULTI semantics. Both link-based and prog-based BPF_F_ALLOW_MULTI
      attachments can be freely intermixed.
      
      To prevent bpf_cgroup_link from keeping cgroup alive past the point when no
      BPF program can be executed, implement auto-detachment of link. When
      cgroup_bpf_release() is called, all attached bpf_links are forced to release
      cgroup refcounts, but they leave bpf_link otherwise active and allocated, as
      well as still owning underlying bpf_prog. This is because user-space might
      still have FDs open and active, so bpf_link as a user-referenced object can't
      be freed yet. Once last active FD is closed, bpf_link will be freed and
      underlying bpf_prog refcount will be dropped. But cgroup refcount won't be
      touched, because cgroup is released already.
      
      The inherent race between bpf_cgroup_link release (from closing last FD) and
      cgroup_bpf_release() is resolved by both operations taking cgroup_mutex. So
      the only additional check required is when bpf_cgroup_link attempts to detach
      itself from cgroup. At that time we need to check whether there is still
      cgroup associated with that link. And if not, exit with success, because
      bpf_cgroup_link was already successfully detached.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Link: https://lore.kernel.org/bpf/20200330030001.2312810-2-andriin@fb.com
      af6eea57
    • J
      bpf: Add socket assign support · cf7fbe66
      Joe Stringer 提交于
      Add support for TPROXY via a new bpf helper, bpf_sk_assign().
      
      This helper requires the BPF program to discover the socket via a call
      to bpf_sk*_lookup_*(), then pass this socket to the new helper. The
      helper takes its own reference to the socket in addition to any existing
      reference that may or may not currently be obtained for the duration of
      BPF processing. For the destination socket to receive the traffic, the
      traffic must be routed towards that socket via local route. The
      simplest example route is below, but in practice you may want to route
      traffic more narrowly (eg by CIDR):
      
        $ ip route add local default dev lo
      
      This patch avoids trying to introduce an extra bit into the skb->sk, as
      that would require more invasive changes to all code interacting with
      the socket to ensure that the bit is handled correctly, such as all
      error-handling cases along the path from the helper in BPF through to
      the orphan path in the input. Instead, we opt to use the destructor
      variable to switch on the prefetch of the socket.
      Signed-off-by: NJoe Stringer <joe@wand.net.nz>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20200329225342.16317-2-joe@wand.net.nz
      cf7fbe66
    • E
      devlink: Add auto dump flag to health reporter · 48bb52c8
      Eran Ben Elisha 提交于
      On low memory system, run time dumps can consume too much memory. Add
      administrator ability to disable auto dumps per reporter as part of the
      error flow handle routine.
      
      This attribute is not relevant while executing
      DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET.
      
      By default, auto dump is activated for any reporter that has a dump method,
      as part of the reporter registration to devlink.
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48bb52c8
    • J
      net: sched: expose HW stats types per action used by drivers · 93a129eb
      Jiri Pirko 提交于
      It may be up to the driver (in case ANY HW stats is passed) to select
      which type of HW stats he is going to use. Add an infrastructure to
      expose this information to user.
      
      $ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
      $ tc -s filter show dev enp3s0np1 ingress
      filter protocol ip pref 1 flower chain 0
      filter protocol ip pref 1 flower chain 0 handle 0x1
        eth_type ipv4
        dst_ip 192.168.1.1
        in_hw in_hw_count 2
              action order 1: gact action drop
               random type none pass val 0
               index 1 ref 1 bind 1 installed 10 sec used 10 sec
              Action statistics:
              Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
              used_hw_stats immediate     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93a129eb
  14. 30 3月, 2020 12 次提交