1. 28 9月, 2020 2 次提交
    • A
      KVM: x86: Introduce MSR filtering · 1a155254
      Alexander Graf 提交于
      It's not desireable to have all MSRs always handled by KVM kernel space. Some
      MSRs would be useful to handle in user space to either emulate behavior (like
      uCode updates) or differentiate whether they are valid based on the CPU model.
      
      To allow user space to specify which MSRs it wants to see handled by KVM,
      this patch introduces a new ioctl to push filter rules with bitmaps into
      KVM. Based on these bitmaps, KVM can then decide whether to reject MSR access.
      With the addition of KVM_CAP_X86_USER_SPACE_MSR it can also deflect the
      denied MSR events to user space to operate on.
      
      If no filter is populated, MSR handling stays identical to before.
      Signed-off-by: NAlexander Graf <graf@amazon.com>
      
      Message-Id: <20200925143422.21718-8-graf@amazon.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1a155254
    • A
      KVM: x86: Allow deflecting unknown MSR accesses to user space · 1ae09954
      Alexander Graf 提交于
      MSRs are weird. Some of them are normal control registers, such as EFER.
      Some however are registers that really are model specific, not very
      interesting to virtualization workloads, and not performance critical.
      Others again are really just windows into package configuration.
      
      Out of these MSRs, only the first category is necessary to implement in
      kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against
      certain CPU models and MSRs that contain information on the package level
      are much better suited for user space to process. However, over time we have
      accumulated a lot of MSRs that are not the first category, but still handled
      by in-kernel KVM code.
      
      This patch adds a generic interface to handle WRMSR and RDMSR from user
      space. With this, any future MSR that is part of the latter categories can
      be handled in user space.
      
      Furthermore, it allows us to replace the existing "ignore_msrs" logic with
      something that applies per-VM rather than on the full system. That way you
      can run productive VMs in parallel to experimental ones where you don't care
      about proper MSR handling.
      Signed-off-by: NAlexander Graf <graf@amazon.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      
      Message-Id: <20200925143422.21718-3-graf@amazon.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1ae09954
  2. 12 9月, 2020 1 次提交
    • H
      KVM: MIPS: Change the definition of kvm type · 15e9e35c
      Huacai Chen 提交于
      MIPS defines two kvm types:
      
       #define KVM_VM_MIPS_TE          0
       #define KVM_VM_MIPS_VZ          1
      
      In Documentation/virt/kvm/api.rst it is said that "You probably want to
      use 0 as machine type", which implies that type 0 be the "automatic" or
      "default" type. And, in user-space libvirt use the null-machine (with
      type 0) to detect the kvm capability, which returns "KVM not supported"
      on a VZ platform.
      
      I try to fix it in QEMU but it is ugly:
      https://lists.nongnu.org/archive/html/qemu-devel/2020-08/msg05629.html
      
      And Thomas Huth suggests me to change the definition of kvm type:
      https://lists.nongnu.org/archive/html/qemu-devel/2020-09/msg03281.html
      
      So I define like this:
      
       #define KVM_VM_MIPS_AUTO        0
       #define KVM_VM_MIPS_VZ          1
       #define KVM_VM_MIPS_TE          2
      
      Since VZ and TE cannot co-exists, using type 0 on a TE platform will
      still return success (so old user-space tools have no problems on new
      kernels); the advantage is that using type 0 on a VZ platform will not
      return failure. So, the only problem is "new user-space tools use type
      2 on old kernels", but if we treat this as a kernel bug, we can backport
      this patch to old stable kernels.
      Signed-off-by: NHuacai Chen <chenhc@lemote.com>
      Message-Id: <1599734031-28746-1-git-send-email-chenhc@lemote.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      15e9e35c
  3. 27 8月, 2020 1 次提交
  4. 22 8月, 2020 2 次提交
  5. 21 8月, 2020 1 次提交
  6. 13 8月, 2020 2 次提交
  7. 07 8月, 2020 1 次提交
    • Y
      bpf: Change uapi for bpf iterator map elements · 5e7b3020
      Yonghong Song 提交于
      Commit a5cbe05a ("bpf: Implement bpf iterator for
      map elements") added bpf iterator support for
      map elements. The map element bpf iterator requires
      info to identify a particular map. In the above
      commit, the attr->link_create.target_fd is used
      to carry map_fd and an enum bpf_iter_link_info
      is added to uapi to specify the target_fd actually
      representing a map_fd:
          enum bpf_iter_link_info {
      	BPF_ITER_LINK_UNSPEC = 0,
      	BPF_ITER_LINK_MAP_FD = 1,
      
      	MAX_BPF_ITER_LINK_INFO,
          };
      
      This is an extensible approach as we can grow
      enumerator for pid, cgroup_id, etc. and we can
      unionize target_fd for pid, cgroup_id, etc.
      But in the future, there are chances that
      more complex customization may happen, e.g.,
      for tasks, it could be filtered based on
      both cgroup_id and user_id.
      
      This patch changed the uapi to have fields
      	__aligned_u64	iter_info;
      	__u32		iter_info_len;
      for additional iter_info for link_create.
      The iter_info is defined as
      	union bpf_iter_link_info {
      		struct {
      			__u32   map_fd;
      		} map;
      	};
      
      So future extension for additional customization
      will be easier. The bpf_iter_link_info will be
      passed to target callback to validate and generic
      bpf_iter framework does not need to deal it any
      more.
      
      Note that map_fd = 0 will be considered invalid
      and -EBADF will be returned to user space.
      
      Fixes: a5cbe05a ("bpf: Implement bpf iterator for map elements")
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200805055056.1457463-1-yhs@fb.com
      5e7b3020
  8. 06 8月, 2020 1 次提交
    • J
      vhost-vdpa: support IOTLB batching hints · 25abc060
      Jason Wang 提交于
      This patches extend the vhost IOTLB API to accept batch updating hints
      form userspace. When userspace wants update the device IOTLB in a
      batch, it may do:
      
      1) Write vhost_iotlb_msg with VHOST_IOTLB_BATCH_BEGIN flag
      2) Perform a batch of IOTLB updating via VHOST_IOTLB_UPDATE/INVALIDATE
      3) Write vhost_iotlb_msg with VHOST_IOTLB_BATCH_END flag
      
      Vhost-vdpa may decide to batch the IOMMU/IOTLB updating in step 3 when
      vDPA device support set_map() ops. This is useful for the vDPA device
      that want to know all the mappings to tweak their own DMA translation
      logic.
      
      For vDPA device that doesn't require set_map(), no behavior changes.
      
      This capability is advertised via VHOST_BACKEND_F_IOTLB_BATCH capability.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Link: https://lore.kernel.org/r/20200804162048.22587-5-eli@mellanox.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
      25abc060
  9. 05 8月, 2020 15 次提交
  10. 04 8月, 2020 4 次提交
  11. 02 8月, 2020 1 次提交
  12. 01 8月, 2020 2 次提交
    • R
      rtnetlink: add support for protodown reason · 829eb208
      Roopa Prabhu 提交于
      netdev protodown is a mechanism that allows protocols to
      hold an interface down. It was initially introduced in
      the kernel to hold links down by a multihoming protocol.
      There was also an attempt to introduce protodown
      reason at the time but was rejected. protodown and protodown reason
      is supported by almost every switching and routing platform.
      It was ok for a while to live without a protodown reason.
      But, its become more critical now given more than
      one protocol may need to keep a link down on a system
      at the same time. eg: vrrp peer node, port security,
      multihoming protocol. Its common for Network operators and
      protocol developers to look for such a reason on a networking
      box (Its also known as errDisable by most networking operators)
      
      This patch adds support for link protodown reason
      attribute. There are two ways to maintain protodown
      reasons.
      (a) enumerate every possible reason code in kernel
          - A protocol developer has to make a request and
            have that appear in a certain kernel version
      (b) provide the bits in the kernel, and allow user-space
      (sysadmin or NOS distributions) to manage the bit-to-reasonname
      map.
      	- This makes extending reason codes easier (kind of like
            the iproute2 table to vrf-name map /etc/iproute2/rt_tables.d/)
      
      This patch takes approach (b).
      
      a few things about the patch:
      - It treats the protodown reason bits as counter to indicate
      active protodown users
      - Since protodown attribute is already an exposed UAPI,
      the reason is not enforced on a protodown set. Its a no-op
      if not used.
      the patch follows the below algorithm:
        - presence of reason bits set indicates protodown
          is in use
        - user can set protodown and protodown reason in a
          single or multiple setlink operations
        - setlink operation to clear protodown, will return -EBUSY
          if there are active protodown reason bits
        - reason is not included in link dumps if not used
      
      example with patched iproute2:
      $cat /etc/iproute2/protodown_reasons.d/r.conf
      0 mlag
      1 evpn
      2 vrrp
      3 psecurity
      
      $ip link set dev vxlan0 protodown on protodown_reason vrrp on
      $ip link set dev vxlan0 protodown_reason mlag on
      $ip link show
      14: vxlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
      DEFAULT group default qlen 1000
          link/ether f6:06:be:17:91:e7 brd ff:ff:ff:ff:ff:ff protodown on <mlag,vrrp>
      
      $ip link set dev vxlan0 protodown_reason mlag off
      $ip link set dev vxlan0 protodown off protodown_reason vrrp off
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      829eb208
    • Y
      tcp: add earliest departure time to SCM_TIMESTAMPING_OPT_STATS · 48040793
      Yousuk Seung 提交于
      This change adds TCP_NLA_EDT to SCM_TIMESTAMPING_OPT_STATS that reports
      the earliest departure time(EDT) of the timestamped skb. By tracking EDT
      values of the skb from different timestamps, we can observe when and how
      much the value changed. This allows to measure the precise delay
      injected on the sender host e.g. by a bpf-base throttler.
      Signed-off-by: NYousuk Seung <ysseung@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48040793
  13. 31 7月, 2020 7 次提交