提交 · 5acc5c063196b4a531a761a954023c1848ec832b · openanolis / cloud-kernel

05 12月, 2017 1 次提交

KVM: Introduce KVM_MEMORY_ENCRYPT_OP ioctl · 5acc5c06

由 Brijesh Singh 提交于 12月 04, 2017

If the hardware supports memory encryption then the
KVM_MEMORY_ENCRYPT_OP ioctl can be used by qemu to issue a platform
specific memory encryption commands.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>

5acc5c06

21 11月, 2017 2 次提交

bpf: revert report offload info to user space · 1ee64009

由 Jakub Kicinski 提交于 11月 20, 2017

This reverts commit bd601b6a ("bpf: report offload info to user
space").  The ifindex by itself is not sufficient, we should provide
information on which network namespace this ifindex belongs to.
After considering some options we concluded that it's best to just
remove this API for now, and rework it in -next.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

1ee64009

bpf: offload: rename the ifindex field · 1f6f4cb7

由 Jakub Kicinski 提交于 11月 20, 2017

bpf_target_prog seems long and clunky, rename it to prog_ifindex.
We don't want to call this field just ifindex, because maps
may need a similar field in the future and bpf_attr members for
programs and maps are unnamed.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

1f6f4cb7

18 11月, 2017 1 次提交

kcov: support comparison operands collection · ded97d2c

由 Victor Chibotaru 提交于 11月 17, 2017

Enables kcov to collect comparison operands from instrumented code.
This is done by using Clang's -fsanitize=trace-cmp instrumentation
(currently not available for GCC).

The comparison operands help a lot in fuzz testing.  E.g.  they are used
in Syzkaller to cover the interiors of conditional statements with way
less attempts and thus make previously unreachable code reachable.

To allow separate collection of coverage and comparison operands two
different work modes are implemented.  Mode selection is now done via a
KCOV_ENABLE ioctl call with corresponding argument value.

Link: http://lkml.kernel.org/r/20171011095459.70721-1-glider@google.comSigned-off-by: NVictor Chibotaru <tchibo@google.com>
Signed-off-by: NAlexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: <syzkaller@googlegroups.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ded97d2c

16 11月, 2017 1 次提交

ipv6: sr: update the struct ipv6_sr_hdr · 6ab6a0dd

由 Ahmed Abdelsalam 提交于 11月 15, 2017

The IPv6 Segment Routing Header (SRH) format has been updated (revision 6
of the SRH ietf draft). The update includes the following SRH fields:

(1) The "First Segment" field changed to be "Last Entry" which contains
the index, in the Segment List, of the last element of the Segment List.

(2) The 16 bit "reserved" field now is used as a "tag" which tags a packet
as part of a class or group of packets, e.g.,packets sharing the same
set of properties.

This patch updates the struct ipv6_sr_hdr, so it complies with the updated
SRH draft. The 16 bit "reserved" field is changed to be "tag", In addition
a comment is added to the "first_segment" field, showing that it represents
the "Last Entry" field of the SRH.
Signed-off-by: NAhmed Abdelsalam <amsalam20@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6ab6a0dd

15 11月, 2017 2 次提交

uapi: fix linux/tls.h userspace compilation error · b9f3eb49

由 Dmitry V. Levin 提交于 11月 14, 2017

Move inclusion of a private kernel header <net/tcp.h>
from uapi/linux/tls.h to its only user - net/tls.h,
to fix the following linux/tls.h userspace compilation error:

/usr/include/linux/tls.h:41:21: fatal error: net/tcp.h: No such file or directory

As to this point uapi/linux/tls.h was totaly unusuable for userspace,
cleanup this header file further by moving other redundant includes
to net/tls.h.

Fixes: 3c4d7559 ("tls: kernel TLS support")
Cc: <stable@vger.kernel.org> # v4.13+
Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9f3eb49

uapi: fix linux/rxrpc.h userspace compilation errors · 0eef304b

由 Dmitry V. Levin 提交于 11月 13, 2017

Consistently use types provided by <linux/types.h> to fix the following
linux/rxrpc.h userspace compilation errors:

/usr/include/linux/rxrpc.h:24:2: error: unknown type name 'u16'
  u16  srx_service; /* service desired */
/usr/include/linux/rxrpc.h:25:2: error: unknown type name 'u16'
  u16  transport_type; /* type of transport socket (SOCK_DGRAM) */
/usr/include/linux/rxrpc.h:26:2: error: unknown type name 'u16'
  u16  transport_len; /* length of transport address */

Use __kernel_sa_family_t instead of sa_family_t the same way
as uapi/linux/in.h does, to fix the following
linux/rxrpc.h userspace compilation errors:

/usr/include/linux/rxrpc.h:23:2: error: unknown type name 'sa_family_t'
  sa_family_t srx_family; /* address family */
/usr/include/linux/rxrpc.h:28:3: error: unknown type name 'sa_family_t'
  sa_family_t family;  /* transport address family */

Fixes: 727f8914 ("rxrpc: Expose UAPI definitions to userspace")
Cc: <stable@vger.kernel.org> # v4.14
Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0eef304b

14 11月, 2017 2 次提交

PCI/ASPM: Add L1 Substates definitions · a48f3d5b

由 Bjorn Helgaas 提交于 11月 13, 2017

Add and use #defines for L1 Substate register fields instead of hard-coding
the masks.  Also update comments to use names from the spec.  No functional
change intended.
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NVidya Sagar <vidyas@nvidia.com>

a48f3d5b

PCI/ASPM: Reformat ASPM register definitions · 7f88ba4a

由 Bjorn Helgaas 提交于 11月 10, 2017

Reformat register field definitions in the style used elsewhere and align
comments with names used in the spec.  No functional change intended.
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NVidya Sagar <vidyas@nvidia.com>

7f88ba4a

13 11月, 2017 5 次提交

afs: Lay the groundwork for supporting network namespaces · f044c884

由 David Howells 提交于 11月 02, 2017

Lay the groundwork for supporting network namespaces (netns) to the AFS
filesystem by moving various global features to a network-namespace struct
(afs_net) and providing an instance of this as a temporary global variable
that everything uses via accessor functions for the moment.

The following changes have been made:

 (1) Store the netns in the superblock info.  This will be obtained from
     the mounter's nsproxy on a manual mount and inherited from the parent
     superblock on an automount.

 (2) The cell list is made per-netns.  It can be viewed through
     /proc/net/afs/cells and also be modified by writing commands to that
     file.

 (3) The local workstation cell is set per-ns in /proc/net/afs/rootcell.
     This is unset by default.

 (4) The 'rootcell' module parameter, which sets a cell and VL server list
     modifies the init net namespace, thereby allowing an AFS root fs to be
     theoretically used.

 (5) The volume location lists and the file lock manager are made
     per-netns.

 (6) The AF_RXRPC socket and associated I/O bits are made per-ns.

The various workqueues remain global for the moment.

Changes still to be made:

 (1) /proc/fs/afs/ should be moved to /proc/net/afs/ and a symlink emplaced
     from the old name.

 (2) A per-netns subsys needs to be registered for AFS into which it can
     store its per-netns data.

 (3) Rather than the AF_RXRPC socket being opened on module init, it needs
     to be opened on the creation of a superblock in that netns.

 (4) The socket needs to be closed when the last superblock using it is
     destroyed and all outstanding client calls on it have been completed.
     This prevents a reference loop on the namespace.

 (5) It is possible that several namespaces will want to use AFS, in which
     case each one will need its own UDP port.  These can either be set
     through /proc/net/afs/cm_port or the kernel can pick one at random.
     The init_ns gets 7001 by default.

Other issues that need resolving:

 (1) The DNS keyring needs net-namespacing.

 (2) Where do upcalls go (eg. DNS request-key upcall)?

 (3) Need something like open_socket_in_file_ns() syscall so that AFS
     command line tools attempting to operate on an AFS file/volume have
     their RPC calls go to the right place.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

f044c884

openvswitch: Add meter action support · cd8a6c33

由 Andy Zhou 提交于 11月 10, 2017

Implements OVS kernel meter action support.
Signed-off-by: NAndy Zhou <azhou@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd8a6c33

openvswitch: Add meter netlink definitions · 57940406

由 Andy Zhou 提交于 11月 10, 2017

Meter has its own netlink family. Define netlink messages and attributes
for communicating with the user space programs.
Signed-off-by: NAndy Zhou <azhou@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

57940406

netem: support delivering packets in delayed time slots · 836af83b

由 Dave Taht 提交于 11月 08, 2017

Slotting is a crude approximation of the behaviors of shared media such
as cable, wifi, and LTE, which gather up a bunch of packets within a
varying delay window and deliver them, relative to that, nearly all at
once.

It works within the existing loss, duplication, jitter and delay
parameters of netem. Some amount of inherent latency must be specified,
regardless.

The new "slot" parameter specifies a minimum and maximum delay between
transmission attempts.

The "bytes" and "packets" parameters can be used to limit the amount of
information transferred per slot.

Examples of use:

tc qdisc add dev eth0 root netem delay 200us \
         slot 800us 10ms bytes 64k packets 42

A more correct example, using stacked netem instances and a packet limit
to emulate a tail drop wifi queue with slots and variable packet
delivery, with a 200Mbit isochronous underlying rate, and 20ms path
delay:

tc qdisc add dev eth0 root handle 1: netem delay 20ms rate 200mbit \
         limit 10000
tc qdisc add dev eth0 parent 1:1 handle 10:1 netem delay 200us \
         slot 800us 10ms bytes 64k packets 42 limit 512
Signed-off-by: NDave Taht <dave.taht@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

836af83b

netem: add uapi to express delay and jitter in nanoseconds · 99803171

由 Dave Taht 提交于 11月 08, 2017

netem userspace has long relied on a horrible /proc/net/psched hack
to translate the current notion of "ticks" to nanoseconds.

Expressing latency and jitter instead, in well defined nanoseconds,
increases the dynamic range of emulated delays and jitter in netem.

It will also ease a transition where reducing a tick to nsec
equivalence would constrain the max delay in prior versions of
netem to only 4.3 seconds.
Signed-off-by: NDave Taht <dave.taht@gmail.com>
Suggested-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

99803171

11 11月, 2017 5 次提交

tcp: retire FACK loss detection · 713bafea

由 Yuchung Cheng 提交于 11月 08, 2017

FACK loss detection has been disabled by default and the
successor RACK subsumed FACK and can handle reordering better.
This patch removes FACK to simplify TCP loss recovery.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NNeal Cardwell <ncardwell@google.com>
Reviewed-by: NSoheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: NPriyaranjan Jha <priyarjha@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

713bafea

D
bpf: Revert bpf_overrid_function() helper changes. · f3edacbd
由 David S. Miller 提交于 11月 11, 2017
```
NACK'd by x86 maintainer.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
f3edacbd

net: ipv6: sysctl to specify IPv6 ND traffic class · 2210d6b2

由 Maciej Żenczykowski 提交于 11月 07, 2017

Add a per-device sysctl to specify the default traffic class to use for
kernel originated IPv6 Neighbour Discovery packets.

Currently this includes:

  - Router Solicitation (ICMPv6 type 133)
    ndisc_send_rs() -> ndisc_send_skb() -> ip6_nd_hdr()

  - Neighbour Solicitation (ICMPv6 type 135)
    ndisc_send_ns() -> ndisc_send_skb() -> ip6_nd_hdr()

  - Neighbour Advertisement (ICMPv6 type 136)
    ndisc_send_na() -> ndisc_send_skb() -> ip6_nd_hdr()

  - Redirect (ICMPv6 type 137)
    ndisc_send_redirect() -> ndisc_send_skb() -> ip6_nd_hdr()

and if the kernel ever gets around to generating RA's,
it would presumably also include:

  - Router Advertisement (ICMPv6 type 134)
    (radvd daemon could pick up on the kernel setting and use it)

Interface drivers may examine the Traffic Class value and translate
the DiffServ Code Point into a link-layer appropriate traffic
prioritization scheme.  An example of mapping IETF DSCP values to
IEEE 802.11 User Priority values can be found here:

    https://tools.ietf.org/html/draft-ietf-tsvwg-ieee-802-11

The expected primary use case is to properly prioritize ND over wifi.

Testing:
  jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  0
  jzem22:~# echo -1 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  -bash: echo: write error: Invalid argument
  jzem22:~# echo 256 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  -bash: echo: write error: Invalid argument
  jzem22:~# echo 0 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  jzem22:~# echo 255 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  255
  jzem22:~# echo 34 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  34

  jzem22:~# echo $[0xDC] > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  jzem22:~# tcpdump -v -i eth0 icmp6 and src host jzem22.pgc and dst host fe80::1
  tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
  IP6 (class 0xdc, hlim 255, next-header ICMPv6 (58) payload length: 24)
  jzem22.pgc > fe80::1: [icmp6 sum ok] ICMP6, neighbor advertisement,
  length 24, tgt is jzem22.pgc, Flags [solicited]

(based on original change written by Erik Kline, with minor changes)

v2: fix 'suspicious rcu_dereference_check() usage'
    by explicitly grabbing the rcu_read_lock.

Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: NErik Kline <ek@google.com>
Signed-off-by: NMaciej Żenczykowski <maze@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2210d6b2

bpf: add a bpf_override_function helper · dd0bb688

由 Josef Bacik 提交于 11月 07, 2017

Error injection is sloppy and very ad-hoc.  BPF could fill this niche
perfectly with it's kprobe functionality.  We could make sure errors are
only triggered in specific call chains that we care about with very
specific situations.  Accomplish this with the bpf_override_funciton
helper.  This will modify the probe'd callers return value to the
specified value and set the PC to an override function that simply
returns, bypassing the originally probed function.  This gives us a nice
clean way to implement systematic error injection for all of our code
paths.
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dd0bb688

audit: filter PATH records keyed on filesystem magic · 42d5e376

由 Richard Guy Briggs 提交于 8月 23, 2017

Tracefs or debugfs were causing hundreds to thousands of PATH records to
be associated with the init_module and finit_module SYSCALL records on a
few modules when the following rule was in place for startup:
	-a always,exit -F arch=x86_64 -S init_module -F key=mod-load

Provide a method to ignore these large number of PATH records from
overwhelming the logs if they are not of interest.  Introduce a new
filter list "AUDIT_FILTER_FS", with a new field type AUDIT_FSTYPE,
which keys off the filesystem 4-octet hexadecimal magic identifier to
filter specific filesystem PATH records.

An example rule would look like:
	-a never,filesystem -F fstype=0x74726163 -F key=ignore_tracefs
	-a never,filesystem -F fstype=0x64626720 -F key=ignore_debugfs

Arguably the better way to address this issue is to disable tracefs and
debugfs on boot from production systems.

See: https://github.com/linux-audit/audit-kernel/issues/16
See: https://github.com/linux-audit/audit-userspace/issues/8
Test case: https://github.com/linux-audit/audit-testsuite/issues/42Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
[PM: fixed the whitespace damage in kernel/auditsc.c]
Signed-off-by: NPaul Moore <paul@paul-moore.com>

42d5e376

10 11月, 2017 1 次提交

NFC: Add NFC_CMD_DEACTIVATE_TARGET support · 4d63adfe

由 Mark Greer 提交于 6月 15, 2017

Once an NFC target (i.e., a tag) is found, it remains active until
there is a failure reading or writing it (often caused by the target
moving out of range).  While the target is active, the NFC adapter
and antenna must remain powered.  This wastes power when the target
remains in range but the client application no longer cares whether
it is there or not.

To mitigate this, add a new netlink command that allows userspace
to deactivate an active target.  When issued, this command will cause
the NFC subsystem to act as though the target was moved out of range.
Once the command has been executed, the client application can power
off the NFC adapter to reduce power consumption.
Signed-off-by: NMark Greer <mgreer@animalcreek.com>
Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>

4d63adfe

09 11月, 2017 3 次提交

KVM: s390: provide a capability for AIS state migration · da9a1446

由 Christian Borntraeger 提交于 11月 09, 2017

The AIS capability was introduced in 4.12, while the interface to
migrate the state was added in 4.13. Unfortunately it is not possible
for userspace to detect the migration capability without creating a flic
kvm device. As in QEMU the cpu model detection runs on the "none"
machine this will result in cpu model issues regarding the "ais"
capability.

To get the "ais" capability properly let's add a new KVM capability that
tells userspace that AIS states can be migrated.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NHalil Pasic <pasic@linux.vnet.ibm.com>

da9a1446

HID: wacom: generic: Send BTN_STYLUS3 when both barrel switches are set · 9e429d56

由 Jason Gerecke 提交于 11月 07, 2017

The Wacom Pro Pen 3D includes a third barrel switch which is intended to
be particularly useful in applications where one frequency uses pan, zoom,
and rotate to navigate around a scene or model. The pen is compatible with
the MobileStudio Pro, 2nd-gen Intuos Pro, and Cintiq Pro. When the third
button is pressed, these devices set both the HID_DG_BARRELSWITCH and
HID_DG_BARRELSWITCH2 usages since their HID descriptors do not include a
usage specific to the button.

Rather than send both BTN_STYLUS and BTN_STYLUS2 when the third button is
pressed, userspace (libinput) has requested that we detect this condition
and report a newly-defined BTN_STYLUS3 event instead. We could define a
quirk specific to devices compatible with the Pro Pen 3D, but the liklihood
of seeing both barrel switch bits set with other pens/devices is low enough
to not worry about (pens mechanically prevent accidental activation of
multiple switches).
Signed-off-by: NJason Gerecke <jason.gerecke@wacom.com>
Acked-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: NPeter Hutterer <peter.hutterer@who-t.net>
Acked-by: NBenjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

9e429d56

EVM: Include security.apparmor in EVM measurements · 096b8546

由 Matthew Garrett 提交于 10月 13, 2017

Apparmor will be gaining support for security.apparmor labels, and it
would be helpful to include these in EVM validation now so appropriate
signatures can be generated even before full support is merged.
Signed-off-by: NMatthew Garrett <mjg59@google.com>
Acked-by: NJohn Johansen <John.johansen@canonical.com>
Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>

096b8546

08 11月, 2017 5 次提交

openvswitch: enable NSH support · b2d0f5d5

由 Yi Yang 提交于 11月 07, 2017

v16->17
 - Fixed disputed check code: keep them in nsh_push and nsh_pop
   but also add them in __ovs_nla_copy_actions

v15->v16
 - Add csum recalculation for nsh_push, nsh_pop and set_nsh
   pointed out by Pravin
 - Move nsh key into the union with ipv4 and ipv6 and add
   check for nsh key in match_validate pointed out by Pravin
 - Add nsh check in validate_set and __ovs_nla_copy_actions

v14->v15
 - Check size in nsh_hdr_from_nlattr
 - Fixed four small issues pointed out By Jiri and Eric

v13->v14
 - Rename skb_push_nsh to nsh_push per Dave's comment
 - Rename skb_pop_nsh to nsh_pop per Dave's comment

v12->v13
 - Fix NSH header length check in set_nsh

v11->v12
 - Fix missing changes old comments pointed out
 - Fix new comments for v11

v10->v11
 - Fix the left three disputable comments for v9
   but not fixed in v10.

v9->v10
 - Change struct ovs_key_nsh to
       struct ovs_nsh_key_base base;
       __be32 context[NSH_MD1_CONTEXT_SIZE];
 - Fix new comments for v9

v8->v9
 - Fix build error reported by daily intel build
   because nsh module isn't selected by openvswitch

v7->v8
 - Rework nested value and mask for OVS_KEY_ATTR_NSH
 - Change pop_nsh to adapt to nsh kernel module
 - Fix many issues per comments from Jiri Benc

v6->v7
 - Remove NSH GSO patches in v6 because Jiri Benc
   reworked it as another patch series and they have
   been merged.
 - Change it to adapt to nsh kernel module added by NSH
   GSO patch series

v5->v6
 - Fix the rest comments for v4.
 - Add NSH GSO support for VxLAN-gpe + NSH and
   Eth + NSH.

v4->v5
 - Fix many comments by Jiri Benc and Eric Garver
   for v4.

v3->v4
 - Add new NSH match field ttl
 - Update NSH header to the latest format
   which will be final format and won't change
   per its author's confirmation.
 - Fix comments for v3.

v2->v3
 - Change OVS_KEY_ATTR_NSH to nested key to handle
   length-fixed attributes and length-variable
   attriubte more flexibly.
 - Remove struct ovs_action_push_nsh completely
 - Add code to handle nested attribute for SET_MASKED
 - Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH
   to transfer NSH header data.
 - Fix comments and coding style issues by Jiri and Eric

v1->v2
 - Change encap_nsh and decap_nsh to push_nsh and pop_nsh
 - Dynamically allocate struct ovs_action_push_nsh for
   length-variable metadata.

OVS master and 2.8 branch has merged NSH userspace
patch series, this patch is to enable NSH support
in kernel data path in order that OVS can support
NSH in compat mode by porting this.
Signed-off-by: NYi Yang <yi.y.yang@intel.com>
Acked-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NEric Garver <e@erig.me>
Acked-by: NPravin Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2d0f5d5

net_sch: red: Add offload ability to RED qdisc · 602f3baf

由 Nogah Frankel 提交于 11月 06, 2017

Add the ability to offload RED qdisc by using ndo_setup_tc.
There are four commands for RED offloading:
* TC_RED_SET: handles set and change.
* TC_RED_DESTROY: handle qdisc destroy.
* TC_RED_STATS: update the qdiscs counters (given as reference)
* TC_RED_XSTAT: returns red xstats.

Whether RED is being offloaded is being determined every time dump action
is being called because parent change of this qdisc could change its
offload state but doesn't require any RED function to be called.
Signed-off-by: NNogah Frankel <nogahf@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

602f3baf

ila: Add a hook type for LWT routes · fddb231e

由 Tom Herbert 提交于 11月 05, 2017

In LWT tunnels both an input and output route method is defined.
If both of these are executed in the same path then double translation
happens and the effect is not correct.

This patch adds a new attribute that indicates the hook type. Two
values are defined for route output and route output. ILA
translation is only done for the one that is set. The default is
to enable ILA on route output.
Signed-off-by: NTom Herbert <tom@quantonium.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fddb231e

ila: allow configuration of identifier type · 70d5aef4

由 Tom Herbert 提交于 11月 05, 2017

Allow identifier to be explicitly configured for a mapping.
This can either be one of the identifier types specified in the
ILA draft or a value of ILA_ATYPE_USE_FORMAT which means the
identifier type is inferred from the identifier type field.
If a value other than ILA_ATYPE_USE_FORMAT is set for a
mapping then it is assumed that the identifier type field is
not present in an identifier.
Signed-off-by: NTom Herbert <tom@quantonium.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70d5aef4

ila: add checksum neutral map auto · 84287bb3

由 Tom Herbert 提交于 11月 05, 2017

Add checksum neutral auto that performs checksum neutral mapping
without using the C-bit. This is enabled by configuration of
a mapping.

The checksum neutral function has been split into
ila_csum_do_neutral_fmt and ila_csum_do_neutral_nofmt. The former
handles the C-bit and includes it in the adjustment value. The latter
just sets the adjustment value on the locator diff only.

Added configuration for checksum neutral map aut in ila_lwt
and ila_xlat.
Signed-off-by: NTom Herbert <tom@quantonium.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84287bb3

07 11月, 2017 1 次提交

usb: core: add Status Type definitions · 6f27f4f9

由 Felipe Balbi 提交于 11月 02, 2017

USB 3.1 added a PTM_STATUS type. Let's add a define for it and
following patches will let usb_get_status() accept the new argument.
Signed-off-by: NFelipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

6f27f4f9

05 11月, 2017 6 次提交

bpf, cgroup: implement eBPF-based device controller for cgroup v2 · ebc614f6

由 Roman Gushchin 提交于 11月 05, 2017

Cgroup v2 lacks the device controller, provided by cgroup v1.
This patch adds a new eBPF program type, which in combination
of previously added ability to attach multiple eBPF programs
to a cgroup, will provide a similar functionality, but with some
additional flexibility.

This patch introduces a BPF_PROG_TYPE_CGROUP_DEVICE program type.
A program takes major and minor device numbers, device type
(block/character) and access type (mknod/read/write) as parameters
and returns an integer which defines if the operation should be
allowed or terminated with -EPERM.
Signed-off-by: NRoman Gushchin <guro@fb.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ebc614f6

bpf: report offload info to user space · bd601b6a

由 Jakub Kicinski 提交于 11月 03, 2017

Extend struct bpf_prog_info to contain information about program
being bound to a device.  Since the netdev may get destroyed while
program still exists we need a flag to indicate the program is
loaded for a device, even if the device is gone.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd601b6a

bpf: offload: add infrastructure for loading programs for a specific netdev · ab3f0063

由 Jakub Kicinski 提交于 11月 03, 2017

The fact that we don't know which device the program is going
to be used on is quite limiting in current eBPF infrastructure.
We have to reverse or limit the changes which kernel makes to
the loaded bytecode if we want it to be offloaded to a networking
device.  We also have to invent new APIs for debugging and
troubleshooting support.

Make it possible to load programs for a specific netdev.  This
helps us to bring the debug information closer to the core
eBPF infrastructure (e.g. we will be able to reuse the verifer
log in device JIT).  It allows device JITs to perform translation
on the original bytecode.

__bpf_prog_get() when called to get a reference for an attachment
point will now refuse to give it if program has a device assigned.
Following patches will add a version of that function which passes
the expected netdev in. @type argument in __bpf_prog_get() is
renamed to attach_type to make it clearer that it's only set on
attachment.

All calls to ndo_bpf are protected by rtnl, only verifier callbacks
are not.  We need a wait queue to make sure netdev doesn't get
destroyed while verifier is still running and calling its driver.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab3f0063

rtnetlink: use netnsid to query interface · 79e1ad14

由 Jiri Benc 提交于 11月 02, 2017

Currently, when an application gets netnsid from the kernel (for example as
the result of RTM_GETLINK call on one end of the veth pair), it's not much
useful. There's no reliable way to get to the netns fd from the netnsid, nor
does any kernel API accept netnsid.

Extend the RTM_GETLINK call to also accept netnsid. It will operate on the
netns with the given netnsid in such case. Of course, the calling process
needs to have enough capabilities in the target name space; for now, require
CAP_NET_ADMIN. This can be relaxed in the future.

To signal to the calling process that the kernel understood the new
IFLA_IF_NETNSID attribute in the query, it will include it in the response.
This is needed to detect older kernels, as they will just ignore
IFLA_IF_NETNSID and query in the current name space.

This patch implemetns IFLA_IF_NETNSID only for get and dump. For set
operations, this can be extended later.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79e1ad14

openvswitch: reliable interface indentification in port dumps · 9354d452

由 Jiri Benc 提交于 11月 02, 2017

This patch allows reliable identification of netdevice interfaces connected
to openvswitch bridges. In particular, user space queries the netdev
interfaces belonging to the ports for statistics, up/down state, etc.
Datapath dump needs to provide enough information for the user space to be
able to do that.

Currently, only interface names are returned. This is not sufficient, as
openvswitch allows its ports to be in different name spaces and the
interface name is valid only in its name space. What is needed and generally
used in other netlink APIs, is the pair ifindex+netnsid.

The solution is addition of the ifindex+netnsid pair (or only ifindex if in
the same name space) to vport get/dump operation.

On request side, ideally the ifindex+netnsid pair could be used to
get/set/del the corresponding vport. This is not implemented by this patch
and can be added later if needed.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9354d452

net/dcb: Add dscp to priority selector type · ee205981

由 Huy Nguyen 提交于 7月 18, 2017

IEEE specification P802.1Qcd/D2.1 defines priority selector 5.
This APP TLV selector defines DSCP to priority map.
This patch defines such DSCP selector.
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

ee205981

04 11月, 2017 2 次提交

platform/x86: dell-smbios-wmi: introduce userspace interface · f2645fa3

由 Mario Limonciello 提交于 11月 01, 2017

It's important for the driver to provide a R/W ioctl to ensure that
two competing userspace processes don't race to provide or read each
others data.

This userspace character device will be used to perform SMBIOS calls
from any applications.

It provides an ioctl that will allow passing the WMI calling
interface buffer between userspace and kernel space.

This character device is intended to deprecate the dcdbas kernel module
and the interface that it provides to userspace.

To perform an SMBIOS IOCTL call using the character device userspace will
perform a read() on the the character device.  The WMI bus will provide
a u64 variable containing the necessary size of the IOCTL buffer.

The API for interacting with this interface is defined in documentation
as well as the WMI uapi header provides the format of the structures.

Not all userspace requests will be accepted.  The dell-smbios filtering
functionality will be used to prevent access to certain tokens and calls.

All whitelisted commands and tokens are now shared out to userspace so
applications don't need to define them in their own headers.
Signed-off-by: NMario Limonciello <mario.limonciello@dell.com>
Reviewed-by: NEdward O'Callaghan <quasisec@google.com>
Signed-off-by: NDarren Hart (VMware) <dvhart@infradead.org>

f2645fa3

platform/x86: wmi: create userspace interface for drivers · 44b6b766

由 Mario Limonciello 提交于 11月 01, 2017

For WMI operations that are only Set or Query readable and writable sysfs
attributes created by WMI vendor drivers or the bus driver makes sense.

For other WMI operations that are run on Method, there needs to be a
way to guarantee to userspace that the results from the method call
belong to the data request to the method call.  Sysfs attributes don't
work well in this scenario because two userspace processes may be
competing at reading/writing an attribute and step on each other's
data.

When a WMI vendor driver declares a callback method in the wmi_driver
the WMI bus driver will create a character device that maps to that
function.  This callback method will be responsible for filtering
invalid requests and performing the actual call.

That character device will correspond to this path:
/dev/wmi/$driver

Performing read() on this character device will provide the size
of the buffer that the character device needs to perform calls.
This buffer size can be set by vendor drivers through a new symbol
or when MOF parsing is available by the MOF.

Performing ioctl() on this character device will be interpretd
by the WMI bus driver. It will perform sanity tests for size of
data, test them for a valid instance, copy the data from userspace
and pass iton to the vendor driver to further process and run.

This creates an implicit policy that each driver will only be allowed
a single character device.  If a module matches multiple GUID's,
the wmi_devices will need to be all handled by the same wmi_driver.

The WMI vendor drivers will be responsible for managing inappropriate
access to this character device and proper locking on data used by
it.

When a WMI vendor driver is unloaded the WMI bus driver will clean
up the character device and any memory allocated for the call.
Signed-off-by: NMario Limonciello <mario.limonciello@dell.com>
Reviewed-by: NEdward O'Callaghan <quasisec@google.com>
Signed-off-by: NDarren Hart (VMware) <dvhart@infradead.org>

44b6b766

03 11月, 2017 3 次提交

arm64/sve: Add prctl controls for userspace vector length management · 2d2123bc

由 Dave Martin 提交于 10月 31, 2017

This patch adds two arm64-specific prctls, to permit userspace to
control its vector length:

 * PR_SVE_SET_VL: set the thread's SVE vector length and vector
   length inheritance mode.

 * PR_SVE_GET_VL: get the same information.

Although these prctls resemble instruction set features in the SVE
architecture, they provide additional control: the vector length
inheritance mode is Linux-specific and nothing to do with the
architecture, and the architecture does not permit EL0 to set its
own vector length directly.  Both can be used in portable tools
without requiring the use of SVE instructions.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
[will: Fixed up prctl constants to avoid clash with PDEATHSIG]
Signed-off-by: NWill Deacon <will.deacon@arm.com>

2d2123bc

arm64/sve: ptrace and ELF coredump support · 43d4da2c

由 Dave Martin 提交于 10月 31, 2017

This patch defines and implements a new regset NT_ARM_SVE, which
describes a thread's SVE register state.  This allows a debugger to
manipulate the SVE state, as well as being included in ELF
coredumps for post-mortem debugging.

Because the regset size and layout are dependent on the thread's
current vector length, it is not possible to define a C struct to
describe the regset contents as is done for existing regsets.
Instead, and for the same reasons, NT_ARM_SVE is based on the
freeform variable-layout approach used for the SVE signal frame.

Additionally, to reduce debug overhead when debugging threads that
might or might not have live SVE register state, NT_ARM_SVE may be
presented in one of two different formats: the old struct
user_fpsimd_state format is embedded for describing the state of a
thread with no live SVE state, whereas a new variable-layout
structure is embedded for describing live SVE state.  This avoids a
debugger needing to poll NT_PRFPREG in addition to NT_ARM_SVE, and
allows existing userspace code to handle the non-SVE case without
too much modification.

For this to work, NT_ARM_SVE is defined with a fixed-format header
of type struct user_sve_header, which the recipient can use to
figure out the content, size and layout of the reset of the regset.
Accessor macros are defined to allow the vector-length-dependent
parts of the regset to be manipulated.
Signed-off-by: NAlan Hayward <alan.hayward@arm.com>
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Okamoto Takayuki <tokamoto@jp.fujitsu.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

43d4da2c

arm64/sve: Backend logic for setting the vector length · 7582e220

由 Dave Martin 提交于 10月 31, 2017

This patch implements the core logic for changing a task's vector
length on request from userspace.  This will be used by the ptrace
and prctl frontends that are implemented in later patches.

The SVE architecture permits, but does not require, implementations
to support vector lengths that are not a power of two.  To handle
this, logic is added to check a requested vector length against a
possibly sparse bitmap of available vector lengths at runtime, so
that the best supported value can be chosen.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

7582e220

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功