提交 · 271b955e52a965f729c9e67f281685c2e7d8726a · gsplhtlxg / clone-Linux

29 6月, 2018 1 次提交

bpf: Change bpf_fib_lookup to return lookup status · 4c79579b

由 David Ahern 提交于 6月 26, 2018

For ACLs implemented using either FIB rules or FIB entries, the BPF
program needs the FIB lookup status to be able to drop the packet.
Since the bpf_fib_lookup API has not reached a released kernel yet,
change the return code to contain an encoding of the FIB lookup
result and return the nexthop device index in the params struct.

In addition, inform the BPF program of any post FIB lookup reason as
to why the packet needs to go up the stack.

The fib result for unicast routes must have an egress device, so remove
the check that it is non-NULL.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

4c79579b

21 6月, 2018 1 次提交

nbd: Add the nbd NBD_DISCONNECT_ON_CLOSE config flag. · 08ba91ee

由 Doron Roberts-Kedes 提交于 6月 15, 2018

If NBD_DISCONNECT_ON_CLOSE is set on a device, then the driver will
issue a disconnect from nbd_release if the device has no remaining
bdev->bd_openers.

Fix ret val so reconfigure with only setting the flag succeeds.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NDoron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

08ba91ee

16 6月, 2018 1 次提交

docs: Fix some broken references · 5fb94e9c

由 Mauro Carvalho Chehab 提交于 5月 08, 2018

As we move stuff around, some doc references are broken. Fix some of
them via this script:
	./scripts/documentation-file-ref-check --fix

Manually checked if the produced result is valid, removing a few
false-positives.
Acked-by: NTakashi Iwai <tiwai@suse.de>
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Acked-by: NStephen Boyd <sboyd@kernel.org>
Acked-by: NCharles Keepax <ckeepax@opensource.wolfsonmicro.com>
Acked-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: NColy Li <colyli@suse.de>
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: NJonathan Corbet <corbet@lwn.net>

5fb94e9c

15 6月, 2018 3 次提交

nl80211: fix some kernel doc tag mistakes · badbc27d

由 Luca Coelho 提交于 6月 08, 2018

There is a bunch of tags marking constants with &, which means struct
or enum name.  Replace them with %, which is the correct tag for
constants.
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>

badbc27d

aio: mark __aio_sigset::sigmask const · 94aefd32

由 Avi Kivity 提交于 6月 08, 2018

io_pgetevents() will not change the signal mask. Mark it const
to make it clear and to reduce the need for casts in user code.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAvi Kivity <avi@scylladb.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

94aefd32

ALSA: usb-audio: Add bi-directional terminal types · 2dd5aa15

由 Jorge Sanjuan 提交于 6月 14, 2018

Define the bi-directional USB terminal types for audio devices.
Signed-off-by: NJorge Sanjuan <jorge.sanjuan@codethink.co.uk>
Signed-off-by: NTakashi Iwai <tiwai@suse.de>

2dd5aa15

13 6月, 2018 1 次提交

netfilter: nft_dynset: do not reject set updates with NFT_SET_EVAL · 215a31f1

由 Pablo Neira Ayuso 提交于 6月 11, 2018

NFT_SET_EVAL is signalling the kernel that this sets can be updated from
the evaluation path, even if there are no expressions attached to the
element. Otherwise, set updates with no expressions fail. Update
description to describe the right semantics.

Fixes: 22fe54d5 ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

215a31f1

12 6月, 2018 3 次提交

kvm: fix typo in flag name · 766d3571

由 Michael S. Tsirkin 提交于 6月 08, 2018

KVM_X86_DISABLE_EXITS_HTL really refers to exit on halt.
Obviously a typo: should be named KVM_X86_DISABLE_EXITS_HLT.

Fixes: caa057a2 ("KVM: X86: Provide a capability to disable HLT intercepts")
Cc: stable@vger.kernel.org
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

766d3571

virtio: update the comments for transport features · 2eb98105

由 Tiwei Bie 提交于 6月 01, 2018

The existing comments for transport features are outdated.
So update them to address the latest changes in the spec.
Suggested-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NTiwei Bie <tiwei.bie@intel.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>

2eb98105

virtio_pci: support enabling VFs · cfecc291

由 Tiwei Bie 提交于 6月 01, 2018

There is a new feature bit allocated in virtio spec to
support SR-IOV (Single Root I/O Virtualization):

https://github.com/oasis-tcs/virtio-spec/issues/11

This patch enables the support for this feature bit in
virtio driver.
Signed-off-by: NTiwei Bie <tiwei.bie@intel.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

cfecc291

08 6月, 2018 4 次提交

autofs4: merge auto_fs.h and auto_fs4.h · ef8b42f7

由 Ian Kent 提交于 6月 07, 2018

The autofs module has long since been removed so there's no need to have
two separate include files for autofs.

Link: http://lkml.kernel.org/r/152626703024.28589.9571964661718767929.stgit@pluto.themaw.netSigned-off-by: NIan Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ef8b42f7

mm: mark pages in use for page tables · 1d40a5ea

由 Matthew Wilcox 提交于 6月 07, 2018

Define a new PageTable bit in the page_type and use it to mark pages in
use as page tables.  This can be helpful when debugging crashdumps or
analysing memory fragmentation.  Add a KPF flag to report these pages to
userspace and update page-types.c to interpret that flag.

Note that only pages currently accounted as NR_PAGETABLES are tracked as
PageTable; this does not include pgd/p4d/pud/pmd pages.  Those will be the
subject of a later patch.

Link: http://lkml.kernel.org/r/20180518194519.3820-4-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d40a5ea

xsk: Fix umem fill/completion queue mmap on 32-bit · a5a16e43

由 Geert Uytterhoeven 提交于 6月 07, 2018

With gcc-4.1.2 on 32-bit:

net/xdp/xsk.c:663: warning: integer constant is too large for ‘long’ type
net/xdp/xsk.c:665: warning: integer constant is too large for ‘long’ type

Add the missing "ULL" suffixes to the large XDP_UMEM_PGOFF_*_RING values
to fix this.

net/xdp/xsk.c:663: warning: comparison is always false due to limited range of data type
net/xdp/xsk.c:665: warning: comparison is always false due to limited range of data type

"unsigned long" is 32-bit on 32-bit systems, hence the offset is
truncated, and can never be equal to any of the XDP_UMEM_PGOFF_*_RING
values. Use loff_t (and the required cast) to fix this.

Fixes: 423f3832 ("xsk: add umem fill queue support and mmap")
Fixes: fe230832 ("xsk: add umem completion queue support and mmap")
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

a5a16e43

netfilter: nf_tables: add NFT_LOGLEVEL_* enumeration and use it · 7eced5ab

由 Pablo Neira Ayuso 提交于 6月 03, 2018

This is internal, not exposed through uapi, and although it maps with
userspace LOG_*, with the introduction of LOGLEVEL_AUDIT we are
incurring in namespace pollution.

This patch adds the NFT_LOGLEVEL_ enumeration and use it from nft_log.

Fixes: 1a893b44 ("netfilter: nf_tables: Add audit support to log statement")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Acked-by: NPhil Sutter <phil@nwl.cc>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7eced5ab

07 6月, 2018 1 次提交

netfilter: nf_conntrack: Increase __IPS_MAX_BIT with new bit IPS_OFFLOAD_BIT · 64e6dd1f

由 Gao Feng 提交于 6月 07, 2018

The __IPS_MAX_BIT is used in __ctnetlink_change_status as the max bit
value. When add new bit IPS_OFFLOAD_BIT whose value is 14, we should
increase the __IPS_MAX_BIT too, from 14 to 15.

There is no any bug in current codes, although it lost one loop in
__ctnetlink_change_status. Because the new bit IPS_OFFLOAD_BIT belongs
the IPS_UNCHANGEABLE_MASK.
Signed-off-by: NGao Feng <gfree.wind@vip.163.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

64e6dd1f

06 6月, 2018 3 次提交

rseq: Introduce restartable sequences system call · d7822b1e

由 Mathieu Desnoyers 提交于 6月 02, 2018

Expose a new system call allowing each thread to register one userspace
memory area to be used as an ABI between kernel and user-space for two
purposes: user-space restartable sequences and quick access to read the
current CPU number value from user-space.

* Restartable sequences (per-cpu atomics)

Restartables sequences allow user-space to perform update operations on
per-cpu data without requiring heavy-weight atomic operations.

The restartable critical sections (percpu atomics) work has been started
by Paul Turner and Andrew Hunter. It lets the kernel handle restart of
critical sections. [1] [2] The re-implementation proposed here brings a
few simplifications to the ABI which facilitates porting to other
architectures and speeds up the user-space fast path.

Here are benchmarks of various rseq use-cases.

Test hardware:

arm32: ARMv7 Processor rev 4 (v7l) "Cubietruck", 2-core
x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading

The following benchmarks were all performed on a single thread.

* Per-CPU statistic counter increment

                getcpu+atomic (ns/op)    rseq (ns/op)    speedup
arm32:                344.0                 31.4          11.0
x86-64:                15.3                  2.0           7.7

* LTTng-UST: write event 32-bit header, 32-bit payload into tracer
             per-cpu buffer

                getcpu+atomic (ns/op)    rseq (ns/op)    speedup
arm32:               2502.0                 2250.0         1.1
x86-64:               117.4                   98.0         1.2

* liburcu percpu: lock-unlock pair, dereference, read/compare word

                getcpu+atomic (ns/op)    rseq (ns/op)    speedup
arm32:                751.0                 128.5          5.8
x86-64:                53.4                  28.6          1.9

* jemalloc memory allocator adapted to use rseq

Using rseq with per-cpu memory pools in jemalloc at Facebook (based on
rseq 2016 implementation):

The production workload response-time has 1-2% gain avg. latency, and
the P99 overall latency drops by 2-3%.

* Reading the current CPU number

Speeding up reading the current CPU number on which the caller thread is
running is done by keeping the current CPU number up do date within the
cpu_id field of the memory area registered by the thread. This is done
by making scheduler preemption set the TIF_NOTIFY_RESUME flag on the
current thread. Upon return to user-space, a notify-resume handler
updates the current CPU value within the registered user-space memory
area. User-space can then read the current CPU number directly from
memory.

Keeping the current cpu id in a memory area shared between kernel and
user-space is an improvement over current mechanisms available to read
the current CPU number, which has the following benefits over
alternative approaches:

- 35x speedup on ARM vs system call through glibc
- 20x speedup on x86 compared to calling glibc, which calls vdso
  executing a "lsl" instruction,
- 14x speedup on x86 compared to inlined "lsl" instruction,
- Unlike vdso approaches, this cpu_id value can be read from an inline
  assembly, which makes it a useful building block for restartable
  sequences.
- The approach of reading the cpu id through memory mapping shared
  between kernel and user-space is portable (e.g. ARM), which is not the
  case for the lsl-based x86 vdso.

On x86, yet another possible approach would be to use the gs segment
selector to point to user-space per-cpu data. This approach performs
similarly to the cpu id cache, but it has two disadvantages: it is
not portable, and it is incompatible with existing applications already
using the gs segment selector for other purposes.

Benchmarking various approaches for reading the current CPU number:

ARMv7 Processor rev 4 (v7l)
Machine model: Cubietruck
- Baseline (empty loop):                                    8.4 ns
- Read CPU from rseq cpu_id:                               16.7 ns
- Read CPU from rseq cpu_id (lazy register):               19.8 ns
- glibc 2.19-0ubuntu6.6 getcpu:                           301.8 ns
- getcpu system call:                                     234.9 ns

x86-64 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz:
- Baseline (empty loop):                                    0.8 ns
- Read CPU from rseq cpu_id:                                0.8 ns
- Read CPU from rseq cpu_id (lazy register):                0.8 ns
- Read using gs segment selector:                           0.8 ns
- "lsl" inline assembly:                                   13.0 ns
- glibc 2.19-0ubuntu6 getcpu:                              16.6 ns
- getcpu system call:                                      53.9 ns

- Speed (benchmark taken on v8 of patchset)

Running 10 runs of hackbench -l 100000 seems to indicate, contrary to
expectations, that enabling CONFIG_RSEQ slightly accelerates the
scheduler:

Configuration: 2 sockets * 8-core Intel(R) Xeon(R) CPU E5-2630 v3 @
2.40GHz (directly on hardware, hyperthreading disabled in BIOS, energy
saving disabled in BIOS, turboboost disabled in BIOS, cpuidle.off=1
kernel parameter), with a Linux v4.6 defconfig+localyesconfig,
restartable sequences series applied.

* CONFIG_RSEQ=n

avg.:      41.37 s
std.dev.:   0.36 s

* CONFIG_RSEQ=y

avg.:      40.46 s
std.dev.:   0.33 s

- Size

On x86-64, between CONFIG_RSEQ=n/y, the text size increase of vmlinux is
567 bytes, and the data size increase of vmlinux is 5696 bytes.

[1] https://lwn.net/Articles/650333/
[2] http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdfSigned-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Chris Lameter <cl@linux.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Andrew Hunter <ahh@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-api@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com
Link: http://lkml.kernel.org/r/20150624222609.6116.86035.stgit@kitami.mtv.corp.google.com
Link: https://lkml.kernel.org/r/20180602124408.8430-3-mathieu.desnoyers@efficios.com

d7822b1e

uapi/headers: Provide types_32_64.h · b575e837

由 Mathieu Desnoyers 提交于 6月 02, 2018

Provide helper macros for fields which represent pointers in
kernel-userspace ABI. This facilitates handling of 32-bit
user-space by 64-bit kernels by defining those fields as
32-bit 0-padding and 32-bit integer on 32-bit architectures,
which allows the kernel to treat those as 64-bit integers.
The order of padding and 32-bit integer depends on the
endianness.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Chris Lameter <cl@linux.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Andrew Hunter <ahh@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: linux-api@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20180602124408.8430-2-mathieu.desnoyers@efficios.com

b575e837

ncpfs: remove uapi .h files · 05e98465

由 Greg Kroah-Hartman 提交于 6月 01, 2018

Now that ncpfs is removed from the tree, there is no need to keep the
uapi header files around as no one uses them, and it is not a feature
that the kernel supports anymore.
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

05e98465

05 6月, 2018 1 次提交

xsk: add zero-copy support for Rx · 173d3adb

由 Björn Töpel 提交于 6月 04, 2018

Extend the xsk_rcv to support the new MEM_TYPE_ZERO_COPY memory, and
wireup ndo_bpf call in bind.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

173d3adb

04 6月, 2018 4 次提交

xsk: new descriptor addressing scheme · bbff2f32

由 Björn Töpel 提交于 6月 04, 2018

Currently, AF_XDP only supports a fixed frame-size memory scheme where
each frame is referenced via an index (idx). A user passes the frame
index to the kernel, and the kernel acts upon the data.  Some NICs,
however, do not have a fixed frame-size model, instead they have a
model where a memory window is passed to the hardware and multiple
frames are filled into that window (referred to as the "type-writer"
model).

By changing the descriptor format from the current frame index
addressing scheme, AF_XDP can in the future be extended to support
these kinds of NICs.

In the index-based model, an idx refers to a frame of size
frame_size. Addressing a frame in the UMEM is done by offseting the
UMEM starting address by a global offset, idx * frame_size + offset.
Communicating via the fill- and completion-rings are done by means of
idx.

In this commit, the idx is removed in favor of an address (addr),
which is a relative address ranging over the UMEM. To convert an
idx-based address to the new addr is simply: addr = idx * frame_size +
offset.

We also stop referring to the UMEM "frame" as a frame. Instead it is
simply called a chunk.

To transfer ownership of a chunk to the kernel, the addr of the chunk
is passed in the fill-ring. Note, that the kernel will mask addr to
make it chunk aligned, so there is no need for userspace to do
that. E.g., for a chunk size of 2k, passing an addr of 2048, 2050 or
3000 to the fill-ring will refer to the same chunk.

On the completion-ring, the addr will match that of the Tx descriptor,
passed to the kernel.

Changing the descriptor format to use chunks/addr will allow for
future changes to move to a type-writer based model, where multiple
frames can reside in one chunk. In this model passing one single chunk
into the fill-ring, would potentially result in multiple Rx
descriptors.

This commit changes the uapi of AF_XDP sockets, and updates the
documentation.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

bbff2f32

bpf: flowlabel in bpf_fib_lookup should be flowinfo · bd3a08aa

由 David Ahern 提交于 6月 03, 2018

As Michal noted the flow struct takes both the flow label and priority.
Update the bpf_fib_lookup API to note that it is flowinfo and not just
the flow label.

Cc: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

bd3a08aa

bpf: implement bpf_get_current_cgroup_id() helper · bf6fa2c8

由 Yonghong Song 提交于 6月 03, 2018

bpf has been used extensively for tracing. For example, bcc
contains an almost full set of bpf-based tools to trace kernel
and user functions/events. Most tracing tools are currently
either filtered based on pid or system-wide.

Containers have been used quite extensively in industry and
cgroup is often used together to provide resource isolation
and protection. Several processes may run inside the same
container. It is often desirable to get container-level tracing
results as well, e.g. syscall count, function count, I/O
activity, etc.

This patch implements a new helper, bpf_get_current_cgroup_id(),
which will return cgroup id based on the cgroup within which
the current task is running.

The later patch will provide an example to show that
userspace can get the same cgroup id so it could
configure a filter or policy in the bpf program based on
task cgroup id.

The helper is currently implemented for tracing. It can
be added to other program types as well when needed.
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

bf6fa2c8

rpmsg: char: Switch to SPDX license identifier · 136200f4

由 Suman Anna 提交于 5月 31, 2018

Use the appropriate SPDX license identifier in the rpmsg char driver
source file and drop the previous boilerplate license text. The uapi
header file already had the SPDX license identifier added as part of
a mass update but the license text removal was deferred for later,
and this patch drops the same.
Signed-off-by: NSuman Anna <s-anna@ti.com>
Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>

136200f4

03 6月, 2018 4 次提交

bpf: make sure to clear unused fields in tunnel/xfrm state fetch · 1fbc2e0c

由 Daniel Borkmann 提交于 6月 02, 2018

Since the remaining bits are not filled in struct bpf_tunnel_key
resp. struct bpf_xfrm_state and originate from uninitialized stack
space, we should make sure to clear them before handing control
back to the program.

Also add a padding element to struct bpf_xfrm_state for future use
similar as we have in struct bpf_tunnel_key and clear it as well.

  struct bpf_xfrm_state {
      __u32                      reqid;            /*     0     4 */
      __u32                      spi;              /*     4     4 */
      __u16                      family;           /*     8     2 */

      /* XXX 2 bytes hole, try to pack */

      union {
          __u32              remote_ipv4;          /*           4 */
          __u32              remote_ipv6[4];       /*          16 */
      };                                           /*    12    16 */

      /* size: 28, cachelines: 1, members: 4 */
      /* sum members: 26, holes: 1, sum holes: 2 */
      /* last cacheline: 28 bytes */
  };
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

1fbc2e0c

bpf: add bpf_skb_cgroup_id helper · cb20b08e

由 Daniel Borkmann 提交于 6月 02, 2018

Add a new bpf_skb_cgroup_id() helper that allows to retrieve the
cgroup id from the skb's socket. This is useful in particular to
enable bpf_get_cgroup_classid()-like behavior for cgroup v1 in
cgroup v2 by allowing ID based matching on egress. This can in
particular be used in combination with applying policy e.g. from
map lookups, and also complements the older bpf_skb_under_cgroup()
interface. In user space the cgroup id for a given path can be
retrieved through the f_handle as demonstrated in [0] recently.

  [0] https://lkml.org/lkml/2018/5/22/1190Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

cb20b08e

PCI/DPC: Disable ERR_NONFATAL handling by DPC · 6927868e

由 Oza Pawandeep 提交于 5月 17, 2018

PCIe ERR_NONFATAL errors mean a particular transaction is unreliable but
the Link is otherwise fully functional (PCIe r4.0, sec 6.2.2).

The AER driver handles these by logging the error details and calling
driver-supplied pci_error_handlers callbacks.  It does not reset downstream
devices, does not remove them from the PCI subsystem, does not re-enumerate
them, and does not call their driver .remove() or .probe() methods.

But DPC driver previously enabled DPC on ERR_NONFATAL, so if the hardware
supports DPC, these errors caused a Link reset (performed automatically by
the hardware), followed by the DPC driver removing affected devices (which
calls their .remove() methods), bringing the Link back up, and
re-enumerating (which calls driver .probe() methods).

Disable ERR_NONFATAL DPC triggering so these errors will only be handled by
AER.  This means drivers won't have to deal with different usage of their
pci_error_handlers callbacks and .probe() and .remove() methods based on
whether the platform has DPC support.
Signed-off-by: NOza Pawandeep <poza@codeaurora.org>
[bhelgaas: changelog]
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

6927868e

netfilter: nf_tables: add connlimit support · 290180e2

由 Pablo Neira Ayuso 提交于 6月 02, 2018

This features which allows you to limit the maximum number of
connections per arbitrary key. The connlimit expression is stateful,
therefore it can be used from meters to dynamically populate a set, this
provides a mapping to the iptables' connlimit match. This patch also
comes that allows you define static connlimit policies.

This extension depends on the nf_conncount infrastructure.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

290180e2

02 6月, 2018 1 次提交

bpf: fix uapi hole for 32 bit compat applications · 36f9814a

由 Daniel Borkmann 提交于 6月 02, 2018

In 64 bit, we have a 4 byte hole between ifindex and netns_dev in the
case of struct bpf_map_info but also struct bpf_prog_info. In net-next
commit b85fab0e ("bpf: Add gpl_compatible flag to struct bpf_prog_info")
added a bitfield into it to expose some flags related to programs. Thus,
add an unnamed __u32 bitfield for both so that alignment keeps the same
in both 32 and 64 bit cases, and can be naturally extended from there
as in b85fab0e.

Before:

  # file test.o
  test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
  # pahole test.o
  struct bpf_map_info {
	__u32                      type;                 /*     0     4 */
	__u32                      id;                   /*     4     4 */
	__u32                      key_size;             /*     8     4 */
	__u32                      value_size;           /*    12     4 */
	__u32                      max_entries;          /*    16     4 */
	__u32                      map_flags;            /*    20     4 */
	char                       name[16];             /*    24    16 */
	__u32                      ifindex;              /*    40     4 */
	__u64                      netns_dev;            /*    44     8 */
	__u64                      netns_ino;            /*    52     8 */

	/* size: 64, cachelines: 1, members: 10 */
	/* padding: 4 */
  };

After (same as on 64 bit):

  # file test.o
  test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
  # pahole test.o
  struct bpf_map_info {
	__u32                      type;                 /*     0     4 */
	__u32                      id;                   /*     4     4 */
	__u32                      key_size;             /*     8     4 */
	__u32                      value_size;           /*    12     4 */
	__u32                      max_entries;          /*    16     4 */
	__u32                      map_flags;            /*    20     4 */
	char                       name[16];             /*    24    16 */
	__u32                      ifindex;              /*    40     4 */

	/* XXX 4 bytes hole, try to pack */

	__u64                      netns_dev;            /*    48     8 */
	__u64                      netns_ino;            /*    56     8 */
	/* --- cacheline 1 boundary (64 bytes) --- */

	/* size: 64, cachelines: 1, members: 10 */
	/* sum members: 60, holes: 1, sum holes: 4 */
  };
Reported-by: NDmitry V. Levin <ldv@altlinux.org>
Reported-by: NEugene Syromiatnikov <esyr@redhat.com>
Fixes: 52775b33 ("bpf: offload: report device information about offloaded maps")
Fixes: 675fc275 ("bpf: offload: report device information for offloaded programs")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

36f9814a

01 6月, 2018 4 次提交

netfilter: nft_fwd_netdev: allow to forward packets via neighbour layer · d32de98e

由 Pablo Neira Ayuso 提交于 5月 31, 2018

This allows us to forward packets from the netdev family via neighbour
layer, so you don't need an explicit link-layer destination when using
this expression from rules. The ttl/hop_limit field is decremented.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d32de98e

netfilter: nf_tables: Add audit support to log statement · 1a893b44

由 Phil Sutter 提交于 5月 30, 2018

This extends log statement to support the behaviour achieved with
AUDIT target in iptables.

Audit logging is enabled via a pseudo log level 8. In this case any
other settings like log prefix are ignored since audit log format is
fixed.
Signed-off-by: NPhil Sutter <phil@nwl.cc>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

1a893b44

netfilter: nf_tables: add support for native socket matching · 554ced0a

由 Máté Eckl 提交于 5月 28, 2018

Now it can only match the transparent flag of an ip/ipv6 socket.
Signed-off-by: NMáté Eckl <ecklm94@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

554ced0a

rtnetlink: Add more well known protocol values · 35aada99

由 Donald Sharp 提交于 5月 30, 2018

FRRouting installs routes into the kernel associated with
the originating protocol.  Add these values to the well
known values in rtnetlink.h.
Signed-off-by: NDonald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35aada99

31 5月, 2018 5 次提交

fs: Add aio iopriority support · d9a08a9e

由 Adam Manzanares 提交于 5月 22, 2018

This is the per-I/O equivalent of the ioprio_set system call.

When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.

This patch depends on block: add ioprio_check_cap function.
Signed-off-by: NAdam Manzanares <adam.manzanares@wdc.com>
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d9a08a9e

btrfs: Add unprivileged version of ino_lookup ioctl · 23d0b79d

由 Tomohiro Misono 提交于 5月 21, 2018

Add unprivileged version of ino_lookup ioctl BTRFS_IOC_INO_LOOKUP_USER
to allow normal users to call "btrfs subvolume list/show" etc. in
combination with BTRFS_IOC_GET_SUBVOL_INFO/BTRFS_IOC_GET_SUBVOL_ROOTREF.

This can be used like BTRFS_IOC_INO_LOOKUP but the argument is
different. This is  because it always searches the fs/file tree
correspoinding to the fd with which this ioctl is called and also
returns the name of bottom subvolume.

The main differences from original ino_lookup ioctl are:

  1. Read + Exec permission will be checked using inode_permission()
     during path construction. -EACCES will be returned in case
     of failure.
  2. Path construction will be stopped at the inode number which
     corresponds to the fd with which this ioctl is called. If
     constructed path does not exist under fd's inode, -EACCES
     will be returned.
  3. The name of bottom subvolume is also searched and filled.

Note that the maximum length of path is shorter 256 (BTRFS_VOL_NAME_MAX+1)
bytes than ino_lookup ioctl because of space of subvolume's name.
Reviewed-by: NGu Jinxiang <gujx@cn.fujitsu.com>
Reviewed-by: NQu Wenruo <wqu@suse.com>
Tested-by: NGu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: NTomohiro Misono <misono.tomohiro@jp.fujitsu.com>
[ style fixes ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

23d0b79d

btrfs: Add unprivileged ioctl which returns subvolume's ROOT_REF · 42e4b520

由 Tomohiro Misono 提交于 5月 21, 2018

Add unprivileged ioctl BTRFS_IOC_GET_SUBVOL_ROOTREF which returns
ROOT_REF information of the subvolume containing this inode except the
subvolume name (this is because to prevent potential name leak). The
subvolume name will be gained by user version of ino_lookup ioctl
(BTRFS_IOC_INO_LOOKUP_USER) which also performs permission check.

The min id of root ref's subvolume to be searched is specified by
@min_id in struct btrfs_ioctl_get_subvol_rootref_args. After the search
ends, @min_id is set to the last searched root ref's subvolid + 1. Also,
if there are more root refs than BTRFS_MAX_ROOTREF_BUFFER_NUM,
-EOVERFLOW is returned. Therefore the caller can just call this ioctl
again without changing the argument to continue search.
Reviewed-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NGu Jinxiang <gujx@cn.fujitsu.com>
Tested-by: NGu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: NTomohiro Misono <misono.tomohiro@jp.fujitsu.com>
[ style fixes and struct item renames ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

42e4b520

btrfs: Add unprivileged ioctl which returns subvolume information · b64ec075

由 Tomohiro Misono 提交于 5月 21, 2018

Add new unprivileged ioctl BTRFS_IOC_GET_SUBVOL_INFO which returns
the information of subvolume containing this inode.
(i.e. returns the information in ROOT_ITEM and ROOT_BACKREF.)
Reviewed-by: NGu Jinxiang <gujx@cn.fujitsu.com>
Tested-by: NGu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: NTomohiro Misono <misono.tomohiro@jp.fujitsu.com>
[ minor style fixes, update struct comments ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

b64ec075

crypto: ccp - Add GET_ID SEV command · 0b3a830b

由 Janakarajan Natarajan 提交于 5月 25, 2018

The GET_ID command, added as of SEV API v0.16, allows the SEV firmware
to be queried about a unique CPU ID. This unique ID can then be used
to obtain the public certificate containing the Chip Endorsement Key
(CEK) public key signed by the AMD SEV Signing Key (ASK).

For more information please refer to "Section 5.12 GET_ID" of
https://support.amd.com/TechDocs/55766_SEV-KM%20API_Specification.pdfSigned-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

0b3a830b

30 5月, 2018 3 次提交

media: rc: introduce BPF_PROG_LIRC_MODE2 · f4364dcf

由 Sean Young 提交于 5月 27, 2018

Add support for BPF_PROG_LIRC_MODE2. This type of BPF program can call
rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
that the last key should be repeated.

The bpf program can be attached to using the bpf(BPF_PROG_ATTACH) syscall;
the target_fd must be the /dev/lircN device.
Acked-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NSean Young <sean@mess.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

f4364dcf

bpf: Drop mpls from bpf_fib_lookup · fa898d76

由 David Ahern 提交于 5月 29, 2018

MPLS support will not be submitted this dev cycle, but in working on it
I do see a few changes are needed to the API. For now, drop mpls from the
API. Since the fields in question are unions, the mpls fields can be added
back later without affecting the uapi.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

fa898d76

bpf: clean up eBPF helpers documentation · 7a279e93

由 Quentin Monnet 提交于 5月 29, 2018

These are minor edits for the eBPF helpers documentation in
include/uapi/linux/bpf.h.

The main fix consists in removing "BPF_FIB_LOOKUP_", because it ends
with a non-escaped underscore that gets interpreted by rst2man and
produces the following message in the resulting manual page:

    DOCUTILS SYSTEM MESSAGES
           System Message: ERROR/3 (/tmp/bpf-helpers.rst:, line 1514)
                  Unknown target name: "bpf_fib_lookup".

Other edits consist in:

- Improving formatting for flag values for "bpf_fib_lookup()" helper.
- Emphasising a parameter name in description of the return value for
  "bpf_get_stack()" helper.
- Removing unnecessary blank lines between "Description" and "Return"
  sections for the few helpers that would use it, for consistency.
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Acked-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

7a279e93