提交 · be26727772cd86979255dfaf1eea967338dc0c9b · openanolis / cloud-kernel

28 12月, 2016 2 次提交

net: xdp: remove unused bfp_warn_invalid_xdp_buffer() · be267277

由 Jason Wang 提交于 12月 27, 2016

After commit 73b62bd0 ("virtio-net:
remove the warning before XDP linearizing"), there's no users for
bpf_warn_invalid_xdp_buffer(), so remove it. This is a revert for
commit f23bc46c.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be267277

ipv4: Namespaceify tcp_tw_reuse knob · 56ab6b93

由 Haishuang Yan 提交于 12月 25, 2016

Different namespaces might have different requirements to reuse
TIME-WAIT sockets for new connections. This might be required in
cases where different namespace applications are in place which
require TIME_WAIT socket connections to be reduced independently
of the host.
Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56ab6b93

23 12月, 2016 2 次提交

move aio compat to fs/aio.c · c00d2c7e

由 Al Viro 提交于 12月 20, 2016

... and fix the minor buglet in compat io_submit() - native one
kills ioctx as cleanup when put_user() fails.  Get rid of
bogus compat_... in !CONFIG_AIO case, while we are at it - they
should simply fail with ENOSYS, same as for native counterparts.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c00d2c7e

IB/cma: Fix a race condition in iboe_addr_get_sgid() · fba332b0

由 Bart Van Assche 提交于 12月 19, 2016

Code that dereferences the struct net_device ip_ptr member must be
protected with an in_dev_get() / in_dev_put() pair. Hence insert
calls to these functions.

Fixes: commit 7b85627b ("IB/cma: IBoE (RoCE) IP-based GID addressing")
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NMoni Shoua <monis@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

fba332b0

21 12月, 2016 5 次提交

ACPI / osl: Remove deprecated acpi_get_table_with_size()/early_acpi_os_unmap_memory() · 8d3523fb

由 Lv Zheng 提交于 12月 14, 2016

Since all users are cleaned up, remove the 2 deprecated APIs due to no
users.
As a Linux variable rather than an ACPICA variable, acpi_gbl_permanent_mmap
is renamed to acpi_permanent_mmap to have a consistent coding style across
entire Linux ACPI subsystem.
Signed-off-by: NLv Zheng <lv.zheng@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8d3523fb

ACPICA: Tables: Back port acpi_get_table_with_size() and... · 174cc718

由 Lv Zheng 提交于 12月 14, 2016

ACPICA: Tables: Back port acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux kernel

ACPICA commit cac6790954d4d752a083e6122220b8a22febcd07

This patch back ports Linux acpi_get_table_with_size() and
early_acpi_os_unmap_memory() into ACPICA upstream to reduce divergences.

The 2 APIs are used by Linux as table management APIs for long time, it
contains a hidden logic that during the early stage, the mapped tables
should be unmapped before the early stage ends.

During the early stage, tables are handled by the following sequence:
 acpi_get_table_with_size();
 parse the table
 early_acpi_os_unmap_memory();
During the late stage, tables are handled by the following sequence:
 acpi_get_table();
 parse the table
Linux uses acpi_gbl_permanent_mmap to distinguish the early stage and the
late stage.

The reasoning of introducing acpi_get_table_with_size() is: ACPICA will
remember the early mapped pointer in acpi_get_table() and Linux isn't able to
prevent ACPICA from using the wrong early mapped pointer during the late
stage as there is no API provided from ACPICA to be an inverse of
acpi_get_table() to forget the early mapped pointer.

But how ACPICA can work with the early/late stage requirement? Inside of
ACPICA, tables are ensured to be remained in "INSTALLED" state during the
early stage, and they are carefully not transitioned to "VALIDATED" state
until the late stage. So the same logic is in fact implemented inside of
ACPICA in a different way. The gap is only that the feature is not provided
to the OSPMs in an accessible external API style.

It then is possible to fix the gap by providing an inverse of
acpi_get_table() from ACPICA, so that the two Linux sequences can be
combined:
 acpi_get_table();
 parse the table
 acpi_put_table();
In order to work easier with the current Linux code, acpi_get_table() and
acpi_put_table() is implemented in a usage counting based style:
 1. When the usage count of the table is increased from 0 to 1, table is
    mapped and .Pointer is set with the mapping address (VALIDATED);
 2. When the usage count of the table is decreased from 1 to 0, .Pointer
    is unset and the mapping address is unmapped (INVALIDATED).
So that we can deploy the new APIs to Linux with minimal effort by just
invoking acpi_get_table() in acpi_get_table_with_size() and invoking
acpi_put_table() in early_acpi_os_unmap_memory(). Lv Zheng.

Link: https://github.com/acpica/acpica/commit/cac67909Signed-off-by: NLv Zheng <lv.zheng@intel.com>
Signed-off-by: NBob Moore <robert.moore@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

174cc718

dt: bindings: net: use boolean dt properties for eee broken modes · 308d3165

由 jbrunet 提交于 12月 19, 2016

The patches regarding eee-broken-modes was merged before all people
involved could find an agreement on the best way to move forward.

While we agreed on having a DT property to mark particular modes as broken,
the value used for eee-broken-modes mapped the phy register in very direct
way. Because of this, the concern is that it could be used to implement
configuration policies instead of describing a broken HW.

In the end, having a boolean property for each mode seems to be preferred
over one bit field value mapping the register (too) directly.

Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NJerome Brunet <jbrunet@baylibre.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

308d3165

ratelimit: fix WARN_ON_RATELIMIT return value · 1b011e2f

由 Jiri Slaby 提交于 12月 19, 2016

The macro is to be used similarly as WARN_ON as:

  if (WARN_ON_RATELIMIT(condition, state))
	do_something();

One would expect only 'condition' to affect the 'if', but
WARN_ON_RATELIMIT does internally only:

  WARN_ON((condition) && __ratelimit(state))

So the 'if' is affected by the ratelimiting state too.  Fix this by
returning 'condition' in any case.

Note that nobody uses WARN_ON_RATELIMIT yet, so there is nothing to
worry about.  But I was about to use it and was a bit surprised.

Link: http://lkml.kernel.org/r/20161215093224.23126-1-jslaby@suse.czSigned-off-by: NJiri Slaby <jslaby@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1b011e2f

ima: on soft reboot, save the measurement list · 7b8589cc

由 Mimi Zohar 提交于 12月 19, 2016

The TPM PCRs are only reset on a hard reboot.  In order to validate a
TPM's quote after a soft reboot (eg.  kexec -e), the IMA measurement
list of the running kernel must be saved and restored on boot.

This patch uses the kexec buffer passing mechanism to pass the
serialized IMA binary_runtime_measurements to the next kernel.

Link: http://lkml.kernel.org/r/1480554346-29071-7-git-send-email-zohar@linux.vnet.ibm.comSigned-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NDmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Andreas Steffen <andreas.steffen@strongswan.org>
Cc: Josh Sklar <sklar@linux.vnet.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7b8589cc

20 12月, 2016 2 次提交
- T
  NFS: Clean up nfs_attribute_timeout() · 187e593d
  由 Trond Myklebust 提交于 12月 16, 2016
```
It can be made static.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
  187e593d
- T
  NFS: Remove unused function nfs_revalidate_inode_rcu() · 3f642a13
  由 Trond Myklebust 提交于 12月 16, 2016
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
  3f642a13
19 12月, 2016 1 次提交

platform/x86: Add Whiskey Cove PMIC TMU support · 957ae509

由 Nilesh Bacchewar 提交于 11月 07, 2016

This adds TMU (Time Management Unit) support for Intel BXT platform.
It enables the alarm wake-up functionality in the TMU unit of Whiskey Cove
PMIC.
Signed-off-by: NNilesh Bacchewar <nilesh.bacchewar@intel.com>
Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com>
[andy: resolve merge conflict in Kconfig]
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>

957ae509

18 12月, 2016 7 次提交

bpf: fix overflow in prog accounting · 5ccb071e

由 Daniel Borkmann 提交于 12月 18, 2016

Commit aaac3ba9 ("bpf: charge user for creation of BPF maps and
programs") made a wrong assumption of charging against prog->pages.
Unlike map->pages, prog->pages are still subject to change when we
need to expand the program through bpf_prog_realloc().

This can for example happen during verification stage when we need to
expand and rewrite parts of the program. Should the required space
cross a page boundary, then prog->pages is not the same anymore as
its original value that we used to bpf_prog_charge_memlock() on. Thus,
we'll hit a wrap-around during bpf_prog_uncharge_memlock() when prog
is freed eventually. I noticed this that despite having unlimited
memlock, programs suddenly refused to load with EPERM error due to
insufficient memlock.

There are two ways to fix this issue. One would be to add a cached
variable to struct bpf_prog that takes a snapshot of prog->pages at the
time of charging. The other approach is to also account for resizes. I
chose to go with the latter for a couple of reasons: i) We want accounting
rather to be more accurate instead of further fooling limits, ii) adding
yet another page counter on struct bpf_prog would also be a waste just
for this purpose. We also do want to charge as early as possible to
avoid going into the verifier just to find out later on that we crossed
limits. The only place that needs to be fixed is bpf_prog_realloc(),
since only here we expand the program, so we try to account for the
needed delta and should we fail, call-sites check for outcome anyway.
On cBPF to eBPF migrations, we don't grab a reference to the user as
they are charged differently. With that in place, my test case worked
fine.

Fixes: aaac3ba9 ("bpf: charge user for creation of BPF maps and programs")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ccb071e

bpf: dynamically allocate digest scratch buffer · aafe6ae9

由 Daniel Borkmann 提交于 12月 18, 2016

Geert rightfully complained that 7bd509e3 ("bpf: add prog_digest
and expose it via fdinfo/netlink") added a too large allocation of
variable 'raw' from bss section, and should instead be done dynamically:

  # ./scripts/bloat-o-meter kernel/bpf/core.o.1 kernel/bpf/core.o.2
  add/remove: 3/0 grow/shrink: 0/0 up/down: 33291/0 (33291)
  function                                     old     new   delta
  raw                                            -   32832  +32832
  [...]

Since this is only relevant during program creation path, which can be
considered slow-path anyway, lets allocate that dynamically and be not
implicitly dependent on verifier mutex. Move bpf_prog_calc_digest() at
the beginning of replace_map_fd_with_map_ptr() and also error handling
stays straight forward.
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aafe6ae9

block: Remove unused member (busy) from struct blk_queue_tag · e8465447

由 Ritesh Harjani 提交于 12月 16, 2016

Signed-off-by: NRitesh Harjani <riteshh@codeaurora.org>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

e8465447

net: xdp: add invalid buffer warning · f23bc46c

由 John Fastabend 提交于 12月 15, 2016

This adds a warning for drivers to use when encountering an invalid
buffer for XDP. For normal cases this should not happen but to catch
this in virtual/qemu setups that I may not have expected from the
emulation layer having a standard warning is useful.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f23bc46c

irnet: ppp: move IRNET_MINOR to include/linux/miscdevice.h · 24c946cc

由 LABBE Corentin 提交于 12月 15, 2016

This patch move the define for IRNET_MINOR to include/linux/miscdevice.h
It is better that all minor number definitions are in the same place.
Signed-off-by: NCorentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24c946cc

bpf: cgroup: annotate pointers in struct cgroup_bpf with __rcu · dcdc43d6

由 Daniel Mack 提交于 12月 15, 2016

The member 'effective' in 'struct cgroup_bpf' is protected by RCU.
Annotate it accordingly to squelch a sparse warning.
Signed-off-by: NDaniel Mack <daniel@zonque.org>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dcdc43d6

inet: Fix get port to handle zero port number with soreuseport set · 0643ee4f

由 Tom Herbert 提交于 12月 14, 2016

A user may call listen with binding an explicit port with the intent
that the kernel will assign an available port to the socket. In this
case inet_csk_get_port does a port scan. For such sockets, the user may
also set soreuseport with the intent a creating more sockets for the
port that is selected. The problem is that the initial socket being
opened could inadvertently choose an existing and unreleated port
number that was already created with soreuseport.

This patch adds a boolean parameter to inet_bind_conflict that indicates
rather soreuseport is allowed for the check (in addition to
sk->sk_reuseport). In calls to inet_bind_conflict from inet_csk_get_port
the argument is set to true if an explicit port is being looked up (snum
argument is nonzero), and is false if port scan is done.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0643ee4f

17 12月, 2016 2 次提交

net: macb: Added PCI wrapper for Platform Driver. · 83a77e9e

由 Bartosz Folta 提交于 12月 14, 2016

There are hardware PCI implementations of Cadence GEM network
controller. This patch will allow to use such hardware with reuse of
existing Platform Driver.
Signed-off-by: NBartosz Folta <bfolta@cadence.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83a77e9e

x86/mpx: Move bd_addr to mm_context_t · cb02de96

由 Mark Rutland 提交于 12月 16, 2016

Currently bd_addr lives in mm_struct, which is otherwise architecture
independent. Architecture-specific data is supposed to live within
mm_context_t (itself contained in mm_struct).

Other x86-specific context like the pkey accounting data lives in
mm_context_t, and there's no readon the MPX data can't also live there.
So as to keep the arch-specific data togather, and to set a good example
for others, this patch moves bd_addr into x86's mm_context_t.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Acked-by: NDave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1481892055-24596-1-git-send-email-mark.rutland@arm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

cb02de96

16 12月, 2016 6 次提交

vfs: call vfs_clone_file_range() under freeze protection · 031a072a

由 Amir Goldstein 提交于 9月 23, 2016

Move sb_start_write()/sb_end_write() out of the vfs helper and up into the
ioctl handler.
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

031a072a

nfsd: add support for the umask attribute · 47057abd

由 Andreas Gruenbacher 提交于 1月 12, 2016

Clients can set the umask attribute when creating files to cause the
server to apply it always except when inheriting permissions from the
parent directory.  That way, the new files will end up with the same
permissions as files created locally.

See https://tools.ietf.org/html/draft-ietf-nfsv4-umask-02 for more
details.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

47057abd

linux: drop __bitwise__ everywhere · 9efeccac

由 Michael S. Tsirkin 提交于 12月 11, 2016

__bitwise__ used to mean "yes, please enable sparse checks
unconditionally", but now that we dropped __CHECK_ENDIAN__
__bitwise is exactly the same.
There aren't many users, replace it by __bitwise everywhere.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NStefan Schmidt <stefan@osg.samsung.com>
Acked-by: NKrzysztof Kozlowski <krzk@kernel.org>
Akced-by: NLee Duncan <lduncan@suse.com>

9efeccac

linux/types.h: enable endian checks for all sparse builds · 05de9700

由 Michael S. Tsirkin 提交于 12月 08, 2016

By now, linux is mostly endian-clean. Enabling endian-ness
checks for everyone produces about 200 new sparse warnings for me -
less than 10% over the 2000 sparse warnings already there.

Not a big deal, OTOH enabling this helps people notice
they are introducing new bugs.

So let's just drop __CHECK_ENDIAN__. Follow-up patches
can drop distinction between __bitwise and __bitwise__.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

05de9700

vhost: remove unused feature bit · 8d390464

由 Jason Wang 提交于 11月 18, 2016

Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

8d390464

crypto: add virtio-crypto driver · dbaf0624

由 Gonglei 提交于 12月 15, 2016

This patch introduces virtio-crypto driver for Linux Kernel.

The virtio crypto device is a virtual cryptography device
as well as a kind of virtual hardware accelerator for
virtual machines. The encryption anddecryption requests
are placed in the data queue and are ultimately handled by
thebackend crypto accelerators. The second queue is the
control queue used to create or destroy sessions for
symmetric algorithms and will control some advanced features
in the future. The virtio crypto device provides the following
cryptoservices: CIPHER, MAC, HASH, and AEAD.

For more information about virtio-crypto device, please see:
  http://qemu-project.org/Features/VirtioCrypto

CC: Michael S. Tsirkin <mst@redhat.com>
CC: Cornelia Huck <cornelia.huck@de.ibm.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Halil Pasic <pasic@linux.vnet.ibm.com>
CC: David S. Miller <davem@davemloft.net>
CC: Zeng Xin <xin.zeng@intel.com>
Signed-off-by: NGonglei <arei.gonglei@huawei.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

dbaf0624

15 12月, 2016 13 次提交

clocksource/dummy_timer: Move hotplug callback after the real timers · 9bf11ecc

由 Thomas Gleixner 提交于 12月 15, 2016

When the dummy timer callback is invoked before the real timer callbacks,
then it tries to install that timer for the starting CPU. If the platform
does not have a broadcast timer installed the installation fails with a
kernel crash. The crash happens due to a unconditional deference of the non
available broadcast device. This needs to be fixed in the timer core code.

But even when this is fixed in the core code then installing the dummy
timer before the real timers is a pointless exercise.

Move it to the end of the callback list.

Fixes: 00c1d17a ("clocksource/dummy_timer: Convert to hotplug state machine")
Reported-and-tested-by: NMason <slash.tmp@free.fr>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Richard Cochran <rcochran@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Sebastian Frias <sf84@laposte.net>
Cc: Thibaud Cornic <thibaud_cornic@sigmadesigns.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Link: http://lkml.kernel.org/r/1147ef90-7877-e4d2-bb2b-5c4fa8d3144b@free.fr

9bf11ecc

EDAC: Document HW_EVENT_ERR_DEFERRED type · 4838a0de

由 Yazen Ghannam 提交于 12月 01, 2016

Add a description of the HW_EVENT_ERR_DEFERRED type that wasn't included
with commit d12a969e ("EDAC, amd64: Add Deferred Error type").
Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

4838a0de

edac.rst: move concepts dictionary from edac.h · 6b1fb6f7

由 Mauro Carvalho Chehab 提交于 10月 29, 2016

Instead of storing the concepts dictionary inside header file,
move it to the subsystem documentation.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

6b1fb6f7

edac: fix kenel-doc markups at edac.h · e0020758

由 Mauro Carvalho Chehab 提交于 10月 28, 2016

As this file was never added to the driver-api, the kernel-doc
markups there were never tested. Some of them have issues.
Fix them.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

e0020758

edac: move EDAC PCI definitions to drivers/edac/edac_pci.h · 0b892c71

由 Mauro Carvalho Chehab 提交于 10月 29, 2016

The edac_core.h header contain data structures and function
definitions for the 3 parts of EDAC: MC, PCI and device.

Let's move the PCI ones to a separate header file, as part
of a header reorganization.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

0b892c71

x86/smpboot: Prevent false positive out of bounds cpumask access warning · 427d77a3

由 Thomas Gleixner 提交于 12月 13, 2016

prefill_possible_map() reinitializes the cpu_possible_map by setting the
possible cpu bits and clearing all other bits up to NR_CPUS.

This is technically always correct because cpu_possible_map is statically
allocated and sized NR_CPUS. With CPUMASK_OFFSTACK and DEBUG_PER_CPU_MAPS
enabled the bounds check of cpu masks happens on nr_cpu_ids. nr_cpu_ids is
initialized to NR_CPUS and only limited after the set/clear bit loops have
been executed.

But if the system was booted with "nr_cpus=N" on the command line, where N
is < NR_CPUS then nr_cpu_ids is limited in the parameter parsing function
before prefill_possible_map() is invoked. As a consequence the cpumask
bounds check triggers when clearing the bits past nr_cpu_ids.

Add a helper which allows to reset cpu_possible_map w/o the bounds check
and then set only the possible bits which are well inside bounds.
Reported-by: NDmitry Safonov <dsafonov@virtuozzo.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: 0x7f454c46@gmail.com
Cc: Jan Beulich <JBeulich@novell.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1612131836050.3415@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

427d77a3

idr: reduce the number of bits per level from 8 to 6 · 424251a4

由 Matthew Wilcox 提交于 12月 14, 2016

In preparation for merging the IDR and radix tree, reduce the fanout at
each level from 256 to 64. If this causes a performance problem then a
bisect will point to this commit, and we'll have a better idea about
what we might do to fix it.

Link: http://lkml.kernel.org/r/1480369871-5271-66-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: NMatthew Wilcox <willy@linux.intel.com>
Tested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

424251a4

rxrpc: abstract away knowledge of IDR internals · 44430612

由 Matthew Wilcox 提交于 12月 14, 2016

Add idr_get_cursor() / idr_set_cursor() APIs, and remove the reference
to IDR_SIZE.

Link: http://lkml.kernel.org/r/1480369871-5271-65-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NDavid Howells <dhowells@redhat.com>
Tested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

44430612

idr: add ida_is_empty · 99c49407

由 Matthew Wilcox 提交于 12月 14, 2016

Two of the USB Gadgets were poking around in the internals of struct ida
in order to determine if it is empty. Add the appropriate abstraction.

Link: http://lkml.kernel.org/r/1480369871-5271-63-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: NMatthew Wilcox <willy@linux.intel.com>
Acked-by: NKonstantin Khlebnikov <koct9i@gmail.com>
Tested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

99c49407

radix-tree: add radix_tree_split_preload() · 2791653a

由 Matthew Wilcox 提交于 12月 14, 2016

Calculate how many nodes we need to allocate to split an old_order entry
into multiple entries, each of size new_order.  The test suite checks
that we allocated exactly the right number of nodes; neither too many
(checked by rtp->nr == 0), nor too few (checked by comparing
nr_allocated before and after the call to radix_tree_split()).

Link: http://lkml.kernel.org/r/1480369871-5271-60-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: NMatthew Wilcox <willy@linux.intel.com>
Tested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2791653a

radix-tree: add radix_tree_split · e157b555

由 Matthew Wilcox 提交于 12月 14, 2016

This new function splits a larger multiorder entry into smaller entries
(potentially multi-order entries).  These entries are initialised to
RADIX_TREE_RETRY to ensure that RCU walkers who see this state aren't
confused.  The caller should then call radix_tree_for_each_slot() and
radix_tree_replace_slot() in order to turn these retry entries into the
intended new entries.  Tags are replicated from the original multiorder
entry into each new entry.

Link: http://lkml.kernel.org/r/1480369871-5271-59-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: NMatthew Wilcox <willy@linux.intel.com>
Tested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e157b555

radix-tree: add radix_tree_join · 175542f5

由 Matthew Wilcox 提交于 12月 14, 2016

This new function allows for the replacement of many smaller entries in
the radix tree with one larger multiorder entry. From the point of view
of an RCU walker, they may see a mixture of the smaller entries and the
large entry during the same walk, but they will never see NULL for an
index which was populated before the join.

Link: http://lkml.kernel.org/r/1480369871-5271-58-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: NMatthew Wilcox <willy@linux.intel.com>
Tested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

175542f5

radix-tree: delete radix_tree_range_tag_if_tagged() · 268f42de

由 Matthew Wilcox 提交于 12月 14, 2016

This is an exceptionally complicated function with just one caller
(tag_pages_for_writeback).  We devote a large portion of the runtime of
the test suite to testing this one function which has one caller.  By
introducing the new function radix_tree_iter_tag_set(), we can eliminate
all of the complexity while keeping the performance.  The caller can now
use a fairly standard radix_tree_for_each() loop, and it doesn't need to
worry about tricksy things like 'start' wrapping.

The test suite continues to spend a large amount of time investigating
this function, but now it's testing the underlying primitives such as
radix_tree_iter_resume() and the radix_tree_for_each_tagged() iterator
which are also used by other parts of the kernel.

Link: http://lkml.kernel.org/r/1480369871-5271-57-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: NMatthew Wilcox <willy@infradead.org>
Tested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

268f42de

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功