提交 · bb7ffbf29e76b89a86ca4c3ee0d4690641f2f772 · openanolis / cloud-kernel

01 4月, 2015 1 次提交

sunrpc: make debugfs file creation failure non-fatal · f9c72d10

由 Jeff Layton 提交于 3月 31, 2015

We currently have a problem that SELinux policy is being enforced when
creating debugfs files. If a debugfs file is created as a side effect of
doing some syscall, then that creation can fail if the SELinux policy
for that process prevents it.

This seems wrong. We don't do that for files under /proc, for instance,
so Bruce has proposed a patch to fix that.

While discussing that patch however, Greg K.H. stated:

    "No kernel code should care / fail if a debugfs function fails, so
     please fix up the sunrpc code first."

This patch converts all of the sunrpc debugfs setup code to be void
return functins, and the callers to not look for errors from those
functions.

This should allow rpc_clnt and rpc_xprt creation to work, even if the
kernel fails to create debugfs files for some reason.

Symptoms were failing krb5 mounts on systems using gss-proxy and
selinux.

Fixes: 388f0c77 "sunrpc: add a debugfs rpc_xprt directory..."
Cc: stable@vger.kernel.org
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

f9c72d10

13 3月, 2015 3 次提交

of/platform: Fix sparc:allmodconfig build · a697c2ef

由 Guenter Roeck 提交于 3月 10, 2015

sparc:allmodconfig fails to build with:

drivers/built-in.o: In function `platform_bus_init':
(.init.text+0x3684): undefined reference to `of_platform_register_reconfig_notifier'

of_platform_register_reconfig_notifier is only declared if both OF_ADDRESS
and OF_DYNAMIC are configured. Yet, the include file only declares a dummy
function if OF_DYNAMIC is not configured. The sparc architecture does not
configure OF_ADDRESS, but does configure OF_DYNAMIC, causing above error.

Fixes: 801d728c ("of/reconfig: Add OF_DYNAMIC notifier for platform_bus_type")
Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NRob Herring <robh@kernel.org>

a697c2ef

kasan, module: move MODULE_ALIGN macro into <linux/moduleloader.h> · d3733e5c

由 Andrey Ryabinin 提交于 3月 12, 2015

include/linux/moduleloader.h is more suitable place for this macro.
Also change alignment to PAGE_SIZE for CONFIG_KASAN=n as such
alignment already assumed in several places.
Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Acked-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d3733e5c

kasan, module, vmalloc: rework shadow allocation for modules · a5af5aa8

由 Andrey Ryabinin 提交于 3月 12, 2015

Current approach in handling shadow memory for modules is broken.

Shadow memory could be freed only after memory shadow corresponds it is no
longer used.  vfree() called from interrupt context could use memory its
freeing to store 'struct llist_node' in it:

    void vfree(const void *addr)
    {
    ...
        if (unlikely(in_interrupt())) {
            struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred);
            if (llist_add((struct llist_node *)addr, &p->list))
                    schedule_work(&p->wq);

Later this list node used in free_work() which actually frees memory.
Currently module_memfree() called in interrupt context will free shadow
before freeing module's memory which could provoke kernel crash.

So shadow memory should be freed after module's memory.  However, such
deallocation order could race with kasan_module_alloc() in module_alloc().

Free shadow right before releasing vm area.  At this point vfree()'d
memory is not used anymore and yet not available for other allocations.
New VM_KASAN flag used to indicate that vm area has dynamically allocated
shadow memory so kasan frees shadow only if it was previously allocated.
Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a5af5aa8

12 3月, 2015 1 次提交

clk: introduce clk_is_match · 3d3801ef

由 Michael Turquette 提交于 2月 25, 2015

Some drivers compare struct clk pointers as a means of knowing
if the two pointers reference the same clock hardware. This behavior is
dubious (drivers must not dereference struct clk), but did not cause any
regressions until the per-user struct clk patch was merged. Now the test
for matching clk's will always fail with per-user struct clk's.

clk_is_match is introduced to fix the regression and prevent drivers
from comparing the pointers manually.

Fixes: 035a61c3 ("clk: Make clk API return per-user struct clk instances")
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: NMichael Turquette <mturquette@linaro.org>
[arnd@arndb.de: Fix COMMON_CLK=N && HAS_CLK=Y config]
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
[sboyd@codeaurora.org: const arguments to clk_is_match() and
remove unnecessary ternary operation]
Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>

3d3801ef

08 3月, 2015 2 次提交

irqchip: gicv3-its: Define macros for GITS_CTLR fields · 7cb99116

由 Yun Wu 提交于 3月 06, 2015

Define macros for GITS_CTLR fields to avoid using magic numbers.
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NYun Wu <wuyun.wu@huawei.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/1425659870-11832-11-git-send-email-marc.zyngier@arm.comSigned-off-by: NJason Cooper <jason@lakedaemon.net>

7cb99116

irqchip: gicv3-its: Allocate enough memory for the full range of DeviceID · f54b97ed

由 Marc Zyngier 提交于 3月 06, 2015

The ITS table allocator is only allocating a single page per table.
This works fine for most things, but leads to silent lack of
interrupt delivery if we end-up with a device that has an ID that is
out of the range defined by a single page of memory. Even worse, depending
on the page size, behaviour changes, which is not a very good experience.

A solution is actually to allocate memory for the full range of ID that
the ITS supports. A massive waste memory wise, but at least a safe bet.

Tested on a Phytium SoC.
Tested-by: NChen Baozi <chenbaozi@kylinos.com.cn>
Acked-by: NChen Baozi <chenbaozi@kylinos.com.cn>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/1425659870-11832-3-git-send-email-marc.zyngier@arm.comSigned-off-by: NJason Cooper <jason@lakedaemon.net>

f54b97ed

07 3月, 2015 2 次提交

serial: uapi: Declare all userspace-visible io types · 647f162b

由 Peter Hurley 提交于 3月 01, 2015

ioctl(TIOCGSERIAL|TIOCSSERIAL) report and can change the port->iotype.
UART drivers use the UPIO_* definitions, but the uapi header defines
parallel values and userspace uses these parallel values for ioctls;
thus the userspace values are definitive.

Define UPIO_* iotypes in terms of the uapi defines, SERIAL_IO_*;
extend the uapi defines to include all values in use by the serial
core.
Signed-off-by: NPeter Hurley <peter@hurleysoftware.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

647f162b

serial: core: Fix iotype userspace breakage · 2bb78516

由 Peter Hurley 提交于 3月 01, 2015

commit 3ffb1a81 ("serial: core: Add big-endian iotype")
re-numbered userspace-dependent values; ioctl(TIOCSSERIAL) can
assign the port iotype (which is expected to match the selected
i/o accessors), so iotype values must not be changed.

Cc: Kevin Cernekee <cernekee@gmail.com>
Cc: <stable@vger.kernel.org> # 3.19+
Signed-off-by: NPeter Hurley <peter@hurleysoftware.com>
Reviewed-by: NKevin Cernekee <cernekee@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

2bb78516

06 3月, 2015 1 次提交

cpuidle / sleep: Use broadcast timer for states that stop local timer · ef2b22ac

由 Rafael J. Wysocki 提交于 3月 02, 2015

Commit 38106313 (PM / sleep: Re-implement suspend-to-idle handling)
overlooked the fact that entering some sufficiently deep idle states
by CPUs may cause their local timers to stop and in those cases it
is necessary to switch over to a broadcast timer prior to entering
the idle state. If the cpuidle driver in use does not provide
the new ->enter_freeze callback for any of the idle states, that
problem affects suspend-to-idle too, but it is not taken into account
after the changes made by commit 38106313.

Fix that by changing the definition of cpuidle_enter_freeze() and
re-arranging of the code in cpuidle_idle_call(), so the former does
not call cpuidle_enter() any more and the fallback case is handled
by cpuidle_idle_call() directly.

Fixes: 38106313 (PM / sleep: Re-implement suspend-to-idle handling)
Reported-and-tested-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>

ef2b22ac

05 3月, 2015 2 次提交

workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE · 8603e1b3

由 Tejun Heo 提交于 3月 05, 2015

cancel[_delayed]_work_sync() are implemented using
__cancel_work_timer() which grabs the PENDING bit using
try_to_grab_pending() and then flushes the work item with PENDING set
to prevent the on-going execution of the work item from requeueing
itself.

try_to_grab_pending() can always grab PENDING bit without blocking
except when someone else is doing the above flushing during
cancelation.  In that case, try_to_grab_pending() returns -ENOENT.  In
this case, __cancel_work_timer() currently invokes flush_work().  The
assumption is that the completion of the work item is what the other
canceling task would be waiting for too and thus waiting for the same
condition and retrying should allow forward progress without excessive
busy looping

Unfortunately, this doesn't work if preemption is disabled or the
latter task has real time priority.  Let's say task A just got woken
up from flush_work() by the completion of the target work item.  If,
before task A starts executing, task B gets scheduled and invokes
__cancel_work_timer() on the same work item, its try_to_grab_pending()
will return -ENOENT as the work item is still being canceled by task A
and flush_work() will also immediately return false as the work item
is no longer executing.  This puts task B in a busy loop possibly
preventing task A from executing and clearing the canceling state on
the work item leading to a hang.

task A			task B			worker

						executing work
__cancel_work_timer()
  try_to_grab_pending()
  set work CANCELING
  flush_work()
    block for work completion
						completion, wakes up A
			__cancel_work_timer()
			while (forever) {
			  try_to_grab_pending()
			    -ENOENT as work is being canceled
			  flush_work()
			    false as work is no longer executing
			}

This patch removes the possible hang by updating __cancel_work_timer()
to explicitly wait for clearing of CANCELING rather than invoking
flush_work() after try_to_grab_pending() fails with -ENOENT.

Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com

v3: bit_waitqueue() can't be used for work items defined in vmalloc
    area.  Switched to custom wake function which matches the target
    work item and exclusive wait and wakeup.

v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if
    the target bit waitqueue has wait_bit_queue's on it.  Use
    DEFINE_WAIT_BIT() and __wake_up_bit() instead.  Reported by Tomeu
    Vizoso.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NRabin Vincent <rabin.vincent@axis.com>
Cc: Tomeu Vizoso <tomeu.vizoso@gmail.com>
Cc: stable@vger.kernel.org
Tested-by: NJesper Nilsson <jesper.nilsson@axis.com>
Tested-by: NRabin Vincent <rabin.vincent@axis.com>

8603e1b3

genirq / PM: Add flag for shared NO_SUSPEND interrupt lines · 17f48034

由 Rafael J. Wysocki 提交于 2月 27, 2015

It currently is required that all users of NO_SUSPEND interrupt
lines pass the IRQF_NO_SUSPEND flag when requesting the IRQ or the
WARN_ON_ONCE() in irq_pm_install_action() will trigger.  That is
done to warn about situations in which unprepared interrupt handlers
may be run unnecessarily for suspended devices and may attempt to
access those devices by mistake.  However, it may cause drivers
that have no technical reasons for using IRQF_NO_SUSPEND to set
that flag just because they happen to share the interrupt line
with something like a timer.

Moreover, the generic handling of wakeup interrupts introduced by
commit 9ce7a258 (genirq: Simplify wakeup mechanism) only works
for IRQs without any NO_SUSPEND users, so the drivers of wakeup
devices needing to use shared NO_SUSPEND interrupt lines for
signaling system wakeup generally have to detect wakeup in their
interrupt handlers.  Thus if they happen to share an interrupt line
with a NO_SUSPEND user, they also need to request that their
interrupt handlers be run after suspend_device_irqs().

In both cases the reason for using IRQF_NO_SUSPEND is not because
the driver in question has a genuine need to run its interrupt
handler after suspend_device_irqs(), but because it happens to
share the line with some other NO_SUSPEND user.  Otherwise, the
driver would do without IRQF_NO_SUSPEND just fine.

To make it possible to specify that condition explicitly, introduce
a new IRQ action handler flag for shared IRQs, IRQF_COND_SUSPEND,
that, when set, will indicate to the IRQ core that the interrupt
user is generally fine with suspending the IRQ, but it also can
tolerate handler invocations after suspend_device_irqs() and, in
particular, it is capable of detecting system wakeup and triggering
it as appropriate from its interrupt handler.

That will allow us to work around a problem with a shared timer
interrupt line on at91 platforms.

Link: http://marc.info/?l=linux-kernel&m=142252777602084&w=2
Link: http://marc.info/?t=142252775300011&r=1&w=2
Link: https://lkml.org/lkml/2014/12/15/552Reported-by: NBoris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NMark Rutland <mark.rutland@arm.com>

17f48034

04 3月, 2015 1 次提交

NFS: Fix a regression in the read() syscall · 874f9463

由 Trond Myklebust 提交于 3月 02, 2015

When invalidating the page cache for a regular file, we want to first
sync all dirty data to disk and then call invalidate_inode_pages2().
The latter relies on nfs_launder_page() and nfs_release_page() to deal
respectively with dirty pages, and unstable written pages.

When commit 95905446 ("NFS: avoid deadlocks with loop-back mounted
NFS filesystems.") changed the behaviour of nfs_release_page(), then it
made it possible for invalidate_inode_pages2() to fail with an EBUSY.
Unfortunately, that error is then propagated back to read().

Let's therefore work around the problem for now by protecting the call
to sync the data and invalidate_inode_pages2() so that they are atomic
w.r.t. the addition of new writes.
Later on, we can revisit whether or not we still need nfs_launder_page()
and nfs_release_page().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

874f9463

03 3月, 2015 2 次提交

spi: fix a typo in comment. · c6331ba3

由 Marcin Bis 提交于 3月 01, 2015

alway -> always
Signed-off-by: NMarcin Bis <marcin@bis.org.pl>
Signed-off-by: NMark Brown <broonie@kernel.org>

c6331ba3

net/mlx4_core: Fix wrong mask and error flow for the update-qp command · f5956faf

由 Or Gerlitz 提交于 3月 02, 2015

The bit mask for currently supported driver features (MLX4_UPDATE_QP_SUPPORTED_ATTRS)
of the update-qp command was defined twice (using enum value and pre-processor
define directive) and wrong.

The return value of the call to mlx4_update_qp() from within the SRIOV
resource-tracker was wrongly voided down.

Fix both issues.

issue: none
Fixes: 09e05c3f ('net/mlx4: Set vlan stripping policy by the right command')
Fixes: ce8d9e0d ('net/mlx4_core: Add UPDATE_QP SRIOV wrapper support')
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5956faf

02 3月, 2015 3 次提交

NFS: Add attribute update barriers to NFS writebacks · a08a8cd3

由 Trond Myklebust 提交于 2月 26, 2015

Ensure that other operations that race with our write RPC calls
cannot revert the file size updates that were made on the server.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Tested-by: NChuck Lever <chuck.lever@oracle.com>

a08a8cd3

NFS: Add attribute update barriers to nfs_setattr_update_inode() · f044636d

由 Trond Myklebust 提交于 2月 26, 2015

Ensure that other operations which raced with our setattr RPC call
cannot revert the file attribute changes that were made on the server.
To do so, we artificially bump the attribute generation counter on
the inode so that all calls to nfs_fattr_init() that precede ours
will be dropped.

The motivation for the patch came from Chuck Lever's reports of readaheads
racing with truncate operations and causing the file size to be reverted.
Reported-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Tested-by: NChuck Lever <chuck.lever@oracle.com>

f044636d

NFS: Add a helper to set attribute barriers · 140e049c

由 Trond Myklebust 提交于 2月 26, 2015

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Tested-by: NChuck Lever <chuck.lever@oracle.com>

140e049c

28 2月, 2015 1 次提交

rhashtable: remove indirection for grow/shrink decision functions · 4c4b52d9

由 Daniel Borkmann 提交于 2月 25, 2015

Currently, all real users of rhashtable default their grow and shrink
decision functions to rht_grow_above_75() and rht_shrink_below_30(),
so that there's currently no need to have this explicitly selectable.

It can/should be generic and private inside rhashtable until a real
use case pops up. Since we can make this private, we'll save us this
additional indirection layer and can improve insertion/deletion time
as well.

Reference: http://patchwork.ozlabs.org/patch/443040/Suggested-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c4b52d9

27 2月, 2015 1 次提交

Revert "USB: serial: make bulk_out_size a lower limit" · bc4b1f48

由 Johan Hovold 提交于 2月 15, 2015

This reverts commit 5083fd7b.

A bulk-out size smaller than the end-point size is indeed valid. The
offending commit broke the usb-debug driver for EHCI debug devices,
which use 8-byte buffers.

Fixes: 5083fd7b ("USB: serial: make bulk_out_size a lower limit")
Reported-by: N"Li, Elvin" <elvin.li@intel.com>
Cc: stable <stable@vger.kernel.org>	# v3.15
Signed-off-by: NJohan Hovold <johan@kernel.org>

bc4b1f48

26 2月, 2015 1 次提交

genirq / PM: better describe IRQF_NO_SUSPEND semantics · 737eb030

由 Mark Rutland 提交于 2月 20, 2015

The IRQF_NO_SUSPEND flag is intended to be used for interrupts required
to be enabled during the suspend-resume cycle. This mostly consists of
IPIs and timer interrupts, potentially including chained irqchip
interrupts if these are necessary to handle timers or IPIs. If an
interrupt does not fall into one of the aforementioned categories,
requesting it with IRQF_NO_SUSPEND is likely incorrect.

Using IRQF_NO_SUSPEND does not guarantee that the interrupt can wake the
system from a suspended state. For an interrupt to be able to trigger a
wakeup, it may be necessary to program various components of the system.
In these cases it is necessary to use {enable,disabled}_irq_wake.

Unfortunately, several drivers assume that IRQF_NO_SUSPEND ensures that
an IRQ can wake up the system, and the documentation can be read
ambiguously w.r.t. this property.

This patch updates the documentation regarding IRQF_NO_SUSPEND to make
this caveat explicit, hopefully making future misuse rarer. Cleanup of
existing misuse will occur as part of later patch series.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

737eb030

25 2月, 2015 1 次提交

thermal: Introduce dummy functions when thermal is not defined · 12ca7188

由 Nishanth Menon 提交于 2月 13, 2015

When CONFIG_THERMAL is not enabled, it is better to introduce
equivalent dummy functions in the exported header than to
introduce #ifdeffery in drivers using the function.

This will prevent issues such as that reported in:
http://www.spinics.net/lists/linux-next/msg31573.html

While at it switch over to IS_ENABLED for thermal macros
to allow for thermal framework to be built as framework
and relevant APIs be usable by relevant drivers as a result.
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NNishanth Menon <nm@ti.com>
Signed-off-by: NEduardo Valentin <edubezval@gmail.com>

12ca7188

23 2月, 2015 4 次提交

VFS: Split DCACHE_FILE_TYPE into regular and special types · 44bdb5e5

由 David Howells 提交于 1月 29, 2015

Split DCACHE_FILE_TYPE into DCACHE_REGULAR_TYPE (dentries representing regular
files) and DCACHE_SPECIAL_TYPE (representing blockdev, chardev, FIFO and
socket files).

d_is_reg() and d_is_special() are added to detect these subtypes and
d_is_file() is left as the union of the two.

This allows a number of places that use S_ISREG(dentry->d_inode->i_mode) to
use d_is_reg(dentry) instead.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

44bdb5e5

VFS: Add a fallthrough flag for marking virtual dentries · df1a085a

由 David Howells 提交于 1月 29, 2015

Add a DCACHE_FALLTHRU flag to indicate that, in a layered filesystem, this is
a virtual dentry that covers another one in a lower layer that should be used
instead.  This may be recorded on medium if directory integration is stored
there.

The flag can be set with d_set_fallthru() and tested with d_is_fallthru().

Original-author: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

df1a085a

VFS: Add a whiteout dentry type · e7f7d225

由 David Howells 提交于 1月 29, 2015

Add DCACHE_WHITEOUT_TYPE and provide a d_is_whiteout() accessor function.  A
d_is_miss() accessor is also added for ordinary cache misses and
d_is_negative() is modified to indicate either an ordinary miss or an enforced
miss (whiteout).
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e7f7d225

VFS: Introduce inode-getting helpers for layered/unioned fs environments · 155e35d4

由 David Howells 提交于 1月 29, 2015

Introduce some function for getting the inode (and also the dentry) in an
environment where layered/unioned filesystems are in operation.

The problem is that we have places where we need *both* the union dentry and
the lower source or workspace inode or dentry available, but we can only have
a handle on one of them.  Therefore we need to derive the handle to the other
from that.

The idea is to introduce an extra field in struct dentry that allows the union
dentry to refer to and pin the lower dentry.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

155e35d4

22 2月, 2015 2 次提交

rhashtable: ensure cache line alignment on bucket_table · b9ebafbe

由 Eric Dumazet 提交于 2月 20, 2015

struct bucket_table contains mostly read fields :

size, locks_mask, locks.

Make sure these are not sharing a cache line with buckets[]
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9ebafbe

kernel: make READ_ONCE() valid on const arguments · dd369297

由 Linus Torvalds 提交于 2月 20, 2015

The use of READ_ONCE() causes lots of warnings witht he pending paravirt
spinlock fixes, because those ends up having passing a member to a
'const' structure to READ_ONCE().

There should certainly be nothing wrong with using READ_ONCE() with a
const source, but the helper function __read_once_size() would cause
warnings because it would drop the 'const' qualifier, but also because
the destination would be marked 'const' too due to the use of 'typeof'.

Use a union of types in READ_ONCE() to avoid this issue.

Also make sure to use parenthesis around the macro arguments to avoid
possible operator precedence issues.
Tested-by: NIngo Molnar <mingo@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dd369297

21 2月, 2015 1 次提交

net: Initialize all members in skb_gro_remcsum_init() · 846cd667

由 Geert Uytterhoeven 提交于 2月 18, 2015

skb_gro_remcsum_init() initializes the gro_remcsum.delta member only,
leading to compiler warnings about a possibly uninitialized
gro_remcsum.offset member:

drivers/net/vxlan.c: In function ‘vxlan_gro_receive’:
drivers/net/vxlan.c:602: warning: ‘grc.offset’ may be used uninitialized in this function
net/ipv4/fou.c: In function ‘gue_gro_receive’:
net/ipv4/fou.c:262: warning: ‘grc.offset’ may be used uninitialized in this function

While these are harmless for now:
  - skb_gro_remcsum_process() sets offset before changing delta,
  - skb_gro_remcsum_cleanup() checks if delta is non-zero before
    accessing offset,
it's safer to let the initialization function initialize all members.
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Acked-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

846cd667

20 2月, 2015 6 次提交

NVMe: Fix potential corruption during shutdown · 07836e65

由 Keith Busch 提交于 2月 19, 2015

The driver has to end unreturned commands at some point even if the
controller has not provided a completion. The driver tried to be safe by
deleting IO queues prior to ending all unreturned commands. That should
cause the controller to internally abort inflight commands, but IO queue
deletion request does not have to be successful, so all bets are off. We
still have to make progress, so to be extra safe, this patch doesn't
clear a queue to release the dma mapping for a command until after the
pci device has been disabled.

This patch removes the special handling during device initialization
so controller recovery can be done all the time. This is possible since
initialization is not inlined with pci probe anymore.
Reported-by: NNilish Choudhury <nilesh.choudhury@oracle.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>

07836e65

NVMe: Asynchronous controller probe · 2e1d8448

由 Keith Busch 提交于 2月 12, 2015

This performs the longest parts of nvme device probe in scheduled work.
This speeds up probe significantly when multiple devices are in use.
Signed-off-by: NKeith Busch <keith.busch@intel.com>

2e1d8448

NVMe: Register management handle under nvme class · b3fffdef

由 Keith Busch 提交于 2月 03, 2015

This creates a new class type for nvme devices to register their
management character devices with. This is so we do not rely on miscdev
to provide enough minors for as many nvme devices some people plan to
use. The previous limit was approximately 60 NVMe controllers, depending
on the platform and kernel. Now the limit is 1M, which ought to be enough
for anybody.

Since we have a new device class, it makes sense to attach the block
devices under this as well, so part of this patch moves the management
handle initialization prior to the namespaces discovery.
Signed-off-by: NKeith Busch <keith.busch@intel.com>

b3fffdef

NVMe: Update SCSI Inquiry VPD 83h translation · 4f1982b4

由 Keith Busch 提交于 2月 19, 2015

The original translation created collisions on Inquiry VPD 83 for many
existing devices. Newer specifications provide other ways to translate
based on the device's version can be used to create unique identifiers.

Version 1.1 provides an EUI64 field that uniquely identifies each
namespace, and 1.2 added the longer NGUID field for the same reason.
Both follow the IEEE EUI format and readily translate to the SCSI device
identification EUI designator type 2h. For devices implementing either,
the translation will use this type, defaulting to the EUI64 8-byte type if
implemented then NGUID's 16 byte version if not. If neither are provided,
the 1.0 translation is used, and is updated to use the SCSI String format
to guarantee a unique identifier.

Knowing when to use the new fields depends on the nvme controller's
revision. The NVME_VS macro was not decoding this correctly, so that is
fixed in this patch and moved to a more appropriate place.

Since the Identify Namespace structure required an update for the NGUID
field, this patch adds the remaining new 1.2 fields to the structure.
Signed-off-by: NKeith Busch <keith.busch@intel.com>

4f1982b4

NVMe: Metadata format support · e1e5e564

由 Keith Busch 提交于 2月 19, 2015

Adds support for NVMe metadata formats and exposes block devices for
all namespaces regardless of their format. Namespace formats that are
unusable will have disk capacity set to 0, but a handle to the block
device is created to simplify device management. A namespace is not
usable when the format requires host interleave block and metadata in
single buffer, has no provisioned storage, or has better data but failed
to register with blk integrity.

The namespace has to be scanned in two phases to support separate
metadata formats. The first establishes the sector size and capacity
prior to invoking add_disk. If metadata is required, the capacity will
be temporarilly set to 0 until it can be revalidated and registered with
the integrity extenstions after add_disk completes.

The driver relies on the integrity extensions to provide the metadata
buffer. NVMe requires this be a single physically contiguous region,
so only one integrity segment is allowed per command. If the metadata
is used for T10 PI, the driver provides mappings to save and restore
the reftag physical block translation. The driver provides no-op
functions for generate and verify if metadata is not used for protection
information. This way the setup is always provided by the block layer.

If a request does not supply a required metadata buffer, the command
is failed with bad address. This could only happen if a user manually
disables verify/generate on such a disk. The only exception to where
this is okay is if the controller is capable of stripping/generating
the metadata, which is possible on some types of formats.

The metadata scatter gather list now occupies the spot in the nvme_iod
that used to be used to link retryable IOD's, but we don't do that
anymore, so the field was unused.
Signed-off-by: NKeith Busch <keith.busch@intel.com>

e1e5e564

kdb: Avoid printing KERN_ levels to consoles · f7d4ca8b

由 Daniel Thompson 提交于 11月 07, 2014

Currently when kdb traps printk messages then the raw log level prefix
(consisting of '\001' followed by a numeral) does not get stripped off
before the message is issued to the various I/O handlers supported by
kdb. This causes annoying visual noise as well as causing problems
grepping for ^. It is also a change of behaviour compared to normal usage
of printk() usage. For example <SysRq>-h ends up with different output to
that of kdb's "sr h".

This patch addresses the problem by stripping log levels from messages
before they are issued to the I/O handlers. printk() which can also
act as an i/o handler in some cases is special cased; if the caller
provided a log level then the prefix will be preserved when sent to
printk().

The addition of non-printable characters to the output of kdb commands is a
regression, albeit and extremely elderly one, introduced by commit
04d2c8c8 ("printk: convert the format for KERN_<LEVEL> to a 2 byte
pattern"). Note also that this patch does *not* restore the original
behaviour from v3.5. Instead it makes printk() from within a kdb command
display the message without any prefix (i.e. like printk() normally does).
Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
Cc: Joe Perches <joe@perches.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

f7d4ca8b

19 2月, 2015 5 次提交

libceph: tcp_nodelay support · ba988f87

由 Chaitanya Huilgol 提交于 1月 23, 2015

TCP_NODELAY socket option set on connection sockets,
disables Nagle’s algorithm and improves latency characteristics.
tcp_nodelay(default)/notcp_nodelay option flags provided to
enable/disable setting the socket option.
Signed-off-by: NChaitanya Huilgol <chaitanya.huilgol@sandisk.com>
[idryomov@redhat.com: NO_TCP_NODELAY -> TCP_NODELAY, minor adjustments]
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>

ba988f87

ceph: handle SESSION_FORCE_RO message · 03f4fcb0

由 Yan, Zheng 提交于 1月 05, 2015

mark session as readonly and wake up all cap waiters.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

03f4fcb0

libceph: nuke pool op infrastructure · 7a6fdeb2

由 Ilya Dryomov 提交于 12月 22, 2014

On Mon, Dec 22, 2014 at 5:35 PM, Sage Weil <sage@newdream.net> wrote:
> On Mon, 22 Dec 2014, Ilya Dryomov wrote:
>> Actually, pool op stuff has been unused for over two years - looks like
>> it was added for rbd create_snap and that got ripped out in 2012.  It's
>> unlikely we'd ever need to manage pools or snaps from the kernel client
>> so I think it makes sense to nuke it.  Sage?
>
> Yep!
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>

7a6fdeb2

NFSv4.1: Clean up bind_conn_to_session · 71a097c6

由 Trond Myklebust 提交于 2月 18, 2015

We don't need to fake up an entire session in order retrieve the arguments.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

71a097c6

NFSv4.1: Clean up create_session · 79969dd1

由 Trond Myklebust 提交于 2月 18, 2015

Don't decode directly into the shared struct session
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

79969dd1

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功