提交 · 167225b775d47954d702db4743f9d918aabab0a8 · openanolis / cloud-kernel

28 7月, 2014 1 次提交

Revert "selinux: fix the default socket labeling in sock_graft()" · 2873ead7

由 Paul Moore 提交于 7月 28, 2014

This reverts commit 4da6daf4.

Unfortunately, the commit in question caused problems with Bluetooth
devices, specifically it caused them to get caught in the newly
created BUG_ON() check.  The AF_ALG problem still exists, but will be
addressed in a future patch.

Cc: stable@vger.kernel.org
Signed-off-by: NPaul Moore <pmoore@redhat.com>

2873ead7

26 7月, 2014 2 次提交

ima: add support for measuring and appraising firmware · 5a9196d7

由 Mimi Zohar 提交于 7月 22, 2014

The "security: introduce kernel_fw_from_file hook" patch defined a
new security hook to evaluate any loaded firmware that wasn't built
into the kernel.

This patch defines ima_fw_from_file(), which is called from the new
security hook, to measure and/or appraise the loaded firmware's
integrity.
Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: NKees Cook <keescook@chromium.org>

5a9196d7

security: introduce kernel_fw_from_file hook · 13752fe2

由 Kees Cook 提交于 2月 25, 2014

In order to validate the contents of firmware being loaded, there must be
a hook to evaluate any loaded firmware that wasn't built into the kernel
itself. Without this, there is a risk that a root user could load malicious
firmware designed to mount an attack against kernel memory (e.g. via DMA).
Signed-off-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NTakashi Iwai <tiwai@suse.de>

13752fe2

24 7月, 2014 1 次提交

CAPABILITIES: remove undefined caps from all processes · 7d8b6c63

由 Eric Paris 提交于 7月 23, 2014

This is effectively a revert of 7b9a7ec5
plus fixing it a different way...

We found, when trying to run an application from an application which
had dropped privs that the kernel does security checks on undefined
capability bits.  This was ESPECIALLY difficult to debug as those
undefined bits are hidden from /proc/$PID/status.

Consider a root application which drops all capabilities from ALL 4
capability sets.  We assume, since the application is going to set
eff/perm/inh from an array that it will clear not only the defined caps
less than CAP_LAST_CAP, but also the higher 28ish bits which are
undefined future capabilities.

The BSET gets cleared differently.  Instead it is cleared one bit at a
time.  The problem here is that in security/commoncap.c::cap_task_prctl()
we actually check the validity of a capability being read.  So any task
which attempts to 'read all things set in bset' followed by 'unset all
things set in bset' will not even attempt to unset the undefined bits
higher than CAP_LAST_CAP.

So the 'parent' will look something like:
CapInh:	0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	ffffffc000000000

All of this 'should' be fine.  Given that these are undefined bits that
aren't supposed to have anything to do with permissions.  But they do...

So lets now consider a task which cleared the eff/perm/inh completely
and cleared all of the valid caps in the bset (but not the invalid caps
it couldn't read out of the kernel).  We know that this is exactly what
the libcap-ng library does and what the go capabilities library does.
They both leave you in that above situation if you try to clear all of
you capapabilities from all 4 sets.  If that root task calls execve()
the child task will pick up all caps not blocked by the bset.  The bset
however does not block bits higher than CAP_LAST_CAP.  So now the child
task has bits in eff which are not in the parent.  These are
'meaningless' undefined bits, but still bits which the parent doesn't
have.

The problem is now in cred_cap_issubset() (or any operation which does a
subset test) as the child, while a subset for valid cap bits, is not a
subset for invalid cap bits!  So now we set durring commit creds that
the child is not dumpable.  Given it is 'more priv' than its parent.  It
also means the parent cannot ptrace the child and other stupidity.

The solution here:
1) stop hiding capability bits in status
	This makes debugging easier!

2) stop giving any task undefined capability bits.  it's simple, it you
don't put those invalid bits in CAP_FULL_SET you won't get them in init
and you won't get them in any other task either.
	This fixes the cap_issubset() tests and resulting fallout (which
	made the init task in a docker container untraceable among other
	things)

3) mask out undefined bits when sys_capset() is called as it might use
~0, ~0 to denote 'all capabilities' for backward/forward compatibility.
	This lets 'capsh --caps="all=eip" -- -c /bin/bash' run.

4) mask out undefined bit when we read a file capability off of disk as
again likely all bits are set in the xattr for forward/backward
compatibility.
	This lets 'setcap all+pe /bin/bash; /bin/bash' run
Signed-off-by: NEric Paris <eparis@redhat.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Steve Grubb <sgrubb@redhat.com>
Cc: Dan Walsh <dwalsh@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

7d8b6c63

23 7月, 2014 4 次提交

KEYS: big_key: Use key preparsing · 002edaf7

由 David Howells 提交于 7月 18, 2014

Make use of key preparsing in the big key type so that quota size determination
can take place prior to keyring locking when a key is being added.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>

002edaf7

KEYS: user: Use key preparsing · f9167789

由 David Howells 提交于 7月 18, 2014

Make use of key preparsing in user-defined and logon keys so that quota size
determination can take place prior to keyring locking when a key is being
added.

Also the idmapper key types need to change to match as they use the
user-defined key type routines.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>
Acked-by: NJeff Layton <jlayton@primarydata.com>

f9167789

KEYS: Allow expiry time to be set when preparsing a key · 7dfa0ca6

由 David Howells 提交于 7月 18, 2014

Allow a key type's preparsing routine to set the expiry time for a key.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>
Acked-by: NJeff Layton <jlayton@primarydata.com>
Reviewed-by: NSage Weil <sage@redhat.com>

7dfa0ca6

KEYS: struct key_preparsed_payload should have two payload pointers · fc7c70e0

由 David Howells 提交于 7月 18, 2014

struct key_preparsed_payload should have two payload pointers to correspond
with those in struct key.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>
Acked-by: NJeff Layton <jlayton@primarydata.com>
Reviewed-by: NSage Weil <sage@redhat.com>

fc7c70e0

19 7月, 2014 5 次提交

seccomp: implement SECCOMP_FILTER_FLAG_TSYNC · c2e1f2e3

由 Kees Cook 提交于 6月 05, 2014

Applying restrictive seccomp filter programs to large or diverse
codebases often requires handling threads which may be started early in
the process lifetime (e.g., by code that is linked in). While it is
possible to apply permissive programs prior to process start up, it is
difficult to further restrict the kernel ABI to those threads after that
point.

This change adds a new seccomp syscall flag to SECCOMP_SET_MODE_FILTER for
synchronizing thread group seccomp filters at filter installation time.

When calling seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
filter) an attempt will be made to synchronize all threads in current's
threadgroup to its new seccomp filter program. This is possible iff all
threads are using a filter that is an ancestor to the filter current is
attempting to synchronize to. NULL filters (where the task is running as
SECCOMP_MODE_NONE) are also treated as ancestors allowing threads to be
transitioned into SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS,
...) has been set on the calling thread, no_new_privs will be set for
all synchronized threads too. On success, 0 is returned. On failure,
the pid of one of the failing threads will be returned and no filters
will have been applied.

The race conditions against another thread are:
- requesting TSYNC (already handled by sighand lock)
- performing a clone (already handled by sighand lock)
- changing its filter (already handled by sighand lock)
- calling exec (handled by cred_guard_mutex)
The clone case is assisted by the fact that new threads will have their
seccomp state duplicated from their parent before appearing on the tasklist.

Holding cred_guard_mutex means that seccomp filters cannot be assigned
while in the middle of another thread's exec (potentially bypassing
no_new_privs or similar). The call to de_thread() may kill threads waiting
for the mutex.

Changes across threads to the filter pointer includes a barrier.

Based on patches by Will Drewry.
Suggested-by: NJulien Tinnes <jln@chromium.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NAndy Lutomirski <luto@amacapital.net>

c2e1f2e3

seccomp: introduce writer locking · dbd95212

由 Kees Cook 提交于 6月 27, 2014

Normally, task_struct.seccomp.filter is only ever read or modified by
the task that owns it (current). This property aids in fast access
during system call filtering as read access is lockless.

Updating the pointer from another task, however, opens up race
conditions. To allow cross-thread filter pointer updates, writes to the
seccomp fields are now protected by the sighand spinlock (which is shared
by all threads in the thread group). Read access remains lockless because
pointer updates themselves are atomic. However, writes (or cloning)
often entail additional checking (like maximum instruction counts)
which require locking to perform safely.

In the case of cloning threads, the child is invisible to the system
until it enters the task list. To make sure a child can't be cloned from
a thread and left in a prior state, seccomp duplication is additionally
moved under the sighand lock. Then parent and child are certain have
the same seccomp state when they exit the lock.

Based on patches by Will Drewry and David Drysdale.
Signed-off-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NAndy Lutomirski <luto@amacapital.net>

dbd95212

sched: move no_new_privs into new atomic flags · 1d4457f9

由 Kees Cook 提交于 5月 21, 2014

Since seccomp transitions between threads requires updates to the
no_new_privs flag to be atomic, the flag must be part of an atomic flag
set. This moves the nnp flag into a separate task field, and introduces
accessors.
Signed-off-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NAndy Lutomirski <luto@amacapital.net>

1d4457f9

seccomp: add "seccomp" syscall · 48dc92b9

由 Kees Cook 提交于 6月 25, 2014

This adds the new "seccomp" syscall with both an "operation" and "flags"
parameter for future expansion. The third argument is a pointer value,
used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must
be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...).

In addition to the TSYNC flag later in this patch series, there is a
non-zero chance that this syscall could be used for configuring a fixed
argument area for seccomp-tracer-aware processes to pass syscall arguments
in the future. Hence, the use of "seccomp" not simply "seccomp_add_filter"
for this syscall. Additionally, this syscall uses operation, flags,
and user pointer for arguments because strictly passing arguments via
a user pointer would mean seccomp itself would be unable to trivially
filter the seccomp syscall itself.
Signed-off-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NAndy Lutomirski <luto@amacapital.net>

48dc92b9

KEYS: Provide a generic instantiation function · 6a09d17b

由 David Howells 提交于 7月 18, 2014

Provide a generic instantiation function for key types that use the preparse
hook.  This makes it easier to prereserve key quota before keyrings get locked
to retain the new key.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>
Acked-by: NJeff Layton <jlayton@primarydata.com>
Reviewed-by: NSage Weil <sage@redhat.com>

6a09d17b

18 7月, 2014 1 次提交

KEYS: Allow special keys (eg. DNS results) to be invalidated by CAP_SYS_ADMIN · 0c7774ab

由 David Howells 提交于 7月 17, 2014

Special kernel keys, such as those used to hold DNS results for AFS, CIFS and
NFS and those used to hold idmapper results for NFS, used to be
'invalidateable' with key_revoke().  However, since the default permissions for
keys were reduced:

	Commit: 96b5c8fe
	KEYS: Reduce initial permissions on keys

it has become impossible to do this.

Add a key flag (KEY_FLAG_ROOT_CAN_INVAL) that will permit a key to be
invalidated by root.  This should not be used for system keyrings as the
garbage collector will try and remove any invalidate key.  For system keyrings,
KEY_FLAG_ROOT_CAN_CLEAR can be used instead.

After this, from userspace, keyctl_invalidate() and "keyctl invalidate" can be
used by any possessor of CAP_SYS_ADMIN (typically root) to invalidate DNS and
idmapper keys.  Invalidated keys are immediately garbage collected and will be
immediately rerequested if needed again.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-by: NSteve Dickson <steved@redhat.com>

0c7774ab

17 7月, 2014 2 次提交

KEYS: validate certificate trust only with builtin keys · 32c4741c

由 Dmitry Kasatkin 提交于 6月 17, 2014

Instead of allowing public keys, with certificates signed by any
key on the system trusted keyring, to be added to a trusted keyring,
this patch further restricts the certificates to those signed only by
builtin keys on the system keyring.

This patch defines a new option 'builtin' for the kernel parameter
'keys_ownerid' to allow trust validation using builtin keys.

Simplified Mimi's "KEYS: define an owner trusted keyring" patch

Changelog v7:
- rename builtin_keys to use_builtin_keys
Signed-off-by: NDmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>

32c4741c

KEYS: verify a certificate is signed by a 'trusted' key · 3be4beaf

由 Mimi Zohar 提交于 8月 20, 2013

Only public keys, with certificates signed by an existing
'trusted' key on the system trusted keyring, should be added
to a trusted keyring.  This patch adds support for verifying
a certificate's signature.

This is derived from David Howells pkcs7_request_asymmetric_key() patch.

Changelog v6:
- on error free key - Dmitry
- validate trust only for not already trusted keys - Dmitry
- formatting cleanup

Changelog:
- define get_system_trusted_keyring() to fix kbuild issues
Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NDmitry Kasatkin <dmitry.kasatkin@gmail.com>

3be4beaf

11 7月, 2014 1 次提交

clk: exynos5420: Add IDs for clocks used in PD mfc · c0fb262b

由 Arun Kumar K 提交于 7月 11, 2014

Adds IDs for MUX clocks to be used by power domain for MFC
for doing re-parenting while pd on/off.
Signed-off-by: NArun Kumar K <arun.kk@samsung.com>
Signed-off-by: NShaik Ameer Basha <shaik.ameer@samsung.com>
Acked-by: NTomasz Figa <t.figa@samsung.com>
Signed-off-by: NKukjin Kim <kgene.kim@samsung.com>

c0fb262b

10 7月, 2014 1 次提交

selinux: fix the default socket labeling in sock_graft() · 4da6daf4

由 Paul Moore 提交于 7月 10, 2014

The sock_graft() hook has special handling for AF_INET, AF_INET, and
AF_UNIX sockets as those address families have special hooks which
label the sock before it is attached its associated socket.
Unfortunately, the sock_graft() hook was missing a default approach
to labeling sockets which meant that any other address family which
made use of connections or the accept() syscall would find the
returned socket to be in an "unlabeled" state.  This was recently
demonstrated by the kcrypto/AF_ALG subsystem and the newly released
cryptsetup package (cryptsetup v1.6.5 and later).

This patch preserves the special handling in selinux_sock_graft(),
but adds a default behavior - setting the sock's label equal to the
associated socket - which resolves the problem with AF_ALG and
presumably any other address family which makes use of accept().

Cc: stable@vger.kernel.org
Signed-off-by: NPaul Moore <pmoore@redhat.com>
Tested-by: NMilan Broz <gmazyland@gmail.com>

4da6daf4

09 7月, 2014 3 次提交

pefile: Parse the "Microsoft individual code signing" data blob · 4c0b4b1d

由 David Howells 提交于 7月 01, 2014

The PKCS#7 certificate should contain a "Microsoft individual code signing"
data blob as its signed content.  This blob contains a digest of the signed
content of the PE binary and the OID of the digest algorithm used (typically
SHA256).
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Reviewed-by: NKees Cook <keescook@chromium.org>

4c0b4b1d

pefile: Parse a PE binary to find a key and a signature contained therein · 26d1164b

由 David Howells 提交于 7月 01, 2014

Parse a PE binary to find a key and a signature contained therein.  Later
patches will check the signature and add the key if the signature checks out.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Reviewed-by: NKees Cook <keescook@chromium.org>

26d1164b

Provide PE binary definitions · 9c87e0f1

由 David Howells 提交于 7月 01, 2014

Provide some PE binary structural and constant definitions as taken from the
pesign package sources.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Reviewed-by: NKees Cook <keescook@chromium.org>

9c87e0f1

08 7月, 2014 4 次提交

PKCS#7: Find intersection between PKCS#7 message and known, trusted keys · 08815b62

由 David Howells 提交于 7月 01, 2014

Find the intersection between the X.509 certificate chain contained in a PKCS#7
message and a set of keys that we already know and trust.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Reviewed-by: NKees Cook <keescook@chromium.org>

08815b62

PKCS#7: Find the right key in the PKCS#7 key list and verify the signature · a4730357

由 David Howells 提交于 7月 01, 2014

Find the appropriate key in the PKCS#7 key list and verify the signature with
it.  There may be several keys in there forming a chain.  Any link in that
chain or the root of that chain may be in our keyrings.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Reviewed-by: NKees Cook <keescook@chromium.org>

a4730357

PKCS#7: Implement a parser [RFC 2315] · 2e3fadbf

由 David Howells 提交于 7月 01, 2014

Implement a parser for a PKCS#7 signed-data message as described in part of
RFC 2315.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Reviewed-by: NKees Cook <keescook@chromium.org>

2e3fadbf

ACPI / i915: ignore firmware requests for backlight change · 0b9f7d93

由 Aaron Lu 提交于 7月 07, 2014

Some Thinkpad laptops' firmware will initiate a backlight level change
request through operation region on the events of AC plug/unplug, but
since we are not using firmware's interface to do the backlight setting
on these affected laptops, we do not want the firmware to use some
arbitrary value from its ASL variable to set the backlight level on
AC plug/unplug either.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=76491
Link: https://bugzilla.kernel.org/show_bug.cgi?id=77091Reported-and-tested-by: NIgor Gnatenko <i.gnatenko.brain@gmail.com>
Reported-and-tested-by: NAnton Gubarkov <anton.gubarkov@gmail.com>
Signed-off-by: NAaron Lu <aaron.lu@intel.com>
Acked-by: NJani Nikula <jani.nikula@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

0b9f7d93

04 7月, 2014 3 次提交

drm/i915: provide interface for audio driver to query cdclk · c149dcb5

由 Jani Nikula 提交于 7月 04, 2014

For Haswell and Broadwell, if the display power well has been disabled,
the display audio controller divider values EM4 M VALUE and EM5 N VALUE
will have been lost. The CDCLK frequency is required for reprogramming them
to generate 24MHz HD-A link BCLK. So provide a private interface for the
audio driver to query CDCLK.

This is a stopgap solution until a more generic interface between audio
and display drivers has been implemented.
Signed-off-by: NJani Nikula <jani.nikula@intel.com>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NMengdong Lin <mengdong.lin@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NTakashi Iwai <tiwai@suse.de>

c149dcb5

ptrace,x86: force IRET path after a ptrace_stop() · b9cd18de

由 Tejun Heo 提交于 7月 03, 2014

The 'sysret' fastpath does not correctly restore even all regular
registers, much less any segment registers or reflags values.  That is
very much part of why it's faster than 'iret'.

Normally that isn't a problem, because the normal ptrace() interface
catches the process using the signal handler infrastructure, which
always returns with an iret.

However, some paths can get caught using ptrace_event() instead of the
signal path, and for those we need to make sure that we aren't going to
return to user space using 'sysret'.  Otherwise the modifications that
may have been done to the register set by the tracer wouldn't
necessarily take effect.

Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from
arch_ptrace_stop_needed() which is invoked from ptrace_stop().
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b9cd18de

[SCSI] use the scsi data buffer length to extract transfer size · 5616b0a4

由 Martin K. Petersen 提交于 6月 24, 2014

Commit 8846bab1 introduced a helper that can be used to query the
wire transfer size for a SCSI command taking protection information into
account.

However, some commands do not have a 1:1 mapping between the block range
they work on and the payload size (discard, write same). After the
scatterlist has been set up these requests use __data_len to store the
number of bytes to report completion on. This means that callers of
scsi_transfer_length() would get the wrong byte count for these types of
requests.

To overcome this we make scsi_transfer_length() use the scatterlist
length in the scsi_data_buffer as basis for the wire transfer
calculation instead of __data_len.
Reported-by: NChristoph Hellwig <hch@infradead.org>
Debugged-by: NMike Christie <michaelc@cs.wisc.edu>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Fixes: d77e6535
Cc: stable@vger.kernel.org
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

5616b0a4

03 7月, 2014 1 次提交

kernfs: kernfs_notify() must be useable from non-sleepable contexts · ecca47ce

由 Tejun Heo 提交于 7月 01, 2014

d911d987 ("kernfs: make kernfs_notify() trigger inotify events
too") added fsnotify triggering to kernfs_notify() which requires a
sleepable context.  There are already existing users of
kernfs_notify() which invoke it from an atomic context and in general
it's silly to require a sleepable context for triggering a
notification.

The following is an invalid context bug triggerd by md invoking
sysfs_notify() from IO completion path.

 BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
 2 locks held by swapper/1/0:
  #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>] virtblk_done+0x42/0xe0 [virtio_blk]
  #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at: [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
 irq event stamp: 33518
 hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
 hardirqs last disabled at (33516): [<ffffffff818122ed>] common_interrupt+0x6d/0x72
 softirqs last  enabled at (33518): [<ffffffff810a1272>] _local_bh_enable+0x22/0x50
 softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c
  0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000
  0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2
 Call Trace:
  <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
  [<ffffffff810d4f14>] __might_sleep+0x184/0x240
  [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440
  [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
  [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
  [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
  [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
  [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
  [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
  [<ffffffff813b46c4>] blk_update_request+0x94/0x420
  [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
  [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
  [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
  [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
  [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
  [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
  [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
  [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
  [<ffffffff81119436>] handle_edge_irq+0x66/0x130
  [<ffffffff8101c3e4>] handle_irq+0x84/0x150
  [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
  [<ffffffff818122f2>] common_interrupt+0x72/0x72
  <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
  [<ffffffff81025454>] default_idle+0x24/0x230
  [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
  [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
  [<ffffffff8104df1b>] start_secondary+0x25b/0x300

This patch fixes it by punting the notification delivery through a
work item.  This ends up adding an extra pointer to kernfs_elem_attr
enlarging kernfs_node by a pointer, which is not ideal but not a very
big deal either.  If this turns out to be an actual issue, we can move
kernfs_elem_attr->size to kernfs_node->iattr later.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NJosh Boyer <jwboyer@fedoraproject.org>
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ecca47ce

02 7月, 2014 1 次提交

core: fix typo in percpu read_mostly section · 330d2822

由 Zhengyu He 提交于 7月 01, 2014

This fixes a typo that named the read_mostly section of percpu as
readmostly. It works fine with SMP because the linker script specifies
.data..percpu..readmostly. However, UP kernel builds don't have percpu
sections defined and the non-percpu version of the section is called
data..read_mostly, so .data..readmostly will float around and may break
things unexpectedly.

Looking at the original change that introduced data..percpu..readmostly
(commit c957ef2c), it looks like this
was the original intention.

Tested: Built UP kernel and confirmed the sections got merged.

- Before the patch:
$ objdump -h vmlinux.o  | grep '\.data\.\.read.*mostly'
38 .data..read_mostly 00004418  0000000000000000  0000000000000000  00431ac0  2**6
50 .data..readmostly 00000014  0000000000000000  0000000000000000  00444000  2**3

- After the patch:
$ objdump -h vmlinux.o  | grep '\.data\.\.read.*mostly'
38 .data..read_mostly 00004438  0000000000000000  0000000000000000  00431ac0  2**6
Signed-off-by: NZhengyu He <hzy@google.com>
Signed-off-by: NFilipe Brandenburger <filbranden@google.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

330d2822

01 7月, 2014 1 次提交

usb-storage/SCSI: Add broken_fua blacklist flag · b14bf2d0

由 Alan Stern 提交于 6月 30, 2014

Some buggy JMicron USB-ATA bridges don't know how to translate the FUA
bit in READs or WRITEs.  This patch adds an entry in unusual_devs.h
and a blacklist flag to tell the sd driver not to use FUA.
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
Reported-by: NMichael Büsch <m@bues.ch>
Tested-by: NMichael Büsch <m@bues.ch>
Acked-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
CC: <stable@vger.kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b14bf2d0

30 6月, 2014 2 次提交

kernfs: introduce kernfs_pin_sb() · 4e26445f

由 Li Zefan 提交于 6月 30, 2014

kernfs_pin_sb() tries to get a refcnt of the superblock.

This will be used by cgroupfs.

v2:
- make kernfs_pin_sb() return the superblock.
- drop kernfs_drop_sb().

tj: Updated the comment a bit.

[ This is a prerequisite for a bugfix. ]
Cc: <stable@vger.kernel.org> # 3.15
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

4e26445f

clk: exynos5420: Remove aclk66_peric from the clock tree description · 44ff0254

由 Doug Anderson 提交于 6月 05, 2014

The "aclk66_peric" clock is a gate clock with a whole bunch of gates
underneath it.  This big gate isn't very useful to include in our
clock tree.  If any of the children need to be turned on then the big
gate will need to be on anyway.  ...and there are plenty of other "big
gates" that aren't described in our clock tree, some of which shut off
collections of clocks that have no relationship in the hierarchy so
are hard to model.

"aclk66_peric" is causing earlyprintk problems since it gets disabled
as part of the boot process, so let's just remove it.

Strangely (and for no good reason) this clock is exported as part of
the common clock bindings.  Remove it since there are no in-kernel
device trees using it and no reason anyone out of tree should refer to
it either.
Signed-off-by: NDoug Anderson <dianders@chromium.org>
Signed-off-by: NTomasz Figa <t.figa@samsung.com>

44ff0254

29 6月, 2014 1 次提交

btrfs: create sprout should rename fsid on the sysfs as well · b2373f25

由 Anand Jain 提交于 6月 03, 2014

Creating sprout will change the fsid of the mounted root.
do the same on the sysfs as well.

reproducer:
 mount /dev/sdb /btrfs (seed disk)
 btrfs dev add /dev/sdc /btrfs
 mount -o rw,remount /btrfs
 btrfs dev del /dev/sdb /btrfs
 mount /dev/sdb /btrfs

Error:
kobject_add_internal failed for fe350492-dc28-4051-a601-e017b17e6145 with -EEXIST, don't try to register things with the same name in the same directory.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

b2373f25

28 6月, 2014 1 次提交

iovec: move memcpy_from/toiovecend to lib/iovec.c · ac5ccdba

由 Michael S. Tsirkin 提交于 6月 19, 2014

ERROR: "memcpy_fromiovecend" [drivers/vhost/vhost_scsi.ko] undefined!

commit 9f977ef7
    vhost-scsi: Include prot_bytes into expected data transfer length
in target-pending makes drivers/vhost/scsi.c call memcpy_fromiovecend().
This function is not available when CONFIG_NET is not enabled.

socket.h already includes uio.h, so no callers need updating.
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>

ac5ccdba

27 6月, 2014 3 次提交

usb: gadget: f_fs: resurect usb_functionfs_descs_head structure · 09122141

由 Michal Nazarewicz 提交于 6月 13, 2014

Even though usb_functionfs_descs_head structure is now deprecated,
it has been used by some user space tools.  Its removel in commit
[ac8dde11: “Add flags to descriptors block”] was an oversight
leading to build breakage for such tools.

Bring it back so that old user space tools can still be build
without problems on newer kernel versions.

Cc: <stable@vger.kernel.org>  # 3.14
Reported-by: NLad, Prabhakar <prabhakar.csengg@gmail.com>
Reported-by: NKrzysztof Opasiak <k.opasiak@samsung.com>
Signed-off-by: NMichal Nazarewicz <mina86@mina86.com>
Signed-off-by: NFelipe Balbi <balbi@ti.com>

09122141

Revert "tools: ffs-test: convert to new descriptor format fixing compilation error" · 9ad78604

由 Felipe Balbi 提交于 6月 27, 2014

This reverts commit f2af7412.

There is a better fix for this build error coming in a following
patch.
Signed-of-by: NFelipe Balbi <balbi@ti.com>

9ad78604

Fix 32-bit regression in block device read(2) · 0b86dbf6

由 Al Viro 提交于 6月 23, 2014

blkdev_read_iter() wants to cap the iov_iter by the amount of data
remaining to the end of device.  That's what iov_iter_truncate() is for
(trim iter->count if it's above the given limit).  So far, so good, but
the argument of iov_iter_truncate() is size_t, so on 32bit boxen (in
case of a large device) we end up with that upper limit truncated down
to 32 bits *before* comparing it with iter->count.

Easily fixed by making iov_iter_truncate() take 64bit argument - it does
the right thing after such change (we only reach the assignment in there
when the current value of iter->count is greater than the limit, i.e.
for anything that would get truncated we don't reach the assignment at
all) and that argument is not the new value of iter->count - it's an
upper limit for such.

The overhead of passing u64 is not an issue - the thing is inlined, so
callers passing size_t won't pay any penalty.
Reported-and-tested-by: NTheodore Tso <tytso@mit.edu>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Tested-by: NAlan Cox <gnomes@lxorguk.ukuu.org.uk>
Tested-by: NBruno Wolff III <bruno@wolff.to>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0b86dbf6

26 6月, 2014 1 次提交

ipv4: fix dst race in sk_dst_get() · f8864972

由 Eric Dumazet 提交于 6月 24, 2014

When IP route cache had been removed in linux-3.6, we broke assumption
that dst entries were all freed after rcu grace period. DST_NOCACHE
dst were supposed to be freed from dst_release(). But it appears
we want to keep such dst around, either in UDP sockets or tunnels.

In sk_dst_get() we need to make sure dst refcount is not 0
before incrementing it, or else we might end up freeing a dst
twice.

DST_NOCACHE set on a dst does not mean this dst can not be attached
to a socket or a tunnel.

Then, before actual freeing, we need to observe a rcu grace period
to make sure all other cpus can catch the fact the dst is no longer
usable.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDormando <dormando@rydia.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8864972

25 6月, 2014 1 次提交

block: add support for limiting gaps in SG lists · 66cb45aa

由 Jens Axboe 提交于 6月 24, 2014

Another restriction inherited for NVMe - those devices don't support
SG lists that have "gaps" in them. Gaps refers to cases where the
previous SG entry doesn't end on a page boundary. For NVMe, all SG
entries must start at offset 0 (except the first) and end on a page
boundary (except the last).
Signed-off-by: NJens Axboe <axboe@fb.com>

66cb45aa

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功