提交 · fd75815f727f157a05f4c96b5294a4617c0557da · openanolis / cloud-kernel

11 5月, 2012 3 次提交

KEYS: Add invalidation support · fd75815f

由 David Howells 提交于 5月 11, 2012

Add support for invalidating a key - which renders it immediately invisible to
further searches and causes the garbage collector to immediately wake up,
remove it from keyrings and then destroy it when it's no longer referenced.

It's better not to do this with keyctl_revoke() as that marks the key to start
returning -EKEYREVOKED to searches when what is actually desired is to have the
key refetched.

To invalidate a key the caller must be granted SEARCH permission by the key.
This may be too strict. It may be better to also permit invalidation if the
caller has any of READ, WRITE or SETATTR permission.

The primary use for this is to evict keys that are cached in special keyrings,
such as the DNS resolver or an ID mapper.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

fd75815f

KEYS: Do LRU discard in full keyrings · 31d5a79d

由 David Howells 提交于 5月 11, 2012

Do an LRU discard in keyrings that are full rather than returning ENFILE.  To
perform this, a time_t is added to the key struct and updated by the creation
of a link to a key and by a key being found as the result of a search.  At the
completion of a successful search, the keyrings in the path between the root of
the search and the first found link to it also have their last-used times
updated.

Note that discarding a link to a key from a keyring does not necessarily
destroy the key as there may be references held by other places.

An alternate discard method that might suffice is to perform FIFO discard from
the keyring, using the spare 2-byte hole in the keylist header as the index of
the next link to be discarded.

This is useful when using a keyring as a cache for DNS results or foreign
filesystem IDs.


This can be tested by the following.  As root do:

	echo 1000 >/proc/sys/kernel/keys/root_maxkeys

	kr=`keyctl newring foo @s`
	for ((i=0; i<2000; i++)); do keyctl add user a$i a $kr; done

Without this patch ENFILE should be reported when the keyring fills up.  With
this patch, the keyring discards keys in an LRU fashion.  Note that the stored
LRU time has a granularity of 1s.

After doing this, /proc/key-users can be observed and should show that most of
the 2000 keys have been discarded:

	[root@andromeda ~]# cat /proc/key-users
	    0:   517 516/516 513/1000 5249/20000

The "513/1000" here is the number of quota-accounted keys present for this user
out of the maximum permitted.

In /proc/keys, the keyring shows the number of keys it has and the number of
slots it has allocated:

	[root@andromeda ~]# grep foo /proc/keys
	200c64c4 I--Q--     1 perm 3b3f0000     0     0 keyring   foo: 509/509

The maximum is (PAGE_SIZE - header) / key pointer size.  That's typically 509
on a 64-bit system and 1020 on a 32-bit system.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

31d5a79d

KEYS: Perform RCU synchronisation on keys prior to key destruction · 65d87fe6

由 David Howells 提交于 5月 11, 2012

Make the keys garbage collector invoke synchronize_rcu() prior to destroying
keys with a zero usage count. This means that a key can be examined under the
RCU read lock in the safe knowledge that it won't get deallocated until after
the lock is released - even if its usage count becomes zero whilst we're
looking at it.

This is useful in keyring search vs key link. Consider a keyring containing a
link to a key. That link can be replaced in-place in the keyring without
requiring an RCU copy-and-replace on the keyring contents without breaking a
search underway on that keyring when the displaced key is released, provided
the key is actually destroyed only after the RCU read lock held by the search
algorithm is released.

This permits __key_link() to replace a key without having to reallocate the key
payload. A key gets replaced if a new key being linked into a keyring has the
same type and description.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NJeff Layton <jlayton@redhat.com>

65d87fe6

30 4月, 2012 1 次提交

pipes: add a "packetized pipe" mode for writing · 9883035a

由 Linus Torvalds 提交于 4月 29, 2012

The actual internal pipe implementation is already really about
individual packets (called "pipe buffers"), and this simply exposes that
as a special packetized mode.

When we are in the packetized mode (marked by O_DIRECT as suggested by
Alan Cox), a write() on a pipe will not merge the new data with previous
writes, so each write will get a pipe buffer of its own.  The pipe
buffer is then marked with the PIPE_BUF_FLAG_PACKET flag, which in turn
will tell the reader side to break the read at that boundary (and throw
away any partial packet contents that do not fit in the read buffer).

End result: as long as you do writes less than PIPE_BUF in size (so that
the pipe doesn't have to split them up), you can now treat the pipe as a
packet interface, where each read() system call will read one packet at
a time.  You can just use a sufficiently big read buffer (PIPE_BUF is
sufficient, since bigger than that doesn't guarantee atomicity anyway),
and the return value of the read() will naturally give you the size of
the packet.

NOTE! We do not support zero-sized packets, and zero-sized reads and
writes to a pipe continue to be no-ops.  Also note that big packets will
currently be split at write time, but that the size at which that
happens is not really specified (except that it's bigger than PIPE_BUF).
Currently that limit is the system page size, but we might want to
explicitly support bigger packets some day.

The main user for this is going to be the autofs packet interface,
allowing us to stop having to care so deeply about exact packet sizes
(which have had bugs with 32/64-bit compatibility modes).  But user
space can create packetized pipes with "pipe2(fd, O_DIRECT)", which will
fail with an EINVAL on kernels that do not support this interface.
Tested-by: NMichael Tokarev <mjt@tls.msk.ru>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Miller <davem@davemloft.net>
Cc: Ian Kent <raven@themaw.net>
Cc: Thomas Meyer <thomas@m3y3r.de>
Cc: stable@kernel.org  # needed for systemd/autofs interaction fix
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9883035a

28 4月, 2012 1 次提交

spi: fix spi.h kernel-doc warning · dbabe0d6

由 Randy Dunlap 提交于 4月 17, 2012

Fix kernel-doc warning in spi.h (copy/paste):

Warning(include/linux/spi/spi.h:365): No description found for parameter 'unprepare_transfer_hardware'
Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

dbabe0d6

27 4月, 2012 1 次提交

ARM: pxa: fix gpio wakeup setting · b95ace54

由 Robert Jarzmik 提交于 4月 22, 2012

In 3.3, gpio wakeup setting was broken. The call
enable_irq_wake() didn't set up the PXA gpio registers
(PWER, ...) anymore.

Fix it at least for pxa27x. The driver doesn't seem to be
used in pxa25x (weird ...), and the fix doesn't extend to
pxa3xx and pxa95x (which don't have a gpio_set_wake()
available).
Signed-off-by: NRobert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: NHaojian Zhuang <haojian.zhuang@gmail.com>

b95ace54

26 4月, 2012 1 次提交

mm: fix up the vmscan stat in vmstat · 904249aa

由 Ying Han 提交于 4月 25, 2012

The "pgsteal" stat is confusing because it counts both direct reclaim as
well as background reclaim.  However, we have "kswapd_steal" which also
counts background reclaim value.

This patch fixes it and also makes it match the existng "pgscan_" stats.

Test:
pgsteal_kswapd_dma32 447623
pgsteal_kswapd_normal 42272677
pgsteal_kswapd_movable 0
pgsteal_direct_dma32 2801
pgsteal_direct_normal 44353270
pgsteal_direct_movable 0
Signed-off-by: NYing Han <yinghan@google.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Reviewed-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

904249aa

25 4月, 2012 1 次提交

USB: EHCI: fix crash during suspend on ASUS computers · 151b6128

由 Alan Stern 提交于 4月 24, 2012

This patch (as1545) fixes a problem affecting several ASUS computers:
The machine crashes or corrupts memory when going into suspend if the
ehci-hcd driver is bound to any controllers.  Users have been forced
to unbind or unload ehci-hcd before putting their systems to sleep.

After extensive testing, it was determined that the machines don't
like going into suspend when any EHCI controllers are in the PCI D3
power state.  Presumably this is a firmware bug, but there's nothing
we can do about it except to avoid putting the controllers in D3
during system sleep.

The patch adds a new flag to indicate whether the problem is present,
and avoids changing the controller's power state if the flag is set.
Runtime suspend is unaffected; this matters only for system suspend.
However as a side effect, the controller will not respond to remote
wakeup requests while the system is asleep.  Hence USB wakeup is not
functional -- but of course, this is already true in the current state
of affairs.

This fixes Bugzilla #42728.
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
Tested-by: NSteven Rostedt <rostedt@goodmis.org>
Tested-by: NAndrey Rahmatullin <wrar@wrar.name>
Tested-by: NOleksij Rempel (fishor) <bug-track@fisher-privat.net>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

151b6128

23 4月, 2012 3 次提交

HSI: hsi: Rework hsi_event interface · ec1c56ff

由 Carlos Chinea 提交于 4月 11, 2012

Remove custom hack and make use of the notifier chain interfaces for
delivering events from the ports to their associated clients.
Clients that want to receive port events need to register their callbacks
using hsi_register_port_event(). The callbacks can be called in interrupt
context. Use hsi_unregestier_port_event() to undo the registration.
Signed-off-by: NCarlos Chinea <carlos.chinea@nokia.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NLinus Walleij <linus.walleij@linaro.org>

ec1c56ff

HSI: hsi: Rework hsi_controller release · 5a218ceb

由 Carlos Chinea 提交于 4月 04, 2012

Use the proper release mechanism for hsi_controller and
hsi_ports structures. Free the structures through their
associated device release callbacks.
Signed-off-by: NCarlos Chinea <carlos.chinea@nokia.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NLinus Walleij <linus.walleij@linaro.org>

5a218ceb

irq: Add IRQ_TYPE_DEFAULT for use by PIC drivers · 3fca40c7

由 Benjamin Herrenschmidt 提交于 4月 19, 2012

This is meant typically to allow a PIC driver's irq domain map() callback
to establish sane defaults for the interrupt (and make sure that the HW
and the irq_desc are in sync as far as the trigger is concerned).

The irq core may not call the set_trigger callback if it thinks the
trigger is already set to the right setting, so we need to ensure new
descriptors are properly synchronized with the hardware.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

3fca40c7

21 4月, 2012 6 次提交

A
kill mm argument of vm_munmap() · bfce281c
由 Al Viro 提交于 4月 20, 2012
```
it's always current->mm
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
bfce281c

NFSv4: Ensure we do not reuse open owner names · 95b72eb0

由 Trond Myklebust 提交于 4月 20, 2012

The NFSv4 spec is ambiguous about whether or not it is permissible
to reuse open owner names, so play it safe. This patch adds a timestamp
to the state_owner structure, and combines that with the IDA based
uniquifier.
Fixes a regression whereby the Linux server returns NFS4ERR_BAD_SEQID.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

95b72eb0

mmc: remove MMC bus legacy suspend/resume method · 32d317c6

由 Chuanxiao Dong 提交于 4月 11, 2012

MMC bus is using legacy suspend/resume method, which is not compatible if
runtime pm callbacks are used. In this scenario, MMC bus suspend/resume
callbacks cannot be called when system entering S3. So change to use the
new defined dev_pm_ops for system sleeping mode.

Tested on AM335x Platform. Solves major issue/crash reported at
http://www.mail-archive.com/linux-omap@vger.kernel.org/msg65425.htmlSigned-off-by: NChuanxiao Dong <chuanxiao.dong@intel.com>
Tested-by: NHebbar, Gururaja <gururaja.hebbar@ti.com>
Acked-by: NLinus Walleij <linus.walleij@linaro.org>
Acked-by: NUlf Hansson <ulf.hansson@stericsson.com>
Signed-off-by: NChris Ball <cjb@laptop.org>

32d317c6

VM: add "vm_mmap()" helper function · 6be5ceb0

由 Linus Torvalds 提交于 4月 20, 2012

This continues the theme started with vm_brk() and vm_munmap():
vm_mmap() does the same thing as do_mmap(), but additionally does the
required VM locking.

This uninlines (and rewrites it to be clearer) do_mmap(), which sadly
duplicates it in mm/mmap.c and mm/nommu.c.  But that way we don't have
to export our internal do_mmap_pgoff() function.

Some day we hopefully don't have to export do_mmap() either, if all
modular users can become the simpler vm_mmap() instead.  We're actually
very close to that already, with the notable exception of the (broken)
use in i810, and a couple of stragglers in binfmt_elf.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6be5ceb0

VM: add "vm_munmap()" helper function · a46ef99d

由 Linus Torvalds 提交于 4月 20, 2012

Like the vm_brk() function, this is the same as "do_munmap()", except it
does the VM locking for the caller.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a46ef99d

VM: add "vm_brk()" helper function · e4eb1ff6

由 Linus Torvalds 提交于 4月 20, 2012

It does the same thing as "do_brk()", except it handles the VM locking
too.

It turns out that all external callers want that anyway, so we can make
do_brk() static to just mm/mmap.c while at it.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e4eb1ff6

18 4月, 2012 1 次提交

seccomp: ignore secure_computing return values · e4da89d0

由 Will Drewry 提交于 4月 17, 2012

This change is inspired by
  https://lkml.org/lkml/2012/4/16/14
which fixes the build warnings for arches that don't support
CONFIG_HAVE_ARCH_SECCOMP_FILTER.

In particular, there is no requirement for the return value of
secure_computing() to be checked unless the architecture supports
seccomp filter.  Instead of silencing the warnings with (void)
a new static inline is added to encode the expected behavior
in a compiler and human friendly way.

v2: - cleans things up with a static inline
    - removes sfr's signed-off-by since it is a different approach
v1: - matches sfr's original change
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NWill Drewry <wad@chromium.org>
Acked-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

e4da89d0

17 4月, 2012 3 次提交

seccomp: use a static inline for a function stub · b1fa650c

由 Stephen Rothwell 提交于 4月 17, 2012

Fixes this error message when CONFIG_SECCOMP is not set:

arch/powerpc/kernel/ptrace.c: In function 'do_syscall_trace_enter':
arch/powerpc/kernel/ptrace.c:1713:2: error: statement with no effect [-Werror=unused-value]
Signed-off-by: NStephen Rothwell <sfr@ozlabs.au.ibm.com>
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

b1fa650c

mfd: Fix modular builds of rc5t583 regulator support · 82ea267f

由 Paul Gortmaker 提交于 4月 16, 2012

The combination of commit 1b1247dd

    "mfd: Add support for RICOH PMIC RC5T583"

and commit 6ffc3270

    "regulator: Add support for RICOH PMIC RC5T583 regulator"

are causing the i386 allmodconfig builds to fail with this:

  ERROR: "rc5t583_update" [drivers/regulator/rc5t583-regulator.ko] undefined!
  ERROR: "rc5t583_set_bits" [drivers/regulator/rc5t583-regulator.ko] undefined!
  ERROR: "rc5t583_clear_bits" [drivers/regulator/rc5t583-regulator.ko] undefined!
  ERROR: "rc5t583_read" [drivers/regulator/rc5t583-regulator.ko] undefined!

and this:

  ERROR: "rc5t583_ext_power_req_config" [drivers/regulator/rc5t583-regulator.ko] undefined!

For the 1st four, make the simple ops static inline, instead of
polluting the namespace with trivial exports.  For the last one,
add an EXPORT_SYMBOL.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>

82ea267f

nfsd: include cld.h in the headers_install target · d22053cd

由 Jeff Layton 提交于 4月 16, 2012

The cld.h file contains the definition of the upcall format to talk
with nfsdcld. When I added the file though, I neglected to add it
to the headers-y target, so make headers_install wasn't installing it.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

d22053cd

16 4月, 2012 2 次提交

mfd: Convert twl6040 to i2c driver, and separate it from twl core · 8eaeb939

由 Peter Ujfalusi 提交于 4月 03, 2012

Complete the separation of the twl6040 from the twl core since
it is a separate chip, not part of the twl6030 PMIC.

Make the needed Kconfig changes for the depending drivers at the
same time to avoid breaking the kernel build (vibra, ASoC components).
Signed-off-by: NPeter Ujfalusi <peter.ujfalusi@ti.com>
Reviewed-by: NMark Brown <broonie@opensource.wolfsonicro.com>
Acked-by: NTony Lindgren <tony@atomide.com>
Acked-by: NDmitry Torokhov <dtor@mail.ru>
Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>

8eaeb939

mfd : Fix dbx500 compilation error · 4accdff7

由 Daniel Lezcano 提交于 4月 02, 2012

The ux500 default config enables the db5500 and the db8500.
The incoming cpuidle driver uses the 'prcmu_enable_wakeups'
and the 'prcmu_set_power_state' functions but these ones
are defined but not implemented for the db5500, leading to
an unresolved symbol error at link time. In order to compile,
we have to disable the db5500 support which is not acceptable
for the default config.

I noticed there are also some other functions which are
defined but not implemented.

This patch fix this by removing the functions definitions
and move out of the config section the empty functions which
are normally used when the DB550 config is disabled.
Only the functions which are not implemented are concerned
by this modification.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>

4accdff7

14 4月, 2012 11 次提交

do not export kernel's NULL #define to userspace · 2084c24a

由 Lubos Lunak 提交于 3月 21, 2012

GCC's NULL is actually __null, which allows detecting some questionable
NULL usage and warn about it.  Moreover each platform/compiler should
have its own stddef.h anyway (which is different from linux/stddef.h).

So there's no good reason to leak kernel's NULL to userspace and
override what the compiler provides.
Signed-off-by: NLuboš Luňák <l.lunak@suse.cz>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2084c24a

ptrace,seccomp: Add PTRACE_SECCOMP support · fb0fadf9

由 Will Drewry 提交于 4月 12, 2012

This change adds support for a new ptrace option, PTRACE_O_TRACESECCOMP,
and a new return value for seccomp BPF programs, SECCOMP_RET_TRACE.

When a tracer specifies the PTRACE_O_TRACESECCOMP ptrace option, the
tracer will be notified, via PTRACE_EVENT_SECCOMP, for any syscall that
results in a BPF program returning SECCOMP_RET_TRACE.  The 16-bit
SECCOMP_RET_DATA mask of the BPF program return value will be passed as
the ptrace_message and may be retrieved using PTRACE_GETEVENTMSG.

If the subordinate process is not using seccomp filter, then no
system call notifications will occur even if the option is specified.

If there is no tracer with PTRACE_O_TRACESECCOMP when SECCOMP_RET_TRACE
is returned, the system call will not be executed and an -ENOSYS errno
will be returned to userspace.

This change adds a dependency on the system call slow path.  Any future
efforts to use the system call fast path for seccomp filter will need to
address this restriction.
Signed-off-by: NWill Drewry <wad@chromium.org>
Acked-by: NEric Paris <eparis@redhat.com>

v18: - rebase
     - comment fatal_signal check
     - acked-by
     - drop secure_computing_int comment
v17: - ...
v16: - update PT_TRACE_MASK to 0xbf4 so that STOP isn't clear on SETOPTIONS call (indan@nul.nu)
       [note PT_TRACE_MASK disappears in linux-next]
v15: - add audit support for non-zero return codes
     - clean up style (indan@nul.nu)
v14: - rebase/nochanges
v13: - rebase on to 88ebdda6
       (Brings back a change to ptrace.c and the masks.)
v12: - rebase to linux-next
     - use ptrace_event and update arch/Kconfig to mention slow-path dependency
     - drop all tracehook changes and inclusion (oleg@redhat.com)
v11: - invert the logic to just make it a PTRACE_SYSCALL accelerator
       (indan@nul.nu)
v10: - moved to PTRACE_O_SECCOMP / PT_TRACE_SECCOMP
v9:  - n/a
v8:  - guarded PTRACE_SECCOMP use with an ifdef
v7:  - introduced
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

fb0fadf9

seccomp: Add SECCOMP_RET_TRAP · bb6ea430

由 Will Drewry 提交于 4月 12, 2012

Adds a new return value to seccomp filters that triggers a SIGSYS to be
delivered with the new SYS_SECCOMP si_code.

This allows in-process system call emulation, including just specifying
an errno or cleanly dumping core, rather than just dying.
Suggested-by: NMarkus Gutschke <markus@chromium.org>
Suggested-by: NJulien Tinnes <jln@chromium.org>
Signed-off-by: NWill Drewry <wad@chromium.org>
Acked-by: NEric Paris <eparis@redhat.com>

v18: - acked-by, rebase
     - don't mention secure_computing_int() anymore
v15: - use audit_seccomp/skip
     - pad out error spacing; clean up switch (indan@nul.nu)
v14: - n/a
v13: - rebase on to 88ebdda6
v12: - rebase on to linux-next
v11: - clarify the comment (indan@nul.nu)
     - s/sigtrap/sigsys
v10: - use SIGSYS, syscall_get_arch, updates arch/Kconfig
       note suggested-by (though original suggestion had other behaviors)
v9:  - changes to SIGILL
v8:  - clean up based on changes to dependent patches
v7:  - introduction
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

bb6ea430

seccomp: add SECCOMP_RET_ERRNO · acf3b2c7

由 Will Drewry 提交于 4月 12, 2012

This change adds the SECCOMP_RET_ERRNO as a valid return value from a
seccomp filter.  Additionally, it makes the first use of the lower
16-bits for storing a filter-supplied errno.  16-bits is more than
enough for the errno-base.h calls.

Returning errors instead of immediately terminating processes that
violate seccomp policy allow for broader use of this functionality
for kernel attack surface reduction.  For example, a linux container
could maintain a whitelist of pre-existing system calls but drop
all new ones with errnos.  This would keep a logically static attack
surface while providing errnos that may allow for graceful failure
without the downside of do_exit() on a bad call.

This change also changes the signature of __secure_computing.  It
appears the only direct caller is the arm entry code and it clobbers
any possible return value (register) immediately.
Signed-off-by: NWill Drewry <wad@chromium.org>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Acked-by: NEric Paris <eparis@redhat.com>

v18: - fix up comments and rebase
     - fix bad var name which was fixed in later revs
     - remove _int() and just change the __secure_computing signature
v16-v17: ...
v15: - use audit_seccomp and add a skip label. (eparis@redhat.com)
     - clean up and pad out return codes (indan@nul.nu)
v14: - no change/rebase
v13: - rebase on to 88ebdda6
v12: - move to WARN_ON if filter is NULL
       (oleg@redhat.com, luto@mit.edu, keescook@chromium.org)
     - return immediately for filter==NULL (keescook@chromium.org)
     - change evaluation to only compare the ACTION so that layered
       errnos don't result in the lowest one being returned.
       (keeschook@chromium.org)
v11: - check for NULL filter (keescook@chromium.org)
v10: - change loaders to fn
 v9: - n/a
 v8: - update Kconfig to note new need for syscall_set_return_value.
     - reordered such that TRAP behavior follows on later.
     - made the for loop a little less indent-y
 v7: - introduced
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

acf3b2c7

seccomp: remove duplicated failure logging · 3dc1c1b2

由 Kees Cook 提交于 4月 12, 2012

This consolidates the seccomp filter error logging path and adds more
details to the audit log.
Signed-off-by: NWill Drewry <wad@chromium.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Acked-by: NEric Paris <eparis@redhat.com>

v18: make compat= permanent in the record
v15: added a return code to the audit_seccomp path by wad@chromium.org
     (suggested by eparis@redhat.com)
v*: original by keescook@chromium.org
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

3dc1c1b2

seccomp: add system call filtering using BPF · e2cfabdf

由 Will Drewry 提交于 4月 12, 2012

[This patch depends on luto@mit.edu's no_new_privs patch:
   https://lkml.org/lkml/2012/1/30/264
 The whole series including Andrew's patches can be found here:
   https://github.com/redpig/linux/tree/seccomp
 Complete diff here:
   https://github.com/redpig/linux/compare/1dc65fed...seccomp
]

This patch adds support for seccomp mode 2.  Mode 2 introduces the
ability for unprivileged processes to install system call filtering
policy expressed in terms of a Berkeley Packet Filter (BPF) program.
This program will be evaluated in the kernel for each system call
the task makes and computes a result based on data in the format
of struct seccomp_data.

A filter program may be installed by calling:
  struct sock_fprog fprog = { ... };
  ...
  prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &fprog);

The return value of the filter program determines if the system call is
allowed to proceed or denied.  If the first filter program installed
allows prctl(2) calls, then the above call may be made repeatedly
by a task to further reduce its access to the kernel.  All attached
programs must be evaluated before a system call will be allowed to
proceed.

Filter programs will be inherited across fork/clone and execve.
However, if the task attaching the filter is unprivileged
(!CAP_SYS_ADMIN) the no_new_privs bit will be set on the task.  This
ensures that unprivileged tasks cannot attach filters that affect
privileged tasks (e.g., setuid binary).

There are a number of benefits to this approach. A few of which are
as follows:
- BPF has been exposed to userland for a long time
- BPF optimization (and JIT'ing) are well understood
- Userland already knows its ABI: system call numbers and desired
  arguments
- No time-of-check-time-of-use vulnerable data accesses are possible.
- system call arguments are loaded on access only to minimize copying
  required for system call policy decisions.

Mode 2 support is restricted to architectures that enable
HAVE_ARCH_SECCOMP_FILTER.  In this patch, the primary dependency is on
syscall_get_arguments().  The full desired scope of this feature will
add a few minor additional requirements expressed later in this series.
Based on discussion, SECCOMP_RET_ERRNO and SECCOMP_RET_TRACE seem to be
the desired additional functionality.

No architectures are enabled in this patch.
Signed-off-by: NWill Drewry <wad@chromium.org>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: NIndan Zupancic <indan@nul.nu>
Acked-by: NEric Paris <eparis@redhat.com>
Reviewed-by: NKees Cook <keescook@chromium.org>

v18: - rebase to v3.4-rc2
     - s/chk/check/ (akpm@linux-foundation.org,jmorris@namei.org)
     - allocate with GFP_KERNEL|__GFP_NOWARN (indan@nul.nu)
     - add a comment for get_u32 regarding endianness (akpm@)
     - fix other typos, style mistakes (akpm@)
     - added acked-by
v17: - properly guard seccomp filter needed headers (leann@ubuntu.com)
     - tighten return mask to 0x7fff0000
v16: - no change
v15: - add a 4 instr penalty when counting a path to account for seccomp_filter
       size (indan@nul.nu)
     - drop the max insns to 256KB (indan@nul.nu)
     - return ENOMEM if the max insns limit has been hit (indan@nul.nu)
     - move IP checks after args (indan@nul.nu)
     - drop !user_filter check (indan@nul.nu)
     - only allow explicit bpf codes (indan@nul.nu)
     - exit_code -> exit_sig
v14: - put/get_seccomp_filter takes struct task_struct
       (indan@nul.nu,keescook@chromium.org)
     - adds seccomp_chk_filter and drops general bpf_run/chk_filter user
     - add seccomp_bpf_load for use by net/core/filter.c
     - lower max per-process/per-hierarchy: 1MB
     - moved nnp/capability check prior to allocation
       (all of the above: indan@nul.nu)
v13: - rebase on to 88ebdda6
v12: - added a maximum instruction count per path (indan@nul.nu,oleg@redhat.com)
     - removed copy_seccomp (keescook@chromium.org,indan@nul.nu)
     - reworded the prctl_set_seccomp comment (indan@nul.nu)
v11: - reorder struct seccomp_data to allow future args expansion (hpa@zytor.com)
     - style clean up, @compat dropped, compat_sock_fprog32 (indan@nul.nu)
     - do_exit(SIGSYS) (keescook@chromium.org, luto@mit.edu)
     - pare down Kconfig doc reference.
     - extra comment clean up
v10: - seccomp_data has changed again to be more aesthetically pleasing
       (hpa@zytor.com)
     - calling convention is noted in a new u32 field using syscall_get_arch.
       This allows for cross-calling convention tasks to use seccomp filters.
       (hpa@zytor.com)
     - lots of clean up (thanks, Indan!)
 v9: - n/a
 v8: - use bpf_chk_filter, bpf_run_filter. update load_fns
     - Lots of fixes courtesy of indan@nul.nu:
     -- fix up load behavior, compat fixups, and merge alloc code,
     -- renamed pc and dropped __packed, use bool compat.
     -- Added a hidden CONFIG_SECCOMP_FILTER to synthesize non-arch
        dependencies
 v7:  (massive overhaul thanks to Indan, others)
     - added CONFIG_HAVE_ARCH_SECCOMP_FILTER
     - merged into seccomp.c
     - minimal seccomp_filter.h
     - no config option (part of seccomp)
     - no new prctl
     - doesn't break seccomp on systems without asm/syscall.h
       (works but arg access always fails)
     - dropped seccomp_init_task, extra free functions, ...
     - dropped the no-asm/syscall.h code paths
     - merges with network sk_run_filter and sk_chk_filter
 v6: - fix memory leak on attach compat check failure
     - require no_new_privs || CAP_SYS_ADMIN prior to filter
       installation. (luto@mit.edu)
     - s/seccomp_struct_/seccomp_/ for macros/functions (amwang@redhat.com)
     - cleaned up Kconfig (amwang@redhat.com)
     - on block, note if the call was compat (so the # means something)
 v5: - uses syscall_get_arguments
       (indan@nul.nu,oleg@redhat.com, mcgrathr@chromium.org)
      - uses union-based arg storage with hi/lo struct to
        handle endianness.  Compromises between the two alternate
        proposals to minimize extra arg shuffling and account for
        endianness assuming userspace uses offsetof().
        (mcgrathr@chromium.org, indan@nul.nu)
      - update Kconfig description
      - add include/seccomp_filter.h and add its installation
      - (naive) on-demand syscall argument loading
      - drop seccomp_t (eparis@redhat.com)
 v4:  - adjusted prctl to make room for PR_[SG]ET_NO_NEW_PRIVS
      - now uses current->no_new_privs
        (luto@mit.edu,torvalds@linux-foundation.com)
      - assign names to seccomp modes (rdunlap@xenotime.net)
      - fix style issues (rdunlap@xenotime.net)
      - reworded Kconfig entry (rdunlap@xenotime.net)
 v3:  - macros to inline (oleg@redhat.com)
      - init_task behavior fixed (oleg@redhat.com)
      - drop creator entry and extra NULL check (oleg@redhat.com)
      - alloc returns -EINVAL on bad sizing (serge.hallyn@canonical.com)
      - adds tentative use of "always_unprivileged" as per
        torvalds@linux-foundation.org and luto@mit.edu
 v2:  - (patch 2 only)
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

e2cfabdf

seccomp: kill the seccomp_t typedef · 932ecebb

由 Will Drewry 提交于 4月 12, 2012

Replaces the seccomp_t typedef with struct seccomp to match modern
kernel style.
Signed-off-by: NWill Drewry <wad@chromium.org>
Reviewed-by: NJames Morris <jmorris@namei.org>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Acked-by: NEric Paris <eparis@redhat.com>

v18: rebase
...
v14: rebase/nochanges
v13: rebase on to 88ebdda6
v12: rebase on to linux-next
v8-v11: no changes
v7: struct seccomp_struct -> struct seccomp
v6: original inclusion in this series.
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

932ecebb

net/compat.c,linux/filter.h: share compat_sock_fprog · 0c5fe1b4

由 Will Drewry 提交于 4月 12, 2012

Any other users of bpf_*_filter that take a struct sock_fprog from
userspace will need to be able to also accept a compat_sock_fprog
if the arch supports compat calls.  This change allows the existing
compat_sock_fprog be shared.
Signed-off-by: NWill Drewry <wad@chromium.org>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NEric Paris <eparis@redhat.com>

v18: tasered by the apostrophe police
v14: rebase/nochanges
v13: rebase on to 88ebdda6
v12: rebase on to linux-next
v11: introduction
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

0c5fe1b4

sk_run_filter: add BPF_S_ANC_SECCOMP_LD_W · 46b325c7

由 Will Drewry 提交于 4月 12, 2012

Introduces a new BPF ancillary instruction that all LD calls will be
mapped through when skb_run_filter() is being used for seccomp BPF.  The
rewriting will be done using a secondary chk_filter function that is run
after skb_chk_filter.

The code change is guarded by CONFIG_SECCOMP_FILTER which is added,
along with the seccomp_bpf_load() function later in this series.

This is based on http://lkml.org/lkml/2012/3/2/141Suggested-by: NIndan Zupancic <indan@nul.nu>
Signed-off-by: NWill Drewry <wad@chromium.org>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NEric Paris <eparis@redhat.com>

v18: rebase
...
v15: include seccomp.h explicitly for when seccomp_bpf_load exists.
v14: First cut using a single additional instruction
... v13: made bpf functions generic.
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

46b325c7

Add PR_{GET,SET}_NO_NEW_PRIVS to prevent execve from granting privs · 259e5e6c

由 Andy Lutomirski 提交于 4月 12, 2012

With this change, calling
  prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)
disables privilege granting operations at execve-time.  For example, a
process will not be able to execute a setuid binary to change their uid
or gid if this bit is set.  The same is true for file capabilities.

Additionally, LSM_UNSAFE_NO_NEW_PRIVS is defined to ensure that
LSMs respect the requested behavior.

To determine if the NO_NEW_PRIVS bit is set, a task may call
  prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0);
It returns 1 if set and 0 if it is not set. If any of the arguments are
non-zero, it will return -1 and set errno to -EINVAL.
(PR_SET_NO_NEW_PRIVS behaves similarly.)

This functionality is desired for the proposed seccomp filter patch
series.  By using PR_SET_NO_NEW_PRIVS, it allows a task to modify the
system call behavior for itself and its child tasks without being
able to impact the behavior of a more privileged task.

Another potential use is making certain privileged operations
unprivileged.  For example, chroot may be considered "safe" if it cannot
affect privileged tasks.

Note, this patch causes execve to fail when PR_SET_NO_NEW_PRIVS is
set and AppArmor is in use.  It is fixed in a subsequent patch.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NWill Drewry <wad@chromium.org>
Acked-by: NEric Paris <eparis@redhat.com>
Acked-by: NKees Cook <keescook@chromium.org>

v18: updated change desc
v17: using new define values as per 3.4
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

259e5e6c

skbuff: struct ubuf_info callback type safety · ca8f4fb2

由 Michael S. Tsirkin 提交于 4月 09, 2012

The skb struct ubuf_info callback gets passed struct ubuf_info
itself, not the arg value as the field name and the function signature
seem to imply. Rename the arg field to ctx to match usage,
add documentation and change the callback argument type
to make usage clear and to have compiler check correctness.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca8f4fb2

13 4月, 2012 2 次提交

ARM: 7366/3: amba: Remove AMBA level regulator support · 1e45860f

由 Mark Brown 提交于 4月 13, 2012

The AMBA bus regulator support is being used to model on/off switches
for power domains which isn't terribly idiomatic for modern kernels with
the generic power domain code and creates integration problems on platforms
which don't use regulators for their power domains as it's hard to tell
the difference between a regulator that is needed but failed to be provided
and one that isn't supposed to be there (though DT does make that easier).

Platforms that wish to use the regulator API to manage their power domains
can indirect via the power domain interface.

This feature is only used with the vape supply of the db8500 PRCMU
driver which supplies the UARTs and MMC controllers, none of which have
support for managing vcore at runtime in mainline (only pl022 SPI
controller does). Update that supply to have an always_on constraint
until the power domain support for the system is updated so that it is
enabled for these users, this is likely to have no impact on practical
systems as probably at least one of these devices will be active and
cause AMBA to hold the supply on anyway.
Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: NLinus Walleij <linus.walleij@linaro.org>
Tested-by: NShawn Guo <shawn.guo@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

1e45860f

kconfig: fix IS_ENABLED to not require all options to be defined · 69349c2d

由 Paul Gortmaker 提交于 4月 12, 2012

Using IS_ENABLED() within C (vs.  within CPP #if statements) in its
current form requires us to actually define every possible bool/tristate
Kconfig option twice (__enabled_* and __enabled_*_MODULE variants).

This results in a huge autoconf.h file, on the order of 16k lines for a
x86_64 defconfig.

Fixing IS_ENABLED to be able to work on the smaller subset of just
things that we really have defined is step one to fixing this.  Which
means it has to not choke when fed non-enabled options, such as:

  include/linux/netdevice.h:964:1: warning: "__enabled_CONFIG_FCOE_MODULE" is not defined [-Wundef]

The original prototype of how to implement a C and preprocessor
compatible way of doing this came from the Google+ user "comex ." in
response to Linus' crowdsourcing challenge for a possible improvement on
his earlier C specific solution:

	#define config_enabled(x)       (__stringify(x)[0] == '1')

In this implementation, I've chosen variable names that hopefully make
how it works more understandable.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

69349c2d

12 4月, 2012 3 次提交

fuse: use flexible array in fuse.h · c628ee67

由 Miklos Szeredi 提交于 4月 12, 2012

Use the ISO C standard compliant form instead of the gcc extension in the
interface definition.
Reported-by: NShachar Sharon <ssnail@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

c628ee67

irq_domain: Move irq_virq_count into NOMAP revmap · 6fa6c8e2

由 Grant Likely 提交于 2月 15, 2012

This patch replaces the old global setting of irq_virq_count that is only
used by the NOMAP mapping and instead uses a revmap_data property so that
the maximum NOMAP allocation can be set per NOMAP irq_domain.

There is exactly one user of irq_virq_count in-tree right now: PS3.
Also, irq_virq_count is only useful for the NOMAP mapping.  So,
instead of having a single global irq_virq_count values, this change
drops it entirely and added a max_irq argument to irq_domain_add_nomap().
That makes it a property of an individual nomap irq domain instead of
a global system settting.
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
Tested-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>

6fa6c8e2

KVM: unmap pages from the iommu when slots are removed · 32f6daad

由 Alex Williamson 提交于 4月 11, 2012

We've been adding new mappings, but not destroying old mappings.
This can lead to a page leak as pages are pinned using
get_user_pages, but only unpinned with put_page if they still
exist in the memslots list on vm shutdown.  A memslot that is
destroyed while an iommu domain is enabled for the guest will
therefore result in an elevated page reference count that is
never cleared.

Additionally, without this fix, the iommu is only programmed
with the first translation for a gpa.  This can result in
peer-to-peer errors if a mapping is destroyed and replaced by a
new mapping at the same gpa as the iommu will still be pointing
to the original, pinned memory address.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

32f6daad

11 4月, 2012 1 次提交

tcp: avoid order-1 allocations on wifi and tx path · a21d4572

由 Eric Dumazet 提交于 4月 10, 2012

Marc Merlin reported many order-1 allocations failures in TX path on its
wireless setup, that dont make any sense with MTU=1500 network, and non
SG capable hardware.

After investigation, it turns out TCP uses sk_stream_alloc_skb() and
used as a convention skb_tailroom(skb) to know how many bytes of data
payload could be put in this skb (for non SG capable devices)

Note : these skb used kmalloc-4096 (MTU=1500 + MAX_HEADER +
sizeof(struct skb_shared_info) being above 2048)

Later, mac80211 layer need to add some bytes at the tail of skb
(IEEE80211_ENCRYPT_TAILROOM = 18 bytes) and since no more tailroom is
available has to call pskb_expand_head() and request order-1
allocations.

This patch changes sk_stream_alloc_skb() so that only
sk->sk_prot->max_header bytes of headroom are reserved, and use a new
skb field, avail_size to hold the data payload limit.

This way, order-0 allocations done by TCP stack can leave more than 2 KB
of tailroom and no more allocation is performed in mac80211 layer (or
any layer needing some tailroom)

avail_size is unioned with mark/dropcount, since mark will be set later
in IP stack for output packets. Therefore, skb size is unchanged.
Reported-by: NMarc MERLIN <marc@merlins.org>
Tested-by: NMarc MERLIN <marc@merlins.org>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a21d4572

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功