提交 · 2633d7a028239a738b793be5ca8fa6ac312f5793 · openeuler / raspberrypi-kernel

19 12月, 2012 8 次提交

slab/slub: consider a memcg parameter in kmem_create_cache · 2633d7a0

由 Glauber Costa 提交于 12月 18, 2012

Allow a memcg parameter to be passed during cache creation.  When the slub
allocator is being used, it will only merge caches that belong to the same
memcg.  We'll do this by scanning the global list, and then translating
the cache to a memcg-specific cache

Default function is created as a wrapper, passing NULL to the memcg
version.  We only merge caches that belong to the same memcg.

A helper is provided, memcg_css_id: because slub needs a unique cache name
for sysfs.  Since this is visible, but not the canonical location for slab
data, the cache name is not used, the css_id should suffice.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: JoonSoo Kim <js1304@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2633d7a0

slab/slub: struct memcg_params · ba6c496e

由 Glauber Costa 提交于 12月 18, 2012

For the kmem slab controller, we need to record some extra information in
the kmem_cache structure.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Signed-off-by: NSuleiman Souhlal <suleiman@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: JoonSoo Kim <js1304@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ba6c496e

fork: protect architectures where THREAD_SIZE >= PAGE_SIZE against fork bombs · 2ad306b1

由 Glauber Costa 提交于 12月 18, 2012

Because those architectures will draw their stacks directly from the page
allocator, rather than the slab cache, we can directly pass __GFP_KMEMCG
flag, and issue the corresponding free_pages.

This code path is taken when the architecture doesn't define
CONFIG_ARCH_THREAD_INFO_ALLOCATOR (only ia64 seems to), and has
THREAD_SIZE >= PAGE_SIZE.  Luckily, most - if not all - of the remaining
architectures fall in this category.

This will guarantee that every stack page is accounted to the memcg the
process currently lives on, and will have the allocations to fail if they
go over limit.

For the time being, I am defining a new variant of THREADINFO_GFP, not to
mess with the other path.  Once the slab is also tracked by memcg, we can
get rid of that flag.

Tested to successfully protect against :(){ :|:& };:
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NFrederic Weisbecker <fweisbec@redhat.com>
Acked-by: NKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: JoonSoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2ad306b1

memcg: use static branches when code not in use · a8964b9b

由 Glauber Costa 提交于 12月 18, 2012

We can use static branches to patch the code in or out when not used.

Because the _ACTIVE bit on kmem_accounted is only set after the increment
is done, we guarantee that the root memcg will always be selected for kmem
charges until all call sites are patched (see memcg_kmem_enabled).  This
guarantees that no mischarges are applied.

Static branch decrement happens when the last reference count from the
kmem accounting in memcg dies.  This will only happen when the charges
drop down to 0.

When that happens, we need to disable the static branch only on those
memcgs that enabled it.  To achieve this, we would be forced to complicate
the code by keeping track of which memcgs were the ones that actually
enabled limits, and which ones got it from its parents.

It is a lot simpler just to do static_key_slow_inc() on every child
that is accounted.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: JoonSoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a8964b9b

res_counter: return amount of charges after res_counter_uncharge() · 50bdd430

由 Glauber Costa 提交于 12月 18, 2012

It is useful to know how many charges are still left after a call to
res_counter_uncharge.  While it is possible to issue a res_counter_read
after uncharge, this can be racy.

If we need, for instance, to take some action when the counters drop down
to 0, only one of the callers should see it.  This is the same semantics
as the atomic variables in the kernel.

Since the current return value is void, we don't need to worry about
anything breaking due to this change: nobody relied on that, and only
users appearing from now on will be checking this value.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: JoonSoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

50bdd430

mm: allocate kernel pages to the right memcg · 6a1a0d3b

由 Glauber Costa 提交于 12月 18, 2012

When a process tries to allocate a page with the __GFP_KMEMCG flag, the
page allocator will call the corresponding memcg functions to validate
the allocation.  Tasks in the root memcg can always proceed.

To avoid adding markers to the page - and a kmem flag that would
necessarily follow, as much as doing page_cgroup lookups for no reason,
whoever is marking its allocations with __GFP_KMEMCG flag is responsible
for telling the page allocator that this is such an allocation at
free_pages() time.  This is done by the invocation of
__free_accounted_pages() and free_accounted_pages().
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NMel Gorman <mgorman@suse.de>
Acked-by: NKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: JoonSoo Kim <js1304@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6a1a0d3b

memcg: kmem controller infrastructure · 7ae1e1d0

由 Glauber Costa 提交于 12月 18, 2012

Introduce infrastructure for tracking kernel memory pages to a given
memcg.  This will happen whenever the caller includes the flag
__GFP_KMEMCG flag, and the task belong to a memcg other than the root.

In memcontrol.h those functions are wrapped in inline acessors.  The idea
is to later on, patch those with static branches, so we don't incur any
overhead when no mem cgroups with limited kmem are being used.

Users of this functionality shall interact with the memcg core code
through the following functions:

memcg_kmem_newpage_charge: will return true if the group can handle the
                           allocation. At this point, struct page is not
                           yet allocated.

memcg_kmem_commit_charge: will either revert the charge, if struct page
                          allocation failed, or embed memcg information
                          into page_cgroup.

memcg_kmem_uncharge_page: called at free time, will revert the charge.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: JoonSoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7ae1e1d0

mm: add a __GFP_KMEMCG flag · 7a64bf05

由 Glauber Costa 提交于 12月 18, 2012

This flag is used to indicate to the callees that this allocation is a
kernel allocation in process context, and should be accounted to current's
memcg.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NMel Gorman <mel@csn.ul.ie>
Acked-by: NKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NChristoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: JoonSoo Kim <js1304@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7a64bf05

18 12月, 2012 21 次提交

fs, exportfs: add exportfs_encode_inode_fh() helper · 711c7bf9

由 Cyrill Gorcunov 提交于 12月 17, 2012

We will need this helper in the next patch to provide a file handle for
inotify marks in /proc/pid/fdinfo output.

The patch is rather providing the way to use inodes directly when dentry
is not available (like in case of inotify system).
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

711c7bf9

fs, epoll: add procfs fdinfo helper · 138d22b5

由 Cyrill Gorcunov 提交于 12月 17, 2012

This allows us to print out eventpoll target file descriptor, events and
data, the /proc/pid/fdinfo/fd consists of

 | pos:	0
 | flags:	02
 | tfd:        5 events:       1d data: ffffffffffffffff enabled: 1

[avagin@: fix for unitialized ret variable]
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

138d22b5

procfs: add ability to plug in auxiliary fdinfo providers · 55985dd7

由 Cyrill Gorcunov 提交于 12月 17, 2012

This patch brings ability to print out auxiliary data associated with
file in procfs interface /proc/pid/fdinfo/fd.

In particular further patches make eventfd, evenpoll, signalfd and
fsnotify to print additional information complete enough to restore
these objects after checkpoint.

To simplify the code we add show_fdinfo callback inside struct
file_operations (as Al and Pavel are proposing).
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

55985dd7

prandom: introduce prandom_bytes() and prandom_bytes_state() · 6582c665

由 Akinobu Mita 提交于 12月 17, 2012

Add functions to get the requested number of pseudo-random bytes.

The difference from get_random_bytes() is that it generates pseudo-random
numbers by prandom_u32().  It doesn't consume the entropy pool, and the
sequence is reproducible if the same rnd_state is used.  So it is suitable
for generating random bytes for testing.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Robert Love <robert.w.love@intel.com>
Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6582c665

random32: rename random32 to prandom · 496f2f93

由 Akinobu Mita 提交于 12月 17, 2012

This renames all random32 functions to have 'prandom_' prefix as follows:

  void prandom_seed(u32 seed);	/* rename from srandom32() */
  u32 prandom_u32(void);		/* rename from random32() */
  void prandom_seed_state(struct rnd_state *state, u64 seed);
  				/* rename from prandom32_seed() */
  u32 prandom_u32_state(struct rnd_state *state);
  				/* rename from prandom32() */

The purpose of this renaming is to prevent some kernel developers from
assuming that prandom32() and random32() might imply that only
prandom32() was the one using a pseudo-random number generator by
prandom32's "p", and the result may be a very embarassing security
exposure.  This concern was expressed by Theodore Ts'o.

And furthermore, I'm going to introduce new functions for getting the
requested number of pseudo-random bytes.  If I continue to use both
prandom32 and random32 prefixes for these functions, the confusion
is getting worse.

As a result of this renaming, "prandom_" is the common prefix for
pseudo-random number library.

Currently, srandom32() and random32() are preserved because it is
difficult to rename too many users at once.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Robert Love <robert.w.love@intel.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: David Laight <david.laight@aculab.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

496f2f93

linux/compiler.h: add __must_hold macro for functions called with a lock held · 8529091e

由 Josh Triplett 提交于 12月 17, 2012

linux/compiler.h has macros to denote functions that acquire or release
locks, but not to denote functions called with a lock held that return
with the lock still held.  Add a __must_hold macro to cover that case.
Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
Reported-by: NEd Cashin <ecashin@coraid.com>
Tested-by: NEd Cashin <ecashin@coraid.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8529091e

pidns: remove unused is_container_init() · a5ba911e

由 Gao feng 提交于 12月 17, 2012

Since commit 1cdcbec1 ("CRED: Neuter sys_capset()")
is_container_init() has no callers.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a5ba911e

exec: use -ELOOP for max recursion depth · d7402698

由 Kees Cook 提交于 12月 17, 2012

To avoid an explosion of request_module calls on a chain of abusive
scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon
as maximum recursion depth is hit, the error will fail all the way back
up the chain, aborting immediately.

This also has the side-effect of stopping the user's shell from attempting
to reexecute the top-level file as a shell script. As seen in the
dash source:

        if (cmd != path_bshell && errno == ENOEXEC) {
                *argv-- = cmd;
                *argv = cmd = path_bshell;
                goto repeat;
        }

The above logic was designed for running scripts automatically that lacked
the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC,
things continue to behave as the shell expects.

Additionally, when tracking recursion, the binfmt handlers should not be
involved. The recursion being tracked is the depth of calls through
search_binary_handler(), so that function should be exclusively responsible
for tracking the depth.
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: halfdog <me@halfdog.net>
Cc: P J P <ppandit@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d7402698

ptrace: introduce PTRACE_O_EXITKILL · 992fb6e1

由 Oleg Nesterov 提交于 12月 17, 2012

Ptrace jailers want to be sure that the tracee can never escape
from the control. However if the tracer dies unexpectedly the
tracee continues to run in potentially unsafe mode.

Add the new ptrace option PTRACE_O_EXITKILL. If the tracer exits
it sends SIGKILL to every tracee which has this bit set.

Note that the new option is not equal to the last-option << 1.  Because
currently all options have an event, and the new one starts the eventless
group.  It uses the random 20 bit, so we have the room for 12 more events,
but we can also add the new eventless options below this one.

Suggested by Amnon Shiloh.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Tested-by: NAmnon Shiloh <u3557@miso.sublimeip.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Chris Evans <scarybeasts@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

992fb6e1

kstrto*: add documentation · 4c925d60

由 Eldad Zack 提交于 12月 17, 2012

As Bruce Fields pointed out, kstrto* is currently lacking kerneldoc
comments.  This patch adds kerneldoc comments to common variants of
kstrto*: kstrto(u)l, kstrto(u)ll and kstrto(u)int.
Signed-off-by: NEldad Zack <eldad@fogrefinery.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Joe Perches <joe@perches.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Rob Landley <rob@landley.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4c925d60

compat: generic compat_sys_sched_rr_get_interval() implementation · 0ad50c38

由 Catalin Marinas 提交于 12月 17, 2012

This function is used by sparc, powerpc tile and arm64 for compat support.
 The patch adds a generic implementation with a wrapper for PowerPC to do
the u32->int sign extension.

The reason for a single patch covering powerpc, tile, sparc and arm64 is
to keep it bisectable, otherwise kernel building may fail with mismatched
function declarations.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>  [for tile]
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0ad50c38

percpu_rw_semaphore: add lockdep annotations · 8ebe3473

由 Oleg Nesterov 提交于 12月 17, 2012

Add lockdep annotations.  Not only this can help to find the potential
problems, we do not want the false warnings if, say, the task takes two
different percpu_rw_semaphore's for reading.  IOW, at least ->rw_sem
should not use a single class.

This patch exposes this internal lock to lockdep so that it represents the
whole percpu_rw_semaphore.  This way we do not need to add another "fake"
->lockdep_map and lock_class_key.  More importantly, this also makes the
output from lockdep much more understandable if it finds the problem.

In short, with this patch from lockdep pov percpu_down_read() and
percpu_up_read() acquire/release ->rw_sem for reading, this matches the
actual semantics.  This abuses __up_read() but I hope this is fine and in
fact I'd like to have down_read_no_lockdep() as well,
percpu_down_read_recursive_readers() will need it.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8ebe3473

percpu_rw_semaphore: kill ->writer_mutex, add ->write_ctr · 9390ef0c

由 Oleg Nesterov 提交于 12月 17, 2012

percpu_rw_semaphore->writer_mutex was only added to simplify the initial
rewrite, the only thing it protects is clear_fast_ctr() which otherwise
could be called by multiple writers.  ->rw_sem is enough to serialize the
writers.

Kill this mutex and add "atomic_t write_ctr" instead.  The writers
increment/decrement this counter, the readers check it is zero instead of
mutex_is_locked().

Move atomic_add(clear_fast_ctr(), slow_read_ctr) under down_write() to
avoid the race with other writers.  This is a bit sub-optimal, only the
first writer needs this and we do not need to exclude the readers at this
stage.  But this is simple, we do not want another internal lock until we
add more features.

And this speeds up the write-contended case.  Before this patch the racing
writers sleep in synchronize_sched_expedited() sequentially, with this
patch multiple synchronize_sched_expedited's can "overlap" with each
other.  Note: we can do more optimizations, this is only the first step.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9390ef0c

percpu_rw_semaphore: reimplement to not block the readers unnecessarily · a1fd3e24

由 Oleg Nesterov 提交于 12月 17, 2012

Currently the writer does msleep() plus synchronize_sched() 3 times to
acquire/release the semaphore, and during this time the readers are
blocked completely.  Even if the "write" section was not actually started
or if it was already finished.

With this patch down_write/up_write does synchronize_sched() twice and
down_read/up_read are still possible during this time, just they use the
slow path.

percpu_down_write() first forces the readers to use rw_semaphore and
increment the "slow" counter to take the lock for reading, then it
takes that rw_semaphore for writing and blocks the readers.

Also.  With this patch the code relies on the documented behaviour of
synchronize_sched(), it doesn't try to pair synchronize_sched() with
barrier.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a1fd3e24

string: introduce helper to get base file name from given path · b18888ab

由 Andy Shevchenko 提交于 12月 17, 2012

There are several places in the kernel that use functionality like
basename(3) with the exception: in case of '/foo/bar/' we expect to get an
empty string.  Let's do it common helper for them.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: YAMANE Toshiaki <yamanetoshi@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b18888ab

backlight: add of_find_backlight_by_node() · 762a936f

由 Thierry Reding 提交于 12月 17, 2012

This function finds the struct backlight_device for a given device tree
node.  A dummy function is provided so that it safely compiles out if OF
support is disabled.

[akpm@linux-foundation.org: Don't use IS_ENABLED(CONFIG_OF)]
Signed-off-by: NThierry Reding <thierry.reding@avionic-design.de>
Acked-by: NJingoo Han <jg1.han@samsung.com>
Reviewed-by: NGrant Likely <grant.likely@secretlab.ca>
Cc: Thierry Reding <thierry.reding@avionic-design.de>
Reviewed-by: NGrant Likely <grant.likely@secretlab.ca>
Acked-by: NJingoo Han <jg1.han@samsung.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

762a936f

drivers/video/backlight/lp855x_bl.c: use generic PWM functions · 8cc9764c

由 Kim, Milo 提交于 12月 17, 2012

The LP855x family devices support the PWM input for the backlight control.
 Period of the PWM is configurable in the platform side.  Platform
specific functions are unnecessary anymore because generic PWM functions
are used inside the driver.

(PWM input mode)
To set the brightness, new lp855x_pwm_ctrl() is used.
If a PWM device is not allocated, devm_pwm_get() is called.
The PWM consumer name is from the chip name such as 'lp8550' and 'lp8556'.
To get the brightness value, no additional handling is required.
Just the value of 'props.brightness' is returned.

If the PWM driver is not ready while initializing the LP855x driver, it's
OK.  The PWM device can be retrieved later, when the brightness value is
changed.

Documentation is updated with an example.

[akpm@linux-foundation.org: coding-style simplification, per Thierry]
Signed-off-by: NMilo(Woogyom) Kim <milo.kim@ti.com>
Cc: Thierry Reding <thierry.reding@avionic-design.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Bryan Wu <bryan.wu@canonical.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8cc9764c

asm-generic: io: don't perform swab during {in,out} string functions · 41739ee3

由 Will Deacon 提交于 12月 17, 2012

The {in,out}s{b,w,l} functions are designed to operate on a stream of
bytes and therefore should not perform any byte-swapping, regardless of
the CPU byte order.

This patch fixes the generic IO header so that {in,out}s{b,w,l} call the
__raw_{read,write} functions directly rather than going via the
endian-correcting accessors.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NBen Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

41739ee3

lseek: the "whence" argument is called "whence" · 965c8e59

由 Andrew Morton 提交于 12月 17, 2012

But the kernel decided to call it "origin" instead.  Fix most of the
sites.
Acked-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

965c8e59

include/linux/init.h: use the stringify operator for the __define_initcall macro · 7929d407

由 Matthew Leach 提交于 12月 17, 2012

Currently the __define_initcall() macro takes three arguments, fn, id and
level. The level argument is exactly the same as the id argument but
wrapped in quotes. To overcome this need to specify three arguments to
the __define_initcall macro, where one argument is the stringification of
another, we can just use the stringification macro instead.
Signed-off-by: NMatthew Leach <matthew@mattleach.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7929d407

Revert "bdi: add a user-tunable cpu_list for the bdi flusher threads" · 9360b536

由 Linus Torvalds 提交于 12月 17, 2012

This reverts commit 8fa72d23.

People disagree about how this should be done, so let's revert this for
now so that nobody starts using the new tuning interface.  Tejun is
thinking about a more generic interface for thread pool affinity.
Requested-by: NTejun Heo <tj@kernel.org>
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Acked-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9360b536

17 12月, 2012 1 次提交

Btrfs: parse parent 0 into correct value in tracepoint · fb57dc81

由 Liu Bo 提交于 11月 30, 2012

Value 0 is not a tree id, so besides an upper limit, a lower limit is
necessary as well while parsing root types of tracepoint.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

fb57dc81

16 12月, 2012 2 次提交

Revert "x86-64/efi: Use EFI to deal with platform wall clock (again)" · 11520e5e

由 Linus Torvalds 提交于 12月 15, 2012

This reverts commit bd52276f ("x86-64/efi: Use EFI to deal with
platform wall clock (again)"), and the two supporting commits:

  da5a108d: "x86/kernel: remove tboot 1:1 page table creation code"

  185034e7: "x86, efi: 1:1 pagetable mapping for virtual EFI calls")

as they all depend semantically on commit 53b87cf0 ("x86, mm:
Include the entire kernel memory map in trampoline_pgd") that got
reverted earlier due to the problems it caused.

This was pointed out by Yinghai Lu, and verified by me on my Macbook Air
that uses EFI.
Pointed-out-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

11520e5e

NFSv4.1: Move the RPC timestamp out of the slot. · 8e63b6a8

由 Trond Myklebust 提交于 12月 15, 2012

Shave a few bytes off the slot table size by moving the RPC timestamp
into the sequence results.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

8e63b6a8

15 12月, 2012 3 次提交

block: discard granularity might not be power of 2 · 8dd2cb7e

由 Shaohua Li 提交于 12月 14, 2012

In MD raid case, discard granularity might not be power of 2, for example, a
4-disk raid5 has 3*chunk_size discard granularity. Correct the calculation for
such cases.
Reported-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8dd2cb7e

drm/exynos: add fimc ipp driver · 16102edb

由 Eunchul Kim 提交于 12月 14, 2012

FIMC is stand for Fully Interfactive Mobile Camera and
supports image scaler/rotator/crop/flip/csc and input/output DMA operations
and also supports writeback and display output operations.

This driver is registered to IPP subsystem framework to be used by user side
and user can control the FIMC hardware through some interfaces of IPP subsystem
framework.

Changelog v6:
- fix build warning.

Changelog v1 ~ v5:
- add comments, code fixups and cleanups.
Signed-off-by: NEunchul Kim <chulspro.kim@samsung.com>
Signed-off-by: NJinyoung Jeon <jy0.jeon@samsung.com>
Signed-off-by: NInki Dae <inki.dae@samsung.com>
Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>

16102edb

drm/exynos: add ipp subsystem · cb471f14

由 Eunchul Kim 提交于 12月 14, 2012

This patch adds Image Post Processing(IPP) support for exynos drm driver.

IPP supports image scaler/rotator and input/output DMA operations
using IPP subsystem framework to control FIMC, Rotator and GSC hardware
and supports some user interfaces for user side.

And each IPP-based drivers support Memory to Memory operations
with various converting. And in case of FIMC hardware, it also supports
Writeback and Display output operations through local path.

Features:
- Memory to Memory operation support.
- Various pixel formats support.
- Image scaling support.
- Color Space Conversion support.
- Image crop operation support.
- Rotate operation support to 90, 180 or 270 degree.
- Flip operation support to vertical, horizontal or both.
- Writeback operation support to display blended image of FIMD fifo on screen

A summary to IPP Subsystem operations:
First of all, user should get property capabilities from IPP subsystem
and set these properties to hardware registers for desired operations.
The properties could be pixel format, position, rotation degree and
flip operation.

And next, user should set source and destination buffer data using
DRM_EXYNOS_IPP_QUEUE_BUF ioctl command with gem handles to source and
destinition buffers.

And next, user can control user-desired hardware with desired operations
such as play, stop, pause and resume controls.

And finally, user can aware of dma operation completion and also get
destination buffer that it contains user-desried result through dequeue
command.

IOCTL commands:
- DRM_EXYNOS_IPP_GET_PROPERTY
  . get ipp driver capabilitis and id.
- DRM_EXYNOS_IPP_SET_PROPERTY
  . set format, position, rotation, flip to source and destination buffers
- DRM_EXYNOS_IPP_QUEUE_BUF
  . enqueue/dequeue buffer and make event list.
- DRM_EXYNOS_IPP_CMD_CTRL
  . play/stop/pause/resume control.

Event:
- DRM_EXYNOS_IPP_EVENT
  . a event to notify dma operation completion to user side.

Basic control flow:
Open -> Get properties -> User choose desired IPP sub driver(FIMC, Rotator
or GSCALER) -> Set Property -> Create gem handle -> Enqueue to source and
destination buffers -> Command control(Play) -> Event is notified to User
-> User gets destinition buffer complated -> (Enqueue to source and
destination buffers -> Event is notified to User) * N -> Queue/Dequeue to
source and destination buffers -> Command control(Stop) -> Free gem handle
-> Close

Changelog v1 ~ v5:
- added comments, code fixups and cleanups.
Signed-off-by: NEunchul Kim <chulspro.kim@samsung.com>
Signed-off-by: NJinyoung Jeon <jy0.jeon@samsung.com>
Signed-off-by: NInki Dae <inki.dae@samsung.com>
Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>

cb471f14

14 12月, 2012 4 次提交

drm/radeon: enable the async DMA rings in the CS ioctl · 278a334c

由 Alex Deucher 提交于 12月 13, 2012

This enables the functionality added in the previous
patches.  Userspace acceleration drivers can use the
CS ioctl to submit command buffers to the async DMA
rings.
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

278a334c

Revert "libata: check SATA_SETTINGS log with HW Feature Ctrl" · 8349e5ae

由 Jeff Garzik 提交于 12月 14, 2012

This reverts commit de90cd71.

Shane Huang writes:

  Please suspend this patch because I just received two new
  DevSlp drives but found word 78 bit 5 is _not_ set.

  I'm checking with the drive vendor whether he gave me
  the wrong information. If bit 5 is not the necessary and
  sufficient condition, I will implement another patch to
  replace ata_device->sata_settings into ->devslp_timing.
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

8349e5ae

target/iscsi_target: Add NodeACL tags for initiator group support · 79e62fc3

由 Andy Grover 提交于 12月 11, 2012

Thanks for reviews, looking a lot better.

---- 8< ----

Initiator access config could be easier. The way other storage vendors
have addressed this is to support initiator groups: the admin adds
initiator WWNs to the group, and then LUN permissions can be granted for
the entire group at once.

Instead of changing ktarget's configfs interface, this patch keeps
the configfs interface per-initiator-wwn and just adds a 'tag' field
for each. This should be enough for user tools like targetcli to group
initiator ACLs and sync their configurations.

acl_tag is not used internally, but needs to be kept in configfs so that
all user tools can avoid dependencies on each other.

Code tested to work, although userspace pieces still to be implemented.
Signed-off-by: NAndy Grover <agrover@redhat.com>
Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>

79e62fc3

net: ethool: Document struct ethtool_flow_ext · dc2e5734

由 Yan Burman 提交于 12月 13, 2012

Add documentation for struct ethtool_flow_ext especially in regard
to what flags are needed for which fields.
Signed-off-by: NYan Burman <yanb@mellanox.com>
Reviewed-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc2e5734

13 12月, 2012 1 次提交

lib/raid6: Add AVX2 optimized gen_syndrome functions · 2c935842

由 Yuanhan Liu 提交于 11月 30, 2012

Add AVX2 optimized gen_syndrom functions, which is simply based on
sse2.c written by hpa.
Signed-off-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NJim Kukunas <james.t.kukunas@linux.intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

2c935842