- 19 12月, 2012 8 次提交
-
-
由 Glauber Costa 提交于
Allow a memcg parameter to be passed during cache creation. When the slub allocator is being used, it will only merge caches that belong to the same memcg. We'll do this by scanning the global list, and then translating the cache to a memcg-specific cache Default function is created as a wrapper, passing NULL to the memcg version. We only merge caches that belong to the same memcg. A helper is provided, memcg_css_id: because slub needs a unique cache name for sysfs. Since this is visible, but not the canonical location for slab data, the cache name is not used, the css_id should suffice. Signed-off-by: NGlauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Glauber Costa 提交于
For the kmem slab controller, we need to record some extra information in the kmem_cache structure. Signed-off-by: NGlauber Costa <glommer@parallels.com> Signed-off-by: NSuleiman Souhlal <suleiman@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Glauber Costa 提交于
Because those architectures will draw their stacks directly from the page allocator, rather than the slab cache, we can directly pass __GFP_KMEMCG flag, and issue the corresponding free_pages. This code path is taken when the architecture doesn't define CONFIG_ARCH_THREAD_INFO_ALLOCATOR (only ia64 seems to), and has THREAD_SIZE >= PAGE_SIZE. Luckily, most - if not all - of the remaining architectures fall in this category. This will guarantee that every stack page is accounted to the memcg the process currently lives on, and will have the allocations to fail if they go over limit. For the time being, I am defining a new variant of THREADINFO_GFP, not to mess with the other path. Once the slab is also tracked by memcg, we can get rid of that flag. Tested to successfully protect against :(){ :|:& };: Signed-off-by: NGlauber Costa <glommer@parallels.com> Acked-by: NFrederic Weisbecker <fweisbec@redhat.com> Acked-by: NKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Glauber Costa 提交于
We can use static branches to patch the code in or out when not used. Because the _ACTIVE bit on kmem_accounted is only set after the increment is done, we guarantee that the root memcg will always be selected for kmem charges until all call sites are patched (see memcg_kmem_enabled). This guarantees that no mischarges are applied. Static branch decrement happens when the last reference count from the kmem accounting in memcg dies. This will only happen when the charges drop down to 0. When that happens, we need to disable the static branch only on those memcgs that enabled it. To achieve this, we would be forced to complicate the code by keeping track of which memcgs were the ones that actually enabled limits, and which ones got it from its parents. It is a lot simpler just to do static_key_slow_inc() on every child that is accounted. Signed-off-by: NGlauber Costa <glommer@parallels.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: JoonSoo Kim <js1304@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Glauber Costa 提交于
It is useful to know how many charges are still left after a call to res_counter_uncharge. While it is possible to issue a res_counter_read after uncharge, this can be racy. If we need, for instance, to take some action when the counters drop down to 0, only one of the callers should see it. This is the same semantics as the atomic variables in the kernel. Since the current return value is void, we don't need to worry about anything breaking due to this change: nobody relied on that, and only users appearing from now on will be checking this value. Signed-off-by: NGlauber Costa <glommer@parallels.com> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: JoonSoo Kim <js1304@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Glauber Costa 提交于
When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to validate the allocation. Tasks in the root memcg can always proceed. To avoid adding markers to the page - and a kmem flag that would necessarily follow, as much as doing page_cgroup lookups for no reason, whoever is marking its allocations with __GFP_KMEMCG flag is responsible for telling the page allocator that this is such an allocation at free_pages() time. This is done by the invocation of __free_accounted_pages() and free_accounted_pages(). Signed-off-by: NGlauber Costa <glommer@parallels.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NMel Gorman <mgorman@suse.de> Acked-by: NKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: JoonSoo Kim <js1304@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Glauber Costa 提交于
Introduce infrastructure for tracking kernel memory pages to a given memcg. This will happen whenever the caller includes the flag __GFP_KMEMCG flag, and the task belong to a memcg other than the root. In memcontrol.h those functions are wrapped in inline acessors. The idea is to later on, patch those with static branches, so we don't incur any overhead when no mem cgroups with limited kmem are being used. Users of this functionality shall interact with the memcg core code through the following functions: memcg_kmem_newpage_charge: will return true if the group can handle the allocation. At this point, struct page is not yet allocated. memcg_kmem_commit_charge: will either revert the charge, if struct page allocation failed, or embed memcg information into page_cgroup. memcg_kmem_uncharge_page: called at free time, will revert the charge. Signed-off-by: NGlauber Costa <glommer@parallels.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: JoonSoo Kim <js1304@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Glauber Costa 提交于
This flag is used to indicate to the callees that this allocation is a kernel allocation in process context, and should be accounted to current's memcg. Signed-off-by: NGlauber Costa <glommer@parallels.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Acked-by: NKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: JoonSoo Kim <js1304@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 12月, 2012 21 次提交
-
-
由 Cyrill Gorcunov 提交于
We will need this helper in the next patch to provide a file handle for inotify marks in /proc/pid/fdinfo output. The patch is rather providing the way to use inodes directly when dentry is not available (like in case of inotify system). Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Acked-by: NPavel Emelyanov <xemul@parallels.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: James Bottomley <jbottomley@parallels.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Matthew Helsley <matt.helsley@gmail.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cyrill Gorcunov 提交于
This allows us to print out eventpoll target file descriptor, events and data, the /proc/pid/fdinfo/fd consists of | pos: 0 | flags: 02 | tfd: 5 events: 1d data: ffffffffffffffff enabled: 1 [avagin@: fix for unitialized ret variable] Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Acked-by: NPavel Emelyanov <xemul@parallels.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: James Bottomley <jbottomley@parallels.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Matthew Helsley <matt.helsley@gmail.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cyrill Gorcunov 提交于
This patch brings ability to print out auxiliary data associated with file in procfs interface /proc/pid/fdinfo/fd. In particular further patches make eventfd, evenpoll, signalfd and fsnotify to print additional information complete enough to restore these objects after checkpoint. To simplify the code we add show_fdinfo callback inside struct file_operations (as Al and Pavel are proposing). Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Acked-by: NPavel Emelyanov <xemul@parallels.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: James Bottomley <jbottomley@parallels.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Matthew Helsley <matt.helsley@gmail.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
Add functions to get the requested number of pseudo-random bytes. The difference from get_random_bytes() is that it generates pseudo-random numbers by prandom_u32(). It doesn't consume the entropy pool, and the sequence is reproducible if the same rnd_state is used. So it is suitable for generating random bytes for testing. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Eilon Greenstein <eilong@broadcom.com> Cc: David Laight <david.laight@aculab.com> Cc: Michel Lespinasse <walken@google.com> Cc: Robert Love <robert.w.love@intel.com> Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
This renames all random32 functions to have 'prandom_' prefix as follows: void prandom_seed(u32 seed); /* rename from srandom32() */ u32 prandom_u32(void); /* rename from random32() */ void prandom_seed_state(struct rnd_state *state, u64 seed); /* rename from prandom32_seed() */ u32 prandom_u32_state(struct rnd_state *state); /* rename from prandom32() */ The purpose of this renaming is to prevent some kernel developers from assuming that prandom32() and random32() might imply that only prandom32() was the one using a pseudo-random number generator by prandom32's "p", and the result may be a very embarassing security exposure. This concern was expressed by Theodore Ts'o. And furthermore, I'm going to introduce new functions for getting the requested number of pseudo-random bytes. If I continue to use both prandom32 and random32 prefixes for these functions, the confusion is getting worse. As a result of this renaming, "prandom_" is the common prefix for pseudo-random number library. Currently, srandom32() and random32() are preserved because it is difficult to rename too many users at once. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Robert Love <robert.w.love@intel.com> Cc: Michel Lespinasse <walken@google.com> Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu> Cc: David Laight <david.laight@aculab.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Josh Triplett 提交于
linux/compiler.h has macros to denote functions that acquire or release locks, but not to denote functions called with a lock held that return with the lock still held. Add a __must_hold macro to cover that case. Signed-off-by: NJosh Triplett <josh@joshtriplett.org> Reported-by: NEd Cashin <ecashin@coraid.com> Tested-by: NEd Cashin <ecashin@coraid.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gao feng 提交于
Since commit 1cdcbec1 ("CRED: Neuter sys_capset()") is_container_init() has no callers. Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com> Cc: David Howells <dhowells@redhat.com> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Cc: James Morris <jmorris@namei.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
To avoid an explosion of request_module calls on a chain of abusive scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon as maximum recursion depth is hit, the error will fail all the way back up the chain, aborting immediately. This also has the side-effect of stopping the user's shell from attempting to reexecute the top-level file as a shell script. As seen in the dash source: if (cmd != path_bshell && errno == ENOEXEC) { *argv-- = cmd; *argv = cmd = path_bshell; goto repeat; } The above logic was designed for running scripts automatically that lacked the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC, things continue to behave as the shell expects. Additionally, when tracking recursion, the binfmt handlers should not be involved. The recursion being tracked is the depth of calls through search_binary_handler(), so that function should be exclusively responsible for tracking the depth. Signed-off-by: NKees Cook <keescook@chromium.org> Cc: halfdog <me@halfdog.net> Cc: P J P <ppandit@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Ptrace jailers want to be sure that the tracee can never escape from the control. However if the tracer dies unexpectedly the tracee continues to run in potentially unsafe mode. Add the new ptrace option PTRACE_O_EXITKILL. If the tracer exits it sends SIGKILL to every tracee which has this bit set. Note that the new option is not equal to the last-option << 1. Because currently all options have an event, and the new one starts the eventless group. It uses the random 20 bit, so we have the room for 12 more events, but we can also add the new eventless options below this one. Suggested by Amnon Shiloh. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Tested-by: NAmnon Shiloh <u3557@miso.sublimeip.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Chris Evans <scarybeasts@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eldad Zack 提交于
As Bruce Fields pointed out, kstrto* is currently lacking kerneldoc comments. This patch adds kerneldoc comments to common variants of kstrto*: kstrto(u)l, kstrto(u)ll and kstrto(u)int. Signed-off-by: NEldad Zack <eldad@fogrefinery.com> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Joe Perches <joe@perches.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Rob Landley <rob@landley.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Catalin Marinas 提交于
This function is used by sparc, powerpc tile and arm64 for compat support. The patch adds a generic implementation with a wrapper for PowerPC to do the u32->int sign extension. The reason for a single patch covering powerpc, tile, sparc and arm64 is to keep it bisectable, otherwise kernel building may fail with mismatched function declarations. Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> [for tile] Acked-by: NDavid S. Miller <davem@davemloft.net> Acked-by: NArnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Add lockdep annotations. Not only this can help to find the potential problems, we do not want the false warnings if, say, the task takes two different percpu_rw_semaphore's for reading. IOW, at least ->rw_sem should not use a single class. This patch exposes this internal lock to lockdep so that it represents the whole percpu_rw_semaphore. This way we do not need to add another "fake" ->lockdep_map and lock_class_key. More importantly, this also makes the output from lockdep much more understandable if it finds the problem. In short, with this patch from lockdep pov percpu_down_read() and percpu_up_read() acquire/release ->rw_sem for reading, this matches the actual semantics. This abuses __up_read() but I hope this is fine and in fact I'd like to have down_read_no_lockdep() as well, percpu_down_read_recursive_readers() will need it. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Anton Arapov <anton@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
percpu_rw_semaphore->writer_mutex was only added to simplify the initial rewrite, the only thing it protects is clear_fast_ctr() which otherwise could be called by multiple writers. ->rw_sem is enough to serialize the writers. Kill this mutex and add "atomic_t write_ctr" instead. The writers increment/decrement this counter, the readers check it is zero instead of mutex_is_locked(). Move atomic_add(clear_fast_ctr(), slow_read_ctr) under down_write() to avoid the race with other writers. This is a bit sub-optimal, only the first writer needs this and we do not need to exclude the readers at this stage. But this is simple, we do not want another internal lock until we add more features. And this speeds up the write-contended case. Before this patch the racing writers sleep in synchronize_sched_expedited() sequentially, with this patch multiple synchronize_sched_expedited's can "overlap" with each other. Note: we can do more optimizations, this is only the first step. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Anton Arapov <anton@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Currently the writer does msleep() plus synchronize_sched() 3 times to acquire/release the semaphore, and during this time the readers are blocked completely. Even if the "write" section was not actually started or if it was already finished. With this patch down_write/up_write does synchronize_sched() twice and down_read/up_read are still possible during this time, just they use the slow path. percpu_down_write() first forces the readers to use rw_semaphore and increment the "slow" counter to take the lock for reading, then it takes that rw_semaphore for writing and blocks the readers. Also. With this patch the code relies on the documented behaviour of synchronize_sched(), it doesn't try to pair synchronize_sched() with barrier. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andy Shevchenko 提交于
There are several places in the kernel that use functionality like basename(3) with the exception: in case of '/foo/bar/' we expect to get an empty string. Let's do it common helper for them. Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Jason Baron <jbaron@redhat.com> Cc: YAMANE Toshiaki <yamanetoshi@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Thierry Reding 提交于
This function finds the struct backlight_device for a given device tree node. A dummy function is provided so that it safely compiles out if OF support is disabled. [akpm@linux-foundation.org: Don't use IS_ENABLED(CONFIG_OF)] Signed-off-by: NThierry Reding <thierry.reding@avionic-design.de> Acked-by: NJingoo Han <jg1.han@samsung.com> Reviewed-by: NGrant Likely <grant.likely@secretlab.ca> Cc: Thierry Reding <thierry.reding@avionic-design.de> Reviewed-by: NGrant Likely <grant.likely@secretlab.ca> Acked-by: NJingoo Han <jg1.han@samsung.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kim, Milo 提交于
The LP855x family devices support the PWM input for the backlight control. Period of the PWM is configurable in the platform side. Platform specific functions are unnecessary anymore because generic PWM functions are used inside the driver. (PWM input mode) To set the brightness, new lp855x_pwm_ctrl() is used. If a PWM device is not allocated, devm_pwm_get() is called. The PWM consumer name is from the chip name such as 'lp8550' and 'lp8556'. To get the brightness value, no additional handling is required. Just the value of 'props.brightness' is returned. If the PWM driver is not ready while initializing the LP855x driver, it's OK. The PWM device can be retrieved later, when the brightness value is changed. Documentation is updated with an example. [akpm@linux-foundation.org: coding-style simplification, per Thierry] Signed-off-by: NMilo(Woogyom) Kim <milo.kim@ti.com> Cc: Thierry Reding <thierry.reding@avionic-design.de> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Bryan Wu <bryan.wu@canonical.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Will Deacon 提交于
The {in,out}s{b,w,l} functions are designed to operate on a stream of bytes and therefore should not perform any byte-swapping, regardless of the CPU byte order. This patch fixes the generic IO header so that {in,out}s{b,w,l} call the __raw_{read,write} functions directly rather than going via the endian-correcting accessors. Signed-off-by: NWill Deacon <will.deacon@arm.com> Cc: Mike Frysinger <vapier@gentoo.org> Acked-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NBen Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Morton 提交于
But the kernel decided to call it "origin" instead. Fix most of the sites. Acked-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Leach 提交于
Currently the __define_initcall() macro takes three arguments, fn, id and level. The level argument is exactly the same as the id argument but wrapped in quotes. To overcome this need to specify three arguments to the __define_initcall macro, where one argument is the stringification of another, we can just use the stringification macro instead. Signed-off-by: NMatthew Leach <matthew@mattleach.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
This reverts commit 8fa72d23. People disagree about how this should be done, so let's revert this for now so that nobody starts using the new tuning interface. Tejun is thinking about a more generic interface for thread pool affinity. Requested-by: NTejun Heo <tj@kernel.org> Acked-by: NJeff Moyer <jmoyer@redhat.com> Acked-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 12月, 2012 1 次提交
-
-
由 Liu Bo 提交于
Value 0 is not a tree id, so besides an upper limit, a lower limit is necessary as well while parsing root types of tracepoint. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
- 16 12月, 2012 2 次提交
-
-
由 Linus Torvalds 提交于
This reverts commit bd52276f ("x86-64/efi: Use EFI to deal with platform wall clock (again)"), and the two supporting commits: da5a108d: "x86/kernel: remove tboot 1:1 page table creation code" 185034e7: "x86, efi: 1:1 pagetable mapping for virtual EFI calls") as they all depend semantically on commit 53b87cf0 ("x86, mm: Include the entire kernel memory map in trampoline_pgd") that got reverted earlier due to the problems it caused. This was pointed out by Yinghai Lu, and verified by me on my Macbook Air that uses EFI. Pointed-out-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Trond Myklebust 提交于
Shave a few bytes off the slot table size by moving the RPC timestamp into the sequence results. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
- 15 12月, 2012 3 次提交
-
-
由 Shaohua Li 提交于
In MD raid case, discard granularity might not be power of 2, for example, a 4-disk raid5 has 3*chunk_size discard granularity. Correct the calculation for such cases. Reported-by: NNeil Brown <neilb@suse.de> Signed-off-by: NShaohua Li <shli@fusionio.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Eunchul Kim 提交于
FIMC is stand for Fully Interfactive Mobile Camera and supports image scaler/rotator/crop/flip/csc and input/output DMA operations and also supports writeback and display output operations. This driver is registered to IPP subsystem framework to be used by user side and user can control the FIMC hardware through some interfaces of IPP subsystem framework. Changelog v6: - fix build warning. Changelog v1 ~ v5: - add comments, code fixups and cleanups. Signed-off-by: NEunchul Kim <chulspro.kim@samsung.com> Signed-off-by: NJinyoung Jeon <jy0.jeon@samsung.com> Signed-off-by: NInki Dae <inki.dae@samsung.com> Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
-
由 Eunchul Kim 提交于
This patch adds Image Post Processing(IPP) support for exynos drm driver. IPP supports image scaler/rotator and input/output DMA operations using IPP subsystem framework to control FIMC, Rotator and GSC hardware and supports some user interfaces for user side. And each IPP-based drivers support Memory to Memory operations with various converting. And in case of FIMC hardware, it also supports Writeback and Display output operations through local path. Features: - Memory to Memory operation support. - Various pixel formats support. - Image scaling support. - Color Space Conversion support. - Image crop operation support. - Rotate operation support to 90, 180 or 270 degree. - Flip operation support to vertical, horizontal or both. - Writeback operation support to display blended image of FIMD fifo on screen A summary to IPP Subsystem operations: First of all, user should get property capabilities from IPP subsystem and set these properties to hardware registers for desired operations. The properties could be pixel format, position, rotation degree and flip operation. And next, user should set source and destination buffer data using DRM_EXYNOS_IPP_QUEUE_BUF ioctl command with gem handles to source and destinition buffers. And next, user can control user-desired hardware with desired operations such as play, stop, pause and resume controls. And finally, user can aware of dma operation completion and also get destination buffer that it contains user-desried result through dequeue command. IOCTL commands: - DRM_EXYNOS_IPP_GET_PROPERTY . get ipp driver capabilitis and id. - DRM_EXYNOS_IPP_SET_PROPERTY . set format, position, rotation, flip to source and destination buffers - DRM_EXYNOS_IPP_QUEUE_BUF . enqueue/dequeue buffer and make event list. - DRM_EXYNOS_IPP_CMD_CTRL . play/stop/pause/resume control. Event: - DRM_EXYNOS_IPP_EVENT . a event to notify dma operation completion to user side. Basic control flow: Open -> Get properties -> User choose desired IPP sub driver(FIMC, Rotator or GSCALER) -> Set Property -> Create gem handle -> Enqueue to source and destination buffers -> Command control(Play) -> Event is notified to User -> User gets destinition buffer complated -> (Enqueue to source and destination buffers -> Event is notified to User) * N -> Queue/Dequeue to source and destination buffers -> Command control(Stop) -> Free gem handle -> Close Changelog v1 ~ v5: - added comments, code fixups and cleanups. Signed-off-by: NEunchul Kim <chulspro.kim@samsung.com> Signed-off-by: NJinyoung Jeon <jy0.jeon@samsung.com> Signed-off-by: NInki Dae <inki.dae@samsung.com> Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
-
- 14 12月, 2012 4 次提交
-
-
由 Alex Deucher 提交于
This enables the functionality added in the previous patches. Userspace acceleration drivers can use the CS ioctl to submit command buffers to the async DMA rings. Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jeff Garzik 提交于
This reverts commit de90cd71. Shane Huang writes: Please suspend this patch because I just received two new DevSlp drives but found word 78 bit 5 is _not_ set. I'm checking with the drive vendor whether he gave me the wrong information. If bit 5 is not the necessary and sufficient condition, I will implement another patch to replace ata_device->sata_settings into ->devslp_timing. Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
-
由 Andy Grover 提交于
Thanks for reviews, looking a lot better. ---- 8< ---- Initiator access config could be easier. The way other storage vendors have addressed this is to support initiator groups: the admin adds initiator WWNs to the group, and then LUN permissions can be granted for the entire group at once. Instead of changing ktarget's configfs interface, this patch keeps the configfs interface per-initiator-wwn and just adds a 'tag' field for each. This should be enough for user tools like targetcli to group initiator ACLs and sync their configurations. acl_tag is not used internally, but needs to be kept in configfs so that all user tools can avoid dependencies on each other. Code tested to work, although userspace pieces still to be implemented. Signed-off-by: NAndy Grover <agrover@redhat.com> Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
-
由 Yan Burman 提交于
Add documentation for struct ethtool_flow_ext especially in regard to what flags are needed for which fields. Signed-off-by: NYan Burman <yanb@mellanox.com> Reviewed-by: NBen Hutchings <bhutchings@solarflare.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 12月, 2012 1 次提交
-
-
由 Yuanhan Liu 提交于
Add AVX2 optimized gen_syndrom functions, which is simply based on sse2.c written by hpa. Signed-off-by: NYuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NJim Kukunas <james.t.kukunas@linux.intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-