- 23 7月, 2012 6 次提交
-
-
由 Al Viro 提交于
It doesn't matter on normal return to userland path (we'll recheck the NOTIFY_RESUME flag anyway), but in case of exit_task_work() we'll need that as soon as we get callbacks capable of triggering more task_work_add(). Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> -
由 Al Viro 提交于
... and get rid of PF_EXITING check in task_work_add(). Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> -
由 Al Viro 提交于
task_work and rcu_head are identical now; merge them (calling the result struct callback_head, rcu_head #define'd to it), kill separate allocation in security/keys since we can just use cred->rcu now. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> -
由 Al Viro 提交于
layout based on Oleg's suggestion; single-linked list, task->task_works points to the last element, forward pointer from said last element points to head. I'd still prefer much more regular scheme with two pointers in task_work, but... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> -
由 Al Viro 提交于
get rid of the only user of ->data; this is _not_ the final variant - in the end we'll have task_work and rcu_head identical and just use cred->rcu, at which point the separate allocation will be gone completely. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> -
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 22 7月, 2012 4 次提交
-
-
由 Anton Vorontsov 提交于
The locked variants are prone to deadlocks (suppose we got to the debugger w/ the logbuf lock held), so let's switch to nolock variants. Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anton Vorontsov 提交于
If used from KDB, the locked variants are prone to deadlocks (suppose we got to the debugger w/ the logbuf lock held). So, we have to implement a few routines that grab no logbuf lock. Yet we don't need these functions in modules, so we don't export them. Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anton Vorontsov 提交于
The function is no longer needed, so remove it. Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anton Vorontsov 提交于
The kgdb dmesg command is broken after the printk rework. The old logic in kdb code makes no sense in terms of current printk/logging storage format, and KDB simply hangs forever. This patch revives the command by switching to kmsg_dumper iterator. The code is now much more simpler and shorter. Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 7月, 2012 2 次提交
-
-
由 Linus Torvalds 提交于
Commit a7a20d10 ("sd: limit the scope of the async probe domain") make the SCSI device probing run device discovery in it's own async domain. However, as a result, the partition detection was no longer synchronized by async_synchronize_full() (which, despite the name, only synchronizes the global async space, not all of them). Which in turn meant that "wait_for_device_probe()" would not wait for the SCSI partitions to be parsed. And "wait_for_device_probe()" was what the boot time init code relied on for mounting the root filesystem. Now, most people never noticed this, because not only is it timing-dependent, but modern distributions all use initrd. So the root filesystem isn't actually on a disk at all. And then before they actually mount the final disk filesystem, they will have loaded the scsi-wait-scan module, which not only does the expected wait_for_device_probe(), but also does scsi_complete_async_scans(). [ Side note: scsi_complete_async_scans() had also been partially broken, but that was fixed in commit 43a8d39d ("fix async probe regression"), so that same commit a7a20d10 had actually broken setups even if you used scsi-wait-scan explicitly ] Solve this problem by just moving the scsi_complete_async_scans() call into wait_for_device_probe(). Everybody who wants to wait for device probing to finish really wants the SCSI probing to complete, so there's no reason not to do this. So now "wait_for_device_probe()" really does what the name implies, and properly waits for device probing to finish. This also removes the now unnecessary extra calls to scsi_complete_async_scans(). Reported-and-tested-by: NArtem S. Tashkinov <t.artem@mailcity.com> Cc: Dan Williams <dan.j.williams@gmail.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: James Bottomley <jbottomley@parallels.com> Cc: Borislav Petkov <bp@amd64.org> Cc: linux-scsi <linux-scsi@vger.kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rafael J. Wysocki 提交于
Require processes wanting to use the wake_lock/wake_unlock sysfs files to have the CAP_BLOCK_SUSPEND capability, which also is required for the eventpoll EPOLLWAKEUP flag to be effective, so that all interfaces related to blocking autosleep depend on the same capability. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Cc: stable@vger.kernel.org Acked-by: NMichael Kerrisk <mtk.man-pages@gmail.com>
-
- 17 7月, 2012 1 次提交
-
-
由 Thomas Gleixner 提交于
The leap second rework unearthed another issue of inconsistent data. On timekeeping_resume() the timekeeper data is updated, but nothing calls timekeeping_update(), so now the update code in the timer interrupt sees stale values. This has been the case before those changes, but then the timer interrupt was using stale data as well so this went unnoticed for quite some time. Add the missing update call, so all the data is consistent everywhere. Reported-by: NAndreas Schwab <schwab@linux-m68k.org> Reported-and-tested-by: N"Rafael J. Wysocki" <rjw@sisk.pl> Reported-and-tested-by: NMartin Steigerwald <Martin@lichtvoll.de> Cc: LKML <linux-kernel@vger.kernel.org> Cc: Linux PM list <linux-pm@vger.kernel.org> Cc: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, Cc: Prarit Bhargava <prarit@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <johnstul@us.ibm.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 7月, 2012 8 次提交
-
-
由 John Stultz 提交于
As part of cleaning up the timekeeping code, this patch converts a number of internal functions to takei a timekeeper ptr as an argument, so that the internal functions don't access the global timekeeper structure directly. This allows for further optimizations to reduce lock hold time later. This patch has been updated to include more consistent usage of the timekeeper value, by making sure it is always passed as a argument to non top-level functions. Signed-off-by: NJohn Stultz <john.stultz@linaro.org> Reviewed-by: NIngo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1342156917-25092-9-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 John Stultz 提交于
When we make adjustments speeding up the clock, its possible for xtime_nsec to underflow. We already handle this properly, but we do so from update_wall_time() instead of the more logical timekeeping_adjust(), where the possible underflow actually occurs. Thus, move the correction logic to the timekeeping_adjust, which is the function that causes the issue. Making update_wall_time() more readable. Signed-off-by: NJohn Stultz <johnstul@us.ibm.com> Reviewed-by: NIngo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1342156917-25092-8-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 John Stultz 提交于
Since we call arch_gettimeoffset() in all the accessor functions, move arch_gettimeoffset() calls into timekeeping_get_ns() and timekeeping_get_ns_raw() to simplify the code. This also makes the code easier to maintain as we don't have to worry about forgetting the arch_gettimeoffset() as has happened in the past. Signed-off-by: NJohn Stultz <johnstul@us.ibm.com> Reviewed-by: NIngo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1342156917-25092-7-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 John Stultz 提交于
We do the exact same logic moving nsecs to secs in the timekeeper in multiple places, so condense this into a single function. Signed-off-by: NJohn Stultz <john.stultz@linaro.org> Reviewed-by: NIngo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1342156917-25092-6-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 John Stultz 提交于
The timekeeper struct has a xtime_nsec, which keeps the sub-nanosecond remainder. This ends up being somewhat duplicative of the timekeeper.xtime.tv_nsec value, and we have to do extra work to keep them apart, copying the full nsec portion out and back in over and over. This patch simplifies some of the logic by taking the timekeeper xtime value and splitting it into timekeeper.xtime_sec and reuses the timekeeper.xtime_nsec for the sub-second portion (stored in higher res shifted nanoseconds). This simplifies some of the accumulation logic. And will allow for more accurate timekeeping once the vsyscall code is updated to use the shifted nanosecond remainder. Signed-off-by: NJohn Stultz <john.stultz@linaro.org> Reviewed-by: NIngo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1342156917-25092-5-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 John Stultz 提交于
Ingo noted that using a u32 instead of int for shift values would be better to make sure the compiler doesn't unnecessarily use complex signed arithmetic. Signed-off-by: NJohn Stultz <john.stultz@linaro.org> Reviewed-by: NIngo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1342156917-25092-4-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 John Stultz 提交于
Ingo noted a number of places where there is inconsistent use of whitespace. This patch tries to address the main culprits. Signed-off-by: NJohn Stultz <john.stultz@linaro.org> Reviewed-by: NIngo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1342156917-25092-3-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 John Stultz 提交于
In commit 6b43ae8a, I introduced a bug that kept the STA_INS or STA_DEL bit from being cleared from time_status via adjtimex() without forcing STA_PLL first. Usually once the STA_INS is set, it isn't cleared until the leap second is applied, so its unlikely this affected anyone. However during testing I noticed it took some effort to cancel a leap second once STA_INS was set. Signed-off-by: NJohn Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> CC: stable@vger.kernel.org # 3.4 Link: http://lkml.kernel.org/r/1342156917-25092-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 14 7月, 2012 4 次提交
-
-
由 David Howells 提交于
Pass mount flags to sget() so that it can use them in initialising a new superblock before the set function is called. They could also be passed to the compare function. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 David Howells 提交于
copy_tree() can theoretically fail in a case other than ENOMEM, but always returns NULL which is interpreted by callers as -ENOMEM. Change it to return an explicit error. Also change clone_mnt() for consistency and because union mounts will add new error cases. Thanks to Andreas Gruenbacher <agruen@suse.de> for a bug fix. [AV: folded braino fix by Dan Carpenter] Original-author: Valerie Aurora <vaurora@redhat.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Cc: Valerie Aurora <valerie.aurora@gmail.com> Cc: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
all callers want the same thing, actually - a kinda-sorta analog of kern_path_create(). I.e. they want parent vfsmount/dentry (with ->i_mutex held, to make sure the child dentry is still their child) + the child dentry. Signed-off-by Al Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Just the flags; only NFS cares even about that, but there are legitimate uses for such argument. And getting rid of that completely would require splitting ->lookup() into a couple of methods (at least), so let's leave that alone for now... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 12 7月, 2012 8 次提交
-
-
由 Dan Carpenter 提交于
Clean up and return -ENOMEM on if the kzalloc() fails. This also prevents a potential crash, as the pointer that failed to allocate would be later used. Link: http://lkml.kernel.org/r/20120711063507.GF11812@elgon.mountain Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Konstantin Khlebnikov 提交于
"no other files mapped" requirement from my previous patch (c/r: prctl: update prctl_set_mm_exe_file() after mm->num_exe_file_vmas removal) is too paranoid, it forbids operation even if there mapped one shared-anon vma. Let's check that current mm->exe_file already unmapped, in this case exe_file symlink already outdated and its changing is reasonable. Plus, this patch fixes exit code in case operation success. Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Reported-by: NCyrill Gorcunov <gorcunov@openvz.org> Tested-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Kees Cook <keescook@chromium.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Stultz 提交于
The update of the hrtimer base offsets on all cpus cannot be made atomically from the timekeeper.lock held and interrupt disabled region as smp function calls are not allowed there. clock_was_set(), which enforces the update on all cpus, is called either from preemptible process context in case of do_settimeofday() or from the softirq context when the offset modification happened in the timer interrupt itself due to a leap second. In both cases there is a race window for an hrtimer interrupt between dropping timekeeper lock, enabling interrupts and clock_was_set() issuing the updates. Any interrupt which arrives in that window will see the new time but operate on stale offsets. So we need to make sure that an hrtimer interrupt always sees a consistent state of time and offsets. ktime_get_update_offsets() allows us to get the current monotonic time and update the per cpu hrtimer base offsets from hrtimer_interrupt() to capture a consistent state of monotonic time and the offsets. The function replaces the existing ktime_get() calls in hrtimer_interrupt(). The overhead of the new function vs. ktime_get() is minimal as it just adds two store operations. This ensures that any changes to realtime or boottime offsets are noticed and stored into the per-cpu hrtimer base structures, prior to any hrtimer expiration and guarantees that timers are not expired early. Signed-off-by: NJohn Stultz <johnstul@us.ibm.com> Reviewed-by: NIngo Molnar <mingo@kernel.org> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NPrarit Bhargava <prarit@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1341960205-56738-8-git-send-email-johnstul@us.ibm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
To finally fix the infamous leap second issue and other race windows caused by functions which change the offsets between the various time bases (CLOCK_MONOTONIC, CLOCK_REALTIME and CLOCK_BOOTTIME) we need a function which atomically gets the current monotonic time and updates the offsets of CLOCK_REALTIME and CLOCK_BOOTTIME with minimalistic overhead. The previous patch which provides ktime_t offsets allows us to make this function almost as cheap as ktime_get() which is going to be replaced in hrtimer_interrupt(). Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NIngo Molnar <mingo@kernel.org> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NPrarit Bhargava <prarit@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NJohn Stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/1341960205-56738-7-git-send-email-johnstul@us.ibm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
We need to update the base offsets from this code and we need to do that under base->lock. Move the lock held region around the ktime_get() calls. The ktime_get() calls are going to be replaced with a function which gets the time and the offsets atomically. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NIngo Molnar <mingo@kernel.org> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NPrarit Bhargava <prarit@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NJohn Stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/1341960205-56738-6-git-send-email-johnstul@us.ibm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
We need to update the hrtimer clock offsets from the hrtimer interrupt context. To avoid conversions from timespec to ktime_t maintain a ktime_t based representation of those offsets in the timekeeper. This puts the conversion overhead into the code which updates the underlying offsets and provides fast accessible values in the hrtimer interrupt. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <johnstul@us.ibm.com> Reviewed-by: NIngo Molnar <mingo@kernel.org> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NPrarit Bhargava <prarit@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1341960205-56738-4-git-send-email-johnstul@us.ibm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 John Stultz 提交于
The timekeeping code misses an update of the hrtimer subsystem after a leap second happened. Due to that timers based on CLOCK_REALTIME are either expiring a second early or late depending on whether a leap second has been inserted or deleted until an operation is initiated which causes that update. Unless the update happens by some other means this discrepancy between the timekeeping and the hrtimer data stays forever and timers are expired either early or late. The reported immediate workaround - $ data -s "`date`" - is causing a call to clock_was_set() which updates the hrtimer data structures. See: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix Add the missing clock_was_set() call to update_wall_time() in case of a leap second event. The actual update is deferred to softirq context as the necessary smp function call cannot be invoked from hard interrupt context. Signed-off-by: NJohn Stultz <johnstul@us.ibm.com> Reported-by: NJan Engelhardt <jengelh@inai.de> Reviewed-by: NIngo Molnar <mingo@kernel.org> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NPrarit Bhargava <prarit@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1341960205-56738-3-git-send-email-johnstul@us.ibm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 John Stultz 提交于
clock_was_set() cannot be called from hard interrupt context because it calls on_each_cpu(). For fixing the widely reported leap seconds issue it is necessary to call it from hard interrupt context, i.e. the timer tick code, which does the timekeeping updates. Provide a new function which denotes it in the hrtimer cpu base structure of the cpu on which it is called and raise the hrtimer softirq. We then execute the clock_was_set() notificiation from softirq context in run_hrtimer_softirq(). The hrtimer softirq is rarely used, so polling the flag there is not a performance issue. [ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get rid of all this ifdeffery ASAP ] Signed-off-by: NJohn Stultz <johnstul@us.ibm.com> Reported-by: NJan Engelhardt <jengelh@inai.de> Reviewed-by: NIngo Molnar <mingo@kernel.org> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NPrarit Bhargava <prarit@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1341960205-56738-2-git-send-email-johnstul@us.ibm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 10 7月, 2012 2 次提交
-
-
由 Kay Sievers 提交于
In (the unlikely) case our continuation merge buffer is busy, we unfortunately can not merge further continuation printk()s into a single record and have to store them separately, which leads to split-up output of these lines when they are printed. Add some flags about newlines and prefix existence to these records and try to reconstruct the full line again, when the separated records are printed. Reported-By: NMichael Neuling <mikey@neuling.org> Cc: Dave Jones <davej@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Tested-By: NMichael Neuling <mikey@neuling.org> Signed-off-by: NKay Sievers <kay@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Kay Sievers 提交于
Restore support for partial reads of any size on /proc/kmsg, in case the supplied read buffer is smaller than the record size. Some people seem to think is is ia good idea to run: $ dd if=/proc/kmsg bs=1 of=... as a klog bridge. Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=44211Reported-by: NJukka Ollila <jiiksteri@gmail.com> Signed-off-by: NKay Sievers <kay@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 08 7月, 2012 2 次提交
-
-
由 Tejun Heo 提交于
48ddbe19 "cgroup: make css->refcnt clearing on cgroup removal optional" allowed a css to linger after the associated cgroup is removed. As a css holds a reference on the cgroup's dentry, it means that cgroup dentries may linger for a while. Destroying a superblock which has dentries with positive refcnts is a critical bug and triggers BUG() in vfs code. As each cgroup dentry holds an s_active reference, any lingering cgroup has both its dentry and the superblock pinned and thus preventing premature release of superblock. Unfortunately, after 48ddbe19, there's a small window while releasing a cgroup which is directly under the root of the hierarchy. When a cgroup directory is released, vfs layer first deletes the corresponding dentry and then invokes dput() on the parent, which may recurse further, so when a cgroup directly below root cgroup is released, the cgroup is first destroyed - which releases the s_active it was holding - and then the dentry for the root cgroup is dput(). This creates a window where the root dentry's refcnt isn't zero but superblock's s_active is. If umount happens before or during this window, vfs will see the root dentry with non-zero refcnt and trigger BUG(). Before 48ddbe19, this problem didn't exist because the last dentry reference was guaranteed to be put synchronously from rmdir(2) invocation which holds s_active around the whole process. Fix it by holding an extra superblock->s_active reference across dput() from css release, which is the dput() path added by 48ddbe19 and the only one which doesn't hold an extra s_active ref across the final cgroup dput(). Signed-off-by: NTejun Heo <tj@kernel.org> LKML-Reference: <4FEEA5CB.8070809@huawei.com> Reported-by: Nshyju pv <shyju.pv@huawei.com> Tested-by: Nshyju pv <shyju.pv@huawei.com> Cc: Sasha Levin <levinsasha928@gmail.com> Acked-by: NLi Zefan <lizefan@huawei.com>
-
由 Tejun Heo 提交于
This reverts commit fa980ca8. The commit was an attempt to fix a race condition where a cgroup hierarchy may be unmounted with positive dentry reference on root cgroup. While the commit made the race condition slightly more difficult to trigger, the race was still there and could be reliably triggered using a different test case. Revert the incorrect fix. The next commit will describe the race and fix it correctly. Signed-off-by: NTejun Heo <tj@kernel.org> LKML-Reference: <4FEEA5CB.8070809@huawei.com> Reported-by: Nshyju pv <shyju.pv@huawei.com> Cc: Sasha Levin <levinsasha928@gmail.com> Acked-by: NLi Zefan <lizefan@huawei.com>
-
- 07 7月, 2012 3 次提交
-
-
由 Kay Sievers 提交于
We suppress printing kmsg records to the console, which are already printed immediately while we have received their fragments. Newly registered boot consoles print the entire kmsg buffer during registration. Clear the console-suppress flag after we skipped the record during its first storage, so any later print will see these records as usual. Signed-off-by: NKay Sievers <kay@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Kay Sievers 提交于
The /proc/kmsg read() interface is internally simply wired up to a sequence of syslog() syscalls, which might are racy between their checks and actions, regarding concurrency. In the (very uncommon) case of concurrent readers of /dev/kmsg, relying on usual O_NONBLOCK behavior, the recently introduced mutex might block an O_NONBLOCK reader in read(), when poll() returns for it, but another process has already read the data in the meantime. We've seen that while running artificial test setups and tools that "fight" about /proc/kmsg data. This restores the original /proc/kmsg behavior, where in case of concurrent read()s, poll() might wake up but the read() syscall will just return 0 to the caller, while another process has "stolen" the data. This is in the general case not the expected behavior, but it is the exact same one, that can easily be triggered with a 3.4 kernel, and some tools might just rely on it. The mutex is not needed, the original integrity issue which introduced it, is in the meantime covered by: "fill buffer with more than a single message for SYSLOG_ACTION_READ" 116e90b2 Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NKay Sievers <kay@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Kay Sievers 提交于
After the recent split of facility and level into separate variables, we miss the facility value (always 0 for kernel-originated messages) in the syslog prefix. On Tue, Jul 3, 2012 at 12:45 PM, Dan Carpenter <dan.carpenter@oracle.com> wrote: > Static checkers complain about the impossible condition here. > > In 084681d1 ('printk: flush continuation lines immediately to > console'), we changed msg->level from being a u16 to being an unsigned > 3 bit bitfield. Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NKay Sievers <kay@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-