提交 · 516e0cc5646f377ab80fcc2ee639892eccb99853 · openanolis / cloud-kernel

27 7月, 2008 1 次提交

[PATCH] f_count may wrap around · 516e0cc5

由 Al Viro 提交于 7月 26, 2008

make it atomic_long_t; while we are at it, get rid of useless checks in affs,
hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

516e0cc5

26 7月, 2008 1 次提交

kill PF_BORROWED_MM in favour of PF_KTHREAD · 246bb0b1

由 Oleg Nesterov 提交于 7月 25, 2008

Kill PF_BORROWED_MM.  Change use_mm/unuse_mm to not play with ->flags, and
do s/PF_BORROWED_MM/PF_KTHREAD/ for a couple of other users.

No functional changes yet.  But this allows us to do further
fixes/cleanups.

oom_kill/ptrace/etc often check "p->mm != NULL" to filter out the
kthreads, this is wrong because of use_mm().  The problem with
PF_BORROWED_MM is that we need task_lock() to avoid races.  With this
patch we can check PF_KTHREAD directly, or use a simple lockless helper:

	/* The result must not be dereferenced !!! */
	struct mm_struct *__get_task_mm(struct task_struct *tsk)
	{
		if (tsk->flags & PF_KTHREAD)
			return NULL;
		return tsk->mm;
	}

Note also ecard_task().  It runs with ->mm != NULL, but it's the kernel
thread without PF_BORROWED_MM.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

246bb0b1

07 6月, 2008 1 次提交

uml: activate_mm: remove the dead PF_BORROWED_MM check · aab2545f

由 Oleg Nesterov 提交于 6月 06, 2008

use_mm() was changed to use switch_mm() instead of activate_mm(), since
then nobody calls (and nobody should call) activate_mm() with
PF_BORROWED_MM bit set.

As Jeff Dike pointed out, we can also remove the "old != new" check, it is
always true.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aab2545f

30 4月, 2008 1 次提交

debugobjects: add timer specific object debugging code · c6f3a97f

由 Thomas Gleixner 提交于 4月 30, 2008

Add calls to the generic object debugging infrastructure and provide fixup
functions which allow to keep the system alive when recoverable problems have
been detected by the object debugging core code.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NIngo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c6f3a97f

29 4月, 2008 3 次提交

aio: fix misleading comments · 39fa0031

由 Jeff Moyer 提交于 4月 29, 2008

The FIXME comments are inaccurate.
The locking comment over lookup_ioctx() is wrong.
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NZach Brown <zach.brown@oracle.com>
Signed-off-by: NShen Feng <shen@cn.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

39fa0031

Remove duplicated unlikely() in IS_ERR() · 801678c5

由 Hirofumi Nakagawa 提交于 4月 29, 2008

Some drivers have duplicated unlikely() macros.  IS_ERR() already has
unlikely() in itself.

This patch cleans up such pointless code.
Signed-off-by: NHirofumi Nakagawa <hnakagawa@miraclelinux.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NJeff Garzik <jeff@garzik.org>
Cc: Paul Clements <paul.clements@steeleye.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Acked-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

801678c5

fs/aio.c: make 3 functions static · d5470b59

由 Adrian Bunk 提交于 4月 29, 2008

Make the following needlessly global functions static:

- __put_ioctx()
- lookup_ioctx()
- io_submit_one()
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d5470b59

28 4月, 2008 1 次提交

aio: io_getevents() should return if io_destroy() is invoked · e92adcba

由 Jeff Moyer 提交于 4月 28, 2008

This patch wakes up a thread waiting in io_getevents if another thread
destroys the context.  This was tested using a small program that spawns a
thread to wait in io_getevents while the parent thread destroys the io context
and then waits for the getevents thread to exit.  Without this patch, the
program hangs indefinitely.  With the patch, the program exits as expected.
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Christopher Smith <x@xman.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e92adcba

11 4月, 2008 2 次提交

eventfd/kaio integration fix · 8d1c98b0

由 Davide Libenzi 提交于 4月 10, 2008

Jeff Roberson discovered a race when using kaio eventfd based notifications.
When it occurs it can lead tomissed wakeups and hung userspace.

This patch fixes the race by moving the notification inside the spinlocked
section of kaio.  The operation is safe since eventfd spinlock and kaio one
are unrelated.
Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jeff Roberson <jroberson@chesapeake.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8d1c98b0

asmlinkage_protect sys_io_getevents · 598af051

由 Roland McGrath 提交于 4月 10, 2008

Use asmlinkage_protect in sys_io_getevents, because GCC for i386 with
CONFIG_FRAME_POINTER=n can decide to clobber an argument word on the
stack, i.e. the user struct pt_regs. Here the problem is not a tail
call, but just the compiler's use of the stack when it inlines and
optimizes the body of the called function. This seems to avoid it.
Signed-off-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

598af051

20 3月, 2008 1 次提交

aio: bad AIO race in aio_complete() leads to process hang · 6cb2a210

由 Quentin Barnes 提交于 3月 19, 2008

My group ran into a AIO process hang on a 2.6.24 kernel with the process
sleeping indefinitely in io_getevents(2) waiting for the last wakeup to come
and it never would.

We ran the tests on x86_64 SMP.  The hang only occurred on a Xeon box
("Clovertown") but not a Core2Duo ("Conroe").  On the Xeon, the L2 cache isn't
shared between all eight processors, but is L2 is shared between between all
two processors on the Core2Duo we use.

My analysis of the hang is if you go down to the second while-loop
in read_events(), what happens on processor #1:
	1) add_wait_queue_exclusive() adds thread to ctx->wait
	2) aio_read_evt() to check tail
	3) if aio_read_evt() returned 0, call [io_]schedule() and sleep

In aio_complete() with processor #2:
	A) info->tail = tail;
	B) waitqueue_active(&ctx->wait)
	C) if waitqueue_active() returned non-0, call wake_up()

The way the code is written, step 1 must be seen by all other processors
before processor 1 checks for pending events in step 2 (that were recorded by
step A) and step A by processor 2 must be seen by all other processors
(checked in step 2) before step B is done.

The race I believed I was seeing is that steps 1 and 2 were
effectively swapped due to the __list_add() being delayed by the L2
cache not shared by some of the other processors.  Imagine:
proc 2: just before step A
proc 1, step 1: adds to ctx->wait, but is not visible by other processors yet
proc 1, step 2: checks tail and sees no pending events
proc 2, step A: updates tail
proc 1, step 3: calls [io_]schedule() and sleeps
proc 2, step B: checks ctx->wait, but sees no one waiting, skips wakeup
                so proc 1 sleeps indefinitely

My patch adds a memory barrier between steps A and B.  It ensures that the
update in step 1 gets seen on processor 2 before continuing.  If processor 1
was just before step 1, the memory barrier makes sure that step A (update
tail) gets seen by the time processor 1 makes it to step 2 (check tail).

Before the patch our AIO process would hang virtually 100% of the time.  After
the patch, we have yet to see the process ever hang.
Signed-off-by: NQuentin Barnes <qbarnes+linux@yahoo-inc.com>
Reviewed-by: NZach Brown <zach.brown@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: <stable@kernel.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
[ We should probably disallow that "if (waitqueue_active()) wake_up()"
  coding pattern, because it's so often buggy wrt memory ordering ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6cb2a210

09 2月, 2008 3 次提交

aio: negative offset should return -EINVAL · c2ec6682

由 Rusty Russell 提交于 2月 08, 2008

An AIO read or write should return -EINVAL if the offset is negative.
This check matches the one in pread and pwrite.

This was found by the libaio test suite.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NZach Brown <zach.brown@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c2ec6682

aio: partial write should not return error code · 7adfa2ff

由 Rusty Russell 提交于 2月 08, 2008

When an AIO write gets an error after writing some data (eg.  ENOSPC), it
should return the amount written already, not the error.  Just like write()
is supposed to.

This was found by the libaio test suite.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-By: NZach Brown <zach.brown@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7adfa2ff

fs: remove fastcall, it is always empty · fc9b52cd

由 Harvey Harrison 提交于 2月 08, 2008

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fc9b52cd

30 1月, 2008 1 次提交

core: remove last users of empty FASTCALL macro · 56c4da45

由 Harvey Harrison 提交于 1月 30, 2008

FASTCALL is always empty after the x86 removal.
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

56c4da45

06 12月, 2007 1 次提交

aio: only account I/O wait time in read_events if there are active requests · e00ba3da

由 Jeff Moyer 提交于 12月 04, 2007

On 2.6.24, top started showing 100% iowait on one CPU when a UML instance was
running (but completely idle).  The UML code sits in io_getevents waiting for
an event to be submitted and completed.

Fix this by checking ctx->reqs_active before scheduling to determine whether
or not we are waiting for I/O.
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e00ba3da

19 10月, 2007 1 次提交

Remove struct task_struct::io_wait · 6212e3a3

由 Alexey Dobriyan 提交于 10月 18, 2007

Hell knows what happened in commit 63b05203af57e7de4f3bb63b8b81d43bc196d32b
during 2.6.9 development.  Commit introduced io_wait field which remained
write-only than and still remains write-only.

Also garbage collect macros which "use" io_wait.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6212e3a3

17 10月, 2007 1 次提交

aio: account I/O wait time properly · 41d10da3

由 Jeff Moyer 提交于 10月 16, 2007

Some months back I proposed changing the schedule() call in
read_events to an io_schedule():
	http://osdir.com/ml/linux.kernel.aio.general/2006-10/msg00024.html
This was rejected as there are AIO operations that do not initiate
disk I/O.  I've had another look at the problem, and the only AIO
operation that will not initiate disk I/O is IOCB_CMD_NOOP.  However,
this command isn't even wired up!

Given that it doesn't work, and hasn't for *years*, I'm going to
suggest again that we do proper I/O accounting when using AIO.
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Acked-by: NZach Brown <zach.brown@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

41d10da3

09 10月, 2007 1 次提交

AIO: fix cleanup in io_submit_one(...) · 87e2831c

由 Yan Zheng 提交于 10月 08, 2007

When IOCB_FLAG_RESFD flag is set and iocb->aio_resfd is incorrect,
statement 'goto out_put_req' is executed. At label 'out_put_req',
aio_put_req(..) is called, which requires 'req->ki_filp' set.

Signed-off-by: Yan Zheng<yanzheng@21cn.com>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

87e2831c

11 5月, 2007 1 次提交

signal/timer/event: KAIO eventfd support example · 9c3060be

由 Davide Libenzi 提交于 5月 10, 2007

This is an example about how to add eventfd support to the current KAIO code,
in order to enable KAIO to post readiness events to a pollable fd (hence
compatible with POSIX select/poll).  The KAIO code simply signals the eventfd
fd when events are ready, and this triggers a POLLIN in the fd.  This patch
uses a reserved for future use member of the struct iocb to pass an eventfd
file descriptor, that KAIO will use to post events every time a request
completes.  At that point, an aio_getevents() will return the completed result
to a struct io_event.  I made a quick test program to verify the patch, and it
runs fine here:

http://www.xmailserver.org/eventfd-aio-test.c

The test program uses poll(2), but it'd, of course, work with select and epoll
too.

This can allow to schedule both block I/O and other poll-able devices
requests, and wait for results using select/poll/epoll.  In a typical
scenario, an application would submit KAIO request using aio_submit(), and
will also use epoll_ctl() on the whole other class of devices (that with the
addition of signals, timers and user events, now it's pretty much complete),
and then would:

	epoll_wait(...);
	for_each_event {
		if (curr_event_is_kaiofd) {
			aio_getevents();
			dispatch_aio_events();
		} else {
			dispatch_epoll_event();
		}
	}
Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9c3060be

10 5月, 2007 2 次提交

unify flush_work/flush_work_keventd and rename it to cancel_work_sync · 28e53bdd

由 Oleg Nesterov 提交于 5月 09, 2007

flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq
(this was possible from the very beginnig, I missed this).  So we can unify
flush_work_keventd and flush_work.

Also, rename flush_work() to cancel_work_sync() and fix all callers.
Perhaps this is not the best name, but "flush_work" is really bad.

(akpm: this is why the earlier patches bypassed maintainers)
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Auke Kok <auke-jan.h.kok@intel.com>,
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

28e53bdd

aio: use flush_work() · a9df62c7

由 Andrew Morton 提交于 5月 09, 2007

Migrate AIO over to use flush_work().

Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a9df62c7

08 5月, 2007 1 次提交

KMEM_CACHE(): simplify slab cache creation · 0a31bd5f

由 Christoph Lameter 提交于 5月 06, 2007

This patch provides a new macro

KMEM_CACHE(<struct>, <flags>)

to simplify slab creation. KMEM_CACHE creates a slab with the name of the
struct, with the size of the struct and with the alignment of the struct.
Additional slab flags may be specified if necessary.

Example

struct test_slab {
	int a,b,c;
	struct list_head;
} __cacheline_aligned_in_smp;

test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)

will create a new slab named "test_slab" of the size sizeof(struct
test_slab) and aligned to the alignment of test slab.  If it fails then we
panic.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0a31bd5f

28 3月, 2007 1 次提交

[PATCH] aio: remove bare user-triggerable error printk · 28defbea

由 Zach Brown 提交于 3月 27, 2007

The user can generate console output if they cause do_mmap() to fail
during sys_io_setup().  This was seen in a regression test that does
exactly that by spinning calling mmap() until it gets -ENOMEM before
calling io_setup().

We don't need this printk at all, just remove it.
Signed-off-by: NZach Brown <zach.brown@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

28defbea

12 2月, 2007 2 次提交

[PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc(). · c3762229

由 Robert P. J. Day 提交于 2月 10, 2007

Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
corresponding "kmem_cache_zalloc()" call.
Signed-off-by: NRobert P. J. Day <rpjday@mindspring.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Roland McGrath <roland@redhat.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: NJoel Becker <Joel.Becker@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c3762229

[PATCH] Remove final references to deprecated "MAP_ANON" page protection flag · e10a4437

由 Robert P. J. Day 提交于 2月 10, 2007

Remove the last vestiges of the long-deprecated "MAP_ANON" page protection
flag: use "MAP_ANONYMOUS" instead.
Signed-off-by: NRobert P. J. Day <rpjday@mindspring.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e10a4437

04 2月, 2007 1 次提交

[PATCH] aio: fix buggy put_ioctx call in aio_complete - v2 · dee11c23

由 Ken Chen 提交于 2月 03, 2007

An AIO bug was reported that sleeping function is being called in softirq
context:

BUG: warning at kernel/mutex.c:132/__mutex_lock_common()
Call Trace:
     [<a000000100577b00>] __mutex_lock_slowpath+0x640/0x6c0
     [<a000000100577ba0>] mutex_lock+0x20/0x40
     [<a0000001000a25b0>] flush_workqueue+0xb0/0x1a0
     [<a00000010018c0c0>] __put_ioctx+0xc0/0x240
     [<a00000010018d470>] aio_complete+0x2f0/0x420
     [<a00000010019cc80>] finished_one_bio+0x200/0x2a0
     [<a00000010019d1c0>] dio_bio_complete+0x1c0/0x200
     [<a00000010019d260>] dio_bio_end_aio+0x60/0x80
     [<a00000010014acd0>] bio_endio+0x110/0x1c0
     [<a0000001002770e0>] __end_that_request_first+0x180/0xba0
     [<a000000100277b90>] end_that_request_chunk+0x30/0x60
     [<a0000002073c0c70>] scsi_end_request+0x50/0x300 [scsi_mod]
     [<a0000002073c1240>] scsi_io_completion+0x200/0x8a0 [scsi_mod]
     [<a0000002074729b0>] sd_rw_intr+0x330/0x860 [sd_mod]
     [<a0000002073b3ac0>] scsi_finish_command+0x100/0x1c0 [scsi_mod]
     [<a0000002073c2910>] scsi_softirq_done+0x230/0x300 [scsi_mod]
     [<a000000100277d20>] blk_done_softirq+0x160/0x1c0
     [<a000000100083e00>] __do_softirq+0x200/0x240
     [<a000000100083eb0>] do_softirq+0x70/0xc0

See report: http://marc.theaimsgroup.com/?l=linux-kernel&m=116599593200888&w=2

flush_workqueue() is not allowed to be called in the softirq context.
However, aio_complete() called from I/O interrupt can potentially call
put_ioctx with last ref count on ioctx and triggers bug.  It is simply
incorrect to perform ioctx freeing from aio_complete.

The bug is trigger-able from a race between io_destroy() and aio_complete().
A possible scenario:

cpu0                               cpu1
io_destroy                         aio_complete
  wait_for_all_aios {                __aio_put_req
     ...                                 ctx->reqs_active--;
     if (!ctx->reqs_active)
        return;
  }
  ...
  put_ioctx(ioctx)

                                     put_ioctx(ctx);
                                        __put_ioctx
                                          bam! Bug trigger!

The real problem is that the condition check of ctx->reqs_active in
wait_for_all_aios() is incorrect that access to reqs_active is not
being properly protected by spin lock.

This patch adds that protective spin lock, and at the same time removes
all duplicate ref counting for each kiocb as reqs_active is already used
as a ref count for each active ioctx.  This also ensures that buggy call
to flush_workqueue() in softirq context is eliminated.
Signed-off-by: N"Ken Chen" <kenchen@google.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: <stable@kernel.org>
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dee11c23

31 12月, 2006 1 次提交

[PATCH] Fix lock inversion aio_kick_handler() · 1ebb1101

由 Zach Brown 提交于 12月 29, 2006

lockdep found a AB BC CA lock inversion in retry-based AIO:

1) The task struct's alloc_lock (A) is acquired in process context with
   interrupts enabled.  An interrupt might arrive and call wake_up() which
   grabs the wait queue's q->lock (B).

2) When performing retry-based AIO the AIO core registers
   aio_wake_function() as the wake funtion for iocb->ki_wait.  It is called
   with the wait queue's q->lock (B) held and then tries to add the iocb to
   the run list after acquiring the ctx_lock (C).

3) aio_kick_handler() holds the ctx_lock (C) while acquiring the
   alloc_lock (A) via lock_task() and unuse_mm().  Lockdep emits a warning
   saying that we're trying to connect the irq-safe q->lock to the
   irq-unsafe alloc_lock via ctx_lock.

This fixes the inversion by calling unuse_mm() in the AIO kick handing path
after we've released the ctx_lock.  As Ben LaHaise pointed out __put_ioctx
could set ctx->mm to NULL, so we must only access ctx->mm while we have the
lock.
Signed-off-by: NZach Brown <zach.brown@oracle.com>
Signed-off-by: NSuparna Bhattacharya <suparna@in.ibm.com>
Acked-by: NBenjamin LaHaise <bcrl@kvack.org>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

1ebb1101

14 12月, 2006 1 次提交

[PATCH] Use activate_mm() in fs/aio.c:use_mm() · 90aef12e

由 Jeremy Fitzhardinge 提交于 12月 13, 2006

activate_mm() is not the right thing to be using in use_mm().  It should be
switch_mm().

On normal x86, they're synonymous, but for the Xen patches I'm adding a
hook which assumes that activate_mm is only used the first time a new mm
is used after creation (I have another hook for dealing with dup_mm).  I
think this use of activate_mm() is the only place where it could be used
a second time on an mm.

>From a quick look at the other architectures I think this is OK (most
simply implement one in terms of the other), but some are doing some
subtly different stuff between the two.
Acked-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

90aef12e

08 12月, 2006 3 次提交

[PATCH] aio: remove ki_retried debugging member · 97d2a805

由 Benjamin LaHaise 提交于 12月 06, 2006

Remove the ki_retried member from struct kiocb.  I think the idea was
bounced around a while back, but Arnaldo pointed out another reason that we
should dig it up when he pointed out that the last cacheline of struct
kiocb only contains 4 bytes.  By removing the debugging member, we save
more than the 8 byte on 64 bit machines.
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
Acked-by: NKen Chen <kenneth.w.chen@intel.com>
Acked-by: NZach Brown <zach.brown@oracle.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

97d2a805

[PATCH] aio: kill pointless ki_nbytes assignment in aio_setup_single_vector · b62e8ec2

由 Chen, Kenneth W 提交于 12月 06, 2006

io_submit_one assigns ki_left = ki_nbytes = iocb->aio_nbytes, then calls
down to aio_setup_iocb, then to aio_setup_single_vector.  In there,
ki_nbytes is reassigned to the same value it got two call stack above it.
There is no need to do so.
Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
Acked-by: NZach Brown <zach.brown@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b62e8ec2

[PATCH] slab: remove kmem_cache_t · e18b890b

由 Christoph Lameter 提交于 12月 06, 2006

Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

	#!/bin/sh
	#
	# Replace one string by another in all the kernel sources.
	#

	set -e

	for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
		quilt add $file
		sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
		mv /tmp/$$ $file
		quilt refresh
	done

The script was run like this

	sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e18b890b

30 11月, 2006 1 次提交

BUG_ON conversion for fs/aio.c · 93e06b41

由 Eric Sesterhenn 提交于 11月 30, 2006

This patch converts a if () BUG(); construct to BUG_ON();
which occupies less space, uses unlikely and is safer when
BUG() is disabled.
Signed-off-by: NEric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

93e06b41

22 11月, 2006 2 次提交

WorkStruct: Pass the work_struct pointer instead of context data · 65f27f38

由 David Howells 提交于 11月 22, 2006

Pass the work_struct pointer to the work function rather than context data.
The work function can use container_of() to work out the data.

For the cases where the container of the work_struct may go away the moment the
pending bit is cleared, it is made possible to defer the release of the
structure by deferring the clearing of the pending bit.

To make this work, an extra flag is introduced into the management side of the
work_struct. This governs auto-release of the structure upon execution.

Ordinarily, the work queue executor would release the work_struct for further
scheduling or deallocation by clearing the pending bit prior to jumping to the
work function. This means that, unless the driver makes some guarantee itself
that the work_struct won't go away, the work function may not access anything
else in the work_struct or its container lest they be deallocated.. This is a
problem if the auxiliary data is taken away (as done by the last patch).

However, if the pending bit is *not* cleared before jumping to the work
function, then the work function *may* access the work_struct and its container
with no problems. But then the work function must itself release the
work_struct by calling work_release().

In most cases, automatic release is fine, so this is the default. Special
initiators exist for the non-auto-release case (ending in _NAR).
Signed-Off-By: NDavid Howells <dhowells@redhat.com>

65f27f38

WorkStruct: Separate delayable and non-delayable events. · 52bad64d

由 David Howells 提交于 11月 22, 2006

Separate delayable work items from non-delayable work items be splitting them
into a separate structure (delayed_work), which incorporates a work_struct and
the timer_list removed from work_struct.

The work_struct struct is huge, and this limits it's usefulness. On a 64-bit
architecture it's nearly 100 bytes in size. This reduces that by half for the
non-delayable type of event.
Signed-Off-By: NDavid Howells <dhowells@redhat.com>

52bad64d

03 10月, 2006 1 次提交

[PATCH] pr_debug: aio: use size_t length modifier in pr_debug format arguments · 691578cd

由 Zach Brown 提交于 10月 03, 2006

aio: use size_t length modifier in pr_debug format arguments
Signed-off-by: NZach Brown <zach.brown@oracle.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

691578cd

01 10月, 2006 2 次提交

[PATCH] Add vector AIO support · eed4e51f

由 Badari Pulavarty 提交于 9月 30, 2006

This work is initially done by Zach Brown to add support for vectored aio.
These are the core changes for AIO to support
IOCB_CMD_PREADV/IOCB_CMD_PWRITEV.

[akpm@osdl.org: huge build fix]
Signed-off-by: NZach Brown <zach.brown@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
Acked-by: NBenjamin LaHaise <bcrl@kvack.org>
Acked-by: NJames Morris <jmorris@namei.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

eed4e51f

[PATCH] Vectorize aio_read/aio_write fileop methods · 027445c3

由 Badari Pulavarty 提交于 9月 30, 2006

This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().
Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Cc: Michael Holzheu <HOLZHEU@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

027445c3

27 6月, 2006 1 次提交

spelling fixes · d6e05edc

由 Andreas Mohr 提交于 6月 26, 2006

acquired (aquired)
contiguous (contigious)
successful (succesful, succesfull)
surprise (suprise)
whether (weather)
some other misspellings
Signed-off-by: NAndreas Mohr <andi@lisas.de>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

d6e05edc

23 6月, 2006 1 次提交

[PATCH] list: use list_replace_init() instead of list_splice_init() · 626ab0e6

由 Oleg Nesterov 提交于 6月 23, 2006

list_splice_init(list, head) does unneeded job if it is known that
list_empty(head) == 1.  We can use list_replace_init() instead.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

626ab0e6

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功