1. 09 2月, 2008 3 次提交
  2. 30 1月, 2008 1 次提交
  3. 06 12月, 2007 1 次提交
  4. 19 10月, 2007 1 次提交
  5. 17 10月, 2007 1 次提交
  6. 09 10月, 2007 1 次提交
  7. 11 5月, 2007 1 次提交
    • D
      signal/timer/event: KAIO eventfd support example · 9c3060be
      Davide Libenzi 提交于
      This is an example about how to add eventfd support to the current KAIO code,
      in order to enable KAIO to post readiness events to a pollable fd (hence
      compatible with POSIX select/poll).  The KAIO code simply signals the eventfd
      fd when events are ready, and this triggers a POLLIN in the fd.  This patch
      uses a reserved for future use member of the struct iocb to pass an eventfd
      file descriptor, that KAIO will use to post events every time a request
      completes.  At that point, an aio_getevents() will return the completed result
      to a struct io_event.  I made a quick test program to verify the patch, and it
      runs fine here:
      
      http://www.xmailserver.org/eventfd-aio-test.c
      
      The test program uses poll(2), but it'd, of course, work with select and epoll
      too.
      
      This can allow to schedule both block I/O and other poll-able devices
      requests, and wait for results using select/poll/epoll.  In a typical
      scenario, an application would submit KAIO request using aio_submit(), and
      will also use epoll_ctl() on the whole other class of devices (that with the
      addition of signals, timers and user events, now it's pretty much complete),
      and then would:
      
      	epoll_wait(...);
      	for_each_event {
      		if (curr_event_is_kaiofd) {
      			aio_getevents();
      			dispatch_aio_events();
      		} else {
      			dispatch_epoll_event();
      		}
      	}
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c3060be
  8. 10 5月, 2007 2 次提交
  9. 08 5月, 2007 1 次提交
  10. 28 3月, 2007 1 次提交
  11. 12 2月, 2007 2 次提交
  12. 04 2月, 2007 1 次提交
    • K
      [PATCH] aio: fix buggy put_ioctx call in aio_complete - v2 · dee11c23
      Ken Chen 提交于
      An AIO bug was reported that sleeping function is being called in softirq
      context:
      
      BUG: warning at kernel/mutex.c:132/__mutex_lock_common()
      Call Trace:
           [<a000000100577b00>] __mutex_lock_slowpath+0x640/0x6c0
           [<a000000100577ba0>] mutex_lock+0x20/0x40
           [<a0000001000a25b0>] flush_workqueue+0xb0/0x1a0
           [<a00000010018c0c0>] __put_ioctx+0xc0/0x240
           [<a00000010018d470>] aio_complete+0x2f0/0x420
           [<a00000010019cc80>] finished_one_bio+0x200/0x2a0
           [<a00000010019d1c0>] dio_bio_complete+0x1c0/0x200
           [<a00000010019d260>] dio_bio_end_aio+0x60/0x80
           [<a00000010014acd0>] bio_endio+0x110/0x1c0
           [<a0000001002770e0>] __end_that_request_first+0x180/0xba0
           [<a000000100277b90>] end_that_request_chunk+0x30/0x60
           [<a0000002073c0c70>] scsi_end_request+0x50/0x300 [scsi_mod]
           [<a0000002073c1240>] scsi_io_completion+0x200/0x8a0 [scsi_mod]
           [<a0000002074729b0>] sd_rw_intr+0x330/0x860 [sd_mod]
           [<a0000002073b3ac0>] scsi_finish_command+0x100/0x1c0 [scsi_mod]
           [<a0000002073c2910>] scsi_softirq_done+0x230/0x300 [scsi_mod]
           [<a000000100277d20>] blk_done_softirq+0x160/0x1c0
           [<a000000100083e00>] __do_softirq+0x200/0x240
           [<a000000100083eb0>] do_softirq+0x70/0xc0
      
      See report: http://marc.theaimsgroup.com/?l=linux-kernel&m=116599593200888&w=2
      
      flush_workqueue() is not allowed to be called in the softirq context.
      However, aio_complete() called from I/O interrupt can potentially call
      put_ioctx with last ref count on ioctx and triggers bug.  It is simply
      incorrect to perform ioctx freeing from aio_complete.
      
      The bug is trigger-able from a race between io_destroy() and aio_complete().
      A possible scenario:
      
      cpu0                               cpu1
      io_destroy                         aio_complete
        wait_for_all_aios {                __aio_put_req
           ...                                 ctx->reqs_active--;
           if (!ctx->reqs_active)
              return;
        }
        ...
        put_ioctx(ioctx)
      
                                           put_ioctx(ctx);
                                              __put_ioctx
                                                bam! Bug trigger!
      
      The real problem is that the condition check of ctx->reqs_active in
      wait_for_all_aios() is incorrect that access to reqs_active is not
      being properly protected by spin lock.
      
      This patch adds that protective spin lock, and at the same time removes
      all duplicate ref counting for each kiocb as reqs_active is already used
      as a ref count for each active ioctx.  This also ensures that buggy call
      to flush_workqueue() in softirq context is eliminated.
      Signed-off-by: N"Ken Chen" <kenchen@google.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Suparna Bhattacharya <suparna@in.ibm.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: <stable@kernel.org>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dee11c23
  13. 31 12月, 2006 1 次提交
    • Z
      [PATCH] Fix lock inversion aio_kick_handler() · 1ebb1101
      Zach Brown 提交于
      lockdep found a AB BC CA lock inversion in retry-based AIO:
      
      1) The task struct's alloc_lock (A) is acquired in process context with
         interrupts enabled.  An interrupt might arrive and call wake_up() which
         grabs the wait queue's q->lock (B).
      
      2) When performing retry-based AIO the AIO core registers
         aio_wake_function() as the wake funtion for iocb->ki_wait.  It is called
         with the wait queue's q->lock (B) held and then tries to add the iocb to
         the run list after acquiring the ctx_lock (C).
      
      3) aio_kick_handler() holds the ctx_lock (C) while acquiring the
         alloc_lock (A) via lock_task() and unuse_mm().  Lockdep emits a warning
         saying that we're trying to connect the irq-safe q->lock to the
         irq-unsafe alloc_lock via ctx_lock.
      
      This fixes the inversion by calling unuse_mm() in the AIO kick handing path
      after we've released the ctx_lock.  As Ben LaHaise pointed out __put_ioctx
      could set ctx->mm to NULL, so we must only access ctx->mm while we have the
      lock.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      Signed-off-by: NSuparna Bhattacharya <suparna@in.ibm.com>
      Acked-by: NBenjamin LaHaise <bcrl@kvack.org>
      Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1ebb1101
  14. 14 12月, 2006 1 次提交
    • J
      [PATCH] Use activate_mm() in fs/aio.c:use_mm() · 90aef12e
      Jeremy Fitzhardinge 提交于
      activate_mm() is not the right thing to be using in use_mm().  It should be
      switch_mm().
      
      On normal x86, they're synonymous, but for the Xen patches I'm adding a
      hook which assumes that activate_mm is only used the first time a new mm
      is used after creation (I have another hook for dealing with dup_mm).  I
      think this use of activate_mm() is the only place where it could be used
      a second time on an mm.
      
      >From a quick look at the other architectures I think this is OK (most
      simply implement one in terms of the other), but some are doing some
      subtly different stuff between the two.
      Acked-by: NDavid Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      90aef12e
  15. 08 12月, 2006 3 次提交
  16. 30 11月, 2006 1 次提交
  17. 22 11月, 2006 2 次提交
    • D
      WorkStruct: Pass the work_struct pointer instead of context data · 65f27f38
      David Howells 提交于
      Pass the work_struct pointer to the work function rather than context data.
      The work function can use container_of() to work out the data.
      
      For the cases where the container of the work_struct may go away the moment the
      pending bit is cleared, it is made possible to defer the release of the
      structure by deferring the clearing of the pending bit.
      
      To make this work, an extra flag is introduced into the management side of the
      work_struct.  This governs auto-release of the structure upon execution.
      
      Ordinarily, the work queue executor would release the work_struct for further
      scheduling or deallocation by clearing the pending bit prior to jumping to the
      work function.  This means that, unless the driver makes some guarantee itself
      that the work_struct won't go away, the work function may not access anything
      else in the work_struct or its container lest they be deallocated..  This is a
      problem if the auxiliary data is taken away (as done by the last patch).
      
      However, if the pending bit is *not* cleared before jumping to the work
      function, then the work function *may* access the work_struct and its container
      with no problems.  But then the work function must itself release the
      work_struct by calling work_release().
      
      In most cases, automatic release is fine, so this is the default.  Special
      initiators exist for the non-auto-release case (ending in _NAR).
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      65f27f38
    • D
      WorkStruct: Separate delayable and non-delayable events. · 52bad64d
      David Howells 提交于
      Separate delayable work items from non-delayable work items be splitting them
      into a separate structure (delayed_work), which incorporates a work_struct and
      the timer_list removed from work_struct.
      
      The work_struct struct is huge, and this limits it's usefulness.  On a 64-bit
      architecture it's nearly 100 bytes in size.  This reduces that by half for the
      non-delayable type of event.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      52bad64d
  18. 03 10月, 2006 1 次提交
  19. 01 10月, 2006 2 次提交
  20. 27 6月, 2006 1 次提交
  21. 23 6月, 2006 1 次提交
  22. 26 3月, 2006 1 次提交
  23. 09 1月, 2006 1 次提交
  24. 14 11月, 2005 2 次提交
  25. 07 11月, 2005 1 次提交
    • Z
      [PATCH] aio: remove aio_max_nr accounting race · d55b5fda
      Zach Brown 提交于
      AIO was adding a new context's max requests to the global total before
      testing if that resulting total was over the global limit.  This let
      innocent tasks get their new limit tested along with a racing guilty task
      that was crossing the limit.  This serializes the _nr accounting with a
      spinlock It also switches to using unsigned long for the global totals.
      Individual contexts are still limited to an unsigned int's worth of
      requests by the syscall interface.
      
      The problem and fix were verified with a simple program that spun creating
      and destroying a context while holding on to another long lived context.
      Before the patch a task creating a tiny context could get a spurious EAGAIN
      if it raced with a task creating a very large context that overran the
      limit.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d55b5fda
  26. 24 10月, 2005 1 次提交
  27. 18 10月, 2005 1 次提交
  28. 01 10月, 2005 3 次提交
  29. 18 9月, 2005 1 次提交