1. 11 12月, 2006 6 次提交
    • Z
      [PATCH] dio: only call aio_complete() after returning -EIOCBQUEUED · 8459d86a
      Zach Brown 提交于
      The only time it is safe to call aio_complete() is when the ->ki_retry
      function returns -EIOCBQUEUED to the AIO core.  direct_io_worker() has
      historically done this by relying on its caller to translate positive return
      codes into -EIOCBQUEUED for the aio case.  It did this by trying to keep
      conditionals in sync.  direct_io_worker() knew when finished_one_bio() was
      going to call aio_complete().  It would reverse the test and wait and free the
      dio in the cases it thought that finished_one_bio() wasn't going to.
      
      Not surprisingly, it ended up getting it wrong.  'ret' could be a negative
      errno from the submission path but it failed to communicate this to
      finished_one_bio().  direct_io_worker() would return < 0, it's callers
      wouldn't raise -EIOCBQUEUED, and aio_complete() would be called.  In the
      future finished_one_bio()'s tests wouldn't reflect this and aio_complete()
      would be called for a second time which can manifest as an oops.
      
      The previous cleanups have whittled the sync and async completion paths down
      to the point where we can collapse them and clearly reassert the invariant
      that we must only call aio_complete() after returning -EIOCBQUEUED.
      direct_io_worker() will only return -EIOCBQUEUED when it is not the last to
      drop the dio refcount and the aio bio completion path will only call
      aio_complete() when it is the last to drop the dio refcount.
      direct_io_worker() can ensure that it is the last to drop the reference count
      by waiting for bios to drain.  It does this for sync ops, of course, and for
      partial dio writes that must fall back to buffered and for aio ops that saw
      errors during submission.
      
      This means that operations that end up waiting, even if they were issued as
      aio ops, will not call aio_complete() from dio.  Instead we return the return
      code of the operation and let the aio core call aio_complete().  This is
      purposely done to fix a bug where AIO DIO file extensions would call
      aio_complete() before their callers have a chance to update i_size.
      
      Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers
      no longer have to translate for it.  XFS needs to be careful not to free
      resources that will be used during AIO completion if -EIOCBQUEUED is returned.
       We maintain the previous behaviour of trying to write fs metadata for O_SYNC
      aio+dio writes.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Suparna Bhattacharya <suparna@in.ibm.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: <xfs-masters@oss.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8459d86a
    • A
      [PATCH] io-accounting-read-accounting nfs fix · 8bde37f0
      Andrew Morton 提交于
      nfs's ->readpages uses read_cache_pages().  Wire it up there.
      
      [wfg@mail.ustc.edu.cn: account only successful nfs/fuse reads]
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8bde37f0
    • A
      [PATCH] io-accounting: write-cancel accounting · e08748ce
      Andrew Morton 提交于
      Account for the number of byte writes which this process caused to not happen
      after all.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e08748ce
    • A
      [PATCH] io-accounting: write accounting · 55e829af
      Andrew Morton 提交于
      Accounting writes is fairly simple: whenever a process flips a page from clean
      to dirty, we accuse it of having caused a write to underlying storage of
      PAGE_CACHE_SIZE bytes.
      
      This may overestimate the amount of writing: the page-dirtying may cause only
      one buffer_head's worth of writeout.  Fixing that is possible, but probably a
      bit messy and isn't obviously important.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      55e829af
    • A
      [PATCH] clean up __set_page_dirty_nobuffers() · 8c08540f
      Andrew Morton 提交于
      Save a tabstop in __set_page_dirty_nobuffers() and __set_page_dirty_buffers()
      and a few other places.  No functional changes.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8c08540f
    • H
      [PATCH] read_zero_pagealigned() locking fix · 5fcf7bb7
      Hugh Dickins 提交于
      Ramiro Voicu hits the BUG_ON(!pte_none(*pte)) in zeromap_pte_range: kernel
      bugzilla 7645.  Right: read_zero_pagealigned uses down_read of mmap_sem,
      but another thread's racing read of /dev/zero, or a normal fault, can
      easily set that pte again, in between zap_page_range and zeromap_page_range
      getting there.  It's been wrong ever since 2.4.3.
      
      The simple fix is to use down_write instead, but that would serialize reads
      of /dev/zero more than at present: perhaps some app would be badly
      affected.  So instead let zeromap_page_range return the error instead of
      BUG_ON, and read_zero_pagealigned break to the slower clear_user loop in
      that case - there's no need to optimize for it.
      
      Use -EEXIST for when a pte is found: BUG_ON in mmap_zero (the other user of
      zeromap_page_range), though it really isn't interesting there.  And since
      mmap_zero wants -EAGAIN for out-of-memory, the zeromaps better return that
      than -ENOMEM.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Ramiro Voicu: <Ramiro.Voicu@cern.ch>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5fcf7bb7
  2. 09 12月, 2006 7 次提交
  3. 08 12月, 2006 27 次提交