1. 27 7月, 2010 35 次提交
    • D
      xfs: fix xfs_trans_add_item() lockdep warnings · 43869706
      Dave Chinner 提交于
      xfs_trans_add_item() is called with ip->i_ilock held, which means it
      is unsafe for memory reclaim to recurse back into the filesystem
      (ilock is required in writeback). Hence the allocation needs to be
      KM_NOFS to avoid recursion.
      
      Lockdep report indicating memory allocation being called with the
      ip->i_ilock held is as follows:
      
      [ 1749.866796] =================================
      [ 1749.867788] [ INFO: inconsistent lock state ]
      [ 1749.868327] 2.6.35-rc3-dgc+ #25
      [ 1749.868741] ---------------------------------
      [ 1749.868741] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
      [ 1749.868741] dd/2835 [HC0[0]:SC0[0]:HE1:SE1] takes:
      [ 1749.868741]  (&(&ip->i_lock)->mr_lock){++++?.}, at: [<ffffffff813170fb>] xfs_ilock+0x10b/0x190
      [ 1749.868741] {IN-RECLAIM_FS-W} state was registered at:
      [ 1749.868741]   [<ffffffff810b3a97>] __lock_acquire+0x437/0x1450
      [ 1749.868741]   [<ffffffff810b4b56>] lock_acquire+0xa6/0x160
      [ 1749.868741]   [<ffffffff810a20b5>] down_write_nested+0x65/0xb0
      [ 1749.868741]   [<ffffffff813170fb>] xfs_ilock+0x10b/0x190
      [ 1749.868741]   [<ffffffff8134e819>] xfs_reclaim_inode+0x99/0x310
      [ 1749.868741]   [<ffffffff8134f56b>] xfs_inode_ag_walk+0x8b/0x150
      [ 1749.868741]   [<ffffffff8134f6bb>] xfs_inode_ag_iterator+0x8b/0xf0
      [ 1749.868741]   [<ffffffff8134f7a8>] xfs_reclaim_inode_shrink+0x88/0x90
      [ 1749.868741]   [<ffffffff81119d07>] shrink_slab+0x137/0x1a0
      [ 1749.868741]   [<ffffffff8111bbe1>] balance_pgdat+0x421/0x6a0
      [ 1749.868741]   [<ffffffff8111bf7d>] kswapd+0x11d/0x320
      [ 1749.868741]   [<ffffffff8109ce56>] kthread+0x96/0xa0
      [ 1749.868741]   [<ffffffff81035de4>] kernel_thread_helper+0x4/0x10
      [ 1749.868741] irq event stamp: 4234335
      [ 1749.868741] hardirqs last  enabled at (4234335): [<ffffffff81147d25>] kmem_cache_free+0x115/0x220
      [ 1749.868741] hardirqs last disabled at (4234334): [<ffffffff81147c4d>] kmem_cache_free+0x3d/0x220
      [ 1749.868741] softirqs last  enabled at (4233112): [<ffffffff81084dd2>] __do_softirq+0x142/0x260
      [ 1749.868741] softirqs last disabled at (4233095): [<ffffffff81035edc>] call_softirq+0x1c/0x50
      [ 1749.868741] 
      [ 1749.868741] other info that might help us debug this:
      [ 1749.868741] 2 locks held by dd/2835:
      [ 1749.868741]  #0:  (&(&ip->i_iolock)->mr_lock#2){+.+.+.}, at: [<ffffffff81316edd>] xfs_ilock_nowait+0xed/0x200
      [ 1749.868741]  #1:  (&(&ip->i_lock)->mr_lock){++++?.}, at: [<ffffffff813170fb>] xfs_ilock+0x10b/0x190
      [ 1749.868741] 
      [ 1749.868741] stack backtrace:
      [ 1749.868741] Pid: 2835, comm: dd Not tainted 2.6.35-rc3-dgc+ #25
      [ 1749.868741] Call Trace:
      [ 1749.868741]  [<ffffffff810b1faa>] print_usage_bug+0x18a/0x190
      [ 1749.868741]  [<ffffffff8104264f>] ? save_stack_trace+0x2f/0x50
      [ 1749.868741]  [<ffffffff810b2400>] ? check_usage_backwards+0x0/0xf0
      [ 1749.868741]  [<ffffffff810b2f11>] mark_lock+0x331/0x400
      [ 1749.868741]  [<ffffffff810b3047>] mark_held_locks+0x67/0x90
      [ 1749.868741]  [<ffffffff810b3111>] lockdep_trace_alloc+0xa1/0xe0
      [ 1749.868741]  [<ffffffff81147419>] kmem_cache_alloc+0x39/0x1e0
      [ 1749.868741]  [<ffffffff8133f954>] kmem_zone_alloc+0x94/0xe0
      [ 1749.868741]  [<ffffffff8133f9be>] kmem_zone_zalloc+0x1e/0x50
      [ 1749.868741]  [<ffffffff81335f02>] xfs_trans_add_item+0x72/0xb0
      [ 1749.868741]  [<ffffffff81339e41>] xfs_trans_ijoin+0xa1/0xd0
      [ 1749.868741]  [<ffffffff81319f82>] xfs_itruncate_finish+0x312/0x5d0
      [ 1749.868741]  [<ffffffff8133cb87>] xfs_free_eofblocks+0x227/0x280
      [ 1749.868741]  [<ffffffff8133cd18>] xfs_release+0x138/0x190
      [ 1749.868741]  [<ffffffff813464c5>] xfs_file_release+0x15/0x20
      [ 1749.868741]  [<ffffffff81150ebf>] fput+0x13f/0x260
      [ 1749.868741]  [<ffffffff8114d8c2>] filp_close+0x52/0x80
      [ 1749.868741]  [<ffffffff8114d9a9>] sys_close+0xb9/0x120
      [ 1749.868741]  [<ffffffff81034ff2>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      43869706
    • D
      xfs: simplify and remove xfs_ireclaim · 2f11feab
      Dave Chinner 提交于
      xfs_ireclaim has to get and put te pag structure because it is only
      called with the inode to reclaim. The one caller of this function
      already has a reference on the pag and a pointer to is, so move the
      radix tree delete to the caller and remove xfs_ireclaim completely.
      This avoids a xfs_perag_get/put on every inode being reclaimed.
      
      The overhead was noticed in a bug report at:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=16348Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2f11feab
    • D
      xfs: don't block on buffer read errors · ec53d1db
      Dave Chinner 提交于
      xfs_buf_read() fails to detect dispatch errors before attempting to
      wait on sychronous IO. If there was an error, it will get stuck
      forever, waiting for an I/O that was never started. Make sure the
      error is detected correctly.
      
      Further, such a failure can leave locked pages in the page cache
      which will cause a later operation to hang on the page. Ensure that
      we correctly process pages in the buffers when we get a dispatch
      error.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ec53d1db
    • D
      xfs: move inode shrinker unregister even earlier · a4190f90
      Dave Chinner 提交于
      I missed Dave Chinner's second revision of this change, and pushed
      his first version out to the repository instead.
      
      	commit a476c59ebb279d738718edc0e3fb76aab3687114
      	Author: Dave Chinner <dchinner@redhat.com>
      
      This commit compensates for that by moving a block of code up a bit
      further, with a result that matches the the effect of Dave's second
      version.
      
      Dave's first version was:
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Dave's second version was:
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      a4190f90
    • C
      xfs: remove a dmapi leftover · fa17b25e
      Christoph Hellwig 提交于
      The open_exec file operation is only added by the external dmapi
      patch.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      fa17b25e
    • C
      xfs: writepage always has buffers · 78558fe8
      Christoph Hellwig 提交于
      These days we always have buffers thanks to ->page_mkwrite.  And we
      already have an assert a few lines above tripping in case that was
      not true due to a bug.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      78558fe8
    • C
      xfs: allow writeback from kswapd · d4f7a5cb
      Christoph Hellwig 提交于
      We only need disable I/O from direct or memcg reclaim.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      d4f7a5cb
    • C
      xfs: remove incorrect log write optimization · 651701d7
      Christoph Hellwig 提交于
      We do need a barrier for the first buffer of a split log write.
      Otherwise we might incorrectly stamp the tail LSN into transactions
      in the first part of the split write, or not flush data I/O before
      updating the inode size.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      651701d7
    • D
      xfs: unregister inode shrinker before freeing filesystem structures · 2727ccc9
      Dave Chinner 提交于
      Currently we don't remove the XFS mount from the shrinker list until
      late in the unmount path. By this time, we have already torn down
      the internals of the filesystem (e.g. the per-ag structures), and
      hence if the shrinker is executed between the teardown and the
      unregistering, the shrinker will get NULL per-ag structure pointers
      and panic trying to dereference them.
      
      Fix this by removing the xfs mount from the shrinker list before
      tearing down it's internal structures.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      2727ccc9
    • C
      xfs: split xfs_itrace_entry · cca28fb8
      Christoph Hellwig 提交于
      Replace the xfs_itrace_entry catchall with specific trace points.  For
      most simple callers we now use the simple inode class, which used to
      be the iget class, but add more details tracing for namespace events,
      which now includes the name of the directory entries manipulated.
      
      Remove the xfs_inactive trace point, which is a duplicate of the clear_inode
      one, and the xfs_change_file_space trace point, which is immediately
      followed by the more specific alloc/free space trace points.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      cca28fb8
    • C
      xfs: remove xfs_iput · f2d67614
      Christoph Hellwig 提交于
      xfs_iput is just a small wrapper for xfs_iunlock + IRELE.  Having this
      out of line wrapper means the trace events in those two can't track
      their caller properly.  So just remove the wrapper and opencode the
      unlock + rele in the few callers.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      f2d67614
    • C
      xfs: remove xfs_iput_new · ef35e925
      Christoph Hellwig 提交于
      We never get an i_mode of 0 or a locked VFS inode until we pass in the
      XFS_IGET_CREATE flag to xfs_iget, which makes xfs_iput_new equivalent to
      xfs_iput for the only caller.  In addition to that xfs_nfs_get_inode
      does not even need to lock the inode given that the generation never changes
      for a life inode, so just pass a 0 lock_flags to xfs_iget and release
      the inode using IRELE in the error path.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      ef35e925
    • C
      xfs: some iget tracing cleanups / fixes · d2e078c3
      Christoph Hellwig 提交于
      The xfs_iget_alloc/found tracepoints are a bit misnamed and misplaced.
      Rename them to xfs_iget_hit/xfs_iget_miss and move them to the beggining
      of the xfs_iget_cache_hit/miss functions.  Add a new xfs_iget_reclaim_fail
      tracepoint for the case where we fail to re-initialize a VFS inode,
      and add a second instance of the xfs_iget_skip tracepoint for the case
      of a failed igrab() call.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      d2e078c3
    • C
      xfs: do not use emums for flags used in tracing · 807cbbdb
      Christoph Hellwig 提交于
      The tracing code can't print flags defined as enums.  Most flags that
      we want to print are defines as macros already, but move the few remaining
      ones over to make the trace output more useful.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      807cbbdb
    • C
      xfs: remove explicit xfs_sync_data/xfs_sync_attr calls on umount · 64c86149
      Christoph Hellwig 提交于
      On the final put of a superblock the VFS already calls sync_filesystem
      for us to write out all data and wait for it.  No need to start another
      asynchronous writeback inside ->put_super.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      64c86149
    • C
      xfs: small cleanups for xfs_iomap / __xfs_get_blocks · f2bde9b8
      Christoph Hellwig 提交于
      Remove the flags argument to  __xfs_get_blocks as we can easily derive
      it from the direct argument, and remove the unused BMAPI_MMAP flag.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      f2bde9b8
    • C
      xfs: reduce stack usage in xfs_iomap · 3070451e
      Christoph Hellwig 提交于
      xfs_iomap passes a xfs_bmbt_irec pointer to xfs_iomap_write_direct and
      xfs_iomap_write_allocate to give them the results of our read-only
      xfs_bmapi query.  Instead of allocating a new xfs_bmbt_irec on stack
      for the next call to xfs_bmapi re use the one we got passed as it's not
      used after this point.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      3070451e
    • C
      xfs: avoid synchronous transaction in xfs_fs_write_inode · 7a36c8a9
      Christoph Hellwig 提交于
      We already rely on the fact that the sync code will cause a synchronous
      log force later on (currently via xfs_fs_sync_fs -> xfs_quiesce_data ->
      xfs_sync_data), so no need to do this here.  This allows us to avoid
      a lot of synchronous log forces during sync, which pays of especially
      with delayed logging enabled.   Some compilebench numbers that show
      this:
      
      xfs (delayed logging, 256k logbufs)
      ===================================
      
      intial create		  25.94 MB/s	  25.75 MB/s	  25.64 MB/s
      create			   8.54 MB/s	   9.12 MB/s	   9.15 MB/s
      patch			   2.47 MB/s	   2.47 MB/s	   3.17 MB/s
      compile			  29.65 MB/s	  30.51 MB/s	  27.33 MB/s
      clean			  90.92 MB/s	  98.83 MB/s	 128.87 MB/s
      read tree		  11.90 MB/s	  11.84 MB/s	   8.56 MB/s
      read compiled		  28.75 MB/s	  29.96 MB/s	  24.25 MB/s
      delete tree		8.39 seconds	8.12 seconds	8.46 seconds
      delete compiled		8.35 seconds	8.44 seconds	5.11 seconds
      stat tree		6.03 seconds	5.59 seconds	5.19 seconds
      stat compiled tree	9.00 seconds	9.52 seconds	8.49 seconds
      
      xfs + write_inode log_force removal
      ===================================
      intial create		  25.87 MB/s	  25.76 MB/s	  25.87 MB/s
      create			  15.18 MB/s	  14.80 MB/s	  14.94 MB/s
      patch			   3.13 MB/s	   3.14 MB/s	   3.11 MB/s
      compile			  36.74 MB/s	  37.17 MB/s	  36.84 MB/s
      clean			 226.02 MB/s	 222.58 MB/s	 217.94 MB/s
      read tree		  15.14 MB/s	  15.02 MB/s	  15.14 MB/s
      read compiled tree	  29.30 MB/s	  29.31 MB/s	  29.32 MB/s
      delete tree		6.22 seconds	6.14 seconds	6.15 seconds
      delete compiled tree	5.75 seconds	5.92 seconds	5.81 seconds
      stat tree		4.60 seconds	4.51 seconds	4.56 seconds
      stat compiled tree	4.07 seconds	3.87 seconds	3.96 seconds
      
      In addition to that also remove the delwri inode flush that is unessecary
      now that bulkstat is always coherent.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      7a36c8a9
    • C
      xfs: simplify xfs_vm_writepage · 20cb52eb
      Christoph Hellwig 提交于
      The writepage implementation in XFS still tries to deal with dirty but
      unmapped buffers which used to caused by writes through shared mmaps.  Since
      the introduction of ->page_mkwrite these can't happen anymore, so remove the
      code dealing with them.
      
      Note that the all_bh variable which causes us to start I/O on all buffers on
      the pages was controlled by the count of unmapped buffers, which also
      included those not actually dirty.  It's now unconditionally initialized to
      0 but set to 1 for the case of small file size extensions.  It probably can
      be removed entirely, but that's left for another patch.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      20cb52eb
    • C
      xfs: simplify xfs_vm_releasepage · 89f3b363
      Christoph Hellwig 提交于
      Currently the xfs releasepage implementation has code to deal with converting
      delayed allocated and unwritten space.  But we never get called for those as
      we always convert delayed and unwritten space when cleaning a page, or drop
      the state from the buffers in block_invalidatepage.  We still keep a WARN_ON
      on those cases for now, but remove all the case dealing with it, which allows
      to fold xfs_page_state_convert into xfs_vm_writepage and remove the !startio
      case from the whole writeback path.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      89f3b363
    • E
      xfs: fix corruption case for block size < page size · 3d9b02e3
      Eric Sandeen 提交于
      xfstests 194 first truncats a file back and then extends it again by
      truncating it to a larger size.  This causes discard_buffer to drop
      the mapped, but not the uptodate bit and thus creates something that
      xfs_page_state_convert takes for unmapped space created by mmap because
      it doesn't check for the dirty bit, which also gets cleared by
      discard_buffer and checked by other ->writepage implementations like
      block_write_full_page.  Handle this kind of buffers early, and unlike
      Eric's first version of the patch simply ASSERT that the buffers is
      dirty, given that the mmap write case can't happen anymore since the
      introduction of ->page_mkwrite.  The now dead code dealing with that
      will be deleted in a follow on patch.
      Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      3d9b02e3
    • C
      xfs: remove unused delta tracking code in xfs_bmapi · b4e9181e
      Christoph Hellwig 提交于
      This code was introduced four years ago in commit
      3e57ecf6 without any review and has
      been unused since.  Remove it just as the rest of the code introduced
      in that commit to reduce that stack usage and complexity in this central
      piece of code.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      b4e9181e
    • C
      xfs: remove unused XFS_BMAPI_ flags · cd8b0bb3
      Christoph Hellwig 提交于
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      cd8b0bb3
    • C
    • C
    • C
      xfs: kill the unused xlog_debug variable · dbb2f652
      Christoph Hellwig 提交于
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      dbb2f652
    • C
      xfs: fix the xfs_log_iovec i_addr type · 4e0d5f92
      Christoph Hellwig 提交于
      By making this member a void pointer we can get rid of a lot of pointless
      casts.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      4e0d5f92
    • C
      xfs: simplify inode to transaction joining · 898621d5
      Christoph Hellwig 提交于
      Currently we need to either call IHOLD or xfs_trans_ihold on an inode when
      joining it to a transaction via xfs_trans_ijoin.
      
      This patches instead makes xfs_trans_ijoin usable on it's own by doing
      an implicity xfs_trans_ihold, which also allows us to drop the third
      argument.  For the case where we want to hold a reference on the inode
      a xfs_trans_ijoin_ref wrapper is added which does the IHOLD and marks
      the inode for needing an xfs_iput.  In addition to the cleaner interface
      to the caller this also simplifies the implementation.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      898621d5
    • C
      xfs: simplify buffer pinning · 4d16e924
      Christoph Hellwig 提交于
      Get rid of the xfs_buf_pin/xfs_buf_unpin/xfs_buf_ispin helpers and opencode
      them in their only callers, just like we did for the inode pinning a while
      ago.  Also remove duplicate trace points - the bufitem tracepoints cover
      all the information that is present in a buffer tracepoint.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      4d16e924
    • C
      xfs: give li_cb callbacks the correct prototype · ca30b2a7
      Christoph Hellwig 提交于
      Stop the function pointer casting madness and give all the li_cb instances
      correct prototype.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      ca30b2a7
    • C
      xfs: give xfs_item_ops methods the correct prototypes · 7bfa31d8
      Christoph Hellwig 提交于
      Stop the function pointer casting madness and give all the xfs_item_ops the
      correct prototypes.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      7bfa31d8
    • C
      xfs: merge iop_unpin_remove into iop_unpin · 9412e318
      Christoph Hellwig 提交于
      The unpin_remove item operation instances always share most of the
      implementation with the respective unpin implementation.  So instead
      of keeping two different entry points add a remove flag to the unpin
      operation and share the code more easily.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      9412e318
    • C
      xfs: simplify log item descriptor tracking · e98c414f
      Christoph Hellwig 提交于
      Currently we track log item descriptor belonging to a transaction using a
      complex opencoded chunk allocator.  This code has been there since day one
      and seems to work around the lack of an efficient slab allocator.
      
      This patch replaces it with dynamically allocated log item descriptors
      from a dedicated slab pool, linked to the transaction by a linked list.
      
      This allows to greatly simplify the log item descriptor tracking to the
      point where it's just a couple hundred lines in xfs_trans.c instead of
      a separate file.  The external API has also been simplified while we're
      at it - the xfs_trans_add_item and xfs_trans_del_item functions to add/
      delete items from a transaction have been simplified to the bare minium,
      and the xfs_trans_find_item function is replaced with a direct dereference
      of the li_desc field.  All debug code walking the list of log items in
      a transaction is down to a simple list_for_each_entry.
      
      Note that we could easily use a singly linked list here instead of the
      double linked list from list.h as the fastpath only does deletion from
      sequential traversal.  But given that we don't have one available as
      a library function yet I use the list.h functions for simplicity.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      e98c414f
    • C
      xfs: remove unneeded #include statements · 3400777f
      Christoph Hellwig 提交于
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <david@fromorbit.com>
      3400777f
    • C
      xfs: drop dmapi hooks · 288699fe
      Christoph Hellwig 提交于
      Dmapi support was never merged upstream, but we still have a lot of hooks
      bloating XFS for it, all over the fast pathes of the filesystem.
      
      This patch drops over 700 lines of dmapi overhead.  If we'll ever get HSM
      support in mainline at least the namespace events can be done much saner
      in the VFS instead of the individual filesystem, so it's not like this
      is much help for future work.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      288699fe
  2. 23 7月, 2010 1 次提交
    • D
      CIFS: Fix a malicious redirect problem in the DNS lookup code · 4c0c03ca
      David Howells 提交于
      Fix the security problem in the CIFS filesystem DNS lookup code in which a
      malicious redirect could be installed by a random user by simply adding a
      result record into one of their keyrings with add_key() and then invoking a
      CIFS CFS lookup [CVE-2010-2524].
      
      This is done by creating an internal keyring specifically for the caching of
      DNS lookups.  To enforce the use of this keyring, the module init routine
      creates a set of override credentials with the keyring installed as the thread
      keyring and instructs request_key() to only install lookup result keys in that
      keyring.
      
      The override is then applied around the call to request_key().
      
      This has some additional benefits when a kernel service uses this module to
      request a key:
      
       (1) The result keys are owned by root, not the user that caused the lookup.
      
       (2) The result keys don't pop up in the user's keyrings.
      
       (3) The result keys don't come out of the quota of the user that caused the
           lookup.
      
      The keyring can be viewed as root by doing cat /proc/keys:
      
      2a0ca6c3 I-----     1 perm 1f030000     0     0 keyring   .dns_resolver: 1/4
      
      It can then be listed with 'keyctl list' by root.
      
      	# keyctl list 0x2a0ca6c3
      	1 key in keyring:
      	726766307: --alswrv     0     0 dns_resolver: foo.bar.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-and-Tested-by: NJeff Layton <jlayton@redhat.com>
      Acked-by: NSteve French <smfrench@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4c0c03ca
  3. 22 7月, 2010 1 次提交
  4. 20 7月, 2010 3 次提交
    • D
      xfs: track AGs with reclaimable inodes in per-ag radix tree · 16fd5367
      Dave Chinner 提交于
      https://bugzilla.kernel.org/show_bug.cgi?id=16348
      
      When the filesystem grows to a large number of allocation groups,
      the summing of recalimable inodes gets expensive. In many cases,
      most AGs won't have any reclaimable inodes and so we are wasting CPU
      time aggregating over these AGs. This is particularly important for
      the inode shrinker that gets called frequently under memory
      pressure.
      
      To avoid the overhead, track AGs with reclaimable inodes in the
      per-ag radix tree so that we can find all the AGs with reclaimable
      inodes via a simple gang tag lookup. This involves setting the tag
      when the first reclaimable inode is tracked in the AG, and removing
      the tag when the last reclaimable inode is removed from the tree.
      Then the summation process becomes a loop walking the radix tree
      summing AGs with the reclaim tag set.
      
      This significantly reduces the overhead of scanning - a 6400 AG
      filesystea now only uses about 25% of a cpu in kswapd while slab
      reclaim progresses instead of being permanently stuck at 100% CPU
      and making little progress. Clean filesystems filesystems will see
      no overhead and the overhead only increases linearly with the number
      of dirty AGs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      16fd5367
    • D
      xfs: convert inode shrinker to per-filesystem contexts · 70e60ce7
      Dave Chinner 提交于
      Now the shrinker passes us a context, wire up a shrinker context per
      filesystem. This allows us to remove the global mount list and the
      locking problems that introduced. It also means that a shrinker call
      does not need to traverse clean filesystems before finding a
      filesystem with reclaimable inodes.  This significantly reduces
      scanning overhead when lots of filesystems are present.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      70e60ce7
    • D
      Btrfs: fix checks in BTRFS_IOC_CLONE_RANGE · 2ebc3464
      Dan Rosenberg 提交于
      1.  The BTRFS_IOC_CLONE and BTRFS_IOC_CLONE_RANGE ioctls should check
      whether the donor file is append-only before writing to it.
      
      2.  The BTRFS_IOC_CLONE_RANGE ioctl appears to have an integer
      overflow that allows a user to specify an out-of-bounds range to copy
      from the source file (if off + len wraps around).  I haven't been able
      to successfully exploit this, but I'd imagine that a clever attacker
      could use this to read things he shouldn't.  Even if it's not
      exploitable, it couldn't hurt to be safe.
      Signed-off-by: NDan Rosenberg <dan.j.rosenberg@gmail.com>
      cc: stable@kernel.org
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      2ebc3464