1. 24 5月, 2011 9 次提交
    • J
      Btrfs: kill BTRFS_I(inode)->block_group · d82a6f1d
      Josef Bacik 提交于
      Originally this was going to be used as a way to give hints to the allocator,
      but frankly we can get much better hints elsewhere and it's not even used at all
      for anything usefull.  In addition to be completely useless, when we initialize
      an inode we try and find a freeish block group to set as the inodes block group,
      and with a completely full 40gb fs this takes _forever_, so I imagine with say
      1tb fs this is just unbearable.  So just axe the thing altoghether, we don't
      need it and it saves us 8 bytes in the inode and saves us 500 microseconds per
      inode lookup in my testcase.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      d82a6f1d
    • J
      Btrfs: don't look at the extent buffer level 3 times in a row · 7e2355ba
      Josef Bacik 提交于
      We have a bit of debugging in btrfs_search_slot to make sure the level of the
      cow block is the same as the original block we were cow'ing.  I don't think I've
      ever seen this tripped, so kill it.  This saves us 2 kmap's per level in our
      search.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      7e2355ba
    • J
      Btrfs: map the node block when looking for readahead targets · cb25c2ea
      Josef Bacik 提交于
      If we have particularly full nodes, we could call btrfs_node_blockptr up to 32
      times, which is 32 pairs of kmap/kunmap, which _sucks_.  So go ahead and map the
      extent buffer while we look for readahead targets.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      cb25c2ea
    • J
      Btrfs: set range_start to the right start in count_range_bits · af60bed2
      Josef Bacik 提交于
      In count_range_bits we are adjusting total_bytes based on the range we are
      searching for, but we don't adjust the range start according to the range we are
      searching for, which makes for weird results.  For example, if the range
      
      [0-8192]
      
      is set DELALLOC, but I search for 4096-8192, I will get back 4096 for the number
      of bytes found, but the range_start will be 0, which makes it look like the
      range is [0-4096].  So instead set range_start = max(cur_start, state->start).
      This makes everything come out right.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      af60bed2
    • J
      Btrfs: fix how we do space reservation for truncate · fcb80c2a
      Josef Bacik 提交于
      The ceph guys keep running into problems where we have space reserved in our
      orphan block rsv when freeing it up.  This is because they tend to do snapshots
      alot, so their truncates tend to use a bunch of space, so when we go to do
      things like update the inode we have to steal reservation space in order to make
      the reservation happen.  This happens because truncate can use as much space as
      it freaking feels like, but we still have to hold space for removing the orphan
      item and updating the inode, which will definitely always happen.  So in order
      to fix this we need to split all of the reservation stuf up.  So with this patch
      we have
      
      1) The orphan block reserve which only holds the space for deleting our orphan
      item when everything is over.
      
      2) The truncate block reserve which gets allocated and used specifically for the
      space that the truncate will use on a per truncate basis.
      
      3) The transaction will always have 1 item's worth of data reserved so we can
      update the inode normally.
      
      Hopefully this will make the ceph problem go away.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      fcb80c2a
    • J
      Btrfs: kill trans_mutex · a4abeea4
      Josef Bacik 提交于
      We use trans_mutex for lots of things, here's a basic list
      
      1) To serialize trans_handles joining the currently running transaction
      2) To make sure that no new trans handles are started while we are committing
      3) To protect the dead_roots list and the transaction lists
      
      Really the serializing trans_handles joining is not too hard, and can really get
      bogged down in acquiring a reference to the transaction.  So replace the
      trans_mutex with a trans_lock spinlock and use it to do the following
      
      1) Protect fs_info->running_transaction.  All trans handles have to do is check
      this, and then take a reference of the transaction and keep on going.
      2) Protect the fs_info->trans_list.  This doesn't get used too much, basically
      it just holds the current transactions, which will usually just be the currently
      committing transaction and the currently running transaction at most.
      3) Protect the dead roots list.  This is only ever processed by splicing the
      list so this is relatively simple.
      4) Protect the fs_info->reloc_ctl stuff.  This is very lightweight and was using
      the trans_mutex before, so this is a pretty straightforward change.
      5) Protect fs_info->no_trans_join.  Because we don't hold the trans_lock over
      the entirety of the commit we need to have a way to block new people from
      creating a new transaction while we're doing our work.  So we set no_trans_join
      and in join_transaction we test to see if that is set, and if it is we do a
      wait_on_commit.
      6) Make the transaction use count atomic so we don't need to take locks to
      modify it when we're dropping references.
      7) Add a commit_lock to the transaction to make sure multiple people trying to
      commit the same transaction don't race and commit at the same time.
      8) Make open_ioctl_trans an atomic so we don't have to take any locks for ioctl
      trans.
      
      I have tested this with xfstests, but obviously it is a pretty hairy change so
      lots of testing is greatly appreciated.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      a4abeea4
    • J
      Btrfs: if we've already started a trans handle, use that one · 2a1eb461
      Josef Bacik 提交于
      We currently track trans handles in current->journal_info, but we don't actually
      use it.  This patch fixes it.  This will cover the case where we have multiple
      people starting transactions down the call chain.  This keeps us from having to
      allocate a new handle and all of that, we just increase the use count of the
      current handle, save the old block_rsv, and return.  I tested this with xfstests
      and it worked out fine.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      2a1eb461
    • J
      Btrfs: take away the num_items argument from btrfs_join_transaction · 7a7eaa40
      Josef Bacik 提交于
      I keep forgetting that btrfs_join_transaction() just ignores the num_items
      argument, which leads me to sending pointless patches and looking stupid :).  So
      just kill the num_items argument from btrfs_join_transaction and
      btrfs_start_ioctl_transaction, since neither of them use it.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      7a7eaa40
    • J
      Btrfs: make sure to use the delalloc reserve when filling delalloc · 74b21075
      Josef Bacik 提交于
      In the prealloc filling code and compressed code we don't set trans->block_rsv
      to the delalloc block reserve properly, which is going to make us use metadata
      from the wrong pool, this patch fixes that.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      74b21075
  2. 18 5月, 2011 4 次提交
    • J
      configfs: Fix race between configfs_readdir() and configfs_d_iput() · 24307aa1
      Joel Becker 提交于
      configfs_readdir() will use the existing inode numbers of inodes in the
      dcache, but it makes them up for attribute files that aren't currently
      instantiated.  There is a race where a closing attribute file can be
      tearing down at the same time as configfs_readdir() is trying to get its
      inode number.
      
      We want to get the inode number of open attribute files, because they
      should match while instantiated.  We can't lock down the transition
      where dentry->d_inode is set to NULL, so we just check for NULL there.
      We can, however, ensure that an inode we find isn't iput() in
      configfs_d_iput() until after we've accessed it.
      Signed-off-by: NJoel Becker <jlbec@evilplan.org>
      24307aa1
    • J
      configfs: Don't try to d_delete() negative dentries. · df7f9967
      Joel Becker 提交于
      When configfs is faking mkdir() on its subsystem or default group
      objects, it starts by adding a negative dentry.  It then tries to
      instantiate the group.  If that should fail, it must clean up after
      itself.
      
      I was using d_delete() here, but configfs_attach_group() promises to
      return an empty dentry on error.  d_delete() explodes with the entry
      dentry.  Let's try d_drop() instead.  The unhashing is what we want for
      our dentry.
      Signed-off-by: NJoel Becker <jlbec@evilplan.org>
      df7f9967
    • J
      cifs: fix cifsConvertToUCS() for the mapchars case · 11379b5e
      Jeff Layton 提交于
      As Metze pointed out, commit 84cdf74e broke mapchars option:
      
          Commit "cifs: fix unaligned accesses in cifsConvertToUCS"
          (84cdf74e) does multiple steps
          in just one commit (moving the function and changing it without
          testing).
      
          put_unaligned_le16(temp, &target[j]); is never called for any
          codepoint the goes via the 'default' switch statement. As a result
          we put just zero (or maybe uninitialized) bytes into the target
          buffer.
      
      His proposed patch looks correct, but doesn't apply to the current head
      of the tree. This patch should also fix it.
      
      Cc: <stable@kernel.org> # .38.x: 581ade4d: cifs: clean up various nits in unicode routines (try #2)
      Reported-by: NStefan Metzmacher <metze@samba.org>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      11379b5e
    • J
      cifs: add fallback in is_path_accessible for old servers · 221d1d79
      Jeff Layton 提交于
      The is_path_accessible check uses a QPathInfo call, which isn't
      supported by ancient win9x era servers. Fall back to an older
      SMBQueryInfo call if it fails with the magic error codes.
      
      Cc: stable@kernel.org
      Reported-and-Tested-by: NSandro Bonazzola <sandro.bonazzola@gmail.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      221d1d79
  3. 15 5月, 2011 5 次提交
  4. 14 5月, 2011 8 次提交
  5. 12 5月, 2011 6 次提交
  6. 10 5月, 2011 8 次提交
    • M
      fuse: fix oops in revalidate when called with NULL nameidata · d2433905
      Miklos Szeredi 提交于
      Some cases (e.g. ecryptfs) can call ->dentry_revalidate with NULL
      nameidata.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=34732
      
      Tyler Hicks pointed out that this bug was introduced by commit
      e7c0a167 "fuse: make fuse_dentry_revalidate() RCU aware"
      Reported-by: NWitold Baryluk <baryluk@smp.if.uj.edu.pl>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      d2433905
    • R
      nilfs2: fix infinite loop in nilfs_palloc_freev function · 349dbc36
      Ryusuke Konishi 提交于
      After having applied commit 9954e7af ("nilfs2: add free
      entries count only if clear bit operation succeeded"), a free routine
      of nilfs came to fall into an infinite loop, outputting the same
      message endlessly:
      
       nilfs_palloc_freev: entry number 29497 already freed
       nilfs_palloc_freev: entry number 29497 already freed
       nilfs_palloc_freev: entry number 29497 already freed
       nilfs_palloc_freev: entry number 29497 already freed
       nilfs_palloc_freev: entry number 29497 already freed ...
      
      That patch broke the routine so that a loop counter is never updated
      in an abnormal state.  This fixes the regression.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      349dbc36
    • D
      xfs: fix race condition in AIL push trigger · 7ac95657
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. One is caused by a
      race condition in determining whether there is a psh in progress or
      not.
      
      The XFS_AIL_PUSHING_BIT is used to determine whether a push is
      currently in progress.  When the AIL push work completes, it checked
      whether the target changed and cleared the PUSHING bit to allow a
      new push to be requeued. The race condition is as follows:
      
      	Thread 1		push work
      
      	smp_wmb()
      				smp_rmb()
      				check ailp->xa_target unchanged
      	update ailp->xa_target
      	test/set PUSHING bit
      	does not queue
      				clear PUSHING bit
      				does not requeue
      
      Now that the push target is updated, new attempts to push the AIL
      will not trigger as the push target will be the same, and hence
      despite trying to push the AIL we won't ever wake it again.
      
      The fix is to ensure that the AIL push work clears the PUSHING bit
      before it checks if the target is unchanged.
      
      As a result, both push triggers operate on the same test/set bit
      criteria, so even if we race in the push work and miss the target
      update, the thread requesting the push will still set the PUSHING
      bit and queue the push work to occur. For safety sake, the same
      queue check is done if the push work detects the target change,
      though only one of the two will will queue new work due to the use
      of test_and_set_bit() checks.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      
      (cherry picked from commit e4d3c4a4)
      7ac95657
    • D
      xfs: make AIL target updates and compares 32bit safe. · fe0da767
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. One of the problems
      noticed was that updates of the push target are not 32 bit safe as
      the target is a 64 bit value.
      
      We cannot copy a 64 bit LSN without the possibility of corrupting
      the result when racing with another updating thread. We have
      function to do this update safely without needing to care about
      32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when
      updating the AIL push target.
      
      Also move the reading of the target in the push work inside the AIL
      lock, and use XFS_LSN_CMP() for the unlocked comparison during work
      termination to close read holes as well.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      
      (cherry picked from commit fd5670f2)
      fe0da767
    • D
      xfs: always push the AIL to the target · 50e86686
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. One of the problems
      discovered is a target mismatch between the item pushing loop and
      the target itself.
      
      The push trigger checks for the target increasing (i.e. new target >
      current) while the push loop only pushes items that have a LSN <
      current. As a result, we can get the situation where the push target
      is X, the items at the tail of the AIL have LSN X and they don't get
      pushed. The push work then completes thinking it is done, and cannot
      be restarted until the push target increases to >= X + 1. If the
      push target then never increases (because the tail is not moving),
      then we never run the push work again and we stall.
      
      Fix it by making sure log items with a LSN that matches the target
      exactly are pushed during the loop.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      
      (cherry picked from commit cb64026b)
      50e86686
    • D
      xfs: exit AIL push work correctly when AIL is empty · 9e7004e7
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. The main cause is a
      regression where a work exit path fails to clear the PUSHING state
      and recheck the target correctly.
      
      Make both exit paths do the same PUSHING bit clearing and target
      checking when the "no more work to be done" condition is hit.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      
      (cherry picked from commit ea35a200)
      9e7004e7
    • D
      xfs: ensure reclaim cursor is reset correctly at end of AG · 228d62dd
      Dave Chinner 提交于
      On a 32 bit highmem PowerPC machine, the XFS inode cache was growing
      without bound and exhausting low memory causing the OOM killer to be
      triggered. After some effort, the problem was reproduced on a 32 bit
      x86 highmem machine.
      
      The problem is that the per-ag inode reclaim index cursor was not
      getting reset to the start of the AG if the radix tree tag lookup
      found no more reclaimable inodes. Hence every further reclaim
      attempt started at the same index beyond where any reclaimable
      inodes lay, and no further background reclaim ever occurred from the
      AG.
      
      Without background inode reclaim the VM driven cache shrinker
      simply cannot keep up with cache growth, and OOM is the result.
      
      While the change that exposed the problem was the conversion of the
      inode reclaim to use work queues for background reclaim, it was not
      the cause of the bug. The bug was introduced when the cursor code
      was added, just waiting for some weird configuration to strike....
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Tested-By: NChristian Kujau <lists@nerdbynature.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      
      (cherry picked from commit b2232219)
      228d62dd
    • M
      Don't lock guardpage if the stack is growing up · a09a79f6
      Mikulas Patocka 提交于
      Linux kernel excludes guard page when performing mlock on a VMA with
      down-growing stack. However, some architectures have up-growing stack
      and locking the guard page should be excluded in this case too.
      
      This patch fixes lvm2 on PA-RISC (and possibly other architectures with
      up-growing stack). lvm2 calculates number of used pages when locking and
      when unlocking and reports an internal error if the numbers mismatch.
      
      [ Patch changed fairly extensively to also fix /proc/<pid>/maps for the
        grows-up case, and to move things around a bit to clean it all up and
        share the infrstructure with the /proc bits.
      
        Tested on ia64 that has both grow-up and grow-down segments  - Linus ]
      Signed-off-by: NMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Tested-by: NTony Luck <tony.luck@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a09a79f6