1. 22 5月, 2011 1 次提交
  2. 21 5月, 2011 1 次提交
    • M
      btrfs: implement delayed inode items operation · 16cdcec7
      Miao Xie 提交于
      Changelog V5 -> V6:
      - Fix oom when the memory load is high, by storing the delayed nodes into the
        root's radix tree, and letting btrfs inodes go.
      
      Changelog V4 -> V5:
      - Fix the race on adding the delayed node to the inode, which is spotted by
        Chris Mason.
      - Merge Chris Mason's incremental patch into this patch.
      - Fix deadlock between readdir() and memory fault, which is reported by
        Itaru Kitayama.
      
      Changelog V3 -> V4:
      - Fix nested lock, which is reported by Itaru Kitayama, by updating space cache
        inode in time.
      
      Changelog V2 -> V3:
      - Fix the race between the delayed worker and the task which does delayed items
        balance, which is reported by Tsutomu Itoh.
      - Modify the patch address David Sterba's comment.
      - Fix the bug of the cpu recursion spinlock, reported by Chris Mason
      
      Changelog V1 -> V2:
      - break up the global rb-tree, use a list to manage the delayed nodes,
        which is created for every directory and file, and used to manage the
        delayed directory name index items and the delayed inode item.
      - introduce a worker to deal with the delayed nodes.
      
      Compare with Ext3/4, the performance of file creation and deletion on btrfs
      is very poor. the reason is that btrfs must do a lot of b+ tree insertions,
      such as inode item, directory name item, directory name index and so on.
      
      If we can do some delayed b+ tree insertion or deletion, we can improve the
      performance, so we made this patch which implemented delayed directory name
      index insertion/deletion and delayed inode update.
      
      Implementation:
      - introduce a delayed root object into the filesystem, that use two lists to
        manage the delayed nodes which are created for every file/directory.
        One is used to manage all the delayed nodes that have delayed items. And the
        other is used to manage the delayed nodes which is waiting to be dealt with
        by the work thread.
      - Every delayed node has two rb-tree, one is used to manage the directory name
        index which is going to be inserted into b+ tree, and the other is used to
        manage the directory name index which is going to be deleted from b+ tree.
      - introduce a worker to deal with the delayed operation. This worker is used
        to deal with the works of the delayed directory name index items insertion
        and deletion and the delayed inode update.
        When the delayed items is beyond the lower limit, we create works for some
        delayed nodes and insert them into the work queue of the worker, and then
        go back.
        When the delayed items is beyond the upper bound, we create works for all
        the delayed nodes that haven't been dealt with, and insert them into the work
        queue of the worker, and then wait for that the untreated items is below some
        threshold value.
      - When we want to insert a directory name index into b+ tree, we just add the
        information into the delayed inserting rb-tree.
        And then we check the number of the delayed items and do delayed items
        balance. (The balance policy is above.)
      - When we want to delete a directory name index from the b+ tree, we search it
        in the inserting rb-tree at first. If we look it up, just drop it. If not,
        add the key of it into the delayed deleting rb-tree.
        Similar to the delayed inserting rb-tree, we also check the number of the
        delayed items and do delayed items balance.
        (The same to inserting manipulation)
      - When we want to update the metadata of some inode, we cached the data of the
        inode into the delayed node. the worker will flush it into the b+ tree after
        dealing with the delayed insertion and deletion.
      - We will move the delayed node to the tail of the list after we access the
        delayed node, By this way, we can cache more delayed items and merge more
        inode updates.
      - If we want to commit transaction, we will deal with all the delayed node.
      - the delayed node will be freed when we free the btrfs inode.
      - Before we log the inode items, we commit all the directory name index items
        and the delayed inode update.
      
      I did a quick test by the benchmark tool[1] and found we can improve the
      performance of file creation by ~15%, and file deletion by ~20%.
      
      Before applying this patch:
      Create files:
              Total files: 50000
              Total time: 1.096108
              Average time: 0.000022
      Delete files:
              Total files: 50000
              Total time: 1.510403
              Average time: 0.000030
      
      After applying this patch:
      Create files:
              Total files: 50000
              Total time: 0.932899
              Average time: 0.000019
      Delete files:
              Total files: 50000
              Total time: 1.215732
              Average time: 0.000024
      
      [1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3
      
      Many thanks for Kitayama-san's help!
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dave@jikos.cz>
      Tested-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Tested-by: NItaru Kitayama <kitayama@cl.bb4u.ne.jp>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      16cdcec7
  3. 18 5月, 2011 4 次提交
    • J
      configfs: Fix race between configfs_readdir() and configfs_d_iput() · 24307aa1
      Joel Becker 提交于
      configfs_readdir() will use the existing inode numbers of inodes in the
      dcache, but it makes them up for attribute files that aren't currently
      instantiated.  There is a race where a closing attribute file can be
      tearing down at the same time as configfs_readdir() is trying to get its
      inode number.
      
      We want to get the inode number of open attribute files, because they
      should match while instantiated.  We can't lock down the transition
      where dentry->d_inode is set to NULL, so we just check for NULL there.
      We can, however, ensure that an inode we find isn't iput() in
      configfs_d_iput() until after we've accessed it.
      Signed-off-by: NJoel Becker <jlbec@evilplan.org>
      24307aa1
    • J
      configfs: Don't try to d_delete() negative dentries. · df7f9967
      Joel Becker 提交于
      When configfs is faking mkdir() on its subsystem or default group
      objects, it starts by adding a negative dentry.  It then tries to
      instantiate the group.  If that should fail, it must clean up after
      itself.
      
      I was using d_delete() here, but configfs_attach_group() promises to
      return an empty dentry on error.  d_delete() explodes with the entry
      dentry.  Let's try d_drop() instead.  The unhashing is what we want for
      our dentry.
      Signed-off-by: NJoel Becker <jlbec@evilplan.org>
      df7f9967
    • J
      cifs: fix cifsConvertToUCS() for the mapchars case · 11379b5e
      Jeff Layton 提交于
      As Metze pointed out, commit 84cdf74e broke mapchars option:
      
          Commit "cifs: fix unaligned accesses in cifsConvertToUCS"
          (84cdf74e) does multiple steps
          in just one commit (moving the function and changing it without
          testing).
      
          put_unaligned_le16(temp, &target[j]); is never called for any
          codepoint the goes via the 'default' switch statement. As a result
          we put just zero (or maybe uninitialized) bytes into the target
          buffer.
      
      His proposed patch looks correct, but doesn't apply to the current head
      of the tree. This patch should also fix it.
      
      Cc: <stable@kernel.org> # .38.x: 581ade4d: cifs: clean up various nits in unicode routines (try #2)
      Reported-by: NStefan Metzmacher <metze@samba.org>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      11379b5e
    • J
      cifs: add fallback in is_path_accessible for old servers · 221d1d79
      Jeff Layton 提交于
      The is_path_accessible check uses a QPathInfo call, which isn't
      supported by ancient win9x era servers. Fall back to an older
      SMBQueryInfo call if it fails with the magic error codes.
      
      Cc: stable@kernel.org
      Reported-and-Tested-by: NSandro Bonazzola <sandro.bonazzola@gmail.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      221d1d79
  4. 15 5月, 2011 5 次提交
  5. 14 5月, 2011 8 次提交
  6. 13 5月, 2011 5 次提交
    • A
      btrfs: quasi-round-robin for chunk allocation · 73c5de00
      Arne Jansen 提交于
      In a multi device setup, the chunk allocator currently always allocates
      chunks on the devices in the same order. This leads to a very uneven
      distribution, especially with RAID1 or RAID10 and an uneven number of
      devices.
      This patch always sorts the devices before allocating, and allocates the
      stripes on the devices with the most available space, as long as there
      is enough space available. In a low space situation, it first tries to
      maximize striping.
      The patch also simplifies the allocator and reduces the checks for
      corner cases.
      The simplification is done by several means. First, it defines the
      properties of each RAID type upfront. These properties are used afterwards
      instead of differentiating cases in several places.
      Second, the old allocator defined a minimum stripe size for each block
      group type, tried to find a large enough chunk, and if this fails just
      allocates a smaller one. This is now done in one step. The largest possible
      chunk (up to max_chunk_size) is searched and allocated.
      Because we now have only one pass, the allocation of the map (struct
      map_lookup) is moved down to the point where the number of stripes is
      already known. This way we avoid reallocation of the map.
      We still avoid allocating stripes that are not a multiple of STRIPE_SIZE.
      73c5de00
    • A
      btrfs: heed alloc_start · a9c9bf68
      Arne Jansen 提交于
      currently alloc_start is disregarded if the requested
      chunk size is bigger than (device size - alloc_start),
      but smaller than the device size.
      The only situation where I see this could have made sense
      was when a chunk equal the size of the device has been
      requested. This was possible as the allocator failed to
      take alloc_start into account when calculating the request
      chunk size. As this gets fixed by this patch, the workaround
      is not necessary anymore.
      a9c9bf68
    • A
      btrfs: move btrfs_cmp_device_free_bytes to super.c · bcd53741
      Arne Jansen 提交于
      this function won't be used here anymore, so move it super.c where it is
      used for df-calculation
      bcd53741
    • D
      btrfs: use unsigned type for single bit bitfield · 4ea02885
      David Sterba 提交于
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      4ea02885
    • D
      btrfs: use printk_ratelimited instead of printk_ratelimit · 7a36ddec
      David Sterba 提交于
      As per printk_ratelimit comment, it should not be used.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      7a36ddec
  7. 12 5月, 2011 10 次提交
  8. 10 5月, 2011 6 次提交
    • M
      fuse: fix oops in revalidate when called with NULL nameidata · d2433905
      Miklos Szeredi 提交于
      Some cases (e.g. ecryptfs) can call ->dentry_revalidate with NULL
      nameidata.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=34732
      
      Tyler Hicks pointed out that this bug was introduced by commit
      e7c0a167 "fuse: make fuse_dentry_revalidate() RCU aware"
      Reported-by: NWitold Baryluk <baryluk@smp.if.uj.edu.pl>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      d2433905
    • R
      nilfs2: fix infinite loop in nilfs_palloc_freev function · 349dbc36
      Ryusuke Konishi 提交于
      After having applied commit 9954e7af ("nilfs2: add free
      entries count only if clear bit operation succeeded"), a free routine
      of nilfs came to fall into an infinite loop, outputting the same
      message endlessly:
      
       nilfs_palloc_freev: entry number 29497 already freed
       nilfs_palloc_freev: entry number 29497 already freed
       nilfs_palloc_freev: entry number 29497 already freed
       nilfs_palloc_freev: entry number 29497 already freed
       nilfs_palloc_freev: entry number 29497 already freed ...
      
      That patch broke the routine so that a loop counter is never updated
      in an abnormal state.  This fixes the regression.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      349dbc36
    • D
      xfs: fix race condition in AIL push trigger · 7ac95657
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. One is caused by a
      race condition in determining whether there is a psh in progress or
      not.
      
      The XFS_AIL_PUSHING_BIT is used to determine whether a push is
      currently in progress.  When the AIL push work completes, it checked
      whether the target changed and cleared the PUSHING bit to allow a
      new push to be requeued. The race condition is as follows:
      
      	Thread 1		push work
      
      	smp_wmb()
      				smp_rmb()
      				check ailp->xa_target unchanged
      	update ailp->xa_target
      	test/set PUSHING bit
      	does not queue
      				clear PUSHING bit
      				does not requeue
      
      Now that the push target is updated, new attempts to push the AIL
      will not trigger as the push target will be the same, and hence
      despite trying to push the AIL we won't ever wake it again.
      
      The fix is to ensure that the AIL push work clears the PUSHING bit
      before it checks if the target is unchanged.
      
      As a result, both push triggers operate on the same test/set bit
      criteria, so even if we race in the push work and miss the target
      update, the thread requesting the push will still set the PUSHING
      bit and queue the push work to occur. For safety sake, the same
      queue check is done if the push work detects the target change,
      though only one of the two will will queue new work due to the use
      of test_and_set_bit() checks.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      
      (cherry picked from commit e4d3c4a4)
      7ac95657
    • D
      xfs: make AIL target updates and compares 32bit safe. · fe0da767
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. One of the problems
      noticed was that updates of the push target are not 32 bit safe as
      the target is a 64 bit value.
      
      We cannot copy a 64 bit LSN without the possibility of corrupting
      the result when racing with another updating thread. We have
      function to do this update safely without needing to care about
      32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when
      updating the AIL push target.
      
      Also move the reading of the target in the push work inside the AIL
      lock, and use XFS_LSN_CMP() for the unlocked comparison during work
      termination to close read holes as well.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      
      (cherry picked from commit fd5670f2)
      fe0da767
    • D
      xfs: always push the AIL to the target · 50e86686
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. One of the problems
      discovered is a target mismatch between the item pushing loop and
      the target itself.
      
      The push trigger checks for the target increasing (i.e. new target >
      current) while the push loop only pushes items that have a LSN <
      current. As a result, we can get the situation where the push target
      is X, the items at the tail of the AIL have LSN X and they don't get
      pushed. The push work then completes thinking it is done, and cannot
      be restarted until the push target increases to >= X + 1. If the
      push target then never increases (because the tail is not moving),
      then we never run the push work again and we stall.
      
      Fix it by making sure log items with a LSN that matches the target
      exactly are pushed during the loop.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      
      (cherry picked from commit cb64026b)
      50e86686
    • D
      xfs: exit AIL push work correctly when AIL is empty · 9e7004e7
      Dave Chinner 提交于
      The recent conversion of the xfsaild functionality to a work queue
      introduced a hard-to-hit log space grant hang. The main cause is a
      regression where a work exit path fails to clear the PUSHING state
      and recheck the target correctly.
      
      Make both exit paths do the same PUSHING bit clearing and target
      checking when the "no more work to be done" condition is hit.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      
      (cherry picked from commit ea35a200)
      9e7004e7