1. 02 10月, 2011 5 次提交
    • A
      btrfs: hooks for readahead · 4bb31e92
      Arne Jansen 提交于
      This adds the hooks needed for readahead. In the readpage_end_io_hook,
      the extent state is checked for the EXTENT_READAHEAD flag. Only in this
      case the readahead hook is called, to keep the impact on non-ra as low
      as possible.
      Additionally, a hook for a failed IO is added, otherwise readahead would
      wait indefinitely for the extent to finish.
      
      Changes for v2:
       - eliminate race condition
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      4bb31e92
    • A
      btrfs: initial readahead code and prototypes · 7414a03f
      Arne Jansen 提交于
      This is the implementation for the generic read ahead framework.
      
      To trigger a readahead, btrfs_reada_add must be called. It will start
      a read ahead for the given range [start, end) on tree root. The returned
      handle can either be used to wait on the readahead to finish
      (btrfs_reada_wait), or to send it to the background (btrfs_reada_detach).
      
      The read ahead works as follows:
      On btrfs_reada_add, the root of the tree is inserted into a radix_tree.
      reada_start_machine will then search for extents to prefetch and trigger
      some reads. When a read finishes for a node, all contained node/leaf
      pointers that lie in the given range will also be enqueued. The reads will
      be triggered in sequential order, thus giving a big win over a naive
      enumeration. It will also make use of multi-device layouts. Each disk
      will have its on read pointer and all disks will by utilized in parallel.
      Also will no two disks read both sides of a mirror simultaneously, as this
      would waste seeking capacity. Instead both disks will read different parts
      of the filesystem.
      Any number of readaheads can be started in parallel. The read order will be
      determined globally, i.e. 2 parallel readaheads will normally finish faster
      than the 2 started one after another.
      
      Changes v2:
       - protect root->node by transaction instead of node_lock
       - fix missed branches:
          The readahead had a too simple check to determine if a branch from
          a node should be checked or not. It now also records the upper bound
          of each node to see if the requested RA range lies within.
       - use KERN_CONT to debug output, to avoid line breaks
       - defer reada_start_machine to worker to avoid deadlock
      
      Changes v3:
       - protect root->node by rcu
      
      Changes v5:
       - changed EIO-semantics of reada_tree_block_flagged
       - remove spin_lock from reada_control and make elems an atomic_t
       - remove unused read_total from reada_control
       - kill reada_key_cmp, use btrfs_comp_cpu_keys instead
       - use kref-style release functions where possible
       - return struct reada_control * instead of void * from btrfs_reada_add
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      7414a03f
    • A
      btrfs: state information for readahead · 90519d66
      Arne Jansen 提交于
      Add state information for readahead to btrfs_fs_info and btrfs_device
      
      Changes v2:
       - don't wait in radix_trees
       - add own set of workers for readahead
      Reviewed-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      90519d66
    • A
      btrfs: add READAHEAD extent buffer flag · ab0fff03
      Arne Jansen 提交于
      Add a READAHEAD extent buffer flag.
      Add a function to trigger a read with this flag set.
      
      Changes v2:
       - use extent buffer flags instead of extent state flags
      
      Changes v5:
       - adapt to changed read_extent_buffer_pages interface
       - don't return eb from reada_tree_block_flagged if it has CORRUPT flag set
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      ab0fff03
    • A
      btrfs: add an extra wait mode to read_extent_buffer_pages · bb82ab88
      Arne Jansen 提交于
      read_extent_buffer_pages currently has two modes, either trigger a read
      without waiting for anything, or wait for the I/O to finish. The former
      also bails when it's unable to lock the page. This patch now adds an
      additional parameter to allow it to block on page lock, but don't wait
      for completion.
      
      Changes v5:
       - merge the 2 wait parameters into one and define WAIT_NONE, WAIT_COMPLETE and
         WAIT_PAGE_LOCK
      
      Change v6:
       - fix bug introduced in v5
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      bb82ab88
  2. 01 10月, 2011 1 次提交
    • J
      Btrfs: force a page fault if we have a shorty copy on a page boundary · b6316429
      Josef Bacik 提交于
      A user reported a problem where ceph was getting into 100% cpu usage while doing
      some writing.  It turns out it's because we were doing a short write on a not
      uptodate page, which means we'd fall back at one page at a time and fault the
      page in.  The problem is our position is on the page boundary, so our fault in
      logic wasn't actually reading the page, so we'd just spin forever or until the
      page got read in by somebody else.  This will force a readpage if we end up
      doing a short copy.  Alexandre could reproduce this easily with ceph and reports
      it fixes his problem.  I also wrote a reproducer that no longer hangs my box
      with this patch.  Thanks,
      Reported-and-tested-by: NAlexandre Oliva <aoliva@redhat.com>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b6316429
  3. 21 9月, 2011 1 次提交
  4. 18 9月, 2011 6 次提交
    • J
      Btrfs: only clear the need lookup flag after the dentry is setup · a66e7cc6
      Josef Bacik 提交于
      We can race with readdir and the RCU path walking stuff.  This is because we
      clear the need lookup flag before actually instantiating the inode.  This will
      lead the RCU path walk stuff to find a dentry it thinks is valid without a
      d_inode attached.  So instead unhash the dentry when we first start the lookup,
      and then clear the flag after we've instantiated the dentry so we're garunteed
      to either try the slow lookup, or have the d_inode set properly.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a66e7cc6
    • J
      BTRFS: Fix lseek return value for error · 48802c8a
      Jeff Liu 提交于
      The recent reworking of btrfs' lseek lead to incorrect
      values being returned.  This adds checks for seeking
      beyond EOF in SEEK_HOLE and makes sure the error
      values come back correct.
      
      Andi Kleen also sent in similar patches.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reported-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      48802c8a
    • L
      Btrfs: don't change inode flag of the dest clone file · dde820fb
      Li Zefan 提交于
      The dst file will have the same inode flags with dst file after
      file clone, and I think it's unexpected.
      
      For example, the dst file will suddenly become immutable after
      getting some share of data with src file, if the src is immutable.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      dde820fb
    • L
      Btrfs: don't make a file partly checksummed through file clone · 0e7b824c
      Li Zefan 提交于
      To reproduce the bug:
      
        # mount /dev/sda7 /mnt
        # dd if=/dev/zero of=/mnt/src bs=4K count=1
        # umount /mnt
      
        # mount -o nodatasum /dev/sda7 /mnt
        # dd if=/dev/zero of=/mnt/dst bs=4K count=1
        # clone_range -s 4K -l 4K /mnt/src /mnt/dst
      
        # echo 3 > /proc/sys/vm/drop_caches
        # cat /mnt/dst
        # dmesg
        ...
        btrfs no csum found for inode 258 start 0
        btrfs csum failed ino 258 off 0 csum 2566472073 private 0
      
      It's because part of the file is checksummed and the other part is not,
      and then btrfs will complain checksum is not found when we read the file.
      
      Disallow file clone if src and dst file have different checksum flag,
      so we ensure a file is completely checksummed or unchecksummed.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0e7b824c
    • L
      Btrfs: fix pages truncation in btrfs_ioctl_clone() · 71ef0786
      Li Zefan 提交于
      It's a bug in commit f81c9cdc
      (Btrfs: truncate pages from clone ioctl target range)
      
      We should pass the dest range to the truncate function, but not the
      src range.
      
      Also move the function before locking extent state.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      71ef0786
    • H
      btrfs: fix d_off in the first dirent · 3765fefa
      Hidetoshi Seto 提交于
      Since the d_off in the first dirent for "." (that originates from
      the 4th argument "offset" of filldir() for the 2nd dirent for "..")
      is wrongly assigned in btrfs_real_readdir(), telldir returns same
      offset for different locations.
      
       | # mkfs.btrfs /dev/sdb1
       | # mount /dev/sdb1 fs0
       | # cd fs0
       | # touch file0 file1
       | # ../test
       | telldir: 0
       | readdir: d_off = 2, d_name = "."
       | telldir: 2
       | readdir: d_off = 2, d_name = ".."
       | telldir: 2
       | readdir: d_off = 3, d_name = "file0"
       | telldir: 3
       | readdir: d_off = 2147483647, d_name = "file1"
       | telldir: 2147483647
      
      To fix this problem, pass filp->f_pos (which is loff_t) instead.
      
       | # ../test
       | telldir: 0
       | readdir: d_off = 1, d_name = "."
       | telldir: 1
       | readdir: d_off = 2, d_name = ".."
       | telldir: 2
       | readdir: d_off = 3, d_name = "file0"
       :
      
      At the moment the "offset" for "." is unused because there is no
      preceding dirent, however it is better to pass filp->f_pos to follow
      grammatical usage.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      3765fefa
  5. 11 9月, 2011 11 次提交
  6. 21 8月, 2011 1 次提交
  7. 18 8月, 2011 3 次提交
  8. 17 8月, 2011 10 次提交
  9. 06 8月, 2011 1 次提交
  10. 02 8月, 2011 1 次提交