1. 15 3月, 2010 3 次提交
    • M
      btrfs: fix btrfs_mkdir goto for no free objectids · 0be2e981
      Miao Xie 提交于
      btrfs_mkdir() must jump to the place of ending transaction after
      btrfs_find_free_objectid() failed. Or this transaction can't end.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0be2e981
    • C
      Btrfs: add new defrag-range ioctl. · 1e701a32
      Chris Mason 提交于
      The btrfs defrag ioctl was limited to doing the entire file.  This
      commit adds a new interface that can defrag a specific range inside
      the file.
      
      It can also force compression on the file, allowing you to selectively
      compress individual files after they were created, even when mount -o
      compress isn't turned on.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      1e701a32
    • J
      Btrfs: change how we mount subvolumes · 73f73415
      Josef Bacik 提交于
      This work is in preperation for being able to set a different root as the
      default mounting root.
      
      There is currently a problem with how we mount subvolumes.  We cannot currently
      mount a subvolume of a subvolume, you can only mount subvolumes/snapshots of the
      default subvolume.  So say you take a snapshot of the default subvolume and call
      it snap1, and then take a snapshot of snap1 and call it snap2, so now you have
      
      /
      /snap1
      /snap1/snap2
      
      as your available volumes.  Currently you can only mount / and /snap1,
      you cannot mount /snap1/snap2.  To fix this problem instead of passing
      subvolid=<name> you must pass in subvolid=<treeid>, where <treeid> is
      the tree id that gets spit out via the subvolume listing you get from
      the subvolume listing patches (btrfs filesystem list).  This allows us
      to mount /, /snap1 and /snap1/snap2 as the root volume.
      
      In addition to the above, we also now read the default dir item in the
      tree root to get the root key that it points to.  For now this just
      points at what has always been the default subvolme, but later on I plan
      to change it to point at whatever root you want to be the new default
      root, so you can just set the default mount and not have to mount with
      -o subvolid=<treeid>.  I tested this out with the above scenario and it
      worked perfectly.  Thanks,
      
      mount -o subvol operates inside the selected subvolid.  For example:
      
      mount -o subvol=snap1,subvolid=256 /dev/xxx /mnt
      
      /mnt will have the snap1 directory for the subvolume with id
      256.
      
      mount -o subvol=snap /dev/xxx /mnt
      
      /mnt will be the snap directory of whatever the default subvolume
      is.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      73f73415
  2. 05 2月, 2010 2 次提交
  3. 29 1月, 2010 3 次提交
    • J
      Btrfs: run orphan cleanup on default fs root · e3acc2a6
      Josef Bacik 提交于
      This patch revert's commit
      
      6c090a11
      
      Since it introduces this problem where we can run orphan cleanup on a
      volume that can have orphan entries re-added.  Instead of my original
      fix, Yan Zheng pointed out that we can just revert my original fix and
      then run the orphan cleanup in open_ctree after we look up the fs_root.
      I have tested this with all the tests that gave me problems and this
      patch fixes both problems.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      e3acc2a6
    • A
      Btrfs: Use correct values when updating inode i_size on fallocate · d1ea6a61
      Aneesh Kumar K.V 提交于
      commit f2bc9dd07e3424c4ec5f3949961fe053d47bc825
      Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Date:   Wed Jan 20 12:57:53 2010 +0530
      
          Btrfs: Use correct values when updating inode i_size on fallocate
      
          Even though we allocate more, we should be updating inode i_size
          as per the arguments passed
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      d1ea6a61
    • C
      Btrfs: Add mount -o compress-force · a555f810
      Chris Mason 提交于
      The default btrfs mount -o compress mode will quickly back off
      compressing a file if it notices that compression does not reduce the
      size of the data being written.  This can save considerable CPU because
      all future writes to the file go through uncompressed.
      
      But some files are both very large and have mixed data stored in
      them.  In that case, we want to add the ability to always try
      compressing data before writing it.
      
      This commit adds mount -o compress-force.  A later commit will add
      a new inode flag that does the same thing.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a555f810
  4. 18 1月, 2010 2 次提交
    • J
      Btrfs: fix regression in orphan cleanup · 6c090a11
      Josef Bacik 提交于
      Currently orphan cleanup only ever gets triggered if we cross subvolumes during
      a lookup, which means that if we just mount a plain jane fs that has orphans in
      it, they will never get cleaned up.  This results in panic's like these
      
      http://www.kerneloops.org/oops.php?number=1109085
      
      where adding an orphan entry results in -EEXIST being returned and we panic.  In
      order to fix this, we check to see on lookup if our root has had the orphan
      cleanup done, and if not go ahead and do it.  This is easily reproduceable by
      running this testcase
      
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <string.h>
      #include <unistd.h>
      #include <stdio.h>
      
      int main(int argc, char **argv)
      {
      	char data[4096];
      	char newdata[4096];
      	int fd1, fd2;
      
      	memset(data, 'a', 4096);
      	memset(newdata, 'b', 4096);
      
      	while (1) {
      		int i;
      
      		fd1 = creat("file1", 0666);
      		if (fd1 < 0)
      			break;
      
      		for (i = 0; i < 512; i++)
      			write(fd1, data, 4096);
      
      		fsync(fd1);
      		close(fd1);
      
      		fd2 = creat("file2", 0666);
      		if (fd2 < 0)
      			break;
      
      		ftruncate(fd2, 4096 * 512);
      
      		for (i = 0; i < 512; i++)
      			write(fd2, newdata, 4096);
      		close(fd2);
      
      		i = rename("file2", "file1");
      		unlink("file1");
      	}
      
      	return 0;
      }
      
      and then pulling the power on the box, and then trying to run that test again
      when the box comes back up.  I've tested this locally and it fixes the problem.
      Thanks to Tomas Carnecky for helping me track this down initially.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      6c090a11
    • J
      btrfs: fix missing last-entry in readdir(3) · 406266ab
      Jan Engelhardt 提交于
      parent 49313cdac7b34c9f7ecbb1780cfc648b1c082cd7 (v2.6.32-1-g49313cd)
      commit ff48c08e1c05c67e8348ab6f8a24de8034e0e34d
      Author: Jan Engelhardt <jengelh@medozas.de>
      Date:   Wed Dec 9 22:57:36 2009 +0100
      
      Btrfs: fix missing last-entry in readdir(3)
      
      When one does a 32-bit readdir(3), the last entry of a directory is
      missing. This is however not due to passing a large value to filldir,
      but it seems to have to do with glibc doing telldir or something
      quirky. In any case, this patch fixes it in practice.
      Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      406266ab
  5. 18 12月, 2009 8 次提交
  6. 16 12月, 2009 1 次提交
  7. 12 11月, 2009 4 次提交
  8. 14 10月, 2009 2 次提交
    • J
      Btrfs: fix possible ENOSPC problems with truncate · 5d5e103a
      Josef Bacik 提交于
      There's a problem where we don't do any space reservation for truncates, which
      can cause you to OOPs because you will be allowed to go off in the weeds a bit
      since we don't account for the delalloc bytes that are created as a result of
      the truncate.
      Signed-off-by: NJosef Bacik <jbacik@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      5d5e103a
    • C
      Btrfs: avoid tree log commit when there are no changes · 257c62e1
      Chris Mason 提交于
      rpm has a habit of running fdatasync when the file hasn't
      changed.  We already detect if a file hasn't been changed
      in the current transaction but it might have been sent to
      the tree-log in this transaction and not changed since
      the last call to fsync.
      
      In this case, we want to avoid a tree log sync, which includes
      a number of synchronous writes and barriers.  This commit
      extends the existing tracking of the last transaction to change
      a file to also track the last sub-transaction.
      
      The end result is that rpm -ivh and -Uvh are roughly twice as fast,
      and on par with ext3.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      257c62e1
  9. 09 10月, 2009 6 次提交
  10. 02 10月, 2009 2 次提交
  11. 29 9月, 2009 1 次提交
    • J
      Btrfs: proper -ENOSPC handling · 9ed74f2d
      Josef Bacik 提交于
      At the start of a transaction we do a btrfs_reserve_metadata_space() and
      specify how many items we plan on modifying.  Then once we've done our
      modifications and such, just call btrfs_unreserve_metadata_space() for
      the same number of items we reserved.
      
      For keeping track of metadata needed for data I've had to add an extent_io op
      for when we merge extents.  This lets us track space properly when we are doing
      sequential writes, so we don't end up reserving way more metadata space than
      what we need.
      
      The only place where the metadata space accounting is not done is in the
      relocation code.  This is because Yan is going to be reworking that code in the
      near future, so running btrfs-vol -b could still possibly result in a ENOSPC
      related panic.  This patch also turns off the metadata_ratio stuff in order to
      allow users to more efficiently use their disk space.
      
      This patch makes it so we track how much metadata we need for an inode's
      delayed allocation extents by tracking how many extents are currently
      waiting for allocation.  It introduces two new callbacks for the
      extent_io tree's, merge_extent_hook and split_extent_hook.  These help
      us keep track of when we merge delalloc extents together and split them
      up.  Reservations are handled prior to any actually dirty'ing occurs,
      and then we unreserve after we dirty.
      
      btrfs_unreserve_metadata_for_delalloc() will make the appropriate
      unreservations as needed based on the number of reservations we
      currently have and the number of extents we currently have.  Doing the
      reservation outside of doing any of the actual dirty'ing lets us do
      things like filemap_flush() the inode to try and force delalloc to
      happen, or as a last resort actually start allocation on all delalloc
      inodes in the fs.  This has survived dbench, fs_mark and an fsx torture
      test.
      Signed-off-by: NJosef Bacik <jbacik@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      9ed74f2d
  12. 24 9月, 2009 2 次提交
    • Y
      Btrfs: don't rename file into dummy directory · f679a840
      Yan, Zheng 提交于
      A recent change enforces only one access point to each subvolume. The first
      directory entry (the one added when the subvolume/snapshot was created) is
      treated as valid access point, all other subvolume links are linked to dummy
      empty directories. The dummy directories are temporary inodes that only in
      memory, so we can not rename file into them.
      Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f679a840
    • Y
      Btrfs: check size of inode backref before adding hardlink · a5719521
      Yan, Zheng 提交于
      For every hardlink in btrfs, there is a corresponding inode back
      reference. All inode back references for hardlinks in a given
      directory are stored in single b-tree item. The size of b-tree item
      is limited by the size of b-tree leaf, so we can only create limited
      number of hardlinks to a given file in a directory.
      
      The original code lacks of the check, it oops if the number of
      hardlinks goes over the limit. This patch fixes the issue by adding
      check to btrfs_link and btrfs_rename.
      Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a5719521
  13. 22 9月, 2009 4 次提交