1. 17 1月, 2011 2 次提交
    • M
      btrfs: fix wrong free space information of btrfs · 6d07bcec
      Miao Xie 提交于
      When we store data by raid profile in btrfs with two or more different size
      disks, df command shows there is some free space in the filesystem, but the
      user can not write any data in fact, df command shows the wrong free space
      information of btrfs.
      
       # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10
       # btrfs-show
       Label: none  uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64
       	Total devices 2 FS bytes used 28.00KB
       	devid    1 size 5.01GB used 2.03GB path /dev/sda9
       	devid    2 size 10.00GB used 2.01GB path /dev/sda10
       # btrfs device scan /dev/sda9 /dev/sda10
       # mount /dev/sda9 /mnt
       # dd if=/dev/zero of=tmpfile0 bs=4K count=9999999999
         (fill the filesystem)
       # sync
       # df -TH
       Filesystem	Type	Size	Used	Avail	Use%	Mounted on
       /dev/sda9	btrfs	17G	8.6G	5.4G	62%	/mnt
       # btrfs-show
       Label: none  uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64
       	Total devices 2 FS bytes used 3.99GB
       	devid    1 size 5.01GB used 5.01GB path /dev/sda9
       	devid    2 size 10.00GB used 4.99GB path /dev/sda10
      
      It is because btrfs cannot allocate chunks when one of the pairing disks has
      no space, the free space on the other disks can not be used for ever, and should
      be subtracted from the total space, but btrfs doesn't subtract this space from
      the total. It is strange to the user.
      
      This patch fixes it by calcing the free space that can be used to allocate
      chunks.
      
      Implementation:
      1. get all the devices free space, and align them by stripe length.
      2. sort the devices by the free space.
      3. check the free space of the devices,
         3.1. if it is not zero, and then check the number of the devices that has
              more free space than this device,
              if the number of the devices is beyond the min stripe number, the free
              space can be used, and add into total free space.
              if the number of the devices is below the min stripe number, we can not
              use the free space, the check ends.
         3.2. if the free space is zero, check the next devices, goto 3.1
      
      This implementation is just likely fake chunk allocation.
      
      After appling this patch, df can show correct space information:
       # df -TH
       Filesystem	Type	Size	Used	Avail	Use%	Mounted on
       /dev/sda9	btrfs	17G	8.6G	0	100%	/mnt
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      6d07bcec
    • S
      fs/btrfs: Fix build of ctree · f580eb09
      Stefan Schmidt 提交于
      CC [M]  fs/btrfs/ctree.o
      In file included from fs/btrfs/ctree.c:21:0:
      fs/btrfs/ctree.h:1003:17: error: field <91>super_kobj<92> has incomplete type
      fs/btrfs/ctree.h:1074:17: error: field <91>root_kobj<92> has incomplete type
      make[2]: *** [fs/btrfs/ctree.o] Error 1
      make[1]: *** [fs/btrfs] Error 2
      make: *** [fs] Error 2
      
      We need to include kobject.h here.
      Reported-by: NJeff Garzik <jeff@garzik.org>
      Fix-suggested-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NStefan Schmidt <stefan@datenfreihafen.org>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f580eb09
  2. 23 12月, 2010 1 次提交
    • L
      Btrfs: Add readonly snapshots support · b83cc969
      Li Zefan 提交于
      Usage:
      
      Set BTRFS_SUBVOL_RDONLY of btrfs_ioctl_vol_arg_v2->flags, and call
      ioctl(BTRFS_I0CTL_SNAP_CREATE_V2).
      
      Implementation:
      
      - Set readonly bit of btrfs_root_item->flags.
      - Add readonly checks in btrfs_permission (inode_permission),
      btrfs_setattr, btrfs_set/remove_xattr and some ioctls.
      
      Changelog for v3:
      
      - Eliminate btrfs_root->readonly, but check btrfs_root->root_item.flags.
      - Rename BTRFS_ROOT_SNAP_RDONLY to BTRFS_ROOT_SUBVOL_RDONLY.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      b83cc969
  3. 22 12月, 2010 2 次提交
    • L
      btrfs: Add lzo compression support · a6fa6fae
      Li Zefan 提交于
      Lzo is a much faster compression algorithm than gzib, so would allow
      more users to enable transparent compression, and some users can
      choose from compression ratio and speed for different applications
      
      Usage:
      
       # mount -t btrfs -o compress[=<zlib,lzo>] dev /mnt
      or
       # mount -t btrfs -o compress-force[=<zlib,lzo>] dev /mnt
      
      "-o compress" without argument is still allowed for compatability.
      
      Compatibility:
      
      If we mount a filesystem with lzo compression, it will not be able be
      mounted in old kernels. One reason is, otherwise btrfs will directly
      dump compressed data, which sits in inline extent, to user.
      
      Performance:
      
      The test copied a linux source tarball (~400M) from an ext4 partition
      to the btrfs partition, and then extracted it.
      
      (time in second)
                 lzo        zlib        nocompress
      copy:      10.6       21.7        14.9
      extract:   70.1       94.4        66.6
      
      (data size in MB)
                 lzo        zlib        nocompress
      copy:      185.87     108.69      394.49
      extract:   193.80     132.36      381.21
      
      Changelog:
      
      v1 -> v2:
      - Select LZO_COMPRESS and LZO_DECOMPRESS in btrfs Kconfig.
      - Add incompability flag.
      - Fix error handling in compress code.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      a6fa6fae
    • L
      btrfs: Allow to add new compression algorithm · 261507a0
      Li Zefan 提交于
      Make the code aware of compression type, instead of always assuming
      zlib compression.
      
      Also make the zlib workspace function as common code for all
      compression types.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      261507a0
  4. 22 11月, 2010 1 次提交
  5. 30 10月, 2010 2 次提交
    • S
      Btrfs: allow subvol deletion by unprivileged user with -o user_subvol_rm_allowed · 4260f7c7
      Sage Weil 提交于
      Add a mount option user_subvol_rm_allowed that allows users to delete a
      (potentially non-empty!) subvol when they would otherwise we allowed to do
      an rmdir(2).  We duplicate the may_delete() checks from the core VFS code
      to implement identical security checks (minus the directory size check).
      We additionally require that the user has write+exec permission on the
      subvol root inode.
      Signed-off-by: NSage Weil <sage@newdream.net>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      4260f7c7
    • S
      Btrfs: async transaction commit · bb9c12c9
      Sage Weil 提交于
      Add support for an async transaction commit that is ordered such that any
      subsequent operations will join the following transaction, but does not
      wait until the current commit is fully on disk.  This avoids much of the
      latency associated with the btrfs_commit_transaction for callers concerned
      with serialization and not safety.
      
      The wait_for_unblock flag controls whether we wait for the 'middle' portion
      of commit_transaction to complete, which is necessary if the caller expects
      some of the modifications contained in the commit to be available (this is
      the case for subvol/snapshot creation).
      Signed-off-by: NSage Weil <sage@newdream.net>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      bb9c12c9
  6. 29 10月, 2010 4 次提交
    • J
      Btrfs: Add a clear_cache mount option · 88c2ba3b
      Josef Bacik 提交于
      If something goes wrong with the free space cache we need a way to make sure
      it's not loaded on mount and that it's cleared for everybody.  When you pass the
      clear_cache option it will make it so all block groups are setup to be cleared,
      which keeps them from being loaded and then they will be truncated when the
      transaction is committed.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      88c2ba3b
    • J
      Btrfs: add support for mixed data+metadata block groups · 67377734
      Josef Bacik 提交于
      There are just a few things that need to be fixed in the kernel to support mixed
      data+metadata block groups.  Mostly we just need to make sure that if we are
      using mixed block groups that we continue to allocate mixed block groups as we
      need them.  Also we need to make sure __find_space_info will find our space info
      if we search for DATA or METADATA only.  Tested this with xfstests and it works
      nicely.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      67377734
    • J
      Btrfs: write out free space cache · 0cb59c99
      Josef Bacik 提交于
      This is a simple bit, just dump the free space cache out to our preallocated
      inode when we're writing out dirty block groups.  There are a bunch of changes
      in inode.c in order to account for special cases.  Mostly when we're doing the
      writeout we're holding trans_mutex, so we need to use the nolock transacation
      functions.  Also we can't do asynchronous completions since the async thread
      could be blocked on already completed IO waiting for the transaction lock.  This
      has been tested with xfstests and btrfs filesystem balance, as well as my ENOSPC
      tests.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      0cb59c99
    • J
      Btrfs: create special free space cache inode · 0af3d00b
      Josef Bacik 提交于
      In order to save free space cache, we need an inode to hold the data, and we
      need a special item to point at the right inode for the right block group.  So
      first, create a special item that will point to the right inode, and the number
      of extent entries we will have and the number of bitmaps we will have.  We
      truncate and pre-allocate space everytime to make sure it's uptodate.
      
      This feature will be turned on as soon as you mount with -o space_cache, however
      it is safe to boot into old kernels, they will just generate the cache the old
      fashion way.  When you boot back into a newer kernel we will notice that we
      modified and not the cache and automatically discard the cache.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      0af3d00b
  7. 23 10月, 2010 3 次提交
    • J
      Btrfs: rework how we reserve metadata bytes · 8bb8ab2e
      Josef Bacik 提交于
      With multi-threaded writes we were getting ENOSPC early because somebody would
      come in, start flushing delalloc because they couldn't make their reservation,
      and in the meantime other threads would come in and use the space that was
      getting freed up, so when the original thread went to check to see if they had
      space they didn't and they'd return ENOSPC.  So instead if we have some free
      space but not enough for our reservation, take the reservation and then start
      doing the flushing.  The only time we don't take reservations is when we've
      already overcommitted our space, that way we don't have people who come late to
      the party way overcommitting ourselves.  This also moves all of the retrying and
      flushing code into reserve_metdata_bytes so it's all uniform.  This keeps my
      fs_mark test from returning -ENOSPC as soon as it starts and actually lets me
      fill up the disk.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      8bb8ab2e
    • J
      Btrfs: re-work delalloc flushing · 0019f10d
      Josef Bacik 提交于
      Currently we try and flush delalloc, but we only do that in a sort of weak way,
      which works fine in most cases but if we're under heavy pressure we need to be
      able to wait for flushing to happen.  Also instead of checking the bytes
      reserved in the block_rsv, check the space info since it is more accurate.  The
      sync option will be used in a future patch.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      0019f10d
    • J
      Btrfs: fix df regression · 89a55897
      Josef Bacik 提交于
      The new ENOSPC stuff breaks out the raid types which breaks the way we were
      reporting df to the system.  This fixes it back so that Available is the total
      space available to data and used is the actual bytes used by the filesystem.
      This means that Available is Total - data used - all of the metadata space.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      89a55897
  8. 10 8月, 2010 2 次提交
  9. 28 5月, 2010 1 次提交
  10. 25 5月, 2010 11 次提交
  11. 31 3月, 2010 1 次提交
  12. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  13. 15 3月, 2010 6 次提交
    • A
      btrfs: use memparse · 91748467
      Akinobu Mita 提交于
      Use memparse() instead of its own private implementation.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: linux-btrfs@vger.kernel.org
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      91748467
    • J
      Btrfs: cache the extent state everywhere we possibly can V2 · 2ac55d41
      Josef Bacik 提交于
      This patch just goes through and fixes everybody that does
      
      lock_extent()
      blah
      unlock_extent()
      
      to use
      
      lock_extent_bits()
      blah
      unlock_extent_cached()
      
      and pass around a extent_state so we only have to do the searches once per
      function.  This gives me about a 3 mb/s boots on my random write test.  I have
      not converted some things, like the relocation and ioctl's, since they aren't
      heavily used and the relocation stuff is in the middle of being re-written.  I
      also changed the clear_extent_bit() to only unset the cached state if we are
      clearing EXTENT_LOCKED and related stuff, so we can do things like this
      
      lock_extent_bits()
      clear delalloc bits
      unlock_extent_cached()
      
      without losing our cached state.  I tested this thoroughly and turned on
      LEAK_DEBUG to make sure we weren't leaking extent states, everything worked out
      fine.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      2ac55d41
    • C
      Btrfs: add new defrag-range ioctl. · 1e701a32
      Chris Mason 提交于
      The btrfs defrag ioctl was limited to doing the entire file.  This
      commit adds a new interface that can defrag a specific range inside
      the file.
      
      It can also force compression on the file, allowing you to selectively
      compress individual files after they were created, even when mount -o
      compress isn't turned on.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      1e701a32
    • J
      Btrfs: add ioctl and incompat flag to set the default mount subvol · 6ef5ed0d
      Josef Bacik 提交于
      This patch needs to go along with my previous patch.  This lets us set the
      default dir item's location to whatever root we want to use as our default
      mounting subvol.  With this we don't have to use mount -o subvol=<tree id>
      anymore to mount a different subvol, we can just set the new one and it will
      just magically work.  I've done some moderate testing with this, mostly just
      switching the default mount around, mounting subvols and the default mount at
      the same time and such, everything seems to work.  Thanks,
      
      Older kernels would generally be able to still mount the filesystem with the
      default subvolume set, but it would result in a different volume being mounted,
      which could be an even more unpleasant suprise for users.  So if you set your
      default subvolume, you can't go back to older kernels.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      6ef5ed0d
    • J
      Btrfs: change how we mount subvolumes · 73f73415
      Josef Bacik 提交于
      This work is in preperation for being able to set a different root as the
      default mounting root.
      
      There is currently a problem with how we mount subvolumes.  We cannot currently
      mount a subvolume of a subvolume, you can only mount subvolumes/snapshots of the
      default subvolume.  So say you take a snapshot of the default subvolume and call
      it snap1, and then take a snapshot of snap1 and call it snap2, so now you have
      
      /
      /snap1
      /snap1/snap2
      
      as your available volumes.  Currently you can only mount / and /snap1,
      you cannot mount /snap1/snap2.  To fix this problem instead of passing
      subvolid=<name> you must pass in subvolid=<treeid>, where <treeid> is
      the tree id that gets spit out via the subvolume listing you get from
      the subvolume listing patches (btrfs filesystem list).  This allows us
      to mount /, /snap1 and /snap1/snap2 as the root volume.
      
      In addition to the above, we also now read the default dir item in the
      tree root to get the root key that it points to.  For now this just
      points at what has always been the default subvolme, but later on I plan
      to change it to point at whatever root you want to be the new default
      root, so you can just set the default mount and not have to mount with
      -o subvolid=<treeid>.  I tested this out with the above scenario and it
      worked perfectly.  Thanks,
      
      mount -o subvol operates inside the selected subvolid.  For example:
      
      mount -o subvol=snap1,subvolid=256 /dev/xxx /mnt
      
      /mnt will have the snap1 directory for the subvolume with id
      256.
      
      mount -o subvol=snap /dev/xxx /mnt
      
      /mnt will be the snap directory of whatever the default subvolume
      is.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      73f73415
    • J
      Btrfs: make set/get functions for the super compat_ro flags use compat_ro · 12534832
      Josef Bacik 提交于
      Our set/get functions for compat_ro_flags actually look at compat_flags.  This
      will mess any attempt to use compat flags up.  The fix is obvious.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      12534832
  14. 06 3月, 2010 1 次提交
  15. 29 1月, 2010 1 次提交
    • C
      Btrfs: Add mount -o compress-force · a555f810
      Chris Mason 提交于
      The default btrfs mount -o compress mode will quickly back off
      compressing a file if it notices that compression does not reduce the
      size of the data being written.  This can save considerable CPU because
      all future writes to the file go through uncompressed.
      
      But some files are both very large and have mixed data stored in
      them.  In that case, we want to add the ability to always try
      compressing data before writing it.
      
      This commit adds mount -o compress-force.  A later commit will add
      a new inode flag that does the same thing.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a555f810
  16. 18 12月, 2009 1 次提交