1. 06 1月, 2009 2 次提交
  2. 19 12月, 2008 1 次提交
  3. 12 12月, 2008 2 次提交
  4. 09 12月, 2008 1 次提交
    • C
      Btrfs: move data checksumming into a dedicated tree · d20f7043
      Chris Mason 提交于
      Btrfs stores checksums for each data block.  Until now, they have
      been stored in the subvolume trees, indexed by the inode that is
      referencing the data block.  This means that when we read the inode,
      we've probably read in at least some checksums as well.
      
      But, this has a few problems:
      
      * The checksums are indexed by logical offset in the file.  When
      compression is on, this means we have to do the expensive checksumming
      on the uncompressed data.  It would be faster if we could checksum
      the compressed data instead.
      
      * If we implement encryption, we'll be checksumming the plain text and
      storing that on disk.  This is significantly less secure.
      
      * For either compression or encryption, we have to get the plain text
      back before we can verify the checksum as correct.  This makes the raid
      layer balancing and extent moving much more expensive.
      
      * It makes the front end caching code more complex, as we have touch
      the subvolume and inodes as we cache extents.
      
      * There is potentitally one copy of the checksum in each subvolume
      referencing an extent.
      
      The solution used here is to store the extent checksums in a dedicated
      tree.  This allows us to index the checksums by phyiscal extent
      start and length.  It means:
      
      * The checksum is against the data stored on disk, after any compression
      or encryption is done.
      
      * The checksum is stored in a central location, and can be verified without
      following back references, or reading inodes.
      
      This makes compression significantly faster by reducing the amount of
      data that needs to be checksummed.  It will also allow much faster
      raid management code in general.
      
      The checksums are indexed by a key with a fixed objectid (a magic value
      in ctree.h) and offset set to the starting byte of the extent.  This
      allows us to copy the checksum items into the fsync log tree directly (or
      any other tree), without having to invent a second format for them.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      d20f7043
  5. 02 12月, 2008 5 次提交
  6. 20 11月, 2008 1 次提交
  7. 18 11月, 2008 5 次提交
    • C
      Btrfs: prevent loops in the directory tree when creating snapshots · ea9e8b11
      Chris Mason 提交于
      For a directory tree:
      
      /mnt/subvolA/subvolB
      
      btrfsctl -s /mnt/subvolA/subvolB /mnt
      
      Will create a directory loop with subvolA under subvolB.  This
      commit uses the forward refs for each subvol and snapshot to error out
      before creating the loop.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      ea9e8b11
    • C
      Btrfs: Add backrefs and forward refs for subvols and snapshots · 0660b5af
      Chris Mason 提交于
      Subvols and snapshots can now be referenced from any point in the directory
      tree.  We need to maintain back refs for them so we can find lost
      subvols.
      
      Forward refs are added so that we know all of the subvols and
      snapshots referenced anywhere in the directory tree of a single subvol.  This
      can be used to do recursive snapshotting (but they aren't yet) and it is
      also used to detect and prevent directory loops when creating new snapshots.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0660b5af
    • C
      Btrfs: Give each subvol and snapshot their own anonymous devid · 3394e160
      Chris Mason 提交于
      Each subvolume has its own private inode number space, and so we need
      to fill in different device numbers for each subvolume to avoid confusing
      applications.
      
      This commit puts a struct super_block into struct btrfs_root so it can
      call set_anon_super() and get a different device number generated for
      each root.
      
      btrfs_rename is changed to prevent renames across subvols.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      3394e160
    • C
      Btrfs: Allow subvolumes and snapshots anywhere in the directory tree · 3de4586c
      Chris Mason 提交于
      Before, all snapshots and subvolumes lived in a single flat directory.  This
      was awkward and confusing because the single flat directory was only writable
      with the ioctls.
      
      This commit changes the ioctls to create subvols and snapshots at any
      point in the directory tree.  This requires making separate ioctls for
      snapshot and subvol creation instead of a combining them into one.
      
      The subvol ioctl does:
      
      btrfsctl -S subvol_name parent_dir
      
      After the ioctl is done subvol_name lives inside parent_dir.
      
      The snapshot ioctl does:
      
      btrfsctl -s path_for_snapshot root_to_snapshot
      
      path_for_snapshot can be an absolute or relative path.  btrfsctl breaks it up
      into directory and basename components.
      
      root_to_snapshot can be any file or directory in the FS.  The snapshot
      is taken of the entire root where that file lives.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      3de4586c
    • Y
      Btrfs: Seed device support · 2b82032c
      Yan Zheng 提交于
      Seed device is a special btrfs with SEEDING super flag
      set and can only be mounted in read-only mode. Seed
      devices allow people to create new btrfs on top of it.
      
      The new FS contains the same contents as the seed device,
      but it can be mounted in read-write mode.
      
      This patch does the following:
      
      1) split code in btrfs_alloc_chunk into two parts. The first part does makes
      the newly allocated chunk usable, but does not do any operation that modifies
      the chunk tree. The second part does the the chunk tree modifications. This
      division is for the bootstrap step of adding storage to the seed device.
      
      2) Update device management code to handle seed device.
      The basic idea is: For an FS grown from seed devices, its
      seed devices are put into a list. Seed devices are
      opened on demand at mounting time. If any seed device is
      missing or has been changed, btrfs kernel module will
      refuse to mount the FS.
      
      3) make btrfs_find_block_group not return NULL when all
      block groups are read-only.
      Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
      2b82032c
  8. 13 11月, 2008 2 次提交
    • Y
      Btrfs: mount ro and remount support · c146afad
      Yan Zheng 提交于
      This patch adds mount ro and remount support. The main
      changes in patch are: adding btrfs_remount and related
      helper function; splitting the transaction related code
      out of close_ctree into btrfs_commit_super; updating
      allocator to properly handle read only block group.
      Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
      c146afad
    • S
      Btrfs: allow clone of an arbitrary file range · c5c9cd4d
      Sage Weil 提交于
      This patch adds an additional CLONE_RANGE ioctl to clone an arbitrary 
      (block-aligned) file range to another file.  The original CLONE ioctl 
      becomes a special case of cloning the entire file range.  The logic is a 
      bit more complex now since ranges may be cloned to different offsets, and 
      because we may only be cloning the beginning or end of a particular extent 
      or checksum item.
      
      An additional sanity check ensures the source and destination files aren't 
      the same (which would previously deadlock), although eventually this could 
      be extended to allow the duplication of file data at a different offset 
      within the same file.
      
      Any extents within the destination range in the target file are dropped.
      
      We currently do not cope with the case where a compressed inline extent 
      needs to be split.  This will probably require decompressing the extent 
      into a temporary address_space, and inserting just the cloned portion as a 
      new compressed inline extent.  For now, just return -EINVAL in this case.  
      Note that this never comes up in the more common case of cloning an entire 
      file.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      c5c9cd4d
  9. 31 10月, 2008 2 次提交
    • Y
      Btrfs: Add fallocate support v2 · d899e052
      Yan Zheng 提交于
      This patch updates btrfs-progs for fallocate support.
      
      fallocate is a little different in Btrfs because we need to tell the
      COW system that a given preallocated extent doesn't need to be
      cow'd as long as there are no snapshots of it.  This leverages the
      -o nodatacow checks.
      Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
      d899e052
    • Y
      Btrfs: update nodatacow code v2 · 80ff3856
      Yan Zheng 提交于
      This patch simplifies the nodatacow checker. If all references
      were created after the latest snapshot, then we can avoid COW
      safely. This patch also updates run_delalloc_nocow to do more
      fine-grained checking.
      Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
      80ff3856
  10. 30 10月, 2008 1 次提交
  11. 10 10月, 2008 2 次提交
    • C
      Btrfs: Don't call security_inode_mkdir during subvol creation · a3dddf3f
      Chris Mason 提交于
      Subvol creation already requires privs, and security_inode_mkdir isn't
      exported.  For now we don't need it.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a3dddf3f
    • C
      Btrfs: Fix subvolume creation locking rules · cb8e7090
      Christoph Hellwig 提交于
      Creating a subvolume is in many ways like a normal VFS ->mkdir, and we
      really need to play with the VFS topology locking rules.  So instead of
      just creating the snapshot on disk and then later getting rid of
      confliting aliases do it correctly from the start.  This will become
      especially important once we allow for subvolumes anywhere in the tree,
      and not just below a hidden root.
      
      Note that snapshots will need the same treatment, but do to the delay
      in creating them we can't do it currently.  Chris promised to fix that
      issue, so I'll wait on that.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      
      cb8e7090
  12. 09 10月, 2008 2 次提交
    • Y
      Btrfs: Remove offset field from struct btrfs_extent_ref · 3bb1a1bc
      Yan Zheng 提交于
      The offset field in struct btrfs_extent_ref records the position
      inside file that file extent is referenced by. In the new back
      reference system, tree leaves holding references to file extent
      are recorded explicitly. We can scan these tree leaves very quickly, so the
      offset field is not required.
      
      This patch also makes the back reference system check the objectid
      when extents are in deleting.
      Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
      3bb1a1bc
    • Y
      Btrfs: Count space allocated to file in bytes · a76a3cd4
      Yan Zheng 提交于
      This patch makes btrfs count space allocated to file in bytes instead
      of 512 byte sectors.
      
      Everything else in btrfs uses a byte count instead of sector sizes or
      blocks sizes, so this fits better.
      Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
      a76a3cd4
  13. 26 9月, 2008 1 次提交
    • Z
      Btrfs: extent_map and data=ordered fixes for space balancing · 5b21f2ed
      Zheng Yan 提交于
      * Add an EXTENT_BOUNDARY state bit to keep the writepage code
      from merging data extents that are in the process of being
      relocated.  This allows us to do accounting for them properly.
      
      * The balancing code relocates data extents indepdent of the underlying
      inode.  The extent_map code was modified to properly account for
      things moving around (invalidating extent_map caches in the inode).
      
      * Don't take the drop_mutex in the create_subvol ioctl.  It isn't
      required.
      
      * Fix walking of the ordered extent list to avoid races with sys_unlink
      
      * Change the lock ordering rules.  Transaction start goes outside
      the drop_mutex.  This allows btrfs_commit_transaction to directly
      drop the relocation trees.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      5b21f2ed
  14. 25 9月, 2008 13 次提交