提交 · 0b4dcea579a1b6f4d249d61f5bc8adeaa7c895d8 · xiphi1978 / linux

11 6月, 2009 1 次提交
- C
  Btrfs: fix oops when btrfs_inherit_iflags called with a NULL dir · 0b4dcea5
  由 Chris Mason 提交于 6月 11, 2009
```
This happens during subvol creation.
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
  0b4dcea5
10 6月, 2009 2 次提交

Btrfs: implement FS_IOC_GETFLAGS/SETFLAGS/GETVERSION · 6cbff00f

由 Christoph Hellwig 提交于 4月 17, 2009

Add support for the standard attributes set via chattr and read via
lsattr.  Currently we store the attributes in the flags value in
the btrfs inode, but I wonder whether we should split it into two so
that we don't have to keep converting between the two formats.

Remove the btrfs_clear_flag/btrfs_set_flag/btrfs_test_flag macros
as they were confusing the existing code and got in the way of the
new additions.

Also add the FS_IOC_GETVERSION ioctl for getting i_generation as it's
trivial.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

6cbff00f

Btrfs: Mixed back reference (FORWARD ROLLING FORMAT CHANGE) · 5d4f98a2

由 Yan Zheng 提交于 6月 10, 2009

This commit introduces a new kind of back reference for btrfs metadata.
Once a filesystem has been mounted with this commit, IT WILL NO LONGER
BE MOUNTABLE BY OLDER KERNELS.

When a tree block in subvolume tree is cow'd, the reference counts of all
extents it points to are increased by one. At transaction commit time,
the old root of the subvolume is recorded in a "dead root" data structure,
and the btree it points to is later walked, dropping reference counts
and freeing any blocks where the reference count goes to 0.

The increments done during cow and decrements done after commit cancel out,
and the walk is a very expensive way to go about freeing the blocks that
are no longer referenced by the new btree root. This commit reduces the
transaction overhead by avoiding the need for dead root records.

When a non-shared tree block is cow'd, we free the old block at once, and the
new block inherits old block's references. When a tree block with reference
count > 1 is cow'd, we increase the reference counts of all extents
the new block points to by one, and decrease the old block's reference count by
one.

This dead tree avoidance code removes the need to modify the reference
counts of lower level extents when a non-shared tree block is cow'd.
But we still need to update back ref for all pointers in the block.
This is because the location of the block is recorded in the back ref
item.

We can solve this by introducing a new type of back ref. The new
back ref provides information about pointer's key, level and in which
tree the pointer lives. This information allow us to find the pointer
by searching the tree. The shortcoming of the new back ref is that it
only works for pointers in tree blocks referenced by their owner trees.

This is mostly a problem for snapshots, where resolving one of these
fuzzy back references would be O(number_of_snapshots) and quite slow.
The solution used here is to use the fuzzy back references in the common
case where a given tree block is only referenced by one root,
and use the full back references when multiple roots have a reference
on a given block.

This commit adds per subvolume red-black tree to keep trace of cached
inodes. The red-black tree helps the balancing code to find cached
inodes whose inode numbers within a given range.

This commit improves the balancing code by introducing several data
structures to keep the state of balancing. The most important one
is the back ref cache. It caches how the upper level tree blocks are
referenced. This greatly reduce the overhead of checking back ref.

The improved balancing code scales significantly better with a large
number of snapshots.

This is a very large commit and was written in a number of
pieces. But, they depend heavily on the disk format change and were
squashed together to make sure git bisect didn't end up in a
bad state wrt space balancing or the format change.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5d4f98a2

15 5月, 2009 1 次提交

Btrfs: remove outdated comment in btrfs_ioctl_resize() · 5d847a8e

由 Li Hong 提交于 5月 14, 2009

In Li Zefan's commit dae7b665,
a combination call of kmalloc() and copy_from_user() is replaced by
memdup_user(). So btrfs_ioctl_resize() doesn't use GFP_NOFS any more.
Signed-off-by: NLi Hong <lihong.hi@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5d847a8e

27 4月, 2009 1 次提交

Btrfs: Fix a bunch of printk() warnings. · 21380931

由 Joel Becker 提交于 4月 21, 2009

Just happened to notice a bunch of %llu vs u64 warnings.  Here's a patch
to cast them all.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

21380931

25 4月, 2009 1 次提交

Btrfs: fix fallocate deadlock on inode extent lock · e980b50c

由 Chris Mason 提交于 4月 24, 2009

The btrfs fallocate call takes an extent lock on the entire range
being fallocated, and then runs through insert_reserved_extent on each
extent as they are allocated.

The problem with this is that btrfs_drop_extents may decide to try
and take the same extent lock fallocate was already holding.  The solution
used here is to push down knowledge of the range that is already locked
going into btrfs_drop_extents.

It turns out that at least one other caller had the same bug.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e980b50c

21 4月, 2009 1 次提交

btrfs: use memdup_user() · dae7b665

由 Li Zefan 提交于 4月 08, 2009

Remove open-coded memdup_user().

Note this changes some GFP_NOFS to GFP_KERNEL, since copy_from_user() may
cause pagefault, it's pointless to pass GFP_NOFS to kmalloc().
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

dae7b665

01 4月, 2009 1 次提交

New helper - current_umask() · ce3b0f8d

由 Al Viro 提交于 3月 29, 2009

current->fs->umask is what most of fs_struct users are doing.
Put that into a helper function.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ce3b0f8d

21 2月, 2009 1 次提交

Btrfs: add better -ENOSPC handling · 6a63209f

由 Josef Bacik 提交于 2月 20, 2009

This is a step in the direction of better -ENOSPC handling. Instead of
checking the global bytes counter we check the space_info bytes counters to
make sure we have enough space.

If we don't we go ahead and try to allocate a new chunk, and then if that fails
we return -ENOSPC. This patch adds two counters to btrfs_space_info,
bytes_delalloc and bytes_may_use.

bytes_delalloc account for extents we've actually setup for delalloc and will
be allocated at some point down the line.

bytes_may_use is to keep track of how many bytes we may use for delalloc at
some point. When we actually set the extent_bit for the delalloc bytes we
subtract the reserved bytes from the bytes_may_use counter. This keeps us from
not actually being able to allocate space for any delalloc bytes.
Signed-off-by: NJosef Bacik <jbacik@redhat.com>

6a63209f

21 1月, 2009 1 次提交

Btrfs: removed unused #include <version.h>'s · 7eaebe7d

由 Huang Weiyi 提交于 1月 21, 2009

Removed unused #include <version.h>'s in btrfs
Signed-off-by: NHuang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7eaebe7d

06 1月, 2009 3 次提交

Btrfs: Fix checkpatch.pl warnings · d397712b

由 Chris Mason 提交于 1月 05, 2009

There were many, most are fixed now.  struct-funcs.c generates some warnings
but these are bogus.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d397712b

Btrfs: update directory's size when creating subvol/snapshot · 52c26179

由 Yan Zheng 提交于 1月 05, 2009

Make sure directory's size properly updated when creating
subvol/snapshot.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

52c26179

Btrfs: add permission checks to the ioctls · e441d54d

由 Chris Mason 提交于 1月 05, 2009

Only root can add/remove devices
Only root can defrag subtrees
Only files open for writing can be defragged
Only files open for writing can be the destination for a clone
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e441d54d

19 12月, 2008 1 次提交

Btrfs: Add missing mnt_drop_write in ioctl.c · ab67b7c1

由 Yan Zheng 提交于 12月 19, 2008

This patch adds the missing mnt_drop_write to match
mnt_want_write in btrfs_ioctl_defrag and btrfs_ioctl_clone
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

ab67b7c1

12 12月, 2008 2 次提交

Btrfs: fix leaking block group on balance · d2fb3437

由 Yan Zheng 提交于 12月 11, 2008

The block group structs are referenced in many different
places, and it's not safe to free while balancing.  So, those block
group structs were simply leaked instead.

This patch replaces the block group pointer in the inode with the starting byte
offset of the block group and adds reference counting to the block group
struct.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

d2fb3437

Btrfs: mnt_drop_write in ioctl_trans_end · cfc8ea87

由 Sage Weil 提交于 12月 11, 2008

Add missing mnt_drop_write to match the mnt_want_write in
btrfs_ioctl_trans_start.
Signed-off-by: NSage Weil <sage@newdream.net>

cfc8ea87

09 12月, 2008 1 次提交

Btrfs: move data checksumming into a dedicated tree · d20f7043

由 Chris Mason 提交于 12月 08, 2008

Btrfs stores checksums for each data block.  Until now, they have
been stored in the subvolume trees, indexed by the inode that is
referencing the data block.  This means that when we read the inode,
we've probably read in at least some checksums as well.

But, this has a few problems:

* The checksums are indexed by logical offset in the file.  When
compression is on, this means we have to do the expensive checksumming
on the uncompressed data.  It would be faster if we could checksum
the compressed data instead.

* If we implement encryption, we'll be checksumming the plain text and
storing that on disk.  This is significantly less secure.

* For either compression or encryption, we have to get the plain text
back before we can verify the checksum as correct.  This makes the raid
layer balancing and extent moving much more expensive.

* It makes the front end caching code more complex, as we have touch
the subvolume and inodes as we cache extents.

* There is potentitally one copy of the checksum in each subvolume
referencing an extent.

The solution used here is to store the extent checksums in a dedicated
tree.  This allows us to index the checksums by phyiscal extent
start and length.  It means:

* The checksum is against the data stored on disk, after any compression
or encryption is done.

* The checksum is stored in a central location, and can be verified without
following back references, or reading inodes.

This makes compression significantly faster by reducing the amount of
data that needs to be checksummed.  It will also allow much faster
raid management code in general.

The checksums are indexed by a key with a fixed objectid (a magic value
in ctree.h) and offset set to the starting byte of the extent.  This
allows us to copy the checksum items into the fsync log tree directly (or
any other tree), without having to invent a second format for them.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d20f7043

02 12月, 2008 5 次提交

Btrfs: add support for multiple csum algorithms · 607d432d

由 Josef Bacik 提交于 12月 02, 2008

This patch gives us the space we will need in order to have different csum
algorithims at some point in the future. We save the csum algorithim type
in the superblock, and use those instead of define's.
Signed-off-by: NJosef Bacik <jbacik@redhat.com>

607d432d

Btrfs: btrfs: pass void __user * to btrfs_ioctl_clone_range · 7a865e8a

由 Christoph Hellwig 提交于 12月 02, 2008

Cleans the code up a little and also avoids a sparse warning due to the
incorrect cast in the current version of the code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

7a865e8a

Btrfs: clean up btrfs_ioctl a little bit · 4bcabaa3

由 Christoph Hellwig 提交于 12月 02, 2008

Provide a void __user *argp pointer so that we can avoid duplicating
the cast for various sub-command calls.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

4bcabaa3

Btrfs: make things static and include the right headers · b2950863

由 Christoph Hellwig 提交于 12月 02, 2008

Shut up various sparse warnings about symbols that should be either
static or have their declarations in scope.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b2950863

S
Btrfs: remove unneeded btrfs_start_delalloc_inodes call · 1ffa4f42
由 Sage Weil 提交于 12月 02, 2008
```
It is called by btrfs_sync_fs.
Signed-off-by: NSage Weil <sage@newdream.net>
```
1ffa4f42

20 11月, 2008 1 次提交

Btrfs: compat code fixes · 4b4e25f2

由 Chris Mason 提交于 11月 20, 2008

The btrfs git kernel trees is used to build a standalone tree for
compiling against older kernels.  This commit makes the standalone tree
work with 2.6.27
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4b4e25f2

18 11月, 2008 5 次提交

Btrfs: prevent loops in the directory tree when creating snapshots · ea9e8b11

由 Chris Mason 提交于 11月 17, 2008

For a directory tree:

/mnt/subvolA/subvolB

btrfsctl -s /mnt/subvolA/subvolB /mnt

Will create a directory loop with subvolA under subvolB.  This
commit uses the forward refs for each subvol and snapshot to error out
before creating the loop.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ea9e8b11

Btrfs: Add backrefs and forward refs for subvols and snapshots · 0660b5af

由 Chris Mason 提交于 11月 17, 2008

Subvols and snapshots can now be referenced from any point in the directory
tree.  We need to maintain back refs for them so we can find lost
subvols.

Forward refs are added so that we know all of the subvols and
snapshots referenced anywhere in the directory tree of a single subvol.  This
can be used to do recursive snapshotting (but they aren't yet) and it is
also used to detect and prevent directory loops when creating new snapshots.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0660b5af

Btrfs: Give each subvol and snapshot their own anonymous devid · 3394e160

由 Chris Mason 提交于 11月 17, 2008

Each subvolume has its own private inode number space, and so we need
to fill in different device numbers for each subvolume to avoid confusing
applications.

This commit puts a struct super_block into struct btrfs_root so it can
call set_anon_super() and get a different device number generated for
each root.

btrfs_rename is changed to prevent renames across subvols.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3394e160

Btrfs: Allow subvolumes and snapshots anywhere in the directory tree · 3de4586c

由 Chris Mason 提交于 11月 17, 2008

Before, all snapshots and subvolumes lived in a single flat directory.  This
was awkward and confusing because the single flat directory was only writable
with the ioctls.

This commit changes the ioctls to create subvols and snapshots at any
point in the directory tree.  This requires making separate ioctls for
snapshot and subvol creation instead of a combining them into one.

The subvol ioctl does:

btrfsctl -S subvol_name parent_dir

After the ioctl is done subvol_name lives inside parent_dir.

The snapshot ioctl does:

btrfsctl -s path_for_snapshot root_to_snapshot

path_for_snapshot can be an absolute or relative path.  btrfsctl breaks it up
into directory and basename components.

root_to_snapshot can be any file or directory in the FS.  The snapshot
is taken of the entire root where that file lives.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3de4586c

Btrfs: Seed device support · 2b82032c

由 Yan Zheng 提交于 11月 17, 2008

Seed device is a special btrfs with SEEDING super flag
set and can only be mounted in read-only mode. Seed
devices allow people to create new btrfs on top of it.

The new FS contains the same contents as the seed device,
but it can be mounted in read-write mode.

This patch does the following:

1) split code in btrfs_alloc_chunk into two parts. The first part does makes
the newly allocated chunk usable, but does not do any operation that modifies
the chunk tree. The second part does the the chunk tree modifications. This
division is for the bootstrap step of adding storage to the seed device.

2) Update device management code to handle seed device.
The basic idea is: For an FS grown from seed devices, its
seed devices are put into a list. Seed devices are
opened on demand at mounting time. If any seed device is
missing or has been changed, btrfs kernel module will
refuse to mount the FS.

3) make btrfs_find_block_group not return NULL when all
block groups are read-only.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

2b82032c

13 11月, 2008 2 次提交

Btrfs: mount ro and remount support · c146afad

由 Yan Zheng 提交于 11月 12, 2008

This patch adds mount ro and remount support. The main
changes in patch are: adding btrfs_remount and related
helper function; splitting the transaction related code
out of close_ctree into btrfs_commit_super; updating
allocator to properly handle read only block group.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

c146afad

Btrfs: allow clone of an arbitrary file range · c5c9cd4d

由 Sage Weil 提交于 11月 12, 2008

This patch adds an additional CLONE_RANGE ioctl to clone an arbitrary 
(block-aligned) file range to another file.  The original CLONE ioctl 
becomes a special case of cloning the entire file range.  The logic is a 
bit more complex now since ranges may be cloned to different offsets, and 
because we may only be cloning the beginning or end of a particular extent 
or checksum item.

An additional sanity check ensures the source and destination files aren't 
the same (which would previously deadlock), although eventually this could 
be extended to allow the duplication of file data at a different offset 
within the same file.

Any extents within the destination range in the target file are dropped.

We currently do not cope with the case where a compressed inline extent 
needs to be split.  This will probably require decompressing the extent 
into a temporary address_space, and inserting just the cloned portion as a 
new compressed inline extent.  For now, just return -EINVAL in this case.  
Note that this never comes up in the more common case of cloning an entire 
file.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c5c9cd4d

31 10月, 2008 2 次提交

Btrfs: Add fallocate support v2 · d899e052

由 Yan Zheng 提交于 10月 30, 2008

This patch updates btrfs-progs for fallocate support.

fallocate is a little different in Btrfs because we need to tell the
COW system that a given preallocated extent doesn't need to be
cow'd as long as there are no snapshots of it.  This leverages the
-o nodatacow checks.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

d899e052

Btrfs: update nodatacow code v2 · 80ff3856

由 Yan Zheng 提交于 10月 30, 2008

This patch simplifies the nodatacow checker. If all references
were created after the latest snapshot, then we can avoid COW
safely. This patch also updates run_delalloc_nocow to do more
fine-grained checking.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

80ff3856

30 10月, 2008 1 次提交

Btrfs: Add root tree pointer transaction ids · 84234f3a

由 Yan Zheng 提交于 10月 29, 2008

This patch adds transaction IDs to root tree pointers.
Transaction IDs in tree pointers are compared with the
generation numbers in block headers when reading root
blocks of trees. This can detect some types of IO errors.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

84234f3a

10 10月, 2008 2 次提交

Btrfs: Don't call security_inode_mkdir during subvol creation · a3dddf3f

由 Chris Mason 提交于 10月 10, 2008

Subvol creation already requires privs, and security_inode_mkdir isn't
exported.  For now we don't need it.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a3dddf3f

Btrfs: Fix subvolume creation locking rules · cb8e7090

由 Christoph Hellwig 提交于 10月 09, 2008

Creating a subvolume is in many ways like a normal VFS ->mkdir, and we
really need to play with the VFS topology locking rules.  So instead of
just creating the snapshot on disk and then later getting rid of
confliting aliases do it correctly from the start.  This will become
especially important once we allow for subvolumes anywhere in the tree,
and not just below a hidden root.

Note that snapshots will need the same treatment, but do to the delay
in creating them we can't do it currently.  Chris promised to fix that
issue, so I'll wait on that.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

cb8e7090

09 10月, 2008 2 次提交

Btrfs: Remove offset field from struct btrfs_extent_ref · 3bb1a1bc

由 Yan Zheng 提交于 10月 09, 2008

The offset field in struct btrfs_extent_ref records the position
inside file that file extent is referenced by. In the new back
reference system, tree leaves holding references to file extent
are recorded explicitly. We can scan these tree leaves very quickly, so the
offset field is not required.

This patch also makes the back reference system check the objectid
when extents are in deleting.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

3bb1a1bc

Btrfs: Count space allocated to file in bytes · a76a3cd4

由 Yan Zheng 提交于 10月 09, 2008

This patch makes btrfs count space allocated to file in bytes instead
of 512 byte sectors.

Everything else in btrfs uses a byte count instead of sector sizes or
blocks sizes, so this fits better.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

a76a3cd4

26 9月, 2008 1 次提交

Btrfs: extent_map and data=ordered fixes for space balancing · 5b21f2ed

由 Zheng Yan 提交于 9月 26, 2008

* Add an EXTENT_BOUNDARY state bit to keep the writepage code
from merging data extents that are in the process of being
relocated.  This allows us to do accounting for them properly.

* The balancing code relocates data extents indepdent of the underlying
inode.  The extent_map code was modified to properly account for
things moving around (invalidating extent_map caches in the inode).

* Don't take the drop_mutex in the create_subvol ioctl.  It isn't
required.

* Fix walking of the ordered extent list to avoid races with sys_unlink

* Change the lock ordering rules.  Transaction start goes outside
the drop_mutex.  This allows btrfs_commit_transaction to directly
drop the relocation trees.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5b21f2ed

25 9月, 2008 2 次提交

Btrfs: Full back reference support · 31840ae1

由 Zheng Yan 提交于 9月 23, 2008

This patch makes the back reference system to explicit record the
location of parent node for all types of extents. The location of
parent node is placed into the offset field of backref key. Every
time a tree block is balanced, the back references for the affected
lower level extents are updated.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

31840ae1

Btrfs: trivial sparse fixes · b214107e

由 Christoph Hellwig 提交于 9月 05, 2008

Fix a bunch of trivial sparse complaints.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b214107e