提交 · c2ec175c39f62949438354f603f4aa170846aabb · openanolis / cloud-kernel

01 4月, 2009 1 次提交

mm: page_mkwrite change prototype to match fault · c2ec175c

由 Nick Piggin 提交于 3月 31, 2009

Change the page_mkwrite prototype to take a struct vm_fault, and return
VM_FAULT_xxx flags.  There should be no functional change.

This makes it possible to return much more detailed error information to
the VM (and also can provide more information eg.  virtual_address to the
driver, which might be important in some special cases).

This is required for a subsequent fix.  And will also make it easier to
merge page_mkwrite() with fault() in future.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Cc: Felix Blyakher <felixb@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c2ec175c

21 2月, 2009 1 次提交

Btrfs: add better -ENOSPC handling · 6a63209f

由 Josef Bacik 提交于 2月 20, 2009

This is a step in the direction of better -ENOSPC handling. Instead of
checking the global bytes counter we check the space_info bytes counters to
make sure we have enough space.

If we don't we go ahead and try to allocate a new chunk, and then if that fails
we return -ENOSPC. This patch adds two counters to btrfs_space_info,
bytes_delalloc and bytes_may_use.

bytes_delalloc account for extents we've actually setup for delalloc and will
be allocated at some point down the line.

bytes_may_use is to keep track of how many bytes we may use for delalloc at
some point. When we actually set the extent_bit for the delalloc bytes we
subtract the reserved bytes from the bytes_may_use counter. This keeps us from
not actually being able to allocate space for any delalloc bytes.
Signed-off-by: NJosef Bacik <jbacik@redhat.com>

6a63209f

13 2月, 2009 1 次提交

Btrfs: remove btrfs_init_path · e00f7308

由 Jeff Mahoney 提交于 2月 12, 2009

btrfs_init_path was initially used when the path objects were on the
stack.  Now all the work is done by btrfs_alloc_path and btrfs_init_path
isn't required.

This patch removes it, and just uses kmem_cache_zalloc to zero out the object.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e00f7308

12 2月, 2009 1 次提交

Btrfs: Avoid using __GFP_HIGHMEM with slab allocator · b335b003

由 Yan Zheng 提交于 2月 12, 2009

btrfs_releasepage may call kmem_cache_alloc indirectly,
and provide same GFP flags it gets to kmem_cache_alloc.
So it's possible to use __GFP_HIGHMEM with the slab
allocator.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

b335b003

07 2月, 2009 1 次提交

Btrfs: Make sure dir is non-null before doing S_ISGID checks · 42f15d77

由 Chris Mason 提交于 2月 06, 2009

The S_ISGID check in btrfs_new_inode caused an oops during subvol creation
because sometimes the dir is null.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

42f15d77

04 2月, 2009 6 次提交

Btrfs: Change btrfs_truncate_inode_items to stop when it hits the inode · 06d9a8d7

由 Chris Mason 提交于 2月 04, 2009

btrfs_truncate_inode_items is setup to stop doing btree searches when
it has finished removing the items for the inode.  It used to detect the
end of the inode by looking for an objectid that didn't match the
one we were searching for.

But, this would result in an extra search through the btree, which
adds extra balancing and cow costs to the operation.

This commit adds a check to see if we found the inode item, which means
we can stop searching early.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

06d9a8d7

Btrfs: Don't try to compress pages past i_size · f03d9301

由 Chris Mason 提交于 2月 04, 2009

The compression code had some checks to make sure we were only
compressing bytes inside of i_size, but it wasn't catching every
case.  To make things worse, some incorrect math about the number
of bytes remaining would make it try to compress more pages than the
file really had.

The fix used here is to fall back to the non-compression code in this
case, which does all the proper cleanup of delalloc and other accounting.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f03d9301

Btrfs: Handle SGID bit when creating inodes · 8c087b51

由 Chris Ball 提交于 2月 04, 2009

Before this patch, new files/dirs would ignore the SGID bit on their
parent directory and always be owned by the creating user's uid/gid.
Signed-off-by: NChris Ball <cjb@laptop.org>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

8c087b51

Btrfs: Make btrfs_drop_snapshot work in larger and more efficient chunks · bd56b302

由 Chris Mason 提交于 2月 04, 2009

Every transaction in btrfs creates a new snapshot, and then schedules the
snapshot from the last transaction for deletion.  Snapshot deletion
works by walking down the btree and dropping the reference counts
on each btree block during the walk.

If if a given leaf or node has a reference count greater than one,
the reference count is decremented and the subtree pointed to by that
node is ignored.

If the reference count is one, walking continues down into that node
or leaf, and the references of everything it points to are decremented.

The old code would try to work in small pieces, walking down the tree
until it found the lowest leaf or node to free and then returning.  This
was very friendly to the rest of the FS because it didn't have a huge
impact on other operations.

But it wouldn't always keep up with the rate that new commits added new
snapshots for deletion, and it wasn't very optimal for the extent
allocation tree because it wasn't finding leaves that were close together
on disk and processing them at the same time.

This changes things to walk down to a level 1 node and then process it
in bulk.  All the leaf pointers are sorted and the leaves are dropped
in order based on their extent number.

The extent allocation tree and commit code are now fast enough for
this kind of bulk processing to work without slowing the rest of the FS
down.  Overall it does less IO and is better able to keep up with
snapshot deletions under high load.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

bd56b302

Btrfs: Change btree locking to use explicit blocking points · b4ce94de

由 Chris Mason 提交于 2月 04, 2009

Most of the btrfs metadata operations can be protected by a spinlock,
but some operations still need to schedule.

So far, btrfs has been using a mutex along with a trylock loop,
most of the time it is able to avoid going for the full mutex, so
the trylock loop is a big performance gain.

This commit is step one for getting rid of the blocking locks entirely.
btrfs_tree_lock takes a spinlock, and the code explicitly switches
to a blocking lock when it starts an operation that can schedule.

We'll be able get rid of the blocking locks in smaller pieces over time.
Tracing allows us to find the most common cause of blocking, so we
can start with the hot spots first.

The basic idea is:

btrfs_tree_lock() returns with the spin lock held

btrfs_set_lock_blocking() sets the EXTENT_BUFFER_BLOCKING bit in
the extent buffer flags, and then drops the spin lock.  The buffer is
still considered locked by all of the btrfs code.

If btrfs_tree_lock gets the spinlock but finds the blocking bit set, it drops
the spin lock and waits on a wait queue for the blocking bit to go away.

Much of the code that needs to set the blocking bit finishes without actually
blocking a good percentage of the time.  So, an adaptive spin is still
used against the blocking bit to avoid very high context switch rates.

btrfs_clear_lock_blocking() clears the blocking bit and returns
with the spinlock held again.

btrfs_tree_unlock() can be called on either blocking or spinning locks,
it does the right thing based on the blocking bit.

ctree.c has a helper function to set/clear all the locked buffers in a
path as blocking.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b4ce94de

Btrfs: selinux support · 0279b4cd

由 Jim Owens 提交于 2月 04, 2009

Add call to LSM security initialization and save
resulting security xattr for new inodes.

Add xattr support to symlink inode ops.

Set inode->i_op for existing special files.
Signed-off-by: Njim owens <jowens@hp.com>

0279b4cd

29 1月, 2009 1 次提交

Btrfs: fix readdir on 32 bit machines · 89f135d8

由 Chris Mason 提交于 1月 28, 2009

After btrfs_readdir has gone through all the directory items, it
sets the directory f_pos to the largest possible int.  This way
applications that mix readdir with creating new files don't
end up in an endless loop finding the new directory items as they go.

It was a workaround for a bug in git, but the assumption was that if git
could make this looping mistake than it would be a common problem.

The largest possible int chosen was INT_LIMIT(typeof(file->f_pos),
and it is possible for that to be a larger number than 32 bit glibc
expects to come out of readdir.

This patches switches that to INT_LIMIT(off_t), which should keep
applications happy on 32 and 64 bit machines.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

89f135d8

22 1月, 2009 2 次提交

Btrfs: fiemap support · 1506fcc8

由 Yehuda Sadeh 提交于 1月 21, 2009

Now that bmap support is gone, this is the only way to get extent
mappings for userland. These are still not valid for IO, but they
can tell us if a file has holes or how much fragmentation there is.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

1506fcc8

Btrfs: stop providing a bmap operation to avoid swapfile corruptions · 35054394

由 Chris Mason 提交于 1月 21, 2009

Swapfiles use bmap to build a list of extents belonging to the file,
and they assume these extents won't change over the life of the file.
They also use resulting list to do IO directly to the block device.

This causes problems for btrfs in a few ways:

btrfs returns logical block numbers through bmap, and these are not suitable
for IO.  They might translate to different devices, raid etc.

COW means that file block mappings are going to change frequently.

Using swapfiles on btrfs will lead to corruption, so we're avoiding the
problem for now by dropping bmap support entirely.  A later commit
will add fiemap support for people that really want to know how
a file is laid out.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

35054394

21 1月, 2009 2 次提交

Btrfs: simplify iteration codes · c6e30871

由 Qinghuang Feng 提交于 1月 21, 2009

Merge list_for_each* and list_entry to list_for_each_entry*
Signed-off-by: NQinghuang Feng <qhfeng.kernel@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c6e30871

Btrfs: removed unused #include <version.h>'s · 7eaebe7d

由 Huang Weiyi 提交于 1月 21, 2009

Removed unused #include <version.h>'s in btrfs
Signed-off-by: NHuang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7eaebe7d

07 1月, 2009 3 次提交

Btrfs: kmap_atomic(KM_USER0) is safe for btrfs_readpage_end_io_hook · 9ab86c8e

由 Chris Mason 提交于 1月 07, 2009

None of the checksum verification code schedules, so we can use the faster
kmap_atomic
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9ab86c8e

Btrfs: Don't use kmap_atomic(..., KM_IRQ0) during checksum verifies · cc7172de

由 Chris Mason 提交于 1月 06, 2009

Checksum verification happens in a helper thread, and there is no
need to mess with interrupts.  This switches to kmap() instead.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

cc7172de

Btrfs: tree logging checksum fixes · 07d400a6

由 Yan Zheng 提交于 1月 06, 2009

This patch contains following things.

1) Limit the max size of btrfs_ordered_sum structure to PAGE_SIZE.  This
struct is kmalloced so we want to keep it reasonable.

2) Replace copy_extent_csums by btrfs_lookup_csums_range.  This was
duplicated code in tree-log.c

3) Remove replay_one_csum. csum items are replayed at the same time as
   replaying file extents. This guarantees we only replay useful csums.

4) nbytes accounting fix.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

07d400a6

06 1月, 2009 2 次提交

Btrfs: Use btrfs_join_transaction to avoid deadlocks during snapshot creation · 180591bc

由 Yan Zheng 提交于 1月 06, 2009

Snapshot creation happens at a specific time during transaction commit.  We
need to make sure the code called by snapshot creation doesn't wait
for the running transaction to commit.

This changes btrfs_delete_inode and finish_pending_snaps to use
btrfs_join_transaction instead of btrfs_start_transaction to avoid deadlocks.

It would be better if btrfs_delete_inode didn't use the join, but the
call path that triggers it is:

btrfs_commit_transaction->create_pending_snapshots->
create_pending_snapshot->btrfs_lookup_dentry->
fixup_tree_root_location->btrfs_read_fs_root->
btrfs_read_fs_root_no_name->btrfs_orphan_cleanup->iput

This will be fixed in a later patch by moving the orphan cleanup to the
cleaner thread.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

180591bc

Btrfs: Fix checkpatch.pl warnings · d397712b

由 Chris Mason 提交于 1月 05, 2009

There were many, most are fixed now.  struct-funcs.c generates some warnings
but these are bogus.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d397712b

18 12月, 2008 1 次提交

Btrfs: shift all end_io work to thread pools · cad321ad

由 Chris Mason 提交于 12月 17, 2008

bio_end_io for reads without checksumming on and btree writes were
happening without using async thread pools.  This means the extent_io.c
code had to use spin_lock_irq and friends on the rb tree locks for
extent state.

There were some irq safe vs unsafe lock inversions between the delallock
lock and the extent state locks.  This patch gets rid of them by moving
all end_io code into the thread pools.

To avoid contention and deadlocks between the data end_io processing and the
metadata end_io processing yet another thread pool is added to finish
off metadata writes.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

cad321ad

16 12月, 2008 2 次提交

Btrfs: Don't use spin*lock_irq for the delalloc lock · 75eff68e

由 Chris Mason 提交于 12月 15, 2008

The delalloc lock doesn't need to have irqs disabled, nobody that
changes the number of delalloc bytes in the FS is running with irqs off.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

75eff68e

Btrfs: Fix compressed writes on truncated pages · 42dc7bab

由 Chris Mason 提交于 12月 15, 2008

The compression code was using isize to limit the amount of data it
sent through zlib.  But, it wasn't properly limiting the looping to
just the pages inside i_size.  The end result was trying to compress
too many pages, including those that had not been setup and properly locked
down.  This made the compression code oops while trying find_get_page on a
page that didn't exist.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

42dc7bab

12 12月, 2008 2 次提交

Btrfs: fix nodatasum handling in balancing code · 17d217fe

由 Yan Zheng 提交于 12月 12, 2008

Checksums on data can be disabled by mount option, so it's
possible some data extents don't have checksums or have
invalid checksums. This causes trouble for data relocation.
This patch contains following things to make data relocation
work.

1) make nodatasum/nodatacow mount option only affects new
files. Checksums and COW on data are only controlled by the
inode flags.

2) check the existence of checksum in the nodatacow checker.
If checksums exist, force COW the data extent. This ensure that
checksum for a given block is either valid or does not exist.

3) update data relocation code to properly handle the case
of checksum missing.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

17d217fe

Btrfs: fix leaking block group on balance · d2fb3437

由 Yan Zheng 提交于 12月 11, 2008

The block group structs are referenced in many different
places, and it's not safe to free while balancing.  So, those block
group structs were simply leaked instead.

This patch replaces the block group pointer in the inode with the starting byte
offset of the block group and adds reference counting to the block group
struct.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

d2fb3437

09 12月, 2008 2 次提交

Btrfs: Add inode sequence number for NFS and reserved space in a few structs · c3027eb5

由 Chris Mason 提交于 12月 08, 2008

This adds a sequence number to the btrfs inode that is increased on
every update.  NFS will be able to use that to detect when an inode has
changed, without relying on inaccurate time fields.

While we're here, this also:

Puts reserved space into the super block and inode

Adds a log root transid to the super so we can pick the newest super
based on the fsync log as well as the main transaction ID.  For now
the log root transid is always zero, but that'll get fixed.

Adds a starting offset to the dev_item.  This will let us do better
alignment calculations if we know the start of a partition on the disk.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c3027eb5

Btrfs: move data checksumming into a dedicated tree · d20f7043

由 Chris Mason 提交于 12月 08, 2008

Btrfs stores checksums for each data block.  Until now, they have
been stored in the subvolume trees, indexed by the inode that is
referencing the data block.  This means that when we read the inode,
we've probably read in at least some checksums as well.

But, this has a few problems:

* The checksums are indexed by logical offset in the file.  When
compression is on, this means we have to do the expensive checksumming
on the uncompressed data.  It would be faster if we could checksum
the compressed data instead.

* If we implement encryption, we'll be checksumming the plain text and
storing that on disk.  This is significantly less secure.

* For either compression or encryption, we have to get the plain text
back before we can verify the checksum as correct.  This makes the raid
layer balancing and extent moving much more expensive.

* It makes the front end caching code more complex, as we have touch
the subvolume and inodes as we cache extents.

* There is potentitally one copy of the checksum in each subvolume
referencing an extent.

The solution used here is to store the extent checksums in a dedicated
tree.  This allows us to index the checksums by phyiscal extent
start and length.  It means:

* The checksum is against the data stored on disk, after any compression
or encryption is done.

* The checksum is stored in a central location, and can be verified without
following back references, or reading inodes.

This makes compression significantly faster by reducing the amount of
data that needs to be checksummed.  It will also allow much faster
raid management code in general.

The checksums are indexed by a key with a fixed objectid (a magic value
in ctree.h) and offset set to the starting byte of the extent.  This
allows us to copy the checksum items into the fsync log tree directly (or
any other tree), without having to invent a second format for them.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d20f7043

02 12月, 2008 3 次提交

C
Btrfs: delete unused function: btrfs_invalidate_dcache_root · 4022abf4
由 Chris Mason 提交于 12月 02, 2008
```
Snapshot and subvolume creation no longer need this helper.
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
4022abf4

Btrfs: make things static and include the right headers · b2950863

由 Christoph Hellwig 提交于 12月 02, 2008

Shut up various sparse warnings about symbols that should be either
static or have their declarations in scope.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b2950863

L
Btrfs: Fix cow semantic in run_delalloc_nocow() · ce397c06
由 Liu Hui 提交于 12月 01, 2008
```
The file preallocation code reversed the logic to force nodatacow.
This fixes it.
```
ce397c06

20 11月, 2008 3 次提交

Btrfs: compat code fixes · 4b4e25f2

由 Chris Mason 提交于 11月 20, 2008

The btrfs git kernel trees is used to build a standalone tree for
compiling against older kernels.  This commit makes the standalone tree
work with 2.6.27
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4b4e25f2

Btrfs: Use current_fsuid/gid · 79683f2d

由 Chris Mason 提交于 11月 19, 2008

This fixes compile problems with linux-next
Signed-off-by: NChris Mason <chris.mason@oracle.com>

79683f2d

Btrfs: Avoid writeback stalls · d2c3f4f6

由 Chris Mason 提交于 11月 19, 2008

While building large bios in writepages, btrfs may end up waiting
for other page writeback to finish if WB_SYNC_ALL is used.

While it is waiting, the bio it is building has a number of pages with the
writeback bit set and they aren't getting to the disk any time soon. This
lowers the latencies of writeback in general by sending down the bio being
built before waiting for other pages.

The bio submission code tries to limit the total number of async bios in
flight by waiting when we're over a certain number of async bios. But,
the waits are happening while writepages is building bios, and this can easily
lead to stalls and other problems for people calling wait_on_page_writeback.

The current fix is to let the congestion tests take care of waiting.

sync() and others make sure to drain the current async requests to make
sure that everything that was pending when the sync was started really get
to disk. The code would drain pending requests both before and after
submitting a new request.

But, if one of the requests is waiting for page writeback to finish,
the draining waits might block that page writeback. This changes the
draining code to only wait after submitting the bio being processed.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d2c3f4f6

18 11月, 2008 3 次提交

Btrfs: Add backrefs and forward refs for subvols and snapshots · 0660b5af

由 Chris Mason 提交于 11月 17, 2008

Subvols and snapshots can now be referenced from any point in the directory
tree.  We need to maintain back refs for them so we can find lost
subvols.

Forward refs are added so that we know all of the subvols and
snapshots referenced anywhere in the directory tree of a single subvol.  This
can be used to do recursive snapshotting (but they aren't yet) and it is
also used to detect and prevent directory loops when creating new snapshots.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0660b5af

Btrfs: Give each subvol and snapshot their own anonymous devid · 3394e160

由 Chris Mason 提交于 11月 17, 2008

Each subvolume has its own private inode number space, and so we need
to fill in different device numbers for each subvolume to avoid confusing
applications.

This commit puts a struct super_block into struct btrfs_root so it can
call set_anon_super() and get a different device number generated for
each root.

btrfs_rename is changed to prevent renames across subvols.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3394e160

Btrfs: Allow subvolumes and snapshots anywhere in the directory tree · 3de4586c

由 Chris Mason 提交于 11月 17, 2008

Before, all snapshots and subvolumes lived in a single flat directory.  This
was awkward and confusing because the single flat directory was only writable
with the ioctls.

This commit changes the ioctls to create subvols and snapshots at any
point in the directory tree.  This requires making separate ioctls for
snapshot and subvol creation instead of a combining them into one.

The subvol ioctl does:

btrfsctl -S subvol_name parent_dir

After the ioctl is done subvol_name lives inside parent_dir.

The snapshot ioctl does:

btrfsctl -s path_for_snapshot root_to_snapshot

path_for_snapshot can be an absolute or relative path.  btrfsctl breaks it up
into directory and basename components.

root_to_snapshot can be any file or directory in the FS.  The snapshot
is taken of the entire root where that file lives.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3de4586c

13 11月, 2008 1 次提交

Btrfs: mount ro and remount support · c146afad

由 Yan Zheng 提交于 11月 12, 2008

This patch adds mount ro and remount support. The main
changes in patch are: adding btrfs_remount and related
helper function; splitting the transaction related code
out of close_ctree into btrfs_commit_super; updating
allocator to properly handle read only block group.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>

c146afad

11 11月, 2008 2 次提交

C
Btrfs: Fix compile warnings on 32 bit machines · 5b050f04
由 Chris Mason 提交于 11月 11, 2008
```
Simple casting here and there to fix things up.
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
5b050f04

Btrfs: Fix usage of struct extent_map->orig_start · 445a6944

由 Chris Mason 提交于 11月 10, 2008

This makes sure the orig_start field in struct extent_map gets set
everywhere the extent_map structs are created or modified.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

445a6944

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功