提交 · 66199712e9eef5aede09dbcd9dfff87798a66917 · openeuler / Kernel

13 1月, 2012 1 次提交

mm: compaction: determine if dirty pages can be migrated without blocking within ->migratepage · b969c4ab

由 Mel Gorman 提交于 1月 12, 2012

Asynchronous compaction is used when allocating transparent hugepages to
avoid blocking for long periods of time.  Due to reports of stalling,
there was a debate on disabling synchronous compaction but this severely
impacted allocation success rates.  Part of the reason was that many dirty
pages are skipped in asynchronous compaction by the following check;

	if (PageDirty(page) && !sync &&
		mapping->a_ops->migratepage != migrate_page)
			rc = -EBUSY;

This skips over all mapping aops using buffer_migrate_page() even though
it is possible to migrate some of these pages without blocking.  This
patch updates the ->migratepage callback with a "sync" parameter.  It is
the responsibility of the callback to fail gracefully if migration would
block.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b969c4ab

16 12月, 2011 1 次提交

Btrfs: fix num_workers_starting bug and other bugs in async thread · 0dc3b84a

由 Josef Bacik 提交于 11月 18, 2011

Al pointed out we have some random problems with the way we account for
num_workers_starting in the async thread stuff.  First of all we need to make
sure to decrement num_workers_starting if we fail to start the worker, so make
__btrfs_start_workers do this.  Also fix __btrfs_start_workers so that it
doesn't call btrfs_stop_workers(), there is no point in stopping everybody if we
failed to create a worker.  Also check_pending_worker_creates needs to call
__btrfs_start_work in it's work function since it already increments
num_workers_starting.

People only start one worker at a time, so get rid of the num_workers argument
everywhere, and make btrfs_queue_worker a void since it will always succeed.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

0dc3b84a

22 11月, 2011 1 次提交

freezer: unexport refrigerator() and update try_to_freeze() slightly · a0acae0e

由 Tejun Heo 提交于 11月 21, 2011

There is no reason to export two functions for entering the
refrigerator.  Calling refrigerator() instead of try_to_freeze()
doesn't save anything noticeable or removes any race condition.

* Rename refrigerator() to __refrigerator() and make it return bool
  indicating whether it scheduled out for freezing.

* Update try_to_freeze() to return bool and relay the return value of
  __refrigerator() if freezing().

* Convert all refrigerator() users to try_to_freeze().

* Update documentation accordingly.

* While at it, add might_sleep() to try_to_freeze().
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Christoph Hellwig <hch@infradead.org>

a0acae0e

20 11月, 2011 2 次提交

btrfs: mirror_num should be int, not u64 · 32240a91

由 Jan Schmidt 提交于 11月 20, 2011

My previous patch introduced some u64 for failed_mirror variables, this one
makes it consistent again.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

32240a91

Btrfs: fix barrier flushes · 387125fc

由 Chris Mason 提交于 11月 18, 2011

When btrfs is writing the super blocks, it send barrier flushes to make
sure writeback caching drives get all the metadata on disk in the
right order.

But, we have two bugs in the way these are sent down.  When doing
full commits (not via the tree log), we are sending the barrier down
before the last super when it should be going down before the first.

In multi-device setups, we should be waiting for the barriers to
complete on all devices before writing any of the supers.

Both of these bugs can cause corruptions on power failures.  We fix it
with some new code to send down empty barriers to all devices before
writing the first super.

Alexandre Oliva found the multi-device bug.  Arne Jansen did the async
barrier loop.
Signed-off-by: NChris Mason <chris.mason@oracle.com>
Reported-by: NAlexandre Oliva <oliva@lsd.ic.unicamp.br>

387125fc

10 11月, 2011 2 次提交

Btrfs: close devices on all error paths in open_ctree() · 586e46e2

由 Ilya Dryomov 提交于 11月 09, 2011

Fix a bug introduced by 7e662854 where we would leave devices busy on
certain error paths in open_ctree(). fs_info is guaranteed to be
non-NULL now so it's safe to dereference it on all error paths.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

586e46e2

Btrfs: avoid null dereference and leaks when bailing from open_ctree() · 4d34b278

由 Ilya Dryomov 提交于 11月 09, 2011

Fix bugs introduced by 6c41761f.  Firstly, after failing to allocate any
of the tree roots (first 'goto fail' in open_ctree()) we would
dereference a NULL fs_info pointer in free_fs_info().  Secondly, after
failures from init_srcu_struct(), setup_bdi() and new_inode() we would
leak all earlier allocated roots: fs_info fields haven't been
initialized yet so free_fs_info() is rendered useless.

Fix this by initializing fs_info pointer and fs_info fields before any
allocations happen.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

4d34b278

07 11月, 2011 1 次提交

Btrfs: check for a null fs root when writing to the backup root log · 7c7e82a7

由 Chris Mason 提交于 11月 06, 2011

During log replay, can commit the transaction before the fs_root
pointers are setup, so we have to make sure they are not null before
trying to use them.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7c7e82a7

06 11月, 2011 6 次提交

Btrfs: stop the readahead threads on failed mount · 306c8b68

由 Chris Mason 提交于 11月 03, 2011

If we don't stop them, they linger around corrupting
memory by using pointers to freed things.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

306c8b68

Btrfs: fix extent_buffer leak in the metadata IO error handling · c674e04e

由 Chris Mason 提交于 11月 03, 2011

The scrub readahead branch brought in a new error handling hook,
but it was leaking extent_buffer references.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c674e04e

Btrfs: make a delayed_block_rsv for the delayed item insertion · 6d668dda

由 Josef Bacik 提交于 11月 03, 2011

I've been hitting warnings in use_block_rsv when running the delayed insertion
stuff. It's because we will readjust global block rsv based on what is in use,
which means we could end up discarding reservations that are for the delayed
insertion stuff. So instead create a seperate block rsv for the delayed
insertion stuff. This will also make it easier to debug problems with the
delayed insertion reservations since we will know that only the delayed
insertion code touches this block_rsv. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

6d668dda

Btrfs: add a log of past tree roots · af31f5e5

由 Chris Mason 提交于 11月 03, 2011

This takes some of the free space in the btrfs super block
to record information about most of the roots in the last four
commits.

It also adds a -o recovery to use the root history log when
we're not able to read the tree of tree roots, the extent
tree root, the device tree root or the csum root.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

af31f5e5

btrfs: separate superblock items out of fs_info · 6c41761f

由 David Sterba 提交于 4月 13, 2011

fs_info has now ~9kb, more than fits into one page. This will cause
mount failure when memory is too fragmented. Top space consumers are
super block structures super_copy and super_for_commit, ~2.8kb each.
Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64)

Add a wrapper for freeing fs_info and all of it's dynamically allocated
members.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

6c41761f

Btrfs: make sure to flush queued bios if write_cache_pages waits · 01d658f2

由 Chris Mason 提交于 11月 01, 2011

write_cache_pages tries to build up a large bio to stuff down the pipe.
But if it needs to wait for a page lock, it needs to make sure and send
down any pending writes so we don't deadlock with anyone who has the
page lock and is waiting for writeback of things inside the bio.

Dave Sterba triggered this as a deadlock between the autodefrag code and
the extent write_cache_pages
Signed-off-by: NChris Mason <chris.mason@oracle.com>

01d658f2

02 11月, 2011 1 次提交

filesystems: add set_nlink() · bfe86848

由 Miklos Szeredi 提交于 10月 28, 2011

Replace remaining direct i_nlink updates with a new set_nlink()
updater function.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Tested-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bfe86848

20 10月, 2011 3 次提交

Btrfs: allow us to overcommit our enospc reservations · 2bf64758

由 Josef Bacik 提交于 9月 26, 2011

One of the things that kills us is the fact that our ENOSPC reservations are
horribly over the top in most normal cases.  There isn't too much that can be
done about this because when we are completely full we really need them to work
like this so we don't under reserve.  However if there is plenty of unallocated
chunks on the disk we can use that to gauge how much we can overcommit.  So this
patch adds chunk free space accounting so we always know how much unallocated
space we have.  Then if we fail to make a reservation within our allocated
space, check to see if we can overcommit.  In the normal flushing case (like
with delalloc metadata reservations) we'll take the free space and divide it by
2 if our metadata profile is setup for DUP or any of those, and then divide it
by 8 to make sure we don't overcommit too much.  Then if we're in a non-flushing
case (we really need this reservation now!) we only limit ourselves to half of
the free space.  This makes this fio test

[torrent]
filename=torrent-test
rw=randwrite
size=4g
ioengine=sync
directory=/mnt/btrfs-test

go from taking around 45 minutes to 10 seconds on my freshly formatted 3 TiB
file system.  This doesn't seem to break my other enospc tests, but could really
use some more testing as this is a super scary change.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

2bf64758

Btrfs: put the block group cache after we commit the super · 300e4f8a

由 Josef Bacik 提交于 8月 29, 2011

In moving some enospc stuff around I noticed that when we unmount we are often
evicting the free space cache inodes before we do our last commit. This isn't
bad, but it makes us constantly have to re-read the inodes back. So instead
don't evict the cache until after we do our last commit, this will make things a
little less crappy and makes a future enospc change work properly. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

300e4f8a

Btrfs: kill the durable block rsv stuff · 37be25bc

由 Josef Bacik 提交于 8月 05, 2011

This is confusing code and isn't used by anything anymore, so delete it.
Signed-off-by: NJosef Bacik <josef@redhat.com>

37be25bc

02 10月, 2011 4 次提交

btrfs: hooks for readahead · 4bb31e92

由 Arne Jansen 提交于 6月 10, 2011

This adds the hooks needed for readahead. In the readpage_end_io_hook,
the extent state is checked for the EXTENT_READAHEAD flag. Only in this
case the readahead hook is called, to keep the impact on non-ra as low
as possible.
Additionally, a hook for a failed IO is added, otherwise readahead would
wait indefinitely for the extent to finish.

Changes for v2:
 - eliminate race condition
Signed-off-by: NArne Jansen <sensille@gmx.net>

4bb31e92

btrfs: state information for readahead · 90519d66

由 Arne Jansen 提交于 5月 23, 2011

Add state information for readahead to btrfs_fs_info and btrfs_device

Changes v2:
 - don't wait in radix_trees
 - add own set of workers for readahead
Reviewed-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NArne Jansen <sensille@gmx.net>

90519d66

btrfs: add READAHEAD extent buffer flag · ab0fff03

由 Arne Jansen 提交于 5月 23, 2011

Add a READAHEAD extent buffer flag.
Add a function to trigger a read with this flag set.

Changes v2:
 - use extent buffer flags instead of extent state flags

Changes v5:
 - adapt to changed read_extent_buffer_pages interface
 - don't return eb from reada_tree_block_flagged if it has CORRUPT flag set
Signed-off-by: NArne Jansen <sensille@gmx.net>

ab0fff03

btrfs: add an extra wait mode to read_extent_buffer_pages · bb82ab88

由 Arne Jansen 提交于 6月 10, 2011

read_extent_buffer_pages currently has two modes, either trigger a read
without waiting for anything, or wait for the I/O to finish. The former
also bails when it's unable to lock the page. This patch now adds an
additional parameter to allow it to block on page lock, but don't wait
for completion.

Changes v5:
 - merge the 2 wait parameters into one and define WAIT_NONE, WAIT_COMPLETE and
   WAIT_PAGE_LOCK

Change v6:
 - fix bug introduced in v5
Signed-off-by: NArne Jansen <sensille@gmx.net>

bb82ab88

29 9月, 2011 1 次提交

btrfs: add mirror_num to extent_read_full_page · 8ddc7d9c

由 Jan Schmidt 提交于 6月 13, 2011

Currently, extent_read_full_page always assumes we are trying to read mirror
0, which generally is the best we can do. To add flexibility, pass it as a
parameter. This will be needed by scrub fixup code.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

8ddc7d9c

28 7月, 2011 3 次提交

Btrfs: make a lockdep class for each root · 85d4e461

由 Chris Mason 提交于 7月 26, 2011

This patch was originally from Tejun Heo. lockdep complains about the btrfs
locking because we sometimes take btree locks from two different trees at the
same time. The current classes are based only on level in the btree, which
isn't enough information for lockdep to figure out if the lock is safe.

This patch makes a class for each type of tree, and lumps all the FS trees that
actually have files and directories into the same class.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

85d4e461

Btrfs: stop using highmem for extent_buffers · a6591715

由 Chris Mason 提交于 7月 19, 2011

The extent_buffers have a very complex interface where
we use HIGHMEM for metadata and try to cache a kmap mapping
to access the memory.

The next commit adds reader/writer locks, and concurrent use
of this kmap cache would make it even more complex.

This commit drops the ability to use HIGHMEM with extent buffers,
and rips out all of the related code.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a6591715

Btrfs: use a worker thread to do caching · bab39bf9

由 Josef Bacik 提交于 6月 30, 2011

A user reported a deadlock when copying a bunch of files. This is because they
were low on memory and kthreadd got hung up trying to migrate pages for an
allocation when starting the caching kthread. The page was locked by the person
starting the caching kthread. To fix this we just need to use the async thread
stuff so that the threads are already created and we don't have to worry about
deadlocks. Thanks,
Reported-by: NRoman Mamedov <rm@romanrm.ru>
Signed-off-by: NJosef Bacik <josef@redhat.com>

bab39bf9

20 7月, 2011 1 次提交
- A
  btrfs: kill magical embedded struct superblock · 0ee5dc67
  由 Al Viro 提交于 7月 07, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  0ee5dc67
18 6月, 2011 3 次提交

btrfs: fix uninitialized return value · 35a30d7c

由 David Sterba 提交于 6月 13, 2011

When allocation fails in btrfs_read_fs_root_no_name, ret is not set
although it is returned, holding a garbage value.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

35a30d7c

btrfs: Remove unused sysfs code · 9fe6a50f

由 Maarten Lankhorst 提交于 6月 16, 2011

Removes code no longer used. The sysfs file itself is kept, because the
btrfs developers expressed interest in putting new entries to sysfs.
Signed-off-by: NMaarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9fe6a50f

Btrfs: fix relocation races · 7585717f

由 Chris Mason 提交于 6月 13, 2011

The recent commit to get rid of our trans_mutex introduced
some races with block group relocation.  The problem is that relocation
needs to do some record keeping about each root, and it was relying
on the transaction mutex to coordinate things in subtle ways.

This fix adds a mutex just for the relocation code and makes sure
it doesn't have a big impact on normal operations.  The race is
really fixed in btrfs_record_root_in_trans, which is where we
step back and wait for the relocation code to finish accounting
setup.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7585717f

13 6月, 2011 1 次提交

Btrfs: check the return value from set_anon_super · ac08aedf

由 Chris Mason 提交于 6月 13, 2011

Al Viro noticed we weren't checking for set_anon_super failures.  This
adds the required checks.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ac08aedf

11 6月, 2011 1 次提交

btrfs: fix unlocked access of delalloc_inodes · 5be76758

由 David Sterba 提交于 6月 09, 2011

list_splice_init will make delalloc_inodes empty, but without a spinlock
around, this may produce corrupted list head, accessed in many placess,
The race window is very tight and nobody seems to have hit it so far.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5be76758

10 6月, 2011 1 次提交

btrfs: reinitialize scrub workers · 632dd772

由 Arne Jansen 提交于 6月 10, 2011

Scrub starts the workers each time a scrub starts and stops them after it
finished. This patch adds an initialization for the workers before each
start, otherwise the workers behave strangely.
Signed-off-by: NArne Jansen <sensille@gmx.net>

632dd772

27 5月, 2011 2 次提交

Btrfs: use the device_list_mutex during write_dev_supers · 174ba509

由 Chris Mason 提交于 5月 27, 2011

write_dev_supers was changed to use RCU to protect the list of
devices, but it was then sleeping while it actually wrote the supers.
This fixes it to just use the mutex, since we really don't any
concurrency in write_dev_supers anyway.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

174ba509

Btrfs: add mount -o auto_defrag · 4cb5300b

由 Chris Mason 提交于 5月 24, 2011

This will detect small random writes into files and
queue the up for an auto defrag process.  It isn't well suited to
database workloads yet, but works for smaller files such as rpm, sqlite
or bdb databases.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4cb5300b

24 5月, 2011 5 次提交

Btrfs: using rcu lock in the reader side of devices list · 1f78160c

由 Xiao Guangrong 提交于 4月 20, 2011

fs_devices->devices is only updated on remove and add device paths, so we can
use rcu to protect it in the reader side
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1f78160c

Btrfs: fix the race between reading and updating devices · c9513edb

由 Xiao Guangrong 提交于 4月 20, 2011

On btrfs_congested_fn and __unplug_io_fn paths, we should hold
device_list_mutex to avoid remove/add device path to
update fs_devices->devices

On __btrfs_close_devices and btrfs_prepare_sprout paths, the devices in
fs_devices->devices or fs_devices->devices is updated, so we should hold
the mutex to avoid the reader side to reach them
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c9513edb

BTRFS: Remove unused node_lock · 0956c798

由 Andi Kleen 提交于 5月 18, 2011

240f62c8 replaced the node_lock with rcu_read_lock, but forgot
to remove the actual lock in the data structure. Remove it here.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0956c798

Btrfs: kill trans_mutex · a4abeea4

由 Josef Bacik 提交于 4月 11, 2011

We use trans_mutex for lots of things, here's a basic list

1) To serialize trans_handles joining the currently running transaction
2) To make sure that no new trans handles are started while we are committing
3) To protect the dead_roots list and the transaction lists

Really the serializing trans_handles joining is not too hard, and can really get
bogged down in acquiring a reference to the transaction. So replace the
trans_mutex with a trans_lock spinlock and use it to do the following

1) Protect fs_info->running_transaction. All trans handles have to do is check
this, and then take a reference of the transaction and keep on going.
2) Protect the fs_info->trans_list. This doesn't get used too much, basically
it just holds the current transactions, which will usually just be the currently
committing transaction and the currently running transaction at most.
3) Protect the dead roots list. This is only ever processed by splicing the
list so this is relatively simple.
4) Protect the fs_info->reloc_ctl stuff. This is very lightweight and was using
the trans_mutex before, so this is a pretty straightforward change.
5) Protect fs_info->no_trans_join. Because we don't hold the trans_lock over
the entirety of the commit we need to have a way to block new people from
creating a new transaction while we're doing our work. So we set no_trans_join
and in join_transaction we test to see if that is set, and if it is we do a
wait_on_commit.
6) Make the transaction use count atomic so we don't need to take locks to
modify it when we're dropping references.
7) Add a commit_lock to the transaction to make sure multiple people trying to
commit the same transaction don't race and commit at the same time.
8) Make open_ioctl_trans an atomic so we don't have to take any locks for ioctl
trans.

I have tested this with xfstests, but obviously it is a pretty hairy change so
lots of testing is greatly appreciated. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

a4abeea4

Btrfs: take away the num_items argument from btrfs_join_transaction · 7a7eaa40

由 Josef Bacik 提交于 4月 13, 2011

I keep forgetting that btrfs_join_transaction() just ignores the num_items
argument, which leads me to sending pointless patches and looking stupid :). So
just kill the num_items argument from btrfs_join_transaction and
btrfs_start_ioctl_transaction, since neither of them use it. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

7a7eaa40

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功