提交 24542bf7 编写于 作者: Z Zach Brown 提交者: Chris Mason

btrfs: limit fallocate extent reservation to 256MB

Very large fallocate requests are cpu bound and result in extents with a
repeating pattern of ever decreasing size:

$ time fallocate -l 1T file
real	0m13.039s

( an excerpt of the extents from btrfs-debug-tree: )
  prealloc data disk byte 1536292564992 nr 397312
  prealloc data disk byte 1536292962304 nr 196608
  prealloc data disk byte 1536293158912 nr 98304
  prealloc data disk byte 1536293257216 nr 49152
  prealloc data disk byte 1536293306368 nr 24576
  prealloc data disk byte 1536293330944 nr 12288
  prealloc data disk byte 1536293343232 nr 8192
  prealloc data disk byte 1536293351424 nr 4096
  prealloc data disk byte 1536293355520 nr 4096
  prealloc data disk byte 1536293359616 nr 4096

The excessive cpu use comes from __btrfs_prealloc_file_range() trying to
allocate the entire remaining size after each extent is allocated.
btrfs_reserve_extent() repeatedly cuts this requested size in half until
it gets down to the size that the allocators can return.  We limit the
problem for now by capping each reservation at 256 meg.

The small extents come from a masking bug when decreasing the requested
reservation size.  The high 32bits are cleared and the remaining low
bits might happen to reserve a small size.   Fix this by using
round_down() which properly casts the mask.

After these fixes huge fallocate requests are fast and result in nice
large extents:

$ time fallocate -l 1T file
real	0m0.082s

  prealloc data disk byte 1112425889792 nr 268435456
  prealloc data disk byte 1112694325248 nr 268435456
  prealloc data disk byte 1112962760704 nr 268435456
Reported-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NZach Brown <zab@redhat.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>
上级 1cba0cdf
...@@ -6143,7 +6143,7 @@ int btrfs_reserve_extent(struct btrfs_trans_handle *trans, ...@@ -6143,7 +6143,7 @@ int btrfs_reserve_extent(struct btrfs_trans_handle *trans,
if (ret == -ENOSPC) { if (ret == -ENOSPC) {
if (!final_tried) { if (!final_tried) {
num_bytes = num_bytes >> 1; num_bytes = num_bytes >> 1;
num_bytes = num_bytes & ~(root->sectorsize - 1); num_bytes = round_down(num_bytes, root->sectorsize);
num_bytes = max(num_bytes, min_alloc_size); num_bytes = max(num_bytes, min_alloc_size);
if (num_bytes == min_alloc_size) if (num_bytes == min_alloc_size)
final_tried = true; final_tried = true;
......
...@@ -7894,8 +7894,9 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode, ...@@ -7894,8 +7894,9 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
} }
} }
ret = btrfs_reserve_extent(trans, root, num_bytes, min_size, ret = btrfs_reserve_extent(trans, root,
0, *alloc_hint, &ins, 1); min(num_bytes, 256ULL * 1024 * 1024),
min_size, 0, *alloc_hint, &ins, 1);
if (ret) { if (ret) {
if (own_trans) if (own_trans)
btrfs_end_transaction(trans, root); btrfs_end_transaction(trans, root);
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册