- 30 6月, 2017 1 次提交
-
-
由 Qu Wenruo 提交于
Introduce a new parameter, struct extent_changeset for btrfs_qgroup_reserved_data() and its callers. Such extent_changeset was used in btrfs_qgroup_reserve_data() to record which range it reserved in current reserve, so it can free it in error paths. The reason we need to export it to callers is, at buffered write error path, without knowing what exactly which range we reserved in current allocation, we can free space which is not reserved by us. This will lead to qgroup reserved space underflow. Reviewed-by: NChandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 21 6月, 2017 1 次提交
-
-
由 Filipe Manana 提交于
While punching a hole in a range that is not aligned with the sector size (currently the same as the page size) we can end up leaving an extent map in memory with a length that is smaller then the sector size or with a start offset that is not aligned to the sector size. Both cases are not expected and can lead to problems. This issue is easily detected after the patch from commit a7e3b975 ("Btrfs: fix reported number of inode blocks"), introduced in kernel 4.12-rc1, in a scenario like the following for example: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -c "pwrite -S 0xaa -b 100K 0 100K" /mnt/foo $ xfs_io -c "fpunch 60K 90K" /mnt/foo $ xfs_io -c "pwrite -S 0xbb -b 100K 50K 100K" /mnt/foo $ xfs_io -c "pwrite -S 0xcc -b 50K 100K 50K" /mnt/foo $ umount /mnt After the unmount operation we can see several warnings emmitted due to underflows related to space reservation counters: [ 2837.443299] ------------[ cut here ]------------ [ 2837.447395] WARNING: CPU: 8 PID: 2474 at fs/btrfs/inode.c:9444 btrfs_destroy_inode+0xe8/0x27e [btrfs] [ 2837.452108] Modules linked in: dm_flakey dm_mod ppdev parport_pc psmouse parport sg pcspkr acpi_cpufreq tpm_tis tpm_tis_core i2c_piix4 i2c_core evdev tpm button se rio_raw sunrpc loop autofs4 ext4 crc16 jbd2 mbcache btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_gene ric raid1 raid0 multipath linear md_mod sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring virtio e1000 scsi_mod floppy [ 2837.458389] CPU: 8 PID: 2474 Comm: umount Tainted: G W 4.10.0-rc8-btrfs-next-43+ #1 [ 2837.459754] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 2837.462379] Call Trace: [ 2837.462379] dump_stack+0x68/0x92 [ 2837.462379] __warn+0xc2/0xdd [ 2837.462379] warn_slowpath_null+0x1d/0x1f [ 2837.462379] btrfs_destroy_inode+0xe8/0x27e [btrfs] [ 2837.462379] destroy_inode+0x3d/0x55 [ 2837.462379] evict+0x177/0x17e [ 2837.462379] dispose_list+0x50/0x71 [ 2837.462379] evict_inodes+0x132/0x141 [ 2837.462379] generic_shutdown_super+0x3f/0xeb [ 2837.462379] kill_anon_super+0x12/0x1c [ 2837.462379] btrfs_kill_super+0x16/0x21 [btrfs] [ 2837.462379] deactivate_locked_super+0x30/0x68 [ 2837.462379] deactivate_super+0x36/0x39 [ 2837.462379] cleanup_mnt+0x58/0x76 [ 2837.462379] __cleanup_mnt+0x12/0x14 [ 2837.462379] task_work_run+0x77/0x9b [ 2837.462379] prepare_exit_to_usermode+0x9d/0xc5 [ 2837.462379] syscall_return_slowpath+0x196/0x1b9 [ 2837.462379] entry_SYSCALL_64_fastpath+0xab/0xad [ 2837.462379] RIP: 0033:0x7f3ef3e6b9a7 [ 2837.462379] RSP: 002b:00007ffdd0d8de58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 2837.462379] RAX: 0000000000000000 RBX: 0000556f76a39060 RCX: 00007f3ef3e6b9a7 [ 2837.462379] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000556f76a3f910 [ 2837.462379] RBP: 0000556f76a3f910 R08: 0000556f76a3e670 R09: 0000000000000015 [ 2837.462379] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f3ef436ce64 [ 2837.462379] R13: 0000000000000000 R14: 0000556f76a39240 R15: 00007ffdd0d8e0e0 [ 2837.519355] ---[ end trace e79345fe24b30b8d ]--- [ 2837.596256] ------------[ cut here ]------------ [ 2837.597625] WARNING: CPU: 8 PID: 2474 at fs/btrfs/extent-tree.c:5699 btrfs_free_block_groups+0x246/0x3eb [btrfs] [ 2837.603547] Modules linked in: dm_flakey dm_mod ppdev parport_pc psmouse parport sg pcspkr acpi_cpufreq tpm_tis tpm_tis_core i2c_piix4 i2c_core evdev tpm button serio_raw sunrpc loop autofs4 ext4 crc16 jbd2 mbcache btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring virtio e1000 scsi_mod floppy [ 2837.659372] CPU: 8 PID: 2474 Comm: umount Tainted: G W 4.10.0-rc8-btrfs-next-43+ #1 [ 2837.663359] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 2837.663359] Call Trace: [ 2837.663359] dump_stack+0x68/0x92 [ 2837.663359] __warn+0xc2/0xdd [ 2837.663359] warn_slowpath_null+0x1d/0x1f [ 2837.663359] btrfs_free_block_groups+0x246/0x3eb [btrfs] [ 2837.663359] close_ctree+0x1dd/0x2e1 [btrfs] [ 2837.663359] ? evict_inodes+0x132/0x141 [ 2837.663359] btrfs_put_super+0x15/0x17 [btrfs] [ 2837.663359] generic_shutdown_super+0x6a/0xeb [ 2837.663359] kill_anon_super+0x12/0x1c [ 2837.663359] btrfs_kill_super+0x16/0x21 [btrfs] [ 2837.663359] deactivate_locked_super+0x30/0x68 [ 2837.663359] deactivate_super+0x36/0x39 [ 2837.663359] cleanup_mnt+0x58/0x76 [ 2837.663359] __cleanup_mnt+0x12/0x14 [ 2837.663359] task_work_run+0x77/0x9b [ 2837.663359] prepare_exit_to_usermode+0x9d/0xc5 [ 2837.663359] syscall_return_slowpath+0x196/0x1b9 [ 2837.663359] entry_SYSCALL_64_fastpath+0xab/0xad [ 2837.663359] RIP: 0033:0x7f3ef3e6b9a7 [ 2837.663359] RSP: 002b:00007ffdd0d8de58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 2837.663359] RAX: 0000000000000000 RBX: 0000556f76a39060 RCX: 00007f3ef3e6b9a7 [ 2837.663359] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000556f76a3f910 [ 2837.663359] RBP: 0000556f76a3f910 R08: 0000556f76a3e670 R09: 0000000000000015 [ 2837.663359] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f3ef436ce64 [ 2837.663359] R13: 0000000000000000 R14: 0000556f76a39240 R15: 00007ffdd0d8e0e0 [ 2837.739445] ---[ end trace e79345fe24b30b8e ]--- [ 2837.745595] ------------[ cut here ]------------ [ 2837.746412] WARNING: CPU: 8 PID: 2474 at fs/btrfs/extent-tree.c:5700 btrfs_free_block_groups+0x261/0x3eb [btrfs] [ 2837.747955] Modules linked in: dm_flakey dm_mod ppdev parport_pc psmouse parport sg pcspkr acpi_cpufreq tpm_tis tpm_tis_core i2c_piix4 i2c_core evdev tpm button serio_raw sunrpc loop autofs4 ext4 crc16 jbd2 mbcache btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring virtio e1000 scsi_mod floppy [ 2837.755395] CPU: 8 PID: 2474 Comm: umount Tainted: G W 4.10.0-rc8-btrfs-next-43+ #1 [ 2837.756769] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 2837.758526] Call Trace: [ 2837.758925] dump_stack+0x68/0x92 [ 2837.759383] __warn+0xc2/0xdd [ 2837.759383] warn_slowpath_null+0x1d/0x1f [ 2837.759383] btrfs_free_block_groups+0x261/0x3eb [btrfs] [ 2837.759383] close_ctree+0x1dd/0x2e1 [btrfs] [ 2837.759383] ? evict_inodes+0x132/0x141 [ 2837.759383] btrfs_put_super+0x15/0x17 [btrfs] [ 2837.759383] generic_shutdown_super+0x6a/0xeb [ 2837.759383] kill_anon_super+0x12/0x1c [ 2837.759383] btrfs_kill_super+0x16/0x21 [btrfs] [ 2837.759383] deactivate_locked_super+0x30/0x68 [ 2837.759383] deactivate_super+0x36/0x39 [ 2837.759383] cleanup_mnt+0x58/0x76 [ 2837.759383] __cleanup_mnt+0x12/0x14 [ 2837.759383] task_work_run+0x77/0x9b [ 2837.759383] prepare_exit_to_usermode+0x9d/0xc5 [ 2837.759383] syscall_return_slowpath+0x196/0x1b9 [ 2837.759383] entry_SYSCALL_64_fastpath+0xab/0xad [ 2837.759383] RIP: 0033:0x7f3ef3e6b9a7 [ 2837.759383] RSP: 002b:00007ffdd0d8de58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 2837.759383] RAX: 0000000000000000 RBX: 0000556f76a39060 RCX: 00007f3ef3e6b9a7 [ 2837.759383] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000556f76a3f910 [ 2837.759383] RBP: 0000556f76a3f910 R08: 0000556f76a3e670 R09: 0000000000000015 [ 2837.759383] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f3ef436ce64 [ 2837.759383] R13: 0000000000000000 R14: 0000556f76a39240 R15: 00007ffdd0d8e0e0 [ 2837.777063] ---[ end trace e79345fe24b30b8f ]--- [ 2837.778235] ------------[ cut here ]------------ [ 2837.778856] WARNING: CPU: 8 PID: 2474 at fs/btrfs/extent-tree.c:9825 btrfs_free_block_groups+0x348/0x3eb [btrfs] [ 2837.791385] Modules linked in: dm_flakey dm_mod ppdev parport_pc psmouse parport sg pcspkr acpi_cpufreq tpm_tis tpm_tis_core i2c_piix4 i2c_core evdev tpm button serio_raw sunrpc loop autofs4 ext4 crc16 jbd2 mbcache btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring virtio e1000 scsi_mod floppy [ 2837.797711] CPU: 8 PID: 2474 Comm: umount Tainted: G W 4.10.0-rc8-btrfs-next-43+ #1 [ 2837.798594] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 2837.800118] Call Trace: [ 2837.800515] dump_stack+0x68/0x92 [ 2837.801015] __warn+0xc2/0xdd [ 2837.801471] warn_slowpath_null+0x1d/0x1f [ 2837.801698] btrfs_free_block_groups+0x348/0x3eb [btrfs] [ 2837.801698] close_ctree+0x1dd/0x2e1 [btrfs] [ 2837.801698] ? evict_inodes+0x132/0x141 [ 2837.801698] btrfs_put_super+0x15/0x17 [btrfs] [ 2837.801698] generic_shutdown_super+0x6a/0xeb [ 2837.801698] kill_anon_super+0x12/0x1c [ 2837.801698] btrfs_kill_super+0x16/0x21 [btrfs] [ 2837.801698] deactivate_locked_super+0x30/0x68 [ 2837.801698] deactivate_super+0x36/0x39 [ 2837.801698] cleanup_mnt+0x58/0x76 [ 2837.801698] __cleanup_mnt+0x12/0x14 [ 2837.801698] task_work_run+0x77/0x9b [ 2837.801698] prepare_exit_to_usermode+0x9d/0xc5 [ 2837.801698] syscall_return_slowpath+0x196/0x1b9 [ 2837.801698] entry_SYSCALL_64_fastpath+0xab/0xad [ 2837.801698] RIP: 0033:0x7f3ef3e6b9a7 [ 2837.801698] RSP: 002b:00007ffdd0d8de58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 2837.801698] RAX: 0000000000000000 RBX: 0000556f76a39060 RCX: 00007f3ef3e6b9a7 [ 2837.801698] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000556f76a3f910 [ 2837.801698] RBP: 0000556f76a3f910 R08: 0000556f76a3e670 R09: 0000000000000015 [ 2837.801698] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f3ef436ce64 [ 2837.801698] R13: 0000000000000000 R14: 0000556f76a39240 R15: 00007ffdd0d8e0e0 [ 2837.818441] ---[ end trace e79345fe24b30b90 ]--- [ 2837.818991] BTRFS info (device sdc): space_info 1 has 7974912 free, is not full [ 2837.819830] BTRFS info (device sdc): space_info total=8388608, used=417792, pinned=0, reserved=0, may_use=18446744073709547520, readonly=0 What happens in the above example is the following: 1) When punching the hole, at btrfs_punch_hole(), the variable tail_len is set to 2048 (as tail_start is 148Kb + 1 and offset + len is 150Kb). This results in the creation of an extent map with a length of 2Kb starting at file offset 148Kb, through find_first_non_hole() -> btrfs_get_extent(). 2) The second write (first write after the hole punch operation), sets the range [50Kb, 152Kb[ to delalloc. 3) The third write, at btrfs_find_new_delalloc_bytes(), sees the extent map covering the range [148Kb, 150Kb[ and ends up calling set_extent_bit() for the same range, which results in splitting an existing extent state record, covering the range [148Kb, 152Kb[ into two 2Kb extent state records, covering the ranges [148Kb, 150Kb[ and [150Kb, 152Kb[. 4) Finally at lock_and_cleanup_extent_if_need(), immediately after calling btrfs_find_new_delalloc_bytes() we clear the delalloc bit from the range [100Kb, 152Kb[ which results in the btrfs_clear_bit_hook() callback being invoked against the two 2Kb extent state records that cover the ranges [148Kb, 150Kb[ and [150Kb, 152Kb[. When called against the first 2Kb extent state, it calls btrfs_delalloc_release_metadata() with a length argument of 2048 bytes. That function rounds up the length to a sector size aligned length, so it ends up considering a length of 4096 bytes, and then calls calc_csum_metadata_size() which results in decrementing the inode's csum_bytes counter by 4096 bytes, so after it stays a value of 0 bytes. Then the same happens when btrfs_clear_bit_hook() is called against the second extent state that has a length of 2Kb, covering the range [150Kb, 152Kb[, the length is rounded up to 4096 and calc_csum_metadata_size() ends up being called to decrement 4096 bytes from the inode's csum_bytes counter, which at that time has a value of 0, leading to an underflow, which is exactly what triggers the first warning, at btrfs_destroy_inode(). All the other warnings relate to several space accounting counters that underflow as well due to similar reasons. A similar case but where the hole punching operation creates an extent map with a start offset not aligned to the sector size is the following: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "fpunch 695K 820K" $SCRATCH_MNT/bar $ xfs_io -c "pwrite -S 0xaa 1008K 307K" $SCRATCH_MNT/bar $ xfs_io -c "pwrite -S 0xbb -b 630K 1073K 630K" $SCRATCH_MNT/bar $ xfs_io -c "pwrite -S 0xcc -b 459K 1068K 459K" $SCRATCH_MNT/bar $ umount /mnt During the unmount operation we get similar traces for the same reasons as in the first example. So fix the hole punching operation to make sure it never creates extent maps with a length that is not aligned to the sector size nor with a start offset that is not aligned to the sector size, as this breaks all assumptions and it's a land mine. Fixes: d7781546 ("btrfs: Avoid trucating page or punching hole in a already existed hole.") Cc: <stable@vger.kernel.org> Signed-off-by: NFilipe Manana <fdmanana@suse.com> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 26 4月, 2017 2 次提交
-
-
由 Filipe Manana 提交于
Currently when there are buffered writes that were not yet flushed and they fall within allocated ranges of the file (that is, not in holes or beyond eof assuming there are no prealloc extents beyond eof), btrfs simply reports an incorrect number of used blocks through the stat(2) system call (or any of its variants), regardless of mount options or inode flags (compress, compress-force, nodatacow). This is because the number of blocks used that is reported is based on the current number of bytes in the vfs inode plus the number of dealloc bytes in the btrfs inode. The later covers bytes that both fall within allocated regions of the file and holes. Example scenarios where the number of reported blocks is wrong while the buffered writes are not flushed: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt/sdc $ xfs_io -f -c "pwrite -S 0xaa 0 64K" /mnt/sdc/foo1 wrote 65536/65536 bytes at offset 0 64 KiB, 16 ops; 0.0000 sec (259.336 MiB/sec and 66390.0415 ops/sec) $ sync $ xfs_io -c "pwrite -S 0xbb 0 64K" /mnt/sdc/foo1 wrote 65536/65536 bytes at offset 0 64 KiB, 16 ops; 0.0000 sec (192.308 MiB/sec and 49230.7692 ops/sec) # The following should have reported 64K... $ du -h /mnt/sdc/foo1 128K /mnt/sdc/foo1 $ sync # After flushing the buffered write, it now reports the correct value. $ du -h /mnt/sdc/foo1 64K /mnt/sdc/foo1 $ xfs_io -f -c "falloc -k 0 128K" -c "pwrite -S 0xaa 0 64K" /mnt/sdc/foo2 wrote 65536/65536 bytes at offset 0 64 KiB, 16 ops; 0.0000 sec (520.833 MiB/sec and 133333.3333 ops/sec) $ sync $ xfs_io -c "pwrite -S 0xbb 64K 64K" /mnt/sdc/foo2 wrote 65536/65536 bytes at offset 65536 64 KiB, 16 ops; 0.0000 sec (260.417 MiB/sec and 66666.6667 ops/sec) # The following should have reported 128K... $ du -h /mnt/sdc/foo2 192K /mnt/sdc/foo2 $ sync # After flushing the buffered write, it now reports the correct value. $ du -h /mnt/sdc/foo2 128K /mnt/sdc/foo2 So the number of used file blocks is simply incorrect, unlike in other filesystems such as ext4 and xfs for example, but only while the buffered writes are not flushed. Fix this by tracking the number of delalloc bytes that fall within holes and beyond eof of a file, and use instead this new counter when reporting the number of used blocks for an inode. Another different problem that exists is that the delalloc bytes counter is reset when writeback starts (by clearing the EXTENT_DEALLOC flag from the respective range in the inode's iotree) and the vfs inode's bytes counter is only incremented when writeback finishes (through insert_reserved_file_extent()). Therefore while writeback is ongoing we simply report a wrong number of blocks used by an inode if the write operation covers a range previously unallocated. While this change does not fix this problem, it does minimizes it a lot by shortening that time window, as the new dealloc bytes counter (new_delalloc_bytes) is only decremented when writeback finishes right before updating the vfs inode's bytes counter. Fully fixing this second problem is not trivial and will be addressed later by a different patch. Signed-off-by: NFilipe Manana <fdmanana@suse.com>
-
由 Filipe Manana 提交于
If the call to btrfs_qgroup_reserve_data() failed, we were leaking an extent map structure. The failure can happen either due to an -ENOMEM condition or, when quotas are enabled, due to -EDQUOT for example. Signed-off-by: NFilipe Manana <fdmanana@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com>
-
- 18 4月, 2017 1 次提交
-
-
由 Dan Carpenter 提交于
btrfs_get_extent() never returns NULL pointers, so this code introduces a static checker warning. The btrfs_get_extent() is a bit complex, but trust me that it doesn't return NULLs and also if it did we would trigger the BUG_ON(!em) before the last return statement. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> [ updated subject ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 28 2月, 2017 14 次提交
-
-
由 Nikolay Borisov 提交于
In addition to changing the signature, this patch also switches all the functions which are used as an argument to also take btrfs_inode. Namely those are: btrfs_get_extent and btrfs_get_extent_filemap. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Fabian Frederick 提交于
Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs branch. This patch also fixes multiple checkpatch warnings: WARNING: Prefer 'unsigned int' to bare use of 'unsigned' Thanks to Andrew Morton for suggesting more appropriate function instead of macro. [geliangtang@gmail.com: truncate: use i_blocksize()] Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.beSigned-off-by: NFabian Frederick <fabf@skynet.be> Signed-off-by: NGeliang Tang <geliangtang@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 2月, 2017 3 次提交
-
-
由 David Sterba 提交于
This goes as a separate patch because fixing that inside the patches caused too many many conflicts. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Currently btrfs_ino takes a struct inode and this causes a lot of internal btrfs functions which consume this ino to take a VFS inode, rather than btrfs' own struct btrfs_inode. In order to fix this "leak" of VFS structs into the internals of btrfs first it's necessary to eliminate all uses of struct inode for the purpose of inode. This patch does that by using BTRFS_I to convert an inode to btrfs_inode. With this problem eliminated subsequent patches will start eliminating the passing of struct inode altogether, eventually resulting in a lot cleaner code. Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com> [ fix btrfs_get_extent tracepoint prototype ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 10 12月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
A clone is a perfectly fine implementation of a file copy, so most file systems just implement the copy that way. Instead of duplicating this logic move it to the VFS. Currently btrfs and XFS implement copies the same way as clones and there is no behavior change for them, cifs only implements clones and grow support for copy_file_range with this patch. NFS implements both, so this will allow copy_file_range to work on servers that only implement CLONE and be lot more efficient on servers that implements CLONE and COPY. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 06 12月, 2016 5 次提交
-
-
由 Jeff Mahoney 提交于
Now we only use the root parameter to print the root objectid in a tracepoint. We can use the root parameter from the transaction handle for that. It's also used to join the transaction with async commits, so we remove the comment that it's just for checking. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jeff Mahoney 提交于
There are loads of functions in btrfs that accept a root parameter but only use it to obtain an fs_info pointer. Let's convert those to just accept an fs_info pointer directly. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jeff Mahoney 提交于
In routines where someptr->fs_info is referenced multiple times, we introduce a convenience variable. This makes the code considerably more readable. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jeff Mahoney 提交于
Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jeff Mahoney 提交于
We track the node sizes per-root, but they never vary from the values in the superblock. This patch messes with the 80-column style a bit, but subsequent patches to factor out root->fs_info into a convenience variable fix it up again. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 30 11月, 2016 4 次提交
-
-
由 Robbie Ko 提交于
The hole punching can result in adding new leafs (and as a consequence new nodes) to the tree because when we find file extent items that span beyond the hole range we may end up not deleting them (just adjusting them, reducing their range by reducing their length or increasing their offset field) and add new file extent items representing holes. So after splitting a leaf (therefore creating a new one) to insert a new file extent item representing a hole, a new node might be added to each level of the tree in the worst case scenario (since there's a new key and every parent node was full). For example if a file has an extent item representing the range 0 to 64Mb and we punch a hole in the range 1Mb to 20Mb, the existing extent item is duplicated and one of the copies is adjusted to represent the range 0 to 1Mb, the other copy adjusted to represent the range 20Mb to 64Mb, and a new file extent item representing a hole in the range 1Mb to 20Mb is inserted. Fix this by using btrfs_calc_trans_metadata_size() instead of btrfs_calc_trunc_metadata_size(), so that enough metadata space is reserved for the worst possible case. Signed-off-by: NRobbie Ko <robbieko@synology.com> Reviewed-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NFilipe Manana <fdmanana@suse.com> [Modified changelog for clarity and correctness]
-
由 Josef Bacik 提交于
At this point we will have dropped extent entries from the file, so if we fail to insert the new hole entries then we are leaving the fs in a corrupt state (albeit an easily fixed one). Abort the transaciton if this happens so we can avoid corrupting the fs. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
In order to do hole punching we have a block reserve to hold the reservation we need to drop the extents in our range. Since we could end up dropping a lot of extents we set rsv->failfast so we can just loop around again and drop the remaining of the range. Unfortunately we unconditionally fill the hole extents in and start from the last extent we encountered, which we may or may not have dropped. So this can result in overlapping file extent entries, which can be tripped over in a variety of ways, either by hitting BUG_ON(!ret) in fill_holes() after the search, or in btrfs_set_item_key_safe() in btrfs_drop_extent() at a later time by an unrelated task. Fix this by only setting drop_end to the last extent we did actually drop. This way our holes are filled in properly for the range that we did drop, and the rest of the range that remains to be dropped is actually dropped. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 28 9月, 2016 1 次提交
-
-
由 Deepa Dinamani 提交于
current_fs_time() uses struct super_block* as an argument. As per Linus's suggestion, this is changed to take struct inode* as a parameter instead. This is because the function is primarily meant for vfs inode timestamps. Also the function was renamed as per Arnd's suggestion. Change all calls to current_fs_time() to use the new current_time() function instead. current_fs_time() will be deleted. Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 26 9月, 2016 2 次提交
-
-
由 Josef Bacik 提交于
No reason to bug on in here, fs corruption could easily cause these things to happen. Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
btrfs: extend btrfs_set_extent_delalloc and its friends to support in-band dedupe and subpage size patchset Extend btrfs_set_extent_delalloc() and extent_clear_unlock_delalloc() parameters for both in-band dedupe and subpage sector size patchset. This should reduce conflict of both patchset and the effort to rebase them. Cc: Chandan Rajendra <chandan@linux.vnet.ibm.com> Cc: David Sterba <dsterba@suse.cz> Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 16 9月, 2016 1 次提交
-
-
由 Miklos Szeredi 提交于
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Cc: Chris Mason <clm@fb.com>
-
- 25 8月, 2016 2 次提交
-
-
由 Filipe Manana 提交于
Commit 44f714da ("Btrfs: improve performance on fsync against new inode after rename/unlink"), which landed in 4.8-rc2, introduced a possibility for a deadlock due to double locking of an inode's log mutex by the same task, which lockdep reports with: [23045.433975] ============================================= [23045.434748] [ INFO: possible recursive locking detected ] [23045.435426] 4.7.0-rc6-btrfs-next-34+ #1 Not tainted [23045.436044] --------------------------------------------- [23045.436044] xfs_io/3688 is trying to acquire lock: [23045.436044] (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs] [23045.436044] but task is already holding lock: [23045.436044] (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs] [23045.436044] other info that might help us debug this: [23045.436044] Possible unsafe locking scenario: [23045.436044] CPU0 [23045.436044] ---- [23045.436044] lock(&ei->log_mutex); [23045.436044] lock(&ei->log_mutex); [23045.436044] *** DEADLOCK *** [23045.436044] May be due to missing lock nesting notation [23045.436044] 3 locks held by xfs_io/3688: [23045.436044] #0: (&sb->s_type->i_mutex_key#15){+.+...}, at: [<ffffffffa035f2ae>] btrfs_sync_file+0x14e/0x425 [btrfs] [23045.436044] #1: (sb_internal#2){.+.+.+}, at: [<ffffffff8118446b>] __sb_start_write+0x5f/0xb0 [23045.436044] #2: (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs] [23045.436044] stack backtrace: [23045.436044] CPU: 4 PID: 3688 Comm: xfs_io Not tainted 4.7.0-rc6-btrfs-next-34+ #1 [23045.436044] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [23045.436044] 0000000000000000 ffff88022f5f7860 ffffffff8127074d ffffffff82a54b70 [23045.436044] ffffffff82a54b70 ffff88022f5f7920 ffffffff81092897 ffff880228015d68 [23045.436044] 0000000000000000 ffffffff82a54b70 ffffffff829c3f00 ffff880228015d68 [23045.436044] Call Trace: [23045.436044] [<ffffffff8127074d>] dump_stack+0x67/0x90 [23045.436044] [<ffffffff81092897>] __lock_acquire+0xcbb/0xe4e [23045.436044] [<ffffffff8109155f>] ? mark_lock+0x24/0x201 [23045.436044] [<ffffffff8109179a>] ? mark_held_locks+0x5e/0x74 [23045.436044] [<ffffffff81092de0>] lock_acquire+0x12f/0x1c3 [23045.436044] [<ffffffff81092de0>] ? lock_acquire+0x12f/0x1c3 [23045.436044] [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs] [23045.436044] [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs] [23045.436044] [<ffffffff814a51a4>] mutex_lock_nested+0x77/0x3a7 [23045.436044] [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs] [23045.436044] [<ffffffffa039705e>] ? btrfs_release_delayed_node+0xb/0xd [btrfs] [23045.436044] [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs] [23045.436044] [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs] [23045.436044] [<ffffffff810a0ed1>] ? vprintk_emit+0x453/0x465 [23045.436044] [<ffffffffa0385a61>] btrfs_log_inode+0x66e/0xc95 [btrfs] [23045.436044] [<ffffffffa03c084d>] log_new_dir_dentries+0x26c/0x359 [btrfs] [23045.436044] [<ffffffffa03865aa>] btrfs_log_inode_parent+0x4a6/0x628 [btrfs] [23045.436044] [<ffffffffa0387552>] btrfs_log_dentry_safe+0x5a/0x75 [btrfs] [23045.436044] [<ffffffffa035f464>] btrfs_sync_file+0x304/0x425 [btrfs] [23045.436044] [<ffffffff811acaf4>] vfs_fsync_range+0x8c/0x9e [23045.436044] [<ffffffff811acb22>] vfs_fsync+0x1c/0x1e [23045.436044] [<ffffffff811acc79>] do_fsync+0x31/0x4a [23045.436044] [<ffffffff811ace99>] SyS_fsync+0x10/0x14 [23045.436044] [<ffffffff814a88e5>] entry_SYSCALL_64_fastpath+0x18/0xa8 [23045.436044] [<ffffffff8108f039>] ? trace_hardirqs_off_caller+0x3f/0xaa An example reproducer for this is: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/dir $ touch /mnt/dir/foo $ sync $ mv /mnt/dir/foo /mnt/dir/bar $ touch /mnt/dir/foo $ xfs_io -c "fsync" /mnt/dir/bar This is because while logging the inode of file bar we end up logging its parent directory (since its inode has an unlink_trans field matching the current transaction id due to the rename operation), which in turn logs the inodes for all its new dentries, so that the new inode for the new file named foo gets logged which in turn triggered another logging attempt for the inode we are fsync'ing, since that inode had an old name that corresponds to the name of the new inode. So fix this by ensuring that when logging the inode for a new dentry that has a name matching an old name of some other inode, we don't log again the original inode that we are fsync'ing. Fixes: 44f714da ("Btrfs: improve performance on fsync against new inode after rename/unlink") Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Wang Xiaoguang 提交于
This patch can fix some false ENOSPC errors, below test script can reproduce one false ENOSPC error: #!/bin/bash dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128 dev=$(losetup --show -f fs.img) mkfs.btrfs -f -M $dev mkdir /tmp/mntpoint mount $dev /tmp/mntpoint cd /tmp/mntpoint xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile Above script will fail for ENOSPC reason, but indeed fs still has free space to satisfy this request. Please see call graph: btrfs_fallocate() |-> btrfs_alloc_data_chunk_ondemand() | bytes_may_use += 64M |-> btrfs_prealloc_file_range() |-> btrfs_reserve_extent() |-> btrfs_add_reserved_bytes() | alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not | change bytes_may_use, and bytes_reserved += 64M. Now | bytes_may_use + bytes_reserved == 128M, which is greater | than btrfs_space_info's total_bytes, false enospc occurs. | Note, the bytes_may_use decrease operation will be done in | end of btrfs_fallocate(), which is too late. Here is another simple case for buffered write: CPU 1 | CPU 2 | |-> cow_file_range() |-> __btrfs_buffered_write() |-> btrfs_reserve_extent() | | | | | | | | | ..... | |-> btrfs_check_data_free_space() | | | | |-> extent_clear_unlock_delalloc() | In CPU 1, btrfs_reserve_extent()->find_free_extent()-> btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease operation will be delayed to be done in extent_clear_unlock_delalloc(). Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's btrfs_check_data_free_space() tries to reserve 100MB data space. If 100MB > data_sinfo->total_bytes - data_sinfo->bytes_used - data_sinfo->bytes_reserved - data_sinfo->bytes_pinned - data_sinfo->bytes_readonly - data_sinfo->bytes_may_use btrfs_check_data_free_space() will try to allcate new data chunk or call btrfs_start_delalloc_roots(), or commit current transaction in order to reserve some free space, obviously a lot of work. But indeed it's not necessary as long as decreasing bytes_may_use timely, we still have free space, decreasing 128M from bytes_may_use. To fix this issue, this patch chooses to update bytes_may_use for both data and metadata in btrfs_add_reserved_bytes(). For compress path, real extent length may not be equal to file content length, so introduce a ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by file content length. Then compress path can update bytes_may_use correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT, RESERVE_ALLOC and RESERVE_FREE. As we know, usually EXTENT_DO_ACCOUNTING is used for error path. In run_delalloc_nocow(), for inode marked as NODATACOW or extent marked as PREALLOC, we also need to update bytes_may_use, but can not pass EXTENT_DO_ACCOUNTING, because it also clears metadata reservation, so here we introduce EXTENT_CLEAR_DATA_RESV flag to indicate btrfs_clear_bit_hook() to update btrfs_space_info's bytes_may_use. Meanwhile __btrfs_prealloc_file_range() will call btrfs_free_reserved_data_space() internally for both sucessful and failed path, btrfs_prealloc_file_range()'s callers does not need to call btrfs_free_reserved_data_space() any more. Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 01 8月, 2016 1 次提交
-
-
由 Filipe Manana 提交于
When we start an fsync we start ordered extents for all delalloc ranges. However before attempting to log the inode, we only wait for those ordered extents if we are not doing a full sync (bit BTRFS_INODE_NEEDS_FULL_SYNC is set in the inode's flags). This means that if an ordered extent completes with an IO error before we check if we can skip logging the inode, we will not catch and report the IO error to user space. This is because on an IO error, when the ordered extent completes we do not update the inode, so if the inode was not previously updated by the current transaction we end up not logging it through calls to fsync and therefore not check its mapping flags for the presence of IO errors. Fix this by checking for errors in the flags of the inode's mapping when we notice we can skip logging the inode. This caused sporadic failures in the test generic/331 (which explicitly tests for IO errors during an fsync call). Signed-off-by: NFilipe Manana <fdmanana@suse.com> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
-
- 26 7月, 2016 1 次提交
-
-
由 Jeff Mahoney 提交于
__btrfs_abort_transaction doesn't use its root parameter except to obtain an fs_info pointer. We can obtain that from trans->root->fs_info for now and from trans->fs_info in a later patch. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-