- 10 5月, 2016 1 次提交
-
-
由 David Sterba 提交于
Currently we lack the identification of the filesystem in most if not all mount messages, done via printk/pr_* functions. We can use the btrfs_* helpers in open_ctree, as the fs_info <-> sb link is established at the beginning of the function. The messages have been updated at the same time to be more consistent: * dropped sb->s_id, as it's not available via btrfs_* * added %d for return code where appropriate * wording changed * %Lx replaced by %llx Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 06 5月, 2016 2 次提交
-
-
由 David Sterba 提交于
Move the op exclusivity check before the other code (same as in ADD_DEV). Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
It seems to be long time unused, since 2008 and 6885f308 ("Btrfs: Misc 2.6.25 updates"). Propagating the removal touches some code but has no functional effect. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 02 5月, 2016 1 次提交
-
-
由 Anand Jain 提交于
Also drop the newline from the message. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 28 4月, 2016 7 次提交
-
-
由 Luis de Bethencourt 提交于
Correct a typo in the chunk_mutex name to make it grepable. Since it is better to fix several typos at once, fixing the 2 more in the same file. Signed-off-by: NLuis de Bethencourt <luisbg@osg.samsung.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Geert Uytterhoeven 提交于
fs/btrfs/extent-tree.c: In function ‘btrfs_lock_cluster’: fs/btrfs/extent-tree.c:6399: warning: ‘used_bg’ may be used uninitialized in this function - Replace "again: ... goto again;" by standard C "while (1) { ... }", - Move block not processed during the first iteration of the loop to the end of the loop, which allows to kill the "locked" variable, Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Reviewed-and-Tested-by: NMiao Xie <miaox@cn.fujitsu.com> [ the compilation warning has been fixed by other patch, now we want to clean up the function ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
Actually save_error_info() sets the FS state to error and nothing else. Further the word save doesn't induce caffeine when compared to the word set in what actually it does. So to make it better understandable move save_error_info() code to its only consumer itself. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Satoru Takeuchi 提交于
Signed-off-by: NSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
Looks like we added the incompatible defines in between the error handling defines in the file ctree.h. Now group them back. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
Apparently looks like ASSERT does the same intended job, as intended btrfs_assert(). Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
btrfs_std_error() handles errors, puts FS into readonly mode (as of now). So its good idea to rename it to btrfs_handle_fs_error(). Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ edit changelog ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 07 4月, 2016 1 次提交
-
-
由 Filipe Manana 提交于
If we rename an inode A (be it a file or a directory), create a new inode B with the old name of inode A and under the same parent directory, fsync inode B and then power fail, at log tree replay time we end up removing inode A completely. If inode A is a directory then all its files are gone too. Example scenarios where this happens: This is reproducible with the following steps, taken from a couple of test cases written for fstests which are going to be submitted upstream soon: # Scenario 1 mkfs.btrfs -f /dev/sdc mount /dev/sdc /mnt mkdir -p /mnt/a/x echo "hello" > /mnt/a/x/foo echo "world" > /mnt/a/x/bar sync mv /mnt/a/x /mnt/a/y mkdir /mnt/a/x xfs_io -c fsync /mnt/a/x <power failure happens> The next time the fs is mounted, log tree replay happens and the directory "y" does not exist nor do the files "foo" and "bar" exist anywhere (neither in "y" nor in "x", nor the root nor anywhere). # Scenario 2 mkfs.btrfs -f /dev/sdc mount /dev/sdc /mnt mkdir /mnt/a echo "hello" > /mnt/a/foo sync mv /mnt/a/foo /mnt/a/bar echo "world" > /mnt/a/foo xfs_io -c fsync /mnt/a/foo <power failure happens> The next time the fs is mounted, log tree replay happens and the file "bar" does not exists anymore. A file with the name "foo" exists and it matches the second file we created. Another related problem that does not involve file/data loss is when a new inode is created with the name of a deleted snapshot and we fsync it: mkfs.btrfs -f /dev/sdc mount /dev/sdc /mnt mkdir /mnt/testdir btrfs subvolume snapshot /mnt /mnt/testdir/snap btrfs subvolume delete /mnt/testdir/snap rmdir /mnt/testdir mkdir /mnt/testdir xfs_io -c fsync /mnt/testdir # or fsync some file inside /mnt/testdir <power failure> The next time the fs is mounted the log replay procedure fails because it attempts to delete the snapshot entry (which has dir item key type of BTRFS_ROOT_ITEM_KEY) as if it were a regular (non-root) entry, resulting in the following error that causes mount to fail: [52174.510532] BTRFS info (device dm-0): failed to delete reference to snap, inode 257 parent 257 [52174.512570] ------------[ cut here ]------------ [52174.513278] WARNING: CPU: 12 PID: 28024 at fs/btrfs/inode.c:3986 __btrfs_unlink_inode+0x178/0x351 [btrfs]() [52174.514681] BTRFS: Transaction aborted (error -2) [52174.515630] Modules linked in: btrfs dm_flakey dm_mod overlay crc32c_generic ppdev xor raid6_pq acpi_cpufreq parport_pc tpm_tis sg parport tpm evdev i2c_piix4 proc [52174.521568] CPU: 12 PID: 28024 Comm: mount Tainted: G W 4.5.0-rc6-btrfs-next-27+ #1 [52174.522805] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 [52174.524053] 0000000000000000 ffff8801df2a7710 ffffffff81264e93 ffff8801df2a7758 [52174.524053] 0000000000000009 ffff8801df2a7748 ffffffff81051618 ffffffffa03591cd [52174.524053] 00000000fffffffe ffff88015e6e5000 ffff88016dbc3c88 ffff88016dbc3c88 [52174.524053] Call Trace: [52174.524053] [<ffffffff81264e93>] dump_stack+0x67/0x90 [52174.524053] [<ffffffff81051618>] warn_slowpath_common+0x99/0xb2 [52174.524053] [<ffffffffa03591cd>] ? __btrfs_unlink_inode+0x178/0x351 [btrfs] [52174.524053] [<ffffffff81051679>] warn_slowpath_fmt+0x48/0x50 [52174.524053] [<ffffffffa03591cd>] __btrfs_unlink_inode+0x178/0x351 [btrfs] [52174.524053] [<ffffffff8118f5e9>] ? iput+0xb0/0x284 [52174.524053] [<ffffffffa0359fe8>] btrfs_unlink_inode+0x1c/0x3d [btrfs] [52174.524053] [<ffffffffa038631e>] check_item_in_log+0x1fe/0x29b [btrfs] [52174.524053] [<ffffffffa0386522>] replay_dir_deletes+0x167/0x1cf [btrfs] [52174.524053] [<ffffffffa038739e>] fixup_inode_link_count+0x289/0x2aa [btrfs] [52174.524053] [<ffffffffa038748a>] fixup_inode_link_counts+0xcb/0x105 [btrfs] [52174.524053] [<ffffffffa038a5ec>] btrfs_recover_log_trees+0x258/0x32c [btrfs] [52174.524053] [<ffffffffa03885b2>] ? replay_one_extent+0x511/0x511 [btrfs] [52174.524053] [<ffffffffa034f288>] open_ctree+0x1dd4/0x21b9 [btrfs] [52174.524053] [<ffffffffa032b753>] btrfs_mount+0x97e/0xaed [btrfs] [52174.524053] [<ffffffff8108e1b7>] ? trace_hardirqs_on+0xd/0xf [52174.524053] [<ffffffff8117bafa>] mount_fs+0x67/0x131 [52174.524053] [<ffffffff81193003>] vfs_kern_mount+0x6c/0xde [52174.524053] [<ffffffffa032af81>] btrfs_mount+0x1ac/0xaed [btrfs] [52174.524053] [<ffffffff8108e1b7>] ? trace_hardirqs_on+0xd/0xf [52174.524053] [<ffffffff8108c262>] ? lockdep_init_map+0xb9/0x1b3 [52174.524053] [<ffffffff8117bafa>] mount_fs+0x67/0x131 [52174.524053] [<ffffffff81193003>] vfs_kern_mount+0x6c/0xde [52174.524053] [<ffffffff8119590f>] do_mount+0x8a6/0x9e8 [52174.524053] [<ffffffff811358dd>] ? strndup_user+0x3f/0x59 [52174.524053] [<ffffffff81195c65>] SyS_mount+0x77/0x9f [52174.524053] [<ffffffff814935d7>] entry_SYSCALL_64_fastpath+0x12/0x6b [52174.561288] ---[ end trace 6b53049efb1a3ea6 ]--- Fix this by forcing a transaction commit when such cases happen. This means we check in the commit root of the subvolume tree if there was any other inode with the same reference when the inode we are fsync'ing is a new inode (created in the current transaction). Test cases for fstests, covering all the scenarios given above, were submitted upstream for fstests: * fstests: generic test for fsync after renaming directory https://patchwork.kernel.org/patch/8694281/ * fstests: generic test for fsync after renaming file https://patchwork.kernel.org/patch/8694301/ * fstests: add btrfs test for fsync after snapshot deletion https://patchwork.kernel.org/patch/8670671/ Cc: stable@vger.kernel.org Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 05 4月, 2016 2 次提交
-
-
由 Kirill A. Shutemov 提交于
Mostly direct substitution with occasional adjustment or removing outdated comments. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 4月, 2016 8 次提交
-
-
由 Yauhen Kharuzhy 提交于
If device replace entry was found on disk at mounting and its num_write_errors stats counter has non-NULL value, then replace operation will never be finished and -EIO error will be reported by btrfs_scrub_dev() because this counter is never reset. # mount -o degraded /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/ # btrfs replace status /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/ Started on 25.Mar 07:28:00, canceled on 25.Mar 07:28:01 at 0.0%, 40 write errs, 0 uncorr. read errs # btrfs replace start -B 4 /dev/sdg /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/ ERROR: ioctl(DEV_REPLACE_START) failed on "/media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/": Input/output error, no error Reset num_write_errors and num_uncorrectable_read_errors counters in the dev_replace structure before start of replacing. Signed-off-by: NYauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Mark Fasheh 提交于
This patch adds tracepoints to the qgroup code on both the reporting side (insert_dirty_extents) and the accounting side. Taken together it allows us to see what qgroup operations have happened, and what their result was. Signed-off-by: NMark Fasheh <mfasheh@suse.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
The fd we pass in may not be on a btrfs file system, so don't try to do BTRFS_I() on it. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The allocation of node could fail if the memory is too fragmented for a given node size, practically observed with 64k. http://article.gmane.org/gmane.comp.file-systems.btrfs/54689Reported-and-tested-by: NJean-Denis Girard <jd.girard@sysnux.pf> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Mark Fasheh 提交于
create_pending_snapshot() will go readonly on _any_ error return from btrfs_qgroup_inherit(). If qgroups are enabled, a user can crash their fs by just making a snapshot and asking it to inherit from an invalid qgroup. For example: $ btrfs sub snap -i 1/10 /btrfs/ /btrfs/foo Will cause a transaction abort. Fix this by only throwing errors in btrfs_qgroup_inherit() when we know going readonly is acceptable. The following xfstests test case reproduces this bug: seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" here=`pwd` tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { cd / rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter # remove previous $seqres.full before test rm -f $seqres.full # real QA test starts here _supported_fs btrfs _supported_os Linux _require_scratch rm -f $seqres.full _scratch_mkfs _scratch_mount _run_btrfs_util_prog quota enable $SCRATCH_MNT # The qgroup '1/10' does not exist and should be silently ignored _run_btrfs_util_prog subvolume snapshot -i 1/10 $SCRATCH_MNT $SCRATCH_MNT/snap1 _scratch_unmount echo "Silence is golden" status=0 exit Signed-off-by: NMark Fasheh <mfasheh@suse.de> Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
As one user in mail list report reproducible balance ENOSPC error, it's better to add more debug info for enospc_debug mount option. Reported-by: NMarc Haber <mh+linux-btrfs@zugschlus.de> Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
Dan Carpenter's static checker has found this error, it's introduced by commit 64c043de ("Btrfs: fix up read_tree_block to return proper error") It's really supposed to 'break' the loop on error like others. Cc: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Davide Italiano 提交于
- We call inode_size_ok() only if FL_KEEP_SIZE isn't specified. - As an optimisation we can skip the call if (off + len) isn't greater than the current size of the file. This operation is called under the lock so the less work we do, the better. - If we call inode_size_ok() pass to it the correct value rather than a more conservative estimation. Signed-off-by: NDavide Italiano <dccitaliano@gmail.com> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 31 3月, 2016 1 次提交
-
-
由 Filipe Manana 提交于
If the lower or upper directory of an overlayfs mount belong to a btrfs file system and we fsync the file through the overlayfs' merged directory we ended up accessing an inode that didn't belong to btrfs as if it were a btrfs inode at btrfs_sync_file() resulting in a crash like the following: [ 7782.588845] BUG: unable to handle kernel NULL pointer dereference at 0000000000000544 [ 7782.590624] IP: [<ffffffffa030b7ab>] btrfs_sync_file+0x11b/0x3e9 [btrfs] [ 7782.591931] PGD 4d954067 PUD 1e878067 PMD 0 [ 7782.592016] Oops: 0002 [#6] PREEMPT SMP DEBUG_PAGEALLOC [ 7782.592016] Modules linked in: btrfs overlay ppdev crc32c_generic evdev xor raid6_pq psmouse pcspkr sg serio_raw acpi_cpufreq parport_pc parport tpm_tis i2c_piix4 tpm i2c_core processor button loop autofs4 ext4 crc16 mbcache jbd2 sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio scsi_mod e1000 floppy [last unloaded: btrfs] [ 7782.592016] CPU: 10 PID: 16437 Comm: xfs_io Tainted: G D 4.5.0-rc6-btrfs-next-26+ #1 [ 7782.592016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 [ 7782.592016] task: ffff88001b8d40c0 ti: ffff880137488000 task.ti: ffff880137488000 [ 7782.592016] RIP: 0010:[<ffffffffa030b7ab>] [<ffffffffa030b7ab>] btrfs_sync_file+0x11b/0x3e9 [btrfs] [ 7782.592016] RSP: 0018:ffff88013748be40 EFLAGS: 00010286 [ 7782.592016] RAX: 0000000080000000 RBX: ffff880133b30c88 RCX: 0000000000000001 [ 7782.592016] RDX: 0000000000000001 RSI: ffffffff8148fec0 RDI: 00000000ffffffff [ 7782.592016] RBP: ffff88013748bec0 R08: 0000000000000001 R09: 0000000000000000 [ 7782.624248] R10: ffff88013748be40 R11: 0000000000000246 R12: 0000000000000000 [ 7782.624248] R13: 0000000000000000 R14: 00000000009305a0 R15: ffff880015e3be40 [ 7782.624248] FS: 00007fa83b9cb700(0000) GS:ffff88023ed40000(0000) knlGS:0000000000000000 [ 7782.624248] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7782.624248] CR2: 0000000000000544 CR3: 00000001fa652000 CR4: 00000000000006e0 [ 7782.624248] Stack: [ 7782.624248] ffffffff8108b5cc ffff88013748bec0 0000000000000246 ffff8800b005ded0 [ 7782.624248] ffff880133b30d60 8000000000000000 7fffffffffffffff 0000000000000246 [ 7782.624248] 0000000000000246 ffffffff81074f9b ffffffff8104357c ffff880015e3be40 [ 7782.624248] Call Trace: [ 7782.624248] [<ffffffff8108b5cc>] ? arch_local_irq_save+0x9/0xc [ 7782.624248] [<ffffffff81074f9b>] ? ___might_sleep+0xce/0x217 [ 7782.624248] [<ffffffff8104357c>] ? __do_page_fault+0x3c0/0x43a [ 7782.624248] [<ffffffff811a2351>] vfs_fsync_range+0x8c/0x9e [ 7782.624248] [<ffffffff811a237f>] vfs_fsync+0x1c/0x1e [ 7782.624248] [<ffffffff811a24d6>] do_fsync+0x31/0x4a [ 7782.624248] [<ffffffff811a2700>] SyS_fsync+0x10/0x14 [ 7782.624248] [<ffffffff81493617>] entry_SYSCALL_64_fastpath+0x12/0x6b [ 7782.624248] Code: 85 c0 0f 85 e2 02 00 00 48 8b 45 b0 31 f6 4c 29 e8 48 ff c0 48 89 45 a8 48 8d 83 d8 00 00 00 48 89 c7 48 89 45 a0 e8 fc 43 18 e1 <f0> 41 ff 84 24 44 05 00 00 48 8b 83 58 ff ff ff 48 c1 e8 07 83 [ 7782.624248] RIP [<ffffffffa030b7ab>] btrfs_sync_file+0x11b/0x3e9 [btrfs] [ 7782.624248] RSP <ffff88013748be40> [ 7782.624248] CR2: 0000000000000544 [ 7782.661994] ---[ end trace 721e14960eb939bc ]--- This started happening since commit 4bacc9c9 (overlayfs: Make f_path always point to the overlay and f_inode to the underlay) and even though after this change we could still access the btrfs inode through struct file->f_mapping->host or struct file->f_inode, we would end up resulting in more similar issues later on at check_parent_dirs_for_sync() because the dentry we got (from struct file->f_path.dentry) was from overlayfs and not from btrfs, that is, we had no way of getting the dentry that belonged to btrfs (we always got the dentry that belonged to overlayfs). The new patch from Miklos Szeredi, titled "vfs: add file_dentry()" and recently submitted to linux-fsdevel, adds a file_dentry() API that allows us to get the btrfs dentry from the input file and therefore being able to fsync when the upper and lower directories belong to btrfs filesystems. This issue has been reported several times by users in the mailing list and bugzilla. A test case for xfstests is being submitted as well. Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=101951 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109791Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com> Cc: stable@vger.kernel.org
-
- 22 3月, 2016 4 次提交
-
-
由 Jiri Kosina 提交于
transaction_kthread() is calling try_to_freeze(), but that's just an expeinsive no-op given the fact that the thread is not marked freezable. After removing this, disk-io.c is now independent on freezer API. Signed-off-by: NJiri Kosina <jkosina@suse.cz> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jiri Kosina 提交于
cleaner_kthread() is not marked freezable, and therefore calling try_to_freeze() in its context is a pointless no-op. In addition to that, as has been clearly demonstrated by 80ad623e ("Revert "btrfs: clear PF_NOFREEZE in cleaner_kthread()"), it's perfectly valid / legal for cleaner_kthread() to stay scheduled out in an arbitrary place during suspend (in that particular example that was waiting for reading of extent pages), so there is no need to leave any traces of freezer in this kthread. Fixes: 80ad623e ("Revert "btrfs: clear PF_NOFREEZE in cleaner_kthread()") Fixes: 69624913 ("btrfs: clear PF_NOFREEZE in cleaner_kthread()") Signed-off-by: NJiri Kosina <jkosina@suse.cz> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Alex Lyakas 提交于
csum_dirty_buffer was issuing a warning in case the extent buffer did not look alright, but was still returning success. Let's return error in this case, and also add an additional sanity check on the extent buffer header. The caller up the chain may BUG_ON on this, for example flush_epd_write_bio will, but it is better than to have a silent metadata corruption on disk. Signed-off-by: NAlex Lyakas <alex@zadarastorage.com> Reviewed-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Alex Lyakas 提交于
Signed-off-by: NAlex Lyakas <alex@zadarastorage.com> Reviewed-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 21 3月, 2016 1 次提交
-
-
由 Chris Mason 提交于
Commit c40a3d38 (Btrfs: Compute and look up csums based on sectorsized blocks) changes around how we walk the bios while looking up crcs. There's an inner loop that is jumping to the next bvec based on sectors and before it derefs the next bvec, it needs to make sure we're still in the bio. In this case, the outer loop would have decided to stop moving forward too, and the bvec deref is never actually used for anything. But CONFIG_DEBUG_PAGEALLOC catches it because we're outside our bio. Signed-off-by: NChris Mason <clm@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com>
-
- 18 3月, 2016 1 次提交
-
-
由 Matthew Wilcox 提交于
Even though this is a 'can't happen' situation, use the new radix_tree_iter_retry() pattern to eliminate a goto. [akpm@linux-foundation.org: fix btrfs build] Signed-off-by: NMatthew Wilcox <willy@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: David Sterba <dsterba@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 3月, 2016 2 次提交
-
-
由 Adam Buchbinder 提交于
Signed-off-by: NAdam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Ashish Samant 提交于
Dont print warning for ENOSPC error unless ENOSPC_DEBUG is enabled. Use btrfs_debug if it is enabled. Signed-off-by: NAshish Samant <ashish.samant@oracle.com> [ preserve the WARN_ON ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 12 3月, 2016 4 次提交
-
-
由 Dan Carpenter 提交于
It's basically harmless if "ref_level" isn't initialized since it's only used for an error message, but it causes a static checker warning. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
So that its better organized. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
So that it indicates what it does. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Satoru Takeuchi 提交于
It's better to show a warning message for the exceptional case that one of objectid (in most case, inode number) reaches its highest value. For example, if inode cache is off and this event happens, we can't create any file even if there are not so many files. This message ease detecting such problem. Signed-off-by: NSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 11 3月, 2016 1 次提交
-
-
由 Rasmus Villemoes 提交于
This is more readable. Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 04 3月, 2016 1 次提交
-
-
由 Filipe Manana 提交于
When looking for orphan roots during mount we can end up hitting a BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is replayed and qgroups are enabled. This is because after a log tree is replayed, a transaction commit is made, which triggers qgroup extent accounting which in turn does backref walking which ends up reading and inserting all roots in the radix tree fs_info->fs_root_radix, including orphan roots (deleted snapshots). So after the log tree is replayed, when finding orphan roots we hit the BUG_ON with the following trace: [118209.182438] ------------[ cut here ]------------ [118209.183279] kernel BUG at fs/btrfs/root-tree.c:314! [118209.184074] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [118209.185123] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic ppdev xor raid6_pq evdev sg parport_pc parport acpi_cpufreq tpm_tis tpm psmouse processor i2c_piix4 serio_raw pcspkr i2c_core button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring virtio scsi_mod e1000 floppy [last unloaded: btrfs] [118209.186318] CPU: 14 PID: 28428 Comm: mount Tainted: G W 4.5.0-rc5-btrfs-next-24+ #1 [118209.186318] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 [118209.186318] task: ffff8801ec131040 ti: ffff8800af34c000 task.ti: ffff8800af34c000 [118209.186318] RIP: 0010:[<ffffffffa04237d7>] [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs] [118209.186318] RSP: 0018:ffff8800af34faa8 EFLAGS: 00010246 [118209.186318] RAX: 00000000ffffffef RBX: 00000000ffffffef RCX: 0000000000000001 [118209.186318] RDX: 0000000080000000 RSI: 0000000000000001 RDI: 00000000ffffffff [118209.186318] RBP: ffff8800af34fb08 R08: 0000000000000001 R09: 0000000000000000 [118209.186318] R10: ffff8800af34f9f0 R11: 6db6db6db6db6db7 R12: ffff880171b97000 [118209.186318] R13: ffff8801ca9d65e0 R14: ffff8800afa2e000 R15: 0000160000000000 [118209.186318] FS: 00007f5bcb914840(0000) GS:ffff88023edc0000(0000) knlGS:0000000000000000 [118209.186318] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [118209.186318] CR2: 00007f5bcaceb5d9 CR3: 00000000b49b5000 CR4: 00000000000006e0 [118209.186318] Stack: [118209.186318] fffffbffffffffff 010230ffffffffff 0101000000000000 ff84000000000000 [118209.186318] fbffffffffffffff 30ffffffffffffff 0000000000000101 ffff880082348000 [118209.186318] 0000000000000000 ffff8800afa2e000 ffff8800afa2e000 0000000000000000 [118209.186318] Call Trace: [118209.186318] [<ffffffffa042e2db>] open_ctree+0x1e37/0x21b9 [btrfs] [118209.186318] [<ffffffffa040a753>] btrfs_mount+0x97e/0xaed [btrfs] [118209.186318] [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf [118209.186318] [<ffffffff8117b87e>] mount_fs+0x67/0x131 [118209.186318] [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde [118209.186318] [<ffffffffa0409f81>] btrfs_mount+0x1ac/0xaed [btrfs] [118209.186318] [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf [118209.186318] [<ffffffff8108c26b>] ? lockdep_init_map+0xb9/0x1b3 [118209.186318] [<ffffffff8117b87e>] mount_fs+0x67/0x131 [118209.186318] [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde [118209.186318] [<ffffffff81195637>] do_mount+0x8a6/0x9e8 [118209.186318] [<ffffffff8119598d>] SyS_mount+0x77/0x9f [118209.186318] [<ffffffff81493017>] entry_SYSCALL_64_fastpath+0x12/0x6b [118209.186318] Code: 64 00 00 85 c0 89 c3 75 24 f0 41 80 4c 24 20 20 49 8b bc 24 f0 01 00 00 4c 89 e6 e8 e8 65 00 00 85 c0 89 c3 74 11 83 f8 ef 75 02 <0f> 0b 4c 89 e7 e8 da 72 00 00 eb 1c 41 83 bc 24 00 01 00 00 00 [118209.186318] RIP [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs] [118209.186318] RSP <ffff8800af34faa8> [118209.230735] ---[ end trace 83938f987d85d477 ]--- So fix this by not treating the error -EEXIST, returned when attempting to insert a root already inserted by the backref walking code, as an error. The following test case for xfstests reproduces the bug: seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { _cleanup_flakey cd / rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter . ./common/dmflakey # real QA test starts here _supported_fs btrfs _supported_os Linux _require_scratch _require_dm_target flakey _require_metadata_journaling $SCRATCH_DEV rm -f $seqres.full _scratch_mkfs >>$seqres.full 2>&1 _init_flakey _mount_flakey _run_btrfs_util_prog quota enable $SCRATCH_MNT # Create 2 directories with one file in one of them. # We use these just to trigger a transaction commit later, moving the file from # directory a to directory b and doing an fsync against directory a. mkdir $SCRATCH_MNT/a mkdir $SCRATCH_MNT/b touch $SCRATCH_MNT/a/f sync # Create our test file with 2 4K extents. $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io # Create a snapshot and delete it. This doesn't really delete the snapshot # immediately, just makes it inaccessible and invisible to user space, the # snapshot is deleted later by a dedicated kernel thread (cleaner kthread) # which is woke up at the next transaction commit. # A root orphan item is inserted into the tree of tree roots, so that if a # power failure happens before the dedicated kernel thread does the snapshot # deletion, the next time the filesystem is mounted it resumes the snapshot # deletion. _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap # Now overwrite half of the extents we wrote before. Because we made a snapshpot # before, which isn't really deleted yet (since no transaction commit happened # after we did the snapshot delete request), the non overwritten extents get # referenced twice, once by the default subvolume and once by the snapshot. $XFS_IO_PROG -c "pwrite -S 0xbb 4K 8K" $SCRATCH_MNT/foobar | _filter_xfs_io # Now move file f from directory a to directory b and fsync directory a. # The fsync on the directory a triggers a transaction commit (because a file # was moved from it to another directory) and the file fsync leaves a log tree # with file extent items to replay. mv $SCRATCH_MNT/a/f $SCRATCH_MNT/a/b $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar echo "File digest before power failure:" md5sum $SCRATCH_MNT/foobar | _filter_scratch # Now simulate a power failure and mount the filesystem to replay the log tree. # After the log tree was replayed, we used to hit a BUG_ON() when processing # the root orphan item for the deleted snapshot. This is because when processing # an orphan root the code expected to be the first code inserting the root into # the fs_info->fs_root_radix radix tree, while in reallity it was the second # caller attempting to do it - the first caller was the transaction commit that # took place after replaying the log tree, when updating the qgroup counters. _flakey_drop_and_remount echo "File digest before after failure:" # Must match what he got before the power failure. md5sum $SCRATCH_MNT/foobar | _filter_scratch _unmount_flakey status=0 exit Fixes: 2d9e9776 ("Btrfs: use btrfs_get_fs_root in resolve_indirect_ref") Cc: stable@vger.kernel.org # 4.4+ Signed-off-by: NFilipe Manana <fdmanana@suse.com> Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 02 3月, 2016 3 次提交
-
-
由 Filipe Manana 提交于
When logging that an inode exists, for example as part of a directory fsync operation, we were collecting any ordered extents for the inode but we ended up doing nothing with them except tagging them as processed, by setting the flag BTRFS_ORDERED_LOGGED on them, which prevented a subsequent fsync of that inode (using the LOG_INODE_ALL mode) from collecting and processing them. This created a time window where a second fsync against the inode, using the fast path, ended up not logging the checksums for the new extents but it logged the extents since they were part of the list of modified extents. This happened because the ordered extents were not collected and checksums were not yet added to the csum tree - the ordered extents have not gone through btrfs_finish_ordered_io() yet (which is where we add them to the csum tree by calling inode.c:add_pending_csums()). So fix this by not collecting an inode's ordered extents if we are logging it with the LOG_INODE_EXISTS mode. Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Filipe Manana 提交于
If we're about to do a fast fsync for an inode and btrfs_inode_in_log() returns false, it's possible that we had an ordered extent in progress (btrfs_finish_ordered_io() not run yet) when we noticed that the inode's last_trans field was not greater than the id of the last committed transaction, but shortly after, before we checked if there were any ongoing ordered extents, the ordered extent had just completed and removed itself from the inode's ordered tree, in which case we end up not logging the inode, losing some data if a power failure or crash happens after the fsync handler returns and before the transaction is committed. Fix this by checking first if there are any ongoing ordered extents before comparing the inode's last_trans with the id of the last committed transaction - when it completes, an ordered extent always updates the inode's last_trans before it removes itself from the inode's ordered tree (at btrfs_finish_ordered_io()). Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Filipe Manana 提交于
In the listxattrs handler, we were not listing all the xattrs that are packed in the same btree item, which happens when multiple xattrs have a name that when crc32c hashed produce the same checksum value. Fix this by processing them all. The following test case for xfstests reproduces the issue: seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { cd / rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter . ./common/attr # real QA test starts here _supported_fs generic _supported_os Linux _require_scratch _require_attrs rm -f $seqres.full _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount # Create our test file with a few xattrs. The first 3 xattrs have a name # that when given as input to a crc32c function result in the same checksum. # This made btrfs list only one of the xattrs through listxattrs system call # (because it packs xattrs with the same name checksum into the same btree # item). touch $SCRATCH_MNT/testfile $SETFATTR_PROG -n user.foobar -v 123 $SCRATCH_MNT/testfile $SETFATTR_PROG -n user.WvG1c1Td -v qwerty $SCRATCH_MNT/testfile $SETFATTR_PROG -n user.J3__T_Km3dVsW_ -v hello $SCRATCH_MNT/testfile $SETFATTR_PROG -n user.something -v pizza $SCRATCH_MNT/testfile $SETFATTR_PROG -n user.ping -v pong $SCRATCH_MNT/testfile # Now call getfattr with --dump, which calls the listxattrs system call. # It should list all the xattrs we have set before. $GETFATTR_PROG --absolute-names --dump $SCRATCH_MNT/testfile | _filter_scratch status=0 exit Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-