- 13 10月, 2013 1 次提交
-
-
由 Dave Jones 提交于
If we take the 2nd retry path in ext4_expand_extra_isize_ea, we potentionally return from the function without having freed these allocations. If we don't do the return, we over-write the previous allocation pointers, so we leak either way. Spotted with Coverity. [ Fixed by tytso to set is and bs to NULL after freeing these pointers, in case in the retry loop we later end up triggering an error causing a jump to cleanup, at which point we could have a double free bug. -- Ted ] Signed-off-by: NDave Jones <davej@fedoraproject.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Cc: stable@vger.kernel.org
-
- 11 10月, 2013 4 次提交
-
-
由 Miao Xie 提交于
When doing space balance and subvolume destroy at the same time, we met the following oops: kernel BUG at fs/btrfs/relocation.c:2247! RIP: 0010: [<ffffffffa04cec16>] prepare_to_merge+0x154/0x1f0 [btrfs] Call Trace: [<ffffffffa04b5ab7>] relocate_block_group+0x466/0x4e6 [btrfs] [<ffffffffa04b5c7a>] btrfs_relocate_block_group+0x143/0x275 [btrfs] [<ffffffffa0495c56>] btrfs_relocate_chunk.isra.27+0x5c/0x5a2 [btrfs] [<ffffffffa0459871>] ? btrfs_item_key_to_cpu+0x15/0x31 [btrfs] [<ffffffffa048b46a>] ? btrfs_get_token_64+0x7e/0xcd [btrfs] [<ffffffffa04a3467>] ? btrfs_tree_read_unlock_blocking+0xb2/0xb7 [btrfs] [<ffffffffa049907d>] btrfs_balance+0x9c7/0xb6f [btrfs] [<ffffffffa049ef84>] btrfs_ioctl_balance+0x234/0x2ac [btrfs] [<ffffffffa04a1e8e>] btrfs_ioctl+0xd87/0x1ef9 [btrfs] [<ffffffff81122f53>] ? path_openat+0x234/0x4db [<ffffffff813c3b78>] ? __do_page_fault+0x31d/0x391 [<ffffffff810f8ab6>] ? vma_link+0x74/0x94 [<ffffffff811250f5>] vfs_ioctl+0x1d/0x39 [<ffffffff811258c8>] do_vfs_ioctl+0x32d/0x3e2 [<ffffffff811259d4>] SyS_ioctl+0x57/0x83 [<ffffffff813c3bfa>] ? do_page_fault+0xe/0x10 [<ffffffff813c73c2>] system_call_fastpath+0x16/0x1b It is because we returned the error number if the reference of the root was 0 when doing space relocation. It was not right here, because though the root was dead(refs == 0), but the space it held still need be relocated, or we could not remove the block group. So in this case, we should return the root no matter it is dead or not. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Miao Xie 提交于
Now we don't drop all the deleted snapshots/subvolumes before the space balance. It means we have to relocate the space which is held by the dead snapshots/subvolumes. So we must into them into fs radix tree, or we would forget to commit the change of them when doing transaction commit, and it would corrupt the metadata. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
Liu fixed part of this problem and unfortunately I steered him in slightly the wrong direction and so didn't completely fix the problem. The problem is we limit the size of the delalloc range we are looking for to max bytes and then we try to lock that range. If we fail to lock the pages in that range we will shrink the max bytes to a single page and re loop. However if our first page is inside of the delalloc range then we will end up limiting the end of the range to a period before our first page. This is illustrated below [0 -------- delalloc range --------- 256mb] [page] So find_delalloc_range will return with delalloc_start as 0 and end as 128mb, and then we will notice that delalloc_start < *start and adjust it up, but not adjust delalloc_end up, so things go sideways. To fix this we need to not limit the max bytes in find_delalloc_range, but in find_lock_delalloc_range and that way we don't end up with this confusion. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
btrfs_rename was using the root of the old dir instead of the root of the new dir when checking for a hash collision, so if you tried to move a file into a subvol it would freak out because it would see the file you are trying to move in its current root. This fixes the bug where this would fail btrfs subvol create test1 btrfs subvol create test2 mv test1 test2. Thanks to Chris Murphy for catching this, Cc: stable@vger.kernel.org Reported-by: NChris Murphy <lists@colorremedies.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
- 05 10月, 2013 9 次提交
-
-
由 Darrick J. Wong 提交于
When btrfs creates a bioset, we must also allocate the integrity data pool. Otherwise btrfs will crash when it tries to submit a bio to a checksumming disk: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150 PGD 2305e4067 PUD 23063d067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: btrfs scsi_debug xfs ext4 jbd2 ext3 jbd mbcache sch_fq_codel eeprom lpc_ich mfd_core nfsd exportfs auth_rpcgss af_packet raid6_pq xor zlib_deflate libcrc32c [last unloaded: scsi_debug] CPU: 1 PID: 4486 Comm: mount Not tainted 3.12.0-rc1-mcsum #2 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8802451c9720 ti: ffff880230698000 task.ti: ffff880230698000 RIP: 0010:[<ffffffff8111e28a>] [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150 RSP: 0018:ffff880230699688 EFLAGS: 00010286 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000005f8445 RDX: 0000000000000001 RSI: 0000000000000010 RDI: 0000000000000000 RBP: ffff8802306996f8 R08: 0000000000011200 R09: 0000000000000008 R10: 0000000000000020 R11: ffff88009d6e8000 R12: 0000000000011210 R13: 0000000000000030 R14: ffff8802306996b8 R15: ffff8802451c9720 FS: 00007f25b8a16800(0000) GS:ffff88024fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000018 CR3: 0000000230576000 CR4: 00000000000007e0 Stack: ffff8802451c9720 0000000000000002 ffffffff81a97100 0000000000281250 ffffffff81a96480 ffff88024fc99150 ffff880228d18200 0000000000000000 0000000000000000 0000000000000040 ffff880230e8c2e8 ffff8802459dc900 Call Trace: [<ffffffff811b2208>] bio_integrity_alloc+0x48/0x1b0 [<ffffffff811b26fc>] bio_integrity_prep+0xac/0x360 [<ffffffff8111e298>] ? mempool_alloc+0x58/0x150 [<ffffffffa03e8041>] ? alloc_extent_state+0x31/0x110 [btrfs] [<ffffffff81241579>] blk_queue_bio+0x1c9/0x460 [<ffffffff8123e58a>] generic_make_request+0xca/0x100 [<ffffffff8123e639>] submit_bio+0x79/0x160 [<ffffffffa03f865e>] btrfs_map_bio+0x48e/0x5b0 [btrfs] [<ffffffffa03c821a>] btree_submit_bio_hook+0xda/0x110 [btrfs] [<ffffffffa03e7eba>] submit_one_bio+0x6a/0xa0 [btrfs] [<ffffffffa03ef450>] read_extent_buffer_pages+0x250/0x310 [btrfs] [<ffffffff8125eef6>] ? __radix_tree_preload+0x66/0xf0 [<ffffffff8125f1c5>] ? radix_tree_insert+0x95/0x260 [<ffffffffa03c66f6>] btree_read_extent_buffer_pages.constprop.128+0xb6/0x120 [btrfs] [<ffffffffa03c8c1a>] read_tree_block+0x3a/0x60 [btrfs] [<ffffffffa03caefd>] open_ctree+0x139d/0x2030 [btrfs] [<ffffffffa03a282a>] btrfs_mount+0x53a/0x7d0 [btrfs] [<ffffffff8113ab0b>] ? pcpu_alloc+0x8eb/0x9f0 [<ffffffff81167305>] ? __kmalloc_track_caller+0x35/0x1e0 [<ffffffff81176ba0>] mount_fs+0x20/0xd0 [<ffffffff81191096>] vfs_kern_mount+0x76/0x120 [<ffffffff81193320>] do_mount+0x200/0xa40 [<ffffffff81135cdb>] ? strndup_user+0x5b/0x80 [<ffffffff81193bf0>] SyS_mount+0x90/0xe0 [<ffffffff8156d31d>] system_call_fastpath+0x1a/0x1f Code: 4c 8d 75 a8 4c 89 6d e8 45 89 e0 4c 8d 6f 30 48 89 5d d8 41 83 e0 af 48 89 fb 49 83 c6 18 4c 89 7d f8 65 4c 8b 3c 25 c0 b8 00 00 <48> 8b 73 18 44 89 c7 44 89 45 98 ff 53 20 48 85 c0 48 89 c2 74 RIP [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150 RSP <ffff880230699688> CR2: 0000000000000018 ---[ end trace 7a96042017ed21e2 ]--- Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Ilya Dryomov 提交于
free_device rcu callback, scheduled from btrfs_rm_dev_replace_srcdev, can be processed before btrfs_scratch_superblock is called, which would result in a use-after-free on btrfs_device contents. Fix this by zeroing the superblock before the rcu callback is registered. Cc: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Ilya Dryomov 提交于
The current implementation of worker threads in Btrfs has races in worker stopping code, which cause all kinds of panics and lockups when running btrfs/011 xfstest in a loop. The problem is that btrfs_stop_workers is unsynchronized with respect to check_idle_worker, check_busy_worker and __btrfs_start_workers. E.g., check_idle_worker race flow: btrfs_stop_workers(): check_idle_worker(aworker): - grabs the lock - splices the idle list into the working list - removes the first worker from the working list - releases the lock to wait for its kthread's completion - grabs the lock - if aworker is on the working list, moves aworker from the working list to the idle list - releases the lock - grabs the lock - puts the worker - removes the second worker from the working list ...... btrfs_stop_workers returns, aworker is on the idle list FS is umounted, memory is freed ...... aworker is waken up, fireworks ensue With this applied, I wasn't able to trigger the problem in 48 hours, whereas previously I could reliably reproduce at least one of these races within an hour. Reported-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Liu Bo 提交于
The crash[1] is found by xfstests/generic/208 with "-o compress", it's not reproduced everytime, but it does panic. The bug is quite interesting, it's actually introduced by a recent commit (573aecaf, Btrfs: actually limit the size of delalloc range). Btrfs implements delay allocation, so during writeback, we (1) get a page A and lock it (2) search the state tree for delalloc bytes and lock all pages within the range (3) process the delalloc range, including find disk space and create ordered extent and so on. (4) submit the page A. It runs well in normal cases, but if we're in a racy case, eg. buffered compressed writes and aio-dio writes, sometimes we may fail to lock all pages in the 'delalloc' range, in which case, we need to fall back to search the state tree again with a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset). The mentioned commit has a side effect, that is, in the fallback case, we can find delalloc bytes before the index of the page we already have locked, so we're in the case of (delalloc_end <= *start) and return with (found > 0). This ends with not locking delalloc pages but making ->writepage still process them, and the crash happens. This fixes it by just thinking that we find nothing and returning to caller as the caller knows how to deal with it properly. [1]: ------------[ cut here ]------------ kernel BUG at mm/page-writeback.c:2170! [...] CPU: 2 PID: 11755 Comm: btrfs-delalloc- Tainted: G O 3.11.0+ #8 [...] RIP: 0010:[<ffffffff810f5093>] [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83 [...] [ 4934.248731] Stack: [ 4934.248731] ffff8801477e5dc8 ffffea00049b9f00 ffff8801869f9ce8 ffffffffa02b841a [ 4934.248731] 0000000000000000 0000000000000000 0000000000000fff 0000000000000620 [ 4934.248731] ffff88018db59c78 ffffea0005da8d40 ffffffffa02ff860 00000001810016c0 [ 4934.248731] Call Trace: [ 4934.248731] [<ffffffffa02b841a>] extent_range_clear_dirty_for_io+0xcf/0xf5 [btrfs] [ 4934.248731] [<ffffffffa02a8889>] compress_file_range+0x1dc/0x4cb [btrfs] [ 4934.248731] [<ffffffff8104f7af>] ? detach_if_pending+0x22/0x4b [ 4934.248731] [<ffffffffa02a8bad>] async_cow_start+0x35/0x53 [btrfs] [ 4934.248731] [<ffffffffa02c694b>] worker_loop+0x14b/0x48c [btrfs] [ 4934.248731] [<ffffffffa02c6800>] ? btrfs_queue_worker+0x25c/0x25c [btrfs] [ 4934.248731] [<ffffffff810608f5>] kthread+0x8d/0x95 [ 4934.248731] [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43 [ 4934.248731] [<ffffffff814fe09c>] ret_from_fork+0x7c/0xb0 [ 4934.248731] [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43 [ 4934.248731] Code: ff 85 c0 0f 94 c0 0f b6 c0 59 5b 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 2c de 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 52 49 8b 84 24 80 00 00 00 f6 40 20 01 75 44 [ 4934.248731] RIP [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83 [ 4934.248731] RSP <ffff8801869f9c48> [ 4934.280307] ---[ end trace 36f06d3f8750236a ]--- Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
If we crash with a log, remount and recover that log, and then crash before we can commit another transaction we will get transid verify errors on the next mount. This is because we were not zero'ing out the log when we committed the transaction after recovery. This is ok as long as we commit another transaction at some point in the future, but if you abort or something else goes wrong you can end up in this weird state because the recovery stuff says that the tree log should have a generation+1 of the super generation, which won't be the case of the transaction that was started for recovery. Fix this by removing the check and _always_ zero out the log portion of the super when we commit a transaction. This fixes the transid verify issues I was seeing with my force errors tests. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Thierry Reding 提交于
This fixes a build failure caused by calling the free() function which does not exist in the Linux kernel. Signed-off-by: NThierry Reding <treding@nvidia.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com> (cherry picked from commit aaaae980)
-
由 tinguely@sgi.com 提交于
Free the memory in error path of xlog_recover_add_to_trans(). Normally this memory is freed in recovery pass2, but is leaked in the error path. Signed-off-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com> (cherry picked from commit 519ccb81)
-
由 Dave Chinner 提交于
The determination of whether a directory entry contains a dtype field originally was dependent on the filesystem having CRCs enabled. This meant that the format for dtype beign enabled could be determined by checking the directory block magic number rather than doing a feature bit check. This was useful in that it meant that we didn't need to pass a struct xfs_mount around to functions that were already supplied with a directory block header. Unfortunately, the introduction of dtype fields into the v4 structure via a feature bit meant this "use the directory block magic number" method of discriminating the dirent entry sizes is broken. Hence we need to convert the places that use magic number checks to use feature bit checks so that they work correctly and not by chance. The current code works on v4 filesystems only because the dirent size roundup covers the extra byte needed by the dtype field in the places where this problem occurs. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com> (cherry picked from commit 367993e7)
-
由 Dave Chinner 提交于
Michael Semon reported that xfs/299 generated this lockdep warning: ============================================= [ INFO: possible recursive locking detected ] 3.12.0-rc2+ #2 Not tainted --------------------------------------------- touch/21072 is trying to acquire lock: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 but task is already holding lock: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&xfs_dquot_other_class); lock(&xfs_dquot_other_class); *** DEADLOCK *** May be due to missing lock nesting notation 7 locks held by touch/21072: #0: (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e #1: (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40 #2: (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35 #3: (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1 #4: (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f #5: (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 #6: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 The lockdep annotation for dquot lock nesting only understands locking for user and "other" dquots, not user, group and quota dquots. Fix the annotations to match the locking heirarchy we now have. Reported-by: NMichael L. Semon <mlsemon35@gmail.com> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com> (cherry picked from commit f112a049)
-
- 02 10月, 2013 1 次提交
-
-
由 Al Viro 提交于
Freeing ->s_{inode,dentry}_lru in deactivate_locked_super() is wrong; the right place is destroy_super(). As it is, we leak them if sget() decides that new superblock it has allocated (and never shown to anybody) isn't needed and should be freed. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 01 10月, 2013 5 次提交
-
-
由 Miklos Szeredi 提交于
fuse_access() is never called in RCU walk, only on the final component of access(2) and chdir(2)... Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
Doing dput(parent) is not valid in RCU walk mode. In RCU mode it would probably be okay to update the parent flags, but it's actually not necessary most of the time... So only set the FUSE_I_ADVISE_RDPLUS flag on the parent when the entry was recently initialized by READDIRPLUS. This is achieved by setting FUSE_I_INIT_RDPLUS on entries added by READDIRPLUS and only dropping out of RCU mode if this flag is set. FUSE_I_INIT_RDPLUS is cleared once the FUSE_I_ADVISE_RDPLUS flag is set in the parent. Reported-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org
-
由 Miklos Szeredi 提交于
If revalidate finds an invalid dentry in RCU walk mode, let the VFS deal with it instead of calling check_submounts_and_drop() which is not prepared for being called from RCU walk. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org
-
由 Vyacheslav Dubeyko 提交于
Many NILFS2 users were reported about strange file system corruption (for example): NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768 NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540) But such error messages are consequence of file system's issue that takes place more earlier. Fortunately, Jerome Poulin <jeromepoulin@gmail.com> and Anton Eliasson <devel@antoneliasson.se> were reported about another issue not so recently. These reports describe the issue with segctor thread's crash: BUG: unable to handle kernel paging request at 0000000000004c83 IP: nilfs_end_page_io+0x12/0xd0 [nilfs2] Call Trace: nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2] nilfs_segctor_construct+0x17b/0x290 [nilfs2] nilfs_segctor_thread+0x122/0x3b0 [nilfs2] kthread+0xc0/0xd0 ret_from_fork+0x7c/0xb0 These two issues have one reason. This reason can raise third issue too. Third issue results in hanging of segctor thread with eating of 100% CPU. REPRODUCING PATH: One of the possible way or the issue reproducing was described by Jermoe me Poulin <jeromepoulin@gmail.com>: 1. init S to get to single user mode. 2. sysrq+E to make sure only my shell is running 3. start network-manager to get my wifi connection up 4. login as root and launch "screen" 5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies. 6. lscp | xz -9e > lscp.txt.xz 7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs 8. start a screen to dump /proc/kmsg to text file since rsyslog is killed 9. start a screen and launch strace -f -o find-cat.log -t find /mnt/nilfs -type f -exec cat {} > /dev/null \; 10. start a screen and launch strace -f -o apt-get.log -t apt-get update 11. launch the last command again as it did not crash the first time 12. apt-get crashes 13. ps aux > ps-aux-crashed.log 13. sysrq+W 14. sysrq+E wait for everything to terminate 15. sysrq+SUSB Simplified way of the issue reproducing is starting kernel compilation task and "apt-get update" in parallel. REPRODUCIBILITY: The issue is reproduced not stable [60% - 80%]. It is very important to have proper environment for the issue reproducing. The critical conditions for successful reproducing: (1) It should have big modified file by mmap() way. (2) This file should have the count of dirty blocks are greater that several segments in size (for example, two or three) from time to time during processing. (3) It should be intensive background activity of files modification in another thread. INVESTIGATION: First of all, it is possible to see that the reason of crash is not valid page address: NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82 NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783 Moreover, value of b_page (0x1a82) is 6786. This value looks like segment number. And b_blocknr with b_size values look like block numbers. So, buffer_head's pointer points on not proper address value. Detailed investigation of the issue is discovered such picture: [-----------------------------SEGMENT 6783-------------------------------] NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024, segbuf->sb_segnum 6783 [-----------------------------SEGMENT 6784-------------------------------] NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824 NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev ffff880221cffee8 NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824 NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784 NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50 NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0 [----------] ditto NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15 [-----------------------------SEGMENT 6785-------------------------------] NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824 NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev ffff880221cffc88 NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824 NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785 NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8 NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0 [----------] ditto NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12 NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783 NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784 NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785 NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82 BUG: unable to handle kernel paging request at 0000000000001a82 IP: [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2] Usually, for every segment we collect dirty files in list. Then, dirty blocks are gathered for every dirty file, prepared for write and submitted by means of nilfs_segbuf_submit_bh() call. Finally, it takes place complete write phase after calling nilfs_end_bio_write() on the block layer. Buffers/pages are marked as not dirty on final phase and processed files removed from the list of dirty files. It is possible to see that we had three prepare_write and submit_bio phases before segbuf_wait and complete_write phase. Moreover, segments compete between each other for dirty blocks because on every iteration of segments processing dirty buffer_heads are added in several lists of payload_buffers: [SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50 [SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8 The next pointer is the same but prev pointer has changed. It means that buffer_head has next pointer from one list but prev pointer from another. Such modification can be made several times. And, finally, it can be resulted in various issues: (1) segctor hanging, (2) segctor crashing, (3) file system metadata corruption. FIX: This patch adds: (1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write() for every proccessed dirty block; (2) checking of BH_Async_Write flag in nilfs_lookup_dirty_data_buffers() and nilfs_lookup_dirty_node_buffers(); (3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(), nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page(). Reported-by: NJerome Poulin <jeromepoulin@gmail.com> Reported-by: NAnton Eliasson <devel@antoneliasson.se> Cc: Paul Fertser <fercerpav@gmail.com> Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp> Cc: Piotr Szymaniak <szarpaj@grubelek.pl> Cc: Juan Barry Manuel Canham <Linux@riotingpacifist.net> Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com> Cc: Elmer Zhang <freeboy6716@gmail.com> Cc: Kenneth Langga <klangga@gmail.com> Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com> Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Aloni 提交于
A high setting of max_map_count, and a process core-dumping with a large enough vm_map_count could result in an NT_FILE note not being written, and the kernel crashing immediately later because it has assumed otherwise. Reproduction of the oops-causing bug described here: https://lkml.org/lkml/2013/8/30/50 Rge ussue originated in commit 2aa362c4 ("coredump: extend core dump note section to contain file names of mapped file") from Oct 4, 2012. This patch make that section optional in that case. fill_files_note() should signify the error, and also let the info struct in elf_core_dump() be zero-initialized so that we can check for the optionally written note. [akpm@linux-foundation.org: avoid abusing E2BIG, remove a couple of not-really-needed local variables] [akpm@linux-foundation.org: fix sparse warning] Signed-off-by: NDan Aloni <alonid@stratoscale.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Denys Vlasenko <vda.linux@googlemail.com> Reported-by: NMartin MOKREJS <mmokrejs@gmail.com> Tested-by: NMartin MOKREJS <mmokrejs@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 9月, 2013 7 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Lubomir Rintel 提交于
Superblock lock was replaced with (un)lock_super() removal, but left uninitialized for Seventh Edition UNIX filesystem in the following commit (3.7): c07cb01c sysv: drop lock/unlock super Signed-off-by: NLubomir Rintel <lkundrak@v3.sk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Anna Schumaker 提交于
The previous patch introduces a compile warning by not assigning an initial value to the "flavor" variable. This could only be a problem if the server returns a supported secflavor list of length zero, but it's better to fix this before it's ever hit. Signed-off-by: NAnna Schumaker <bjschuma@netapp.com> Acked-by: NWeston Andros Adamson <dros@netapp.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Weston Andros Adamson 提交于
Call nfs4_lookup_root_sec for each flavor returned by SECINFO_NO_NAME until one works. One example of a situation this fixes: - server configured for krb5 - server principal somehow gets deleted from KDC - server still thinking krb is good, sends krb5 as first entry in SECINFO_NO_NAME response - client tries krb5, but this fails without even sending an RPC because gssd's requests to the KDC can't find the server's principal Signed-off-by: NWeston Andros Adamson <dros@netapp.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
We need to ensure that the initialisation of the data server nfs_client structure in nfs4_ds_connect is correctly ordered w.r.t. the read of ds->ds_clp in nfs4_fl_prepare_ds. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
- Fix an Oops when nfs4_ds_connect() returns an error. - Always check the device status after waiting for a connect to complete. Reported-by: NAndy Adamson <andros@netapp.com> Reported-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Cc: <stable@vger.kernel.org> # v3.10+
-
- 27 9月, 2013 1 次提交
-
-
由 Benjamin LaHaise 提交于
Dmitry Vyukov managed to trigger a case where aio_migratepage can cause a use-after-free during teardown of the aio ring buffer's mapping. This turns out to be caused by access to the ioctx's ring_pages via the migratepage operation which was not being protected by any locks during ioctx freeing. Use the address_space's private_lock to protect use and updates of the mapping's private_data, and make ioctx teardown unlink the ioctx from the address space. Reported-by: NDmitry Vyukov <dvyukov@google.com> Tested-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
-
- 26 9月, 2013 4 次提交
-
-
由 Mark Tinguely 提交于
Commit f5ea1100 cleans up the disk to host conversions for node directory entries, but because a variable is reused in xfs_node_toosmall() the next node is not correctly found. If the original node is small enough (<= 3/8 of the node size), this change may incorrectly cause a node collapse when it should not. That will cause an assert in xfstest generic/319: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: /root/newest/xfs/fs/xfs/xfs_trans_buf.c, line: 569 Keep the original node header to get the correct forward node. (When a node is considered for a merge with a sibling, it overwrites the sibling pointers of the original incore nodehdr with the sibling's pointers. This leads to loop considering the original node as a merge candidate with itself in the second pass, and so it incorrectly determines a merge should occur.) Signed-off-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com> [v3: added Dave Chinner's (slightly modified) suggestion to the commit header, cleaned up whitespace. -bpm]
-
由 Trond Myklebust 提交于
Determine if we've created a new file by examining the directory change attribute and/or the O_EXCL flag. This fixes a regression when doing a non-exclusive create of a new file. If the FILE_CREATED flag is not set, the atomic_open() command will perform full file access permissions checks instead of just checking for MAY_OPEN. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Steve French 提交于
To 2.02 Signed-off-by: NSteve French <smfrench@gmail.com>
-
由 Steve French 提交于
These flags were unused by cifs and since the EXT flags have been moved to common code in uapi/linux/fs.h we won't need to have a cifs specific copy. Signed-off-by: NSteve French <smfrench@gmail.com>
-
- 25 9月, 2013 6 次提交
-
-
由 Goldwyn Rodrigues 提交于
While printing 32-bit node numbers, an 8-byte string is not enough. Increase the size of the string to 12 chars. This got left out in commit 49fa8140 ("fs/ocfs2/super.c: Use bigger nodestr to accomodate 32-bit node numbers"). Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kent Overstreet 提交于
The memcpy() in bio_copy_data() was using the wrong offset vars, leading to data corruption in weird unusual setups. Signed-off-by: NKent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-stable <stable@vger.kernel.org> # >= v3.9 Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dave Chinner 提交于
After a fair number of xfstests runs, xfs/182 started to fail regularly with a corrupted directory - a directory read verifier was failing after recovery because it found a block with a XARM magic number (remote attribute block) rather than a directory data block. The first time I saw this repeated failure I did /something/ and the problem went away, so I was never able to find the underlying problem. Test xfs/182 failed again today, and I found the root cause before I did /something else/ that made it go away. Tracing indicated that the block in question was being correctly logged, the log was being flushed by sync, but the buffer was not being written back before the shutdown occurred. Tracing also indicated that log recovery was also reading the block, but then never writing it before log recovery invalidated the cache, indicating that it was not modified by log recovery. More detailed analysis of the corpse indicated that the filesystem had a uuid of "a4131074-1872-4cac-9323-2229adbcb886" but the XARM block had a uuid of "8f32f043-c3c9-e7f8-f947-4e7f989c05d3", which indicated it was a block from an older filesystem. The reason that log recovery didn't replay it was that the LSN in the XARM block was larger than the LSN of the transaction being replayed, and so the block was not overwritten by log recovery. Hence, log recovery cant blindly trust the magic number and LSN in the block - it must verify that it belongs to the filesystem being recovered before using the LSN. i.e. if the UUIDs don't match, we need to unconditionally recovery the change held in the log. This patch was first tested on a block device that was repeatedly causing xfs/182 to fail with the same failure on the same block with the same directory read corruption signature (i.e. XARM block). It did not fail, and hasn't failed since. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
It uses a kernel internal structure in it's definition rather than the user visible structure that is passed to the ioctl. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
When we free an inode, we do so via RCU. As an RCU lookup can occur at any time before we free an inode, and that lookup takes the inode flags lock, we cannot safely assert that the flags lock is not held just before marking it dead and running call_rcu() to free the inode. We check on allocation of a new inode structre that the lock is not held, so we still have protection against locks being leaked and hence not correctly initialised when allocated out of the slab. Hence just remove the assert... Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Regression introduced by commit 46f9d2eb ("xfs: aborted buf items can be in the AIL") which fails to lock the AIL before removing the item. Spinlock debugging throws a warning about this. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 24 9月, 2013 2 次提交
-
-
由 Jeff Mahoney 提交于
There are two locks involved in managing the journal lists. The general reiserfs_write_lock and the journal->j_flush_mutex. While flush_journal_list is sleeping to acquire the j_flush_mutex or to submit a block for write, it will drop the write lock. This allows another thread to acquire the write lock and ultimately call flush_used_journal_lists to traverse the list of journal lists and select one for flushing. It can select the journal_list that has just had flush_journal_list called on it in the original thread and call it again with the same journal_list. The second thread then drops the write lock to acquire j_flush_mutex and the first thread reacquires it and continues execution and eventually clears and frees the journal list before dropping j_flush_mutex and returning. The second thread acquires j_flush_mutex and ends up operating on a journal_list that has already been released. If the memory hasn't been reused, we'll soon after hit a BUG_ON because the transaction id has already been cleared. If it's been reused, we'll crash in other fun ways. Since flush_journal_list will synchronize on j_flush_mutex, we can fix the race by taking a proper reference in flush_used_journal_lists and checking to see if it's still valid after the mutex is taken. It's safe to iterate the list of journal lists and pick a list with just the write lock as long as a reference is taken on the journal list before we drop the lock. We already have code to handle whether a transaction has been flushed already so we can use that to handle the race and get rid of the trans_id BUG_ON. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Jeff Mahoney 提交于
Commit a3172027 introduced test_transaction as a requirement for flushing old lists -- but it can never return 1 unless the transaction has already been flushed. As a result, we have a routine that iterates the j_realblocks list but doesn't actually do anything. Since it's been this way since 2006 and the latency numbers were what Chris expected, let's just rip it out. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NJan Kara <jack@suse.cz>
-