- 07 3月, 2012 6 次提交
-
-
由 Benny Halevy 提交于
Handle the case where the nfsv4.1 client asked to uprade or downgrade its delegations and server returns no delegation. In this case, op_delegate_type is set to NFS4_OPEN_DELEGATE_NONE_EXT and op_why_no_deleg is set respectively to WND4_NOT_SUPP_{UP,DOWN}GRADE Signed-off-by: NBenny Halevy <bhalevy@tonian.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Benny Halevy 提交于
When a 4.1 client asks for a delegation and the server returns none op_delegate_type is set to NFS4_OPEN_DELEGATE_NONE_EXT and op_why_no_deleg is set to either WND4_CONTENTION or WND4_RESOURCE. Or, if the client sent a NFS4_SHARE_WANT_CANCEL (which it is not supposed to ever do until our server supports delegations signaling), op_why_no_deleg is set to WND4_CANCELLED. Note that for WND4_CONTENTION and WND4_RESOURCE, the xdr layer is hard coded at this time to encode boolean FALSE for ond_server_will_push_deleg / ond_server_will_signal_avail. Signed-off-by: NBenny Halevy <bhalevy@tonian.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
Another leak on error Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
The current code never calls nfsd4_shutdown_recdir if nfs4_state_start returns an error. Also, it's better to go ahead and consolidate these functions since one is just a trivial wrapper around the other. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
To escape having your stable storage record purged at the end of the grace period, it's not sufficient to simply have performed a setclientid_confirm; you also need to meet the same requirements as someone creating a new record: either you should have done an open or open reclaim (in the 4.0 case) or a reclaim_complete (in the 4.1 case). Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
We set cl_firststate when we first decide that a client will be permitted to reclaim state on next boot. This happens: - for new 4.0 clients, when they confirm their first open - for returning 4.0 clients, when they reclaim their first open - for 4.1+ clients, when they perform reclaim_complete We also use cl_firststate to decide whether a reclaim_complete has already been performed, in the 4.1+ case. We were setting it on 4.1 open reclaims, which caused spurious COMPLETE_ALREADY errors on RECLAIM_COMPLETE from an nfs4.1 client with anything to reclaim. Reported-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 18 2月, 2012 5 次提交
-
-
由 Benny Halevy 提交于
Respect client request for not getting a delegation in NFSv4.1 Appropriately return delegation "type" NFS4_OPEN_DELEGATE_NONE_EXT and WND4_NOT_WANTED reason. [nfsd41: add missing break when encoding op_why_no_deleg] Signed-off-by: NBenny Halevy <bhalevy@tonian.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Bryan Schumaker 提交于
When I initially wrote it, I didn't understand how lists worked so I wrote something that didn't use them. I think making a list of stateids to test is a more straightforward implementation, especially compared to especially compared to decoding stateids while simultaneously encoding a reply to the client. Signed-off-by: NBryan Schumaker <bjschuma@netapp.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 NeilBrown 提交于
If you try to set grace_period or timeout via a module parameter to lockd, and do this on a big-endian machine where sizeof(int) != sizeof(unsigned long) it won't work. This number given will be effectively shifted right by the difference in those two sizes. So cast kp->arg properly to get correct result. Cc: stable@kernel.org Signed-off-by: NNeilBrown <neilb@suse.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Benny Halevy 提交于
Signed-off-by: NBenny Halevy <bhalevy@tonian.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Benny Halevy 提交于
Currently, it will not correctly ignore any nfsv4.1 signal flags if the client sends them. Signed-off-by: NBenny Halevy <bhalevy@tonian.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 16 2月, 2012 10 次提交
-
-
由 Tigran Mkrtchyan 提交于
Signed-off-by: NTigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Tigran Mkrtchyan 提交于
Signed-off-by: NTigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Tigran Mkrtchyan 提交于
Signed-off-by: NTigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Tigran Mkrtchyan 提交于
Signed-off-by: NTigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Tigran Mkrtchyan 提交于
Signed-off-by: NTigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Tigran Mkrtchyan 提交于
Signed-off-by: NTigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Tigran Mkrtchyan 提交于
Signed-off-by: NTigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Tigran Mkrtchyan 提交于
Signed-off-by: NTigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Tigran Mkrtchyan 提交于
Signed-off-by: NTigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Tigran Mkrtchyan 提交于
Signed-off-by: NTigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 15 2月, 2012 2 次提交
-
-
由 J. Bruce Fields 提交于
This fixes an oops when a buggy client tries to use an initial seqid of 0 on a new slot, which we may misinterpret as a replay. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
Combine two booleans into a single flag field, move the smaller fields to the end. (In practice this doesn't make the struct any smaller. But we'll be adding another flag here soon.) Remove some debugging code that doesn't look useful, while we're in the neighborhood. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 14 2月, 2012 1 次提交
-
-
由 J. Bruce Fields 提交于
From RFC 5661 2.10.6.1: "If the previous sequence ID was 0xFFFFFFFF, then the next request for the slot MUST have the sequence ID set to zero." While we're there, delete some redundant comments. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 04 2月, 2012 3 次提交
-
-
由 J. Bruce Fields 提交于
The rpc buffers will be allocated out of low memory, so we should really only be taking that into account. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
Move calculation of the default into a helper function. Get rid of an unused variable "err" while we're there. Thanks to Mi Jinlong for catching an arithmetic error in a previous version. Cc: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Dan Carpenter 提交于
We check for zero length strings in the caller now, so these aren't needed. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 28 1月, 2012 9 次提交
-
-
由 Joern Engel 提交于
Not all mtd drivers define block_isbad(). Let's assume no bad blocks instead of refusing to mount. Signed-off-by: NJoern Engel <joern@logfs.org>
-
由 Joern Engel 提交于
Can be necessary if an inode gets deleted (through -ENOSPC) before being written. Might be better to move this into logfs_write_rec(), but for now go with the stupid&safe patch. Signed-off-by: NJoern Engel <joern@logfs.org>
-
由 Joern Engel 提交于
Or hit an assertion in map_invalidatepage() instead. Signed-off-by: NJoern Engel <joern@logfs.org>
-
由 Joern Engel 提交于
It prevents write sizes >4k. Signed-off-by: NJoern Engel <joern@logfs.org>
-
由 Prasad Joshi 提交于
During GC LogFS has to rewrite each valid block to a separate segment. Rewrite operation reads data from an old segment and writes it to a newly allocated segment. Since every write operation changes data block pointers maintained in inode, inode should also be rewritten. In GC path to avoid AB-BA deadlock LogFS marks a page with PG_pre_locked in addition to locking the page (PG_locked). The page lock is ignored iff the page is pre-locked. LogFS uses a special file called segment file. The segment file maintains an 8 bytes entry for every segment. It keeps track of erase count, level etc. for every segment. Bad things happen with a segment belonging to the segment file is GCed ------------[ cut here ]------------ kernel BUG at /home/prasad/logfs/readwrite.c:297! invalid opcode: 0000 [#1] SMP Modules linked in: logfs joydev usbhid hid psmouse e1000 i2c_piix4 serio_raw [last unloaded: logfs] Pid: 20161, comm: mount Not tainted 3.1.0-rc3+ #3 innotek GmbH VirtualBox EIP: 0060:[<f809132a>] EFLAGS: 00010292 CPU: 0 EIP is at logfs_lock_write_page+0x6a/0x70 [logfs] EAX: 00000027 EBX: f73f5b20 ECX: c16007c8 EDX: 00000094 ESI: 00000000 EDI: e59be6e4 EBP: c7337b28 ESP: c7337b18 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process mount (pid: 20161, ti=c7336000 task=eb323f70 task.ti=c7336000) Stack: f8099a3d c7337b24 f73f5b20 00001002 c7337b50 f8091f6d f8099a4d f80994e4 00000003 00000000 c7337b68 00000000 c67e4400 00001000 c7337b80 f80935e5 00000000 00000000 00000000 00000000 e1fcf000 0000000f e59be618 c70bf900 Call Trace: [<f8091f6d>] logfs_get_write_page.clone.16+0xdd/0x100 [logfs] [<f80935e5>] logfs_mod_segment_entry+0x55/0x110 [logfs] [<f809460d>] logfs_get_segment_entry+0x1d/0x20 [logfs] [<f8091060>] ? logfs_cleanup_journal+0x50/0x50 [logfs] [<f809521b>] ostore_get_erase_count+0x1b/0x40 [logfs] [<f80965b8>] logfs_open_area+0xc8/0x150 [logfs] [<c141a7ec>] ? kmemleak_alloc+0x2c/0x60 [<f809668e>] __logfs_segment_write.clone.16+0x4e/0x1b0 [logfs] [<c10dd563>] ? mempool_kmalloc+0x13/0x20 [<c10dd563>] ? mempool_kmalloc+0x13/0x20 [<f809696f>] logfs_segment_write+0x17f/0x1d0 [logfs] [<f8092e8c>] logfs_write_i0+0x11c/0x180 [logfs] [<f8092f35>] logfs_write_direct+0x45/0x90 [logfs] [<f80934cd>] __logfs_write_buf+0xbd/0xf0 [logfs] [<c102900e>] ? kmap_atomic_prot+0x4e/0xe0 [<f809424b>] logfs_write_buf+0x3b/0x60 [logfs] [<f80947a9>] __logfs_write_inode+0xa9/0x110 [logfs] [<f8094cb0>] logfs_rewrite_block+0xc0/0x110 [logfs] [<f8095300>] ? get_mapping_page+0x10/0x60 [logfs] [<f8095aa0>] ? logfs_load_object_aliases+0x2e0/0x2f0 [logfs] [<f808e57d>] logfs_gc_segment+0x2ad/0x310 [logfs] [<f808e62a>] __logfs_gc_once+0x4a/0x80 [logfs] [<f808ed43>] logfs_gc_pass+0x683/0x6a0 [logfs] [<f8097a89>] logfs_mount+0x5a9/0x680 [logfs] [<c1126b21>] mount_fs+0x21/0xd0 [<c10f6f6f>] ? __alloc_percpu+0xf/0x20 [<c113da41>] ? alloc_vfsmnt+0xb1/0x130 [<c113db4b>] vfs_kern_mount+0x4b/0xa0 [<c113e06e>] do_kern_mount+0x3e/0xe0 [<c113f60d>] do_mount+0x34d/0x670 [<c10f2749>] ? strndup_user+0x49/0x70 [<c113fcab>] sys_mount+0x6b/0xa0 [<c142d87c>] syscall_call+0x7/0xb Code: f8 e8 8b 93 39 c9 8b 45 f8 3e 0f ba 28 00 19 d2 85 d2 74 ca eb d0 0f 0b 8d 45 fc 89 44 24 04 c7 04 24 3d 9a 09 f8 e8 09 92 39 c9 <0f> 0b 8d 74 26 00 55 89 e5 3e 8d 74 26 00 8b 10 80 e6 01 74 09 EIP: [<f809132a>] logfs_lock_write_page+0x6a/0x70 [logfs] SS:ESP 0068:c7337b18 ---[ end trace 96e67d5b3aa3d6ca ]--- The patch passes locked page to __logfs_write_inode. It calls function logfs_get_wblocks() to pre-lock the page. This ensures any further attempts to lock the page are ignored (esp from get_erase_count). Acked-by: NJoern Engel <joern@logfs.org> Signed-off-by: NPrasad Joshi <prasadjoshi.linux@gmail.com>
-
由 Prasad Joshi 提交于
While unmounting the file system LogFS calls generic_shutdown_super. The function does file system independent superblock shutdown. However, it might result in call file system specific inode eviction. LogFS marks FS shutting down by setting bit LOGFS_SB_FLAG_SHUTDOWN in super->s_flags. Since, inode eviction might call truncate on inode, following BUG is observed when file system is unmounted: ------------[ cut here ]------------ kernel BUG at /home/prasad/logfs/segment.c:362! invalid opcode: 0000 [#1] PREEMPT SMP CPU 3 Modules linked in: logfs binfmt_misc ppdev virtio_blk parport_pc lp parport psmouse floppy virtio_pci serio_raw virtio_ring virtio Pid: 1933, comm: umount Not tainted 3.0.0+ #4 Bochs Bochs RIP: 0010:[<ffffffffa008c841>] [<ffffffffa008c841>] logfs_segment_write+0x211/0x230 [logfs] RSP: 0018:ffff880062d7b9e8 EFLAGS: 00010202 RAX: 000000000000000e RBX: ffff88006eca9000 RCX: 0000000000000000 RDX: ffff88006fd87c40 RSI: ffffea00014ff468 RDI: ffff88007b68e000 RBP: ffff880062d7ba48 R08: 8000000020451430 R09: 0000000000000000 R10: dead000000100100 R11: 0000000000000000 R12: ffff88006fd87c40 R13: ffffea00014ff468 R14: ffff88005ad0a460 R15: 0000000000000000 FS: 00007f25d50ea760(0000) GS:ffff88007fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000d05e48 CR3: 0000000062c72000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process umount (pid: 1933, threadinfo ffff880062d7a000, task ffff880070b44500) Stack: ffff880062d7ba38 ffff88005ad0a508 0000000000001000 0000000000000000 8000000020451430 ffffea00014ff468 ffff880062d7ba48 ffff88005ad0a460 ffff880062d7bad8 ffffea00014ff468 ffff88006fd87c40 0000000000000000 Call Trace: [<ffffffffa0088fee>] logfs_write_i0+0x12e/0x190 [logfs] [<ffffffffa0089360>] __logfs_write_rec+0x140/0x220 [logfs] [<ffffffffa0089312>] __logfs_write_rec+0xf2/0x220 [logfs] [<ffffffffa00894a4>] logfs_write_rec+0x64/0xd0 [logfs] [<ffffffffa0089616>] __logfs_write_buf+0x106/0x110 [logfs] [<ffffffffa008a19e>] logfs_write_buf+0x4e/0x80 [logfs] [<ffffffffa008a6b8>] __logfs_write_inode+0x98/0x110 [logfs] [<ffffffffa008a7c4>] logfs_truncate+0x54/0x290 [logfs] [<ffffffffa008abfc>] logfs_evict_inode+0xdc/0x190 [logfs] [<ffffffff8115eef5>] evict+0x85/0x170 [<ffffffff8115f126>] iput+0xe6/0x1b0 [<ffffffff8115b4a8>] shrink_dcache_for_umount_subtree+0x218/0x280 [<ffffffff8115ce91>] shrink_dcache_for_umount+0x51/0x90 [<ffffffff8114796c>] generic_shutdown_super+0x2c/0x100 [<ffffffffa008cc47>] logfs_kill_sb+0x57/0xf0 [logfs] [<ffffffff81147de5>] deactivate_locked_super+0x45/0x70 [<ffffffff811487ea>] deactivate_super+0x4a/0x70 [<ffffffff81163934>] mntput_no_expire+0xa4/0xf0 [<ffffffff8116469f>] sys_umount+0x6f/0x380 [<ffffffff814dd46b>] system_call_fastpath+0x16/0x1b Code: 55 c8 49 8d b6 a8 00 00 00 45 89 f9 45 89 e8 4c 89 e1 4c 89 55 b8 c7 04 24 00 00 00 00 e8 68 fc ff ff 4c 8b 55 b8 e9 3c ff ff ff <0f> 0b 0f 0b c7 45 c0 00 00 00 00 e9 44 fe ff ff 66 66 66 66 66 RIP [<ffffffffa008c841>] logfs_segment_write+0x211/0x230 [logfs] RSP <ffff880062d7b9e8> ---[ end trace fe6b040cea952290 ]--- Therefore, move super->s_flags setting after the fs-indenpendent work has been finished. Reviewed-by: NJoern Engel <joern@logfs.org> Signed-off-by: NPrasad Joshi <prasadjoshi.linux@gmail.com>
-
由 Prasad Joshi 提交于
LogFS uses super->s_write_mutex while writing data to disk. Taking the same mutex lock in sync and fsync code path solves the following BUG: ------------[ cut here ]------------ kernel BUG at /home/prasad/logfs/dev_bdev.c:134! Pid: 2387, comm: flush-253:16 Not tainted 3.0.0+ #4 Bochs Bochs RIP: 0010:[<ffffffffa007deed>] [<ffffffffa007deed>] bdev_writeseg+0x25d/0x270 [logfs] Call Trace: [<ffffffffa007c381>] logfs_open_area+0x91/0x150 [logfs] [<ffffffff8128dcb2>] ? find_level.clone.9+0x62/0x100 [<ffffffffa007c49c>] __logfs_segment_write.clone.20+0x5c/0x190 [logfs] [<ffffffff810ef005>] ? mempool_kmalloc+0x15/0x20 [<ffffffff810ef383>] ? mempool_alloc+0x53/0x130 [<ffffffffa007c7a4>] logfs_segment_write+0x1d4/0x230 [logfs] [<ffffffffa0078f8e>] logfs_write_i0+0x12e/0x190 [logfs] [<ffffffffa0079300>] __logfs_write_rec+0x140/0x220 [logfs] [<ffffffffa0079444>] logfs_write_rec+0x64/0xd0 [logfs] [<ffffffffa00795b6>] __logfs_write_buf+0x106/0x110 [logfs] [<ffffffffa007a13e>] logfs_write_buf+0x4e/0x80 [logfs] [<ffffffffa0073e33>] __logfs_writepage+0x23/0x80 [logfs] [<ffffffffa007410c>] logfs_writepage+0xdc/0x110 [logfs] [<ffffffff810f5ba7>] __writepage+0x17/0x40 [<ffffffff810f6208>] write_cache_pages+0x208/0x4f0 [<ffffffff810f5b90>] ? set_page_dirty+0x70/0x70 [<ffffffff810f653a>] generic_writepages+0x4a/0x70 [<ffffffff810f75d1>] do_writepages+0x21/0x40 [<ffffffff8116b9d1>] writeback_single_inode+0x101/0x250 [<ffffffff8116bdbd>] writeback_sb_inodes+0xed/0x1c0 [<ffffffff8116c5fb>] writeback_inodes_wb+0x7b/0x1e0 [<ffffffff8116cc23>] wb_writeback+0x4c3/0x530 [<ffffffff814d984d>] ? sub_preempt_count+0x9d/0xd0 [<ffffffff8116cd6b>] wb_do_writeback+0xdb/0x290 [<ffffffff814d984d>] ? sub_preempt_count+0x9d/0xd0 [<ffffffff814d6208>] ? _raw_spin_unlock_irqrestore+0x18/0x40 [<ffffffff8105aa5a>] ? del_timer+0x8a/0x120 [<ffffffff8116cfac>] bdi_writeback_thread+0x8c/0x2e0 [<ffffffff8116cf20>] ? wb_do_writeback+0x290/0x290 [<ffffffff8106d2e6>] kthread+0x96/0xa0 [<ffffffff814de514>] kernel_thread_helper+0x4/0x10 [<ffffffff8106d250>] ? kthread_worker_fn+0x190/0x190 [<ffffffff814de510>] ? gs_change+0xb/0xb RIP [<ffffffffa007deed>] bdev_writeseg+0x25d/0x270 [logfs] ---[ end trace 0211ad60a57657c4 ]--- Reviewed-by: NJoern Engel <joern@logfs.org> Signed-off-by: NPrasad Joshi <prasadjoshi.linux@gmail.com>
-
由 Joern Engel 提交于
This is a bad one. I wonder whether we were so far protected by no_free_segments(sb) usually being smaller than LOGFS_NO_AREAS. Found by Dan Carpenter <dan.carpenter@oracle.com> using smatch. Signed-off-by: NJoern Engel <joern@logfs.org> Signed-off-by: NPrasad Joshi <prasadjoshi.linux@gmail.com>
-
由 Prasad Joshi 提交于
LogFS sets PG_private flag to indicate a pined page. We assumed that marking a page as private is enough to ensure its existence. But instead it is necessary to hold a reference count to the page. The change resolves the following BUG BUG: Bad page state in process flush-253:16 pfn:6a6d0 page flags: 0x100000000000808(uptodate|private) Suggested-and-Acked-by: NJoern Engel <joern@logfs.org> Signed-off-by: NPrasad Joshi <prasadjoshi.linux@gmail.com>
-
- 27 1月, 2012 4 次提交
-
-
由 Chris Mason 提交于
Josef fixed btrfs_page_mkwrite to properly release reserved extents if there was an error. But if we fail to get a reservation and we fail to dirty the inode (for ENOSPC reasons), we'll end up trying to release a reservation we never had. This makes sure we only release if we were able to reserve. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
If we span a long area in a bitmap we could end up taking a lot of time searching to the next free area if we're searching from the original window_start, so advance window_start in order to make sure we don't do any superficial searching. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 David Sterba 提交于
btree_releasepage is a callback and can be passed unknown gfp flags and then they may end up in kmem_cache_alloc called from alloc_extent_state, slab allocator will BUG_ON when there is HIGHMEM or DMA32 flag set. This may happen when btrfs is mounted from a loop device, which masks out __GFP_IO flag. The check in try_release_extent_state 3399 if ((mask & GFP_NOFS) == GFP_NOFS) 3400 mask = GFP_NOFS; will not work and passes unfiltered flags further resulting in crash at mm/slab.c:2963 [<000000000024ae4c>] cache_alloc_refill+0x3b4/0x5c8 [<000000000024c810>] kmem_cache_alloc+0x204/0x294 [<00000000001fd3c2>] mempool_alloc+0x52/0x170 [<000003c000ced0b0>] alloc_extent_state+0x40/0xd4 [btrfs] [<000003c000cee5ae>] __clear_extent_bit+0x38a/0x4cc [btrfs] [<000003c000cee78c>] try_release_extent_state+0x9c/0xd4 [btrfs] [<000003c000cc4c66>] btree_releasepage+0x7e/0xd0 [btrfs] [<0000000000210d84>] shrink_page_list+0x6a0/0x724 [<0000000000211394>] shrink_inactive_list+0x230/0x578 [<0000000000211bb8>] shrink_list+0x6c/0x120 [<0000000000211e4e>] shrink_zone+0x1e2/0x228 [<0000000000211f24>] shrink_zones+0x90/0x254 [<0000000000213410>] do_try_to_free_pages+0xac/0x420 [<0000000000213ae0>] try_to_free_pages+0x13c/0x1b0 [<0000000000204e6c>] __alloc_pages_nodemask+0x5b4/0x9a8 [<00000000001fb04a>] grab_cache_page_write_begin+0x7e/0xe8 Signed-off-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Miao Xie 提交于
When we did sysbench test for inline files, enospc error happened easily though there was lots of free disk space which could be allocated for new chunks. Reproduce steps: # mkfs.btrfs -b $((2 * 1024 * 1024 * 1024)) <test partition> # mount <test partition> /mnt # ulimit -n 102400 # cd /mnt # sysbench --num-threads=1 --test=fileio --file-num=81920 \ > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \ > --file-test-mode=seqwr prepare # sysbench --num-threads=1 --test=fileio --file-num=81920 \ > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \ > --file-test-mode=seqwr run <soon later, BUG_ON() was triggered by enospc error> The reason of this bug is: Now, we can reserve space which is larger than the free space in the chunks if we have enough free disk space which can be used for new chunks. By this way, the space allocator should allocate a new chunk by force if there is no free space in the free space cache. But there are two wrong checks which break this operation. One is if (ret == -ENOSPC && num_bytes > min_alloc_size) in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk even we fail to allocate free space by minimum allocable size. The other is if (space_info->force_alloc) force = space_info->force_alloc; in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen. Fix these two wrong checks. Especially the second one, we fix it by changing the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has higher priority. And if the value which is passed in by the caller is greater than ->force_alloc, use the passed value. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-