- 17 5月, 2010 6 次提交
-
-
由 Jan Kara 提交于
We failed to show journal_checksum option in /proc/mounts. Fix it. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Curt Wohlgemuth 提交于
Fix ext4_mb_collect_stats() to use the correct test for s_bal_success; it should be testing "best-extent.fe_len >= orig-extent.fe_len" , not "orig-extent.fe_len >= goal-extent.fe_len" . Signed-off-by: NCurt Wohlgemuth <curtw@google.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Curt Wohlgemuth 提交于
This adds a new field in ext4_group_info to cache the largest available block range in a block group; and don't load the buddy pages until *after* we've done a sanity check on the block group. With large allocation requests (e.g., fallocate(), 8MiB) and relatively full partitions, it's easy to have no block groups with a block extent large enough to satisfy the input request length. This currently causes the loop during cr == 0 in ext4_mb_regular_allocator() to load the buddy bitmap pages for EVERY block group. That can be a lot of pages. The patch below allows us to call ext4_mb_good_group() BEFORE we load the buddy pages (although we have check again after we lock the block group). Addresses-Google-Bug: #2578108 Addresses-Google-Bug: #2704453 Signed-off-by: NCurt Wohlgemuth <curtw@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Nikanth Karthikesan 提交于
Currently using posix_fallocate one can bypass an RLIMIT_FSIZE limit and create a file larger than the limit. Add a check for that. Signed-off-by: NNikanth Karthikesan <knikanth@suse.de> Signed-off-by: NAmit Arora <aarora@in.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Curt Wohlgemuth 提交于
Addresses-Google-Bug: #2562325 Signed-off-by: NCurt Wohlgemuth <curtw@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Curt Wohlgemuth 提交于
This adds a "re-mounted" message to ext4_remount(), and both it and the mount message in ext4_fill_super() now have the original mount options data string. Signed-off-by: NCurt Wohlgemuth <curtw@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 16 5月, 2010 13 次提交
-
-
由 Eric Sandeen 提交于
Because we can badly over-reserve metadata when we calculate worst-case, it complicates things for quota, since we must reserve and then claim later, retry on EDQUOT, etc. Quota is also a generally smaller pool than fs free blocks, so this over-reservation hurts more, and more often. I'm of the opinion that it's not the worst thing to allow metadata to push a user slightly over quota. This simplifies the code and avoids the false quota rejections that result from worst-case speculation. This patch stops the speculative quota-charging for worst-case metadata requirements, and just charges quota when the blocks are allocated at writeout. It also is able to remove the try-again loop on EDQUOT. This patch has been tested indirectly by running the xfstests suite with a hack to mount & enable quota prior to the test. I also did a more specific test of fragmenting freespace and then doing a large delalloc write under quota; quota stopped me at the right amount of file IO, and then the writeout generated enough metadata (due to the fragmentation) that it put me slightly over quota, as expected. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
To simplify metadata tracking for delalloc writes, ext4 will simply claim metadata blocks at allocation time, without first speculatively reserving the worst case and then freeing what was not used. To do this, we need a mechanism to track allocations in the quota subsystem, but potentially allow that allocation to actually go over quota. This patch adds a DQUOT_SPACE_NOFAIL flag and function variants for this purpose. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
Switch __dquot_alloc_space and __dquot_free_space to take flags to indicate whether to warn and/or to reserve (or free reserve). This is slightly more readable at the callpoints, and makes it cleaner to add a "nofail" option in the next patch. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dmitry Monakhov 提交于
Currently block/inode/dir counters initialized before journal was recovered. In fact after journal recovery this info will probably change. And freeblocks it critical for correct delalloc mode accounting. https://bugzilla.kernel.org/show_bug.cgi?id=15768Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Acked-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dmitry Monakhov 提交于
- Reorganize locking scheme to batch two atomic operation in to one. This also allow us to state what healthy group must obey following rule ext4_free_inodes_count(sb, gdp) == ext4_count_free(inode_bitmap, NUM); - Fix possible undefined pointer dereference. - Even if group descriptor stats aren't accessible we have to update inode bitmaps. - Move non-group members update out of group_lock. Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dmitry Monakhov 提交于
The extents code will sometimes zero out blocks and mark them as initialized instead of splitting an extent into several smaller ones. This optimization however, causes problems if the extent is beyond i_size because fsck will complain if there are uninitialized blocks after i_size as this can not be distinguished from an inode that has an incorrect i_size field. https://bugzilla.kernel.org/show_bug.cgi?id=15742Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
One of the most contended locks in the jbd2 layer is j_state_lock when running dbench. This is especially true if using the real-time kernel with its "sleeping spinlocks" patch that replaces spinlocks with priority inheriting mutexes --- but it also shows up on large SMP benchmarks. Thanks to John Stultz for pointing this out. Reviewed by Mingming Cao and Jan Kara. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
There was a bug reported on RHEL5 that a 10G dd on a 12G box had a very, very slow sync after that. At issue was the loop in write_cache_pages scanning all the way to the end of the 10G file, even though the subsequent call to mpage_da_submit_io would only actually write a smallish amt; then we went back to the write_cache_pages loop ... wasting tons of time in calling __mpage_da_writepage for thousands of pages we would just revisit (many times) later. Upstream it's not such a big issue for sys_sync because we get to the loop with a much smaller nr_to_write, which limits the loop. However, talking with Aneesh he realized that fsync upstream still gets here with a very large nr_to_write and we face the same problem. This patch makes mpage_add_bh_to_extent stop the loop after we've accumulated 2048 pages, by setting mpd->io_done = 1; which ultimately causes the write_cache_pages loop to break. Repeating the test with a dirty_ratio of 80 (to leave something for fsync to do), I don't see huge IO performance gains, but the reduction in cpu usage is striking: 80% usage with stock, and 2% with the below patch. Instrumenting the loop in write_cache_pages clearly shows that we are wasting time here. Eventually we need to change mpage_da_map_pages() also submit its I/O to the block layer, subsuming mpage_da_submit_io(), and then change it call ext4_get_blocks() multiple times. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
Turn off issuance of discard requests if the device does not support it - similar to the action we take for barriers. This will save a little computation time if a non-discardable device is mounted with -o discard, and also makes it obvious that it's not doing what was asked at mount time ... Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
ext4_freeze() used jbd2_journal_lock_updates() which takes the j_barrier mutex, and then returns to userspace. The kernel does not like this: ================================================ [ BUG: lock held when returning to user space! ] ------------------------------------------------ lvcreate/1075 is leaving the kernel with locks still held! 1 lock held by lvcreate/1075: #0: (&journal->j_barrier){+.+...}, at: [<ffffffff811c6214>] jbd2_journal_lock_updates+0xe1/0xf0 Use vfs_check_frozen() added to ext4_journal_start_sb() and ext4_force_commit() instead. Addresses-Red-Hat-Bugzilla: #568503 Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dmitry Monakhov 提交于
generic setattr implementation is no longer responsible for quota transfer so synlinks must be handled via ext4_setattr. Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
If groups_per_flex < 2, sbi->s_flex_groups[] doesn't get filled out, and every other access to this first tests s_log_groups_per_flex; same thing needs to happen in resize or we'll wander off into a null pointer when doing an online resize of the file system. Thanks to Christoph Biedl, who came up with the trivial testcase: # truncate --size 128M fsfile # mkfs.ext3 -F fsfile # tune2fs -O extents,uninit_bg,dir_index,flex_bg,huge_file,dir_nlink,extra_isize fsfile # e2fsck -yDf -C0 fsfile # truncate --size 132M fsfile # losetup /dev/loop0 fsfile # mount /dev/loop0 mnt # resize2fs -p /dev/loop0 https://bugzilla.kernel.org/show_bug.cgi?id=13549Reported-by: NAlessandro Polverini <alex@nibbles.it> Test-case-by: NChristoph Biedl <bugzilla.kernel.bpeb@manchmal.in-ulm.de> Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dmitry Monakhov 提交于
allocated_meta_data is already included in 'used' variable. Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 15 5月, 2010 1 次提交
-
-
由 Christian Borntraeger 提交于
I have an x86_64 kernel with i386 userspace. e4defrag fails on the EXT4_IOC_MOVE_EXT ioctl because it is not wired up for the compat case. It seems that struct move_extent is compat save, only types with fixed widths are used: { __u32 reserved; /* should be zero */ __u32 donor_fd; /* donor file descriptor */ __u64 orig_start; /* logical start offset in block for orig */ __u64 donor_start; /* logical start offset in block for donor */ __u64 len; /* block length to be moved */ __u64 moved_len; /* moved block length */ }; Lets just wire up EXT4_IOC_MOVE_EXT for the compat case. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NEric Sandeen <sandeen@redhat.com> CC: Akira Fujita <a-fujita@rs.jp.nec.com>
-
- 14 5月, 2010 1 次提交
-
-
由 Jing Zhang 提交于
This function cleans up after ext4_mb_load_buddy(), so the renaming makes the code clearer. Signed-off-by: NJing Zhang <zj.barak@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 13 5月, 2010 1 次提交
-
-
由 Jing Zhang 提交于
Signed-off-by: NJing Zhang <zj.barak@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 12 5月, 2010 1 次提交
-
-
由 Jing Zhang 提交于
When EIO occurs after bio is submitted, there is no memory free operation for bio, which results in memory leakage. And there is also no check against bio_alloc() for bio. Acked-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: NJing Zhang <zj.barak@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 11 5月, 2010 1 次提交
-
-
由 liuqi_123 提交于
Making sure ee_block is initialized to zero to prevent gcc from kvetching. It's harmless (although it's not obvious that it's harmless) from code inspection: fs/ext4/move_extent.c:478: warning: 'start_ext.ee_block' may be used uninitialized in this function Thanks to Stefan Richter for first bringing this to the attention of linux-ext4@vger.kernel.org. Signed-off-by: LiuQi <lingjiujianke@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
-
- 10 5月, 2010 1 次提交
-
-
由 Dmitry Monakhov 提交于
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 04 5月, 2010 12 次提交
-
-
由 Joel Becker 提交于
gcc warns that a variable is uninitialized. It's actually handled, but an early return fools gcc. Let's just initialize the variable to a garbage value that will crash if the usage is ever broken. Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Sage Weil 提交于
It's useless, since our allocations are already a power of 2. And it was allocated per-instance (not globally), which caused a name collision when we tried to mount a second file system with auth_x enabled. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
The __ variant requires caller to hold i_lock. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
If a rename operation is resent to the MDS following an MDS restart, the client does not get a full reply (containing the resulting metadata) back. In that case, a ceph_rename() needs to compensate by doing anything useful that fill_inode() would have, like d_move(). It also needs to invalidate the dentry (to workaround the vfs_rename_dir() bug) and clear the dir complete flag, just like fill_trace(). Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
truncate_inode_pages_range wants the end offset to align with the last byte in a page. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
We can get old message seq #'s after a tcp reconnect for stateful sessions (i.e., the MDS). If we get a higher seq #, that is an error, and we shouldn't see any bad seq #'s for stateless (mon, osd) connections. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Increment in_seq even when the message is skipped for some reason. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Decouple the client version from the server side. Print relevant protocol and map version info instead. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
The snap realm split was checking i_snap_realm, not the list_head, to determine if an inode belonged in the new realm. The check always failed, which meant we always moved the inode, corrupting the old realm's list and causing various crashes. Also wait to release old realm reference to avoid possibility of use after free. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
d_move() reorders the d_subdirs list, breaking the readdir result caching. Unless/until d_move preserves that ordering, clear CEPH_I_COMPLETE on rename. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 03 5月, 2010 1 次提交
-
-
由 Ryusuke Konishi 提交于
As of 32a88aa1, __sync_filesystem() will return 0 if s_bdi is not set. And nilfs does not set s_bdi anywhere. I noticed this problem by the warning introduced by the recent commit 5129a469 ("Catch filesystem lacking s_bdi"). WARNING: at fs/super.c:959 vfs_kern_mount+0xc5/0x14e() Hardware name: PowerEdge 2850 Modules linked in: nilfs2 loop tpm_tis tpm tpm_bios video shpchp pci_hotplug output dcdbas Pid: 3773, comm: mount.nilfs2 Not tainted 2.6.34-rc6-debug #38 Call Trace: [<c1028422>] warn_slowpath_common+0x60/0x90 [<c102845f>] warn_slowpath_null+0xd/0x10 [<c1095936>] vfs_kern_mount+0xc5/0x14e [<c1095a03>] do_kern_mount+0x32/0xbd [<c10a811e>] do_mount+0x671/0x6d0 [<c1073794>] ? __get_free_pages+0x1f/0x21 [<c10a684f>] ? copy_mount_options+0x2b/0xe2 [<c107b634>] ? strndup_user+0x48/0x67 [<c10a81de>] sys_mount+0x61/0x8f [<c100280c>] sysenter_do_call+0x12/0x32 This ensures to set s_bdi for nilfs and fixes the sync silent failure. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: NJens Axboe <jens.axboe@oracle.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 5月, 2010 2 次提交
-
-
由 David Howells 提交于
Fix a number of RCU issues in the NFSv4 delegation code. (1) delegation->cred doesn't need to be RCU protected as it's essentially an invariant refcounted structure. By the time we get to nfs_free_delegation(), the delegation is being released, so no one else should be attempting to use the saved credentials, and they can be cleared. However, since the list of delegations could still be under traversal at this point by such as nfs_client_return_marked_delegations(), the cred should be released in nfs_do_free_delegation() rather than in nfs_free_delegation(). Simply using rcu_assign_pointer() to clear it is insufficient as that doesn't stop the cred from being destroyed, and nor does calling put_rpccred() after call_rcu(), given that the latter is asynchronous. (2) nfs_detach_delegation_locked() and nfs_inode_set_delegation() should use rcu_derefence_protected() because they can only be called if nfs_client::cl_lock is held, and that guards against anyone changing nfsi->delegation under it. Furthermore, the barrier imposed by rcu_dereference() is superfluous, given that the spin_lock() is also a barrier. (3) nfs_detach_delegation_locked() is now passed a pointer to the nfs_client struct so that it can issue lockdep advice based on clp->cl_lock for (2). (4) nfs_inode_return_delegation_noreclaim() and nfs_inode_return_delegation() should use rcu_access_pointer() outside the spinlocked region as they merely examine the pointer and don't follow it, thus rendering unnecessary the need to impose a partial ordering over the one item of interest. These result in an RCU warning like the following: [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- fs/nfs/delegation.c:332 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by mount.nfs4/2281: #0: (&type->s_umount_key#34){+.+...}, at: [<ffffffff810b25b4>] deactivate_super+0x60/0x80 #1: (iprune_sem){+.+...}, at: [<ffffffff810c332a>] invalidate_inodes+0x39/0x13a stack backtrace: Pid: 2281, comm: mount.nfs4 Not tainted 2.6.34-rc1-cachefs #110 Call Trace: [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2 [<ffffffffa00b4591>] nfs_inode_return_delegation_noreclaim+0x5b/0xa0 [nfs] [<ffffffffa0095d63>] nfs4_clear_inode+0x11/0x1e [nfs] [<ffffffff810c2d92>] clear_inode+0x9e/0xf8 [<ffffffff810c3028>] dispose_list+0x67/0x10e [<ffffffff810c340d>] invalidate_inodes+0x11c/0x13a [<ffffffff810b1dc1>] generic_shutdown_super+0x42/0xf4 [<ffffffff810b1ebe>] kill_anon_super+0x11/0x4f [<ffffffffa009893c>] nfs4_kill_super+0x3f/0x72 [nfs] [<ffffffff810b25bc>] deactivate_super+0x68/0x80 [<ffffffff810c6744>] mntput_no_expire+0xbb/0xf8 [<ffffffff810c681b>] release_mounts+0x9a/0xb0 [<ffffffff810c689b>] put_mnt_ns+0x6a/0x79 [<ffffffffa00983a1>] nfs_follow_remote_path+0x5a/0x146 [nfs] [<ffffffffa0098334>] ? nfs_do_root_mount+0x82/0x95 [nfs] [<ffffffffa00985a9>] nfs4_try_mount+0x75/0xaf [nfs] [<ffffffffa0098874>] nfs4_get_sb+0x291/0x31a [nfs] [<ffffffff810b2059>] vfs_kern_mount+0xb8/0x177 [<ffffffff810b2176>] do_kern_mount+0x48/0xe8 [<ffffffff810c810b>] do_mount+0x782/0x7f9 [<ffffffff810c8205>] sys_mount+0x83/0xbe [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b Also on: fs/nfs/delegation.c:215 invoked rcu_dereference_check() without protection! [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2 [<ffffffffa00b4223>] nfs_inode_set_delegation+0xfe/0x219 [nfs] [<ffffffffa00a9c6f>] nfs4_opendata_to_nfs4_state+0x2c2/0x30d [nfs] [<ffffffffa00aa15d>] nfs4_do_open+0x2a6/0x3a6 [nfs] ... And: fs/nfs/delegation.c:40 invoked rcu_dereference_check() without protection! [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2 [<ffffffffa00b3bef>] nfs_free_delegation+0x3d/0x6e [nfs] [<ffffffffa00b3e71>] nfs_do_return_delegation+0x26/0x30 [nfs] [<ffffffffa00b406a>] __nfs_inode_return_delegation+0x1ef/0x1fe [nfs] [<ffffffffa00b448a>] nfs_client_return_marked_delegations+0xc9/0x124 [nfs] ... Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
Ensure that we correctly rcu-dereference the delegation itself, and that we protect against removal while we're changing the contents. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-