- 17 5月, 2010 11 次提交
-
-
由 Dmitry Monakhov 提交于
At several places we modify EXT4_I(inode)->i_flags without holding i_mutex (ext4_do_update_inode, ...). These modifications are racy and we can lose updates to i_flags. So convert handling of i_flags to use bitops which are atomic. https://bugzilla.kernel.org/show_bug.cgi?id=15792Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
EXT4_ERROR_INODE() tends to provide better error information and in a more consistent format. Some errors were not even identifying the inode or directory which was corrupted, which made them not very useful. Addresses-Google-Bug: #2507977 Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
This saves a huge amount of stack space by avoiding unnecesary struct buffer_head's from being allocated on the stack. In addition, to make the code easier to understand, collapse and refactor ext4_get_block(), ext4_get_block_write(), noalloc_get_block_write(), into a single function. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Jack up ext4_get_blocks() and add a new function, ext4_map_blocks() which uses a much smaller structure, struct ext4_map_blocks which is 20 bytes, as opposed to a struct buffer_head, which nearly 5 times bigger on an x86_64 machine. By switching things to use ext4_map_blocks(), we can save stack space by using ext4_map_blocks() since we can avoid allocating a struct buffer_head on the stack. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Make a copy of write_cache_pages() for the benefit of ext4_da_writepages(). This allows us to simplify the code some, and will allow us to further customize the code in future patches. There are some nasty hacks in write_cache_pages(), which Linus has (correctly) characterized as vile. I've just copied it into write_cache_pages_da(), without trying to clean those bits up lest I break something in the ext4's delalloc implementation, which is a bit fragile right now. This will allow Dave Chinner to clean up write_cache_pages() in mm/page-writeback.c, without worrying about breaking ext4. Eventually write_cache_pages_da() will go away when I rewrite ext4's delayed allocation and create a general ext4_writepages() which is used for all of ext4's writeback. Until now this is the lowest risk way to clean up the core write_cache_pages() function. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Dave Chinner <david@fromorbit.com>
-
由 Jan Kara 提交于
We failed to show journal_checksum option in /proc/mounts. Fix it. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Curt Wohlgemuth 提交于
Fix ext4_mb_collect_stats() to use the correct test for s_bal_success; it should be testing "best-extent.fe_len >= orig-extent.fe_len" , not "orig-extent.fe_len >= goal-extent.fe_len" . Signed-off-by: NCurt Wohlgemuth <curtw@google.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Curt Wohlgemuth 提交于
This adds a new field in ext4_group_info to cache the largest available block range in a block group; and don't load the buddy pages until *after* we've done a sanity check on the block group. With large allocation requests (e.g., fallocate(), 8MiB) and relatively full partitions, it's easy to have no block groups with a block extent large enough to satisfy the input request length. This currently causes the loop during cr == 0 in ext4_mb_regular_allocator() to load the buddy bitmap pages for EVERY block group. That can be a lot of pages. The patch below allows us to call ext4_mb_good_group() BEFORE we load the buddy pages (although we have check again after we lock the block group). Addresses-Google-Bug: #2578108 Addresses-Google-Bug: #2704453 Signed-off-by: NCurt Wohlgemuth <curtw@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Nikanth Karthikesan 提交于
Currently using posix_fallocate one can bypass an RLIMIT_FSIZE limit and create a file larger than the limit. Add a check for that. Signed-off-by: NNikanth Karthikesan <knikanth@suse.de> Signed-off-by: NAmit Arora <aarora@in.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Curt Wohlgemuth 提交于
Addresses-Google-Bug: #2562325 Signed-off-by: NCurt Wohlgemuth <curtw@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Curt Wohlgemuth 提交于
This adds a "re-mounted" message to ext4_remount(), and both it and the mount message in ext4_fill_super() now have the original mount options data string. Signed-off-by: NCurt Wohlgemuth <curtw@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 16 5月, 2010 13 次提交
-
-
由 Eric Sandeen 提交于
Because we can badly over-reserve metadata when we calculate worst-case, it complicates things for quota, since we must reserve and then claim later, retry on EDQUOT, etc. Quota is also a generally smaller pool than fs free blocks, so this over-reservation hurts more, and more often. I'm of the opinion that it's not the worst thing to allow metadata to push a user slightly over quota. This simplifies the code and avoids the false quota rejections that result from worst-case speculation. This patch stops the speculative quota-charging for worst-case metadata requirements, and just charges quota when the blocks are allocated at writeout. It also is able to remove the try-again loop on EDQUOT. This patch has been tested indirectly by running the xfstests suite with a hack to mount & enable quota prior to the test. I also did a more specific test of fragmenting freespace and then doing a large delalloc write under quota; quota stopped me at the right amount of file IO, and then the writeout generated enough metadata (due to the fragmentation) that it put me slightly over quota, as expected. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
To simplify metadata tracking for delalloc writes, ext4 will simply claim metadata blocks at allocation time, without first speculatively reserving the worst case and then freeing what was not used. To do this, we need a mechanism to track allocations in the quota subsystem, but potentially allow that allocation to actually go over quota. This patch adds a DQUOT_SPACE_NOFAIL flag and function variants for this purpose. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
Switch __dquot_alloc_space and __dquot_free_space to take flags to indicate whether to warn and/or to reserve (or free reserve). This is slightly more readable at the callpoints, and makes it cleaner to add a "nofail" option in the next patch. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dmitry Monakhov 提交于
Currently block/inode/dir counters initialized before journal was recovered. In fact after journal recovery this info will probably change. And freeblocks it critical for correct delalloc mode accounting. https://bugzilla.kernel.org/show_bug.cgi?id=15768Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Acked-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dmitry Monakhov 提交于
- Reorganize locking scheme to batch two atomic operation in to one. This also allow us to state what healthy group must obey following rule ext4_free_inodes_count(sb, gdp) == ext4_count_free(inode_bitmap, NUM); - Fix possible undefined pointer dereference. - Even if group descriptor stats aren't accessible we have to update inode bitmaps. - Move non-group members update out of group_lock. Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dmitry Monakhov 提交于
The extents code will sometimes zero out blocks and mark them as initialized instead of splitting an extent into several smaller ones. This optimization however, causes problems if the extent is beyond i_size because fsck will complain if there are uninitialized blocks after i_size as this can not be distinguished from an inode that has an incorrect i_size field. https://bugzilla.kernel.org/show_bug.cgi?id=15742Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
One of the most contended locks in the jbd2 layer is j_state_lock when running dbench. This is especially true if using the real-time kernel with its "sleeping spinlocks" patch that replaces spinlocks with priority inheriting mutexes --- but it also shows up on large SMP benchmarks. Thanks to John Stultz for pointing this out. Reviewed by Mingming Cao and Jan Kara. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
There was a bug reported on RHEL5 that a 10G dd on a 12G box had a very, very slow sync after that. At issue was the loop in write_cache_pages scanning all the way to the end of the 10G file, even though the subsequent call to mpage_da_submit_io would only actually write a smallish amt; then we went back to the write_cache_pages loop ... wasting tons of time in calling __mpage_da_writepage for thousands of pages we would just revisit (many times) later. Upstream it's not such a big issue for sys_sync because we get to the loop with a much smaller nr_to_write, which limits the loop. However, talking with Aneesh he realized that fsync upstream still gets here with a very large nr_to_write and we face the same problem. This patch makes mpage_add_bh_to_extent stop the loop after we've accumulated 2048 pages, by setting mpd->io_done = 1; which ultimately causes the write_cache_pages loop to break. Repeating the test with a dirty_ratio of 80 (to leave something for fsync to do), I don't see huge IO performance gains, but the reduction in cpu usage is striking: 80% usage with stock, and 2% with the below patch. Instrumenting the loop in write_cache_pages clearly shows that we are wasting time here. Eventually we need to change mpage_da_map_pages() also submit its I/O to the block layer, subsuming mpage_da_submit_io(), and then change it call ext4_get_blocks() multiple times. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
Turn off issuance of discard requests if the device does not support it - similar to the action we take for barriers. This will save a little computation time if a non-discardable device is mounted with -o discard, and also makes it obvious that it's not doing what was asked at mount time ... Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
ext4_freeze() used jbd2_journal_lock_updates() which takes the j_barrier mutex, and then returns to userspace. The kernel does not like this: ================================================ [ BUG: lock held when returning to user space! ] ------------------------------------------------ lvcreate/1075 is leaving the kernel with locks still held! 1 lock held by lvcreate/1075: #0: (&journal->j_barrier){+.+...}, at: [<ffffffff811c6214>] jbd2_journal_lock_updates+0xe1/0xf0 Use vfs_check_frozen() added to ext4_journal_start_sb() and ext4_force_commit() instead. Addresses-Red-Hat-Bugzilla: #568503 Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dmitry Monakhov 提交于
generic setattr implementation is no longer responsible for quota transfer so synlinks must be handled via ext4_setattr. Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
If groups_per_flex < 2, sbi->s_flex_groups[] doesn't get filled out, and every other access to this first tests s_log_groups_per_flex; same thing needs to happen in resize or we'll wander off into a null pointer when doing an online resize of the file system. Thanks to Christoph Biedl, who came up with the trivial testcase: # truncate --size 128M fsfile # mkfs.ext3 -F fsfile # tune2fs -O extents,uninit_bg,dir_index,flex_bg,huge_file,dir_nlink,extra_isize fsfile # e2fsck -yDf -C0 fsfile # truncate --size 132M fsfile # losetup /dev/loop0 fsfile # mount /dev/loop0 mnt # resize2fs -p /dev/loop0 https://bugzilla.kernel.org/show_bug.cgi?id=13549Reported-by: NAlessandro Polverini <alex@nibbles.it> Test-case-by: NChristoph Biedl <bugzilla.kernel.bpeb@manchmal.in-ulm.de> Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dmitry Monakhov 提交于
allocated_meta_data is already included in 'used' variable. Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 15 5月, 2010 1 次提交
-
-
由 Christian Borntraeger 提交于
I have an x86_64 kernel with i386 userspace. e4defrag fails on the EXT4_IOC_MOVE_EXT ioctl because it is not wired up for the compat case. It seems that struct move_extent is compat save, only types with fixed widths are used: { __u32 reserved; /* should be zero */ __u32 donor_fd; /* donor file descriptor */ __u64 orig_start; /* logical start offset in block for orig */ __u64 donor_start; /* logical start offset in block for donor */ __u64 len; /* block length to be moved */ __u64 moved_len; /* moved block length */ }; Lets just wire up EXT4_IOC_MOVE_EXT for the compat case. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NEric Sandeen <sandeen@redhat.com> CC: Akira Fujita <a-fujita@rs.jp.nec.com>
-
- 14 5月, 2010 1 次提交
-
-
由 Jing Zhang 提交于
This function cleans up after ext4_mb_load_buddy(), so the renaming makes the code clearer. Signed-off-by: NJing Zhang <zj.barak@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 13 5月, 2010 1 次提交
-
-
由 Jing Zhang 提交于
Signed-off-by: NJing Zhang <zj.barak@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 12 5月, 2010 1 次提交
-
-
由 Jing Zhang 提交于
When EIO occurs after bio is submitted, there is no memory free operation for bio, which results in memory leakage. And there is also no check against bio_alloc() for bio. Acked-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: NJing Zhang <zj.barak@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 11 5月, 2010 1 次提交
-
-
由 liuqi_123 提交于
Making sure ee_block is initialized to zero to prevent gcc from kvetching. It's harmless (although it's not obvious that it's harmless) from code inspection: fs/ext4/move_extent.c:478: warning: 'start_ext.ee_block' may be used uninitialized in this function Thanks to Stefan Richter for first bringing this to the attention of linux-ext4@vger.kernel.org. Signed-off-by: LiuQi <lingjiujianke@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
-
- 10 5月, 2010 4 次提交
-
-
由 Dmitry Monakhov 提交于
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Linus Torvalds 提交于
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6由 Linus Torvalds 提交于
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: [SCSI] Retry commands with UNIT_ATTENTION sense codes to fix ext3/ext4 I/O error [SCSI] Enable retries for SYNCRONIZE_CACHE commands to fix I/O error [SCSI] scsi_debug: virtual_gb ignores sector_size [SCSI] libiscsi: regression: fix header digest errors [SCSI] fix locking around blk_abort_request() [SCSI] advansys: fix narrow board error path
-
由 Arjan van de Ven 提交于
commit 672917dc ("cpuidle: menu governor: reduce latency on exit") added an optimization, where the analysis on the past idle period moved from the end of idle, to the beginning of the new idle. Unfortunately, this optimization had a bug where it zeroed one key variable for new use, that is needed for the analysis. The fix is simple, zero the variable after doing the work from the previous idle. During the audit of the code that found this issue, another issue was also found; the ->measured_us data structure member is never set, a local variable is always used instead. Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Cc: Corrado Zoccolo <czoccolo@gmail.com> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 5月, 2010 7 次提交
-
-
git://neil.brown.name/md由 Linus Torvalds 提交于
* 'for-linus' of git://neil.brown.name/md: md: restore ability of spare drives to spin down. md/raid6: Fix raid-6 read-error correction in degraded state
-
git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6由 Linus Torvalds 提交于
* 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6: pcmcia: fix compilation after 16bit state locking changes pcmcia: order userspace suspend and resume requests pcmcia: avoid pccard_validate_cis failure in resume callpath
-
git://git.kernel.dk/linux-2.6-block由 Linus Torvalds 提交于
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: blk-cgroup: Fix an RCU warning in blkiocg_create() blk-cgroup: Fix RCU correctness warning in cfq_init_queue() drbd: don't expose failed local READ to upper layers
-
git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6由 Linus Torvalds 提交于
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: drm/ttm: Remove the ttm_bo_block_reservation() function. drm/ttm: Remove some leftover debug messages. drm/radeon: async event synchronization for drmWaitVblank
-
由 Stijn Tintel 提交于
Move initialization of the virtio framework before the initialization of mtd, so that block2mtd can be used on virtio-based block devices. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=15644Signed-off-by: NStijn Tintel <stijn@linux-ipv6.be> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
git://git.linux-nfs.org/projects/trondmy/nfs-2.6由 Linus Torvalds 提交于
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Fix RCU issues in the NFSv4 delegation code NFSv4: Fix the locking in nfs_inode_reclaim_delegation()
-
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6由 Linus Torvalds 提交于
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: ACPI: sleep: init_set_sci_en_on_resume for Dell Studio 155x ACPI: fix acpi_hest_firmware_first_pci() caused oops sbshc: acpi_device_class "smbus_host_controller" too long power_meter: acpi_device_class "power_meter_resource" too long acpi_pad: "processor_aggregator" name too long PNP: don't check for conflicts with bridge windows ACPI: DMI init_set_sci_en_on_resume for multiple Lenovo ThinkPads PNPACPI: compute Address Space length rather than using _LEN ACPI: silence kmemcheck false positive
-