- 22 5月, 2010 10 次提交
-
-
由 Dmitry Monakhov 提交于
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Roland Dreier 提交于
> ============================================= > [ INFO: possible recursive locking detected ] > 2.6.31-2-generic #14~rbd3 > --------------------------------------------- > firefox-3.5/4162 is trying to acquire lock: > (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0 > > but task is already holding lock: > (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0 > > other info that might help us debug this: > 3 locks held by firefox-3.5/4162: > #0: (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0 > #1: (&sb->s_type->i_mutex_key#11/1){+.+.+.}, at: [<ffffffff81139d5a>] lock_rename+0x6a/0xf0 > #2: (&sb->s_type->i_mutex_key#11/2){+.+.+.}, at: [<ffffffff81139d6f>] lock_rename+0x7f/0xf0 > > stack backtrace: > Pid: 4162, comm: firefox-3.5 Tainted: G C 2.6.31-2-generic #14~rbd3 > Call Trace: > [<ffffffff8108ae74>] print_deadlock_bug+0xf4/0x100 > [<ffffffff8108ce26>] validate_chain+0x4c6/0x750 > [<ffffffff8108d2e7>] __lock_acquire+0x237/0x430 > [<ffffffff8108d585>] lock_acquire+0xa5/0x150 > [<ffffffff81139d31>] ? lock_rename+0x41/0xf0 > [<ffffffff815526ad>] __mutex_lock_common+0x4d/0x3d0 > [<ffffffff81139d31>] ? lock_rename+0x41/0xf0 > [<ffffffff81139d31>] ? lock_rename+0x41/0xf0 > [<ffffffff8120eaf9>] ? ecryptfs_rename+0x99/0x170 > [<ffffffff81552b36>] mutex_lock_nested+0x46/0x60 > [<ffffffff81139d31>] lock_rename+0x41/0xf0 > [<ffffffff8120eb2a>] ecryptfs_rename+0xca/0x170 > [<ffffffff81139a9e>] vfs_rename_dir+0x13e/0x160 > [<ffffffff8113ac7e>] vfs_rename+0xee/0x290 > [<ffffffff8113c212>] ? __lookup_hash+0x102/0x160 > [<ffffffff8113d512>] sys_renameat+0x252/0x280 > [<ffffffff81133eb4>] ? cp_new_stat+0xe4/0x100 > [<ffffffff8101316a>] ? sysret_check+0x2e/0x69 > [<ffffffff8108c34d>] ? trace_hardirqs_on_caller+0x14d/0x190 > [<ffffffff8113d55b>] sys_rename+0x1b/0x20 > [<ffffffff81013132>] system_call_fastpath+0x16/0x1b The trace above is totally reproducible by doing a cross-directory rename on an ecryptfs directory. The issue seems to be that sys_renameat() does lock_rename() then calls into the filesystem; if the filesystem is ecryptfs, then ecryptfs_rename() again does lock_rename() on the lower filesystem, and lockdep can't tell that the two s_vfs_rename_mutexes are different. It seems an annotation like the following is sufficient to fix this (it does get rid of the lockdep trace in my simple tests); however I would like to make sure I'm not misunderstanding the locking, hence the CC list... Signed-off-by: NRoland Dreier <rdreier@cisco.com> Cc: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Cc: Dustin Kirkland <kirkland@canonical.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Now that the last user passing a NULL file pointer is gone we can remove the redundant dentry argument and associated hacks inside vfs_fsynmc_range. The next step will be removig the dentry argument from ->fsync, but given the luck with the last round of method prototype changes I'd rather defer this until after the main merge window. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Stephen Hemminger 提交于
The entries in xattr handler table should be immutable (ie const) like other operation tables. Later patches convert common filesystems. Uncoverted filesystems will still work, but will generate a compiler warning. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Josef Bacik 提交于
Currently the way we do freezing is by passing sb>s_bdev to freeze_bdev and then letting it do all the work. But freezing is more of an fs thing, and doesn't really have much to do with the bdev at all, all the work gets done with the super. In btrfs we do not populate s_bdev, since we can have multiple bdev's for one fs and setting s_bdev makes removing devices from a pool kind of tricky. This means that freezing a btrfs filesystem fails, which causes us to corrupt with things like tux-on-ice which use the fsfreeze mechanism. So instead of populating sb->s_bdev with a random bdev in our pool, I've broken the actual fs freezing stuff into freeze_super and thaw_super. These just take the super_block that we're freezing and does the appropriate work. It's basically just copy and pasted from freeze_bdev. I've then converted freeze_bdev over to use the new super helpers. I've tested this with ext4 and btrfs and verified everything continues to work the same as before. The only new gotcha is multiple calls to the fsfreeze ioctl will return EBUSY if the fs is already frozen. I thought this was a better solution than adding a freeze counter to the super_block, but if everybody hates this idea I'm open to suggestions. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
... and switch the simple "loop over superblocks and do something" loops to it. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
At the same time we can kill s_need_restart and local mutex in there. __put_super() made public for a while; will be gone later. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
use atomic_inc_not_zero(&sb->s_active) instead of playing games with checking ->s_count > S_BIAS Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 11 5月, 2010 1 次提交
-
-
由 Jiri Slaby 提交于
It will be used in suspend code and serves as an easy wrap around copy_from_user. Similar to simple_read_from_buffer, it takes care of transfers with proper lengths depending on available and count parameters and advances ppos appropriately. Signed-off-by: NJiri Slaby <jslaby@suse.cz> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
- 27 4月, 2010 1 次提交
-
-
由 Tejun Heo 提交于
Currently, device claiming for exclusive open is done after low level open - disk->fops->open() - has completed successfully. This means that exclusive open attempts while a device is already exclusively open will fail only after disk->fops->open() is called. cdrom driver issues commands during open() which means that O_EXCL open attempt can unintentionally inject commands to in-progress command stream for burning thus disturbing burning process. In most cases, this doesn't cause problems because the first command to be issued is TUR which most devices can process in the middle of burning. However, depending on how a device replies to TUR during burning, cdrom driver may end up issuing further commands. This can't be resolved trivially by moving bd_claim() before doing actual open() because that means an open attempt which will end up failing could interfere other legit O_EXCL open attempts. ie. unconfirmed open attempts can fail others. This patch resolves the problem by introducing claiming block which is started by bd_start_claiming() and terminated either by bd_claim() or bd_abort_claiming(). bd_claim() from inside a claiming block is guaranteed to succeed and once a claiming block is started, other bd_start_claiming() or bd_claim() attempts block till the current claiming block is terminated. bd_claim() can still be used standalone although now it always synchronizes against claiming blocks, so the existing users will keep working without any change. blkdev_open() and open_bdev_exclusive() are converted to use claiming blocks so that exclusive open attempts from these functions don't interfere with the existing exclusive open. This problem was discovered while investigating bko#15403. https://bugzilla.kernel.org/show_bug.cgi?id=15403 The burning problem itself can be resolved by updating userspace probing tools to always open w/ O_EXCL. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NMatthias-Christian Ott <ott@mirix.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 24 4月, 2010 1 次提交
-
-
由 Josef Bacik 提交于
This cleans up a few of the complaints of __generic_block_fiemap. I've fixed all the typing stuff, used inline functions instead of macros, gotten rid of a couple of variables, and made sure the size and block requests are all block aligned. It also fixes a problem where sometimes FIEMAP_EXTENT_LAST wasn't being set properly. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 4月, 2010 1 次提交
-
-
由 Eric Dumazet 提交于
kill_fasync() uses a central rwlock, candidate for RCU conversion, to avoid cache line ping pongs on SMP. fasync_remove_entry() and fasync_add_entry() can disable IRQS on a short section instead during whole list scan. Use a spinlock per fasync_struct to synchronize kill_fasync_rcu() and fasync_{remove|add}_entry(). This spinlock is IRQ safe, so sock_fasync() doesnt need its own implementation and can use fasync_helper(), to reduce code size and complexity. We can remove __kill_fasync() direct use in net/socket.c, and rename it to kill_fasync_rcu(). Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 4月, 2010 2 次提交
-
-
由 Andrew Morton 提交于
Requested by hch, for consistency now it is exported. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Anton Blanchard <anton@samba.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anton Blanchard 提交于
Commit 148f948b (vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode) broke the raw driver. We now call through generic_file_aio_write -> generic_write_sync -> vfs_fsync_range. vfs_fsync_range has: if (!fop || !fop->fsync) { ret = -EINVAL; goto out; } But drivers/char/raw.c doesn't set an fsync method. We have two options: fix it or remove the raw driver completely. I'm happy to do either, the fact this has been broken for so long suggests it is rarely used. The patch below adds an fsync method to the raw driver. My knowledge of the block layer is pretty sketchy so this could do with a once over. If we instead decide to remove the raw driver, this patch might still be useful as a backport to 2.6.33 and 2.6.32. Signed-off-by: NAnton Blanchard <anton@samba.org> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Tested-by: NJeff Moyer <jmoyer@redhat.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 3月, 2010 2 次提交
-
-
由 Andrew Morton 提交于
It was tolerable until Eric went and added 8388608. Cc: Eric Paris <eparis@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wu Fengguang 提交于
This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM. POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor performance: a 16K read will be carried out in 4 _sync_ 1-page reads. In other places, ra_pages==0 means - it's ramfs/tmpfs/hugetlbfs/sysfs/configfs - some IO error happened where multi-page read IO won't help or should be avoided. POSIX_FADV_RANDOM actually want a different semantics: to disable the *heuristic* readahead algorithm, and to use a dumb one which faithfully submit read IO for whatever application requests. So introduce a flag FMODE_RANDOM for POSIX_FADV_RANDOM. Note that the random hint is not likely to help random reads performance noticeably. And it may be too permissive on huge request size (its IO size is not limited by read_ahead_kb). In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall (NFS read) performance of the application increased by 313%! Tested-by: NQuentin Barnes <qbarnes+nfs@yahoo-inc.com> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@kernel.org> [2.6.33.x] Cc: <qbarnes+nfs@yahoo-inc.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 3月, 2010 1 次提交
-
-
由 Christoph Hellwig 提交于
This gives the filesystem more information about the writeback that is happening. Trond requested this for the NFS unstable write handling, and other filesystems might benefit from this too by beeing able to distinguish between the different callers in more detail. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 04 3月, 2010 6 次提交
-
-
由 Miklos Szeredi 提交于
Add a new UMOUNT_NOFOLLOW flag to umount(2). This is needed to prevent symlink attacks in unprivileged unmounts (fuse, samba, ncpfs). Additionally, return -EINVAL if an unknown flag is used (and specify an explicitly unused flag: UMOUNT_UNUSED). This makes it possible for the caller to determine if a flag is supported or not. CC: Eugene Teo <eugene@redhat.com> CC: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
apply function to vfsmounts in set returned by collect_mounts(), stop if it returns non-zero. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Analog of is_subdir for vfsmount,dentry pairs, moved from audit_tree.c Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
No one is calling this anymore as everyone has switched to invalidate_mapping_pages long time ago. Also update a few references to it in comments. nfs has two more, but I can't easily figure what they are actually referring to, so I left them as-is. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Richard Kennedy 提交于
re-order structure super_block to remove 16 bytes of alignment padding on 64bit builds. This shrinks the size of super_block from 712 to 696 bytes so requiring one fewer 64 byte cache lines. Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk> ----- patch against 2.6.33-rc5 compiled & tested on x86_64 AMDX2 desktop machine. I've been running with this patch applied for several weeks with no problems. regards Richard Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Boaz Harrosh 提交于
Remove the EXPORT_UNUSED_SYMBOL of simple_prepare_write Collapse simple_prepare_write into it's only caller, though making it simpler and clearer to understand. Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 19 2月, 2010 1 次提交
-
-
由 Richard Kennedy 提交于
This removes 8 bytes of padding from struct inode on 64bit builds, and so allows 1 more object/slab in the inode_cache when using slub. Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk> ---- patch against 2.6.33-rc8 compiled & tested on x86_64 AMDX2 I've been running this patch for over a week with no obvious problems regards Richard Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 14 1月, 2010 1 次提交
-
-
由 Al Viro 提交于
commit 5300990c had stepped on a rather nasty mess: definitions of ACC_MODE used to be different. Fixed the resulting breakage, converting them to variant that takes O_... value; all callers have that and it actually simplifies life (see tomoyo part of changes). Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 23 12月, 2009 3 次提交
-
-
由 Dmitry Monakhov 提交于
Quota code requires unlocked version of this function. Off course we can just copy-paste the code, but copy-pasting is always an evil. Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Andreas Gruenbacher 提交于
This question was determined to be a bug which was fixed in commit 4a3b0a49. Signed-off-by: NAndreas Gruenbacher <agruen@suse.de> Cc: Jan Blunck <jblunck@suse.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
* pull ACC_MODE to fs.h; we have several copies all over the place * nightmarish expression calculating f_mode by f_flags deserves a helper too (OPEN_FMODE(flags)) Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 18 12月, 2009 2 次提交
-
-
由 Christoph Hellwig 提交于
After I_SYNC was split from I_LOCK the leftover is always used together with I_NEW and thus superflous. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
We recently go rid of all callers of do_sync_file_range as they're better served with vfs_fsync or the filemap_write_and_wait. Now that do_sync_file_range is down to a single caller fold it into it so that people don't start using it again accidentally. While at it also switch it from using __filemap_fdatawrite_range(..., WB_SYNC_ALL) to the more clear filemap_fdatawrite_range(). Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 17 12月, 2009 3 次提交
-
-
由 Dave Chinner 提交于
Change all async metadata buffers to use [READ|WRITE]_META I/O types so that the I/O doesn't get issued immediately. This allows merging of adjacent metadata requests but still prioritises them over bulk data. This shows a 10-15% improvement in sequential create speed of small files. Don't include the log buffers in this classification - leave them as sync types so they are issued immediately. Signed-off-by: NDave Chinner <dgc@sgi.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
Currently the locking in blockdev_direct_IO is a mess, we have three different locking types and very confusing checks for some of them. The most complicated one is DIO_OWN_LOCKING for reads, which happens to not actually be used. This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read case is unused anyway, and the write side is almost identical to DIO_NO_LOCKING. The difference is that DIO_NO_LOCKING always sets the create argument for the get_blocks callback to zero, but we can easily move that to the actual get_blocks callbacks. There are four users of the DIO_NO_LOCKING mode: gfs already ignores the create argument and thus is fine with the new version, ocfs2 only errors out if create were ever set, and we can remove this dead code now, the block device code only ever uses create for an error message if we are fully beyond the device which can never happen, and last but not least XFS will need the new behavour for writes. Now we can replace the lock_type variable with a flags one, where no flag means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first flag. Separate out the check for not allowing to fill holes into a separate flag, although for now both flags always get set at the same time. Also revamp the documentation of the locking scheme to actually make sense. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Eric Paris 提交于
All users outside of fs/ of get_empty_filp() have been removed. This patch moves the definition from the include/ directory to internal.h so no new users crop up and removes the EXPORT_SYMBOL. I'd love to see open intents stop using it too, but that's a problem for another day and a smarter developer! Signed-off-by: NEric Paris <eparis@redhat.com> Acked-by: NMiklos Szeredi <miklos@szeredi.hu> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 16 12月, 2009 1 次提交
-
-
由 Christoph Hellwig 提交于
Currently the locking in blockdev_direct_IO is a mess, we have three different locking types and very confusing checks for some of them. The most complicated one is DIO_OWN_LOCKING for reads, which happens to not actually be used. This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read case is unused anyway, and the write side is almost identical to DIO_NO_LOCKING. The difference is that DIO_NO_LOCKING always sets the create argument for the get_blocks callback to zero, but we can easily move that to the actual get_blocks callbacks. There are four users of the DIO_NO_LOCKING mode: gfs already ignores the create argument and thus is fine with the new version, ocfs2 only errors out if create were ever set, and we can remove this dead code now, the block device code only ever uses create for an error message if we are fully beyond the device which can never happen, and last but not least XFS will need the new behavour for writes. Now we can replace the lock_type variable with a flags one, where no flag means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first flag. Separate out the check for not allowing to fill holes into a separate flag, although for now both flags always get set at the same time. Also revamp the documentation of the locking scheme to actually make sense. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alex Elder <aelder@sgi.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 12月, 2009 1 次提交
-
-
由 Christoph Hellwig 提交于
All callers really want the more logical filemap_fdatawait_range interface, so convert them to use it and merge wait_on_page_writeback_range into filemap_fdatawait_range. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJan Kara <jack@suse.cz>
-
- 03 12月, 2009 1 次提交
-
-
由 Martin K. Petersen 提交于
The discard ioctl is used by mkfs utilities to clear a block device prior to putting metadata down. However, not all devices return zeroed blocks after a discard. Some drives return stale data, potentially containing old superblocks. It is therefore important to know whether discarded blocks are properly zeroed. Both ATA and SCSI drives have configuration bits that indicate whether zeroes are returned after a discard operation. Implement a block level interface that allows this information to be bubbled up the stack and queried via a new block device ioctl. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 26 11月, 2009 1 次提交
-
-
由 Vivek Goyal 提交于
There seems to be a regression in direct write path due to following commit in for-2.6.33 branch of block tree. commit 1af60fbd Author: Jeff Moyer <jmoyer@redhat.com> Date: Fri Oct 2 18:56:53 2009 -0400 block: get rid of the WRITE_ODIRECT flag Marking direct writes as WRITE_SYNC_PLUG instead of WRITE_ODIRECT, sets the NOIDLE flag in bio and hence in request. This tells CFQ to not expect more request from the queue and not idle on it (despite the fact that queue's think time is less and it is not seeky). So direct writers lose big time when competing with sequential readers. Using fio, I have run one direct writer and two sequential readers and following are the results with 2.6.32-rc7 kernel and with for-2.6.33 branch. Test ==== 1 direct writer and 2 sequential reader running simultaneously. [global] directory=/mnt/sdc/fio/ runtime=10 [seqwrite] rw=write size=4G direct=1 [seqread] rw=read size=2G numjobs=2 2.6.32-rc7 ========== direct writes: aggrb=2,968KB/s readers : aggrb=101MB/s for-2.6.33 branch ================= direct write: aggrb=19KB/s readers aggrb=137MB/s This patch brings back the WRITE_ODIRECT flag, with the difference that we don't set the BIO_RW_UNPLUG flag so that device is not unplugged after submission of request and an explicit unplug from submitter is required. That way we fix the jeff's issue of not enough merging taking place in aio path as well as make sure direct writes get their fair share. After the fix ============= for-2.6.33 + fix ---------------- direct writes: aggrb=2,728KB/s reads: aggrb=103MB/s Thanks Vivek Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 28 10月, 2009 1 次提交
-
-
由 Jeff Moyer 提交于
Hi, The WRITE_ODIRECT flag is only used in one place, and that code path happens to also call blk_run_address_space. The introduction of this flag, then, could result in the device being unplugged twice for every I/O. Further, with the batching changes in the next patch, we don't want an O_DIRECT write to imply a queue unplug. Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-