- 16 1月, 2016 2 次提交
-
-
由 Dan Williams 提交于
The DAX implementation needs to protect new calls to ->direct_access() and usage of its return value against the driver for the underlying block device being disabled. Use blk_queue_enter()/blk_queue_exit() to hold off blk_cleanup_queue() from proceeding, or otherwise fail new mapping requests if the request_queue is being torn down. This also introduces blk_dax_ctl to simplify the interface from fs/dax.c through dax_map_atomic() to bdev_direct_access(). [willy@linux.intel.com: fix read() of a hole] Signed-off-by: NDan Williams <dan.j.williams@intel.com> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Cc: Jan Kara <jack@suse.com> Cc: Jens Axboe <axboe@fb.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Williams 提交于
If a ->direct_access() implementation ever returns a map count less than PAGE_SIZE, catch the error in bdev_direct_access(). This simplifies error checking in upper layers. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 1月, 2016 2 次提交
-
-
由 Andrew Morton 提交于
bdev_write_page() is used by swapout and by writepage where we cannot use __GFP_FS or __GFP_IO. So it is misleading to mention GFP_KERNEL here. blk_queue_enter() only actually looks at __GFP_DIRECT_RECLAIM, so no bugs were harmed in the making of this patch. Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vladimir Davydov 提交于
Mark those kmem allocations that are known to be easily triggered from userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to memcg. For the list, see below: - threadinfo - task_struct - task_delay_info - pid - cred - mm_struct - vm_area_struct and vm_region (nommu) - anon_vma and anon_vma_chain - signal_struct - sighand_struct - fs_struct - files_struct - fdtable and fdtable->full_fds_bits - dentry and external_name - inode for all filesystems. This is the most tedious part, because most filesystems overwrite the alloc_inode method. The list is far from complete, so feel free to add more objects. Nevertheless, it should be close to "account everything" approach and keep most workloads within bounds. Malevolent users will be able to breach the limit, but this was possible even with the former "account everything" approach (simply because it did not account everything in fact). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 1月, 2016 2 次提交
-
-
由 Dan Williams 提交于
If an application wants exclusive access to all of the persistent memory provided by an NVDIMM namespace it can use this raw-block-dax facility to forgo establishing a filesystem. This capability is targeted primarily to hypervisors wanting to provision persistent memory for guests. It can be disabled / enabled dynamically via the new BLKDAXSET ioctl. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Reviewed-by: NJan Kara <jack@suse.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
Similar to the file_inode() helper, provide a helper to lookup the inode for a raw block device itself. Cc: Al Viro <viro@zeniv.linux.org.uk> Suggested-by: NJan Kara <jack@suse.cz> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 07 1月, 2016 1 次提交
-
-
由 Dmitry Monakhov 提交于
gendisk with part==0 is obviously gendisk->disk_name. Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 12月, 2015 1 次提交
-
-
由 Ilya Dryomov 提交于
Since 52ebea74 ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks") inode, at some point in its lifetime, gets attached to a wb (struct bdi_writeback). Detaching happens on evict, in inode_detach_wb() called from __destroy_inode(), and involves updating wb. However, detaching an internal bdev inode from its wb in __destroy_inode() is too late. Its bdi and by extension root wb are embedded into struct request_queue, which has different lifetime rules and can be freed long before the final bdput() is called (can be from __fput() of a corresponding /dev inode, through dput() - evict() - bd_forget(). bdevs hold onto the underlying disk/queue pair only while opened; as soon as bdev is closed all bets are off. In fact, disk/queue can be gone before __blkdev_put() even returns: 1499 static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part) 1500 { ... 1518 if (bdev->bd_contains == bdev) { 1519 if (disk->fops->release) 1520 disk->fops->release(disk, mode); [ Driver puts its references to disk/queue ] 1521 } 1522 if (!bdev->bd_openers) { 1523 struct module *owner = disk->fops->owner; 1524 1525 disk_put_part(bdev->bd_part); 1526 bdev->bd_part = NULL; 1527 bdev->bd_disk = NULL; 1528 if (bdev != bdev->bd_contains) 1529 victim = bdev->bd_contains; 1530 bdev->bd_contains = NULL; 1531 1532 put_disk(disk); [ We put ours, the queue is gone The last bdput() would result in a write to invalid memory ] 1533 module_put(owner); ... 1539 } Since bdev inodes are special anyway, detach them in __blkdev_put() after clearing inode's dirty bits, turning the problematic inode_detach_wb() in __destroy_inode() into a noop. add_disk() grabs its disk->queue since 523e1d39 ("block: make gendisk hold a reference to its queue"), so the old ->release comment is removed in favor of the new inode_detach_wb() comment. Cc: stable@vger.kernel.org # 4.2+, needs backporting Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Acked-by: NTejun Heo <tj@kernel.org> Tested-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 20 11月, 2015 1 次提交
-
-
由 Dan Williams 提交于
Fix use after free crashes like the following: general protection fault: 0000 [#1] SMP Call Trace: [<ffffffffa0050216>] ? pmem_do_bvec.isra.12+0xa6/0xf0 [nd_pmem] [<ffffffffa0050ba2>] pmem_rw_page+0x42/0x80 [nd_pmem] [<ffffffff8128fd90>] bdev_read_page+0x50/0x60 [<ffffffff812972f0>] do_mpage_readpage+0x510/0x770 [<ffffffff8128fd20>] ? I_BDEV+0x20/0x20 [<ffffffff811d86dc>] ? lru_cache_add+0x1c/0x50 [<ffffffff81297657>] mpage_readpages+0x107/0x170 [<ffffffff8128fd20>] ? I_BDEV+0x20/0x20 [<ffffffff8128fd20>] ? I_BDEV+0x20/0x20 [<ffffffff8129058d>] blkdev_readpages+0x1d/0x20 [<ffffffff811d615f>] __do_page_cache_readahead+0x28f/0x310 [<ffffffff811d6039>] ? __do_page_cache_readahead+0x169/0x310 [<ffffffff811c5abd>] ? pagecache_get_page+0x2d/0x1d0 [<ffffffff811c76f6>] filemap_fault+0x396/0x530 [<ffffffff811f816e>] __do_fault+0x4e/0xf0 [<ffffffff811fce7d>] handle_mm_fault+0x11bd/0x1b50 Cc: <stable@vger.kernel.org> Cc: Jens Axboe <axboe@fb.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Reported-by: Nkbuild test robot <lkp@intel.com> Acked-by: NMatthew Wilcox <willy@linux.intel.com> [willy: symmetry fixups] Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 12 11月, 2015 1 次提交
-
-
由 Vivek Goyal 提交于
If a block device is hot removed and later last reference to device is put, we try to writeback the dirty inode. But device is gone and that writeback fails. Currently we do a WARN_ON() which does not seem to be the right thing. Convert it to a ratelimited kernel warning. Reported-by: NAndi Kleen <andi@firstfloor.org> Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Acked-by: NTejun Heo <tj@kernel.org> [jmoyer@redhat.com: get rid of unnecessary name initialization, 80 cols] Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 22 10月, 2015 1 次提交
-
-
由 Martin K. Petersen 提交于
Up until now the_integrity profile has been dynamically allocated and attached to struct gendisk after the disk has been made active. This causes problems because NVMe devices need to register the profile prior to the partition table being read due to a mandatory metadata buffer requirement. In addition, DM goes through hoops to deal with preallocating, but not initializing integrity profiles. Since the integrity profile is small (4 bytes + a pointer), Christoph suggested moving it to struct gendisk proper. This requires several changes: - Moving the blk_integrity definition to genhd.h. - Inlining blk_integrity in struct gendisk. - Removing the dynamic allocation code. - Adding helper functions which allow gendisk to set up and tear down the integrity sysfs dir when a disk is added/deleted. - Adding a blk_integrity_revalidate() callback for updating the stable pages bdi setting. - The calls that depend on whether a device has an integrity profile or not now key off of the bi->profile pointer. - Simplifying the integrity support routines in DM (Mike Snitzer). Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Reported-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 16 9月, 2015 1 次提交
-
-
由 Jeff Moyer 提交于
The dax code doesn't currently support misaligned partitions, so disable O_DIRECT via dax until such time as that support materializes. Cc: <stable@vger.kernel.org> Suggested-by: NBoaz Harrosh <boaz@plexistor.com> Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 09 9月, 2015 1 次提交
-
-
由 Matthew Wilcox 提交于
In order to handle the !CONFIG_TRANSPARENT_HUGEPAGES case, we need to return VM_FAULT_FALLBACK from the inlined dax_pmd_fault(), which is defined in linux/mm.h. Given that we don't want to include <linux/mm.h> in <linux/fs.h>, the easiest solution is to move the DAX-related functions to a new header, <linux/dax.h>. We could also have moved VM_FAULT_* definitions to a new header, or a different header that isn't quite such a boil-the-ocean header as <linux/mm.h>, but this felt like the best option. Signed-off-by: NMatthew Wilcox <willy@linux.intel.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 8月, 2015 1 次提交
-
-
由 Dan Williams 提交于
None of the implementations currently use it. The common bdev_direct_access() entry point handles all the size checks before calling ->direct_access(). Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 21 8月, 2015 1 次提交
-
-
由 Ross Zwisler 提交于
Update the annotation for the kaddr pointer returned by direct_access() so that it is a __pmem pointer. This is consistent with the PMEM driver and with how this direct_access() pointer is used in the DAX code. Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 18 8月, 2015 1 次提交
-
-
由 Dave Chinner 提交于
The process of reducing contention on per-superblock inode lists starts with moving the locking to match the per-superblock inode list. This takes the global lock out of the picture and reduces the contention problems to within a single filesystem. This doesn't get rid of contention as the locks still have global CPU scope, but it does isolate operations on different superblocks form each other. Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NChristoph Hellwig <hch@lst.de> Tested-by: NDave Chinner <dchinner@redhat.com>
-
- 05 7月, 2015 2 次提交
-
-
由 Matthew Wilcox 提交于
The brd driver is the only in-tree driver that may sleep currently. After some discussion on linux-fsdevel, we decided that any driver may choose to sleep in its ->direct_access method. To ensure that all callers of bdev_direct_access() are prepared for this, add a call to might_sleep(). Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Matthew Wilcox 提交于
If a block device supports the ->direct_access methods, bypass the normal DIO path and use DAX to go straight to memcpy() instead of allocating a DIO and a BIO. Includes support for the DIO_SKIP_DIO_COUNT flag in DAX, as is done in do_blockdev_direct_IO(). Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 28 6月, 2015 1 次提交
-
-
由 Geert Uytterhoeven 提交于
With gcc 3.4.6/4.1.2/4.2.4 (not with 4.4.7/4.6.4/4.8.4): CC fs/block_dev.o include/linux/fs.h:804: warning: ‘I_BDEV’ declared inline after being called include/linux/fs.h:804: warning: previous declaration of ‘I_BDEV’ was here Commit a212b105 ("bdi: make inode_to_bdi() inline") added a caller of I_BDEV() in a header file, exposing the bogus "inline" on the exported implementation. Drop the "inline" keyword to fix this. Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 26 6月, 2015 1 次提交
-
-
由 Vishal Verma 提交于
If a block device has bio integrity enabled, rw_page will bypass the integrity payload, which is undesirable. Skip rw_page if this is the case. Currently brd and zram provide rw_page, and the proposed 'nd' drivers will too. Cc: Jens Axboe <axboe@fb.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Suggested-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NVishal Verma <vishal.l.verma@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 02 6月, 2015 2 次提交
-
-
由 Tejun Heo 提交于
Now that bdi definitions are moved to backing-dev-defs.h, backing-dev.h can include blkdev.h and inline inode_to_bdi() without worrying about introducing circular include dependency. The function gets called from hot paths and fairly trivial. This patch makes inode_to_bdi() and sb_is_blkdev_sb() that the function calls inline. blockdev_superblock and noop_backing_dev_info are EXPORT_GPL'd to allow the inline functions to be used from modules. While at it, make sb_is_blkdev_sb() return bool instead of int. v2: Fixed typo in description as suggested by Jan. Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NJens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Tejun Heo 提交于
With the planned cgroup writeback support, backing-dev related declarations will be more widely used across block and cgroup; unfortunately, including backing-dev.h from include/linux/blkdev.h makes cyclic include dependency quite likely. This patch separates out backing-dev-defs.h which only has the essential definitions and updates blkdev.h to include it. c files which need access to more backing-dev details now include backing-dev.h directly. This takes backing-dev.h off the common include dependency chain making it a lot easier to use it across block and cgroup. v2: fs/fat build failure fixed. Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 25 4月, 2015 1 次提交
-
-
由 Jens Axboe 提交于
do_blockdev_direct_IO() increments and decrements the inode ->i_dio_count for each IO operation. It does this to protect against truncate of a file. Block devices don't need this sort of protection. For a capable multiqueue setup, this atomic int is the only shared state between applications accessing the device for O_DIRECT, and it presents a scaling wall for that. In my testing, as much as 30% of system time is spent incrementing and decrementing this value. A mixed read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with better latencies too. Before: clat percentiles (usec): | 1.00th=[ 33], 5.00th=[ 34], 10.00th=[ 34], 20.00th=[ 34], | 30.00th=[ 34], 40.00th=[ 34], 50.00th=[ 35], 60.00th=[ 35], | 70.00th=[ 35], 80.00th=[ 35], 90.00th=[ 37], 95.00th=[ 80], | 99.00th=[ 98], 99.50th=[ 151], 99.90th=[ 155], 99.95th=[ 155], | 99.99th=[ 165] After: clat percentiles (usec): | 1.00th=[ 95], 5.00th=[ 108], 10.00th=[ 129], 20.00th=[ 149], | 30.00th=[ 155], 40.00th=[ 161], 50.00th=[ 167], 60.00th=[ 171], | 70.00th=[ 177], 80.00th=[ 185], 90.00th=[ 201], 95.00th=[ 270], | 99.00th=[ 390], 99.50th=[ 398], 99.90th=[ 418], 99.95th=[ 422], | 99.99th=[ 438] In other setups, Robert Elliott reported seeing good performance improvements: https://lkml.org/lkml/2015/4/3/557 The more applications accessing the device, the worse it gets. Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells do_blockdev_direct_IO() that it need not worry about incrementing or decrementing the inode i_dio_count for this caller. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Elliott, Robert (Server Storage) <elliott@hp.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NJens Axboe <axboe@fb.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 16 4月, 2015 1 次提交
-
-
由 David Howells 提交于
Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 12 4月, 2015 5 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Omar Sandoval 提交于
Now that no one is using rw, remove it completely. Signed-off-by: NOmar Sandoval <osandov@osandov.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Omar Sandoval 提交于
Most filesystems call through to these at some point, so we'll start here. Signed-off-by: NOmar Sandoval <osandov@osandov.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
All places outside of core VFS that checked ->read and ->write for being NULL or called the methods directly are gone now, so NULL {read,write} with non-NULL {read,write}_iter will do the right thing in all cases. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 26 3月, 2015 1 次提交
-
-
由 Christoph Hellwig 提交于
struct kiocb now is a generic I/O container, so move it to fs.h. Also do a #include diet for aio.h while we're at it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 21 1月, 2015 2 次提交
-
-
由 Christoph Hellwig 提交于
Now that we never use the backing_dev_info pointer in struct address_space we can simply remove it and save 4 to 8 bytes in every inode. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reviewed-by: NTejun Heo <tj@kernel.org> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
Since 018a17bd ("bdi: reimplement bdev_inode_switch_bdi()") the block device code writes out all dirty data whenever switching the backing_dev_info for a block device inode. But a block device inode can only be dirtied when it is in use, which means we only have to write it out on the final blkdev_put, but not when doing a blkdev_get. Factoring out the write out from the bdi list switch prepares from removing the list switch later in the series. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NTejun Heo <tj@kernel.org> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 14 1月, 2015 1 次提交
-
-
由 Matthew Wilcox 提交于
In order to support accesses to larger chunks of memory, pass in a 'size' parameter (counted in bytes), and return the amount available at that address. Add a new helper function, bdev_direct_access(), to handle common functionality including partition handling, checking the length requested is positive, checking for the sector being page-aligned, and checking the length of the request does not pass the end of the partition. Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NBoaz Harrosh <boaz@plexistor.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 17 11月, 2014 1 次提交
-
-
由 Benjamin Marzinski 提交于
Currently, freezing a filesystem involves calling freeze_super, which locks sb->s_umount and then calls the fs-specific freeze_fs hook. This makes it hard for gfs2 (and potentially other cluster filesystems) to use the vfs freezing code to do freezes on all the cluster nodes. In order to communicate that a freeze has been requested, and to make sure that only one node is trying to freeze at a time, gfs2 uses a glock (sd_freeze_gl). The problem is that there is no hook for gfs2 to acquire this lock before calling freeze_super. This means that two nodes can attempt to freeze the filesystem by both calling freeze_super, acquiring the sb->s_umount lock, and then attempting to grab the cluster glock sd_freeze_gl. Only one will succeed, and the other will be stuck in freeze_super, making it impossible to finish freezing the node. To solve this problem, this patch adds the freeze_super and thaw_super hooks. If a filesystem implements these hooks, they are called instead of the vfs freeze_super and thaw_super functions. This means that every filesystem that implements these hooks must call the vfs freeze_super and thaw_super functions itself within the hook function to make use of the vfs freezing code. Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 31 10月, 2014 1 次提交
-
-
由 David Jeffery 提交于
Author: David Jeffery <djeffery@redhat.com> Changes to the basic direct I/O code have broken the raw driver when reading to the end of a raw device. Instead of returning a short read for a read that extends partially beyond the device's end or 0 when at the end of the device, these reads now return EIO. The raw driver needs the same end of device handling as was added for normal block devices. Using blkdev_read_iter, which has the needed size checks, prevents the EIO conditions at the end of the device. Signed-off-by: NDavid Jeffery <djeffery@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 10 10月, 2014 1 次提交
-
-
由 Akinobu Mita 提交于
Sequential read from a block device is expected to be equal or faster than from the file on a filesystem. But it is not correct due to the lack of effective readpages() in the address space operations for block device. This implements readpages() operation for block device by using mpage_readpages() which can create multipage BIOs instead of BIOs for each page and reduce system CPU time consumption. Install 1GB of RAM disk storage: # modprobe scsi_debug dev_size_mb=1024 delay=0 Sequential read from file on a filesystem: # mkfs.ext4 /dev/$DEV # mount /dev/$DEV /mnt # fio --name=t --size=512m --rw=read --filename=/mnt/file ... read : io=524288KB, bw=2133.4MB/s, iops=546133, runt= 240msec Sequential read from a block device: # fio --name=t --size=512m --rw=read --filename=/dev/$DEV ... (Without this commit) read : io=524288KB, bw=1700.2MB/s, iops=435455, runt= 301msec (With this commit) read : io=524288KB, bw=2160.4MB/s, iops=553046, runt= 237msec Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 9月, 2014 2 次提交
-
-
由 Tejun Heo 提交于
A block_device may be attached to different gendisks and thus different bdis over time. bdev_inode_switch_bdi() is used to switch the associated bdi. The function assumes that the inode could be dirty and transfers it between bdis if so. This is a bit nasty in that it reaches into bdi internals. This patch reimplements the function so that it writes out the inode if dirty. This is a lot simpler and can be implemented without exposing bdi internals. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Tejun Heo 提交于
bdev_get_queue() returns the request_queue associated with the specified block_device. blk_get_backing_dev_info() makes use of bdev_get_queue() to determine the associated bdi given a block_device. All the callers of bdev_get_queue() including blk_get_backing_dev_info() assume that bdev_get_queue() may return NULL and implement NULL handling; however, bdev_get_queue() requires the passed in block_device is opened and attached to its gendisk. Because an active gendisk always has a valid request_queue associated with it, bdev_get_queue() can never return NULL and neither can blk_get_backing_dev_info(). Make it clear that neither of the two functions can return NULL and remove NULL handling from all the callers. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 12 6月, 2014 1 次提交
-
-
由 Al Viro 提交于
iter_file_splice_write() - a ->splice_write() instance that gathers the pipe buffers, builds a bio_vec-based iov_iter covering those and feeds it to ->write_iter(). A bunch of simple cases coverted to that... [AV: fixed the braino spotted by Cyrill] Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 6月, 2014 1 次提交
-
-
由 Matthew Wilcox 提交于
A block device driver may choose to provide a rw_page operation. These will be called when the filesystem is attempting to do page sized I/O to page cache pages (ie not for direct I/O). This does preclude I/Os that are larger than page size, so this may only be a performance gain for some devices. Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Tested-by: NDheeraj Reddy <dheeraj.reddy@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-