- 14 12月, 2016 2 次提交
-
-
由 Shaohua Li 提交于
For sync direct IO, generic_file_direct_write/generic_file_read_iter will update file access position. Don't duplicate the update in .direct_IO. This cause my raid array can't assemble. Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 NeilBrown 提交于
bdev->bd_contains is not stable before calling __blkdev_get(). When __blkdev_get() is called on a parition with ->bd_openers == 0 it sets bdev->bd_contains = bdev; which is not correct for a partition. After a call to __blkdev_get() succeeds, ->bd_openers will be > 0 and then ->bd_contains is stable. When FMODE_EXCL is used, blkdev_get() calls bd_start_claiming() -> bd_prepare_to_claim() -> bd_may_claim() This call happens before __blkdev_get() is called, so ->bd_contains is not stable. So bd_may_claim() cannot safely use ->bd_contains. It currently tries to use it, and this can lead to a BUG_ON(). This happens when a whole device is already open with a bd_holder (in use by dm in my particular example) and two threads race to open a partition of that device for the first time, one opening with O_EXCL and one without. The thread that doesn't use O_EXCL gets through blkdev_get() to __blkdev_get(), gains the ->bd_mutex, and sets bdev->bd_contains = bdev; Immediately thereafter the other thread, using FMODE_EXCL, calls bd_start_claiming() from blkdev_get(). This should fail because the whole device has a holder, but because bdev->bd_contains == bdev bd_may_claim() incorrectly reports success. This thread continues and blocks on bd_mutex. The first thread then sets bdev->bd_contains correctly and drops the mutex. The thread using FMODE_EXCL then continues and when it calls bd_may_claim() again in: BUG_ON(!bd_may_claim(bdev, whole, holder)); The BUG_ON fires. Fix this by removing the dependency on ->bd_contains in bd_may_claim(). As bd_may_claim() has direct access to the whole device, it can simply test if the target bdev is the whole device. Fixes: 6b4517a7 ("block: implement bd_claiming and claiming block") Cc: stable@vger.kernel.org (v2.6.35+) Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 01 12月, 2016 1 次提交
-
-
由 Rabin Vincent 提交于
If a block device is closed while iterate_bdevs() is handling it, the following NULL pointer dereference occurs because bdev->b_disk is NULL in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in turn called by the mapping_cap_writeback_dirty() call in __filemap_fdatawrite_range()): BUG: unable to handle kernel NULL pointer dereference at 0000000000000508 IP: [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 PGD 9e62067 PUD 9ee8067 PMD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000 RIP: 0010:[<ffffffff81314790>] [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 RSP: 0018:ffff880009f5fe68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940 RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0 RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860 R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38 FS: 00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0 Stack: ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff 0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001 ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f Call Trace: [<ffffffff8112e7f5>] __filemap_fdatawrite_range+0x85/0x90 [<ffffffff8112e81f>] filemap_fdatawrite+0x1f/0x30 [<ffffffff811b25d6>] fdatawrite_one_bdev+0x16/0x20 [<ffffffff811bc402>] iterate_bdevs+0xf2/0x130 [<ffffffff811b2763>] sys_sync+0x63/0x90 [<ffffffff815d4272>] entry_SYSCALL_64_fastpath+0x12/0x76 Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 5d RIP [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 RSP <ffff880009f5fe68> CR2: 0000000000000508 ---[ end trace 2487336ceb3de62d ]--- The crash is easily reproducible by running the following command, if an msleep(100) is inserted before the call to func() in iterate_devs(): while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done Fix it by holding the bd_mutex across the func() call and only calling func() if the bdev is opened. Cc: stable@vger.kernel.org Fixes: 5c0d6b60 ("vfs: Create function for iterating over block devices") Reported-and-tested-by: NWei Fang <fangwei1@huawei.com> Signed-off-by: NRabin Vincent <rabinv@axis.com> Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 22 11月, 2016 3 次提交
-
-
由 Ming Lei 提交于
Some drivers often use external bvec table, so introduce this helper for this case. It is always safe to access the bio->bi_io_vec in this way for this case. After converting to this usage, it will becomes a bit easier to evaluate the remaining direct access to bio->bi_io_vec, so it can help to prepare for the following multipage bvec support. Signed-off-by: NMing Lei <tom.leiming@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Fixed up the new O_DIRECT cases. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
We store the bits in the bdev sector size locally, but we don't use the calculation anymore. All we do with it is shift it back up to the bdev sector size. So let's just use that directly and kill the variable and bits calculation. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Damien Le Moal 提交于
A direct I/O alignment must be always checked against the device blocks size, but the I/O offset (bio->bi_iter.bi_sector must always use 512B sector unit, and not the actual logical block size. Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 18 11月, 2016 4 次提交
-
-
由 Christoph Hellwig 提交于
Similar to the simple fast path, but we now need a dio structure to track multiple-bio completions. It's basically a cut-down version of the new iomap-based direct I/O code for filesystems, but without all the logic to call into the filesystem for extent lookup or allocation, and without the complex I/O completion workqueue handler for AIO - instead we just use the FUA bit on the bios to ensure data is flushed to stable storage. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
Split the op setting code into a helper, use it in both places. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
Just alloc the bio_vec array if we exceed the inline limit. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
This patch adds a small and simple fast patch for small direct I/O requests on block devices that don't use AIO. Between the neat bio_iov_iter_get_pages helper that avoids allocating a page array for get_user_pages and the on-stack bio and biovec this avoid memory allocations and atomic operations entirely in the direct I/O code (lower levels might still do memory allocations and will usually have at least some atomic operations, though). Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com> Tested-By: NStephen Bates <sbates@raithlin.com> Reviewed-By: NStephen Bates <sbates@raithlin.com>
-
- 12 10月, 2016 1 次提交
-
-
由 Darrick J. Wong 提交于
After much discussion, it seems that the fallocate feature flag FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been whitelisted for zeroing SCSI UNMAP. Punch still requires that FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the device will be clamped to the device size if KEEP_SIZE is set; or will return -EINVAL if not. Both start and length must be aligned to the device's logical block size. Since the semantics of fallocate are fairly well established already, wire up the two pieces. The other fallocate variants (collapse range, insert range, and allocate blocks) are not supported. Link: http://lkml.kernel.org/r/147518379992.22791.8849838163218235007.stgit@birch.djwong.orgSigned-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Mike Snitzer <snitzer@redhat.com> # tweaked header Cc: Brian Foster <bfoster@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 10月, 2016 1 次提交
-
-
由 Pierre Morel 提交于
When triggering thaw-filesystems via magic sysrq, the system enters a loop in do_thaw_one(), as thaw_bdev() still returns success if bd_fsfreeze_count == 0. To fix this, let thaw_bdev() always return error (and simplify the code a bit at the same time). Reviewed-by: NEric Farman <farman@linux.vnet.ibm.com> Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NPierre Morel <pmorel@linux.vnet.ibm.com> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 14 9月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
DAX support for block devices was removed in commits 03cdad ("block: disable block device DAX by default") and 99a01cdf ("block: remove BLK_DEV_DAX config option"), but we still kept a call to dax_do_io and some uneeded i_flags manipulations introduced in commit bbab37 ("block: Add support for DAX reads/writes to block devices"). Remove those leftovers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Acked-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 25 8月, 2016 1 次提交
-
-
由 Andrey Ryabinin 提交于
Calling freeze_bdev() twice on the same block device without mounted filesystem get_super() will return NULL, which will lead to NULL-ptr dereference later in drop_super(). Check get_super() result to fix that. Note, that this is a purely theoretical issue. We have only 3 freeze_bdev() callers. 2 of them are in filesystem code and used on a device with mounted fs. The third one in lock_fs() has protection in upper-layer code against freezing block device the second time without thawing it first. Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 22 8月, 2016 1 次提交
-
-
由 Vegard Nossum 提交于
I got this: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) CPU: 0 PID: 5505 Comm: syz-executor Not tainted 4.8.0-rc2+ #161 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 task: ffff880113415940 task.stack: ffff880118350000 RIP: 0010:[<ffffffff8172cb32>] [<ffffffff8172cb32>] bd_mount+0x52/0xa0 RSP: 0018:ffff880118357ca0 EFLAGS: 00010207 RAX: dffffc0000000000 RBX: ffffffffffffffff RCX: ffffc90000bb6000 RDX: 0000000000000018 RSI: ffffffff846d6b20 RDI: 00000000000000c7 RBP: ffff880118357cb0 R08: ffff880115967c68 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801188211e8 R13: ffffffff847baa20 R14: ffff8801139cb000 R15: 0000000000000080 FS: 00007fa3ff6c0700(0000) GS:ffff88011aa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc1d8cc7e78 CR3: 0000000109f20000 CR4: 00000000000006f0 DR0: 000000000000001e DR1: 000000000000001e DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Stack: ffff880112cfd6c0 ffff8801188211e8 ffff880118357cf0 ffffffff8167f207 ffffffff816d7a1e ffff880112a413c0 ffffffff847baa20 ffff8801188211e8 0000000000000080 ffff880112cfd6c0 ffff880118357d38 ffffffff816dce0a Call Trace: [<ffffffff8167f207>] mount_fs+0x97/0x2e0 [<ffffffff816d7a1e>] ? alloc_vfsmnt+0x55e/0x760 [<ffffffff816dce0a>] vfs_kern_mount+0x7a/0x300 [<ffffffff83c3247c>] ? _raw_read_unlock+0x2c/0x50 [<ffffffff816dfc87>] do_mount+0x3d7/0x2730 [<ffffffff81235fd4>] ? trace_do_page_fault+0x1f4/0x3a0 [<ffffffff816df8b0>] ? copy_mount_string+0x40/0x40 [<ffffffff8161ea81>] ? memset+0x31/0x40 [<ffffffff816df73e>] ? copy_mount_options+0x1ee/0x320 [<ffffffff816e2a02>] SyS_mount+0xb2/0x120 [<ffffffff816e2950>] ? copy_mnt_ns+0x970/0x970 [<ffffffff81005524>] do_syscall_64+0x1c4/0x4e0 [<ffffffff83c3282a>] entry_SYSCALL64_slow_path+0x25/0x25 Code: 83 e8 63 1b fc ff 48 85 c0 48 89 c3 74 4c e8 56 35 d1 ff 48 8d bb c8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 36 4c 8b a3 c8 00 00 00 48 b8 00 00 00 00 00 fc RIP [<ffffffff8172cb32>] bd_mount+0x52/0xa0 RSP <ffff880118357ca0> ---[ end trace 13690ad962168b98 ]--- mount_pseudo() returns ERR_PTR(), not NULL, on error. Fixes: 3684aa70 ("block-dev: enable writeback cgroup support") Cc: Shaohua Li <shli@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@fb.com> Cc: stable@vger.kernel.org Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 08 8月, 2016 1 次提交
-
-
由 Jens Axboe 提交于
Commit abf54548 changed it from an 'rw' flags type to the newer ops based interface, but now we're effectively leaking some bdev internals to the rest of the kernel. Since we only care about whether it's a read or a write at that level, just pass in a bool 'is_write' parameter instead. Then we can also move op_is_write() and friends back under CONFIG_BLOCK protection. Reviewed-by: NMike Christie <mchristi@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 05 8月, 2016 1 次提交
-
-
由 Mike Christie 提交于
The rw_page users were not converted to use bio/req ops. As a result bdev_write_page is not passing down REQ_OP_WRITE and the IOs will be sent down as reads. Signed-off-by: NMike Christie <mchristi@redhat.com> Fixes: 4e1b2d52 ("block, fs, drivers: remove REQ_OP compat defs and related code") Modified by me to: 1) Drop op_flags passing into ->rw_page(), as we don't use it. 2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 04 8月, 2016 1 次提交
-
-
由 Ross Zwisler 提交于
The functionality for block device DAX was already removed with commit acc93d30 ("Revert "block: enable dax for raw block devices"") However, we still had a config option hanging around that was always disabled because it depended on CONFIG_BROKEN. This config option was introduced in commit 03cdadb0 ("block: disable block device DAX by default") This change reverts that commit, removing the dead config option. Link: http://lkml.kernel.org/r/20160729182314.6368-1-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: NDan Williams <dan.j.williams@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 7月, 2016 1 次提交
-
-
由 Toshi Kani 提交于
Currently, presence of direct_access() in block_device_operations indicates support of DAX on its block device. Because block_device_operations is instantiated with 'const', this DAX capablity may not be enabled conditinally. In preparation for supporting DAX to device-mapper devices, add QUEUE_FLAG_DAX to request_queue flags to advertise their DAX support. This will allow to set the DAX capability based on how mapped device is composed. Signed-off-by: NToshi Kani <toshi.kani@hpe.com> Acked-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: <linux-s390@vger.kernel.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 20 7月, 2016 1 次提交
-
-
由 Al Viro 提交于
Since 2006 we have ->i_bdev pinning bdev in question, so there's no way to get to bdev ->evict_inode() while there's an aliasing inode anywhere. In other words, the only place walking the list of aliases is guaranteed to do it only when the list is empty... Remove the detritus; it should've been done in "[PATCH] Fix a race condition between ->i_mapping and iput()", but nobody had noticed it back then. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 24 6月, 2016 1 次提交
-
-
由 Eric W. Biederman 提交于
Introduce a function may_open_dev that tests MNT_NODEV and a new superblock flab SB_I_NODEV. Use this new function in all of the places where MNT_NODEV was previously tested. Add the new SB_I_NODEV s_iflag to proc, sysfs, and mqueuefs as those filesystems should never support device nodes, and a simple superblock flags makes that very hard to get wrong. With SB_I_NODEV set if any device nodes somehow manage to show up on on a filesystem those device nodes will be unopenable. Acked-by: NSeth Forshee <seth.forshee@canonical.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 21 5月, 2016 1 次提交
-
-
由 Dan Williams 提交于
This reverts commit 5a023cdb. The functionality is superseded by the new "Device DAX" facility. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Jan Kara <jack@suse.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 19 5月, 2016 1 次提交
-
-
由 Dan Williams 提交于
1/ If a mapping overlaps a bad sector fail the request. 2/ Do not opportunistically report more dax-capable capacity than is requested when errors present. Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com> [vishal: fix a conflict with system RAM collision patches] [vishal: add a 'size' parameter to ->direct_access] [vishal: fix a conflict with DAX alignment check patches] Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
-
- 17 5月, 2016 4 次提交
-
-
由 Toshi Kani 提交于
blkdev_dax_capable() is similar to bdev_dax_supported(), but needs to remain as a separate interface for checking dax capability of a raw block device. Rename and relocate blkdev_dax_capable() to keep them maintained consistently, and call bdev_direct_access() for the dax capability check. There is no change in the behavior. Link: https://lkml.org/lkml/2016/5/9/950Signed-off-by: NToshi Kani <toshi.kani@hpe.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@fb.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Boaz Harrosh <boaz@plexistor.com> Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
-
由 Toshi Kani 提交于
DAX imposes additional requirements to a device. Add bdev_dax_supported() which performs all the precondition checks necessary for filesystem to mount the device with dax option. Also add a new check to verify if a partition is aligned by 4KB. When a partition is unaligned, any dax read/write access fails, except for metadata update. Signed-off-by: NToshi Kani <toshi.kani@hpe.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@fb.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Boaz Harrosh <boaz@plexistor.com> Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
-
由 Toshi Kani 提交于
In preparation of moving DAX capability checks to the block layer from filesystem code, add a VFS message interface that aligns with filesystem's message format. For instance, a vfs_msg() message followed by XFS messages in case of a dax mount error may look like: VFS (pmem0p1): error: unaligned partition for dax XFS (pmem0p1): DAX unsupported by block device. Turning off DAX. XFS (pmem0p1): Mounting V5 Filesystem : vfs_msg() is largely based on ext4_msg(). Signed-off-by: NToshi Kani <toshi.kani@hpe.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@fb.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Boaz Harrosh <boaz@plexistor.com> Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
-
由 Jan Kara 提交于
Fault handlers currently take complete_unwritten argument to convert unwritten extents after PTEs are updated. However no filesystem uses this anymore as the code is racy. Remove the unused argument. Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
-
- 02 5月, 2016 3 次提交
-
-
由 Christoph Hellwig 提交于
The kiocb already has the new position, so use that. The only interesting case is AIO, where we currently don't bother updating ki_pos. We're about to free the kiocb after we're done, so we might as well update it to make everyone's life simpler. While we're at it also return the bytes written argument passed in if we were successful so that the boilerplate error switch code in the callers can go away. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
This will allow us to do per-I/O sync file writes, as required by a lot of fileservers or storage targets. XXX: Will need a few additional audits for O_DSYNC Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Including blkdev_direct_IO and dax_do_io. It has to be ki_pos to actually work, so eliminate the superflous argument. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 4月, 2016 1 次提交
-
-
由 Kirill A. Shutemov 提交于
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 3月, 2016 1 次提交
-
-
由 Shaohua Li 提交于
block_dev's .writepages/.writepage already handles wbc_init_bio/wbc_account_io. We only set the SB_I_CGROUPWB bit to suppport writeback cgroup support. Signed-off-by: NShaohua Li <shli@fb.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 28 2月, 2016 2 次提交
-
-
由 Ross Zwisler 提交于
Previously calls to dax_writeback_mapping_range() for all DAX filesystems (ext2, ext4 & xfs) were centralized in filemap_write_and_wait_range(). dax_writeback_mapping_range() needs a struct block_device, and it used to get that from inode->i_sb->s_bdev. This is correct for normal inodes mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw block devices and for XFS real-time files. Instead, call dax_writeback_mapping_range() directly from the filesystem ->writepages function so that it can supply us with a valid block device. This also fixes DAX code to properly flush caches in response to sync(2). Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NJan Kara <jack@suse.cz> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@fb.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Williams 提交于
The recent *sync enabling discovered that we are inserting into the block_device pagecache counter to the expectations of the dirty data tracking for dax mappings. This can lead to data corruption. We want to support DAX for block devices eventually, but it requires wider changes to properly manage the pagecache. dump_stack+0x85/0xc2 dax_writeback_mapping_range+0x60/0xe0 blkdev_writepages+0x3f/0x50 do_writepages+0x21/0x30 __filemap_fdatawrite_range+0xc6/0x100 filemap_write_and_wait+0x4a/0xa0 set_blocksize+0x70/0xd0 sb_set_blocksize+0x1d/0x50 ext4_fill_super+0x75b/0x3360 mount_bdev+0x180/0x1b0 ext4_mount+0x15/0x20 mount_fs+0x38/0x170 Mark the support broken so its disabled by default, but otherwise still available for testing. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Suggested-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@fb.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 2月, 2016 1 次提交
-
-
由 Ross Zwisler 提交于
Previously the pfn_mkwrite() fault handler for raw block devices called bldev_dax_fault() -> __dax_fault() to do a full DAX page fault. Really what the pfn_mkwrite() fault handler needs to do is call dax_pfn_mkwrite() to make sure that the radix tree entry for the given PTE is marked as dirty so that a follow-up fsync or msync call will flush it durably to media. Fixes: 5a023cdb ("block: enable dax for raw block devices") Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 31 1月, 2016 1 次提交
-
-
由 Dan Williams 提交于
Dynamically enabling DAX requires that the page cache first be flushed and invalidated. This must occur atomically with the change of DAX mode otherwise we confuse the fsync/msync tracking and violate data durability guarantees. Eliminate the possibilty of DAX-disabled to DAX-enabled transitions for now and revisit this for the next cycle. Cc: Jan Kara <jack@suse.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 23 1月, 2016 2 次提交
-
-
由 Ross Zwisler 提交于
Add support for tracking dirty DAX entries in the struct address_space radix tree. This tree is already used for dirty page writeback, and it already supports the use of exceptional (non struct page*) entries. In order to properly track dirty DAX pages we will insert new exceptional entries into the radix tree that represent dirty DAX PTE or PMD pages. These exceptional entries will also contain the writeback addresses for the PTE or PMD faults that we can use at fsync/msync time. There are currently two types of exceptional entries (shmem and shadow) that can be placed into the radix tree, and this adds a third. We rely on the fact that only one type of exceptional entry can be found in a given radix tree based on its usage. This happens for free with DAX vs shmem but we explicitly prevent shadow entries from being added to radix trees for DAX mappings. The only shadow entries that would be generated for DAX radix trees would be to track zero page mappings that were created for holes. These pages would receive minimal benefit from having shadow entries, and the choice to have only one type of exceptional entry in a given radix tree makes the logic simpler both in clear_exceptional_entry() and in the rest of DAX. Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Dave Chinner <david@fromorbit.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Kara <jack@suse.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Al Viro 提交于
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 16 1月, 2016 2 次提交
-
-
由 Dan Williams 提交于
The DAX implementation needs to protect new calls to ->direct_access() and usage of its return value against the driver for the underlying block device being disabled. Use blk_queue_enter()/blk_queue_exit() to hold off blk_cleanup_queue() from proceeding, or otherwise fail new mapping requests if the request_queue is being torn down. This also introduces blk_dax_ctl to simplify the interface from fs/dax.c through dax_map_atomic() to bdev_direct_access(). [willy@linux.intel.com: fix read() of a hole] Signed-off-by: NDan Williams <dan.j.williams@intel.com> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Cc: Jan Kara <jack@suse.com> Cc: Jens Axboe <axboe@fb.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Williams 提交于
If a ->direct_access() implementation ever returns a map count less than PAGE_SIZE, catch the error in bdev_direct_access(). This simplifies error checking in upper layers. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-