- 10 8月, 2017 2 次提交
-
-
由 Jens Axboe 提交于
Instead of returning the count that matches the partition, pass in an array of two ints. Index 0 will be filled with the inflight count for the partition in question, and index 1 will filled with the root inflight count, if the partition passed in is not the root. This is in preparation for being able to calculate both in one go. Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
No functional change in this patch, just in preparation for basing the inflight mechanism on the queue in question. Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 6月, 2017 1 次提交
-
-
由 Bart Van Assche 提交于
The variable 'disk_type' is never modified so constify it. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 4月, 2017 1 次提交
-
-
由 Dan Williams 提交于
Commit 99e6608c "block: Add badblock management for gendisks" allowed for drivers like pmem and software-raid to advertise a list of bad media areas. However, it inadvertently added a 'badblocks' to all block devices. Lets clean this up by having the 'badblocks' attribute not be visible when the driver has not populated a 'struct badblocks' instance in the gendisk. Cc: Jens Axboe <axboe@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Reported-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Tested-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 03 4月, 2017 1 次提交
-
-
由 mchehab@s-opensource.com 提交于
./lib/string.c:134: WARNING: Inline emphasis start-string without end-string. ./mm/filemap.c:522: WARNING: Inline interpreted text or phrase reference start-string without end-string. ./mm/filemap.c:1283: ERROR: Unexpected indentation. ./mm/filemap.c:3003: WARNING: Inline interpreted text or phrase reference start-string without end-string. ./mm/vmalloc.c:1544: WARNING: Inline emphasis start-string without end-string. ./mm/page_alloc.c:4245: ERROR: Unexpected indentation. ./ipc/util.c:676: ERROR: Unexpected indentation. ./drivers/pci/irq.c:35: WARNING: Block quote ends without a blank line; unexpected unindent. ./security/security.c:109: ERROR: Unexpected indentation. ./security/security.c:110: WARNING: Definition list ends without a blank line; unexpected unindent. ./block/genhd.c:275: WARNING: Inline strong start-string without end-string. ./block/genhd.c:283: WARNING: Inline strong start-string without end-string. ./include/linux/clk.h:134: WARNING: Inline emphasis start-string without end-string. ./include/linux/clk.h:134: WARNING: Inline emphasis start-string without end-string. ./ipc/util.c:477: ERROR: Unknown target name: "s". Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Signed-off-by: NJonathan Corbet <corbet@lwn.net>
-
- 23 3月, 2017 1 次提交
-
-
由 Jan Kara 提交于
When device open races with device shutdown, we can get the following oops in scsi_disk_get(): [11863.044351] general protection fault: 0000 [#1] SMP [11863.045561] Modules linked in: scsi_debug xfs libcrc32c netconsole btrfs raid6_pq zlib_deflate lzo_compress xor [last unloaded: loop] [11863.047853] CPU: 3 PID: 13042 Comm: hald-probe-stor Tainted: G W 4.10.0-rc2-xen+ #35 [11863.048030] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [11863.048030] task: ffff88007f438200 task.stack: ffffc90000fd0000 [11863.048030] RIP: 0010:scsi_disk_get+0x43/0x70 [11863.048030] RSP: 0018:ffffc90000fd3a08 EFLAGS: 00010202 [11863.048030] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007f56d000 RCX: 0000000000000000 [11863.048030] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffffff81a8d880 [11863.048030] RBP: ffffc90000fd3a18 R08: 0000000000000000 R09: 0000000000000001 [11863.059217] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffffa [11863.059217] R13: ffff880078872800 R14: ffff880070915540 R15: 000000000000001d [11863.059217] FS: 00007f2611f71800(0000) GS:ffff88007f0c0000(0000) knlGS:0000000000000000 [11863.059217] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [11863.059217] CR2: 000000000060e048 CR3: 00000000778d4000 CR4: 00000000000006e0 [11863.059217] Call Trace: [11863.059217] ? disk_get_part+0x22/0x1f0 [11863.059217] sd_open+0x39/0x130 [11863.059217] __blkdev_get+0x69/0x430 [11863.059217] ? bd_acquire+0x7f/0xc0 [11863.059217] ? bd_acquire+0x96/0xc0 [11863.059217] ? blkdev_get+0x350/0x350 [11863.059217] blkdev_get+0x126/0x350 [11863.059217] ? _raw_spin_unlock+0x2b/0x40 [11863.059217] ? bd_acquire+0x7f/0xc0 [11863.059217] ? blkdev_get+0x350/0x350 [11863.059217] blkdev_open+0x65/0x80 ... As you can see RAX value is already poisoned showing that gendisk we got is already freed. The problem is that get_gendisk() looks up device number in ext_devt_idr and then does get_disk() which does kobject_get() on the disks kobject. However the disk gets removed from ext_devt_idr only in disk_release() (through blk_free_devt()) at which moment it has already 0 refcount and is already on its way to be freed. Indeed we've got a warning from kobject_get() about 0 refcount shortly before the oops. We fix the problem by using kobject_get_unless_zero() in get_disk() so that get_disk() cannot get reference on a disk that is already being freed. Tested-by: NLekshmi Pillai <lekshmicpillai@in.ibm.com> Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 09 3月, 2017 2 次提交
-
-
由 Jan Kara 提交于
This reverts commit 0dba1314. It causes leaking of device numbers for SCSI when SCSI registers multiple gendisks for one request_queue in succession. It can be easily reproduced using Omar's script [1] on kernel with CONFIG_DEBUG_TEST_DRIVER_REMOVE. Furthermore the protection provided by this commit is not needed anymore as the problem it was fixing got also fixed by commit 165a5e22 "block: Move bdi_unregister() to del_gendisk()". [1]: http://marc.info/?l=linux-block&m=148554717109098&w=2Signed-off-by: NJan Kara <jack@suse.cz> Acked-by: NDan Williams <dan.j.williams@intel.com> Tested-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jan Kara 提交于
Commit 165a5e22 "block: Move bdi_unregister() to del_gendisk()" added disk->queue dereference to del_gendisk(). Although del_gendisk() is not supposed to be called without disk->queue valid and blk_unregister_queue() warns in that case, this change will make it oops instead. Return to the old more robust behavior of just warning when del_gendisk() gets called for gendisk with disk->queue being NULL. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJan Kara <jack@suse.cz> Tested-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 03 3月, 2017 1 次提交
-
-
由 Jan Kara 提交于
Commit 6cd18e71 "block: destroy bdi before blockdev is unregistered." moved bdi unregistration (at that time through bdi_destroy()) from blk_release_queue() to blk_cleanup_queue() because it needs to happen before blk_unregister_region() call in del_gendisk() for MD. SCSI though will free up the device number from sd_remove() called through a maze of callbacks from device_del() in __scsi_remove_device() before blk_cleanup_queue() and thus similar races as described in 6cd18e71 can happen for SCSI as well as reported by Omar [1]. Moving bdi_unregister() to del_gendisk() works for MD and fixes the problem for SCSI since del_gendisk() gets called from sd_remove() before freeing the device number. This also makes device_add_disk() (calling bdi_register_owner()) more symmetric with del_gendisk(). [1] http://marc.info/?l=linux-block&m=148554717109098&w=2Tested-by: NLekshmi Pillai <lekshmicpillai@in.ibm.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJan Kara <jack@suse.cz> Tested-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 22 2月, 2017 2 次提交
-
-
由 Jan Kara 提交于
Iteration over partitions in del_gendisk() omits part0. Add bdev_unhash_inode() call for the whole device. Otherwise if the device number gets reused, bdev inode will be still associated with the old (stale) bdi. Tested-by: NLekshmi Pillai <lekshmicpillai@in.ibm.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jan Kara 提交于
Move bdev_unhash_inode() after invalidate_partition() as invalidate_partition() looks up bdev and it cannot find the right bdev inode after bdev_unhash_inode() is called. Thus invalidate_partition() would not invalidate page cache of the previously used bdev. Also use part_devt() when calling bdev_unhash_inode() instead of manually creating the device number. Tested-by: NLekshmi Pillai <lekshmicpillai@in.ibm.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 02 2月, 2017 3 次提交
-
-
由 Dan Williams 提交于
Warnings of the following form occur because scsi reuses a devt number while the block layer still has it referenced as the name of the bdi [1]: WARNING: CPU: 1 PID: 93 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80 sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:192' [..] Call Trace: dump_stack+0x86/0xc3 __warn+0xcb/0xf0 warn_slowpath_fmt+0x5f/0x80 ? kernfs_path_from_node+0x4f/0x60 sysfs_warn_dup+0x62/0x80 sysfs_create_dir_ns+0x77/0x90 kobject_add_internal+0xb2/0x350 kobject_add+0x75/0xd0 device_add+0x15a/0x650 device_create_groups_vargs+0xe0/0xf0 device_create_vargs+0x1c/0x20 bdi_register+0x90/0x240 ? lockdep_init_map+0x57/0x200 bdi_register_owner+0x36/0x60 device_add_disk+0x1bb/0x4e0 ? __pm_runtime_use_autosuspend+0x5c/0x70 sd_probe_async+0x10d/0x1c0 async_run_entry_fn+0x39/0x170 This is a brute-force fix to pass the devt release information from sd_probe() to the locations where we register the bdi, device_add_disk(), and unregister the bdi, blk_cleanup_queue(). Thanks to Omar for the quick reproducer script [2]. This patch survives where an unmodified kernel fails in a few seconds. [1]: https://marc.info/?l=linux-scsi&m=147116857810716&w=4 [2]: http://marc.info/?l=linux-block&m=148554717109098&w=2 Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Bart Van Assche <bart.vanassche@sandisk.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Jan Kara <jack@suse.cz> Reported-by: NOmar Sandoval <osandov@osandov.com> Tested-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jan Kara 提交于
We will want to have struct backing_dev_info allocated separately from struct request_queue. As the first step add pointer to backing_dev_info to request_queue and convert all users touching it. No functional changes in this patch. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jan Kara 提交于
Currently, block device inodes stay around after corresponding gendisk hash died until memory reclaim finds them and frees them. Since we will make block device inode pin the bdi, we want to free the block device inode as soon as the device goes away so that bdi does not stay around unnecessarily. Furthermore we need to avoid issues when new device with the same major,minor pair gets created since reusing the bdi structure would be rather difficult in this case. Unhashing block device inode on gendisk destruction nicely deals with these problems. Once last block device inode reference is dropped (which may be directly in del_gendisk()), the inode gets evicted. Furthermore if the major,minor pair gets reallocated, we are guaranteed to get new block device inode even if old block device inode is not yet evicted and thus we avoid issues with possible reuse of bdi. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 05 8月, 2016 2 次提交
-
-
由 Dan Williams 提交于
The name for a bdi of a gendisk is derived from the gendisk's devt. However, since the gendisk is destroyed before the bdi it leaves a window where a new gendisk could dynamically reuse the same devt while a bdi with the same name is still live. Arrange for the bdi to hold a reference against its "owner" disk device while it is registered. Otherwise we can hit sysfs duplicate name collisions like the following: WARNING: CPU: 10 PID: 2078 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x80 sysfs: cannot create duplicate filename '/devices/virtual/bdi/259:1' Hardware name: HP ProLiant DL580 Gen8, BIOS P79 05/06/2015 0000000000000286 0000000002c04ad5 ffff88006f24f970 ffffffff8134caec ffff88006f24f9c0 0000000000000000 ffff88006f24f9b0 ffffffff8108c351 0000001f0000000c ffff88105d236000 ffff88105d1031e0 ffff8800357427f8 Call Trace: [<ffffffff8134caec>] dump_stack+0x63/0x87 [<ffffffff8108c351>] __warn+0xd1/0xf0 [<ffffffff8108c3cf>] warn_slowpath_fmt+0x5f/0x80 [<ffffffff812a0d34>] sysfs_warn_dup+0x64/0x80 [<ffffffff812a0e1e>] sysfs_create_dir_ns+0x7e/0x90 [<ffffffff8134faaa>] kobject_add_internal+0xaa/0x320 [<ffffffff81358d4e>] ? vsnprintf+0x34e/0x4d0 [<ffffffff8134ff55>] kobject_add+0x75/0xd0 [<ffffffff816e66b2>] ? mutex_lock+0x12/0x2f [<ffffffff8148b0a5>] device_add+0x125/0x610 [<ffffffff8148b788>] device_create_groups_vargs+0xd8/0x100 [<ffffffff8148b7cc>] device_create_vargs+0x1c/0x20 [<ffffffff811b775c>] bdi_register+0x8c/0x180 [<ffffffff811b7877>] bdi_register_dev+0x27/0x30 [<ffffffff813317f5>] add_disk+0x175/0x4a0 Cc: <stable@vger.kernel.org> Reported-by: NYi Zhang <yizhan@redhat.com> Tested-by: NYi Zhang <yizhan@redhat.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Fixed up missing 0 return in bdi_register_owner(). Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Vegard Nossum 提交于
I got a KASAN report of use-after-free: ================================================================== BUG: KASAN: use-after-free in klist_iter_exit+0x61/0x70 at addr ffff8800b6581508 Read of size 8 by task trinity-c1/315 ============================================================================= BUG kmalloc-32 (Not tainted): kasan: bad access detected ----------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: Allocated in disk_seqf_start+0x66/0x110 age=144 cpu=1 pid=315 ___slab_alloc+0x4f1/0x520 __slab_alloc.isra.58+0x56/0x80 kmem_cache_alloc_trace+0x260/0x2a0 disk_seqf_start+0x66/0x110 traverse+0x176/0x860 seq_read+0x7e3/0x11a0 proc_reg_read+0xbc/0x180 do_loop_readv_writev+0x134/0x210 do_readv_writev+0x565/0x660 vfs_readv+0x67/0xa0 do_preadv+0x126/0x170 SyS_preadv+0xc/0x10 do_syscall_64+0x1a1/0x460 return_from_SYSCALL_64+0x0/0x6a INFO: Freed in disk_seqf_stop+0x42/0x50 age=160 cpu=1 pid=315 __slab_free+0x17a/0x2c0 kfree+0x20a/0x220 disk_seqf_stop+0x42/0x50 traverse+0x3b5/0x860 seq_read+0x7e3/0x11a0 proc_reg_read+0xbc/0x180 do_loop_readv_writev+0x134/0x210 do_readv_writev+0x565/0x660 vfs_readv+0x67/0xa0 do_preadv+0x126/0x170 SyS_preadv+0xc/0x10 do_syscall_64+0x1a1/0x460 return_from_SYSCALL_64+0x0/0x6a CPU: 1 PID: 315 Comm: trinity-c1 Tainted: G B 4.7.0+ #62 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 ffffea0002d96000 ffff880119b9f918 ffffffff81d6ce81 ffff88011a804480 ffff8800b6581500 ffff880119b9f948 ffffffff8146c7bd ffff88011a804480 ffffea0002d96000 ffff8800b6581500 fffffffffffffff4 ffff880119b9f970 Call Trace: [<ffffffff81d6ce81>] dump_stack+0x65/0x84 [<ffffffff8146c7bd>] print_trailer+0x10d/0x1a0 [<ffffffff814704ff>] object_err+0x2f/0x40 [<ffffffff814754d1>] kasan_report_error+0x221/0x520 [<ffffffff8147590e>] __asan_report_load8_noabort+0x3e/0x40 [<ffffffff83888161>] klist_iter_exit+0x61/0x70 [<ffffffff82404389>] class_dev_iter_exit+0x9/0x10 [<ffffffff81d2e8ea>] disk_seqf_stop+0x3a/0x50 [<ffffffff8151f812>] seq_read+0x4b2/0x11a0 [<ffffffff815f8fdc>] proc_reg_read+0xbc/0x180 [<ffffffff814b24e4>] do_loop_readv_writev+0x134/0x210 [<ffffffff814b4c45>] do_readv_writev+0x565/0x660 [<ffffffff814b8a17>] vfs_readv+0x67/0xa0 [<ffffffff814b8de6>] do_preadv+0x126/0x170 [<ffffffff814b92ec>] SyS_preadv+0xc/0x10 This problem can occur in the following situation: open() - pread() - .seq_start() - iter = kmalloc() // succeeds - seqf->private = iter - .seq_stop() - kfree(seqf->private) - pread() - .seq_start() - iter = kmalloc() // fails - .seq_stop() - class_dev_iter_exit(seqf->private) // boom! old pointer As the comment in disk_seqf_stop() says, stop is called even if start failed, so we need to reinitialise the private pointer to NULL when seq iteration stops. An alternative would be to set the private pointer to NULL when the kmalloc() in disk_seqf_start() fails. Cc: stable@vger.kernel.org Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 07 7月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
We now have implicit batching in the timer wheel. The slack API is no longer used, so remove it. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Andrew F. Davis <afd@ti.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jaehoon Chung <jh80.chung@samsung.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: John Stultz <john.stultz@linaro.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathias Nyman <mathias.nyman@intel.com> Cc: Pali Rohár <pali.rohar@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mmc@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-usb@vger.kernel.org Cc: netdev@vger.kernel.org Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160704094342.189813118@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 28 6月, 2016 1 次提交
-
-
由 Dan Williams 提交于
Now that all drivers that specify a ->driverfs_dev have been converted to device_add_disk(), the pointer can be removed from struct gendisk. Cc: Jens Axboe <axboe@fb.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 16 6月, 2016 1 次提交
-
-
由 Dan Williams 提交于
In preparation for removing the ->driverfs_dev member of a gendisk, add an api that takes the parent device as a parameter to add_disk(). For now this maintains the status quo of WARN()ing on failure, but not return a error code. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 10 1月, 2016 4 次提交
-
-
由 Dan Williams 提交于
These actions are completely managed by a block driver or can use the badblocks api directly. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
The badblocks list attached to a gendisk is allocated by the driver which equates to the driver owning the lifetime of the object. Do not automatically free it in del_gendisk(). This is in preparation for expanding the use of badblocks in libnvdimm drivers and introducing devm_init_badblocks(). Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
For symmetry with badblocks_init() make it clear that this path only destroys incremental allocations of a badblocks instance, and does not free the badblocks instance itself. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Vishal Verma 提交于
NVDIMM devices, which can behave more like DRAM rather than block devices, may develop bad cache lines, or 'poison'. A block device exposed by the pmem driver can then consume poison via a read (or write), and cause a machine check. On platforms without machine check recovery features, this would mean a crash. The block device maintaining a runtime list of all known sectors that have poison can directly avoid this, and also provide a path forward to enable proper handling/recovery for DAX faults on such a device. Use the new badblock management interfaces to add a badblocks list to gendisks. Signed-off-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 09 1月, 2016 1 次提交
-
-
由 Dan Williams 提交于
When tearing down a block device early in its lifetime, userspace may still be performing discovery actions like blkdev_ioctl() to re-read partitions. The nvdimm_revalidate_disk() implementation depends on disk->driverfs_dev to be valid at entry. However, it is set to NULL in del_gendisk() and fatally this is happening *before* the disk device is deleted from userspace view. There's no reason for del_gendisk() to clear ->driverfs_dev. That device is the parent of the disk. It is guaranteed to not be freed until the disk, as a child, drops its ->parent reference. We could also fix this issue locally in nvdimm_revalidate_disk() by using disk_to_dev(disk)->parent, but lets fix it globally since ->driverfs_dev follows the lifetime of the parent. Longer term we should probably just add a @parent parameter to add_disk(), and stop carrying this pointer in the gendisk. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffa00340a8>] nvdimm_revalidate_disk+0x18/0x90 [libnvdimm] CPU: 2 PID: 538 Comm: systemd-udevd Tainted: G O 4.4.0-rc5 #2257 [..] Call Trace: [<ffffffff8143e5c7>] rescan_partitions+0x87/0x2c0 [<ffffffff810f37f9>] ? __lock_is_held+0x49/0x70 [<ffffffff81438c62>] __blkdev_reread_part+0x72/0xb0 [<ffffffff81438cc5>] blkdev_reread_part+0x25/0x40 [<ffffffff8143982d>] blkdev_ioctl+0x4fd/0x9c0 [<ffffffff811246c9>] ? current_kernel_time64+0x69/0xd0 [<ffffffff812916dd>] block_ioctl+0x3d/0x50 [<ffffffff81264c38>] do_vfs_ioctl+0x308/0x560 [<ffffffff8115dbd1>] ? __audit_syscall_entry+0xb1/0x100 [<ffffffff810031d6>] ? do_audit_syscall_entry+0x66/0x70 [<ffffffff81264f09>] SyS_ioctl+0x79/0x90 [<ffffffff81902672>] entry_SYSCALL_64_fastpath+0x12/0x76 Reported-by: NRobert Hu <robert.hu@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 25 11月, 2015 1 次提交
-
-
由 Wei Tang 提交于
This patch fixes the checkpatch.pl error to genhd.c: ERROR: do not initialise statics to 0 or NULL Signed-off-by: NWei Tang <tangwei@cmss.chinamobile.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 22 10月, 2015 1 次提交
-
-
由 Martin K. Petersen 提交于
Up until now the_integrity profile has been dynamically allocated and attached to struct gendisk after the disk has been made active. This causes problems because NVMe devices need to register the profile prior to the partition table being read due to a mandatory metadata buffer requirement. In addition, DM goes through hoops to deal with preallocating, but not initializing integrity profiles. Since the integrity profile is small (4 bytes + a pointer), Christoph suggested moving it to struct gendisk proper. This requires several changes: - Moving the blk_integrity definition to genhd.h. - Inlining blk_integrity in struct gendisk. - Removing the dynamic allocation code. - Adding helper functions which allow gendisk to set up and tear down the integrity sysfs dir when a disk is added/deleted. - Adding a blk_integrity_revalidate() callback for updating the stable pages bdi setting. - The calls that depend on whether a device has an integrity profile or not now key off of the bi->profile pointer. - Simplifying the integrity support routines in DM (Mike Snitzer). Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Reported-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 17 7月, 2015 2 次提交
-
-
由 Ming Lei 提交于
Percpu refcount is the perfect match for partition's case, and the conversion is quite straight. With the convertion, one pair of atomic inc/dec can be saved for accounting block I/O, which is run in hot path of block I/O. Signed-off-by: NMing Lei <tom.leiming@gmail.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Ming Lei 提交于
So the helper can be used in both generic partition case and part0 case. Signed-off-by: NMing Lei <tom.leiming@gmail.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 11 6月, 2015 1 次提交
-
-
由 Dan Williams 提交于
================================= [ INFO: inconsistent lock state ] 4.1.0-rc7+ #217 Tainted: G O --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. swapper/6/0 [HC0[0]:SC1[1]:HE1:SE0] takes: (ext_devt_lock){+.?...}, at: [<ffffffff8143a60c>] blk_free_devt+0x3c/0x70 {SOFTIRQ-ON-W} state was registered at: [<ffffffff810bf6b1>] __lock_acquire+0x461/0x1e70 [<ffffffff810c1947>] lock_acquire+0xb7/0x290 [<ffffffff818ac3a8>] _raw_spin_lock+0x38/0x50 [<ffffffff8143a07d>] blk_alloc_devt+0x6d/0xd0 <-- take the lock in process context [..] [<ffffffff810bf64e>] __lock_acquire+0x3fe/0x1e70 [<ffffffff810c00ad>] ? __lock_acquire+0xe5d/0x1e70 [<ffffffff810c1947>] lock_acquire+0xb7/0x290 [<ffffffff8143a60c>] ? blk_free_devt+0x3c/0x70 [<ffffffff818ac3a8>] _raw_spin_lock+0x38/0x50 [<ffffffff8143a60c>] ? blk_free_devt+0x3c/0x70 [<ffffffff8143a60c>] blk_free_devt+0x3c/0x70 <-- take the lock in softirq [<ffffffff8143bfec>] part_release+0x1c/0x50 [<ffffffff8158edf6>] device_release+0x36/0xb0 [<ffffffff8145ac2b>] kobject_cleanup+0x7b/0x1a0 [<ffffffff8145aad0>] kobject_put+0x30/0x70 [<ffffffff8158f147>] put_device+0x17/0x20 [<ffffffff8143c29c>] delete_partition_rcu_cb+0x16c/0x180 [<ffffffff8143c130>] ? read_dev_sector+0xa0/0xa0 [<ffffffff810e0e0f>] rcu_process_callbacks+0x2ff/0xa90 [<ffffffff810e0dcf>] ? rcu_process_callbacks+0x2bf/0xa90 [<ffffffff81067e2e>] __do_softirq+0xde/0x600 Neil sees this in his tests and it also triggers on pmem driver unbind for the libnvdimm tests. This fix is on top of an initial fix by Keith for incorrect usage of mutex_lock() in this path: 2da78092 "block: Fix dev_t minor allocation lifetime". Both this and 2da78092 are candidates for -stable. Fixes: 2da78092 ("block: Fix dev_t minor allocation lifetime") Cc: <stable@vger.kernel.org> Cc: Keith Busch <keith.busch@intel.com> Reported-by: NNeilBrown <neilb@suse.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 02 6月, 2015 1 次提交
-
-
由 Tejun Heo 提交于
With the planned cgroup writeback support, backing-dev related declarations will be more widely used across block and cgroup; unfortunately, including backing-dev.h from include/linux/blkdev.h makes cyclic include dependency quite likely. This patch separates out backing-dev-defs.h which only has the essential definitions and updates blkdev.h to include it. c files which need access to more backing-dev details now include backing-dev.h directly. This takes backing-dev.h off the common include dependency chain making it a lot easier to use it across block and cgroup. v2: fs/fat build failure fixed. Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 29 5月, 2015 1 次提交
-
-
由 NeilBrown 提交于
bdi_unregister() now contains very little functionality. It contains a "WARN_ON" if bdi->dev is NULL. This warning is of no real consequence as bdi->dev isn't needed by anything else in the function, and it triggers if blk_cleanup_queue() -> bdi_destroy() is called before bdi_unregister, which happens since Commit: 6cd18e71 ("block: destroy bdi before blockdev is unregistered.") So this isn't wanted. It also calls bdi_set_min_ratio(). This needs to be called after writes through the bdi have all been flushed, and before the bdi is destroyed. Calling it early is better than calling it late as it frees up a global resource. Calling it immediately after bdi_wb_shutdown() in bdi_destroy() perfectly fits these requirements. So bdi_unregister() can be discarded with the important content moved to bdi_destroy(), as can the writeback_bdi_unregister event which is already not used. Reported-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org (v4.0) Fixes: c4db59d3 ("fs: don't reassign dirty inodes to default_backing_dev_info") Fixes: 6cd18e71 ("block: destroy bdi before blockdev is unregistered.") Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NDan Williams <dan.j.williams@intel.com> Tested-by: NNicholas Moulin <nicholas.w.moulin@linux.intel.com> Signed-off-by: NNeilBrown <neilb@suse.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 20 11月, 2014 1 次提交
-
-
由 Jens Axboe 提交于
We can get here from blkdev_ioctl() -> blkpg_ioctl() -> add_partition() with a user passed in partno value. If we pass in 0x7fffffff, the new target in disk_expand_part_tbl() overflows the 'int' and we access beyond the end of ptbl->part[] and even write to it when we do the rcu_assign_pointer() to assign the new partition. Reported-by: NDavid Ramos <daramos@stanford.edu> Cc: stable@kernel.org Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 23 9月, 2014 1 次提交
-
-
由 Jens Axboe 提交于
Commit 2da78092 changed the locking from a mutex to a spinlock, so we now longer sleep in this context. But there was a leftover might_sleep() in there, which now triggers since we do the final free from an RCU callback. Get rid of it. Reported-by: NPontus Fuchs <pontus.fuchs@gmail.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 09 9月, 2014 1 次提交
-
-
由 Masanari Iida 提交于
This patch fix spelling typo found in DocBook/kernel-api.xml. It is because the file is generated from the source comments, I have to fix the comments in source codes. Signed-off-by: NMasanari Iida <standby24x7@gmail.com> Acked-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 04 9月, 2014 1 次提交
-
-
由 Keith Busch 提交于
Releases the dev_t minor when all references are closed to prevent another device from acquiring the same major/minor. Since the partition's release may be invoked from call_rcu's soft-irq context, the ext_dev_idr's mutex had to be replaced with a spinlock so as not so sleep. Signed-off-by: NKeith Busch <keith.busch@intel.com> Cc: stable@kernel.org Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 12 9月, 2013 1 次提交
-
-
由 Joe Perches 提交于
Use the helper function instead of __GFP_ZERO. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 04 7月, 2013 1 次提交
-
-
由 Kees Cook 提交于
Disk names may contain arbitrary strings, so they must not be interpreted as format strings. It seems that only md allows arbitrary strings to be used for disk names, but this could allow for a local memory corruption from uid 0 into ring 0. CVE-2013-2851 Signed-off-by: NKees Cook <keescook@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 5月, 2013 1 次提交
-
-
由 Viresh Kumar 提交于
Block layer uses workqueues for multiple purposes. There is no real dependency of scheduling these on the cpu which scheduled them. On a idle system, it is observed that and idle cpu wakes up many times just to service this work. It would be better if we can schedule it on a cpu which the scheduler believes to be the most appropriate one. This patch replaces normal workqueues with power efficient versions. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 12 4月, 2013 1 次提交
-
-
由 Greg Kroah-Hartman 提交于
Now that devtmpfs is caring about uid/gid, we need to use the correct internal types so users who have USER_NS enabled will have things work properly for them. Thanks to Eric for pointing this out, and the patch review. Reported-by: NEric W. Biederman <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Cc: Ming Lei <ming.lei@canonical.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 08 4月, 2013 1 次提交
-
-
由 Kay Sievers 提交于
Some drivers want to tell userspace what uid and gid should be used for their device nodes, so allow that information to percolate through the driver core to userspace in order to make this happen. This means that some systems (i.e. Android and friends) will not need to even run a udev-like daemon for their device node manager and can just rely in devtmpfs fully, reducing their footprint even more. Signed-off-by: NKay Sievers <kay@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-