1. 23 2月, 2013 1 次提交
  2. 22 2月, 2013 3 次提交
    • G
      block: remove redundant check to bd_openers() · de33127d
      Guo Chao 提交于
      bd_openers is stable under bd_mutex, no need to check it twice.
      Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: M. Hindess <hindessm@uk.ibm.com>
      Cc: Nikanth Karthikesan <knikanth@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      de33127d
    • G
      block: use i_size_write() in bd_set_size() · d646a02a
      Guo Chao 提交于
      blkdev_ioctl(GETBLKSIZE) uses i_size_read() to read size of block device.
      If we update block size directly, reader may see intermediate result in
      some machines and configurations.  Use i_size_write() instead.
      Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: M. Hindess <hindessm@uk.ibm.com>
      Cc: Nikanth Karthikesan <knikanth@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d646a02a
    • M
      fs/block_dev.c: page cache wrongly left invalidated after revalidate_disk() · 7630b661
      MITSUNARI Shigeo 提交于
      We found that bdev->bd_invalidated was left set once revalidate_disk()
      is called, which results in page cache flush every time that device is
      open.
      
      Specifically, we found this problem in MD block device.  Once we resize
      a MD device, mdadm --monitor periodically flush all page cache for that
      device every 60 or 1000 seconds when it opens the device.
      
      This bug lies since at least 3.2.0 till the latest kernel(3.6.2).  Patch
      is attached.
      
      The following steps will reproduce the problem.
      
      1. prepair a block device (eg /dev/sdb).
      
      2. create two partitions:
      
         sudo parted /dev/sdb
         mklabel gpt
         mkpart primary 0% 50%
         mkpart primary 50% 100%
      
      3. create a md device.
      
         sudo mdadm -C /dev/md/hoge -l 1 -n 2 -e 1.2 --assume-clean --auto=md --symlink=no /dev/sdb1 /dev/sdb2
      
      4. create file system and mount it
      
         sudo mkfs.ext3 /dev/md/hoge
         sudo mkdir /mnt/test
         sudo mount /dev/md/hoge /mnt/test
      
      5. try to resize the device
      
         sudo mdadm -G /dev/md/hoge --size=max
      
      6. create a file to fill file cache.
      
        sudo dd if=/dev/urandom of=/mnt/test/data bs=1M count=10
      
      and verify the current status of file by free command.
      
      7. mdadm monitor will open the md device every 1000 seconds and you
         will find all file cache on the device are cleared.
      
      The timing can be reduced by the following steps.
      
      a) kill mdadm and restart it with --delay option
      
         /sbin/mdadm --monitor --delay=30 --pid-file /var/run/mdadm/monitor.pid --daemonise --scan --syslog
      
      or open the md device directly.
      
         sudo dd if=/dev/md/hoge of=/dev/null bs=4096 count=1
      Signed-off-by: NMITSUNARI Shigeo <herumi@nifty.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7630b661
  3. 18 12月, 2012 1 次提交
  4. 09 12月, 2012 1 次提交
    • L
      vfs: fix O_DIRECT read past end of block device · 684c9aae
      Linus Torvalds 提交于
      The direct-IO write path already had the i_size checks in mm/filemap.c,
      but it turns out the read path did not, and removing the block size
      checks in fs/block_dev.c (commit bbec0270: "blkdev_max_block: make
      private to fs/buffer.c") removed the magic "shrink IO to past the end of
      the device" code there.
      
      Fix it by truncating the IO to the size of the block device, like the
      write path already does.
      
      NOTE! I suspect the write path would be *much* better off doing it this
      way in fs/block_dev.c, rather than hidden deep in mm/filemap.c.  The
      mm/filemap.c code is extremely hard to follow, and has various
      conditionals on the target being a block device (ie the flag passed in
      to 'generic_write_checks()', along with a conditional update of the
      inode timestamp etc).
      
      It is also quite possible that we should treat this whole block device
      size as a "s_maxbytes" issue, and try to make the logic even more
      generic.  However, in the meantime this is the fairly minimal targeted
      fix.
      
      Noted by Milan Broz thanks to a regression test for the cryptsetup
      reencrypt tool.
      Reported-and-tested-by: NMilan Broz <mbroz@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      684c9aae
  5. 30 11月, 2012 2 次提交
  6. 29 10月, 2012 1 次提交
  7. 26 9月, 2012 3 次提交
    • F
      fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared · 3eab7315
      Fengguang Wu 提交于
      blkdev_mmap() isn't used outside of fs/block_dev.c, mark it as
      static.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3eab7315
    • M
      blockdev: turn a rw semaphore into a percpu rw semaphore · 62ac665f
      Mikulas Patocka 提交于
      This avoids cache line bouncing when many processes lock the semaphore
      for read.
      
      New percpu lock implementation
      
      The lock consists of an array of percpu unsigned integers, a boolean
      variable and a mutex.
      
      When we take the lock for read, we enter rcu read section, check for a
      "locked" variable. If it is false, we increase a percpu counter on the
      current cpu and exit the rcu section. If "locked" is true, we exit the
      rcu section, take the mutex and drop it (this waits until a writer
      finished) and retry.
      
      Unlocking for read just decreases percpu variable. Note that we can
      unlock on a difference cpu than where we locked, in this case the
      counter underflows. The sum of all percpu counters represents the number
      of processes that hold the lock for read.
      
      When we need to lock for write, we take the mutex, set "locked" variable
      to true and synchronize rcu. Since RCU has been synchronized, no
      processes can create new read locks. We wait until the sum of percpu
      counters is zero - when it is, there are no readers in the critical
      section.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      62ac665f
    • M
      Fix a crash when block device is read and block size is changed at the same time · b87570f5
      Mikulas Patocka 提交于
      The kernel may crash when block size is changed and I/O is issued
      simultaneously.
      
      Because some subsystems (udev or lvm) may read any block device anytime,
      the bug actually puts any code that changes a block device size in
      jeopardy.
      
      The crash can be reproduced if you place "msleep(1000)" to
      blkdev_get_blocks just before "bh->b_size = max_blocks <<
      inode->i_blkbits;".
      Then, run "dd if=/dev/ram0 of=/dev/null bs=4k count=1 iflag=direct"
      While it is waiting in msleep, run "blockdev --setbsz 2048 /dev/ram0"
      You get a BUG.
      
      The direct and non-direct I/O is written with the assumption that block
      size does not change. It doesn't seem practical to fix these crashes
      one-by-one there may be many crash possibilities when block size changes
      at a certain place and it is impossible to find them all and verify the
      code.
      
      This patch introduces a new rw-lock bd_block_size_semaphore. The lock is
      taken for read during I/O. It is taken for write when changing block
      size. Consequently, block size can't be changed while I/O is being
      submitted.
      
      For asynchronous I/O, the patch only prevents block size change while
      the I/O is being submitted. The block size can change when the I/O is in
      progress or when the I/O is being finished. This is acceptable because
      there are no accesses to block size when asynchronous I/O is being
      finished.
      
      The patch prevents block size changing while the device is mapped with
      mmap.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b87570f5
  8. 02 8月, 2012 1 次提交
  9. 23 7月, 2012 1 次提交
  10. 11 5月, 2012 1 次提交
    • J
      block: don't mark buffers beyond end of disk as mapped · 080399aa
      Jeff Moyer 提交于
      Hi,
      
      We have a bug report open where a squashfs image mounted on ppc64 would
      exhibit errors due to trying to read beyond the end of the disk.  It can
      easily be reproduced by doing the following:
      
      [root@ibm-p750e-02-lp3 ~]# ls -l install.img
      -rw-r--r-- 1 root root 142032896 Apr 30 16:46 install.img
      [root@ibm-p750e-02-lp3 ~]# mount -o loop ./install.img /mnt/test
      [root@ibm-p750e-02-lp3 ~]# dd if=/dev/loop0 of=/dev/null
      dd: reading `/dev/loop0': Input/output error
      277376+0 records in
      277376+0 records out
      142016512 bytes (142 MB) copied, 0.9465 s, 150 MB/s
      
      In dmesg, you'll find the following:
      
      squashfs: version 4.0 (2009/01/31) Phillip Lougher
      [   43.106012] attempt to access beyond end of device
      [   43.106029] loop0: rw=0, want=277410, limit=277408
      [   43.106039] Buffer I/O error on device loop0, logical block 138704
      [   43.106053] attempt to access beyond end of device
      [   43.106057] loop0: rw=0, want=277412, limit=277408
      [   43.106061] Buffer I/O error on device loop0, logical block 138705
      [   43.106066] attempt to access beyond end of device
      [   43.106070] loop0: rw=0, want=277414, limit=277408
      [   43.106073] Buffer I/O error on device loop0, logical block 138706
      [   43.106078] attempt to access beyond end of device
      [   43.106081] loop0: rw=0, want=277416, limit=277408
      [   43.106085] Buffer I/O error on device loop0, logical block 138707
      [   43.106089] attempt to access beyond end of device
      [   43.106093] loop0: rw=0, want=277418, limit=277408
      [   43.106096] Buffer I/O error on device loop0, logical block 138708
      [   43.106101] attempt to access beyond end of device
      [   43.106104] loop0: rw=0, want=277420, limit=277408
      [   43.106108] Buffer I/O error on device loop0, logical block 138709
      [   43.106112] attempt to access beyond end of device
      [   43.106116] loop0: rw=0, want=277422, limit=277408
      [   43.106120] Buffer I/O error on device loop0, logical block 138710
      [   43.106124] attempt to access beyond end of device
      [   43.106128] loop0: rw=0, want=277424, limit=277408
      [   43.106131] Buffer I/O error on device loop0, logical block 138711
      [   43.106135] attempt to access beyond end of device
      [   43.106139] loop0: rw=0, want=277426, limit=277408
      [   43.106143] Buffer I/O error on device loop0, logical block 138712
      [   43.106147] attempt to access beyond end of device
      [   43.106151] loop0: rw=0, want=277428, limit=277408
      [   43.106154] Buffer I/O error on device loop0, logical block 138713
      [   43.106158] attempt to access beyond end of device
      [   43.106162] loop0: rw=0, want=277430, limit=277408
      [   43.106166] attempt to access beyond end of device
      [   43.106169] loop0: rw=0, want=277432, limit=277408
      ...
      [   43.106307] attempt to access beyond end of device
      [   43.106311] loop0: rw=0, want=277470, limit=2774
      
      Squashfs manages to read in the end block(s) of the disk during the
      mount operation.  Then, when dd reads the block device, it leads to
      block_read_full_page being called with buffers that are beyond end of
      disk, but are marked as mapped.  Thus, it would end up submitting read
      I/O against them, resulting in the errors mentioned above.  I fixed the
      problem by modifying init_page_buffers to only set the buffer mapped if
      it fell inside of i_size.
      
      Cheers,
      Jeff
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Acked-by: NNick Piggin <npiggin@kernel.dk>
      
      --
      
      Changes from v1->v2: re-used max_block, as suggested by Nick Piggin.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      080399aa
  11. 06 5月, 2012 1 次提交
  12. 24 3月, 2012 1 次提交
  13. 02 3月, 2012 1 次提交
  14. 24 1月, 2012 1 次提交
    • D
      mm: cleancache: s/flush/invalidate/ · 3167760f
      Dan Magenheimer 提交于
      Per akpm suggestions alter the use of the term flush to be
      invalidate. The next patch will do this across all MM.
      
      This change is completely cosmetic.
      
      [v9: akpm@linux-foundation.org: change "flush" to "invalidate", part 3]
      Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      Reviewed-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Rik Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      [v10: Fixed  fs: move code out of buffer.c conflict change]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3167760f
  15. 13 1月, 2012 1 次提交
  16. 11 1月, 2012 1 次提交
    • S
      block_dev: Suppress bdev_cache_init() kmemleak warninig · ace8577a
      Sergey Senozhatsky 提交于
      Kmemleak reports the following warning in bdev_cache_init()
      [    0.003738] kmemleak: Object 0xffff880153035200 (size 256):
      [    0.003823] kmemleak:   comm "swapper/0", pid 0, jiffies 4294667299
      [    0.003909] kmemleak:   min_count = 1
      [    0.003988] kmemleak:   count = 0
      [    0.004066] kmemleak:   flags = 0x1
      [    0.004144] kmemleak:   checksum = 0
      [    0.004224] kmemleak:   backtrace:
      [    0.004303]      [<ffffffff814755ac>] kmemleak_alloc+0x21/0x3e
      [    0.004446]      [<ffffffff811100ba>] kmem_cache_alloc+0xca/0x1dc
      [    0.004592]      [<ffffffff811371b1>] alloc_vfsmnt+0x1f/0x198
      [    0.004736]      [<ffffffff811375c5>] vfs_kern_mount+0x36/0xd2
      [    0.004879]      [<ffffffff8113929a>] kern_mount_data+0x18/0x32
      [    0.005025]      [<ffffffff81ab9075>] bdev_cache_init+0x51/0x81
      [    0.005169]      [<ffffffff81ab8abf>] vfs_caches_init+0x101/0x10d
      [    0.005313]      [<ffffffff81a9bae3>] start_kernel+0x344/0x383
      [    0.005456]      [<ffffffff81a9b2a7>] x86_64_start_reservations+0xae/0xb2
      [    0.005602]      [<ffffffff81a9b3ad>] x86_64_start_kernel+0x102/0x111
      [    0.005747]      [<ffffffffffffffff>] 0xffffffffffffffff
      [    0.008653] kmemleak: Trying to color unknown object at 0xffff880153035220 as Grey
      [    0.008754] Pid: 0, comm: swapper/0 Not tainted 3.3.0-rc0-dbg-04200-g8180888-dirty #888
      [    0.008856] Call Trace:
      [    0.008934]  [<ffffffff81118704>] ? find_and_get_object+0x44/0x118
      [    0.009023]  [<ffffffff81118fe6>] paint_ptr+0x57/0x8f
      [    0.009109]  [<ffffffff81475935>] kmemleak_not_leak+0x23/0x42
      [    0.009195]  [<ffffffff81ab9096>] bdev_cache_init+0x72/0x81
      [    0.009282]  [<ffffffff81ab8abf>] vfs_caches_init+0x101/0x10d
      [    0.009368]  [<ffffffff81a9bae3>] start_kernel+0x344/0x383
      [    0.009466]  [<ffffffff81a9b2a7>] x86_64_start_reservations+0xae/0xb2
      [    0.009555]  [<ffffffff81a9b140>] ? early_idt_handlers+0x140/0x140
      [    0.009643]  [<ffffffff81a9b3ad>] x86_64_start_kernel+0x102/0x111
      
      due to attempt to mark pointer to `struct vfsmount' as a gray object, which
      is embedded into `struct mount' returned from alloc_vfsmnt().
      
      Make `bd_mnt' static, avoiding need to tell kmemleak to mark it gray, as
      suggested by Al Viro.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ace8577a
  17. 04 1月, 2012 3 次提交
  18. 24 10月, 2011 1 次提交
    • T
      block: make gendisk hold a reference to its queue · f992ae80
      Tejun Heo 提交于
      The following command sequence triggers an oops.
      
      # mount /dev/sdb1 /mnt
      # echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
      # umount /mnt
      
       general protection fault: 0000 [#1] PREEMPT SMP
       CPU 2
       Modules linked in:
      
       Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs
       RIP: 0010:[<ffffffff810d0879>]  [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60
      ...
       Call Trace:
        [<ffffffff810d2845>] lock_acquire+0x95/0x140
        [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50
        [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70
        [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0
        [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0
        [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0
        [<ffffffff811c40df>] blkdev_put+0x5f/0x190
        [<ffffffff8118f18d>] kill_block_super+0x4d/0x80
        [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70
        [<ffffffff8119003a>] deactivate_super+0x4a/0x70
        [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130
        [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0
        [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b
      
      This is because bdev holds on to disk but disk doesn't pin the
      associated queue.  If a SCSI device is removed while the device is
      still open, the sdev puts the base reference to the queue on release.
      When the bdev is finally released, the associated queue is already
      gone along with the bdi and bdev_inode_switch_bdi() ends up
      dereferencing already freed bdi.
      
      Even if it were not for this bug, disk not holding onto the associated
      queue is very unusual and error-prone.
      
      Fix it by making add_disk() take an extra reference to its queue and
      put it on disk_release() and ensuring that disk and its fops owner are
      put in that order after all accesses to the disk and queue are
      complete.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f992ae80
  19. 19 10月, 2011 1 次提交
    • T
      block: make gendisk hold a reference to its queue · 523e1d39
      Tejun Heo 提交于
      The following command sequence triggers an oops.
      
      # mount /dev/sdb1 /mnt
      # echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
      # umount /mnt
      
       general protection fault: 0000 [#1] PREEMPT SMP
       CPU 2
       Modules linked in:
      
       Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs
       RIP: 0010:[<ffffffff810d0879>]  [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60
      ...
       Call Trace:
        [<ffffffff810d2845>] lock_acquire+0x95/0x140
        [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50
        [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70
        [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0
        [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0
        [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0
        [<ffffffff811c40df>] blkdev_put+0x5f/0x190
        [<ffffffff8118f18d>] kill_block_super+0x4d/0x80
        [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70
        [<ffffffff8119003a>] deactivate_super+0x4a/0x70
        [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130
        [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0
        [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b
      
      This is because bdev holds on to disk but disk doesn't pin the
      associated queue.  If a SCSI device is removed while the device is
      still open, the sdev puts the base reference to the queue on release.
      When the bdev is finally released, the associated queue is already
      gone along with the bdi and bdev_inode_switch_bdi() ends up
      dereferencing already freed bdi.
      
      Even if it were not for this bug, disk not holding onto the associated
      queue is very unusual and error-prone.
      
      Fix it by making add_disk() take an extra reference to its queue and
      put it on disk_release() and ensuring that disk and its fops owner are
      put in that order after all accesses to the disk and queue are
      complete.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      523e1d39
  20. 10 9月, 2011 1 次提交
    • N
      Avoid dereferencing a 'request_queue' after last close. · 94007751
      NeilBrown 提交于
      On the last close of an 'md' device which as been stopped, the device
      is destroyed and in particular the request_queue is freed.  The free
      is done in a separate thread so it might happen a short time later.
      
      __blkdev_put calls bdev_inode_switch_bdi *after* ->release has been
      called.
      
      Since commit f758eeab
      bdev_inode_switch_bdi will dereference the 'old' bdi, which lives
      inside a request_queue, to get a spin lock.  This causes the last
      close on an md device to sometime take a spin_lock which lives in
      freed memory - which results in an oops.
      
      So move the called to bdev_inode_switch_bdi before the call to
      ->release.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Acked-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: stable@kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      94007751
  21. 24 8月, 2011 1 次提交
    • T
      block: add GENHD_FL_NO_PART_SCAN · d27769ec
      Tejun Heo 提交于
      There are cases where suppressing partition scan is useful - e.g. for
      lo devices and pseudo SATA devices which advertise to be a disk but
      get upset on partition scan (some port multiplier control devices show
      such behavior).
      
      This patch adds GENHD_FL_NO_PART_SCAN which suppresses partition scan
      regardless of the number of possible partitions.  disk_partitionable()
      is renamed to disk_part_scan_enabled() as suppressing partition scan
      doesn't imply the device can't be partitioned using
      BLKPG_ADD/DEL_PARTITION calls from userland.  show_partition() now
      directly tests disk_max_parts() to maintain backward-compatibility.
      
      -v2: Updated to make it clear that only partition scan is suppressed
           not partitioning itself as suggested by Kay Sievers.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      d27769ec
  22. 02 8月, 2011 1 次提交
  23. 01 8月, 2011 1 次提交
  24. 21 7月, 2011 2 次提交
  25. 01 7月, 2011 1 次提交
    • T
      block: flush MEDIA_CHANGE from drivers on close(2) · 85ef06d1
      Tejun Heo 提交于
      Currently, only open(2) is defined as the 'clearing' point.  It has
      two roles - first, it's an acknowledgement from userland indicating
      that the event has been received and kernel can clear pending states
      and proceed to generate more events.  Secondly, it's passed on to
      device drivers as a hint indicating that a synchronization point has
      been reached and it might want to take a deeper look at the device.
      
      The latter currently is only used by sr which uses two different
      mechanisms - GET_EVENT_MEDIA_STATUS_NOTIFICATION and TEST_UNIT_READY
      to discover events, where the former is lighter weight and safe to be
      used repeatedly but may not provide full coverage.  Among other
      things, GET_EVENT can't detect media removal while TUR can.
      
      This patch makes close(2) - blkdev_put() - indicate clearing hint for
      MEDIA_CHANGE to drivers.  disk_check_events() is renamed to
      disk_flush_events() and updated to take @mask for events to flush
      which is or'd to ev->clearing and will be passed to the driver on the
      next ->check_events() invocation.
      
      This change makes sr generate MEDIA_CHANGE when media is ejected from
      userland - e.g. with eject(1).
      
      Note: Given the current usage, it seems @clearing hint is needlessly
      complex.  disk_clear_events() can simply clear all events and the hint
      can be boolean @flush.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      85ef06d1
  26. 13 6月, 2011 1 次提交
    • T
      block: use the passed in @bdev when claiming if partno is zero · d4c208b8
      Tejun Heo 提交于
      6b4517a7 (block: implement bd_claiming and claiming block)
      introduced claiming block to support O_EXCL blkdev opens properly.
      
      bd_start_claiming() looks up the part 0 bdev and starts claiming
      block.  The function assumed that there is only one part 0 bdev and
      always used bdget_disk(disk, 0) to look it up; unfortunately, this
      isn't true for some drivers (floppy) which use multiple block devices
      to denote different operating parameters for the same physical device.
      There can be multiple part 0 bdev's for the same device number.
      
      This incorrect assumption caused the wrong bdev to be used during
      claiming leading to unbalanced bd_holders as reported in the following
      bug.
      
        https://bugzilla.kernel.org/show_bug.cgi?id=28522
      
      This patch updates bd_start_claiming() such that it uses the bdev
      specified as argument if its partno is zero.
      
      Note that this means that different bdev's can be used for the same
      device and O_EXCL check can be effectively bypassed.  It has always
      been broken that way and floppy is fortunately on its way out.  Leave
      that breakage alone.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NAlex Villacis Lasso <avillaci@ceibo.fiec.espol.edu.ec>
      Tested-by: NAlex Villacis Lasso <avillaci@ceibo.fiec.espol.edu.ec>
      Cc: stable@kernel.org	# >= v2.6.36
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      d4c208b8
  27. 08 6月, 2011 1 次提交
    • C
      writeback: split inode_wb_list_lock into bdi_writeback.list_lock · f758eeab
      Christoph Hellwig 提交于
      Split the global inode_wb_list_lock into a per-bdi_writeback list_lock,
      as it's currently the most contended lock in the system for metadata
      heavy workloads.  It won't help for single-filesystem workloads for
      which we'll need the I/O-less balance_dirty_pages, but at least we
      can dedicate a cpu to spinning on each bdi now for larger systems.
      
      Based on earlier patches from Nick Piggin and Dave Chinner.
      
      It reduces lock contentions to 1/4 in this test case:
      10 HDD JBOD, 100 dd on each disk, XFS, 6GB ram
      
      lock_stat version 0.3
      -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                    class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total
      -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      vanilla 2.6.39-rc3:
                            inode_wb_list_lock:         42590          44433           0.12         147.74      144127.35         252274         886792           0.08         121.34      917211.23
                            ------------------
                            inode_wb_list_lock              2          [<ffffffff81165da5>] bdev_inode_switch_bdi+0x29/0x85
                            inode_wb_list_lock             34          [<ffffffff8115bd0b>] inode_wb_list_del+0x22/0x49
                            inode_wb_list_lock          12893          [<ffffffff8115bb53>] __mark_inode_dirty+0x170/0x1d0
                            inode_wb_list_lock          10702          [<ffffffff8115afef>] writeback_single_inode+0x16d/0x20a
                            ------------------
                            inode_wb_list_lock              2          [<ffffffff81165da5>] bdev_inode_switch_bdi+0x29/0x85
                            inode_wb_list_lock             19          [<ffffffff8115bd0b>] inode_wb_list_del+0x22/0x49
                            inode_wb_list_lock           5550          [<ffffffff8115bb53>] __mark_inode_dirty+0x170/0x1d0
                            inode_wb_list_lock           8511          [<ffffffff8115b4ad>] writeback_sb_inodes+0x10f/0x157
      
      2.6.39-rc3 + patch:
                      &(&wb->list_lock)->rlock:         11383          11657           0.14         151.69       40429.51          90825         527918           0.11         145.90      556843.37
                      ------------------------
                      &(&wb->list_lock)->rlock             10          [<ffffffff8115b189>] inode_wb_list_del+0x5f/0x86
                      &(&wb->list_lock)->rlock           1493          [<ffffffff8115b1ed>] writeback_inodes_wb+0x3d/0x150
                      &(&wb->list_lock)->rlock           3652          [<ffffffff8115a8e9>] writeback_sb_inodes+0x123/0x16f
                      &(&wb->list_lock)->rlock           1412          [<ffffffff8115a38e>] writeback_single_inode+0x17f/0x223
                      ------------------------
                      &(&wb->list_lock)->rlock              3          [<ffffffff8110b5af>] bdi_lock_two+0x46/0x4b
                      &(&wb->list_lock)->rlock              6          [<ffffffff8115b189>] inode_wb_list_del+0x5f/0x86
                      &(&wb->list_lock)->rlock           2061          [<ffffffff8115af97>] __mark_inode_dirty+0x173/0x1cf
                      &(&wb->list_lock)->rlock           2629          [<ffffffff8115a8e9>] writeback_sb_inodes+0x123/0x16f
      
      hughd@google.com: fix recursive lock when bdi_lock_two() is called with new the same as old
      akpm@linux-foundation.org: cleanup bdev_inode_switch_bdi() comment
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      f758eeab
  28. 01 6月, 2011 1 次提交
  29. 23 5月, 2011 2 次提交
  30. 29 4月, 2011 1 次提交
  31. 22 4月, 2011 1 次提交