1. 29 10月, 2012 1 次提交
  2. 26 9月, 2012 3 次提交
    • F
      fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared · 3eab7315
      Fengguang Wu 提交于
      blkdev_mmap() isn't used outside of fs/block_dev.c, mark it as
      static.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3eab7315
    • M
      blockdev: turn a rw semaphore into a percpu rw semaphore · 62ac665f
      Mikulas Patocka 提交于
      This avoids cache line bouncing when many processes lock the semaphore
      for read.
      
      New percpu lock implementation
      
      The lock consists of an array of percpu unsigned integers, a boolean
      variable and a mutex.
      
      When we take the lock for read, we enter rcu read section, check for a
      "locked" variable. If it is false, we increase a percpu counter on the
      current cpu and exit the rcu section. If "locked" is true, we exit the
      rcu section, take the mutex and drop it (this waits until a writer
      finished) and retry.
      
      Unlocking for read just decreases percpu variable. Note that we can
      unlock on a difference cpu than where we locked, in this case the
      counter underflows. The sum of all percpu counters represents the number
      of processes that hold the lock for read.
      
      When we need to lock for write, we take the mutex, set "locked" variable
      to true and synchronize rcu. Since RCU has been synchronized, no
      processes can create new read locks. We wait until the sum of percpu
      counters is zero - when it is, there are no readers in the critical
      section.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      62ac665f
    • M
      Fix a crash when block device is read and block size is changed at the same time · b87570f5
      Mikulas Patocka 提交于
      The kernel may crash when block size is changed and I/O is issued
      simultaneously.
      
      Because some subsystems (udev or lvm) may read any block device anytime,
      the bug actually puts any code that changes a block device size in
      jeopardy.
      
      The crash can be reproduced if you place "msleep(1000)" to
      blkdev_get_blocks just before "bh->b_size = max_blocks <<
      inode->i_blkbits;".
      Then, run "dd if=/dev/ram0 of=/dev/null bs=4k count=1 iflag=direct"
      While it is waiting in msleep, run "blockdev --setbsz 2048 /dev/ram0"
      You get a BUG.
      
      The direct and non-direct I/O is written with the assumption that block
      size does not change. It doesn't seem practical to fix these crashes
      one-by-one there may be many crash possibilities when block size changes
      at a certain place and it is impossible to find them all and verify the
      code.
      
      This patch introduces a new rw-lock bd_block_size_semaphore. The lock is
      taken for read during I/O. It is taken for write when changing block
      size. Consequently, block size can't be changed while I/O is being
      submitted.
      
      For asynchronous I/O, the patch only prevents block size change while
      the I/O is being submitted. The block size can change when the I/O is in
      progress or when the I/O is being finished. This is acceptable because
      there are no accesses to block size when asynchronous I/O is being
      finished.
      
      The patch prevents block size changing while the device is mapped with
      mmap.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b87570f5
  3. 02 8月, 2012 1 次提交
  4. 23 7月, 2012 1 次提交
  5. 11 5月, 2012 1 次提交
    • J
      block: don't mark buffers beyond end of disk as mapped · 080399aa
      Jeff Moyer 提交于
      Hi,
      
      We have a bug report open where a squashfs image mounted on ppc64 would
      exhibit errors due to trying to read beyond the end of the disk.  It can
      easily be reproduced by doing the following:
      
      [root@ibm-p750e-02-lp3 ~]# ls -l install.img
      -rw-r--r-- 1 root root 142032896 Apr 30 16:46 install.img
      [root@ibm-p750e-02-lp3 ~]# mount -o loop ./install.img /mnt/test
      [root@ibm-p750e-02-lp3 ~]# dd if=/dev/loop0 of=/dev/null
      dd: reading `/dev/loop0': Input/output error
      277376+0 records in
      277376+0 records out
      142016512 bytes (142 MB) copied, 0.9465 s, 150 MB/s
      
      In dmesg, you'll find the following:
      
      squashfs: version 4.0 (2009/01/31) Phillip Lougher
      [   43.106012] attempt to access beyond end of device
      [   43.106029] loop0: rw=0, want=277410, limit=277408
      [   43.106039] Buffer I/O error on device loop0, logical block 138704
      [   43.106053] attempt to access beyond end of device
      [   43.106057] loop0: rw=0, want=277412, limit=277408
      [   43.106061] Buffer I/O error on device loop0, logical block 138705
      [   43.106066] attempt to access beyond end of device
      [   43.106070] loop0: rw=0, want=277414, limit=277408
      [   43.106073] Buffer I/O error on device loop0, logical block 138706
      [   43.106078] attempt to access beyond end of device
      [   43.106081] loop0: rw=0, want=277416, limit=277408
      [   43.106085] Buffer I/O error on device loop0, logical block 138707
      [   43.106089] attempt to access beyond end of device
      [   43.106093] loop0: rw=0, want=277418, limit=277408
      [   43.106096] Buffer I/O error on device loop0, logical block 138708
      [   43.106101] attempt to access beyond end of device
      [   43.106104] loop0: rw=0, want=277420, limit=277408
      [   43.106108] Buffer I/O error on device loop0, logical block 138709
      [   43.106112] attempt to access beyond end of device
      [   43.106116] loop0: rw=0, want=277422, limit=277408
      [   43.106120] Buffer I/O error on device loop0, logical block 138710
      [   43.106124] attempt to access beyond end of device
      [   43.106128] loop0: rw=0, want=277424, limit=277408
      [   43.106131] Buffer I/O error on device loop0, logical block 138711
      [   43.106135] attempt to access beyond end of device
      [   43.106139] loop0: rw=0, want=277426, limit=277408
      [   43.106143] Buffer I/O error on device loop0, logical block 138712
      [   43.106147] attempt to access beyond end of device
      [   43.106151] loop0: rw=0, want=277428, limit=277408
      [   43.106154] Buffer I/O error on device loop0, logical block 138713
      [   43.106158] attempt to access beyond end of device
      [   43.106162] loop0: rw=0, want=277430, limit=277408
      [   43.106166] attempt to access beyond end of device
      [   43.106169] loop0: rw=0, want=277432, limit=277408
      ...
      [   43.106307] attempt to access beyond end of device
      [   43.106311] loop0: rw=0, want=277470, limit=2774
      
      Squashfs manages to read in the end block(s) of the disk during the
      mount operation.  Then, when dd reads the block device, it leads to
      block_read_full_page being called with buffers that are beyond end of
      disk, but are marked as mapped.  Thus, it would end up submitting read
      I/O against them, resulting in the errors mentioned above.  I fixed the
      problem by modifying init_page_buffers to only set the buffer mapped if
      it fell inside of i_size.
      
      Cheers,
      Jeff
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Acked-by: NNick Piggin <npiggin@kernel.dk>
      
      --
      
      Changes from v1->v2: re-used max_block, as suggested by Nick Piggin.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      080399aa
  6. 06 5月, 2012 1 次提交
  7. 24 3月, 2012 1 次提交
  8. 02 3月, 2012 1 次提交
  9. 24 1月, 2012 1 次提交
    • D
      mm: cleancache: s/flush/invalidate/ · 3167760f
      Dan Magenheimer 提交于
      Per akpm suggestions alter the use of the term flush to be
      invalidate. The next patch will do this across all MM.
      
      This change is completely cosmetic.
      
      [v9: akpm@linux-foundation.org: change "flush" to "invalidate", part 3]
      Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      Reviewed-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Rik Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      [v10: Fixed  fs: move code out of buffer.c conflict change]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3167760f
  10. 13 1月, 2012 1 次提交
  11. 11 1月, 2012 1 次提交
    • S
      block_dev: Suppress bdev_cache_init() kmemleak warninig · ace8577a
      Sergey Senozhatsky 提交于
      Kmemleak reports the following warning in bdev_cache_init()
      [    0.003738] kmemleak: Object 0xffff880153035200 (size 256):
      [    0.003823] kmemleak:   comm "swapper/0", pid 0, jiffies 4294667299
      [    0.003909] kmemleak:   min_count = 1
      [    0.003988] kmemleak:   count = 0
      [    0.004066] kmemleak:   flags = 0x1
      [    0.004144] kmemleak:   checksum = 0
      [    0.004224] kmemleak:   backtrace:
      [    0.004303]      [<ffffffff814755ac>] kmemleak_alloc+0x21/0x3e
      [    0.004446]      [<ffffffff811100ba>] kmem_cache_alloc+0xca/0x1dc
      [    0.004592]      [<ffffffff811371b1>] alloc_vfsmnt+0x1f/0x198
      [    0.004736]      [<ffffffff811375c5>] vfs_kern_mount+0x36/0xd2
      [    0.004879]      [<ffffffff8113929a>] kern_mount_data+0x18/0x32
      [    0.005025]      [<ffffffff81ab9075>] bdev_cache_init+0x51/0x81
      [    0.005169]      [<ffffffff81ab8abf>] vfs_caches_init+0x101/0x10d
      [    0.005313]      [<ffffffff81a9bae3>] start_kernel+0x344/0x383
      [    0.005456]      [<ffffffff81a9b2a7>] x86_64_start_reservations+0xae/0xb2
      [    0.005602]      [<ffffffff81a9b3ad>] x86_64_start_kernel+0x102/0x111
      [    0.005747]      [<ffffffffffffffff>] 0xffffffffffffffff
      [    0.008653] kmemleak: Trying to color unknown object at 0xffff880153035220 as Grey
      [    0.008754] Pid: 0, comm: swapper/0 Not tainted 3.3.0-rc0-dbg-04200-g8180888-dirty #888
      [    0.008856] Call Trace:
      [    0.008934]  [<ffffffff81118704>] ? find_and_get_object+0x44/0x118
      [    0.009023]  [<ffffffff81118fe6>] paint_ptr+0x57/0x8f
      [    0.009109]  [<ffffffff81475935>] kmemleak_not_leak+0x23/0x42
      [    0.009195]  [<ffffffff81ab9096>] bdev_cache_init+0x72/0x81
      [    0.009282]  [<ffffffff81ab8abf>] vfs_caches_init+0x101/0x10d
      [    0.009368]  [<ffffffff81a9bae3>] start_kernel+0x344/0x383
      [    0.009466]  [<ffffffff81a9b2a7>] x86_64_start_reservations+0xae/0xb2
      [    0.009555]  [<ffffffff81a9b140>] ? early_idt_handlers+0x140/0x140
      [    0.009643]  [<ffffffff81a9b3ad>] x86_64_start_kernel+0x102/0x111
      
      due to attempt to mark pointer to `struct vfsmount' as a gray object, which
      is embedded into `struct mount' returned from alloc_vfsmnt().
      
      Make `bd_mnt' static, avoiding need to tell kmemleak to mark it gray, as
      suggested by Al Viro.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ace8577a
  12. 04 1月, 2012 3 次提交
  13. 24 10月, 2011 1 次提交
    • T
      block: make gendisk hold a reference to its queue · f992ae80
      Tejun Heo 提交于
      The following command sequence triggers an oops.
      
      # mount /dev/sdb1 /mnt
      # echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
      # umount /mnt
      
       general protection fault: 0000 [#1] PREEMPT SMP
       CPU 2
       Modules linked in:
      
       Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs
       RIP: 0010:[<ffffffff810d0879>]  [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60
      ...
       Call Trace:
        [<ffffffff810d2845>] lock_acquire+0x95/0x140
        [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50
        [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70
        [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0
        [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0
        [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0
        [<ffffffff811c40df>] blkdev_put+0x5f/0x190
        [<ffffffff8118f18d>] kill_block_super+0x4d/0x80
        [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70
        [<ffffffff8119003a>] deactivate_super+0x4a/0x70
        [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130
        [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0
        [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b
      
      This is because bdev holds on to disk but disk doesn't pin the
      associated queue.  If a SCSI device is removed while the device is
      still open, the sdev puts the base reference to the queue on release.
      When the bdev is finally released, the associated queue is already
      gone along with the bdi and bdev_inode_switch_bdi() ends up
      dereferencing already freed bdi.
      
      Even if it were not for this bug, disk not holding onto the associated
      queue is very unusual and error-prone.
      
      Fix it by making add_disk() take an extra reference to its queue and
      put it on disk_release() and ensuring that disk and its fops owner are
      put in that order after all accesses to the disk and queue are
      complete.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f992ae80
  14. 19 10月, 2011 1 次提交
    • T
      block: make gendisk hold a reference to its queue · 523e1d39
      Tejun Heo 提交于
      The following command sequence triggers an oops.
      
      # mount /dev/sdb1 /mnt
      # echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
      # umount /mnt
      
       general protection fault: 0000 [#1] PREEMPT SMP
       CPU 2
       Modules linked in:
      
       Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs
       RIP: 0010:[<ffffffff810d0879>]  [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60
      ...
       Call Trace:
        [<ffffffff810d2845>] lock_acquire+0x95/0x140
        [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50
        [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70
        [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0
        [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0
        [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0
        [<ffffffff811c40df>] blkdev_put+0x5f/0x190
        [<ffffffff8118f18d>] kill_block_super+0x4d/0x80
        [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70
        [<ffffffff8119003a>] deactivate_super+0x4a/0x70
        [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130
        [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0
        [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b
      
      This is because bdev holds on to disk but disk doesn't pin the
      associated queue.  If a SCSI device is removed while the device is
      still open, the sdev puts the base reference to the queue on release.
      When the bdev is finally released, the associated queue is already
      gone along with the bdi and bdev_inode_switch_bdi() ends up
      dereferencing already freed bdi.
      
      Even if it were not for this bug, disk not holding onto the associated
      queue is very unusual and error-prone.
      
      Fix it by making add_disk() take an extra reference to its queue and
      put it on disk_release() and ensuring that disk and its fops owner are
      put in that order after all accesses to the disk and queue are
      complete.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      523e1d39
  15. 10 9月, 2011 1 次提交
    • N
      Avoid dereferencing a 'request_queue' after last close. · 94007751
      NeilBrown 提交于
      On the last close of an 'md' device which as been stopped, the device
      is destroyed and in particular the request_queue is freed.  The free
      is done in a separate thread so it might happen a short time later.
      
      __blkdev_put calls bdev_inode_switch_bdi *after* ->release has been
      called.
      
      Since commit f758eeab
      bdev_inode_switch_bdi will dereference the 'old' bdi, which lives
      inside a request_queue, to get a spin lock.  This causes the last
      close on an md device to sometime take a spin_lock which lives in
      freed memory - which results in an oops.
      
      So move the called to bdev_inode_switch_bdi before the call to
      ->release.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Acked-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: stable@kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      94007751
  16. 24 8月, 2011 1 次提交
    • T
      block: add GENHD_FL_NO_PART_SCAN · d27769ec
      Tejun Heo 提交于
      There are cases where suppressing partition scan is useful - e.g. for
      lo devices and pseudo SATA devices which advertise to be a disk but
      get upset on partition scan (some port multiplier control devices show
      such behavior).
      
      This patch adds GENHD_FL_NO_PART_SCAN which suppresses partition scan
      regardless of the number of possible partitions.  disk_partitionable()
      is renamed to disk_part_scan_enabled() as suppressing partition scan
      doesn't imply the device can't be partitioned using
      BLKPG_ADD/DEL_PARTITION calls from userland.  show_partition() now
      directly tests disk_max_parts() to maintain backward-compatibility.
      
      -v2: Updated to make it clear that only partition scan is suppressed
           not partitioning itself as suggested by Kay Sievers.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      d27769ec
  17. 02 8月, 2011 1 次提交
  18. 01 8月, 2011 1 次提交
  19. 21 7月, 2011 2 次提交
  20. 01 7月, 2011 1 次提交
    • T
      block: flush MEDIA_CHANGE from drivers on close(2) · 85ef06d1
      Tejun Heo 提交于
      Currently, only open(2) is defined as the 'clearing' point.  It has
      two roles - first, it's an acknowledgement from userland indicating
      that the event has been received and kernel can clear pending states
      and proceed to generate more events.  Secondly, it's passed on to
      device drivers as a hint indicating that a synchronization point has
      been reached and it might want to take a deeper look at the device.
      
      The latter currently is only used by sr which uses two different
      mechanisms - GET_EVENT_MEDIA_STATUS_NOTIFICATION and TEST_UNIT_READY
      to discover events, where the former is lighter weight and safe to be
      used repeatedly but may not provide full coverage.  Among other
      things, GET_EVENT can't detect media removal while TUR can.
      
      This patch makes close(2) - blkdev_put() - indicate clearing hint for
      MEDIA_CHANGE to drivers.  disk_check_events() is renamed to
      disk_flush_events() and updated to take @mask for events to flush
      which is or'd to ev->clearing and will be passed to the driver on the
      next ->check_events() invocation.
      
      This change makes sr generate MEDIA_CHANGE when media is ejected from
      userland - e.g. with eject(1).
      
      Note: Given the current usage, it seems @clearing hint is needlessly
      complex.  disk_clear_events() can simply clear all events and the hint
      can be boolean @flush.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      85ef06d1
  21. 13 6月, 2011 1 次提交
    • T
      block: use the passed in @bdev when claiming if partno is zero · d4c208b8
      Tejun Heo 提交于
      6b4517a7 (block: implement bd_claiming and claiming block)
      introduced claiming block to support O_EXCL blkdev opens properly.
      
      bd_start_claiming() looks up the part 0 bdev and starts claiming
      block.  The function assumed that there is only one part 0 bdev and
      always used bdget_disk(disk, 0) to look it up; unfortunately, this
      isn't true for some drivers (floppy) which use multiple block devices
      to denote different operating parameters for the same physical device.
      There can be multiple part 0 bdev's for the same device number.
      
      This incorrect assumption caused the wrong bdev to be used during
      claiming leading to unbalanced bd_holders as reported in the following
      bug.
      
        https://bugzilla.kernel.org/show_bug.cgi?id=28522
      
      This patch updates bd_start_claiming() such that it uses the bdev
      specified as argument if its partno is zero.
      
      Note that this means that different bdev's can be used for the same
      device and O_EXCL check can be effectively bypassed.  It has always
      been broken that way and floppy is fortunately on its way out.  Leave
      that breakage alone.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NAlex Villacis Lasso <avillaci@ceibo.fiec.espol.edu.ec>
      Tested-by: NAlex Villacis Lasso <avillaci@ceibo.fiec.espol.edu.ec>
      Cc: stable@kernel.org	# >= v2.6.36
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      d4c208b8
  22. 08 6月, 2011 1 次提交
    • C
      writeback: split inode_wb_list_lock into bdi_writeback.list_lock · f758eeab
      Christoph Hellwig 提交于
      Split the global inode_wb_list_lock into a per-bdi_writeback list_lock,
      as it's currently the most contended lock in the system for metadata
      heavy workloads.  It won't help for single-filesystem workloads for
      which we'll need the I/O-less balance_dirty_pages, but at least we
      can dedicate a cpu to spinning on each bdi now for larger systems.
      
      Based on earlier patches from Nick Piggin and Dave Chinner.
      
      It reduces lock contentions to 1/4 in this test case:
      10 HDD JBOD, 100 dd on each disk, XFS, 6GB ram
      
      lock_stat version 0.3
      -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                    class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total
      -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      vanilla 2.6.39-rc3:
                            inode_wb_list_lock:         42590          44433           0.12         147.74      144127.35         252274         886792           0.08         121.34      917211.23
                            ------------------
                            inode_wb_list_lock              2          [<ffffffff81165da5>] bdev_inode_switch_bdi+0x29/0x85
                            inode_wb_list_lock             34          [<ffffffff8115bd0b>] inode_wb_list_del+0x22/0x49
                            inode_wb_list_lock          12893          [<ffffffff8115bb53>] __mark_inode_dirty+0x170/0x1d0
                            inode_wb_list_lock          10702          [<ffffffff8115afef>] writeback_single_inode+0x16d/0x20a
                            ------------------
                            inode_wb_list_lock              2          [<ffffffff81165da5>] bdev_inode_switch_bdi+0x29/0x85
                            inode_wb_list_lock             19          [<ffffffff8115bd0b>] inode_wb_list_del+0x22/0x49
                            inode_wb_list_lock           5550          [<ffffffff8115bb53>] __mark_inode_dirty+0x170/0x1d0
                            inode_wb_list_lock           8511          [<ffffffff8115b4ad>] writeback_sb_inodes+0x10f/0x157
      
      2.6.39-rc3 + patch:
                      &(&wb->list_lock)->rlock:         11383          11657           0.14         151.69       40429.51          90825         527918           0.11         145.90      556843.37
                      ------------------------
                      &(&wb->list_lock)->rlock             10          [<ffffffff8115b189>] inode_wb_list_del+0x5f/0x86
                      &(&wb->list_lock)->rlock           1493          [<ffffffff8115b1ed>] writeback_inodes_wb+0x3d/0x150
                      &(&wb->list_lock)->rlock           3652          [<ffffffff8115a8e9>] writeback_sb_inodes+0x123/0x16f
                      &(&wb->list_lock)->rlock           1412          [<ffffffff8115a38e>] writeback_single_inode+0x17f/0x223
                      ------------------------
                      &(&wb->list_lock)->rlock              3          [<ffffffff8110b5af>] bdi_lock_two+0x46/0x4b
                      &(&wb->list_lock)->rlock              6          [<ffffffff8115b189>] inode_wb_list_del+0x5f/0x86
                      &(&wb->list_lock)->rlock           2061          [<ffffffff8115af97>] __mark_inode_dirty+0x173/0x1cf
                      &(&wb->list_lock)->rlock           2629          [<ffffffff8115a8e9>] writeback_sb_inodes+0x123/0x16f
      
      hughd@google.com: fix recursive lock when bdi_lock_two() is called with new the same as old
      akpm@linux-foundation.org: cleanup bdev_inode_switch_bdi() comment
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      f758eeab
  23. 01 6月, 2011 1 次提交
  24. 23 5月, 2011 2 次提交
  25. 29 4月, 2011 1 次提交
  26. 22 4月, 2011 2 次提交
  27. 31 3月, 2011 1 次提交
  28. 25 3月, 2011 2 次提交
    • D
      fs: move i_wb_list out from under inode_lock · a66979ab
      Dave Chinner 提交于
      Protect the inode writeback list with a new global lock
      inode_wb_list_lock and use it to protect the list manipulations and
      traversals. This lock replaces the inode_lock as the inodes on the
      list can be validity checked while holding the inode->i_lock and
      hence the inode_lock is no longer needed to protect the list.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a66979ab
    • D
      fs: protect inode->i_state with inode->i_lock · 250df6ed
      Dave Chinner 提交于
      Protect inode state transitions and validity checks with the
      inode->i_lock. This enables us to make inode state transitions
      independently of the inode_lock and is the first step to peeling
      away the inode_lock from the code.
      
      This requires that __iget() is done atomically with i_state checks
      during list traversals so that we don't race with another thread
      marking the inode I_FREEING between the state check and grabbing the
      reference.
      
      Also remove the unlock_new_inode() memory barrier optimisation
      required to avoid taking the inode_lock when clearing I_NEW.
      Simplify the code by simply taking the inode->i_lock around the
      state change and wakeup. Because the wakeup is no longer tricky,
      remove the wake_up_inode() function and open code the wakeup where
      necessary.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      250df6ed
  29. 19 3月, 2011 1 次提交
  30. 10 3月, 2011 3 次提交
    • J
      block: remove per-queue plugging · 7eaceacc
      Jens Axboe 提交于
      Code has been converted over to the new explicit on-stack plugging,
      and delay users have been converted to use the new API for that.
      So lets kill off the old plugging along with aops->sync_page().
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      7eaceacc
    • T
      block: Don't check events while open is in progress · 69e02c59
      Tejun Heo 提交于
      Not all block drivers clear events immediately after reporting.  Some
      do so in ->revalidate_disk() or other steps during ->open().  There is
      a slim chance event poll may happen between the clearing event check
      from check_disk_change() and the actual clearing of the events which
      would result in spurious events.
      
      Block event checks while block device open is in progress.  There is
      no need to kick explicit event check afterwards as events are always
      checked during open.
      
      -v2: The original patch could have called disk_unblock_events() with
           an already released or %NULL @disk causing oops.  Fixed by making
           sure references are put after disk_unblock_events() is called.
           It also makes the error path of __blkdev_get() a bit simpler.
           This problem was reported by Jens.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      69e02c59
    • T
      block: Don't check events on close unless it was blocked · 6936217c
      Tejun Heo 提交于
      The block event mechanism currently always checks events when the
      device is being closed regardless of the open mode.  The intention was
      to allow detection of EJECT_REQUEST when a device is closed whether
      disk event polling is enabled or not.
      
      This is unnecessary as, for devices of interest, events are checked
      from either userland or kernel and in the former case ->check_events()
      is performed on open of each poll attempt anyway.  Furthermore, this
      unconditional event check on close makes the code susceptible to event
      loop if the block driver doesn't clear reported events correctly - an
      event triggers userland to open and close the device which in turn
      causes another event, rinse and repeat.
      
      Check events on close only if it was blocked by excl write open.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      6936217c