1. 13 11月, 2010 2 次提交
    • T
      block: clean up blkdev_get() wrappers and their users · d4d77629
      Tejun Heo 提交于
      After recent blkdev_get() modifications, open_by_devnum() and
      open_bdev_exclusive() are simple wrappers around blkdev_get().
      Replace them with blkdev_get_by_dev() and blkdev_get_by_path().
      
      blkdev_get_by_dev() is identical to open_by_devnum().
      blkdev_get_by_path() is slightly different in that it doesn't
      automatically add %FMODE_EXCL to @mode.
      
      All users are converted.  Most conversions are mechanical and don't
      introduce any behavior difference.  There are several exceptions.
      
      * btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no
        reason to OR it explicitly on blkdev_put().
      
      * gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in
        sb->s_mode.
      
      * With the above changes, sb->s_mode now always should contain
        FMODE_EXCL.  WARN_ON_ONCE() added to kill_block_super() to detect
        errors.
      
      The new blkdev_get_*() functions are with proper docbook comments.
      While at it, add function description to blkdev_get() too.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Joern Engel <joern@lazybastard.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: reiserfs-devel@vger.kernel.org
      Cc: xfs-masters@oss.sgi.com
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      d4d77629
    • T
      block: make blkdev_get/put() handle exclusive access · e525fd89
      Tejun Heo 提交于
      Over time, block layer has accumulated a set of APIs dealing with bdev
      open, close, claim and release.
      
      * blkdev_get/put() are the primary open and close functions.
      
      * bd_claim/release() deal with exclusive open.
      
      * open/close_bdev_exclusive() are combination of open and claim and
        the other way around, respectively.
      
      * bd_link/unlink_disk_holder() to create and remove holder/slave
        symlinks.
      
      * open_by_devnum() wraps bdget() + blkdev_get().
      
      The interface is a bit confusing and the decoupling of open and claim
      makes it impossible to properly guarantee exclusive access as
      in-kernel open + claim sequence can disturb the existing exclusive
      open even before the block layer knows the current open if for another
      exclusive access.  Reorganize the interface such that,
      
      * blkdev_get() is extended to include exclusive access management.
        @holder argument is added and, if is @FMODE_EXCL specified, it will
        gain exclusive access atomically w.r.t. other exclusive accesses.
      
      * blkdev_put() is similarly extended.  It now takes @mode argument and
        if @FMODE_EXCL is set, it releases an exclusive access.  Also, when
        the last exclusive claim is released, the holder/slave symlinks are
        removed automatically.
      
      * bd_claim/release() and close_bdev_exclusive() are no longer
        necessary and either made static or removed.
      
      * bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
        is no longer necessary and removed.
      
      * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
        and blkdev_get().  It also has an unexpected extra bdev_read_only()
        test which probably should be moved into blkdev_get().
      
      * open_by_devnum() is modified to take @holder argument and pass it to
        blkdev_get().
      
      Most of bdev open/close operations are unified into blkdev_get/put()
      and most exclusive accesses are tested atomically at the open time (as
      it should).  This cleans up code and removes some, both valid and
      invalid, but unnecessary all the same, corner cases.
      
      open_bdev_exclusive() and open_by_devnum() can use further cleanup -
      rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
      special features.  Well, let's leave them for another day.
      
      Most conversions are straight-forward.  drbd conversion is a bit more
      involved as there was some reordering, but the logic should stay the
      same.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Brown <neilb@suse.de>
      Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Cc: Peter Osterlund <petero2@telia.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: dm-devel@redhat.com
      Cc: drbd-dev@lists.linbit.com
      Cc: Leo Chen <leochen@broadcom.com>
      Cc: Scott Branden <sbranden@broadcom.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Joern Engel <joern@logfs.org>
      Cc: reiserfs-devel@vger.kernel.org
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      e525fd89
  2. 09 11月, 2010 5 次提交
    • T
      ext4: Add new ext4 inode tracepoints · 7ff9c073
      Theodore Ts'o 提交于
      Add ext4_evict_inode, ext4_drop_inode, ext4_mark_inode_dirty, and
      ext4_begin_ordered_truncate()
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7ff9c073
    • T
      ext4: Don't call sb_issue_discard() in ext4_free_blocks() · b56ff9d3
      Theodore Ts'o 提交于
      Commit 5c521830 (ext4: Support discard requests when running in
      no-journal mode) attempts to add sb_issue_discard() for data blocks
      (in data=writeback mode) and in no-journal mode.  Unfortunately, this
      no longer works, because in commit dd3932ed (block: remove
      BLKDEV_IFL_WAIT), sb_issue_discard() only presents a synchronous
      interface, and there are times when we call ext4_free_blocks() when we
      are are holding a spinlock, or are otherwise in an atomic context.
      
      For now, I've removed the call to sb_issue_discard() to prevent a
      deadlock or (if spinlock debugging is enabled) failures like this:
      
      BUG: scheduling while atomic: rc.sysinit/1376/0x00000002
      Pid: 1376, comm: rc.sysinit Not tainted 2.6.36-ARCH #1
      Call Trace:
      [<ffffffff810397ce>] __schedule_bug+0x5e/0x70
      [<ffffffff81403110>] schedule+0x950/0xa70
      [<ffffffff81060bad>] ? insert_work+0x7d/0x90
      [<ffffffff81060fbd>] ? queue_work_on+0x1d/0x30
      [<ffffffff81061127>] ? queue_work+0x37/0x60
      [<ffffffff8140377d>] schedule_timeout+0x21d/0x360
      [<ffffffff812031c3>] ? generic_make_request+0x2c3/0x540
      [<ffffffff81402680>] wait_for_common+0xc0/0x150
      [<ffffffff81041490>] ? default_wake_function+0x0/0x10
      [<ffffffff812034bc>] ? submit_bio+0x7c/0x100
      [<ffffffff810680a0>] ? wake_bit_function+0x0/0x40
      [<ffffffff814027b8>] wait_for_completion+0x18/0x20
      [<ffffffff8120a969>] blkdev_issue_discard+0x1b9/0x210
      [<ffffffff811ba03e>] ext4_free_blocks+0x68e/0xb60
      [<ffffffff811b1650>] ? __ext4_handle_dirty_metadata+0x110/0x120
      [<ffffffff811b098c>] ext4_ext_truncate+0x8cc/0xa70
      [<ffffffff810d713e>] ? pagevec_lookup+0x1e/0x30
      [<ffffffff81191618>] ext4_truncate+0x178/0x5d0
      [<ffffffff810eacbb>] ? unmap_mapping_range+0xab/0x280
      [<ffffffff810d8976>] vmtruncate+0x56/0x70
      [<ffffffff811925cb>] ext4_setattr+0x14b/0x460
      [<ffffffff811319e4>] notify_change+0x194/0x380
      [<ffffffff81117f80>] do_truncate+0x60/0x90
      [<ffffffff811e08fa>] ? security_inode_permission+0x1a/0x20
      [<ffffffff811eaec1>] ? tomoyo_path_truncate+0x11/0x20
      [<ffffffff81127539>] do_last+0x5d9/0x770
      [<ffffffff811278bd>] do_filp_open+0x1ed/0x680
      [<ffffffff8140644f>] ? page_fault+0x1f/0x30
      [<ffffffff81132bfc>] ? alloc_fd+0xec/0x140
      [<ffffffff81118db1>] do_sys_open+0x61/0x120
      [<ffffffff81118e8b>] sys_open+0x1b/0x20
      [<ffffffff81002e6b>] system_call_fastpath+0x16/0x1b
      
      https://bugzilla.kernel.org/show_bug.cgi?id=22302Reported-by: NMathias Burén <mathias.buren@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: jiayingz@google.com
      b56ff9d3
    • D
      ext4: do not try to grab the s_umount semaphore in ext4_quota_off · 87009d86
      Dmitry Monakhov 提交于
      It's not needed to sync the filesystem, and it fixes a lock_dep complaint.
      Signed-off-by: NDmitry Monakhov <dmonakhov@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      87009d86
    • T
      ext4: fix potential race when freeing ext4_io_page structures · 83668e71
      Theodore Ts'o 提交于
      Use an atomic_t and make sure we don't free the structure while we
      might still be submitting I/O for that page.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      83668e71
    • T
      ext4: handle writeback of inodes which are being freed · f7ad6d2e
      Theodore Ts'o 提交于
      The following BUG can occur when an inode which is getting freed when
      it still has dirty pages outstanding, and it gets deleted (in this
      because it was the target of a rename).  In ordered mode, we need to
      make sure the data pages are written just in case we crash before the
      rename (or unlink) is committed.  If the inode is being freed then
      when we try to igrab the inode, we end up tripping the BUG_ON at
      fs/ext4/page-io.c:146.
      
      To solve this problem, we need to keep track of the number of io
      callbacks which are pending, and avoid destroying the inode until they
      have all been completed.  That way we don't have to bump the inode
      count to keep the inode from being destroyed; an approach which
      doesn't work because the count could have already been dropped down to
      zero before the inode writeback has started (at which point we're not
      allowed to bump the count back up to 1, since it's already started
      getting freed).
      
      Thanks to Dave Chinner for suggesting this approach, which is also
      used by XFS.
      
        kernel BUG at /scratch_space/linux-2.6/fs/ext4/page-io.c:146!
        Call Trace:
         [<ffffffff811075b1>] ext4_bio_write_page+0x172/0x307
         [<ffffffff811033a7>] mpage_da_submit_io+0x2f9/0x37b
         [<ffffffff811068d7>] mpage_da_map_and_submit+0x2cc/0x2e2
         [<ffffffff811069b3>] mpage_add_bh_to_extent+0xc6/0xd5
         [<ffffffff81106c66>] write_cache_pages_da+0x2a4/0x3ac
         [<ffffffff81107044>] ext4_da_writepages+0x2d6/0x44d
         [<ffffffff81087910>] do_writepages+0x1c/0x25
         [<ffffffff810810a4>] __filemap_fdatawrite_range+0x4b/0x4d
         [<ffffffff810815f5>] filemap_fdatawrite_range+0xe/0x10
         [<ffffffff81122a2e>] jbd2_journal_begin_ordered_truncate+0x7b/0xa2
         [<ffffffff8110615d>] ext4_evict_inode+0x57/0x24c
         [<ffffffff810c14a3>] evict+0x22/0x92
         [<ffffffff810c1a3d>] iput+0x212/0x249
         [<ffffffff810bdf16>] dentry_iput+0xa1/0xb9
         [<ffffffff810bdf6b>] d_kill+0x3d/0x5d
         [<ffffffff810be613>] dput+0x13a/0x147
         [<ffffffff810b990d>] sys_renameat+0x1b5/0x258
         [<ffffffff81145f71>] ? _atomic_dec_and_lock+0x2d/0x4c
         [<ffffffff810b2950>] ? cp_new_stat+0xde/0xea
         [<ffffffff810b29c1>] ? sys_newlstat+0x2d/0x38
         [<ffffffff810b99c6>] sys_rename+0x16/0x18
         [<ffffffff81002a2b>] system_call_fastpath+0x16/0x1b
      Reported-by: NNick Bowler <nbowler@elliptictech.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Tested-by: NNick Bowler <nbowler@elliptictech.com>
      f7ad6d2e
  3. 04 11月, 2010 1 次提交
  4. 03 11月, 2010 2 次提交
  5. 02 11月, 2010 1 次提交
  6. 29 10月, 2010 3 次提交
  7. 28 10月, 2010 26 次提交