1. 26 3月, 2016 2 次提交
    • R
      ocfs2: use c_new to indicate newly allocated extents · b46637d5
      Ryan Ding 提交于
      To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock.
      
      There is a problem in ocfs2's direct io implement: if system crashed
      after extents allocated, and before data return, we will get a extent
      with dirty data on disk.  This problem violate the journal=order
      semantics, which means meta changes take effect after data written to
      disk.  To resolve this issue, direct write can use the UNWRITTEN flag to
      describe a extent during direct data writeback.  The direct write
      procedure should act in the following order:
      
      phase 1: alloc extent with UNWRITTEN flag
      phase 2: submit direct data to disk, add zero page to page cache
      phase 3: clear UNWRITTEN flag when data has been written to disk
      
      This patch is to change the 'c_unwritten' member of
      ocfs2_write_cluster_desc to 'c_clear_unwritten'.  Means whether to clear
      the unwritten flag.  It do not care if a extent is allocated or not.
      And use 'c_new' to specify a newly allocated extent.  So the direct io
      procedure can use c_clear_unwritten to control the UNWRITTEN bit on
      extent.
      Signed-off-by: NRyan Ding <ryan.ding@oracle.com>
      Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b46637d5
    • R
      ocfs2: add ocfs2_write_type_t type to identify the caller of write · c1ad1e3c
      Ryan Ding 提交于
      Patchset: fix ocfs2 direct io code patch to support sparse file and data
      ordering semantics
      
      The idea is to use buffer io(more precisely use the interface
      ocfs2_write_begin_nolock & ocfs2_write_end_nolock) to do the zero work
      beyond block size.  And clear UNWRITTEN flag until direct io data has
      been written to disk, which can prevent data corruption when system
      crashed during direct write.
      
      And we will also archive a better performance: eg.  dd direct write new
      file with block size 4KB: before this patchset:
        2.5 MB/s
      after this patchset:
        66.4 MB/s
      
      This patch (of 8):
      
      To support direct io in ocfs2_write_begin_nolock &
      ocfs2_write_end_nolock.
      
      Remove unused args filp & flags.  Add new arg type.  The type is one of
      buffer/direct/mmap.  Indicate 3 way to perform write.  buffer/mmap type
      has implemented.  direct type will be implemented later.
      Signed-off-by: NRyan Ding <ryan.ding@oracle.com>
      Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c1ad1e3c
  2. 28 2月, 2016 1 次提交
  3. 08 2月, 2016 1 次提交
  4. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  5. 06 11月, 2015 2 次提交
  6. 05 9月, 2015 5 次提交
    • J
      ocfs2: neaten do_error, ocfs2_error and ocfs2_abort · 7ecef14a
      Joe Perches 提交于
      These uses sometimes do and sometimes don't have '\n' terminations.  Make
      the uses consistently use '\n' terminations and remove the newline from
      the functions.
      
      Miscellanea:
      
      o Coalesce formats
      o Realign arguments
      Signed-off-by: NJoe Perches <joe@perches.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ecef14a
    • Y
      ocfs2: call ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock() · 7f27ec97
      yangwenfang 提交于
      1: After we call ocfs2_journal_access_di() in ocfs2_write_begin(),
         jbd2_journal_restart() may also be called, in this function transaction
         A's t_updates-- and obtains a new transaction B.  If
         jbd2_journal_commit_transaction() is happened to commit transaction A,
         when t_updates==0, it will continue to complete commit and unfile
         buffer.
      
         So when jbd2_journal_dirty_metadata(), the handle is pointed a new
         transaction B, and the buffer head's journal head is already freed,
         jh->b_transaction == NULL, jh->b_next_transaction == NULL, it returns
         EINVAL, So it triggers the BUG_ON(status).
      
      thread 1                                          jbd2
      ocfs2_write_begin                     jbd2_journal_commit_transaction
      ocfs2_write_begin_nolock
        ocfs2_start_trans
          jbd2__journal_start(t_updates+1,
                             transaction A)
          ocfs2_journal_access_di
          ocfs2_write_cluster_by_desc
            ocfs2_mark_extent_written
              ocfs2_change_extent_flag
                ocfs2_split_extent
                  ocfs2_extend_rotate_transaction
                    jbd2_journal_restart
                    (t_updates-1,transaction B) t_updates==0
                                              __jbd2_journal_refile_buffer
                                              (jh->b_transaction = NULL)
      ocfs2_write_end
      ocfs2_write_end_nolock
          ocfs2_journal_dirty
              jbd2_journal_dirty_metadata(bug)
         ocfs2_commit_trans
      
      2.  In ext4, I found that: jbd2_journal_get_write_access() called by
         ext4_write_end.
      
      ext4_write_begin
          ext4_journal_start
              __ext4_journal_start_sb
                  ext4_journal_check_start
                  jbd2__journal_start
      
      ext4_write_end
          ext4_mark_inode_dirty
              ext4_reserve_inode_write
                  ext4_journal_get_write_access
                      jbd2_journal_get_write_access
              ext4_mark_iloc_dirty
                  ext4_do_update_inode
                      ext4_handle_dirty_metadata
                          jbd2_journal_dirty_metadata
      
      3. So I think we should put ocfs2_journal_access_di before
         ocfs2_journal_dirty in the ocfs2_write_end.  and it works well after my
         modification.
      Signed-off-by: Nvicky <vicky.yangwenfang@huawei.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Zhangguanghui <zhang.guanghui@h3c.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f27ec97
    • W
      ocfs2: add ip_alloc_sem in direct IO to protect allocation changes · 6ab855a9
      WeiWei Wang 提交于
      In ocfs2, ip_alloc_sem is used to protect allocation changes on the
      node.  In direct IO, we add ip_alloc_sem to protect date consistent
      between direct-io and ocfs2_truncate_file race (buffer io use
      ip_alloc_sem already).  Although inode->i_mutex lock is used to avoid
      concurrency of above situation, i think ip_alloc_sem is still needed
      because protect allocation changes is significant.
      
      Other filesystem like ext4 also uses rw_semaphore to protect data
      consistent between get_block-vs-truncate race by other means, So
      ip_alloc_sem in ocfs2 direct io is needed.
      Signed-off-by: NWeiwei Wang <wangww631@huawei.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ab855a9
    • J
      ocfs2: fix several issues of append dio · faaebf18
      Joseph Qi 提交于
      1) Take rw EX lock in case of append dio.
      2) Explicitly treat the error code -EIOCBQUEUED as normal.
      3) Set di_bh to NULL after brelse if it may be used again later.
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Yiwen Jiang <jiangyiwen@huawei.com>
      Cc: Weiwei Wang <wangww631@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      faaebf18
    • J
      ocfs2: fix race between dio and recover orphan · 512f62ac
      Joseph Qi 提交于
      During direct io the inode will be added to orphan first and then
      deleted from orphan.  There is a race window that the orphan entry will
      be deleted twice and thus trigger the BUG when validating
      OCFS2_DIO_ORPHANED_FL in ocfs2_del_inode_from_orphan.
      
      ocfs2_direct_IO_write
          ...
          ocfs2_add_inode_to_orphan
          >>>>>>>> race window.
                   1) another node may rm the file and then down, this node
                   take care of orphan recovery and clear flag
                   OCFS2_DIO_ORPHANED_FL.
                   2) since rw lock is unlocked, it may race with another
                   orphan recovery and append dio.
          ocfs2_del_inode_from_orphan
      
      So take inode mutex lock when recovering orphans and make rw unlock at the
      end of aio write in case of append dio.
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Reported-by: NYiwen Jiang <jiangyiwen@huawei.com>
      Cc: Weiwei Wang <wangww631@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      512f62ac
  7. 07 8月, 2015 1 次提交
  8. 25 6月, 2015 3 次提交
  9. 15 4月, 2015 4 次提交
  10. 12 4月, 2015 3 次提交
  11. 26 3月, 2015 1 次提交
  12. 17 2月, 2015 2 次提交
  13. 19 12月, 2014 1 次提交
    • J
      ocfs2: fix journal commit deadlock · 136f49b9
      Junxiao Bi 提交于
      For buffer write, page lock will be got in write_begin and released in
      write_end, in ocfs2_write_end_nolock(), before it unlock the page in
      ocfs2_free_write_ctxt(), it calls ocfs2_run_deallocs(), this will ask
      for the read lock of journal->j_trans_barrier.  Holding page lock and
      ask for journal->j_trans_barrier breaks the locking order.
      
      This will cause a deadlock with journal commit threads, ocfs2cmt will
      get write lock of journal->j_trans_barrier first, then it wakes up
      kjournald2 to do the commit work, at last it waits until done.  To
      commit journal, kjournald2 needs flushing data first, it needs get the
      cache page lock.
      
      Since some ocfs2 cluster locks are holding by write process, this
      deadlock may hung the whole cluster.
      
      unlock pages before ocfs2_run_deallocs() can fix the locking order, also
      put unlock before ocfs2_commit_trans() to make page lock is unlocked
      before j_trans_barrier to preserve unlocking order.
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: NWengang Wang <wen.gang.wang@oracle.com>
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      136f49b9
  14. 11 12月, 2014 1 次提交
  15. 10 10月, 2014 1 次提交
    • J
      ocfs2: fix deadlock due to wrong locking order · f775da2f
      Junxiao Bi 提交于
      For commit ocfs2 journal, ocfs2 journal thread will acquire the mutex
      osb->journal->j_trans_barrier and wake up jbd2 commit thread, then it
      will wait until jbd2 commit thread done. In order journal mode, jbd2
      needs flushing dirty data pages first, and this needs get page lock.
      So osb->journal->j_trans_barrier should be got before page lock.
      
      But ocfs2_write_zero_page() and ocfs2_write_begin_inline() obey this
      locking order, and this will cause deadlock and hung the whole cluster.
      
      One deadlock catched is the following:
      
      PID: 13449  TASK: ffff8802e2f08180  CPU: 31  COMMAND: "oracle"
       #0 [ffff8802ee3f79b0] __schedule at ffffffff8150a524
       #1 [ffff8802ee3f7a58] schedule at ffffffff8150acbf
       #2 [ffff8802ee3f7a68] rwsem_down_failed_common at ffffffff8150cb85
       #3 [ffff8802ee3f7ad8] rwsem_down_read_failed at ffffffff8150cc55
       #4 [ffff8802ee3f7ae8] call_rwsem_down_read_failed at ffffffff812617a4
       #5 [ffff8802ee3f7b50] ocfs2_start_trans at ffffffffa0498919 [ocfs2]
       #6 [ffff8802ee3f7ba0] ocfs2_zero_start_ordered_transaction at ffffffffa048b2b8 [ocfs2]
       #7 [ffff8802ee3f7bf0] ocfs2_write_zero_page at ffffffffa048e9bd [ocfs2]
       #8 [ffff8802ee3f7c80] ocfs2_zero_extend_range at ffffffffa048ec83 [ocfs2]
       #9 [ffff8802ee3f7ce0] ocfs2_zero_extend at ffffffffa048edfd [ocfs2]
       #10 [ffff8802ee3f7d50] ocfs2_extend_file at ffffffffa049079e [ocfs2]
       #11 [ffff8802ee3f7da0] ocfs2_setattr at ffffffffa04910ed [ocfs2]
       #12 [ffff8802ee3f7e70] notify_change at ffffffff81187d29
       #13 [ffff8802ee3f7ee0] do_truncate at ffffffff8116bbc1
       #14 [ffff8802ee3f7f50] sys_ftruncate at ffffffff8116bcbd
       #15 [ffff8802ee3f7f80] system_call_fastpath at ffffffff81515142
          RIP: 00007f8de750c6f7  RSP: 00007fffe786e478  RFLAGS: 00000206
          RAX: 000000000000004d  RBX: ffffffff81515142  RCX: 0000000000000000
          RDX: 0000000000000200  RSI: 0000000000028400  RDI: 000000000000000d
          RBP: 00007fffe786e040   R8: 0000000000000000   R9: 000000000000000d
          R10: 0000000000000000  R11: 0000000000000206  R12: 000000000000000d
          R13: 00007fffe786e710  R14: 00007f8de70f8340  R15: 0000000000028400
          ORIG_RAX: 000000000000004d  CS: 0033  SS: 002b
      
      crash64> bt
      PID: 7610   TASK: ffff88100fd56140  CPU: 1   COMMAND: "ocfs2cmt"
       #0 [ffff88100f4d1c50] __schedule at ffffffff8150a524
       #1 [ffff88100f4d1cf8] schedule at ffffffff8150acbf
       #2 [ffff88100f4d1d08] jbd2_log_wait_commit at ffffffffa01274fd [jbd2]
       #3 [ffff88100f4d1d98] jbd2_journal_flush at ffffffffa01280b4 [jbd2]
       #4 [ffff88100f4d1dd8] ocfs2_commit_cache at ffffffffa0499b14 [ocfs2]
       #5 [ffff88100f4d1e38] ocfs2_commit_thread at ffffffffa0499d38 [ocfs2]
       #6 [ffff88100f4d1ee8] kthread at ffffffff81090db6
       #7 [ffff88100f4d1f48] kernel_thread_helper at ffffffff81516284
      
      crash64> bt
      PID: 7609   TASK: ffff88100f2d4480  CPU: 0   COMMAND: "jbd2/dm-20-86"
       #0 [ffff88100def3920] __schedule at ffffffff8150a524
       #1 [ffff88100def39c8] schedule at ffffffff8150acbf
       #2 [ffff88100def39d8] io_schedule at ffffffff8150ad6c
       #3 [ffff88100def39f8] sleep_on_page at ffffffff8111069e
       #4 [ffff88100def3a08] __wait_on_bit_lock at ffffffff8150b30a
       #5 [ffff88100def3a58] __lock_page at ffffffff81110687
       #6 [ffff88100def3ab8] write_cache_pages at ffffffff8111b752
       #7 [ffff88100def3be8] generic_writepages at ffffffff8111b901
       #8 [ffff88100def3c48] journal_submit_data_buffers at ffffffffa0120f67 [jbd2]
       #9 [ffff88100def3cf8] jbd2_journal_commit_transaction at ffffffffa0121372[jbd2]
       #10 [ffff88100def3e68] kjournald2 at ffffffffa0127a86 [jbd2]
       #11 [ffff88100def3ee8] kthread at ffffffff81090db6
       #12 [ffff88100def3f48] kernel_thread_helper at ffffffff81516284
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Alex Chen <alex.chen@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f775da2f
  16. 07 5月, 2014 2 次提交
  17. 04 4月, 2014 2 次提交
  18. 13 11月, 2013 4 次提交
  19. 12 9月, 2013 1 次提交
  20. 04 9月, 2013 1 次提交
    • C
      direct-io: Implement generic deferred AIO completions · 7b7a8665
      Christoph Hellwig 提交于
      Add support to the core direct-io code to defer AIO completions to user
      context using a workqueue.  This replaces opencoded and less efficient
      code in XFS and ext4 (we save a memory allocation for each direct IO)
      and will be needed to properly support O_(D)SYNC for AIO.
      
      The communication between the filesystem and the direct I/O code requires
      a new buffer head flag, which is a bit ugly but not avoidable until the
      direct I/O code stops abusing the buffer_head structure for communicating
      with the filesystems.
      
      Currently this creates a per-superblock unbound workqueue for these
      completions, which is taken from an earlier patch by Jan Kara.  I'm
      not really convinced about this use and would prefer a "normal" global
      workqueue with a high concurrency limit, but this needs further discussion.
      
      JK: Fixed ext4 part, dynamic allocation of the workqueue.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7b7a8665
  21. 14 8月, 2013 1 次提交