1. 01 2月, 2018 2 次提交
  2. 16 11月, 2017 1 次提交
  3. 28 2月, 2017 1 次提交
  4. 13 12月, 2016 1 次提交
  5. 11 12月, 2016 4 次提交
  6. 05 11月, 2016 1 次提交
  7. 01 10月, 2016 1 次提交
    • E
      ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock() · c33f0785
      Eric Ren 提交于
      The testcase "mmaptruncate" of ocfs2-test deadlocks occasionally.
      
      In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it;
      there are 2 process repeatedly performing the following operations
      respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1, 'a',
      1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE) and then
      ftruncate(fd, CLUSTER_SIZE) again and again.
      
      This is the backtrace when the deadlock happens:
      
         __wait_on_bit_lock+0x50/0xa0
         __lock_page+0xb7/0xc0
         ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2]
         ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2]
         do_page_mkwrite+0x66/0xc0
         handle_mm_fault+0x685/0x1350
         __do_page_fault+0x1d8/0x4d0
         trace_do_page_fault+0x37/0xf0
         do_async_page_fault+0x19/0x70
         async_page_fault+0x28/0x30
      
      In ocfs2_write_begin_nolock(), we first grab the pages and then allocate
      disk space for this write; ocfs2_try_to_free_truncate_log() will be
      called if -ENOSPC is returned; if we're lucky to get enough clusters,
      which is usually the case, we start over again.
      
      But in ocfs2_free_write_ctxt() the target page isn't unlocked, so we
      will deadlock when trying to grab the target page again.
      
      Also, -ENOMEM might be returned in ocfs2_grab_pages_for_write().
      Another deadlock will happen in __do_page_mkwrite() if
      ocfs2_page_mkwrite() returns non-VM_FAULT_LOCKED, and along with a
      locked target page.
      
      These two errors fail on the same path, so fix them by unlocking the
      target page manually before ocfs2_free_write_ctxt().
      
      Jan Kara helps me clear out the JBD2 part, and suggest the hint for root
      cause.
      
      Changes since v1:
      1. Also put ENOMEM error case into consideration.
      
      Link: http://lkml.kernel.org/r/1474173902-32075-1-git-send-email-zren@suse.comSigned-off-by: NEric Ren <zren@suse.com>
      Reviewed-by: NHe Gang <ghe@suse.com>
      Acked-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c33f0785
  8. 28 9月, 2016 1 次提交
  9. 03 8月, 2016 1 次提交
  10. 08 6月, 2016 1 次提交
  11. 30 5月, 2016 1 次提交
  12. 03 5月, 2016 1 次提交
  13. 02 5月, 2016 1 次提交
  14. 05 4月, 2016 2 次提交
    • K
      mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage · ea1754a0
      Kirill A. Shutemov 提交于
      Mostly direct substitution with occasional adjustment or removing
      outdated comments.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea1754a0
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  15. 26 3月, 2016 11 次提交
    • R
      ocfs2: fix a deadlock issue in ocfs2_dio_end_io_write() · 28888681
      Ryan Ding 提交于
      The code should call ocfs2_free_alloc_context() to free meta_ac &
      data_ac before calling ocfs2_run_deallocs().  Because
      ocfs2_run_deallocs() will acquire the system inode's i_mutex hold by
      meta_ac.  So try to release the lock before ocfs2_run_deallocs().
      
      Fixes: af1310367f41 ("ocfs2: fix sparse file & data ordering issue in direct io.")
      Signed-off-by: NRyan Ding <ryan.ding@oracle.com>
      Acked-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      28888681
    • R
      ocfs2: fix disk file size and memory file size mismatch · ce170828
      Ryan Ding 提交于
      When doing append direct write in an already allocated cluster, and fast
      path in ocfs2_dio_get_block() is triggered, function
      ocfs2_dio_end_io_write() will be skipped as there is no context
      allocated.
      
      As a result, the disk file size will not be changed as it should be.
      The solution is to skip fast path when we are about to change file size.
      
      Fixes: af1310367f41 ("ocfs2: fix sparse file & data ordering issue in direct io.")
      Signed-off-by: NRyan Ding <ryan.ding@oracle.com>
      Acked-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ce170828
    • R
      ocfs2: take ip_alloc_sem in ocfs2_dio_get_block & ocfs2_dio_end_io_write · a86a72a4
      Ryan Ding 提交于
      Take ip_alloc_sem to prevent concurrent access to extent tree, which may
      cause the extent tree in an unstable state.
      Signed-off-by: NRyan Ding <ryan.ding@oracle.com>
      Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a86a72a4
    • R
      ocfs2: fix ip_unaligned_aio deadlock with dio work queue · e63890f3
      Ryan Ding 提交于
      In the current implementation of unaligned aio+dio, lock order behave as
      follow:
      
      in user process context:
        -> call io_submit()
          -> get i_mutex
      		<== window1
            -> get ip_unaligned_aio
              -> submit direct io to block device
          -> release i_mutex
        -> io_submit() return
      
      in dio work queue context(the work queue is created in __blockdev_direct_IO):
        -> release ip_unaligned_aio
      		<== window2
          -> get i_mutex
            -> clear unwritten flag & change i_size
          -> release i_mutex
      
      There is a limitation to the thread number of dio work queue.  256 at
      default.  If all 256 thread are in the above 'window2' stage, and there
      is a user process in the 'window1' stage, the system will became
      deadlock.  Since the user process hold i_mutex to wait ip_unaligned_aio
      lock, while there is a direct bio hold ip_unaligned_aio mutex who is
      waiting for a dio work queue thread to be schedule.  But all the dio
      work queue thread is waiting for i_mutex lock in 'window2'.
      
      This case only happened in a test which send a large number(more than
      256) of aio at one io_submit() call.
      
      My design is to remove ip_unaligned_aio lock.  Change it to a sync io
      instead.  Just like ip_unaligned_aio lock, serialize the unaligned aio
      dio.
      
      [akpm@linux-foundation.org: remove OCFS2_IOCB_UNALIGNED_IO, per Junxiao Bi]
      Signed-off-by: NRyan Ding <ryan.ding@oracle.com>
      Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e63890f3
    • R
      ocfs2: fix sparse file & data ordering issue in direct io · c15471f7
      Ryan Ding 提交于
      There are mainly three issues in the direct io code path after commit
      24c40b32 ("ocfs2: implement ocfs2_direct_IO_write"):
      
        * Does not support sparse file.
        * Does not support data ordering.  eg: when write to a file hole, it
          will alloc extent first.  If system crashed before io finished, data
          will corrupt.
        * Potential risk when doing aio+dio.  The -EIOCBQUEUED return value is
          likely to be ignored by ocfs2_direct_IO_write().
      
      To resolve above problems, re-design direct io code with following ideas:
        * Use buffer io to fill in holes.  And this will make better
          performance also.
        * Clear unwritten after direct write finished.  So we can make sure
          meta data changes after data write to disk.  (Unwritten extent is
          invisible to user, from user's view, meta data is not changed when
          allocate an unwritten extent.)
        * Clear ocfs2_direct_IO_write().  Do all ending work in end_io.
      
      This patch has passed fs,dio,ltp-aiodio.part1,ltp-aiodio.part2,ltp-aiodio.part4
      test cases of ltp.
      
      For performance improvement, see following test result:
      ocfs2 cluster size 1MB, ocfs2 volume is mounted on /mnt/.
      The original way:
        + rm /mnt/test.img -f
        + dd if=/dev/zero of=/mnt/test.img bs=4K count=1048576 oflag=direct
        1048576+0 records in
        1048576+0 records out
        4294967296 bytes (4.3 GB) copied, 1707.83 s, 2.5 MB/s
        + rm /mnt/test.img -f
        + dd if=/dev/zero of=/mnt/test.img bs=256K count=16384 oflag=direct
        16384+0 records in
        16384+0 records out
        4294967296 bytes (4.3 GB) copied, 582.705 s, 7.4 MB/s
      
      After this patch:
        + rm /mnt/test.img -f
        + dd if=/dev/zero of=/mnt/test.img bs=4K count=1048576 oflag=direct
        1048576+0 records in
        1048576+0 records out
        4294967296 bytes (4.3 GB) copied, 64.6412 s, 66.4 MB/s
        + rm /mnt/test.img -f
        + dd if=/dev/zero of=/mnt/test.img bs=256K count=16384 oflag=direct
        16384+0 records in
        16384+0 records out
        4294967296 bytes (4.3 GB) copied, 34.7611 s, 124 MB/s
      Signed-off-by: NRyan Ding <ryan.ding@oracle.com>
      Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c15471f7
    • R
      ocfs2: record UNWRITTEN extents when populate write desc · 4506cfb6
      Ryan Ding 提交于
      To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock.
      
      There is still one issue in the direct write procedure.
      
      phase 1: alloc extent with UNWRITTEN flag
      phase 2: submit direct data to disk, add zero page to page cache
      phase 3: clear UNWRITTEN flag when data has been written to disk
      
      When there are 2 direct write A(0~3KB),B(4~7KB) writing to the same
      cluster 0~7KB (cluster size 8KB).  Write request A arrive phase 2 first,
      it will zero the region (4~7KB).  Before request A enter to phase 3,
      request B arrive phase 2, it will zero region (0~3KB).  This is just like
      request B steps request A.
      
      To resolve this issue, we should let request B knows this cluster is already
      under zero, to prevent it from steps the previous write request.
      
      This patch will add function ocfs2_unwritten_check() to do this job.  It
      will record all clusters that are under direct write(it will be recorded
      in the 'ip_unwritten_list' member of inode info), and prevent the later
      direct write writing to the same cluster to do the zero work again.
      Signed-off-by: NRyan Ding <ryan.ding@oracle.com>
      Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4506cfb6
    • R
      ocfs2: return the physical address in ocfs2_write_cluster · 2de6a3c7
      Ryan Ding 提交于
      To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock.
      
      Direct io needs to get the physical address from write_begin, to map the
      user page.  This patch is to change the arg 'phys' of
      ocfs2_write_cluster to a pointer, so it can be retrieved to write_begin.
      And we can retrieve it to the direct io procedure.
      Signed-off-by: NRyan Ding <ryan.ding@oracle.com>
      Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2de6a3c7
    • R
      ocfs2: do not change i_size in write_end for direct io · 46e62556
      Ryan Ding 提交于
      To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock.
      
      Append direct io do not change i_size in get block phase.  It only move
      to orphan when starting write.  After data is written to disk, it will
      delete itself from orphan and update i_size.  So skip i_size change
      section in write_begin for direct io.
      
      And when there is no extents alloc, no meta data changes needed for
      direct io (since write_begin start trans for 2 reason: alloc extents &
      change i_size.  Now none of them needed).  So we can skip start trans
      procedure.
      Signed-off-by: NRyan Ding <ryan.ding@oracle.com>
      Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46e62556
    • R
      ocfs2: test target page before change it · 65c4db8c
      Ryan Ding 提交于
      To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock.
      
      Direct io data will not appear in buffer.  The w_target_page member will
      not be filled by direct io.  So avoid to use it when it's NULL.  Unlinke
      buffer io and mmap, direct io will call write_begin with more than 1
      page a time.  So the target_index is not sufficient to describe the
      actual data.  change it to a range start at target_index, end in
      end_index.
      Signed-off-by: NRyan Ding <ryan.ding@oracle.com>
      Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      65c4db8c
    • R
      ocfs2: use c_new to indicate newly allocated extents · b46637d5
      Ryan Ding 提交于
      To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock.
      
      There is a problem in ocfs2's direct io implement: if system crashed
      after extents allocated, and before data return, we will get a extent
      with dirty data on disk.  This problem violate the journal=order
      semantics, which means meta changes take effect after data written to
      disk.  To resolve this issue, direct write can use the UNWRITTEN flag to
      describe a extent during direct data writeback.  The direct write
      procedure should act in the following order:
      
      phase 1: alloc extent with UNWRITTEN flag
      phase 2: submit direct data to disk, add zero page to page cache
      phase 3: clear UNWRITTEN flag when data has been written to disk
      
      This patch is to change the 'c_unwritten' member of
      ocfs2_write_cluster_desc to 'c_clear_unwritten'.  Means whether to clear
      the unwritten flag.  It do not care if a extent is allocated or not.
      And use 'c_new' to specify a newly allocated extent.  So the direct io
      procedure can use c_clear_unwritten to control the UNWRITTEN bit on
      extent.
      Signed-off-by: NRyan Ding <ryan.ding@oracle.com>
      Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b46637d5
    • R
      ocfs2: add ocfs2_write_type_t type to identify the caller of write · c1ad1e3c
      Ryan Ding 提交于
      Patchset: fix ocfs2 direct io code patch to support sparse file and data
      ordering semantics
      
      The idea is to use buffer io(more precisely use the interface
      ocfs2_write_begin_nolock & ocfs2_write_end_nolock) to do the zero work
      beyond block size.  And clear UNWRITTEN flag until direct io data has
      been written to disk, which can prevent data corruption when system
      crashed during direct write.
      
      And we will also archive a better performance: eg.  dd direct write new
      file with block size 4KB: before this patchset:
        2.5 MB/s
      after this patchset:
        66.4 MB/s
      
      This patch (of 8):
      
      To support direct io in ocfs2_write_begin_nolock &
      ocfs2_write_end_nolock.
      
      Remove unused args filp & flags.  Add new arg type.  The type is one of
      buffer/direct/mmap.  Indicate 3 way to perform write.  buffer/mmap type
      has implemented.  direct type will be implemented later.
      Signed-off-by: NRyan Ding <ryan.ding@oracle.com>
      Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c1ad1e3c
  16. 28 2月, 2016 1 次提交
  17. 08 2月, 2016 1 次提交
  18. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  19. 06 11月, 2015 2 次提交
  20. 05 9月, 2015 5 次提交
    • J
      ocfs2: neaten do_error, ocfs2_error and ocfs2_abort · 7ecef14a
      Joe Perches 提交于
      These uses sometimes do and sometimes don't have '\n' terminations.  Make
      the uses consistently use '\n' terminations and remove the newline from
      the functions.
      
      Miscellanea:
      
      o Coalesce formats
      o Realign arguments
      Signed-off-by: NJoe Perches <joe@perches.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ecef14a
    • Y
      ocfs2: call ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock() · 7f27ec97
      yangwenfang 提交于
      1: After we call ocfs2_journal_access_di() in ocfs2_write_begin(),
         jbd2_journal_restart() may also be called, in this function transaction
         A's t_updates-- and obtains a new transaction B.  If
         jbd2_journal_commit_transaction() is happened to commit transaction A,
         when t_updates==0, it will continue to complete commit and unfile
         buffer.
      
         So when jbd2_journal_dirty_metadata(), the handle is pointed a new
         transaction B, and the buffer head's journal head is already freed,
         jh->b_transaction == NULL, jh->b_next_transaction == NULL, it returns
         EINVAL, So it triggers the BUG_ON(status).
      
      thread 1                                          jbd2
      ocfs2_write_begin                     jbd2_journal_commit_transaction
      ocfs2_write_begin_nolock
        ocfs2_start_trans
          jbd2__journal_start(t_updates+1,
                             transaction A)
          ocfs2_journal_access_di
          ocfs2_write_cluster_by_desc
            ocfs2_mark_extent_written
              ocfs2_change_extent_flag
                ocfs2_split_extent
                  ocfs2_extend_rotate_transaction
                    jbd2_journal_restart
                    (t_updates-1,transaction B) t_updates==0
                                              __jbd2_journal_refile_buffer
                                              (jh->b_transaction = NULL)
      ocfs2_write_end
      ocfs2_write_end_nolock
          ocfs2_journal_dirty
              jbd2_journal_dirty_metadata(bug)
         ocfs2_commit_trans
      
      2.  In ext4, I found that: jbd2_journal_get_write_access() called by
         ext4_write_end.
      
      ext4_write_begin
          ext4_journal_start
              __ext4_journal_start_sb
                  ext4_journal_check_start
                  jbd2__journal_start
      
      ext4_write_end
          ext4_mark_inode_dirty
              ext4_reserve_inode_write
                  ext4_journal_get_write_access
                      jbd2_journal_get_write_access
              ext4_mark_iloc_dirty
                  ext4_do_update_inode
                      ext4_handle_dirty_metadata
                          jbd2_journal_dirty_metadata
      
      3. So I think we should put ocfs2_journal_access_di before
         ocfs2_journal_dirty in the ocfs2_write_end.  and it works well after my
         modification.
      Signed-off-by: Nvicky <vicky.yangwenfang@huawei.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Zhangguanghui <zhang.guanghui@h3c.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f27ec97
    • W
      ocfs2: add ip_alloc_sem in direct IO to protect allocation changes · 6ab855a9
      WeiWei Wang 提交于
      In ocfs2, ip_alloc_sem is used to protect allocation changes on the
      node.  In direct IO, we add ip_alloc_sem to protect date consistent
      between direct-io and ocfs2_truncate_file race (buffer io use
      ip_alloc_sem already).  Although inode->i_mutex lock is used to avoid
      concurrency of above situation, i think ip_alloc_sem is still needed
      because protect allocation changes is significant.
      
      Other filesystem like ext4 also uses rw_semaphore to protect data
      consistent between get_block-vs-truncate race by other means, So
      ip_alloc_sem in ocfs2 direct io is needed.
      Signed-off-by: NWeiwei Wang <wangww631@huawei.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ab855a9
    • J
      ocfs2: fix several issues of append dio · faaebf18
      Joseph Qi 提交于
      1) Take rw EX lock in case of append dio.
      2) Explicitly treat the error code -EIOCBQUEUED as normal.
      3) Set di_bh to NULL after brelse if it may be used again later.
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Yiwen Jiang <jiangyiwen@huawei.com>
      Cc: Weiwei Wang <wangww631@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      faaebf18
    • J
      ocfs2: fix race between dio and recover orphan · 512f62ac
      Joseph Qi 提交于
      During direct io the inode will be added to orphan first and then
      deleted from orphan.  There is a race window that the orphan entry will
      be deleted twice and thus trigger the BUG when validating
      OCFS2_DIO_ORPHANED_FL in ocfs2_del_inode_from_orphan.
      
      ocfs2_direct_IO_write
          ...
          ocfs2_add_inode_to_orphan
          >>>>>>>> race window.
                   1) another node may rm the file and then down, this node
                   take care of orphan recovery and clear flag
                   OCFS2_DIO_ORPHANED_FL.
                   2) since rw lock is unlocked, it may race with another
                   orphan recovery and append dio.
          ocfs2_del_inode_from_orphan
      
      So take inode mutex lock when recovering orphans and make rw unlock at the
      end of aio write in case of append dio.
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Reported-by: NYiwen Jiang <jiangyiwen@huawei.com>
      Cc: Weiwei Wang <wangww631@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      512f62ac