1. 05 6月, 2013 3 次提交
    • J
      jbd2: refine waiting for shadow buffers · b34090e5
      Jan Kara 提交于
      Currently when we add a buffer to a transaction, we wait until the
      buffer is removed from BJ_Shadow list (so that we prevent any changes
      to the buffer that is just written to the journal).  This can take
      unnecessarily long as a lot happens between the time the buffer is
      submitted to the journal and the time when we remove the buffer from
      BJ_Shadow list.  (e.g.  We wait for all data buffers in the
      transaction, we issue a cache flush, etc.)  Also this creates a
      dependency of do_get_write_access() on transaction commit (namely
      waiting for data IO to complete) which we want to avoid when
      implementing transaction reservation.
      
      So we modify commit code to set new BH_Shadow flag when temporary
      shadowing buffer is created and we clear that flag once IO on that
      buffer is complete.  This allows do_get_write_access() to wait only
      for BH_Shadow bit and thus removes the dependency on data IO
      completion.
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b34090e5
    • J
      jbd2: remove journal_head from descriptor buffers · e5a120ae
      Jan Kara 提交于
      Similarly as for metadata buffers, also log descriptor buffers don't
      really need the journal head. So strip it and remove BJ_LogCtl list.
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e5a120ae
    • J
      jbd2: don't create journal_head for temporary journal buffers · f5113eff
      Jan Kara 提交于
      When writing metadata to the journal, we create temporary buffer heads
      for that task.  We also attach journal heads to these buffer heads but
      the only purpose of the journal heads is to keep buffers linked in
      transaction's BJ_IO list.  We remove the need for journal heads by
      reusing buffer_head's b_assoc_buffers list for that purpose.  Also
      since BJ_IO list is just a temporary list for transaction commit, we
      use a private list in jbd2_journal_commit_transaction() for that thus
      removing BJ_IO list from transaction completely.
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f5113eff
  2. 22 5月, 2013 1 次提交
  3. 22 4月, 2013 1 次提交
    • T
      jbd2: trace when lock_buffer in do_get_write_access takes a long time · f783f091
      Theodore Ts'o 提交于
      While investigating interactivity problems it was clear that processes
      sometimes stall for long periods of times if an attempt is made to
      lock a buffer which is undergoing writeback.  It would stall in
      a trace looking something like
      
      [<ffffffff811a39de>] __lock_buffer+0x2e/0x30
      [<ffffffff8123a60f>] do_get_write_access+0x43f/0x4b0
      [<ffffffff8123a7cb>] jbd2_journal_get_write_access+0x2b/0x50
      [<ffffffff81220f79>] __ext4_journal_get_write_access+0x39/0x80
      [<ffffffff811f3198>] ext4_reserve_inode_write+0x78/0xa0
      [<ffffffff811f3209>] ext4_mark_inode_dirty+0x49/0x220
      [<ffffffff811f57d1>] ext4_dirty_inode+0x41/0x60
      [<ffffffff8119ac3e>] __mark_inode_dirty+0x4e/0x2d0
      [<ffffffff8118b9b9>] update_time+0x79/0xc0
      [<ffffffff8118ba98>] file_update_time+0x98/0x100
      [<ffffffff81110ffc>] __generic_file_aio_write+0x17c/0x3b0
      [<ffffffff811112aa>] generic_file_aio_write+0x7a/0xf0
      [<ffffffff811ea853>] ext4_file_write+0x83/0xd0
      [<ffffffff81172b23>] do_sync_write+0xa3/0xe0
      [<ffffffff811731ae>] vfs_write+0xae/0x180
      [<ffffffff8117361d>] sys_write+0x4d/0x90
      [<ffffffff8159d62d>] system_call_fastpath+0x1a/0x1f
      [<ffffffffffffffff>] 0xffffffffffffffff
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f783f091
  4. 20 4月, 2013 1 次提交
  5. 12 3月, 2013 1 次提交
    • J
      jbd2: fix use after free in jbd2_journal_dirty_metadata() · ad56edad
      Jan Kara 提交于
      jbd2_journal_dirty_metadata() didn't get a reference to journal_head it
      was working with. This is OK in most of the cases since the journal head
      should be attached to a transaction but in rare occasions when we are
      journalling data, __ext4_journalled_writepage() can race with
      jbd2_journal_invalidatepage() stripping buffers from a page and thus
      journal head can be freed under hands of jbd2_journal_dirty_metadata().
      
      Fix the problem by getting own journal head reference in
      jbd2_journal_dirty_metadata() (and also in jbd2_journal_set_triggers()
      which can possibly have the same issue).
      Reported-by: NZheng Liu <gnehzuil.liu@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      ad56edad
  6. 03 3月, 2013 1 次提交
    • D
      jbd2: fix ERR_PTR dereference in jbd2__journal_start · df05c1b8
      Dmitry Monakhov 提交于
      If start_this_handle() failed handle will be initialized
      to ERR_PTR() and can not be dereferenced.
      
      paging request at fffffffffffffff6
      IP: [<ffffffff813c073f>] jbd2__journal_start+0x18f/0x290
      PGD 200e067 PUD 200f067 PMD 0
      Oops: 0000 [#1] SMP
      Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
      CPU 0 journal commit I/O error
      
      Pid: 2694, comm: fio Not tainted 3.8.0-rc3+ #79                  /DQ67SW
      RIP: 0010:[<ffffffff813c073f>]  [<ffffffff813c073f>] jbd2__journal_start+0x18f/0x290
      RSP: 0018:ffff880233b8ba58  EFLAGS: 00010292
      RAX: 00000000ffffffe2 RBX: ffffffffffffffe2 RCX: 0000000000000006
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff82128f48
      RBP: ffff880233b8ba98 R08: 0000000000000000 R09: ffff88021440a6e0
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      df05c1b8
  7. 09 2月, 2013 1 次提交
  8. 07 2月, 2013 1 次提交
    • T
      jbd2: track request delay statistics · 9fff24aa
      Theodore Ts'o 提交于
      Track the delay between when we first request that the commit begin
      and when it actually begins, so we can see how much of a gap exists.
      In theory, this should just be the remaining scheduling quantuum of
      the thread which requested the commit (assuming it was not a
      synchronous operation which triggered the commit request) plus
      scheduling overhead; however, it's possible that real time processes
      might get in the way of letting the kjournald thread from executing.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9fff24aa
  9. 26 12月, 2012 1 次提交
    • J
      ext4: fix deadlock in journal_unmap_buffer() · 53e87268
      Jan Kara 提交于
      We cannot wait for transaction commit in journal_unmap_buffer()
      because we hold page lock which ranks below transaction start.  We
      solve the issue by bailing out of journal_unmap_buffer() and
      jbd2_journal_invalidatepage() with -EBUSY.  Caller is then responsible
      for waiting for transaction commit to finish and try invalidation
      again. Since the issue can happen only for page stradding i_size, it
      is simple enough to manually call jbd2_journal_invalidatepage() for
      such page from ext4_setattr(), check the return value and wait if
      necessary.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      53e87268
  10. 21 12月, 2012 1 次提交
    • J
      jbd2: fix assertion failure in jbd2_journal_flush() · d7961c7f
      Jan Kara 提交于
      The following race is possible between start_this_handle() and someone
      calling jbd2_journal_flush().
      
      Process A                              Process B
      start_this_handle().
        if (journal->j_barrier_count) # false
        if (!journal->j_running_transaction) { #true
          read_unlock(&journal->j_state_lock);
                                             jbd2_journal_lock_updates()
                                             jbd2_journal_flush()
                                               write_lock(&journal->j_state_lock);
                                               if (journal->j_running_transaction) {
                                                 # false
                                               ... wait for committing trans ...
                                               write_unlock(&journal->j_state_lock);
          ...
          write_lock(&journal->j_state_lock);
          if (!journal->j_running_transaction) { # true
            jbd2_get_transaction(journal, new_transaction);
          write_unlock(&journal->j_state_lock);
          goto repeat; # eventually blocks on j_barrier_count > 0
                                               ...
                                               J_ASSERT(!journal->j_running_transaction);
                                                 # fails
      
      We fix the race by rechecking j_barrier_count after reacquiring j_state_lock
      in exclusive mode.
      
      Reported-by: yjwsignal@empal.com
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      d7961c7f
  11. 19 11月, 2012 1 次提交
  12. 09 11月, 2012 1 次提交
  13. 27 9月, 2012 1 次提交
    • J
      jbd2: fix assertion failure in commit code due to lacking transaction credits · b794e7a6
      Jan Kara 提交于
      ext4 users of data=journal mode with blocksize < pagesize were
      occasionally hitting assertion failure in
      jbd2_journal_commit_transaction() checking whether the transaction has
      at least as many credits reserved as buffers attached.  The core of the
      problem is that when a file gets truncated, buffers that still need
      checkpointing or that are attached to the committing transaction are
      left with buffer_mapped set. When this happens to buffers beyond i_size
      attached to a page stradding i_size, subsequent write extending the file
      will see these buffers and as they are mapped (but underlying blocks
      were freed) things go awry from here.
      
      The assertion failure just coincidentally (and in this case luckily as
      we would start corrupting filesystem) triggers due to journal_head not
      being properly cleaned up as well.
      
      We fix the problem by unmapping buffers if possible (in lots of cases we
      just need a buffer attached to a transaction as a place holder but it
      must not be written out anyway).  And in one case, we just have to bite
      the bullet and wait for transaction commit to finish.
      
      CC: Josef Bacik <jbacik@fusionio.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      b794e7a6
  14. 01 6月, 2012 1 次提交
  15. 20 3月, 2012 1 次提交
  16. 14 3月, 2012 2 次提交
  17. 21 2月, 2012 2 次提交
    • Y
      jbd2: allocate transaction from separate slab cache · 0c2022ec
      Yongqiang Yang 提交于
      There is normally only a handful of these active at any one time, but
      putting them in a separate slab cache makes debugging memory
      corruption problems easier.  Manish Katiyar also wanted this make it
      easier to test memory failure scenarios in the jbd2 layer.
      
      Cc: Manish Katiyar <mkatiyar@gmail.com>
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0c2022ec
    • E
      jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer · 15291164
      Eric Sandeen 提交于
      journal_unmap_buffer()'s zap_buffer: code clears a lot of buffer head
      state ala discard_buffer(), but does not touch _Delay or _Unwritten as
      discard_buffer() does.
      
      This can be problematic in some areas of the ext4 code which assume
      that if they have found a buffer marked unwritten or delay, then it's
      a live one.  Perhaps those spots should check whether it is mapped
      as well, but if jbd2 is going to tear down a buffer, let's really
      tear it down completely.
      
      Without this I get some fsx failures on sub-page-block filesystems
      up until v3.2, at which point 4e96b2db
      and 189e868f make the failures go
      away, because buried within that large change is some more flag
      clearing.  I still think it's worth doing in jbd2, since
      ->invalidatepage leads here directly, and it's the right place
      to clear away these flags.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      15291164
  18. 05 1月, 2012 1 次提交
    • J
      jbd2: fix hung processes in jbd2_journal_lock_updates() · 9837d8e9
      Jan Kara 提交于
      Toshiyuki Okajima found out that when running
      
      for ((i=0; i < 100000; i++)); do
              if ((i%2 == 0)); then
                      chattr +j /mnt/file
              else
                      chattr -j /mnt/file
              fi
              echo "0" >> /mnt/file
      done
      
      process sometimes hangs indefinitely in jbd2_journal_lock_updates().
      
      Toshiyuki identified that the following race happens:
      
      jbd2_journal_lock_updates()            |jbd2_journal_stop()
      ---------------------------------------+---------------------------------------
       write_lock(&journal->j_state_lock)    |    .
       ++journal->j_barrier_count            |    .
       spin_lock(&tran->t_handle_lock)       |    .
       atomic_read(&tran->t_updates) //not 0 |
                                             | atomic_dec_and_test(&tran->t_updates)
                                             |    // t_updates = 0
                                             | wake_up(&journal->j_wait_updates)
       prepare_to_wait()                     |    // no process is woken up.
       spin_unlock(&tran->t_handle_lock)     |
       write_unlock(&journal->j_state_lock)  |
       schedule() // never return            |
      
      We fix the problem by first calling prepare_to_wait() and only after that
      checking t_updates in jbd2_journal_lock_updates().
      Reported-and-analyzed-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9837d8e9
  19. 02 11月, 2011 1 次提交
  20. 27 10月, 2011 1 次提交
  21. 04 9月, 2011 2 次提交
    • D
      jbd2: use gfp_t instead of int · d2159fb7
      Dan Carpenter 提交于
      This silences some Sparse warnings:
      fs/jbd2/transaction.c:135:69: warning: incorrect type in argument 2 (different base types)
      fs/jbd2/transaction.c:135:69:    expected restricted gfp_t [usertype] flags
      fs/jbd2/transaction.c:135:69:    got int [signed] gfp_mask
      Signed-off-by: NDan Carpenter <error27@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d2159fb7
    • T
      jbd2: add debugging information to jbd2_journal_dirty_metadata() · 9ea7a0df
      Theodore Ts'o 提交于
      Add debugging information in case jbd2_journal_dirty_metadata() is
      called with a buffer_head which didn't have
      jbd2_journal_get_write_access() called on it, or if the journal_head
      has the wrong transaction in it.  In addition, return an error code.
      This won't change anything for ocfs2, which will BUG_ON() the non-zero
      exit code.
      
      For ext4, the caller of this function is ext4_handle_dirty_metadata(),
      and on seeing a non-zero return code, will call __ext4_journal_stop(),
      which will print the function and line number of the (buggy) calling
      function and abort the journal.  This will allow us to recover instead
      of bug halting, which is better from a robustness and reliability
      point of view.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9ea7a0df
  22. 14 6月, 2011 1 次提交
    • J
      jbd2: Fix oops in jbd2_journal_remove_journal_head() · de1b7941
      Jan Kara 提交于
      jbd2_journal_remove_journal_head() can oops when trying to access
      journal_head returned by bh2jh(). This is caused for example by the
      following race:
      
      	TASK1					TASK2
        jbd2_journal_commit_transaction()
          ...
          processing t_forget list
            __jbd2_journal_refile_buffer(jh);
            if (!jh->b_transaction) {
              jbd_unlock_bh_state(bh);
      					jbd2_journal_try_to_free_buffers()
      					  jbd2_journal_grab_journal_head(bh)
      					  jbd_lock_bh_state(bh)
      					  __journal_try_to_free_buffer()
      					  jbd2_journal_put_journal_head(jh)
              jbd2_journal_remove_journal_head(bh);
      
      jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and
      buffer is not part of any transaction and thus frees journal_head
      before TASK1 gets to doing so. Note that even buffer_head can be
      released by try_to_free_buffers() after
      jbd2_journal_put_journal_head() which adds even larger opportunity for
      oops (but I didn't see this happen in reality).
      
      Fix the problem by making transactions hold their own journal_head
      reference (in b_jcount). That way we don't have to remove journal_head
      explicitely via jbd2_journal_remove_journal_head() and instead just
      remove journal_head when b_jcount drops to zero. The result of this is
      that [__]jbd2_journal_refile_buffer(),
      [__]jbd2_journal_unfile_buffer(), and
      __jdb2_journal_remove_checkpoint() can free journal_head which needs
      modification of a few callers. Also we have to be careful because once
      journal_head is removed, buffer_head might be freed as well. So we
      have to get our own buffer_head reference where it matters.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      de1b7941
  23. 13 6月, 2011 1 次提交
  24. 26 5月, 2011 1 次提交
  25. 25 5月, 2011 1 次提交
  26. 24 5月, 2011 1 次提交
    • J
      jbd2: fix sending of data flush on journal commit · 81be12c8
      Jan Kara 提交于
      
      In data=ordered mode, it's theoretically possible (however rare) that
      an inode is filed to transaction's t_inode_list and a flusher thread
      writes all the data and inode is reclaimed before the transaction
      starts to commit.  In such a case, we could erroneously omit sending a
      flush to file system device when it is different from the journal
      device (because data can still be in disk cache only).
      
      Fix the problem by setting a flag in a transaction when some inode is added
      to it and then send disk flush in the commit code when the flag is set.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      81be12c8
  27. 23 5月, 2011 1 次提交
    • T
      jbd2: Fix the wrong calculation of t_max_wait in update_t_max_wait · 28e35e42
      Tao Ma 提交于
      t_max_wait is added in commit 8e85fb3f to indicate how long we
      were waiting for new transaction to start. In commit 6d0bf005,
      it is moved to another function named update_t_max_wait to
      avoid a build warning. But the wrong thing is that the original
      'ts' is initialized in the start of function start_this_handle
      and we can calculate t_max_wait in the right way. while with
      this change, ts is initialized within the function and t_max_wait
      can never be calculated right.
      
      This patch moves the initialization of ts to the original beginning
      of start_this_handle and pass it to function update_t_max_wait so
      that it can be calculated right and the build warning is avoided also.
      
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      28e35e42
  28. 31 3月, 2011 1 次提交
  29. 12 2月, 2011 1 次提交
    • T
      jbd2: call __jbd2_log_start_commit with j_state_lock write locked · e4471831
      Theodore Ts'o 提交于
      On an SMP ARM system running ext4, I've received a report that the
      first J_ASSERT in jbd2_journal_commit_transaction has been triggering:
      
      	J_ASSERT(journal->j_running_transaction != NULL);
      
      While investigating possible causes for this problem, I noticed that
      __jbd2_log_start_commit() is getting called with j_state_lock only
      read-locked, in spite of the fact that it's possible for it might
      j_commit_request.  Fix this by grabbing the necessary information so
      we can test to see if we need to start a new transaction before
      dropping the read lock, and then calling jbd2_log_start_commit() which
      will grab the write lock.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e4471831
  30. 19 12月, 2010 3 次提交
  31. 10 12月, 2010 1 次提交
  32. 28 10月, 2010 1 次提交
  33. 10 8月, 2010 1 次提交