1. 23 2月, 2016 2 次提交
  2. 18 10月, 2015 1 次提交
  3. 29 7月, 2015 1 次提交
  4. 29 8月, 2014 1 次提交
    • D
      jbd2: fix descriptor block size handling errors with journal_csum · db9ee220
      Darrick J. Wong 提交于
      It turns out that there are some serious problems with the on-disk
      format of journal checksum v2.  The foremost is that the function to
      calculate descriptor tag size returns sizes that are too big.  This
      causes alignment issues on some architectures and is compounded by the
      fact that some parts of jbd2 use the structure size (incorrectly) to
      determine the presence of a 64bit journal instead of checking the
      feature flags.
      
      Therefore, introduce journal checksum v3, which enlarges the
      descriptor block tag format to allow for full 32-bit checksums of
      journal blocks, fix the journal tag function to return the correct
      sizes, and fix the jbd2 recovery code to use feature flags to
      determine 64bitness.
      
      Add a few function helpers so we don't have to open-code quite so
      many pieces.
      
      Switching to a 16-byte block size was found to increase journal size
      overhead by a maximum of 0.1%, to convert a 32-bit journal with no
      checksumming to a 32-bit journal with checksum v3 enabled.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reported-by: NTR Reardon <thomas_reardon@hotmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      db9ee220
  5. 18 4月, 2014 1 次提交
  6. 09 3月, 2014 3 次提交
  7. 29 8月, 2013 1 次提交
  8. 13 6月, 2013 2 次提交
  9. 05 6月, 2013 4 次提交
    • J
      jbd2: transaction reservation support · 8f7d89f3
      Jan Kara 提交于
      In some cases we cannot start a transaction because of locking
      constraints and passing started transaction into those places is not
      handy either because we could block transaction commit for too long.
      Transaction reservation is designed to solve these issues.  It
      reserves a handle with given number of credits in the journal and the
      handle can be later attached to the running transaction without
      blocking on commit or checkpointing.  Reserved handles do not block
      transaction commit in any way, they only reduce maximum size of the
      running transaction (because we have to always be prepared to
      accomodate request for attaching reserved handle).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8f7d89f3
    • J
      jbd2: refine waiting for shadow buffers · b34090e5
      Jan Kara 提交于
      Currently when we add a buffer to a transaction, we wait until the
      buffer is removed from BJ_Shadow list (so that we prevent any changes
      to the buffer that is just written to the journal).  This can take
      unnecessarily long as a lot happens between the time the buffer is
      submitted to the journal and the time when we remove the buffer from
      BJ_Shadow list.  (e.g.  We wait for all data buffers in the
      transaction, we issue a cache flush, etc.)  Also this creates a
      dependency of do_get_write_access() on transaction commit (namely
      waiting for data IO to complete) which we want to avoid when
      implementing transaction reservation.
      
      So we modify commit code to set new BH_Shadow flag when temporary
      shadowing buffer is created and we clear that flag once IO on that
      buffer is complete.  This allows do_get_write_access() to wait only
      for BH_Shadow bit and thus removes the dependency on data IO
      completion.
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b34090e5
    • J
      jbd2: remove journal_head from descriptor buffers · e5a120ae
      Jan Kara 提交于
      Similarly as for metadata buffers, also log descriptor buffers don't
      really need the journal head. So strip it and remove BJ_LogCtl list.
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e5a120ae
    • J
      jbd2: don't create journal_head for temporary journal buffers · f5113eff
      Jan Kara 提交于
      When writing metadata to the journal, we create temporary buffer heads
      for that task.  We also attach journal heads to these buffer heads but
      the only purpose of the journal heads is to keep buffers linked in
      transaction's BJ_IO list.  We remove the need for journal heads by
      reusing buffer_head's b_assoc_buffers list for that purpose.  Also
      since BJ_IO list is just a temporary list for transaction commit, we
      use a private list in jbd2_journal_commit_transaction() for that thus
      removing BJ_IO list from transaction completely.
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f5113eff
  10. 28 5月, 2013 1 次提交
  11. 04 4月, 2013 1 次提交
    • D
      jbd2: fix race between jbd2_journal_remove_checkpoint and ->j_commit_callback · 794446c6
      Dmitry Monakhov 提交于
      The following race is possible:
      
      [kjournald2]                              other_task
      jbd2_journal_commit_transaction()
        j_state = T_FINISHED;
        spin_unlock(&journal->j_list_lock);
                                               ->jbd2_journal_remove_checkpoint()
      					   ->jbd2_journal_free_transaction();
      					     ->kmem_cache_free(transaction)
        ->j_commit_callback(journal, transaction);
          -> USE_AFTER_FREE
      
      WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
      Hardware name:
      list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
      Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
      Pid: 16400, comm: jbd2/dm-1-8 Tainted: G        W    3.8.0-rc3+ #107
      Call Trace:
       [<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0
       [<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50
       [<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0
       [<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250
       [<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0
       [<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570
       [<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0
       [<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0
       [<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0
       [<ffffffff810ad630>] ? wake_up_bit+0x40/0x40
       [<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80
       [<ffffffff810ac6be>] kthread+0x10e/0x120
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
       [<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
      
      In order to demonstrace this issue one should mount ext4 with mount -o
      discard option on SSD disk.  This makes callback longer and race
      window becomes wider.
      
      In order to fix this we should mark transaction as finished only after
      callbacks have completed
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      794446c6
  12. 07 2月, 2013 1 次提交
    • T
      jbd2: track request delay statistics · 9fff24aa
      Theodore Ts'o 提交于
      Track the delay between when we first request that the commit begin
      and when it actually begins, so we can see how much of a gap exists.
      In theory, this should just be the remaining scheduling quantuum of
      the thread which requested the commit (assuming it was not a
      synchronous operation which triggered the commit request) plus
      scheduling overhead; however, it's possible that real time processes
      might get in the way of letting the kjournald thread from executing.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9fff24aa
  13. 27 9月, 2012 1 次提交
    • J
      jbd2: fix assertion failure in commit code due to lacking transaction credits · b794e7a6
      Jan Kara 提交于
      ext4 users of data=journal mode with blocksize < pagesize were
      occasionally hitting assertion failure in
      jbd2_journal_commit_transaction() checking whether the transaction has
      at least as many credits reserved as buffers attached.  The core of the
      problem is that when a file gets truncated, buffers that still need
      checkpointing or that are attached to the committing transaction are
      left with buffer_mapped set. When this happens to buffers beyond i_size
      attached to a page stradding i_size, subsequent write extending the file
      will see these buffers and as they are mapped (but underlying blocks
      were freed) things go awry from here.
      
      The assertion failure just coincidentally (and in this case luckily as
      we would start corrupting filesystem) triggers due to journal_head not
      being properly cleaned up as well.
      
      We fix the problem by unmapping buffers if possible (in lots of cases we
      just need a buffer attached to a transaction as a place holder but it
      must not be written out anyway).  And in one case, we just have to bite
      the bullet and wait for transaction commit to finish.
      
      CC: Josef Bacik <jbacik@fusionio.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      b794e7a6
  14. 23 7月, 2012 1 次提交
  15. 27 5月, 2012 3 次提交
  16. 23 5月, 2012 1 次提交
  17. 24 4月, 2012 1 次提交
  18. 29 3月, 2012 1 次提交
  19. 20 3月, 2012 1 次提交
  20. 14 3月, 2012 4 次提交
    • J
      jbd2: cleanup journal tail after transaction commit · 3339578f
      Jan Kara 提交于
      Normally, we have to issue a cache flush before we can update journal tail in
      journal superblock, effectively wiping out old transactions from the journal.
      So use the fact that during transaction commit we issue cache flush anyway and
      opportunistically push journal tail as far as we can. Since update of journal
      superblock is still costly (we have to use WRITE_FUA), we update log tail only
      if we can free significant amount of space.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      3339578f
    • J
      jbd2: issue cache flush after checkpointing even with internal journal · 79feb521
      Jan Kara 提交于
      When we reach jbd2_cleanup_journal_tail(), there is no guarantee that
      checkpointed buffers are on a stable storage - especially if buffers were
      written out by jbd2_log_do_checkpoint(), they are likely to be only in disk's
      caches. Thus when we update journal superblock effectively removing old
      transaction from journal, this write of superblock can get to stable storage
      before those checkpointed buffers which can result in filesystem corruption
      after a crash. Thus we must unconditionally issue a cache flush before we
      update journal superblock in these cases.
      
      A similar problem can also occur if journal superblock is written only in
      disk's caches, other transaction starts reusing space of the transaction
      cleaned from the log and power failure happens. Subsequent journal replay would
      still try to replay the old transaction but some of it's blocks may be already
      overwritten by the new transaction. For this reason we must use WRITE_FUA when
      updating log tail and we must first write new log tail to disk and update
      in-memory information only after that.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      79feb521
    • J
      jbd2: protect all log tail updates with j_checkpoint_mutex · a78bb11d
      Jan Kara 提交于
      There are some log tail updates that are not protected by j_checkpoint_mutex.
      Some of these are harmless because they happen during startup or shutdown but
      updates in jbd2_journal_commit_transaction() and jbd2_journal_flush() can
      really race with other log tail updates (e.g. someone doing
      jbd2_journal_flush() with someone running jbd2_cleanup_journal_tail()). So
      protect all log tail updates with j_checkpoint_mutex.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a78bb11d
    • J
      jbd2: split updating of journal superblock and marking journal empty · 24bcc89c
      Jan Kara 提交于
      There are three case of updating journal superblock. In the first case, we want
      to mark journal as empty (setting s_sequence to 0), in the second case we want
      to update log tail, in the third case we want to update s_errno. Split these
      cases into separate functions. It makes the code slightly more straightforward
      and later patches will make the distinction even more important.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      24bcc89c
  21. 21 2月, 2012 1 次提交
  22. 29 12月, 2011 1 次提交
    • Y
      jbd2: clear revoked flag on buffers before a new transaction started · 1ba37268
      Yongqiang Yang 提交于
      Currently, we clear revoked flag only when a block is reused.  However,
      this can tigger a false journal error.  Consider a situation when a block
      is used as a meta block and is deleted(revoked) in ordered mode, then the
      block is allocated as a data block to a file.  At this moment, user changes
      the file's journal mode from ordered to journaled and truncates the file.
      The block will be considered re-revoked by journal because it has revoked
      flag still pending from the last transaction and an assertion triggers.
      
      We fix the problem by keeping the revoked status more uptodate - we clear
      revoked flag when switching revoke tables to reflect there is no revoked
      buffers in current transaction any more.
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      1ba37268
  23. 02 11月, 2011 1 次提交
  24. 14 6月, 2011 1 次提交
    • J
      jbd2: Fix oops in jbd2_journal_remove_journal_head() · de1b7941
      Jan Kara 提交于
      jbd2_journal_remove_journal_head() can oops when trying to access
      journal_head returned by bh2jh(). This is caused for example by the
      following race:
      
      	TASK1					TASK2
        jbd2_journal_commit_transaction()
          ...
          processing t_forget list
            __jbd2_journal_refile_buffer(jh);
            if (!jh->b_transaction) {
              jbd_unlock_bh_state(bh);
      					jbd2_journal_try_to_free_buffers()
      					  jbd2_journal_grab_journal_head(bh)
      					  jbd_lock_bh_state(bh)
      					  __journal_try_to_free_buffer()
      					  jbd2_journal_put_journal_head(jh)
              jbd2_journal_remove_journal_head(bh);
      
      jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and
      buffer is not part of any transaction and thus frees journal_head
      before TASK1 gets to doing so. Note that even buffer_head can be
      released by try_to_free_buffers() after
      jbd2_journal_put_journal_head() which adds even larger opportunity for
      oops (but I didn't see this happen in reality).
      
      Fix the problem by making transactions hold their own journal_head
      reference (in b_jcount). That way we don't have to remove journal_head
      explicitely via jbd2_journal_remove_journal_head() and instead just
      remove journal_head when b_jcount drops to zero. The result of this is
      that [__]jbd2_journal_refile_buffer(),
      [__]jbd2_journal_unfile_buffer(), and
      __jdb2_journal_remove_checkpoint() can free journal_head which needs
      modification of a few callers. Also we have to be careful because once
      journal_head is removed, buffer_head might be freed as well. So we
      have to get our own buffer_head reference where it matters.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      de1b7941
  25. 24 5月, 2011 2 次提交
    • J
      jbd2: Add function jbd2_trans_will_send_data_barrier() · bbd2be36
      Jan Kara 提交于
      Provide a function which returns whether a transaction with given tid
      will send a flush to the filesystem device.  The function will be used
      by ext4 to detect whether fsync needs to send a separate flush or not.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      bbd2be36
    • J
      jbd2: fix sending of data flush on journal commit · 81be12c8
      Jan Kara 提交于
      
      In data=ordered mode, it's theoretically possible (however rare) that
      an inode is filed to transaction's t_inode_list and a flusher thread
      writes all the data and inode is reclaimed before the transaction
      starts to commit.  In such a case, we could erroneously omit sending a
      flush to file system device when it is different from the journal
      device (because data can still be in disk cache only).
      
      Fix the problem by setting a flag in a transaction when some inode is added
      to it and then send disk flush in the commit code when the flag is set.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      81be12c8
  26. 17 5月, 2011 1 次提交
  27. 09 5月, 2011 1 次提交