1. 09 6月, 2015 1 次提交
    • J
      jbd2: speedup jbd2_journal_get_[write|undo]_access() · de92c8ca
      Jan Kara 提交于
      jbd2_journal_get_write_access() and jbd2_journal_get_create_access() are
      frequently called for buffers that are already part of the running
      transaction - most frequently it is the case for bitmaps, inode table
      blocks, and superblock. Since in such cases we have nothing to do, it is
      unfortunate we still grab reference to journal head, lock the bh, lock
      bh_state only to find out there's nothing to do.
      
      Improving this is a bit subtle though since until we find out journal
      head is attached to the running transaction, it can disappear from under
      us because checkpointing / commit decided it's no longer needed. We deal
      with this by protecting journal_head slab with RCU. We still have to be
      careful about journal head being freed & reallocated within slab and
      about exposing journal head in consistent state (in particular
      b_modified and b_frozen_data must be in correct state before we allow
      user to touch the buffer).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      de92c8ca
  2. 08 6月, 2015 1 次提交
    • M
      jbd2: revert must-not-fail allocation loops back to GFP_NOFAIL · 6ccaf3e2
      Michal Hocko 提交于
      This basically reverts 47def826 (jbd2: Remove __GFP_NOFAIL from jbd2
      layer). The deprecation of __GFP_NOFAIL was a bad choice because it led
      to open coding the endless loop around the allocator rather than
      removing the dependency on the non failing allocation. So the
      deprecation was a clear failure and the reality tells us that
      __GFP_NOFAIL is not even close to go away.
      
      It is still true that __GFP_NOFAIL allocations are generally discouraged
      and new uses should be evaluated and an alternative (pre-allocations or
      reservations) should be considered but it doesn't make any sense to lie
      the allocator about the requirements. Allocator can take steps to help
      making a progress if it knows the requirements.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      6ccaf3e2
  3. 02 12月, 2014 1 次提交
    • D
      jbd2: fix regression where we fail to initialize checksum seed when loading · 32f38691
      Darrick J. Wong 提交于
      When we're enabling journal features, we cannot use the predicate
      jbd2_journal_has_csum_v2or3() because we haven't yet set the sb
      feature flag fields!  Moreover, we just finished loading the shash
      driver, so the test is unnecessary; calculate the seed always.
      
      Without this patch, we fail to initialize the checksum seed the first
      time we turn on journal_checksum, which means that all journal blocks
      written during that first mount are corrupt.  Transactions written
      after the second mount will be fine, since the feature flag will be
      set in the journal superblock.  xfstests generic/{034,321,322} are the
      regression tests.
      
      (This is important for 3.18.)
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.coM>
      Reported-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      32f38691
  4. 26 11月, 2014 1 次提交
  5. 11 9月, 2014 1 次提交
  6. 05 9月, 2014 1 次提交
  7. 29 8月, 2014 1 次提交
    • D
      jbd2: fix descriptor block size handling errors with journal_csum · db9ee220
      Darrick J. Wong 提交于
      It turns out that there are some serious problems with the on-disk
      format of journal checksum v2.  The foremost is that the function to
      calculate descriptor tag size returns sizes that are too big.  This
      causes alignment issues on some architectures and is compounded by the
      fact that some parts of jbd2 use the structure size (incorrectly) to
      determine the presence of a 64bit journal instead of checking the
      feature flags.
      
      Therefore, introduce journal checksum v3, which enlarges the
      descriptor block tag format to allow for full 32-bit checksums of
      journal blocks, fix the journal tag function to return the correct
      sizes, and fix the jbd2 recovery code to use feature flags to
      determine 64bitness.
      
      Add a few function helpers so we don't have to open-code quite so
      many pieces.
      
      Switching to a 16-byte block size was found to increase journal size
      overhead by a maximum of 0.1%, to convert a 32-bit journal with no
      checksumming to a 32-bit journal with checksum v3 enabled.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reported-by: NTR Reardon <thomas_reardon@hotmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      db9ee220
  8. 09 3月, 2014 1 次提交
  9. 18 2月, 2014 1 次提交
  10. 09 12月, 2013 2 次提交
  11. 29 8月, 2013 1 次提交
  12. 01 7月, 2013 1 次提交
  13. 13 6月, 2013 2 次提交
    • P
      jbd2: use a single printk for jbd_debug() · 169f1a2a
      Paul Gortmaker 提交于
      Since the jbd_debug() is implemented with two separate printk()
      calls, it can lead to corrupted and misleading debug output like
      the following (see lines marked with "*"):
      
      [  290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes
      [  290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104
      [  290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ
      [* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit:
      [* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103
      [* 290.339382] JBD2: starting commit of transaction 42104
      [  290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records
      [  290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079
      
      i.e. the debug output from log_wait_commit and journal_commit_transaction
      have become interleaved.  The output should have been:
      
      (fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103
      (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104
      
      It is expected that this is not easy to replicate -- I was only able
      to cause it on preempt-rt kernels, and even then only under heavy
      I/O load.
      Reported-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Suggested-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      169f1a2a
    • D
      jbd2: optimize jbd2_journal_force_commit · 9ff86446
      Dmitry Monakhov 提交于
      Current implementation of jbd2_journal_force_commit() is suboptimal because
      result in empty and useless commits. But callers just want to force and wait
      any unfinished commits. We already have jbd2_journal_force_commit_nested()
      which does exactly what we want, except we are guaranteed that we do not hold
      journal transaction open.
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9ff86446
  14. 05 6月, 2013 6 次提交
    • J
      jbd2: transaction reservation support · 8f7d89f3
      Jan Kara 提交于
      In some cases we cannot start a transaction because of locking
      constraints and passing started transaction into those places is not
      handy either because we could block transaction commit for too long.
      Transaction reservation is designed to solve these issues.  It
      reserves a handle with given number of credits in the journal and the
      handle can be later attached to the running transaction without
      blocking on commit or checkpointing.  Reserved handles do not block
      transaction commit in any way, they only reduce maximum size of the
      running transaction (because we have to always be prepared to
      accomodate request for attaching reserved handle).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8f7d89f3
    • J
      jbd2: remove unused waitqueues · f29fad72
      Jan Kara 提交于
      j_wait_logspace and j_wait_checkpoint are unused.  Remove them.
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f29fad72
    • J
      jbd2: cleanup needed free block estimates when starting a transaction · 76c39904
      Jan Kara 提交于
      __jbd2_log_space_left() and jbd_space_needed() were kind of odd.
      jbd_space_needed() accounted also credits needed for currently
      committing transaction while it didn't account for credits needed for
      control blocks.  __jbd2_log_space_left() then accounted for control
      blocks as a fraction of free space.  Since results of these two
      functions are always only compared against each other, this works
      correct but is somewhat strange.  Move the estimates so that
      jbd_space_needed() returns number of blocks needed for a transaction
      including control blocks and __jbd2_log_space_left() returns free
      space in the journal (with the committing transaction already
      subtracted).  Rename functions to jbd2_log_space_left() and
      jbd2_space_needed() while we are changing them.
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      76c39904
    • J
      jbd2: refine waiting for shadow buffers · b34090e5
      Jan Kara 提交于
      Currently when we add a buffer to a transaction, we wait until the
      buffer is removed from BJ_Shadow list (so that we prevent any changes
      to the buffer that is just written to the journal).  This can take
      unnecessarily long as a lot happens between the time the buffer is
      submitted to the journal and the time when we remove the buffer from
      BJ_Shadow list.  (e.g.  We wait for all data buffers in the
      transaction, we issue a cache flush, etc.)  Also this creates a
      dependency of do_get_write_access() on transaction commit (namely
      waiting for data IO to complete) which we want to avoid when
      implementing transaction reservation.
      
      So we modify commit code to set new BH_Shadow flag when temporary
      shadowing buffer is created and we clear that flag once IO on that
      buffer is complete.  This allows do_get_write_access() to wait only
      for BH_Shadow bit and thus removes the dependency on data IO
      completion.
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b34090e5
    • J
      jbd2: remove journal_head from descriptor buffers · e5a120ae
      Jan Kara 提交于
      Similarly as for metadata buffers, also log descriptor buffers don't
      really need the journal head. So strip it and remove BJ_LogCtl list.
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e5a120ae
    • J
      jbd2: don't create journal_head for temporary journal buffers · f5113eff
      Jan Kara 提交于
      When writing metadata to the journal, we create temporary buffer heads
      for that task.  We also attach journal heads to these buffer heads but
      the only purpose of the journal heads is to keep buffers linked in
      transaction's BJ_IO list.  We remove the need for journal heads by
      reusing buffer_head's b_assoc_buffers list for that purpose.  Also
      since BJ_IO list is just a temporary list for transaction commit, we
      use a private list in jbd2_journal_commit_transaction() for that thus
      removing BJ_IO list from transaction completely.
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f5113eff
  15. 28 5月, 2013 1 次提交
  16. 30 4月, 2013 1 次提交
  17. 10 4月, 2013 1 次提交
    • A
      procfs: new helper - PDE_DATA(inode) · d9dda78b
      Al Viro 提交于
      The only part of proc_dir_entry the code outside of fs/proc
      really cares about is PDE(inode)->data.  Provide a helper
      for that; static inline for now, eventually will be moved
      to fs/proc, along with the knowledge of struct proc_dir_entry
      layout.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d9dda78b
  18. 04 4月, 2013 1 次提交
    • T
      ext4/jbd2: don't wait (forever) for stale tid caused by wraparound · d76a3a77
      Theodore Ts'o 提交于
      In the case where an inode has a very stale transaction id (tid) in
      i_datasync_tid or i_sync_tid, it's possible that after a very large
      (2**31) number of transactions, that the tid number space might wrap,
      causing tid_geq()'s calculations to fail.
      
      Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
      by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
      attempted to fix this problem, but it only avoided kjournald spinning
      forever by fixing the logic in jbd2_log_start_commit().
      
      Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
      that might call jbd2_log_start_commit() with a stale tid, those
      functions will subsequently call jbd2_log_wait_commit() with the same
      stale tid, and then wait for a very long time.  To fix this, we
      replace the calls to jbd2_log_start_commit() and
      jbd2_log_wait_commit() with a call to a new function,
      jbd2_complete_transaction(), which will correctly handle stale tid's.
      
      As a bonus, jbd2_complete_transaction() will avoid locking
      j_state_lock for writing unless a commit needs to be started.  This
      should have a small (but probably not measurable) improvement for
      ext4's scalability.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: NBen Hutchings <ben@decadent.org.uk>
      Reported-by: NGeorge Barnett <gbarnett@atlassian.com>
      Cc: stable@vger.kernel.org
      
      d76a3a77
  19. 10 2月, 2013 1 次提交
    • T
      jbd2: use module parameters instead of debugfs for jbd_debug · b6e96d00
      Theodore Ts'o 提交于
      There are multiple reasons to move away from debugfs.  First of all,
      we are only using it for a single parameter, and it is much more
      complicated to set up (some 30 lines of code compared to 3), and one
      more thing that might fail while loading the jbd2 module.
      
      Secondly, as a module paramter it can be specified as a boot option if
      jbd2 is built into the kernel, or as a parameter when the module is
      loaded, and it can also be manipulated dynamically under
      /sys/module/jbd2/parameters/jbd2_debug.  So it is more flexible.
      
      Ultimately we want to move away from using jbd_debug() towards
      tracepoints, but for now this is still a useful simplification of the
      code base.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b6e96d00
  20. 07 2月, 2013 1 次提交
    • T
      jbd2: track request delay statistics · 9fff24aa
      Theodore Ts'o 提交于
      Track the delay between when we first request that the commit begin
      and when it actually begins, so we can see how much of a gap exists.
      In theory, this should just be the remaining scheduling quantuum of
      the thread which requested the commit (assuming it was not a
      synchronous operation which triggered the commit request) plus
      scheduling overhead; however, it's possible that real time processes
      might get in the way of letting the kjournald thread from executing.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9fff24aa
  21. 30 1月, 2013 1 次提交
    • E
      jbd2: don't wake kjournald unnecessarily · e7b04ac0
      Eric Sandeen 提交于
      Don't send an extra wakeup to kjournald in the case where we
      already have the proper target in j_commit_request, i.e. that
      transaction has already been requested for commit.
      
      commit deeeaf13 "jbd2: fix fsync() tid wraparound bug" changed
      the logic leading to a wakeup, but it caused some extra wakeups
      which were found to lead to a measurable performance regression.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      [tytso@mit.edu: reworked check to make it clearer]
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e7b04ac0
  22. 09 11月, 2012 1 次提交
  23. 19 8月, 2012 1 次提交
    • E
      jbd2: don't write superblock when if its empty · eeecef0a
      Eric Sandeen 提交于
      This sequence:
      
      # truncate --size=1g fsfile
      # mkfs.ext4 -F fsfile
      # mount -o loop,ro fsfile /mnt
      # umount /mnt
      # dmesg | tail
      
      results in an IO error when unmounting the RO filesystem:
      
      [  318.020828] Buffer I/O error on device loop1, logical block 196608
      [  318.027024] lost page write due to I/O error on loop1
      [  318.032088] JBD2: Error -5 detected when updating journal superblock for loop1-8.
      
      This was a regression introduced by commit 24bcc89c: "jbd2: split
      updating of journal superblock and marking journal empty".
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      eeecef0a
  24. 06 8月, 2012 1 次提交
    • T
      ext4: make sure the journal sb is written in ext4_clear_journal_err() · d796c52e
      Theodore Ts'o 提交于
      After we transfer set the EXT4_ERROR_FS bit in the file system
      superblock, it's not enough to call jbd2_journal_clear_err() to clear
      the error indication from journal superblock --- we need to call
      jbd2_journal_update_sb_errno() as well.  Otherwise, when the root file
      system is mounted read-only, the journal is replayed, and the error
      indicator is transferred to the superblock --- but the s_errno field
      in the jbd2 superblock is left set (since although we cleared it in
      memory, we never flushed it out to disk).
      
      This can end up confusing e2fsck.  We should make e2fsck more robust
      in this case, but the kernel shouldn't be leaving things in this
      confused state, either.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      
      d796c52e
  25. 04 8月, 2012 1 次提交
  26. 27 5月, 2012 4 次提交
  27. 29 3月, 2012 1 次提交
  28. 20 3月, 2012 1 次提交
  29. 14 3月, 2012 2 次提交
    • J
      jbd2: cleanup journal tail after transaction commit · 3339578f
      Jan Kara 提交于
      Normally, we have to issue a cache flush before we can update journal tail in
      journal superblock, effectively wiping out old transactions from the journal.
      So use the fact that during transaction commit we issue cache flush anyway and
      opportunistically push journal tail as far as we can. Since update of journal
      superblock is still costly (we have to use WRITE_FUA), we update log tail only
      if we can free significant amount of space.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      3339578f
    • J
      jbd2: issue cache flush after checkpointing even with internal journal · 79feb521
      Jan Kara 提交于
      When we reach jbd2_cleanup_journal_tail(), there is no guarantee that
      checkpointed buffers are on a stable storage - especially if buffers were
      written out by jbd2_log_do_checkpoint(), they are likely to be only in disk's
      caches. Thus when we update journal superblock effectively removing old
      transaction from journal, this write of superblock can get to stable storage
      before those checkpointed buffers which can result in filesystem corruption
      after a crash. Thus we must unconditionally issue a cache flush before we
      update journal superblock in these cases.
      
      A similar problem can also occur if journal superblock is written only in
      disk's caches, other transaction starts reusing space of the transaction
      cleaned from the log and power failure happens. Subsequent journal replay would
      still try to replay the old transaction but some of it's blocks may be already
      overwritten by the new transaction. For this reason we must use WRITE_FUA when
      updating log tail and we must first write new log tail to disk and update
      in-memory information only after that.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      79feb521