1. 12 2月, 2011 1 次提交
    • T
      jbd2: call __jbd2_log_start_commit with j_state_lock write locked · e4471831
      Theodore Ts'o 提交于
      On an SMP ARM system running ext4, I've received a report that the
      first J_ASSERT in jbd2_journal_commit_transaction has been triggering:
      
      	J_ASSERT(journal->j_running_transaction != NULL);
      
      While investigating possible causes for this problem, I noticed that
      __jbd2_log_start_commit() is getting called with j_state_lock only
      read-locked, in spite of the fact that it's possible for it might
      j_commit_request.  Fix this by grabbing the necessary information so
      we can test to see if we need to start a new transaction before
      dropping the read lock, and then calling jbd2_log_start_commit() which
      will grab the write lock.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e4471831
  2. 19 12月, 2010 3 次提交
  3. 10 12月, 2010 1 次提交
  4. 28 10月, 2010 1 次提交
  5. 10 8月, 2010 1 次提交
  6. 04 8月, 2010 2 次提交
  7. 02 8月, 2010 1 次提交
  8. 27 7月, 2010 1 次提交
    • T
      jbd2: Remove __GFP_NOFAIL from jbd2 layer · 47def826
      Theodore Ts'o 提交于
      __GFP_NOFAIL is going away, so add our own retry loop.  Also add
      jbd2__journal_start() and jbd2__journal_restart() which take a gfp
      mask, so that file systems can optionally (re)start transaction
      handles using GFP_KERNEL.  If they do this, then they need to be
      prepared to handle receiving an PTR_ERR(-ENOMEM) error, and be ready
      to reflect that error up to userspace.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      47def826
  9. 16 7月, 2010 1 次提交
    • J
      jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions · 13ceef09
      Jan Kara 提交于
      OCFS2 uses t_commit trigger to compute and store checksum of the just
      committed blocks. When a buffer has b_frozen_data, checksum is computed
      for it instead of b_data but this can result in an old checksum being
      written to the filesystem in the following scenario:
      
      1) transaction1 is opened
      2) handle1 is opened
      3) journal_access(handle1, bh)
          - This sets jh->b_transaction to transaction1
      4) modify(bh)
      5) journal_dirty(handle1, bh)
      6) handle1 is closed
      7) start committing transaction1, opening transaction2
      8) handle2 is opened
      9) journal_access(handle2, bh)
          - This copies off b_frozen_data to make it safe for transaction1 to commit.
            jh->b_next_transaction is set to transaction2.
      10) jbd2_journal_write_metadata() checksums b_frozen_data
      11) the journal correctly writes b_frozen_data to the disk journal
      12) handle2 is closed
          - There was no dirty call for the bh on handle2, so it is never queued for
            any more journal operation
      13) Checkpointing finally happens, and it just spools the bh via normal buffer
      writeback.  This will write b_data, which was never triggered on and thus
      contains a wrong (old) checksum.
      
      This patch fixes the problem by calling the trigger at the moment data is
      frozen for journal commit - i.e., either when b_frozen_data is created by
      do_get_write_access or just before we write a buffer to the log if
      b_frozen_data does not exist. We also rename the trigger to t_frozen as
      that better describes when it is called.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      13ceef09
  10. 16 5月, 2010 1 次提交
  11. 16 2月, 2010 1 次提交
  12. 18 8月, 2009 1 次提交
  13. 11 8月, 2009 1 次提交
  14. 18 6月, 2009 1 次提交
  15. 14 7月, 2009 1 次提交
    • J
      jbd2: Fix a race between checkpointing code and journal_get_write_access() · f91d1d04
      Jan Kara 提交于
      The following race can happen:
      
       CPU1                          CPU2
                                     checkpointing code checks the buffer, adds
                                       it to an array for writeback
       do_get_write_access()
       ...
       lock_buffer()
       unlock_buffer()
                                     flush_batch() submits the buffer for IO
       __jbd2_journal_file_buffer()
      
      So a buffer under writeout is returned from
      do_get_write_access(). Since the filesystem code relies on the fact
      that journaled buffers cannot be written out, it does not take the
      buffer lock and so it can modify buffer while it is under
      writeout. That can lead to a filesystem corruption if we crash at the
      right moment.
      
      We fix the problem by clearing the buffer dirty bit under buffer_lock
      even if the buffer is on BJ_None list. Actually, we clear the dirty
      bit regardless the list the buffer is in and warn about the fact if
      the buffer is already journalled.
      
      Thanks for spotting the problem goes to dingdinghua <dingdinghua85@gmail.com>.
      Reported-by: Ndingdinghua <dingdinghua85@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f91d1d04
  16. 26 3月, 2009 1 次提交
  17. 11 2月, 2009 1 次提交
    • J
      jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() · 7f5aa215
      Jan Kara 提交于
      If we race with commit code setting i_transaction to NULL, we could
      possibly dereference it.  Proper locking requires the journal pointer
      (to access journal->j_list_lock), which we don't have.  So we have to
      change the prototype of the function so that filesystem passes us the
      journal pointer.  Also add a more detailed comment about why the
      function jbd2_journal_begin_ordered_truncate() does what it does and
      how it should be used.
      
      Thanks to Dan Carpenter <error27@gmail.com> for pointing to the
      suspitious code.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      CC: linux-ext4@vger.kernel.org
      CC: ocfs2-devel@oss.oracle.com
      CC: mfasheh@suse.de
      CC: Dan Carpenter <error27@gmail.com>
      7f5aa215
  18. 06 1月, 2009 1 次提交
    • J
      jbd2: Add buffer triggers · e06c8227
      Joel Becker 提交于
      Filesystems often to do compute intensive operation on some
      metadata.  If this operation is repeated many times, it can be very
      expensive.  It would be much nicer if the operation could be performed
      once before a buffer goes to disk.
      
      This adds triggers to jbd2 buffer heads.  Just before writing a metadata
      buffer to the journal, jbd2 will optionally call a commit trigger associated
      with the buffer.  If the journal is aborted, an abort trigger will be
      called on any dirty buffers as they are dropped from pending
      transactions.
      
      ocfs2 will use this feature.
      
      Initially I tried to come up with a more generic trigger that could be
      used for non-buffer-related events like transaction completion.  It
      doesn't tie nicely, because the information a buffer trigger needs
      (specific to a journal_head) isn't the same as what a transaction
      trigger needs (specific to a tranaction_t or perhaps journal_t).  So I
      implemented a buffer set, with the understanding that
      journal/transaction wide triggers should be implemented separately.
      
      There is only one trigger set allowed per buffer.  I can't think of any
      reason to attach more than one set.  Contrast this with a journal or
      transaction in which multiple places may want to watch the entire
      transaction separately.
      
      The trigger sets are considered static allocation from the jbd2
      perspective.  ocfs2 will just have one trigger set per block type,
      setting the same set on every bh of the same type.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      e06c8227
  19. 04 1月, 2009 1 次提交
  20. 26 11月, 2008 1 次提交
    • J
      jbd2: improve jbd2 fsync batching · e07f7183
      Josef Bacik 提交于
      This patch removes the static sleep time in favor of a more self
      optimizing approach where we measure the average amount of time it
      takes to commit a transaction to disk and the ammount of time a
      transaction has been running.  If somebody does a sync write or an
      fsync() traditionally we would sleep for 1 jiffies, which depending on
      the value of HZ could be a significant amount of time compared to how
      long it takes to commit a transaction to the underlying storage.  With
      this patch instead of sleeping for a jiffie, we check to see if the
      amount of time this transaction has been running is less than the
      average commit time, and if it is we sleep for the delta using
      schedule_hrtimeout to give us a higher precision sleep time.  This
      greatly benefits high end storage where you could end up sleeping for
      longer than it takes to commit the transaction and therefore sitting
      idle instead of allowing the transaction to be committed by keeping
      the sleep time to a minimum so you are sure to always be doing
      something.
      Signed-off-by: NJosef Bacik <jbacik@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e07f7183
  21. 17 10月, 2008 1 次提交
    • T
      ext4: Replace hackish ext4_mb_poll_new_transaction with commit callback · 3e624fc7
      Theodore Ts'o 提交于
      The multiblock allocator needs to be able to release blocks (and issue
      a blkdev discard request) when the transaction which freed those
      blocks is committed.  Previously this was done via a polling mechanism
      when blocks are allocated or freed.  A much better way of doing things
      is to create a jbd2 callback function and attaching the list of blocks
      to be freed directly to the transaction structure.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      3e624fc7
  22. 11 8月, 2008 2 次提交
  23. 12 7月, 2008 2 次提交
  24. 14 7月, 2008 1 次提交
    • M
      jbd2: fix race between jbd2_journal_try_to_free_buffers() and jbd2 commit transaction · 530576bb
      Mingming Cao 提交于
      journal_try_to_free_buffers() could race with jbd commit transaction
      when the later is holding the buffer reference while waiting for the
      data buffer to flush to disk. If the caller of
      journal_try_to_free_buffers() request tries hard to release the buffers,
      it will treat the failure as error and return back to the caller. We
      have seen the directo IO failed due to this race.  Some of the caller of
      releasepage() also expecting the buffer to be dropped when passed with
      GFP_KERNEL mask to the releasepage()->journal_try_to_free_buffers().
      
      With this patch, if the caller is passing the GFP_KERNEL to indicating
      this call could wait, in case of try_to_free_buffers() failed, let's
      waiting for journal_commit_transaction() to finish commit the current
      committing transaction , then try to free those buffers again with
      journal locked.
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Reviewed-by: Badari Pulavarty <pbadari@us.ibm.com> 
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      530576bb
  25. 17 4月, 2008 4 次提交
    • H
      jdb2: replace remaining __FUNCTION__ occurrences · 329d291f
      Harvey Harrison 提交于
      __FUNCTION__ is gcc-specific, use __func__
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      329d291f
    • R
      jbd2: fix kernel-doc notation · 5648ba5b
      Randy Dunlap 提交于
      Fix kernel-doc notation in jbd2.
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5648ba5b
    • J
      jbd2: fix possible journal overflow issues · 1dfc3220
      Josef Bacik 提交于
      There are several cases where the running transaction can get buffers
      added to its BJ_Metadata list which it never dirtied, which makes its
      t_nr_buffers counter end up larger than its t_outstanding_credits
      counter.
      
      This will cause issues when starting new transactions as while we are
      logging buffers we decrement t_outstanding_buffers, so when
      t_outstanding_buffers goes negative, we will report that we need less
      space in the journal than we actually need, so transactions will be
      started even though there may not be enough room for them.  In the worst
      case scenario (which admittedly is almost impossible to reproduce) this
      will result in the journal running out of space.
      
      The fix is to only refile buffers from the committing transaction to the
      running transactions BJ_Modified list when b_modified is set on that
      journal, which is the only way to be sure if the running transaction has
      modified that buffer.
      
      This patch also fixes an accounting error in journal_forget, it is
      possible that we can call journal_forget on a buffer without having
      modified it, only gotten write access to it, so instead of freeing a
      credit, we only do so if the buffer was modified.  The assert will help
      catch if this problem occurs.  Without these two patches I could hit
      this assert within minutes of running postmark, with them this issue no
      longer arises.
      
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Jan Kara <jack@ucw.cz>
      Signed-off-by: NJosef Bacik <jbacik@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      1dfc3220
    • J
      jbd2: fix the way the b_modified flag is cleared · 9fc7c63a
      Josef Bacik 提交于
      Currently at the start of a journal commit we loop through all of the buffers
      on the committing transaction and clear the b_modified flag (the flag that is
      set when a transaction modifies the buffer) under the j_list_lock.
      
      The problem is that everywhere else this flag is modified only under the jbd2
      lock buffer flag, so it will race with a running transaction who could
      potentially set it, and have it unset by the committing transaction.
      
      This is also a big waste, you can have several thousands of buffers that you
      are clearing the modified flag on when you may not need to.  This patch
      removes this code and instead clears the b_modified flag upon entering
      do_get_write_access/journal_get_create_access, so if that transaction does
      indeed use the buffer then it will be accounted for properly, and if it does
      not then we know we didn't use it.
      
      That will be important for the next patch in this series.  Tested thoroughly
      by myself using postmark/iozone/bonnie++.
      
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Jan Kara <jack@ucw.cz>
      Signed-off-by: NJosef Bacik <jbacik@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9fc7c63a
  26. 29 1月, 2008 4 次提交
    • M
      jbd2: sparse pointer use of zero as null · 4019191b
      Mingming Cao 提交于
      Get rid of sparse related warnings from places that use integer as NULL
      pointer.  (Ported from upstream ext3/jbd changes.)
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4019191b
    • M
      jbd2: Use round-jiffies() function for the "5 second" ext4/jbd2 wakeup · db857da3
      Mingming Cao 提交于
      While "every 5 seconds" doesn't sound as a problem, there can be many
      of these (and these timers do add up over all the kernel).  The "5
      second" wakeup isn't really timing sensitive; in addition even with
      rounding it'll still happen every 5 seconds (with the exception of the
      very first time, which is likely to be rounded up to somewhere closer
      to 6 seconds)
      
      (Ported from similar JBD patch made by Arjan van de Ven to
      fs/jbd/transaction.c)
      
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Andrew Morton <akpm@osdl.org>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      db857da3
    • M
      jbd2: add lockdep support · 7b751066
      Mingming Cao 提交于
      Ported from similar patch for the jbd layer.
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7b751066
    • J
      jbd2: jbd2 stats through procfs · 8e85fb3f
      Johann Lombardi 提交于
      The patch below updates the jbd stats patch to 2.6.20/jbd2.
      The initial patch was posted by Alex Tomas in December 2005
      (http://marc.info/?l=linux-ext4&m=113538565128617&w=2).
      It provides statistics via procfs such as transaction lifetime and size.
      
      Sometimes, investigating performance problems, i find useful to have
      stats from jbd about transaction's lifetime, size, etc. here is a
      patch for review and inclusion probably.
      
      for example, stats after creation of 3M files in htree directory:
      
      [root@bob ~]# cat /proc/fs/jbd/sda/history
      R/C  tid   wait  run   lock  flush log   hndls  block inlog ctime write drop  close
      R    261   8260  2720  0     0     750   9892   8170  8187
      C    259                                                    750   0     4885  1
      R    262   20    2200  10    0     770   9836   8170  8187
      R    263   30    2200  10    0     3070  9812   8170  8187
      R    264   0     5000  10    0     1340  0      0     0
      C    261                                                    8240  3212  4957  0
      R    265   8260  1470  0     0     4640  9854   8170  8187
      R    266   0     5000  10    0     1460  0      0     0
      C    262                                                    8210  2989  4868  0
      R    267   8230  1490  10    0     4440  9875   8171  8188
      R    268   0     5000  10    0     1260  0      0     0
      C    263                                                    7710  2937  4908  0
      R    269   7730  1470  10    0     3330  9841   8170  8187
      R    270   0     5000  10    0     830   0      0     0
      C    265                                                    8140  3234  4898  0
      C    267                                                    720   0     4849  1
      R    271   8630  2740  20    0     740   9819   8170  8187
      C    269                                                    800   0     4214  1
      R    272   40    2170  10    0     830   9716   8170  8187
      R    273   40    2280  0     0     3530  9799   8170  8187
      R    274   0     5000  10    0     990   0      0     0
      
      
      where,
      
      R     - line for transaction's life from T_RUNNING to T_FINISHED
      C     - line for transaction's checkpointing
      tid   - transaction's id
      wait  - for how long we were waiting for new transaction to start
               (the longest period journal_start() took in this transaction)
      run   - real transaction's lifetime (from T_RUNNING to T_LOCKED
      lock  - how long we were waiting for all handles to close
               (time the transaction was in T_LOCKED)
      flush - how long it took to flush all data (data=ordered)
      log   - how long it took to write the transaction to the log
      hndls - how many handles got to the transaction
      block - how many blocks got to the transaction
      inlog - how many blocks are written to the log (block + descriptors)
      ctime - how long it took to checkpoint the transaction
      write - how many blocks have been written during checkpointing
      drop  - how many blocks have been dropped during checkpointing
      close - how many running transactions have been closed to checkpoint this one
      
      all times are in msec.
      
      
      [root@bob ~]# cat /proc/fs/jbd/sda/info
      280 transaction, each upto 8192 blocks
      average:
        1633ms waiting for transaction
        3616ms running transaction
        5ms transaction was being locked
        1ms flushing data (in ordered mode)
        1799ms logging transaction
        11781 handles per transaction
        5629 blocks per transaction
        5641 logged blocks per transaction
      Signed-off-by: NJohann Lombardi <johann.lombardi@bull.net>
      Signed-off-by: NMariusz Kozlowski <m.kozlowski@tuxland.pl>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      8e85fb3f
  27. 18 10月, 2007 3 次提交