1. 16 9月, 2009 2 次提交
  2. 21 7月, 2009 1 次提交
  3. 16 7月, 2009 2 次提交
    • J
      jbd: Fix a race between checkpointing code and journal_get_write_access() · 1e9fd53b
      Jan Kara 提交于
      The following race can happen:
      
        CPU1                          CPU2
                                      checkpointing code checks the buffer, adds
                                        it to an array for writeback
      do_get_write_access()
        ...
        lock_buffer()
        unlock_buffer()
                                        flush_batch() submits the buffer for IO
        __jbd_journal_file_buffer()
      
        So a buffer under writeout is returned from do_get_write_access(). Since
      the filesystem code relies on the fact that journaled buffers cannot be
      written out, it does not take the buffer lock and so it can modify buffer
      while it is under writeout. That can lead to a filesystem corruption
      if we crash at the right moment. The similar problem can happen with
      the journal_get_create_access() path.
        We fix the problem by clearing the buffer dirty bit under buffer_lock
      even if the buffer is on BJ_None list. Actually, we clear the dirty bit
      regardless the list the buffer is in and warn about the fact if
      the buffer is already journalled.
      
      Thanks for spotting the problem goes to dingdinghua <dingdinghua85@gmail.com>.
      Reported-by: Ndingdinghua <dingdinghua85@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      1e9fd53b
    • J
      jbd: Fail to load a journal if it is too short · 7447a668
      Jan Kara 提交于
      Due to on disk corruption, it can happen that journal is too short. Fail
      to load it in such case so that we don't oops somewhere later.
      Reported-by: NNageswara R Sastry <rnsastry@linux.vnet.ibm.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      7447a668
  4. 19 6月, 2009 1 次提交
  5. 10 6月, 2009 1 次提交
    • J
      jbd: fix race in buffer processing in commit code · a61d90d7
      Jan Kara 提交于
      In commit code, we scan buffers attached to a transaction.  During this
      scan, we sometimes have to drop j_list_lock and then we recheck whether
      the journal buffer head didn't get freed by journal_try_to_free_buffers().
       But checking for buffer_jbd(bh) isn't enough because a new journal head
      could get attached to our buffer head.  So add a check whether the journal
      head remained the same and whether it's still at the same transaction and
      list.
      
      This is a nasty bug and can cause problems like memory corruption (use after
      free) or trigger various assertions in JBD code (observed).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: <stable@kernel.org>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a61d90d7
  6. 14 4月, 2009 2 次提交
  7. 06 4月, 2009 1 次提交
  8. 03 4月, 2009 1 次提交
  9. 28 3月, 2009 1 次提交
  10. 12 2月, 2009 1 次提交
    • J
      jbd: fix return value of journal_start_commit() · 8fe4cd0d
      Jan Kara 提交于
      journal_start_commit() returns 1 if either a transaction is committing or
      the function has queued a transaction commit.  But it returns 0 if we
      raced with somebody queueing the transaction commit as well.  This
      resulted in ext3_sync_fs() not functioning correctly (description from
      Arthur Jones): In the case of a data=ordered umount with pending long
      symlinks which are delayed due to a long list of other I/O on the backing
      block device, this causes the buffer associated with the long symlinks to
      not be moved to the inode dirty list in the second phase of fsync_super.
      Then, before they can be dirtied again, kjournald exits, seeing the UMOUNT
      flag and the dirty pages are never written to the backing block device,
      causing long symlink corruption and exposing new or previously freed block
      data to userspace.
      
      This can be reproduced with a script created by Eric Sandeen
      <sandeen@redhat.com>:
      
              #!/bin/bash
      
              umount /mnt/test2
              mount /dev/sdb4 /mnt/test2
              rm -f /mnt/test2/*
              dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512
              touch /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
              ln -s /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
              /mnt/test2/link
              umount /mnt/test2
              mount /dev/sdb4 /mnt/test2
              ls /mnt/test2/
      
      This patch fixes journal_start_commit() to always return 1 when there's
      a transaction committing or queued for commit.
      
      Cc: Eric Sandeen <sandeen@redhat.com>
      Cc: Mike Snitzer <snitzer@gmail.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8fe4cd0d
  11. 09 1月, 2009 2 次提交
    • R
      jbd: remove excess kernel-doc notation · 1579c3a1
      Randy Dunlap 提交于
      Remove excess kernel-doc from fs/jbd/transaction.c:
      
      Warning(linux-2.6.28-git5//fs/jbd/transaction.c:764): Excess function parameter 'credits' description in 'journal_get_write_access'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1579c3a1
    • J
      jbd: improve fsync batching · f420d4dc
      Josef Bacik 提交于
      There is a flaw with the way jbd handles fsync batching.  If we fsync() a
      file and we were not the last person to run fsync() on this fs then we
      automatically sleep for 1 jiffie in order to wait for new writers to join
      into the transaction before forcing the commit.  The problem with this is
      that with really fast storage (ie a Clariion) the time it takes to commit
      a transaction to disk is way faster than 1 jiffie in most cases, so
      sleeping means waiting longer with nothing to do than if we just committed
      the transaction and kept going.  Ric Wheeler noticed this when using
      fs_mark with more than 1 thread, the throughput would plummet as he added
      more threads.
      
      This patch attempts to fix this problem by recording the average time in
      nanoseconds that it takes to commit a transaction to disk, and what time
      we started the transaction.  If we run an fsync() and we have been running
      for less time than it takes to commit the transaction to disk, we sleep
      for the delta amount of time and then commit to disk.  We acheive
      sub-jiffie sleeping using schedule_hrtimeout.  This means that the wait
      time is auto-tuned to the speed of the underlying disk, instead of having
      this static timeout.  I weighted the average according to somebody's
      comments (Andreas Dilger I think) in order to help normalize random
      outliers where we take way longer or way less time to commit than the
      average.  I also have a min() check in there to make sure we don't sleep
      longer than a jiffie in case our storage is super slow, this was requested
      by Andrew.
      
      I unfortunately do not have access to a Clariion, so I had to use a
      ramdisk to represent a super fast array.  I tested with a SATA drive with
      barrier=1 to make sure there was no regression with local disks, I tested
      with a 4 way multipathed Apple Xserve RAID array and of course the
      ramdisk.  I ran the following command
      
      fs_mark -d /mnt/ext3-test -s 4096 -n 2000 -D 64 -t $i
      
      where $i was 2, 4, 8, 16 and 32.  I mkfs'ed the fs each time.  Here are my
      results
      
      type	threads		with patch	without patch
      sata	2		24.6		26.3
      sata	4		49.2		48.1
      sata	8		70.1		67.0
      sata	16		104.0		94.1
      sata	32		153.6		142.7
      
      xserve	2		246.4		222.0
      xserve	4		480.0		440.8
      xserve	8		829.5		730.8
      xserve	16		1172.7		1026.9
      xserve	32		1816.3		1650.5
      
      ramdisk	2		2538.3		1745.6
      ramdisk	4		2942.3		661.9
      ramdisk	8		2882.5		999.8
      ramdisk	16		2738.7		1801.9
      ramdisk	32		2541.9		2394.0
      Signed-off-by: NJosef Bacik <jbacik@redhat.com>
      Cc: Andreas Dilger <adilger@sun.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Ric Wheeler <rwheeler@redhat.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f420d4dc
  12. 07 11月, 2008 1 次提交
    • T
      jbd: don't give up looking for space so easily in __log_wait_for_space · e219cca0
      Theodore Ts'o 提交于
      Commit be07c4ed introducd a regression because it assumed that if
      there were no transactions ready to be checkpointed, that no progress
      could be made on making space available in the journal, and so the
      journal should be aborted.  This assumption is false; it could be the
      case that simply calling cleanup_journal_tail() will recover the
      necessary space, or, for small journals, the currently committing
      transaction could be responsible for chewing up the required space in
      the log, so we need to wait for the currently committing transaction
      to finish before trying to force a checkpoint operation.
      
      This patch fixes the bug reported by Meelis Roos at:
      http://bugzilla.kernel.org/show_bug.cgi?id=11937Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Duane Griffin <duaneg@dghda.com>
      Cc: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
      e219cca0
  13. 31 10月, 2008 1 次提交
  14. 23 10月, 2008 3 次提交
  15. 21 10月, 2008 1 次提交
  16. 20 10月, 2008 4 次提交
  17. 11 8月, 2008 2 次提交
  18. 05 8月, 2008 2 次提交
  19. 26 7月, 2008 7 次提交
    • H
      jbd: don't abort if flushing file data failed · cbe5f466
      Hidehiro Kawai 提交于
      In ordered mode, the current jbd aborts the journal if a file data buffer
      has an error.  But this behavior is unintended, and we found that it has
      been adopted accidentally.
      
      This patch undoes it and just calls printk() instead of aborting the
      journal.  Additionally, set AS_EIO into the address_space object of the
      failed buffer which is submitted by journal_do_submit_data() so that
      fsync() can get -EIO.
      
      Missing error checkings are also added to inform errors on file data
      buffers to the user.  The following buffers are targeted.
      
        (a) the buffer which has already been written out by pdflush
        (b) the buffer which has been unlocked before scanned in the
            t_locked_list loop
      
      [akpm@linux-foundation.org: improve grammar in a printk]
      Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbe5f466
    • T
      jbd: positively dispose the unmapped data buffers in journal_commit_transaction() · fc80c442
      Toshiyuki Okajima 提交于
      After ext3-ordered files are truncated, there is a possibility that the
      pages which cannot be estimated still remain.  Remaining pages can be
      released when the system has really few memory.  So, it is not memory
      leakage.  But the resource management software etc.  may not work
      correctly.
      
      It is possible that journal_unmap_buffer() cannot release the buffers, and
      the pages to which they belong because they are attached to a commiting
      transaction and journal_unmap_buffer() cannot release them.  To release
      such the buffers and the pages later, journal_unmap_buffer() leaves it to
      journal_commit_transaction().  (journal_unmap_buffer() puts the mark
      'BH_Freed' to the buffers so that journal_commit_transaction() can
      identify whether they can be released or not.)
      
      In the journalled mode and the writeback mode, jbd does with only metadata
      buffers.  But in the ordered mode, jbd does with metadata buffers and also
      data buffers.
      
      Actually, journal_commit_transaction() releases only the metadata buffers
      of which release is demanded by journal_unmap_buffer(), and also releases
      the pages to which they belong if possible.
      
      As a result, the data buffers of which release is demanded by
      journal_unmap_buffer() remain after a transaction commits.  And also the
      pages to which they belong remain.
      
      Such the remained pages don't have mapping any longer.  Due to this fact,
      there is a possibility that the pages which cannot be estimated remain.
      
      The metadata buffers marked 'BH_Freed' and the pages to which
      they belong can be released at 'JBD: commit phase 7'.
      
      Therefore, by applying the same code into 'JBD: commit phase 2' (where the
      data buffers are done with), journal_commit_transaction() can also release
      the data buffers marked 'BH_Freed' and the pages to which they belong.
      
      As a result, all the buffers marked 'BH_Freed' can be released, and also
      all the pages to which these buffers belong can be released at
      journal_commit_transaction().  So, the page which cannot be estimated is
      lost.
      
      <<Excerpt of code at 'JBD: commit phase 7'>>
       >         spin_lock(&journal->j_list_lock);
       >         while (commit_transaction->t_forget) {
       >                 transaction_t *cp_transaction;
       >                 struct buffer_head *bh;
       >
       >                 jh = commit_transaction->t_forget;
       >...
       >                 if (buffer_freed(bh)) {
       >                 ^^^^^^^^^^^^^^^^^^^^^^^^
       >                         clear_buffer_freed(bh);
       >                        ^^^^^^^^^^^^^^^^^^^^^^^^
       >                         clear_buffer_jbddirty(bh);
       >                 }
       >
       >                 if (buffer_jbddirty(bh)) {
       >                         JBUFFER_TRACE(jh, "add to new checkpointing trans");
       >                         __journal_insert_checkpoint(jh, commit_transaction);
       >                         JBUFFER_TRACE(jh, "refile for checkpoint writeback");
       >                         __journal_refile_buffer(jh);
       >                         jbd_unlock_bh_state(bh);
       >                 } else {
       >                         J_ASSERT_BH(bh, !buffer_dirty(bh));
       > ...
       >                         JBUFFER_TRACE(jh, "refile or unfile freed buffer");
       >                         __journal_refile_buffer(jh);
       >                         if (!jh->b_transaction) {
       >                                 jbd_unlock_bh_state(bh);
       >                                  /* needs a brelse */
       >                                 journal_remove_journal_head(bh);
       >                                 release_buffer_page(bh);
       >                                 ^^^^^^^^^^^^^^^^^^^^^^^^
       >                         } else
       >                 }
      ****************************************************************
      * Apply the code of "^^^^^^" lines into 'JBD: commit phase 2' *
      ****************************************************************
      
      At journal_commit_transaction() code, there is one extra message in the
      series of jbd debug messages.  ("JBD: commit phase 2") This patch fixes
      it, too.
      Signed-off-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc80c442
    • A
      jbd: unexport journal_update_superblock · a10320e8
      Adrian Bunk 提交于
      Remove the unused EXPORT_SYMBOL(journal_update_superblock).
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a10320e8
    • M
      jbd: fix race between free buffer and commit transaction · 3f31fddf
      Mingming Cao 提交于
      journal_try_to_free_buffers() could race with jbd commit transaction when
      the later is holding the buffer reference while waiting for the data
      buffer to flush to disk.  If the caller of journal_try_to_free_buffers()
      request tries hard to release the buffers, it will treat the failure as
      error and return back to the caller.  We have seen the directo IO failed
      due to this race.  Some of the caller of releasepage() also expecting the
      buffer to be dropped when passed with GFP_KERNEL mask to the
      releasepage()->journal_try_to_free_buffers().
      
      With this patch, if the caller is passing the __GFP_WAIT and __GFP_FS to
      indicating this call could wait, in case of try_to_free_buffers() failed,
      let's waiting for journal_commit_transaction() to finish commit the
      current committing transaction, then try to free those buffers again.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Reviewed-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f31fddf
    • D
      jbd: tidy up revoke cache initialisation and destruction · 1984bb76
      Duane Griffin 提交于
      Make revocation cache destruction safe to call if initialisation fails
      partially or entirely.  This allows it to be used to cleanup in the case
      of initialisation failure, simplifying that code slightly.
      Signed-off-by: NDuane Griffin <duaneg@dghda.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1984bb76
    • D
      jbd: eliminate duplicated code in revocation table init/destroy functions · f4d79ca2
      Duane Griffin 提交于
      The revocation table initialisation/destruction code is repeated for each
      of the two revocation tables stored in the journal.  Refactoring the
      duplicated code into functions is tidier, simplifies the logic in
      initialisation in particular, and slightly reduces the code size.
      
      There should not be any functional change.
      Signed-off-by: NDuane Griffin <duaneg@dghda.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f4d79ca2
    • D
      jbd: replace potentially false assertion with if block · 3850f7a5
      Duane Griffin 提交于
      If an error occurs during jbd cache initialisation it is possible for the
      journal_head_cache to be NULL when journal_destroy_journal_head_cache is
      called.  Replace the J_ASSERT with an if block to handle the situation
      correctly.
      
      Note that even with this fix things will break badly if jbd is statically
      compiled in and cache initialisation fails.
      
      Signed-off-by: Duane Griffin <duaneg@dghda.com
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3850f7a5
  20. 15 5月, 2008 1 次提交
  21. 28 4月, 2008 3 次提交
    • H
      jbd: replace remaining __FUNCTION__ occurrences · 08fc99bf
      Harvey Harrison 提交于
      __FUNCTION__ is gcc-specific, use __func__
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08fc99bf
    • J
      jbd: fix possible journal overflow issues · 5b9a499d
      Josef Bacik 提交于
      There are several cases where the running transaction can get buffers added to
      its BJ_Metadata list which it never dirtied, which makes its t_nr_buffers
      counter end up larger than its t_outstanding_credits counter.
      
      This will cause issues when starting new transactions as while we are logging
      buffers we decrement t_outstanding_buffers, so when t_outstanding_buffers goes
      negative, we will report that we need less space in the journal than we
      actually need, so transactions will be started even though there may not be
      enough room for them.  In the worst case scenario (which admittedly is almost
      impossible to reproduce) this will result in the journal running out of space.
      
      The fix is to only
      refile buffers from the committing transaction to the running transactions
      BJ_Modified list when b_modified is set on that journal, which is the only way
      to be sure if the running transaction has modified that buffer.
      
      This patch also fixes an accounting error in journal_forget, it is possible
      that we can call journal_forget on a buffer without having modified it, only
      gotten write access to it, so instead of freeing a credit, we only do so if
      the buffer was modified.  The assert will help catch if this problem occurs.
      Without these two patches I could hit this assert within minutes of running
      postmark, with them this issue no longer arises.  Thank you,
      Signed-off-by: NJosef Bacik <jbacik@redhat.com>
      Cc: <linux-ext4@vger.kernel.org>
      Acked-by: NJan Kara <jack@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b9a499d
    • J
      jbd: fix the way the b_modified flag is cleared · 5bc833fe
      Josef Bacik 提交于
      Currently at the start of a journal commit we loop through all of the buffers
      on the committing transaction and clear the b_modified flag (the flag that is
      set when a transaction modifies the buffer) under the j_list_lock.
      
      The problem is that everywhere else this flag is modified only under the jbd
      lock buffer flag, so it will race with a running transaction who could
      potentially set it, and have it unset by the committing transaction.
      
      This is also a big waste, you can have several thousands of buffers that you
      are clearing the modified flag on when you may not need to.  This patch
      removes this code and instead clears the b_modified flag upon entering
      do_get_write_access/journal_get_create_access, so if that transaction does
      indeed use the buffer then it will be accounted for properly, and if it does
      not then we know we didn't use it.
      
      That will be important for the next patch in this series.  Tested thoroughly
      by myself using postmark/iozone/bonnie++.
      Signed-off-by: NJosef Bacik <jbacik@redhat.com>
      Cc: <linux-ext4@vger.kernel.org>
      Acked-by: NJan Kara <jack@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5bc833fe