1. 29 3月, 2012 1 次提交
  2. 20 3月, 2012 1 次提交
  3. 14 3月, 2012 9 次提交
    • J
      jbd2: cleanup journal tail after transaction commit · 3339578f
      Jan Kara 提交于
      Normally, we have to issue a cache flush before we can update journal tail in
      journal superblock, effectively wiping out old transactions from the journal.
      So use the fact that during transaction commit we issue cache flush anyway and
      opportunistically push journal tail as far as we can. Since update of journal
      superblock is still costly (we have to use WRITE_FUA), we update log tail only
      if we can free significant amount of space.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      3339578f
    • J
      jbd2: remove bh_state lock from checkpointing code · 932bb305
      Jan Kara 提交于
      All accesses to checkpointing entries in journal_head are protected
      by j_list_lock. Thus __jbd2_journal_remove_checkpoint() doesn't really
      need bh_state lock.
      
      Also the only part of journal head that the rest of checkpointing code
      needs to check is jh->b_transaction which is safe to read under
      j_list_lock.
      
      So we can safely remove bh_state lock from all of checkpointing code which
      makes it considerably prettier.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      932bb305
    • J
      jbd2: remove always true condition in __journal_try_to_free_buffer() · c254c9ec
      Jan Kara 提交于
      The check b_jlist == BJ_None in __journal_try_to_free_buffer() is
      always true (__jbd2_journal_temp_unlink_buffer() also checks this in
      an assertion) so just remove it.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c254c9ec
    • J
      jbd2: declare __jbd2_journal_temp_unlink_buffer() static · 5bebccf9
      Jan Kara 提交于
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5bebccf9
    • J
      jbd2: fix BH_JWrite setting in checkpointing code · 96c86678
      Jan Kara 提交于
      BH_JWrite bit should be set when buffer is written to the journal. So
      checkpointing shouldn't set this bit when writing out buffer. This didn't
      cause any observable bug since BH_JWrite bit is used only for debugging
      purposes but it's good to have this consistent.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      96c86678
    • J
      jbd2: issue cache flush after checkpointing even with internal journal · 79feb521
      Jan Kara 提交于
      When we reach jbd2_cleanup_journal_tail(), there is no guarantee that
      checkpointed buffers are on a stable storage - especially if buffers were
      written out by jbd2_log_do_checkpoint(), they are likely to be only in disk's
      caches. Thus when we update journal superblock effectively removing old
      transaction from journal, this write of superblock can get to stable storage
      before those checkpointed buffers which can result in filesystem corruption
      after a crash. Thus we must unconditionally issue a cache flush before we
      update journal superblock in these cases.
      
      A similar problem can also occur if journal superblock is written only in
      disk's caches, other transaction starts reusing space of the transaction
      cleaned from the log and power failure happens. Subsequent journal replay would
      still try to replay the old transaction but some of it's blocks may be already
      overwritten by the new transaction. For this reason we must use WRITE_FUA when
      updating log tail and we must first write new log tail to disk and update
      in-memory information only after that.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      79feb521
    • N
      PM / Sleep: JBD and JBD2 missing set_freezable() · 35c80422
      Nigel Cunningham 提交于
      With the latest and greatest changes to the freezer, I started seeing
      panics that were caused by jbd2 running post-process freezing and
      hitting the canary BUG_ON for non-TuxOnIce I/O submission. I've traced
      this back to a lack of set_freezable calls in both jbd and jbd2. Since
      they're clearly meant to be frozen (there are tests for freezing()), I
      submit the following patch to add the missing calls.
      Signed-off-by: NNigel Cunningham <nigel@tuxonice.net>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      35c80422
    • J
      jbd2: protect all log tail updates with j_checkpoint_mutex · a78bb11d
      Jan Kara 提交于
      There are some log tail updates that are not protected by j_checkpoint_mutex.
      Some of these are harmless because they happen during startup or shutdown but
      updates in jbd2_journal_commit_transaction() and jbd2_journal_flush() can
      really race with other log tail updates (e.g. someone doing
      jbd2_journal_flush() with someone running jbd2_cleanup_journal_tail()). So
      protect all log tail updates with j_checkpoint_mutex.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a78bb11d
    • J
      jbd2: split updating of journal superblock and marking journal empty · 24bcc89c
      Jan Kara 提交于
      There are three case of updating journal superblock. In the first case, we want
      to mark journal as empty (setting s_sequence to 0), in the second case we want
      to update log tail, in the third case we want to update s_errno. Split these
      cases into separate functions. It makes the code slightly more straightforward
      and later patches will make the distinction even more important.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      24bcc89c
  4. 21 2月, 2012 6 次提交
  5. 05 1月, 2012 1 次提交
    • J
      jbd2: fix hung processes in jbd2_journal_lock_updates() · 9837d8e9
      Jan Kara 提交于
      Toshiyuki Okajima found out that when running
      
      for ((i=0; i < 100000; i++)); do
              if ((i%2 == 0)); then
                      chattr +j /mnt/file
              else
                      chattr -j /mnt/file
              fi
              echo "0" >> /mnt/file
      done
      
      process sometimes hangs indefinitely in jbd2_journal_lock_updates().
      
      Toshiyuki identified that the following race happens:
      
      jbd2_journal_lock_updates()            |jbd2_journal_stop()
      ---------------------------------------+---------------------------------------
       write_lock(&journal->j_state_lock)    |    .
       ++journal->j_barrier_count            |    .
       spin_lock(&tran->t_handle_lock)       |    .
       atomic_read(&tran->t_updates) //not 0 |
                                             | atomic_dec_and_test(&tran->t_updates)
                                             |    // t_updates = 0
                                             | wake_up(&journal->j_wait_updates)
       prepare_to_wait()                     |    // no process is woken up.
       spin_unlock(&tran->t_handle_lock)     |
       write_unlock(&journal->j_state_lock)  |
       schedule() // never return            |
      
      We fix the problem by first calling prepare_to_wait() and only after that
      checking t_updates in jbd2_journal_lock_updates().
      Reported-and-analyzed-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9837d8e9
  6. 29 12月, 2011 1 次提交
    • Y
      jbd2: clear revoked flag on buffers before a new transaction started · 1ba37268
      Yongqiang Yang 提交于
      Currently, we clear revoked flag only when a block is reused.  However,
      this can tigger a false journal error.  Consider a situation when a block
      is used as a meta block and is deleted(revoked) in ordered mode, then the
      block is allocated as a data block to a file.  At this moment, user changes
      the file's journal mode from ordered to journaled and truncates the file.
      The block will be considered re-revoked by journal because it has revoked
      flag still pending from the last transaction and an assertion triggers.
      
      We fix the problem by keeping the revoked status more uptodate - we clear
      revoked flag when switching revoke tables to reflect there is no revoked
      buffers in current transaction any more.
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      1ba37268
  7. 06 12月, 2011 1 次提交
  8. 22 11月, 2011 1 次提交
    • T
      freezer: unexport refrigerator() and update try_to_freeze() slightly · a0acae0e
      Tejun Heo 提交于
      There is no reason to export two functions for entering the
      refrigerator.  Calling refrigerator() instead of try_to_freeze()
      doesn't save anything noticeable or removes any race condition.
      
      * Rename refrigerator() to __refrigerator() and make it return bool
        indicating whether it scheduled out for freezing.
      
      * Update try_to_freeze() to return bool and relay the return value of
        __refrigerator() if freezing().
      
      * Convert all refrigerator() users to try_to_freeze().
      
      * Update documentation accordingly.
      
      * While at it, add might_sleep() to try_to_freeze().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Samuel Ortiz <samuel@sortiz.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Christoph Hellwig <hch@infradead.org>
      a0acae0e
  9. 02 11月, 2011 2 次提交
    • E
      jbd2: Unify log messages in jbd2 code · f2a44523
      Eryu Guan 提交于
      Some jbd2 code prints out kernel messages with "JBD2: " prefix, at the
      same time other jbd2 code prints with "JBD: " prefix. Unify the prefix
      to "JBD2: ".
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f2a44523
    • E
      jbd/jbd2: validate sb->s_first in journal_get_superblock() · 8762202d
      Eryu Guan 提交于
      I hit a J_ASSERT(blocknr != 0) failure in cleanup_journal_tail() when
      mounting a fsfuzzed ext3 image. It turns out that the corrupted ext3
      image has s_first = 0 in journal superblock, and the 0 is passed to
      journal->j_head in journal_reset(), then to blocknr in
      cleanup_journal_tail(), in the end the J_ASSERT failed.
      
      So validate s_first after reading journal superblock from disk in
      journal_get_superblock() to ensure s_first is valid.
      
      The following script could reproduce it:
      
      fstype=ext3
      blocksize=1024
      img=$fstype.img
      offset=0
      found=0
      magic="c0 3b 39 98"
      
      dd if=/dev/zero of=$img bs=1M count=8
      mkfs -t $fstype -b $blocksize -F $img
      filesize=`stat -c %s $img`
      while [ $offset -lt $filesize ]
      do
              if od -j $offset -N 4 -t x1 $img | grep -i "$magic";then
                      echo "Found journal: $offset"
                      found=1
                      break
              fi
              offset=`echo "$offset+$blocksize" | bc`
      done
      
      if [ $found -ne 1 ];then
              echo "Magic \"$magic\" not found"
              exit 1
      fi
      
      dd if=/dev/zero of=$img seek=$(($offset+23)) conv=notrunc bs=1 count=1
      
      mkdir -p ./mnt
      mount -o loop $img ./mnt
      
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8762202d
  10. 27 10月, 2011 1 次提交
  11. 04 9月, 2011 2 次提交
    • D
      jbd2: use gfp_t instead of int · d2159fb7
      Dan Carpenter 提交于
      This silences some Sparse warnings:
      fs/jbd2/transaction.c:135:69: warning: incorrect type in argument 2 (different base types)
      fs/jbd2/transaction.c:135:69:    expected restricted gfp_t [usertype] flags
      fs/jbd2/transaction.c:135:69:    got int [signed] gfp_mask
      Signed-off-by: NDan Carpenter <error27@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d2159fb7
    • T
      jbd2: add debugging information to jbd2_journal_dirty_metadata() · 9ea7a0df
      Theodore Ts'o 提交于
      Add debugging information in case jbd2_journal_dirty_metadata() is
      called with a buffer_head which didn't have
      jbd2_journal_get_write_access() called on it, or if the journal_head
      has the wrong transaction in it.  In addition, return an error code.
      This won't change anything for ocfs2, which will BUG_ON() the non-zero
      exit code.
      
      For ext4, the caller of this function is ext4_handle_dirty_metadata(),
      and on seeing a non-zero return code, will call __ext4_journal_stop(),
      which will print the function and line number of the (buggy) calling
      function and abort the journal.  This will allow us to recover instead
      of bug halting, which is better from a robustness and reliability
      point of view.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9ea7a0df
  12. 11 7月, 2011 1 次提交
  13. 28 6月, 2011 1 次提交
    • T
      jbd2: use WRITE_SYNC in journal checkpoint · d3ad8434
      Tao Ma 提交于
      In journal checkpoint, we write the buffer and wait for its finish.
      But in cfq, the async queue has a very low priority, and in our test,
      if there are too many sync queues and every queue is filled up with
      requests, the write request will be delayed for quite a long time and
      all the tasks which are waiting for journal space will end with errors like:
      
      INFO: task attr_set:3816 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      attr_set      D ffff880028393480     0  3816      1 0x00000000
       ffff8802073fbae8 0000000000000086 ffff8802140847c8 ffff8800283934e8
       ffff8802073fb9d8 ffffffff8103e456 ffff8802140847b8 ffff8801ed728080
       ffff8801db4bc080 ffff8801ed728450 ffff880028393480 0000000000000002
      Call Trace:
       [<ffffffff8103e456>] ? __dequeue_entity+0x33/0x38
       [<ffffffff8103caad>] ? need_resched+0x23/0x2d
       [<ffffffff814006a6>] ? thread_return+0xa2/0xbc
       [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
       [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
       [<ffffffff81400d31>] __mutex_lock_common+0x14e/0x1a9
       [<ffffffffa021dbfb>] ? brelse+0x13/0x15 [ext4]
       [<ffffffff81400ddb>] __mutex_lock_slowpath+0x19/0x1b
       [<ffffffff81400b2d>] mutex_lock+0x1b/0x32
       [<ffffffffa01f927b>] __jbd2_journal_insert_checkpoint+0xe3/0x20c [jbd2]
       [<ffffffffa01f547b>] start_this_handle+0x438/0x527 [jbd2]
       [<ffffffff8106f491>] ? autoremove_wake_function+0x0/0x3e
       [<ffffffffa01f560b>] jbd2_journal_start+0xa1/0xcc [jbd2]
       [<ffffffffa02353be>] ext4_journal_start_sb+0x57/0x81 [ext4]
       [<ffffffffa024a314>] ext4_xattr_set+0x6c/0xe3 [ext4]
       [<ffffffffa024aaff>] ext4_xattr_user_set+0x42/0x4b [ext4]
       [<ffffffff81145adb>] generic_setxattr+0x6b/0x76
       [<ffffffff81146ac0>] __vfs_setxattr_noperm+0x47/0xc0
       [<ffffffff81146bb8>] vfs_setxattr+0x7f/0x9a
       [<ffffffff81146c88>] setxattr+0xb5/0xe8
       [<ffffffff81137467>] ? do_filp_open+0x571/0xa6e
       [<ffffffff81146d26>] sys_fsetxattr+0x6b/0x91
       [<ffffffff81002d32>] system_call_fastpath+0x16/0x1b
      
      So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
      be moved into sync queue and handled by cfq timely. We also use the new plug,
      sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Jan Kara <jack@suse.cz>
      Reported-by: NRobin Dong <sanbai@taobao.com>
      d3ad8434
  14. 14 6月, 2011 1 次提交
    • J
      jbd2: Fix oops in jbd2_journal_remove_journal_head() · de1b7941
      Jan Kara 提交于
      jbd2_journal_remove_journal_head() can oops when trying to access
      journal_head returned by bh2jh(). This is caused for example by the
      following race:
      
      	TASK1					TASK2
        jbd2_journal_commit_transaction()
          ...
          processing t_forget list
            __jbd2_journal_refile_buffer(jh);
            if (!jh->b_transaction) {
              jbd_unlock_bh_state(bh);
      					jbd2_journal_try_to_free_buffers()
      					  jbd2_journal_grab_journal_head(bh)
      					  jbd_lock_bh_state(bh)
      					  __journal_try_to_free_buffer()
      					  jbd2_journal_put_journal_head(jh)
              jbd2_journal_remove_journal_head(bh);
      
      jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and
      buffer is not part of any transaction and thus frees journal_head
      before TASK1 gets to doing so. Note that even buffer_head can be
      released by try_to_free_buffers() after
      jbd2_journal_put_journal_head() which adds even larger opportunity for
      oops (but I didn't see this happen in reality).
      
      Fix the problem by making transactions hold their own journal_head
      reference (in b_jcount). That way we don't have to remove journal_head
      explicitely via jbd2_journal_remove_journal_head() and instead just
      remove journal_head when b_jcount drops to zero. The result of this is
      that [__]jbd2_journal_refile_buffer(),
      [__]jbd2_journal_unfile_buffer(), and
      __jdb2_journal_remove_checkpoint() can free journal_head which needs
      modification of a few callers. Also we have to be careful because once
      journal_head is removed, buffer_head might be freed as well. So we
      have to get our own buffer_head reference where it matters.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      de1b7941
  15. 13 6月, 2011 1 次提交
  16. 26 5月, 2011 1 次提交
  17. 25 5月, 2011 1 次提交
  18. 24 5月, 2011 2 次提交
    • J
      jbd2: Add function jbd2_trans_will_send_data_barrier() · bbd2be36
      Jan Kara 提交于
      Provide a function which returns whether a transaction with given tid
      will send a flush to the filesystem device.  The function will be used
      by ext4 to detect whether fsync needs to send a separate flush or not.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      bbd2be36
    • J
      jbd2: fix sending of data flush on journal commit · 81be12c8
      Jan Kara 提交于
      
      In data=ordered mode, it's theoretically possible (however rare) that
      an inode is filed to transaction's t_inode_list and a flusher thread
      writes all the data and inode is reclaimed before the transaction
      starts to commit.  In such a case, we could erroneously omit sending a
      flush to file system device when it is different from the journal
      device (because data can still be in disk cache only).
      
      Fix the problem by setting a flag in a transaction when some inode is added
      to it and then send disk flush in the commit code when the flag is set.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      81be12c8
  19. 23 5月, 2011 1 次提交
    • T
      jbd2: Fix the wrong calculation of t_max_wait in update_t_max_wait · 28e35e42
      Tao Ma 提交于
      t_max_wait is added in commit 8e85fb3f to indicate how long we
      were waiting for new transaction to start. In commit 6d0bf005,
      it is moved to another function named update_t_max_wait to
      avoid a build warning. But the wrong thing is that the original
      'ts' is initialized in the start of function start_this_handle
      and we can calculate t_max_wait in the right way. while with
      this change, ts is initialized within the function and t_max_wait
      can never be calculated right.
      
      This patch moves the initialization of ts to the original beginning
      of start_this_handle and pass it to function update_t_max_wait so
      that it can be calculated right and the build warning is avoided also.
      
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      28e35e42
  20. 17 5月, 2011 1 次提交
  21. 09 5月, 2011 2 次提交
  22. 02 5月, 2011 1 次提交
    • T
      jbd2: fix fsync() tid wraparound bug · deeeaf13
      Theodore Ts'o 提交于
      If an application program does not make any changes to the indirect
      blocks or extent tree, i_datasync_tid will not get updated.  If there
      are enough commits (i.e., 2**31) such that tid_geq()'s calculations
      wrap, and there isn't a currently active transaction at the time of
      the fdatasync() call, this can end up triggering a BUG_ON in
      fs/jbd2/commit.c:
      
      	J_ASSERT(journal->j_running_transaction != NULL);
      
      It's pretty rare that this can happen, since it requires the use of
      fdatasync() plus *very* frequent and excessive use of fsync().  But
      with the right workload, it can.
      
      We fix this by replacing the use of tid_geq() with an equality test,
      since there's only one valid transaction id that we is valid for us to
      wait until it is commited: namely, the currently running transaction
      (if it exists).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      deeeaf13
  23. 06 4月, 2011 1 次提交
    • Z
      jbd2: fix potential memory leak on transaction commit · 6cba611e
      Zhang Huan 提交于
      There is potential memory leak of journal head in function
      jbd2_journal_commit_transaction. The problem is that JBD2 will not
      reclaim the journal head of commit record if error occurs or journal
      is abotred.
      
      I use the following script to reproduce this issue, on a RHEL6
      system. I found it very easy to reproduce with async commit enabled.
      
      mount /dev/sdb /mnt -o journal_checksum,journal_async_commit
      touch /mnt/xxx
      echo offline > /sys/block/sdb/device/state
      sync
      umount /mnt
      rmmod ext4
      rmmod jbd2
      
      Removal of the jbd2 module will make slab complaining that
      "cache `jbd2_journal_head': can't free all objects".
      Signed-off-by: NZhang Huan <zhhuan@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      6cba611e