1. 21 2月, 2012 2 次提交
  2. 05 1月, 2012 1 次提交
    • J
      jbd2: fix hung processes in jbd2_journal_lock_updates() · 9837d8e9
      Jan Kara 提交于
      Toshiyuki Okajima found out that when running
      
      for ((i=0; i < 100000; i++)); do
              if ((i%2 == 0)); then
                      chattr +j /mnt/file
              else
                      chattr -j /mnt/file
              fi
              echo "0" >> /mnt/file
      done
      
      process sometimes hangs indefinitely in jbd2_journal_lock_updates().
      
      Toshiyuki identified that the following race happens:
      
      jbd2_journal_lock_updates()            |jbd2_journal_stop()
      ---------------------------------------+---------------------------------------
       write_lock(&journal->j_state_lock)    |    .
       ++journal->j_barrier_count            |    .
       spin_lock(&tran->t_handle_lock)       |    .
       atomic_read(&tran->t_updates) //not 0 |
                                             | atomic_dec_and_test(&tran->t_updates)
                                             |    // t_updates = 0
                                             | wake_up(&journal->j_wait_updates)
       prepare_to_wait()                     |    // no process is woken up.
       spin_unlock(&tran->t_handle_lock)     |
       write_unlock(&journal->j_state_lock)  |
       schedule() // never return            |
      
      We fix the problem by first calling prepare_to_wait() and only after that
      checking t_updates in jbd2_journal_lock_updates().
      Reported-and-analyzed-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9837d8e9
  3. 29 12月, 2011 1 次提交
    • Y
      jbd2: clear revoked flag on buffers before a new transaction started · 1ba37268
      Yongqiang Yang 提交于
      Currently, we clear revoked flag only when a block is reused.  However,
      this can tigger a false journal error.  Consider a situation when a block
      is used as a meta block and is deleted(revoked) in ordered mode, then the
      block is allocated as a data block to a file.  At this moment, user changes
      the file's journal mode from ordered to journaled and truncates the file.
      The block will be considered re-revoked by journal because it has revoked
      flag still pending from the last transaction and an assertion triggers.
      
      We fix the problem by keeping the revoked status more uptodate - we clear
      revoked flag when switching revoke tables to reflect there is no revoked
      buffers in current transaction any more.
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      1ba37268
  4. 06 12月, 2011 1 次提交
  5. 22 11月, 2011 1 次提交
    • T
      freezer: unexport refrigerator() and update try_to_freeze() slightly · a0acae0e
      Tejun Heo 提交于
      There is no reason to export two functions for entering the
      refrigerator.  Calling refrigerator() instead of try_to_freeze()
      doesn't save anything noticeable or removes any race condition.
      
      * Rename refrigerator() to __refrigerator() and make it return bool
        indicating whether it scheduled out for freezing.
      
      * Update try_to_freeze() to return bool and relay the return value of
        __refrigerator() if freezing().
      
      * Convert all refrigerator() users to try_to_freeze().
      
      * Update documentation accordingly.
      
      * While at it, add might_sleep() to try_to_freeze().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Samuel Ortiz <samuel@sortiz.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Christoph Hellwig <hch@infradead.org>
      a0acae0e
  6. 02 11月, 2011 2 次提交
    • E
      jbd2: Unify log messages in jbd2 code · f2a44523
      Eryu Guan 提交于
      Some jbd2 code prints out kernel messages with "JBD2: " prefix, at the
      same time other jbd2 code prints with "JBD: " prefix. Unify the prefix
      to "JBD2: ".
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f2a44523
    • E
      jbd/jbd2: validate sb->s_first in journal_get_superblock() · 8762202d
      Eryu Guan 提交于
      I hit a J_ASSERT(blocknr != 0) failure in cleanup_journal_tail() when
      mounting a fsfuzzed ext3 image. It turns out that the corrupted ext3
      image has s_first = 0 in journal superblock, and the 0 is passed to
      journal->j_head in journal_reset(), then to blocknr in
      cleanup_journal_tail(), in the end the J_ASSERT failed.
      
      So validate s_first after reading journal superblock from disk in
      journal_get_superblock() to ensure s_first is valid.
      
      The following script could reproduce it:
      
      fstype=ext3
      blocksize=1024
      img=$fstype.img
      offset=0
      found=0
      magic="c0 3b 39 98"
      
      dd if=/dev/zero of=$img bs=1M count=8
      mkfs -t $fstype -b $blocksize -F $img
      filesize=`stat -c %s $img`
      while [ $offset -lt $filesize ]
      do
              if od -j $offset -N 4 -t x1 $img | grep -i "$magic";then
                      echo "Found journal: $offset"
                      found=1
                      break
              fi
              offset=`echo "$offset+$blocksize" | bc`
      done
      
      if [ $found -ne 1 ];then
              echo "Magic \"$magic\" not found"
              exit 1
      fi
      
      dd if=/dev/zero of=$img seek=$(($offset+23)) conv=notrunc bs=1 count=1
      
      mkdir -p ./mnt
      mount -o loop $img ./mnt
      
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8762202d
  7. 27 10月, 2011 1 次提交
  8. 04 9月, 2011 2 次提交
    • D
      jbd2: use gfp_t instead of int · d2159fb7
      Dan Carpenter 提交于
      This silences some Sparse warnings:
      fs/jbd2/transaction.c:135:69: warning: incorrect type in argument 2 (different base types)
      fs/jbd2/transaction.c:135:69:    expected restricted gfp_t [usertype] flags
      fs/jbd2/transaction.c:135:69:    got int [signed] gfp_mask
      Signed-off-by: NDan Carpenter <error27@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d2159fb7
    • T
      jbd2: add debugging information to jbd2_journal_dirty_metadata() · 9ea7a0df
      Theodore Ts'o 提交于
      Add debugging information in case jbd2_journal_dirty_metadata() is
      called with a buffer_head which didn't have
      jbd2_journal_get_write_access() called on it, or if the journal_head
      has the wrong transaction in it.  In addition, return an error code.
      This won't change anything for ocfs2, which will BUG_ON() the non-zero
      exit code.
      
      For ext4, the caller of this function is ext4_handle_dirty_metadata(),
      and on seeing a non-zero return code, will call __ext4_journal_stop(),
      which will print the function and line number of the (buggy) calling
      function and abort the journal.  This will allow us to recover instead
      of bug halting, which is better from a robustness and reliability
      point of view.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9ea7a0df
  9. 11 7月, 2011 1 次提交
  10. 28 6月, 2011 1 次提交
    • T
      jbd2: use WRITE_SYNC in journal checkpoint · d3ad8434
      Tao Ma 提交于
      In journal checkpoint, we write the buffer and wait for its finish.
      But in cfq, the async queue has a very low priority, and in our test,
      if there are too many sync queues and every queue is filled up with
      requests, the write request will be delayed for quite a long time and
      all the tasks which are waiting for journal space will end with errors like:
      
      INFO: task attr_set:3816 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      attr_set      D ffff880028393480     0  3816      1 0x00000000
       ffff8802073fbae8 0000000000000086 ffff8802140847c8 ffff8800283934e8
       ffff8802073fb9d8 ffffffff8103e456 ffff8802140847b8 ffff8801ed728080
       ffff8801db4bc080 ffff8801ed728450 ffff880028393480 0000000000000002
      Call Trace:
       [<ffffffff8103e456>] ? __dequeue_entity+0x33/0x38
       [<ffffffff8103caad>] ? need_resched+0x23/0x2d
       [<ffffffff814006a6>] ? thread_return+0xa2/0xbc
       [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
       [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
       [<ffffffff81400d31>] __mutex_lock_common+0x14e/0x1a9
       [<ffffffffa021dbfb>] ? brelse+0x13/0x15 [ext4]
       [<ffffffff81400ddb>] __mutex_lock_slowpath+0x19/0x1b
       [<ffffffff81400b2d>] mutex_lock+0x1b/0x32
       [<ffffffffa01f927b>] __jbd2_journal_insert_checkpoint+0xe3/0x20c [jbd2]
       [<ffffffffa01f547b>] start_this_handle+0x438/0x527 [jbd2]
       [<ffffffff8106f491>] ? autoremove_wake_function+0x0/0x3e
       [<ffffffffa01f560b>] jbd2_journal_start+0xa1/0xcc [jbd2]
       [<ffffffffa02353be>] ext4_journal_start_sb+0x57/0x81 [ext4]
       [<ffffffffa024a314>] ext4_xattr_set+0x6c/0xe3 [ext4]
       [<ffffffffa024aaff>] ext4_xattr_user_set+0x42/0x4b [ext4]
       [<ffffffff81145adb>] generic_setxattr+0x6b/0x76
       [<ffffffff81146ac0>] __vfs_setxattr_noperm+0x47/0xc0
       [<ffffffff81146bb8>] vfs_setxattr+0x7f/0x9a
       [<ffffffff81146c88>] setxattr+0xb5/0xe8
       [<ffffffff81137467>] ? do_filp_open+0x571/0xa6e
       [<ffffffff81146d26>] sys_fsetxattr+0x6b/0x91
       [<ffffffff81002d32>] system_call_fastpath+0x16/0x1b
      
      So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
      be moved into sync queue and handled by cfq timely. We also use the new plug,
      sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Jan Kara <jack@suse.cz>
      Reported-by: NRobin Dong <sanbai@taobao.com>
      d3ad8434
  11. 14 6月, 2011 1 次提交
    • J
      jbd2: Fix oops in jbd2_journal_remove_journal_head() · de1b7941
      Jan Kara 提交于
      jbd2_journal_remove_journal_head() can oops when trying to access
      journal_head returned by bh2jh(). This is caused for example by the
      following race:
      
      	TASK1					TASK2
        jbd2_journal_commit_transaction()
          ...
          processing t_forget list
            __jbd2_journal_refile_buffer(jh);
            if (!jh->b_transaction) {
              jbd_unlock_bh_state(bh);
      					jbd2_journal_try_to_free_buffers()
      					  jbd2_journal_grab_journal_head(bh)
      					  jbd_lock_bh_state(bh)
      					  __journal_try_to_free_buffer()
      					  jbd2_journal_put_journal_head(jh)
              jbd2_journal_remove_journal_head(bh);
      
      jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and
      buffer is not part of any transaction and thus frees journal_head
      before TASK1 gets to doing so. Note that even buffer_head can be
      released by try_to_free_buffers() after
      jbd2_journal_put_journal_head() which adds even larger opportunity for
      oops (but I didn't see this happen in reality).
      
      Fix the problem by making transactions hold their own journal_head
      reference (in b_jcount). That way we don't have to remove journal_head
      explicitely via jbd2_journal_remove_journal_head() and instead just
      remove journal_head when b_jcount drops to zero. The result of this is
      that [__]jbd2_journal_refile_buffer(),
      [__]jbd2_journal_unfile_buffer(), and
      __jdb2_journal_remove_checkpoint() can free journal_head which needs
      modification of a few callers. Also we have to be careful because once
      journal_head is removed, buffer_head might be freed as well. So we
      have to get our own buffer_head reference where it matters.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      de1b7941
  12. 13 6月, 2011 1 次提交
  13. 26 5月, 2011 1 次提交
  14. 25 5月, 2011 1 次提交
  15. 24 5月, 2011 2 次提交
    • J
      jbd2: Add function jbd2_trans_will_send_data_barrier() · bbd2be36
      Jan Kara 提交于
      Provide a function which returns whether a transaction with given tid
      will send a flush to the filesystem device.  The function will be used
      by ext4 to detect whether fsync needs to send a separate flush or not.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      bbd2be36
    • J
      jbd2: fix sending of data flush on journal commit · 81be12c8
      Jan Kara 提交于
      
      In data=ordered mode, it's theoretically possible (however rare) that
      an inode is filed to transaction's t_inode_list and a flusher thread
      writes all the data and inode is reclaimed before the transaction
      starts to commit.  In such a case, we could erroneously omit sending a
      flush to file system device when it is different from the journal
      device (because data can still be in disk cache only).
      
      Fix the problem by setting a flag in a transaction when some inode is added
      to it and then send disk flush in the commit code when the flag is set.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      81be12c8
  16. 23 5月, 2011 1 次提交
    • T
      jbd2: Fix the wrong calculation of t_max_wait in update_t_max_wait · 28e35e42
      Tao Ma 提交于
      t_max_wait is added in commit 8e85fb3f to indicate how long we
      were waiting for new transaction to start. In commit 6d0bf005,
      it is moved to another function named update_t_max_wait to
      avoid a build warning. But the wrong thing is that the original
      'ts' is initialized in the start of function start_this_handle
      and we can calculate t_max_wait in the right way. while with
      this change, ts is initialized within the function and t_max_wait
      can never be calculated right.
      
      This patch moves the initialization of ts to the original beginning
      of start_this_handle and pass it to function update_t_max_wait so
      that it can be calculated right and the build warning is avoided also.
      
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      28e35e42
  17. 17 5月, 2011 1 次提交
  18. 09 5月, 2011 2 次提交
  19. 02 5月, 2011 1 次提交
    • T
      jbd2: fix fsync() tid wraparound bug · deeeaf13
      Theodore Ts'o 提交于
      If an application program does not make any changes to the indirect
      blocks or extent tree, i_datasync_tid will not get updated.  If there
      are enough commits (i.e., 2**31) such that tid_geq()'s calculations
      wrap, and there isn't a currently active transaction at the time of
      the fdatasync() call, this can end up triggering a BUG_ON in
      fs/jbd2/commit.c:
      
      	J_ASSERT(journal->j_running_transaction != NULL);
      
      It's pretty rare that this can happen, since it requires the use of
      fdatasync() plus *very* frequent and excessive use of fsync().  But
      with the right workload, it can.
      
      We fix this by replacing the use of tid_geq() with an equality test,
      since there's only one valid transaction id that we is valid for us to
      wait until it is commited: namely, the currently running transaction
      (if it exists).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      deeeaf13
  20. 06 4月, 2011 1 次提交
    • Z
      jbd2: fix potential memory leak on transaction commit · 6cba611e
      Zhang Huan 提交于
      There is potential memory leak of journal head in function
      jbd2_journal_commit_transaction. The problem is that JBD2 will not
      reclaim the journal head of commit record if error occurs or journal
      is abotred.
      
      I use the following script to reproduce this issue, on a RHEL6
      system. I found it very easy to reproduce with async commit enabled.
      
      mount /dev/sdb /mnt -o journal_checksum,journal_async_commit
      touch /mnt/xxx
      echo offline > /sys/block/sdb/device/state
      sync
      umount /mnt
      rmmod ext4
      rmmod jbd2
      
      Removal of the jbd2 module will make slab complaining that
      "cache `jbd2_journal_head': can't free all objects".
      Signed-off-by: NZhang Huan <zhhuan@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      6cba611e
  21. 05 4月, 2011 1 次提交
  22. 31 3月, 2011 1 次提交
  23. 17 3月, 2011 1 次提交
  24. 10 3月, 2011 1 次提交
    • J
      block: kill off REQ_UNPLUG · 721a9602
      Jens Axboe 提交于
      With the plugging now being explicitly controlled by the
      submitter, callers need not pass down unplugging hints
      to the block layer. If they want to unplug, it's because they
      manually plugged on their own - in which case, they should just
      unplug at will.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      721a9602
  25. 01 3月, 2011 1 次提交
  26. 12 2月, 2011 1 次提交
    • T
      jbd2: call __jbd2_log_start_commit with j_state_lock write locked · e4471831
      Theodore Ts'o 提交于
      On an SMP ARM system running ext4, I've received a report that the
      first J_ASSERT in jbd2_journal_commit_transaction has been triggering:
      
      	J_ASSERT(journal->j_running_transaction != NULL);
      
      While investigating possible causes for this problem, I noticed that
      __jbd2_log_start_commit() is getting called with j_state_lock only
      read-locked, in spite of the fact that it's possible for it might
      j_commit_request.  Fix this by grabbing the necessary information so
      we can test to see if we need to start a new transaction before
      dropping the read lock, and then calling jbd2_log_start_commit() which
      will grab the write lock.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e4471831
  27. 11 1月, 2011 1 次提交
  28. 19 12月, 2010 5 次提交
  29. 17 12月, 2010 1 次提交
  30. 10 12月, 2010 1 次提交
  31. 18 11月, 2010 1 次提交