1. 14 3月, 2012 3 次提交
    • J
      jbd2: fix BH_JWrite setting in checkpointing code · 96c86678
      Jan Kara 提交于
      BH_JWrite bit should be set when buffer is written to the journal. So
      checkpointing shouldn't set this bit when writing out buffer. This didn't
      cause any observable bug since BH_JWrite bit is used only for debugging
      purposes but it's good to have this consistent.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      96c86678
    • J
      jbd2: issue cache flush after checkpointing even with internal journal · 79feb521
      Jan Kara 提交于
      When we reach jbd2_cleanup_journal_tail(), there is no guarantee that
      checkpointed buffers are on a stable storage - especially if buffers were
      written out by jbd2_log_do_checkpoint(), they are likely to be only in disk's
      caches. Thus when we update journal superblock effectively removing old
      transaction from journal, this write of superblock can get to stable storage
      before those checkpointed buffers which can result in filesystem corruption
      after a crash. Thus we must unconditionally issue a cache flush before we
      update journal superblock in these cases.
      
      A similar problem can also occur if journal superblock is written only in
      disk's caches, other transaction starts reusing space of the transaction
      cleaned from the log and power failure happens. Subsequent journal replay would
      still try to replay the old transaction but some of it's blocks may be already
      overwritten by the new transaction. For this reason we must use WRITE_FUA when
      updating log tail and we must first write new log tail to disk and update
      in-memory information only after that.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      79feb521
    • J
      jbd2: split updating of journal superblock and marking journal empty · 24bcc89c
      Jan Kara 提交于
      There are three case of updating journal superblock. In the first case, we want
      to mark journal as empty (setting s_sequence to 0), in the second case we want
      to update log tail, in the third case we want to update s_errno. Split these
      cases into separate functions. It makes the code slightly more straightforward
      and later patches will make the distinction even more important.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      24bcc89c
  2. 21 2月, 2012 2 次提交
  3. 06 12月, 2011 1 次提交
  4. 28 6月, 2011 1 次提交
    • T
      jbd2: use WRITE_SYNC in journal checkpoint · d3ad8434
      Tao Ma 提交于
      In journal checkpoint, we write the buffer and wait for its finish.
      But in cfq, the async queue has a very low priority, and in our test,
      if there are too many sync queues and every queue is filled up with
      requests, the write request will be delayed for quite a long time and
      all the tasks which are waiting for journal space will end with errors like:
      
      INFO: task attr_set:3816 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      attr_set      D ffff880028393480     0  3816      1 0x00000000
       ffff8802073fbae8 0000000000000086 ffff8802140847c8 ffff8800283934e8
       ffff8802073fb9d8 ffffffff8103e456 ffff8802140847b8 ffff8801ed728080
       ffff8801db4bc080 ffff8801ed728450 ffff880028393480 0000000000000002
      Call Trace:
       [<ffffffff8103e456>] ? __dequeue_entity+0x33/0x38
       [<ffffffff8103caad>] ? need_resched+0x23/0x2d
       [<ffffffff814006a6>] ? thread_return+0xa2/0xbc
       [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
       [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
       [<ffffffff81400d31>] __mutex_lock_common+0x14e/0x1a9
       [<ffffffffa021dbfb>] ? brelse+0x13/0x15 [ext4]
       [<ffffffff81400ddb>] __mutex_lock_slowpath+0x19/0x1b
       [<ffffffff81400b2d>] mutex_lock+0x1b/0x32
       [<ffffffffa01f927b>] __jbd2_journal_insert_checkpoint+0xe3/0x20c [jbd2]
       [<ffffffffa01f547b>] start_this_handle+0x438/0x527 [jbd2]
       [<ffffffff8106f491>] ? autoremove_wake_function+0x0/0x3e
       [<ffffffffa01f560b>] jbd2_journal_start+0xa1/0xcc [jbd2]
       [<ffffffffa02353be>] ext4_journal_start_sb+0x57/0x81 [ext4]
       [<ffffffffa024a314>] ext4_xattr_set+0x6c/0xe3 [ext4]
       [<ffffffffa024aaff>] ext4_xattr_user_set+0x42/0x4b [ext4]
       [<ffffffff81145adb>] generic_setxattr+0x6b/0x76
       [<ffffffff81146ac0>] __vfs_setxattr_noperm+0x47/0xc0
       [<ffffffff81146bb8>] vfs_setxattr+0x7f/0x9a
       [<ffffffff81146c88>] setxattr+0xb5/0xe8
       [<ffffffff81137467>] ? do_filp_open+0x571/0xa6e
       [<ffffffff81146d26>] sys_fsetxattr+0x6b/0x91
       [<ffffffff81002d32>] system_call_fastpath+0x16/0x1b
      
      So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
      be moved into sync queue and handled by cfq timely. We also use the new plug,
      sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Jan Kara <jack@suse.cz>
      Reported-by: NRobin Dong <sanbai@taobao.com>
      d3ad8434
  5. 14 6月, 2011 1 次提交
    • J
      jbd2: Fix oops in jbd2_journal_remove_journal_head() · de1b7941
      Jan Kara 提交于
      jbd2_journal_remove_journal_head() can oops when trying to access
      journal_head returned by bh2jh(). This is caused for example by the
      following race:
      
      	TASK1					TASK2
        jbd2_journal_commit_transaction()
          ...
          processing t_forget list
            __jbd2_journal_refile_buffer(jh);
            if (!jh->b_transaction) {
              jbd_unlock_bh_state(bh);
      					jbd2_journal_try_to_free_buffers()
      					  jbd2_journal_grab_journal_head(bh)
      					  jbd_lock_bh_state(bh)
      					  __journal_try_to_free_buffer()
      					  jbd2_journal_put_journal_head(jh)
              jbd2_journal_remove_journal_head(bh);
      
      jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and
      buffer is not part of any transaction and thus frees journal_head
      before TASK1 gets to doing so. Note that even buffer_head can be
      released by try_to_free_buffers() after
      jbd2_journal_put_journal_head() which adds even larger opportunity for
      oops (but I didn't see this happen in reality).
      
      Fix the problem by making transactions hold their own journal_head
      reference (in b_jcount). That way we don't have to remove journal_head
      explicitely via jbd2_journal_remove_journal_head() and instead just
      remove journal_head when b_jcount drops to zero. The result of this is
      that [__]jbd2_journal_refile_buffer(),
      [__]jbd2_journal_unfile_buffer(), and
      __jdb2_journal_remove_checkpoint() can free journal_head which needs
      modification of a few callers. Also we have to be careful because once
      journal_head is removed, buffer_head might be freed as well. So we
      have to get our own buffer_head reference where it matters.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      de1b7941
  6. 28 10月, 2010 1 次提交
  7. 17 9月, 2010 1 次提交
    • C
      block: remove BLKDEV_IFL_WAIT · dd3932ed
      Christoph Hellwig 提交于
      All the blkdev_issue_* helpers can only sanely be used for synchronous
      caller.  To issue cache flushes or barriers asynchronously the caller needs
      to set up a bio by itself with a completion callback to move the asynchronous
      state machine ahead.  So drop the BLKDEV_IFL_WAIT flag that is always
      specified when calling blkdev_issue_* and also remove the now unused flags
      argument to blkdev_issue_flush and blkdev_issue_zeroout.  For
      blkdev_issue_discard we need to keep it for the secure discard flag, which
      gains a more descriptive name and loses the bitops vs flag confusion.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      dd3932ed
  8. 18 8月, 2010 1 次提交
    • C
      remove SWRITE* I/O types · 9cb569d6
      Christoph Hellwig 提交于
      These flags aren't real I/O types, but tell ll_rw_block to always
      lock the buffer instead of giving up on a failed trylock.
      
      Instead add a new write_dirty_buffer helper that implements this semantic
      and use it from the existing SWRITE* callers.  Note that the ll_rw_block
      code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
      this patch fixes.
      
      In the ufs code clean up the helper that used to call ll_rw_block
      to mirror sync_dirty_buffer, which is the function it implements for
      compound buffers.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9cb569d6
  9. 04 8月, 2010 1 次提交
  10. 02 8月, 2010 1 次提交
  11. 29 4月, 2010 1 次提交
  12. 23 12月, 2009 2 次提交
    • T
      ext4, jbd2: Add barriers for file systems with exernal journals · cc3e1bea
      Theodore Ts'o 提交于
      This is a bit complicated because we are trying to optimize when we
      send barriers to the fs data disk.  We could just throw in an extra
      barrier to the data disk whenever we send a barrier to the journal
      disk, but that's not always strictly necessary.
      
      We only need to send a barrier during a commit when there are data
      blocks which are must be written out due to an inode written in
      ordered mode, or if fsync() depends on the commit to force data blocks
      to disk.  Finally, before we drop transactions from the beginning of
      the journal during a checkpoint operation, we need to guarantee that
      any blocks that were flushed out to the data disk are firmly on the
      rust platter before we drop the transaction from the journal.
      
      Thanks to Oleg Drokin for pointing out this flaw in ext3/ext4.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      cc3e1bea
    • T
      71f2be21
  13. 30 9月, 2009 1 次提交
  14. 17 6月, 2009 1 次提交
  15. 07 11月, 2008 1 次提交
  16. 05 11月, 2008 1 次提交
  17. 07 11月, 2008 1 次提交
  18. 11 10月, 2008 1 次提交
    • H
      jbd2: fix error handling for checkpoint io · 44519faf
      Hidehiro Kawai 提交于
      When a checkpointing IO fails, current JBD2 code doesn't check the
      error and continue journaling.  This means latest metadata can be
      lost from both the journal and filesystem.
      
      This patch leaves the failed metadata blocks in the journal space
      and aborts journaling in the case of jbd2_log_do_checkpoint().
      To achieve this, we need to do:
      
      1. don't remove the failed buffer from the checkpoint list where in
         the case of __try_to_free_cp_buf() because it may be released or
         overwritten by a later transaction
      2. jbd2_log_do_checkpoint() is the last chance, remove the failed
         buffer from the checkpoint list and abort the journal
      3. when checkpointing fails, don't update the journal super block to
         prevent the journaled contents from being cleaned.  For safety,
         don't update j_tail and j_tail_sequence either
      4. when checkpointing fails, notify this error to the ext4 layer so
         that ext4 don't clear the needs_recovery flag, otherwise the
         journaled contents are ignored and cleaned in the recovery phase
      5. if the recovery fails, keep the needs_recovery flag
      6. prevent jbd2_cleanup_journal_tail() from being called between
         __jbd2_journal_drop_transaction() and jbd2_journal_abort()
         (a possible race issue between jbd2_log_do_checkpoint()s called by
         jbd2_journal_flush() and __jbd2_log_wait_for_space())
      Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      44519faf
  19. 06 10月, 2008 1 次提交
  20. 09 10月, 2008 1 次提交
  21. 12 7月, 2008 1 次提交
  22. 30 1月, 2008 1 次提交
    • N
      spinlock: lockbreak cleanup · 95c354fe
      Nick Piggin 提交于
      The break_lock data structure and code for spinlocks is quite nasty.
      Not only does it double the size of a spinlock but it changes locking to
      a potentially less optimal trylock.
      
      Put all of that under CONFIG_GENERIC_LOCKBREAK, and introduce a
      __raw_spin_is_contended that uses the lock data itself to determine whether
      there are waiters on the lock, to be used if CONFIG_GENERIC_LOCKBREAK is
      not set.
      
      Rename need_lockbreak to spin_needbreak, make it use spin_is_contended to
      decouple it from the spinlock implementation, and make it typesafe (rwlocks
      do not have any need_lockbreak sites -- why do they even get bloated up
      with that break_lock then?).
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      95c354fe
  23. 29 1月, 2008 2 次提交
    • J
      jbd2: jbd2 stats through procfs · 8e85fb3f
      Johann Lombardi 提交于
      The patch below updates the jbd stats patch to 2.6.20/jbd2.
      The initial patch was posted by Alex Tomas in December 2005
      (http://marc.info/?l=linux-ext4&m=113538565128617&w=2).
      It provides statistics via procfs such as transaction lifetime and size.
      
      Sometimes, investigating performance problems, i find useful to have
      stats from jbd about transaction's lifetime, size, etc. here is a
      patch for review and inclusion probably.
      
      for example, stats after creation of 3M files in htree directory:
      
      [root@bob ~]# cat /proc/fs/jbd/sda/history
      R/C  tid   wait  run   lock  flush log   hndls  block inlog ctime write drop  close
      R    261   8260  2720  0     0     750   9892   8170  8187
      C    259                                                    750   0     4885  1
      R    262   20    2200  10    0     770   9836   8170  8187
      R    263   30    2200  10    0     3070  9812   8170  8187
      R    264   0     5000  10    0     1340  0      0     0
      C    261                                                    8240  3212  4957  0
      R    265   8260  1470  0     0     4640  9854   8170  8187
      R    266   0     5000  10    0     1460  0      0     0
      C    262                                                    8210  2989  4868  0
      R    267   8230  1490  10    0     4440  9875   8171  8188
      R    268   0     5000  10    0     1260  0      0     0
      C    263                                                    7710  2937  4908  0
      R    269   7730  1470  10    0     3330  9841   8170  8187
      R    270   0     5000  10    0     830   0      0     0
      C    265                                                    8140  3234  4898  0
      C    267                                                    720   0     4849  1
      R    271   8630  2740  20    0     740   9819   8170  8187
      C    269                                                    800   0     4214  1
      R    272   40    2170  10    0     830   9716   8170  8187
      R    273   40    2280  0     0     3530  9799   8170  8187
      R    274   0     5000  10    0     990   0      0     0
      
      
      where,
      
      R     - line for transaction's life from T_RUNNING to T_FINISHED
      C     - line for transaction's checkpointing
      tid   - transaction's id
      wait  - for how long we were waiting for new transaction to start
               (the longest period journal_start() took in this transaction)
      run   - real transaction's lifetime (from T_RUNNING to T_LOCKED
      lock  - how long we were waiting for all handles to close
               (time the transaction was in T_LOCKED)
      flush - how long it took to flush all data (data=ordered)
      log   - how long it took to write the transaction to the log
      hndls - how many handles got to the transaction
      block - how many blocks got to the transaction
      inlog - how many blocks are written to the log (block + descriptors)
      ctime - how long it took to checkpoint the transaction
      write - how many blocks have been written during checkpointing
      drop  - how many blocks have been dropped during checkpointing
      close - how many running transactions have been closed to checkpoint this one
      
      all times are in msec.
      
      
      [root@bob ~]# cat /proc/fs/jbd/sda/info
      280 transaction, each upto 8192 blocks
      average:
        1633ms waiting for transaction
        3616ms running transaction
        5ms transaction was being locked
        1ms flushing data (in ordered mode)
        1799ms logging transaction
        11781 handles per transaction
        5629 blocks per transaction
        5641 logged blocks per transaction
      Signed-off-by: NJohann Lombardi <johann.lombardi@bull.net>
      Signed-off-by: NMariusz Kozlowski <m.kozlowski@tuxland.pl>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      8e85fb3f
    • J
      jbd2: Fix assertion failure in fs/jbd2/checkpoint.c · f5a7a6b0
      Jan Kara 提交于
      Before we start committing a transaction, we call
      __journal_clean_checkpoint_list() to cleanup transaction's written-back
      buffers.
      
      If this call happens to remove all of them (and there were already some
      buffers), __journal_remove_checkpoint() will decide to free the transaction
      because it isn't (yet) a committing transaction and soon we fail some
      assertion - the transaction really isn't ready to be freed :).
      
      We change the check in __journal_remove_checkpoint() to free only a
      transaction in T_FINISHED state.  The locking there is subtle though (as
      everywhere in JBD ;().  We use j_list_lock to protect the check and a
      subsequent call to __journal_drop_transaction() and do the same in the end
      of journal_commit_transaction() which is the only place where a transaction
      can get to T_FINISHED state.
      
      Probably I'm too paranoid here and such locking is not really necessary -
      checkpoint lists are processed only from log_do_checkpoint() where a
      transaction must be already committed to be processed or from
      __journal_clean_checkpoint_list() where kjournald itself calls it and thus
      transaction cannot change state either.  Better be safe if something
      changes in future...
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      f5a7a6b0
  24. 09 5月, 2007 1 次提交
  25. 12 10月, 2006 2 次提交
  26. 27 9月, 2006 3 次提交
  27. 23 6月, 2006 1 次提交
    • J
      [PATCH] JBD: split checkpoint lists · 78ce89c9
      Jan Kara 提交于
      Split the checkpoint list of the transaction into two lists.  In the first
      list we keep the buffers that need to be submitted for IO.  In the second
      list are kept buffers that were already submitted and we just have to wait
      for the IO to complete.  This should simplify a handling of checkpoint
      lists a bit and can eventually be also a performance gain.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: "Stephen C. Tweedie" <sct@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      78ce89c9
  28. 23 3月, 2006 1 次提交
  29. 15 2月, 2006 1 次提交
  30. 19 1月, 2006 1 次提交
  31. 07 1月, 2006 1 次提交
  32. 08 9月, 2005 1 次提交