1. 29 7月, 2015 1 次提交
  2. 16 6月, 2015 1 次提交
  3. 15 6月, 2015 1 次提交
    • D
      jbd2: use GFP_NOFS in jbd2_cleanup_journal_tail() · b4f1afcd
      Dmitry Monakhov 提交于
      jbd2_cleanup_journal_tail() can be invoked by jbd2__journal_start()
      So allocations should be done with GFP_NOFS
      
      [Full stack trace snipped from 3.10-rh7]
      [<ffffffff815c4bd4>] dump_stack+0x19/0x1b
      [<ffffffff8105dba1>] warn_slowpath_common+0x61/0x80
      [<ffffffff8105dcca>] warn_slowpath_null+0x1a/0x20
      [<ffffffff815c2142>] slab_pre_alloc_hook.isra.31.part.32+0x15/0x17
      [<ffffffff8119c045>] kmem_cache_alloc+0x55/0x210
      [<ffffffff811477f5>] ? mempool_alloc_slab+0x15/0x20
      [<ffffffff811477f5>] mempool_alloc_slab+0x15/0x20
      [<ffffffff81147939>] mempool_alloc+0x69/0x170
      [<ffffffff815cb69e>] ? _raw_spin_unlock_irq+0xe/0x20
      [<ffffffff8109160d>] ? finish_task_switch+0x5d/0x150
      [<ffffffff811f1a8e>] bio_alloc_bioset+0x1be/0x2e0
      [<ffffffff8127ee49>] blkdev_issue_flush+0x99/0x120
      [<ffffffffa019a733>] jbd2_cleanup_journal_tail+0x93/0xa0 [jbd2] -->GFP_KERNEL
      [<ffffffffa019aca1>] jbd2_log_do_checkpoint+0x221/0x4a0 [jbd2]
      [<ffffffffa019afc7>] __jbd2_log_wait_for_space+0xa7/0x1e0 [jbd2]
      [<ffffffffa01952d8>] start_this_handle+0x2d8/0x550 [jbd2]
      [<ffffffff811b02a9>] ? __memcg_kmem_put_cache+0x29/0x30
      [<ffffffff8119c120>] ? kmem_cache_alloc+0x130/0x210
      [<ffffffffa019573a>] jbd2__journal_start+0xba/0x190 [jbd2]
      [<ffffffff811532ce>] ? lru_cache_add+0xe/0x10
      [<ffffffffa01c9549>] ? ext4_da_write_begin+0xf9/0x330 [ext4]
      [<ffffffffa01f2c77>] __ext4_journal_start_sb+0x77/0x160 [ext4]
      [<ffffffffa01c9549>] ext4_da_write_begin+0xf9/0x330 [ext4]
      [<ffffffff811446ec>] generic_file_buffered_write_iter+0x10c/0x270
      [<ffffffff81146918>] __generic_file_write_iter+0x178/0x390
      [<ffffffff81146c6b>] __generic_file_aio_write+0x8b/0xb0
      [<ffffffff81146ced>] generic_file_aio_write+0x5d/0xc0
      [<ffffffffa01bf289>] ext4_file_write+0xa9/0x450 [ext4]
      [<ffffffff811c31d9>] ? pipe_read+0x379/0x4f0
      [<ffffffff811b93f0>] do_sync_write+0x90/0xe0
      [<ffffffff811b9b6d>] vfs_write+0xbd/0x1e0
      [<ffffffff811ba5b8>] SyS_write+0x58/0xb0
      [<ffffffff815d4799>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      b4f1afcd
  4. 18 9月, 2014 2 次提交
  5. 17 9月, 2014 1 次提交
    • D
      jbd2: jbd2_log_wait_for_space improve error detetcion · 1245799f
      Dmitry Monakhov 提交于
      If EIO happens after we have dropped j_state_lock, we won't notice
      that the journal has been aborted.  So it is reasonable to move this
      check after we have grabbed the j_checkpoint_mutex and re-grabbed the
      j_state_lock.  This patch helps to prevent false positive complain
      after EIO.
      
      #DMESG:
      __jbd2_log_wait_for_space: needed 8448 blocks and only had 8386 space available
      __jbd2_log_wait_for_space: no way to get more journal space in ram1-8
      ------------[ cut here ]------------
      WARNING: CPU: 15 PID: 6739 at fs/jbd2/checkpoint.c:168 __jbd2_log_wait_for_space+0x188/0x200()
      Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod
      CPU: 15 PID: 6739 Comm: fsstress Tainted: G        W      3.17.0-rc2-00429-g684de574 #139
      Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
       00000000000000a8 ffff88077aaab878 ffffffff815c1a8c 00000000000000a8
       0000000000000000 ffff88077aaab8b8 ffffffff8106ce8c ffff88077aaab898
       ffff8807c57e6000 ffff8807c57e6028 0000000000002100 ffff8807c57e62f0
      Call Trace:
       [<ffffffff815c1a8c>] dump_stack+0x51/0x6d
       [<ffffffff8106ce8c>] warn_slowpath_common+0x8c/0xc0
       [<ffffffff8106ceda>] warn_slowpath_null+0x1a/0x20
       [<ffffffff812419f8>] __jbd2_log_wait_for_space+0x188/0x200
       [<ffffffff8123be9a>] start_this_handle+0x4da/0x7b0
       [<ffffffff810990e5>] ? local_clock+0x25/0x30
       [<ffffffff810aba87>] ? lockdep_init_map+0xe7/0x180
       [<ffffffff8123c5bc>] jbd2__journal_start+0xdc/0x1d0
       [<ffffffff811f2414>] ? __ext4_new_inode+0x7f4/0x1330
       [<ffffffff81222a38>] __ext4_journal_start_sb+0xf8/0x110
       [<ffffffff811f2414>] __ext4_new_inode+0x7f4/0x1330
       [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190
       [<ffffffff812025bb>] ext4_create+0x8b/0x150
       [<ffffffff8117fe3b>] vfs_create+0x7b/0xb0
       [<ffffffff8118097b>] do_last+0x7db/0xcf0
       [<ffffffff8117e31d>] ? inode_permission+0x4d/0x50
       [<ffffffff811845d2>] path_openat+0x242/0x590
       [<ffffffff81191a76>] ? __alloc_fd+0x36/0x140
       [<ffffffff81184a6a>] do_filp_open+0x4a/0xb0
       [<ffffffff81191b61>] ? __alloc_fd+0x121/0x140
       [<ffffffff81172f20>] do_sys_open+0x170/0x220
       [<ffffffff8117300e>] SyS_open+0x1e/0x20
       [<ffffffff811715d6>] SyS_creat+0x16/0x20
       [<ffffffff815c7e12>] system_call_fastpath+0x16/0x1b
      ---[ end trace cd71c831f82059db ]---
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      1245799f
  6. 05 9月, 2014 2 次提交
  7. 02 9月, 2014 2 次提交
  8. 13 6月, 2013 1 次提交
    • P
      jbd2: drop checkpoint mutex when waiting in __jbd2_log_wait_for_space() · 0ef54180
      Paul Gortmaker 提交于
      While trying to debug an an issue under extreme I/O loading
      on preempt-rt kernels, the following backtrace was observed
      via SysRQ output:
      
      rm              D ffff8802203afbc0  4600  4878   4748 0x00000000
       ffff8802217bfb78 0000000000000082 ffff88021fc2bb80 ffff88021fc2bb80
       ffff88021fc2bb80 ffff8802217bffd8 ffff8802217bffd8 ffff8802217bffd8
       ffff88021f1d4c80 ffff88021fc2bb80 ffff8802217bfb88 ffff88022437b000
      Call Trace:
       [<ffffffff8172dc34>] schedule+0x24/0x70
       [<ffffffff81225b5d>] jbd2_log_wait_commit+0xbd/0x140
       [<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
       [<ffffffff81223635>] jbd2_log_do_checkpoint+0xf5/0x520
       [<ffffffff81223b09>] __jbd2_log_wait_for_space+0xa9/0x1f0
       [<ffffffff8121dc40>] start_this_handle.isra.10+0x2e0/0x530
       [<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
       [<ffffffff8121e0a3>] jbd2__journal_start+0xc3/0x110
       [<ffffffff811de7ce>] ? ext4_rmdir+0x6e/0x230
       [<ffffffff8121e0fe>] jbd2_journal_start+0xe/0x10
       [<ffffffff811f308b>] ext4_journal_start_sb+0x5b/0x160
       [<ffffffff811de7ce>] ext4_rmdir+0x6e/0x230
       [<ffffffff811435c5>] vfs_rmdir+0xd5/0x140
       [<ffffffff8114370f>] do_rmdir+0xdf/0x120
       [<ffffffff8105c6b4>] ? task_work_run+0x44/0x80
       [<ffffffff81002889>] ? do_notify_resume+0x89/0x100
       [<ffffffff817361ae>] ? int_signal+0x12/0x17
       [<ffffffff81145d85>] sys_unlinkat+0x25/0x40
       [<ffffffff81735f22>] system_call_fastpath+0x16/0x1b
      
      What is interesting here, is that we call log_wait_commit, from
      within wait_for_space, but we are still holding the checkpoint_mutex
      as it surrounds mostly the whole of wait_for_space.  And then, as we
      are waiting, journal_commit_transaction can run, and if the JBD2_FLUSHED
      bit is set, then we will also try to take the same checkpoint_mutex.
      
      It seems that we need to drop the checkpoint_mutex while sitting in
      jbd2_log_wait_commit, if we want to guarantee that progress can be made
      by jbd2_journal_commit_transaction().  There does not seem to be
      anything preempt-rt specific about this, other then perhaps increasing
      the odds of it happening.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0ef54180
  9. 05 6月, 2013 4 次提交
  10. 14 3月, 2012 4 次提交
    • J
      jbd2: remove bh_state lock from checkpointing code · 932bb305
      Jan Kara 提交于
      All accesses to checkpointing entries in journal_head are protected
      by j_list_lock. Thus __jbd2_journal_remove_checkpoint() doesn't really
      need bh_state lock.
      
      Also the only part of journal head that the rest of checkpointing code
      needs to check is jh->b_transaction which is safe to read under
      j_list_lock.
      
      So we can safely remove bh_state lock from all of checkpointing code which
      makes it considerably prettier.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      932bb305
    • J
      jbd2: fix BH_JWrite setting in checkpointing code · 96c86678
      Jan Kara 提交于
      BH_JWrite bit should be set when buffer is written to the journal. So
      checkpointing shouldn't set this bit when writing out buffer. This didn't
      cause any observable bug since BH_JWrite bit is used only for debugging
      purposes but it's good to have this consistent.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      96c86678
    • J
      jbd2: issue cache flush after checkpointing even with internal journal · 79feb521
      Jan Kara 提交于
      When we reach jbd2_cleanup_journal_tail(), there is no guarantee that
      checkpointed buffers are on a stable storage - especially if buffers were
      written out by jbd2_log_do_checkpoint(), they are likely to be only in disk's
      caches. Thus when we update journal superblock effectively removing old
      transaction from journal, this write of superblock can get to stable storage
      before those checkpointed buffers which can result in filesystem corruption
      after a crash. Thus we must unconditionally issue a cache flush before we
      update journal superblock in these cases.
      
      A similar problem can also occur if journal superblock is written only in
      disk's caches, other transaction starts reusing space of the transaction
      cleaned from the log and power failure happens. Subsequent journal replay would
      still try to replay the old transaction but some of it's blocks may be already
      overwritten by the new transaction. For this reason we must use WRITE_FUA when
      updating log tail and we must first write new log tail to disk and update
      in-memory information only after that.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      79feb521
    • J
      jbd2: split updating of journal superblock and marking journal empty · 24bcc89c
      Jan Kara 提交于
      There are three case of updating journal superblock. In the first case, we want
      to mark journal as empty (setting s_sequence to 0), in the second case we want
      to update log tail, in the third case we want to update s_errno. Split these
      cases into separate functions. It makes the code slightly more straightforward
      and later patches will make the distinction even more important.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      24bcc89c
  11. 21 2月, 2012 2 次提交
  12. 06 12月, 2011 1 次提交
  13. 28 6月, 2011 1 次提交
    • T
      jbd2: use WRITE_SYNC in journal checkpoint · d3ad8434
      Tao Ma 提交于
      In journal checkpoint, we write the buffer and wait for its finish.
      But in cfq, the async queue has a very low priority, and in our test,
      if there are too many sync queues and every queue is filled up with
      requests, the write request will be delayed for quite a long time and
      all the tasks which are waiting for journal space will end with errors like:
      
      INFO: task attr_set:3816 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      attr_set      D ffff880028393480     0  3816      1 0x00000000
       ffff8802073fbae8 0000000000000086 ffff8802140847c8 ffff8800283934e8
       ffff8802073fb9d8 ffffffff8103e456 ffff8802140847b8 ffff8801ed728080
       ffff8801db4bc080 ffff8801ed728450 ffff880028393480 0000000000000002
      Call Trace:
       [<ffffffff8103e456>] ? __dequeue_entity+0x33/0x38
       [<ffffffff8103caad>] ? need_resched+0x23/0x2d
       [<ffffffff814006a6>] ? thread_return+0xa2/0xbc
       [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
       [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
       [<ffffffff81400d31>] __mutex_lock_common+0x14e/0x1a9
       [<ffffffffa021dbfb>] ? brelse+0x13/0x15 [ext4]
       [<ffffffff81400ddb>] __mutex_lock_slowpath+0x19/0x1b
       [<ffffffff81400b2d>] mutex_lock+0x1b/0x32
       [<ffffffffa01f927b>] __jbd2_journal_insert_checkpoint+0xe3/0x20c [jbd2]
       [<ffffffffa01f547b>] start_this_handle+0x438/0x527 [jbd2]
       [<ffffffff8106f491>] ? autoremove_wake_function+0x0/0x3e
       [<ffffffffa01f560b>] jbd2_journal_start+0xa1/0xcc [jbd2]
       [<ffffffffa02353be>] ext4_journal_start_sb+0x57/0x81 [ext4]
       [<ffffffffa024a314>] ext4_xattr_set+0x6c/0xe3 [ext4]
       [<ffffffffa024aaff>] ext4_xattr_user_set+0x42/0x4b [ext4]
       [<ffffffff81145adb>] generic_setxattr+0x6b/0x76
       [<ffffffff81146ac0>] __vfs_setxattr_noperm+0x47/0xc0
       [<ffffffff81146bb8>] vfs_setxattr+0x7f/0x9a
       [<ffffffff81146c88>] setxattr+0xb5/0xe8
       [<ffffffff81137467>] ? do_filp_open+0x571/0xa6e
       [<ffffffff81146d26>] sys_fsetxattr+0x6b/0x91
       [<ffffffff81002d32>] system_call_fastpath+0x16/0x1b
      
      So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
      be moved into sync queue and handled by cfq timely. We also use the new plug,
      sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Jan Kara <jack@suse.cz>
      Reported-by: NRobin Dong <sanbai@taobao.com>
      d3ad8434
  14. 14 6月, 2011 1 次提交
    • J
      jbd2: Fix oops in jbd2_journal_remove_journal_head() · de1b7941
      Jan Kara 提交于
      jbd2_journal_remove_journal_head() can oops when trying to access
      journal_head returned by bh2jh(). This is caused for example by the
      following race:
      
      	TASK1					TASK2
        jbd2_journal_commit_transaction()
          ...
          processing t_forget list
            __jbd2_journal_refile_buffer(jh);
            if (!jh->b_transaction) {
              jbd_unlock_bh_state(bh);
      					jbd2_journal_try_to_free_buffers()
      					  jbd2_journal_grab_journal_head(bh)
      					  jbd_lock_bh_state(bh)
      					  __journal_try_to_free_buffer()
      					  jbd2_journal_put_journal_head(jh)
              jbd2_journal_remove_journal_head(bh);
      
      jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and
      buffer is not part of any transaction and thus frees journal_head
      before TASK1 gets to doing so. Note that even buffer_head can be
      released by try_to_free_buffers() after
      jbd2_journal_put_journal_head() which adds even larger opportunity for
      oops (but I didn't see this happen in reality).
      
      Fix the problem by making transactions hold their own journal_head
      reference (in b_jcount). That way we don't have to remove journal_head
      explicitely via jbd2_journal_remove_journal_head() and instead just
      remove journal_head when b_jcount drops to zero. The result of this is
      that [__]jbd2_journal_refile_buffer(),
      [__]jbd2_journal_unfile_buffer(), and
      __jdb2_journal_remove_checkpoint() can free journal_head which needs
      modification of a few callers. Also we have to be careful because once
      journal_head is removed, buffer_head might be freed as well. So we
      have to get our own buffer_head reference where it matters.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      de1b7941
  15. 28 10月, 2010 1 次提交
  16. 17 9月, 2010 1 次提交
    • C
      block: remove BLKDEV_IFL_WAIT · dd3932ed
      Christoph Hellwig 提交于
      All the blkdev_issue_* helpers can only sanely be used for synchronous
      caller.  To issue cache flushes or barriers asynchronously the caller needs
      to set up a bio by itself with a completion callback to move the asynchronous
      state machine ahead.  So drop the BLKDEV_IFL_WAIT flag that is always
      specified when calling blkdev_issue_* and also remove the now unused flags
      argument to blkdev_issue_flush and blkdev_issue_zeroout.  For
      blkdev_issue_discard we need to keep it for the secure discard flag, which
      gains a more descriptive name and loses the bitops vs flag confusion.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      dd3932ed
  17. 18 8月, 2010 1 次提交
    • C
      remove SWRITE* I/O types · 9cb569d6
      Christoph Hellwig 提交于
      These flags aren't real I/O types, but tell ll_rw_block to always
      lock the buffer instead of giving up on a failed trylock.
      
      Instead add a new write_dirty_buffer helper that implements this semantic
      and use it from the existing SWRITE* callers.  Note that the ll_rw_block
      code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
      this patch fixes.
      
      In the ufs code clean up the helper that used to call ll_rw_block
      to mirror sync_dirty_buffer, which is the function it implements for
      compound buffers.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9cb569d6
  18. 04 8月, 2010 1 次提交
  19. 02 8月, 2010 1 次提交
  20. 29 4月, 2010 1 次提交
  21. 23 12月, 2009 2 次提交
    • T
      ext4, jbd2: Add barriers for file systems with exernal journals · cc3e1bea
      Theodore Ts'o 提交于
      This is a bit complicated because we are trying to optimize when we
      send barriers to the fs data disk.  We could just throw in an extra
      barrier to the data disk whenever we send a barrier to the journal
      disk, but that's not always strictly necessary.
      
      We only need to send a barrier during a commit when there are data
      blocks which are must be written out due to an inode written in
      ordered mode, or if fsync() depends on the commit to force data blocks
      to disk.  Finally, before we drop transactions from the beginning of
      the journal during a checkpoint operation, we need to guarantee that
      any blocks that were flushed out to the data disk are firmly on the
      rust platter before we drop the transaction from the journal.
      
      Thanks to Oleg Drokin for pointing out this flaw in ext3/ext4.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      cc3e1bea
    • T
      71f2be21
  22. 30 9月, 2009 1 次提交
  23. 17 6月, 2009 1 次提交
  24. 07 11月, 2008 1 次提交
  25. 05 11月, 2008 1 次提交
  26. 07 11月, 2008 1 次提交
  27. 11 10月, 2008 1 次提交
    • H
      jbd2: fix error handling for checkpoint io · 44519faf
      Hidehiro Kawai 提交于
      When a checkpointing IO fails, current JBD2 code doesn't check the
      error and continue journaling.  This means latest metadata can be
      lost from both the journal and filesystem.
      
      This patch leaves the failed metadata blocks in the journal space
      and aborts journaling in the case of jbd2_log_do_checkpoint().
      To achieve this, we need to do:
      
      1. don't remove the failed buffer from the checkpoint list where in
         the case of __try_to_free_cp_buf() because it may be released or
         overwritten by a later transaction
      2. jbd2_log_do_checkpoint() is the last chance, remove the failed
         buffer from the checkpoint list and abort the journal
      3. when checkpointing fails, don't update the journal super block to
         prevent the journaled contents from being cleaned.  For safety,
         don't update j_tail and j_tail_sequence either
      4. when checkpointing fails, notify this error to the ext4 layer so
         that ext4 don't clear the needs_recovery flag, otherwise the
         journaled contents are ignored and cleaned in the recovery phase
      5. if the recovery fails, keep the needs_recovery flag
      6. prevent jbd2_cleanup_journal_tail() from being called between
         __jbd2_journal_drop_transaction() and jbd2_journal_abort()
         (a possible race issue between jbd2_log_do_checkpoint()s called by
         jbd2_journal_flush() and __jbd2_log_wait_for_space())
      Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      44519faf
  28. 06 10月, 2008 1 次提交
  29. 09 10月, 2008 1 次提交