1. 10 5月, 2022 2 次提交
  2. 22 4月, 2022 1 次提交
    • Y
      jbd2: fix a potential race while discarding reserved buffers after an abort · 23e3d7f7
      Ye Bin 提交于
      we got issue as follows:
      [   72.796117] EXT4-fs error (device sda): ext4_journal_check_start:83: comm fallocate: Detected aborted journal
      [   72.826847] EXT4-fs (sda): Remounting filesystem read-only
      fallocate: fallocate failed: Read-only file system
      [   74.791830] jbd2_journal_commit_transaction: jh=0xffff9cfefe725d90 bh=0x0000000000000000 end delay
      [   74.793597] ------------[ cut here ]------------
      [   74.794203] kernel BUG at fs/jbd2/transaction.c:2063!
      [   74.794886] invalid opcode: 0000 [#1] PREEMPT SMP PTI
      [   74.795533] CPU: 4 PID: 2260 Comm: jbd2/sda-8 Not tainted 5.17.0-rc8-next-20220315-dirty #150
      [   74.798327] RIP: 0010:__jbd2_journal_unfile_buffer+0x3e/0x60
      [   74.801971] RSP: 0018:ffffa828c24a3cb8 EFLAGS: 00010202
      [   74.802694] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [   74.803601] RDX: 0000000000000001 RSI: ffff9cfefe725d90 RDI: ffff9cfefe725d90
      [   74.804554] RBP: ffff9cfefe725d90 R08: 0000000000000000 R09: ffffa828c24a3b20
      [   74.805471] R10: 0000000000000001 R11: 0000000000000001 R12: ffff9cfefe725d90
      [   74.806385] R13: ffff9cfefe725d98 R14: 0000000000000000 R15: ffff9cfe833a4d00
      [   74.807301] FS:  0000000000000000(0000) GS:ffff9d01afb00000(0000) knlGS:0000000000000000
      [   74.808338] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   74.809084] CR2: 00007f2b81bf4000 CR3: 0000000100056000 CR4: 00000000000006e0
      [   74.810047] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   74.810981] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   74.811897] Call Trace:
      [   74.812241]  <TASK>
      [   74.812566]  __jbd2_journal_refile_buffer+0x12f/0x180
      [   74.813246]  jbd2_journal_refile_buffer+0x4c/0xa0
      [   74.813869]  jbd2_journal_commit_transaction.cold+0xa1/0x148
      [   74.817550]  kjournald2+0xf8/0x3e0
      [   74.819056]  kthread+0x153/0x1c0
      [   74.819963]  ret_from_fork+0x22/0x30
      
      Above issue may happen as follows:
              write                   truncate                   kjournald2
      generic_perform_write
       ext4_write_begin
        ext4_walk_page_buffers
         do_journal_get_write_access ->add BJ_Reserved list
       ext4_journalled_write_end
        ext4_walk_page_buffers
         write_end_fn
          ext4_handle_dirty_metadata
                      ***************JBD2 ABORT**************
           jbd2_journal_dirty_metadata
       -> return -EROFS, jh in reserved_list
                                                         jbd2_journal_commit_transaction
                                                          while (commit_transaction->t_reserved_list)
                                                            jh = commit_transaction->t_reserved_list;
                              truncate_pagecache_range
                               do_invalidatepage
      			  ext4_journalled_invalidatepage
      			   jbd2_journal_invalidatepage
      			    journal_unmap_buffer
      			     __dispose_buffer
      			      __jbd2_journal_unfile_buffer
      			       jbd2_journal_put_journal_head ->put last ref_count
      			        __journal_remove_journal_head
      				 bh->b_private = NULL;
      				 jh->b_bh = NULL;
      				                      jbd2_journal_refile_buffer(journal, jh);
      							bh = jh2bh(jh);
      							->bh is NULL, later will trigger null-ptr-deref
      				 journal_free_journal_head(jh);
      
      After commit 96f1e097, we no longer hold the j_state_lock while
      iterating over the list of reserved handles in
      jbd2_journal_commit_transaction().  This potentially allows the
      journal_head to be freed by journal_unmap_buffer while the commit
      codepath is also trying to free the BJ_Reserved buffers.  Keeping
      j_state_lock held while trying extends hold time of the lock
      minimally, and solves this issue.
      
      Fixes: 96f1e097("jbd2: avoid long hold times of j_state_lock while committing a transaction")
      Signed-off-by: NYe Bin <yebin10@huawei.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220317142137.1821590-1-yebin10@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      23e3d7f7
  3. 03 2月, 2022 2 次提交
  4. 28 1月, 2021 1 次提交
  5. 07 11月, 2020 2 次提交
  6. 22 10月, 2020 1 次提交
  7. 18 10月, 2020 2 次提交
  8. 22 5月, 2020 1 次提交
  9. 06 3月, 2020 1 次提交
  10. 14 2月, 2020 2 次提交
  11. 25 1月, 2020 1 次提交
  12. 06 11月, 2019 3 次提交
  13. 21 10月, 2019 2 次提交
    • T
      jbd2: Make state lock a spinlock · 46417064
      Thomas Gleixner 提交于
      Bit-spinlocks are problematic on PREEMPT_RT if functions which might sleep
      on RT, e.g. spin_lock(), alloc/free(), are invoked inside the lock held
      region because bit spinlocks disable preemption even on RT.
      
      A first attempt was to replace state lock with a spinlock placed in struct
      buffer_head and make the locking conditional on PREEMPT_RT and
      DEBUG_BIT_SPINLOCKS.
      
      Jan pointed out that there is a 4 byte hole in struct journal_head where a
      regular spinlock fits in and he would not object to convert the state lock
      to a spinlock unconditionally.
      
      Aside of solving the RT problem, this also gains lockdep coverage for the
      journal head state lock (bit-spinlocks are not covered by lockdep as it's
      hard to fit a lockdep map into a single bit).
      
      The trivial change would have been to convert the jbd_*lock_bh_state()
      inlines, but that comes with the downside that these functions take a
      buffer head pointer which needs to be converted to a journal head pointer
      which adds another level of indirection.
      
      As almost all functions which use this lock have a journal head pointer
      readily available, it makes more sense to remove the lock helper inlines
      and write out spin_*lock() at all call sites.
      
      Fixup all locking comments as well.
      Suggested-by: NJan Kara <jack@suse.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Jan Kara <jack@suse.com>
      Cc: linux-ext4@vger.kernel.org
      Link: https://lore.kernel.org/r/20190809124233.13277-7-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      46417064
    • J
      jbd2: Move dropping of jh reference out of un/re-filing functions · 93108ebb
      Jan Kara 提交于
      __jbd2_journal_unfile_buffer() and __jbd2_journal_refile_buffer() drop
      transaction's jh reference when they remove jh from a transaction. This
      will be however inconvenient once we move state lock into journal_head
      itself as we still need to unlock it and we'd need to grab jh reference
      just for that. Move dropping of jh reference out of these functions into
      the few callers.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20190809124233.13277-4-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      93108ebb
  14. 21 6月, 2019 1 次提交
    • R
      jbd2: introduce jbd2_inode dirty range scoping · 6ba0e7dc
      Ross Zwisler 提交于
      Currently both journal_submit_inode_data_buffers() and
      journal_finish_inode_data_buffers() operate on the entire address space
      of each of the inodes associated with a given journal entry.  The
      consequence of this is that if we have an inode where we are constantly
      appending dirty pages we can end up waiting for an indefinite amount of
      time in journal_finish_inode_data_buffers() while we wait for all the
      pages under writeback to be written out.
      
      The easiest way to cause this type of workload is do just dd from
      /dev/zero to a file until it fills the entire filesystem.  This can
      cause journal_finish_inode_data_buffers() to wait for the duration of
      the entire dd operation.
      
      We can improve this situation by scoping each of the inode dirty ranges
      associated with a given transaction.  We do this via the jbd2_inode
      structure so that the scoping is contained within jbd2 and so that it
      follows the lifetime and locking rules for that structure.
      
      This allows us to limit the writeback & wait in
      journal_submit_inode_data_buffers() and
      journal_finish_inode_data_buffers() respectively to the dirty range for
      a given struct jdb2_inode, keeping us from waiting forever if the inode
      in question is still being appended to.
      Signed-off-by: NRoss Zwisler <zwisler@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.org
      6ba0e7dc
  15. 31 5月, 2019 1 次提交
  16. 01 3月, 2019 1 次提交
    • L
      jbd2: fix invalid descriptor block checksum · 6e876c3d
      luojiajun 提交于
      In jbd2_journal_commit_transaction(), if we are in abort mode,
      we may flush the buffer without setting descriptor block checksum
      by goto start_journal_io. Then fs is mounted,
      jbd2_descriptor_block_csum_verify() failed.
      
      [  271.379811] EXT4-fs (vdd): shut down requested (2)
      [  271.381827] Aborting journal on device vdd-8.
      [  271.597136] JBD2: Invalid checksum recovering block 22199 in log
      [  271.598023] JBD2: recovery failed
      [  271.598484] EXT4-fs (vdd): error loading journal
      
      Fix this problem by keep setting descriptor block checksum if the
      descriptor buffer is not NULL.
      
      This checksum problem can be reproduced by xfstests generic/388.
      Signed-off-by: Nluojiajun <luojiajun3@huawei.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      6e876c3d
  17. 04 12月, 2018 1 次提交
    • J
      jbd2: avoid long hold times of j_state_lock while committing a transaction · 96f1e097
      Jan Kara 提交于
      We can hold j_state_lock for writing at the beginning of
      jbd2_journal_commit_transaction() for a rather long time (reportedly for
      30 ms) due cleaning revoke bits of all revoked buffers under it. The
      handling of revoke tables as well as cleaning of t_reserved_list, and
      checkpoint lists does not need j_state_lock for anything. It is only
      needed to prevent new handles from joining the transaction. Generally
      T_LOCKED transaction state prevents new handles from joining the
      transaction - except for reserved handles which have to allowed to join
      while we wait for other handles to complete.
      
      To prevent reserved handles from joining the transaction while cleaning
      up lists, add new transaction state T_SWITCH and watch for it when
      starting reserved handles. With this we can just drop the lock for
      operations that don't need it.
      Reported-and-tested-by: NAdrian Hunter <adrian.hunter@intel.com>
      Suggested-by: N"Theodore Y. Ts'o" <tytso@mit.edu>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      96f1e097
  18. 30 7月, 2018 1 次提交
  19. 18 12月, 2017 1 次提交
    • T
      ext4: fix up remaining files with SPDX cleanups · f5166768
      Theodore Ts'o 提交于
      A number of ext4 source files were skipped due because their copyright
      permission statements didn't match the expected text used by the
      automated conversion utilities.  I've added SPDX tags for the rest.
      
      While looking at some of these files, I've noticed that we have quite
      a bit of variation on the licenses that were used --- in particular
      some of the Red Hat licenses on the jbd2 files use a GPL2+ license,
      and we have some files that have a LGPL-2.1 license (which was quite
      surprising).
      
      I've not attempted to do any license changes.  Even if it is perfectly
      legal to relicense to GPL 2.0-only for consistency's sake, that should
      be done with ext4 developer community discussion.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      
      f5166768
  20. 06 7月, 2017 1 次提交
  21. 14 1月, 2017 1 次提交
    • T
      fs/jbd2, locking/mutex, sched/wait: Use mutex_lock_io() for journal->j_checkpoint_mutex · 6fa7aa50
      Tejun Heo 提交于
      When an ext4 fs is bogged down by a lot of metadata IOs (in the
      reported case, it was deletion of millions of files, but any massive
      amount of journal writes would do), after the journal is filled up,
      tasks which try to access the filesystem and aren't currently
      performing the journal writes end up waiting in
      __jbd2_log_wait_for_space() for journal->j_checkpoint_mutex.
      
      Because those mutex sleeps aren't marked as iowait, this condition can
      lead to misleadingly low iowait and /proc/stat:procs_blocked.  While
      iowait propagation is far from strict, this condition can be triggered
      fairly easily and annotating these sleeps correctly helps initial
      diagnosis quite a bit.
      
      Use the new mutex_lock_io() for journal->j_checkpoint_mutex so that
      these sleeps are properly marked as iowait.
      Reported-by: NMingbo Wan <mingbo@fb.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-team@fb.com
      Link: http://lkml.kernel.org/r/1477673892-28940-5-git-send-email-tj@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6fa7aa50
  22. 01 11月, 2016 1 次提交
  23. 12 10月, 2016 1 次提交
  24. 30 6月, 2016 1 次提交
  25. 08 6月, 2016 1 次提交
  26. 24 4月, 2016 1 次提交
    • J
      jbd2: add support for avoiding data writes during transaction commits · 41617e1a
      Jan Kara 提交于
      Currently when filesystem needs to make sure data is on permanent
      storage before committing a transaction it adds inode to transaction's
      inode list. During transaction commit, jbd2 writes back all dirty
      buffers that have allocated underlying blocks and waits for the IO to
      finish. However when doing writeback for delayed allocated data, we
      allocate blocks and immediately submit the data. Thus asking jbd2 to
      write dirty pages just unnecessarily adds more work to jbd2 possibly
      writing back other redirtied blocks.
      
      Add support to jbd2 to allow filesystem to ask jbd2 to only wait for
      outstanding data writes before committing a transaction and thus avoid
      unnecessary writes.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      41617e1a
  27. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  28. 23 2月, 2016 4 次提交
  29. 18 10月, 2015 1 次提交