1. 22 5月, 2017 1 次提交
    • T
      jbd2: preserve original nofs flag during journal restart · b4709067
      Tahsin Erdogan 提交于
      When a transaction starts, start_this_handle() saves current
      PF_MEMALLOC_NOFS value so that it can be restored at journal stop time.
      Journal restart is a special case that calls start_this_handle() without
      stopping the transaction. start_this_handle() isn't aware that the
      original value is already stored so it overwrites it with current value.
      
      For instance, a call sequence like below leaves PF_MEMALLOC_NOFS flag set
      at the end:
      
        jbd2_journal_start()
        jbd2__journal_restart()
        jbd2_journal_stop()
      
      Make jbd2__journal_restart() restore the original value before calling
      start_this_handle().
      
      Fixes: 81378da6 ("jbd2: mark the transaction context with the scope GFP_NOFS context")
      Signed-off-by: NTahsin Erdogan <tahsin@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      b4709067
  2. 16 5月, 2017 2 次提交
  3. 04 5月, 2017 1 次提交
  4. 05 2月, 2017 1 次提交
  5. 13 10月, 2016 1 次提交
  6. 22 9月, 2016 1 次提交
  7. 30 6月, 2016 3 次提交
    • J
      jbd2: track more dependencies on transaction commit · 1eaa566d
      Jan Kara 提交于
      So far we were tracking only dependency on transaction commit due to
      starting a new handle (which may require commit to start a new
      transaction). Now add tracking also for other cases where we wait for
      transaction commit. This way lockdep can catch deadlocks e. g. because we
      call jbd2_journal_stop() for a synchronous handle with some locks held
      which rank below transaction start.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      1eaa566d
    • J
      jbd2: move lockdep tracking to journal_s · ab714aff
      Jan Kara 提交于
      Currently lockdep map is tracked in each journal handle. To be able to
      expand lockdep support to cover also other cases where we depend on
      transaction commit and where handle is not available, move lockdep map
      into struct journal_s. Since this makes the lockdep map shared for all
      handles, we have to use rwsem_acquire_read() for acquisitions now.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ab714aff
    • J
      jbd2: move lockdep instrumentation for jbd2 handles · 7a4b188f
      Jan Kara 提交于
      The transaction the handle references is free to commit once we've
      decremented t_updates counter. Move the lockdep instrumentation to that
      place. Currently it was a bit later which did not really matter but
      subsequent improvements to lockdep instrumentation would cause false
      positives with it.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      7a4b188f
  8. 24 4月, 2016 1 次提交
    • J
      jbd2: add support for avoiding data writes during transaction commits · 41617e1a
      Jan Kara 提交于
      Currently when filesystem needs to make sure data is on permanent
      storage before committing a transaction it adds inode to transaction's
      inode list. During transaction commit, jbd2 writes back all dirty
      buffers that have allocated underlying blocks and waits for the IO to
      finish. However when doing writeback for delayed allocated data, we
      allocate blocks and immediately submit the data. Thus asking jbd2 to
      write dirty pages just unnecessarily adds more work to jbd2 possibly
      writing back other redirtied blocks.
      
      Add support to jbd2 to allow filesystem to ask jbd2 to only wait for
      outstanding data writes before committing a transaction and thus avoid
      unnecessary writes.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      41617e1a
  9. 18 4月, 2016 1 次提交
  10. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  11. 14 3月, 2016 1 次提交
    • M
      jbd2: do not fail journal because of frozen_buffer allocation failure · 490c1b44
      Michal Hocko 提交于
      Journal transaction might fail prematurely because the frozen_buffer
      is allocated by GFP_NOFS request:
      [   72.440013] do_get_write_access: OOM for frozen_buffer
      [   72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
      [   72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory
      (...snipped....)
      [   72.495559] do_get_write_access: OOM for frozen_buffer
      [   72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
      [   72.496839] do_get_write_access: OOM for frozen_buffer
      [   72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
      [   72.505766] Aborting journal on device sda1-8.
      [   72.505851] EXT4-fs (sda1): Remounting filesystem read-only
      
      This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS
      allocations upon OOM" because small GPF_NOFS allocations never failed.
      This allocation seems essential for the journal and GFP_NOFS is too
      restrictive to the memory allocator so let's use __GFP_NOFAIL here to
      emulate the previous behavior.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      490c1b44
  12. 07 1月, 2016 1 次提交
  13. 05 12月, 2015 1 次提交
    • J
      jbd2: fix null committed data return in undo_access · 087ffd4e
      Junxiao Bi 提交于
      introduced jbd2_write_access_granted() to improve write|undo_access
      speed, but missed to check the status of b_committed_data which caused
      a kernel panic on ocfs2.
      
      [ 6538.405938] ------------[ cut here ]------------
      [ 6538.406686] kernel BUG at fs/ocfs2/suballoc.c:2400!
      [ 6538.406686] invalid opcode: 0000 [#1] SMP
      [ 6538.406686] Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront xen_fbfront parport_pc parport pcspkr i2c_piix4 acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix cirrus ttm drm_kms_helper drm fb_sys_fops sysimgblt sysfillrect i2c_core syscopyarea dm_mirror dm_region_hash dm_log dm_mod
      [ 6538.406686] CPU: 1 PID: 16265 Comm: mmap_truncate Not tainted 4.3.0 #1
      [ 6538.406686] Hardware name: Xen HVM domU, BIOS 4.3.1OVM 05/14/2014
      [ 6538.406686] task: ffff88007c2bab00 ti: ffff880075b78000 task.ti: ffff880075b78000
      [ 6538.406686] RIP: 0010:[<ffffffffa06a286b>]  [<ffffffffa06a286b>] ocfs2_block_group_clear_bits+0x23b/0x250 [ocfs2]
      [ 6538.406686] RSP: 0018:ffff880075b7b7f8  EFLAGS: 00010246
      [ 6538.406686] RAX: ffff8800760c5b40 RBX: ffff88006c06a000 RCX: ffffffffa06e6df0
      [ 6538.406686] RDX: 0000000000000000 RSI: ffff88007a6f6ea0 RDI: ffff88007a760430
      [ 6538.406686] RBP: ffff880075b7b878 R08: 0000000000000002 R09: 0000000000000001
      [ 6538.406686] R10: ffffffffa06769be R11: 0000000000000000 R12: 0000000000000001
      [ 6538.406686] R13: ffffffffa06a1750 R14: 0000000000000001 R15: ffff88007a6f6ea0
      [ 6538.406686] FS:  00007f17fde30720(0000) GS:ffff88007f040000(0000) knlGS:0000000000000000
      [ 6538.406686] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 6538.406686] CR2: 0000000000601730 CR3: 000000007aea0000 CR4: 00000000000406e0
      [ 6538.406686] Stack:
      [ 6538.406686]  ffff88007c2bb5b0 ffff880075b7b8e0 ffff88007a7604b0 ffff88006c640800
      [ 6538.406686]  ffff88007a7604b0 ffff880075d77390 0000000075b7b878 ffffffffa06a309d
      [ 6538.406686]  ffff880075d752d8 ffff880075b7b990 ffff880075b7b898 0000000000000000
      [ 6538.406686] Call Trace:
      [ 6538.406686]  [<ffffffffa06a309d>] ? ocfs2_read_group_descriptor+0x6d/0xa0 [ocfs2]
      [ 6538.406686]  [<ffffffffa06a3654>] _ocfs2_free_suballoc_bits+0xe4/0x320 [ocfs2]
      [ 6538.406686]  [<ffffffffa06a1750>] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2]
      [ 6538.406686]  [<ffffffffa06a397e>] _ocfs2_free_clusters+0xee/0x210 [ocfs2]
      [ 6538.406686]  [<ffffffffa06a1750>] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2]
      [ 6538.406686]  [<ffffffffa06a1750>] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2]
      [ 6538.406686]  [<ffffffffa0682d50>] ? ocfs2_extend_trans+0x50/0x1a0 [ocfs2]
      [ 6538.406686]  [<ffffffffa06a3ad5>] ocfs2_free_clusters+0x15/0x20 [ocfs2]
      [ 6538.406686]  [<ffffffffa065072c>] ocfs2_replay_truncate_records+0xfc/0x290 [ocfs2]
      [ 6538.406686]  [<ffffffffa06843ac>] ? ocfs2_start_trans+0xec/0x1d0 [ocfs2]
      [ 6538.406686]  [<ffffffffa0654600>] __ocfs2_flush_truncate_log+0x140/0x2d0 [ocfs2]
      [ 6538.406686]  [<ffffffffa0654394>] ? ocfs2_reserve_blocks_for_rec_trunc.clone.0+0x44/0x170 [ocfs2]
      [ 6538.406686]  [<ffffffffa065acd4>] ocfs2_remove_btree_range+0x374/0x630 [ocfs2]
      [ 6538.406686]  [<ffffffffa017486b>] ? jbd2_journal_stop+0x25b/0x470 [jbd2]
      [ 6538.406686]  [<ffffffffa065d5b5>] ocfs2_commit_truncate+0x305/0x670 [ocfs2]
      [ 6538.406686]  [<ffffffffa0683430>] ? ocfs2_journal_access_eb+0x20/0x20 [ocfs2]
      [ 6538.406686]  [<ffffffffa067adb7>] ocfs2_truncate_file+0x297/0x380 [ocfs2]
      [ 6538.406686]  [<ffffffffa01759e4>] ? jbd2_journal_begin_ordered_truncate+0x64/0xc0 [jbd2]
      [ 6538.406686]  [<ffffffffa067c7a2>] ocfs2_setattr+0x572/0x860 [ocfs2]
      [ 6538.406686]  [<ffffffff810e4a3f>] ? current_fs_time+0x3f/0x50
      [ 6538.406686]  [<ffffffff812124b7>] notify_change+0x1d7/0x340
      [ 6538.406686]  [<ffffffff8121abf9>] ? generic_getxattr+0x79/0x80
      [ 6538.406686]  [<ffffffff811f5876>] do_truncate+0x66/0x90
      [ 6538.406686]  [<ffffffff81120e30>] ? __audit_syscall_entry+0xb0/0x110
      [ 6538.406686]  [<ffffffff811f5bb3>] do_sys_ftruncate.clone.0+0xf3/0x120
      [ 6538.406686]  [<ffffffff811f5bee>] SyS_ftruncate+0xe/0x10
      [ 6538.406686]  [<ffffffff816aa2ae>] entry_SYSCALL_64_fastpath+0x12/0x71
      [ 6538.406686] Code: 28 48 81 ee b0 04 00 00 48 8b 92 50 fb ff ff 48 8b 80 b0 03 00 00 48 39 90 88 00 00 00 0f 84 30 fe ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b 0f 1f 00 eb fb 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
      [ 6538.406686] RIP  [<ffffffffa06a286b>] ocfs2_block_group_clear_bits+0x23b/0x250 [ocfs2]
      [ 6538.406686]  RSP <ffff880075b7b7f8>
      [ 6538.691128] ---[ end trace 31cd7011d6770d7e ]---
      [ 6538.694492] Kernel panic - not syncing: Fatal exception
      [ 6538.695484] Kernel Offset: disabled
      
      Fixes: de92c8ca("jbd2: speedup jbd2_journal_get_[write|undo]_access()")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      087ffd4e
  14. 25 11月, 2015 1 次提交
    • J
      jbd2: Fix unreclaimed pages after truncate in data=journal mode · bc23f0c8
      Jan Kara 提交于
      Ted and Namjae have reported that truncated pages don't get timely
      reclaimed after being truncated in data=journal mode. The following test
      triggers the issue easily:
      
      for (i = 0; i < 1000; i++) {
      	pwrite(fd, buf, 1024*1024, 0);
      	fsync(fd);
      	fsync(fd);
      	ftruncate(fd, 0);
      }
      
      The reason is that journal_unmap_buffer() finds that truncated buffers
      are not journalled (jh->b_transaction == NULL), they are part of
      checkpoint list of a transaction (jh->b_cp_transaction != NULL) and have
      been already written out (!buffer_dirty(bh)). We clean such buffers but
      we leave them in the checkpoint list. Since checkpoint transaction holds
      a reference to the journal head, these buffers cannot be released until
      the checkpoint transaction is cleaned up. And at that point we don't
      call release_buffer_page() anymore so pages detached from mapping are
      lingering in the system waiting for reclaim to find them and free them.
      
      Fix the problem by removing buffers from transaction checkpoint lists
      when journal_unmap_buffer() finds out they don't have to be there
      anymore.
      Reported-and-tested-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Fixes: de1b7941Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      bc23f0c8
  15. 07 11月, 2015 1 次提交
    • M
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman 提交于
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      reserves.
      
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      
      This patch then converts a number of sites
      
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0164adc
  16. 04 8月, 2015 1 次提交
    • L
      jbd2: limit number of reserved credits · 6d3ec14d
      Lukas Czerner 提交于
      Currently there is no limitation on number of reserved credits we can
      ask for. If we ask for more reserved credits than 1/2 of maximum
      transaction size, or if total number of credits exceeds the maximum
      transaction size per operation (which is currently only possible with
      the former) we will spin forever in start_this_handle().
      
      Fix this by adding this limitation at the start of start_this_handle().
      
      This patch also removes the credit limitation 1/2 of maximum transaction
      size, since we really only want to limit the number of reserved credits.
      There is not much point to limit the credits if there is still space in
      the journal.
      
      This accidentally also fixes the online resize, where due to the
      limitation of the journal credits we're unable to grow file systems with
      1k block size and size between 16M and 32M. It has been partially fixed
      by 2c869b26, but not entirely.
      
      Thanks Jan Kara for helping me getting the correct fix.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      6d3ec14d
  17. 13 7月, 2015 1 次提交
    • J
      jbd2: speedup jbd2_journal_dirty_metadata() · 6e06ae88
      Jan Kara 提交于
      It is often the case that we mark buffer as having dirty metadata when
      the buffer is already in that state (frequent for bitmaps, inode table
      blocks, superblock). Thus it is unnecessary to contend on grabbing
      journal head reference and bh_state lock. Avoid that by checking whether
      any modification to the buffer is needed before grabbing any locks or
      references.
      
      [ Note: this is a fixed version of commit 2143c196, which was
        reverted in ebeaa8dd due to a false positive triggering of an
        assertion check. -- Ted ]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      6e06ae88
  18. 28 6月, 2015 1 次提交
    • L
      Revert "jbd2: speedup jbd2_journal_dirty_metadata()" · ebeaa8dd
      Linus Torvalds 提交于
      This reverts commit 2143c196.
      
      This commit seems to be the cause of the following jbd2 assertion
      failure:
      
         ------------[ cut here ]------------
         kernel BUG at fs/jbd2/transaction.c:1325!
         invalid opcode: 0000 [#1] SMP
         Modules linked in: bnep bluetooth fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 ...
         CPU: 7 PID: 5509 Comm: gcc Not tainted 4.1.0-10944-g2a298679 #1
         Hardware name:                  /DH87RL, BIOS RLH8710H.86A.0327.2014.0924.1645 09/24/2014
         task: ffff8803bf866040 ti: ffff880308528000 task.ti: ffff880308528000
         RIP: jbd2_journal_dirty_metadata+0x237/0x290
         Call Trace:
           __ext4_handle_dirty_metadata+0x43/0x1f0
           ext4_handle_dirty_dirent_node+0xde/0x160
           ? jbd2_journal_get_write_access+0x36/0x50
           ext4_delete_entry+0x112/0x160
           ? __ext4_journal_start_sb+0x52/0xb0
           ext4_unlink+0xfa/0x260
           vfs_unlink+0xec/0x190
           do_unlinkat+0x24a/0x270
           SyS_unlink+0x11/0x20
           entry_SYSCALL_64_fastpath+0x12/0x6a
         ---[ end trace ae033ebde8d080b4 ]---
      
      which is not easily reproducible (I've seen it just once, and then Ted
      was able to reproduce it once).  Revert it while Ted and Jan try to
      figure out what is wrong.
      
      Cc: Jan Kara <jack@suse.cz>
      Acked-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ebeaa8dd
  19. 21 6月, 2015 1 次提交
    • J
      jbd2: speedup jbd2_journal_dirty_metadata() · 2143c196
      Jan Kara 提交于
      It is often the case that we mark buffer as having dirty metadata when
      the buffer is already in that state (frequent for bitmaps, inode table
      blocks, superblock). Thus it is unnecessary to contend on grabbing
      journal head reference and bh_state lock. Avoid that by checking whether
      any modification to the buffer is needed before grabbing any locks or
      references.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      2143c196
  20. 09 6月, 2015 4 次提交
    • J
      jbd2: speedup jbd2_journal_get_[write|undo]_access() · de92c8ca
      Jan Kara 提交于
      jbd2_journal_get_write_access() and jbd2_journal_get_create_access() are
      frequently called for buffers that are already part of the running
      transaction - most frequently it is the case for bitmaps, inode table
      blocks, and superblock. Since in such cases we have nothing to do, it is
      unfortunate we still grab reference to journal head, lock the bh, lock
      bh_state only to find out there's nothing to do.
      
      Improving this is a bit subtle though since until we find out journal
      head is attached to the running transaction, it can disappear from under
      us because checkpointing / commit decided it's no longer needed. We deal
      with this by protecting journal_head slab with RCU. We still have to be
      careful about journal head being freed & reallocated within slab and
      about exposing journal head in consistent state (in particular
      b_modified and b_frozen_data must be in correct state before we allow
      user to touch the buffer).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      de92c8ca
    • J
      jbd2: more simplifications in do_get_write_access() · 8b00f400
      Jan Kara 提交于
      Check for the simple case of unjournaled buffer first, handle it and
      bail out. This allows us to remove one if and unindent the difficult case
      by one tab. The result is easier to read.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      8b00f400
    • J
      jbd2: simplify error path on allocation failure in do_get_write_access() · d012aa59
      Jan Kara 提交于
      We were acquiring bh_state_lock when allocation of buffer failed in
      do_get_write_access() only to be able to jump to a label that releases
      the lock and does all other checks that don't make sense for this error
      path. Just jump into the right label instead.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      d012aa59
    • J
      jbd2: simplify code flow in do_get_write_access() · ee57aba1
      Jan Kara 提交于
      needs_copy is set only in one place in do_get_write_access(), just move
      the frozen buffer copying into that place and factor it out to a
      separate function to make do_get_write_access() slightly more readable.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ee57aba1
  21. 08 6月, 2015 1 次提交
    • M
      jbd2: revert must-not-fail allocation loops back to GFP_NOFAIL · 6ccaf3e2
      Michal Hocko 提交于
      This basically reverts 47def826 (jbd2: Remove __GFP_NOFAIL from jbd2
      layer). The deprecation of __GFP_NOFAIL was a bad choice because it led
      to open coding the endless loop around the allocator rather than
      removing the dependency on the non failing allocation. So the
      deprecation was a clear failure and the reality tells us that
      __GFP_NOFAIL is not even close to go away.
      
      It is still true that __GFP_NOFAIL allocations are generally discouraged
      and new uses should be evaluated and an alternative (pre-allocations or
      reservations) should be considered but it doesn't make any sense to lie
      the allocator about the requirements. Allocator can take steps to help
      making a progress if it knows the requirements.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      6ccaf3e2
  22. 15 5月, 2015 1 次提交
    • L
      ext4: fix NULL pointer dereference when journal restart fails · 9d506594
      Lukas Czerner 提交于
      Currently when journal restart fails, we'll have the h_transaction of
      the handle set to NULL to indicate that the handle has been effectively
      aborted. We handle this situation quietly in the jbd2_journal_stop() and just
      free the handle and exit because everything else has been done before we
      attempted (and failed) to restart the journal.
      
      Unfortunately there are a number of problems with that approach
      introduced with commit
      
      41a5b913 "jbd2: invalidate handle if jbd2_journal_restart()
      fails"
      
      First of all in ext4 jbd2_journal_stop() will be called through
      __ext4_journal_stop() where we would try to get a hold of the superblock
      by dereferencing h_transaction which in this case would lead to NULL
      pointer dereference and crash.
      
      In addition we're going to free the handle regardless of the refcount
      which is bad as well, because others up the call chain will still
      reference the handle so we might potentially reference already freed
      memory.
      
      Moreover it's expected that we'll get aborted handle as well as detached
      handle in some of the journalling function as the error propagates up
      the stack, so it's unnecessary to call WARN_ON every time we get
      detached handle.
      
      And finally we might leak some memory by forgetting to free reserved
      handle in jbd2_journal_stop() in the case where handle was detached from
      the transaction (h_transaction is NULL).
      
      Fix the NULL pointer dereference in __ext4_journal_stop() by just
      calling jbd2_journal_stop() quietly as suggested by Jan Kara. Also fix
      the potential memory leak in jbd2_journal_stop() and use proper
      handle refcounting before we attempt to free it to avoid use-after-free
      issues.
      
      And finally remove all WARN_ON(!transaction) from the code so that we do
      not get random traces when something goes wrong because when journal
      restart fails we will get to some of those functions.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      9d506594
  23. 16 7月, 2014 1 次提交
    • N
      sched: Remove proliferation of wait_on_bit() action functions · 74316201
      NeilBrown 提交于
      The current "wait_on_bit" interface requires an 'action'
      function to be provided which does the actual waiting.
      There are over 20 such functions, many of them identical.
      Most cases can be satisfied by one of just two functions, one
      which uses io_schedule() and one which just uses schedule().
      
      So:
       Rename wait_on_bit and        wait_on_bit_lock to
              wait_on_bit_action and wait_on_bit_lock_action
       to make it explicit that they need an action function.
      
       Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
       which are *not* given an action function but implicitly use
       a standard one.
       The decision to error-out if a signal is pending is now made
       based on the 'mode' argument rather than being encoded in the action
       function.
      
       All instances of the old wait_on_bit and wait_on_bit_lock which
       can use the new version have been changed accordingly and their
       action functions have been discarded.
       wait_on_bit{_lock} does not return any specific error code in the
       event of a signal so the caller must check for non-zero and
       interpolate their own error code as appropriate.
      
      The wait_on_bit() call in __fscache_wait_on_invalidate() was
      ambiguous as it specified TASK_UNINTERRUPTIBLE but used
      fscache_wait_bit_interruptible as an action function.
      David Howells confirms this should be uniformly
      "uninterruptible"
      
      The main remaining user of wait_on_bit{,_lock}_action is NFS
      which needs to use a freezer-aware schedule() call.
      
      A comment in fs/gfs2/glock.c notes that having multiple 'action'
      functions is useful as they display differently in the 'wchan'
      field of 'ps'. (and /proc/$PID/wchan).
      As the new bit_wait{,_io} functions are tagged "__sched", they
      will not show up at all, but something higher in the stack.  So
      the distinction will still be visible, only with different
      function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
      gfs2/glock.c case).
      
      Since first version of this patch (against 3.15) two new action
      functions appeared, on in NFS and one in CIFS.  CIFS also now
      uses an action function that makes the same freezer aware
      schedule call as NFS.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
      Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74316201
  24. 06 7月, 2014 1 次提交
  25. 13 3月, 2014 1 次提交
  26. 09 3月, 2014 3 次提交
  27. 18 2月, 2014 1 次提交
  28. 09 12月, 2013 3 次提交
  29. 01 7月, 2013 2 次提交
    • T
      jbd2: invalidate handle if jbd2_journal_restart() fails · 41a5b913
      Theodore Ts'o 提交于
      If jbd2_journal_restart() fails the handle will have been disconnected
      from the current transaction.  In this situation, the handle must not
      be used for for any jbd2 function other than jbd2_journal_stop().
      Enforce this with by treating a handle which has a NULL transaction
      pointer as an aborted handle, and issue a kernel warning if
      jbd2_journal_extent(), jbd2_journal_get_write_access(),
      jbd2_journal_dirty_metadata(), etc. is called with an invalid handle.
      
      This commit also fixes a bug where jbd2_journal_stop() would trip over
      a kernel jbd2 assertion check when trying to free an invalid handle.
      
      Also move the responsibility of setting current->journal_info to
      start_this_handle(), simplifying the three users of this function.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: NYounger Liu <younger.liu@huawei.com>
      Cc: Jan Kara <jack@suse.cz>
      41a5b913
    • T
      jbd2: fix theoretical race in jbd2__journal_restart · 39c04153
      Theodore Ts'o 提交于
      Once we decrement transaction->t_updates, if this is the last handle
      holding the transaction from closing, and once we release the
      t_handle_lock spinlock, it's possible for the transaction to commit
      and be released.  In practice with normal kernels, this probably won't
      happen, since the commit happens in a separate kernel thread and it's
      unlikely this could all happen within the space of a few CPU cycles.
      
      On the other hand, with a real-time kernel, this could potentially
      happen, so save the tid found in transaction->t_tid before we release
      t_handle_lock.  It would require an insane configuration, such as one
      where the jbd2 thread was set to a very high real-time priority,
      perhaps because a high priority real-time thread is trying to read or
      write to a file system.  But some people who use real-time kernels
      have been known to do insane things, including controlling
      laser-wielding industrial robots.  :-)
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      39c04153