1. 20 11月, 2020 1 次提交
  2. 07 11月, 2020 9 次提交
  3. 22 10月, 2020 3 次提交
  4. 18 10月, 2020 3 次提交
  5. 20 8月, 2020 1 次提交
  6. 08 8月, 2020 3 次提交
  7. 06 8月, 2020 2 次提交
  8. 13 6月, 2020 1 次提交
  9. 04 6月, 2020 1 次提交
    • J
      jbd2: avoid leaking transaction credits when unreserving handle · 14ff6286
      Jan Kara 提交于
      When reserved transaction handle is unused, we subtract its reserved
      credits in __jbd2_journal_unreserve_handle() called from
      jbd2_journal_stop(). However this function forgets to remove reserved
      credits from transaction->t_outstanding_credits and thus the transaction
      space that was reserved remains effectively leaked. The leaked
      transaction space can be quite significant in some cases and leads to
      unnecessarily small transactions and thus reducing throughput of the
      journalling machinery. E.g. fsmark workload creating lots of 4k files
      was observed to have about 20% lower throughput due to this when ext4 is
      mounted with dioread_nolock mount option.
      
      Subtract reserved credits from t_outstanding_credits as well.
      
      CC: stable@vger.kernel.org
      Fixes: 8f7d89f3 ("jbd2: transaction reservation support")
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20200520133119.1383-3-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      14ff6286
  10. 22 5月, 2020 1 次提交
  11. 06 3月, 2020 1 次提交
  12. 01 3月, 2020 1 次提交
    • Q
      jbd2: fix data races at struct journal_head · 6c5d9112
      Qian Cai 提交于
      journal_head::b_transaction and journal_head::b_next_transaction could
      be accessed concurrently as noticed by KCSAN,
      
       LTP: starting fsync04
       /dev/zero: Can't open blockdev
       EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem
       EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
       ==================================================================
       BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2]
      
       write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70:
        __jbd2_journal_refile_buffer+0xdd/0x210 [jbd2]
        __jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569
        jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2]
        (inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034
        kjournald2+0x13b/0x450 [jbd2]
        kthread+0x1cd/0x1f0
        ret_from_fork+0x27/0x50
      
       read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68:
        jbd2_write_access_granted+0x1b2/0x250 [jbd2]
        jbd2_write_access_granted at fs/jbd2/transaction.c:1155
        jbd2_journal_get_write_access+0x2c/0x60 [jbd2]
        __ext4_journal_get_write_access+0x50/0x90 [ext4]
        ext4_mb_mark_diskspace_used+0x158/0x620 [ext4]
        ext4_mb_new_blocks+0x54f/0xca0 [ext4]
        ext4_ind_map_blocks+0xc79/0x1b40 [ext4]
        ext4_map_blocks+0x3b4/0x950 [ext4]
        _ext4_get_block+0xfc/0x270 [ext4]
        ext4_get_block+0x3b/0x50 [ext4]
        __block_write_begin_int+0x22e/0xae0
        __block_write_begin+0x39/0x50
        ext4_write_begin+0x388/0xb50 [ext4]
        generic_perform_write+0x15d/0x290
        ext4_buffered_write_iter+0x11f/0x210 [ext4]
        ext4_file_write_iter+0xce/0x9e0 [ext4]
        new_sync_write+0x29c/0x3b0
        __vfs_write+0x92/0xa0
        vfs_write+0x103/0x260
        ksys_write+0x9d/0x130
        __x64_sys_write+0x4c/0x60
        do_syscall_64+0x91/0xb05
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       5 locks held by fsync04/25724:
        #0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260
        #1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4]
        #2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2]
        #3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4]
        #4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2]
       irq event stamp: 1407125
       hardirqs last  enabled at (1407125): [<ffffffff980da9b7>] __find_get_block+0x107/0x790
       hardirqs last disabled at (1407124): [<ffffffff980da8f9>] __find_get_block+0x49/0x790
       softirqs last  enabled at (1405528): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c
       softirqs last disabled at (1405521): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ #7
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      The plain reads are outside of jh->b_state_lock critical section which result
      in data races. Fix them by adding pairs of READ|WRITE_ONCE().
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NQian Cai <cai@lca.pw>
      Link: https://lore.kernel.org/r/20200222043111.2227-1-cai@lca.pwSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      6c5d9112
  13. 22 2月, 2020 1 次提交
    • W
      jbd2: fix ocfs2 corrupt when clearing block group bits · 8eedabfd
      wangyan 提交于
      I found a NULL pointer dereference in ocfs2_block_group_clear_bits().
      The running environment:
      	kernel version: 4.19
      	A cluster with two nodes, 5 luns mounted on two nodes, and do some
      	file operations like dd/fallocate/truncate/rm on every lun with storage
      	network disconnection.
      
      The fallocate operation on dm-23-45 caused an null pointer dereference.
      
      The information of NULL pointer dereference as follows:
      	[577992.878282] JBD2: Error -5 detected when updating journal superblock for dm-23-45.
      	[577992.878290] Aborting journal on device dm-23-45.
      	...
      	[577992.890778] JBD2: Error -5 detected when updating journal superblock for dm-24-46.
      	[577992.890908] __journal_remove_journal_head: freeing b_committed_data
      	[577992.890916] (fallocate,88392,52):ocfs2_extend_trans:474 ERROR: status = -30
      	[577992.890918] __journal_remove_journal_head: freeing b_committed_data
      	[577992.890920] (fallocate,88392,52):ocfs2_rotate_tree_right:2500 ERROR: status = -30
      	[577992.890922] __journal_remove_journal_head: freeing b_committed_data
      	[577992.890924] (fallocate,88392,52):ocfs2_do_insert_extent:4382 ERROR: status = -30
      	[577992.890928] (fallocate,88392,52):ocfs2_insert_extent:4842 ERROR: status = -30
      	[577992.890928] __journal_remove_journal_head: freeing b_committed_data
      	[577992.890930] (fallocate,88392,52):ocfs2_add_clusters_in_btree:4947 ERROR: status = -30
      	[577992.890933] __journal_remove_journal_head: freeing b_committed_data
      	[577992.890939] __journal_remove_journal_head: freeing b_committed_data
      	[577992.890949] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
      	[577992.890950] Mem abort info:
      	[577992.890951]   ESR = 0x96000004
      	[577992.890952]   Exception class = DABT (current EL), IL = 32 bits
      	[577992.890952]   SET = 0, FnV = 0
      	[577992.890953]   EA = 0, S1PTW = 0
      	[577992.890954] Data abort info:
      	[577992.890955]   ISV = 0, ISS = 0x00000004
      	[577992.890956]   CM = 0, WnR = 0
      	[577992.890958] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000f8da07a9
      	[577992.890960] [0000000000000020] pgd=0000000000000000
      	[577992.890964] Internal error: Oops: 96000004 [#1] SMP
      	[577992.890965] Process fallocate (pid: 88392, stack limit = 0x00000000013db2fd)
      	[577992.890968] CPU: 52 PID: 88392 Comm: fallocate Kdump: loaded Tainted: G        W  OE     4.19.36 #1
      	[577992.890969] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019
      	[577992.890971] pstate: 60400009 (nZCv daif +PAN -UAO)
      	[577992.891054] pc : _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
      	[577992.891082] lr : _ocfs2_free_suballoc_bits+0x618/0x968 [ocfs2]
      	[577992.891084] sp : ffff0000c8e2b810
      	[577992.891085] x29: ffff0000c8e2b820 x28: 0000000000000000
      	[577992.891087] x27: 00000000000006f3 x26: ffffa07957b02e70
      	[577992.891089] x25: ffff807c59d50000 x24: 00000000000006f2
      	[577992.891091] x23: 0000000000000001 x22: ffff807bd39abc30
      	[577992.891093] x21: ffff0000811d9000 x20: ffffa07535d6a000
      	[577992.891097] x19: ffff000001681638 x18: ffffffffffffffff
      	[577992.891098] x17: 0000000000000000 x16: ffff000080a03df0
      	[577992.891100] x15: ffff0000811d9708 x14: 203d207375746174
      	[577992.891101] x13: 73203a524f525245 x12: 20373439343a6565
      	[577992.891103] x11: 0000000000000038 x10: 0101010101010101
      	[577992.891106] x9 : ffffa07c68a85d70 x8 : 7f7f7f7f7f7f7f7f
      	[577992.891109] x7 : 0000000000000000 x6 : 0000000000000080
      	[577992.891110] x5 : 0000000000000000 x4 : 0000000000000002
      	[577992.891112] x3 : ffff000001713390 x2 : 2ff90f88b1c22f00
      	[577992.891114] x1 : ffff807bd39abc30 x0 : 0000000000000000
      	[577992.891116] Call trace:
      	[577992.891139]  _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
      	[577992.891162]  _ocfs2_free_clusters+0x100/0x290 [ocfs2]
      	[577992.891185]  ocfs2_free_clusters+0x50/0x68 [ocfs2]
      	[577992.891206]  ocfs2_add_clusters_in_btree+0x198/0x5e0 [ocfs2]
      	[577992.891227]  ocfs2_add_inode_data+0x94/0xc8 [ocfs2]
      	[577992.891248]  ocfs2_extend_allocation+0x1bc/0x7a8 [ocfs2]
      	[577992.891269]  ocfs2_allocate_extents+0x14c/0x338 [ocfs2]
      	[577992.891290]  __ocfs2_change_file_space+0x3f8/0x610 [ocfs2]
      	[577992.891309]  ocfs2_fallocate+0xe4/0x128 [ocfs2]
      	[577992.891316]  vfs_fallocate+0x11c/0x250
      	[577992.891317]  ksys_fallocate+0x54/0x88
      	[577992.891319]  __arm64_sys_fallocate+0x28/0x38
      	[577992.891323]  el0_svc_common+0x78/0x130
      	[577992.891325]  el0_svc_handler+0x38/0x78
      	[577992.891327]  el0_svc+0x8/0xc
      
      My analysis process as follows:
      ocfs2_fallocate
        __ocfs2_change_file_space
          ocfs2_allocate_extents
            ocfs2_extend_allocation
              ocfs2_add_inode_data
                ocfs2_add_clusters_in_btree
                  ocfs2_insert_extent
                    ocfs2_do_insert_extent
                      ocfs2_rotate_tree_right
                        ocfs2_extend_rotate_transaction
                          ocfs2_extend_trans
                            jbd2_journal_restart
                              jbd2__journal_restart
                                /* handle->h_transaction is NULL,
                                 * is_handle_aborted(handle) is true
                                 */
                                handle->h_transaction = NULL;
                                start_this_handle
                                  return -EROFS;
                  ocfs2_free_clusters
                    _ocfs2_free_clusters
                      _ocfs2_free_suballoc_bits
                        ocfs2_block_group_clear_bits
                          ocfs2_journal_access_gd
                            __ocfs2_journal_access
                              jbd2_journal_get_undo_access
                                /* I think jbd2_write_access_granted() will
                                 * return true, because do_get_write_access()
                                 * will return -EROFS.
                                 */
                                if (jbd2_write_access_granted(...)) return 0;
                                do_get_write_access
                                  /* handle->h_transaction is NULL, it will
                                   * return -EROFS here, so do_get_write_access()
                                   * was not called.
                                   */
                                  if (is_handle_aborted(handle)) return -EROFS;
                          /* bh2jh(group_bh) is NULL, caused NULL
                             pointer dereference */
                          undo_bg = (struct ocfs2_group_desc *)
                                      bh2jh(group_bh)->b_committed_data;
      
      If handle->h_transaction == NULL, then jbd2_write_access_granted()
      does not really guarantee that journal_head will stay around,
      not even speaking of its b_committed_data. The bh2jh(group_bh)
      can be removed after ocfs2_journal_access_gd() and before call
      "bh2jh(group_bh)->b_committed_data". So, we should move
      is_handle_aborted() check from do_get_write_access() into
      jbd2_journal_get_undo_access() and jbd2_journal_get_write_access()
      before the call to jbd2_write_access_granted().
      
      Link: https://lore.kernel.org/r/f72a623f-b3f1-381a-d91d-d22a1c83a336@huawei.comSigned-off-by: NYan Wang <wangyan122@huawei.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJun Piao <piaojun@huawei.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      8eedabfd
  14. 14 2月, 2020 2 次提交
  15. 04 2月, 2020 1 次提交
  16. 03 2月, 2020 1 次提交
    • C
      fs: Enable bmap() function to properly return errors · 30460e1e
      Carlos Maiolino 提交于
      By now, bmap() will either return the physical block number related to
      the requested file offset or 0 in case of error or the requested offset
      maps into a hole.
      This patch makes the needed changes to enable bmap() to proper return
      errors, using the return value as an error return, and now, a pointer
      must be passed to bmap() to be filled with the mapped physical block.
      
      It will change the behavior of bmap() on return:
      
      - negative value in case of error
      - zero on success or map fell into a hole
      
      In case of a hole, the *block will be zero too
      
      Since this is a prep patch, by now, the only error return is -EINVAL if
      ->bmap doesn't exist.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      30460e1e
  17. 25 1月, 2020 8 次提交