1. 25 6月, 2019 3 次提交
  2. 22 6月, 2019 3 次提交
  3. 19 6月, 2019 2 次提交
    • R
      f2fs: fix to avoid accessing xattr across the boundary · ae3787d4
      Randall Huang 提交于
      [ Upstream commit 2777e654371dd4207a3a7f4fb5fa39550053a080 ]
      
      When we traverse xattr entries via __find_xattr(),
      if the raw filesystem content is faked or any hardware failure occurs,
      out-of-bound error can be detected by KASAN.
      Fix the issue by introducing boundary check.
      
      [   38.402878] c7   1827 BUG: KASAN: slab-out-of-bounds in f2fs_getxattr+0x518/0x68c
      [   38.402891] c7   1827 Read of size 4 at addr ffffffc0b6fb35dc by task
      [   38.402935] c7   1827 Call trace:
      [   38.402952] c7   1827 [<ffffff900809003c>] dump_backtrace+0x0/0x6bc
      [   38.402966] c7   1827 [<ffffff9008090030>] show_stack+0x20/0x2c
      [   38.402981] c7   1827 [<ffffff900871ab10>] dump_stack+0xfc/0x140
      [   38.402995] c7   1827 [<ffffff9008325c40>] print_address_description+0x80/0x2d8
      [   38.403009] c7   1827 [<ffffff900832629c>] kasan_report_error+0x198/0x1fc
      [   38.403022] c7   1827 [<ffffff9008326104>] kasan_report_error+0x0/0x1fc
      [   38.403037] c7   1827 [<ffffff9008325000>] __asan_load4+0x1b0/0x1b8
      [   38.403051] c7   1827 [<ffffff90085fcc44>] f2fs_getxattr+0x518/0x68c
      [   38.403066] c7   1827 [<ffffff90085fc508>] f2fs_xattr_generic_get+0xb0/0xd0
      [   38.403080] c7   1827 [<ffffff9008395708>] __vfs_getxattr+0x1f4/0x1fc
      [   38.403096] c7   1827 [<ffffff9008621bd0>] inode_doinit_with_dentry+0x360/0x938
      [   38.403109] c7   1827 [<ffffff900862d6cc>] selinux_d_instantiate+0x2c/0x38
      [   38.403123] c7   1827 [<ffffff900861b018>] security_d_instantiate+0x68/0x98
      [   38.403136] c7   1827 [<ffffff9008377db8>] d_splice_alias+0x58/0x348
      [   38.403149] c7   1827 [<ffffff900858d16c>] f2fs_lookup+0x608/0x774
      [   38.403163] c7   1827 [<ffffff900835eacc>] lookup_slow+0x1e0/0x2cc
      [   38.403177] c7   1827 [<ffffff9008367fe0>] walk_component+0x160/0x520
      [   38.403190] c7   1827 [<ffffff9008369ef4>] path_lookupat+0x110/0x2b4
      [   38.403203] c7   1827 [<ffffff900835dd38>] filename_lookup+0x1d8/0x3a8
      [   38.403216] c7   1827 [<ffffff900835eeb0>] user_path_at_empty+0x54/0x68
      [   38.403229] c7   1827 [<ffffff9008395f44>] SyS_getxattr+0xb4/0x18c
      [   38.403241] c7   1827 [<ffffff9008084200>] el0_svc_naked+0x34/0x38
      Signed-off-by: NRandall Huang <huangrandall@google.com>
      [Jaegeuk Kim: Fix wrong ending boundary]
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ae3787d4
    • W
      fs/ocfs2: fix race in ocfs2_dentry_attach_lock() · 6b9aa7ac
      Wengang Wang 提交于
      commit be99ca2716972a712cde46092c54dee5e6192bf8 upstream.
      
      ocfs2_dentry_attach_lock() can be executed in parallel threads against the
      same dentry.  Make that race safe.  The race is like this:
      
                  thread A                               thread B
      
      (A1) enter ocfs2_dentry_attach_lock,
      seeing dentry->d_fsdata is NULL,
      and no alias found by
      ocfs2_find_local_alias, so kmalloc
      a new ocfs2_dentry_lock structure
      to local variable "dl", dl1
      
                     .....
      
                                          (B1) enter ocfs2_dentry_attach_lock,
                                          seeing dentry->d_fsdata is NULL,
                                          and no alias found by
                                          ocfs2_find_local_alias so kmalloc
                                          a new ocfs2_dentry_lock structure
                                          to local variable "dl", dl2.
      
                                                         ......
      
      (A2) set dentry->d_fsdata with dl1,
      call ocfs2_dentry_lock() and increase
      dl1->dl_lockres.l_ro_holders to 1 on
      success.
                    ......
      
                                          (B2) set dentry->d_fsdata with dl2
                                          call ocfs2_dentry_lock() and increase
      				    dl2->dl_lockres.l_ro_holders to 1 on
      				    success.
      
                                                        ......
      
      (A3) call ocfs2_dentry_unlock()
      and decrease
      dl2->dl_lockres.l_ro_holders to 0
      on success.
                   ....
      
                                          (B3) call ocfs2_dentry_unlock(),
                                          decreasing
      				    dl2->dl_lockres.l_ro_holders, but
      				    see it's zero now, panic
      
      Link: http://lkml.kernel.org/r/20190529174636.22364-1-wen.gang.wang@oracle.comSigned-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Reported-by: NDaniel Sobe <daniel.sobe@nxp.com>
      Tested-by: NDaniel Sobe <daniel.sobe@nxp.com>
      Reviewed-by: NChangwei Ge <gechangwei@live.cn>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b9aa7ac
  4. 15 6月, 2019 18 次提交
    • A
      ovl: support stacked SEEK_HOLE/SEEK_DATA · afec7068
      Amir Goldstein 提交于
      commit 9e46b840c7053b5f7a245e98cd239b60d189a96c upstream.
      
      Overlay file f_pos is the master copy that is preserved
      through copy up and modified on read/write, but only real
      fs knows how to SEEK_HOLE/SEEK_DATA and real fs may impose
      limitations that are more strict than ->s_maxbytes for specific
      files, so we use the real file to perform seeks.
      
      We do not call real fs for SEEK_CUR:0 query and for SEEK_SET:0
      requests.
      
      Fixes: d1d04ef8 ("ovl: stack file ops")
      Reported-by: NEddie Horng <eddiehorng.tw@gmail.com>
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      afec7068
    • J
      ovl: check the capability before cred overridden · 22dac6cc
      Jiufei Xue 提交于
      commit 98487de318a6f33312471ae1e2afa16fbf8361fe upstream.
      
      We found that it return success when we set IMMUTABLE_FL flag to a file in
      docker even though the docker didn't have the capability
      CAP_LINUX_IMMUTABLE.
      
      The commit d1d04ef8 ("ovl: stack file ops") and dab5ca8f ("ovl: add
      lsattr/chattr support") implemented chattr operations on a regular overlay
      file. ovl_real_ioctl() overridden the current process's subjective
      credentials with ofs->creator_cred which have the capability
      CAP_LINUX_IMMUTABLE so that it will return success in
      vfs_ioctl()->cap_capable().
      
      Fix this by checking the capability before cred overridden. And here we
      only care about APPEND_FL and IMMUTABLE_FL, so get these information from
      inode.
      
      [SzM: move check and call to underlying fs inside inode locked region to
      prevent two such calls from racing with each other]
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Cc: Amir Goldstein <amir73il@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      22dac6cc
    • A
      nfsd: avoid uninitialized variable warning · 806e8395
      Arnd Bergmann 提交于
      [ Upstream commit 0ab88ca4bcf18ba21058d8f19220f60afe0d34d8 ]
      
      clang warns that 'contextlen' may be accessed without an initialization:
      
      fs/nfsd/nfs4xdr.c:2911:9: error: variable 'contextlen' is uninitialized when used here [-Werror,-Wuninitialized]
                                                                      contextlen);
                                                                      ^~~~~~~~~~
      fs/nfsd/nfs4xdr.c:2424:16: note: initialize the variable 'contextlen' to silence this warning
              int contextlen;
                            ^
                             = 0
      
      Presumably this cannot happen, as FATTR4_WORD2_SECURITY_LABEL is
      set if CONFIG_NFSD_V4_SECURITY_LABEL is enabled.
      Adding another #ifdef like the other two in this function
      avoids the warning.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      806e8395
    • J
      nfsd: allow fh_want_write to be called twice · b4330e4a
      J. Bruce Fields 提交于
      [ Upstream commit 0b8f62625dc309651d0efcb6a6247c933acd8b45 ]
      
      A fuzzer recently triggered lockdep warnings about potential sb_writers
      deadlocks caused by fh_want_write().
      
      Looks like we aren't careful to pair each fh_want_write() with an
      fh_drop_write().
      
      It's not normally a problem since fh_put() will call fh_drop_write() for
      us.  And was OK for NFSv3 where we'd do one operation that might call
      fh_want_write(), and then put the filehandle.
      
      But an NFSv4 protocol fuzzer can do weird things like call unlink twice
      in a compound, and then we get into trouble.
      
      I'm a little worried about this approach of just leaving everything to
      fh_put().  But I think there are probably a lot of
      fh_want_write()/fh_drop_write() imbalances so for now I think we need it
      to be more forgiving.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b4330e4a
    • K
      fuse: retrieve: cap requested size to negotiated max_write · ae35c325
      Kirill Smelkov 提交于
      [ Upstream commit 7640682e67b33cab8628729afec8ca92b851394f ]
      
      FUSE filesystem server and kernel client negotiate during initialization
      phase, what should be the maximum write size the client will ever issue.
      Correspondingly the filesystem server then queues sys_read calls to read
      requests with buffer capacity large enough to carry request header + that
      max_write bytes. A filesystem server is free to set its max_write in
      anywhere in the range between [1*page, fc->max_pages*page]. In particular
      go-fuse[2] sets max_write by default as 64K, wheres default fc->max_pages
      corresponds to 128K. Libfuse also allows users to configure max_write, but
      by default presets it to possible maximum.
      
      If max_write is < fc->max_pages*page, and in NOTIFY_RETRIEVE handler we
      allow to retrieve more than max_write bytes, corresponding prepared
      NOTIFY_REPLY will be thrown away by fuse_dev_do_read, because the
      filesystem server, in full correspondence with server/client contract, will
      be only queuing sys_read with ~max_write buffer capacity, and
      fuse_dev_do_read throws away requests that cannot fit into server request
      buffer. In turn the filesystem server could get stuck waiting indefinitely
      for NOTIFY_REPLY since NOTIFY_RETRIEVE handler returned OK which is
      understood by clients as that NOTIFY_REPLY was queued and will be sent
      back.
      
      Cap requested size to negotiate max_write to avoid the problem.  This
      aligns with the way NOTIFY_RETRIEVE handler works, which already
      unconditionally caps requested retrieve size to fuse_conn->max_pages.  This
      way it should not hurt NOTIFY_RETRIEVE semantic if we return less data than
      was originally requested.
      
      Please see [1] for context where the problem of stuck filesystem was hit
      for real, how the situation was traced and for more involving patch that
      did not make it into the tree.
      
      [1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2
      [2] https://github.com/hanwen/go-fuseSigned-off-by: NKirill Smelkov <kirr@nexedi.com>
      Cc: Han-Wen Nienhuys <hanwen@google.com>
      Cc: Jakob Unterwurzacher <jakobunt@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ae35c325
    • A
      ovl: do not generate duplicate fsnotify events for "fake" path · 06382ad6
      Amir Goldstein 提交于
      [ Upstream commit d989903058a83e8536cc7aadf9256a47d5c173fe ]
      
      Overlayfs "fake" path is used for stacked file operations on underlying
      files.  Operations on files with "fake" path must not generate fsnotify
      events with path data, because those events have already been generated at
      overlayfs layer and because the reported event->fd for fanotify marks on
      underlying inode/filesystem will have the wrong path (the overlayfs path).
      
      Link: https://lore.kernel.org/linux-fsdevel/20190423065024.12695-1-jencce.kernel@gmail.com/Reported-by: NMurphy Zhou <jencce.kernel@gmail.com>
      Fixes: d1d04ef8 ("ovl: stack file ops")
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      06382ad6
    • Y
      configfs: fix possible use-after-free in configfs_register_group · 4dc146d4
      YueHaibing 提交于
      [ Upstream commit 35399f87e271f7cf3048eab00a421a6519ac8441 ]
      
      In configfs_register_group(), if create_default_group() failed, we
      forget to unlink the group. It will left a invalid item in the parent list,
      which may trigger the use-after-free issue seen below:
      
      BUG: KASAN: use-after-free in __list_add_valid+0xd4/0xe0 lib/list_debug.c:26
      Read of size 8 at addr ffff8881ef61ae20 by task syz-executor.0/5996
      
      CPU: 1 PID: 5996 Comm: syz-executor.0 Tainted: G         C        5.0.0+ #5
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xa9/0x10e lib/dump_stack.c:113
       print_address_description+0x65/0x270 mm/kasan/report.c:187
       kasan_report+0x149/0x18d mm/kasan/report.c:317
       __list_add_valid+0xd4/0xe0 lib/list_debug.c:26
       __list_add include/linux/list.h:60 [inline]
       list_add_tail include/linux/list.h:93 [inline]
       link_obj+0xb0/0x190 fs/configfs/dir.c:759
       link_group+0x1c/0x130 fs/configfs/dir.c:784
       configfs_register_group+0x56/0x1e0 fs/configfs/dir.c:1751
       configfs_register_default_group+0x72/0xc0 fs/configfs/dir.c:1834
       ? 0xffffffffc1be0000
       iio_sw_trigger_init+0x23/0x1000 [industrialio_sw_trigger]
       do_one_initcall+0xbc/0x47d init/main.c:887
       do_init_module+0x1b5/0x547 kernel/module.c:3456
       load_module+0x6405/0x8c10 kernel/module.c:3804
       __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
       do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f494ecbcc58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003
      RBP: 00007f494ecbcc70 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f494ecbd6bc
      R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
      
      Allocated by task 5987:
       set_track mm/kasan/common.c:87 [inline]
       __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:497
       kmalloc include/linux/slab.h:545 [inline]
       kzalloc include/linux/slab.h:740 [inline]
       configfs_register_default_group+0x4c/0xc0 fs/configfs/dir.c:1829
       0xffffffffc1bd0023
       do_one_initcall+0xbc/0x47d init/main.c:887
       do_init_module+0x1b5/0x547 kernel/module.c:3456
       load_module+0x6405/0x8c10 kernel/module.c:3804
       __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
       do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 5987:
       set_track mm/kasan/common.c:87 [inline]
       __kasan_slab_free+0x130/0x180 mm/kasan/common.c:459
       slab_free_hook mm/slub.c:1429 [inline]
       slab_free_freelist_hook mm/slub.c:1456 [inline]
       slab_free mm/slub.c:3003 [inline]
       kfree+0xe1/0x270 mm/slub.c:3955
       configfs_register_default_group+0x9a/0xc0 fs/configfs/dir.c:1836
       0xffffffffc1bd0023
       do_one_initcall+0xbc/0x47d init/main.c:887
       do_init_module+0x1b5/0x547 kernel/module.c:3456
       load_module+0x6405/0x8c10 kernel/module.c:3804
       __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
       do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8881ef61ae00
       which belongs to the cache kmalloc-192 of size 192
      The buggy address is located 32 bytes inside of
       192-byte region [ffff8881ef61ae00, ffff8881ef61aec0)
      The buggy address belongs to the page:
      page:ffffea0007bd8680 count:1 mapcount:0 mapping:ffff8881f6c03000 index:0xffff8881ef61a700
      flags: 0x2fffc0000000200(slab)
      raw: 02fffc0000000200 ffffea0007ca4740 0000000500000005 ffff8881f6c03000
      raw: ffff8881ef61a700 000000008010000c 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8881ef61ad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8881ef61ad80: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
      >ffff8881ef61ae00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                     ^
       ffff8881ef61ae80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff8881ef61af00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 5cf6a51e ("configfs: allow dynamic group creation")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4dc146d4
    • C
      f2fs: fix to do checksum even if inode page is uptodate · 8d7ebdd1
      Chao Yu 提交于
      [ Upstream commit b42b179bda9ff11075a6fc2bac4d9e400513679a ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203221
      
      - Overview
      When mounting the attached crafted image and running program, this error is reported.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_07.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      
      - Messages
       kernel BUG at fs/f2fs/node.c:1279!
       RIP: 0010:read_node_page+0xcf/0xf0
       Call Trace:
        __get_node_page+0x6b/0x2f0
        f2fs_iget+0x8f/0xdf0
        f2fs_lookup+0x136/0x320
        __lookup_slow+0x92/0x140
        lookup_slow+0x30/0x50
        walk_component+0x1c1/0x350
        path_lookupat+0x62/0x200
        filename_lookup+0xb3/0x1a0
        do_fchmodat+0x3e/0xa0
        __x64_sys_chmod+0x12/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      On below paths, we can have opportunity to readahead inode page
      - gc_node_segment -> f2fs_ra_node_page
      - gc_data_segment -> f2fs_ra_node_page
      - f2fs_fill_dentries -> f2fs_ra_node_page
      
      Unlike synchronized read, on readahead path, we can set page uptodate
      before verifying page's checksum, then read_node_page() will trigger
      kernel panic once it encounters a uptodated page w/ incorrect checksum.
      
      So considering readahead scenario, we have to do checksum each time
      when loading inode page even if it is uptodated.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8d7ebdd1
    • C
      f2fs: fix to do sanity check on valid block count of segment · 64024854
      Chao Yu 提交于
      [ Upstream commit e95bcdb2fefa129f37bd9035af1d234ca92ee4ef ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203233
      
      - Overview
      When mounting the attached crafted image and running program, following errors are reported.
      Additionally, it hangs on sync after running program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      cc poc_13.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
      - Kernel messages
       F2FS-fs (sdb): Bitmap was wrongly set, blk:4608
       kernel BUG at fs/f2fs/segment.c:2102!
       RIP: 0010:update_sit_entry+0x394/0x410
       Call Trace:
        f2fs_allocate_data_block+0x16f/0x660
        do_write_page+0x62/0x170
        f2fs_do_write_node_page+0x33/0xa0
        __write_node_page+0x270/0x4e0
        f2fs_sync_node_pages+0x5df/0x670
        f2fs_write_checkpoint+0x372/0x1400
        f2fs_sync_fs+0xa3/0x130
        f2fs_do_sync_file+0x1a6/0x810
        do_fsync+0x33/0x60
        __x64_sys_fsync+0xb/0x10
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      sit.vblocks and sum valid block count in sit.valid_map may be
      inconsistent, segment w/ zero vblocks will be treated as free
      segment, while allocating in free segment, we may allocate a
      free block, if its bitmap is valid previously, it can cause
      kernel crash due to bitmap verification failure.
      
      Anyway, to avoid further serious metadata inconsistence and
      corruption, it is necessary and worth to detect SIT
      inconsistence. So let's enable check_block_count() to verify
      vblocks and valid_map all the time rather than do it only
      CONFIG_F2FS_CHECK_FS is enabled.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      64024854
    • C
      f2fs: fix to use inline space only if inline_xattr is enable · 101e48fe
      Chao Yu 提交于
      [ Upstream commit 622927f3b8809206f6da54a6a7ed4df1a7770fce ]
      
      With below mkfs and mount option:
      
      MKFS_OPTIONS  -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f
      MOUNT_OPTIONS -- -o noinline_xattr
      
      We may miss xattr data with below testcase:
      - mkdir dir
      - setfattr -n "user.name" -v 0 dir
      - for ((i = 0; i < 190; i++)) do touch dir/$i; done
      - umount
      - mount
      - getfattr -n "user.name" dir
      
      user.name: No such attribute
      
      The root cause is that we persist xattr data into reserved inline xattr
      space, even if inline_xattr is not enable in inline directory inode, after
      inline dentry conversion, reserved space no longer exists, so that xattr
      data missed.
      
      Let's use inline xattr space only if inline_xattr flag is set on inode
      to fix this iusse.
      
      Fixes: 6afc662e ("f2fs: support flexible inline xattr size")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      101e48fe
    • C
      f2fs: fix to avoid panic in dec_valid_block_count() · 45624f0e
      Chao Yu 提交于
      [ Upstream commit 5e159cd349bf3a31fb7e35c23a93308eb30f4f71 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203209
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after the this script.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_01.c
      ./run.sh f2fs
      sync
      
       kernel BUG at fs/f2fs/f2fs.h:1788!
       RIP: 0010:f2fs_truncate_data_blocks_range+0x342/0x350
       Call Trace:
        f2fs_truncate_blocks+0x36d/0x3c0
        f2fs_truncate+0x88/0x110
        f2fs_setattr+0x3e1/0x460
        notify_change+0x2da/0x400
        do_truncate+0x6d/0xb0
        do_sys_ftruncate+0xf1/0x160
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is dec_valid_block_count() will trigger kernel panic due to
      inconsistent count in between inode.i_blocks and actual block.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      45624f0e
    • C
      f2fs: fix to clear dirty inode in error path of f2fs_iget() · 47a92acf
      Chao Yu 提交于
      [ Upstream commit 546d22f070d64a7b96f57c93333772085d3a5e6d ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203217
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after running the program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_test_05.c
      mkdir test
      mount -t f2fs tmp.img test
      sudo ./a.out
      sync
      
      - Messages
       kernel BUG at fs/f2fs/inode.c:707!
       RIP: 0010:f2fs_evict_inode+0x33f/0x3a0
       Call Trace:
        evict+0xba/0x180
        f2fs_iget+0x598/0xdf0
        f2fs_lookup+0x136/0x320
        __lookup_slow+0x92/0x140
        lookup_slow+0x30/0x50
        walk_component+0x1c1/0x350
        path_lookupat+0x62/0x200
        filename_lookup+0xb3/0x1a0
        do_readlinkat+0x56/0x110
        __x64_sys_readlink+0x16/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      During inode loading, __recover_inline_status() can recovery inode status
      and set inode dirty, once we failed in following process, it will fail
      the check in f2fs_evict_inode, result in trigger BUG_ON().
      
      Let's clear dirty inode in error path of f2fs_iget() to avoid panic.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      47a92acf
    • C
      f2fs: fix to do sanity check on free nid · ca9fcbc5
      Chao Yu 提交于
      [ Upstream commit 626bcf2b7ce87211dba565f2bfa7842ba5be5c1b ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203225
      
      - Overview
      When mounting the attached crafted image and unmounting it, following errors are reported.
      Additionally, it hangs on sync after unmounting.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      touch test/t
      umount test
      sync
      
      - Messages
       kernel BUG at fs/f2fs/node.c:3073!
       RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
       Call Trace:
        f2fs_put_super+0xf4/0x270
        generic_shutdown_super+0x62/0x110
        kill_block_super+0x1c/0x50
        kill_f2fs_super+0xad/0xd0
        deactivate_locked_super+0x35/0x60
        cleanup_mnt+0x36/0x70
        task_work_run+0x75/0x90
        exit_to_usermode_loop+0x93/0xa0
        do_syscall_64+0xba/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
      
      NAT table is corrupted, so reserved meta/node inode ids were added into
      free list incorrectly, during file creation, since reserved id has cached
      in inode hash, so it fails the creation and preallocated nid can not be
      released later, result in kernel panic.
      
      To fix this issue, let's do nid boundary check during free nid loading.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ca9fcbc5
    • C
      f2fs: fix to avoid panic in f2fs_remove_inode_page() · f3aa313d
      Chao Yu 提交于
      [ Upstream commit 8b6810f8acfe429fde7c7dad4714692cc5f75651 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203219
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after running the program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_06.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
      - Messages
       kernel BUG at fs/f2fs/node.c:1183!
       RIP: 0010:f2fs_remove_inode_page+0x294/0x2d0
       Call Trace:
        f2fs_evict_inode+0x2a3/0x3a0
        evict+0xba/0x180
        __dentry_kill+0xbe/0x160
        dentry_kill+0x46/0x180
        dput+0xbb/0x100
        do_renameat2+0x3c9/0x550
        __x64_sys_rename+0x17/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is f2fs_remove_inode_page() will trigger kernel panic due to
      inconsistent i_blocks value of inode.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing of potential image corruption.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f3aa313d
    • C
      f2fs: fix to avoid panic in f2fs_inplace_write_data() · 0325c5cc
      Chao Yu 提交于
      [ Upstream commit 05573d6ccf702df549a7bdeabef31e4753df1a90 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203239
      
      - Overview
      When mounting the attached crafted image and running program, following errors are reported.
      Additionally, it hangs on sync after running program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      cc poc_15.c
      ./run.sh f2fs
      sync
      
      - Kernel messages
       ------------[ cut here ]------------
       kernel BUG at fs/f2fs/segment.c:3162!
       RIP: 0010:f2fs_inplace_write_data+0x12d/0x160
       Call Trace:
        f2fs_do_write_data_page+0x3c1/0x820
        __write_data_page+0x156/0x720
        f2fs_write_cache_pages+0x20d/0x460
        f2fs_write_data_pages+0x1b4/0x300
        do_writepages+0x15/0x60
        __filemap_fdatawrite_range+0x7c/0xb0
        file_write_and_wait_range+0x2c/0x80
        f2fs_do_sync_file+0x102/0x810
        do_fsync+0x33/0x60
        __x64_sys_fsync+0xb/0x10
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is f2fs_inplace_write_data() will trigger kernel panic due
      to data block locates in node type segment.
      
      To avoid panic, let's just return error code and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0325c5cc
    • C
      f2fs: fix to avoid panic in do_recover_data() · 8490bf2d
      Chao Yu 提交于
      [ Upstream commit 22d61e286e2d9097dae36f75ed48801056b77cac ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203227
      
      - Overview
      When mounting the attached crafted image, following errors are reported.
      Additionally, it hangs on sync after trying to mount it.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      sync
      
      - Messages
       kernel BUG at fs/f2fs/recovery.c:549!
       RIP: 0010:recover_data+0x167a/0x1780
       Call Trace:
        f2fs_recover_fsync_data+0x613/0x710
        f2fs_fill_super+0x1043/0x1aa0
        mount_bdev+0x16d/0x1a0
        mount_fs+0x4a/0x170
        vfs_kern_mount+0x5d/0x100
        do_mount+0x200/0xcf0
        ksys_mount+0x79/0xc0
        __x64_sys_mount+0x1c/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      During recovery, if ofs_of_node is inconsistent in between recovered
      node page and original checkpointed node page, let's just fail recovery
      instead of making kernel panic.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8490bf2d
    • A
      mm: page_mkclean vs MADV_DONTNEED race · 38c5fce7
      Aneesh Kumar K.V 提交于
      [ Upstream commit 024eee0e83f0df52317be607ca521e0fc572aa07 ]
      
      MADV_DONTNEED is handled with mmap_sem taken in read mode.  We call
      page_mkclean without holding mmap_sem.
      
      MADV_DONTNEED implies that pages in the region are unmapped and subsequent
      access to the pages in that range is handled as a new page fault.  This
      implies that if we don't have parallel access to the region when
      MADV_DONTNEED is run we expect those range to be unallocated.
      
      w.r.t page_mkclean() we need to make sure that we don't break the
      MADV_DONTNEED semantics.  MADV_DONTNEED check for pmd_none without holding
      pmd_lock.  This implies we skip the pmd if we temporarily mark pmd none.
      Avoid doing that while marking the page clean.
      
      Keep the sequence same for dax too even though we don't support
      MADV_DONTNEED for dax mapping
      
      The bug was noticed by code review and I didn't observe any failures w.r.t
      test run.  This is similar to
      
      commit 58ceeb6b
      Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Date:   Thu Apr 13 14:56:26 2017 -0700
      
          thp: fix MADV_DONTNEED vs. MADV_FREE race
      
      commit ced10803
      Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Date:   Thu Apr 13 14:56:20 2017 -0700
      
          thp: fix MADV_DONTNEED vs. numa balancing race
      
      Link: http://lkml.kernel.org/r/20190321040610.14226-1-aneesh.kumar@linux.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc:"Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      38c5fce7
    • H
      fs/fat/file.c: issue flush after the writeback of FAT · 5b6619b4
      Hou Tao 提交于
      [ Upstream commit bd8309de0d60838eef6fb575b0c4c7e95841cf73 ]
      
      fsync() needs to make sure the data & meta-data of file are persistent
      after the return of fsync(), even when a power-failure occurs later.  In
      the case of fat-fs, the FAT belongs to the meta-data of file, so we need
      to issue a flush after the writeback of FAT instead before.
      
      Also bail out early when any stage of fsync fails.
      
      Link: http://lkml.kernel.org/r/20190409030158.136316-1-houtao1@huawei.comSigned-off-by: NHou Tao <houtao1@huawei.com>
      Acked-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5b6619b4
  5. 11 6月, 2019 7 次提交
  6. 09 6月, 2019 7 次提交
    • B
      Revert "lockd: Show pid of lockd for remote locks" · 87304193
      Benjamin Coddington 提交于
      commit 141731d15d6eb2fd9aaefbf9b935ce86ae243074 upstream.
      
      This reverts most of commit b8eee0e90f97 ("lockd: Show pid of lockd for
      remote locks"), which caused remote locks to not be differentiated between
      remote processes for NLM.
      
      We retain the fixup for setting the client's fl_pid to a negative value.
      
      Fixes: b8eee0e90f97 ("lockd: Show pid of lockd for remote locks")
      Cc: stable@vger.kernel.org
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: NXueWei Zhang <xueweiz@google.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87304193
    • R
      CIFS: cifs_read_allocate_pages: don't iterate through whole page array on ENOMEM · 297a2510
      Roberto Bergantinos Corpas 提交于
      commit 31fad7d41e73731f05b8053d17078638cf850fa6 upstream.
      
       In cifs_read_allocate_pages, in case of ENOMEM, we go through
      whole rdata->pages array but we have failed the allocation before
      nr_pages, therefore we may end up calling put_page with NULL
      pointer, causing oops
      Signed-off-by: NRoberto Bergantinos Corpas <rbergant@redhat.com>
      Acked-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      297a2510
    • C
      cifs: fix memory leak of pneg_inbuf on -EOPNOTSUPP ioctl case · 32d57c0c
      Colin Ian King 提交于
      commit 210782038b54ec8e9059a3c12d6f6ae173efa3a9 upstream.
      
      Currently in the case where SMB2_ioctl returns the -EOPNOTSUPP error
      there is a memory leak of pneg_inbuf. Fix this by returning via
      the out_free_inbuf exit path that will perform the relevant kfree.
      
      Addresses-Coverity: ("Resource leak")
      Fixes: 969ae8e8d4ee ("cifs: Accept validate negotiate if server return NT_STATUS_NOT_SUPPORTED")
      CC: Stable <stable@vger.kernel.org> # v5.1+
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      32d57c0c
    • F
      Btrfs: incremental send, fix file corruption when no-holes feature is enabled · 8a652fd1
      Filipe Manana 提交于
      commit 6b1f72e5b82a5c2a4da4d1ebb8cc01913ddbea21 upstream.
      
      When using the no-holes feature, if we have a file with prealloc extents
      with a start offset beyond the file's eof, doing an incremental send can
      cause corruption of the file due to incorrect hole detection. Such case
      requires that the prealloc extent(s) exist in both the parent and send
      snapshots, and that a hole is punched into the file that covers all its
      extents that do not cross the eof boundary.
      
      Example reproducer:
      
        $ mkfs.btrfs -f -O no-holes /dev/sdb
        $ mount /dev/sdb /mnt/sdb
      
        $ xfs_io -f -c "pwrite -S 0xab 0 500K" /mnt/sdb/foobar
        $ xfs_io -c "falloc -k 1200K 800K" /mnt/sdb/foobar
      
        $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/base
      
        $ btrfs send -f /tmp/base.snap /mnt/sdb/base
      
        $ xfs_io -c "fpunch 0 500K" /mnt/sdb/foobar
      
        $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/incr
      
        $ btrfs send -p /mnt/sdb/base -f /tmp/incr.snap /mnt/sdb/incr
      
        $ md5sum /mnt/sdb/incr/foobar
        816df6f64deba63b029ca19d880ee10a   /mnt/sdb/incr/foobar
      
        $ mkfs.btrfs -f /dev/sdc
        $ mount /dev/sdc /mnt/sdc
      
        $ btrfs receive -f /tmp/base.snap /mnt/sdc
        $ btrfs receive -f /tmp/incr.snap /mnt/sdc
      
        $ md5sum /mnt/sdc/incr/foobar
        cf2ef71f4a9e90c2f6013ba3b2257ed2   /mnt/sdc/incr/foobar
      
          --> Different checksum, because the prealloc extent beyond the
              file's eof confused the hole detection code and it assumed
              a hole starting at offset 0 and ending at the offset of the
              prealloc extent (1200Kb) instead of ending at the offset
              500Kb (the file's size).
      
      Fix this by ensuring we never cross the file's size when issuing the
      write operations for a hole.
      
      Fixes: 16e7549f ("Btrfs: incompatible format change to remove hole extents")
      CC: stable@vger.kernel.org # 3.14+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a652fd1
    • F
      Btrfs: fix fsync not persisting changed attributes of a directory · a8107111
      Filipe Manana 提交于
      commit 60d9f50308e5df19bc18c2fefab0eba4a843900a upstream.
      
      While logging an inode we follow its ancestors and for each one we mark
      it as logged in the current transaction, even if we have not logged it.
      As a consequence if we change an attribute of an ancestor, such as the
      UID or GID for example, and then explicitly fsync it, we end up not
      logging the inode at all despite returning success to user space, which
      results in the attribute being lost if a power failure happens after
      the fsync.
      
      Sample reproducer:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ mkdir /mnt/dir
        $ chown 6007:6007 /mnt/dir
      
        $ sync
      
        $ chown 9003:9003 /mnt/dir
        $ touch /mnt/dir/file
        $ xfs_io -c fsync /mnt/dir/file
      
        # fsync our directory after fsync'ing the new file, should persist the
        # new values for the uid and gid.
        $ xfs_io -c fsync /mnt/dir
      
        <power failure>
      
        $ mount /dev/sdb /mnt
        $ stat -c %u:%g /mnt/dir
        6007:6007
      
          --> should be 9003:9003, the uid and gid were not persisted, despite
              the explicit fsync on the directory prior to the power failure
      
      Fix this by not updating the logged_trans field of ancestor inodes when
      logging an inode, since we have not logged them. Let only future calls to
      btrfs_log_inode() to mark inodes as logged.
      
      This could be triggered by my recent fsync fuzz tester for fstests, for
      which an fstests patch exists titled "fstests: generic, fsync fuzz tester
      with fsstress".
      
      Fixes: 12fcfd22 ("Btrfs: tree logging unlink/rename fixes")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8107111
    • F
      Btrfs: fix race updating log root item during fsync · 37fe0383
      Filipe Manana 提交于
      commit 06989c799f04810f6876900d4760c0edda369cf7 upstream.
      
      When syncing the log, the final phase of a fsync operation, we need to
      either create a log root's item or update the existing item in the log
      tree of log roots, and that depends on the current value of the log
      root's log_transid - if it's 1 we need to create the log root item,
      otherwise it must exist already and we update it. Since there is no
      synchronization between updating the log_transid and checking it for
      deciding whether the log root's item needs to be created or updated, we
      end up with a tiny race window that results in attempts to update the
      item to fail because the item was not yet created:
      
                    CPU 1                                    CPU 2
      
        btrfs_sync_log()
      
          lock root->log_mutex
      
          set log root's log_transid to 1
      
          unlock root->log_mutex
      
                                                     btrfs_sync_log()
      
                                                       lock root->log_mutex
      
                                                       sets log root's
                                                       log_transid to 2
      
                                                       unlock root->log_mutex
      
          update_log_root()
      
            sees log root's log_transid
            with a value of 2
      
              calls btrfs_update_root(),
              which fails with -EUCLEAN
              and causes transaction abort
      
      Until recently the race lead to a BUG_ON at btrfs_update_root(), but after
      the recent commit 7ac1e464c4d47 ("btrfs: Don't panic when we can't find a
      root key") we just abort the current transaction.
      
      A sample trace of the BUG_ON() on a SLE12 kernel:
      
        ------------[ cut here ]------------
        kernel BUG at ../fs/btrfs/root-tree.c:157!
        Oops: Exception in kernel mode, sig: 5 [#1]
        SMP NR_CPUS=2048 NUMA pSeries
        (...)
        Supported: Yes, External
        CPU: 78 PID: 76303 Comm: rtas_errd Tainted: G                 X 4.4.156-94.57-default #1
        task: c00000ffa906d010 ti: c00000ff42b08000 task.ti: c00000ff42b08000
        NIP: d000000036ae5cdc LR: d000000036ae5cd8 CTR: 0000000000000000
        REGS: c00000ff42b0b860 TRAP: 0700   Tainted: G                 X  (4.4.156-94.57-default)
        MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 22444484  XER: 20000000
        CFAR: d000000036aba66c SOFTE: 1
        GPR00: d000000036ae5cd8 c00000ff42b0bae0 d000000036bda220 0000000000000054
        GPR04: 0000000000000001 0000000000000000 c00007ffff8d37c8 0000000000000000
        GPR08: c000000000e19c00 0000000000000000 0000000000000000 3736343438312079
        GPR12: 3930373337303434 c000000007a3a800 00000000007fffff 0000000000000023
        GPR16: c00000ffa9d26028 c00000ffa9d261f8 0000000000000010 c00000ffa9d2ab28
        GPR20: c00000ff42b0bc48 0000000000000001 c00000ff9f0d9888 0000000000000001
        GPR24: c00000ffa9d26000 c00000ffa9d261e8 c00000ffa9d2a800 c00000ff9f0d9888
        GPR28: c00000ffa9d26028 c00000ffa9d2aa98 0000000000000001 c00000ffa98f5b20
        NIP [d000000036ae5cdc] btrfs_update_root+0x25c/0x4e0 [btrfs]
        LR [d000000036ae5cd8] btrfs_update_root+0x258/0x4e0 [btrfs]
        Call Trace:
        [c00000ff42b0bae0] [d000000036ae5cd8] btrfs_update_root+0x258/0x4e0 [btrfs] (unreliable)
        [c00000ff42b0bba0] [d000000036b53610] btrfs_sync_log+0x2d0/0xc60 [btrfs]
        [c00000ff42b0bce0] [d000000036b1785c] btrfs_sync_file+0x44c/0x4e0 [btrfs]
        [c00000ff42b0bd80] [c00000000032e300] vfs_fsync_range+0x70/0x120
        [c00000ff42b0bdd0] [c00000000032e44c] do_fsync+0x5c/0xb0
        [c00000ff42b0be10] [c00000000032e8dc] SyS_fdatasync+0x2c/0x40
        [c00000ff42b0be30] [c000000000009488] system_call+0x3c/0x100
        Instruction dump:
        7f43d378 4bffebb9 60000000 88d90008 3d220000 e8b90000 3b390009 e87a01f0
        e8898e08 e8f90000 4bfd48e5 60000000 <0fe00000> e95b0060 39200004 394a0ea0
        ---[ end trace 8f2dc8f919cabab8 ]---
      
      So fix this by doing the check of log_transid and updating or creating the
      log root's item while holding the root's log_mutex.
      
      Fixes: 7237f183 ("Btrfs: fix tree logs parallel sync")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37fe0383
    • F
      Btrfs: fix wrong ctime and mtime of a directory after log replay · 7301bbea
      Filipe Manana 提交于
      commit 5338e43abbab13791144d37fd8846847062351c6 upstream.
      
      When replaying a log that contains a new file or directory name that needs
      to be added to its parent directory, we end up updating the mtime and the
      ctime of the parent directory to the current time after we have set their
      values to the correct ones (set at fsync time), efectivelly losing them.
      
      Sample reproducer:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ mkdir /mnt/dir
        $ touch /mnt/dir/file
      
        # fsync of the directory is optional, not needed
        $ xfs_io -c fsync /mnt/dir
        $ xfs_io -c fsync /mnt/dir/file
      
        $ stat -c %Y /mnt/dir
        1557856079
      
        <power failure>
      
        $ sleep 3
        $ mount /dev/sdb /mnt
        $ stat -c %Y /mnt/dir
        1557856082
      
          --> should have been 1557856079, the mtime is updated to the current
              time when replaying the log
      
      Fix this by not updating the mtime and ctime to the current time at
      btrfs_add_link() when we are replaying a log tree.
      
      This could be triggered by my recent fsync fuzz tester for fstests, for
      which an fstests patch exists titled "fstests: generic, fsync fuzz tester
      with fsstress".
      
      Fixes: e02119d5 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7301bbea