1. 03 1月, 2019 1 次提交
    • N
      locks: fix error in locks_move_blocks() · bf77ae4c
      NeilBrown 提交于
      After moving all requests from
         fl->fl_blocked_requests
      to
         new->fl_blocked_requests
      
      it is nonsensical to do anything to all the remaining elements, there
      aren't any.  This should do something to all the requests that have been
      moved. For simplicity, it does it to all requests in the target list.
      
      Setting "f->fl_blocker = new" to all members of new->fl_blocked_requests
      is "obviously correct" as it preserves the invariant of the linkage
      among requests.
      
      Reported-by: syzbot+239d99847eb49ecb3899@syzkaller.appspotmail.com
      Fixes: 5946c431 ("fs/locks: allow a lock request to block other requests.")
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      bf77ae4c
  2. 29 12月, 2018 18 次提交
  3. 27 12月, 2018 14 次提交
    • J
      f2fs: sanity check of xattr entry size · 64beba05
      Jaegeuk Kim 提交于
      There is a security report where f2fs_getxattr() has a hole to expose wrong
      memory region when the image is malformed like this.
      
      f2fs_getxattr: entry->e_name_len: 4, size: 12288, buffer_size: 16384, len: 4
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      64beba05
    • S
      f2fs: fix use-after-free issue when accessing sbi->stat_info · 60aa4d55
      Sahitya Tummala 提交于
      iput() on sbi->node_inode can update sbi->stat_info
      in the below context, if the f2fs_write_checkpoint()
      has failed with error.
      
      f2fs_balance_fs_bg+0x1ac/0x1ec
      f2fs_write_node_pages+0x4c/0x260
      do_writepages+0x80/0xbc
      __writeback_single_inode+0xdc/0x4ac
      writeback_single_inode+0x9c/0x144
      write_inode_now+0xc4/0xec
      iput+0x194/0x22c
      f2fs_put_super+0x11c/0x1e8
      generic_shutdown_super+0x70/0xf4
      kill_block_super+0x2c/0x5c
      kill_f2fs_super+0x44/0x50
      deactivate_locked_super+0x60/0x8c
      deactivate_super+0x68/0x74
      cleanup_mnt+0x40/0x78
      
      Fix this by moving f2fs_destroy_stats() further below iput() in
      both f2fs_put_super() and f2fs_fill_super() paths.
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      60aa4d55
    • C
      f2fs: check PageWriteback flag for ordered case · bae0ee7a
      Chao Yu 提交于
      For all ordered cases in f2fs_wait_on_page_writeback(), we need to
      check PageWriteback status, so let's clean up to relocate the check
      into f2fs_wait_on_page_writeback().
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bae0ee7a
    • M
      f2fs: fix validation of the block count in sanity_check_raw_super · 88960068
      Martin Blumenstingl 提交于
      Treat "block_count" from struct f2fs_super_block as 64-bit little endian
      value in sanity_check_raw_super() because struct f2fs_super_block
      declares "block_count" as "__le64".
      
      This fixes a bug where the superblock validation fails on big endian
      devices with the following error:
        F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0)
        F2FS-fs (sda1): Can't find valid F2FS filesystem in 1th superblock
        F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0)
        F2FS-fs (sda1): Can't find valid F2FS filesystem in 2th superblock
      As result of this the partition cannot be mounted.
      
      With this patch applied the superblock validation works fine and the
      partition can be mounted again:
        F2FS-fs (sda1): Mounted with checkpoint version = 7c84
      
      My little endian x86-64 hardware was able to mount the partition without
      this fix.
      To confirm that mounting f2fs filesystems works on big endian machines
      again I tested this on a 32-bit MIPS big endian (lantiq) device.
      
      Fixes: 0cfe75c5 ("f2fs: enhance sanity_check_raw_super() to avoid potential overflows")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      88960068
    • J
      f2fs: fix missing unlock(sbi->gc_mutex) · 8f31b466
      Jaegeuk Kim 提交于
      This fixes missing unlock call.
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8f31b466
    • C
      f2fs: fix to dirty inode synchronously · b32e0190
      Chao Yu 提交于
      If user change inode's i_flags via ioctl, let's add it into global
      dirty list, so that checkpoint can guarantee its persistence before
      fsync, it can make checkpoint keeping strong consistency.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b32e0190
    • C
      f2fs: clean up structure extent_node · c0362117
      Chao Yu 提交于
      The union in struct extent_node wass only to indicate below fields
      
      	struct rb_node rb_node;
      	union {
      		struct {
      			unsigned int fofs;
      			unsigned int len;
      		...
      	...
      
      can be parsed as fields in struct rb_entry, but they were never be
      used explicitly before, so let's remove them for cleanup.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c0362117
    • Q
      f2fs: fix block address for __check_sit_bitmap · 9249dded
      Qiuyang Sun 提交于
      Should use lstart (logical start address) instead of start (in dev) here.
      This fixes a bug in multi-device scenarios.
      Signed-off-by: NQiuyang Sun <sunqiuyang@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      9249dded
    • S
      f2fs: fix sbi->extent_list corruption issue · e4589fa5
      Sahitya Tummala 提交于
      When there is a failure in f2fs_fill_super() after/during
      the recovery of fsync'd nodes, it frees the current sbi and
      retries again. This time the mount is successful, but the files
      that got recovered before retry, still holds the extent tree,
      whose extent nodes list is corrupted since sbi and sbi->extent_list
      is freed up. The list_del corruption issue is observed when the
      file system is getting unmounted and when those recoverd files extent
      node is being freed up in the below context.
      
      list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
      <...>
      kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
      lr : __list_del_entry_valid+0x94/0xb4
      pc : __list_del_entry_valid+0x94/0xb4
      <...>
      Call trace:
      __list_del_entry_valid+0x94/0xb4
      __release_extent_node+0xb0/0x114
      __free_extent_tree+0x58/0x7c
      f2fs_shrink_extent_tree+0xdc/0x3b0
      f2fs_leave_shrinker+0x28/0x7c
      f2fs_put_super+0xfc/0x1e0
      generic_shutdown_super+0x70/0xf4
      kill_block_super+0x2c/0x5c
      kill_f2fs_super+0x44/0x50
      deactivate_locked_super+0x60/0x8c
      deactivate_super+0x68/0x74
      cleanup_mnt+0x40/0x78
      __cleanup_mnt+0x1c/0x28
      task_work_run+0x48/0xd0
      do_notify_resume+0x678/0xe98
      work_pending+0x8/0x14
      
      Fix this by not creating extents for those recovered files if shrinker is
      not registered yet. Once mount is successful and shrinker is registered,
      those files can have extents again.
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e4589fa5
    • C
      f2fs: clean up checkpoint flow · 8ec18bff
      Chao Yu 提交于
      This patch cleans up checkpoint flow a bit:
      - remove unneeded circulation of flushing meta pages.
      - don't flush nat_bits pages in prior to other checkpoint pages.
      - add bug_on to check remained meta pages after flushing.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8ec18bff
    • J
      f2fs: flush stale issued discard candidates · 76c7bfb3
      Jaegeuk Kim 提交于
      Sometimes, I could observe # of issuing_discard to be 1 which blocks background
      jobs due to is_idle()=false.
      The only way to get out of it was to trigger gc_urgent. This patch avoids that
      by checking any candidates as done in the list.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      76c7bfb3
    • J
      f2fs: correct wrong spelling, issing_* · 72691af6
      Jaegeuk Kim 提交于
      Let's use "queued" instead of "issuing".
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      72691af6
    • J
      f2fs: use kvmalloc, if kmalloc is failed · 5222595d
      Jaegeuk Kim 提交于
      One report says memalloc failure during mount.
      
       (unwind_backtrace) from [<c010cd4c>] (show_stack+0x10/0x14)
       (show_stack) from [<c049c6b8>] (dump_stack+0x8c/0xa0)
       (dump_stack) from [<c024fcf0>] (warn_alloc+0xc4/0x160)
       (warn_alloc) from [<c0250218>] (__alloc_pages_nodemask+0x3f4/0x10d0)
       (__alloc_pages_nodemask) from [<c0270450>] (kmalloc_order_trace+0x2c/0x120)
       (kmalloc_order_trace) from [<c03fa748>] (build_node_manager+0x35c/0x688)
       (build_node_manager) from [<c03de494>] (f2fs_fill_super+0xf0c/0x16cc)
       (f2fs_fill_super) from [<c02a5864>] (mount_bdev+0x15c/0x188)
       (mount_bdev) from [<c03da624>] (f2fs_mount+0x18/0x20)
       (f2fs_mount) from [<c02a68b8>] (mount_fs+0x158/0x19c)
       (mount_fs) from [<c02c3c9c>] (vfs_kern_mount+0x78/0x134)
       (vfs_kern_mount) from [<c02c76ac>] (do_mount+0x474/0xca4)
       (do_mount) from [<c02c8264>] (SyS_mount+0x94/0xbc)
       (SyS_mount) from [<c0108180>] (ret_fast_syscall+0x0/0x48)
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5222595d
    • Y
      f2fs: remove redundant comment of unused wio_mutex · af56b487
      Yunlong Song 提交于
      Commit 089842de ("f2fs: remove codes of unused wio_mutex") removes codes
      of unused wio_mutex, but missing the comment, so delete it.
      Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      af56b487
  4. 23 12月, 2018 1 次提交
    • C
      Revert "vfs: Allow userns root to call mknod on owned filesystems." · 94f82008
      Christian Brauner 提交于
      This reverts commit 55956b59.
      
      commit 55956b59 ("vfs: Allow userns root to call mknod on owned filesystems.")
      enabled mknod() in user namespaces for userns root if CAP_MKNOD is
      available. However, these device nodes are useless since any filesystem
      mounted from a non-initial user namespace will set the SB_I_NODEV flag on
      the filesystem. Now, when a device node s created in a non-initial user
      namespace a call to open() on said device node will fail due to:
      
      bool may_open_dev(const struct path *path)
      {
              return !(path->mnt->mnt_flags & MNT_NODEV) &&
                      !(path->mnt->mnt_sb->s_iflags & SB_I_NODEV);
      }
      
      The problem with this is that as of the aforementioned commit mknod()
      creates partially functional device nodes in non-initial user namespaces.
      In particular, it has the consequence that as of the aforementioned commit
      open() will be more privileged with respect to device nodes than mknod().
      Before it was the other way around. Specifically, if mknod() succeeded
      then it was transparent for any userspace application that a fatal error
      must have occured when open() failed.
      
      All of this breaks multiple userspace workloads and a widespread assumption
      about how to handle mknod(). Basically, all container runtimes and systemd
      live by the slogan "ask for forgiveness not permission" when running user
      namespace workloads. For mknod() the assumption is that if the syscall
      succeeds the device nodes are useable irrespective of whether it succeeds
      in a non-initial user namespace or not. This logic was chosen explicitly
      to allow for the glorious day when mknod() will actually be able to create
      fully functional device nodes in user namespaces.
      A specific problem people are already running into when running 4.18 rc
      kernels are failing systemd services. For any distro that is run in a
      container systemd services started with the PrivateDevices= property set
      will fail to start since the device nodes in question cannot be
      opened (cf. the arguments in [1]).
      
      Full disclosure, Seth made the very sound argument that it is already
      possible to end up with partially functional device nodes. Any filesystem
      mounted with MS_NODEV set will allow mknod() to succeed but will not allow
      open() to succeed. The difference to the case here is that the MS_NODEV
      case is transparent to userspace since it is an explicitly set mount option
      while the SB_I_NODEV case is an implicit property enforced by the kernel
      and hence opaque to userspace.
      
      [1]: https://github.com/systemd/systemd/pull/9483Signed-off-by: NChristian Brauner <christian@brauner.io>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Seth Forshee <seth.forshee@canonical.com>
      Cc: Serge Hallyn <serge@hallyn.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      94f82008
  5. 22 12月, 2018 3 次提交
  6. 20 12月, 2018 3 次提交
    • D
      iomap: Revert "fs/iomap.c: get/put the page in iomap_page_create/release()" · a837eca2
      Dave Chinner 提交于
      This reverts commit 61c6de66.
      
      The reverted commit added page reference counting to iomap page
      structures that are used to track block size < page size state. This
      was supposed to align the code with page migration page accounting
      assumptions, but what it has done instead is break XFS filesystems.
      Every fstests run I've done on sub-page block size XFS filesystems
      has since picking up this commit 2 days ago has failed with bad page
      state errors such as:
      
      # ./run_check.sh "-m rmapbt=1,reflink=1 -i sparse=1 -b size=1k" "generic/038"
      ....
      SECTION       -- xfs
      FSTYP         -- xfs (debug)
      PLATFORM      -- Linux/x86_64 test1 4.20.0-rc6-dgc+
      MKFS_OPTIONS  -- -f -m rmapbt=1,reflink=1 -i sparse=1 -b size=1k /dev/sdc
      MOUNT_OPTIONS -- /dev/sdc /mnt/scratch
      
      generic/038 454s ...
       run fstests generic/038 at 2018-12-20 18:43:05
       XFS (sdc): Unmounting Filesystem
       XFS (sdc): Mounting V5 Filesystem
       XFS (sdc): Ending clean mount
       BUG: Bad page state in process kswapd0  pfn:3a7fa
       page:ffffea0000ccbeb0 count:0 mapcount:0 mapping:ffff88800d9b6360 index:0x1
       flags: 0xfffffc0000000()
       raw: 000fffffc0000000 dead000000000100 dead000000000200 ffff88800d9b6360
       raw: 0000000000000001 0000000000000000 00000000ffffffff
       page dumped because: non-NULL mapping
       CPU: 0 PID: 676 Comm: kswapd0 Not tainted 4.20.0-rc6-dgc+ #915
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
       Call Trace:
        dump_stack+0x67/0x90
        bad_page.cold.116+0x8a/0xbd
        free_pcppages_bulk+0x4bf/0x6a0
        free_unref_page_list+0x10f/0x1f0
        shrink_page_list+0x49d/0xf50
        shrink_inactive_list+0x19d/0x3b0
        shrink_node_memcg.constprop.77+0x398/0x690
        ? shrink_slab.constprop.81+0x278/0x3f0
        shrink_node+0x7a/0x2f0
        kswapd+0x34b/0x6d0
        ? node_reclaim+0x240/0x240
        kthread+0x11f/0x140
        ? __kthread_bind_mask+0x60/0x60
        ret_from_fork+0x24/0x30
       Disabling lock debugging due to kernel taint
      ....
      
      The failures are from anyway that frees pages and empties the
      per-cpu page magazines, so it's not a predictable failure or an easy
      to debug failure.
      
      generic/038 is a reliable reproducer of this problem - it has a 9 in
      10 failure rate on one of my test machines. Failure on other
      machines have been at random points in fstests runs but every run
      has ended up tripping this problem. Hence generic/038 was used to
      bisect the failure because it was the most reliable failure.
      
      It is too close to the 4.20 release (not to mention holidays) to
      try to diagnose, fix and test the underlying cause of the problem,
      so reverting the commit is the only option we have right now. The
      revert has been tested against a current tot 4.20-rc7+ kernel across
      multiple machines running sub-page block size XFs filesystems and
      none of the bad page state failures have been seen.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Cc: Piotr Jaroszynski <pjaroszynski@nvidia.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Brian Foster <bfoster@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a837eca2
    • D
      xfs: stringify scrub types in ftrace output · 86d163db
      Darrick J. Wong 提交于
      Use __print_symbolic to print the scrub type in ftrace output.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      86d163db
    • D
      xfs: stringify btree cursor types in ftrace output · c494213f
      Darrick J. Wong 提交于
      Use __print_symbolic to print the btree type in ftrace output.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      c494213f