1. 14 11月, 2018 32 次提交
    • J
      fsnotify: Fix busy inodes during unmount · 778af261
      Jan Kara 提交于
      commit 721fb6fbfd2132164c2e8777cc837f9b2c1794dc upstream.
      
      Detaching of mark connector from fsnotify_put_mark() can race with
      unmounting of the filesystem like:
      
        CPU1				CPU2
      fsnotify_put_mark()
        spin_lock(&conn->lock);
        ...
        inode = fsnotify_detach_connector_from_object(conn)
        spin_unlock(&conn->lock);
      				generic_shutdown_super()
      				  fsnotify_unmount_inodes()
      				    sees connector detached for inode
      				      -> nothing to do
      				  evict_inode()
      				    barfs on pending inode reference
        iput(inode);
      
      Resulting in "Busy inodes after unmount" message and possible kernel
      oops. Make fsnotify_unmount_inodes() properly wait for outstanding inode
      references from detached connectors.
      
      Note that the accounting of outstanding inode references in the
      superblock can cause some cacheline contention on the counter. OTOH it
      happens only during deletion of the last notification mark from an inode
      (or during unlinking of watched inode) and that is not too bad. I have
      measured time to create & delete inotify watch 100000 times from 64
      processes in parallel (each process having its own inotify group and its
      own file on a shared superblock) on a 64 CPU machine. Average and
      standard deviation of 15 runs look like:
      
      	Avg		Stddev
      Vanilla	9.817400	0.276165
      Fixed	9.710467	0.228294
      
      So there's no statistically significant difference.
      
      Fixes: 6b3f05d2 ("fsnotify: Detach mark from object list when last reference is dropped")
      CC: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      778af261
    • A
      lockd: fix access beyond unterminated strings in prints · 86edf562
      Amir Goldstein 提交于
      commit 93f38b6f upstream.
      
      printk format used %*s instead of %.*s, so hostname_len does not limit
      the number of bytes accessed from hostname.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      86edf562
    • A
      nfsd: correctly decrement odstate refcount in error path · b71f9663
      Andrew Elble 提交于
      commit bd8d725078867cda250fe94b9c5a067b4a64ca74 upstream.
      
      alloc_init_deleg() both allocates an nfs4_delegation, and
      bumps the refcount on odstate. So after this point, we need to
      put_clnt_odstate() and nfs4_put_stid() to not leave the odstate
      refcount inappropriately bumped.
      Signed-off-by: NAndrew Elble <aweits@rit.edu>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b71f9663
    • B
      nfs: Fix a missed page unlock after pg_doio() · d99bbbf1
      Benjamin Coddington 提交于
      commit fdbd1a2e4a71adcb1ae219fcfd964930d77a7f84 upstream.
      
      We must check pg_error and call error_cleanup after any call to pg_doio.
      Currently, we are skipping the unlock of a page if we encounter an error in
      nfs_pageio_complete() before handing off the work to the RPC layer.
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d99bbbf1
    • T
      NFSv4.1: Fix the r/wsize checking · 3a1c13e1
      Trond Myklebust 提交于
      commit 943cff67b842839f4f35364ba2db5c2d3f025d94 upstream.
      
      The intention of nfs4_session_set_rwsize() was to cap the r/wsize to the
      buffer sizes negotiated by the CREATE_SESSION. The initial code had a
      bug whereby we would not check the values negotiated by nfs_probe_fsinfo()
      (the assumption being that CREATE_SESSION will always negotiate buffer values
      that are sane w.r.t. the server's preferred r/wsizes) but would only check
      values set by the user in the 'mount' command.
      
      The code was changed in 4.11 to _always_ set the r/wsize, meaning that we
      now never use the server preferred r/wsizes. This is the regression that
      this patch fixes.
      Also rename the function to nfs4_session_limit_rwsize() in order to avoid
      future confusion.
      
      Fixes: 03385332 (NFSv4.1 respect server's max size in CREATE_SESSION")
      Cc: stable@vger.kernel.org # v4.11+
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a1c13e1
    • S
      smb3: on kerberos mount if server doesn't specify auth type use krb5 · 93e2e867
      Steve French 提交于
      commit 926674de upstream.
      
      Some servers (e.g. Azure) do not include a spnego blob in the SMB3
      negotiate protocol response, so on kerberos mounts ("sec=krb5")
      we can fail, as we expected the server to list its supported
      auth types (OIDs in the spnego blob in the negprot response).
      Change this so that on krb5 mounts we default to trying krb5 if the
      server doesn't list its supported protocol mechanisms.
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93e2e867
    • S
      smb3: do not attempt cifs operation in smb3 query info error path · 108b981d
      Steve French 提交于
      commit 1e77a8c2 upstream.
      
      If backupuid mount option is sent, we can incorrectly retry
      (on access denied on query info) with a cifs (FindFirst) operation
      on an smb3 mount which causes the server to force the session close.
      
      We set backup intent on open so no need for this fallback.
      
      See kernel bugzilla 201435
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      108b981d
    • S
      smb3: allow stats which track session and share reconnects to be reset · eb7814c3
      Steve French 提交于
      commit 2c887635 upstream.
      
      Currently, "echo 0 > /proc/fs/cifs/Stats" resets all of the stats
      except the session and share reconnect counts.  Fix it to
      reset those as well.
      
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NAurelien Aptel <aaptel@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb7814c3
    • C
      userfaultfd: disable irqs when taking the waitqueue lock · d2e97f02
      Christoph Hellwig 提交于
      commit ae62c16e105a869524afcf8a07ee85c5ae5d0479 upstream.
      
      userfaultfd contains howe-grown locking of the waitqueue lock, and does
      not disable interrupts.  This relies on the fact that no one else takes it
      from interrupt context and violates an invariat of the normal waitqueue
      locking scheme.  With aio poll it is easy to trigger other locks that
      disable interrupts (or are called from interrupt context).
      
      Link: http://lkml.kernel.org/r/20181018154101.18750-1-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: <stable@vger.kernel.org>	[4.19.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2e97f02
    • V
      mm: /proc/pid/smaps_rollup: fix NULL pointer deref in smaps_pte_range() · 30391e41
      Vlastimil Babka 提交于
      commit fa76da46 upstream.
      
      Leonardo reports an apparent regression in 4.19-rc7:
      
       BUG: unable to handle kernel NULL pointer dereference at 00000000000000f0
       PGD 0 P4D 0
       Oops: 0000 [#1] PREEMPT SMP PTI
       CPU: 3 PID: 6032 Comm: python Not tainted 4.19.0-041900rc7-lowlatency #201810071631
       Hardware name: LENOVO 80UG/Toronto 4A2, BIOS 0XCN45WW 08/09/2018
       RIP: 0010:smaps_pte_range+0x32d/0x540
       Code: 80 00 00 00 00 74 a9 48 89 de 41 f6 40 52 40 0f 85 04 02 00 00 49 2b 30 48 c1 ee 0c 49 03 b0 98 00 00 00 49 8b 80 a0 00 00 00 <48> 8b b8 f0 00 00 00 e8 b7 ef ec ff 48 85 c0 0f 84 71 ff ff ff a8
       RSP: 0018:ffffb0cbc484fb88 EFLAGS: 00010202
       RAX: 0000000000000000 RBX: 0000560ddb9e9000 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: 0000000560ddb9e9 RDI: 0000000000000001
       RBP: ffffb0cbc484fbc0 R08: ffff94a5a227a578 R09: ffff94a5a227a578
       R10: 0000000000000000 R11: 0000560ddbbe7000 R12: ffffe903098ba728
       R13: ffffb0cbc484fc78 R14: ffffb0cbc484fcf8 R15: ffff94a5a2e9cf48
       FS:  00007f6dfb683740(0000) GS:ffff94a5aaf80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00000000000000f0 CR3: 000000011c118001 CR4: 00000000003606e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        __walk_page_range+0x3c2/0x6f0
        walk_page_vma+0x42/0x60
        smap_gather_stats+0x79/0xe0
        ? gather_pte_stats+0x320/0x320
        ? gather_hugetlb_stats+0x70/0x70
        show_smaps_rollup+0xcd/0x1c0
        seq_read+0x157/0x400
        __vfs_read+0x3a/0x180
        ? security_file_permission+0x93/0xc0
        ? security_file_permission+0x93/0xc0
        vfs_read+0x8f/0x140
        ksys_read+0x55/0xc0
        __x64_sys_read+0x1a/0x20
        do_syscall_64+0x5a/0x110
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Decoded code matched to local compilation+disassembly points to
      smaps_pte_entry():
      
              } else if (unlikely(IS_ENABLED(CONFIG_SHMEM) && mss->check_shmem_swap
                                                              && pte_none(*pte))) {
                      page = find_get_entry(vma->vm_file->f_mapping,
                                                      linear_page_index(vma, addr));
      
      Here, vma->vm_file is NULL.  mss->check_shmem_swap should be false in that
      case, however for smaps_rollup, smap_gather_stats() can set the flag true
      for one vma and leave it true for subsequent vma's where it should be
      false.
      
      To fix, reset the check_shmem_swap flag to false.  There's also related
      bug which sets mss->swap to shmem_swapped, which in the context of
      smaps_rollup overwrites any value accumulated from previous vma's.  Fix
      that as well.
      
      Note that the report suggests a regression between 4.17.19 and 4.19-rc7,
      which makes the 4.19 series ending with commit 258f669e ("mm:
      /proc/pid/smaps_rollup: convert to single value seq_file") suspicious.
      But the mss was reused for rollup since 493b0e9d ("mm: add
      /proc/pid/smaps_rollup") so let's play it safe with the stable backport.
      
      Link: http://lkml.kernel.org/r/555fbd1f-4ac9-0b58-dcd4-5dc4380ff7ca@suse.cz
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=201377
      Fixes: 493b0e9d ("mm: add /proc/pid/smaps_rollup")
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reported-by: NLeonardo Soares Müller <leozinho29_eu@hotmail.com>
      Tested-by: NLeonardo Soares Müller <leozinho29_eu@hotmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30391e41
    • J
      crypto: speck - remove Speck · 3252b60c
      Jason A. Donenfeld 提交于
      commit 578bdaabd015b9b164842c3e8ace9802f38e7ecc upstream.
      
      These are unused, undesired, and have never actually been used by
      anybody. The original authors of this code have changed their mind about
      its inclusion. While originally proposed for disk encryption on low-end
      devices, the idea was discarded [1] in favor of something else before
      that could really get going. Therefore, this patch removes Speck.
      
      [1] https://marc.info/?l=linux-crypto-vger&m=153359499015659Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: NEric Biggers <ebiggers@google.com>
      Cc: stable@vger.kernel.org
      Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3252b60c
    • T
      ext4: fix use-after-free race in ext4_remount()'s error path · 15f255ec
      Theodore Ts'o 提交于
      commit 33458eaba4dfe778a426df6a19b7aad2ff9f7eec upstream.
      
      It's possible for ext4_show_quota_options() to try reading
      s_qf_names[i] while it is being modified by ext4_remount() --- most
      notably, in ext4_remount's error path when the original values of the
      quota file name gets restored.
      
      Reported-by: syzbot+a2872d6feea6918008a9@syzkaller.appspotmail.com
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 3.2+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      15f255ec
    • W
      ext4: propagate error from dquot_initialize() in EXT4_IOC_FSSETXATTR · ce1daaa8
      Wang Shilong 提交于
      commit 182a79e0c17147d2c2d3990a9a7b6b58a1561c7a upstream.
      
      We return most failure of dquota_initialize() except
      inode evict, this could make a bit sense, for example
      we allow file removal even quota files are broken?
      
      But it dosen't make sense to allow setting project
      if quota files etc are broken.
      Signed-off-by: NWang Shilong <wshilong@ddn.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce1daaa8
    • W
      ext4: fix setattr project check in fssetxattr ioctl · 0d0413e9
      Wang Shilong 提交于
      commit dc7ac6c4cae3b58724c2f1e21a7c05ce19ecd5a8 upstream.
      
      Currently, project quota could be changed by fssetxattr
      ioctl, and existed permission check inode_owner_or_capable()
      is obviously not enough, just think that common users could
      change project id of file, that could make users to
      break project quota easily.
      
      This patch try to follow same regular of xfs project
      quota:
      
      "Project Quota ID state is only allowed to change from
      within the init namespace. Enforce that restriction only
      if we are trying to change the quota ID state.
      Everything else is allowed in user namespaces."
      
      Besides that, check and set project id'state should
      be an atomic operation, protect whole operation with
      inode lock, ext4_ioctl_setproject() is only used for
      ioctl EXT4_IOC_FSSETXATTR, we have held mnt_want_write_file()
      before ext4_ioctl_setflags(), and ext4_ioctl_setproject()
      is called after ext4_ioctl_setflags(), we could share
      codes, so remove it inside ext4_ioctl_setproject().
      Signed-off-by: NWang Shilong <wshilong@ddn.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d0413e9
    • L
      ext4: initialize retries variable in ext4_da_write_inline_data_begin() · 99a3b224
      Lukas Czerner 提交于
      commit 625ef8a3acd111d5f496d190baf99d1a815bd03e upstream.
      
      Variable retries is not initialized in ext4_da_write_inline_data_begin()
      which can lead to nondeterministic number of retries in case we hit
      ENOSPC. Initialize retries to zero as we do everywhere else.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Fixes: bc0ca9df ("ext4: retry allocation when inline->extent conversion failed")
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99a3b224
    • T
      ext4: fix EXT4_IOC_SWAP_BOOT · b2af09dd
      Theodore Ts'o 提交于
      commit 18aded17492088962ef43f00825179598b3e8c58 upstream.
      
      The code EXT4_IOC_SWAP_BOOT ioctl hasn't been updated in a while, and
      it's a bit broken with respect to more modern ext4 kernels, especially
      metadata checksums.
      
      Other problems fixed with this commit:
      
      * Don't allow installing a DAX, swap file, or an encrypted file as a
        boot loader.
      
      * Respect the immutable and append-only flags.
      
      * Wait until any DIO operations are finished *before* calling
        truncate_inode_pages().
      
      * Don't swap inode->i_flags, since these flags have nothing to do with
        the inode blocks --- and it will give the IMA/audit code heartburn
        when the inode is evicted.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Reported-by: syzbot+e81ccd4744c6c4f71354@syzkaller.appspotmail.com
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2af09dd
    • A
      gfs2_meta: ->mount() can get NULL dev_name · 8c448126
      Al Viro 提交于
      commit 3df629d873f8683af6f0d34dfc743f637966d483 upstream.
      
      get in sync with mount_bdev() handling of the same
      
      Reported-by: syzbot+c54f8e94e6bba03b04e9@syzkaller.appspotmail.com
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c448126
    • J
      jbd2: fix use after free in jbd2_log_do_checkpoint() · 25881163
      Jan Kara 提交于
      commit ccd3c4373eacb044eb3832966299d13d2631f66f upstream.
      
      The code cleaning transaction's lists of checkpoint buffers has a bug
      where it increases bh refcount only after releasing
      journal->j_list_lock. Thus the following race is possible:
      
      CPU0					CPU1
      jbd2_log_do_checkpoint()
      					jbd2_journal_try_to_free_buffers()
      					  __journal_try_to_free_buffer(bh)
        ...
        while (transaction->t_checkpoint_io_list)
        ...
          if (buffer_locked(bh)) {
      
      <-- IO completes now, buffer gets unlocked -->
      
            spin_unlock(&journal->j_list_lock);
      					    spin_lock(&journal->j_list_lock);
      					    __jbd2_journal_remove_checkpoint(jh);
      					    spin_unlock(&journal->j_list_lock);
      					  try_to_free_buffers(page);
            get_bh(bh) <-- accesses freed bh
      
      Fix the problem by grabbing bh reference before unlocking
      journal->j_list_lock.
      
      Fixes: dc6e8d66 ("jbd2: don't call get_bh() before calling __jbd2_journal_remove_checkpoint()")
      Fixes: be1158cc ("jbd2: fold __process_buffer() into jbd2_log_do_checkpoint()")
      Reported-by: syzbot+7f4a27091759e2fe7453@syzkaller.appspotmail.com
      CC: stable@vger.kernel.org
      Reviewed-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      25881163
    • C
      f2fs: fix to account IO correctly · 5cfb4a68
      Chao Yu 提交于
      commit 4c58ed076875f36dae0f240da1e25e99e5d4afb8 upstream.
      
      Below race can cause reversed reference on dirty count, fix it by
      relocating __submit_bio() and inc_page_count().
      
      Thread A				Thread B
      - f2fs_inplace_write_data
       - f2fs_submit_page_bio
        - __submit_bio
      					- f2fs_write_end_io
      					 - dec_page_count
        - inc_page_count
      
      Cc: <stable@vger.kernel.org>
      Fixes: d1b3e72d ("f2fs: submit bio of in-place-update pages")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5cfb4a68
    • C
      f2fs: fix to recover cold bit of inode block during POR · 56db43fd
      Chao Yu 提交于
      commit ef2a007134b4eaa39264c885999f296577bc87d2 upstream.
      
      Testcase to reproduce this bug:
      1. mkfs.f2fs /dev/sdd
      2. mount -t f2fs /dev/sdd /mnt/f2fs
      3. touch /mnt/f2fs/file
      4. sync
      5. chattr +A /mnt/f2fs/file
      6. xfs_io -f /mnt/f2fs/file -c "fsync"
      7. godown /mnt/f2fs
      8. umount /mnt/f2fs
      9. mount -t f2fs /dev/sdd /mnt/f2fs
      10. chattr -A /mnt/f2fs/file
      11. xfs_io -f /mnt/f2fs/file -c "fsync"
      12. umount /mnt/f2fs
      13. mount -t f2fs /dev/sdd /mnt/f2fs
      14. lsattr /mnt/f2fs/file
      
      -----------------N- /mnt/f2fs/file
      
      But actually, we expect the corrct result is:
      
      -------A---------N- /mnt/f2fs/file
      
      The reason is in step 9) we missed to recover cold bit flag in inode
      block, so later, in fsync, we will skip write inode block due to below
      condition check, result in lossing data in another SPOR.
      
      f2fs_fsync_node_pages()
      	if (!IS_DNODE(page) || !is_cold_node(page))
      		continue;
      
      Note that, I guess that some non-dir inode has already lost cold bit
      during POR, so in order to reenable recovery for those inode, let's
      try to recover cold bit in f2fs_iget() to save more fsynced data.
      
      Fixes: c5667575 ("f2fs: remove unneeded set_cold_node()")
      Cc: <stable@vger.kernel.org> 4.17+
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      56db43fd
    • J
      f2fs: fix missing up_read · d224115a
      Jaegeuk Kim 提交于
      commit 89d13c38501df730cbb2e02c4499da1b5187119d upstream.
      
      This patch fixes missing up_read call.
      
      Fixes: c9b60788 ("f2fs: fix to do sanity check with block address in main area")
      Cc: <stable@vger.kernel.org> # 4.19+
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d224115a
    • J
      Revert "f2fs: fix to clear PG_checked flag in set_page_dirty()" · 4561194f
      Jaegeuk Kim 提交于
      commit 164a63fa6b384e30ceb96ed80bc7dc3379bc0960 upstream.
      
      This reverts commit 66110abc.
      
      If we clear the cold data flag out of the writeback flow, we can miscount
      -1 by end_io, which incurs a deadlock caused by all I/Os being blocked during
      heavy GC.
      
      Balancing F2FS Async:
       - IO (CP:    1, Data:   -1, Flush: (   0    0    1), Discard: (   ...
      
      GC thread:                              IRQ
      - move_data_page()
       - set_page_dirty()
        - clear_cold_data()
                                              - f2fs_write_end_io()
                                               - type = WB_DATA_TYPE(page);
                                                 here, we get wrong type
                                               - dec_page_count(sbi, type);
       - f2fs_wait_on_page_writeback()
      
      Cc: <stable@vger.kernel.org>
      Reported-and-Tested-by: NPark Ju Hyung <qkrwngud825@gmail.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4561194f
    • C
      f2fs: fix to flush all dirty inodes recovered in readonly fs · cd295fdd
      Chao Yu 提交于
      [ Upstream commit 1378752b9921e60749eaf18ec6c47b33f9001abb ]
      
      generic/417 reported as blow:
      
      ------------[ cut here ]------------
      kernel BUG at /home/yuchao/git/devf2fs/inode.c:695!
      invalid opcode: 0000 [#1] PREEMPT SMP
      CPU: 1 PID: 21697 Comm: umount Tainted: G        W  O      4.18.0-rc2+ #39
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
      Call Trace:
       ? _raw_spin_unlock+0x2c/0x50
       evict+0xa8/0x170
       dispose_list+0x34/0x40
       evict_inodes+0x118/0x120
       generic_shutdown_super+0x41/0x100
       ? rcu_read_lock_sched_held+0x97/0xa0
       kill_block_super+0x22/0x50
       kill_f2fs_super+0x6f/0x80 [f2fs]
       deactivate_locked_super+0x3d/0x70
       deactivate_super+0x40/0x60
       cleanup_mnt+0x39/0x70
       __cleanup_mnt+0x10/0x20
       task_work_run+0x81/0xa0
       exit_to_usermode_loop+0x59/0xa7
       do_fast_syscall_32+0x1f5/0x22c
       entry_SYSENTER_32+0x53/0x86
      EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
      
      It can simply reproduced with scripts:
      
      Enable quota feature during mkfs.
      
      Testcase1:
      1. mkfs.f2fs /dev/zram0
      2. mount -t f2fs /dev/zram0 /mnt/f2fs
      3. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4k" -c "fsync"
      4. godown /mnt/f2fs
      5. umount /mnt/f2fs
      6. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
      7. umount /mnt/f2fs
      
      Testcase2:
      1. mkfs.f2fs /dev/zram0
      2. mount -t f2fs /dev/zram0 /mnt/f2fs
      3. touch /mnt/f2fs/file
      4. create process[pid = x] do:
      	a) open /mnt/f2fs/file;
      	b) unlink /mnt/f2fs/file
      5. godown -f /mnt/f2fs
      6. kill process[pid = x]
      7. umount /mnt/f2fs
      8. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
      9. umount /mnt/f2fs
      
      The reason is: during recovery, i_{c,m}time of inode will be updated, then
      the inode can be set dirty w/o being tracked in sbi->inode_list[DIRTY_META]
      global list, so later write_checkpoint will not flush such dirty inode into
      node page.
      
      Once umount is called, sync_filesystem() in generic_shutdown_super() will
      skip syncng dirty inodes due to sb_rdonly check, leaving dirty inodes
      there.
      
      To solve this issue, during umount, add remove SB_RDONLY flag in
      sb->s_flags, to make sure sync_filesystem() will not be skipped.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd295fdd
    • Y
      f2fs: report error if quota off error during umount · cfc8a57a
      Yunlei He 提交于
      [ Upstream commit cda9cc595f0bb6ffa51a4efc4b6533dfa4039b4c ]
      
      Now, we depend on fsck to ensure quota file data is ok,
      so we scan whole partition if checkpoint without umount
      flag. It's same for quota off error case, which may make
      quota file data inconsistent.
      
      generic/019 reports below error:
      
       __quota_error: 1160 callbacks suppressed
       Quota error (device zram1): write_blk: dquota write failed
       Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
       Quota error (device zram1): write_blk: dquota write failed
       Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
       Quota error (device zram1): write_blk: dquota write failed
       Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
       Quota error (device zram1): write_blk: dquota write failed
       Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
       Quota error (device zram1): write_blk: dquota write failed
       Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
       VFS: Busy inodes after unmount of zram1. Self-destruct in 5 seconds.  Have a nice day...
      
      If we failed in below path due to fail to write dquot block, we will miss
      to release quota inode, fix it.
      
      - f2fs_put_super
       - f2fs_quota_off_umount
        - f2fs_quota_off
         - f2fs_quota_sync   <-- failed
         - dquot_quota_off   <-- missed to call
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cfc8a57a
    • Z
      f2fs: avoid sleeping under spin_lock · 207093ca
      Zhikang Zhang 提交于
      [ Upstream commit b430f7263673eab1dc40e662ae3441a9619d16b8 ]
      
      In the call trace below, we might sleep in function dput().
      
      So in order to avoid sleeping under spin_lock, we remove f2fs_mark_inode_dirty_sync
      from __try_update_largest_extent && __drop_largest_extent.
      
      BUG: sleeping function called from invalid context at fs/dcache.c:796
      Call trace:
      	dump_backtrace+0x0/0x3f4
      	show_stack+0x24/0x30
      	dump_stack+0xe0/0x138
      	___might_sleep+0x2a8/0x2c8
      	__might_sleep+0x78/0x10c
      	dput+0x7c/0x750
      	block_dump___mark_inode_dirty+0x120/0x17c
      	__mark_inode_dirty+0x344/0x11f0
      	f2fs_mark_inode_dirty_sync+0x40/0x50
      	__insert_extent_tree+0x2e0/0x2f4
      	f2fs_update_extent_tree_range+0xcf4/0xde8
      	f2fs_update_extent_cache+0x114/0x12c
      	f2fs_update_data_blkaddr+0x40/0x50
      	write_data_page+0x150/0x314
      	do_write_data_page+0x648/0x2318
      	__write_data_page+0xdb4/0x1640
      	f2fs_write_cache_pages+0x768/0xafc
      	__f2fs_write_data_pages+0x590/0x1218
      	f2fs_write_data_pages+0x64/0x74
      	do_writepages+0x74/0xe4
      	__writeback_single_inode+0xdc/0x15f0
      	writeback_sb_inodes+0x574/0xc98
      	__writeback_inodes_wb+0x190/0x204
      	wb_writeback+0x730/0xf14
      	wb_check_old_data_flush+0x1bc/0x1c8
      	wb_workfn+0x554/0xf74
      	process_one_work+0x440/0x118c
      	worker_thread+0xac/0x974
      	kthread+0x1a0/0x1c8
      	ret_from_fork+0x10/0x1c
      Signed-off-by: NZhikang Zhang <zhangzhikang1@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      207093ca
    • C
      f2fs: fix to recover inode's i_flags during POR · b1097eb9
      Chao Yu 提交于
      [ Upstream commit 19c73a691ccf6fb2f12d4e9cf9830023966cec88 ]
      
      Testcase to reproduce this bug:
      1. mkfs.f2fs /dev/sdd
      2. mount -t f2fs /dev/sdd /mnt/f2fs
      3. touch /mnt/f2fs/file
      4. sync
      5. chattr +A /mnt/f2fs/file
      6. xfs_io -f /mnt/f2fs/file -c "fsync"
      7. godown /mnt/f2fs
      8. umount /mnt/f2fs
      9. mount -t f2fs /dev/sdd /mnt/f2fs
      10. lsattr /mnt/f2fs/file
      
      -----------------N- /mnt/f2fs/file
      
      But actually, we expect the corrct result is:
      
      -------A---------N- /mnt/f2fs/file
      
      The reason is we didn't recover inode.i_flags field during mount,
      fix it.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b1097eb9
    • C
      f2fs: fix to recover inode's crtime during POR · 5ce5de03
      Chao Yu 提交于
      [ Upstream commit 5cd1f387a13b5188b4edb4c834310302a85a6ea2 ]
      
      Testcase to reproduce this bug:
      1. mkfs.f2fs -O extra_attr -O inode_crtime /dev/sdd
      2. mount -t f2fs /dev/sdd /mnt/f2fs
      3. touch /mnt/f2fs/file
      4. xfs_io -f /mnt/f2fs/file -c "fsync"
      5. godown /mnt/f2fs
      6. umount /mnt/f2fs
      7. mount -t f2fs /dev/sdd /mnt/f2fs
      8. xfs_io -f /mnt/f2fs/file -c "statx -r"
      
      stat.btime.tv_sec = 0
      stat.btime.tv_nsec = 0
      
      This patch fixes to recover inode creation time fields during
      mount.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ce5de03
    • T
      ext4: fix argument checking in EXT4_IOC_MOVE_EXT · 3d267c56
      Theodore Ts'o 提交于
      [ Upstream commit f18b2b83a727a3db208308057d2c7945f368e625 ]
      
      If the starting block number of either the source or destination file
      exceeds the EOF, EXT4_IOC_MOVE_EXT should return EINVAL.
      
      Also fixed the helper function mext_check_coverage() so that if the
      logical block is beyond EOF, make it return immediately, instead of
      looping until the block number wraps all the away around.  This takes
      long enough that if there are multiple threads trying to do pound on
      an the same inode doing non-sensical things, it can end up triggering
      the kernel's soft lockup detector.
      
      Reported-by: syzbot+c61979f6f2cba5cb3c06@syzkaller.appspotmail.com
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d267c56
    • J
      f2fs: clear PageError on the read path · a28549b8
      Jaegeuk Kim 提交于
      [ Upstream commit fb7d70db305a1446864227abf711b756568f8242 ]
      
      When running fault injection test, I hit somewhat wrong behavior in f2fs_gc ->
      gc_data_segment():
      
      0. fault injection generated some PageError'ed pages
      
      1. gc_data_segment
       -> f2fs_get_read_data_page(REQ_RAHEAD)
      
      2. move_data_page
       -> f2fs_get_lock_data_page()
        -> f2f_get_read_data_page()
         -> f2fs_submit_page_read()
          -> submit_bio(READ)
        -> return EIO due to PageError
        -> fail to move data
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a28549b8
    • C
      f2fs: fix to account IO correctly for cgroup writeback · 64c90d9c
      Chao Yu 提交于
      [ Upstream commit 78efac53 ]
      
      Now, we have supported cgroup writeback, it depends on correctly IO
      account of specified filesystem.
      
      But in commit d1b3e72d ("f2fs: submit bio of in-place-update pages"),
      we split write paths from f2fs_submit_page_mbio() to two:
      - f2fs_submit_page_bio() for IPU path
      - f2fs_submit_page_bio() for OPU path
      
      But still we account write IO only in f2fs_submit_page_mbio(), result in
      incorrect IO account, fix it by adding missing IO account in IPU path.
      
      Fixes: d1b3e72d ("f2fs: submit bio of in-place-update pages")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      64c90d9c
    • R
      cifs: fix a credits leak for compund commands · c24f57c6
      Ronnie Sahlberg 提交于
      [ Upstream commit cb5c2e63 ]
      
      When processing the mids for compounds we would only add credits based on
      the last successful mid in the compound which would leak credits and
      eventually triggering a re-connect.
      
      Fix this by splitting the mid processing part into two loops instead of one
      where the first loop just waits for all mids and then counts how many
      credits we were granted for the whole compound.
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c24f57c6
    • H
      jffs2: free jffs2_sb_info through jffs2_kill_sb() · f5f578eb
      Hou Tao 提交于
      commit 92e2921f upstream.
      
      When an invalid mount option is passed to jffs2, jffs2_parse_options()
      will fail and jffs2_sb_info will be freed, but then jffs2_sb_info will
      be used (use-after-free) and freeed (double-free) in jffs2_kill_sb().
      
      Fix it by removing the buggy invocation of kfree() when getting invalid
      mount options.
      
      Fixes: 92abc475 ("jffs2: implement mount option parsing and compression overriding")
      Cc: stable@kernel.org
      Signed-off-by: NHou Tao <houtao1@huawei.com>
      Reviewed-by: NRichard Weinberger <richard@nod.at>
      Signed-off-by: NBoris Brezillon <boris.brezillon@bootlin.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5f578eb
  2. 18 10月, 2018 3 次提交
    • E
      fscache: Fix out of bound read in long cookie keys · fa520c47
      Eric Sandeen 提交于
      fscache_set_key() can incur an out-of-bounds read, reported by KASAN:
      
       BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x5b3/0x680 [fscache]
       Read of size 4 at addr ffff88084ff056d4 by task mount.nfs/32615
      
      and also reported by syzbot at https://lkml.org/lkml/2018/7/8/236
      
        BUG: KASAN: slab-out-of-bounds in fscache_set_key fs/fscache/cookie.c:120 [inline]
        BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x7a9/0x880 fs/fscache/cookie.c:171
        Read of size 4 at addr ffff8801d3cc8bb4 by task syz-executor907/4466
      
      This happens for any index_key_len which is not divisible by 4 and is
      larger than the size of the inline key, because the code allocates exactly
      index_key_len for the key buffer, but the hashing loop is stepping through
      it 4 bytes (u32) at a time in the buf[] array.
      
      Fix this by calculating how many u32 buffers we'll need by using
      DIV_ROUND_UP, and then using kcalloc() to allocate a precleared allocation
      buffer to hold the index_key, then using that same count as the hashing
      index limit.
      
      Fixes: ec0328e4 ("fscache: Maintain a catalogue of allocated cookies")
      Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa520c47
    • D
      fscache: Fix incomplete initialisation of inline key space · 1ff22883
      David Howells 提交于
      The inline key in struct rxrpc_cookie is insufficiently initialized,
      zeroing only 3 of the 4 slots, therefore an index_key_len between 13 and 15
      bytes will end up hashing uninitialized memory because the memcpy only
      partially fills the last buf[] element.
      
      Fix this by clearing fscache_cookie objects on allocation rather than using
      the slab constructor to initialise them.  We're going to pretty much fill
      in the entire struct anyway, so bringing it into our dcache writably
      shouldn't incur much overhead.
      
      This removes the need to do clearance in fscache_set_key() (where we aren't
      doing it correctly anyway).
      
      Also, we don't need to set cookie->key_len in fscache_set_key() as we
      already did it in the only caller, so remove that.
      
      Fixes: ec0328e4 ("fscache: Maintain a catalogue of allocated cookies")
      Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
      Reported-by: NEric Sandeen <sandeen@redhat.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ff22883
    • A
      cachefiles: fix the race between cachefiles_bury_object() and rmdir(2) · 169b8033
      Al Viro 提交于
      the victim might've been rmdir'ed just before the lock_rename();
      unlike the normal callers, we do not look the source up after the
      parents are locked - we know it beforehand and just recheck that it's
      still the child of what used to be its parent.  Unfortunately,
      the check is too weak - we don't spot a dead directory since its
      ->d_parent is unchanged, dentry is positive, etc.  So we sail all
      the way to ->rename(), with hosting filesystems _not_ expecting
      to be asked renaming an rmdir'ed subdirectory.
      
      The fix is easy, fortunately - the lock on parent is sufficient for
      making IS_DEADDIR() on child safe.
      
      Cc: stable@vger.kernel.org
      Fixes: 9ae326a6 (CacheFiles: A cache that backs onto a mounted filesystem)
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      169b8033
  3. 15 10月, 2018 1 次提交
    • D
      afs: Fix clearance of reply · f0a7d188
      David Howells 提交于
      The recent patch to fix the afs_server struct leak didn't actually fix the
      bug, but rather fixed some of the symptoms.  The problem is that an
      asynchronous call that holds a resource pointed to by call->reply[0] will
      find the pointer cleared in the call destructor, thereby preventing the
      resource from being cleaned up.
      
      In the case of the server record leak, the afs_fs_get_capabilities()
      function in devel code sets up a call with reply[0] pointing at the server
      record that should be altered when the result is obtained, but this was
      being cleared before the destructor was called, so the put in the
      destructor does nothing and the record is leaked.
      
      Commit f014ffb0 removed the additional ref obtained by
      afs_install_server(), but the removal of this ref is actually used by the
      garbage collector to mark a server record as being defunct after the record
      has expired through lack of use.
      
      The offending clearance of call->reply[0] upon completion in
      afs_process_async_call() has been there from the origin of the code, but
      none of the asynchronous calls actually use that pointer currently, so it
      should be safe to remove (note that synchronous calls don't involve this
      function).
      
      Fix this by the following means:
      
       (1) Revert commit f014ffb0.
      
       (2) Remove the clearance of reply[0] from afs_process_async_call().
      
      Without this, afs_manage_servers() will suffer an assertion failure if it
      sees a server record that didn't get used because the usage count is not 1.
      
      Fixes: f014ffb0 ("afs: Fix afs_server struct leak")
      Fixes: 08e0e7c8 ("[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC.")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0a7d188
  4. 13 10月, 2018 3 次提交
  5. 12 10月, 2018 1 次提交
    • D
      afs: Fix afs_server struct leak · f014ffb0
      David Howells 提交于
      Fix a leak of afs_server structs.  The routine that installs them in the
      various lookup lists and trees gets a ref on leaving the function, whether
      it added the server or a server already exists.  It shouldn't increment
      the refcount if it added the server.
      
      The effect of this that "rmmod kafs" will hang waiting for the leaked
      server to become unused.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f014ffb0