1. 28 1月, 2017 1 次提交
    • B
      xfs: prevent quotacheck from overloading inode lru · e0d76fa4
      Brian Foster 提交于
      Quotacheck runs at mount time in situations where quota accounting must
      be recalculated. In doing so, it uses bulkstat to visit every inode in
      the filesystem. Historically, every inode processed during quotacheck
      was released and immediately tagged for reclaim because quotacheck runs
      before the superblock is marked active by the VFS. In other words,
      the final iput() lead to an immediate ->destroy_inode() call, which
      allowed the XFS background reclaim worker to start reclaiming inodes.
      
      Commit 17c12bcd ("xfs: when replaying bmap operations, don't let
      unlinked inodes get reaped") marks the XFS superblock active sooner as
      part of the mount process to support caching inodes processed during log
      recovery. This occurs before quotacheck and thus means all inodes
      processed by quotacheck are inserted to the LRU on release.  The
      s_umount lock is held until the mount has completed and thus prevents
      the shrinkers from operating on the sb. This means that quotacheck can
      excessively populate the inode LRU and lead to OOM conditions on systems
      without sufficient RAM.
      
      Update the quotacheck bulkstat handler to set XFS_IGET_DONTCACHE on
      inodes processed by quotacheck. This causes ->drop_inode() to return 1
      and in turn causes iput_final() to evict the inode. This preserves the
      original quotacheck behavior and prevents it from overloading the LRU
      and running out of memory.
      
      CC: stable@vger.kernel.org # v4.9
      Reported-by: NMartin Svec <martin.svec@zoner.cz>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e0d76fa4
  2. 27 1月, 2017 1 次提交
    • D
      xfs: fix bmv_count confusion w/ shared extents · c364b6d0
      Darrick J. Wong 提交于
      In a bmapx call, bmv_count is the total size of the array, including the
      zeroth element that userspace uses to supply the search key.  The output
      array starts at offset 1 so that we can set up the user for the next
      invocation.  Since we now can split an extent into multiple bmap records
      due to shared/unshared status, we have to be careful that we don't
      overflow the output array.
      
      In the original patch f86f4037 ("xfs: teach get_bmapx about shared
      extents and the CoW fork") I used cur_ext (the output index) to check
      for overflows, albeit with an off-by-one error.  Since nexleft no longer
      describes the number of unfilled slots in the output, we can rip all
      that out and use cur_ext for the overflow check directly.
      
      Failure to do this causes heap corruption in bmapx callers such as
      xfs_io and xfs_scrub.  xfs/328 can reproduce this problem.
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      c364b6d0
  3. 26 1月, 2017 2 次提交
    • D
      xfs: clear _XBF_PAGES from buffers when readahead page · 2aa6ba7b
      Darrick J. Wong 提交于
      If we try to allocate memory pages to back an xfs_buf that we're trying
      to read, it's possible that we'll be so short on memory that the page
      allocation fails.  For a blocking read we'll just wait, but for
      readahead we simply dump all the pages we've collected so far.
      
      Unfortunately, after dumping the pages we neglect to clear the
      _XBF_PAGES state, which means that the subsequent call to xfs_buf_free
      thinks that b_pages still points to pages we own.  It then double-frees
      the b_pages pages.
      
      This results in screaming about negative page refcounts from the memory
      manager, which xfs oughtn't be triggering.  To reproduce this case,
      mount a filesystem where the size of the inodes far outweighs the
      availalble memory (a ~500M inode filesystem on a VM with 300MB memory
      did the trick here) and run bulkstat in parallel with other memory
      eating processes to put a huge load on the system.  The "check summary"
      phase of xfs_scrub also works for this purpose.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      2aa6ba7b
    • C
      xfs: extsize hints are not unlikely in xfs_bmap_btalloc · 493611eb
      Christoph Hellwig 提交于
      With COW files they are the hotpath, just like for files with the
      extent size hint attribute.  We really shouldn't micro-manage anything
      but failure cases with unlikely.
      
      Additionally Arnd Bergmann recently reported that one of these two
      unlikely annotations causes link failures together with an upcoming
      kernel instrumentation patch, so let's get rid of it ASAP.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      493611eb
  4. 25 1月, 2017 4 次提交
  5. 24 1月, 2017 1 次提交
    • C
      xfs: fix COW writeback race · d2b3964a
      Christoph Hellwig 提交于
      Due to the way how xfs_iomap_write_allocate tries to convert the whole
      found extents from delalloc to real space we can run into a race
      condition with multiple threads doing writes to this same extent.
      For the non-COW case that is harmless as the only thing that can happen
      is that we call xfs_bmapi_write on an extent that has already been
      converted to a real allocation.  For COW writes where we move the extent
      from the COW to the data fork after I/O completion the race is, however,
      not quite as harmless.  In the worst case we are now calling
      xfs_bmapi_write on a region that contains hole in the COW work, which
      will trip up an assert in debug builds or lead to file system corruption
      in non-debug builds.  This seems to be reproducible with workloads of
      small O_DSYNC write, although so far I've not managed to come up with
      a with an isolated reproducer.
      
      The fix for the issue is relatively simple:  tell xfs_bmapi_write
      that we are only asked to convert delayed allocations and skip holes
      in that case.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d2b3964a
  6. 19 1月, 2017 5 次提交
  7. 18 1月, 2017 8 次提交
  8. 17 1月, 2017 6 次提交
    • R
      ubifs: Fix journal replay wrt. xattr nodes · 1cb51a15
      Richard Weinberger 提交于
      When replaying the journal it can happen that a journal entry points to
      a garbage collected node.
      This is the case when a power-cut occurred between a garbage collect run
      and a commit. In such a case nodes have to be read using the failable
      read functions to detect whether the found node matches what we expect.
      
      One corner case was forgotten, when the journal contains an entry to
      remove an inode all xattrs have to be removed too. UBIFS models xattr
      like directory entries, so the TNC code iterates over
      all xattrs of the inode and removes them too. This code re-uses the
      functions for walking directories and calls ubifs_tnc_next_ent().
      ubifs_tnc_next_ent() expects to be used only after the journal and
      aborts when a node does not match the expected result. This behavior can
      render an UBIFS volume unmountable after a power-cut when xattrs are
      used.
      
      Fix this issue by using failable read functions in ubifs_tnc_next_ent()
      too when replaying the journal.
      Cc: stable@vger.kernel.org
      Fixes: 1e51764a ("UBIFS: add new flash file system")
      Reported-by: NRock Lee <rockdotlee@gmail.com>
      Reviewed-by: NDavid Gstir <david@sigma-star.at>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      1cb51a15
    • E
      ubifs: remove redundant checks for encryption key · 3d4b2fcb
      Eric Biggers 提交于
      In several places, ubifs checked for an encryption key before creating a
      file in an encrypted directory.  This was redundant with
      fscrypt_setup_filename() or ubifs_new_inode(), and in the case of
      ubifs_link() it broke linking to special files.  So remove the extra
      checks.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      3d4b2fcb
    • E
      ubifs: allow encryption ioctls in compat mode · a75467d9
      Eric Biggers 提交于
      The ubifs encryption ioctls did not work when called by a 32-bit program
      on a 64-bit kernel.  Since 'struct fscrypt_policy' is not affected by
      the word size, ubifs just needs to allow these ioctls through, like what
      ext4 and f2fs do.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      a75467d9
    • A
      ubifs: add CONFIG_BLOCK dependency for encryption · 404e0b63
      Arnd Bergmann 提交于
      This came up during the v4.10 merge window:
      
      warning: (UBIFS_FS_ENCRYPTION) selects FS_ENCRYPTION which has unmet direct dependencies (BLOCK)
      fs/crypto/crypto.c: In function 'fscrypt_zeroout_range':
      fs/crypto/crypto.c:355:9: error: implicit declaration of function 'bio_alloc';did you mean 'd_alloc'? [-Werror=implicit-function-declaration]
         bio = bio_alloc(GFP_NOWAIT, 1);
      
      The easiest way out is to limit UBIFS_FS_ENCRYPTION to configurations
      that also enable BLOCK.
      
      Fixes: d475a507 ("ubifs: Add skeleton for fscrypto")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      404e0b63
    • P
      ubifs: fix unencrypted journal write · 507502ad
      Peter Rosin 提交于
      Without this, I get the following on reboot:
      
      UBIFS error (ubi1:0 pid 703): ubifs_load_znode: bad target node (type 1) length (8240)
      UBIFS error (ubi1:0 pid 703): ubifs_load_znode: have to be in range of 48-4144
      UBIFS error (ubi1:0 pid 703): ubifs_load_znode: bad indexing node at LEB 13:11080, error 5
       magic          0x6101831
       crc            0xb1cb246f
       node_type      9 (indexing node)
       group_type     0 (no node group)
       sqnum          546
       len            128
       child_cnt      5
       level          0
       Branches:
       0: LEB 14:72088 len 161 key (133, inode)
       1: LEB 14:81120 len 160 key (134, inode)
       2: LEB 20:26624 len 8240 key (134, data, 0)
       3: LEB 14:81280 len 160 key (135, inode)
       4: LEB 20:34864 len 8240 key (135, data, 0)
      UBIFS warning (ubi1:0 pid 703): ubifs_ro_mode.part.0: switched to read-only mode, error -22
      CPU: 0 PID: 703 Comm: mount Not tainted 4.9.0-next-20161213+ #1197
      Hardware name: Atmel SAMA5
      [<c010d2ac>] (unwind_backtrace) from [<c010b250>] (show_stack+0x10/0x14)
      [<c010b250>] (show_stack) from [<c024df94>] (ubifs_jnl_update+0x2e8/0x614)
      [<c024df94>] (ubifs_jnl_update) from [<c0254bf8>] (ubifs_mkdir+0x160/0x204)
      [<c0254bf8>] (ubifs_mkdir) from [<c01a6030>] (vfs_mkdir+0xb0/0x104)
      [<c01a6030>] (vfs_mkdir) from [<c0286070>] (ovl_create_real+0x118/0x248)
      [<c0286070>] (ovl_create_real) from [<c0283ed4>] (ovl_fill_super+0x994/0xaf4)
      [<c0283ed4>] (ovl_fill_super) from [<c019c394>] (mount_nodev+0x44/0x9c)
      [<c019c394>] (mount_nodev) from [<c019c4ac>] (mount_fs+0x14/0xa4)
      [<c019c4ac>] (mount_fs) from [<c01b5338>] (vfs_kern_mount+0x4c/0xd4)
      [<c01b5338>] (vfs_kern_mount) from [<c01b6b80>] (do_mount+0x154/0xac8)
      [<c01b6b80>] (do_mount) from [<c01b782c>] (SyS_mount+0x74/0x9c)
      [<c01b782c>] (SyS_mount) from [<c0107f80>] (ret_fast_syscall+0x0/0x3c)
      UBIFS error (ubi1:0 pid 703): ubifs_mkdir: cannot create directory, error -22
      overlayfs: failed to create directory /mnt/ovl/work/work (errno: 22); mounting read-only
      
      Fixes: 7799953b ("ubifs: Implement encrypt/decrypt for all IO")
      Signed-off-by: NPeter Rosin <peda@axentia.se>
      Tested-by: NKevin Hilman <khilman@baylibre.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      507502ad
    • C
      ubifs: ensure zero err is returned on successful return · e8f19746
      Colin Ian King 提交于
      err is no longer being set on a successful return path, causing
      a garbage value being returned. Fix this by setting err to zero
      for the successful return path.
      
      Found with static analysis by CoverityScan, CID 1389473
      
      Fixes: 7799953b ("ubifs: Implement encrypt/decrypt for all IO")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      e8f19746
  9. 15 1月, 2017 2 次提交
    • D
      coredump: Ensure proper size of sparse core files · 4d22c75d
      Dave Kleikamp 提交于
      If the last section of a core file ends with an unmapped or zero page,
      the size of the file does not correspond with the last dump_skip() call.
      gdb complains that the file is truncated and can be confusing to users.
      
      After all of the vma sections are written, make sure that the file size
      is no smaller than the current file position.
      
      This problem can be demonstrated with gdb's bigcore testcase on the
      sparc architecture.
      Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4d22c75d
    • S
      aio: fix lock dep warning · a12f1ae6
      Shaohua Li 提交于
      lockdep reports a warnning. file_start_write/file_end_write only
      acquire/release the lock for regular files. So checking the files in aio
      side too.
      
      [  453.532141] ------------[ cut here ]------------
      [  453.533011] WARNING: CPU: 1 PID: 1298 at ../kernel/locking/lockdep.c:3514 lock_release+0x434/0x670
      [  453.533011] DEBUG_LOCKS_WARN_ON(depth <= 0)
      [  453.533011] Modules linked in:
      [  453.533011] CPU: 1 PID: 1298 Comm: fio Not tainted 4.9.0+ #964
      [  453.533011] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.0-1.fc24 04/01/2014
      [  453.533011]  ffff8803a24b7a70 ffffffff8196cffb ffff8803a24b7ae8 0000000000000000
      [  453.533011]  ffff8803a24b7ab8 ffffffff81091ee1 ffff8803a5dba700 00000dba00000008
      [  453.533011]  ffffed0074496f59 ffff8803a5dbaf54 ffff8803ae0f8488 fffffffffffffdef
      [  453.533011] Call Trace:
      [  453.533011]  [<ffffffff8196cffb>] dump_stack+0x67/0x9c
      [  453.533011]  [<ffffffff81091ee1>] __warn+0x111/0x130
      [  453.533011]  [<ffffffff81091f97>] warn_slowpath_fmt+0x97/0xb0
      [  453.533011]  [<ffffffff81091f00>] ? __warn+0x130/0x130
      [  453.533011]  [<ffffffff8191b789>] ? blk_finish_plug+0x29/0x60
      [  453.533011]  [<ffffffff811205d4>] lock_release+0x434/0x670
      [  453.533011]  [<ffffffff8198af94>] ? import_single_range+0xd4/0x110
      [  453.533011]  [<ffffffff81322195>] ? rw_verify_area+0x65/0x140
      [  453.533011]  [<ffffffff813aa696>] ? aio_write+0x1f6/0x280
      [  453.533011]  [<ffffffff813aa6c9>] aio_write+0x229/0x280
      [  453.533011]  [<ffffffff813aa4a0>] ? aio_complete+0x640/0x640
      [  453.533011]  [<ffffffff8111df20>] ? debug_check_no_locks_freed+0x1a0/0x1a0
      [  453.533011]  [<ffffffff8114793a>] ? debug_lockdep_rcu_enabled.part.2+0x1a/0x30
      [  453.533011]  [<ffffffff81147985>] ? debug_lockdep_rcu_enabled+0x35/0x40
      [  453.533011]  [<ffffffff812a92be>] ? __might_fault+0x7e/0xf0
      [  453.533011]  [<ffffffff813ac9bc>] do_io_submit+0x94c/0xb10
      [  453.533011]  [<ffffffff813ac2ae>] ? do_io_submit+0x23e/0xb10
      [  453.533011]  [<ffffffff813ac070>] ? SyS_io_destroy+0x270/0x270
      [  453.533011]  [<ffffffff8111d7b3>] ? mark_held_locks+0x23/0xc0
      [  453.533011]  [<ffffffff8100201a>] ? trace_hardirqs_on_thunk+0x1a/0x1c
      [  453.533011]  [<ffffffff813acb90>] SyS_io_submit+0x10/0x20
      [  453.533011]  [<ffffffff824f96aa>] entry_SYSCALL_64_fastpath+0x18/0xad
      [  453.533011]  [<ffffffff81119190>] ? trace_hardirqs_off_caller+0xc0/0x110
      [  453.533011] ---[ end trace b2fbe664d1cc0082 ]---
      
      Cc: Dmitry Monakhov <dmonakhov@openvz.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a12f1ae6
  10. 14 1月, 2017 2 次提交
  11. 13 1月, 2017 8 次提交