1. 22 6月, 2015 4 次提交
  2. 21 6月, 2015 2 次提交
    • T
      ext4: prevent ext4_quota_write() from failing due to ENOSPC · c5e298ae
      Theodore Ts'o 提交于
      In order to prevent quota block tracking to be inaccurate when
      ext4_quota_write() fails with ENOSPC, we make two changes.  The quota
      file can now use the reserved block (since the quota file is arguably
      file system metadata), and ext4_quota_write() now uses
      ext4_should_retry_alloc() to retry the block allocation after a commit
      has completed and released some blocks for allocation.
      
      This fixes failures of xfstests generic/270:
      
      Quota error (device vdc): write_blk: dquota write failed
      Quota error (device vdc): qtree_write_dquot: Error -28 occurred while creating quota
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      c5e298ae
    • T
      ext4: call sync_blockdev() before invalidate_bdev() in put_super() · 89d96a6f
      Theodore Ts'o 提交于
      Normally all of the buffers will have been forced out to disk before
      we call invalidate_bdev(), but there will be some cases, where a file
      system operation was aborted due to an ext4_error(), where there may
      still be some dirty buffers in the buffer cache for the device.  So
      try to force them out to memory before calling invalidate_bdev().
      
      This fixes a warning triggered by generic/081:
      
      WARNING: CPU: 1 PID: 3473 at /usr/projects/linux/ext4/fs/block_dev.c:56 __blkdev_put+0xb5/0x16f()
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      89d96a6f
  3. 16 6月, 2015 1 次提交
    • A
      ext4: improve warning directory handling messages · b03a2f7e
      Andreas Dilger 提交于
      Several ext4_warning() messages in the directory handling code do not
      report the inode number of the (potentially corrupt) directory where a
      problem is seen, and others report this in an ad-hoc manner.  Add an
      ext4_warning_inode() helper to print the inode number and command name
      consistent with ext4_error_inode().
      
      Consolidate the place in ext4.h that these macros are defined.
      
      Clean up some other directory error and warning messages to print the
      calling function name.
      
      Minor code style fixes in nearby lines.
      Signed-off-by: NAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b03a2f7e
  4. 15 6月, 2015 3 次提交
    • R
      ext4: mballoc: avoid 20-argument function call · 97b4af2f
      Rasmus Villemoes 提交于
      Making a function call with 20 arguments is rather expensive in both
      stack and .text. In this case, doing the formatting manually doesn't
      make it any less readable, so we might as well save 155 bytes of .text
      and 112 bytes of stack.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      97b4af2f
    • L
      ext4: wait for existing dio workers in ext4_alloc_file_blocks() · 0d306dcf
      Lukas Czerner 提交于
      Currently existing dio workers can jump in and potentially increase
      extent tree depth while we're allocating blocks in
      ext4_alloc_file_blocks().  This may cause us to underestimate the
      number of credits needed for the transaction because the extent tree
      depth can change after our estimation.
      
      Fix this by waiting for all the existing dio workers in the same way
      as we do it in ext4_punch_hole.  We've seen errors caused by this in
      xfstest generic/299, however it's really hard to reproduce.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      0d306dcf
    • L
      ext4: recalculate journal credits as inode depth changes · 4134f5c8
      Lukas Czerner 提交于
      Currently in ext4_alloc_file_blocks() the number of credits is
      calculated only once before we enter the allocation loop. However within
      the allocation loop the extent tree depth can change, hence the number
      of credits needed can increase potentially exceeding the number of credits
      reserved in the handle which can cause journal failures.
      
      Fix this by recalculating number of credits when the inode depth
      changes. Note that even though ext4_alloc_file_blocks() is only
      currently used by extent base inodes we will avoid recalculating number
      of credits unnecessarily in the case of indirect based inodes.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      4134f5c8
  5. 13 6月, 2015 4 次提交
    • F
      ext4: use swap() in mext_page_double_lock() · bf865467
      Fabian Frederick 提交于
      Use kernel.h macro definition.
      
      Thanks to Julia Lawall for Coccinelle scripting support.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      bf865467
    • F
      ext4: use swap() in memswap() · 4b7e2db5
      Fabian Frederick 提交于
      Use kernel.h macro definition.
      
      Thanks to Julia Lawall for Coccinelle scripting support.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      4b7e2db5
    • T
      ext4: fix race between truncate and __ext4_journalled_writepage() · bdf96838
      Theodore Ts'o 提交于
      The commit cf108bca: "ext4: Invert the locking order of page_lock
      and transaction start" caused __ext4_journalled_writepage() to drop
      the page lock before the page was written back, as part of changing
      the locking order to jbd2_journal_start -> page_lock.  However, this
      introduced a potential race if there was a truncate racing with the
      data=journalled writeback mode.
      
      Fix this by grabbing the page lock after starting the journal handle,
      and then checking to see if page had gotten truncated out from under
      us.
      
      This fixes a number of different warnings or BUG_ON's when running
      xfstests generic/086 in data=journalled mode, including:
      
      jbd2_journal_dirty_metadata: vdc-8: bad jh for block 115643: transaction (ee3fe7
      c0, 164), jh->b_transaction (  (null), 0), jh->b_next_transaction (  (null), 0), jlist 0
      
      	      	      	  - and -
      
      kernel BUG at /usr/projects/linux/ext4/fs/jbd2/transaction.c:2200!
          ...
      Call Trace:
       [<c02b2ded>] ? __ext4_journalled_invalidatepage+0x117/0x117
       [<c02b2de5>] __ext4_journalled_invalidatepage+0x10f/0x117
       [<c02b2ded>] ? __ext4_journalled_invalidatepage+0x117/0x117
       [<c027d883>] ? lock_buffer+0x36/0x36
       [<c02b2dfa>] ext4_journalled_invalidatepage+0xd/0x22
       [<c0229139>] do_invalidatepage+0x22/0x26
       [<c0229198>] truncate_inode_page+0x5b/0x85
       [<c022934b>] truncate_inode_pages_range+0x156/0x38c
       [<c0229592>] truncate_inode_pages+0x11/0x15
       [<c022962d>] truncate_pagecache+0x55/0x71
       [<c02b913b>] ext4_setattr+0x4a9/0x560
       [<c01ca542>] ? current_kernel_time+0x10/0x44
       [<c026c4d8>] notify_change+0x1c7/0x2be
       [<c0256a00>] do_truncate+0x65/0x85
       [<c0226f31>] ? file_ra_state_init+0x12/0x29
      
      	      	      	  - and -
      
      WARNING: CPU: 1 PID: 1331 at /usr/projects/linux/ext4/fs/jbd2/transaction.c:1396
      irty_metadata+0x14a/0x1ae()
          ...
      Call Trace:
       [<c01b879f>] ? console_unlock+0x3a1/0x3ce
       [<c082cbb4>] dump_stack+0x48/0x60
       [<c0178b65>] warn_slowpath_common+0x89/0xa0
       [<c02ef2cf>] ? jbd2_journal_dirty_metadata+0x14a/0x1ae
       [<c0178bef>] warn_slowpath_null+0x14/0x18
       [<c02ef2cf>] jbd2_journal_dirty_metadata+0x14a/0x1ae
       [<c02d8615>] __ext4_handle_dirty_metadata+0xd4/0x19d
       [<c02b2f44>] write_end_fn+0x40/0x53
       [<c02b4a16>] ext4_walk_page_buffers+0x4e/0x6a
       [<c02b59e7>] ext4_writepage+0x354/0x3b8
       [<c02b2f04>] ? mpage_release_unused_pages+0xd4/0xd4
       [<c02b1b21>] ? wait_on_buffer+0x2c/0x2c
       [<c02b5a4b>] ? ext4_writepage+0x3b8/0x3b8
       [<c02b5a5b>] __writepage+0x10/0x2e
       [<c0225956>] write_cache_pages+0x22d/0x32c
       [<c02b5a4b>] ? ext4_writepage+0x3b8/0x3b8
       [<c02b6ee8>] ext4_writepages+0x102/0x607
       [<c019adfe>] ? sched_clock_local+0x10/0x10e
       [<c01a8a7c>] ? __lock_is_held+0x2e/0x44
       [<c01a8ad5>] ? lock_is_held+0x43/0x51
       [<c0226dff>] do_writepages+0x1c/0x29
       [<c0276bed>] __writeback_single_inode+0xc3/0x545
       [<c0277c07>] writeback_sb_inodes+0x21f/0x36d
          ...
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      bdf96838
    • T
      ext4 crypto: fail the mount if blocksize != pagesize · 1cb767cd
      Theodore Ts'o 提交于
      We currently don't correctly handle the case where blocksize !=
      pagesize, so disallow the mount in those cases.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      1cb767cd
  6. 09 6月, 2015 2 次提交
  7. 08 6月, 2015 5 次提交
    • D
      ext4: BUG_ON assertion repeated for inode1, not done for inode2 · 8bc3b1e6
      David Moore 提交于
      During a source code review of fs/ext4/extents.c I noted identical
      consecutive lines. An assertion is repeated for inode1 and never done
      for inode2. This is not in keeping with the rest of the code in the
      ext4_swap_extents function and appears to be a bug.
      
      Assert that the inode2 mutex is not locked.
      Signed-off-by: NDavid Moore <dmoorefo@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      8bc3b1e6
    • T
    • L
      ext4: return error code from ext4_mb_good_group() · 42ac1848
      Lukas Czerner 提交于
      Currently ext4_mb_good_group() only returns 0 or 1 depending on whether
      the allocation group is suitable for use or not. However we might get
      various errors and fail while initializing new group including -EIO
      which would never get propagated up the call chain. This might lead to
      an endless loop at writeback when we're trying to find a good group to
      allocate from and we fail to initialize new group (read error for
      example).
      
      Fix this by returning proper error code from ext4_mb_good_group() and
      using it in ext4_mb_regular_allocator(). In ext4_mb_regular_allocator()
      we will always return only the first occurred error from
      ext4_mb_good_group() and we only propagate it back  to the caller if we
      do not get any other errors and we fail to allocate any blocks.
      
      Note that with other modes than errors=continue, we will fail
      immediately in ext4_mb_good_group() in case of error, however with
      errors=continue we should try to continue using the file system, that's
      why we're not going to fail immediately when we see an error from
      ext4_mb_good_group(), but rather when we fail to find a suitable block
      group to allocate from due to an problem in group initialization.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      42ac1848
    • L
      ext4: try to initialize all groups we can in case of failure on ppc64 · bbdc322f
      Lukas Czerner 提交于
      Currently on the machines with page size > block size when initializing
      block group buddy cache we initialize it for all the block group bitmaps
      in the page. However in the case of read error, checksum error, or if
      a single bitmap is in any way corrupted we would fail to initialize all
      of the bitmaps. This is problematic because we will not have access to
      the other allocation groups even though those might be perfectly fine
      and usable.
      
      Fix this by reading all the bitmaps instead of error out on the first
      problem and simply skip the bitmaps which were either not read properly,
      or are not valid.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      bbdc322f
    • L
      ext4: verify block bitmap even after fresh initialization · 41e5b7ed
      Lukas Czerner 提交于
      If we want to rely on the buffer_verified() flag of the block bitmap
      buffer, we have to set it consistently. However currently if we're
      initializing uninitialized block bitmap in
      ext4_read_block_bitmap_nowait() we're not going to set buffer verified
      at all.
      
      We can do this by simply setting the flag on the buffer, but I think
      it's actually better to run ext4_validate_block_bitmap() to make sure
      that what we did in the ext4_init_block_bitmap() is right.
      
      So run ext4_validate_block_bitmap() even after the block bitmap
      initialization. Also bail out early from ext4_validate_block_bitmap() if
      we see corrupt bitmap, since we already know it's corrupt and we do not
      need to verify that.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      41e5b7ed
  8. 03 6月, 2015 1 次提交
    • T
      ext4 crypto: allocate bounce pages using GFP_NOWAIT · 3dbb5eb9
      Theodore Ts'o 提交于
      Previously we allocated bounce pages using a combination of
      alloc_page() and mempool_alloc() with the __GFP_WAIT bit set.
      Instead, use mempool_alloc() with GFP_NOWAIT.  The mempool_alloc()
      function will try using alloc_pages() initially, and then only use the
      mempool reserve of pages if alloc_pages() is unable to fulfill the
      request.
      
      This minimizes the the impact on the mm layer when we need to do a
      large amount of writeback of encrypted files, as Jaeguk Kim had
      reported that under a heavy fio workload on a system with restricted
      amounts memory (which unfortunately, includes many mobile handsets),
      he had observed the the OOM killer getting triggered several times.
      Using GFP_NOWAIT
      
      If the mempool_alloc() function fails, we will retry the page
      writeback at a later time; the function of the mempool is to ensure
      that we can writeback at least 32 pages at a time, so we can more
      efficiently dispatch I/O under high memory pressure situations.  In
      the future we should make this be a tunable so we can determine the
      best tradeoff between permanently sequestering memory and the ability
      to quickly launder pages so we can free up memory quickly when
      necessary.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      3dbb5eb9
  9. 01 6月, 2015 13 次提交
  10. 19 5月, 2015 5 次提交
    • T
      ext4 crypto: get rid of ci_mode from struct ext4_crypt_info · 1aaa6e8b
      Theodore Ts'o 提交于
      The ci_mode field was superfluous, and getting rid of it gets rid of
      an unused hole in the structure.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      1aaa6e8b
    • T
      ext4 crypto: use slab caches · 8ee03714
      Theodore Ts'o 提交于
      Use slab caches the ext4_crypto_ctx and ext4_crypt_info structures for
      slighly better memory efficiency and debuggability.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      8ee03714
    • T
      ext4: clean up superblock encryption mode fields · f5aed2c2
      Theodore Ts'o 提交于
      The superblock fields s_file_encryption_mode and s_dir_encryption_mode
      are vestigal, so remove them as a cleanup.  While we're at it, allow
      file systems with both encryption and inline_data enabled at the same
      time to work correctly.  We can't have encrypted inodes with inline
      data, but there's no reason to prohibit unencrypted inodes from using
      the inline data feature.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      f5aed2c2
    • T
      ext4 crypto: reorganize how we store keys in the inode · b7236e21
      Theodore Ts'o 提交于
      This is a pretty massive patch which does a number of different things:
      
      1) The per-inode encryption information is now stored in an allocated
         data structure, ext4_crypt_info, instead of directly in the node.
         This reduces the size usage of an in-memory inode when it is not
         using encryption.
      
      2) We drop the ext4_fname_crypto_ctx entirely, and use the per-inode
         encryption structure instead.  This remove an unnecessary memory
         allocation and free for the fname_crypto_ctx as well as allowing us
         to reuse the ctfm in a directory for multiple lookups and file
         creations.
      
      3) We also cache the inode's policy information in the ext4_crypt_info
         structure so we don't have to continually read it out of the
         extended attributes.
      
      4) We now keep the keyring key in the inode's encryption structure
         instead of releasing it after we are done using it to derive the
         per-inode key.  This allows us to test to see if the key has been
         revoked; if it has, we prevent the use of the derived key and free
         it.
      
      5) When an inode is released (or when the derived key is freed), we
         will use memset_explicit() to zero out the derived key, so it's not
         left hanging around in memory.  This implies that when a user logs
         out, it is important to first revoke the key, and then unlink it,
         and then finally, to use "echo 3 > /proc/sys/vm/drop_caches" to
         release any decrypted pages and dcache entries from the system
         caches.
      
      6) All this, and we also shrink the number of lines of code by around
         100.  :-)
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b7236e21
    • T
      ext4 crypto: separate kernel and userspace structure for the key · e2881b1b
      Theodore Ts'o 提交于
      Use struct ext4_encryption_key only for the master key passed via the
      kernel keyring.
      
      For internal kernel space users, we now use struct ext4_crypt_info.
      This will allow us to put information from the policy structure so we
      can cache it and avoid needing to constantly looking up the extended
      attribute.  We will do this in a spearate patch.  This patch is mostly
      mechnical to make it easier for patch review.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      e2881b1b