1. 13 6月, 2015 4 次提交
    • F
      ext4: use swap() in mext_page_double_lock() · bf865467
      Fabian Frederick 提交于
      Use kernel.h macro definition.
      
      Thanks to Julia Lawall for Coccinelle scripting support.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      bf865467
    • F
      ext4: use swap() in memswap() · 4b7e2db5
      Fabian Frederick 提交于
      Use kernel.h macro definition.
      
      Thanks to Julia Lawall for Coccinelle scripting support.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      4b7e2db5
    • T
      ext4: fix race between truncate and __ext4_journalled_writepage() · bdf96838
      Theodore Ts'o 提交于
      The commit cf108bca: "ext4: Invert the locking order of page_lock
      and transaction start" caused __ext4_journalled_writepage() to drop
      the page lock before the page was written back, as part of changing
      the locking order to jbd2_journal_start -> page_lock.  However, this
      introduced a potential race if there was a truncate racing with the
      data=journalled writeback mode.
      
      Fix this by grabbing the page lock after starting the journal handle,
      and then checking to see if page had gotten truncated out from under
      us.
      
      This fixes a number of different warnings or BUG_ON's when running
      xfstests generic/086 in data=journalled mode, including:
      
      jbd2_journal_dirty_metadata: vdc-8: bad jh for block 115643: transaction (ee3fe7
      c0, 164), jh->b_transaction (  (null), 0), jh->b_next_transaction (  (null), 0), jlist 0
      
      	      	      	  - and -
      
      kernel BUG at /usr/projects/linux/ext4/fs/jbd2/transaction.c:2200!
          ...
      Call Trace:
       [<c02b2ded>] ? __ext4_journalled_invalidatepage+0x117/0x117
       [<c02b2de5>] __ext4_journalled_invalidatepage+0x10f/0x117
       [<c02b2ded>] ? __ext4_journalled_invalidatepage+0x117/0x117
       [<c027d883>] ? lock_buffer+0x36/0x36
       [<c02b2dfa>] ext4_journalled_invalidatepage+0xd/0x22
       [<c0229139>] do_invalidatepage+0x22/0x26
       [<c0229198>] truncate_inode_page+0x5b/0x85
       [<c022934b>] truncate_inode_pages_range+0x156/0x38c
       [<c0229592>] truncate_inode_pages+0x11/0x15
       [<c022962d>] truncate_pagecache+0x55/0x71
       [<c02b913b>] ext4_setattr+0x4a9/0x560
       [<c01ca542>] ? current_kernel_time+0x10/0x44
       [<c026c4d8>] notify_change+0x1c7/0x2be
       [<c0256a00>] do_truncate+0x65/0x85
       [<c0226f31>] ? file_ra_state_init+0x12/0x29
      
      	      	      	  - and -
      
      WARNING: CPU: 1 PID: 1331 at /usr/projects/linux/ext4/fs/jbd2/transaction.c:1396
      irty_metadata+0x14a/0x1ae()
          ...
      Call Trace:
       [<c01b879f>] ? console_unlock+0x3a1/0x3ce
       [<c082cbb4>] dump_stack+0x48/0x60
       [<c0178b65>] warn_slowpath_common+0x89/0xa0
       [<c02ef2cf>] ? jbd2_journal_dirty_metadata+0x14a/0x1ae
       [<c0178bef>] warn_slowpath_null+0x14/0x18
       [<c02ef2cf>] jbd2_journal_dirty_metadata+0x14a/0x1ae
       [<c02d8615>] __ext4_handle_dirty_metadata+0xd4/0x19d
       [<c02b2f44>] write_end_fn+0x40/0x53
       [<c02b4a16>] ext4_walk_page_buffers+0x4e/0x6a
       [<c02b59e7>] ext4_writepage+0x354/0x3b8
       [<c02b2f04>] ? mpage_release_unused_pages+0xd4/0xd4
       [<c02b1b21>] ? wait_on_buffer+0x2c/0x2c
       [<c02b5a4b>] ? ext4_writepage+0x3b8/0x3b8
       [<c02b5a5b>] __writepage+0x10/0x2e
       [<c0225956>] write_cache_pages+0x22d/0x32c
       [<c02b5a4b>] ? ext4_writepage+0x3b8/0x3b8
       [<c02b6ee8>] ext4_writepages+0x102/0x607
       [<c019adfe>] ? sched_clock_local+0x10/0x10e
       [<c01a8a7c>] ? __lock_is_held+0x2e/0x44
       [<c01a8ad5>] ? lock_is_held+0x43/0x51
       [<c0226dff>] do_writepages+0x1c/0x29
       [<c0276bed>] __writeback_single_inode+0xc3/0x545
       [<c0277c07>] writeback_sb_inodes+0x21f/0x36d
          ...
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      bdf96838
    • T
      ext4 crypto: fail the mount if blocksize != pagesize · 1cb767cd
      Theodore Ts'o 提交于
      We currently don't correctly handle the case where blocksize !=
      pagesize, so disallow the mount in those cases.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      1cb767cd
  2. 09 6月, 2015 2 次提交
  3. 08 6月, 2015 5 次提交
    • D
      ext4: BUG_ON assertion repeated for inode1, not done for inode2 · 8bc3b1e6
      David Moore 提交于
      During a source code review of fs/ext4/extents.c I noted identical
      consecutive lines. An assertion is repeated for inode1 and never done
      for inode2. This is not in keeping with the rest of the code in the
      ext4_swap_extents function and appears to be a bug.
      
      Assert that the inode2 mutex is not locked.
      Signed-off-by: NDavid Moore <dmoorefo@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      8bc3b1e6
    • T
    • L
      ext4: return error code from ext4_mb_good_group() · 42ac1848
      Lukas Czerner 提交于
      Currently ext4_mb_good_group() only returns 0 or 1 depending on whether
      the allocation group is suitable for use or not. However we might get
      various errors and fail while initializing new group including -EIO
      which would never get propagated up the call chain. This might lead to
      an endless loop at writeback when we're trying to find a good group to
      allocate from and we fail to initialize new group (read error for
      example).
      
      Fix this by returning proper error code from ext4_mb_good_group() and
      using it in ext4_mb_regular_allocator(). In ext4_mb_regular_allocator()
      we will always return only the first occurred error from
      ext4_mb_good_group() and we only propagate it back  to the caller if we
      do not get any other errors and we fail to allocate any blocks.
      
      Note that with other modes than errors=continue, we will fail
      immediately in ext4_mb_good_group() in case of error, however with
      errors=continue we should try to continue using the file system, that's
      why we're not going to fail immediately when we see an error from
      ext4_mb_good_group(), but rather when we fail to find a suitable block
      group to allocate from due to an problem in group initialization.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      42ac1848
    • L
      ext4: try to initialize all groups we can in case of failure on ppc64 · bbdc322f
      Lukas Czerner 提交于
      Currently on the machines with page size > block size when initializing
      block group buddy cache we initialize it for all the block group bitmaps
      in the page. However in the case of read error, checksum error, or if
      a single bitmap is in any way corrupted we would fail to initialize all
      of the bitmaps. This is problematic because we will not have access to
      the other allocation groups even though those might be perfectly fine
      and usable.
      
      Fix this by reading all the bitmaps instead of error out on the first
      problem and simply skip the bitmaps which were either not read properly,
      or are not valid.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      bbdc322f
    • L
      ext4: verify block bitmap even after fresh initialization · 41e5b7ed
      Lukas Czerner 提交于
      If we want to rely on the buffer_verified() flag of the block bitmap
      buffer, we have to set it consistently. However currently if we're
      initializing uninitialized block bitmap in
      ext4_read_block_bitmap_nowait() we're not going to set buffer verified
      at all.
      
      We can do this by simply setting the flag on the buffer, but I think
      it's actually better to run ext4_validate_block_bitmap() to make sure
      that what we did in the ext4_init_block_bitmap() is right.
      
      So run ext4_validate_block_bitmap() even after the block bitmap
      initialization. Also bail out early from ext4_validate_block_bitmap() if
      we see corrupt bitmap, since we already know it's corrupt and we do not
      need to verify that.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      41e5b7ed
  4. 03 6月, 2015 1 次提交
    • T
      ext4 crypto: allocate bounce pages using GFP_NOWAIT · 3dbb5eb9
      Theodore Ts'o 提交于
      Previously we allocated bounce pages using a combination of
      alloc_page() and mempool_alloc() with the __GFP_WAIT bit set.
      Instead, use mempool_alloc() with GFP_NOWAIT.  The mempool_alloc()
      function will try using alloc_pages() initially, and then only use the
      mempool reserve of pages if alloc_pages() is unable to fulfill the
      request.
      
      This minimizes the the impact on the mm layer when we need to do a
      large amount of writeback of encrypted files, as Jaeguk Kim had
      reported that under a heavy fio workload on a system with restricted
      amounts memory (which unfortunately, includes many mobile handsets),
      he had observed the the OOM killer getting triggered several times.
      Using GFP_NOWAIT
      
      If the mempool_alloc() function fails, we will retry the page
      writeback at a later time; the function of the mempool is to ensure
      that we can writeback at least 32 pages at a time, so we can more
      efficiently dispatch I/O under high memory pressure situations.  In
      the future we should make this be a tunable so we can determine the
      best tradeoff between permanently sequestering memory and the ability
      to quickly launder pages so we can free up memory quickly when
      necessary.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      3dbb5eb9
  5. 01 6月, 2015 13 次提交
  6. 19 5月, 2015 7 次提交
    • T
      ext4 crypto: get rid of ci_mode from struct ext4_crypt_info · 1aaa6e8b
      Theodore Ts'o 提交于
      The ci_mode field was superfluous, and getting rid of it gets rid of
      an unused hole in the structure.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      1aaa6e8b
    • T
      ext4 crypto: use slab caches · 8ee03714
      Theodore Ts'o 提交于
      Use slab caches the ext4_crypto_ctx and ext4_crypt_info structures for
      slighly better memory efficiency and debuggability.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      8ee03714
    • T
      ext4: clean up superblock encryption mode fields · f5aed2c2
      Theodore Ts'o 提交于
      The superblock fields s_file_encryption_mode and s_dir_encryption_mode
      are vestigal, so remove them as a cleanup.  While we're at it, allow
      file systems with both encryption and inline_data enabled at the same
      time to work correctly.  We can't have encrypted inodes with inline
      data, but there's no reason to prohibit unencrypted inodes from using
      the inline data feature.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      f5aed2c2
    • T
      ext4 crypto: reorganize how we store keys in the inode · b7236e21
      Theodore Ts'o 提交于
      This is a pretty massive patch which does a number of different things:
      
      1) The per-inode encryption information is now stored in an allocated
         data structure, ext4_crypt_info, instead of directly in the node.
         This reduces the size usage of an in-memory inode when it is not
         using encryption.
      
      2) We drop the ext4_fname_crypto_ctx entirely, and use the per-inode
         encryption structure instead.  This remove an unnecessary memory
         allocation and free for the fname_crypto_ctx as well as allowing us
         to reuse the ctfm in a directory for multiple lookups and file
         creations.
      
      3) We also cache the inode's policy information in the ext4_crypt_info
         structure so we don't have to continually read it out of the
         extended attributes.
      
      4) We now keep the keyring key in the inode's encryption structure
         instead of releasing it after we are done using it to derive the
         per-inode key.  This allows us to test to see if the key has been
         revoked; if it has, we prevent the use of the derived key and free
         it.
      
      5) When an inode is released (or when the derived key is freed), we
         will use memset_explicit() to zero out the derived key, so it's not
         left hanging around in memory.  This implies that when a user logs
         out, it is important to first revoke the key, and then unlink it,
         and then finally, to use "echo 3 > /proc/sys/vm/drop_caches" to
         release any decrypted pages and dcache entries from the system
         caches.
      
      6) All this, and we also shrink the number of lines of code by around
         100.  :-)
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b7236e21
    • T
      ext4 crypto: separate kernel and userspace structure for the key · e2881b1b
      Theodore Ts'o 提交于
      Use struct ext4_encryption_key only for the master key passed via the
      kernel keyring.
      
      For internal kernel space users, we now use struct ext4_crypt_info.
      This will allow us to put information from the policy structure so we
      can cache it and avoid needing to constantly looking up the extended
      attribute.  We will do this in a spearate patch.  This patch is mostly
      mechnical to make it easier for patch review.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      e2881b1b
    • T
    • T
      ext4 crypto: optimize filename encryption · 5b643f9c
      Theodore Ts'o 提交于
      Encrypt the filename as soon it is passed in by the user.  This avoids
      our needing to encrypt the filename 2 or 3 times while in the process
      of creating a filename.
      
      Similarly, when looking up a directory entry, encrypt the filename
      early, or if the encryption key is not available, base-64 decode the
      file syystem so that the hash value and the last 16 bytes of the
      encrypted filename is available in the new struct ext4_filename data
      structure.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      5b643f9c
  7. 15 5月, 2015 6 次提交
    • T
      ext4: fix an ext3 collapse range regression in xfstests · b9576fc3
      Theodore Ts'o 提交于
      The xfstests test suite assumes that an attempt to collapse range on
      the range (0, 1) will return EOPNOTSUPP if the file system does not
      support collapse range.  Commit 280227a7: "ext4: move check under
      lock scope to close a race" broke this, and this caused xfstests to
      fail when run when testing file systems that did not have the extents
      feature enabled.
      Reported-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b9576fc3
    • E
      ext4: check for zero length extent explicitly · 2f974865
      Eryu Guan 提交于
      The following commit introduced a bug when checking for zero length extent
      
      5946d089 ext4: check for overlapping extents in ext4_valid_extent_entries()
      
      Zero length extent could pass the check if lblock is zero.
      
      Adding the explicit check for zero length back.
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      2f974865
    • L
      ext4: fix NULL pointer dereference when journal restart fails · 9d506594
      Lukas Czerner 提交于
      Currently when journal restart fails, we'll have the h_transaction of
      the handle set to NULL to indicate that the handle has been effectively
      aborted. We handle this situation quietly in the jbd2_journal_stop() and just
      free the handle and exit because everything else has been done before we
      attempted (and failed) to restart the journal.
      
      Unfortunately there are a number of problems with that approach
      introduced with commit
      
      41a5b913 "jbd2: invalidate handle if jbd2_journal_restart()
      fails"
      
      First of all in ext4 jbd2_journal_stop() will be called through
      __ext4_journal_stop() where we would try to get a hold of the superblock
      by dereferencing h_transaction which in this case would lead to NULL
      pointer dereference and crash.
      
      In addition we're going to free the handle regardless of the refcount
      which is bad as well, because others up the call chain will still
      reference the handle so we might potentially reference already freed
      memory.
      
      Moreover it's expected that we'll get aborted handle as well as detached
      handle in some of the journalling function as the error propagates up
      the stack, so it's unnecessary to call WARN_ON every time we get
      detached handle.
      
      And finally we might leak some memory by forgetting to free reserved
      handle in jbd2_journal_stop() in the case where handle was detached from
      the transaction (h_transaction is NULL).
      
      Fix the NULL pointer dereference in __ext4_journal_stop() by just
      calling jbd2_journal_stop() quietly as suggested by Jan Kara. Also fix
      the potential memory leak in jbd2_journal_stop() and use proper
      handle refcounting before we attempt to free it to avoid use-after-free
      issues.
      
      And finally remove all WARN_ON(!transaction) from the code so that we do
      not get random traces when something goes wrong because when journal
      restart fails we will get to some of those functions.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      9d506594
    • T
      ext4: remove unused function prototype from ext4.h · 92c82639
      Theodore Ts'o 提交于
      The ext4_extent_tree_init() function hasn't been in the ext4 code for
      a long time ago, except in an unused function prototype in ext4.h
      
      Google-Bug-Id: 4530137
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      92c82639
    • T
      ext4: don't save the error information if the block device is read-only · 1b46617b
      Theodore Ts'o 提交于
      Google-Bug-Id: 20939131
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      1b46617b
    • T
      ext4: fix lazytime optimization · 8f4d8558
      Theodore Ts'o 提交于
      We had a fencepost error in the lazytime optimization which means that
      timestamp would get written to the wrong inode.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      8f4d8558
  8. 03 5月, 2015 2 次提交
    • J
      ext4: fix growing of tiny filesystems · 2c869b26
      Jan Kara 提交于
      The estimate of necessary transaction credits in ext4_flex_group_add()
      is too pessimistic. It reserves credit for sb, resize inode, and resize
      inode dindirect block for each group added in a flex group although they
      are always the same block and thus it is enough to account them only
      once. Also the number of modified GDT block is overestimated since we
      fit EXT4_DESC_PER_BLOCK(sb) descriptors in one block.
      
      Make the estimation more precise. That reduces number of requested
      credits enough that we can grow 20 MB filesystem (which has 1 MB
      journal, 79 reserved GDT blocks, and flex group size 16 by default).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      2c869b26
    • D
      ext4: move check under lock scope to close a race. · 280227a7
      Davide Italiano 提交于
      fallocate() checks that the file is extent-based and returns
      EOPNOTSUPP in case is not. Other tasks can convert from and to
      indirect and extent so it's safe to check only after grabbing
      the inode mutex.
      Signed-off-by: NDavide Italiano <dccitaliano@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      280227a7