1. 21 6月, 2015 2 次提交
    • T
      ext4: prevent ext4_quota_write() from failing due to ENOSPC · c5e298ae
      Theodore Ts'o 提交于
      In order to prevent quota block tracking to be inaccurate when
      ext4_quota_write() fails with ENOSPC, we make two changes.  The quota
      file can now use the reserved block (since the quota file is arguably
      file system metadata), and ext4_quota_write() now uses
      ext4_should_retry_alloc() to retry the block allocation after a commit
      has completed and released some blocks for allocation.
      
      This fixes failures of xfstests generic/270:
      
      Quota error (device vdc): write_blk: dquota write failed
      Quota error (device vdc): qtree_write_dquot: Error -28 occurred while creating quota
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      c5e298ae
    • T
      ext4: call sync_blockdev() before invalidate_bdev() in put_super() · 89d96a6f
      Theodore Ts'o 提交于
      Normally all of the buffers will have been forced out to disk before
      we call invalidate_bdev(), but there will be some cases, where a file
      system operation was aborted due to an ext4_error(), where there may
      still be some dirty buffers in the buffer cache for the device.  So
      try to force them out to memory before calling invalidate_bdev().
      
      This fixes a warning triggered by generic/081:
      
      WARNING: CPU: 1 PID: 3473 at /usr/projects/linux/ext4/fs/block_dev.c:56 __blkdev_put+0xb5/0x16f()
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      89d96a6f
  2. 16 6月, 2015 1 次提交
    • A
      ext4: improve warning directory handling messages · b03a2f7e
      Andreas Dilger 提交于
      Several ext4_warning() messages in the directory handling code do not
      report the inode number of the (potentially corrupt) directory where a
      problem is seen, and others report this in an ad-hoc manner.  Add an
      ext4_warning_inode() helper to print the inode number and command name
      consistent with ext4_error_inode().
      
      Consolidate the place in ext4.h that these macros are defined.
      
      Clean up some other directory error and warning messages to print the
      calling function name.
      
      Minor code style fixes in nearby lines.
      Signed-off-by: NAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b03a2f7e
  3. 13 6月, 2015 1 次提交
  4. 01 6月, 2015 2 次提交
  5. 19 5月, 2015 3 次提交
    • T
      ext4: clean up superblock encryption mode fields · f5aed2c2
      Theodore Ts'o 提交于
      The superblock fields s_file_encryption_mode and s_dir_encryption_mode
      are vestigal, so remove them as a cleanup.  While we're at it, allow
      file systems with both encryption and inline_data enabled at the same
      time to work correctly.  We can't have encrypted inodes with inline
      data, but there's no reason to prohibit unencrypted inodes from using
      the inline data feature.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      f5aed2c2
    • T
      ext4 crypto: reorganize how we store keys in the inode · b7236e21
      Theodore Ts'o 提交于
      This is a pretty massive patch which does a number of different things:
      
      1) The per-inode encryption information is now stored in an allocated
         data structure, ext4_crypt_info, instead of directly in the node.
         This reduces the size usage of an in-memory inode when it is not
         using encryption.
      
      2) We drop the ext4_fname_crypto_ctx entirely, and use the per-inode
         encryption structure instead.  This remove an unnecessary memory
         allocation and free for the fname_crypto_ctx as well as allowing us
         to reuse the ctfm in a directory for multiple lookups and file
         creations.
      
      3) We also cache the inode's policy information in the ext4_crypt_info
         structure so we don't have to continually read it out of the
         extended attributes.
      
      4) We now keep the keyring key in the inode's encryption structure
         instead of releasing it after we are done using it to derive the
         per-inode key.  This allows us to test to see if the key has been
         revoked; if it has, we prevent the use of the derived key and free
         it.
      
      5) When an inode is released (or when the derived key is freed), we
         will use memset_explicit() to zero out the derived key, so it's not
         left hanging around in memory.  This implies that when a user logs
         out, it is important to first revoke the key, and then unlink it,
         and then finally, to use "echo 3 > /proc/sys/vm/drop_caches" to
         release any decrypted pages and dcache entries from the system
         caches.
      
      6) All this, and we also shrink the number of lines of code by around
         100.  :-)
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b7236e21
    • T
      ext4 crypto: separate kernel and userspace structure for the key · e2881b1b
      Theodore Ts'o 提交于
      Use struct ext4_encryption_key only for the master key passed via the
      kernel keyring.
      
      For internal kernel space users, we now use struct ext4_crypt_info.
      This will allow us to put information from the policy structure so we
      can cache it and avoid needing to constantly looking up the extended
      attribute.  We will do this in a spearate patch.  This patch is mostly
      mechnical to make it easier for patch review.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      e2881b1b
  6. 15 5月, 2015 1 次提交
  7. 16 4月, 2015 2 次提交
  8. 12 4月, 2015 1 次提交
    • M
      ext4 crypto: add ext4 encryption facilities · b30ab0e0
      Michael Halcrow 提交于
      On encrypt, we will re-assign the buffer_heads to point to a bounce
      page rather than the control_page (which is the original page to write
      that contains the plaintext). The block I/O occurs against the bounce
      page.  On write completion, we re-assign the buffer_heads to the
      original plaintext page.
      
      On decrypt, we will attach a read completion callback to the bio
      struct. This read completion will decrypt the read contents in-place
      prior to setting the page up-to-date.
      
      The current encryption mode, AES-256-XTS, lacks cryptographic
      integrity. AES-256-GCM is in-plan, but we will need to devise a
      mechanism for handling the integrity data.
      Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
      Signed-off-by: NIldar Muslukhov <ildarm@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b30ab0e0
  9. 03 4月, 2015 2 次提交
  10. 04 3月, 2015 1 次提交
  11. 17 2月, 2015 1 次提交
  12. 13 2月, 2015 3 次提交
  13. 05 2月, 2015 1 次提交
    • T
      ext4: add optimization for the lazytime mount option · a26f4992
      Theodore Ts'o 提交于
      Add an optimization for the MS_LAZYTIME mount option so that we will
      opportunistically write out any inodes with the I_DIRTY_TIME flag set
      in a particular inode table block when we need to update some inode in
      that inode table block anyway.
      
      Also add some temporary code so that we can set the lazytime mount
      option without needing a modified /sbin/mount program which can set
      MS_LAZYTIME.  We can eventually make this go away once util-linux has
      added support.
      
      Google-Bug-Id: 18297052
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a26f4992
  14. 30 1月, 2015 1 次提交
    • J
      ext4: Use generic helpers for quotaon and quotaoff · 1fa5efe3
      Jan Kara 提交于
      Ext4 can just use the generic helpers provided by quota code for turning
      quotas on and off when quota files are stored as system inodes. The only
      difference is the feature test in ext4_quota_on_sysfile() but the same
      is achieved in dquot_quota_enable() by checking whether usage tracking
      for the corresponding quota type is enabled (which can happen only if
      quota feature is set).
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      1fa5efe3
  15. 27 1月, 2015 1 次提交
  16. 21 1月, 2015 1 次提交
  17. 03 1月, 2015 1 次提交
  18. 26 11月, 2014 7 次提交
    • J
      ext4: forbid journal_async_commit in data=ordered mode · d4f76107
      Jan Kara 提交于
      Option journal_async_commit breaks gurantees of data=ordered mode as it
      sends only a single cache flush after writing a transaction commit
      block. Thus even though the transaction including the commit block is
      fully stored on persistent storage, file data may still linger in drives
      caches and will be lost on power failure. Since all checksums match on
      journal recovery, we replay the transaction thus possibly exposing stale
      user data.
      
      To fix this data exposure issue, remove the possibility to use
      journal_async_commit in data=ordered mode.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      d4f76107
    • E
      ext4: don't count external journal blocks as overhead · b003b524
      Eric Sandeen 提交于
      This was fixed for ext3 with:
      
      e6d8fb34 ext3: Count internal journal as bsddf overhead in ext3_statfs
      
      but was never fixed for ext4.
      
      With a large external journal and no used disk blocks, df comes
      out negative without this, as journal blocks are added to the
      overhead & subtracted from used blocks unconditionally.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b003b524
    • D
      ext4: create nojournal_checksum mount option · c6d3d56d
      Darrick J. Wong 提交于
      Create a mount option to disable journal checksumming (because the
      metadata_csum feature turns it on by default now), and fix remount not
      to allow changing the journal checksumming option, since changing the
      mount options has no effect on the journal.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      c6d3d56d
    • D
      ext4: cleanup GFP flags inside resize path · 4fdb5543
      Dmitry Monakhov 提交于
      We must use GFP_NOFS instead GFP_KERNEL inside ext4_mb_add_groupinfo
      and ext4_calculate_overhead() because they are called from inside a
      journal transaction. Call trace:
      
      ioctl
       ->ext4_group_add
         ->journal_start
         ->ext4_setup_new_descs
           ->ext4_mb_add_groupinfo -> GFP_KERNEL
         ->ext4_flex_group_add
           ->ext4_update_super
             ->ext4_calculate_overhead  -> GFP_KERNEL
         ->journal_stop
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      4fdb5543
    • J
      ext4: limit number of scanned extents in status tree shrinker · dd475925
      Jan Kara 提交于
      Currently we scan extent status trees of inodes until we reclaim nr_to_scan
      extents. This can however require a lot of scanning when there are lots
      of delayed extents (as those cannot be reclaimed).
      
      Change shrinker to work as shrinkers are supposed to and *scan* only
      nr_to_scan extents regardless of how many extents did we actually
      reclaim. We however need to be careful and avoid scanning each status
      tree from the beginning - that could lead to a situation where we would
      not be able to reclaim anything at all when first nr_to_scan extents in
      the tree are always unreclaimable. We remember with each inode offset
      where we stopped scanning and continue from there when we next come
      across the inode.
      
      Note that we also need to update places calling __es_shrink() manually
      to pass reasonable nr_to_scan to have a chance of reclaiming anything and
      not just 1.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      dd475925
    • J
      ext4: move handling of list of shrinkable inodes into extent status code · b0dea4c1
      Jan Kara 提交于
      Currently callers adding extents to extent status tree were responsible
      for adding the inode to the list of inodes with freeable extents. This
      is error prone and puts list handling in unnecessarily many places.
      
      Just add inode to the list automatically when the first non-delay extent
      is added to the tree and remove inode from the list when the last
      non-delay extent is removed.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b0dea4c1
    • Z
      ext4: change LRU to round-robin in extent status tree shrinker · edaa53ca
      Zheng Liu 提交于
      In this commit we discard the lru algorithm for inodes with extent
      status tree because it takes significant effort to maintain a lru list
      in extent status tree shrinker and the shrinker can take a long time to
      scan this lru list in order to reclaim some objects.
      
      We replace the lru ordering with a simple round-robin.  After that we
      never need to keep a lru list.  That means that the list needn't be
      sorted if the shrinker can not reclaim any objects in the first round.
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      edaa53ca
  19. 21 11月, 2014 1 次提交
  20. 10 11月, 2014 1 次提交
  21. 30 10月, 2014 3 次提交
  22. 14 10月, 2014 1 次提交
    • D
      ext4: check s_chksum_driver when looking for bg csum presence · 813d32f9
      Darrick J. Wong 提交于
      Convert the ext4_has_group_desc_csum predicate to look for a checksum
      driver instead of the metadata_csum flag and change the bg checksum
      calculation function to look for GDT_CSUM before taking the crc16
      path.
      
      Without this patch, if we mount with ^uninit_bg,^metadata_csum and
      later metadata_csum gets turned on by accident, the block group
      checksum functions will incorrectly assume that checksumming is
      enabled (metadata_csum) but that crc16 should be used
      (!s_chksum_driver).  This is totally wrong, so fix the predicate
      and the checksum formula selection.
      
      (Granted, if the metadata_csum feature bit gets enabled on a live FS
      then something underhanded is going on, but we could at least avoid
      writing garbage into the on-disk fields.)
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Cc: stable@vger.kernel.org
      813d32f9
  23. 13 10月, 2014 1 次提交
  24. 06 10月, 2014 1 次提交
    • T
      ext4: add ext4_iget_normal() which is to be used for dir tree lookups · f4bb2981
      Theodore Ts'o 提交于
      If there is a corrupted file system which has directory entries that
      point at reserved, metadata inodes, prohibit them from being used by
      treating them the same way we treat Boot Loader inodes --- that is,
      mark them to be bad inodes.  This prohibits them from being opened,
      deleted, or modified via chmod, chown, utimes, etc.
      
      In particular, this prevents a corrupted file system which has a
      directory entry which points at the journal inode from being deleted
      and its blocks released, after which point Much Hilarity Ensues.
      Reported-by: NSami Liedes <sami.liedes@iki.fi>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      f4bb2981