1. 08 12月, 2015 5 次提交
    • J
      ext4: implement allocation of pre-zeroed blocks · c86d8db3
      Jan Kara 提交于
      DAX page fault path needs to get blocks that are pre-zeroed to avoid
      races when two concurrent page faults happen in the same block of a
      file. Implement support for this in ext4_map_blocks().
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      c86d8db3
    • J
      ext4: provide ext4_issue_zeroout() · 53085fac
      Jan Kara 提交于
      Create new function ext4_issue_zeroout() to zeroout contiguous (both
      logically and physically) part of inode data. We will need to issue
      zeroout when extent structure is not readily available and this function
      will allow us to do it without making up fake extent structures.
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      53085fac
    • J
      ext4: get rid of EXT4_GET_BLOCKS_NO_LOCK flag · 2dcba478
      Jan Kara 提交于
      When dioread_nolock mode is enabled, we grab i_data_sem in
      ext4_ext_direct_IO() and therefore we need to instruct _ext4_get_block()
      not to grab i_data_sem again using EXT4_GET_BLOCKS_NO_LOCK. However
      holding i_data_sem over overwrite direct IO isn't needed these days. We
      have exclusion against truncate / hole punching because we increase
      i_dio_count under i_mutex in ext4_ext_direct_IO() so once
      ext4_file_write_iter() verifies blocks are allocated & written, they are
      guaranteed to stay so during the whole direct IO even after we drop
      i_mutex.
      
      So we can just remove this locking abuse and the no longer necessary
      EXT4_GET_BLOCKS_NO_LOCK flag.
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      2dcba478
    • J
      ext4: fix races of writeback with punch hole and zero range · 01127848
      Jan Kara 提交于
      When doing delayed allocation, update of on-disk inode size is postponed
      until IO submission time. However hole punch or zero range fallocate
      calls can end up discarding the tail page cache page and thus on-disk
      inode size would never be properly updated.
      
      Make sure the on-disk inode size is updated before truncating page
      cache.
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      01127848
    • J
      ext4: fix races between page faults and hole punching · ea3d7209
      Jan Kara 提交于
      Currently, page faults and hole punching are completely unsynchronized.
      This can result in page fault faulting in a page into a range that we
      are punching after truncate_pagecache_range() has been called and thus
      we can end up with a page mapped to disk blocks that will be shortly
      freed. Filesystem corruption will shortly follow. Note that the same
      race is avoided for truncate by checking page fault offset against
      i_size but there isn't similar mechanism available for punching holes.
      
      Fix the problem by creating new rw semaphore i_mmap_sem in inode and
      grab it for writing over truncate, hole punching, and other functions
      removing blocks from extent tree and for read over page faults. We
      cannot easily use i_data_sem for this since that ranks below transaction
      start and we need something ranking above it so that it can be held over
      the whole truncate / hole punching operation. Also remove various
      workarounds we had in the code to reduce race window when page fault
      could have created pages with stale mapping information.
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ea3d7209
  2. 25 11月, 2015 1 次提交
    • D
      ext4: Fix handling of extended tv_sec · a4dad1ae
      David Turner 提交于
      In ext4, the bottom two bits of {a,c,m}time_extra are used to extend
      the {a,c,m}time fields, deferring the year 2038 problem to the year
      2446.
      
      When decoding these extended fields, for times whose bottom 32 bits
      would represent a negative number, sign extension causes the 64-bit
      extended timestamp to be negative as well, which is not what's
      intended.  This patch corrects that issue, so that the only negative
      {a,c,m}times are those between 1901 and 1970 (as per 32-bit signed
      timestamps).
      
      Some older kernels might have written pre-1970 dates with 1,1 in the
      extra bits.  This patch treats those incorrectly-encoded dates as
      pre-1970, instead of post-2311, until kernel 4.20 is released.
      Hopefully by then e2fsck will have fixed up the bad data.
      
      Also add a comment explaining the encoding of ext4's extra {a,c,m}time
      bits.
      Signed-off-by: NDavid Turner <novalis@novalis.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reported-by: NMark Harris <mh8928@yahoo.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732
      Cc: stable@vger.kernel.org
      a4dad1ae
  3. 19 10月, 2015 1 次提交
    • D
      ext4: do not allow journal_opts for fs w/o journal · 1e381f60
      Dmitry Monakhov 提交于
      It is appeared that we can pass journal related mount options and such options
      be shown in /proc/mounts
      
      Example:
      #mkfs.ext4 -F /dev/vdb
      #tune2fs -O ^has_journal /dev/vdb
      #mount /dev/vdb /mnt/  -ocommit=20,journal_async_commit
      #cat /proc/mounts  | grep /mnt
       /dev/vdb /mnt ext4 rw,relatime,journal_checksum,journal_async_commit,commit=20,data=ordered 0 0
      
      But options:"journal_checksum,journal_async_commit,commit=20,data=ordered" has
      nothing with reality because there is no journal at all.
      
      This patch disallow following options for journalless configurations:
       - journal_checksum
       - journal_async_commit
       - commit=%ld
       - data={writeback,ordered,journal}
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      1e381f60
  4. 18 10月, 2015 4 次提交
  5. 03 10月, 2015 1 次提交
  6. 24 9月, 2015 2 次提交
  7. 09 9月, 2015 1 次提交
  8. 22 7月, 2015 1 次提交
    • T
      ext4: replace ext4_io_submit->io_op with ->io_wbc · 5a33911f
      Tejun Heo 提交于
      ext4_io_submit_init() takes the pointer to writeback_control to test
      its sync_mode and determine between WRITE and WRITE_SYNC and records
      the result in ->io_op.  This patch makes it record the pointer
      directly and moves the test to ext4_io_submit().
      
      This doesn't cause any noticeable differences now but having
      writeback_control available throughout IO submission path will be
      depended upon by the planned cgroup writeback support.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      5a33911f
  9. 16 6月, 2015 1 次提交
    • A
      ext4: improve warning directory handling messages · b03a2f7e
      Andreas Dilger 提交于
      Several ext4_warning() messages in the directory handling code do not
      report the inode number of the (potentially corrupt) directory where a
      problem is seen, and others report this in an ad-hoc manner.  Add an
      ext4_warning_inode() helper to print the inode number and command name
      consistent with ext4_error_inode().
      
      Consolidate the place in ext4.h that these macros are defined.
      
      Clean up some other directory error and warning messages to print the
      calling function name.
      
      Minor code style fixes in nearby lines.
      Signed-off-by: NAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b03a2f7e
  10. 09 6月, 2015 1 次提交
    • N
      ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate · 331573fe
      Namjae Jeon 提交于
      This patch implements fallocate's FALLOC_FL_INSERT_RANGE for Ext4.
      
      1) Make sure that both offset and len are block size aligned.
      2) Update the i_size of inode by len bytes.
      3) Compute the file's logical block number against offset. If the computed
         block number is not the starting block of the extent, split the extent
         such that the block number is the starting block of the extent.
      4) Shift all the extents which are lying between [offset, last allocated extent]
         towards right by len bytes. This step will make a hole of len bytes
         at offset.
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com>
      331573fe
  11. 01 6月, 2015 3 次提交
  12. 19 5月, 2015 5 次提交
    • T
      ext4 crypto: use slab caches · 8ee03714
      Theodore Ts'o 提交于
      Use slab caches the ext4_crypto_ctx and ext4_crypt_info structures for
      slighly better memory efficiency and debuggability.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      8ee03714
    • T
      ext4: clean up superblock encryption mode fields · f5aed2c2
      Theodore Ts'o 提交于
      The superblock fields s_file_encryption_mode and s_dir_encryption_mode
      are vestigal, so remove them as a cleanup.  While we're at it, allow
      file systems with both encryption and inline_data enabled at the same
      time to work correctly.  We can't have encrypted inodes with inline
      data, but there's no reason to prohibit unencrypted inodes from using
      the inline data feature.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      f5aed2c2
    • T
      ext4 crypto: reorganize how we store keys in the inode · b7236e21
      Theodore Ts'o 提交于
      This is a pretty massive patch which does a number of different things:
      
      1) The per-inode encryption information is now stored in an allocated
         data structure, ext4_crypt_info, instead of directly in the node.
         This reduces the size usage of an in-memory inode when it is not
         using encryption.
      
      2) We drop the ext4_fname_crypto_ctx entirely, and use the per-inode
         encryption structure instead.  This remove an unnecessary memory
         allocation and free for the fname_crypto_ctx as well as allowing us
         to reuse the ctfm in a directory for multiple lookups and file
         creations.
      
      3) We also cache the inode's policy information in the ext4_crypt_info
         structure so we don't have to continually read it out of the
         extended attributes.
      
      4) We now keep the keyring key in the inode's encryption structure
         instead of releasing it after we are done using it to derive the
         per-inode key.  This allows us to test to see if the key has been
         revoked; if it has, we prevent the use of the derived key and free
         it.
      
      5) When an inode is released (or when the derived key is freed), we
         will use memset_explicit() to zero out the derived key, so it's not
         left hanging around in memory.  This implies that when a user logs
         out, it is important to first revoke the key, and then unlink it,
         and then finally, to use "echo 3 > /proc/sys/vm/drop_caches" to
         release any decrypted pages and dcache entries from the system
         caches.
      
      6) All this, and we also shrink the number of lines of code by around
         100.  :-)
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b7236e21
    • T
      ext4 crypto: separate kernel and userspace structure for the key · e2881b1b
      Theodore Ts'o 提交于
      Use struct ext4_encryption_key only for the master key passed via the
      kernel keyring.
      
      For internal kernel space users, we now use struct ext4_crypt_info.
      This will allow us to put information from the policy structure so we
      can cache it and avoid needing to constantly looking up the extended
      attribute.  We will do this in a spearate patch.  This patch is mostly
      mechnical to make it easier for patch review.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      e2881b1b
    • T
      ext4 crypto: optimize filename encryption · 5b643f9c
      Theodore Ts'o 提交于
      Encrypt the filename as soon it is passed in by the user.  This avoids
      our needing to encrypt the filename 2 or 3 times while in the process
      of creating a filename.
      
      Similarly, when looking up a directory entry, encrypt the filename
      early, or if the encryption key is not available, base-64 decode the
      file syystem so that the hash value and the last 16 bytes of the
      encrypted filename is available in the new struct ext4_filename data
      structure.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      5b643f9c
  13. 15 5月, 2015 1 次提交
  14. 11 5月, 2015 1 次提交
  15. 02 5月, 2015 3 次提交
  16. 16 4月, 2015 3 次提交
  17. 12 4月, 2015 6 次提交