1. 25 12月, 2018 1 次提交
    • T
      ext4: fix a potential fiemap/page fault deadlock w/ inline_data · 2b08b1f1
      Theodore Ts'o 提交于
      The ext4_inline_data_fiemap() function calls fiemap_fill_next_extent()
      while still holding the xattr semaphore.  This is not necessary and it
      triggers a circular lockdep warning.  This is because
      fiemap_fill_next_extent() could trigger a page fault when it writes
      into page which triggers a page fault.  If that page is mmaped from
      the inline file in question, this could very well result in a
      deadlock.
      
      This problem can be reproduced using generic/519 with a file system
      configuration which has the inline_data feature enabled.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      2b08b1f1
  2. 04 12月, 2018 1 次提交
  3. 03 10月, 2018 1 次提交
  4. 27 8月, 2018 1 次提交
  5. 10 7月, 2018 1 次提交
  6. 17 6月, 2018 1 次提交
  7. 16 6月, 2018 1 次提交
    • T
      ext4: clear i_data in ext4_inode_info when removing inline data · 6e8ab72a
      Theodore Ts'o 提交于
      When converting from an inode from storing the data in-line to a data
      block, ext4_destroy_inline_data_nolock() was only clearing the on-disk
      copy of the i_blocks[] array.  It was not clearing copy of the
      i_blocks[] in ext4_inode_info, in i_data[], which is the copy actually
      used by ext4_map_blocks().
      
      This didn't matter much if we are using extents, since the extents
      header would be invalid and thus the extents could would re-initialize
      the extents tree.  But if we are using indirect blocks, the previous
      contents of the i_blocks array will be treated as block numbers, with
      potentially catastrophic results to the file system integrity and/or
      user data.
      
      This gets worse if the file system is using a 1k block size and
      s_first_data is zero, but even without this, the file system can get
      quite badly corrupted.
      
      This addresses CVE-2018-10881.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=200015Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      6e8ab72a
  8. 02 6月, 2018 1 次提交
  9. 23 5月, 2018 1 次提交
  10. 01 2月, 2018 1 次提交
  11. 29 1月, 2018 1 次提交
  12. 18 12月, 2017 1 次提交
    • T
      ext4: fix up remaining files with SPDX cleanups · f5166768
      Theodore Ts'o 提交于
      A number of ext4 source files were skipped due because their copyright
      permission statements didn't match the expected text used by the
      automated conversion utilities.  I've added SPDX tags for the rest.
      
      While looking at some of these files, I've noticed that we have quite
      a bit of variation on the licenses that were used --- in particular
      some of the Red Hat licenses on the jbd2 files use a GPL2+ license,
      and we have some files that have a LGPL-2.1 license (which was quite
      surprising).
      
      I've not attempted to do any license changes.  Even if it is perfectly
      legal to relicense to GPL 2.0-only for consistency's sake, that should
      be done with ext4 developer community discussion.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      
      f5166768
  13. 12 10月, 2017 1 次提交
  14. 02 10月, 2017 1 次提交
  15. 22 6月, 2017 1 次提交
    • A
      ext4: xattr-in-inode support · e50e5129
      Andreas Dilger 提交于
      Large xattr support is implemented for EXT4_FEATURE_INCOMPAT_EA_INODE.
      
      If the size of an xattr value is larger than will fit in a single
      external block, then the xattr value will be saved into the body
      of an external xattr inode.
      
      The also helps support a larger number of xattr, since only the headers
      will be stored in the in-inode space or the single external block.
      
      The inode is referenced from the xattr header via "e_value_inum",
      which was formerly "e_value_block", but that field was never used.
      The e_value_size still contains the xattr size so that listing
      xattrs does not need to look up the inode if the data is not accessed.
      
      struct ext4_xattr_entry {
              __u8    e_name_len;     /* length of name */
              __u8    e_name_index;   /* attribute name index */
              __le16  e_value_offs;   /* offset in disk block of value */
              __le32  e_value_inum;   /* inode in which value is stored */
              __le32  e_value_size;   /* size of attribute value */
              __le32  e_hash;         /* hash value of name and value */
              char    e_name[0];      /* attribute name */
      };
      
      The xattr inode is marked with the EXT4_EA_INODE_FL flag and also
      holds a back-reference to the owning inode in its i_mtime field,
      allowing the ext4/e2fsck to verify the correct inode is accessed.
      
      [ Applied fix by Dan Carpenter to avoid freeing an ERR_PTR. ]
      
      Lustre-Jira: https://jira.hpdd.intel.com/browse/LU-80
      Lustre-bugzilla: https://bugzilla.lustre.org/show_bug.cgi?id=4424Signed-off-by: NKalpak Shah <kalpak.shah@sun.com>
      Signed-off-by: NJames Simmons <uja.ornl@gmail.com>
      Signed-off-by: NAndreas Dilger <andreas.dilger@intel.com>
      Signed-off-by: NTahsin Erdogan <tahsin@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      e50e5129
  16. 25 5月, 2017 1 次提交
  17. 30 4月, 2017 1 次提交
  18. 16 3月, 2017 1 次提交
    • E
      ext4: mark inode dirty after converting inline directory · b9cf625d
      Eric Biggers 提交于
      If ext4_convert_inline_data() was called on a directory with inline
      data, the filesystem was left in an inconsistent state (as considered by
      e2fsck) because the file size was not increased to cover the new block.
      This happened because the inode was not marked dirty after i_disksize
      was updated.  Fix this by marking the inode dirty at the end of
      ext4_finish_convert_inline_dir().
      
      This bug was probably not noticed before because most users mark the
      inode dirty afterwards for other reasons.  But if userspace executed
      FS_IOC_SET_ENCRYPTION_POLICY with invalid parameters, as exercised by
      'kvm-xfstests -c adv generic/396', then the inode was never marked dirty
      after updating i_disksize.
      
      Cc: stable@vger.kernel.org  # 3.10+
      Fixes: 3c47d541Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b9cf625d
  19. 05 2月, 2017 2 次提交
  20. 23 1月, 2017 1 次提交
  21. 12 1月, 2017 2 次提交
    • T
      ext4: avoid calling ext4_mark_inode_dirty() under unneeded semaphores · b907f2d5
      Theodore Ts'o 提交于
      There is no need to call ext4_mark_inode_dirty while holding xattr_sem
      or i_data_sem, so where it's easy to avoid it, move it out from the
      critical region.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b907f2d5
    • T
      ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea() · c755e251
      Theodore Ts'o 提交于
      The xattr_sem deadlock problems fixed in commit 2e81a4ee: "ext4:
      avoid deadlock when expanding inode size" didn't include the use of
      xattr_sem in fs/ext4/inline.c.  With the addition of project quota
      which added a new extra inode field, this exposed deadlocks in the
      inline_data code similar to the ones fixed by 2e81a4ee.
      
      The deadlock can be reproduced via:
      
         dmesg -n 7
         mke2fs -t ext4 -O inline_data -Fq -I 256 /dev/vdc 32768
         mount -t ext4 -o debug_want_extra_isize=24 /dev/vdc /vdc
         mkdir /vdc/a
         umount /vdc
         mount -t ext4 /dev/vdc /vdc
         echo foo > /vdc/a/foo
      
      and looks like this:
      
      [   11.158815] 
      [   11.160276] =============================================
      [   11.161960] [ INFO: possible recursive locking detected ]
      [   11.161960] 4.10.0-rc3-00015-g011b30a8a3cf #160 Tainted: G        W      
      [   11.161960] ---------------------------------------------
      [   11.161960] bash/2519 is trying to acquire lock:
      [   11.161960]  (&ei->xattr_sem){++++..}, at: [<c1225a4b>] ext4_expand_extra_isize_ea+0x3d/0x4cd
      [   11.161960] 
      [   11.161960] but task is already holding lock:
      [   11.161960]  (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152
      [   11.161960] 
      [   11.161960] other info that might help us debug this:
      [   11.161960]  Possible unsafe locking scenario:
      [   11.161960] 
      [   11.161960]        CPU0
      [   11.161960]        ----
      [   11.161960]   lock(&ei->xattr_sem);
      [   11.161960]   lock(&ei->xattr_sem);
      [   11.161960] 
      [   11.161960]  *** DEADLOCK ***
      [   11.161960] 
      [   11.161960]  May be due to missing lock nesting notation
      [   11.161960] 
      [   11.161960] 4 locks held by bash/2519:
      [   11.161960]  #0:  (sb_writers#3){.+.+.+}, at: [<c11a2414>] mnt_want_write+0x1e/0x3e
      [   11.161960]  #1:  (&type->i_mutex_dir_key){++++++}, at: [<c119508b>] path_openat+0x338/0x67a
      [   11.161960]  #2:  (jbd2_handle){++++..}, at: [<c123314a>] start_this_handle+0x582/0x622
      [   11.161960]  #3:  (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152
      [   11.161960] 
      [   11.161960] stack backtrace:
      [   11.161960] CPU: 0 PID: 2519 Comm: bash Tainted: G        W       4.10.0-rc3-00015-g011b30a8a3cf #160
      [   11.161960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
      [   11.161960] Call Trace:
      [   11.161960]  dump_stack+0x72/0xa3
      [   11.161960]  __lock_acquire+0xb7c/0xcb9
      [   11.161960]  ? kvm_clock_read+0x1f/0x29
      [   11.161960]  ? __lock_is_held+0x36/0x66
      [   11.161960]  ? __lock_is_held+0x36/0x66
      [   11.161960]  lock_acquire+0x106/0x18a
      [   11.161960]  ? ext4_expand_extra_isize_ea+0x3d/0x4cd
      [   11.161960]  down_write+0x39/0x72
      [   11.161960]  ? ext4_expand_extra_isize_ea+0x3d/0x4cd
      [   11.161960]  ext4_expand_extra_isize_ea+0x3d/0x4cd
      [   11.161960]  ? _raw_read_unlock+0x22/0x2c
      [   11.161960]  ? jbd2_journal_extend+0x1e2/0x262
      [   11.161960]  ? __ext4_journal_get_write_access+0x3d/0x60
      [   11.161960]  ext4_mark_inode_dirty+0x17d/0x26d
      [   11.161960]  ? ext4_add_dirent_to_inline.isra.12+0xa5/0xb2
      [   11.161960]  ext4_add_dirent_to_inline.isra.12+0xa5/0xb2
      [   11.161960]  ext4_try_add_inline_entry+0x69/0x152
      [   11.161960]  ext4_add_entry+0xa3/0x848
      [   11.161960]  ? __brelse+0x14/0x2f
      [   11.161960]  ? _raw_spin_unlock_irqrestore+0x44/0x4f
      [   11.161960]  ext4_add_nondir+0x17/0x5b
      [   11.161960]  ext4_create+0xcf/0x133
      [   11.161960]  ? ext4_mknod+0x12f/0x12f
      [   11.161960]  lookup_open+0x39e/0x3fb
      [   11.161960]  ? __wake_up+0x1a/0x40
      [   11.161960]  ? lock_acquire+0x11e/0x18a
      [   11.161960]  path_openat+0x35c/0x67a
      [   11.161960]  ? sched_clock_cpu+0xd7/0xf2
      [   11.161960]  do_filp_open+0x36/0x7c
      [   11.161960]  ? _raw_spin_unlock+0x22/0x2c
      [   11.161960]  ? __alloc_fd+0x169/0x173
      [   11.161960]  do_sys_open+0x59/0xcc
      [   11.161960]  SyS_open+0x1d/0x1f
      [   11.161960]  do_int80_syscall_32+0x4f/0x61
      [   11.161960]  entry_INT80_32+0x2f/0x2f
      [   11.161960] EIP: 0xb76ad469
      [   11.161960] EFLAGS: 00000286 CPU: 0
      [   11.161960] EAX: ffffffda EBX: 08168ac8 ECX: 00008241 EDX: 000001b6
      [   11.161960] ESI: b75e46bc EDI: b7755000 EBP: bfbdb108 ESP: bfbdafc0
      [   11.161960]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
      
      Cc: stable@vger.kernel.org # 3.10 (requires 2e81a4ee as a prereq)
      Reported-by: NGeorge Spelvin <linux@sciencehorizons.net>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      c755e251
  22. 10 12月, 2016 1 次提交
  23. 21 11月, 2016 1 次提交
  24. 15 11月, 2016 1 次提交
    • D
      ext4: use current_time() for inode timestamps · eeca7ea1
      Deepa Dinamani 提交于
      CURRENT_TIME_SEC and CURRENT_TIME are not y2038 safe.
      current_time() will be transitioned to be y2038 safe
      along with vfs.
      
      current_time() returns timestamps according to the
      granularities set in the super_block.
      The granularity check in ext4_current_time() to call
      current_time() or CURRENT_TIME_SEC is not required.
      Use current_time() directly to obtain timestamps
      unconditionally, and remove ext4_current_time().
      
      Quota files are assumed to be on the same filesystem.
      Hence, use current_time() for these files as well.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      eeca7ea1
  25. 11 7月, 2016 1 次提交
  26. 27 4月, 2016 1 次提交
  27. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  28. 10 3月, 2016 1 次提交
  29. 09 3月, 2016 1 次提交
  30. 09 1月, 2016 1 次提交
  31. 18 10月, 2015 1 次提交
  32. 19 5月, 2015 1 次提交
    • T
      ext4 crypto: optimize filename encryption · 5b643f9c
      Theodore Ts'o 提交于
      Encrypt the filename as soon it is passed in by the user.  This avoids
      our needing to encrypt the filename 2 or 3 times while in the process
      of creating a filename.
      
      Similarly, when looking up a directory entry, encrypt the filename
      early, or if the encryption key is not available, base-64 decode the
      file syystem so that the hash value and the last 16 bytes of the
      encrypted filename is available in the new struct ext4_filename data
      structure.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      5b643f9c
  33. 16 4月, 2015 1 次提交
  34. 12 4月, 2015 2 次提交
  35. 03 4月, 2015 1 次提交
    • R
      ext4: fix transposition typo in format string · 80cfb71e
      Rasmus Villemoes 提交于
      According to C99, %*.s means the same as %*.0s, in other words, print as
      many spaces as the field width argument says and effectively ignore the
      string argument. That is certainly not what was meant here. The kernel's
      printf implementation, however, treats it as if the . was not there,
      i.e. as %*s. I don't know if de->name is nul-terminated or not, but in
      any case I'm guessing the intention was to use de->name_len as precision
      instead of field width.
      
      [ Note: this is debugging code which is commented out, so this is not
        security issue; a developer would have to explicitly enable
        INLINE_DIR_DEBUG before this would be an issue. ]
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      80cfb71e
  36. 06 12月, 2014 1 次提交
    • D
      ext4: ext4_da_convert_inline_data_to_extent drop locked page after error · 50db71ab
      Dmitry Monakhov 提交于
      Testcase:
      xfstests generic/270
      MKFS_OPTIONS="-q -I 256 -O inline_data,64bit"
      
      Call Trace:
       [<ffffffff81144c76>] lock_page+0x35/0x39 -------> DEADLOCK
       [<ffffffff81145260>] pagecache_get_page+0x65/0x15a
       [<ffffffff811507fc>] truncate_inode_pages_range+0x1db/0x45c
       [<ffffffff8120ea63>] ? ext4_da_get_block_prep+0x439/0x4b6
       [<ffffffff811b29b7>] ? __block_write_begin+0x284/0x29c
       [<ffffffff8120e62a>] ? ext4_change_inode_journal_flag+0x16b/0x16b
       [<ffffffff81150af0>] truncate_inode_pages+0x12/0x14
       [<ffffffff81247cb4>] ext4_truncate_failed_write+0x19/0x25
       [<ffffffff812488cf>] ext4_da_write_inline_data_begin+0x196/0x31c
       [<ffffffff81210dad>] ext4_da_write_begin+0x189/0x302
       [<ffffffff810c07ac>] ? trace_hardirqs_on+0xd/0xf
       [<ffffffff810ddd13>] ? read_seqcount_begin.clone.1+0x9f/0xcc
       [<ffffffff8114309d>] generic_perform_write+0xc7/0x1c6
       [<ffffffff810c040e>] ? mark_held_locks+0x59/0x77
       [<ffffffff811445d1>] __generic_file_write_iter+0x17f/0x1c5
       [<ffffffff8120726b>] ext4_file_write_iter+0x2a5/0x354
       [<ffffffff81185656>] ? file_start_write+0x2a/0x2c
       [<ffffffff8107bcdb>] ? bad_area_nosemaphore+0x13/0x15
       [<ffffffff811858ce>] new_sync_write+0x8a/0xb2
       [<ffffffff81186e7b>] vfs_write+0xb5/0x14d
       [<ffffffff81186ffb>] SyS_write+0x5c/0x8c
       [<ffffffff816f2529>] system_call_fastpath+0x12/0x17
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      50db71ab
  37. 03 12月, 2014 1 次提交