1. 08 10月, 2016 1 次提交
  2. 10 9月, 2016 1 次提交
  3. 12 8月, 2016 2 次提交
    • J
      ext4: avoid deadlock when expanding inode size · 2e81a4ee
      Jan Kara 提交于
      When we need to move xattrs into external xattr block, we call
      ext4_xattr_block_set() from ext4_expand_extra_isize_ea(). That may end
      up calling ext4_mark_inode_dirty() again which will recurse back into
      the inode expansion code leading to deadlocks.
      
      Protect from recursion using EXT4_STATE_NO_EXPAND inode flag and move
      its management into ext4_expand_extra_isize_ea() since its manipulation
      is safe there (due to xattr_sem) from possible races with
      ext4_xattr_set_handle() which plays with it as well.
      
      CC: stable@vger.kernel.org   # 4.4.x
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      2e81a4ee
    • J
      ext4: properly align shifted xattrs when expanding inodes · 443a8c41
      Jan Kara 提交于
      We did not count with the padding of xattr value when computing desired
      shift of xattrs in the inode when expanding i_extra_isize. As a result
      we could create unaligned start of inline xattrs. Account for alignment
      properly.
      
      CC: stable@vger.kernel.org  # 4.4.x-
      Signed-off-by: NJan Kara <jack@suse.cz>
      443a8c41
  4. 11 8月, 2016 2 次提交
    • J
      ext4: fix xattr shifting when expanding inodes part 2 · 418c12d0
      Jan Kara 提交于
      When multiple xattrs need to be moved out of inode, we did not properly
      recompute total size of xattr headers in the inode and the new header
      position. Thus when moving the second and further xattr we asked
      ext4_xattr_shift_entries() to move too much and from the wrong place,
      resulting in possible xattr value corruption or general memory
      corruption.
      
      CC: stable@vger.kernel.org  # 4.4.x
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      418c12d0
    • J
      ext4: fix xattr shifting when expanding inodes · d0141191
      Jan Kara 提交于
      The code in ext4_expand_extra_isize_ea() treated new_extra_isize
      argument sometimes as the desired target i_extra_isize and sometimes as
      the amount by which we need to grow current i_extra_isize. These happen
      to coincide when i_extra_isize is 0 which used to be the common case and
      so nobody noticed this until recently when we added i_projid to the
      inode and so i_extra_isize now needs to grow from 28 to 32 bytes.
      
      The result of these bugs was that we sometimes unnecessarily decided to
      move xattrs out of inode even if there was enough space and we often
      ended up corrupting in-inode xattrs because arguments to
      ext4_xattr_shift_entries() were just wrong. This could demonstrate
      itself as BUG_ON in ext4_xattr_shift_entries() triggering.
      
      Fix the problem by introducing new isize_diff variable and use it where
      appropriate.
      
      CC: stable@vger.kernel.org   # 4.4.x
      Reported-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      d0141191
  5. 01 8月, 2016 2 次提交
  6. 27 7月, 2016 2 次提交
    • M
      mm, memcg: use consistent gfp flags during readahead · 8a5c743e
      Michal Hocko 提交于
      Vladimir has noticed that we might declare memcg oom even during
      readahead because read_pages only uses GFP_KERNEL (with mapping_gfp
      restriction) while __do_page_cache_readahead uses
      page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from
      OOMs.  This gfp mask discrepancy is really unfortunate and easily
      fixable.  Drop page_cache_alloc_readahead() which only has one user and
      outsource the gfp_mask logic into readahead_gfp_mask and propagate this
      mask from __do_page_cache_readahead down to read_pages.
      
      This alone would have only very limited impact as most filesystems are
      implementing ->readpages and the common implementation mpage_readpages
      does GFP_KERNEL (with mapping_gfp restriction) again.  We can tell it to
      use readahead_gfp_mask instead as this function is called only during
      readahead as well.  The same applies to read_cache_pages.
      
      ext4 has its own ext4_mpage_readpages but the path which has pages !=
      NULL can use the same gfp mask.  Btrfs, cifs, f2fs and orangefs are
      doing a very similar pattern to mpage_readpages so the same can be
      applied to them as well.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [mhocko@suse.com: restrict gfp mask in mpage_alloc]
        Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Chris Mason <clm@fb.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Mike Marshall <hubcap@omnibond.com>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Cc: Changman Lee <cm224.lee@samsung.com>
      Cc: Chao Yu <yuchao0@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a5c743e
    • R
      dax: remote unused fault wrappers · 6b524995
      Ross Zwisler 提交于
      Remove the unused wrappers dax_fault() and dax_pmd_fault().  After this
      removal, rename __dax_fault() and __dax_pmd_fault() to dax_fault() and
      dax_pmd_fault() respectively, and update all callers.
      
      The dax_fault() and dax_pmd_fault() wrappers were initially intended to
      capture some filesystem independent functionality around page faults
      (calling sb_start_pagefault() & sb_end_pagefault(), updating file mtime
      and ctime).
      
      However, the following commits:
      
         5726b27b ("ext2: Add locking for DAX faults")
         ea3d7209 ("ext4: fix races between page faults and hole punching")
      
      added locking to the ext2 and ext4 filesystems after these common
      operations but before __dax_fault() and __dax_pmd_fault() were called.
      This means that these wrappers are no longer used, and are unlikely to
      be used in the future.
      
      XFS has had locking analogous to what was recently added to ext2 and
      ext4 since DAX support was initially introduced by:
      
         6b698ede ("xfs: add DAX file operations support")
      
      Link: http://lkml.kernel.org/r/20160714214049.20075-2-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b524995
  7. 15 7月, 2016 3 次提交
    • V
      ext4: verify extent header depth · 7bc94916
      Vegard Nossum 提交于
      Although the extent tree depth of 5 should enough be for the worst
      case of 2*32 extents of length 1, the extent tree code does not
      currently to merge nodes which are less than half-full with a sibling
      node, or to shrink the tree depth if possible.  So it's possible, at
      least in theory, for the tree depth to be greater than 5.  However,
      even in the worst case, a tree depth of 32 is highly unlikely, and if
      the file system is maliciously corrupted, an insanely large eh_depth
      can cause memory allocation failures that will trigger kernel warnings
      (here, eh_depth = 65280):
      
          JBD2: ext4.exe wants too many credits credits:195849 rsv_credits:0 max:256
          ------------[ cut here ]------------
          WARNING: CPU: 0 PID: 50 at fs/jbd2/transaction.c:293 start_this_handle+0x569/0x580
          CPU: 0 PID: 50 Comm: ext4.exe Not tainted 4.7.0-rc5+ #508
          Stack:
           604a8947 625badd8 0002fd09 00000000
           60078643 00000000 62623910 601bf9bc
           62623970 6002fc84 626239b0 900000125
          Call Trace:
           [<6001c2dc>] show_stack+0xdc/0x1a0
           [<601bf9bc>] dump_stack+0x2a/0x2e
           [<6002fc84>] __warn+0x114/0x140
           [<6002fdff>] warn_slowpath_null+0x1f/0x30
           [<60165829>] start_this_handle+0x569/0x580
           [<60165d4e>] jbd2__journal_start+0x11e/0x220
           [<60146690>] __ext4_journal_start_sb+0x60/0xa0
           [<60120a81>] ext4_truncate+0x131/0x3a0
           [<60123677>] ext4_setattr+0x757/0x840
           [<600d5d0f>] notify_change+0x16f/0x2a0
           [<600b2b16>] do_truncate+0x76/0xc0
           [<600c3e56>] path_openat+0x806/0x1300
           [<600c55c9>] do_filp_open+0x89/0xf0
           [<600b4074>] do_sys_open+0x134/0x1e0
           [<600b4140>] SyS_open+0x20/0x30
           [<6001ea68>] handle_syscall+0x88/0x90
           [<600295fd>] userspace+0x3fd/0x500
           [<6001ac55>] fork_handler+0x85/0x90
      
          ---[ end trace 08b0b88b6387a244 ]---
      
      [ Commit message modified and the extent tree depath check changed
      from 5 to 32 -- tytso ]
      
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      7bc94916
    • V
      ext4: short-cut orphan cleanup on error · c65d5c6c
      Vegard Nossum 提交于
      If we encounter a filesystem error during orphan cleanup, we should stop.
      Otherwise, we may end up in an infinite loop where the same inode is
      processed again and again.
      
          EXT4-fs (loop0): warning: checktime reached, running e2fsck is recommended
          EXT4-fs error (device loop0): ext4_mb_generate_buddy:758: group 2, block bitmap and bg descriptor inconsistent: 6117 vs 0 free clusters
          Aborting journal on device loop0-8.
          EXT4-fs (loop0): Remounting filesystem read-only
          EXT4-fs error (device loop0) in ext4_free_blocks:4895: Journal has aborted
          EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
          EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
          EXT4-fs error (device loop0) in ext4_ext_remove_space:3068: IO failure
          EXT4-fs error (device loop0) in ext4_ext_truncate:4667: Journal has aborted
          EXT4-fs error (device loop0) in ext4_orphan_del:2927: Journal has aborted
          EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
          EXT4-fs (loop0): Inode 16 (00000000618192a0): orphan list check failed!
          [...]
          EXT4-fs (loop0): Inode 16 (0000000061819748): orphan list check failed!
          [...]
          EXT4-fs (loop0): Inode 16 (0000000061819bf0): orphan list check failed!
          [...]
      
      See-also: c9eb13a9 ("ext4: fix hang when processing corrupted orphaned inode list")
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      c65d5c6c
    • V
      ext4: fix reference counting bug on block allocation error · 554a5ccc
      Vegard Nossum 提交于
      If we hit this error when mounted with errors=continue or
      errors=remount-ro:
      
          EXT4-fs error (device loop0): ext4_mb_mark_diskspace_used:2940: comm ext4.exe: Allocating blocks 5090-6081 which overlap fs metadata
      
      then ext4_mb_new_blocks() will call ext4_mb_release_context() and try to
      continue. However, ext4_mb_release_context() is the wrong thing to call
      here since we are still actually using the allocation context.
      
      Instead, just error out. We could retry the allocation, but there is a
      possibility of getting stuck in an infinite loop instead, so this seems
      safer.
      
      [ Fixed up so we don't return EAGAIN to userspace. --tytso ]
      
      Fixes: 8556e8f3 ("ext4: Don't allow new groups to be added during block allocation")
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      554a5ccc
  8. 11 7月, 2016 1 次提交
  9. 06 7月, 2016 3 次提交
  10. 04 7月, 2016 5 次提交
  11. 30 6月, 2016 1 次提交
    • V
      ext4: check for extents that wrap around · f70749ca
      Vegard Nossum 提交于
      An extent with lblock = 4294967295 and len = 1 will pass the
      ext4_valid_extent() test:
      
      	ext4_lblk_t last = lblock + len - 1;
      
      	if (len == 0 || lblock > last)
      		return 0;
      
      since last = 4294967295 + 1 - 1 = 4294967295. This would later trigger
      the BUG_ON(es->es_lblk + es->es_len < es->es_lblk) in ext4_es_end().
      
      We can simplify it by removing the - 1 altogether and changing the test
      to use lblock + len <= lblock, since now if len = 0, then lblock + 0 ==
      lblock and it fails, and if len > 0 then lblock + len > lblock in order
      to pass (i.e. it doesn't overflow).
      
      Fixes: 5946d089 ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
      Fixes: 2f974865 ("ext4: check for zero length extent explicitly")
      Cc: Eryu Guan <guaneryu@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPhil Turnbull <phil.turnbull@oracle.com>
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      f70749ca
  12. 27 6月, 2016 2 次提交
  13. 09 6月, 2016 1 次提交
    • M
      ext4: use bio op helprs in ext4 crypto code · 60a40096
      Mike Christie 提交于
      This was missed from my last patchset.
      
      This patch has ext4 crypto code use the bio op helper
      to set the operation. The operation (discard, write, writesame,
      etc) is now defined seperately from the other REQ bits. They
      still share the bi_rw field to save space, so we use these
      helpers so modules do not have to worry about setting/overwriting
      info.
      
      Jens, I am not sure how you handle patches on top of patches
      in the next branches. If you merge patches that fix issues
      in previous patches in next, then this patch could be part
      of
      
      commit 95fe6c1a
      Author: Mike Christie <mchristi@redhat.com>
      Date:   Sun Jun 5 14:31:48 2016 -0500
      
          block, fs, mm, drivers: use bio set/get op accessors
      Signed-off-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      60a40096
  14. 08 6月, 2016 4 次提交
  15. 30 5月, 2016 1 次提交
  16. 28 5月, 2016 1 次提交
  17. 21 5月, 2016 1 次提交
  18. 17 5月, 2016 2 次提交
  19. 13 5月, 2016 5 次提交
    • J
      ext4: pre-zero allocated blocks for DAX IO · 12735f88
      Jan Kara 提交于
      Currently ext4 treats DAX IO the same way as direct IO. I.e., it
      allocates unwritten extents before IO is done and converts unwritten
      extents afterwards. However this way DAX IO can race with page fault to
      the same area:
      
      ext4_ext_direct_IO()				dax_fault()
        dax_io()
          get_block() - allocates unwritten extent
          copy_from_iter_pmem()
      						  get_block() - converts
      						    unwritten block to
      						    written and zeroes it
      						    out
        ext4_convert_unwritten_extents()
      
      So data written with DAX IO gets lost. Similarly dax_new_buf() called
      from dax_io() can overwrite data that has been already written to the
      block via mmap.
      
      Fix the problem by using pre-zeroed blocks for DAX IO the same way as we
      use them for DAX mmap. The downside of this solution is that every
      allocating write writes each block twice (once zeros, once data). Fixing
      the race with locking is possible as well however we would need to
      lock-out faults for the whole range written to by DAX IO. And that is
      not easy to do without locking-out faults for the whole file which seems
      too aggressive.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      12735f88
    • J
      ext4: refactor direct IO code · 914f82a3
      Jan Kara 提交于
      Currently ext4 direct IO handling is split between ext4_ext_direct_IO()
      and ext4_ind_direct_IO(). However the extent based function calls into
      the indirect based one for some cases and for example it is not able to
      handle file extending. Previously it was not also properly handling
      retries in case of ENOSPC errors. With DAX things would get even more
      contrieved so just refactor the direct IO code and instead of indirect /
      extent split do the split to read vs writes.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      914f82a3
    • J
      ext4: fix race in transient ENOSPC detection · dbc427ce
      Jan Kara 提交于
      When there are blocks to free in the running transaction, block
      allocator can return ENOSPC although the filesystem has some blocks to
      free. We use ext4_should_retry_alloc() to force commit of the current
      transaction and return whether anything was committed so that it makes
      sense to retry the allocation. However the transaction may get committed
      after block allocation fails but before we call
      ext4_should_retry_alloc(). So ext4_should_retry_alloc() returns false
      because there is nothing to commit and we wrongly return ENOSPC.
      
      Fix the race by unconditionally returning 1 from ext4_should_retry_alloc()
      when we tried to commit a transaction. This should not add any
      unnecessary retries since we had a transaction running a while ago when
      trying to allocate blocks and we want to retry the allocation once that
      transaction has committed anyway.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      dbc427ce
    • J
      ext4: handle transient ENOSPC properly for DAX · 7cb476f8
      Jan Kara 提交于
      ext4_dax_get_blocks() was accidentally omitted fixing get blocks
      handlers to properly handle transient ENOSPC errors. Fix it now to use
      ext4_get_blocks_trans() helper which takes care of these errors.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      7cb476f8
    • A
      ext4: switch to ->iterate_shared() · ae05327a
      Al Viro 提交于
      Note that we need relax_dir() equivalent for directories
      locked shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ae05327a