1. 30 8月, 2016 7 次提交
  2. 12 8月, 2016 2 次提交
    • J
      ext4: avoid deadlock when expanding inode size · 2e81a4ee
      Jan Kara 提交于
      When we need to move xattrs into external xattr block, we call
      ext4_xattr_block_set() from ext4_expand_extra_isize_ea(). That may end
      up calling ext4_mark_inode_dirty() again which will recurse back into
      the inode expansion code leading to deadlocks.
      
      Protect from recursion using EXT4_STATE_NO_EXPAND inode flag and move
      its management into ext4_expand_extra_isize_ea() since its manipulation
      is safe there (due to xattr_sem) from possible races with
      ext4_xattr_set_handle() which plays with it as well.
      
      CC: stable@vger.kernel.org   # 4.4.x
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      2e81a4ee
    • J
      ext4: properly align shifted xattrs when expanding inodes · 443a8c41
      Jan Kara 提交于
      We did not count with the padding of xattr value when computing desired
      shift of xattrs in the inode when expanding i_extra_isize. As a result
      we could create unaligned start of inline xattrs. Account for alignment
      properly.
      
      CC: stable@vger.kernel.org  # 4.4.x-
      Signed-off-by: NJan Kara <jack@suse.cz>
      443a8c41
  3. 11 8月, 2016 2 次提交
    • J
      ext4: fix xattr shifting when expanding inodes part 2 · 418c12d0
      Jan Kara 提交于
      When multiple xattrs need to be moved out of inode, we did not properly
      recompute total size of xattr headers in the inode and the new header
      position. Thus when moving the second and further xattr we asked
      ext4_xattr_shift_entries() to move too much and from the wrong place,
      resulting in possible xattr value corruption or general memory
      corruption.
      
      CC: stable@vger.kernel.org  # 4.4.x
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      418c12d0
    • J
      ext4: fix xattr shifting when expanding inodes · d0141191
      Jan Kara 提交于
      The code in ext4_expand_extra_isize_ea() treated new_extra_isize
      argument sometimes as the desired target i_extra_isize and sometimes as
      the amount by which we need to grow current i_extra_isize. These happen
      to coincide when i_extra_isize is 0 which used to be the common case and
      so nobody noticed this until recently when we added i_projid to the
      inode and so i_extra_isize now needs to grow from 28 to 32 bytes.
      
      The result of these bugs was that we sometimes unnecessarily decided to
      move xattrs out of inode even if there was enough space and we often
      ended up corrupting in-inode xattrs because arguments to
      ext4_xattr_shift_entries() were just wrong. This could demonstrate
      itself as BUG_ON in ext4_xattr_shift_entries() triggering.
      
      Fix the problem by introducing new isize_diff variable and use it where
      appropriate.
      
      CC: stable@vger.kernel.org   # 4.4.x
      Reported-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      d0141191
  4. 01 8月, 2016 2 次提交
  5. 27 7月, 2016 2 次提交
    • M
      mm, memcg: use consistent gfp flags during readahead · 8a5c743e
      Michal Hocko 提交于
      Vladimir has noticed that we might declare memcg oom even during
      readahead because read_pages only uses GFP_KERNEL (with mapping_gfp
      restriction) while __do_page_cache_readahead uses
      page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from
      OOMs.  This gfp mask discrepancy is really unfortunate and easily
      fixable.  Drop page_cache_alloc_readahead() which only has one user and
      outsource the gfp_mask logic into readahead_gfp_mask and propagate this
      mask from __do_page_cache_readahead down to read_pages.
      
      This alone would have only very limited impact as most filesystems are
      implementing ->readpages and the common implementation mpage_readpages
      does GFP_KERNEL (with mapping_gfp restriction) again.  We can tell it to
      use readahead_gfp_mask instead as this function is called only during
      readahead as well.  The same applies to read_cache_pages.
      
      ext4 has its own ext4_mpage_readpages but the path which has pages !=
      NULL can use the same gfp mask.  Btrfs, cifs, f2fs and orangefs are
      doing a very similar pattern to mpage_readpages so the same can be
      applied to them as well.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [mhocko@suse.com: restrict gfp mask in mpage_alloc]
        Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Chris Mason <clm@fb.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Mike Marshall <hubcap@omnibond.com>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Cc: Changman Lee <cm224.lee@samsung.com>
      Cc: Chao Yu <yuchao0@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a5c743e
    • R
      dax: remote unused fault wrappers · 6b524995
      Ross Zwisler 提交于
      Remove the unused wrappers dax_fault() and dax_pmd_fault().  After this
      removal, rename __dax_fault() and __dax_pmd_fault() to dax_fault() and
      dax_pmd_fault() respectively, and update all callers.
      
      The dax_fault() and dax_pmd_fault() wrappers were initially intended to
      capture some filesystem independent functionality around page faults
      (calling sb_start_pagefault() & sb_end_pagefault(), updating file mtime
      and ctime).
      
      However, the following commits:
      
         5726b27b ("ext2: Add locking for DAX faults")
         ea3d7209 ("ext4: fix races between page faults and hole punching")
      
      added locking to the ext2 and ext4 filesystems after these common
      operations but before __dax_fault() and __dax_pmd_fault() were called.
      This means that these wrappers are no longer used, and are unlikely to
      be used in the future.
      
      XFS has had locking analogous to what was recently added to ext2 and
      ext4 since DAX support was initially introduced by:
      
         6b698ede ("xfs: add DAX file operations support")
      
      Link: http://lkml.kernel.org/r/20160714214049.20075-2-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b524995
  6. 15 7月, 2016 3 次提交
    • V
      ext4: verify extent header depth · 7bc94916
      Vegard Nossum 提交于
      Although the extent tree depth of 5 should enough be for the worst
      case of 2*32 extents of length 1, the extent tree code does not
      currently to merge nodes which are less than half-full with a sibling
      node, or to shrink the tree depth if possible.  So it's possible, at
      least in theory, for the tree depth to be greater than 5.  However,
      even in the worst case, a tree depth of 32 is highly unlikely, and if
      the file system is maliciously corrupted, an insanely large eh_depth
      can cause memory allocation failures that will trigger kernel warnings
      (here, eh_depth = 65280):
      
          JBD2: ext4.exe wants too many credits credits:195849 rsv_credits:0 max:256
          ------------[ cut here ]------------
          WARNING: CPU: 0 PID: 50 at fs/jbd2/transaction.c:293 start_this_handle+0x569/0x580
          CPU: 0 PID: 50 Comm: ext4.exe Not tainted 4.7.0-rc5+ #508
          Stack:
           604a8947 625badd8 0002fd09 00000000
           60078643 00000000 62623910 601bf9bc
           62623970 6002fc84 626239b0 900000125
          Call Trace:
           [<6001c2dc>] show_stack+0xdc/0x1a0
           [<601bf9bc>] dump_stack+0x2a/0x2e
           [<6002fc84>] __warn+0x114/0x140
           [<6002fdff>] warn_slowpath_null+0x1f/0x30
           [<60165829>] start_this_handle+0x569/0x580
           [<60165d4e>] jbd2__journal_start+0x11e/0x220
           [<60146690>] __ext4_journal_start_sb+0x60/0xa0
           [<60120a81>] ext4_truncate+0x131/0x3a0
           [<60123677>] ext4_setattr+0x757/0x840
           [<600d5d0f>] notify_change+0x16f/0x2a0
           [<600b2b16>] do_truncate+0x76/0xc0
           [<600c3e56>] path_openat+0x806/0x1300
           [<600c55c9>] do_filp_open+0x89/0xf0
           [<600b4074>] do_sys_open+0x134/0x1e0
           [<600b4140>] SyS_open+0x20/0x30
           [<6001ea68>] handle_syscall+0x88/0x90
           [<600295fd>] userspace+0x3fd/0x500
           [<6001ac55>] fork_handler+0x85/0x90
      
          ---[ end trace 08b0b88b6387a244 ]---
      
      [ Commit message modified and the extent tree depath check changed
      from 5 to 32 -- tytso ]
      
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      7bc94916
    • V
      ext4: short-cut orphan cleanup on error · c65d5c6c
      Vegard Nossum 提交于
      If we encounter a filesystem error during orphan cleanup, we should stop.
      Otherwise, we may end up in an infinite loop where the same inode is
      processed again and again.
      
          EXT4-fs (loop0): warning: checktime reached, running e2fsck is recommended
          EXT4-fs error (device loop0): ext4_mb_generate_buddy:758: group 2, block bitmap and bg descriptor inconsistent: 6117 vs 0 free clusters
          Aborting journal on device loop0-8.
          EXT4-fs (loop0): Remounting filesystem read-only
          EXT4-fs error (device loop0) in ext4_free_blocks:4895: Journal has aborted
          EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
          EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
          EXT4-fs error (device loop0) in ext4_ext_remove_space:3068: IO failure
          EXT4-fs error (device loop0) in ext4_ext_truncate:4667: Journal has aborted
          EXT4-fs error (device loop0) in ext4_orphan_del:2927: Journal has aborted
          EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
          EXT4-fs (loop0): Inode 16 (00000000618192a0): orphan list check failed!
          [...]
          EXT4-fs (loop0): Inode 16 (0000000061819748): orphan list check failed!
          [...]
          EXT4-fs (loop0): Inode 16 (0000000061819bf0): orphan list check failed!
          [...]
      
      See-also: c9eb13a9 ("ext4: fix hang when processing corrupted orphaned inode list")
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      c65d5c6c
    • V
      ext4: fix reference counting bug on block allocation error · 554a5ccc
      Vegard Nossum 提交于
      If we hit this error when mounted with errors=continue or
      errors=remount-ro:
      
          EXT4-fs error (device loop0): ext4_mb_mark_diskspace_used:2940: comm ext4.exe: Allocating blocks 5090-6081 which overlap fs metadata
      
      then ext4_mb_new_blocks() will call ext4_mb_release_context() and try to
      continue. However, ext4_mb_release_context() is the wrong thing to call
      here since we are still actually using the allocation context.
      
      Instead, just error out. We could retry the allocation, but there is a
      possibility of getting stuck in an infinite loop instead, so this seems
      safer.
      
      [ Fixed up so we don't return EAGAIN to userspace. --tytso ]
      
      Fixes: 8556e8f3 ("ext4: Don't allow new groups to be added during block allocation")
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      554a5ccc
  7. 11 7月, 2016 1 次提交
  8. 06 7月, 2016 3 次提交
  9. 04 7月, 2016 5 次提交
  10. 30 6月, 2016 1 次提交
    • V
      ext4: check for extents that wrap around · f70749ca
      Vegard Nossum 提交于
      An extent with lblock = 4294967295 and len = 1 will pass the
      ext4_valid_extent() test:
      
      	ext4_lblk_t last = lblock + len - 1;
      
      	if (len == 0 || lblock > last)
      		return 0;
      
      since last = 4294967295 + 1 - 1 = 4294967295. This would later trigger
      the BUG_ON(es->es_lblk + es->es_len < es->es_lblk) in ext4_es_end().
      
      We can simplify it by removing the - 1 altogether and changing the test
      to use lblock + len <= lblock, since now if len = 0, then lblock + 0 ==
      lblock and it fails, and if len > 0 then lblock + len > lblock in order
      to pass (i.e. it doesn't overflow).
      
      Fixes: 5946d089 ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
      Fixes: 2f974865 ("ext4: check for zero length extent explicitly")
      Cc: Eryu Guan <guaneryu@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPhil Turnbull <phil.turnbull@oracle.com>
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      f70749ca
  11. 27 6月, 2016 2 次提交
  12. 09 6月, 2016 1 次提交
    • M
      ext4: use bio op helprs in ext4 crypto code · 60a40096
      Mike Christie 提交于
      This was missed from my last patchset.
      
      This patch has ext4 crypto code use the bio op helper
      to set the operation. The operation (discard, write, writesame,
      etc) is now defined seperately from the other REQ bits. They
      still share the bi_rw field to save space, so we use these
      helpers so modules do not have to worry about setting/overwriting
      info.
      
      Jens, I am not sure how you handle patches on top of patches
      in the next branches. If you merge patches that fix issues
      in previous patches in next, then this patch could be part
      of
      
      commit 95fe6c1a
      Author: Mike Christie <mchristi@redhat.com>
      Date:   Sun Jun 5 14:31:48 2016 -0500
      
          block, fs, mm, drivers: use bio set/get op accessors
      Signed-off-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      60a40096
  13. 08 6月, 2016 4 次提交
  14. 30 5月, 2016 1 次提交
  15. 28 5月, 2016 1 次提交
  16. 21 5月, 2016 1 次提交
  17. 17 5月, 2016 2 次提交