1. 11 4月, 2014 1 次提交
    • N
      ext4: fix COLLAPSE_RANGE test failure in data journalling mode · 1ce01c4a
      Namjae Jeon 提交于
      When mounting ext4 with data=journal option, xfstest shared/002 and
      shared/004 are currently failing as checksum computed for testfile
      does not match with the checksum computed in other journal modes.
      In case of data=journal mode, a call to filemap_write_and_wait_range
      will not flush anything to disk as buffers are not marked dirty in
      write_end. In collapse range this call is followed by a call to
      truncate_pagecache_range. Due to this, when checksum is computed,
      a portion of file is re-read from disk which replace valid data with
      NULL bytes and hence the reason for the difference in checksum.
      
      Calling ext4_force_commit before filemap_write_and_wait_range solves
      the issue as it will mark the buffers dirty during commit transaction
      which can be later synced by a call to filemap_write_and_wait_range.
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      1ce01c4a
  2. 08 4月, 2014 1 次提交
    • T
      ext4: update PF_MEMALLOC handling in ext4_write_inode() · 87f7e416
      Theodore Ts'o 提交于
      The special handling of PF_MEMALLOC callers in ext4_write_inode()
      shouldn't be necessary as there shouldn't be any. Warn about it. Also
      update comment before the function as it seems somewhat outdated.
      
      (Changes modeled on an ext3 patch posted by Jan Kara to the linux-ext4
      mailing list on Februaryt 28, 2014, which apparently never went into
      the ext3 tree.)
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Jan Kara <jack@suse.cz>
      87f7e416
  3. 07 4月, 2014 5 次提交
    • J
      ext4: fix jbd2 warning under heavy xattr load · ec4cb1aa
      Jan Kara 提交于
      When heavily exercising xattr code the assertion that
      jbd2_journal_dirty_metadata() shouldn't return error was triggered:
      
      WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237
      jbd2_journal_dirty_metadata+0x1ba/0x260()
      
      CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G    W 3.10.0-ceph-00049-g68d04c9 #1
      Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
       ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
       ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
       0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
      Call Trace:
       [<ffffffff816311b0>] dump_stack+0x19/0x1b
       [<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
       [<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
       [<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
       [<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
       [<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
       [<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
       [<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
       [<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
       [<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
       [<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
       [<ffffffff811935ce>] generic_setxattr+0x6e/0x90
       [<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
       [<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
       [<ffffffff8119421e>] setxattr+0x13e/0x1e0
       [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
       [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
       [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
       [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
       [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
       [<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
       [<ffffffff816407c2>] system_call_fastpath+0x16/0x1b
      
      The reason for the warning is that buffer_head passed into
      jbd2_journal_dirty_metadata() didn't have journal_head attached. This is
      caused by the following race of two ext4_xattr_release_block() calls:
      
      CPU1                                CPU2
      ext4_xattr_release_block()          ext4_xattr_release_block()
      lock_buffer(bh);
      /* False */
      if (BHDR(bh)->h_refcount == cpu_to_le32(1))
      } else {
        le32_add_cpu(&BHDR(bh)->h_refcount, -1);
        unlock_buffer(bh);
                                          lock_buffer(bh);
                                          /* True */
                                          if (BHDR(bh)->h_refcount == cpu_to_le32(1))
                                            get_bh(bh);
                                            ext4_free_blocks()
                                              ...
                                              jbd2_journal_forget()
                                                jbd2_journal_unfile_buffer()
                                                -> JH is gone
        error = ext4_handle_dirty_xattr_block(handle, inode, bh);
        -> triggers the warning
      
      We fix the problem by moving ext4_handle_dirty_xattr_block() under the
      buffer lock. Sadly this cannot be done in nojournal mode as that
      function can call sync_dirty_buffer() which would deadlock. Luckily in
      nojournal mode the race is harmless (we only dirty already freed buffer)
      and thus for nojournal mode we leave the dirtying outside of the buffer
      lock.
      Reported-by: NSage Weil <sage@inktank.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      ec4cb1aa
    • M
      ext4: note the error in ext4_end_bio() · 9503c67c
      Matthew Wilcox 提交于
      ext4_end_bio() currently throws away the error that it receives.  Chances
      are this is part of a spate of errors, one of which will end up getting
      the error returned to userspace somehow, but we shouldn't take that risk.
      Also print out the errno to aid in debug.
      Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.org
      9503c67c
    • A
      ext4: initialize multi-block allocator before checking block descriptors · 00764937
      Azat Khuzhin 提交于
      With EXT4FS_DEBUG ext4_count_free_clusters() will call
      ext4_read_block_bitmap() without s_group_info initialized, so we need to
      initialize multi-block allocator before.
      
      And dependencies that must be solved, to allow this:
      - multi-block allocator needs in group descriptors
      - need to install s_op before initializing multi-block allocator,
        because in ext4_mb_init_backend() new inode is created.
      - initialize number of group desc blocks (s_gdb_count) otherwise
        number of clusters returned by ext4_free_clusters_after_init() is not correct.
        (see ext4_bg_num_gdb_nometa())
      
      Here is the stack backtrace:
      
      (gdb) bt
       #0  ext4_get_group_info (group=0, sb=0xffff880079a10000) at ext4.h:2430
       #1  ext4_validate_block_bitmap (sb=sb@entry=0xffff880079a10000,
           desc=desc@entry=0xffff880056510000, block_group=block_group@entry=0,
           bh=bh@entry=0xffff88007bf2b2d8) at balloc.c:358
       #2  0xffffffff81232202 in ext4_wait_block_bitmap (sb=sb@entry=0xffff880079a10000,
           block_group=block_group@entry=0,
           bh=bh@entry=0xffff88007bf2b2d8) at balloc.c:476
       #3  0xffffffff81232eaf in ext4_read_block_bitmap (sb=sb@entry=0xffff880079a10000,
           block_group=block_group@entry=0) at balloc.c:489
       #4  0xffffffff81232fc0 in ext4_count_free_clusters (sb=sb@entry=0xffff880079a10000) at balloc.c:665
       #5  0xffffffff81259ffa in ext4_check_descriptors (first_not_zeroed=<synthetic pointer>,
           sb=0xffff880079a10000) at super.c:2143
       #6  ext4_fill_super (sb=sb@entry=0xffff880079a10000, data=<optimized out>,
           data@entry=0x0 <irq_stack_union>, silent=silent@entry=0) at super.c:3851
           ...
      Signed-off-by: NAzat Khuzhin <a3at.mail@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      00764937
    • K
      ext4: FIBMAP ioctl causes BUG_ON due to handle EXT_MAX_BLOCKS · 4adb6ab3
      Kazuya Mio 提交于
      When we try to get 2^32-1 block of the file which has the extent
      (ee_block=2^32-2, ee_len=1) with FIBMAP ioctl, it causes BUG_ON
      in ext4_ext_put_gap_in_cache().
      
      To avoid the problem, ext4_map_blocks() needs to check the file logical block
      number. ext4_ext_put_gap_in_cache() called via ext4_map_blocks() cannot
      handle 2^32-1 because the maximum file logical block number is 2^32-2.
      
      Note that ext4_ind_map_blocks() returns -EIO when the block number is invalid.
      So ext4_map_blocks() should also return the same errno.
      Signed-off-by: NKazuya Mio <k-mio@sx.jp.nec.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      4adb6ab3
    • C
      ext4: fix 64-bit number truncation warning · 666525df
      Chen Gang 提交于
      '0x7FDEADBEEF' will be truncated to 32-bit number under unicore32. Need
      append 'ULL' for it.
      
      The related warning (with allmodconfig under unicore32):
      
          CC [M]  fs/ext4/extents_status.o
        fs/ext4/extents_status.c: In function "__es_remove_extent":
        fs/ext4/extents_status.c:813: warning: integer constant is too large for "long" type
      Signed-off-by: NChen Gang <gang.chen.5i5j@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      666525df
  4. 04 4月, 2014 2 次提交
  5. 02 4月, 2014 1 次提交
    • E
      ext4: fix premature freeing of partial clusters split across leaf blocks · ad6599ab
      Eric Whitney 提交于
      Xfstests generic/311 and shared/298 fail when run on a bigalloc file
      system.  Kernel error messages produced during the tests report that
      blocks to be freed are already on the to-be-freed list.  When e2fsck
      is run at the end of the tests, it typically reports bad i_blocks and
      bad free blocks counts.
      
      The bug that causes these failures is located in ext4_ext_rm_leaf().
      Code at the end of the function frees a partial cluster if it's not
      shared with an extent remaining in the leaf.  However, if all the
      extents in the leaf have been removed, the code dereferences an
      invalid extent pointer (off the front of the leaf) when the check for
      sharing is made.  This generally has the effect of unconditionally
      freeing the partial cluster, which leads to the observed failures
      when the partial cluster is shared with the last extent in the next
      leaf.
      
      Fix this by attempting to free the cluster only if extents remain in
      the leaf.  Any remaining partial cluster will be freed if possible
      when the next leaf is processed or when leaf removal is complete.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      ad6599ab
  6. 01 4月, 2014 6 次提交
    • M
      ext4: add cross rename support · bd42998a
      Miklos Szeredi 提交于
      Implement RENAME_EXCHANGE flag in renameat2 syscall.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      bd42998a
    • M
      ext4: rename: split out helper functions · bd1af145
      Miklos Szeredi 提交于
      Cross rename (exchange source and dest) will need to call some of these
      helpers for both source and dest, while overwriting rename currently only
      calls them for one or the other.  This also makes the code easier to
      follow.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      bd1af145
    • M
      ext4: rename: move EMLINK check up · 0d7d5d67
      Miklos Szeredi 提交于
      Move checking i_nlink from after ext4_get_first_dir_block() to before.  The
      check doesn't rely on the result of that function and the function only
      fails on fs corruption, so the order shouldn't matter.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      0d7d5d67
    • M
      ext4: rename: create ext4_renament structure for local vars · c0d268c3
      Miklos Szeredi 提交于
      Need to split up ext4_rename() into helpers but there are too many local
      variables involved, so create a new structure.  This also, apparently,
      makes the generated code size slightly smaller.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      c0d268c3
    • M
      vfs: add RENAME_NOREPLACE flag · 0a7c3937
      Miklos Szeredi 提交于
      If this flag is specified and the target of the rename exists then the
      rename syscall fails with EEXIST.
      
      The VFS does the existence checking, so it is trivial to enable for most
      local filesystems.  This patch only enables it in ext4.
      
      For network filesystems the VFS check is not enough as there may be a race
      between a remote create and the rename, so these filesystems need to handle
      this flag in their ->rename() implementations to ensure atomicity.
      
      Andy writes about why this is useful:
      
      "The trivial answer: to eliminate the race condition from 'mv -i'.
      
      Another answer: there's a common pattern to atomically create a file
      with contents: open a temporary file, write to it, optionally fsync
      it, close it, then link(2) it to the final name, then unlink the
      temporary file.
      
      The reason to use link(2) is because it won't silently clobber the destination.
      
      This is annoying:
       - It requires an extra system call that shouldn't be necessary.
       - It doesn't work on (IMO sensible) filesystems that don't support
      hard links (e.g. vfat).
       - It's not atomic -- there's an intermediate state where both files exist.
       - It's ugly.
      
      The new rename flag will make this totally sensible.
      
      To be fair, on new enough kernels, you can also use O_TMPFILE and
      linkat to achieve the same thing even more cleanly."
      
      Suggested-by: Andy Lutomirski <luto@amacapital.net> 
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
      0a7c3937
    • L
      ext4: remove unneeded test of ret variable · e5b30416
      Lukas Czerner 提交于
      Currently in ext4_fallocate() and ext4_zero_range() we're testing ret
      variable along with new_size. However in ext4_fallocate() we just tested
      ret before and in ext4_zero_range() if will always be zero when we get
      there so there is no need to test it in both cases.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e5b30416
  7. 31 3月, 2014 1 次提交
  8. 25 3月, 2014 4 次提交
  9. 20 3月, 2014 1 次提交
    • T
      ext4: kill i_version support for Hurd-castrated file systems · c4f65706
      Theodore Ts'o 提交于
      The Hurd file system uses uses the inode field which is now used for
      i_version for its translator block.  This means that ext2 file systems
      that are formatted for GNU Hurd can't be used to support NFSv4.  Given
      that Hurd file systems don't support extents, and a huge number of
      modern file system features, this is no great loss.
      
      If we don't do this, the attempt to update the i_version field will
      stomp over the translator block field, which will cause file system
      corruption for Hurd file systems.  This can be replicated via:
      
      mke2fs -t ext2 -o hurd /dev/vdc
      mount -t ext4 /dev/vdc /vdc
      touch /vdc/bug0000
      umount /dev/vdc
      e2fsck -f /dev/vdc
      
      Addresses-Debian-Bug: #738758
      Reported-By: NGabriele Giacone <1o5g4r8o@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c4f65706
  10. 19 3月, 2014 4 次提交
    • T
      ext4: each filesystem creates and uses its own mb_cache · 9c191f70
      T Makphaibulchoke 提交于
      This patch adds new interfaces to create and destory cache,
      ext4_xattr_create_cache() and ext4_xattr_destroy_cache(), and remove
      the cache creation and destory calls from ex4_init_xattr() and
      ext4_exitxattr() in fs/ext4/xattr.c.
      
      fs/ext4/super.c has been changed so that when a filesystem is mounted
      a cache is allocated and attched to its ext4_sb_info structure.
      
      fs/mbcache.c has been changed so that only one slab allocator is
      allocated and used by all mbcache structures.
      Signed-off-by: NT. Makphaibulchoke <tmac@hp.com>
      9c191f70
    • L
      ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate · b8a86845
      Lukas Czerner 提交于
      Introduce new FALLOC_FL_ZERO_RANGE flag for fallocate. This has the same
      functionality as xfs ioctl XFS_IOC_ZERO_RANGE.
      
      It can be used to convert a range of file to zeros preferably without
      issuing data IO. Blocks should be preallocated for the regions that span
      holes in the file, and the entire range is preferable converted to
      unwritten extents
      
      This can be also used to preallocate blocks past EOF in the same way as
      with fallocate. Flag FALLOC_FL_KEEP_SIZE which should cause the inode
      size to remain the same.
      
      Also add appropriate tracepoints.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b8a86845
    • L
      ext4: refactor ext4_fallocate code · 0e8b6879
      Lukas Czerner 提交于
      Move block allocation out of the ext4_fallocate into separate function
      called ext4_alloc_file_blocks(). This will allow us to use the same
      allocation code for other allocation operations such as zero range which
      is commit in the next patch.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0e8b6879
    • L
      ext4: Update inode i_size after the preallocation · f282ac19
      Lukas Czerner 提交于
      Currently in ext4_fallocate we would update inode size, c_time and sync
      the file with every partial allocation which is entirely unnecessary. It
      is true that if the crash happens in the middle of truncate we might end
      up with unchanged i size, or c_time which I do not think is really a
      problem - it does not mean file system corruption in any way. Note that
      xfs is doing things the same way e.g. update all of the mentioned after
      the allocation is done.
      
      This commit moves all the updates after the allocation is done. In
      addition we also need to change m_time as not only inode has been change
      bot also data regions might have changed (unwritten extents). However
      m_time will be only updated when i_size changed.
      
      Also we do not need to be paranoid about changing the c_time only if the
      actual allocation have happened, we can change it even if we try to
      allocate only to find out that there are already block allocated. It's
      not really a big deal and it will save us some additional complexity.
      
      Also use ext4_debug, instead of ext4_warning in #ifdef EXT4FS_DEBUG
      section.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>-
      --
      v3: Do not remove the code to set EXT4_INODE_EOFBLOCKS flag
      
       fs/ext4/extents.c | 96 ++++++++++++++++++++++++-------------------------------
       1 file changed, 42 insertions(+), 54 deletions(-)
      f282ac19
  11. 14 3月, 2014 3 次提交
  12. 13 3月, 2014 2 次提交
    • T
      fs: push sync_filesystem() down to the file system's remount_fs() · 02b9984d
      Theodore Ts'o 提交于
      Previously, the no-op "mount -o mount /dev/xxx" operation when the
      file system is already mounted read-write causes an implied,
      unconditional syncfs().  This seems pretty stupid, and it's certainly
      documented or guaraunteed to do this, nor is it particularly useful,
      except in the case where the file system was mounted rw and is getting
      remounted read-only.
      
      However, it's possible that there might be some file systems that are
      actually depending on this behavior.  In most file systems, it's
      probably fine to only call sync_filesystem() when transitioning from
      read-write to read-only, and there are some file systems where this is
      not needed at all (for example, for a pseudo-filesystem or something
      like romfs).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Cc: Jan Kara <jack@suse.cz>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Anders Larsen <al@alarsen.net>
      Cc: Phillip Lougher <phillip@squashfs.org.uk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Cc: xfs@oss.sgi.com
      Cc: linux-btrfs@vger.kernel.org
      Cc: linux-cifs@vger.kernel.org
      Cc: samba-technical@lists.samba.org
      Cc: codalist@coda.cs.cmu.edu
      Cc: linux-ext4@vger.kernel.org
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Cc: fuse-devel@lists.sourceforge.net
      Cc: cluster-devel@redhat.com
      Cc: linux-mtd@lists.infradead.org
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: linux-nfs@vger.kernel.org
      Cc: linux-nilfs@vger.kernel.org
      Cc: linux-ntfs-dev@lists.sourceforge.net
      Cc: ocfs2-devel@oss.oracle.com
      Cc: reiserfs-devel@vger.kernel.org
      02b9984d
    • T
      jbd2: improve error messages for inconsistent journal heads · 66a4cb18
      Theodore Ts'o 提交于
      Fix up error messages printed when the transaction pointers in a
      journal head are inconsistent.  This improves the error messages which
      are printed when running xfstests generic/068 in data=journal mode.
      See the bug report at: https://bugzilla.kernel.org/show_bug.cgi?id=60786Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      66a4cb18
  13. 04 3月, 2014 1 次提交
    • J
      ext4: Speedup WB_SYNC_ALL pass called from sync(2) · 10542c22
      Jan Kara 提交于
      When doing filesystem wide sync, there's no need to force transaction
      commit (or synchronously write inode buffer) separately for each inode
      because ext4_sync_fs() takes care of forcing commit at the end (VFS
      takes care of flushing buffer cache, respectively). Most of the time
      this slowness doesn't manifest because previous WB_SYNC_NONE writeback
      doesn't leave much to write but when there are processes aggressively
      creating new files and several filesystems to sync, the sync slowness
      can be noticeable. In the following test script sync(1) takes around 6
      minutes when there are two ext4 filesystems mounted on a standard SATA
      drive. After this patch sync takes a couple of seconds so we have about
      two orders of magnitude improvement.
      
            function run_writers
            {
              for (( i = 0; i < 10; i++ )); do
                mkdir $1/dir$i
                for (( j = 0; j < 40000; j++ )); do
                  dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null
                done &
              done
            }
      
            for dir in "$@"; do
              run_writers $dir
            done
      
            sleep 40
            time sync
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      10542c22
  14. 24 2月, 2014 1 次提交
  15. 22 2月, 2014 1 次提交
  16. 21 2月, 2014 5 次提交
    • D
      ext4: merge uninitialized extents · a9b82415
      Darrick J. Wong 提交于
      Allow for merging uninitialized extents.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a9b82415
    • M
      ext4: avoid exposure of stale data in ext4_punch_hole() · e251f9bc
      Maxim Patlasov 提交于
      While handling punch-hole fallocate, it's useless to truncate page cache
      before removing the range from extent tree (or block map in indirect case)
      because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
      read) immediately after truncating page cache, but before updating extent
      tree (or block map). In that case the user will see stale data even after
      fallocate is completed.
      
      Until the problem of data corruption resulting from pages backed by
      already freed blocks is fully resolved, the simple thing we can do now
      is to add another truncation of pagecache after punch hole is done.
      Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      e251f9bc
    • E
      ext4: silence warnings in extent status tree debugging code · ce140cdd
      Eric Whitney 提交于
      Adjust the conversion specifications in a few optionally compiled debug
      messages to match the return type of ext4_es_status().  Also, make a
      couple of minor grammatical message edits while we're at it.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ce140cdd
    • E
      ext4: remove unused ac_ex_scanned · dc9ddd98
      Eric Sandeen 提交于
      When looking at a bug report with:
      
      > kernel: EXT4-fs: 0 scanned, 0 found
      
      I thought wow, 0 scanned, that's odd?  But it's not odd; it's printing
      a variable that is initialized to 0 and never touched again.
      
      It's never been used since the original merge, so I don't really even
      know what the original intent was, either.
      
      If anyone knows how to hook it up, speak now via patch, otherwise just
      yank it so it's not making a confusing situation more confusing in
      kernel logs.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      dc9ddd98
    • T
      ext4: avoid possible overflow in ext4_map_blocks() · e861b5e9
      Theodore Ts'o 提交于
      The ext4_map_blocks() function returns the number of blocks which
      satisfying the caller's request.  This number of blocks requested by
      the caller is specified by an unsigned integer, but the return value
      of ext4_map_blocks() is a signed integer (to accomodate error codes
      per the kernel's standard error signalling convention).
      
      Historically, overflows could never happen since mballoc() will refuse
      to allocate more than 2048 blocks at a time (which is something we
      should fix), and if the blocks were already allocated, the fact that
      there would be some number of intervening metadata blocks pretty much
      guaranteed that there could never be a contiguous region of data
      blocks that was greater than 2**31 blocks.
      
      However, this is now possible if there is a file system which is a bit
      bigger than 8TB, and is created using the new mke2fs hugeblock
      feature, which can create a perfectly contiguous file.  In that case,
      if a userspace program attempted to call fallocate() on this already
      fully allocated file, it's possible that ext4_map_blocks() could
      return a number large enough that it would overflow a signed integer,
      resulting in a ext4 thinking that the ext4_map_blocks() call had
      failed with some strange error code.
      
      Since ext4_map_blocks() is always free to return a smaller number of
      blocks than what was requested by the caller, fix this by capping the
      number of blocks that ext4_map_blocks() will ever try to map to 2**31
      - 1.  In practice this should never get hit, except by someone
      deliberately trying to provke the above-described bug.
      
      Thanks to the PaX team for asking whethre this could possibly happen
      in some off-line discussions about using some static code checking
      technology they are developing to find bugs in kernel code.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e861b5e9
  17. 20 2月, 2014 1 次提交
    • T
      ext4: make sure ex.fe_logical is initialized · ab0c00fc
      Theodore Ts'o 提交于
      The lowest levels of mballoc set all of the fields of struct
      ext4_free_extent except for fe_logical, since they are just trying to
      find the requested free set of blocks, and the logical block hasn't
      been set yet.  This makes some static code checkers sad.  Set it to
      various different debug values, which would be useful when
      debugging mballoc if these values were to ever show up due to the
      parts of mballoc triyng to use ac->ac_b_ex.fe_logical before it is
      properly upper layers of mballoc failing to properly set, usually by
      ext4_mb_use_best_found().
      
      Addresses-Coverity-Id: #139697
      Addresses-Coverity-Id: #139698
      Addresses-Coverity-Id: #139699
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      
      ab0c00fc