1. 24 1月, 2019 1 次提交
  2. 02 10月, 2018 4 次提交
    • E
      ext4: adjust reserved cluster count when removing extents · 9fe67149
      Eric Whitney 提交于
      Modify ext4_ext_remove_space() and the code it calls to correct the
      reserved cluster count for pending reservations (delayed allocated
      clusters shared with allocated blocks) when a block range is removed
      from the extent tree.  Pending reservations may be found for the clusters
      at the ends of written or unwritten extents when a block range is removed.
      If a physical cluster at the end of an extent is freed, it's necessary
      to increment the reserved cluster count to maintain correct accounting
      if the corresponding logical cluster is shared with at least one
      delayed and unwritten extent as found in the extents status tree.
      
      Add a new function, ext4_rereserve_cluster(), to reapply a reservation
      on a delayed allocated cluster sharing blocks with a freed allocated
      cluster.  To avoid ENOSPC on reservation, a flag is applied to
      ext4_free_blocks() to briefly defer updating the freeclusters counter
      when an allocated cluster is freed.  This prevents another thread
      from allocating the freed block before the reservation can be reapplied.
      
      Redefine the partial cluster object as a struct to carry more state
      information and to clarify the code using it.
      
      Adjust the conditional code structure in ext4_ext_remove_space to
      reduce the indentation level in the main body of the code to improve
      readability.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      9fe67149
    • E
      ext4: reduce reserved cluster count by number of allocated clusters · b6bf9171
      Eric Whitney 提交于
      Ext4 does not always reduce the reserved cluster count by the number
      of clusters allocated when mapping a delayed extent.  It sometimes
      adds back one or more clusters after allocation if delalloc blocks
      adjacent to the range allocated by ext4_ext_map_blocks() share the
      clusters newly allocated for that range.  However, this overcounts
      the number of clusters needed to satisfy future mapping requests
      (holding one or more reservations for clusters that have already been
      allocated) and premature ENOSPC and quota failures, etc., result.
      
      Ext4 also does not reduce the reserved cluster count when allocating
      clusters for non-delayed allocated writes that have previously been
      reserved for delayed writes.  This also results in overcounts.
      
      To make it possible to handle reserved cluster accounting for
      fallocated regions in the same manner as used for other non-delayed
      writes, do the reserved cluster accounting for them at the time of
      allocation.  In the current code, this is only done later when a
      delayed extent sharing the fallocated region is finally mapped.
      
      Address comment correcting handling of unsigned long long constant
      from Jan Kara's review of RFC version of this patch.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b6bf9171
    • E
      ext4: fix reserved cluster accounting at delayed write time · 0b02f4c0
      Eric Whitney 提交于
      The code in ext4_da_map_blocks sometimes reserves space for more
      delayed allocated clusters than it should, resulting in premature
      ENOSPC, exceeded quota, and inaccurate free space reporting.
      
      Fix this by checking for written and unwritten blocks shared in the
      same cluster with the newly delayed allocated block.  A cluster
      reservation should not be made for a cluster for which physical space
      has already been allocated.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      0b02f4c0
    • E
      ext4: generalize extents status tree search functions · ad431025
      Eric Whitney 提交于
      Ext4 contains a few functions that are used to search for delayed
      extents or blocks in the extents status tree.  Rather than duplicate
      code to add new functions to search for extents with different status
      values, such as written or a combination of delayed and unwritten,
      generalize the existing code to search for caller-specified extents
      status values.  Also, move this code into extents_status.c where it
      is better associated with the data structures it operates upon, and
      where it can be more readily used to implement new extents status tree
      functions that might want a broader scope for i_es_lock.
      
      Three missing static specifiers in RFC version of patch reported and
      fixed by Fengguang Wu <fengguang.wu@intel.com>.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ad431025
  3. 30 7月, 2018 1 次提交
  4. 15 6月, 2018 1 次提交
  5. 13 6月, 2018 1 次提交
    • K
      treewide: kzalloc() -> kcalloc() · 6396bb22
      Kees Cook 提交于
      The kzalloc() function has a 2-factor argument form, kcalloc(). This
      patch replaces cases of:
      
              kzalloc(a * b, gfp)
      
      with:
              kcalloc(a * b, gfp)
      
      as well as handling cases of:
      
              kzalloc(a * b * c, gfp)
      
      with:
      
              kzalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kzalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kzalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kzalloc
      + kcalloc
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(sizeof(THING) * C2, ...)
      |
        kzalloc(sizeof(TYPE) * C2, ...)
      |
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(C1 * C2, ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6396bb22
  6. 12 4月, 2018 1 次提交
    • E
      ext4: prevent right-shifting extents beyond EXT_MAX_BLOCKS · 349fa7d6
      Eric Biggers 提交于
      During the "insert range" fallocate operation, extents starting at the
      range offset are shifted "right" (to a higher file offset) by the range
      length.  But, as shown by syzbot, it's not validated that this doesn't
      cause extents to be shifted beyond EXT_MAX_BLOCKS.  In that case
      ->ee_block can wrap around, corrupting the extent tree.
      
      Fix it by returning an error if the space between the end of the last
      extent and EXT4_MAX_BLOCKS is smaller than the range being inserted.
      
      This bug can be reproduced by running the following commands when the
      current directory is on an ext4 filesystem with a 4k block size:
      
              fallocate -l 8192 file
              fallocate --keep-size -o 0xfffffffe000 -l 4096 -n file
              fallocate --insert-range -l 8192 file
      
      Then after unmounting the filesystem, e2fsck reports corruption.
      
      Reported-by: syzbot+06c885be0edcdaeab40c@syzkaller.appspotmail.com
      Fixes: 331573fe ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate")
      Cc: stable@vger.kernel.org # v4.2+
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      349fa7d6
  7. 26 3月, 2018 1 次提交
  8. 22 3月, 2018 1 次提交
  9. 18 12月, 2017 1 次提交
    • T
      ext4: fix up remaining files with SPDX cleanups · f5166768
      Theodore Ts'o 提交于
      A number of ext4 source files were skipped due because their copyright
      permission statements didn't match the expected text used by the
      automated conversion utilities.  I've added SPDX tags for the rest.
      
      While looking at some of these files, I've noticed that we have quite
      a bit of variation on the licenses that were used --- in particular
      some of the Red Hat licenses on the jbd2 files use a GPL2+ license,
      and we have some files that have a LGPL-2.1 license (which was quite
      surprising).
      
      I've not attempted to do any license changes.  Even if it is perfectly
      legal to relicense to GPL 2.0-only for consistency's sake, that should
      be done with ext4 developer community discussion.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      
      f5166768
  10. 04 12月, 2017 1 次提交
    • E
      ext4: fix fdatasync(2) after fallocate(2) operation · c894aa97
      Eryu Guan 提交于
      Currently, fallocate(2) with KEEP_SIZE followed by a fdatasync(2)
      then crash, we'll see wrong allocated block number (stat -c %b), the
      blocks allocated beyond EOF are all lost. fstests generic/468
      exposes this bug.
      
      Commit 67a7d5f5 ("ext4: fix fdatasync(2) after extent
      manipulation operations") fixed all the other extent manipulation
      operation paths such as hole punch, zero range, collapse range etc.,
      but forgot the fallocate case.
      
      So similarly, fix it by recording the correct journal tid in ext4
      inode in fallocate(2) path, so that ext4_sync_file() will wait for
      the right tid to be committed on fdatasync(2).
      
      This addresses the test failure in xfstests test generic/468.
      Signed-off-by: NEryu Guan <eguan@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      c894aa97
  11. 07 10月, 2017 1 次提交
    • T
      ext4: fix interaction between i_size, fallocate, and delalloc after a crash · 51e3ae81
      Theodore Ts'o 提交于
      If there are pending writes subject to delayed allocation, then i_size
      will show size after the writes have completed, while i_disksize
      contains the value of i_size on the disk (since the writes have not
      been persisted to disk).
      
      If fallocate(2) is called with the FALLOC_FL_KEEP_SIZE flag, either
      with or without the FALLOC_FL_ZERO_RANGE flag set, and the new size
      after the fallocate(2) is between i_size and i_disksize, then after a
      crash, if a journal commit has resulted in the changes made by the
      fallocate() call to be persisted after a crash, but the delayed
      allocation write has not resolved itself, i_size would not be updated,
      and this would cause the following e2fsck complaint:
      
      Inode 12, end of extent exceeds allowed value
      	(logical block 33, physical block 33441, len 7)
      
      This can only take place on a sparse file, where the fallocate(2) call
      is allocating blocks in a range which is before a pending delayed
      allocation write which is extending i_size.  Since this situation is
      quite rare, and the window in which the crash must take place is
      typically < 30 seconds, in practice this condition will rarely happen.
      
      Nevertheless, it can be triggered in testing, and in particular by
      xfstests generic/456.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reported-by: NAmir Goldstein <amir73il@gmail.com>
      Cc: stable@vger.kernel.org
      51e3ae81
  12. 06 8月, 2017 2 次提交
  13. 22 6月, 2017 1 次提交
  14. 30 5月, 2017 1 次提交
    • J
      ext4: fix fdatasync(2) after extent manipulation operations · 67a7d5f5
      Jan Kara 提交于
      Currently, extent manipulation operations such as hole punch, range
      zeroing, or extent shifting do not record the fact that file data has
      changed and thus fdatasync(2) has a work to do. As a result if we crash
      e.g. after a punch hole and fdatasync, user can still possibly see the
      punched out data after journal replay. Test generic/392 fails due to
      these problems.
      
      Fix the problem by properly marking that file data has changed in these
      operations.
      
      CC: stable@vger.kernel.org
      Fixes: a4bb6b64Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      67a7d5f5
  15. 27 5月, 2017 1 次提交
    • J
      ext4: fix data corruption with EXT4_GET_BLOCKS_ZERO · 4f8caa60
      Jan Kara 提交于
      When ext4_map_blocks() is called with EXT4_GET_BLOCKS_ZERO to zero-out
      allocated blocks and these blocks are actually converted from unwritten
      extent the following race can happen:
      
      CPU0					CPU1
      
      page fault				page fault
      ...					...
      ext4_map_blocks()
        ext4_ext_map_blocks()
          ext4_ext_handle_unwritten_extents()
            ext4_ext_convert_to_initialized()
      	- zero out converted extent
      	ext4_zeroout_es()
      	  - inserts extent as initialized in status tree
      
      					ext4_map_blocks()
      					  ext4_es_lookup_extent()
      					    - finds initialized extent
      					write data
        ext4_issue_zeroout()
          - zeroes out new extent overwriting data
      
      This problem can be reproduced by generic/340 for the fallocated case
      for the last block in the file.
      
      Fix the problem by avoiding zeroing out the area we are mapping with
      ext4_map_blocks() in ext4_ext_convert_to_initialized(). It is pointless
      to zero out this area in the first place as the caller asked us to
      convert the area to initialized because he is just going to write data
      there before the transaction finishes. To achieve this we delete the
      special case of zeroing out full extent as that will be handled by the
      cases below zeroing only the part of the extent that needs it. We also
      instruct ext4_split_extent() that the middle of extent being split
      contains data so that ext4_split_extent_at() cannot zero out full extent
      in case of ENOSPC.
      
      CC: stable@vger.kernel.org
      Fixes: 12735f88Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      4f8caa60
  16. 09 1月, 2017 2 次提交
    • R
      ext4: do not polute the extents cache while shifting extents · 03e916fa
      Roman Pen 提交于
      Inside ext4_ext_shift_extents() function ext4_find_extent() is called
      without EXT4_EX_NOCACHE flag, which should prevent cache population.
      
      This leads to oudated offsets in the extents tree and wrong blocks
      afterwards.
      
      Patch fixes the problem providing EXT4_EX_NOCACHE flag for each
      ext4_find_extents() call inside ext4_ext_shift_extents function.
      
      Fixes: 331573feSigned-off-by: NRoman Pen <roman.penyaev@profitbricks.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: stable@vger.kernel.org
      03e916fa
    • R
      ext4: Include forgotten start block on fallocate insert range · 2a9b8cba
      Roman Pen 提交于
      While doing 'insert range' start block should be also shifted right.
      The bug can be easily reproduced by the following test:
      
          ptr = malloc(4096);
          assert(ptr);
      
          fd = open("./ext4.file", O_CREAT | O_TRUNC | O_RDWR, 0600);
          assert(fd >= 0);
      
          rc = fallocate(fd, 0, 0, 8192);
          assert(rc == 0);
          for (i = 0; i < 2048; i++)
                  *((unsigned short *)ptr + i) = 0xbeef;
          rc = pwrite(fd, ptr, 4096, 0);
          assert(rc == 4096);
          rc = pwrite(fd, ptr, 4096, 4096);
          assert(rc == 4096);
      
          for (block = 2; block < 1000; block++) {
                  rc = fallocate(fd, FALLOC_FL_INSERT_RANGE, 4096, 4096);
                  assert(rc == 0);
      
                  for (i = 0; i < 2048; i++)
                          *((unsigned short *)ptr + i) = block;
      
                  rc = pwrite(fd, ptr, 4096, 4096);
                  assert(rc == 4096);
          }
      
      Because start block is not included in the range the hole appears at
      the wrong offset (just after the desired offset) and the following
      pwrite() overwrites already existent block, keeping hole untouched.
      
      Simple way to verify wrong behaviour is to check zeroed blocks after
      the test:
      
         $ hexdump ./ext4.file | grep '0000 0000'
      
      The root cause of the bug is a wrong range (start, stop], where start
      should be inclusive, i.e. [start, stop].
      
      This patch fixes the problem by including start into the range.  But
      not to break left shift (range collapse) stop points to the beginning
      of the a block, not to the end.
      
      The other not obvious change is an iterator check on validness in a
      main loop.  Because iterator is unsigned the following corner case
      should be considered with care: insert a block at 0 offset, when stop
      variables overflows and never becomes less than start, which is 0.
      To handle this special case iterator is set to NULL to indicate that
      end of the loop is reached.
      
      Fixes: 331573feSigned-off-by: NRoman Pen <roman.penyaev@profitbricks.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: stable@vger.kernel.org
      2a9b8cba
  17. 25 12月, 2016 1 次提交
  18. 04 12月, 2016 1 次提交
  19. 15 11月, 2016 1 次提交
    • D
      ext4: use current_time() for inode timestamps · eeca7ea1
      Deepa Dinamani 提交于
      CURRENT_TIME_SEC and CURRENT_TIME are not y2038 safe.
      current_time() will be transitioned to be y2038 safe
      along with vfs.
      
      current_time() returns timestamps according to the
      granularities set in the super_block.
      The granularity check in ext4_current_time() to call
      current_time() or CURRENT_TIME_SEC is not required.
      Use current_time() directly to obtain timestamps
      unconditionally, and remove ext4_current_time().
      
      Quota files are assumed to be on the same filesystem.
      Hence, use current_time() for these files as well.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      eeca7ea1
  20. 14 11月, 2016 1 次提交
  21. 05 11月, 2016 1 次提交
  22. 15 9月, 2016 3 次提交
    • F
      ext4: create EXT4_MAX_BLOCKS() macro · 518eaa63
      Fabian Frederick 提交于
      Create a macro to calculate length + offset -> maximum blocks
      This adds more readability.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      518eaa63
    • F
      ext4: remove unneeded test in ext4_alloc_file_blocks() · c3fe493c
      Fabian Frederick 提交于
      ext4_alloc_file_blocks() is called from ext4_zero_range() and
      ext4_fallocate() both already testing EXT4_INODE_EXTENTS
      We can call ext_depth(inode) unconditionnally.
      
      [ Added BUG_ON check to make sure ext4_alloc_file_blocks() won't get
        called for a indirect-mapped inode in the future.  -- tytso ]
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      c3fe493c
    • F
      ext4: fix memory leak in ext4_insert_range() · edf15aa1
      Fabian Frederick 提交于
      Running xfstests generic/013 with kmemleak gives the following:
      
      unreferenced object 0xffff8801d3d27de0 (size 96):
        comm "fsstress", pid 4941, jiffies 4294860168 (age 53.485s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff818eaaf3>] kmemleak_alloc+0x23/0x40
          [<ffffffff81179805>] __kmalloc+0xf5/0x1d0
          [<ffffffff8122ef5c>] ext4_find_extent+0x1ec/0x2f0
          [<ffffffff8123530c>] ext4_insert_range+0x34c/0x4a0
          [<ffffffff81235942>] ext4_fallocate+0x4e2/0x8b0
          [<ffffffff81181334>] vfs_fallocate+0x134/0x210
          [<ffffffff8118203f>] SyS_fallocate+0x3f/0x60
          [<ffffffff818efa9b>] entry_SYSCALL_64_fastpath+0x13/0x8f
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      Problem seems mitigated by dropping refs and freeing path
      when there's no path[depth].p_ext
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      edf15aa1
  23. 15 7月, 2016 1 次提交
    • V
      ext4: verify extent header depth · 7bc94916
      Vegard Nossum 提交于
      Although the extent tree depth of 5 should enough be for the worst
      case of 2*32 extents of length 1, the extent tree code does not
      currently to merge nodes which are less than half-full with a sibling
      node, or to shrink the tree depth if possible.  So it's possible, at
      least in theory, for the tree depth to be greater than 5.  However,
      even in the worst case, a tree depth of 32 is highly unlikely, and if
      the file system is maliciously corrupted, an insanely large eh_depth
      can cause memory allocation failures that will trigger kernel warnings
      (here, eh_depth = 65280):
      
          JBD2: ext4.exe wants too many credits credits:195849 rsv_credits:0 max:256
          ------------[ cut here ]------------
          WARNING: CPU: 0 PID: 50 at fs/jbd2/transaction.c:293 start_this_handle+0x569/0x580
          CPU: 0 PID: 50 Comm: ext4.exe Not tainted 4.7.0-rc5+ #508
          Stack:
           604a8947 625badd8 0002fd09 00000000
           60078643 00000000 62623910 601bf9bc
           62623970 6002fc84 626239b0 900000125
          Call Trace:
           [<6001c2dc>] show_stack+0xdc/0x1a0
           [<601bf9bc>] dump_stack+0x2a/0x2e
           [<6002fc84>] __warn+0x114/0x140
           [<6002fdff>] warn_slowpath_null+0x1f/0x30
           [<60165829>] start_this_handle+0x569/0x580
           [<60165d4e>] jbd2__journal_start+0x11e/0x220
           [<60146690>] __ext4_journal_start_sb+0x60/0xa0
           [<60120a81>] ext4_truncate+0x131/0x3a0
           [<60123677>] ext4_setattr+0x757/0x840
           [<600d5d0f>] notify_change+0x16f/0x2a0
           [<600b2b16>] do_truncate+0x76/0xc0
           [<600c3e56>] path_openat+0x806/0x1300
           [<600c55c9>] do_filp_open+0x89/0xf0
           [<600b4074>] do_sys_open+0x134/0x1e0
           [<600b4140>] SyS_open+0x20/0x30
           [<6001ea68>] handle_syscall+0x88/0x90
           [<600295fd>] userspace+0x3fd/0x500
           [<6001ac55>] fork_handler+0x85/0x90
      
          ---[ end trace 08b0b88b6387a244 ]---
      
      [ Commit message modified and the extent tree depath check changed
      from 5 to 32 -- tytso ]
      
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      7bc94916
  24. 30 6月, 2016 1 次提交
    • V
      ext4: check for extents that wrap around · f70749ca
      Vegard Nossum 提交于
      An extent with lblock = 4294967295 and len = 1 will pass the
      ext4_valid_extent() test:
      
      	ext4_lblk_t last = lblock + len - 1;
      
      	if (len == 0 || lblock > last)
      		return 0;
      
      since last = 4294967295 + 1 - 1 = 4294967295. This would later trigger
      the BUG_ON(es->es_lblk + es->es_len < es->es_lblk) in ext4_es_end().
      
      We can simplify it by removing the - 1 altogether and changing the test
      to use lblock + len <= lblock, since now if len = 0, then lblock + 0 ==
      lblock and it fails, and if len > 0 then lblock + len > lblock in order
      to pass (i.e. it doesn't overflow).
      
      Fixes: 5946d089 ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
      Fixes: 2f974865 ("ext4: check for zero length extent explicitly")
      Cc: Eryu Guan <guaneryu@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPhil Turnbull <phil.turnbull@oracle.com>
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      f70749ca
  25. 06 5月, 2016 1 次提交
  26. 27 4月, 2016 1 次提交
  27. 26 4月, 2016 1 次提交
    • T
      ext4: fix jbd2 handle extension in ext4_ext_truncate_extend_restart() · 7b808191
      Theodore Ts'o 提交于
      The function jbd2_journal_extend() takes as its argument the number of
      new credits to be added to the handle.  We weren't taking into account
      the currently unused handle credits; worse, we would try to extend the
      handle by N credits when it had N credits available.
      
      In the case where jbd2_journal_extend() fails because the transaction
      is too large, when jbd2_journal_restart() gets called, the N credits
      owned by the handle gets returned to the transaction, and the
      transaction commit is asynchronously requested, and then
      start_this_handle() will be able to successfully attach the handle to
      the current transaction since the required credits are now available.
      
      This is mostly harmless, but since ext4_ext_truncate_extend_restart()
      returns EAGAIN, the truncate machinery will once again try to call
      ext4_ext_truncate_extend_restart(), which will do the above sequence
      over and over again until the transaction has committed.
      
      This was found while I was debugging a lockup in caused by running
      xfstests generic/074 in the data=journal case.  I'm still not sure why
      we ended up looping forever, which suggests there may still be another
      bug hiding in the transaction accounting machinery, but this commit
      prevents us from looping in the first place.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      7b808191
  28. 10 3月, 2016 3 次提交
  29. 09 3月, 2016 1 次提交
    • J
      ext4: simplify io_end handling for AIO DIO · 109811c2
      Jan Kara 提交于
      When mapping blocks for direct IO, we allocate io_end structure before
      mapping blocks and store pointer to it in the inode. This creates a
      requirement that any AIO DIO using io_end must be protected by i_mutex.
      This created problems in the past with dioread_nolock mode which was
      corrupting io_end pointers. Also io_end is allocated unnecessarily in
      case where we don't need to convert any extents (which is a common case
      for example when overwriting file).
      
      We fix the problem by allocating io_end only once we return unwritten
      extent from block mapping function for AIO DIO (so we can save some
      pointless io_end allocations) and we pass pointer to it in bh->b_private
      which generic DIO code later passes to our end IO callback. That way we
      remove any need for global pointer to io_end structure and thus fix the
      races.
      
      The downside of this change is that the checking for unwritten IO in
      flight in ext4_extents_can_be_merged() is more racy since we now
      increment i_unwritten / set EXT4_STATE_DIO_UNWRITTEN only after dropping
      i_data_sem. However the check has been racy already before because
      ext4_writepages() already increment i_unwritten after dropping
      i_data_sem and reserved blocks save us from hitting ENOSPC in the worst
      case.
      Signed-off-by: NJan Kara <jack@suse.cz>
      109811c2
  30. 23 2月, 2016 1 次提交
  31. 12 2月, 2016 1 次提交