1. 26 11月, 2014 5 次提交
    • D
      ext4: create nojournal_checksum mount option · c6d3d56d
      Darrick J. Wong 提交于
      Create a mount option to disable journal checksumming (because the
      metadata_csum feature turns it on by default now), and fix remount not
      to allow changing the journal checksumming option, since changing the
      mount options has no effect on the journal.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      c6d3d56d
    • D
      ext4: cleanup GFP flags inside resize path · 4fdb5543
      Dmitry Monakhov 提交于
      We must use GFP_NOFS instead GFP_KERNEL inside ext4_mb_add_groupinfo
      and ext4_calculate_overhead() because they are called from inside a
      journal transaction. Call trace:
      
      ioctl
       ->ext4_group_add
         ->journal_start
         ->ext4_setup_new_descs
           ->ext4_mb_add_groupinfo -> GFP_KERNEL
         ->ext4_flex_group_add
           ->ext4_update_super
             ->ext4_calculate_overhead  -> GFP_KERNEL
         ->journal_stop
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      4fdb5543
    • J
      ext4: limit number of scanned extents in status tree shrinker · dd475925
      Jan Kara 提交于
      Currently we scan extent status trees of inodes until we reclaim nr_to_scan
      extents. This can however require a lot of scanning when there are lots
      of delayed extents (as those cannot be reclaimed).
      
      Change shrinker to work as shrinkers are supposed to and *scan* only
      nr_to_scan extents regardless of how many extents did we actually
      reclaim. We however need to be careful and avoid scanning each status
      tree from the beginning - that could lead to a situation where we would
      not be able to reclaim anything at all when first nr_to_scan extents in
      the tree are always unreclaimable. We remember with each inode offset
      where we stopped scanning and continue from there when we next come
      across the inode.
      
      Note that we also need to update places calling __es_shrink() manually
      to pass reasonable nr_to_scan to have a chance of reclaiming anything and
      not just 1.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      dd475925
    • J
      ext4: move handling of list of shrinkable inodes into extent status code · b0dea4c1
      Jan Kara 提交于
      Currently callers adding extents to extent status tree were responsible
      for adding the inode to the list of inodes with freeable extents. This
      is error prone and puts list handling in unnecessarily many places.
      
      Just add inode to the list automatically when the first non-delay extent
      is added to the tree and remove inode from the list when the last
      non-delay extent is removed.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      b0dea4c1
    • Z
      ext4: change LRU to round-robin in extent status tree shrinker · edaa53ca
      Zheng Liu 提交于
      In this commit we discard the lru algorithm for inodes with extent
      status tree because it takes significant effort to maintain a lru list
      in extent status tree shrinker and the shrinker can take a long time to
      scan this lru list in order to reclaim some objects.
      
      We replace the lru ordering with a simple round-robin.  After that we
      never need to keep a lru list.  That means that the list needn't be
      sorted if the shrinker can not reclaim any objects in the first round.
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      edaa53ca
  2. 21 11月, 2014 1 次提交
  3. 30 10月, 2014 3 次提交
  4. 14 10月, 2014 1 次提交
    • D
      ext4: check s_chksum_driver when looking for bg csum presence · 813d32f9
      Darrick J. Wong 提交于
      Convert the ext4_has_group_desc_csum predicate to look for a checksum
      driver instead of the metadata_csum flag and change the bg checksum
      calculation function to look for GDT_CSUM before taking the crc16
      path.
      
      Without this patch, if we mount with ^uninit_bg,^metadata_csum and
      later metadata_csum gets turned on by accident, the block group
      checksum functions will incorrectly assume that checksumming is
      enabled (metadata_csum) but that crc16 should be used
      (!s_chksum_driver).  This is totally wrong, so fix the predicate
      and the checksum formula selection.
      
      (Granted, if the metadata_csum feature bit gets enabled on a live FS
      then something underhanded is going on, but we could at least avoid
      writing garbage into the on-disk fields.)
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Cc: stable@vger.kernel.org
      813d32f9
  5. 13 10月, 2014 1 次提交
  6. 06 10月, 2014 1 次提交
    • T
      ext4: add ext4_iget_normal() which is to be used for dir tree lookups · f4bb2981
      Theodore Ts'o 提交于
      If there is a corrupted file system which has directory entries that
      point at reserved, metadata inodes, prohibit them from being used by
      treating them the same way we treat Boot Loader inodes --- that is,
      mark them to be bad inodes.  This prohibits them from being opened,
      deleted, or modified via chmod, chown, utimes, etc.
      
      In particular, this prevents a corrupted file system which has a
      directory entry which points at the journal inode from being deleted
      and its blocks released, after which point Much Hilarity Ensues.
      Reported-by: NSami Liedes <sami.liedes@iki.fi>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      f4bb2981
  7. 19 9月, 2014 3 次提交
  8. 18 9月, 2014 1 次提交
  9. 17 9月, 2014 1 次提交
  10. 11 9月, 2014 5 次提交
  11. 08 9月, 2014 1 次提交
    • T
      percpu_counter: add @gfp to percpu_counter_init() · 908c7f19
      Tejun Heo 提交于
      Percpu allocator now supports allocation mask.  Add @gfp to
      percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
      with percpu_counters too.
      
      We could have left percpu_counter_init() alone and added
      percpu_counter_init_gfp(); however, the number of users isn't that
      high and introducing _gfp variants to all percpu data structures would
      be quite ugly, so let's just do the conversion.  This is the one with
      the most users.  Other percpu data structures are a lot easier to
      convert.
      
      This patch doesn't make any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Cc: x86@kernel.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      908c7f19
  12. 05 9月, 2014 1 次提交
  13. 02 9月, 2014 2 次提交
    • Z
      ext4: track extent status tree shrinker delay statictics · eb68d0e2
      Zheng Liu 提交于
      This commit adds some statictics in extent status tree shrinker.  The
      purpose to add these is that we want to collect more details when we
      encounter a stall caused by extent status tree shrinker.  Here we count
      the following statictics:
        stats:
          the number of all objects on all extent status trees
          the number of reclaimable objects on lru list
          cache hits/misses
          the last sorted interval
          the number of inodes on lru list
        average:
          scan time for shrinking some objects
          the number of shrunk objects
        maximum:
          the inode that has max nr. of objects on lru list
          the maximum scan time for shrinking some objects
      
      The output looks like below:
        $ cat /proc/fs/ext4/sda1/es_shrinker_info
        stats:
          28228 objects
          6341 reclaimable objects
          5281/631 cache hits/misses
          586 ms last sorted interval
          250 inodes on lru list
        average:
          153 us scan time
          128 shrunk objects
        maximum:
          255 inode (255 objects, 198 reclaimable)
          125723 us max scan time
      
      If the lru list has never been sorted, the following line will not be
      printed:
          586ms last sorted interval
      If there is an empty lru list, the following lines also will not be
      printed:
          250 inodes on lru list
        ...
        maximum:
          255 inode (255 objects, 198 reclaimable)
          0 us max scan time
      
      Meanwhile in this commit a new trace point is defined to print some
      details in __ext4_es_shrink().
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      eb68d0e2
    • D
      ext4: enable block_validity by default · 45f1a9c3
      Darrick J. Wong 提交于
      Enable by default the block_validity feature, which checks for
      collisions between newly allocated blocks and critical system
      metadata.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      45f1a9c3
  14. 30 8月, 2014 1 次提交
  15. 29 8月, 2014 1 次提交
    • D
      jbd2: fix descriptor block size handling errors with journal_csum · db9ee220
      Darrick J. Wong 提交于
      It turns out that there are some serious problems with the on-disk
      format of journal checksum v2.  The foremost is that the function to
      calculate descriptor tag size returns sizes that are too big.  This
      causes alignment issues on some architectures and is compounded by the
      fact that some parts of jbd2 use the structure size (incorrectly) to
      determine the presence of a 64bit journal instead of checking the
      feature flags.
      
      Therefore, introduce journal checksum v3, which enlarges the
      descriptor block tag format to allow for full 32-bit checksums of
      journal blocks, fix the journal tag function to return the correct
      sizes, and fix the jbd2 recovery code to use feature flags to
      determine 64bitness.
      
      Add a few function helpers so we don't have to open-code quite so
      many pieces.
      
      Switching to a 16-byte block size was found to increase journal size
      overhead by a maximum of 0.1%, to convert a 32-bit journal with no
      checksumming to a 32-bit journal with checksum v3 enabled.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reported-by: NTR Reardon <thomas_reardon@hotmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      db9ee220
  16. 15 7月, 2014 1 次提交
    • T
      ext4: rearrange initialization to fix EXT4FS_DEBUG · d5e03cbb
      Theodore Ts'o 提交于
      The EXT4FS_DEBUG is a *very* developer specific #ifdef designed for
      ext4 developers only.  (You have to modify fs/ext4/ext4.h to enable
      it.)
      
      Rearrange how we initialize data structures to avoid calling
      ext4_count_free_clusters() until the multiblock allocator has been
      initialized.
      
      This also allows us to only call ext4_count_free_clusters() once, and
      simplifies the code somewhat.
      
      (Thanks to Chen Gang <gang.chen.5i5j@gmail.com> for pointing out a
      !CONFIG_SMP compile breakage in the original patch.)
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NLukas Czerner <lczerner@redhat.com>
      d5e03cbb
  17. 12 7月, 2014 1 次提交
  18. 06 7月, 2014 2 次提交
  19. 13 5月, 2014 2 次提交
  20. 12 5月, 2014 2 次提交
  21. 22 4月, 2014 1 次提交
  22. 21 4月, 2014 1 次提交
    • L
      ext4: rename uninitialized extents to unwritten · 556615dc
      Lukas Czerner 提交于
      Currently in ext4 there is quite a mess when it comes to naming
      unwritten extents. Sometimes we call it uninitialized and sometimes we
      refer to it as unwritten.
      
      The right name for the extent which has been allocated but does not
      contain any written data is _unwritten_. Other file systems are
      using this name consistently, even the buffer head state refers to it as
      unwritten. We need to fix this confusion in ext4.
      
      This commit changes every reference to an uninitialized extent (meaning
      allocated but unwritten) to unwritten extent. This includes comments,
      function names and variable names. It even covers abbreviation of the
      word uninitialized (such as uninit) and some misspellings.
      
      This commit does not change any of the code paths at all. This has been
      confirmed by comparing md5sums of the assembly code of each object file
      after all the function names were stripped from it.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      556615dc
  23. 07 4月, 2014 1 次提交
    • A
      ext4: initialize multi-block allocator before checking block descriptors · 00764937
      Azat Khuzhin 提交于
      With EXT4FS_DEBUG ext4_count_free_clusters() will call
      ext4_read_block_bitmap() without s_group_info initialized, so we need to
      initialize multi-block allocator before.
      
      And dependencies that must be solved, to allow this:
      - multi-block allocator needs in group descriptors
      - need to install s_op before initializing multi-block allocator,
        because in ext4_mb_init_backend() new inode is created.
      - initialize number of group desc blocks (s_gdb_count) otherwise
        number of clusters returned by ext4_free_clusters_after_init() is not correct.
        (see ext4_bg_num_gdb_nometa())
      
      Here is the stack backtrace:
      
      (gdb) bt
       #0  ext4_get_group_info (group=0, sb=0xffff880079a10000) at ext4.h:2430
       #1  ext4_validate_block_bitmap (sb=sb@entry=0xffff880079a10000,
           desc=desc@entry=0xffff880056510000, block_group=block_group@entry=0,
           bh=bh@entry=0xffff88007bf2b2d8) at balloc.c:358
       #2  0xffffffff81232202 in ext4_wait_block_bitmap (sb=sb@entry=0xffff880079a10000,
           block_group=block_group@entry=0,
           bh=bh@entry=0xffff88007bf2b2d8) at balloc.c:476
       #3  0xffffffff81232eaf in ext4_read_block_bitmap (sb=sb@entry=0xffff880079a10000,
           block_group=block_group@entry=0) at balloc.c:489
       #4  0xffffffff81232fc0 in ext4_count_free_clusters (sb=sb@entry=0xffff880079a10000) at balloc.c:665
       #5  0xffffffff81259ffa in ext4_check_descriptors (first_not_zeroed=<synthetic pointer>,
           sb=0xffff880079a10000) at super.c:2143
       #6  ext4_fill_super (sb=sb@entry=0xffff880079a10000, data=<optimized out>,
           data@entry=0x0 <irq_stack_union>, silent=silent@entry=0) at super.c:3851
           ...
      Signed-off-by: NAzat Khuzhin <a3at.mail@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      00764937
  24. 25 3月, 2014 1 次提交