1. 14 2月, 2020 4 次提交
  2. 25 1月, 2020 5 次提交
  3. 24 1月, 2020 1 次提交
    • D
      ext4: fix extent_status fragmentation for plain files · 4068664e
      Dmitry Monakhov 提交于
      Extents are cached in read_extent_tree_block(); as a result, extents
      are not cached for inodes with depth == 0 when we try to find the
      extent using ext4_find_extent().  The result of the lookup is cached
      in ext4_map_blocks() but is only a subset of the extent on disk.  As a
      result, the contents of extents status cache can get very badly
      fragmented for certain workloads, such as a random 4k read workload.
      
      File size of /mnt/test is 33554432 (8192 blocks of 4096 bytes)
       ext:     logical_offset:        physical_offset: length:   expected: flags:
         0:        0..    8191:      40960..     49151:   8192:             last,eof
      
      $ perf record -e 'ext4:ext4_es_*' /root/bin/fio --name=t --direct=0 --rw=randread --bs=4k --filesize=32M --size=32M --filename=/mnt/test
      $ perf script | grep ext4_es_insert_extent | head -n 10
                   fio   131 [000]    13.975421:           ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [494/1) mapped 41454 status W
                   fio   131 [000]    13.975939:           ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6064/1) mapped 47024 status W
                   fio   131 [000]    13.976467:           ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6907/1) mapped 47867 status W
                   fio   131 [000]    13.976937:           ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3850/1) mapped 44810 status W
                   fio   131 [000]    13.977440:           ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3292/1) mapped 44252 status W
                   fio   131 [000]    13.977931:           ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6882/1) mapped 47842 status W
                   fio   131 [000]    13.978376:           ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3117/1) mapped 44077 status W
                   fio   131 [000]    13.978957:           ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [2896/1) mapped 43856 status W
                   fio   131 [000]    13.979474:           ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [7479/1) mapped 48439 status W
      
      Fix this by caching the extents for inodes with depth == 0 in
      ext4_find_extent().
      
      [ Renamed ext4_es_cache_extents() to ext4_cache_extents() since this
        newly added function is not in extents_cache.c, and to avoid
        potential visual confusion with ext4_es_cache_extent().  -TYT ]
      Signed-off-by: NDmitry Monakhov <dmonakhov@gmail.com>
      Link: https://lore.kernel.org/r/20191106122502.19986-1-dmonakhov@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      4068664e
  4. 18 1月, 2020 25 次提交
  5. 15 1月, 2020 1 次提交
    • E
      fs-verity: implement readahead of Merkle tree pages · fd39073d
      Eric Biggers 提交于
      When fs-verity verifies data pages, currently it reads each Merkle tree
      page synchronously using read_mapping_page().
      
      Therefore, when the Merkle tree pages aren't already cached, fs-verity
      causes an extra 4 KiB I/O request for every 512 KiB of data (assuming
      that the Merkle tree uses SHA-256 and 4 KiB blocks).  This results in
      more I/O requests and performance loss than is strictly necessary.
      
      Therefore, implement readahead of the Merkle tree pages.
      
      For simplicity, we take advantage of the fact that the kernel already
      does readahead of the file's *data*, just like it does for any other
      file.  Due to this, we don't really need a separate readahead state
      (struct file_ra_state) just for the Merkle tree, but rather we just need
      to piggy-back on the existing data readahead requests.
      
      We also only really need to bother with the first level of the Merkle
      tree, since the usual fan-out factor is 128, so normally over 99% of
      Merkle tree I/O requests are for the first level.
      
      Therefore, make fsverity_verify_bio() enable readahead of the first
      Merkle tree level, for up to 1/4 the number of pages in the bio, when it
      sees that the REQ_RAHEAD flag is set on the bio.  The readahead size is
      then passed down to ->read_merkle_tree_page() for the filesystem to
      (optionally) implement if it sees that the requested page is uncached.
      
      While we're at it, also make build_merkle_tree_level() set the Merkle
      tree readahead size, since it's easy to do there.
      
      However, for now don't set the readahead size in fsverity_verify_page(),
      since currently it's only used to verify holes on ext4 and f2fs, and it
      would need parameters added to know how much to read ahead.
      
      This patch significantly improves fs-verity sequential read performance.
      Some quick benchmarks with 'cat'-ing a 250MB file after dropping caches:
      
          On an ARM64 phone (using sha256-ce):
              Before: 217 MB/s
              After: 263 MB/s
              (compare to sha256sum of non-verity file: 357 MB/s)
      
          In an x86_64 VM (using sha256-avx2):
              Before: 173 MB/s
              After: 215 MB/s
              (compare to sha256sum of non-verity file: 223 MB/s)
      
      Link: https://lore.kernel.org/r/20200106205533.137005-1-ebiggers@kernel.orgReviewed-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      fd39073d
  6. 10 1月, 2020 1 次提交
    • A
      kunit: allow kunit tests to be loaded as a module · c475c77d
      Alan Maguire 提交于
      As tests are added to kunit, it will become less feasible to execute
      all built tests together.  By supporting modular tests we provide
      a simple way to do selective execution on a running system; specifying
      
      CONFIG_KUNIT=y
      CONFIG_KUNIT_EXAMPLE_TEST=m
      
      ...means we can simply "insmod example-test.ko" to run the tests.
      
      To achieve this we need to do the following:
      
      o export the required symbols in kunit
      o string-stream tests utilize non-exported symbols so for now we skip
        building them when CONFIG_KUNIT_TEST=m.
      o drivers/base/power/qos-test.c contains a few unexported interface
        references, namely freq_qos_read_value() and freq_constraints_init().
        Both of these could be potentially defined as static inline functions
        in include/linux/pm_qos.h, but for now we simply avoid supporting
        module build for that test suite.
      o support a new way of declaring test suites.  Because a module cannot
        do multiple late_initcall()s, we provide a kunit_test_suites() macro
        to declare multiple suites within the same module at once.
      o some test module names would have been too general ("test-test"
        and "example-test" for kunit tests, "inode-test" for ext4 tests);
        rename these as appropriate ("kunit-test", "kunit-example-test"
        and "ext4-inode-test" respectively).
      
      Also define kunit_test_suite() via kunit_test_suites()
      as callers in other trees may need the old definition.
      Co-developed-by: NKnut Omang <knut.omang@oracle.com>
      Signed-off-by: NKnut Omang <knut.omang@oracle.com>
      Signed-off-by: NAlan Maguire <alan.maguire@oracle.com>
      Reviewed-by: NBrendan Higgins <brendanhiggins@google.com>
      Acked-by: Theodore Ts'o <tytso@mit.edu> # for ext4 bits
      Acked-by: David Gow <davidgow@google.com> # For list-test
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NShuah Khan <skhan@linuxfoundation.org>
      c475c77d
  7. 01 1月, 2020 2 次提交
  8. 27 12月, 2019 1 次提交
    • J
      ext4: Optimize ext4 DIO overwrites · 8cd115bd
      Jan Kara 提交于
      Currently we start transaction for mapping every extent for writing
      using direct IO. This is unnecessary when we know we are overwriting
      already allocated blocks and the overhead of starting a transaction can
      be significant especially for multithreaded workloads doing small writes.
      Use iomap operations that avoid starting a transaction for direct IO
      overwrites.
      
      This improves throughput of 4k random writes - fio jobfile:
      [global]
      rw=randrw
      norandommap=1
      invalidate=0
      bs=4k
      numjobs=16
      time_based=1
      ramp_time=30
      runtime=120
      group_reporting=1
      ioengine=psync
      direct=1
      size=16G
      filename=file1.0.0:file1.0.1:file1.0.2:file1.0.3:file1.0.4:file1.0.5:file1.0.6:file1.0.7:file1.0.8:file1.0.9:file1.0.10:file1.0.11:file1.0.12:file1.0.13:file1.0.14:file1.0.15:file1.0.16:file1.0.17:file1.0.18:file1.0.19:file1.0.20:file1.0.21:file1.0.22:file1.0.23:file1.0.24:file1.0.25:file1.0.26:file1.0.27:file1.0.28:file1.0.29:file1.0.30:file1.0.31
      file_service_type=random
      nrfiles=32
      
      from 3018MB/s to 4059MB/s in my test VM running test against simulated
      pmem device (note that before iomap conversion, this workload was able
      to achieve 3708MB/s because old direct IO path avoided transaction start
      for overwrites as well). For dax, the win is even larger improving
      throughput from 3042MB/s to 4311MB/s.
      Reported-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20191218174433.19380-1-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      8cd115bd