1. 06 12月, 2022 40 次提交
    • J
      btrfs: move the printk helpers out of ctree.h · 9b569ea0
      Josef Bacik 提交于
      We have a bunch of printk helpers that are in ctree.h.  These have
      nothing to do with ctree.c, so move them into their own header.
      Subsequent patches will cleanup the printk helpers.
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9b569ea0
    • J
      btrfs: move assert helpers out of ctree.h · e118578a
      Josef Bacik 提交于
      These call functions that aren't defined in, or will be moved out of,
      ctree.h  Move them to super.c where the other assert/error message code
      is defined. Drop the __noreturn attribute for btrfs_assertfail as
      objtool does not like it and fails with warnings like
      
        fs/btrfs/dir-item.o: warning: objtool: .text.unlikely: unexpected end of section
        fs/btrfs/xattr.o: warning: objtool: btrfs_setxattr() falls through to next function btrfs_setxattr_trans.cold()
        fs/btrfs/xattr.o: warning: objtool: .text.unlikely: unexpected end of section
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e118578a
    • J
      btrfs: move fs wide helpers out of ctree.h · c7f13d42
      Josef Bacik 提交于
      We have several fs wide related helpers in ctree.h.  The bulk of these
      are the incompat flag test helpers, but there are things such as
      btrfs_fs_closing() and the read only helpers that also aren't directly
      related to the ctree code.  Move these into a fs.h header, which will
      serve as the location for file system wide related helpers.
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c7f13d42
    • W
      btrfs: send add define for v2 buffer size · 875c627c
      Wang Yugui 提交于
      Add a define for the data buffer size (though the maximum size is not
      limited by it) BTRFS_SEND_BUF_SIZE_V2 so it's more visible.
      Signed-off-by: NWang Yugui <wangyugui@e16-tech.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      875c627c
    • D
      btrfs: simplify generation check in btrfs_get_dentry · b307f06d
      David Sterba 提交于
      Callers that pass non-zero generation always want to perform the
      generation check, we can simply encode that in one parameter and drop
      check_generation. Add function documentation.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b307f06d
    • D
      btrfs: auto enable discard=async when possible · 63a7cb13
      David Sterba 提交于
      There's a request to automatically enable async discard for capable
      devices. We can do that, the async mode is designed to wait for larger
      freed extents and is not intrusive, with limits to iops, kbps or latency.
      
      The status and tunables will be exported in /sys/fs/btrfs/FSID/discard .
      
      The automatic selection is done if there's at least one discard capable
      device in the filesystem (not capable devices are skipped). Mounting
      with any other discard option will honor that option, notably mounting
      with nodiscard will keep it disabled.
      
      Link: https://lore.kernel.org/linux-btrfs/CAEg-Je_b1YtdsCR0zS5XZ_SbvJgN70ezwvRwLiCZgDGLbeMB=w@mail.gmail.com/Reviewed-by: NBoris Burkov <boris@bur.io>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      63a7cb13
    • D
      btrfs: sysfs: convert remaining scnprintf to sysfs_emit · 467761f9
      David Sterba 提交于
      The sysfs_emit is the safe API for writing to the sysfs files,
      previously converted from scnprintf, there's one left to do in
      btrfs_read_policy_show.
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      467761f9
    • J
      btrfs: do not panic if we can't allocate a prealloc extent state · 5a75034e
      Josef Bacik 提交于
      We sometimes have to allocate new extent states when clearing or setting
      new bits in an extent io tree.  Generally we preallocate this before
      taking the tree spin lock, but we can use this preallocated extent state
      sometimes and then need to try to do a GFP_ATOMIC allocation under the
      lock.
      
      Unfortunately sometimes this fails, and then we hit the BUG_ON() and
      bring the box down.  This happens roughly 20 times a week in our fleet.
      
      However the vast majority of callers use GFP_NOFS, which means that if
      this GFP_ATOMIC allocation fails, we could simply drop the spin lock, go
      back and allocate a new extent state with our given gfp mask, and begin
      again from where we left off.
      
      For the remaining callers that do not use GFP_NOFS, they are generally
      using GFP_NOWAIT, which still allows for some reclaim.  So allow these
      allocations to attempt to happen outside of the spin lock so we don't
      need to rely on GFP_ATOMIC allocations.
      
      This in essence creates an infinite loop for anything that isn't
      GFP_NOFS.  To address this we may want to migrate to using mempools for
      extent states so that we will always have emergency reserves in order to
      make our allocations.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5a75034e
    • J
      btrfs: remove unused unlock_extent_atomic · da2a071b
      Josef Bacik 提交于
      As of "btrfs: do not use GFP_ATOMIC in the read endio" we no longer have
      any users of unlock_extent_atomic, remove it.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      da2a071b
    • J
      btrfs: do not use GFP_ATOMIC in the read endio · 48acc47d
      Josef Bacik 提交于
      We have done read endio in an async thread for a very, very long time,
      which makes the use of GFP_ATOMIC and unlock_extent_atomic() unneeded in
      our read endio path.  We've noticed under heavy memory pressure in our
      fleet that we can fail these allocations, and then often trip a
      BUG_ON(!allocation), which isn't an ideal outcome.  Begin to address
      this by simply not using GFP_ATOMIC, which will allow us to do things
      like actually allocate a extent state when doing
      set_extent_bits(UPTODATE) in the endio handler.
      
      End io handlers are not called in atomic context, besides we have been
      allocating failrec with GFP_NOFS so we'd notice there's a problem.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      48acc47d
    • Q
      btrfs: skip update of block group item if used bytes are the same · 7248e0ce
      Qu Wenruo 提交于
      [BACKGROUND]
      
      When committing a transaction, we will update block group items for all
      dirty block groups.
      
      But in fact, dirty block groups don't always need to update their block
      group items.
      It's pretty common to have a metadata block group which experienced
      several COW operations, but still have the same amount of used bytes.
      
      In that case, we may unnecessarily COW a tree block doing nothing.
      
      [ENHANCEMENT]
      
      This patch will introduce btrfs_block_group::commit_used member to
      remember the last used bytes, and use that new member to skip
      unnecessary block group item update.
      
      This would be more common for large filesystems, where metadata block
      group can be as large as 1GiB, containing at most 64K metadata items.
      
      In that case, if COW added and then deleted one metadata item near the
      end of the block group, then it's completely possible we don't need to
      touch the block group item at all.
      
      [BENCHMARK]
      
      The change itself can have quite a high chance (20~80%) to skip block
      group item updates in lot of workloads.
      
      As a result, it would result shorter time spent on
      btrfs_write_dirty_block_groups(), and overall reduce the execution time
      of the critical section of btrfs_commit_transaction().
      
      Here comes a fio command, which will do random writes in 4K block size,
      causing a very heavy metadata updates.
      
      fio --filename=$mnt/file --size=512M --rw=randwrite --direct=1 --bs=4k \
          --ioengine=libaio --iodepth=64 --runtime=300 --numjobs=4 \
          --name=random_write --fallocate=none --time_based --fsync_on_close=1
      
      The file size (512M) and number of threads (4) means 2GiB file size in
      total, but during the full 300s run time, my dedicated SATA SSD is able
      to write around 20~25GiB, which is over 10 times the file size.
      
      Thus after we fill the initial 2G, we should not cause much block group
      item updates.
      
      Please note, the fio numbers by themselves don't have much change, but
      if we look deeper, there is some reduced execution time, especially for
      the critical section of btrfs_commit_transaction().
      
      I added extra trace_printk() to measure the following per-transaction
      execution time:
      
      - Critical section of btrfs_commit_transaction()
        By re-using the existing update_commit_stats() function, which
        has already calculated the interval correctly.
      
      - The while() loop for btrfs_write_dirty_block_groups()
        Although this includes the execution time of btrfs_run_delayed_refs(),
        it should still be representative overall.
      
      Both result involves transid 7~30, the same amount of transaction
      committed.
      
      The result looks like this:
      
                            |      Before       |     After      |  Diff
      ----------------------+-------------------+----------------+--------
      Transaction interval  | 229247198.5       | 215016933.6    | -6.2%
      Block group interval  | 23133.33333       | 18970.83333    | -18.0%
      
      The change in block group item updates is more obvious, as skipped block
      group item updates also mean less delayed refs.
      
      And the overall execution time for that block group update loop is
      pretty small, thus we can assume the extent tree is already mostly
      cached.  If we can skip an uncached tree block, it would cause more
      obvious change.
      
      Unfortunately the overall reduction in commit transaction critical
      section is much smaller, as the block group item updates loop is not
      really the major part, at least not for the above fio script.
      
      But still we have a observable reduction in the critical section.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7248e0ce
    • D
      btrfs: convert __TRANS_* defines to enum bits · cc37ea61
      David Sterba 提交于
      The base transaction bits can be defined as bits in a contiguous
      sequence, although right now there's a hole from bit 1 to 8.
      
      The bits are used for btrfs_trans_handle::type, and there's another set
      of TRANS_STATE_* defines that are for btrfs_transaction::state. They are
      mutually exclusive though the hole in the sequence looks like was made
      for the states.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      cc37ea61
    • D
      btrfs: convert QGROUP_* defines to enum bits · e0a8b9a7
      David Sterba 提交于
      The defines/enums are used only for tracepoints and are not part of the
      on-disk format.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e0a8b9a7
    • D
      btrfs: convert EXTENT_* bits to enums · d3b4d0fd
      David Sterba 提交于
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d3b4d0fd
    • D
      fd8d2951
    • D
      c7321b76
    • D
      btrfs: add helper for bit enumeration · d549ff7b
      David Sterba 提交于
      Define helper macro that can be used in enum {} to utilize the automatic
      increment to define all bits without directly defining the values or
      using additional linear bits.
      
      1. capture the sequence value, N
      2. use the value to define the given enum with N-th bit set
      3. reset the sequence back to N
      
      Use for enums that do not require fixed values for symbolic names (like
      for on-disk structures):
      
      enum {
      	ENUM_BIT(FIRST),
      	ENUM_BIT(SECOND),
      	ENUM_BIT(THIRD)
      };
      
      Where the values would be 0x1, 0x2 and 0x4.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d549ff7b
    • Q
      btrfs: make module init/exit match their sequence · 5565b8e0
      Qu Wenruo 提交于
      [BACKGROUND]
      In theory init_btrfs_fs() and exit_btrfs_fs() should match their
      sequence, thus normally they should look like this:
      
          init_btrfs_fs()   |   exit_btrfs_fs()
      ----------------------+------------------------
          init_A();         |
          init_B();         |
          init_C();         |
                            |   exit_C();
                            |   exit_B();
                            |   exit_A();
      
      So is for the error path of init_btrfs_fs().
      
      But it's not the case, some exit functions don't match their init
      functions sequence in init_btrfs_fs().
      
      Furthermore in init_btrfs_fs(), we need to have a new error label for
      each new init function we added.  This is not really expandable,
      especially recently we may add several new functions to init_btrfs_fs().
      
      [ENHANCEMENT]
      The patch will introduce the following things to enhance the situation:
      
      - struct init_sequence
        Just a wrapper of init and exit function pointers.
      
        The init function must use int type as return value, thus some init
        functions need to be updated to return 0.
      
        The exit function can be NULL, as there are some init sequence just
        outputting a message.
      
      - struct mod_init_seq[] array
        This is a const array, recording all the initialization we need to do
        in init_btrfs_fs(), and the order follows the old init_btrfs_fs().
      
      - bool mod_init_result[] array
        This is a bool array, recording if we have initialized one entry in
        mod_init_seq[].
      
        The reason to split mod_init_seq[] and mod_init_result[] is to avoid
        section mismatch in reference.
      
        All init function are in .init.text, but if mod_init_seq[] records
        the @initialized member it can no longer be const, thus will be put
        into .data section, and cause modpost warning.
      
      For init_btrfs_fs() we just call all init functions in their order in
      mod_init_seq[] array, and after each call, setting corresponding
      mod_init_result[] to true.
      
      For exit_btrfs_fs() and error handling path of init_btrfs_fs(), we just
      iterate mod_init_seq[] in reverse order, and skip all uninitialized
      entry.
      
      With this patch, init_btrfs_fs()/exit_btrfs_fs() will be much easier to
      expand and will always follow the strict order.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5565b8e0
    • F
      btrfs: remove gfp_t flag from btrfs_tree_mod_log_insert_key() · 33cff222
      Filipe Manana 提交于
      All callers of btrfs_tree_mod_log_insert_key() are now passing a GFP_NOFS
      flag to it, so remove the flag from it and from alloc_tree_mod_elem() and
      use it directly within alloc_tree_mod_elem().
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      33cff222
    • F
      btrfs: switch GFP_ATOMIC to GFP_NOFS when fixing up low keys · 879b2221
      Filipe Manana 提交于
      When fixing up the first key of each node above the current level, at
      fixup_low_keys(), we are doing a GFP_ATOMIC allocation for inserting an
      operation record for the tree mod log. However we can do just fine with
      GFP_NOFS nowadays. The need for GFP_ATOMIC was for the old days when we
      had custom locks with spinning behaviour for extent buffers and we were
      in spinning mode while at fixup_low_keys(). Now we use rw semaphores for
      extent buffer locks, so we can safely use GFP_NOFS.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      879b2221
    • B
      btrfs: re-check reclaim condition in reclaim worker · 81531225
      Boris Burkov 提交于
      I have observed the following case play out and lead to unnecessary
      relocations:
      
      1. write a file across multiple block groups
      2. delete the file
      3. several block groups fall below the reclaim threshold
      4. reclaim the first, moving extents into the others
      5. reclaim the others which are now actually very full, leading to poor
         reclaim behavior with lots of writing, allocating new block groups,
         etc.
      
      I believe the risk of missing some reasonable reclaims is worth it
      when traded off against the savings of avoiding overfull reclaims.
      
      Going forward, it could be interesting to make the check more advanced
      (zoned aware, fragmentation aware, etc...) so that it can be a really
      strong signal both at extent delete and reclaim time.
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      81531225
    • B
      btrfs: skip reclaim if block_group is empty · cc4804bf
      Boris Burkov 提交于
      As we delete extents from a block group, at some deletion we cross below
      the reclaim threshold. It is possible we are still in the middle of
      deleting more extents and might soon hit 0. If the block group is empty
      by the time the reclaim worker runs, we will still relocate it.
      
      This works just fine, as relocating an empty block group ultimately
      results in properly deleting it. However, we have more direct ways of
      removing empty block groups in the cleaner thread. Those are either
      async discard or the unused_bgs list. In fact, when we decide whether to
      relocate a block group during extent deletion, we do check for emptiness
      and prefer the discard/unused_bgs mechanisms when possible.
      
      Not using relocation for this case reduces some modest overhead from
      empty bg relocation:
      
      - extra transactions
      - extra metadata use/churn for creating relocation metadata
      - trying to read the extent tree to look for extents (and in this case
        finding none)
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      cc4804bf
    • F
      btrfs: avoid unnecessary resolution of indirect backrefs during fiemap · 6976201f
      Filipe Manana 提交于
      During fiemap, when determining if a data extent is shared or not, if we
      don't find the extent is directly shared, then we need to determine if
      it's shared through subtrees. For that we need to resolve the indirect
      reference we found in order to figure out the path in the inode's fs tree,
      which is a path starting at the fs tree's root node and going down to the
      leaf that contains the file extent item that points to the data extent.
      We then proceed to determine if any extent buffer in that path is shared
      with other trees or not.
      
      However when the generation of the data extent is more recent than the
      last generation used to snapshot the root, we don't need to determine
      the path, since the data extent can not be shared through snapshots.
      For this case we currently still determine the leaf of that path (at
      find_parent_nodes(), but then stop determining the other nodes in the
      path (at btrfs_is_data_extent_shared()) as it's pointless.
      
      So do the check of the data extent's generation earlier, at
      find_parent_nodes(), before trying to resolve the indirect reference to
      determine the leaf in the path. This saves us from doing one expensive
      b+tree search in the fs tree of our target inode, as well as other minor
      work.
      
      The following test was run on a non-debug kernel (Debian's default kernel
      config):
      
         $ cat test-fiemap.sh
         #!/bin/bash
      
         DEV=/dev/sdi
         MNT=/mnt/sdi
      
         umount $DEV &> /dev/null
         mkfs.btrfs -f $DEV
         # Use compression to quickly create files with a lot of extents
         # (each with a size of 128K).
         mount -o compress=lzo $DEV $MNT
      
         # 40G gives 327680 extents, each with a size of 128K.
         xfs_io -f -c "pwrite -S 0xab -b 1M 0 40G" $MNT/foobar
      
         # Add some more files to increase the size of the fs and extent
         # trees (in the real world there's a lot of files and extents
         # from other files).
         xfs_io -f -c "pwrite -S 0xcd -b 1M 0 20G" $MNT/file1
         xfs_io -f -c "pwrite -S 0xef -b 1M 0 20G" $MNT/file2
         xfs_io -f -c "pwrite -S 0x73 -b 1M 0 20G" $MNT/file3
      
         umount $MNT
         mount -o compress=lzo $DEV $MNT
      
         start=$(date +%s%N)
         filefrag $MNT/foobar
         end=$(date +%s%N)
         dur=$(( (end - start) / 1000000 ))
         echo "fiemap took $dur milliseconds (metadata not cached)"
         echo
      
         start=$(date +%s%N)
         filefrag $MNT/foobar
         end=$(date +%s%N)
         dur=$(( (end - start) / 1000000 ))
         echo "fiemap took $dur milliseconds (metadata cached)"
      
         umount $MNT
      
      Before applying this patch:
      
         (...)
         /mnt/sdi/foobar: 327680 extents found
         fiemap took 1285 milliseconds (metadata not cached)
      
         /mnt/sdi/foobar: 327680 extents found
         fiemap took 742 milliseconds (metadata cached)
      
      After applying this patch:
      
         (...)
         /mnt/sdi/foobar: 327680 extents found
         fiemap took 689 milliseconds (metadata not cached)
      
         /mnt/sdi/foobar: 327680 extents found
         fiemap took 393 milliseconds (metadata cached)
      
      That's a -46.4% total reduction for the metadata not cached case, and
      a -47.0% reduction for the cached metadata case.
      
      The test is somewhat limited in the sense the gains may be higher in
      practice, because in the test the filesystem is small, so we have small
      fs and extent trees, plus there's no concurrent access to the trees as
      well, therefore no lock contention there.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6976201f
    • F
      btrfs: avoid duplicated resolution of indirect backrefs during fiemap · 877c1476
      Filipe Manana 提交于
      During fiemap, when determining if a data extent is shared or not, if we
      don't find the extent is directly shared, then we need to determine if
      it's shared through subtrees. For that we need to resolve the indirect
      reference we found in order to figure out the path in the inode's fs tree,
      which is a path starting at the fs tree's root node and going down to the
      leaf that contains the file extent item that points to the data extent.
      We then proceed to determine if any extent buffer in that path is shared
      with other trees or not.
      
      Currently whenever we find the data extent that a file extent item points
      to is not directly shared, we always resolve the path in the fs tree, and
      then check if any extent buffer in the path is shared. This is a lot of
      work and when we have file extent items that belong to the same leaf, we
      have the same path, so we only need to calculate it once.
      
      This change does that, it keeps track of the current and previous leaf,
      and when we find that a data extent is not directly shared, we try to
      compute the fs tree path only once and then use it for every other file
      extent item in the same leaf, using the existing cached path result for
      the leaf as long as the cache results are valid.
      
      This saves us from doing expensive b+tree searches in the fs tree of our
      target inode, as well as other minor work.
      
      The following test was run on a non-debug kernel (Debian's default kernel
      config):
      
         $ cat test-with-snapshots.sh
         #!/bin/bash
      
         DEV=/dev/sdi
         MNT=/mnt/sdi
      
         umount $DEV &> /dev/null
         mkfs.btrfs -f $DEV
         # Use compression to quickly create files with a lot of extents
         # (each with a size of 128K).
         mount -o compress=lzo $DEV $MNT
      
         # 40G gives 327680 extents, each with a size of 128K.
         xfs_io -f -c "pwrite -S 0xab -b 1M 0 40G" $MNT/foobar
      
         # Add some more files to increase the size of the fs and extent
         # trees (in the real world there's a lot of files and extents
         # from other files).
         xfs_io -f -c "pwrite -S 0xcd -b 1M 0 20G" $MNT/file1
         xfs_io -f -c "pwrite -S 0xef -b 1M 0 20G" $MNT/file2
         xfs_io -f -c "pwrite -S 0x73 -b 1M 0 20G" $MNT/file3
      
         # Create a snapshot so all the extents become indirectly shared
         # through subtrees, with a generation less than or equals to the
         # generation used to create the snapshot.
         btrfs subvolume snapshot -r $MNT $MNT/snap1
      
         umount $MNT
         mount -o compress=lzo $DEV $MNT
      
         start=$(date +%s%N)
         filefrag $MNT/foobar
         end=$(date +%s%N)
         dur=$(( (end - start) / 1000000 ))
         echo "fiemap took $dur milliseconds (metadata not cached)"
         echo
      
         start=$(date +%s%N)
         filefrag $MNT/foobar
         end=$(date +%s%N)
         dur=$(( (end - start) / 1000000 ))
         echo "fiemap took $dur milliseconds (metadata cached)"
      
         umount $MNT
      
      Result before applying this patch:
      
         (...)
         /mnt/sdi/foobar: 327680 extents found
         fiemap took 1204 milliseconds (metadata not cached)
      
         /mnt/sdi/foobar: 327680 extents found
         fiemap took 729 milliseconds (metadata cached)
      
      Result after applying this patch:
      
         (...)
         /mnt/sdi/foobar: 327680 extents found
         fiemap took 732 milliseconds (metadata not cached)
      
         /mnt/sdi/foobar: 327680 extents found
         fiemap took 421 milliseconds (metadata cached)
      
      That's a -46.1% total reduction for the metadata not cached case, and
      a -42.2% reduction for the cached metadata case.
      
      The test is somewhat limited in the sense the gains may be higher in
      practice, because in the test the filesystem is small, so we have small
      fs and extent trees, plus there's no concurrent access to the trees as
      well, therefore no lock contention there.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      877c1476
    • F
      btrfs: move up backref sharedness cache store and lookup functions · 583f4ac5
      Filipe Manana 提交于
      Move the static functions to lookup and store sharedness check of an
      extent buffer to a location above find_all_parents(), because in the
      next patch the lookup function will be used by find_all_parents().
      The store function is also moved just because it's the counter part
      to the lookup function and it's best to have their definitions close
      together.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      583f4ac5
    • F
      btrfs: cache sharedness of the last few data extents during fiemap · 73e339e6
      Filipe Manana 提交于
      During fiemap we process all the file extent items of an inode, by their
      file offset order (left to right b+tree order), and then check if the data
      extent they point at is shared or not. Until now we didn't cache those
      results, we only did it for b+tree nodes/leaves since for each unique
      b+tree path we have access to hundreds of file extent items. However, it
      is also common to repeat checking the sharedness of a particular data
      extent in a very short time window, and the cases that lead to that are
      the following:
      
      1) COW writes.
      
         If have a file extent item like this:
      
                        [ bytenr X, offset = 0, num_bytes = 512K ]
         file offset    0                                        512K
      
         Then a 4K write into file offset 64K happens, we end up with the
         following file extent item layout:
      
                        [ bytenr X, offset = 0, num_bytes = 64K ]
         file offset    0                                       64K
      
                        [ bytenr Y, offset = 0, num_bytes = 4K ]
         file offset   64K                                     68K
      
                        [ bytenr X, offset = 68K, num_bytes = 444K ]
         file offset   68K                                         512K
      
         So during fiemap we well check for the sharedness of the data extent
         with bytenr X twice. Typically for COW writes and for at least
         moderately updated files, we end up with many file extent items that
         point to different sections of the same data extent.
      
      2) Writing into a NOCOW file after a snapshot is taken.
      
         This happens if the target extent was created in a generation older
         than the generation where the last snapshot for the root (the tree the
         inode belongs to) was made.
      
         This leads to a scenario like the previous one.
      
      3) Writing into sections of a preallocated extent.
      
         For example if a file has the following layout:
      
         [ bytenr X, offset = 0, num_bytes = 1M, type = prealloc ]
         0                                                       1M
      
         After doing a 4K write into file offset 0 and another 4K write into
         offset 512K, we get the following layout:
      
            [ bytenr X, offset = 0, num_bytes = 4K, type = regular ]
            0                                                      4K
      
            [ bytenr X, offset = 4K, num_bytes = 508K, type = prealloc ]
           4K                                                          512K
      
            [ bytenr X, offset = 512K, num_bytes = 4K, type = regular ]
         512K                                                         516K
      
            [ bytenr X, offset = 516K, num_bytes = 508K, type = prealloc ]
         516K                                                            1M
      
         So we end up with 4 consecutive file extent items pointing to the data
         extent at bytenr X.
      
      4) Hole punching in the middle of an extent.
      
         For example if a file has the following file extent item:
      
         [ bytenr X, offset = 0, num_bytes = 8M ]
         0                                      8M
      
         And then hole is punched for the file range [4M, 6M[, we our file
         extent item split into two:
      
         [ bytenr X, offset = 0, num_bytes = 4M  ]
         0                                       4M
      
         [ 2M hole, implicit or explicit depending on NO_HOLES feature ]
         4M                                                            6M
      
         [ bytenr X, offset = 6M, num_bytes = 2M  ]
         6M                                       8M
      
         Again, we end up with two file extent items pointing to the same
         data extent.
      
      5) When reflinking (clone and deduplication) within the same file.
         This is probably the least common case of all.
      
      In cases 1, 2, 4 and 4, when we have multiple file extent items that point
      to the same data extent, their distance is usually short, typically
      separated by a few slots in a b+tree leaf (or across sibling leaves). For
      case 5, the distance can vary a lot, but it's typically the less common
      case.
      
      This change caches the result of the sharedness checks for data extents,
      but only for the last 8 extents that we notice that our inode refers to
      with multiple file extent items. Whenever we want to check if a data
      extent is shared, we lookup the cache which consists of doing a linear
      scan of an 8 elements array, and if we find the data extent there, we
      return the result and don't check the extent tree and delayed refs.
      
      The array/cache is small so that doing the search has no noticeable
      negative impact on the performance in case we don't have file extent items
      within a distance of 8 slots that point to the same data extent.
      
      Slots in the cache/array are overwritten in a simple round robin fashion,
      as that approach fits very well.
      
      Using this simple approach with only the last 8 data extents seen is
      effective as usually when multiple file extents items point to the same
      data extent, their distance is within 8 slots. It also uses very little
      memory and the time to cache a result or lookup the cache is negligible.
      
      The following test was run on non-debug kernel (Debian's default kernel
      config) to measure the impact in the case of COW writes (first example
      given above), where we run fiemap after overwriting 33% of the blocks of
      a file:
      
         $ cat test.sh
         #!/bin/bash
      
         DEV=/dev/sdi
         MNT=/mnt/sdi
      
         umount $DEV &> /dev/null
         mkfs.btrfs -f $DEV
         mount $DEV $MNT
      
         FILE_SIZE=$((1 * 1024 * 1024  * 1024))
      
         # Create the file full of 1M extents.
         xfs_io -f -s -c "pwrite -b 1M -S 0xab 0 $FILE_SIZE" $MNT/foobar
      
         block_count=$((FILE_SIZE / 4096))
         # Overwrite about 33% of the file blocks.
         overwrite_count=$((block_count / 3))
      
         echo -e "\nOverwriting $overwrite_count 4K blocks (out of $block_count)..."
         RANDOM=123
         for ((i = 1; i <= $overwrite_count; i++)); do
             off=$(((RANDOM % block_count) * 4096))
             xfs_io -c "pwrite -S 0xcd $off 4K" $MNT/foobar > /dev/null
             echo -ne "\r$i blocks overwritten..."
         done
         echo -e "\n"
      
         # Unmount and mount to clear all cached metadata.
         umount $MNT
         mount $DEV $MNT
      
         start=$(date +%s%N)
         filefrag $MNT/foobar
         end=$(date +%s%N)
         dur=$(( (end - start) / 1000000 ))
         echo "fiemap took $dur milliseconds"
      
         umount $MNT
      
      Result before applying this patch:
      
         fiemap took 128 milliseconds
      
      Result after applying this patch:
      
         fiemap took 92 milliseconds   (-28.1%)
      
      The test is somewhat limited in the sense the gains may be higher in
      practice, because in the test the filesystem is small, so we have small
      fs and extent trees, plus there's no concurrent access to the trees as
      well, therefore no lock contention there.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      73e339e6
    • F
      btrfs: remove useless logic when finding parent nodes · 56f5c199
      Filipe Manana 提交于
      At find_parent_nodes(), at its last step, when iterating over all direct
      references, we are checking if we have a share context and if we have
      a reference with a different root from the one in the share context.
      However that logic is pointless because of two reasons:
      
      1) After the previous patch in the series (subject "btrfs: remove roots
         ulist when checking data extent sharedness"), the roots argument is
         always NULL when using a share check context (struct share_check), so
         this code is never triggered;
      
      2) Even before that previous patch, we could not hit this code because
         if we had a reference with a root different from the one in our share
         context, then we would have exited earlier when doing either of the
         following:
      
            - Adding a second direct ref to the direct refs red black tree
              resulted in extent_is_shared() returning true when called from
              add_direct_ref() -> add_prelim_ref(), after processing delayed
              references or while processing references in the extent tree;
      
            - When adding a second reference to the indirect refs red black
              tree (same as above, extent_is_shared() returns true);
      
            - If we only have one indirect reference and no direct references,
              then when resolving it at resolve_indirect_refs() we immediately
              return that the target extent is shared, therefore never reaching
              that loop that iterates over all direct references at
              find_parent_nodes();
      
            - If we have 1 indirect reference and 1 direct reference, then we
              also exit early because extent_is_shared() ends up returning true
              when called through add_prelim_ref() (by add_direct_ref() or
              add_indirect_ref()) or add_delayed_refs(). Same applies as when
              having a combination of direct, indirect and indirect with missing
              key references.
      
         This logic had been obsoleted since commit 3ec4d323 ("btrfs:
         allow backref search checks for shared extents"), which introduced the
         early exits in case an extent is shared.
      
      So just remove that logic, and assert at find_parent_nodes() that when we
      have a share context we don't have a roots ulist and that we haven't found
      the extent to be directly shared after processing delayed references and
      all references from the extent tree.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      56f5c199
    • F
      btrfs: remove roots ulist when checking data extent sharedness · b6296858
      Filipe Manana 提交于
      Currently btrfs_is_data_extent_shared() is passing a ulist for the roots
      argument of find_parent_nodes(), however it does not use that ulist for
      anything and for this context that list always ends up with at most one
      element.
      
      Since find_parent_nodes() is able to deal with a NULL ulist for its roots
      argument, make btrfs_is_data_extent_shared() pass it NULL and avoid the
      burden of allocating memory for the unnused roots ulist, initializing it,
      releasing it and allocating one struct ulist_node for it during the call
      to find_parent_nodes().
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b6296858
    • F
      btrfs: move ulists to data extent sharedness check context · 84a7949d
      Filipe Manana 提交于
      When calling btrfs_is_data_extent_shared() we pass two ulists that were
      allocated by the caller. This is because the single caller, fiemap, calls
      btrfs_is_data_extent_shared() multiple times and the ulists can be reused,
      instead of allocating new ones before each call and freeing them after
      each call.
      
      Now that we have a context structure/object that we pass to
      btrfs_is_data_extent_shared(), we can move those ulists to it, and hide
      their allocation and the context's allocation in a helper function, as
      well as the freeing of the ulists and the context object. This allows to
      reduce the number of parameters passed to btrfs_is_data_extent_shared(),
      the need to pass the ulists from extent_fiemap() to fiemap_process_hole()
      and having the caller deal with allocating and releasing the ulists.
      
      Also rename one of the ulists from 'tmp' / 'tmp_ulist' to 'refs', since
      that's a much better name as it reflects what the list is used for (and
      matching the argument name for find_parent_nodes()).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      84a7949d
    • F
      btrfs: turn the backref sharedness check cache into a context object · 61dbb952
      Filipe Manana 提交于
      Right now we are using a struct btrfs_backref_shared_cache to pass state
      across multiple btrfs_is_data_extent_shared() calls. The structure's name
      closely follows its current purpose, which is to cache previous checks
      for the sharedness of metadata extents. However we will start using the
      structure for more things other than caching sharedness checks, so rename
      it to struct btrfs_backref_share_check_ctx.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      61dbb952
    • F
      btrfs: directly pass the inode to btrfs_is_data_extent_shared() · ceb707da
      Filipe Manana 提交于
      Currently we pass a root and an inode number as arguments for
      btrfs_is_data_extent_shared() and the inode number is always from an
      inode that belongs to that root (it wouldn't make sense otherwise).
      In every context that we call btrfs_is_data_extent_shared() (fiemap only),
      we have an inode available, so directly pass the inode to the function
      instead of a root and inode number. This reduces the number of parameters
      and it makes the function's signature conform to most other functions we
      have.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ceb707da
    • F
      btrfs: remove checks for a 0 inode number during backref walking · a0a5472a
      Filipe Manana 提交于
      When doing backref walking to determine if an extent is shared, we are
      testing if the inode number, stored in the 'inum' field of struct
      share_check, is 0. However that can never be case, since the all instances
      of the structure are created at btrfs_is_data_extent_shared(), which
      always initializes it with the inode number from a fs tree (and the number
      for any inode from any tree can never be 0). So remove the checks.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a0a5472a
    • F
      btrfs: remove checks for a root with id 0 during backref walking · c9024219
      Filipe Manana 提交于
      When doing backref walking to determine if an extent is shared, we are
      testing the root_objectid of the given share_check struct is 0, but that
      is an impossible case, since btrfs_is_data_extent_shared() always
      initializes the root_objectid field with the id of the given root, and
      no root can have an objectid of 0. So remove those checks.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c9024219
    • F
      btrfs: drop redundant bflags initialization when allocating extent buffer · 206c1d32
      Filipe Manana 提交于
      When allocating an extent buffer, at __alloc_extent_buffer(), there's no
      point in explicitly assigning zero to the bflags field of the new extent
      buffer because we allocated it with kmem_cache_zalloc().
      
      So just remove the redundant initialization, it saves one mov instruction
      in the generated assembly code for x86_64 ("movq $0x0,0x10(%rax)").
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      206c1d32
    • F
      btrfs: drop pointless memset when cloning extent buffer · b98c6cd5
      Filipe Manana 提交于
      At btrfs_clone_extent_buffer(), before allocating the pages array for the
      new extent buffer we are calling memset() to zero out the pages array of
      the extent buffer. This is pointless however, because the extent buffer
      already has every element in its pages array pointing to NULL, as it was
      allocated with kmem_cache_zalloc(). The memset() was introduced with
      commit dd137dd1 ("btrfs: factor out allocating an array of pages"),
      but even before that commit we already depended on the pages array being
      initialized to NULL for the error paths that need to call
      btrfs_release_extent_buffer().
      
      So remove the memset(), it's useless and slightly increases the object
      text size.
      
      Before this change:
      
         $ size fs/btrfs/extent_io.o
            text	   data	    bss	    dec	    hex	filename
           70580	   5469	     40	  76089	  12939	fs/btrfs/extent_io.o
      
      After this change:
      
         $ size fs/btrfs/extent_io.o
            text	   data	    bss	    dec	    hex	filename
           70564	   5469	     40	  76073	  12929	fs/btrfs/extent_io.o
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b98c6cd5
    • F
      btrfs: skip unnecessary delalloc search during fiemap and lseek · a2853ffc
      Filipe Manana 提交于
      During fiemap and lseek (hole and data seeking), there's no point in
      iterating the inode's io tree to count delalloc bits if the inode's
      delalloc bytes counter has a value of zero, as that counter is updated
      whenever we set a range for delalloc or clear a range from delalloc.
      
      So skip the counting and io tree iteration if the inode's delalloc bytes
      counter has a value of zero. This helps save time when processing a file
      range corresponding to a hole or prealloc (unwritten) extent.
      
      This patch is part of a series comprised of the following patches:
      
        btrfs: get the next extent map during fiemap/lseek more efficiently
        btrfs: skip unnecessary extent map searches during fiemap and lseek
        btrfs: skip unnecessary delalloc search during fiemap and lseek
      
      The following test was performed on a release kernel (Debian's default
      kernel config) before and after applying those 3 patches.
      
         # Wrapper to call fiemap in extent count only mode.
         # (struct fiemap::fm_extent_count set to 0)
         $ cat fiemap.c
         #include <stdio.h>
         #include <unistd.h>
         #include <stdlib.h>
         #include <fcntl.h>
         #include <errno.h>
         #include <string.h>
         #include <sys/ioctl.h>
         #include <linux/fs.h>
         #include <linux/fiemap.h>
      
         int main(int argc, char **argv)
         {
                  struct fiemap fiemap = { 0 };
                  int fd;
      
                  if (argc != 2) {
                          printf("usage: %s <path>\n", argv[0]);
                          return 1;
                  }
                  fd = open(argv[1], O_RDONLY);
                  if (fd < 0) {
                          fprintf(stderr, "error opening file: %s\n",
                                  strerror(errno));
                          return 1;
                  }
      
                  /* fiemap.fm_extent_count set to 0, to count extents only. */
                  fiemap.fm_length = FIEMAP_MAX_OFFSET;
                  if (ioctl(fd, FS_IOC_FIEMAP, &fiemap) < 0) {
                          fprintf(stderr, "fiemap error: %s\n",
                                  strerror(errno));
                          return 1;
                  }
                  close(fd);
                  printf("fm_mapped_extents = %d\n", fiemap.fm_mapped_extents);
      
                  return 0;
         }
      
         $ gcc -o fiemap fiemap.c
      
      And the wrapper shell script that creates a file with many holes and runs
      fiemap against it:
      
         $ cat test.sh
         #!/bin/bash
      
         DEV=/dev/sdi
         MNT=/mnt/sdi
      
         mkfs.btrfs -f $DEV
         mount $DEV $MNT
      
         FILE_SIZE=$((1 * 1024 * 1024 * 1024))
         echo -n > $MNT/foobar
         for ((off = 0; off < $FILE_SIZE; off += 8192)); do
                 xfs_io -c "pwrite -S 0xab $off 4K" $MNT/foobar > /dev/null
         done
      
         # flush all delalloc
         sync
      
         start=$(date +%s%N)
         ./fiemap $MNT/foobar
         end=$(date +%s%N)
         dur=$(( (end - start) / 1000000 ))
         echo "fiemap took $dur milliseconds"
      
         umount $MNT
      
      Result before applying patchset:
      
         fm_mapped_extents = 131072
         fiemap took 63 milliseconds
      
      Result after applying patchset:
      
         fm_mapped_extents = 131072
         fiemap took 39 milliseconds   (-38.1%)
      
      Running the same test for a 512M file instead of a 1G file, gave the
      following results.
      
      Result before applying patchset:
      
         fm_mapped_extents = 65536
         fiemap took 29 milliseconds
      
      Result after applying patchset:
      
         fm_mapped_extents = 65536
         fiemap took 20 milliseconds    (-31.0%)
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a2853ffc
    • F
      btrfs: skip unnecessary extent map searches during fiemap and lseek · 013f9c70
      Filipe Manana 提交于
      If we have no outstanding extents it means we don't have any extent maps
      corresponding to delalloc that is flushing, as when an ordered extent is
      created we increment the number of outstanding extents to 1 and when we
      remove the ordered extent we decrement them by 1. So skip extent map tree
      searches if the number of outstanding ordered extents is 0, saving time as
      the tree is not empty if we have previously made some reads or flushed
      delalloc, as in those cases it can have a very large number of extent maps
      for files with many extents.
      
      This helps save time when processing a file range corresponding to a hole
      or prealloc (unwritten) extent.
      
      The next patch in the series has a performance test in its changelog and
      its subject is:
      
          "btrfs: skip unnecessary delalloc search during fiemap and lseek"
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      013f9c70
    • F
      btrfs: get the next extent map during fiemap/lseek more efficiently · d47704bd
      Filipe Manana 提交于
      At find_delalloc_subrange(), when we need to get the next extent map, we
      do a full search on the extent map tree (a red black tree). This is fine
      but it's a lot more efficient to simply use rb_next(), which typically
      requires iterating over less nodes of the tree and never needs to compare
      the ranges of nodes with the one we are looking for.
      
      So add a public helper to extent_map.{h,c} to get the extent map that
      immediately follows another extent map, using rb_next(), and use that
      helper at find_delalloc_subrange().
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d47704bd
    • Q
      btrfs: raid56: make it more explicit that cache rbio should have all its data sectors uptodate · 88074c8b
      Qu Wenruo 提交于
      For Btrfs RAID56, we have a caching system for btrfs raid bios (rbio).
      
      We call cache_rbio_pages() to mark a qualified rbio ready for cache.
      
      The timing happens at:
      
      - finish_rmw()
      
        At this timing, we have already read all necessary sectors, along with
        the rbio sectors, we have covered all data stripes.
      
      - __raid_recover_end_io()
      
        At this timing, we have rebuild the rbio, thus all data sectors
        involved (either from stripe or bio list) are uptodate now.
      
      Thus at the timing of cache_rbio_pages(), we should have all data
      sectors uptodate.
      
      This patch will make it explicit that all data sectors are uptodate at
      cache_rbio_pages() timing, mostly to prepare for the incoming
      verification at RMW time.
      
      This patch will add:
      
      - Extra ASSERT()s in cache_rbio_pages()
        This is to make sure all data sectors, which are not covered by bio,
        are already uptodate.
      
      - Extra ASSERT()s in steal_rbio()
        Since only cached rbio can be stolen, thus every data sector should
        already be uptodate in the source rbio.
      
      - Update __raid_recover_end_io() to update recovered sector->uptodate
        Previously __raid_recover_end_io() will only mark failed sectors
        uptodate if it's doing an RMW.
      
        But this can trigger new ASSERT()s, as for recovery case, a recovered
        failed sector will not be marked uptodate, and trigger ASSERT() in
        later cache_rbio_pages() call.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      88074c8b
    • Q
      btrfs: raid56: allocate memory separately for rbio pointers · 797d74b7
      Qu Wenruo 提交于
      Currently inside alloc_rbio(), we allocate a larger memory to contain
      the following members:
      
      - struct btrfs_raid_rbio itself
      - stripe_pages array
      - bio_sectors array
      - stripe_sectors array
      - finish_pointers array
      
      Then update rbio pointers to point the extra space after the rbio
      structure itself.
      
      Thus it introduced a complex CONSUME_ALLOC() macro to help the thing.
      
      This is too hacky, and is going to make later pointers expansion harder.
      
      This patch will change it to use regular kcalloc() for each pointer
      inside btrfs_raid_bio, making the later expansion much easier.
      
      And introduce a helper free_raid_bio_pointers() to free up all the
      pointer members in btrfs_raid_bio, which will be used in both
      free_raid_bio() and error path of alloc_rbio().
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      797d74b7