1. 09 9月, 2019 1 次提交
  2. 25 2月, 2019 1 次提交
    • A
      btrfs: introduce new ioctl to unregister a btrfs device · 228a73ab
      Anand Jain 提交于
      Support for a new command that can be used eg. as a command
      
        $ btrfs device scan --forget [dev]'
      (the final name may change though)
      
      to undo the effects of 'btrfs device scan [dev]'. For this purpose
      this patch proposes to use ioctl #5 as it was empty and is next to the
      SCAN ioctl.
      
      The new ioctl BTRFS_IOC_FORGET_DEV works only on the control device
      (/dev/btrfs-control) to unregister one or all devices, devices that are
      not mounted.
      
      The argument is struct btrfs_ioctl_vol_args, ::name specifies the device
      path. To unregister all device, the path is an empty string.
      
      Again, the devices are removed only if they aren't part of a mounte
      filesystem.
      
      This new ioctl provides:
      
      - release of unwanted btrfs_fs_devices and btrfs_devices structures
        from memory if the device is not going to be mounted
      
      - ability to mount filesystem in degraded mode, when one devices is
        corrupted like in split brain raid1
      
      - running test cases which would require reloading the kernel module
        but this is not possible eg. due to mounted filesystem or built-in
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ update changelog ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      228a73ab
  3. 17 12月, 2018 1 次提交
    • N
      btrfs: Introduce support for FSID change without metadata rewrite · 7239ff4b
      Nikolay Borisov 提交于
      This field is going to be used when the user wants to change the UUID
      of the filesystem without having to rewrite all metadata blocks. This
      field adds another level of indirection such that when the FSID is
      changed what really happens is the current UUID (the one with which the
      fs was created) is copied to the 'metadata_uuid' field in the superblock
      as well as a new incompat flag is set METADATA_UUID. When the kernel
      detects this flag is set it knows that the superblock in fact has 2
      UUIDs:
      
      1. Is the UUID which is user-visible, currently known as FSID.
      2. Metadata UUID - this is the UUID which is stamped into all on-disk
         datastructures belonging to this file system.
      
      When the new incompat flag is present device scanning checks whether
      both fsid/metadata_uuid of the scanned device match any of the
      registered filesystems. When the flag is not set then both UUIDs are
      equal and only the FSID is retained on disk, metadata_uuid is set only
      in-memory during mount.
      
      Additionally a new metadata_uuid field is also added to the fs_info
      struct. It's initialised either with the FSID in case METADATA_UUID
      incompat flag is not set or with the metdata_uuid of the superblock
      otherwise.
      
      This commit introduces the new fields as well as the new incompat flag
      and switches all users of the fsid to the new logic.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ minor updates in comments ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7239ff4b
  4. 31 5月, 2018 3 次提交
  5. 22 1月, 2018 1 次提交
  6. 02 11月, 2017 2 次提交
    • G
      License cleanup: add SPDX license identifier to uapi header files with a license · e2be04c7
      Greg Kroah-Hartman 提交于
      Many user space API headers have licensing information, which is either
      incomplete, badly formatted or just a shorthand for referring to the
      license under which the file is supposed to be.  This makes it hard for
      compliance tools to determine the correct license.
      
      Update these files with an SPDX license identifier.  The identifier was
      chosen based on the license information in the file.
      
      GPL/LGPL licensed headers get the matching GPL/LGPL SPDX license
      identifier with the added 'WITH Linux-syscall-note' exception, which is
      the officially assigned exception identifier for the kernel syscall
      exception:
      
         NOTE! This copyright does *not* cover user programs that use kernel
         services by normal system calls - this is merely considered normal use
         of the kernel, and does *not* fall under the heading of "derived work".
      
      This exception makes it possible to include GPL headers into non GPL
      code, without confusing license compliance tools.
      
      Headers which have either explicit dual licensing or are just licensed
      under a non GPL license are updated with the corresponding SPDX
      identifier and the GPLv2 with syscall exception identifier.  The format
      is:
              ((GPL-2.0 WITH Linux-syscall-note) OR SPDX-ID-OF-OTHER-LICENSE)
      
      SPDX license identifiers are a legally binding shorthand, which can be
      used instead of the full boiler plate text.  The update does not remove
      existing license information as this has to be done on a case by case
      basis and the copyright holders might have to be consulted. This will
      happen in a separate step.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.  See the previous patch in this series for the
      methodology of how this patch was researched.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2be04c7
    • Z
      btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 · d24a67b2
      Zygo Blaxell 提交于
      Now that check_extent_in_eb()'s extent offset filter can be turned off,
      we need a way to do it from userspace.
      
      Add a 'flags' field to the btrfs_logical_ino_args structure to disable
      extent offset filtering, taking the place of one of the existing
      reserved[] fields.
      
      Previous versions of LOGICAL_INO neglected to check whether any of the
      reserved fields have non-zero values.  Assigning meaning to those fields
      now may change the behavior of existing programs that left these fields
      uninitialized.  The lack of a zero check also means that new programs
      have no way to know whether the kernel is honoring the flags field.
      
      To avoid these problems, define a new ioctl LOGICAL_INO_V2.  We can
      use the same argument layout as LOGICAL_INO, but shorten the reserved[]
      array by one element and turn it into the 'flags' field.  The V2 ioctl
      explicitly checks that reserved fields and unsupported flag bits are zero
      so that userspace can negotiate future feature bits as they are defined.
      
      Since the memory layouts of the two ioctls' arguments are compatible,
      there is no need for a separate function for logical_to_ino_v2 (contrast
      with tree_search_v2 vs tree_search where the layout and code are quite
      different).  A version parameter and an 'if' statement will suffice.
      
      Now that we have a flags field in logical_ino_args, add a flag
      BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET to get the behavior we want,
      and pass it down the stack to iterate_inodes_from_logical.
      
      Motivation and background, copied from the patchset cover letter:
      
      Suppose we have a file with one extent:
      
          root@tester:~# zcat /usr/share/doc/cpio/changelog.gz > /test/a
          root@tester:~# sync
      
      Split the extent by overwriting it in the middle:
      
          root@tester:~# cat /dev/urandom | dd bs=4k seek=2 skip=2 count=1 conv=notrunc of=/test/a
      
      We should now have 3 extent refs to 2 extents, with one block unreachable.
      The extent tree looks like:
      
          root@tester:~# btrfs-debug-tree /dev/vdc -t 2
          [...]
                  item 9 key (1103101952 EXTENT_ITEM 73728) itemoff 15942 itemsize 53
                          extent refs 2 gen 29 flags DATA
                          extent data backref root 5 objectid 261 offset 0 count 2
          [...]
                  item 11 key (1103175680 EXTENT_ITEM 4096) itemoff 15865 itemsize 53
                          extent refs 1 gen 30 flags DATA
                          extent data backref root 5 objectid 261 offset 8192 count 1
          [...]
      
      and the ref tree looks like:
      
          root@tester:~# btrfs-debug-tree /dev/vdc -t 5
          [...]
                  item 6 key (261 EXTENT_DATA 0) itemoff 15825 itemsize 53
                          extent data disk byte 1103101952 nr 73728
                          extent data offset 0 nr 8192 ram 73728
                          extent compression(none)
                  item 7 key (261 EXTENT_DATA 8192) itemoff 15772 itemsize 53
                          extent data disk byte 1103175680 nr 4096
                          extent data offset 0 nr 4096 ram 4096
                          extent compression(none)
                  item 8 key (261 EXTENT_DATA 12288) itemoff 15719 itemsize 53
                          extent data disk byte 1103101952 nr 73728
                          extent data offset 12288 nr 61440 ram 73728
                          extent compression(none)
          [...]
      
      There are two references to the same extent with different, non-overlapping
      byte offsets:
      
          [------------------72K extent at 1103101952----------------------]
          [--8K----------------|--4K unreachable----|--60K-----------------]
          ^                                         ^
          |                                         |
          [--8K ref offset 0--][--4K ref offset 0--][--60K ref offset 12K--]
                               |
                               v
                               [-----4K extent-----] at 1103175680
      
      We want to find all of the references to extent bytenr 1103101952.
      
      Without the patch (and without running btrfs-debug-tree), we have to
      do it with 18 LOGICAL_INO calls:
      
          root@tester:~# btrfs ins log 1103101952 -P /test/
          Using LOGICAL_INO
          inode 261 offset 0 root 5
      
          root@tester:~# for x in $(seq 0 17); do btrfs ins log $((1103101952 + x * 4096)) -P /test/; done 2>&1 | grep inode
          inode 261 offset 0 root 5
          inode 261 offset 4096 root 5   <- same extent ref as offset 0
                                         (offset 8192 returns empty set, not reachable)
          inode 261 offset 12288 root 5
          inode 261 offset 16384 root 5  \
          inode 261 offset 20480 root 5  |
          inode 261 offset 24576 root 5  |
          inode 261 offset 28672 root 5  |
          inode 261 offset 32768 root 5  |
          inode 261 offset 36864 root 5  \
          inode 261 offset 40960 root 5   > all the same extent ref as offset 12288.
          inode 261 offset 45056 root 5  /  More processing required in userspace
          inode 261 offset 49152 root 5  |  to figure out these are all duplicates.
          inode 261 offset 53248 root 5  |
          inode 261 offset 57344 root 5  |
          inode 261 offset 61440 root 5  |
          inode 261 offset 65536 root 5  |
          inode 261 offset 69632 root 5  /
      
      In the worst case the extents are 128MB long, and we have to do 32768
      iterations of the loop to find one 4K extent ref.
      
      With the patch, we just use one call to map all refs to the extent at once:
          root@tester:~# btrfs ins log 1103101952 -P /test/
          Using LOGICAL_INO_V2
          inode 261 offset 0 root 5
          inode 261 offset 12288 root 5
      
      The TREE_SEARCH ioctl allows userspace to retrieve the offset and
      extent bytenr fields easily once the root, inode and offset are known.
      This is sufficient information to build a complete map of the extent
      and all of its references.  Userspace can use this information to make
      better choices to dedup or defrag.
      Signed-off-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      Reviewed-by: NHans van Kranenburg <hans.van.kranenburg@mendix.com>
      Tested-by: NHans van Kranenburg <hans.van.kranenburg@mendix.com>
      [ copy background and motivation from cover letter ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d24a67b2
  7. 16 8月, 2017 1 次提交
    • N
      btrfs: Add zstd support · 5c1aab1d
      Nick Terrell 提交于
      Add zstd compression and decompression support to BtrFS. zstd at its
      fastest level compresses almost as well as zlib, while offering much
      faster compression and decompression, approaching lzo speeds.
      
      I benchmarked btrfs with zstd compression against no compression, lzo
      compression, and zlib compression. I benchmarked two scenarios. Copying
      a set of files to btrfs, and then reading the files. Copying a tarball
      to btrfs, extracting it to btrfs, and then reading the extracted files.
      After every operation, I call `sync` and include the sync time.
      Between every pair of operations I unmount and remount the filesystem
      to avoid caching. The benchmark files can be found in the upstream
      zstd source repository under
      `contrib/linux-kernel/{btrfs-benchmark.sh,btrfs-extract-benchmark.sh}`
      [1] [2].
      
      I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
      The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
      16 GB of RAM, and a SSD.
      
      The first compression benchmark is copying 10 copies of the unzipped
      Silesia corpus [3] into a BtrFS filesystem mounted with
      `-o compress-force=Method`. The decompression benchmark times how long
      it takes to `tar` all 10 copies into `/dev/null`. The compression ratio is
      measured by comparing the output of `df` and `du`. See the benchmark file
      [1] for details. I benchmarked multiple zstd compression levels, although
      the patch uses zstd level 1.
      
      | Method  | Ratio | Compression MB/s | Decompression speed |
      |---------|-------|------------------|---------------------|
      | None    |  0.99 |              504 |                 686 |
      | lzo     |  1.66 |              398 |                 442 |
      | zlib    |  2.58 |               65 |                 241 |
      | zstd 1  |  2.57 |              260 |                 383 |
      | zstd 3  |  2.71 |              174 |                 408 |
      | zstd 6  |  2.87 |               70 |                 398 |
      | zstd 9  |  2.92 |               43 |                 406 |
      | zstd 12 |  2.93 |               21 |                 408 |
      | zstd 15 |  3.01 |               11 |                 354 |
      
      The next benchmark first copies `linux-4.11.6.tar` [4] to btrfs. Then it
      measures the compression ratio, extracts the tar, and deletes the tar.
      Then it measures the compression ratio again, and `tar`s the extracted
      files into `/dev/null`. See the benchmark file [2] for details.
      
      | Method | Tar Ratio | Extract Ratio | Copy (s) | Extract (s)| Read (s) |
      |--------|-----------|---------------|----------|------------|----------|
      | None   |      0.97 |          0.78 |    0.981 |      5.501 |    8.807 |
      | lzo    |      2.06 |          1.38 |    1.631 |      8.458 |    8.585 |
      | zlib   |      3.40 |          1.86 |    7.750 |     21.544 |   11.744 |
      | zstd 1 |      3.57 |          1.85 |    2.579 |     11.479 |    9.389 |
      
      [1] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-benchmark.sh
      [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-extract-benchmark.sh
      [3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
      [4] https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.11.6.tar.xz
      
      zstd source repository: https://github.com/facebook/zstdSigned-off-by: NNick Terrell <terrelln@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      5c1aab1d
  8. 20 6月, 2017 1 次提交
    • H
      Btrfs: btrfs_ioctl_search_key documentation · 1a63143d
      Hans van Kranenburg 提交于
      A programmer who is trying to implement calling the btrfs SEARCH
      or SEARCH_V2 ioctl will probably soon end up reading this struct
      definition.
      
      Properly document the input fields to prevent common misconceptions:
       1. The search space is linear, not 3 dimensional. The invidual min/max
       values for objectid, type and offset cannot be used to filter the
       result, they only define the endpoints of an interval.
       2. The transaction id (a.k.a. generation) filter applies only on
       transaction id of the last COW operation on a whole metadata page, not
       on individual items.
      
      Ad 1. The first misunderstanding was helped by the previous misleading
      comments on min/max type and offset:
        "keys returned will be >= min and <= max".
      
      Ad 2. For example, running btrfs balance will happily cause rewriting of
      metadata pages that contain a filesystem tree of a read only subvolume,
      causing transids to be increased.
      
      Also, improve descriptions of tree_id and nr_items and add in/out
      annotations.
      Signed-off-by: NHans van Kranenburg <hans.van.kranenburg@mendix.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1a63143d
  9. 18 4月, 2017 1 次提交
  10. 07 3月, 2017 1 次提交
  11. 04 10月, 2016 1 次提交
    • O
      Btrfs: catch invalid free space trees · 6675df31
      Omar Sandoval 提交于
      There are two separate issues that can lead to corrupted free space
      trees.
      
      1. The free space tree bitmaps had an endianness issue on big-endian
         systems which is fixed by an earlier patch in this series.
      2. btrfs-progs before v4.7.3 modified filesystems without updating the
         free space tree.
      
      To catch both of these issues at once, we need to force the free space
      tree to be rebuilt. To do so, add a FREE_SPACE_TREE_VALID compat_ro bit.
      If the bit isn't set, we know that it was either produced by a broken
      big-endian kernel or may have been corrupted by btrfs-progs.
      
      This also provides us with a way to add rudimentary read-write support
      for the free space tree to btrfs-progs: it can just clear this bit and
      have the kernel rebuild the free space tree.
      
      Cc: stable@vger.kernel.org # 4.5+
      Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6675df31
  12. 26 7月, 2016 1 次提交
  13. 30 5月, 2016 1 次提交
  14. 28 4月, 2016 8 次提交
  15. 27 10月, 2015 3 次提交
  16. 03 2月, 2015 1 次提交
  17. 21 11月, 2014 1 次提交
    • E
      Btrfs: return failure if btrfs_dev_replace_finishing() failed · 2fc9f6ba
      Eryu Guan 提交于
      device replace could fail due to another running scrub process or any
      other errors btrfs_scrub_dev() may hit, but this failure doesn't get
      returned to userspace.
      
      The following steps could reproduce this issue
      
      	mkfs -t btrfs -f /dev/sdb1 /dev/sdb2
      	mount /dev/sdb1 /mnt/btrfs
      	while true; do btrfs scrub start -B /mnt/btrfs >/dev/null 2>&1; done &
      	btrfs replace start -Bf /dev/sdb2 /dev/sdb3 /mnt/btrfs
      	# if this replace succeeded, do the following and repeat until
      	# you see this log in dmesg
      	# BTRFS: btrfs_scrub_dev(/dev/sdb2, 2, /dev/sdb3) failed -115
      	#btrfs replace start -Bf /dev/sdb3 /dev/sdb2 /mnt/btrfs
      
      	# once you see the error log in dmesg, check return value of
      	# replace
      	echo $?
      
      Introduce a new dev replace result
      
      BTRFS_IOCTL_DEV_REPLACE_RESULT_SCRUB_INPROGRESS
      
      to catch -EINPROGRESS explicitly and return other errors directly to
      userspace.
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      2fc9f6ba
  18. 29 6月, 2014 1 次提交
  19. 14 6月, 2014 1 次提交
  20. 10 6月, 2014 2 次提交
    • D
      btrfs: retrieve more info from FS_INFO ioctl · 80a773fb
      David Sterba 提交于
      Provide the basic information about filesystem through the ioctl:
      * b-tree node size (same as leaf size)
      * sector size
      * expected alignment of CLONE_RANGE and EXTENT_SAME ioctl arguments
      
      Backward compatibility: if the values are 0, kernel does not provide
      this information, the applications should ignore them.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      80a773fb
    • D
      btrfs: balance filter: add limit of processed chunks · 7d824b6f
      David Sterba 提交于
      This started as debugging helper, to watch the effects of converting
      between raid levels on multiple devices, but could be useful standalone.
      
      In my case the usage filter was not finegrained enough and led to
      converting too many chunks at once. Another example use is in connection
      with drange+devid or vrange filters that allow to work with a specific
      chunk or even with a chunk on a given device.
      
      The limit filter applies last, the value of 0 means no limiting.
      
      CC: Ilya Dryomov <idryomov@gmail.com>
      CC: Hugo Mills <hugo@carfax.org.uk>
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      7d824b6f
  21. 15 2月, 2014 1 次提交
  22. 29 1月, 2014 2 次提交
    • J
      btrfs: add ioctl to export size of global metadata reservation · 01e219e8
      Jeff Mahoney 提交于
      btrfs filesystem df output will show the size of the metadata space
      and how much of it is used, and the user assumes that the difference
      is all usable space. Since that's not actually the case due to the
      global metadata reservation, we should provide the full picture to the
      user.
      
      This patch adds an ioctl that exports the size of the global metadata
      reservation so that btrfs filesystem df can report it.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      01e219e8
    • J
      btrfs: add ioctls to query/change feature bits online · 2eaa055f
      Jeff Mahoney 提交于
      There are some feature bits that require no offline setup and can
      be enabled online. I've only reviewed extended irefs, but there will
      probably be more.
      
      We introduce three new ioctls:
      - BTRFS_IOC_GET_SUPPORTED_FEATURES: query the kernel for supported features.
      - BTRFS_IOC_GET_FEATURES: query the kernel for enabled features on a per-fs
        basis, as well as querying for which features are changeable with mounted.
      - BTRFS_IOC_SET_FEATURES: change features on a per-fs basis.
      
      We introduce two new masks per feature set (_SAFE_SET and _SAFE_CLEAR) that
      allow us to define which features are safe to change at runtime.
      
      The failure modes for BTRFS_IOC_SET_FEATURES are as follows:
      - Enabling a completely unsupported feature: warns and returns -ENOTSUPP
      - Enabling a feature that can only be done offline: warns and returns -EPERM
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      2eaa055f
  23. 01 9月, 2013 2 次提交
  24. 14 6月, 2013 2 次提交
    • A
      btrfs: device delete to get errors from the kernel · 183860f6
      Anand Jain 提交于
      when user runs command btrfs dev del the raid requisite error if any
      goes to the /var/log/messages, its not good idea to clutter messages
      with these user (knowledge) errors, further user don't have to review
      the system messages to know problem with the cli it should be dropped
      to the user as part of the cli return.
      
      to bring this feature created a set of the ERROR defined
      BTRFS_ERROR_DEV* error codes and created their error string.
      
      I expect this enum to be added with other error which we might
      want to communicate to the user land
      
      v3:
      moved the code with in the file no logical change
      
      v1->v2:
      introduce error codes for the device mgmt usage
      
      v1:
      adds a parameter in the ioctl arg struct to carry the error string
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      183860f6
    • J
      Btrfs: add ioctl to wait for qgroup rescan completion · 57254b6e
      Jan Schmidt 提交于
      btrfs_qgroup_wait_for_completion waits until the currently running qgroup
      operation completes. It returns immediately when no rescan process is in
      progress. This is useful to automate things around the rescan process (e.g.
      testing).
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      57254b6e