1. 02 11月, 2017 2 次提交
    • G
      License cleanup: add SPDX license identifier to uapi header files with a license · e2be04c7
      Greg Kroah-Hartman 提交于
      Many user space API headers have licensing information, which is either
      incomplete, badly formatted or just a shorthand for referring to the
      license under which the file is supposed to be.  This makes it hard for
      compliance tools to determine the correct license.
      
      Update these files with an SPDX license identifier.  The identifier was
      chosen based on the license information in the file.
      
      GPL/LGPL licensed headers get the matching GPL/LGPL SPDX license
      identifier with the added 'WITH Linux-syscall-note' exception, which is
      the officially assigned exception identifier for the kernel syscall
      exception:
      
         NOTE! This copyright does *not* cover user programs that use kernel
         services by normal system calls - this is merely considered normal use
         of the kernel, and does *not* fall under the heading of "derived work".
      
      This exception makes it possible to include GPL headers into non GPL
      code, without confusing license compliance tools.
      
      Headers which have either explicit dual licensing or are just licensed
      under a non GPL license are updated with the corresponding SPDX
      identifier and the GPLv2 with syscall exception identifier.  The format
      is:
              ((GPL-2.0 WITH Linux-syscall-note) OR SPDX-ID-OF-OTHER-LICENSE)
      
      SPDX license identifiers are a legally binding shorthand, which can be
      used instead of the full boiler plate text.  The update does not remove
      existing license information as this has to be done on a case by case
      basis and the copyright holders might have to be consulted. This will
      happen in a separate step.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.  See the previous patch in this series for the
      methodology of how this patch was researched.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2be04c7
    • Z
      btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 · d24a67b2
      Zygo Blaxell 提交于
      Now that check_extent_in_eb()'s extent offset filter can be turned off,
      we need a way to do it from userspace.
      
      Add a 'flags' field to the btrfs_logical_ino_args structure to disable
      extent offset filtering, taking the place of one of the existing
      reserved[] fields.
      
      Previous versions of LOGICAL_INO neglected to check whether any of the
      reserved fields have non-zero values.  Assigning meaning to those fields
      now may change the behavior of existing programs that left these fields
      uninitialized.  The lack of a zero check also means that new programs
      have no way to know whether the kernel is honoring the flags field.
      
      To avoid these problems, define a new ioctl LOGICAL_INO_V2.  We can
      use the same argument layout as LOGICAL_INO, but shorten the reserved[]
      array by one element and turn it into the 'flags' field.  The V2 ioctl
      explicitly checks that reserved fields and unsupported flag bits are zero
      so that userspace can negotiate future feature bits as they are defined.
      
      Since the memory layouts of the two ioctls' arguments are compatible,
      there is no need for a separate function for logical_to_ino_v2 (contrast
      with tree_search_v2 vs tree_search where the layout and code are quite
      different).  A version parameter and an 'if' statement will suffice.
      
      Now that we have a flags field in logical_ino_args, add a flag
      BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET to get the behavior we want,
      and pass it down the stack to iterate_inodes_from_logical.
      
      Motivation and background, copied from the patchset cover letter:
      
      Suppose we have a file with one extent:
      
          root@tester:~# zcat /usr/share/doc/cpio/changelog.gz > /test/a
          root@tester:~# sync
      
      Split the extent by overwriting it in the middle:
      
          root@tester:~# cat /dev/urandom | dd bs=4k seek=2 skip=2 count=1 conv=notrunc of=/test/a
      
      We should now have 3 extent refs to 2 extents, with one block unreachable.
      The extent tree looks like:
      
          root@tester:~# btrfs-debug-tree /dev/vdc -t 2
          [...]
                  item 9 key (1103101952 EXTENT_ITEM 73728) itemoff 15942 itemsize 53
                          extent refs 2 gen 29 flags DATA
                          extent data backref root 5 objectid 261 offset 0 count 2
          [...]
                  item 11 key (1103175680 EXTENT_ITEM 4096) itemoff 15865 itemsize 53
                          extent refs 1 gen 30 flags DATA
                          extent data backref root 5 objectid 261 offset 8192 count 1
          [...]
      
      and the ref tree looks like:
      
          root@tester:~# btrfs-debug-tree /dev/vdc -t 5
          [...]
                  item 6 key (261 EXTENT_DATA 0) itemoff 15825 itemsize 53
                          extent data disk byte 1103101952 nr 73728
                          extent data offset 0 nr 8192 ram 73728
                          extent compression(none)
                  item 7 key (261 EXTENT_DATA 8192) itemoff 15772 itemsize 53
                          extent data disk byte 1103175680 nr 4096
                          extent data offset 0 nr 4096 ram 4096
                          extent compression(none)
                  item 8 key (261 EXTENT_DATA 12288) itemoff 15719 itemsize 53
                          extent data disk byte 1103101952 nr 73728
                          extent data offset 12288 nr 61440 ram 73728
                          extent compression(none)
          [...]
      
      There are two references to the same extent with different, non-overlapping
      byte offsets:
      
          [------------------72K extent at 1103101952----------------------]
          [--8K----------------|--4K unreachable----|--60K-----------------]
          ^                                         ^
          |                                         |
          [--8K ref offset 0--][--4K ref offset 0--][--60K ref offset 12K--]
                               |
                               v
                               [-----4K extent-----] at 1103175680
      
      We want to find all of the references to extent bytenr 1103101952.
      
      Without the patch (and without running btrfs-debug-tree), we have to
      do it with 18 LOGICAL_INO calls:
      
          root@tester:~# btrfs ins log 1103101952 -P /test/
          Using LOGICAL_INO
          inode 261 offset 0 root 5
      
          root@tester:~# for x in $(seq 0 17); do btrfs ins log $((1103101952 + x * 4096)) -P /test/; done 2>&1 | grep inode
          inode 261 offset 0 root 5
          inode 261 offset 4096 root 5   <- same extent ref as offset 0
                                         (offset 8192 returns empty set, not reachable)
          inode 261 offset 12288 root 5
          inode 261 offset 16384 root 5  \
          inode 261 offset 20480 root 5  |
          inode 261 offset 24576 root 5  |
          inode 261 offset 28672 root 5  |
          inode 261 offset 32768 root 5  |
          inode 261 offset 36864 root 5  \
          inode 261 offset 40960 root 5   > all the same extent ref as offset 12288.
          inode 261 offset 45056 root 5  /  More processing required in userspace
          inode 261 offset 49152 root 5  |  to figure out these are all duplicates.
          inode 261 offset 53248 root 5  |
          inode 261 offset 57344 root 5  |
          inode 261 offset 61440 root 5  |
          inode 261 offset 65536 root 5  |
          inode 261 offset 69632 root 5  /
      
      In the worst case the extents are 128MB long, and we have to do 32768
      iterations of the loop to find one 4K extent ref.
      
      With the patch, we just use one call to map all refs to the extent at once:
          root@tester:~# btrfs ins log 1103101952 -P /test/
          Using LOGICAL_INO_V2
          inode 261 offset 0 root 5
          inode 261 offset 12288 root 5
      
      The TREE_SEARCH ioctl allows userspace to retrieve the offset and
      extent bytenr fields easily once the root, inode and offset are known.
      This is sufficient information to build a complete map of the extent
      and all of its references.  Userspace can use this information to make
      better choices to dedup or defrag.
      Signed-off-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      Reviewed-by: NHans van Kranenburg <hans.van.kranenburg@mendix.com>
      Tested-by: NHans van Kranenburg <hans.van.kranenburg@mendix.com>
      [ copy background and motivation from cover letter ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d24a67b2
  2. 16 8月, 2017 1 次提交
    • N
      btrfs: Add zstd support · 5c1aab1d
      Nick Terrell 提交于
      Add zstd compression and decompression support to BtrFS. zstd at its
      fastest level compresses almost as well as zlib, while offering much
      faster compression and decompression, approaching lzo speeds.
      
      I benchmarked btrfs with zstd compression against no compression, lzo
      compression, and zlib compression. I benchmarked two scenarios. Copying
      a set of files to btrfs, and then reading the files. Copying a tarball
      to btrfs, extracting it to btrfs, and then reading the extracted files.
      After every operation, I call `sync` and include the sync time.
      Between every pair of operations I unmount and remount the filesystem
      to avoid caching. The benchmark files can be found in the upstream
      zstd source repository under
      `contrib/linux-kernel/{btrfs-benchmark.sh,btrfs-extract-benchmark.sh}`
      [1] [2].
      
      I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
      The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
      16 GB of RAM, and a SSD.
      
      The first compression benchmark is copying 10 copies of the unzipped
      Silesia corpus [3] into a BtrFS filesystem mounted with
      `-o compress-force=Method`. The decompression benchmark times how long
      it takes to `tar` all 10 copies into `/dev/null`. The compression ratio is
      measured by comparing the output of `df` and `du`. See the benchmark file
      [1] for details. I benchmarked multiple zstd compression levels, although
      the patch uses zstd level 1.
      
      | Method  | Ratio | Compression MB/s | Decompression speed |
      |---------|-------|------------------|---------------------|
      | None    |  0.99 |              504 |                 686 |
      | lzo     |  1.66 |              398 |                 442 |
      | zlib    |  2.58 |               65 |                 241 |
      | zstd 1  |  2.57 |              260 |                 383 |
      | zstd 3  |  2.71 |              174 |                 408 |
      | zstd 6  |  2.87 |               70 |                 398 |
      | zstd 9  |  2.92 |               43 |                 406 |
      | zstd 12 |  2.93 |               21 |                 408 |
      | zstd 15 |  3.01 |               11 |                 354 |
      
      The next benchmark first copies `linux-4.11.6.tar` [4] to btrfs. Then it
      measures the compression ratio, extracts the tar, and deletes the tar.
      Then it measures the compression ratio again, and `tar`s the extracted
      files into `/dev/null`. See the benchmark file [2] for details.
      
      | Method | Tar Ratio | Extract Ratio | Copy (s) | Extract (s)| Read (s) |
      |--------|-----------|---------------|----------|------------|----------|
      | None   |      0.97 |          0.78 |    0.981 |      5.501 |    8.807 |
      | lzo    |      2.06 |          1.38 |    1.631 |      8.458 |    8.585 |
      | zlib   |      3.40 |          1.86 |    7.750 |     21.544 |   11.744 |
      | zstd 1 |      3.57 |          1.85 |    2.579 |     11.479 |    9.389 |
      
      [1] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-benchmark.sh
      [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-extract-benchmark.sh
      [3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
      [4] https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.11.6.tar.xz
      
      zstd source repository: https://github.com/facebook/zstdSigned-off-by: NNick Terrell <terrelln@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      5c1aab1d
  3. 20 6月, 2017 1 次提交
    • H
      Btrfs: btrfs_ioctl_search_key documentation · 1a63143d
      Hans van Kranenburg 提交于
      A programmer who is trying to implement calling the btrfs SEARCH
      or SEARCH_V2 ioctl will probably soon end up reading this struct
      definition.
      
      Properly document the input fields to prevent common misconceptions:
       1. The search space is linear, not 3 dimensional. The invidual min/max
       values for objectid, type and offset cannot be used to filter the
       result, they only define the endpoints of an interval.
       2. The transaction id (a.k.a. generation) filter applies only on
       transaction id of the last COW operation on a whole metadata page, not
       on individual items.
      
      Ad 1. The first misunderstanding was helped by the previous misleading
      comments on min/max type and offset:
        "keys returned will be >= min and <= max".
      
      Ad 2. For example, running btrfs balance will happily cause rewriting of
      metadata pages that contain a filesystem tree of a read only subvolume,
      causing transids to be increased.
      
      Also, improve descriptions of tree_id and nr_items and add in/out
      annotations.
      Signed-off-by: NHans van Kranenburg <hans.van.kranenburg@mendix.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1a63143d
  4. 18 4月, 2017 1 次提交
  5. 07 3月, 2017 1 次提交
  6. 04 10月, 2016 1 次提交
    • O
      Btrfs: catch invalid free space trees · 6675df31
      Omar Sandoval 提交于
      There are two separate issues that can lead to corrupted free space
      trees.
      
      1. The free space tree bitmaps had an endianness issue on big-endian
         systems which is fixed by an earlier patch in this series.
      2. btrfs-progs before v4.7.3 modified filesystems without updating the
         free space tree.
      
      To catch both of these issues at once, we need to force the free space
      tree to be rebuilt. To do so, add a FREE_SPACE_TREE_VALID compat_ro bit.
      If the bit isn't set, we know that it was either produced by a broken
      big-endian kernel or may have been corrupted by btrfs-progs.
      
      This also provides us with a way to add rudimentary read-write support
      for the free space tree to btrfs-progs: it can just clear this bit and
      have the kernel rebuild the free space tree.
      
      Cc: stable@vger.kernel.org # 4.5+
      Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6675df31
  7. 26 7月, 2016 1 次提交
  8. 30 5月, 2016 1 次提交
  9. 28 4月, 2016 8 次提交
  10. 27 10月, 2015 3 次提交
  11. 03 2月, 2015 1 次提交
  12. 21 11月, 2014 1 次提交
    • E
      Btrfs: return failure if btrfs_dev_replace_finishing() failed · 2fc9f6ba
      Eryu Guan 提交于
      device replace could fail due to another running scrub process or any
      other errors btrfs_scrub_dev() may hit, but this failure doesn't get
      returned to userspace.
      
      The following steps could reproduce this issue
      
      	mkfs -t btrfs -f /dev/sdb1 /dev/sdb2
      	mount /dev/sdb1 /mnt/btrfs
      	while true; do btrfs scrub start -B /mnt/btrfs >/dev/null 2>&1; done &
      	btrfs replace start -Bf /dev/sdb2 /dev/sdb3 /mnt/btrfs
      	# if this replace succeeded, do the following and repeat until
      	# you see this log in dmesg
      	# BTRFS: btrfs_scrub_dev(/dev/sdb2, 2, /dev/sdb3) failed -115
      	#btrfs replace start -Bf /dev/sdb3 /dev/sdb2 /mnt/btrfs
      
      	# once you see the error log in dmesg, check return value of
      	# replace
      	echo $?
      
      Introduce a new dev replace result
      
      BTRFS_IOCTL_DEV_REPLACE_RESULT_SCRUB_INPROGRESS
      
      to catch -EINPROGRESS explicitly and return other errors directly to
      userspace.
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      2fc9f6ba
  13. 29 6月, 2014 1 次提交
  14. 14 6月, 2014 1 次提交
  15. 10 6月, 2014 2 次提交
    • D
      btrfs: retrieve more info from FS_INFO ioctl · 80a773fb
      David Sterba 提交于
      Provide the basic information about filesystem through the ioctl:
      * b-tree node size (same as leaf size)
      * sector size
      * expected alignment of CLONE_RANGE and EXTENT_SAME ioctl arguments
      
      Backward compatibility: if the values are 0, kernel does not provide
      this information, the applications should ignore them.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      80a773fb
    • D
      btrfs: balance filter: add limit of processed chunks · 7d824b6f
      David Sterba 提交于
      This started as debugging helper, to watch the effects of converting
      between raid levels on multiple devices, but could be useful standalone.
      
      In my case the usage filter was not finegrained enough and led to
      converting too many chunks at once. Another example use is in connection
      with drange+devid or vrange filters that allow to work with a specific
      chunk or even with a chunk on a given device.
      
      The limit filter applies last, the value of 0 means no limiting.
      
      CC: Ilya Dryomov <idryomov@gmail.com>
      CC: Hugo Mills <hugo@carfax.org.uk>
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      7d824b6f
  16. 15 2月, 2014 1 次提交
  17. 29 1月, 2014 2 次提交
    • J
      btrfs: add ioctl to export size of global metadata reservation · 01e219e8
      Jeff Mahoney 提交于
      btrfs filesystem df output will show the size of the metadata space
      and how much of it is used, and the user assumes that the difference
      is all usable space. Since that's not actually the case due to the
      global metadata reservation, we should provide the full picture to the
      user.
      
      This patch adds an ioctl that exports the size of the global metadata
      reservation so that btrfs filesystem df can report it.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      01e219e8
    • J
      btrfs: add ioctls to query/change feature bits online · 2eaa055f
      Jeff Mahoney 提交于
      There are some feature bits that require no offline setup and can
      be enabled online. I've only reviewed extended irefs, but there will
      probably be more.
      
      We introduce three new ioctls:
      - BTRFS_IOC_GET_SUPPORTED_FEATURES: query the kernel for supported features.
      - BTRFS_IOC_GET_FEATURES: query the kernel for enabled features on a per-fs
        basis, as well as querying for which features are changeable with mounted.
      - BTRFS_IOC_SET_FEATURES: change features on a per-fs basis.
      
      We introduce two new masks per feature set (_SAFE_SET and _SAFE_CLEAR) that
      allow us to define which features are safe to change at runtime.
      
      The failure modes for BTRFS_IOC_SET_FEATURES are as follows:
      - Enabling a completely unsupported feature: warns and returns -ENOTSUPP
      - Enabling a feature that can only be done offline: warns and returns -EPERM
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      2eaa055f
  18. 01 9月, 2013 2 次提交
  19. 14 6月, 2013 2 次提交
    • A
      btrfs: device delete to get errors from the kernel · 183860f6
      Anand Jain 提交于
      when user runs command btrfs dev del the raid requisite error if any
      goes to the /var/log/messages, its not good idea to clutter messages
      with these user (knowledge) errors, further user don't have to review
      the system messages to know problem with the cli it should be dropped
      to the user as part of the cli return.
      
      to bring this feature created a set of the ERROR defined
      BTRFS_ERROR_DEV* error codes and created their error string.
      
      I expect this enum to be added with other error which we might
      want to communicate to the user land
      
      v3:
      moved the code with in the file no logical change
      
      v1->v2:
      introduce error codes for the device mgmt usage
      
      v1:
      adds a parameter in the ioctl arg struct to carry the error string
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      183860f6
    • J
      Btrfs: add ioctl to wait for qgroup rescan completion · 57254b6e
      Jan Schmidt 提交于
      btrfs_qgroup_wait_for_completion waits until the currently running qgroup
      operation completes. It returns immediately when no rescan process is in
      progress. This is useful to automate things around the rescan process (e.g.
      testing).
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      57254b6e
  20. 07 5月, 2013 2 次提交
    • J
      Btrfs: rescan for qgroups · 2f232036
      Jan Schmidt 提交于
      If qgroup tracking is out of sync, a rescan operation can be started. It
      iterates the complete extent tree and recalculates all qgroup tracking data.
      This is an expensive operation and should not be used unless required.
      
      A filesystem under rescan can still be umounted. The rescan continues on the
      next mount.  Status information is provided with a separate ioctl while a
      rescan operation is in progress.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      2f232036
    • S
      Btrfs: allow omitting stream header and end-cmd for btrfs send · c2c71324
      Stefan Behrens 提交于
      Two new flags are added to allow omitting the stream header and the
      end command for btrfs send streams. This is used in cases where you
      send multiple snapshots back-to-back in one stream.
      
      This used to be encoded like this (with 2 snapshots in this example):
      <stream header> + <sequence of commands> + <end cmd> +
      <stream header> + <sequence of commands> + <end cmd> + EOF
      
      The new format (if the two new flags are used) is this one:
      <stream header> + <sequence of commands> +
                        <sequence of commands> + <end cmd>
      
      Note that the currently existing receivers treat <end cmd> only as
      an indication that a new <stream header> is following. This means,
      you can just skip the sequence <end cmd> <stream header> without
      loosing compatibility. As long as an EOF is following, the currently
      existing receivers handle the new format (if the two new flags are
      used) exactly as the old one.
      
      So what is the benefit of this change? The goal is to be able to use
      a single stream (one TCP connection) to multiplex a request/response
      handshake plus Btrfs send streams, all in the same stream. In this
      case you cannot evaluate an EOF condition as an end of the Btrfs send
      stream. You need something else, and the <end cmd> is just perfect
      for this purpose.
      
      The summary is:
      The format change is driven by the need to send several Btrfs send
      streams over a single TCP connections, with the ability for a repeated
      request/response handshake in the middle. And this format change does
      not break any existing tool, it is completely compatible.
      
      You could compare the old behaviour of the Btrfs send stream to the
      one of ftp where you need a seperate request/response channel and
      newly opened data transfer channels for each file, while the new
      behaviour is more like http using a single stream for everything.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c2c71324
  21. 21 2月, 2013 3 次提交
  22. 20 2月, 2013 1 次提交
  23. 17 12月, 2012 1 次提交