1. 08 2月, 2014 1 次提交
  2. 29 1月, 2014 3 次提交
    • F
      Btrfs: add support for inode properties · 63541927
      Filipe David Borba Manana 提交于
      This change adds infrastructure to allow for generic properties for
      inodes. Properties are name/value pairs that can be associated with
      inodes for different purposes. They are stored as xattrs with the
      prefix "btrfs."
      
      Properties can be inherited - this means when a directory inode has
      inheritable properties set, these are added to new inodes created
      under that directory. Further, subvolumes can also have properties
      associated with them, and they can be inherited from their parent
      subvolume. Naturally, directory properties have priority over subvolume
      properties (in practice a subvolume property is just a regular
      property associated with the root inode, objectid 256, of the
      subvolume's fs tree).
      
      This change also adds one specific property implementation, named
      "compression", whose values can be "lzo" or "zlib" and it's an
      inheritable property.
      
      The corresponding changes to btrfs-progs were also implemented.
      A patch with xfstests for this feature will follow once there's
      agreement on this change/feature.
      
      Further, the script at the bottom of this commit message was used to
      do some benchmarks to measure any performance penalties of this feature.
      
      Basically the tests correspond to:
      
      Test 1 - create a filesystem and mount it with compress-force=lzo,
      then sequentially create N files of 64Kb each, measure how long it took
      to create the files, unmount the filesystem, mount the filesystem and
      perform an 'ls -lha' against the test directory holding the N files, and
      report the time the command took.
      
      Test 2 - create a filesystem and don't use any compression option when
      mounting it - instead set the compression property of the subvolume's
      root to 'lzo'. Then create N files of 64Kb, and report the time it took.
      The unmount the filesystem, mount it again and perform an 'ls -lha' like
      in the former test. This means every single file ends up with a property
      (xattr) associated to it.
      
      Test 3 - same as test 2, but uses 4 properties - 3 are duplicates of the
      compression property, have no real effect other than adding more work
      when inheriting properties and taking more btree leaf space.
      
      Test 4 - same as test 3 but with 10 properties per file.
      
      Results (in seconds, and averages of 5 runs each), for different N
      numbers of files follow.
      
      * Without properties (test 1)
      
                          file creation time        ls -lha time
      10 000 files              3.49                   0.76
      100 000 files            47.19                   8.37
      1 000 000 files         518.51                 107.06
      
      * With 1 property (compression property set to lzo - test 2)
      
                          file creation time        ls -lha time
      10 000 files              3.63                    0.93
      100 000 files            48.56                    9.74
      1 000 000 files         537.72                  125.11
      
      * With 4 properties (test 3)
      
                          file creation time        ls -lha time
      10 000 files              3.94                    1.20
      100 000 files            52.14                   11.48
      1 000 000 files         572.70                  142.13
      
      * With 10 properties (test 4)
      
                          file creation time        ls -lha time
      10 000 files              4.61                    1.35
      100 000 files            58.86                   13.83
      1 000 000 files         656.01                  177.61
      
      The increased latencies with properties are essencialy because of:
      
      *) When creating an inode, we now synchronously write 1 more item
         (an xattr item) for each property inherited from the parent dir
         (or subvolume). This could be done in an asynchronous way such
         as we do for dir intex items (delayed-inode.c), which could help
         reduce the file creation latency;
      
      *) With properties, we now have larger fs trees. For this particular
         test each xattr item uses 75 bytes of leaf space in the fs tree.
         This could be less by using a new item for xattr items, instead of
         the current btrfs_dir_item, since we could cut the 'location' and
         'type' fields (saving 18 bytes) and maybe 'transid' too (saving a
         total of 26 bytes per xattr item) from the btrfs_dir_item type.
      
      Also tried batching the xattr insertions (ignoring proper hash
      collision handling, since it didn't exist) when creating files that
      inherit properties from their parent inode/subvolume, but the end
      results were (surprisingly) essentially the same.
      
      Test script:
      
      $ cat test.pl
        #!/usr/bin/perl -w
      
        use strict;
        use Time::HiRes qw(time);
        use constant NUM_FILES => 10_000;
        use constant FILE_SIZES => (64 * 1024);
        use constant DEV => '/dev/sdb4';
        use constant MNT_POINT => '/home/fdmanana/btrfs-tests/dev';
        use constant TEST_DIR => (MNT_POINT . '/testdir');
      
        system("mkfs.btrfs", "-l", "16384", "-f", DEV) == 0 or die "mkfs.btrfs failed!";
      
        # following line for testing without properties
        #system("mount", "-o", "compress-force=lzo", DEV, MNT_POINT) == 0 or die "mount failed!";
      
        # following 2 lines for testing with properties
        system("mount", DEV, MNT_POINT) == 0 or die "mount failed!";
        system("btrfs", "prop", "set", MNT_POINT, "compression", "lzo") == 0 or die "set prop failed!";
      
        system("mkdir", TEST_DIR) == 0 or die "mkdir failed!";
        my ($t1, $t2);
      
        $t1 = time();
        for (my $i = 1; $i <= NUM_FILES; $i++) {
            my $p = TEST_DIR . '/file_' . $i;
            open(my $f, '>', $p) or die "Error opening file!";
            $f->autoflush(1);
            for (my $j = 0; $j < FILE_SIZES; $j += 4096) {
                print $f ('A' x 4096) or die "Error writing to file!";
            }
            close($f);
        }
        $t2 = time();
        print "Time to create " . NUM_FILES . ": " . ($t2 - $t1) . " seconds.\n";
        system("umount", DEV) == 0 or die "umount failed!";
        system("mount", DEV, MNT_POINT) == 0 or die "mount failed!";
      
        $t1 = time();
        system("bash -c 'ls -lha " . TEST_DIR . " > /dev/null'") == 0 or die "ls failed!";
        $t2 = time();
        print "Time to ls -lha all files: " . ($t2 - $t1) . " seconds.\n";
        system("umount", DEV) == 0 or die "umount failed!";
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      63541927
    • J
      btrfs: add ioctl to export size of global metadata reservation · 01e219e8
      Jeff Mahoney 提交于
      btrfs filesystem df output will show the size of the metadata space
      and how much of it is used, and the user assumes that the difference
      is all usable space. Since that's not actually the case due to the
      global metadata reservation, we should provide the full picture to the
      user.
      
      This patch adds an ioctl that exports the size of the global metadata
      reservation so that btrfs filesystem df can report it.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      01e219e8
    • J
      btrfs: add ioctls to query/change feature bits online · 2eaa055f
      Jeff Mahoney 提交于
      There are some feature bits that require no offline setup and can
      be enabled online. I've only reviewed extended irefs, but there will
      probably be more.
      
      We introduce three new ioctls:
      - BTRFS_IOC_GET_SUPPORTED_FEATURES: query the kernel for supported features.
      - BTRFS_IOC_GET_FEATURES: query the kernel for enabled features on a per-fs
        basis, as well as querying for which features are changeable with mounted.
      - BTRFS_IOC_SET_FEATURES: change features on a per-fs basis.
      
      We introduce two new masks per feature set (_SAFE_SET and _SAFE_CLEAR) that
      allow us to define which features are safe to change at runtime.
      
      The failure modes for BTRFS_IOC_SET_FEATURES are as follows:
      - Enabling a completely unsupported feature: warns and returns -ENOTSUPP
      - Enabling a feature that can only be done offline: warns and returns -EPERM
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      2eaa055f
  3. 24 1月, 2014 6 次提交
  4. 23 1月, 2014 3 次提交
  5. 22 1月, 2014 3 次提交
  6. 21 1月, 2014 8 次提交
  7. 20 1月, 2014 2 次提交
  8. 19 1月, 2014 1 次提交
    • M
      net: introduce SO_BPF_EXTENSIONS · ea02f941
      Michal Sekletar 提交于
      For user space packet capturing libraries such as libpcap, there's
      currently only one way to check which BPF extensions are supported
      by the kernel, that is, commit aa1113d9 ("net: filter: return
      -EINVAL if BPF_S_ANC* operation is not supported"). For querying all
      extensions at once this might be rather inconvenient.
      
      Therefore, this patch introduces a new option which can be used as
      an argument for getsockopt(), and allows one to obtain information
      about which BPF extensions are supported by the current kernel.
      
      As David Miller suggests, we do not need to define any bits right
      now and status quo can just return 0 in order to state that this
      versions supports SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Later
      additions to BPF extensions need to add their bits to the
      bpf_tell_extensions() function, as documented in the comment.
      Signed-off-by: NMichal Sekletar <msekleta@redhat.com>
      Cc: David Miller <davem@davemloft.net>
      Reviewed-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea02f941
  9. 18 1月, 2014 1 次提交
  10. 17 1月, 2014 7 次提交
  11. 16 1月, 2014 2 次提交
  12. 14 1月, 2014 3 次提交
    • N
      md: Change handling of save_raid_disk and metadata update during recovery. · f466722c
      NeilBrown 提交于
      Since commit d70ed2e4
         MD: Allow restarting an interrupted incremental recovery.
      
      we don't write out the metadata to devices while they are recovering.
      This had a good reason, but has unfortunate consequences.  This patch
      changes things to make them work better.
      
      At issue is what happens if the array is shut down while a recovery is
      happening, particularly a bitmap-guided recovery.
      Ideally the recovery should pick up where it left off.
      However the metadata cannot represent the state "A recovery is in
      process which is guided by the bitmap".
      
      Before the above mentioned commit, we wrote metadata to the device
      which said "this is being recovered and it is up to <here>".  So after
      a restart, a full recovery (not bitmap-guided) would happen from
      where-ever it was up to.
      
      After the commit the metadata wasn't updated so it still said "This
      device is fully in sync with <this> event count".  That leads to a
      bitmap-based recovery following the whole bitmap, which should be a
      lot less work than a full recovery from some starting point.  So this
      was an improvement.
      
      However updates some metadata but not all leads to other problems.
      In particular, the metadata written to the fully-up-to-date device
      record that the array has all devices present (even though some are
      recovering).  So on restart, mdadm wants to find all devices and
      expects them to have current event counts.
      Obviously it doesn't (some have old event counts) so (when assembling
      with --incremental) it waits indefinitely for the rest of the expected
      devices.
      
      It really is wrong to not update all the metadata together.  Do that
      is bound to cause confusion.
      Instead, we should make it possible to record the truth in the
      metadata.  i.e. we need to be able to record that a device is being
      recovered based on the bitmap.
      We already have a Feature flag to say that recovery is happening.  We
      now add another one to say that it is a bitmap-based recovery.
      
      With this we can remove the code that disables the write-out of
      metadata on some devices.
      
      So this patch:
       - moves the setting of 'saved_raid_disk' from add_new_disk to
         the validate_super methods.  This makes sure it is always set
         properly, both when adding a new device to an array, and when
         assembling an array from a collection of devices.
       - Adds a metadata flag MD_FEATURE_RECOVERY_BITMAP which is only
         used if MD_FEATURE_RECOVERY_OFFSET is set, and record that a
         bitmap-based recovery is allowed.
         This is only present in v1.x metadata. v0.90 doesn't support
         devices which are in the middle of recovery at all.
       - Only skips writing metadata to Faulty devices.
      
       - Also allows rdev state to be set to "-insync" via sysfs.
         This can be used for external-metadata arrays.  When the
         'role' is set the device is assumed to be in-sync.  If, after
         setting the role, we set the state to "-insync", the role is
         moved to saved_raid_disk which effectively says the device is
         partly in-sync with that slot and needs a bitmap recovery.
      
      Cc: Andrei Warkentin <andreiw@vmware.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f466722c
    • E
      audit: use define's for audit version · 70249a9c
      Eric Paris 提交于
      Give names to the audit versions.  Just something for a userspace
      programmer to know what the version provides.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      70249a9c
    • R
      audit: add audit_backlog_wait_time configuration option · 51cc83f0
      Richard Guy Briggs 提交于
      reaahead-collector abuses the audit logging facility to discover which files
      are accessed at boot time to make a pre-load list
      
      Add a tuning option to audit_backlog_wait_time so that if auditd can't keep up,
      or gets blocked, the callers won't be blocked.
      
      Bump audit_status API version to "2".
      Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      51cc83f0