1. 08 2月, 2014 1 次提交
  2. 29 1月, 2014 3 次提交
    • F
      Btrfs: add support for inode properties · 63541927
      Filipe David Borba Manana 提交于
      This change adds infrastructure to allow for generic properties for
      inodes. Properties are name/value pairs that can be associated with
      inodes for different purposes. They are stored as xattrs with the
      prefix "btrfs."
      
      Properties can be inherited - this means when a directory inode has
      inheritable properties set, these are added to new inodes created
      under that directory. Further, subvolumes can also have properties
      associated with them, and they can be inherited from their parent
      subvolume. Naturally, directory properties have priority over subvolume
      properties (in practice a subvolume property is just a regular
      property associated with the root inode, objectid 256, of the
      subvolume's fs tree).
      
      This change also adds one specific property implementation, named
      "compression", whose values can be "lzo" or "zlib" and it's an
      inheritable property.
      
      The corresponding changes to btrfs-progs were also implemented.
      A patch with xfstests for this feature will follow once there's
      agreement on this change/feature.
      
      Further, the script at the bottom of this commit message was used to
      do some benchmarks to measure any performance penalties of this feature.
      
      Basically the tests correspond to:
      
      Test 1 - create a filesystem and mount it with compress-force=lzo,
      then sequentially create N files of 64Kb each, measure how long it took
      to create the files, unmount the filesystem, mount the filesystem and
      perform an 'ls -lha' against the test directory holding the N files, and
      report the time the command took.
      
      Test 2 - create a filesystem and don't use any compression option when
      mounting it - instead set the compression property of the subvolume's
      root to 'lzo'. Then create N files of 64Kb, and report the time it took.
      The unmount the filesystem, mount it again and perform an 'ls -lha' like
      in the former test. This means every single file ends up with a property
      (xattr) associated to it.
      
      Test 3 - same as test 2, but uses 4 properties - 3 are duplicates of the
      compression property, have no real effect other than adding more work
      when inheriting properties and taking more btree leaf space.
      
      Test 4 - same as test 3 but with 10 properties per file.
      
      Results (in seconds, and averages of 5 runs each), for different N
      numbers of files follow.
      
      * Without properties (test 1)
      
                          file creation time        ls -lha time
      10 000 files              3.49                   0.76
      100 000 files            47.19                   8.37
      1 000 000 files         518.51                 107.06
      
      * With 1 property (compression property set to lzo - test 2)
      
                          file creation time        ls -lha time
      10 000 files              3.63                    0.93
      100 000 files            48.56                    9.74
      1 000 000 files         537.72                  125.11
      
      * With 4 properties (test 3)
      
                          file creation time        ls -lha time
      10 000 files              3.94                    1.20
      100 000 files            52.14                   11.48
      1 000 000 files         572.70                  142.13
      
      * With 10 properties (test 4)
      
                          file creation time        ls -lha time
      10 000 files              4.61                    1.35
      100 000 files            58.86                   13.83
      1 000 000 files         656.01                  177.61
      
      The increased latencies with properties are essencialy because of:
      
      *) When creating an inode, we now synchronously write 1 more item
         (an xattr item) for each property inherited from the parent dir
         (or subvolume). This could be done in an asynchronous way such
         as we do for dir intex items (delayed-inode.c), which could help
         reduce the file creation latency;
      
      *) With properties, we now have larger fs trees. For this particular
         test each xattr item uses 75 bytes of leaf space in the fs tree.
         This could be less by using a new item for xattr items, instead of
         the current btrfs_dir_item, since we could cut the 'location' and
         'type' fields (saving 18 bytes) and maybe 'transid' too (saving a
         total of 26 bytes per xattr item) from the btrfs_dir_item type.
      
      Also tried batching the xattr insertions (ignoring proper hash
      collision handling, since it didn't exist) when creating files that
      inherit properties from their parent inode/subvolume, but the end
      results were (surprisingly) essentially the same.
      
      Test script:
      
      $ cat test.pl
        #!/usr/bin/perl -w
      
        use strict;
        use Time::HiRes qw(time);
        use constant NUM_FILES => 10_000;
        use constant FILE_SIZES => (64 * 1024);
        use constant DEV => '/dev/sdb4';
        use constant MNT_POINT => '/home/fdmanana/btrfs-tests/dev';
        use constant TEST_DIR => (MNT_POINT . '/testdir');
      
        system("mkfs.btrfs", "-l", "16384", "-f", DEV) == 0 or die "mkfs.btrfs failed!";
      
        # following line for testing without properties
        #system("mount", "-o", "compress-force=lzo", DEV, MNT_POINT) == 0 or die "mount failed!";
      
        # following 2 lines for testing with properties
        system("mount", DEV, MNT_POINT) == 0 or die "mount failed!";
        system("btrfs", "prop", "set", MNT_POINT, "compression", "lzo") == 0 or die "set prop failed!";
      
        system("mkdir", TEST_DIR) == 0 or die "mkdir failed!";
        my ($t1, $t2);
      
        $t1 = time();
        for (my $i = 1; $i <= NUM_FILES; $i++) {
            my $p = TEST_DIR . '/file_' . $i;
            open(my $f, '>', $p) or die "Error opening file!";
            $f->autoflush(1);
            for (my $j = 0; $j < FILE_SIZES; $j += 4096) {
                print $f ('A' x 4096) or die "Error writing to file!";
            }
            close($f);
        }
        $t2 = time();
        print "Time to create " . NUM_FILES . ": " . ($t2 - $t1) . " seconds.\n";
        system("umount", DEV) == 0 or die "umount failed!";
        system("mount", DEV, MNT_POINT) == 0 or die "mount failed!";
      
        $t1 = time();
        system("bash -c 'ls -lha " . TEST_DIR . " > /dev/null'") == 0 or die "ls failed!";
        $t2 = time();
        print "Time to ls -lha all files: " . ($t2 - $t1) . " seconds.\n";
        system("umount", DEV) == 0 or die "umount failed!";
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      63541927
    • J
      btrfs: add ioctl to export size of global metadata reservation · 01e219e8
      Jeff Mahoney 提交于
      btrfs filesystem df output will show the size of the metadata space
      and how much of it is used, and the user assumes that the difference
      is all usable space. Since that's not actually the case due to the
      global metadata reservation, we should provide the full picture to the
      user.
      
      This patch adds an ioctl that exports the size of the global metadata
      reservation so that btrfs filesystem df can report it.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      01e219e8
    • J
      btrfs: add ioctls to query/change feature bits online · 2eaa055f
      Jeff Mahoney 提交于
      There are some feature bits that require no offline setup and can
      be enabled online. I've only reviewed extended irefs, but there will
      probably be more.
      
      We introduce three new ioctls:
      - BTRFS_IOC_GET_SUPPORTED_FEATURES: query the kernel for supported features.
      - BTRFS_IOC_GET_FEATURES: query the kernel for enabled features on a per-fs
        basis, as well as querying for which features are changeable with mounted.
      - BTRFS_IOC_SET_FEATURES: change features on a per-fs basis.
      
      We introduce two new masks per feature set (_SAFE_SET and _SAFE_CLEAR) that
      allow us to define which features are safe to change at runtime.
      
      The failure modes for BTRFS_IOC_SET_FEATURES are as follows:
      - Enabling a completely unsupported feature: warns and returns -ENOTSUPP
      - Enabling a feature that can only be done offline: warns and returns -EPERM
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      2eaa055f
  3. 24 1月, 2014 5 次提交
  4. 23 1月, 2014 3 次提交
  5. 22 1月, 2014 2 次提交
  6. 21 1月, 2014 5 次提交
  7. 20 1月, 2014 2 次提交
  8. 18 1月, 2014 1 次提交
  9. 17 1月, 2014 3 次提交
    • J
      floppy: bail out in open() if drive is not responding to block0 read · 7b7b68bb
      Jiri Kosina 提交于
      In case reading of block 0 during open() fails, it is not the right thing
      to let open() succeed.
      
      Fix this by introducing FD_OPEN_SHOULD_FAIL_BIT flag, and setting it in
      case the bio callback encounters an error while trying to read block 0.
      
      As a bonus, this works around certain broken userspace (blkid), which is
      not able to properly handle read()s returning IO errors. Hence be nice to
      those, and bail out during open() already; if block 0 is not readable,
      read()s are not going to provide any meaningful data anyway.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      7b7b68bb
    • V
      add support for Hyper-V reference time counter · e984097b
      Vadim Rozenfeld 提交于
      Signed-off: Peter Lieven <pl@kamp.de>
      Signed-off: Gleb Natapov
      Signed-off: Vadim Rozenfeld <vrozenfe@redhat.com>
      
      After some consideration I decided to submit only Hyper-V reference
      counters support this time. I will submit iTSC support as a separate
      patch as soon as it is ready.
      
      v1 -> v2
      1. mark TSC page dirty as suggested by
          Eric Northup <digitaleric@google.com> and Gleb
      2. disable local irq when calling get_kernel_ns,
          as it was done by Peter Lieven <pl@amp.de>
      3. move check for TSC page enable from second patch
          to this one.
      
      v3 -> v4
          Get rid of ref counter offset.
      
      v4 -> v5
          replace __copy_to_user with kvm_write_guest
          when updateing iTSC page.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e984097b
    • W
      net_sched: act: pick a different type for act_xt · 6c80563c
      WANG Cong 提交于
      In tcf_register_action() we check either ->type or ->kind to see if
      there is an existing action registered, but ipt action registers two
      actions with same type but different kinds. They should have different
      types too.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c80563c
  10. 16 1月, 2014 2 次提交
  11. 14 1月, 2014 4 次提交
    • N
      md: Change handling of save_raid_disk and metadata update during recovery. · f466722c
      NeilBrown 提交于
      Since commit d70ed2e4
         MD: Allow restarting an interrupted incremental recovery.
      
      we don't write out the metadata to devices while they are recovering.
      This had a good reason, but has unfortunate consequences.  This patch
      changes things to make them work better.
      
      At issue is what happens if the array is shut down while a recovery is
      happening, particularly a bitmap-guided recovery.
      Ideally the recovery should pick up where it left off.
      However the metadata cannot represent the state "A recovery is in
      process which is guided by the bitmap".
      
      Before the above mentioned commit, we wrote metadata to the device
      which said "this is being recovered and it is up to <here>".  So after
      a restart, a full recovery (not bitmap-guided) would happen from
      where-ever it was up to.
      
      After the commit the metadata wasn't updated so it still said "This
      device is fully in sync with <this> event count".  That leads to a
      bitmap-based recovery following the whole bitmap, which should be a
      lot less work than a full recovery from some starting point.  So this
      was an improvement.
      
      However updates some metadata but not all leads to other problems.
      In particular, the metadata written to the fully-up-to-date device
      record that the array has all devices present (even though some are
      recovering).  So on restart, mdadm wants to find all devices and
      expects them to have current event counts.
      Obviously it doesn't (some have old event counts) so (when assembling
      with --incremental) it waits indefinitely for the rest of the expected
      devices.
      
      It really is wrong to not update all the metadata together.  Do that
      is bound to cause confusion.
      Instead, we should make it possible to record the truth in the
      metadata.  i.e. we need to be able to record that a device is being
      recovered based on the bitmap.
      We already have a Feature flag to say that recovery is happening.  We
      now add another one to say that it is a bitmap-based recovery.
      
      With this we can remove the code that disables the write-out of
      metadata on some devices.
      
      So this patch:
       - moves the setting of 'saved_raid_disk' from add_new_disk to
         the validate_super methods.  This makes sure it is always set
         properly, both when adding a new device to an array, and when
         assembling an array from a collection of devices.
       - Adds a metadata flag MD_FEATURE_RECOVERY_BITMAP which is only
         used if MD_FEATURE_RECOVERY_OFFSET is set, and record that a
         bitmap-based recovery is allowed.
         This is only present in v1.x metadata. v0.90 doesn't support
         devices which are in the middle of recovery at all.
       - Only skips writing metadata to Faulty devices.
      
       - Also allows rdev state to be set to "-insync" via sysfs.
         This can be used for external-metadata arrays.  When the
         'role' is set the device is assumed to be in-sync.  If, after
         setting the role, we set the state to "-insync", the role is
         moved to saved_raid_disk which effectively says the device is
         partly in-sync with that slot and needs a bitmap recovery.
      
      Cc: Andrei Warkentin <andreiw@vmware.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f466722c
    • E
      audit: use define's for audit version · 70249a9c
      Eric Paris 提交于
      Give names to the audit versions.  Just something for a userspace
      programmer to know what the version provides.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      70249a9c
    • R
      audit: add audit_backlog_wait_time configuration option · 51cc83f0
      Richard Guy Briggs 提交于
      reaahead-collector abuses the audit logging facility to discover which files
      are accessed at boot time to make a pre-load list
      
      Add a tuning option to audit_backlog_wait_time so that if auditd can't keep up,
      or gets blocked, the callers won't be blocked.
      
      Bump audit_status API version to "2".
      Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      51cc83f0
    • R
      audit: clean up AUDIT_GET/SET local variables and future-proof API · 09f883a9
      Richard Guy Briggs 提交于
      Re-named confusing local variable names (status_set and status_get didn't agree
      with their command type name) and reduced their scope.
      
      Future-proof API changes by not depending on the exact size of the audit_status
      struct and by adding an API version field.
      Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      09f883a9
  12. 13 1月, 2014 3 次提交
  13. 12 1月, 2014 1 次提交
  14. 11 1月, 2014 1 次提交
  15. 10 1月, 2014 2 次提交
    • J
      netfilter: introduce l2tp match extension · 74f77a6b
      James Chapman 提交于
      Introduce an xtables add-on for matching L2TP packets. Supports L2TPv2
      and L2TPv3 over IPv4 and IPv6. As well as filtering on L2TP tunnel-id
      and session-id, the filtering decision can also include the L2TP
      packet type (control or data), protocol version (2 or 3) and
      encapsulation type (UDP or IP).
      
      The most common use for this will likely be to filter L2TP data
      packets of individual L2TP tunnels or sessions. While a u32 match can
      be used, the L2TP protocol headers are such that field offsets differ
      depending on bits set in the header, making rules for matching generic
      L2TP connections cumbersome. This match extension takes care of all
      that.
      Signed-off-by: NJames Chapman <jchapman@katalix.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      74f77a6b
    • K
      netfilter: nft_ct: Add support to set the connmark · c4ede3d3
      Kristian Evensen 提交于
      This patch adds kernel support for setting properties of tracked
      connections. Currently, only connmark is supported. One use-case
      for this feature is to provide the same functionality as
      -j CONNMARK --save-mark in iptables.
      
      Some restructuring was needed to implement the set op. The new
      structure follows that of nft_meta.
      Signed-off-by: NKristian Evensen <kristian.evensen@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c4ede3d3
  16. 09 1月, 2014 2 次提交