1. 29 5月, 2015 1 次提交
    • D
      percpu_counter: batch size aware __percpu_counter_compare() · 80188b0d
      Dave Chinner 提交于
      XFS uses non-stanard batch sizes for avoiding frequent global
      counter updates on it's allocated inode counters, as they increment
      or decrement in batches of 64 inodes. Hence the standard percpu
      counter batch of 32 means that the counter is effectively a global
      counter. Currently Xfs uses a batch size of 128 so that it doesn't
      take the global lock on every single modification.
      
      However, Xfs also needs to compare accurately against zero, which
      means we need to use percpu_counter_compare(), and that has a
      hard-coded batch size of 32, and hence will spuriously fail to
      detect when it is supposed to use precise comparisons and hence
      the accounting goes wrong.
      
      Add __percpu_counter_compare() to take a custom batch size so we can
      use it sanely in XFS and factor percpu_counter_compare() to use it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      80188b0d
  2. 28 5月, 2015 2 次提交
    • R
      cpumask_set_cpu_local_first => cpumask_local_spread, lament · f36963c9
      Rusty Russell 提交于
      da91309e (cpumask: Utility function to set n'th cpu...) created a
      genuinely weird function.  I never saw it before, it went through DaveM.
      (He only does this to make us other maintainers feel better about our own
      mistakes.)
      
      cpumask_set_cpu_local_first's purpose is say "I need to spread things
      across N online cpus, choose the ones on this numa node first"; you call
      it in a loop.
      
      It can fail.  One of the two callers ignores this, the other aborts and
      fails the device open.
      
      It can fail in two ways: allocating the off-stack cpumask, or through a
      convoluted codepath which AFAICT can only occur if cpu_online_mask
      changes.  Which shouldn't happen, because if cpu_online_mask can change
      while you call this, it could return a now-offline cpu anyway.
      
      It contains a nonsensical test "!cpumask_of_node(numa_node)".  This was
      drawn to my attention by Geert, who said this causes a warning on Sparc.
      It sets a single bit in a cpumask instead of returning a cpu number,
      because that's what the callers want.
      
      It could be made more efficient by passing the previous cpu rather than
      an index, but that would be more invasive to the callers.
      
      Fixes: da91309e
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (then rebased)
      Tested-by: NAmir Vadai <amirv@mellanox.com>
      Acked-by: NAmir Vadai <amirv@mellanox.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      f36963c9
    • D
      test_bpf: add similarly conflicting jump test case only for classic · bde28bc6
      Daniel Borkmann 提交于
      While 3b529602 ("test_bpf: add more eBPF jump torture cases")
      added the int3 bug test case only for eBPF, which needs exactly 11
      passes to converge, here's a version for classic BPF with 11 passes,
      and one that would need 70 passes on x86_64 to actually converge for
      being successfully JITed. Effectively, all jumps are being optimized
      out resulting in a JIT image of just 89 bytes (from originally max
      BPF insns), only returning K.
      
      Might be useful as a receipe for folks wanting to craft a test case
      when backporting the fix in commit 3f7352bf ("x86: bpf_jit: fix
      compilation of large bpf programs") while not having eBPF. The 2nd
      one is delegated to the interpreter as the last pass still results
      in shrinking, in other words, this one won't be JITed on x86_64.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bde28bc6
  3. 25 5月, 2015 1 次提交
  4. 23 5月, 2015 1 次提交
  5. 17 5月, 2015 1 次提交
    • H
      rhashtable: Add cap on number of elements in hash table · 07ee0722
      Herbert Xu 提交于
      We currently have no limit on the number of elements in a hash table.
      This is a problem because some users (tipc) set a ceiling on the
      maximum table size and when that is reached the hash table may
      degenerate.  Others may encounter OOM when growing and if we allow
      insertions when that happens the hash table perofrmance may also
      suffer.
      
      This patch adds a new paramater insecure_max_entries which becomes
      the cap on the table.  If unset it defaults to max_size * 2.  If
      it is also zero it means that there is no cap on the number of
      elements in the table.  However, the table will grow whenever the
      utilisation hits 100% and if that growth fails, you will get ENOMEM
      on insertion.
      
      As allowing oversubscription is potentially dangerous, the name
      contains the word insecure.
      
      Note that the cap is not a hard limit.  This is done for performance
      reasons as enforcing a hard limit will result in use of atomic ops
      that are heavier than the ones we currently use.
      
      The reasoning is that we're only guarding against a gross over-
      subscription of the table, rather than a small breach of the limit.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07ee0722
  6. 15 5月, 2015 2 次提交
    • M
      test_bpf: fix sparse warnings · 56cbaa45
      Michael Holzheu 提交于
      Fix several sparse warnings like:
      lib/test_bpf.c:1824:25: sparse: constant 4294967295 is so big it is long
      lib/test_bpf.c:1878:25: sparse: constant 0x0000ffffffff0000 is so big it is long
      
      Fixes: cffc642d ("test_bpf: add 173 new testcases for eBPF")
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56cbaa45
    • D
      test_bpf: add tests related to BPF_MAXINSNS · a4afd37b
      Daniel Borkmann 提交于
      Couple of torture test cases related to the bug fixed in 0b59d880
      ("ARM: net: delegate filter to kernel interpreter when imm_offset()
      return value can't fit into 12bits.").
      
      I've added a helper to allocate and fill the insn space. Output on
      x86_64 from my laptop:
      
      test_bpf: #233 BPF_MAXINSNS: Maximum possible literals jited:0 7 PASS
      test_bpf: #234 BPF_MAXINSNS: Single literal jited:0 8 PASS
      test_bpf: #235 BPF_MAXINSNS: Run/add until end jited:0 11553 PASS
      test_bpf: #236 BPF_MAXINSNS: Too many instructions PASS
      test_bpf: #237 BPF_MAXINSNS: Very long jump jited:0 9 PASS
      test_bpf: #238 BPF_MAXINSNS: Ctx heavy transformations jited:0 20329 20398 PASS
      test_bpf: #239 BPF_MAXINSNS: Call heavy transformations jited:0 32178 32475 PASS
      test_bpf: #240 BPF_MAXINSNS: Jump heavy test jited:0 10518 PASS
      
      test_bpf: #233 BPF_MAXINSNS: Maximum possible literals jited:1 4 PASS
      test_bpf: #234 BPF_MAXINSNS: Single literal jited:1 4 PASS
      test_bpf: #235 BPF_MAXINSNS: Run/add until end jited:1 1625 PASS
      test_bpf: #236 BPF_MAXINSNS: Too many instructions PASS
      test_bpf: #237 BPF_MAXINSNS: Very long jump jited:1 8 PASS
      test_bpf: #238 BPF_MAXINSNS: Ctx heavy transformations jited:1 3301 3174 PASS
      test_bpf: #239 BPF_MAXINSNS: Call heavy transformations jited:1 24107 23491 PASS
      test_bpf: #240 BPF_MAXINSNS: Jump heavy test jited:1 8651 PASS
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Nicolas Schichan <nschichan@freebox.fr>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4afd37b
  7. 13 5月, 2015 1 次提交
  8. 11 5月, 2015 1 次提交
  9. 06 5月, 2015 4 次提交
  10. 04 5月, 2015 7 次提交
    • D
      lib: make memzero_explicit more robust against dead store elimination · 7829fb09
      Daniel Borkmann 提交于
      In commit 0b053c95 ("lib: memzero_explicit: use barrier instead
      of OPTIMIZER_HIDE_VAR"), we made memzero_explicit() more robust in
      case LTO would decide to inline memzero_explicit() and eventually
      find out it could be elimiated as dead store.
      
      While using barrier() works well for the case of gcc, recent efforts
      from LLVMLinux people suggest to use llvm as an alternative to gcc,
      and there, Stephan found in a simple stand-alone user space example
      that llvm could nevertheless optimize and thus elimitate the memset().
      A similar issue has been observed in the referenced llvm bug report,
      which is regarded as not-a-bug.
      
      Based on some experiments, icc is a bit special on its own, while it
      doesn't seem to eliminate the memset(), it could do so with an own
      implementation, and then result in similar findings as with llvm.
      
      The fix in this patch now works for all three compilers (also tested
      with more aggressive optimization levels). Arguably, in the current
      kernel tree it's more of a theoretical issue, but imho, it's better
      to be pedantic about it.
      
      It's clearly visible with gcc/llvm though, with the below code: if we
      would have used barrier() only here, llvm would have omitted clearing,
      not so with barrier_data() variant:
      
        static inline void memzero_explicit(void *s, size_t count)
        {
          memset(s, 0, count);
          barrier_data(s);
        }
      
        int main(void)
        {
          char buff[20];
          memzero_explicit(buff, sizeof(buff));
          return 0;
        }
      
        $ gcc -O2 test.c
        $ gdb a.out
        (gdb) disassemble main
        Dump of assembler code for function main:
         0x0000000000400400  <+0>: lea   -0x28(%rsp),%rax
         0x0000000000400405  <+5>: movq  $0x0,-0x28(%rsp)
         0x000000000040040e <+14>: movq  $0x0,-0x20(%rsp)
         0x0000000000400417 <+23>: movl  $0x0,-0x18(%rsp)
         0x000000000040041f <+31>: xor   %eax,%eax
         0x0000000000400421 <+33>: retq
        End of assembler dump.
      
        $ clang -O2 test.c
        $ gdb a.out
        (gdb) disassemble main
        Dump of assembler code for function main:
         0x00000000004004f0  <+0>: xorps  %xmm0,%xmm0
         0x00000000004004f3  <+3>: movaps %xmm0,-0x18(%rsp)
         0x00000000004004f8  <+8>: movl   $0x0,-0x8(%rsp)
         0x0000000000400500 <+16>: lea    -0x18(%rsp),%rax
         0x0000000000400505 <+21>: xor    %eax,%eax
         0x0000000000400507 <+23>: retq
        End of assembler dump.
      
      As gcc, clang, but also icc defines __GNUC__, it's sufficient to define
      this in compiler-gcc.h only to be picked up. For a fallback or otherwise
      unsupported compiler, we define it as a barrier. Similarly, for ecc which
      does not support gcc inline asm.
      
      Reference: https://llvm.org/bugs/show_bug.cgi?id=15495Reported-by: NStephan Mueller <smueller@chronox.de>
      Tested-by: NStephan Mueller <smueller@chronox.de>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Stephan Mueller <smueller@chronox.de>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: mancha security <mancha1@zoho.com>
      Cc: Mark Charlebois <charlebm@gmail.com>
      Cc: Behan Webster <behanw@converseincode.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      7829fb09
    • T
      rhashtable-test: Detect insertion failures · 67b7cbf4
      Thomas Graf 提交于
      Account for failed inserts due to memory pressure or EBUSY and
      ignore failed entries during the consistency check.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67b7cbf4
    • T
      rhashtable-test: Use walker to test bucket statistics · 246b23a7
      Thomas Graf 提交于
      As resizes may continue to run in the background, use walker to
      ensure we see all entries. Also print the encountered number
      of rehashes queued up while traversing.
      
      This may lead to warnings due to entries being seen multiple
      times. We consider them non-fatal.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      246b23a7
    • T
      rhashtable-test: Do not allocate individual test objects · fcc57020
      Thomas Graf 提交于
      By far the most expensive part of the selftest was the allocation
      of entries. Using a static array allows to measure the rhashtable
      operations.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fcc57020
    • T
      rhashtable-test: Get rid of ptr in test_obj structure · c2c8a901
      Thomas Graf 提交于
      This only blows up the size of the test structure for no gain
      in test coverage. Reduces size of test_obj from 24 to 16 bytes.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2c8a901
    • T
      rhashtable-test: Measure time to insert, remove & traverse entries · 1aa661f5
      Thomas Graf 提交于
      Make test configurable by allowing to specify all relevant knobs
      through module parameters.
      
      Do several test runs and measure the average time it takes to
      insert & remove all entries. Note, a deferred resize might still
      continue to run in the background.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1aa661f5
    • T
      f54e84b6
  11. 01 5月, 2015 1 次提交
    • D
      test_bpf: indicate whether bpf prog got jited in test suite · 327941f8
      Daniel Borkmann 提交于
      I think this is useful to verify whether a filter could be JITed or
      not in case of bpf_prog_enable >= 1, which otherwise the test suite
      doesn't tell besides taking a good peek at the performance numbers.
      
      Nicolas Schichan reported a bug in the ARM JIT compiler that rejected
      and waved the filter to the interpreter although it shouldn't have.
      Nevertheless, the test passes as expected, but such information is
      not visible.
      
      It's i.e. useful for the remaining classic JITs, but also for
      implementing remaining opcodes that are not yet present in eBPF JITs
      (e.g. ARM64 waves some of them to the interpreter). This minor patch
      allows to grep through dmesg to find those accordingly, but also
      provides a total summary, i.e.: [<X>/53 JIT'ed]
      
        # echo 1 > /proc/sys/net/core/bpf_jit_enable
        # insmod lib/test_bpf.ko
        # dmesg | grep "jited:0"
      
      dmesg example on the ARM issue with JIT rejection:
      
      [...]
      [   67.925387] test_bpf: #2 ADD_SUB_MUL_K jited:1 24 PASS
      [   67.930889] test_bpf: #3 DIV_MOD_KX jited:0 794 PASS
      [   67.943940] test_bpf: #4 AND_OR_LSH_K jited:1 20 20 PASS
      [...]
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Nicolas Schichan <nschichan@freebox.fr>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      327941f8
  12. 23 4月, 2015 2 次提交
  13. 22 4月, 2015 4 次提交
    • M
      md/raid6 algorithms: xor_syndrome() for SSE2 · a582564b
      Markus Stockhausen 提交于
      The second and (last) optimized XOR syndrome calculation. This version
      supports right and left side optimization. All CPUs with architecture
      older than Haswell will benefit from it.
      
      It should be noted that SSE2 movntdq kills performance for memory areas
      that are read and written simultaneously in chunks smaller than cache
      line size. So use movdqa instead for P/Q writes in sse21 and sse22 XOR
      functions.
      Signed-off-by: NMarkus Stockhausen <stockhausen@collogia.de>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a582564b
    • M
      md/raid6 algorithms: xor_syndrome() for generic int · 9a5ce91d
      Markus Stockhausen 提交于
      Start the algorithms with the very basic one. It is left and right
      optimized. That means we can avoid all calculations for unneeded pages
      above the right stop offset. For pages below the left start offset we
      still need the syndrome multiplication but without reading data pages.
      Signed-off-by: NMarkus Stockhausen <stockhausen@collogia.de>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      9a5ce91d
    • M
      md/raid6 algorithms: improve test program · 7e92e1d7
      Markus Stockhausen 提交于
      It is always helpful to have a test tool in place if we implement
      new data critical algorithms. So add some test routines to the raid6
      checker that can prove if the new xor_syndrome() works as expected.
      
      Run through all permutations of start/stop pages per algorithm and
      simulate a xor_syndrome() assisted rmw run. After each rmw check if
      the recovery algorithm still confirms that the stripe is fine.
      Signed-off-by: NMarkus Stockhausen <stockhausen@collogia.de>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7e92e1d7
    • M
      md/raid6 algorithms: delta syndrome functions · fe5cbc6e
      Markus Stockhausen 提交于
      v3: s-o-b comment, explanation of performance and descision for
      the start/stop implementation
      
      Implementing rmw functionality for RAID6 requires optimized syndrome
      calculation. Up to now we can only generate a complete syndrome. The
      target P/Q pages are always overwritten. With this patch we provide
      a framework for inplace P/Q modification. In the first place simply
      fill those functions with NULL values.
      
      xor_syndrome() has two additional parameters: start & stop. These
      will indicate the first and last page that are changing during a
      rmw run. That makes it possible to avoid several unneccessary loops
      and speed up calculation. The caller needs to implement the following
      logic to make the functions work.
      
      1) xor_syndrome(disks, start, stop, ...): "Remove" all data of source
      blocks inside P/Q between (and including) start and end.
      
      2) modify any block with start <= block <= stop
      
      3) xor_syndrome(disks, start, stop, ...): "Reinsert" all data of
      source blocks into P/Q between (and including) start and end.
      
      Pages between start and stop that won't be changed should be filled
      with a pointer to the kernel zero page. The reasons for not taking NULL
      pages are:
      
      1) Algorithms cross the whole source data line by line. Thus avoid
      additional branches.
      
      2) Having a NULL page avoids calculating the XOR P parity but still
      need calulation steps for the Q parity. Depending on the algorithm
      unrolling that might be only a difference of 2 instructions per loop.
      
      The benchmark numbers of the gen_syndrome() functions are displayed in
      the kernel log. Do the same for the xor_syndrome() functions. This
      will help to analyze performance problems and give an rough estimate
      how well the algorithm works. The choice of the fastest algorithm will
      still depend on the gen_syndrome() performance.
      
      With the start/stop page implementation the speed can vary a lot in real
      life. E.g. a change of page 0 & page 15 on a stripe will be harder to
      compute than the case where page 0 & page 1 are XOR candidates. To be not
      to enthusiatic about the expected speeds we will run a worse case test
      that simulates a change on the upper half of the stripe. So we do:
      
      1) calculation of P/Q for the upper pages
      
      2) continuation of Q for the lower (empty) pages
      Signed-off-by: NMarkus Stockhausen <stockhausen@collogia.de>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      fe5cbc6e
  14. 21 4月, 2015 2 次提交
  15. 20 4月, 2015 1 次提交
    • L
      hexdump: avoid warning in test function · 17974c05
      Linus Torvalds 提交于
      The test_data_1_le[] array is a const array of const char *.  To avoid
      dropping any const information, we need to use "const char * const *",
      not just "const char **".
      
      I'm not sure why the different test arrays end up having different
      const'ness, but let's make the pointer we use to traverse them as const
      as possible, since we modify neither the array of pointers _or_ the
      pointers we find in the array.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      17974c05
  16. 19 4月, 2015 4 次提交
  17. 18 4月, 2015 1 次提交
  18. 17 4月, 2015 4 次提交
新手
引导
客服 返回
顶部