1. 12 11月, 2013 5 次提交
    • D
      random32: add test cases for taus113 implementation · a6a9c0f1
      Daniel Borkmann 提交于
      We generated a battery of 100 test cases from GSL taus113 implemention
      and compare the results from a particular seed and a particular
      iteration with our implementation in the kernel. We have verified on
      32 and 64 bit machines that our taus113 kernel implementation gives
      same results as GSL taus113 implementation:
      
        [    0.147370] prandom: seed boundary self test passed
        [    0.148078] prandom: 100 self tests passed
      
      This is a Kconfig option that is disabled on default, just like the
      crc32 init selftests in order to not unnecessary slow down boot process.
      We also refactored out prandom_seed_very_weak() as it's now used in
      multiple places in order to reduce redundant code.
      
      GSL code we used for generating test cases:
      
        int i, j;
        srand(time(NULL));
        for (i = 0; i < 100; ++i) {
          int iteration = 500 + (rand() % 500);
          gsl_rng_default_seed = rand() + 1;
          gsl_rng *r = gsl_rng_alloc(gsl_rng_taus113);
          printf("\t{ %lu, ", gsl_rng_default_seed);
          for (j = 0; j < iteration - 1; ++j)
            gsl_rng_get(r);
          printf("%u, %lu },\n", iteration, gsl_rng_get(r));
          gsl_rng_free(r);
        }
      
      Joint work with Hannes Frederic Sowa.
      
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6a9c0f1
    • D
      random32: upgrade taus88 generator to taus113 from errata paper · a98814ce
      Daniel Borkmann 提交于
      Since we use prandom*() functions quite often in networking code
      i.e. in UDP port selection, netfilter code, etc, upgrade the PRNG
      from Pierre L'Ecuyer's original paper "Maximally Equidistributed
      Combined Tausworthe Generators", Mathematics of Computation, 65,
      213 (1996), 203--213 to the version published in his errata paper [1].
      
      The Tausworthe generator is a maximally-equidistributed generator,
      that is fast and has good statistical properties [1].
      
      The version presented there upgrades the 3 state LFSR to a 4 state
      LFSR with increased periodicity from about 2^88 to 2^113. The
      algorithm is presented in [1] by the very same author who also
      designed the original algorithm in [2].
      
      Also, by increasing the state, we make it a bit harder for attackers
      to "guess" the PRNGs internal state. See also discussion in [3].
      
      Now, as we use this sort of weak initialization discussed in [3]
      only between core_initcall() until late_initcall() time [*] for
      prandom32*() users, namely in prandom_init(), it is less relevant
      from late_initcall() onwards as we overwrite seeds through
      prandom_reseed() anyways with a seed source of higher entropy, that
      is, get_random_bytes(). In other words, a exhaustive keysearch of
      96 bit would be needed. Now, with the help of this patch, this
      state-search increases further to 128 bit. Initialization needs
      to make sure that s1 > 1, s2 > 7, s3 > 15, s4 > 127.
      
      taus88 and taus113 algorithm is also part of GSL. I added a test
      case in the next patch to verify internal behaviour of this patch
      with GSL and ran tests with the dieharder 3.31.1 RNG test suite:
      
      $ dieharder -g 052 -a -m 10 -s 1 -S 4137730333 #taus88
      $ dieharder -g 054 -a -m 10 -s 1 -S 4137730333 #taus113
      
      With this seed configuration, in order to compare both, we get
      the following differences:
      
      algorithm                 taus88           taus113
      rands/second [**]         1.61e+08         1.37e+08
      sts_serial(4, 1st run)    WEAK             PASSED
      sts_serial(9, 2nd run)    WEAK             PASSED
      rgb_lagged_sum(31)        WEAK             PASSED
      
      We took out diehard_sums test as according to the authors it is
      considered broken and unusable [4]. Despite that and the slight
      decrease in performance (which is acceptable), taus113 here passes
      all 113 tests (only rgb_minimum_distance_5 in WEAK, the rest PASSED).
      In general, taus/taus113 is considered "very good" by the authors
      of dieharder [5].
      
      The papers [1][2] states a single warm-up step is sufficient by
      running quicktaus once on each state to ensure proper initialization
      of ~s_{0}:
      
      Our selection of (s) according to Table 1 of [1] row 1 holds the
      condition L - k <= r - s, that is,
      
        (32 32 32 32) - (31 29 28 25) <= (25 27 15 22) - (18 2 7 13)
      
      with r = k - q and q = (6 2 13 3) as also stated by the paper.
      So according to [2] we are safe with one round of quicktaus for
      initialization. However we decided to include the warm-up phase
      of the PRNG as done in GSL in every case as a safety net. We also
      use the warm up phase to make the output of the RNG easier to
      verify by the GSL output.
      
      In prandom_init(), we also mix random_get_entropy() into it, just
      like drivers/char/random.c does it, jiffies ^ random_get_entropy().
      random-get_entropy() is get_cycles(). xor is entropy preserving so
      it is fine if it is not implemented by some architectures.
      
      Note, this PRNG is *not* used for cryptography in the kernel, but
      rather as a fast PRNG for various randomizations i.e. in the
      networking code, or elsewhere for debugging purposes, for example.
      
      [*]: In order to generate some "sort of pseduo-randomness", since
      get_random_bytes() is not yet available for us, we use jiffies and
      initialize states s1 - s3 with a simple linear congruential generator
      (LCG), that is x <- x * 69069; and derive s2, s3, from the 32bit
      initialization from s1. So the above quote from [3] accounts only
      for the time from core to late initcall, not afterwards.
      [**] Single threaded run on MacBook Air w/ Intel Core i5-3317U
      
       [1] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme2.ps
       [2] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme.ps
       [3] http://thread.gmane.org/gmane.comp.encryption.general/12103/
       [4] http://code.google.com/p/dieharder/source/browse/trunk/libdieharder/diehard_sums.c?spec=svn490&r=490#20
       [5] http://www.phy.duke.edu/~rgb/General/dieharder.php
      
      Joint work with Hannes Frederic Sowa.
      
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a98814ce
    • H
      random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized · 4af712e8
      Hannes Frederic Sowa 提交于
      The Tausworthe PRNG is initialized at late_initcall time. At that time the
      entropy pool serving get_random_bytes is not filled sufficiently. This
      patch adds an additional reseeding step as soon as the nonblocking pool
      gets marked as initialized.
      
      On some machines it might be possible that late_initcall gets called after
      the pool has been initialized. In this situation we won't reseed again.
      
      (A call to prandom_seed_late blocks later invocations of early reseed
      attempts.)
      
      Joint work with Daniel Borkmann.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4af712e8
    • H
      random32: add periodic reseeding · 6d319202
      Hannes Frederic Sowa 提交于
      The current Tausworthe PRNG is never reseeded with truly random data after
      the first attempt in late_initcall. As this PRNG is used for some critical
      random data as e.g. UDP port randomization we should try better and reseed
      the PRNG once in a while with truly random data from get_random_bytes().
      
      When we reseed with prandom_seed we now make also sure to throw the first
      output away. This suffices the reseeding procedure.
      
      The delay calculation is based on a proposal from Eric Dumazet.
      
      Joint work with Daniel Borkmann.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d319202
    • D
      random32: fix off-by-one in seeding requirement · 51c37a70
      Daniel Borkmann 提交于
      For properly initialising the Tausworthe generator [1], we have
      a strict seeding requirement, that is, s1 > 1, s2 > 7, s3 > 15.
      
      Commit 697f8d03 ("random32: seeding improvement") introduced
      a __seed() function that imposes boundary checks proposed by the
      errata paper [2] to properly ensure above conditions.
      
      However, we're off by one, as the function is implemented as:
      "return (x < m) ? x + m : x;", and called with __seed(X, 1),
      __seed(X, 7), __seed(X, 15). Thus, an unwanted seed of 1, 7, 15
      would be possible, whereas the lower boundary should actually
      be of at least 2, 8, 16, just as GSL does. Fix this, as otherwise
      an initialization with an unwanted seed could have the effect
      that Tausworthe's PRNG properties cannot not be ensured.
      
      Note that this PRNG is *not* used for cryptography in the kernel.
      
       [1] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme.ps
       [2] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme2.ps
      
      Joint work with Hannes Frederic Sowa.
      
      Fixes: 697f8d03 ("random32: seeding improvement")
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51c37a70
  2. 05 11月, 2013 2 次提交
  3. 04 11月, 2013 3 次提交
  4. 01 11月, 2013 1 次提交
    • M
      lib/scatterlist.c: don't flush_kernel_dcache_page on slab page · 3d77b50c
      Ming Lei 提交于
      Commit b1adaf65 ("[SCSI] block: add sg buffer copy helper
      functions") introduces two sg buffer copy helpers, and calls
      flush_kernel_dcache_page() on pages in SG list after these pages are
      written to.
      
      Unfortunately, the commit may introduce a potential bug:
      
       - Before sending some SCSI commands, kmalloc() buffer may be passed to
         block layper, so flush_kernel_dcache_page() can see a slab page
         finally
      
       - According to cachetlb.txt, flush_kernel_dcache_page() is only called
         on "a user page", which surely can't be a slab page.
      
       - ARCH's implementation of flush_kernel_dcache_page() may use page
         mapping information to do optimization so page_mapping() will see the
         slab page, then VM_BUG_ON() is triggered.
      
      Aaro Koskinen reported the bug on ARM/kirkwood when DEBUG_VM is enabled,
      and this patch fixes the bug by adding test of '!PageSlab(miter->page)'
      before calling flush_kernel_dcache_page().
      Signed-off-by: NMing Lei <ming.lei@canonical.com>
      Reported-by: NAaro Koskinen <aaro.koskinen@iki.fi>
      Tested-by: NSimon Baatz <gmbnomis@gmail.com>
      Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: <stable@vger.kernel.org>	[3.2+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d77b50c
  5. 29 10月, 2013 1 次提交
  6. 17 10月, 2013 1 次提交
  7. 11 10月, 2013 1 次提交
  8. 28 9月, 2013 3 次提交
  9. 27 9月, 2013 1 次提交
    • E
      sysfs: Allow mounting without CONFIG_NET · 667b4102
      Eric W. Biederman 提交于
      In kobj_ns_current_may_mount the default should be to allow the
      mount.  The test is only for a single kobj_ns_type at a time, and unless
      there is a reason to prevent it the mounting sysfs should be allowed.
      Subsystems that are not registered can't have are not involved so can't
      have a reason to prevent mounting sysfs.
      
      This is a bug-fix to:
          commit 7dc5dbc8
          Author: Eric W. Biederman <ebiederm@xmission.com>
          Date:   Mon Mar 25 20:07:01 2013 -0700
      
              sysfs: Restrict mounting sysfs
      
              Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights
              over the net namespace.  The principle here is if you create or have
              capabilities over it you can mount it, otherwise you get to live with
              what other people have mounted.
      
              Instead of testing this with a straight forward ns_capable call,
              perform this check the long and torturous way with kobject helpers,
              this keeps direct knowledge of namespaces out of sysfs, and preserves
              the existing sysfs abstractions.
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      
      That came in via the userns tree during the 3.12 merge window.
      Reported-by: NJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      667b4102
  10. 21 9月, 2013 2 次提交
  11. 13 9月, 2013 1 次提交
  12. 12 9月, 2013 11 次提交
  13. 10 9月, 2013 1 次提交
    • K
      idr: Percpu ida · 798ab48e
      Kent Overstreet 提交于
      Percpu frontend for allocating ids. With percpu allocation (that works),
      it's impossible to guarantee it will always be possible to allocate all
      nr_tags - typically, some will be stuck on a remote percpu freelist
      where the current job can't get to them.
      
      We do guarantee that it will always be possible to allocate at least
      (nr_tags / 2) tags - this is done by keeping track of which and how many
      cpus have tags on their percpu freelists. On allocation failure if
      enough cpus have tags that there could potentially be (nr_tags / 2) tags
      stuck on remote percpu freelists, we then pick a remote cpu at random to
      steal from.
      
      Note that there's no cpu hotplug notifier - we don't care, because
      steal_tags() will eventually get the down cpu's tags. We _could_ satisfy
      more allocations if we had a notifier - but we'll still meet our
      guarantees and it's absolutely not a correctness issue, so I don't think
      it's worth the extra code.
      
      From akpm:
      
          "It looks OK to me (that's as close as I get to an ack :))
      
      v6 changes:
        - Add #include <linux/cpumask.h> to include/linux/percpu_ida.h to
          make alpha/arc builds happy (Fengguang)
        - Move second (cpu >= nr_cpu_ids) check inside of first check scope
          in steal_tags() (akpm + nab)
      
      v5 changes:
        - Change percpu_ida->cpus_have_tags to cpumask_t (kmo + akpm)
        - Add comment for percpu_ida_cpu->lock + ->nr_free (kmo + akpm)
        - Convert steal_tags() to use cpumask_weight() + cpumask_next() +
          cpumask_first() + cpumask_clear_cpu() (kmo + akpm)
        - Add comment for alloc_global_tags() (kmo + akpm)
        - Convert percpu_ida_alloc() to use cpumask_set_cpu() (kmo + akpm)
        - Convert percpu_ida_free() to use cpumask_set_cpu() (kmo + akpm)
        - Drop percpu_ida->cpus_have_tags allocation in percpu_ida_init()
          (kmo + akpm)
        - Drop percpu_ida->cpus_have_tags kfree in percpu_ida_destroy()
          (kmo + akpm)
        - Add comment for percpu_ida_alloc @ gfp (kmo + akpm)
        - Move to percpu_ida.c + percpu_ida.h (kmo + akpm + nab)
      
      v4 changes:
      
        - Fix tags.c reference in percpu_ida_init (akpm)
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      798ab48e
  14. 08 9月, 2013 2 次提交
    • L
      lockref: add ability to mark lockrefs "dead" · e7d33bb5
      Linus Torvalds 提交于
      The only actual current lockref user (dcache) uses zero reference counts
      even for perfectly live dentries, because it's a cache: there may not be
      any users, but that doesn't mean that we want to throw away the dentry.
      
      At the same time, the dentry cache does have a notion of a truly "dead"
      dentry that we must not even increment the reference count of, because
      we have pruned it and it is not valid.
      
      Currently that distinction is not visible in the lockref itself, and the
      dentry cache validation uses "lockref_get_or_lock()" to either get a new
      reference to a dentry that already had existing references (and thus
      cannot be dead), or get the dentry lock so that we can then verify the
      dentry and increment the reference count under the lock if that
      verification was successful.
      
      That's all somewhat complicated.
      
      This adds the concept of being "dead" to the lockref itself, by simply
      using a count that is negative.  This allows a usage scenario where we
      can increment the refcount of a dentry without having to validate it,
      and pushing the special "we killed it" case into the lockref code.
      
      The dentry code itself doesn't actually use this yet, and it's probably
      too late in the merge window to do that code (the dentry_kill() code
      with its "should I decrement the count" logic really is pretty complex
      code), but let's introduce the concept at the lockref level now.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7d33bb5
    • L
      lockref: fix docbook argument names · 44a0cf92
      Linus Torvalds 提交于
      The code got rewritten, but the comments got copied as-is from older
      versions, and as a result the argument name in the comment didn't
      actually match the code any more.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44a0cf92
  15. 07 9月, 2013 1 次提交
  16. 05 9月, 2013 1 次提交
    • V
      Kconfig.debug: Add FRAME_POINTER anti-dependency for ARC · cc80ae38
      Vineet Gupta 提交于
      Frame pointer on ARC doesn't serve the conventional purpose of stack
      unwinding due to the typical way ABI designates it's usage.
      Thus it's explicit usage on ARC is discouraged (gcc is free to use it,
      for some tricky stack frames even if -fomit-frame-pointer).
      
      Hence no point enabling it for ARC.
      
      References: http://www.spinics.net/lists/kernel/msg1593937.htmlSigned-off-by: NVineet Gupta <vgupta@synopsys.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Paul E. McKenney" <paul.mckenney@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: linux-kernel@vger.kernel.org
      cc80ae38
  17. 04 9月, 2013 2 次提交
  18. 03 9月, 2013 1 次提交
    • L
      lockref: implement lockless reference count updates using cmpxchg() · bc08b449
      Linus Torvalds 提交于
      Instead of taking the spinlock, the lockless versions atomically check
      that the lock is not taken, and do the reference count update using a
      cmpxchg() loop.  This is semantically identical to doing the reference
      count update protected by the lock, but avoids the "wait for lock"
      contention that you get when accesses to the reference count are
      contended.
      
      Note that a "lockref" is absolutely _not_ equivalent to an atomic_t.
      Even when the lockref reference counts are updated atomically with
      cmpxchg, the fact that they also verify the state of the spinlock means
      that the lockless updates can never happen while somebody else holds the
      spinlock.
      
      So while "lockref_put_or_lock()" looks a lot like just another name for
      "atomic_dec_and_lock()", and both optimize to lockless updates, they are
      fundamentally different: the decrement done by atomic_dec_and_lock() is
      truly independent of any lock (as long as it doesn't decrement to zero),
      so a locked region can still see the count change.
      
      The lockref structure, in contrast, really is a *locked* reference
      count.  If you hold the spinlock, the reference count will be stable and
      you can modify the reference count without using atomics, because even
      the lockless updates will see and respect the state of the lock.
      
      In order to enable the cmpxchg lockless code, the architecture needs to
      do three things:
      
       (1) Make sure that the "arch_spinlock_t" and an "unsigned int" can fit
           in an aligned u64, and have a "cmpxchg()" implementation that works
           on such a u64 data type.
      
       (2) define a helper function to test for a spinlock being unlocked
           ("arch_spin_value_unlocked()")
      
       (3) select the "ARCH_USE_CMPXCHG_LOCKREF" config variable in its
           Kconfig file.
      
      This enables it for x86-64 (but not 32-bit, we'd need to make sure
      cmpxchg() turns into the proper cmpxchg8b in order to enable it for
      32-bit mode).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc08b449