1. 20 3月, 2014 1 次提交
  2. 17 1月, 2014 1 次提交
  3. 15 1月, 2014 1 次提交
    • M
      lib/percpu_counter.c: fix __percpu_counter_add() · 74e72f89
      Ming Lei 提交于
      __percpu_counter_add() may be called in softirq/hardirq handler (such
      as, blk_mq_queue_exit() is typically called in hardirq/softirq handler),
      so we need to call this_cpu_add()(irq safe helper) to update percpu
      counter, otherwise counts may be lost.
      
      This fixes the problem that 'rmmod null_blk' hangs in blk_cleanup_queue()
      because of miscounting of request_queue->mq_usage_counter.
      
      This patch is the v1 of previous one of "lib/percpu_counter.c:
      disable local irq when updating percpu couter", and takes Andrew's
      approach which may be more efficient for ARCHs(x86, s390) that
      have optimized this_cpu_add().
      Signed-off-by: NMing Lei <tom.leiming@gmail.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Shaohua Li <shli@fusionio.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Fan Du <fan.du@windriver.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      74e72f89
  4. 02 12月, 2013 1 次提交
    • D
      KEYS: Fix multiple key add into associative array · 23fd78d7
      David Howells 提交于
      If sufficient keys (or keyrings) are added into a keyring such that a node in
      the associative array's tree overflows (each node has a capacity N, currently
      16) and such that all N+1 keys have the same index key segment for that level
      of the tree (the level'th nibble of the index key), then assoc_array_insert()
      calls ops->diff_objects() to indicate at which bit position the two index keys
      vary.
      
      However, __key_link_begin() passes a NULL object to assoc_array_insert() with
      the intention of supplying the correct pointer later before we commit the
      change.  This means that keyring_diff_objects() is given a NULL pointer as one
      of its arguments which it does not expect.  This results in an oops like the
      attached.
      
      With the previous patch to fix the keyring hash function, this can be forced
      much more easily by creating a keyring and only adding keyrings to it.  Add any
      other sort of key and a different insertion path is taken - all 16+1 objects
      must want to cluster in the same node slot.
      
      This can be tested by:
      
      	r=`keyctl newring sandbox @s`
      	for ((i=0; i<=16; i++)); do keyctl newring ring$i $r; done
      
      This should work fine, but oopses when the 17th keyring is added.
      
      Since ops->diff_objects() is always called with the first pointer pointing to
      the object to be inserted (ie. the NULL pointer), we can fix the problem by
      changing the to-be-inserted object pointer to point to the index key passed
      into assoc_array_insert() instead.
      
      Whilst we're at it, we also switch the arguments so that they are the same as
      for ->compare_object().
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
      IP: [<ffffffff81191ee4>] hash_key_type_and_desc+0x18/0xb0
      ...
      RIP: 0010:[<ffffffff81191ee4>] hash_key_type_and_desc+0x18/0xb0
      ...
      Call Trace:
       [<ffffffff81191f9d>] keyring_diff_objects+0x21/0xd2
       [<ffffffff811f09ef>] assoc_array_insert+0x3b6/0x908
       [<ffffffff811929a7>] __key_link_begin+0x78/0xe5
       [<ffffffff81191a2e>] key_create_or_update+0x17d/0x36a
       [<ffffffff81192e0a>] SyS_add_key+0x123/0x183
       [<ffffffff81400ddb>] tracesys+0xdd/0xe2
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NStephen Gallagher <sgallagh@redhat.com>
      23fd78d7
  5. 28 11月, 2013 1 次提交
  6. 20 11月, 2013 1 次提交
  7. 15 11月, 2013 7 次提交
  8. 13 11月, 2013 7 次提交
  9. 12 11月, 2013 5 次提交
    • D
      random32: add test cases for taus113 implementation · a6a9c0f1
      Daniel Borkmann 提交于
      We generated a battery of 100 test cases from GSL taus113 implemention
      and compare the results from a particular seed and a particular
      iteration with our implementation in the kernel. We have verified on
      32 and 64 bit machines that our taus113 kernel implementation gives
      same results as GSL taus113 implementation:
      
        [    0.147370] prandom: seed boundary self test passed
        [    0.148078] prandom: 100 self tests passed
      
      This is a Kconfig option that is disabled on default, just like the
      crc32 init selftests in order to not unnecessary slow down boot process.
      We also refactored out prandom_seed_very_weak() as it's now used in
      multiple places in order to reduce redundant code.
      
      GSL code we used for generating test cases:
      
        int i, j;
        srand(time(NULL));
        for (i = 0; i < 100; ++i) {
          int iteration = 500 + (rand() % 500);
          gsl_rng_default_seed = rand() + 1;
          gsl_rng *r = gsl_rng_alloc(gsl_rng_taus113);
          printf("\t{ %lu, ", gsl_rng_default_seed);
          for (j = 0; j < iteration - 1; ++j)
            gsl_rng_get(r);
          printf("%u, %lu },\n", iteration, gsl_rng_get(r));
          gsl_rng_free(r);
        }
      
      Joint work with Hannes Frederic Sowa.
      
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6a9c0f1
    • D
      random32: upgrade taus88 generator to taus113 from errata paper · a98814ce
      Daniel Borkmann 提交于
      Since we use prandom*() functions quite often in networking code
      i.e. in UDP port selection, netfilter code, etc, upgrade the PRNG
      from Pierre L'Ecuyer's original paper "Maximally Equidistributed
      Combined Tausworthe Generators", Mathematics of Computation, 65,
      213 (1996), 203--213 to the version published in his errata paper [1].
      
      The Tausworthe generator is a maximally-equidistributed generator,
      that is fast and has good statistical properties [1].
      
      The version presented there upgrades the 3 state LFSR to a 4 state
      LFSR with increased periodicity from about 2^88 to 2^113. The
      algorithm is presented in [1] by the very same author who also
      designed the original algorithm in [2].
      
      Also, by increasing the state, we make it a bit harder for attackers
      to "guess" the PRNGs internal state. See also discussion in [3].
      
      Now, as we use this sort of weak initialization discussed in [3]
      only between core_initcall() until late_initcall() time [*] for
      prandom32*() users, namely in prandom_init(), it is less relevant
      from late_initcall() onwards as we overwrite seeds through
      prandom_reseed() anyways with a seed source of higher entropy, that
      is, get_random_bytes(). In other words, a exhaustive keysearch of
      96 bit would be needed. Now, with the help of this patch, this
      state-search increases further to 128 bit. Initialization needs
      to make sure that s1 > 1, s2 > 7, s3 > 15, s4 > 127.
      
      taus88 and taus113 algorithm is also part of GSL. I added a test
      case in the next patch to verify internal behaviour of this patch
      with GSL and ran tests with the dieharder 3.31.1 RNG test suite:
      
      $ dieharder -g 052 -a -m 10 -s 1 -S 4137730333 #taus88
      $ dieharder -g 054 -a -m 10 -s 1 -S 4137730333 #taus113
      
      With this seed configuration, in order to compare both, we get
      the following differences:
      
      algorithm                 taus88           taus113
      rands/second [**]         1.61e+08         1.37e+08
      sts_serial(4, 1st run)    WEAK             PASSED
      sts_serial(9, 2nd run)    WEAK             PASSED
      rgb_lagged_sum(31)        WEAK             PASSED
      
      We took out diehard_sums test as according to the authors it is
      considered broken and unusable [4]. Despite that and the slight
      decrease in performance (which is acceptable), taus113 here passes
      all 113 tests (only rgb_minimum_distance_5 in WEAK, the rest PASSED).
      In general, taus/taus113 is considered "very good" by the authors
      of dieharder [5].
      
      The papers [1][2] states a single warm-up step is sufficient by
      running quicktaus once on each state to ensure proper initialization
      of ~s_{0}:
      
      Our selection of (s) according to Table 1 of [1] row 1 holds the
      condition L - k <= r - s, that is,
      
        (32 32 32 32) - (31 29 28 25) <= (25 27 15 22) - (18 2 7 13)
      
      with r = k - q and q = (6 2 13 3) as also stated by the paper.
      So according to [2] we are safe with one round of quicktaus for
      initialization. However we decided to include the warm-up phase
      of the PRNG as done in GSL in every case as a safety net. We also
      use the warm up phase to make the output of the RNG easier to
      verify by the GSL output.
      
      In prandom_init(), we also mix random_get_entropy() into it, just
      like drivers/char/random.c does it, jiffies ^ random_get_entropy().
      random-get_entropy() is get_cycles(). xor is entropy preserving so
      it is fine if it is not implemented by some architectures.
      
      Note, this PRNG is *not* used for cryptography in the kernel, but
      rather as a fast PRNG for various randomizations i.e. in the
      networking code, or elsewhere for debugging purposes, for example.
      
      [*]: In order to generate some "sort of pseduo-randomness", since
      get_random_bytes() is not yet available for us, we use jiffies and
      initialize states s1 - s3 with a simple linear congruential generator
      (LCG), that is x <- x * 69069; and derive s2, s3, from the 32bit
      initialization from s1. So the above quote from [3] accounts only
      for the time from core to late initcall, not afterwards.
      [**] Single threaded run on MacBook Air w/ Intel Core i5-3317U
      
       [1] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme2.ps
       [2] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme.ps
       [3] http://thread.gmane.org/gmane.comp.encryption.general/12103/
       [4] http://code.google.com/p/dieharder/source/browse/trunk/libdieharder/diehard_sums.c?spec=svn490&r=490#20
       [5] http://www.phy.duke.edu/~rgb/General/dieharder.php
      
      Joint work with Hannes Frederic Sowa.
      
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a98814ce
    • H
      random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized · 4af712e8
      Hannes Frederic Sowa 提交于
      The Tausworthe PRNG is initialized at late_initcall time. At that time the
      entropy pool serving get_random_bytes is not filled sufficiently. This
      patch adds an additional reseeding step as soon as the nonblocking pool
      gets marked as initialized.
      
      On some machines it might be possible that late_initcall gets called after
      the pool has been initialized. In this situation we won't reseed again.
      
      (A call to prandom_seed_late blocks later invocations of early reseed
      attempts.)
      
      Joint work with Daniel Borkmann.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4af712e8
    • H
      random32: add periodic reseeding · 6d319202
      Hannes Frederic Sowa 提交于
      The current Tausworthe PRNG is never reseeded with truly random data after
      the first attempt in late_initcall. As this PRNG is used for some critical
      random data as e.g. UDP port randomization we should try better and reseed
      the PRNG once in a while with truly random data from get_random_bytes().
      
      When we reseed with prandom_seed we now make also sure to throw the first
      output away. This suffices the reseeding procedure.
      
      The delay calculation is based on a proposal from Eric Dumazet.
      
      Joint work with Daniel Borkmann.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d319202
    • D
      random32: fix off-by-one in seeding requirement · 51c37a70
      Daniel Borkmann 提交于
      For properly initialising the Tausworthe generator [1], we have
      a strict seeding requirement, that is, s1 > 1, s2 > 7, s3 > 15.
      
      Commit 697f8d03 ("random32: seeding improvement") introduced
      a __seed() function that imposes boundary checks proposed by the
      errata paper [2] to properly ensure above conditions.
      
      However, we're off by one, as the function is implemented as:
      "return (x < m) ? x + m : x;", and called with __seed(X, 1),
      __seed(X, 7), __seed(X, 15). Thus, an unwanted seed of 1, 7, 15
      would be possible, whereas the lower boundary should actually
      be of at least 2, 8, 16, just as GSL does. Fix this, as otherwise
      an initialization with an unwanted seed could have the effect
      that Tausworthe's PRNG properties cannot not be ensured.
      
      Note that this PRNG is *not* used for cryptography in the kernel.
      
       [1] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme.ps
       [2] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme2.ps
      
      Joint work with Hannes Frederic Sowa.
      
      Fixes: 697f8d03 ("random32: seeding improvement")
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51c37a70
  10. 08 11月, 2013 1 次提交
  11. 07 11月, 2013 1 次提交
    • L
      Revert "sysfs: drop kobj_ns_type handling" · a1212d27
      Linus Torvalds 提交于
      This reverts commit cb26a311.
      
      It mysteriously causes NetworkManager to not find the wireless device
      for me.  As far as I can tell, Tejun *meant* for this commit to not make
      any semantic changes, but there clearly are some.  So revert it, taking
      into account some of the calling convention changes that happened in
      this area in subsequent commits.
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a1212d27
  12. 06 11月, 2013 3 次提交
  13. 05 11月, 2013 2 次提交
  14. 04 11月, 2013 3 次提交
  15. 01 11月, 2013 1 次提交
    • M
      lib/scatterlist.c: don't flush_kernel_dcache_page on slab page · 3d77b50c
      Ming Lei 提交于
      Commit b1adaf65 ("[SCSI] block: add sg buffer copy helper
      functions") introduces two sg buffer copy helpers, and calls
      flush_kernel_dcache_page() on pages in SG list after these pages are
      written to.
      
      Unfortunately, the commit may introduce a potential bug:
      
       - Before sending some SCSI commands, kmalloc() buffer may be passed to
         block layper, so flush_kernel_dcache_page() can see a slab page
         finally
      
       - According to cachetlb.txt, flush_kernel_dcache_page() is only called
         on "a user page", which surely can't be a slab page.
      
       - ARCH's implementation of flush_kernel_dcache_page() may use page
         mapping information to do optimization so page_mapping() will see the
         slab page, then VM_BUG_ON() is triggered.
      
      Aaro Koskinen reported the bug on ARM/kirkwood when DEBUG_VM is enabled,
      and this patch fixes the bug by adding test of '!PageSlab(miter->page)'
      before calling flush_kernel_dcache_page().
      Signed-off-by: NMing Lei <ming.lei@canonical.com>
      Reported-by: NAaro Koskinen <aaro.koskinen@iki.fi>
      Tested-by: NSimon Baatz <gmbnomis@gmail.com>
      Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: <stable@vger.kernel.org>	[3.2+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d77b50c
  16. 29 10月, 2013 1 次提交
  17. 25 10月, 2013 3 次提交
    • S
      percpu_ida: add an API to return free tags · 1dddc01a
      Shaohua Li 提交于
      Add an API to return free tags, blk-mq-tag will use it.
      
      Note, this just returns a snapshot of free tags number. blk-mq-tag has
      two usages of it. One is for info output for diagnosis. The other is to
      quickly check if there are free tags for request dispatch checking.
      Neither requires very precise.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1dddc01a
    • S
      percpu_ida: add percpu_ida_for_each_free · 7fc2ba17
      Shaohua Li 提交于
      Add a new API to iterate free ids. blk-mq-tag will use it.
      
      Note, this doesn't guarantee to iterate all free ids restrictly. Caller
      should be aware of this. blk-mq uses it to do sanity check for request
      timedout, so can tolerate the limitation.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7fc2ba17
    • S
      percpu_ida: make percpu_ida percpu size/batch configurable · e26b53d0
      Shaohua Li 提交于
      Make percpu_ida percpu size/batch configurable. The block-mq-tag will
      use it.
      
      After block-mq uses percpu_ida to manage tags, performance is improved.
      My test is done in a 2 sockets machine, 12 process cross the 2 sockets.
      So if there is lock contention or ipi, should be stressed heavily.
      Testing is done for null-blk.
      
      hw_queue_depth	nopatch iops	patch iops
      64		~800k/s		~1470k/s
      2048		~4470k/s	~4340k/s
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e26b53d0