提交 · 6701de6c51c172b5de5633374479503c81fefc0b · openeuler / Kernel

19 5月, 2022 5 次提交

random: remove mostly unused async readiness notifier · 6701de6c

由 Jason A. Donenfeld 提交于 5月 15, 2022

The register_random_ready_notifier() notifier is somewhat complicated,
and was already recently rewritten to use notifier blocks. It is only
used now by one consumer in the kernel, vsprintf.c, for which the async
mechanism is really overly complex for what it actually needs. This
commit removes register_random_ready_notifier() and unregister_random_
ready_notifier(), because it just adds complication with little utility,
and changes vsprintf.c to just check on `!rng_is_initialized() &&
!rng_has_arch_random()`, which will eventually be true. Performance-
wise, that code was already using a static branch, so there's basically
no overhead at all to this change.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Petr Mladek <pmladek@suse.com> # for vsprintf.c
Reviewed-by: NPetr Mladek <pmladek@suse.com>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

6701de6c

random: remove get_random_bytes_arch() and add rng_has_arch_random() · 248561ad

由 Jason A. Donenfeld 提交于 5月 14, 2022

The RNG incorporates RDRAND into its state at boot and every time it
reseeds, so there's no reason for callers to use it directly. The
hashing that the RNG does on it is preferable to using the bytes raw.

The only current use case of get_random_bytes_arch() is vsprintf's
siphash key for pointer hashing, which uses it to initialize the pointer
secret earlier than usual if RDRAND is available. In order to replace
this narrow use case, just expose whether RDRAND is mixed into the RNG,
with a new function called rng_has_arch_random(). With that taken care
of, there are no users of get_random_bytes_arch() left, so it can be
removed.

Later, if trust_cpu gets turned on by default (as most distros are
doing), this one use of rng_has_arch_random() can probably go away as
well.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Petr Mladek <pmladek@suse.com> # for vsprintf.c
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

248561ad

random: move initialization functions out of hot pages · 560181c2

由 Jason A. Donenfeld 提交于 5月 13, 2022

Much of random.c is devoted to initializing the rng and accounting for
when a sufficient amount of entropy has been added. In a perfect world,
this would all happen during init, and so we could mark these functions
as __init. But in reality, this isn't the case: sometimes the rng only
finishes initializing some seconds after system init is finished.

For this reason, at the moment, a whole host of functions that are only
used relatively close to system init and then never again are intermixed
with functions that are used in hot code all the time. This creates more
cache misses than necessary.

In order to pack the hot code closer together, this commit moves the
initialization functions that can't be marked as __init into
.text.unlikely by way of the __cold attribute.

Of particular note is moving credit_init_bits() into a macro wrapper
that inlines the crng_ready() static branch check. This avoids a
function call to a nop+ret, and most notably prevents extra entropy
arithmetic from being computed in mix_interrupt_randomness().
Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

560181c2

random: make consistent use of buf and len · a1940263

由 Jason A. Donenfeld 提交于 5月 13, 2022

The current code was a mix of "nbytes", "count", "size", "buffer", "in",
and so forth. Instead, let's clean this up by naming input parameters
"buf" (or "ubuf") and "len", so that you always understand that you're
reading this variety of function argument.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

a1940263

random: use static branch for crng_ready() · f5bda35f

由 Jason A. Donenfeld 提交于 5月 03, 2022

Since crng_ready() is only false briefly during initialization and then
forever after becomes true, we don't need to evaluate it after, making
it a prime candidate for a static branch.

One complication, however, is that it changes state in a particular call
to credit_init_bits(), which might be made from atomic context, which
means we must kick off a workqueue to change the static key. Further
complicating things, credit_init_bits() may be called sufficiently early
on in system initialization such that system_wq is NULL.

Fortunately, there exists the nice function execute_in_process_context(),
which will immediately execute the function if !in_interrupt(), and
otherwise defer it to a workqueue. During early init, before workqueues
are available, in_interrupt() is always false, because interrupts
haven't even been enabled yet, which means the function in that case
executes immediately. Later on, after workqueues are available,
in_interrupt() might be true, but in that case, the work is queued in
system_wq and all goes well.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Sultan Alsawaf <sultan@kerneltoast.com>
Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

f5bda35f

18 5月, 2022 10 次提交

random: credit architectural init the exact amount · 12e45a2a

由 Jason A. Donenfeld 提交于 5月 12, 2022

RDRAND and RDSEED can fail sometimes, which is fine. We currently
initialize the RNG with 512 bits of RDRAND/RDSEED. We only need 256 bits
of those to succeed in order to initialize the RNG. Instead of the
current "all or nothing" approach, actually credit these contributions
the amount that is actually contributed.
Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

12e45a2a

random: handle latent entropy and command line from random_init() · 2f14062b

由 Jason A. Donenfeld 提交于 5月 05, 2022

Currently, start_kernel() adds latent entropy and the command line to
the entropy bool *after* the RNG has been initialized, deferring when
it's actually used by things like stack canaries until the next time
the pool is seeded. This surely is not intended.

Rather than splitting up which entropy gets added where and when between
start_kernel() and random_init(), just do everything in random_init(),
which should eliminate these kinds of bugs in the future.

While we're at it, rename the awkwardly titled "rand_initialize()" to
the more standard "random_init()" nomenclature.
Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

2f14062b

random: use proper jiffies comparison macro · 8a5b8a4a

由 Jason A. Donenfeld 提交于 5月 10, 2022

This expands to exactly the same code that it replaces, but makes things
consistent by using the same macro for jiffy comparisons throughout.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

8a5b8a4a

random: remove ratelimiting for in-kernel unseeded randomness · cc1e127b

由 Jason A. Donenfeld 提交于 5月 09, 2022

The CONFIG_WARN_ALL_UNSEEDED_RANDOM debug option controls whether the
kernel warns about all unseeded randomness or just the first instance.
There's some complicated rate limiting and comparison to the previous
caller, such that even with CONFIG_WARN_ALL_UNSEEDED_RANDOM enabled,
developers still don't see all the messages or even an accurate count of
how many were missed. This is the result of basically parallel
mechanisms aimed at accomplishing more or less the same thing, added at
different points in random.c history, which sort of compete with the
first-instance-only limiting we have now.

It turns out, however, that nobody cares about the first unseeded
randomness instance of in-kernel users. The same first user has been
there for ages now, and nobody is doing anything about it. It isn't even
clear that anybody _can_ do anything about it. Most places that can do
something about it have switched over to using get_random_bytes_wait()
or wait_for_random_bytes(), which is the right thing to do, but there is
still much code that needs randomness sometimes during init, and as a
geeneral rule, if you're not using one of the _wait functions or the
readiness notifier callback, you're bound to be doing it wrong just
based on that fact alone.

So warning about this same first user that can't easily change is simply
not an effective mechanism for anything at all. Users can't do anything
about it, as the Kconfig text points out -- the problem isn't in
userspace code -- and kernel developers don't or more often can't react
to it.

Instead, show the warning for all instances when CONFIG_WARN_ALL_UNSEEDED_RANDOM
is set, so that developers can debug things need be, or if it isn't set,
don't show a warning at all.

At the same time, CONFIG_WARN_ALL_UNSEEDED_RANDOM now implies setting
random.ratelimit_disable=1 on by default, since if you care about one
you probably care about the other too. And we can clean up usage around
the related urandom_warning ratelimiter as well (whose behavior isn't
changing), so that it properly counts missed messages after the 10
message threshold is reached.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

cc1e127b

random: move initialization out of reseeding hot path · 68c9c8b1

由 Jason A. Donenfeld 提交于 5月 09, 2022

Initialization happens once -- by way of credit_init_bits() -- and then
it never happens again. Therefore, it doesn't need to be in
crng_reseed(), which is a hot path that is called multiple times. It
also doesn't make sense to have there, as initialization activity is
better associated with initialization routines.

After the prior commit, crng_reseed() now won't be called by multiple
concurrent callers, which means that we can safely move the
"finialize_init" logic into crng_init_bits() unconditionally.
Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

68c9c8b1

random: avoid initializing twice in credit race · fed7ef06

由 Jason A. Donenfeld 提交于 5月 09, 2022

Since all changes of crng_init now go through credit_init_bits(), we can
fix a long standing race in which two concurrent callers of
credit_init_bits() have the new bit count >= some threshold, but are
doing so with crng_init as a lower threshold, checked outside of a lock,
resulting in crng_reseed() or similar being called twice.

In order to fix this, we can use the original cmpxchg value of the bit
count, and only change crng_init when the bit count transitions from
below a threshold to meeting the threshold.
Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

fed7ef06

random: use symbolic constants for crng_init states · e3d2c5e7

由 Jason A. Donenfeld 提交于 5月 08, 2022

crng_init represents a state machine, with three states, and various
rules for transitions. For the longest time, we've been managing these
with "0", "1", and "2", and expecting people to figure it out. To make
the code more obvious, replace these with proper enum values
representing the transition, and then redocument what each of these
states mean.
Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

e3d2c5e7

siphash: use one source of truth for siphash permutations · e73aaae2

由 Jason A. Donenfeld 提交于 5月 07, 2022

The SipHash family of permutations is currently used in three places:

- siphash.c itself, used in the ordinary way it was intended.
- random32.c, in a construction from an anonymous contributor.
- random.c, as part of its fast_mix function.

Each one of these places reinvents the wheel with the same C code, same
rotation constants, and same symmetry-breaking constants.

This commit tidies things up a bit by placing macros for the
permutations and constants into siphash.h, where each of the three .c
users can access them. It also leaves a note dissuading more users of
them from emerging.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

e73aaae2

random: help compiler out with fast_mix() by using simpler arguments · 791332b3

由 Jason A. Donenfeld 提交于 5月 06, 2022

Now that fast_mix() has more than one caller, gcc no longer inlines it.
That's fine. But it also doesn't handle the compound literal argument we
pass it very efficiently, nor does it handle the loop as well as it
could. So just expand the code to spell out this function so that it
generates the same code as it did before. Performance-wise, this now
behaves as it did before the last commit. The difference in actual code
size on x86 is 45 bytes, which is less than a cache line.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

791332b3

random: do not use input pool from hard IRQs · e3e33fc2

由 Jason A. Donenfeld 提交于 5月 06, 2022

Years ago, a separate fast pool was added for interrupts, so that the
cost associated with taking the input pool spinlocks and mixing into it
would be avoided in places where latency is critical. However, one
oversight was that add_input_randomness() and add_disk_randomness()
still sometimes are called directly from the interrupt handler, rather
than being deferred to a thread. This means that some unlucky interrupts
will be caught doing a blake2s_compress() call and potentially spinning
on input_pool.lock, which can also be taken by unprivileged users by
writing into /dev/urandom.

In order to fix this, add_timer_randomness() now checks whether it is
being called from a hard IRQ and if so, just mixes into the per-cpu IRQ
fast pool using fast_mix(), which is much faster and can be done
lock-free. A nice consequence of this, as well, is that it means hard
IRQ context FPU support is likely no longer useful.

The entropy estimation algorithm used by add_timer_randomness() is also
somewhat different than the one used for add_interrupt_randomness(). The
former looks at deltas of deltas of deltas, while the latter just waits
for 64 interrupts for one bit or for one second since the last bit. In
order to bridge these, and since add_interrupt_randomness() runs after
an add_timer_randomness() that's called from hard IRQ, we add to the
fast pool credit the related amount, and then subtract one to account
for add_interrupt_randomness()'s contribution.

A downside of this, however, is that the num argument is potentially
attacker controlled, which puts a bit more pressure on the fast_mix()
sponge to do more than it's really intended to do. As a mitigating
factor, the first 96 bits of input aren't attacker controlled (a cycle
counter followed by zeros), which means it's essentially two rounds of
siphash rather than one, which is somewhat better. It's also not that
much different from add_interrupt_randomness()'s use of the irq stack
instruction pointer register.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Filipe Manana <fdmanana@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

e3e33fc2

16 5月, 2022 1 次提交

random: order timer entropy functions below interrupt functions · a4b5c26b

由 Jason A. Donenfeld 提交于 5月 06, 2022

There are no code changes here; this is just a reordering of functions,
so that in subsequent commits, the timer entropy functions can call into
the interrupt ones.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

a4b5c26b

15 5月, 2022 1 次提交

random: do not pretend to handle premature next security model · e85c0fc1

由 Jason A. Donenfeld 提交于 4月 30, 2022

Per the thread linked below, "premature next" is not considered to be a
realistic threat model, and leads to more serious security problems.

"Premature next" is the scenario in which:

- Attacker compromises the current state of a fully initialized RNG via
  some kind of infoleak.
- New bits of entropy are added directly to the key used to generate the
  /dev/urandom stream, without any buffering or pooling.
- Attacker then, somehow having read access to /dev/urandom, samples RNG
  output and brute forces the individual new bits that were added.
- Result: the RNG never "recovers" from the initial compromise, a
  so-called violation of what academics term "post-compromise security".

The usual solutions to this involve some form of delaying when entropy
gets mixed into the crng. With Fortuna, this involves multiple input
buckets. With what the Linux RNG was trying to do prior, this involves
entropy estimation.

However, by delaying when entropy gets mixed in, it also means that RNG
compromises are extremely dangerous during the window of time before
the RNG has gathered enough entropy, during which time nonces may become
predictable (or repeated), ephemeral keys may not be secret, and so
forth. Moreover, it's unclear how realistic "premature next" is from an
attack perspective, if these attacks even make sense in practice.

Put together -- and discussed in more detail in the thread below --
these constitute grounds for just doing away with the current code that
pretends to handle premature next. I say "pretends" because it wasn't
doing an especially great job at it either; should we change our mind
about this direction, we would probably implement Fortuna to "fix" the
"problem", in which case, removing the pretend solution still makes
sense.

This also reduces the crng reseed period from 5 minutes down to 1
minute. The rationale from the thread might lead us toward reducing that
even further in the future (or even eliminating it), but that remains a
topic of a future commit.

At a high level, this patch changes semantics from:

    Before: Seed for the first time after 256 "bits" of estimated
    entropy have been accumulated since the system booted. Thereafter,
    reseed once every five minutes, but only if 256 new "bits" have been
    accumulated since the last reseeding.

    After: Seed for the first time after 256 "bits" of estimated entropy
    have been accumulated since the system booted. Thereafter, reseed
    once every minute.

Most of this patch is renaming and removing: POOL_MIN_BITS becomes
POOL_INIT_BITS, credit_entropy_bits() becomes credit_init_bits(),
crng_reseed() loses its "force" parameter since it's now always true,
the drain_entropy() function no longer has any use so it's removed,
entropy estimation is skipped if we've already init'd, the various
notifiers for "low on entropy" are now only active prior to init, and
finally, some documentation comments are cleaned up here and there.

Link: https://lore.kernel.org/lkml/YmlMGx6+uigkGiZ0@zx2c4.com/
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Nadia Heninger <nadiah@cs.ucsd.edu>
Cc: Tom Ristenpart <ristenpart@cornell.edu>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

e85c0fc1

14 5月, 2022 5 次提交

random: use first 128 bits of input as fast init · 5c3b747e

由 Jason A. Donenfeld 提交于 4月 30, 2022

Before, the first 64 bytes of input, regardless of how entropic it was,
would be used to mutate the crng base key directly, and none of those
bytes would be credited as having entropy. Then 256 bits of credited
input would be accumulated, and only then would the rng transition from
the earlier "fast init" phase into being actually initialized.

The thinking was that by mixing and matching fast init and real init, an
attacker who compromised the fast init state, considered easy to do
given how little entropy might be in those first 64 bytes, would then be
able to bruteforce bits from the actual initialization. By keeping these
separate, bruteforcing became impossible.

However, by not crediting potentially creditable bits from those first 64
bytes of input, we delay initialization, and actually make the problem
worse, because it means the user is drawing worse random numbers for a
longer period of time.

Instead, we can take the first 128 bits as fast init, and allow them to
be credited, and then hold off on the next 128 bits until they've
accumulated. This is still a wide enough margin to prevent bruteforcing
the rng state, while still initializing much faster.

Then, rather than trying to piecemeal inject into the base crng key at
various points, instead just extract from the pool when we need it, for
the crng_init==0 phase. Performance may even be better for the various
inputs here, since there are likely more calls to mix_pool_bytes() then
there are to get_random_bytes() during this phase of system execution.

Since the preinit injection code is gone, bootloader randomness can then
do something significantly more straight forward, removing the weird
system_wq hack in hwgenerator randomness.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

5c3b747e

random: do not use batches when !crng_ready() · cbe89e5a

由 Jason A. Donenfeld 提交于 5月 03, 2022

It's too hard to keep the batches synchronized, and pointless anyway,
since in !crng_ready(), we're updating the base_crng key really often,
where batching only hurts. So instead, if the crng isn't ready, just
call into get_random_bytes(). At this stage nothing is performance
critical anyhow.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

cbe89e5a

random: mix in timestamps and reseed on system restore · b7b67d13

由 Jason A. Donenfeld 提交于 5月 01, 2022

Since the RNG loses freshness with system suspend/hibernation, when we
resume, immediately reseed using whatever data we can, which for this
particular case is the various timestamps regarding system suspend time,
in addition to more generally the RDSEED/RDRAND/RDTSC values that happen
whenever the crng reseeds.

On systems that suspend and resume automatically all the time -- such as
Android -- we skip the reseeding on suspend resumption, since that could
wind up being far too busy. This is the same trade-off made in
WireGuard.

In addition to reseeding upon resumption always mix into the pool these
various stamps on every power notification event.

Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

b7b67d13

random: vary jitter iterations based on cycle counter speed · 78c768e6

由 Jason A. Donenfeld 提交于 4月 22, 2022

Currently, we do the jitter dance if two consecutive reads to the cycle
counter return different values. If they do, then we consider the cycle
counter to be fast enough that one trip through the scheduler will yield
one "bit" of credited entropy. If those two reads return the same value,
then we assume the cycle counter is too slow to show meaningful
differences.

This methodology is flawed for a variety of reasons, one of which Eric
posted a patch to fix in [1]. The issue that patch solves is that on a
system with a slow counter, you might be [un]lucky and read the counter
_just_ before it changes, so that the second cycle counter you read
differs from the first, even though there's usually quite a large period
of time in between the two. For example:

| real time | cycle counter |
| --------- | ------------- |
| 3         | 5             |
| 4         | 5             |
| 5         | 5             |
| 6         | 5             |
| 7         | 5             | <--- a
| 8         | 6             | <--- b
| 9         | 6             | <--- c

If we read the counter at (a) and compare it to (b), we might be fooled
into thinking that it's a fast counter, when in reality it is not. The
solution in [1] is to also compare counter (b) to counter (c), on the
theory that if the counter is _actually_ slow, and (a)!=(b), then
certainly (b)==(c).

This helps solve this particular issue, in one sense, but in another
sense, it mostly functions to disallow jitter entropy on these systems,
rather than simply taking more samples in that case.

Instead, this patch takes a different approach. Right now we assume that
a difference in one set of consecutive samples means one "bit" of
credited entropy per scheduler trip. We can extend this so that a
difference in two sets of consecutive samples means one "bit" of
credited entropy per /two/ scheduler trips, and three for three, and
four for four. In other words, we can increase the amount of jitter
"work" we require for each "bit", depending on how slow the cycle
counter is.

So this patch takes whole bunch of samples, sees how many of them are
different, and divides to find the amount of work required per "bit",
and also requires that at least some minimum of them are different in
order to attempt any jitter entropy.

Note that this approach is still far from perfect. It's not a real
statistical estimate on how much these samples vary; it's not a
real-time analysis of the relevant input data. That remains a project
for another time. However, it makes the same (partly flawed) assumptions
as the code that's there now, so it's probably not worse than the status
quo, and it handles the issue Eric mentioned in [1]. But, again, it's
probably a far cry from whatever a really robust version of this would
be.

[1] https://lore.kernel.org/lkml/20220421233152.58522-1-ebiggers@kernel.org/
    https://lore.kernel.org/lkml/20220421192939.250680-1-ebiggers@kernel.org/

Cc: Eric Biggers <ebiggers@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

78c768e6

random: insist on random_get_entropy() existing in order to simplify · 4b758eda

由 Jason A. Donenfeld 提交于 4月 12, 2022

All platforms are now guaranteed to provide some value for
random_get_entropy(). In case some bug leads to this not being so, we
print a warning, because that indicates that something is really very
wrong (and likely other things are impacted too). This should never be
hit, but it's a good and cheap way of finding out if something ever is
problematic.

Since we now have viable fallback code for random_get_entropy() on all
platforms, which is, in the worst case, not worse than jiffies, we can
count on getting the best possible value out of it. That means there's
no longer a use for using jiffies as entropy input. It also means we no
longer have a reason for doing the round-robin register flow in the IRQ
handler, which was always of fairly dubious value.

Instead we can greatly simplify the IRQ handler inputs and also unify
the construction between 64-bits and 32-bits. We now collect the cycle
counter and the return address, since those are the two things that
matter. Because the return address and the irq number are likely
related, to the extent we mix in the irq number, we can just xor it into
the top unchanging bytes of the return address, rather than the bottom
changing bytes of the cycle counter as before. Then, we can do a fixed 2
rounds of SipHash/HSipHash. Finally, we use the same construction of
hashing only half of the [H]SipHash state on 32-bit and 64-bit. We're
not actually discarding any entropy, since that entropy is carried
through until the next time. And more importantly, it lets us do the
same sponge-like construction everywhere.

Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

4b758eda

25 4月, 2022 1 次提交

random: document crng_fast_key_erasure() destination possibility · 8717627d

由 Jason A. Donenfeld 提交于 4月 18, 2022

This reverts 35a33ff3 ("random: use memmove instead of memcpy for
remaining 32 bytes"), which was made on a totally bogus basis. The thing
it was worried about overlapping came from the stack, not from one of
its arguments, as Eric pointed out.

But the fact that this confusion even happened draws attention to the
fact that it's a bit non-obvious that the random_data parameter can
alias chacha_state, and in fact should do so when the caller can't rely
on the stack being cleared in a timely manner. So this commit documents
that.
Reported-by: NEric Biggers <ebiggers@kernel.org>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

8717627d

16 4月, 2022 1 次提交

random: use memmove instead of memcpy for remaining 32 bytes · 35a33ff3

由 Jason A. Donenfeld 提交于 4月 14, 2022

In order to immediately overwrite the old key on the stack, before
servicing a userspace request for bytes, we use the remaining 32 bytes
of block 0 as the key. This means moving indices 8,9,a,b,c,d,e,f ->
4,5,6,7,8,9,a,b. Since 4 < 8, for the kernel implementations of
memcpy(), this doesn't actually appear to be a problem in practice. But
relying on that characteristic seems a bit brittle. So let's change that
to a proper memmove(), which is the by-the-books way of handling
overlapping memory copies.
Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

35a33ff3

13 4月, 2022 2 次提交

random: make random_get_entropy() return an unsigned long · b0c3e796

由 Jason A. Donenfeld 提交于 4月 08, 2022

Some implementations were returning type `unsigned long`, while others
that fell back to get_cycles() were implicitly returning a `cycles_t` or
an untyped constant int literal. That makes for weird and confusing
code, and basically all code in the kernel already handled it like it
was an `unsigned long`. I recently tried to handle it as the largest
type it could be, a `cycles_t`, but doing so doesn't really help with
much.

Instead let's just make random_get_entropy() return an unsigned long all
the time. This also matches the commonly used `arch_get_random_long()`
function, so now RDRAND and RDTSC return the same sized integer, which
means one can fallback to the other more gracefully.

Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Theodore Ts'o <tytso@mit.edu>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

b0c3e796

random: allow partial reads if later user copies fail · 5209aed5

由 Jason A. Donenfeld 提交于 4月 07, 2022

Rather than failing entirely if a copy_to_user() fails at some point,
instead we should return a partial read for the amount that succeeded
prior, unless none succeeded at all, in which case we return -EFAULT as
before.

This makes it consistent with other reader interfaces. For example, the
following snippet for /dev/zero outputs "4" followed by "1":

  int fd;
  void *x = mmap(NULL, 4096, PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
  assert(x != MAP_FAILED);
  fd = open("/dev/zero", O_RDONLY);
  assert(fd >= 0);
  printf("%zd\n", read(fd, x, 4));
  printf("%zd\n", read(fd, x + 4095, 4));
  close(fd);

This brings that same standard behavior to the various RNG reader
interfaces.

While we're at it, we can streamline the loop logic a little bit.
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

5209aed5

07 4月, 2022 1 次提交

random: check for signals every PAGE_SIZE chunk of /dev/[u]random · e3c1c4fd

由 Jason A. Donenfeld 提交于 4月 06, 2022

In 1448769c ("random: check for signal_pending() outside of
need_resched() check"), Jann pointed out that we previously were only
checking the TIF_NOTIFY_SIGNAL and TIF_SIGPENDING flags if the process
had TIF_NEED_RESCHED set, which meant in practice, super long reads to
/dev/[u]random would delay signal handling by a long time. I tried this
using the below program, and indeed I wasn't able to interrupt a
/dev/urandom read until after several megabytes had been read. The bug
he fixed has always been there, and so code that reads from /dev/urandom
without checking the return value of read() has mostly worked for a long
time, for most sizes, not just for <= 256.

Maybe it makes sense to keep that code working. The reason it was so
small prior, ignoring the fact that it didn't work anyway, was likely
because /dev/random used to block, and that could happen for pretty
large lengths of time while entropy was gathered. But now, it's just a
chacha20 call, which is extremely fast and is just operating on pure
data, without having to wait for some external event. In that sense,
/dev/[u]random is a lot more like /dev/zero.

Taking a page out of /dev/zero's read_zero() function, it always returns
at least one chunk, and then checks for signals after each chunk. Chunk
sizes there are of length PAGE_SIZE. Let's just copy the same thing for
/dev/[u]random, and check for signals and cond_resched() for every
PAGE_SIZE amount of data. This makes the behavior more consistent with
expectations, and should mitigate the impact of Jann's fix for the
age-old signal check bug.

---- test program ----

  #include <unistd.h>
  #include <signal.h>
  #include <stdio.h>
  #include <sys/random.h>

  static unsigned char x[~0U];

  static void handle(int) { }

  int main(int argc, char *argv[])
  {
    pid_t pid = getpid(), child;
    signal(SIGUSR1, handle);
    if (!(child = fork())) {
      for (;;)
        kill(pid, SIGUSR1);
    }
    pause();
    printf("interrupted after reading %zd bytes\n", getrandom(x, sizeof(x), 0));
    kill(child, SIGTERM);
    return 0;
  }

Cc: Jann Horn <jannh@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

e3c1c4fd

06 4月, 2022 2 次提交

random: check for signal_pending() outside of need_resched() check · 1448769c

由 Jann Horn 提交于 4月 05, 2022

signal_pending() checks TIF_NOTIFY_SIGNAL and TIF_SIGPENDING, which
signal that the task should bail out of the syscall when possible. This
is a separate concept from need_resched(), which checks
TIF_NEED_RESCHED, signaling that the task should preempt.

In particular, with the current code, the signal_pending() bailout
probably won't work reliably.

Change this to look like other functions that read lots of data, such as
read_zero().

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NJann Horn <jannh@google.com>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

1448769c

random: do not allow user to keep crng key around on stack · aba120cc

由 Jason A. Donenfeld 提交于 4月 05, 2022

The fast key erasure RNG design relies on the key that's used to be used
and then discarded. We do this, making judicious use of
memzero_explicit().  However, reads to /dev/urandom and calls to
getrandom() involve a copy_to_user(), and userspace can use FUSE or
userfaultfd, or make a massive call, dynamically remap memory addresses
as it goes, and set the process priority to idle, in order to keep a
kernel stack alive indefinitely. By probing
/proc/sys/kernel/random/entropy_avail to learn when the crng key is
refreshed, a malicious userspace could mount this attack every 5 minutes
thereafter, breaking the crng's forward secrecy.

In order to fix this, we just overwrite the stack's key with the first
32 bytes of the "free" fast key erasure output. If we're returning <= 32
bytes to the user, then we can still return those bytes directly, so
that short reads don't become slower. And for long reads, the difference
is hopefully lost in the amortization, so it doesn't change much, with
that amortization helping variously for medium reads.

We don't need to do this for get_random_bytes() and the various
kernel-space callers, and later, if we ever switch to always batching,
this won't be necessary either, so there's no need to change the API of
these functions.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: NJann Horn <jannh@google.com>
Fixes: c92e040d ("random: add backtracking protection to the CRNG")
Fixes: 186873c5 ("random: use simpler fast key erasure flow on per-cpu keys")
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

aba120cc

05 4月, 2022 2 次提交

random: opportunistically initialize on /dev/urandom reads · 48bff105

由 Jason A. Donenfeld 提交于 4月 05, 2022

In 6f98a4bf ("random: block in /dev/urandom"), we tried to make a
successful try_to_generate_entropy() call *required* if the RNG was not
already initialized. Unfortunately, weird architectures and old
userspaces combined in TCG test harnesses, making that change still not
realistic, so it was reverted in 0313bc27 ("Revert "random: block in
/dev/urandom"").

However, rather than making a successful try_to_generate_entropy() call
*required*, we can instead make it *best-effort*.

If try_to_generate_entropy() fails, it fails, and nothing changes from
the current behavior. If it succeeds, then /dev/urandom becomes safe to
use for free. This way, we don't risk the regression potential that led
to us reverting the required-try_to_generate_entropy() call before.

Practically speaking, this means that at least on x86, /dev/urandom
becomes safe. Probably other architectures with working cycle counters
will also become safe. And architectures with slow or broken cycle
counters at least won't be affected at all by this change.

So it may not be the glorious "all things are unified!" change we were
hoping for initially, but practically speaking, it makes a positive
impact.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

48bff105

random: do not split fast init input in add_hwgenerator_randomness() · 527a9867

由 Jan Varho 提交于 4月 04, 2022

add_hwgenerator_randomness() tries to only use the required amount of input
for fast init, but credits all the entropy, rather than a fraction of
it. Since it's hard to determine how much entropy is left over out of a
non-unformly random sample, either give it all to fast init or credit
it, but don't attempt to do both. In the process, we can clean up the
injection code to no longer need to return a value.
Signed-off-by: NJan Varho <jan.varho@gmail.com>
[Jason: expanded commit message]
Fixes: 73c7733f ("random: do not throw away excess input to crng_fast_load")
Cc: stable@vger.kernel.org # 5.17+, requires af704c85Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

527a9867

01 4月, 2022 1 次提交

random: mix build-time latent entropy into pool at init · 1754abb3

由 Jason A. Donenfeld 提交于 3月 31, 2022

Prior, the "input_pool_data" array needed no real initialization, and so
it was easy to mark it with __latent_entropy to populate it during
compile-time. In switching to using a hash function, this required us to
specifically initialize it to some specific state, which means we
dropped the __latent_entropy attribute. An unfortunate side effect was
this meant the pool was no longer seeded using compile-time random data.
In order to bring this back, we declare an array in rand_initialize()
with __latent_entropy and call mix_pool_bytes() on that at init, which
accomplishes the same thing as before. We make this __initconst, so that
it doesn't take up space at runtime after init.

Fixes: 6e8ec255 ("random: use computational hash for entropy extraction")
Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

1754abb3

25 3月, 2022 3 次提交

random: re-add removed comment about get_random_{u32,u64} reseeding · dd7aa36e

由 Jason A. Donenfeld 提交于 3月 22, 2022

The comment about get_random_{u32,u64}() not invoking reseeding got
added in an unrelated commit, that then was recently reverted by
0313bc27 ("Revert "random: block in /dev/urandom""). So this adds
that little comment snippet back, and improves the wording a bit too.
Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

dd7aa36e

random: treat bootloader trust toggle the same way as cpu trust toggle · d97c68d1

由 Jason A. Donenfeld 提交于 3月 22, 2022

If CONFIG_RANDOM_TRUST_CPU is set, the RNG initializes using RDRAND.
But, the user can disable (or enable) this behavior by setting
`random.trust_cpu=0/1` on the kernel command line. This allows system
builders to do reasonable things while avoiding howls from tinfoil
hatters. (Or vice versa.)

CONFIG_RANDOM_TRUST_BOOTLOADER is basically the same thing, but regards
the seed passed via EFI or device tree, which might come from RDRAND or
a TPM or somewhere else. In order to allow distros to more easily enable
this while avoiding those same howls (or vice versa), this commit adds
the corresponding `random.trust_bootloader=0/1` toggle.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Graham Christensen <graham@grahamc.com>
Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
Link: https://github.com/NixOS/nixpkgs/pull/165355Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

d97c68d1

random: skip fast_init if hwrng provides large chunk of entropy · af704c85

由 Jason A. Donenfeld 提交于 3月 21, 2022

At boot time, EFI calls add_bootloader_randomness(), which in turn calls
add_hwgenerator_randomness(). Currently add_hwgenerator_randomness()
feeds the first 64 bytes of randomness to the "fast init"
non-crypto-grade phase. But if add_hwgenerator_randomness() gets called
with more than POOL_MIN_BITS of entropy, there's no point in passing it
off to the "fast init" stage, since that's enough entropy to bootstrap
the real RNG. The "fast init" stage is just there to provide _something_
in the case where we don't have enough entropy to properly bootstrap the
RNG. But if we do have enough entropy to bootstrap the RNG, the current
logic doesn't serve a purpose. So, in the case where we're passed
greater than or equal to POOL_MIN_BITS of entropy, this commit makes us
skip the "fast init" phase.

Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

af704c85

23 3月, 2022 1 次提交

Revert "random: block in /dev/urandom" · 0313bc27

由 Linus Torvalds 提交于 3月 22, 2022

This reverts commit 6f98a4bf.

It turns out we still can't do this.  Way too many platforms that don't
have any real source of randomness at boot and no jitter entropy because
they don't even have a cycle counter.

As reported by Guenter Roeck:

 "This causes a large number of qemu boot test failures for various
  architectures (arm, m68k, microblaze, sparc32, xtensa are the ones I
  observed).

  Common denominator is that boot hangs at 'Saving random seed:'"

This isn't hugely unexpected - we tried it, it failed, so now we'll
revert it.

Link: https://lore.kernel.org/all/20220322155820.GA1745955@roeck-us.net/Reported-and-bisected-by: NGuenter Roeck <linux@roeck-us.net>
Cc: Jason Donenfeld <Jason@zx2c4.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0313bc27

13 3月, 2022 4 次提交

random: check for signal and try earlier when generating entropy · 3e504d20

由 Jason A. Donenfeld 提交于 3月 08, 2022

Rather than waiting a full second in an interruptable waiter before
trying to generate entropy, try to generate entropy first and wait
second. While waiting one second might give an extra second for getting
entropy from elsewhere, we're already pretty late in the init process
here, and whatever else is generating entropy will still continue to
contribute. This has implications on signal handling: we call
try_to_generate_entropy() from wait_for_random_bytes(), and
wait_for_random_bytes() always uses wait_event_interruptible_timeout()
when waiting, since it's called by userspace code in restartable
contexts, where signals can pend. Since try_to_generate_entropy() now
runs first, if a signal is pending, it's necessary for
try_to_generate_entropy() to check for signals, since it won't hit the
wait until after try_to_generate_entropy() has returned. And even before
this change, when entering a busy loop in try_to_generate_entropy(), we
should have been checking to see if any signals are pending, so that a
process doesn't get stuck in that loop longer than expected.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

3e504d20

random: reseed more often immediately after booting · 7a7ff644

由 Jason A. Donenfeld 提交于 3月 08, 2022

In order to chip away at the "premature first" problem, we augment our
existing entropy accounting with more frequent reseedings at boot.

The idea is that at boot, we're getting entropy from various places, and
we're not very sure which of early boot entropy is good and which isn't.
Even when we're crediting the entropy, we're still not totally certain
that it's any good. Since boot is the one time (aside from a compromise)
that we have zero entropy, it's important that we shepherd entropy into
the crng fairly often.

At the same time, we don't want a "premature next" problem, whereby an
attacker can brute force individual bits of added entropy. In lieu of
going full-on Fortuna (for now), we can pick a simpler strategy of just
reseeding more often during the first 5 minutes after boot. This is
still bounded by the 256-bit entropy credit requirement, so we'll skip a
reseeding if we haven't reached that, but in case entropy /is/ coming
in, this ensures that it makes its way into the crng rather rapidly
during these early stages.

Ordinarily we reseed if the previous reseeding is 300 seconds old. This
commit changes things so that for the first 600 seconds of boot time, we
reseed if the previous reseeding is uptime / 2 seconds old. That means
that we'll reseed at the very least double the uptime of the previous
reseeding.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

7a7ff644

random: make consistent usage of crng_ready() · a96cfe2d

由 Jason A. Donenfeld 提交于 3月 08, 2022

Rather than sometimes checking `crng_init < 2`, we should always use the
crng_ready() macro, so that should we change anything later, it's
consistent. Additionally, that macro already has a likely() around it,
which means we don't need to open code our own likely() and unlikely()
annotations.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

a96cfe2d

random: use SipHash as interrupt entropy accumulator · f5eab0e2

由 Jason A. Donenfeld 提交于 2月 11, 2022

The current fast_mix() function is a piece of classic mailing list
crypto, where it just sort of sprung up by an anonymous author without a
lot of real analysis of what precisely it was accomplishing. As an ARX
permutation alone, there are some easily searchable differential trails
in it, and as a means of preventing malicious interrupts, it completely
fails, since it xors new data into the entire state every time. It can't
really be analyzed as a random permutation, because it clearly isn't,
and it can't be analyzed as an interesting linear algebraic structure
either, because it's also not that. There really is very little one can
say about it in terms of entropy accumulation. It might diffuse bits,
some of the time, maybe, we hope, I guess. But for the most part, it
fails to accomplish anything concrete.

As a reminder, the simple goal of add_interrupt_randomness() is to
simply accumulate entropy until ~64 interrupts have elapsed, and then
dump it into the main input pool, which uses a cryptographic hash.

It would be nice to have something cryptographically strong in the
interrupt handler itself, in case a malicious interrupt compromises a
per-cpu fast pool within the 64 interrupts / 1 second window, and then
inside of that same window somehow can control its return address and
cycle counter, even if that's a bit far fetched. However, with a very
CPU-limited budget, actually doing that remains an active research
project (and perhaps there'll be something useful for Linux to come out
of it). And while the abundance of caution would be nice, this isn't
*currently* the security model, and we don't yet have a fast enough
solution to make it our security model. Plus there's not exactly a
pressing need to do that. (And for the avoidance of doubt, the actual
cluster of 64 accumulated interrupts still gets dumped into our
cryptographically secure input pool.)

So, for now we are going to stick with the existing interrupt security
model, which assumes that each cluster of 64 interrupt data samples is
mostly non-malicious and not colluding with an infoleaker. With this as
our goal, we have a few more choices, simply aiming to accumulate
entropy, while discarding the least amount of it.

We know from <https://eprint.iacr.org/2019/198> that random oracles,
instantiated as computational hash functions, make good entropy
accumulators and extractors, which is the justification for using
BLAKE2s in the main input pool. As mentioned, we don't have that luxury
here, but we also don't have the same security model requirements,
because we're assuming that there aren't malicious inputs. A
pseudorandom function instance can approximately behave like a random
oracle, provided that the key is uniformly random. But since we're not
concerned with malicious inputs, we can pick a fixed key, which is not
secret, knowing that "nature" won't interact with a sufficiently chosen
fixed key by accident. So we pick a PRF with a fixed initial key, and
accumulate into it continuously, dumping the result every 64 interrupts
into our cryptographically secure input pool.

For this, we make use of SipHash-1-x on 64-bit and HalfSipHash-1-x on
32-bit, which are already in use in the kernel's hsiphash family of
functions and achieve the same performance as the function they replace.
It would be nice to do two rounds, but we don't exactly have the CPU
budget handy for that, and one round alone is already sufficient.

As mentioned, we start with a fixed initial key (zeros is fine), and
allow SipHash's symmetry breaking constants to turn that into a useful
starting point. Also, since we're dumping the result (or half of it on
64-bit so as to tax our hash function the same amount on all platforms)
into the cryptographically secure input pool, there's no point in
finalizing SipHash's output, since it'll wind up being finalized by
something much stronger. This means that all we need to do is use the
ordinary round function word-by-word, as normal SipHash does.
Simplified, the flow is as follows:

Initialize:

siphash_state_t state;
siphash_init(&state, key={0, 0, 0, 0});

Update (accumulate) on interrupt:

siphash_update(&state, interrupt_data_and_timing);

Dump into input pool after 64 interrupts:

blake2s_update(&input_pool, &state, sizeof(state) / 2);

The result of all of this is that the security model is unchanged from
before -- we assume non-malicious inputs -- yet we now implement that
model with a stronger argument. I would like to emphasize, again, that
the purpose of this commit is to improve the existing design, by making
it analyzable, without changing any fundamental assumptions. There may
well be value down the road in changing up the existing design, using
something cryptographically strong, or simply using a ring buffer of
samples rather than having a fast_mix() at all, or changing which and
how much data we collect each interrupt so that we can use something
linear, or a variety of other ideas. This commit does not invalidate the
potential for those in the future.

For example, in the future, if we're able to characterize the data we're
collecting on each interrupt, we may be able to inch toward information
theoretic accumulators. <https://eprint.iacr.org/2021/523> shows that `s
= ror32(s, 7) ^ x` and `s = ror64(s, 19) ^ x` make very good
accumulators for 2-monotone distributions, which would apply to
timestamp counters, like random_get_entropy() or jiffies, but would not
apply to our current combination of the two values, or to the various
function addresses and register values we mix in. Alternatively,
<https://eprint.iacr.org/2021/1002> shows that max-period linear
functions with no non-trivial invariant subspace make good extractors,
used in the form `s = f(s) ^ x`. However, this only works if the input
data is both identical and independent, and obviously a collection of
address values and counters fails; so it goes with theoretical papers.
Future directions here may involve trying to characterize more precisely
what we actually need to collect in the interrupt handler, and building
something specific around that.

However, as mentioned, the morass of data we're gathering at the
interrupt handler presently defies characterization, and so we use
SipHash for now, which works well and performs well.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NJean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

f5eab0e2

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功