- 16 4月, 2022 1 次提交
-
-
由 Jason A. Donenfeld 提交于
In order to immediately overwrite the old key on the stack, before servicing a userspace request for bytes, we use the remaining 32 bytes of block 0 as the key. This means moving indices 8,9,a,b,c,d,e,f -> 4,5,6,7,8,9,a,b. Since 4 < 8, for the kernel implementations of memcpy(), this doesn't actually appear to be a problem in practice. But relying on that characteristic seems a bit brittle. So let's change that to a proper memmove(), which is the by-the-books way of handling overlapping memory copies. Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
- 13 4月, 2022 2 次提交
-
-
由 Jason A. Donenfeld 提交于
Some implementations were returning type `unsigned long`, while others that fell back to get_cycles() were implicitly returning a `cycles_t` or an untyped constant int literal. That makes for weird and confusing code, and basically all code in the kernel already handled it like it was an `unsigned long`. I recently tried to handle it as the largest type it could be, a `cycles_t`, but doing so doesn't really help with much. Instead let's just make random_get_entropy() return an unsigned long all the time. This also matches the commonly used `arch_get_random_long()` function, so now RDRAND and RDTSC return the same sized integer, which means one can fallback to the other more gracefully. Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Theodore Ts'o <tytso@mit.edu> Acked-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
Rather than failing entirely if a copy_to_user() fails at some point, instead we should return a partial read for the amount that succeeded prior, unless none succeeded at all, in which case we return -EFAULT as before. This makes it consistent with other reader interfaces. For example, the following snippet for /dev/zero outputs "4" followed by "1": int fd; void *x = mmap(NULL, 4096, PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); assert(x != MAP_FAILED); fd = open("/dev/zero", O_RDONLY); assert(fd >= 0); printf("%zd\n", read(fd, x, 4)); printf("%zd\n", read(fd, x + 4095, 4)); close(fd); This brings that same standard behavior to the various RNG reader interfaces. While we're at it, we can streamline the loop logic a little bit. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Jann Horn <jannh@google.com> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
- 07 4月, 2022 1 次提交
-
-
由 Jason A. Donenfeld 提交于
In 1448769c ("random: check for signal_pending() outside of need_resched() check"), Jann pointed out that we previously were only checking the TIF_NOTIFY_SIGNAL and TIF_SIGPENDING flags if the process had TIF_NEED_RESCHED set, which meant in practice, super long reads to /dev/[u]random would delay signal handling by a long time. I tried this using the below program, and indeed I wasn't able to interrupt a /dev/urandom read until after several megabytes had been read. The bug he fixed has always been there, and so code that reads from /dev/urandom without checking the return value of read() has mostly worked for a long time, for most sizes, not just for <= 256. Maybe it makes sense to keep that code working. The reason it was so small prior, ignoring the fact that it didn't work anyway, was likely because /dev/random used to block, and that could happen for pretty large lengths of time while entropy was gathered. But now, it's just a chacha20 call, which is extremely fast and is just operating on pure data, without having to wait for some external event. In that sense, /dev/[u]random is a lot more like /dev/zero. Taking a page out of /dev/zero's read_zero() function, it always returns at least one chunk, and then checks for signals after each chunk. Chunk sizes there are of length PAGE_SIZE. Let's just copy the same thing for /dev/[u]random, and check for signals and cond_resched() for every PAGE_SIZE amount of data. This makes the behavior more consistent with expectations, and should mitigate the impact of Jann's fix for the age-old signal check bug. ---- test program ---- #include <unistd.h> #include <signal.h> #include <stdio.h> #include <sys/random.h> static unsigned char x[~0U]; static void handle(int) { } int main(int argc, char *argv[]) { pid_t pid = getpid(), child; signal(SIGUSR1, handle); if (!(child = fork())) { for (;;) kill(pid, SIGUSR1); } pause(); printf("interrupted after reading %zd bytes\n", getrandom(x, sizeof(x), 0)); kill(child, SIGTERM); return 0; } Cc: Jann Horn <jannh@google.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
- 06 4月, 2022 2 次提交
-
-
由 Jann Horn 提交于
signal_pending() checks TIF_NOTIFY_SIGNAL and TIF_SIGPENDING, which signal that the task should bail out of the syscall when possible. This is a separate concept from need_resched(), which checks TIF_NEED_RESCHED, signaling that the task should preempt. In particular, with the current code, the signal_pending() bailout probably won't work reliably. Change this to look like other functions that read lots of data, such as read_zero(). Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
The fast key erasure RNG design relies on the key that's used to be used and then discarded. We do this, making judicious use of memzero_explicit(). However, reads to /dev/urandom and calls to getrandom() involve a copy_to_user(), and userspace can use FUSE or userfaultfd, or make a massive call, dynamically remap memory addresses as it goes, and set the process priority to idle, in order to keep a kernel stack alive indefinitely. By probing /proc/sys/kernel/random/entropy_avail to learn when the crng key is refreshed, a malicious userspace could mount this attack every 5 minutes thereafter, breaking the crng's forward secrecy. In order to fix this, we just overwrite the stack's key with the first 32 bytes of the "free" fast key erasure output. If we're returning <= 32 bytes to the user, then we can still return those bytes directly, so that short reads don't become slower. And for long reads, the difference is hopefully lost in the amortization, so it doesn't change much, with that amortization helping variously for medium reads. We don't need to do this for get_random_bytes() and the various kernel-space callers, and later, if we ever switch to always batching, this won't be necessary either, so there's no need to change the API of these functions. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: NJann Horn <jannh@google.com> Fixes: c92e040d ("random: add backtracking protection to the CRNG") Fixes: 186873c5 ("random: use simpler fast key erasure flow on per-cpu keys") Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
- 05 4月, 2022 2 次提交
-
-
由 Jason A. Donenfeld 提交于
In 6f98a4bf ("random: block in /dev/urandom"), we tried to make a successful try_to_generate_entropy() call *required* if the RNG was not already initialized. Unfortunately, weird architectures and old userspaces combined in TCG test harnesses, making that change still not realistic, so it was reverted in 0313bc27 ("Revert "random: block in /dev/urandom""). However, rather than making a successful try_to_generate_entropy() call *required*, we can instead make it *best-effort*. If try_to_generate_entropy() fails, it fails, and nothing changes from the current behavior. If it succeeds, then /dev/urandom becomes safe to use for free. This way, we don't risk the regression potential that led to us reverting the required-try_to_generate_entropy() call before. Practically speaking, this means that at least on x86, /dev/urandom becomes safe. Probably other architectures with working cycle counters will also become safe. And architectures with slow or broken cycle counters at least won't be affected at all by this change. So it may not be the glorious "all things are unified!" change we were hoping for initially, but practically speaking, it makes a positive impact. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jan Varho 提交于
add_hwgenerator_randomness() tries to only use the required amount of input for fast init, but credits all the entropy, rather than a fraction of it. Since it's hard to determine how much entropy is left over out of a non-unformly random sample, either give it all to fast init or credit it, but don't attempt to do both. In the process, we can clean up the injection code to no longer need to return a value. Signed-off-by: NJan Varho <jan.varho@gmail.com> [Jason: expanded commit message] Fixes: 73c7733f ("random: do not throw away excess input to crng_fast_load") Cc: stable@vger.kernel.org # 5.17+, requires af704c85Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
- 01 4月, 2022 1 次提交
-
-
由 Jason A. Donenfeld 提交于
Prior, the "input_pool_data" array needed no real initialization, and so it was easy to mark it with __latent_entropy to populate it during compile-time. In switching to using a hash function, this required us to specifically initialize it to some specific state, which means we dropped the __latent_entropy attribute. An unfortunate side effect was this meant the pool was no longer seeded using compile-time random data. In order to bring this back, we declare an array in rand_initialize() with __latent_entropy and call mix_pool_bytes() on that at init, which accomplishes the same thing as before. We make this __initconst, so that it doesn't take up space at runtime after init. Fixes: 6e8ec255 ("random: use computational hash for entropy extraction") Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
- 25 3月, 2022 3 次提交
-
-
由 Jason A. Donenfeld 提交于
The comment about get_random_{u32,u64}() not invoking reseeding got added in an unrelated commit, that then was recently reverted by 0313bc27 ("Revert "random: block in /dev/urandom""). So this adds that little comment snippet back, and improves the wording a bit too. Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
If CONFIG_RANDOM_TRUST_CPU is set, the RNG initializes using RDRAND. But, the user can disable (or enable) this behavior by setting `random.trust_cpu=0/1` on the kernel command line. This allows system builders to do reasonable things while avoiding howls from tinfoil hatters. (Or vice versa.) CONFIG_RANDOM_TRUST_BOOTLOADER is basically the same thing, but regards the seed passed via EFI or device tree, which might come from RDRAND or a TPM or somewhere else. In order to allow distros to more easily enable this while avoiding those same howls (or vice versa), this commit adds the corresponding `random.trust_bootloader=0/1` toggle. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Graham Christensen <graham@grahamc.com> Reviewed-by: NArd Biesheuvel <ardb@kernel.org> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Link: https://github.com/NixOS/nixpkgs/pull/165355Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
At boot time, EFI calls add_bootloader_randomness(), which in turn calls add_hwgenerator_randomness(). Currently add_hwgenerator_randomness() feeds the first 64 bytes of randomness to the "fast init" non-crypto-grade phase. But if add_hwgenerator_randomness() gets called with more than POOL_MIN_BITS of entropy, there's no point in passing it off to the "fast init" stage, since that's enough entropy to bootstrap the real RNG. The "fast init" stage is just there to provide _something_ in the case where we don't have enough entropy to properly bootstrap the RNG. But if we do have enough entropy to bootstrap the RNG, the current logic doesn't serve a purpose. So, in the case where we're passed greater than or equal to POOL_MIN_BITS of entropy, this commit makes us skip the "fast init" phase. Cc: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
- 23 3月, 2022 1 次提交
-
-
由 Linus Torvalds 提交于
This reverts commit 6f98a4bf. It turns out we still can't do this. Way too many platforms that don't have any real source of randomness at boot and no jitter entropy because they don't even have a cycle counter. As reported by Guenter Roeck: "This causes a large number of qemu boot test failures for various architectures (arm, m68k, microblaze, sparc32, xtensa are the ones I observed). Common denominator is that boot hangs at 'Saving random seed:'" This isn't hugely unexpected - we tried it, it failed, so now we'll revert it. Link: https://lore.kernel.org/all/20220322155820.GA1745955@roeck-us.net/Reported-and-bisected-by: NGuenter Roeck <linux@roeck-us.net> Cc: Jason Donenfeld <Jason@zx2c4.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 3月, 2022 11 次提交
-
-
由 Jason A. Donenfeld 提交于
Rather than waiting a full second in an interruptable waiter before trying to generate entropy, try to generate entropy first and wait second. While waiting one second might give an extra second for getting entropy from elsewhere, we're already pretty late in the init process here, and whatever else is generating entropy will still continue to contribute. This has implications on signal handling: we call try_to_generate_entropy() from wait_for_random_bytes(), and wait_for_random_bytes() always uses wait_event_interruptible_timeout() when waiting, since it's called by userspace code in restartable contexts, where signals can pend. Since try_to_generate_entropy() now runs first, if a signal is pending, it's necessary for try_to_generate_entropy() to check for signals, since it won't hit the wait until after try_to_generate_entropy() has returned. And even before this change, when entering a busy loop in try_to_generate_entropy(), we should have been checking to see if any signals are pending, so that a process doesn't get stuck in that loop longer than expected. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
In order to chip away at the "premature first" problem, we augment our existing entropy accounting with more frequent reseedings at boot. The idea is that at boot, we're getting entropy from various places, and we're not very sure which of early boot entropy is good and which isn't. Even when we're crediting the entropy, we're still not totally certain that it's any good. Since boot is the one time (aside from a compromise) that we have zero entropy, it's important that we shepherd entropy into the crng fairly often. At the same time, we don't want a "premature next" problem, whereby an attacker can brute force individual bits of added entropy. In lieu of going full-on Fortuna (for now), we can pick a simpler strategy of just reseeding more often during the first 5 minutes after boot. This is still bounded by the 256-bit entropy credit requirement, so we'll skip a reseeding if we haven't reached that, but in case entropy /is/ coming in, this ensures that it makes its way into the crng rather rapidly during these early stages. Ordinarily we reseed if the previous reseeding is 300 seconds old. This commit changes things so that for the first 600 seconds of boot time, we reseed if the previous reseeding is uptime / 2 seconds old. That means that we'll reseed at the very least double the uptime of the previous reseeding. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
Rather than sometimes checking `crng_init < 2`, we should always use the crng_ready() macro, so that should we change anything later, it's consistent. Additionally, that macro already has a likely() around it, which means we don't need to open code our own likely() and unlikely() annotations. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
The current fast_mix() function is a piece of classic mailing list crypto, where it just sort of sprung up by an anonymous author without a lot of real analysis of what precisely it was accomplishing. As an ARX permutation alone, there are some easily searchable differential trails in it, and as a means of preventing malicious interrupts, it completely fails, since it xors new data into the entire state every time. It can't really be analyzed as a random permutation, because it clearly isn't, and it can't be analyzed as an interesting linear algebraic structure either, because it's also not that. There really is very little one can say about it in terms of entropy accumulation. It might diffuse bits, some of the time, maybe, we hope, I guess. But for the most part, it fails to accomplish anything concrete. As a reminder, the simple goal of add_interrupt_randomness() is to simply accumulate entropy until ~64 interrupts have elapsed, and then dump it into the main input pool, which uses a cryptographic hash. It would be nice to have something cryptographically strong in the interrupt handler itself, in case a malicious interrupt compromises a per-cpu fast pool within the 64 interrupts / 1 second window, and then inside of that same window somehow can control its return address and cycle counter, even if that's a bit far fetched. However, with a very CPU-limited budget, actually doing that remains an active research project (and perhaps there'll be something useful for Linux to come out of it). And while the abundance of caution would be nice, this isn't *currently* the security model, and we don't yet have a fast enough solution to make it our security model. Plus there's not exactly a pressing need to do that. (And for the avoidance of doubt, the actual cluster of 64 accumulated interrupts still gets dumped into our cryptographically secure input pool.) So, for now we are going to stick with the existing interrupt security model, which assumes that each cluster of 64 interrupt data samples is mostly non-malicious and not colluding with an infoleaker. With this as our goal, we have a few more choices, simply aiming to accumulate entropy, while discarding the least amount of it. We know from <https://eprint.iacr.org/2019/198> that random oracles, instantiated as computational hash functions, make good entropy accumulators and extractors, which is the justification for using BLAKE2s in the main input pool. As mentioned, we don't have that luxury here, but we also don't have the same security model requirements, because we're assuming that there aren't malicious inputs. A pseudorandom function instance can approximately behave like a random oracle, provided that the key is uniformly random. But since we're not concerned with malicious inputs, we can pick a fixed key, which is not secret, knowing that "nature" won't interact with a sufficiently chosen fixed key by accident. So we pick a PRF with a fixed initial key, and accumulate into it continuously, dumping the result every 64 interrupts into our cryptographically secure input pool. For this, we make use of SipHash-1-x on 64-bit and HalfSipHash-1-x on 32-bit, which are already in use in the kernel's hsiphash family of functions and achieve the same performance as the function they replace. It would be nice to do two rounds, but we don't exactly have the CPU budget handy for that, and one round alone is already sufficient. As mentioned, we start with a fixed initial key (zeros is fine), and allow SipHash's symmetry breaking constants to turn that into a useful starting point. Also, since we're dumping the result (or half of it on 64-bit so as to tax our hash function the same amount on all platforms) into the cryptographically secure input pool, there's no point in finalizing SipHash's output, since it'll wind up being finalized by something much stronger. This means that all we need to do is use the ordinary round function word-by-word, as normal SipHash does. Simplified, the flow is as follows: Initialize: siphash_state_t state; siphash_init(&state, key={0, 0, 0, 0}); Update (accumulate) on interrupt: siphash_update(&state, interrupt_data_and_timing); Dump into input pool after 64 interrupts: blake2s_update(&input_pool, &state, sizeof(state) / 2); The result of all of this is that the security model is unchanged from before -- we assume non-malicious inputs -- yet we now implement that model with a stronger argument. I would like to emphasize, again, that the purpose of this commit is to improve the existing design, by making it analyzable, without changing any fundamental assumptions. There may well be value down the road in changing up the existing design, using something cryptographically strong, or simply using a ring buffer of samples rather than having a fast_mix() at all, or changing which and how much data we collect each interrupt so that we can use something linear, or a variety of other ideas. This commit does not invalidate the potential for those in the future. For example, in the future, if we're able to characterize the data we're collecting on each interrupt, we may be able to inch toward information theoretic accumulators. <https://eprint.iacr.org/2021/523> shows that `s = ror32(s, 7) ^ x` and `s = ror64(s, 19) ^ x` make very good accumulators for 2-monotone distributions, which would apply to timestamp counters, like random_get_entropy() or jiffies, but would not apply to our current combination of the two values, or to the various function addresses and register values we mix in. Alternatively, <https://eprint.iacr.org/2021/1002> shows that max-period linear functions with no non-trivial invariant subspace make good extractors, used in the form `s = f(s) ^ x`. However, this only works if the input data is both identical and independent, and obviously a collection of address values and counters fails; so it goes with theoretical papers. Future directions here may involve trying to characterize more precisely what we actually need to collect in the interrupt handler, and building something specific around that. However, as mentioned, the morass of data we're gathering at the interrupt handler presently defies characterization, and so we use SipHash for now, which works well and performs well. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: NJean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
Drivers such as WireGuard need to learn when VMs fork in order to clear sessions. This commit provides a simple notifier_block for that, with a register and unregister function. When no VM fork detection is compiled in, this turns into a no-op, similar to how the power notifier works. Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
We previously rolled our own randomness readiness notifier, which only has two users in the whole kernel. Replace this with a more standard atomic notifier block that serves the same purpose with less code. Also unexport the symbols, because no modules use it, only unconditional builtins. The only drawback is that it's possible for a notification handler returning the "stop" code to prevent further processing, but given that there are only two users, and that we're unexporting this anyway, that doesn't seem like a significant drawback for the simplification we receive here. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
Since add_vmfork_randomness() is only called from vmgenid.o, we can guard it in CONFIG_VMGENID, similarly to how we do with add_disk_randomness() and CONFIG_BLOCK. If we ever have multiple things calling into add_vmfork_randomness(), we can add another shared Kconfig symbol for that, but for now, this is good enough. Even though add_vmfork_randomess() is a pretty small function, removing it means that there are only calls to crng_reseed(false) and none to crng_reseed(true), which means the compiler can constant propagate the false, removing branches from crng_reseed() and its descendants. Additionally, we don't even need the symbol to be exported if CONFIG_VMGENID is not a module, so conditionalize that too. Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
When a VM forks, we must immediately mix in additional information to the stream of random output so that two forks or a rollback don't produce the same stream of random numbers, which could have catastrophic cryptographic consequences. This commit adds a simple API, add_vmfork_ randomness(), for that, by force reseeding the crng. This has the added benefit of also draining the entropy pool and setting its timer back, so that any old entropy that was there prior -- which could have already been used by a different fork, or generally gone stale -- does not contribute to the accounting of the next 256 bits. Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Jann Horn <jannh@google.com> Cc: Eric Biggers <ebiggers@google.com> Reviewed-by: NArd Biesheuvel <ardb@kernel.org> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
We leave around these old sysctls for compatibility, and we keep them "writable" for compatibility, but even after writing, we should keep reporting the same value. This is consistent with how userspaces tend to use sysctl_random_write_wakeup_bits, writing to it, and then later reading from it and using the value. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
This isn't used by anything or anywhere, but we can't delete it due to compatibility. So at least give it the correct value of what it's supposed to be instead of a garbage one. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
This topic has come up countless times, and usually doesn't go anywhere. This time I thought I'd bring it up with a slightly narrower focus, updated for some developments over the last three years: we finally can make /dev/urandom always secure, in light of the fact that our RNG is now always seeded. Ever since Linus' 50ee7529 ("random: try to actively add entropy rather than passively wait for it"), the RNG does a haveged-style jitter dance around the scheduler, in order to produce entropy (and credit it) for the case when we're stuck in wait_for_random_bytes(). How ever you feel about the Linus Jitter Dance is beside the point: it's been there for three years and usually gets the RNG initialized in a second or so. As a matter of fact, this is what happens currently when people use getrandom(). It's already there and working, and most people have been using it for years without realizing. So, given that the kernel has grown this mechanism for seeding itself from nothing, and that this procedure happens pretty fast, maybe there's no point any longer in having /dev/urandom give insecure bytes. In the past we didn't want the boot process to deadlock, which was understandable. But now, in the worst case, a second goes by, and the problem is resolved. It seems like maybe we're finally at a point when we can get rid of the infamous "urandom read hole". The one slight drawback is that the Linus Jitter Dance relies on random_ get_entropy() being implemented. The first lines of try_to_generate_ entropy() are: stack.now = random_get_entropy(); if (stack.now == random_get_entropy()) return; On most platforms, random_get_entropy() is simply aliased to get_cycles(). The number of machines without a cycle counter or some other implementation of random_get_entropy() in 2022, which can also run a mainline kernel, and at the same time have a both broken and out of date userspace that relies on /dev/urandom never blocking at boot is thought to be exceedingly low. And to be clear: those museum pieces without cycle counters will continue to run Linux just fine, and even /dev/urandom will be operable just like before; the RNG just needs to be seeded first through the usual means, which should already be the case now. On systems that really do want unseeded randomness, we already offer getrandom(GRND_INSECURE), which is in use by, e.g., systemd for seeding their hash tables at boot. Nothing in this commit would affect GRND_INSECURE, and it remains the means of getting those types of random numbers. This patch goes a long way toward eliminating a long overdue userspace crypto footgun. After several decades of endless user confusion, we will finally be able to say, "use any single one of our random interfaces and you'll be fine. They're all the same. It doesn't matter." And that, I think, is really something. Finally all of those blog posts and disagreeing forums and contradictory articles will all become correct about whatever they happened to recommend, and along with it, a whole class of vulnerabilities eliminated. With very minimal downside, we're finally in a position where we can make this change. Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: Borislav Petkov <bp@alien8.de> Cc: Guo Ren <guoren@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Joshua Kinard <kumba@gentoo.org> Cc: David Laight <David.Laight@aculab.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Eric Biggers <ebiggers@google.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
- 28 2月, 2022 3 次提交
-
-
由 Jason A. Donenfeld 提交于
Taking spinlocks from IRQ context is generally problematic for PREEMPT_RT. That is, in part, why we take trylocks instead. However, a spin_try_lock() is also problematic since another spin_lock() invocation can potentially PI-boost the wrong task, as the spin_try_lock() is invoked from an IRQ-context, so the task on CPU (random task or idle) is not the actual owner. Additionally, by deferring the crng pre-init loading to the worker, we can use the cryptographic hash function rather than xor, which is perhaps a meaningful difference when considering this data has only been through the relatively weak fast_mix() function. The biggest downside of this approach is that the pre-init loading is now deferred until later, which means things that need random numbers after interrupts are enabled, but before workqueues are running -- or before this particular worker manages to run -- are going to get into trouble. Hopefully in the real world, this window is rather small, especially since this code won't run until 64 interrupts had occurred. Cc: Sultan Alsawaf <sultan@kerneltoast.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Acked-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
random_get_entropy() returns a cycles_t, not an unsigned long, which is sometimes 64 bits on various 32-bit platforms, including x86. Conversely, jiffies is always unsigned long. This commit fixes things to use cycles_t for fields that use random_get_entropy(), named "cycles", and unsigned long for fields that use jiffies, named "now". It's also good to mix in a cycles_t and a jiffies in the same way for both add_device_randomness and add_timer_randomness, rather than using xor in one case. Finally, we unify the order of these volatile reads, always reading the more precise cycles counter, and then jiffies, so that the cycle counter is as close to the event as possible. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
Rather than hard coding various lengths, we can use the right constants. Strings should be `char *` while buffers should be `u8 *`. Rather than have a nonsensical and unused maxlength, just remove it. Finally, use snprintf instead of sprintf, just out of good hygiene. As well, remove the old comment about returning a binary UUID via the binary sysctl syscall. That syscall was removed from the kernel in 5.5, and actually, the "uuid_strategy" function and related infrastructure for even serving it via the binary sysctl syscall was removed with 894d2491 ("sysctl drivers: Remove dead binary sysctl support") back in 2.6.33. Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
- 24 2月, 2022 2 次提交
-
-
由 Jason A. Donenfeld 提交于
The only time that we need to wake up /dev/random writers on RNDCLEARPOOL/RNDZAPPOOL is when we're changing from a value that is greater than or equal to POOL_MIN_BITS to zero, because if we're changing from below POOL_MIN_BITS to zero, the writers are already unblocked. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
When the interrupt handler does not have a valid cycle counter, it calls get_reg() to read a register from the irq stack, in round-robin. Currently it does this assuming that registers are 32-bit. This is _probably_ the case, and probably all platforms without cycle counters are in fact 32-bit platforms. But maybe not, and either way, it's not quite correct. This commit fixes that to deal with `unsigned long` rather than `u32`. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
- 22 2月, 2022 11 次提交
-
-
由 Jason A. Donenfeld 提交于
For the irq randomness fast pool, rather than having to use expensive atomics, which were visibly the most expensive thing in the entire irq handler, simply take care of the extreme edge case of resetting count to zero in the cpuhp online handler, just after workqueues have been reenabled. This simplifies the code a bit and lets us use vanilla variables rather than atomics, and performance should be improved. As well, very early on when the CPU comes up, while interrupts are still disabled, we clear out the per-cpu crng and its batches, so that it always starts with fresh randomness. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Sultan Alsawaf <sultan@kerneltoast.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
This has no real functional change, as crng_pre_init_inject() (and before that, crng_slow_init()) always checks for == 0, not >= 2. So correct the outer unlocked change to reflect that. Before this used crng_ready(), which was not correct. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
crng_fast_load() and crng_slow_load() have different semantics: - crng_fast_load() xors and accounts with crng_init_cnt. - crng_slow_load() hashes and doesn't account. However add_hwgenerator_randomness() can afford to hash (it's called from a kthread), and it should account. Additionally, ones that can afford to hash don't need to take a trylock but can take a normal lock. So, we combine these into one function, crng_pre_init_inject(), which allows us to control these in a uniform way. This will make it simpler later to simplify this all down when the time comes for that. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
Since rand_initialize() is run while interrupts are still off and nothing else is running, we don't need to repeatedly take and release the pool spinlock, especially in the RDSEED loop. Reviewed-by: NEric Biggers <ebiggers@google.com> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
On PREEMPT_RT, it's problematic to take spinlocks from hard irq handlers. We can fix this by deferring to a workqueue the dumping of the fast pool into the input pool. We accomplish this with some careful rules on fast_pool->count: - When it's incremented to >= 64, we schedule the work. - If the top bit is set, we never schedule the work, even if >= 64. - The worker is responsible for setting it back to 0 when it's done. There are two small issues around using workqueues for this purpose that we work around. The first issue is that mix_interrupt_randomness() might be migrated to another CPU during CPU hotplug. This issue is rectified by checking that it hasn't been migrated (after disabling irqs). If it has been migrated, then we set the count to zero, so that when the CPU comes online again, it can requeue the work. As part of this, we switch to using an atomic_t, so that the increment in the irq handler doesn't wipe out the zeroing if the CPU comes back online while this worker is running. The second issue is that, though relatively minor in effect, we probably want to make sure we get a consistent view of the pool onto the stack, in case it's interrupted by an irq while reading. To do this, we don't reenable irqs until after the copy. There are only 18 instructions between the cli and sti, so this is a pretty tiny window. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Acked-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: NSultan Alsawaf <sultan@kerneltoast.com> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
Now that we've re-documented the various sections, we can remove the outdated text here and replace it with a high-level overview. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: NEric Biggers <ebiggers@google.com> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
This pulls all of the sysctl-focused functions into the sixth labeled section. No functional changes. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
This pulls all of the userspace read/write-focused functions into the fifth labeled section. No functional changes. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: NEric Biggers <ebiggers@google.com> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
This pulls all of the entropy collection-focused functions into the fourth labeled section. No functional changes. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
This pulls all of the entropy extraction-focused functions into the third labeled section. No functional changes. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: NEric Biggers <ebiggers@google.com> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 提交于
This pulls all of the crng-focused functions into the second labeled section. No functional changes. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-