1. 24 8月, 2018 1 次提交
    • C
      getxattr: use correct xattr length · 82c9a927
      Christian Brauner 提交于
      When running in a container with a user namespace, if you call getxattr
      with name = "system.posix_acl_access" and size % 8 != 4, then getxattr
      silently skips the user namespace fixup that it normally does resulting in
      un-fixed-up data being returned.
      This is caused by posix_acl_fix_xattr_to_user() being passed the total
      buffer size and not the actual size of the xattr as returned by
      vfs_getxattr().
      This commit passes the actual length of the xattr as returned by
      vfs_getxattr() down.
      
      A reproducer for the issue is:
      
        touch acl_posix
      
        setfacl -m user:0:rwx acl_posix
      
      and the compile:
      
        #define _GNU_SOURCE
        #include <errno.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <string.h>
        #include <sys/types.h>
        #include <unistd.h>
        #include <attr/xattr.h>
      
        /* Run in user namespace with nsuid 0 mapped to uid != 0 on the host. */
        int main(int argc, void **argv)
        {
                ssize_t ret1, ret2;
                char buf1[128], buf2[132];
                int fret = EXIT_SUCCESS;
                char *file;
      
                if (argc < 2) {
                        fprintf(stderr,
                                "Please specify a file with "
                                "\"system.posix_acl_access\" permissions set\n");
                        _exit(EXIT_FAILURE);
                }
                file = argv[1];
      
                ret1 = getxattr(file, "system.posix_acl_access",
                                buf1, sizeof(buf1));
                if (ret1 < 0) {
                        fprintf(stderr, "%s - Failed to retrieve "
                                        "\"system.posix_acl_access\" "
                                        "from \"%s\"\n", strerror(errno), file);
                        _exit(EXIT_FAILURE);
                }
      
                ret2 = getxattr(file, "system.posix_acl_access",
                                buf2, sizeof(buf2));
                if (ret2 < 0) {
                        fprintf(stderr, "%s - Failed to retrieve "
                                        "\"system.posix_acl_access\" "
                                        "from \"%s\"\n", strerror(errno), file);
                        _exit(EXIT_FAILURE);
                }
      
                if (ret1 != ret2) {
                        fprintf(stderr, "The value of \"system.posix_acl_"
                                        "access\" for file \"%s\" changed "
                                        "between two successive calls\n", file);
                        _exit(EXIT_FAILURE);
                }
      
                for (ssize_t i = 0; i < ret2; i++) {
                        if (buf1[i] == buf2[i])
                                continue;
      
                        fprintf(stderr,
                                "Unexpected different in byte %zd: "
                                "%02x != %02x\n", i, buf1[i], buf2[i]);
                        fret = EXIT_FAILURE;
                }
      
                if (fret == EXIT_SUCCESS)
                        fprintf(stderr, "Test passed\n");
                else
                        fprintf(stderr, "Test failed\n");
      
                _exit(fret);
        }
      and run:
      
        ./tester acl_posix
      
      On a non-fixed up kernel this should return something like:
      
        root@c1:/# ./t
        Unexpected different in byte 16: ffffffa0 != 00
        Unexpected different in byte 17: ffffff86 != 00
        Unexpected different in byte 18: 01 != 00
      
      and on a fixed kernel:
      
        root@c1:~# ./t
        Test passed
      
      Cc: stable@vger.kernel.org
      Fixes: 2f6f0654 ("userns: Convert vfs posix_acl support to use kuids and kgids")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=199945Reported-by: NColin Watson <cjwatson@ubuntu.com>
      Signed-off-by: NChristian Brauner <christian@brauner.io>
      Acked-by: NSerge Hallyn <serge@hallyn.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      82c9a927
  2. 11 8月, 2018 3 次提交
    • J
      sys: don't hold uts_sem while accessing userspace memory · 42a0cc34
      Jann Horn 提交于
      Holding uts_sem as a writer while accessing userspace memory allows a
      namespace admin to stall all processes that attempt to take uts_sem.
      Instead, move data through stack buffers and don't access userspace memory
      while uts_sem is held.
      
      Cc: stable@vger.kernel.org
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      42a0cc34
    • J
      userns: move user access out of the mutex · 5820f140
      Jann Horn 提交于
      The old code would hold the userns_state_mutex indefinitely if
      memdup_user_nul stalled due to e.g. a userfault region. Prevent that by
      moving the memdup_user_nul in front of the mutex_lock().
      
      Note: This changes the error precedence of invalid buf/count/*ppos vs
      map already written / capabilities missing.
      
      Fixes: 22d917d8 ("userns: Rework the user_namespace adding uid/gid...")
      Cc: stable@vger.kernel.org
      Signed-off-by: NJann Horn <jannh@google.com>
      Acked-by: NChristian Brauner <christian@brauner.io>
      Acked-by: NSerge Hallyn <serge@hallyn.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      5820f140
    • E
      cap_inode_getsecurity: use d_find_any_alias() instead of d_find_alias() · 355139a8
      Eddie.Horng 提交于
      The code in cap_inode_getsecurity(), introduced by commit 8db6c34f
      ("Introduce v3 namespaced file capabilities"), should use
      d_find_any_alias() instead of d_find_alias() do handle unhashed dentry
      correctly. This is needed, for example, if execveat() is called with an
      open but unlinked overlayfs file, because overlayfs unhashes dentry on
      unlink.
      This is a regression of real life application, first reported at
      https://www.spinics.net/lists/linux-unionfs/msg05363.html
      
      Below reproducer and setup can reproduce the case.
        const char* exec="echo";
        const char *newargv[] = { "echo", "hello", NULL};
        const char *newenviron[] = { NULL };
        int fd, err;
      
        fd = open(exec, O_PATH);
        unlink(exec);
        err = syscall(322/*SYS_execveat*/, fd, "", newargv, newenviron,
      AT_EMPTY_PATH);
        if(err<0)
          fprintf(stderr, "execveat: %s\n", strerror(errno));
      
      gcc compile into ~/test/a.out
      mount -t overlay -orw,lowerdir=/mnt/l,upperdir=/mnt/u,workdir=/mnt/w
      none /mnt/m
      cd /mnt/m
      cp /bin/echo .
      ~/test/a.out
      
      Expected result:
      hello
      Actually result:
      execveat: Invalid argument
      dmesg:
      Invalid argument reading file caps for /dev/fd/3
      
      The 2nd reproducer and setup emulates similar case but for
      regular filesystem:
        const char* exec="echo";
        int fd, err;
        char buf[256];
      
        fd = open(exec, O_RDONLY);
        unlink(exec);
        err = fgetxattr(fd, "security.capability", buf, 256);
        if(err<0)
          fprintf(stderr, "fgetxattr: %s\n", strerror(errno));
      
      gcc compile into ~/test_fgetxattr
      
      cd /tmp
      cp /bin/echo .
      ~/test_fgetxattr
      
      Result:
      fgetxattr: Invalid argument
      
      On regular filesystem, for example, ext4 read xattr from
      disk and return to execveat(), will not trigger this issue, however,
      the overlay attr handler pass real dentry to vfs_getxattr() will.
      This reproducer calls fgetxattr() with an unlinked fd, involkes
      vfs_getxattr() then reproduced the case that d_find_alias() in
      cap_inode_getsecurity() can't find the unlinked dentry.
      Suggested-by: NAmir Goldstein <amir73il@gmail.com>
      Acked-by: NAmir Goldstein <amir73il@gmail.com>
      Acked-by: NSerge E. Hallyn <serge@hallyn.com>
      Fixes: 8db6c34f ("Introduce v3 namespaced file capabilities")
      Cc: <stable@vger.kernel.org> # v4.14
      Signed-off-by: NEddie Horng <eddie.horng@mediatek.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      355139a8
  3. 24 6月, 2018 18 次提交
    • L
      Linux 4.18-rc2 · 7daf201d
      Linus Torvalds 提交于
      7daf201d
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c81b995f
      Linus Torvalds 提交于
      Pull perf fixes from Thomas Gleixner:
       "A pile of perf updates:
      
        Kernel side:
      
         - Remove an incorrect warning in uprobe_init_insn() when
           insn_get_length() fails. The error return code is handled at the
           call site.
      
         - Move the inline keyword to the right place in the perf ringbuffer
           code to address a W=1 build warning.
      
        Tooling:
      
        perf stat:
      
         - Fix metric column header display alignment
      
         - Improve error messages for default attributes, providing better
           output for error in command line.
      
         - Add --interval-clear option, to provide a 'watch' like printing
      
        perf script:
      
         - Show hw-cache events too
      
        perf c2c:
      
         - Fix data dependency problem in layout of 'struct c2c_hist_entry'
      
        Core:
      
         - Do not blindly assume that 'struct perf_evsel' can be obtained via
           a straight forward container_of() as there are call sites which
           hand in a plain 'struct hist' which is not part of a container.
      
         - Fix error index in the PMU event parser, so that error messages can
           point to the problematic token"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/core: Move the inline keyword at the beginning of the function declaration
        uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn()
        perf script: Show hw-cache events
        perf c2c: Keep struct hist_entry at the end of struct c2c_hist_entry
        perf stat: Add event parsing error handling to add_default_attributes
        perf stat: Allow to specify specific metric column len
        perf stat: Fix metric column header display alignment
        perf stat: Use only color_fprintf call in print_metric_only
        perf stat: Add --interval-clear option
        perf tools: Fix error index for pmu event parser
        perf hists: Reimplement hists__has_callchains()
        perf hists browser gtk: Use hist_entry__has_callchains()
        perf hists: Make hist_entry__has_callchains() work with 'perf c2c'
        perf hists: Save the callchain_size in struct hist_entry
      c81b995f
    • L
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2ce413ec
      Linus Torvalds 提交于
      Pull rseq fixes from Thomas Gleixer:
       "A pile of rseq related fixups:
      
         - Prevent infinite recursion when delivering SIGSEGV
      
         - Remove the abort of rseq critical section on fork() as syscalls
           inside rseq critical sections are explicitely forbidden. So no
           point in doing the abort on the child.
      
         - Align the rseq structure on 32 bytes in the ARM selftest code.
      
         - Fix file permissions of the test script"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        rseq: Avoid infinite recursion when delivering SIGSEGV
        rseq/cleanup: Do not abort rseq c.s. in child on fork()
        rseq/selftests/arm: Align 'struct rseq_cs' on 32 bytes
        rseq/selftests: Make run_param_test.sh executable
      2ce413ec
    • L
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 64dd7655
      Linus Torvalds 提交于
      Pull EFI fixes from Thomas Gleixner:
       "Two fixlets for the EFI maze:
      
         - Properly zero variables to prevent an early boot hang on EFI mixed
           mode systems
      
         - Fix the fallout of merging the 32bit and 64bit variants of EFI PCI
           related code which ended up chosing the 32bit variant of the actual
           EFi call invocation which leads to failures on 64bit"
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi/x86: Fix incorrect invocation of PciIo->Attributes()
        efi/libstub/tpm: Initialize efi_physical_addr_t vars to zero for mixed mode
      64dd7655
    • L
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d3a6749c
      Linus Torvalds 提交于
      Pull core fixes from Thomas Gleixner:
       "Two tiny fixes:
      
         - Add the missing machine_real_restart() to objtools noreturn list so
           it stops complaining
      
         - Fix a trivial comment typo"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        kernel.h: Fix a typo in comment
        objtool: Add machine_real_restart() to the noreturn list
      d3a6749c
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d4e860ea
      Linus Torvalds 提交于
      Pull x86 fixes from Thomas Gleixner:
       "A set of fixes for x86:
      
         - Make Xen PV guest deal with speculative store bypass correctly
      
         - Address more fallout from the 5-Level pagetable handling. Undo an
           __initdata annotation to avoid section mismatch and malfunction
           when post init code would touch the freed variable.
      
         - Handle exception fixup in math_error() before calling notify_die().
           The reverse call order incorrectly triggers notify_die() listeners
           for soemthing which is handled correctly at the site which issues
           the floating point instruction.
      
         - Fix an off by one in the LLC topology calculation on AMD
      
         - Handle non standard memory block sizes gracefully un UV platforms
      
         - Plug a memory leak in the microcode loader
      
         - Sanitize the purgatory build magic
      
         - Add the x86 specific device tree bindings directory to the x86
           MAINTAINER file patterns"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mm: Fix 'no5lvl' handling
        Revert "x86/mm: Mark __pgtable_l5_enabled __initdata"
        x86/CPU/AMD: Fix LLC ID bit-shift calculation
        MAINTAINERS: Add file patterns for x86 device tree bindings
        x86/microcode/intel: Fix memleak in save_microcode_patch()
        x86/platform/UV: Add kernel parameter to set memory block size
        x86/platform/UV: Use new set memory block size function
        x86/platform/UV: Add adjustable set memory block size function
        x86/build: Remove unnecessary preparation for purgatory
        Revert "kexec/purgatory: Add clean-up for purgatory directory"
        x86/xen: Add call of speculative_store_bypass_ht_init() to PV paths
        x86: Call fixup_exception() before notify_die() in math_error()
      d4e860ea
    • L
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 177d363e
      Linus Torvalds 提交于
      Pull x86 pti fixes from Thomas Gleixner:
       "Two small updates for the speculative distractions:
      
         - Make it more clear to the compiler that array_index_mask_nospec()
           is not subject for optimizations. It's not perfect, but ...
      
         - Don't report XEN PV guests as vulnerable because their mitigation
           state depends on the hypervisor. Report unknown and refer to the
           hypervisor requirement"
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/spectre_v1: Disable compiler optimizations over array_index_mask_nospec()
        x86/pti: Don't report XenPV as vulnerable
      177d363e
    • L
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2da2ca24
      Linus Torvalds 提交于
      Pull locking fixes from Thomas Gleixner:
       "A set of fixes and updates for the locking code:
      
         - Prevent lockdep from updating irq state within its own code and
           thereby confusing itself.
      
         - Buid fix for older GCCs which mistreat anonymous unions
      
         - Add a missing lockdep annotation in down_read_non_onwer() which
           causes up_read_non_owner() to emit a lockdep splat
      
         - Remove the custom alpha dec_and_lock() implementation which is
           incorrect in terms of ordering and use the generic one.
      
        The remaining two commits are not strictly fixes. They provide irqsave
        variants of atomic_dec_and_lock() and refcount_dec_and_lock(). These
        are required to merge the relevant updates and cleanups into different
        maintainer trees for 4.19, so routing them into mainline without
        actual users is the sanest approach.
      
        They should have been in -rc1, but last weekend I took the liberty to
        just avoid computers in order to regain some mental sanity"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/qspinlock: Fix build for anonymous union in older GCC compilers
        locking/lockdep: Do not record IRQ state within lockdep code
        locking/rwsem: Fix up_read_non_owner() warning with DEBUG_RWSEMS
        locking/refcounts: Implement refcount_dec_and_lock_irqsave()
        atomic: Add irqsave variant of atomic_dec_and_lock()
        alpha: Remove custom dec_and_lock() implementation
      2da2ca24
    • L
      Merge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a43de489
      Linus Torvalds 提交于
      Pull ras fixes from Thomas Gleixner:
       "A set of fixes for RAS/MCE:
      
         - Improve the error message when the kernel cannot recover from a MCE
           so the maximum amount of information gets provided.
      
         - Individually check MCE recovery features on SkyLake CPUs instead of
           assuming none when the CAPID0 register does not advertise the
           general ability for recovery.
      
         - Prevent MCE to output inconsistent messages which first show an
           error location and then claim that the source is unknown.
      
         - Prevent overwriting MCi_STATUS in the attempt to gather more
           information when a fatal MCE has alreay been detected. This leads
           to empty status values in the printout and failing to react
           promptly on the fatal event"
      
      * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mce: Fix incorrect "Machine check from unknown source" message
        x86/mce: Do not overwrite MCi_STATUS in mce_no_way_out()
        x86/mce: Check for alternate indication of machine check recovery on Skylake
        x86/mce: Improve error message when kernel cannot recover
      a43de489
    • L
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6242258b
      Linus Torvalds 提交于
      Pull timer fixes from Thomas Gleixner:
       "A small set of fixes for time(r) related issues:
      
         - Fix a long standing conversion issue in jiffies_to_msecs() for odd
           HZ values like 1024 or 1200 which resulted in returning 0 for small
           jiffies values due to rounding down.
      
         - Use the proper CONFIG symbol in the new Y2038 safe compat code for
           posix-timers. Not yet a visible breakage, but this will immediately
           trigger when the architecture support for the new interfaces is
           merged.
      
         - Return an error code in the STM32 clocksource driver on failure
           instead of success.
      
         - Remove the redundant and stale irq disabled check in the posix cpu
           timer code. The check is at the wrong place anyway and lockdep
           already covers it via the sighand lock locking coverage"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        time: Make sure jiffies_to_msecs() preserves non-zero time periods
        posix-timers: Fix nanosleep_copyout() for CONFIG_COMPAT_32BIT_TIME
        clocksource/drivers/stm32: Fix error return code
        posix-cpu-timers: Remove lockdep_assert_irqs_disabled()
      6242258b
    • L
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 78fea633
      Linus Torvalds 提交于
      Pull irq fixes from Thomas Gleixner:
       "A set of fixes mostly for the ARM/GIC world:
      
         - Fix the MSI affinity handling in the ls-scfg irq chip driver so it
           updates and uses the effective affinity mask correctly
      
         - Prevent binding LPIs to offline CPUs and respect the Cavium erratum
           which requires that LPIs which belong to an offline NUMA node are
           not bound to a CPU on a different NUMA node.
      
         - Free only the amount of allocated interrupts in the GIC-V2M driver
           instead of trying to free log2(nrirqs).
      
         - Prevent emitting SYNC and VSYNC targetting non existing interrupt
           collections in the GIC-V3 ITS driver
      
         - Ensure that the GIV-V3 interrupt redistributor is correctly
           reprogrammed on CPU hotplug
      
         - Remove a stale unused helper function"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqdesc: Delete irq_desc_get_msi_desc()
        irqchip/gic-v3-its: Fix reprogramming of redistributors on CPU hotplug
        irqchip/gic-v3-its: Only emit VSYNC if targetting a valid collection
        irqchip/gic-v3-its: Only emit SYNC if targetting a valid collection
        irqchip/gic-v3-its: Don't bind LPI to unavailable NUMA node
        irqchip/gic-v2m: Fix SPI release on error path
        irqchip/ls-scfg-msi: Fix MSI affinity handling
        genirq/debugfs: Add missing IRQCHIP_SUPPORTS_LEVEL_MSI debug
      78fea633
    • L
      Merge tag 'mips_fixes_4.18_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · e0bc833d
      Linus Torvalds 提交于
      Pull MIPS fixes from Paul Burton:
       "A few MIPS fixes for 4.18:
      
         - a GPIO device name fix for a regression in v4.15-rc1.
      
         - an errata workaround for the BCM5300X platform.
      
         - a fix to ftrace function graph tracing, broken for a long time with
           the fix applying cleanly back as far as v3.17.
      
         - addition of read barriers to in{b,w,l,q}() functions, matching
           behavior of other architectures & mirroring the equivalent addition
           to read{b,w,l,q} in v4.17-rc2.
      
        Plus changes to wire up new syscalls introduced in the 4.18 cycle:
      
         - Restartable sequences support is added, including MIPS support in
           the selftests.
      
         - io_pgetevents is wired up"
      
      * tag 'mips_fixes_4.18_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
        MIPS: Wire up io_pgetevents syscall
        rseq/selftests: Implement MIPS support
        MIPS: Wire up the restartable sequences (rseq) syscall
        MIPS: Add syscall detection for restartable sequences
        MIPS: Add support for restartable sequences
        MIPS: io: Add barrier after register read in inX()
        mips: ftrace: fix static function graph tracing
        MIPS: BCM47XX: Enable 74K Core ExternalSync for PCIe erratum
        MIPS: pb44: Fix i2c-gpio GPIO descriptor table
      e0bc833d
    • A
      efi/x86: Fix incorrect invocation of PciIo->Attributes() · 2e6eb40c
      Ard Biesheuvel 提交于
      The following commit:
      
        2c3625cb ("efi/x86: Fold __setup_efi_pci32() and __setup_efi_pci64() into one function")
      
      ... merged the two versions of __setup_efi_pciXX(), without taking into
      account that the 32-bit version used a rather dodgy trick to pass an
      immediate 0 constant as argument for a uint64_t parameter.
      
      The issue is caused by the fact that on x86, UEFI protocol method calls
      are redirected via struct efi_config::call(), which is a variadic function,
      and so the compiler has to infer the types of the parameters from the
      arguments rather than from the prototype.
      
      As the 32-bit x86 calling convention passes arguments via the stack,
      passing the unqualified constant 0 twice is the same as passing 0ULL,
      which is why the 32-bit code in __setup_efi_pci32() contained the
      following call:
      
        status = efi_early->call(pci->attributes, pci,
                                 EfiPciIoAttributeOperationGet, 0, 0,
                                 &attributes);
      
      to invoke this UEFI protocol method:
      
        typedef
        EFI_STATUS
        (EFIAPI *EFI_PCI_IO_PROTOCOL_ATTRIBUTES) (
          IN  EFI_PCI_IO_PROTOCOL                     *This,
          IN  EFI_PCI_IO_PROTOCOL_ATTRIBUTE_OPERATION Operation,
          IN  UINT64                                  Attributes,
          OUT UINT64                                  *Result OPTIONAL
          );
      
      After the merge, we inadvertently ended up with this version for both
      32-bit and 64-bit builds, breaking the latter.
      
      So replace the two zeroes with the explicitly typed constant 0ULL,
      which works as expected on both 32-bit and 64-bit builds.
      
      Wilfried tested the 64-bit build, and I checked the generated assembly
      of a 32-bit build with and without this patch, and they are identical.
      Reported-by: NWilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
      Tested-by: NWilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: hdegoede@redhat.com
      Cc: linux-efi@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2e6eb40c
    • L
      Merge tag 'for-linus-20180623' of git://git.kernel.dk/linux-block · 77072ca5
      Linus Torvalds 提交于
      Pull block fixes from Jens Axboe:
      
       - Further timeout fixes. We aren't quite there yet, so expect another
         round of fixes for that to completely close some of the IRQ vs
         completion races. (Christoph/Bart)
      
       - Set of NVMe fixes from the usual suspects, mostly error handling
      
       - Two off-by-one fixes (Dan)
      
       - Another bdi race fix (Jan)
      
       - Fix nbd reconfigure with NBD_DISCONNECT_ON_CLOSE (Doron)
      
      * tag 'for-linus-20180623' of git://git.kernel.dk/linux-block:
        blk-mq: Fix timeout handling in case the timeout handler returns BLK_EH_DONE
        bdi: Fix another oops in wb_workfn()
        lightnvm: Remove depends on HAS_DMA in case of platform dependency
        nvme-pci: limit max IO size and segments to avoid high order allocations
        nvme-pci: move nvme_kill_queues to nvme_remove_dead_ctrl
        nvme-fc: release io queues to allow fast fail
        nbd: Add the nbd NBD_DISCONNECT_ON_CLOSE config flag.
        block: sed-opal: Fix a couple off by one bugs
        blk-mq-debugfs: Off by one in blk_mq_rq_state_name()
        nvmet: reset keep alive timer in controller enable
        nvme-rdma: don't override opts->queue_size
        nvme-rdma: Fix command completion race at error recovery
        nvme-rdma: fix possible free of a non-allocated async event buffer
        nvme-rdma: fix possible double free condition when failing to create a controller
        Revert "block: Add warning for bi_next not NULL in bio_endio()"
        block: fix timeout changes for legacy request drivers
      77072ca5
    • L
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 2dd3f7c9
      Linus Torvalds 提交于
      Pull crypto fixes from Herbert Xu:
      
       - Fix use after free in chtls
      
       - Fix RBP breakage in sha3
      
       - Fix use after free in hwrng_unregister
      
       - Fix overread in morus640
      
       - Move sleep out of kernel_neon in arm64/aes-blk
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        hwrng: core - Always drop the RNG in hwrng_unregister()
        crypto: morus640 - Fix out-of-bounds access
        crypto: don't optimize keccakf()
        crypto: arm64/aes-blk - fix and move skcipher_walk_done out of kernel_neon_begin, _end
        crypto: chtls - use after free in chtls_pt_recvmsg()
      2dd3f7c9
    • L
      Merge tag 'linux-kselftest-4.18-rc2' of... · b13fbe77
      Linus Torvalds 提交于
      Merge tag 'linux-kselftest-4.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kselftest fixes from Shuah Khan:
      
       - fix new sparc64 adi driver test compile errors on non-sparc systems
      
       - fix config fragment for sync framework for improved test coverage
      
       - fix several tests to return correct Kselftest skip code
      
      * tag 'linux-kselftest-4.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests: sparc64: Add missing SPDX License Identifiers
        selftests: sparc64: delete RUN_TESTS and EMIT_TESTS overrides
        selftests: sparc64: Fix to do nothing on non-sparc64
        selftests: sync: add config fragment for testing sync framework
        selftests: vm: return Kselftest Skip code for skipped tests
        selftests: zram: return Kselftest Skip code for skipped tests
        selftests: user: return Kselftest Skip code for skipped tests
        selftests: sysctl: return Kselftest Skip code for skipped tests
        selftests: static_keys: return Kselftest Skip code for skipped tests
        selftests: pstore: return Kselftest Skip code for skipped tests
      b13fbe77
    • L
      Merge tag 'trace-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 81f9c4e4
      Linus Torvalds 提交于
      Pull tracing fixes from Steven Rostedt:
       "This contains a few fixes and a clean up.
      
         - a bad merge caused an "endif" to go in the wrong place in
           scripts/Makefile.build
      
         - softirq tracing fix for tracing that corrupts lockdep and causes a
           false splat
      
         - histogram documentation typo fixes
      
         - fix a bad memory reference when passing in no filter to the filter
           code
      
         - simplify code by using the swap macro instead of open coding the
           swap"
      
      * tag 'trace-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Fix SKIP_STACK_VALIDATION=1 build due to bad merge with -mrecord-mcount
        tracing: Fix some errors in histogram documentation
        tracing: Use swap macro in update_max_tr
        softirq: Reorder trace_softirqs_on to prevent lockdep splat
        tracing: Check for no filter when processing event filters
      81f9c4e4
    • B
      blk-mq: Fix timeout handling in case the timeout handler returns BLK_EH_DONE · f5e350f0
      Bart Van Assche 提交于
      Make sure that RQF_TIMED_OUT is cleared when a request is reused
      after a block driver timeout handler has returned BLK_EH_DONE.
      
      Fixes: da661267 ("blk-mq: don't time out requests again that are in the timeout handler")
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
      Cc: Andrew Randrianasulu <randrianasulu@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f5e350f0
  4. 23 6月, 2018 13 次提交
    • L
      Merge tag 'powerpc-4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 5e220483
      Linus Torvalds 提交于
      Pull powerpc fixes from Michael Ellerman:
      
       - a fix for hugetlb with 4K pages, broken by our recent changes for
         split PMD PTL.
      
       - set the correct assembler machine type on e500mc, needed since
         binutils 2.26 introduced two forms for the "wait" instruction.
      
       - a fix for potential missed TLB flushes with MADV_[FREE|DONTNEED] etc.
         and THP on Power9 Radix.
      
       - three fixes to try and make our panic handling more robust by hard
         disabling interrupts, and not marking stopped CPUs as offline because
         they haven't been properly offlined.
      
       - three other minor fixes.
      
      Thanks to: Aneesh Kumar K.V, Michael Jeanson, Nicholas Piggin.
      
      * tag 'powerpc-4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/mm/hash/4k: Free hugetlb page table caches correctly.
        powerpc/64s/radix: Fix radix_kvm_prefetch_workaround paca access of not possible CPU
        powerpc/64s: Fix build failures with CONFIG_NMI_IPI=n
        powerpc/64: hard disable irqs on the panic()ing CPU
        powerpc: smp_send_stop do not offline stopped CPUs
        powerpc/64: hard disable irqs in panic_smp_self_stop
        powerpc/64s: Fix DT CPU features Power9 DD2.1 logic
        powerpc/64s/radix: Fix MADV_[FREE|DONTNEED] TLB flush miss problem with THP
        powerpc/e500mc: Set assembler machine type to e500mc
      5e220483
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 7ab366e4
      Linus Torvalds 提交于
      Pull arm64 fixes from Catalin Marinas:
      
       - clear buffers allocated with FORCE_CONTIGUOUS explicitly until the
         CMA code honours __GFP_ZERO
      
       - notrace annotation for secondary_start_kernel()
      
       - use early_param() instead of __setup() for "kpti=" as it is needed
         for the cpufeature callback remapping swapper to non-global mappings
      
       - ensure writes to swapper are ordered wrt subsequent cache maintenance
         in the kpti non-global remapping code
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: mm: Ensure writes to swapper are ordered wrt subsequent cache maintenance
        arm64: kpti: Use early_param for kpti= command-line option
        arm64: make secondary_start_kernel() notrace
        arm64: dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag
      7ab366e4
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 8b88ed3c
      Linus Torvalds 提交于
      Pull KVM fixes from Radim Krčmář:
       "ARM:
         - Lazy FPSIMD switching fixes
         - Really disable compat ioctls on architectures that don't want it
         - Disable compat on arm64 (it was never implemented...)
         - Rely on architectural requirements for GICV on GICv3
         - Detect bad alignments in unmap_stage2_range
      
        x86:
         - Add nested VM entry checks to avoid broken error recovery path
         - Minor documentation fix"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: fix KVM_CAP_HYPERV_TLBFLUSH paragraph number
        kvm: vmx: Nested VM-entry prereqs for event inj.
        KVM: arm64: Prevent KVM_COMPAT from being selected
        KVM: Enforce error in ioctl for compat tasks when !KVM_COMPAT
        KVM: arm/arm64: add WARN_ON if size is not PAGE_SIZE aligned in unmap_stage2_range
        KVM: arm64: Avoid mistaken attempts to save SVE state for vcpus
        KVM: arm64/sve: Fix SVE trap restoration for non-current tasks
        KVM: arm64: Don't mask softirq with IRQs disabled in vcpu_put()
        arm64: Introduce sysreg_clear_set()
        KVM: arm/arm64: Drop resource size check for GICV window
      8b88ed3c
    • L
      Merge tag 'for-linus-4.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 4ab59fcf
      Linus Torvalds 提交于
      Pull xen fixes from Juergen Gross:
       "This contains the following fixes/cleanups:
      
         - the removal of a BUG_ON() which wasn't necessary and which could
           trigger now due to a recent change
      
         - a correction of a long standing bug happening very rarely in Xen
           dom0 when a hypercall buffer from user land was not accessible by
           the hypervisor for very short periods of time due to e.g. page
           migration or compaction
      
         - usage of EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL() in a
           Xen-related driver (no breakage possible as using those symbols
           without others already exported via EXPORT-SYMBOL_GPL() wouldn't
           make any sense)
      
         - a simplification for Xen PVH or Xen ARM guests
      
         - some additional error handling for callers of xenbus_printf()"
      
      * tag 'for-linus-4.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen: Remove unnecessary BUG_ON from __unbind_from_irq()
        xen: add new hypercall buffer mapping device
        xen/scsiback: add error handling for xenbus_printf
        scsi: xen-scsifront: add error handling for xenbus_printf
        xen/grant-table: Export gnttab_{alloc|free}_pages as GPL
        xen: add error handling for xenbus_printf
        xen: share start flags between PV and PVH
      4ab59fcf
    • K
      x86/mm: Fix 'no5lvl' handling · 2458e53f
      Kirill A. Shutemov 提交于
      early_identify_cpu() has to use early version of pgtable_l5_enabled()
      that doesn't rely on cpu_feature_enabled().
      
      Defining USE_EARLY_PGTABLE_L5 before all includes does the trick.
      
      I lost the define in one of reworks of the original patch.
      
      Fixes: 372fddf7 ("x86/mm: Introduce the 'no5lvl' kernel parameter")
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Link: https://lkml.kernel.org/r/20180622220841.54135-3-kirill.shutemov@linux.intel.com
      2458e53f
    • K
      Revert "x86/mm: Mark __pgtable_l5_enabled __initdata" · 51be1335
      Kirill A. Shutemov 提交于
      This reverts commit e4e961e3.
      
      We need to use early version of pgtable_l5_enabled() in
      early_identify_cpu() as this code runs before cpu_feature_enabled() is
      usable.
      
      But it leads to section mismatch:
      
      cpu_init()
        load_mm_ldt()
          ldt_slot_va()
            LDT_BASE_ADDR
              LDT_PGD_ENTRY
      	  pgtable_l5_enabled()
      	    __pgtable_l5_enabled
      
      __pgtable_l5_enabled marked as __initdata, but cpu_init() is not __init.
      
      It's fixable: early code can be isolated into a separate translation unit,
      but such change collides with other work in the area.  That's too much
      hassle to save 4 bytes of memory.
      
      Return __pgtable_l5_enabled back to be __ro_after_init.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Link: https://lkml.kernel.org/r/20180622220841.54135-2-kirill.shutemov@linux.intel.com
      51be1335
    • S
      x86/CPU/AMD: Fix LLC ID bit-shift calculation · 964d9784
      Suravee Suthikulpanit 提交于
      The current logic incorrectly calculates the LLC ID from the APIC ID.
      
      Unless specified otherwise, the LLC ID should be calculated by removing
      the Core and Thread ID bits from the least significant end of the APIC
      ID. For more info, see "ApicId Enumeration Requirements" in any Fam17h
      PPR document.
      
      [ bp: Improve commit message. ]
      
      Fixes: 68091ee7 ("Calculate last level cache ID from number of sharing threads")
      Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1528915390-30533-1-git-send-email-suravee.suthikulpanit@amd.com
      964d9784
    • T
      Merge branch 'linus' into x86/urgent · 7731b8bc
      Thomas Gleixner 提交于
      Required to queue a dependent fix.
      7731b8bc
    • J
      bdi: Fix another oops in wb_workfn() · 3ee7e869
      Jan Kara 提交于
      syzbot is reporting NULL pointer dereference at wb_workfn() [1] due to
      wb->bdi->dev being NULL. And Dmitry confirmed that wb->state was
      WB_shutting_down after wb->bdi->dev became NULL. This indicates that
      unregister_bdi() failed to call wb_shutdown() on one of wb objects.
      
      The problem is in cgwb_bdi_unregister() which does cgwb_kill() and thus
      drops bdi's reference to wb structures before going through the list of
      wbs again and calling wb_shutdown() on each of them. This way the loop
      iterating through all wbs can easily miss a wb if that wb has already
      passed through cgwb_remove_from_bdi_list() called from wb_shutdown()
      from cgwb_release_workfn() and as a result fully shutdown bdi although
      wb_workfn() for this wb structure is still running. In fact there are
      also other ways cgwb_bdi_unregister() can race with
      cgwb_release_workfn() leading e.g. to use-after-free issues:
      
      CPU1                            CPU2
                                      cgwb_bdi_unregister()
                                        cgwb_kill(*slot);
      
      cgwb_release()
        queue_work(cgwb_release_wq, &wb->release_work);
      cgwb_release_workfn()
                                        wb = list_first_entry(&bdi->wb_list, ...)
                                        spin_unlock_irq(&cgwb_lock);
        wb_shutdown(wb);
        ...
        kfree_rcu(wb, rcu);
                                        wb_shutdown(wb); -> oops use-after-free
      
      We solve these issues by synchronizing writeback structure shutdown from
      cgwb_bdi_unregister() with cgwb_release_workfn() using a new mutex. That
      way we also no longer need synchronization using WB_shutting_down as the
      mutex provides it for CONFIG_CGROUP_WRITEBACK case and without
      CONFIG_CGROUP_WRITEBACK wb_shutdown() can be called only once from
      bdi_unregister().
      Reported-by: Nsyzbot <syzbot+4a7438e774b21ddd8eca@syzkaller.appspotmail.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3ee7e869
    • G
      lightnvm: Remove depends on HAS_DMA in case of platform dependency · 0ae52ddf
      Geert Uytterhoeven 提交于
      Remove dependencies on HAS_DMA where a Kconfig symbol depends on another
      symbol that implies HAS_DMA, and, optionally, on "|| COMPILE_TEST".
      In most cases this other symbol is an architecture or platform specific
      symbol, or PCI.
      
      Generic symbols and drivers without platform dependencies keep their
      dependencies on HAS_DMA, to prevent compiling subsystems or drivers that
      cannot work anyway.
      
      This simplifies the dependencies, and allows to improve compile-testing.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Reviewed-by: NMark Brown <broonie@kernel.org>
      Acked-by: NRobin Murphy <robin.murphy@arm.com>
      Reviewed-by: NMatias Bjørling <mb@lightnvm.io>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0ae52ddf
    • W
      rseq: Avoid infinite recursion when delivering SIGSEGV · 784e0300
      Will Deacon 提交于
      When delivering a signal to a task that is using rseq, we call into
      __rseq_handle_notify_resume() so that the registers pushed in the
      sigframe are updated to reflect the state of the restartable sequence
      (for example, ensuring that the signal returns to the abort handler if
      necessary).
      
      However, if the rseq management fails due to an unrecoverable fault when
      accessing userspace or certain combinations of RSEQ_CS_* flags, then we
      will attempt to deliver a SIGSEGV. This has the potential for infinite
      recursion if the rseq code continuously fails on signal delivery.
      
      Avoid this problem by using force_sigsegv() instead of force_sig(), which
      is explicitly designed to reset the SEGV handler to SIG_DFL in the case
      of a recursive fault. In doing so, remove rseq_signal_deliver() from the
      internal rseq API and have an optional struct ksignal * parameter to
      rseq_handle_notify_resume() instead.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: peterz@infradead.org
      Cc: paulmck@linux.vnet.ibm.com
      Cc: boqun.feng@gmail.com
      Link: https://lkml.kernel.org/r/1529664307-983-1-git-send-email-will.deacon@arm.com
      784e0300
    • W
      arm64: mm: Ensure writes to swapper are ordered wrt subsequent cache maintenance · 71c8fc0c
      Will Deacon 提交于
      When rewriting swapper using nG mappings, we must performance cache
      maintenance around each page table access in order to avoid coherency
      problems with the host's cacheable alias under KVM. To ensure correct
      ordering of the maintenance with respect to Device memory accesses made
      with the Stage-1 MMU disabled, DMBs need to be added between the
      maintenance and the corresponding memory access.
      
      This patch adds a missing DMB between writing a new page table entry and
      performing a clean+invalidate on the same line.
      
      Fixes: f992b4df ("arm64: kpti: Add ->enable callback to remap swapper using nG mappings")
      Cc: <stable@vger.kernel.org> # 4.16.x-
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      71c8fc0c
    • W
      arm64: kpti: Use early_param for kpti= command-line option · b5b7dd64
      Will Deacon 提交于
      We inspect __kpti_forced early on as part of the cpufeature enable
      callback which remaps the swapper page table using non-global entries.
      
      Ensure that __kpti_forced has been updated to reflect the kpti=
      command-line option before we start using it.
      
      Fixes: ea1e3de8 ("arm64: entry: Add fake CPU feature for unmapping the kernel at EL0")
      Cc: <stable@vger.kernel.org> # 4.16.x-
      Reported-by: NWei Xu <xuwei5@hisilicon.com>
      Tested-by: NSudeep Holla <sudeep.holla@arm.com>
      Tested-by: NWei Xu <xuwei5@hisilicon.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      b5b7dd64
  5. 22 6月, 2018 5 次提交