1. 06 8月, 2018 8 次提交
    • D
      x86/mm/init: Add helper for freeing kernel image pages · 6ea2738e
      Dave Hansen 提交于
      When chunks of the kernel image are freed, free_init_pages() is used
      directly.  Consolidate the three sites that do this.  Also update the
      string to give an incrementally better description of that memory versus
      what was there before.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: keescook@google.com
      Cc: aarcange@redhat.com
      Cc: jgross@suse.com
      Cc: jpoimboe@redhat.com
      Cc: gregkh@linuxfoundation.org
      Cc: peterz@infradead.org
      Cc: hughd@google.com
      Cc: torvalds@linux-foundation.org
      Cc: bp@alien8.de
      Cc: luto@kernel.org
      Cc: ak@linux.intel.com
      Cc: Kees Cook <keescook@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Link: https://lkml.kernel.org/r/20180802225829.FE0E32EA@viggo.jf.intel.com
      6ea2738e
    • D
      x86/mm/init: Pass unconverted symbol addresses to free_init_pages() · 9f515cdb
      Dave Hansen 提交于
      The x86 code has several places where it frees parts of kernel image:
      
       1. Unused SMP alternative
       2. __init code
       3. The hole between text and rodata
       4. The hole between rodata and data
      
      We call free_init_pages() to do this.  Strangely, we convert the symbol
      addresses to kernel direct map addresses in some cases (#3, #4) but not
      others (#1, #2).
      
      The virt_to_page() and the other code in free_reserved_area() now works
      fine for for symbol addresses on x86, so don't bother converting the
      addresses to direct map addresses before freeing them.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: keescook@google.com
      Cc: aarcange@redhat.com
      Cc: jgross@suse.com
      Cc: jpoimboe@redhat.com
      Cc: gregkh@linuxfoundation.org
      Cc: peterz@infradead.org
      Cc: hughd@google.com
      Cc: torvalds@linux-foundation.org
      Cc: bp@alien8.de
      Cc: luto@kernel.org
      Cc: ak@linux.intel.com
      Cc: Kees Cook <keescook@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Link: https://lkml.kernel.org/r/20180802225828.89B2D0E2@viggo.jf.intel.com
      9f515cdb
    • D
      mm: Allow non-direct-map arguments to free_reserved_area() · 0d834328
      Dave Hansen 提交于
      free_reserved_area() takes pointers as arguments to show which addresses
      should be freed.  However, it does this in a somewhat ambiguous way.  If it
      gets a kernel direct map address, it always works.  However, if it gets an
      address that is part of the kernel image alias mapping, it can fail.
      
      It fails if all of the following happen:
       * The specified address is part of the kernel image alias
       * Poisoning is requested (forcing a memset())
       * The address is in a read-only portion of the kernel image
      
      The memset() fails on the read-only mapping, of course.
      free_reserved_area() *is* called both on the direct map and on kernel image
      alias addresses.  We've just lucked out thus far that the kernel image
      alias areas it gets used on are read-write.  I'm fairly sure this has been
      just a happy accident.
      
      It is quite easy to make free_reserved_area() work for all cases: just
      convert the address to a direct map address before doing the memset(), and
      do this unconditionally.  There is little chance of a regression here
      because we previously did a virt_to_page() on the address for the memset,
      so we know these are not highmem pages for which virt_to_page() would fail.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: keescook@google.com
      Cc: aarcange@redhat.com
      Cc: jgross@suse.com
      Cc: jpoimboe@redhat.com
      Cc: gregkh@linuxfoundation.org
      Cc: peterz@infradead.org
      Cc: hughd@google.com
      Cc: torvalds@linux-foundation.org
      Cc: bp@alien8.de
      Cc: luto@kernel.org
      Cc: ak@linux.intel.com
      Cc: Kees Cook <keescook@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Link: https://lkml.kernel.org/r/20180802225826.1287AE3E@viggo.jf.intel.com
      0d834328
    • D
      x86/mm/pti: Clear Global bit more aggressively · eac7073a
      Dave Hansen 提交于
      The kernel image starts out with the Global bit set across the entire
      kernel image.  The bit is cleared with set_memory_nonglobal() in the
      configurations with PCIDs where the performance benefits of the Global bit
      are not needed.
      
      However, this is fragile.  It means that we are stuck opting *out* of the
      less-secure (Global bit set) configuration, which seems backwards.  Let's
      start more secure (Global bit clear) and then let things opt back in if
      they want performance, or are truly mapping common data between kernel and
      userspace.
      
      This fixes a bug.  Before this patch, there are areas that are unmapped
      from the user page tables (like like everything above 0xffffffff82600000 in
      the example below).  These have the hallmark of being a wrong Global area:
      they are not identical in the 'current_kernel' and 'current_user' page
      table dumps.  They are also read-write, which means they're much more
      likely to contain secrets.
      
      Before this patch:
      
      current_kernel:---[ High Kernel Mapping ]---
      current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
      current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
      current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
      current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K     RW                 GLB NX pte
      current_kernel-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
      current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE     GLB NX pmd
      current_kernel-0xffffffff82c00000-0xffffffff82e00000           2M     RW                 GLB NX pte
      current_kernel-0xffffffff82e00000-0xffffffff83200000           4M     RW         PSE     GLB NX pmd
      current_kernel-0xffffffff83200000-0xffffffffa0000000         462M                               pmd
      
       current_user:---[ High Kernel Mapping ]---
       current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
       current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
       current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
       current_user-0xffffffff81e11000-0xffffffff82000000        1980K     RW                 GLB NX pte
       current_user-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
       current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd
      
      After this patch:
      
      current_kernel:---[ High Kernel Mapping ]---
      current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
      current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
      current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
      current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
      current_kernel-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
      current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
      current_kernel-0xffffffff82c00000-0xffffffff82e00000           2M     RW                     NX pte
      current_kernel-0xffffffff82e00000-0xffffffff83200000           4M     RW         PSE         NX pmd
      current_kernel-0xffffffff83200000-0xffffffffa0000000         462M                               pmd
      
        current_user:---[ High Kernel Mapping ]---
        current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
        current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
        current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
        current_user-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
        current_user-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
        current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd
      
      Fixes: 0f561fce ("x86/pti: Enable global pages for shared areas")
      Reported-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: keescook@google.com
      Cc: aarcange@redhat.com
      Cc: jgross@suse.com
      Cc: jpoimboe@redhat.com
      Cc: gregkh@linuxfoundation.org
      Cc: peterz@infradead.org
      Cc: torvalds@linux-foundation.org
      Cc: bp@alien8.de
      Cc: luto@kernel.org
      Cc: ak@linux.intel.com
      Cc: Kees Cook <keescook@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Link: https://lkml.kernel.org/r/20180802225825.A100C071@viggo.jf.intel.com
      eac7073a
    • L
      Linux 4.18-rc8 · 1ffaddd0
      Linus Torvalds 提交于
      1ffaddd0
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a8c19920
      Linus Torvalds 提交于
      Pull x86 fix from Thomas Gleixner:
       "A single fix, which addresses boot failures on machines which do not
        report EBDA correctly, which can place the trampoline into reserved
        memory regions. Validating against E820 prevents that"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/boot/compressed/64: Validate trampoline placement against E820
      a8c19920
    • L
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2f3672cb
      Linus Torvalds 提交于
      Pull timer fixes from Thomas Gleixner:
       "Two oneliners addressing NOHZ failures:
      
         - Use a bitmask to check for the pending timer softirq and not the
           bit number. The existing code using the bit number checked for
           the wrong bit, which caused timers to either expire late or stop
           completely.
      
         - Make the nohz evaluation on interrupt exit more robust. The
           existing code did not re-arm the hardware when interrupting a
           running softirq in task context (ksoftirqd or tail of
           local_bh_enable()), which caused timers to either expire late
           or stop completely"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        nohz: Fix missing tick reprogram when interrupting an inline softirq
        nohz: Fix local_timer_softirq_pending()
      2f3672cb
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0cdf6d46
      Linus Torvalds 提交于
      Pull perf fixes from Thomas Gleixner:
       "A set of fixes for perf:
      
        Kernel side:
      
         - Fix the hardcoded index of extra PCI devices on Broadwell which
           caused a resource conflict and triggered warnings on CPU hotplug.
      
        Tooling:
      
         - Update the tools copy of several files, including perf_event.h,
           powerpc's asm/unistd.h (new io_pgetevents syscall), bpf.h and x86's
           memcpy_64.s (used in 'perf bench mem'), silencing the respective
           warnings during the perf tools build.
      
         - Fix the build on the alpine:edge distro"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel/uncore: Fix hardcoded index of Broadwell extra PCI devices
        perf tools: Fix the build on the alpine:edge distro
        tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy'
        tools headers uapi: Refresh linux/bpf.h copy
        tools headers powerpc: Update asm/unistd.h copy to pick new
        tools headers uapi: Update tools's copy of linux/perf_event.h
      0cdf6d46
  2. 05 8月, 2018 8 次提交
    • L
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b9fb1fc7
      Linus Torvalds 提交于
      Pull irq fix from Thomas Gleixner:
       "A single bugfix for the irq core to prevent silent data corruption and
        malfunction of threaded interrupts under certain conditions"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Make force irq threading setup more robust
      b9fb1fc7
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 212dab05
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Handle frames in error situations properly in AF_XDP, from Jakub
          Kicinski.
      
       2) tcp_mmap test case only tests ipv6 due to a thinko, fix from
          Maninder Singh.
      
       3) Session refcnt fix in l2tp_ppp, from Guillaume Nault.
      
       4) Fix regression in netlink bind handling of multicast gruops, from
          Dmitry Safonov.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        netlink: Don't shift on 64 for ngroups
        net/smc: no cursor update send in state SMC_INIT
        l2tp: fix missing refcount drop in pppol2tp_tunnel_ioctl()
        mlxsw: core_acl_flex_actions: Remove redundant mirror resource destruction
        mlxsw: core_acl_flex_actions: Remove redundant counter destruction
        mlxsw: core_acl_flex_actions: Remove redundant resource destruction
        mlxsw: core_acl_flex_actions: Return error for conflicting actions
        selftests/bpf: update test_lwt_seg6local.sh according to iproute2
        drivers: net: lmc: fix case value for target abort error
        selftest/net: fix protocol family to work for IPv4.
        net: xsk: don't return frames via the allocator on error
        tools/bpftool: fix a percpu_array map dump problem
      212dab05
    • L
      Merge tag 'usercopy-fix-v4.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 60f5a217
      Linus Torvalds 提交于
      Pull usercopy whitelisting fix from Kees Cook:
       "Bart Massey discovered that the usercopy whitelist for JFS was
        incomplete: the inline inode data may intentionally "overflow" into
        the neighboring "extended area", so the size of the whitelist needed
        to be raised to include the neighboring field"
      
      * tag 'usercopy-fix-v4.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        jfs: Fix usercopy whitelist for inline inode data
      60f5a217
    • L
      Merge tag 'xfs-4.18-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · f639bef5
      Linus Torvalds 提交于
      Pull xfs bugfix from Darrick Wong:
       "One more patch for 4.18 to fix a coding error in the iomap_bmap()
        function introduced in -rc1: fix incorrect shifting"
      
      * tag 'xfs-4.18-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        fs: fix iomap_bmap position calculation
      f639bef5
    • L
      Partially revert "block: fail op_is_write() requests to read-only partitions" · a32e236e
      Linus Torvalds 提交于
      It turns out that commit 721c7fc7 ("block: fail op_is_write()
      requests to read-only partitions"), while obviously correct, causes
      problems for some older lvm2 installations.
      
      The reason is that the lvm snapshotting will continue to write to the
      snapshow COW volume, even after the volume has been marked read-only.
      End result: snapshot failure.
      
      This has actually been fixed in newer version of the lvm2 tool, but the
      old tools still exist, and the breakage was reported both in the kernel
      bugzilla and in the Debian bugzilla:
      
        https://bugzilla.kernel.org/show_bug.cgi?id=200439
        https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900442
      
      The lvm2 fix is here
      
        https://sourceware.org/git/?p=lvm2.git;a=commit;h=a6fdb9d9d70f51c49ad11a87ab4243344e6701a3
      
      but until everybody has updated to recent versions, we'll have to weaken
      the "never write to read-only partitions" check.  It now allows the
      write to happen, but causes a warning, something like this:
      
        generic_make_request: Trying to write to read-only block-device dm-3 (partno X)
        Modules linked in: nf_tables xt_cgroup xt_owner kvm_intel iwlmvm kvm irqbypass iwlwifi
        CPU: 1 PID: 77 Comm: kworker/1:1 Not tainted 4.17.9-gentoo #3
        Hardware name: LENOVO 20B6A019RT/20B6A019RT, BIOS GJET91WW (2.41 ) 09/21/2016
        Workqueue: ksnaphd do_metadata
        RIP: 0010:generic_make_request_checks+0x4ac/0x600
        ...
        Call Trace:
         generic_make_request+0x64/0x400
         submit_bio+0x6c/0x140
         dispatch_io+0x287/0x430
         sync_io+0xc3/0x120
         dm_io+0x1f8/0x220
         do_metadata+0x1d/0x30
         process_one_work+0x1b9/0x3e0
         worker_thread+0x2b/0x3c0
         kthread+0x113/0x130
         ret_from_fork+0x35/0x40
      
      Note that this is a "revert" in behavior only.  I'm leaving alone the
      actual code cleanups in commit 721c7fc7, but letting the previously
      uncaught request go through with a warning instead of stopping it.
      
      Fixes: 721c7fc7 ("block: fail op_is_write() requests to read-only partitions")
      Reported-and-tested-by: NWGH <wgh@torlan.ru>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Ilya Dryomov <idryomov@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Zdenek Kabelac <zkabelac@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a32e236e
    • D
      netlink: Don't shift on 64 for ngroups · 91874ecf
      Dmitry Safonov 提交于
      It's legal to have 64 groups for netlink_sock.
      
      As user-supplied nladdr->nl_groups is __u32, it's possible to subscribe
      only to first 32 groups.
      
      The check for correctness of .bind() userspace supplied parameter
      is done by applying mask made from ngroups shift. Which broke Android
      as they have 64 groups and the shift for mask resulted in an overflow.
      
      Fixes: 61f4b237 ("netlink: Don't shift with UB on nlk->ngroups")
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: netdev@vger.kernel.org
      Cc: stable@vger.kernel.org
      Reported-and-Tested-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NDmitry Safonov <dima@arista.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91874ecf
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 5dbfb6ec
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-08-05
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix bpftool percpu_array dump by using correct roundup to next
         multiple of 8 for the value size, from Yonghong.
      
      2) Fix in AF_XDP's __xsk_rcv_zc() to not returning frames back to
         allocator since driver will recycle frame anyway in case of an
         error, from Jakub.
      
      3) Fix up BPF test_lwt_seg6local test cases to final iproute2
         syntax, from Mathieu.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5dbfb6ec
    • U
      net/smc: no cursor update send in state SMC_INIT · 5607016c
      Ursula Braun 提交于
      If a writer blocked condition is received without data, the current
      consumer cursor is immediately sent. Servers could already receive this
      condition in state SMC_INIT without finished tx-setup. This patch
      avoids sending a consumer cursor update in this case.
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5607016c
  3. 04 8月, 2018 14 次提交
  4. 03 8月, 2018 10 次提交