1. 27 7月, 2018 4 次提交
    • A
      include/linux/eventfd.h: include linux/errno.h · fa3fc2ad
      Arnd Bergmann 提交于
      The new gasket staging driver ran into a randconfig build failure when
      CONFIG_EVENTFD is disabled:
      
        In file included from drivers/staging/gasket/gasket_interrupt.h:11,
                         from drivers/staging/gasket/gasket_interrupt.c:4:
        include/linux/eventfd.h: In function 'eventfd_ctx_fdget':
        include/linux/eventfd.h:51:9: error: implicit declaration of function 'ERR_PTR' [-Werror=implicit-function-declaration]
      
      I can't see anything wrong with including eventfd.h before err.h, so the
      easiest fix is to make it possible to do this by including the file
      where it is needed.
      
      Link: http://lkml.kernel.org/r/20180724110737.3985088-1-arnd@arndb.de
      Fixes: 9a69f508 ("drivers/staging: Gasket driver framework + Apex driver")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fa3fc2ad
    • K
      mm: fix vma_is_anonymous() false-positives · bfd40eaf
      Kirill A. Shutemov 提交于
      vma_is_anonymous() relies on ->vm_ops being NULL to detect anonymous
      VMA.  This is unreliable as ->mmap may not set ->vm_ops.
      
      False-positive vma_is_anonymous() may lead to crashes:
      
      	next ffff8801ce5e7040 prev ffff8801d20eca50 mm ffff88019c1e13c0
      	prot 27 anon_vma ffff88019680cdd8 vm_ops 0000000000000000
      	pgoff 0 file ffff8801b2ec2d00 private_data 0000000000000000
      	flags: 0xff(read|write|exec|shared|mayread|maywrite|mayexec|mayshare)
      	------------[ cut here ]------------
      	kernel BUG at mm/memory.c:1422!
      	invalid opcode: 0000 [#1] SMP KASAN
      	CPU: 0 PID: 18486 Comm: syz-executor3 Not tainted 4.18.0-rc3+ #136
      	Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
      	01/01/2011
      	RIP: 0010:zap_pmd_range mm/memory.c:1421 [inline]
      	RIP: 0010:zap_pud_range mm/memory.c:1466 [inline]
      	RIP: 0010:zap_p4d_range mm/memory.c:1487 [inline]
      	RIP: 0010:unmap_page_range+0x1c18/0x2220 mm/memory.c:1508
      	Call Trace:
      	 unmap_single_vma+0x1a0/0x310 mm/memory.c:1553
      	 zap_page_range_single+0x3cc/0x580 mm/memory.c:1644
      	 unmap_mapping_range_vma mm/memory.c:2792 [inline]
      	 unmap_mapping_range_tree mm/memory.c:2813 [inline]
      	 unmap_mapping_pages+0x3a7/0x5b0 mm/memory.c:2845
      	 unmap_mapping_range+0x48/0x60 mm/memory.c:2880
      	 truncate_pagecache+0x54/0x90 mm/truncate.c:800
      	 truncate_setsize+0x70/0xb0 mm/truncate.c:826
      	 simple_setattr+0xe9/0x110 fs/libfs.c:409
      	 notify_change+0xf13/0x10f0 fs/attr.c:335
      	 do_truncate+0x1ac/0x2b0 fs/open.c:63
      	 do_sys_ftruncate+0x492/0x560 fs/open.c:205
      	 __do_sys_ftruncate fs/open.c:215 [inline]
      	 __se_sys_ftruncate fs/open.c:213 [inline]
      	 __x64_sys_ftruncate+0x59/0x80 fs/open.c:213
      	 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
      	 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Reproducer:
      
      	#include <stdio.h>
      	#include <stddef.h>
      	#include <stdint.h>
      	#include <stdlib.h>
      	#include <string.h>
      	#include <sys/types.h>
      	#include <sys/stat.h>
      	#include <sys/ioctl.h>
      	#include <sys/mman.h>
      	#include <unistd.h>
      	#include <fcntl.h>
      
      	#define KCOV_INIT_TRACE			_IOR('c', 1, unsigned long)
      	#define KCOV_ENABLE			_IO('c', 100)
      	#define KCOV_DISABLE			_IO('c', 101)
      	#define COVER_SIZE			(1024<<10)
      
      	#define KCOV_TRACE_PC  0
      	#define KCOV_TRACE_CMP 1
      
      	int main(int argc, char **argv)
      	{
      		int fd;
      		unsigned long *cover;
      
      		system("mount -t debugfs none /sys/kernel/debug");
      		fd = open("/sys/kernel/debug/kcov", O_RDWR);
      		ioctl(fd, KCOV_INIT_TRACE, COVER_SIZE);
      		cover = mmap(NULL, COVER_SIZE * sizeof(unsigned long),
      				PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
      		munmap(cover, COVER_SIZE * sizeof(unsigned long));
      		cover = mmap(NULL, COVER_SIZE * sizeof(unsigned long),
      				PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
      		memset(cover, 0, COVER_SIZE * sizeof(unsigned long));
      		ftruncate(fd, 3UL << 20);
      		return 0;
      	}
      
      This can be fixed by assigning anonymous VMAs own vm_ops and not relying
      on it being NULL.
      
      If ->mmap() failed to set ->vm_ops, mmap_region() will set it to
      dummy_vm_ops.  This way we will have non-NULL ->vm_ops for all VMAs.
      
      Link: http://lkml.kernel.org/r/20180724121139.62570-4-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: syzbot+3f84280d52be9b7083cc@syzkaller.appspotmail.com
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bfd40eaf
    • K
      mm: introduce vma_init() · 027232da
      Kirill A. Shutemov 提交于
      Not all VMAs allocated with vm_area_alloc().  Some of them allocated on
      stack or in data segment.
      
      The new helper can be use to initialize VMA properly regardless where it
      was allocated.
      
      Link: http://lkml.kernel.org/r/20180724121139.62570-2-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      027232da
    • T
      delayacct: fix crash in delayacct_blkio_end() after delayacct init failure · b512719f
      Tejun Heo 提交于
      While forking, if delayacct init fails due to memory shortage, it
      continues expecting all delayacct users to check task->delays pointer
      against NULL before dereferencing it, which all of them used to do.
      
      Commit c96f5471 ("delayacct: Account blkio completion on the correct
      task"), while updating delayacct_blkio_end() to take the target task
      instead of always using %current, made the function test NULL on
      %current->delays and then continue to operated on @p->delays.  If
      %current succeeded init while @p didn't, it leads to the following
      crash.
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
       IP: __delayacct_blkio_end+0xc/0x40
       PGD 8000001fd07e1067 P4D 8000001fd07e1067 PUD 1fcffbb067 PMD 0
       Oops: 0000 [#1] SMP PTI
       CPU: 4 PID: 25774 Comm: QIOThread0 Not tainted 4.16.0-9_fbk1_rc2_1180_g6b593215b4d7 #9
       RIP: 0010:__delayacct_blkio_end+0xc/0x40
       Call Trace:
        try_to_wake_up+0x2c0/0x600
        autoremove_wake_function+0xe/0x30
        __wake_up_common+0x74/0x120
        wake_up_page_bit+0x9c/0xe0
        mpage_end_io+0x27/0x70
        blk_update_request+0x78/0x2c0
        scsi_end_request+0x2c/0x1e0
        scsi_io_completion+0x20b/0x5f0
        blk_mq_complete_request+0xa2/0x100
        ata_scsi_qc_complete+0x79/0x400
        ata_qc_complete_multiple+0x86/0xd0
        ahci_handle_port_interrupt+0xc9/0x5c0
        ahci_handle_port_intr+0x54/0xb0
        ahci_single_level_irq_intr+0x3b/0x60
        __handle_irq_event_percpu+0x43/0x190
        handle_irq_event_percpu+0x20/0x50
        handle_irq_event+0x2a/0x50
        handle_edge_irq+0x80/0x1c0
        handle_irq+0xaf/0x120
        do_IRQ+0x41/0xc0
        common_interrupt+0xf/0xf
      
      Fix it by updating delayacct_blkio_end() check @p->delays instead.
      
      Link: http://lkml.kernel.org/r/20180724175542.GP1934745@devbig577.frc2.facebook.com
      Fixes: c96f5471 ("delayacct: Account blkio completion on the correct task")
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NDave Jones <dsj@fb.com>
      Debugged-by: NDave Jones <dsj@fb.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Josh Snyder <joshs@netflix.com>
      Cc: <stable@vger.kernel.org>	[4.15+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b512719f
  2. 25 7月, 2018 4 次提交
  3. 23 7月, 2018 1 次提交
  4. 22 7月, 2018 3 次提交
  5. 20 7月, 2018 1 次提交
  6. 19 7月, 2018 2 次提交
    • T
      net/mlx5: Fix QP fragmented buffer allocation · d7037ad7
      Tariq Toukan 提交于
      Fix bad alignment of SQ buffer in fragmented QP allocation.
      It should start directly after RQ buffer ends.
      
      Take special care of the end case where the RQ buffer does not occupy
      a whole page. RQ size is a power of two, so would be the case only for
      small RQ sizes (RQ size < PAGE_SIZE).
      
      Fix wrong assignments for sqb->size (mistakenly assigned RQ size),
      and for npages value of RQ and SQ.
      
      Fixes: 3a2f7033 ("net/mlx5: Use order-0 allocations for all WQ types")
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      d7037ad7
    • S
      PCI: OF: Fix I/O space page leak · a5fb9fb0
      Sergei Shtylyov 提交于
      When testing the R-Car PCIe driver on the Condor board, if the PCIe PHY
      driver was left disabled, the kernel crashed with this BUG:
      
        kernel BUG at lib/ioremap.c:72!
        Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
        Modules linked in:
        CPU: 0 PID: 39 Comm: kworker/0:1 Not tainted 4.17.0-dirty #1092
        Hardware name: Renesas Condor board based on r8a77980 (DT)
        Workqueue: events deferred_probe_work_func
        pstate: 80000005 (Nzcv daif -PAN -UAO)
        pc : ioremap_page_range+0x370/0x3c8
        lr : ioremap_page_range+0x40/0x3c8
        sp : ffff000008da39e0
        x29: ffff000008da39e0 x28: 00e8000000000f07
        x27: ffff7dfffee00000 x26: 0140000000000000
        x25: ffff7dfffef00000 x24: 00000000000fe100
        x23: ffff80007b906000 x22: ffff000008ab8000
        x21: ffff000008bb1d58 x20: ffff7dfffef00000
        x19: ffff800009c30fb8 x18: 0000000000000001
        x17: 00000000000152d0 x16: 00000000014012d0
        x15: 0000000000000000 x14: 0720072007200720
        x13: 0720072007200720 x12: 0720072007200720
        x11: 0720072007300730 x10: 00000000000000ae
        x9 : 0000000000000000 x8 : ffff7dffff000000
        x7 : 0000000000000000 x6 : 0000000000000100
        x5 : 0000000000000000 x4 : 000000007b906000
        x3 : ffff80007c61a880 x2 : ffff7dfffeefffff
        x1 : 0000000040000000 x0 : 00e80000fe100f07
        Process kworker/0:1 (pid: 39, stack limit = 0x        (ptrval))
        Call trace:
         ioremap_page_range+0x370/0x3c8
         pci_remap_iospace+0x7c/0xac
         pci_parse_request_of_pci_ranges+0x13c/0x190
         rcar_pcie_probe+0x4c/0xb04
         platform_drv_probe+0x50/0xbc
         driver_probe_device+0x21c/0x308
         __device_attach_driver+0x98/0xc8
         bus_for_each_drv+0x54/0x94
         __device_attach+0xc4/0x12c
         device_initial_probe+0x10/0x18
         bus_probe_device+0x90/0x98
         deferred_probe_work_func+0xb0/0x150
         process_one_work+0x12c/0x29c
         worker_thread+0x200/0x3fc
         kthread+0x108/0x134
         ret_from_fork+0x10/0x18
        Code: f9004ba2 54000080 aa0003fb 17ffff48 (d4210000)
      
      It turned out that pci_remap_iospace() wasn't undone when the driver's
      probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER,
      the probe was retried, finally causing the BUG due to trying to remap
      already remapped pages.
      
      Introduce the devm_pci_remap_iospace() managed API and replace the
      pci_remap_iospace() call with it to fix the bug.
      
      Fixes: dbf9826d ("PCI: generic: Convert to DT resource parsing API")
      Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      [lorenzo.pieralisi@arm.com: split commit/updated the commit log]
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      a5fb9fb0
  7. 18 7月, 2018 1 次提交
  8. 17 7月, 2018 3 次提交
    • R
      net/ethernet/freescale/fman: fix cross-build error · c1334597
      Randy Dunlap 提交于
        CC [M]  drivers/net/ethernet/freescale/fman/fman.o
      In file included from ../drivers/net/ethernet/freescale/fman/fman.c:35:
      ../include/linux/fsl/guts.h: In function 'guts_set_dmacr':
      ../include/linux/fsl/guts.h:165:2: error: implicit declaration of function 'clrsetbits_be32' [-Werror=implicit-function-declaration]
        clrsetbits_be32(&guts->dmacr, 3 << shift, device << shift);
        ^~~~~~~~~~~~~~~
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Madalin Bucur <madalin.bucur@nxp.com>
      Cc: netdev@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1334597
    • H
      ipv4/igmp: init group mode as INCLUDE when join source group · 6e2059b5
      Hangbin Liu 提交于
      Based on RFC3376 5.1
         If no interface
         state existed for that multicast address before the change (i.e., the
         change consisted of creating a new per-interface record), or if no
         state exists after the change (i.e., the change consisted of deleting
         a per-interface record), then the "non-existent" state is considered
         to have a filter mode of INCLUDE and an empty source list.
      
      Which means a new multicast group should start with state IN().
      
      Function ip_mc_join_group() works correctly for IGMP ASM(Any-Source Multicast)
      mode. It adds a group with state EX() and inits crcount to mc_qrv,
      so the kernel will send a TO_EX() report message after adding group.
      
      But for IGMPv3 SSM(Source-specific multicast) JOIN_SOURCE_GROUP mode, we
      split the group joining into two steps. First we join the group like ASM,
      i.e. via ip_mc_join_group(). So the state changes from IN() to EX().
      
      Then we add the source-specific address with INCLUDE mode. So the state
      changes from EX() to IN(A).
      
      Before the first step sends a group change record, we finished the second
      step. So we will only send the second change record. i.e. TO_IN(A).
      
      Regarding the RFC stands, we should actually send an ALLOW(A) message for
      SSM JOIN_SOURCE_GROUP as the state should mimic the 'IN() to IN(A)'
      transition.
      
      The issue was exposed by commit a052517a ("net/multicast: should not
      send source list records when have filter mode change"). Before this change,
      we used to send both ALLOW(A) and TO_IN(A). After this change we only send
      TO_IN(A).
      
      Fix it by adding a new parameter to init group mode. Also add new wrapper
      functions so we don't need to change too much code.
      
      v1 -> v2:
      In my first version I only cleared the group change record. But this is not
      enough. Because when a new group join, it will init as EXCLUDE and trigger
      an filter mode change in ip/ip6_mc_add_src(), which will clear all source
      addresses' sf_crcount. This will prevent early joined address sending state
      change records if multi source addressed joined at the same time.
      
      In v2 patch, I fixed it by directly initializing the mode to INCLUDE for SSM
      JOIN_SOURCE_GROUP. I also split the original patch into two separated patches
      for IPv4 and IPv6.
      
      Fixes: a052517a ("net/multicast: should not send source list records when have filter mode change")
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e2059b5
    • P
      mm: don't do zero_resv_unavail if memmap is not allocated · d1b47a7c
      Pavel Tatashin 提交于
      Moving zero_resv_unavail before memmap_init_zone(), caused a regression on
      x86-32.
      
      The cause is that we access struct pages before they are allocated when
      CONFIG_FLAT_NODE_MEM_MAP is used.
      
      free_area_init_nodes()
        zero_resv_unavail()
          mm_zero_struct_page(pfn_to_page(pfn)); <- struct page is not alloced
        free_area_init_node()
          if CONFIG_FLAT_NODE_MEM_MAP
            alloc_node_mem_map()
              memblock_virt_alloc_node_nopanic() <- struct page alloced here
      
      On the other hand memblock_virt_alloc_node_nopanic() zeroes all the memory
      that it returns, so we do not need to do zero_resv_unavail() here.
      
      Fixes: e181ae0c ("mm: zero unavailable pages before memmap init")
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Tested-by: NMatt Hart <matt@mattface.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d1b47a7c
  9. 13 7月, 2018 1 次提交
    • S
      net: Don't copy pfmemalloc flag in __copy_skb_header() · 8b700862
      Stefano Brivio 提交于
      The pfmemalloc flag indicates that the skb was allocated from
      the PFMEMALLOC reserves, and the flag is currently copied on skb
      copy and clone.
      
      However, an skb copied from an skb flagged with pfmemalloc
      wasn't necessarily allocated from PFMEMALLOC reserves, and on
      the other hand an skb allocated that way might be copied from an
      skb that wasn't.
      
      So we should not copy the flag on skb copy, and rather decide
      whether to allow an skb to be associated with sockets unrelated
      to page reclaim depending only on how it was allocated.
      
      Move the pfmemalloc flag before headers_start[0] using an
      existing 1-bit hole, so that __copy_skb_header() doesn't copy
      it.
      
      When cloning, we'll now take care of this flag explicitly,
      contravening to the warning comment of __skb_clone().
      
      While at it, restore the newline usage introduced by commit
      b1937227 ("net: reorganize sk_buff for faster
      __copy_skb_header()") to visually separate bytes used in
      bitfields after headers_start[0], that was gone after commit
      a9e419dc ("netfilter: merge ctinfo into nfct pointer storage
      area"), and describe the pfmemalloc flag in the kernel-doc
      structure comment.
      
      This doesn't change the size of sk_buff or cacheline boundaries,
      but consolidates the 15 bits hole before tc_index into a 2 bytes
      hole before csum, that could now be filled more easily.
      Reported-by: NPatrick Talbert <ptalbert@redhat.com>
      Fixes: c93bdd0e ("netvm: allow skb allocation to use PFMEMALLOC reserves")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b700862
  10. 11 7月, 2018 1 次提交
  11. 09 7月, 2018 1 次提交
    • R
      bpf: include errno.h from bpf-cgroup.h · f292b87d
      Roman Gushchin 提交于
      Commit fdb5c453 ("bpf: fix attach type BPF_LIRC_MODE2 dependency
      wrt CONFIG_CGROUP_BPF") caused some build issues, detected by 0-DAY
      kernel test infrastructure.
      
      The problem is that cgroup_bpf_prog_attach/detach/query() functions
      can return -EINVAL error code, which is not defined. Fix this adding
      errno.h to includes.
      
      Fixes: fdb5c453 ("bpf: fix attach type BPF_LIRC_MODE2 dependency wrt CONFIG_CGROUP_BPF")
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Cc: Sean Young <sean@mess.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f292b87d
  12. 08 7月, 2018 1 次提交
  13. 07 7月, 2018 2 次提交
  14. 04 7月, 2018 2 次提交
  15. 03 7月, 2018 2 次提交
    • N
      compiler-gcc.h: Add __attribute__((gnu_inline)) to all inline declarations · d03db2bc
      Nick Desaulniers 提交于
      Functions marked extern inline do not emit an externally visible
      function when the gnu89 C standard is used. Some KBUILD Makefiles
      overwrite KBUILD_CFLAGS. This is an issue for GCC 5.1+ users as without
      an explicit C standard specified, the default is gnu11. Since c99, the
      semantics of extern inline have changed such that an externally visible
      function is always emitted. This can lead to multiple definition errors
      of extern inline functions at link time of compilation units whose build
      files have removed an explicit C standard compiler flag for users of GCC
      5.1+ or Clang.
      Suggested-by: NArnd Bergmann <arnd@arndb.de>
      Suggested-by: NH. Peter Anvin <hpa@zytor.com>
      Suggested-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@redhat.com
      Cc: akataria@vmware.com
      Cc: akpm@linux-foundation.org
      Cc: andrea.parri@amarulasolutions.com
      Cc: ard.biesheuvel@linaro.org
      Cc: aryabinin@virtuozzo.com
      Cc: astrachan@google.com
      Cc: boris.ostrovsky@oracle.com
      Cc: brijesh.singh@amd.com
      Cc: caoj.fnst@cn.fujitsu.com
      Cc: geert@linux-m68k.org
      Cc: ghackmann@google.com
      Cc: gregkh@linuxfoundation.org
      Cc: jan.kiszka@siemens.com
      Cc: jarkko.sakkinen@linux.intel.com
      Cc: jpoimboe@redhat.com
      Cc: keescook@google.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: kstewart@linuxfoundation.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-kbuild@vger.kernel.org
      Cc: manojgupta@google.com
      Cc: mawilcox@microsoft.com
      Cc: michal.lkml@markovi.net
      Cc: mjg59@google.com
      Cc: mka@chromium.org
      Cc: pombredanne@nexb.com
      Cc: rientjes@google.com
      Cc: rostedt@goodmis.org
      Cc: sedat.dilek@gmail.com
      Cc: thomas.lendacky@amd.com
      Cc: tstellar@redhat.com
      Cc: tweek@google.com
      Cc: virtualization@lists.linux-foundation.org
      Cc: will.deacon@arm.com
      Cc: yamada.masahiro@socionext.com
      Link: http://lkml.kernel.org/r/20180621162324.36656-2-ndesaulniers@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d03db2bc
    • P
      kthread, sched/core: Fix kthread_parkme() (again...) · 1cef1150
      Peter Zijlstra 提交于
      Gaurav reports that commit:
      
        85f1abe0 ("kthread, sched/wait: Fix kthread_parkme() completion issue")
      
      isn't working for him. Because of the following race:
      
      > controller Thread                               CPUHP Thread
      > takedown_cpu
      > kthread_park
      > kthread_parkme
      > Set KTHREAD_SHOULD_PARK
      >                                                 smpboot_thread_fn
      >                                                 set Task interruptible
      >
      >
      > wake_up_process
      >  if (!(p->state & state))
      >                 goto out;
      >
      >                                                 Kthread_parkme
      >                                                 SET TASK_PARKED
      >                                                 schedule
      >                                                 raw_spin_lock(&rq->lock)
      > ttwu_remote
      > waiting for __task_rq_lock
      >                                                 context_switch
      >
      >                                                 finish_lock_switch
      >
      >
      >
      >                                                 Case TASK_PARKED
      >                                                 kthread_park_complete
      >
      >
      > SET Running
      
      Furthermore, Oleg noticed that the whole scheduler TASK_PARKED
      handling is buggered because the TASK_DEAD thing is done with
      preemption disabled, the current code can still complete early on
      preemption :/
      
      So basically revert that earlier fix and go with a variant of the
      alternative mentioned in the commit. Promote TASK_PARKED to special
      state to avoid the store-store issue on task->state leading to the
      WARN in kthread_unpark() -> __kthread_bind().
      
      But in addition, add wait_task_inactive() to kthread_park() to ensure
      the task really is PARKED when we return from kthread_park(). This
      avoids the whole kthread still gets migrated nonsense -- although it
      would be really good to get this done differently.
      Reported-by: NGaurav Kohli <gkohli@codeaurora.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 85f1abe0 ("kthread, sched/wait: Fix kthread_parkme() completion issue")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1cef1150
  16. 02 7月, 2018 2 次提交
    • H
      ahci: Disable LPM on Lenovo 50 series laptops with a too old BIOS · 240630e6
      Hans de Goede 提交于
      There have been several reports of LPM related hard freezes about once
      a day on multiple Lenovo 50 series models. Strange enough these reports
      where not disk model specific as LPM issues usually are and some users
      with the exact same disk + laptop where seeing them while other users
      where not seeing these issues.
      
      It turns out that enabling LPM triggers a firmware bug somewhere, which
      has been fixed in later BIOS versions.
      
      This commit adds a new ahci_broken_lpm() function and a new ATA_FLAG_NO_LPM
      for dealing with this.
      
      The ahci_broken_lpm() function contains DMI match info for the 4 models
      which are known to be affected by this and the DMI BIOS date field for
      known good BIOS versions. If the BIOS date is older then the one in the
      table LPM will be disabled and a warning will be printed.
      
      Note the BIOS dates are for known good versions, some older versions may
      work too, but we don't know for sure, the table is using dates from BIOS
      versions for which users have confirmed that upgrading to that version
      makes the problem go away.
      
      Unfortunately I've been unable to get hold of the reporter who reported
      that BIOS version 2.35 fixed the problems on the W541 for him. I've been
      able to verify the DMI_SYS_VENDOR and DMI_PRODUCT_VERSION from an older
      dmidecode, but I don't know the exact BIOS date as reported in the DMI.
      Lenovo keeps a changelog with dates in their release notes, but the
      dates there are the release dates not the build dates which are in DMI.
      So I've chosen to set the date to which we compare to one day past the
      release date of the 2.34 BIOS. I plan to fix this with a follow up
      commit once I've the necessary info.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      240630e6
    • S
      net: fix use-after-free in GRO with ESP · 603d4cf8
      Sabrina Dubroca 提交于
      Since the addition of GRO for ESP, gro_receive can consume the skb and
      return -EINPROGRESS. In that case, the lower layer GRO handler cannot
      touch the skb anymore.
      
      Commit 5f114163 ("net: Add a skb_gro_flush_final helper.") converted
      some of the gro_receive handlers that can lead to ESP's gro_receive so
      that they wouldn't access the skb when -EINPROGRESS is returned, but
      missed other spots, mainly in tunneling protocols.
      
      This patch finishes the conversion to using skb_gro_flush_final(), and
      adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and
      GUE.
      
      Fixes: 5f114163 ("net: Add a skb_gro_flush_final helper.")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      603d4cf8
  17. 30 6月, 2018 1 次提交
    • D
      bpf: undo prog rejection on read-only lock failure · 85782e03
      Daniel Borkmann 提交于
      Partially undo commit 9facc336 ("bpf: reject any prog that failed
      read-only lock") since it caused a regression, that is, syzkaller was
      able to manage to cause a panic via fault injection deep in set_memory_ro()
      path by letting an allocation fail: In x86's __change_page_attr_set_clr()
      it was able to change the attributes of the primary mapping but not in
      the alias mapping via cpa_process_alias(), so the second, inner call
      to the __change_page_attr() via __change_page_attr_set_clr() had to split
      a larger page and failed in the alloc_pages() with the artifically triggered
      allocation error which is then propagated down to the call site.
      
      Thus, for set_memory_ro() this means that it returned with an error, but
      from debugging a probe_kernel_write() revealed EFAULT on that memory since
      the primary mapping succeeded to get changed. Therefore the subsequent
      hdr->locked = 0 reset triggered the panic as it was performed on read-only
      memory, so call-site assumptions were infact wrong to assume that it would
      either succeed /or/ not succeed at all since there's no such rollback in
      set_memory_*() calls from partial change of mappings, in other words, we're
      left in a state that is "half done". A later undo via set_memory_rw() is
      succeeding though due to matching permissions on that part (aka due to the
      try_preserve_large_page() succeeding). While reproducing locally with
      explicitly triggering this error, the initial splitting only happens on
      rare occasions and in real world it would additionally need oom conditions,
      but that said, it could partially fail. Therefore, it is definitely wrong
      to bail out on set_memory_ro() error and reject the program with the
      set_memory_*() semantics we have today. Shouldn't have gone the extra mile
      since no other user in tree today infact checks for any set_memory_*()
      errors, e.g. neither module_enable_ro() / module_disable_ro() for module
      RO/NX handling which is mostly default these days nor kprobes core with
      alloc_insn_page() / free_insn_page() as examples that could be invoked long
      after bootup and original 314beb9b ("x86: bpf_jit_comp: secure bpf jit
      against spraying attacks") did neither when it got first introduced to BPF
      so "improving" with bailing out was clearly not right when set_memory_*()
      cannot handle it today.
      
      Kees suggested that if set_memory_*() can fail, we should annotate it with
      __must_check, and all callers need to deal with it gracefully given those
      set_memory_*() markings aren't "advisory", but they're expected to actually
      do what they say. This might be an option worth to move forward in future
      but would at the same time require that set_memory_*() calls from supporting
      archs are guaranteed to be "atomic" in that they provide rollback if part
      of the range fails, once that happened, the transition from RW -> RO could
      be made more robust that way, while subsequent RO -> RW transition /must/
      continue guaranteeing to always succeed the undo part.
      
      Reported-by: syzbot+a4eb8c7766952a1ca872@syzkaller.appspotmail.com
      Reported-by: syzbot+d866d1925855328eac3b@syzkaller.appspotmail.com
      Fixes: 9facc336 ("bpf: reject any prog that failed read-only lock")
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      85782e03
  18. 29 6月, 2018 4 次提交
    • J
      sg: remove ->sg_magic member · 9544bc53
      Jens Axboe 提交于
      This was introduced more than a decade ago when sg chaining was
      added, but we never really caught anything with it. The scatterlist
      entry size can be critical, since drivers allocate it, so remove
      the magic member. Recently it's been triggering allocation stalls
      and failures in NVMe.
      Tested-by: NJordan Glover <Golden_Miller83@protonmail.ch>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9544bc53
    • S
      include/linux/dax.h: dax_iomap_fault() returns vm_fault_t · f77bc3a8
      Souptick Joarder 提交于
      Commit 1c8f4220 ("mm: change return type to vm_fault_t") missed a
      conversion.  It's not a big problem at present because mainline is still
      using
      
      	typedef int vm_fault_t;
      
      Fixes: 1c8f4220 ("mm: change return type to vm_fault_t")
      Link: http://lkml.kernel.org/r/20180620172046.GA27894@jordon-HP-15-Notebook-PCSigned-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f77bc3a8
    • M
      slub: fix failure when we delete and create a slab cache · d50d82fa
      Mikulas Patocka 提交于
      In kernel 4.17 I removed some code from dm-bufio that did slab cache
      merging (commit 21bb1327: "dm bufio: remove code that merges slab
      caches") - both slab and slub support merging caches with identical
      attributes, so dm-bufio now just calls kmem_cache_create and relies on
      implicit merging.
      
      This uncovered a bug in the slub subsystem - if we delete a cache and
      immediatelly create another cache with the same attributes, it fails
      because of duplicate filename in /sys/kernel/slab/.  The slub subsystem
      offloads freeing the cache to a workqueue - and if we create the new
      cache before the workqueue runs, it complains because of duplicate
      filename in sysfs.
      
      This patch fixes the bug by moving the call of kobject_del from
      sysfs_slab_remove_workfn to shutdown_cache.  kobject_del must be called
      while we hold slab_mutex - so that the sysfs entry is deleted before a
      cache with the same attributes could be created.
      
      Running device-mapper-test-suite with:
      
        dmtest run --suite thin-provisioning -n /commit_failure_causes_fallback/
      
      triggered:
      
        Buffer I/O error on dev dm-0, logical block 1572848, async page read
        device-mapper: thin: 253:1: metadata operation 'dm_pool_alloc_data_block' failed: error = -5
        device-mapper: thin: 253:1: aborting current metadata transaction
        sysfs: cannot create duplicate filename '/kernel/slab/:a-0000144'
        CPU: 2 PID: 1037 Comm: kworker/u48:1 Not tainted 4.17.0.snitm+ #25
        Hardware name: Supermicro SYS-1029P-WTR/X11DDW-L, BIOS 2.0a 12/06/2017
        Workqueue: dm-thin do_worker [dm_thin_pool]
        Call Trace:
         dump_stack+0x5a/0x73
         sysfs_warn_dup+0x58/0x70
         sysfs_create_dir_ns+0x77/0x80
         kobject_add_internal+0xba/0x2e0
         kobject_init_and_add+0x70/0xb0
         sysfs_slab_add+0xb1/0x250
         __kmem_cache_create+0x116/0x150
         create_cache+0xd9/0x1f0
         kmem_cache_create_usercopy+0x1c1/0x250
         kmem_cache_create+0x18/0x20
         dm_bufio_client_create+0x1ae/0x410 [dm_bufio]
         dm_block_manager_create+0x5e/0x90 [dm_persistent_data]
         __create_persistent_data_objects+0x38/0x940 [dm_thin_pool]
         dm_pool_abort_metadata+0x64/0x90 [dm_thin_pool]
         metadata_operation_failed+0x59/0x100 [dm_thin_pool]
         alloc_data_block.isra.53+0x86/0x180 [dm_thin_pool]
         process_cell+0x2a3/0x550 [dm_thin_pool]
         do_worker+0x28d/0x8f0 [dm_thin_pool]
         process_one_work+0x171/0x370
         worker_thread+0x49/0x3f0
         kthread+0xf8/0x130
         ret_from_fork+0x35/0x40
        kobject_add_internal failed for :a-0000144 with -EEXIST, don't try to register things with the same name in the same directory.
        kmem_cache_create(dm_bufio_buffer-16) failed with error -17
      
      Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1806151817130.6333@file01.intranet.prod.int.rdu2.redhat.comSigned-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Reported-by: NMike Snitzer <snitzer@redhat.com>
      Tested-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d50d82fa
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  19. 27 6月, 2018 2 次提交
  20. 26 6月, 2018 1 次提交
    • S
      bpf: fix attach type BPF_LIRC_MODE2 dependency wrt CONFIG_CGROUP_BPF · fdb5c453
      Sean Young 提交于
      If the kernel is compiled with CONFIG_CGROUP_BPF not enabled, it is not
      possible to attach, detach or query IR BPF programs to /dev/lircN devices,
      making them impossible to use. For embedded devices, it should be possible
      to use IR decoding without cgroups or CONFIG_CGROUP_BPF enabled.
      
      This change requires some refactoring, since bpf_prog_{attach,detach,query}
      functions are now always compiled, but their code paths for cgroups need
      moving out. Rather than a #ifdef CONFIG_CGROUP_BPF in kernel/bpf/syscall.c,
      moving them to kernel/bpf/cgroup.c and kernel/bpf/sockmap.c does not
      require #ifdefs since that is already conditionally compiled.
      
      Fixes: f4364dcf ("media: rc: introduce BPF_PROG_LIRC_MODE2")
      Signed-off-by: NSean Young <sean@mess.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      fdb5c453
  21. 25 6月, 2018 1 次提交
    • A
      disable -Wattribute-alias warning for SYSCALL_DEFINEx() · bee20031
      Arnd Bergmann 提交于
      gcc-8 warns for every single definition of a system call entry
      point, e.g.:
      
      include/linux/compat.h:56:18: error: 'compat_sys_rt_sigprocmask' alias between functions of incompatible types 'long int(int,  compat_sigset_t *, compat_sigset_t *, compat_size_t)' {aka 'long int(int,  struct <anonymous> *, struct <anonymous> *, unsigned int)'} and 'long int(long int,  long int,  long int,  long int)' [-Werror=attribute-alias]
        asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))\
                        ^~~~~~~~~~
      include/linux/compat.h:45:2: note: in expansion of macro 'COMPAT_SYSCALL_DEFINEx'
        COMPAT_SYSCALL_DEFINEx(4, _##name, __VA_ARGS__)
        ^~~~~~~~~~~~~~~~~~~~~~
      kernel/signal.c:2601:1: note: in expansion of macro 'COMPAT_SYSCALL_DEFINE4'
       COMPAT_SYSCALL_DEFINE4(rt_sigprocmask, int, how, compat_sigset_t __user *, nset,
       ^~~~~~~~~~~~~~~~~~~~~~
      include/linux/compat.h:60:18: note: aliased declaration here
        asmlinkage long compat_SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))\
                        ^~~~~~~~~~
      
      The new warning seems reasonable in principle, but it doesn't
      help us here, since we rely on the type mismatch to sanitize the
      system call arguments. After I reported this as GCC PR82435, a new
      -Wno-attribute-alias option was added that could be used to turn the
      warning off globally on the command line, but I'd prefer to do it a
      little more fine-grained.
      
      Interestingly, turning a warning off and on again inside of
      a single macro doesn't always work, in this case I had to add
      an extra statement inbetween and decided to copy the __SC_TEST
      one from the native syscall to the compat syscall macro.  See
      https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83256 for more details
      about this.
      
      [paul.burton@mips.com:
        - Rebase atop current master.
        - Split GCC & version arguments to __diag_ignore() in order to match
          changes to the preceding patch.
        - Add the comment argument to match the preceding patch.]
      
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82435Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Tested-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Tested-by: NStafford Horne <shorne@gmail.com>
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      bee20031