1. 23 4月, 2018 5 次提交
    • L
      Merge tag 'for-4.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · d54b5c13
      Linus Torvalds 提交于
      Pull btrfs fixes from David Sterba:
       "This contains a few fixups to the qgroup patches that were merged this
        dev cycle, unaligned access fix, blockgroup removal corner case fix
        and a small debugging output tweak"
      
      * tag 'for-4.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: print-tree: debugging output enhancement
        btrfs: Fix race condition between delayed refs and blockgroup removal
        btrfs: fix unaligned access in readdir
        btrfs: Fix wrong btrfs_delalloc_release_extents parameter
        btrfs: delayed-inode: Remove wrong qgroup meta reservation calls
        btrfs: qgroup: Use independent and accurate per inode qgroup rsv
        btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT
      d54b5c13
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 37a535ed
      Linus Torvalds 提交于
      Pull x86 fixes from Thomas Gleixner:
       "A small set of fixes for x86:
      
         - Prevent X2APIC ID 0xFFFFFFFF from being treated as valid, which
           causes the possible CPU count to be wrong.
      
         - Prevent 32bit truncation in calc_hpet_ref() which causes the TSC
           calibration to fail
      
         - Fix the page table setup for temporary text mappings in the resume
           code which causes resume failures
      
         - Make the page table dump code handle HIGHPTE correctly instead of
           oopsing
      
         - Support for topologies where NUMA nodes share an LLC to prevent a
           invalid topology warning and further malfunction on such systems.
      
         - Remove the now unused pci-nommu code
      
         - Remove stale function declarations"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/power/64: Fix page-table setup for temporary text mapping
        x86/mm: Prevent kernel Oops in PTDUMP code with HIGHPTE=y
        x86,sched: Allow topologies where NUMA nodes share an LLC
        x86/processor: Remove two unused function declarations
        x86/acpi: Prevent X2APIC id 0xffffffff from being accounted
        x86/tsc: Prevent 32bit truncation in calc_hpet_ref()
        x86: Remove pci-nommu.c
      37a535ed
    • L
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c1e9dae0
      Linus Torvalds 提交于
      Pull timer fixes from Thomas Gleixner:
       "A small set of timer fixes:
      
         - Evaluate the -ETIME condition correctly in the imx tpm driver
      
         - Fix the evaluation order of a condition in posix cpu timers
      
         - Use pr_cont() in the clockevents code to prevent ugly message
           splitting
      
         - Remove __current_kernel_time() which is now unused to prevent that
           new users show up.
      
         - Remove a stale forward declaration"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource/imx-tpm: Correct -ETIME return condition check
        posix-cpu-timers: Ensure set_process_cpu_timer is always evaluated
        timekeeping: Remove __current_kernel_time()
        timers: Remove stale struct tvec_base forward declaration
        clockevents: Fix kernel messages split across multiple lines
      c1e9dae0
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 38f0b33e
      Linus Torvalds 提交于
      Pull perf fixes from Thomas Gleixner:
       "A larger set of updates for perf.
      
        Kernel:
      
         - Handle the SBOX uncore monitoring correctly on Broadwell CPUs which
           do not have SBOX.
      
         - Store context switch out type in PERF_RECORD_SWITCH[_CPU_WIDE]. The
           percentage of preempting and non-preempting context switches help
           understanding the nature of workloads (CPU or IO bound) that are
           running on a machine. This adds the kernel facility and userspace
           changes needed to show this information in 'perf script' and 'perf
           report -D' (Alexey Budankov)
      
         - Remove a WARN_ON() in the trace/kprobes code which is pointless
           because the return error code is already telling the caller what's
           wrong.
      
         - Revert a fugly workaround for clang BPF targets.
      
         - Fix sample_max_stack maximum check and do not proceed when an error
           has been detect, return them to avoid misidentifying errors (Jiri
           Olsa)
      
         - Add SPDX idenitifiers and get rid of GPL boilderplate.
      
        Tools:
      
         - Synchronize kernel ABI headers, v4.17-rc1 (Ingo Molnar)
      
         - Support MAP_FIXED_NOREPLACE, noticed when updating the
           tools/include/ copies (Arnaldo Carvalho de Melo)
      
         - Add '\n' at the end of parse-options error messages (Ravi Bangoria)
      
         - Add s390 support for detailed/verbose PMU event description (Thomas
           Richter)
      
         - perf annotate fixes and improvements:
      
            * Allow showing offsets in more than just jump targets, use the
              new 'O' hotkey in the TUI, config ~/.perfconfig
              annotate.offset_level for it and for --stdio2 (Arnaldo Carvalho
              de Melo)
      
            * Use the resolved variable names from objdump disassembled lines
              to make them more compact, just like was already done for some
              instructions, like "mov", this eventually will be done more
              generally, but lets now add some more to the existing mechanism
              (Arnaldo Carvalho de Melo)
      
         - perf record fixes:
      
            * Change warning for missing topology sysfs entry to debug, as not
              all architectures have those files, s390 being one of those
              (Thomas Richter)
      
            * Remove old error messages about things that unlikely to be the
              root cause in modern systems (Andi Kleen)
      
         - perf sched fixes:
      
            * Fix -g/--call-graph documentation (Takuya Yamamoto)
      
         - perf stat:
      
            * Enable 1ms interval for printing event counters values in
              (Alexey Budankov)
      
         - perf test fixes:
      
            * Run dwarf unwind on arm32 (Kim Phillips)
      
            * Remove unused ptrace.h include from LLVM test, sidesteping older
              clang's lack of support for some asm constructs (Arnaldo
              Carvalho de Melo)
      
            * Fixup BPF test using epoll_pwait syscall function probe, to cope
              with the syscall routines renames performed in this development
              cycle (Arnaldo Carvalho de Melo)
      
         - perf version fixes:
      
            * Do not print info about HAVE_LIBAUDIT_SUPPORT in 'perf version
              --build-options' when HAVE_SYSCALL_TABLE_SUPPORT is true, as
              libaudit won't be used in that case, print info about
              syscall_table support instead (Jin Yao)
      
         - Build system fixes:
      
            * Use HAVE_..._SUPPORT used consistently (Jin Yao)
      
            * Restore READ_ONCE() C++ compatibility in tools/include (Mark
              Rutland)
      
            * Give hints about package names needed to build jvmti (Arnaldo
              Carvalho de Melo)"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
        perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs
        perf/x86/intel/uncore: Revert "Remove SBOX support for Broadwell server"
        coresight: Move to SPDX identifier
        perf test BPF: Fixup BPF test using epoll_pwait syscall function probe
        perf tests mmap: Show which tracepoint is failing
        perf tools: Add '\n' at the end of parse-options error messages
        perf record: Remove suggestion to enable APIC
        perf record: Remove misleading error suggestion
        perf hists browser: Clarify top/report browser help
        perf mem: Allow all record/report options
        perf trace: Support MAP_FIXED_NOREPLACE
        perf: Remove superfluous allocation error check
        perf: Fix sample_max_stack maximum check
        perf: Return proper values for user stack errors
        perf list: Add s390 support for detailed/verbose PMU event description
        perf script: Extend misc field decoding with switch out event type
        perf report: Extend raw dump (-D) out with switch out event type
        perf/core: Store context switch out type in PERF_RECORD_SWITCH[_CPU_WIDE]
        tools/headers: Synchronize kernel ABI headers, v4.17-rc1
        trace_kprobe: Remove warning message "Could not insert probe at..."
        ...
      38f0b33e
    • L
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 18de45a9
      Linus Torvalds 提交于
      Pull objtool fix from Thomas Gleixner:
       "A single fix for objtool so it uses the host C and LD flags and not
        the target ones"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        objtool: Support HOSTCFLAGS and HOSTLDFLAGS
      18de45a9
  2. 22 4月, 2018 6 次提交
    • L
      Merge tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random · 285848b0
      Linus Torvalds 提交于
      Pull /dev/random fixes from Ted Ts'o:
       "Fix some bugs in the /dev/random driver which causes getrandom(2) to
        unblock earlier than designed.
      
        Thanks to Jann Horn from Google's Project Zero for pointing this out
        to me"
      
      * tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
        random: add new ioctl RNDRESEEDCRNG
        random: crng_reseed() should lock the crng instance that it is modifying
        random: set up the NUMA crng instances after the CRNG is fully initialized
        random: use a different mixing algorithm for add_device_randomness()
        random: fix crng_ready() test
      285848b0
    • L
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 4c50ceae
      Linus Torvalds 提交于
      Pull libnvdimm fixes from Dan Williams:
       "A regression fix, new unit test infrastructure and a build fix:
      
         - Regression fix addressing support for the new NVDIMM label storage
           area access commands (_LSI, _LSR, and _LSW).
      
           The Intel specific version of these commands communicated the
           "Device Locked" status on the label-storage-information command.
      
           However, these new commands (standardized in ACPI 6.2) communicate
           the "Device Locked" status on the label-storage-read command, and
           the driver was missing the indication.
      
           Reading from locked persistent memory is similar to reading
           unmapped PCI memory space, returns all 1's.
      
         - Unit test infrastructure is added to regression test the "Device
           Locked" detection failure.
      
         - A build fix is included to allow the "of_pmem" driver to be built
           as a module and translate an Open Firmware described device to its
           local numa node"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        MAINTAINERS: Add backup maintainers for libnvdimm and DAX
        device-dax: allow MAP_SYNC to succeed
        Revert "libnvdimm, of_pmem: workaround OF_NUMA=n build error"
        libnvdimm, of_pmem: use dev_to_node() instead of of_node_to_nid()
        tools/testing/nvdimm: enable labels for nfit_test.1 dimms
        tools/testing/nvdimm: fix missing newline in nfit_test_dimm 'handle' attribute
        tools/testing/nvdimm: support nfit_test_dimm attributes under nfit_test.1
        tools/testing/nvdimm: allow custom error code injection
        libnvdimm, dimm: handle EACCES failures from label reads
      4c50ceae
    • L
      Merge tag 'sound-4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 5e7c7806
      Linus Torvalds 提交于
      Pull sound fixes from Takashi Iwai:
       "A few small fixes:
      
         - a fix for the NULL-dereference in rawmidi compat ioctls, triggered
           by fuzzer
      
         - HD-audio Realtek codec quirks, a VIA controller fixup
      
         - a long-standing bug fix in LINE6 MIDI"
      
      * tag 'sound-4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: rawmidi: Fix missing input substream checks in compat ioctls
        ALSA: hda/realtek - adjust the location of one mic
        ALSA: hda/realtek - set PINCFG_HEADSET_MIC to parse_flags
        ALSA: hda - New VIA controller suppor no-snoop path
        ALSA: line6: Use correct endpoint type for midi output
      5e7c7806
    • L
      Merge tag 'linux-watchdog-4.17-rc2' of git://www.linux-watchdog.org/linux-watchdog · e46096b6
      Linus Torvalds 提交于
      Pull watchdog fixes from Wim Van Sebroeck:
      
       - fall-through fixes
      
       - MAINTAINER change for hpwdt
      
       - renesas-wdt: Add support for WDIOF_CARDRESET
      
       - aspeed: set bootstatus during probe
      
      * tag 'linux-watchdog-4.17-rc2' of git://www.linux-watchdog.org/linux-watchdog:
        aspeed: watchdog: Set bootstatus during probe
        watchdog: renesas-wdt: Add support for WDIOF_CARDRESET
        watchdog: wafer5823wdt: Mark expected switch fall-through
        watchdog: w83977f_wdt: Mark expected switch fall-through
        watchdog: sch311x_wdt: Mark expected switch fall-through
        watchdog: hpwdt: change maintainer.
      e46096b6
    • L
      Merge tag 'linux-kselftest-4.17-rc2' of... · 6488ec26
      Linus Torvalds 提交于
      Merge tag 'linux-kselftest-4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull Kselftest fix from Shuah Khan:
       "A fix from Michael Ellerman to not run dnotify_test by default to
        prevent Kselftest running forever"
      
      * tag 'linux-kselftest-4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests/filesystems: Don't run dnotify_test by default
      6488ec26
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 9409227a
      Linus Torvalds 提交于
      Pull arm64 fixes from Catalin Marinas:
      
       - kasan: avoid pfn_to_nid() before the page array is initialised
      
       - Fix typo causing the "upgrade" of known signals to SIGKILL
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: signal: don't force known signals to SIGKILL
        arm64: kasan: avoid pfn_to_nid() before page array is initialized
      9409227a
  3. 21 4月, 2018 28 次提交
    • L
      Merge branch 'akpm' (patches from Andrew) · 7a752478
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
      
       - "fork: unconditionally clear stack on fork" is a non-bugfix which got
         lost during the merge window - performance concerns appear to have
         been adequately addressed.
      
       - and a bunch of fixes
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/filemap.c: fix NULL pointer in page_cache_tree_insert()
        mm: memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create()
        fs, elf: don't complain MAP_FIXED_NOREPLACE unless -EEXIST error
        kexec_file: do not add extra alignment to efi memmap
        proc: fix /proc/loadavg regression
        proc: revalidate kernel thread inodes to root:root
        autofs: mount point create should honour passed in mode
        MAINTAINERS: add personal addresses for Sascha and Uwe
        kasan: add no_sanitize attribute for clang builds
        rapidio: fix rio_dma_transfer error handling
        mm: enable thp migration for shmem thp
        writeback: safer lock nesting
        mm, pagemap: fix swap offset value for PMD migration entry
        mm: fix do_pages_move status handling
        fork: unconditionally clear stack on fork
      7a752478
    • I
      Merge tag 'perf-urgent-for-mingo-4.17-20180420' of... · c042f7e9
      Ingo Molnar 提交于
      Merge tag 'perf-urgent-for-mingo-4.17-20180420' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/urgent fixes and improvements from Arnaldo Carvalho de Melo:
      
      - Store context switch out type in PERF_RECORD_SWITCH[_CPU_WIDE].
        The percentage of preempting and non-preempting context switches help
        understanding the nature of workloads (CPU or IO bound) that are running
        on a machine. This adds the kernel facility and userspace changes needed
        to show this information in 'perf script' and 'perf report -D' (Alexey Budankov)
      
      - Remove old error messages about things that unlikely to be the root cause
        in modern systems (Andi Kleen)
      
      - Synchronize kernel ABI headers, v4.17-rc1 (Ingo Molnar)
      
      - Support MAP_FIXED_NOREPLACE, noticed when updating the tools/include/
        copies (Arnaldo Carvalho de Melo)
      
      - Fixup BPF test using epoll_pwait syscall function probe, to cope with
        the syscall routines renames performed in this development cycle (Arnaldo Carvalho de Melo)
      
      - Fix sample_max_stack maximum check and do not proceed when an error
        has been detect, return them to avoid misidentifying errors (Jiri Olsa)
      
      - Add '\n' at the end of parse-options error messages (Ravi Bangoria)
      
      - Add s390 support for detailed/verbose PMU event description (Thomas Richter)
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c042f7e9
    • M
      mm/filemap.c: fix NULL pointer in page_cache_tree_insert() · abc1be13
      Matthew Wilcox 提交于
      f2fs specifies the __GFP_ZERO flag for allocating some of its pages.
      Unfortunately, the page cache also uses the mapping's GFP flags for
      allocating radix tree nodes.  It always masked off the __GFP_HIGHMEM
      flag, and masks off __GFP_ZERO in some paths, but not all.  That causes
      radix tree nodes to be allocated with a NULL list_head, which causes
      backtraces like:
      
        __list_del_entry+0x30/0xd0
        list_lru_del+0xac/0x1ac
        page_cache_tree_insert+0xd8/0x110
      
      The __GFP_DMA and __GFP_DMA32 flags would also be able to sneak through
      if they are ever used.  Fix them all by using GFP_RECLAIM_MASK at the
      innermost location, and remove it from earlier in the callchain.
      
      Link: http://lkml.kernel.org/r/20180411060320.14458-2-willy@infradead.org
      Fixes: 449dd698 ("mm: keep page cache radix tree nodes in check")
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Reported-by: NChris Fries <cfries@google.com>
      Debugged-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      abc1be13
    • M
      mm: memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create() · c892fd82
      Minchan Kim 提交于
      If there is heavy memory pressure, page allocation with __GFP_NOWAIT
      fails easily although it's order-0 request.  I got below warning 9 times
      for normal boot.
      
           <snip >: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT|__GFP_NOTRACK)
           .. snip ..
           Call trace:
             dump_backtrace+0x0/0x4
             dump_stack+0xa4/0xc0
             warn_alloc+0xd4/0x15c
             __alloc_pages_nodemask+0xf88/0x10fc
             alloc_slab_page+0x40/0x18c
             new_slab+0x2b8/0x2e0
             ___slab_alloc+0x25c/0x464
             __kmalloc+0x394/0x498
             memcg_kmem_get_cache+0x114/0x2b8
             kmem_cache_alloc+0x98/0x3e8
             mmap_region+0x3bc/0x8c0
             do_mmap+0x40c/0x43c
             vm_mmap_pgoff+0x15c/0x1e4
             sys_mmap+0xb0/0xc8
             el0_svc_naked+0x24/0x28
           Mem-Info:
           active_anon:17124 inactive_anon:193 isolated_anon:0
            active_file:7898 inactive_file:712955 isolated_file:55
            unevictable:0 dirty:27 writeback:18 unstable:0
            slab_reclaimable:12250 slab_unreclaimable:23334
            mapped:19310 shmem:212 pagetables:816 bounce:0
            free:36561 free_pcp:1205 free_cma:35615
           Node 0 active_anon:68496kB inactive_anon:772kB active_file:31592kB inactive_file:2851820kB unevictable:0kB isolated(anon):0kB isolated(file):220kB mapped:77240kB dirty:108kB writeback:72kB shmem:848kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
           DMA free:142188kB min:3056kB low:3820kB high:4584kB active_anon:10052kB inactive_anon:12kB active_file:312kB inactive_file:1412620kB unevictable:0kB writepending:0kB present:1781412kB managed:1604728kB mlocked:0kB slab_reclaimable:3592kB slab_unreclaimable:876kB kernel_stack:400kB pagetables:52kB bounce:0kB free_pcp:1436kB local_pcp:124kB free_cma:142492kB
           lowmem_reserve[]: 0 1842 1842
           Normal free:4056kB min:4172kB low:5212kB high:6252kB active_anon:58376kB inactive_anon:760kB active_file:31348kB inactive_file:1439040kB unevictable:0kB writepending:180kB present:2000636kB managed:1923688kB mlocked:0kB slab_reclaimable:45408kB slab_unreclaimable:92460kB kernel_stack:9680kB pagetables:3212kB bounce:0kB free_pcp:3392kB local_pcp:688kB free_cma:0kB
           lowmem_reserve[]: 0 0 0
           DMA: 0*4kB 0*8kB 1*16kB (C) 0*32kB 0*64kB 0*128kB 1*256kB (C) 1*512kB (C) 0*1024kB 1*2048kB (C) 34*4096kB (C) = 142096kB
           Normal: 228*4kB (UMEH) 172*8kB (UMH) 23*16kB (UH) 24*32kB (H) 5*64kB (H) 1*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3872kB
           721350 total pagecache pages
           0 pages in swap cache
           Swap cache stats: add 0, delete 0, find 0/0
           Free swap  = 0kB
           Total swap = 0kB
           945512 pages RAM
           0 pages HighMem/MovableOnly
           63408 pages reserved
           51200 pages cma reserved
      
      __memcg_schedule_kmem_cache_create() tries to create a shadow slab cache
      and the worker allocation failure is not really critical because we will
      retry on the next kmem charge.  We might miss some charges but that
      shouldn't be critical.  The excessive allocation failure report is not
      very helpful.
      
      [mhocko@kernel.org: changelog update]
      Link: http://lkml.kernel.org/r/20180418022912.248417-1-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c892fd82
    • T
      fs, elf: don't complain MAP_FIXED_NOREPLACE unless -EEXIST error · d23a61ee
      Tetsuo Handa 提交于
      Commit 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map") is
      printing spurious messages under memory pressure due to map_addr == -ENOMEM.
      
       9794 (a.out): Uhuuh, elf segment at 00007f2e34738000(fffffffffffffff4) requested but the memory is mapped already
       14104 (a.out): Uhuuh, elf segment at 00007f34fd76c000(fffffffffffffff4) requested but the memory is mapped already
       16843 (a.out): Uhuuh, elf segment at 00007f930ecc7000(fffffffffffffff4) requested but the memory is mapped already
      
      Complain only if -EEXIST, and use %px for printing the address.
      
      Link: http://lkml.kernel.org/r/201804182307.FAC17665.SFMOFJVFtHOLOQ@I-love.SAKURA.ne.jp
      Fixes: 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map") is
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Andrei Vagin <avagin@openvz.org>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Joel Stanley <joel@jms.id.au>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d23a61ee
    • D
      kexec_file: do not add extra alignment to efi memmap · a841aa83
      Dave Young 提交于
      Chun-Yi reported a kernel warning message below:
      
        WARNING: CPU: 0 PID: 0 at ../mm/early_ioremap.c:182 early_iounmap+0x4f/0x12c()
        early_iounmap(ffffffffff200180, 00000118) [0] size not consistent 00000120
      
      The problem is x86 kexec_file_load adds extra alignment to the efi
      memmap: in bzImage64_load():
      
              efi_map_sz = efi_get_runtime_map_size();
              efi_map_sz = ALIGN(efi_map_sz, 16);
      
      And __efi_memmap_init maps with the size including the alignment bytes
      but efi_memmap_unmap use nr_maps * desc_size which does not include the
      extra bytes.
      
      The alignment in kexec code is only needed for the kexec buffer internal
      use Actually kexec should pass exact size of the efi memmap to 2nd
      kernel.
      
      Link: http://lkml.kernel.org/r/20180417083600.GA1972@dhcp-128-65.nay.redhat.comSigned-off-by: NDave Young <dyoung@redhat.com>
      Reported-by: Njoeyli <jlee@suse.com>
      Tested-by: NRandy Wright <rwright@hpe.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a841aa83
    • A
      proc: fix /proc/loadavg regression · 9a1015b3
      Alexey Dobriyan 提交于
      Commit 95846ecf ("pid: replace pid bitmap implementation with IDR
      API") changed last field of /proc/loadavg (last pid allocated) to be off
      by one:
      
      	# unshare -p -f --mount-proc cat /proc/loadavg
      	0.00 0.00 0.00 1/60 2	<===
      
      It should be 1 after first fork into pid namespace.
      
      This is formally a regression but given how useless this field is I
      don't think anyone is affected.
      
      Bug was found by /proc testsuite!
      
      Link: http://lkml.kernel.org/r/20180413175408.GA27246@avx2
      Fixes: 95846ecf ("pid: replace pid bitmap implementation with IDR API")
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Gargi Sharma <gs051095@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a1015b3
    • A
      proc: revalidate kernel thread inodes to root:root · 2e0ad552
      Alexey Dobriyan 提交于
      task_dump_owner() has the following code:
      
      	mm = task->mm;
      	if (mm) {
      		if (get_dumpable(mm) != SUID_DUMP_USER) {
      			uid = ...
      		}
      	}
      
      Check for ->mm is buggy -- kernel thread might be borrowing mm
      and inode will go to some random uid:gid pair.
      
      Link: http://lkml.kernel.org/r/20180412220109.GA20978@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2e0ad552
    • I
      autofs: mount point create should honour passed in mode · 1e630665
      Ian Kent 提交于
      The autofs file system mkdir inode operation blindly sets the created
      directory mode to S_IFDIR | 0555, ingoring the passed in mode, which can
      cause selinux dac_override denials.
      
      But the function also checks if the caller is the daemon (as no-one else
      should be able to do anything here) so there's no point in not honouring
      the passed in mode, allowing the daemon to set appropriate mode when
      required.
      
      Link: http://lkml.kernel.org/r/152361593601.8051.14014139124905996173.stgit@pluto.themaw.netSigned-off-by: NIan Kent <raven@themaw.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e630665
    • U
      MAINTAINERS: add personal addresses for Sascha and Uwe · 1551cf74
      Uwe Kleine-König 提交于
      The idea behind using kernel@pengutronix.de (i.e. the mail alias for the
      kernel people at Pengutronix) as email address was to have a backup when
      a given developer is on vacation or run over by a bus. Make this more
      explicit by adding the alias as reviewer and use the personal address
      for Sascha and me.
      
      Link: http://lkml.kernel.org/r/20180413083312.11213-1-u.kleine-koenig@pengutronix.deSigned-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Acked-by: NSascha Hauer <s.hauer@pengutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1551cf74
    • A
      kasan: add no_sanitize attribute for clang builds · 12c8f25a
      Andrey Konovalov 提交于
      KASAN uses the __no_sanitize_address macro to disable instrumentation of
      particular functions.  Right now it's defined only for GCC build, which
      causes false positives when clang is used.
      
      This patch adds a definition for clang.
      
      Note, that clang's revision 329612 or higher is required.
      
      [andreyknvl@google.com: remove redundant #ifdef CONFIG_KASAN check]
        Link: http://lkml.kernel.org/r/c79aa31a2a2790f6131ed607c58b0dd45dd62a6c.1523967959.git.andreyknvl@google.com
      Link: http://lkml.kernel.org/r/4ad725cc903f8534f8c8a60f0daade5e3d674f8d.1523554166.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Paul Lawrence <paullawrence@google.com>
      Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12c8f25a
    • I
      rapidio: fix rio_dma_transfer error handling · c5157b76
      Ioan Nicu 提交于
      Some of the mport_dma_req structure members were initialized late
      inside the do_dma_request() function, just before submitting the
      request to the dma engine. But we have some error branches before
      that. In case of such an error, the code would return on the error
      path and trigger the calling of dma_req_free() with a req structure
      which is not completely initialized. This causes a NULL pointer
      dereference in dma_req_free().
      
      This patch fixes these error branches by making sure that all
      necessary mport_dma_req structure members are initialized in
      rio_dma_transfer() immediately after the request structure gets
      allocated.
      
      Link: http://lkml.kernel.org/r/20180412150605.GA31409@nokia.com
      Fixes: bbd876ad ("rapidio: use a reference count for struct mport_dma_req")
      Signed-off-by: NIoan Nicu <ioan.nicu.ext@nokia.com>
      Tested-by: NAlexander Sverdlin <alexander.sverdlin@nokia.com>
      Acked-by: NAlexandre Bounine <alex.bou9@gmail.com>
      Cc: Barry Wood <barry.wood@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Frank Kunz <frank.kunz@nokia.com>
      Cc: <stable@vger.kernel.org>	[4.6+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c5157b76
    • N
      mm: enable thp migration for shmem thp · e71769ae
      Naoya Horiguchi 提交于
      My testing for the latest kernel supporting thp migration showed an
      infinite loop in offlining the memory block that is filled with shmem
      thps.  We can get out of the loop with a signal, but kernel should return
      with failure in this case.
      
      What happens in the loop is that scan_movable_pages() repeats returning
      the same pfn without any progress.  That's because page migration always
      fails for shmem thps.
      
      In memory offline code, memory blocks containing unmovable pages should be
      prevented from being offline targets by has_unmovable_pages() inside
      start_isolate_page_range().  So it's possible to change migratability for
      non-anonymous thps to avoid the issue, but it introduces more complex and
      thp-specific handling in migration code, so it might not good.
      
      So this patch is suggesting to fix the issue by enabling thp migration for
      shmem thp.  Both of anon/shmem thp are migratable so we don't need
      precheck about the type of thps.
      
      Link: http://lkml.kernel.org/r/20180406030706.GA2434@hori1.linux.bs1.fc.nec.co.jp
      Fixes: commit 72b39cfc ("mm, memory_hotplug: do not fail offlining too early")
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Zi Yan <zi.yan@sent.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e71769ae
    • G
      writeback: safer lock nesting · 2e898e4c
      Greg Thelen 提交于
      lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
      the page's memcg is undergoing move accounting, which occurs when a
      process leaves its memcg for a new one that has
      memory.move_charge_at_immigrate set.
      
      unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
      the given inode is switching writeback domains.  Switches occur when
      enough writes are issued from a new domain.
      
      This existing pattern is thus suspicious:
          lock_page_memcg(page);
          unlocked_inode_to_wb_begin(inode, &locked);
          ...
          unlocked_inode_to_wb_end(inode, locked);
          unlock_page_memcg(page);
      
      If both inode switch and process memcg migration are both in-flight then
      unlocked_inode_to_wb_end() will unconditionally enable interrupts while
      still holding the lock_page_memcg() irq spinlock.  This suggests the
      possibility of deadlock if an interrupt occurs before unlock_page_memcg().
      
          truncate
          __cancel_dirty_page
          lock_page_memcg
          unlocked_inode_to_wb_begin
          unlocked_inode_to_wb_end
          <interrupts mistakenly enabled>
                                          <interrupt>
                                          end_page_writeback
                                          test_clear_page_writeback
                                          lock_page_memcg
                                          <deadlock>
          unlock_page_memcg
      
      Due to configuration limitations this deadlock is not currently possible
      because we don't mix cgroup writeback (a cgroupv2 feature) and
      memory.move_charge_at_immigrate (a cgroupv1 feature).
      
      If the kernel is hacked to always claim inode switching and memcg
      moving_account, then this script triggers lockup in less than a minute:
      
        cd /mnt/cgroup/memory
        mkdir a b
        echo 1 > a/memory.move_charge_at_immigrate
        echo 1 > b/memory.move_charge_at_immigrate
        (
          echo $BASHPID > a/cgroup.procs
          while true; do
            dd if=/dev/zero of=/mnt/big bs=1M count=256
          done
        ) &
        while true; do
          sync
        done &
        sleep 1h &
        SLEEP=$!
        while true; do
          echo $SLEEP > a/cgroup.procs
          echo $SLEEP > b/cgroup.procs
        done
      
      The deadlock does not seem possible, so it's debatable if there's any
      reason to modify the kernel.  I suggest we should to prevent future
      surprises.  And Wang Long said "this deadlock occurs three times in our
      environment", so there's more reason to apply this, even to stable.
      Stable 4.4 has minor conflicts applying this patch.  For a clean 4.4 patch
      see "[PATCH for-4.4] writeback: safer lock nesting"
      https://lkml.org/lkml/2018/4/11/146
      
      Wang Long said "this deadlock occurs three times in our environment"
      
      [gthelen@google.com: v4]
        Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
      [akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
      Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
      Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
      Fixes: 682aa8e1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Reported-by: NWang Long <wanglong19@meituan.com>
      Acked-by: NWang Long <wanglong19@meituan.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: <stable@vger.kernel.org>	[v4.2+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2e898e4c
    • H
      mm, pagemap: fix swap offset value for PMD migration entry · 88c28f24
      Huang Ying 提交于
      The swap offset reported by /proc/<pid>/pagemap may be not correct for
      PMD migration entries.  If addr passed into pagemap_pmd_range() isn't
      aligned with PMD start address, the swap offset reported doesn't
      reflect this.  And in the loop to report information of each sub-page,
      the swap offset isn't increased accordingly as that for PFN.
      
      This may happen after opening /proc/<pid>/pagemap and seeking to a page
      whose address doesn't align with a PMD start address.  I have verified
      this with a simple test program.
      
      BTW: migration swap entries have PFN information, do we need to restrict
      whether to show them?
      
      [akpm@linux-foundation.org: fix typo, per Huang, Ying]
      Link: http://lkml.kernel.org/r/20180408033737.10897-1-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrei Vagin <avagin@openvz.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: "Jerome Glisse" <jglisse@redhat.com>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88c28f24
    • M
      mm: fix do_pages_move status handling · 8f175cf5
      Michal Hocko 提交于
      Li Wang has reported that LTP move_pages04 test fails with the current
      tree:
      
      LTP move_pages04:
         TFAIL  :  move_pages04.c:143: status[1] is EPERM, expected EFAULT
      
      The test allocates an array of two pages, one is present while the other
      is not (resp.  backed by zero page) and it expects EFAULT for the second
      page as the man page suggests.  We are reporting EPERM which doesn't make
      any sense and this is a result of a bug from cf5f16b23ec9 ("mm: unclutter
      THP migration").
      
      do_pages_move tries to handle as many pages in one batch as possible so we
      queue all pages with the same node target together and that corresponds to
      [start, i] range which is then used to update status array.
      add_page_for_migration will correctly notice the zero (resp.  !present)
      page and returns with EFAULT which gets written to the status.  But if
      this is the last page in the array we do not update start and so the last
      store_status after the loop will overwrite the range of the last batch
      with NUMA_NO_NODE (which corresponds to EPERM).
      
      Fix this by simply bailing out from the last flush if the pagelist is
      empty as there is clearly nothing more to do.
      
      Link: http://lkml.kernel.org/r/20180418121255.334-1-mhocko@kernel.org
      Fixes: cf5f16b23ec9 ("mm: unclutter THP migration")
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NLi Wang <liwang@redhat.com>
      Tested-by: NLi Wang <liwang@redhat.com>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f175cf5
    • K
      fork: unconditionally clear stack on fork · e01e8063
      Kees Cook 提交于
      One of the classes of kernel stack content leaks[1] is exposing the
      contents of prior heap or stack contents when a new process stack is
      allocated.  Normally, those stacks are not zeroed, and the old contents
      remain in place.  In the face of stack content exposure flaws, those
      contents can leak to userspace.
      
      Fixing this will make the kernel no longer vulnerable to these flaws, as
      the stack will be wiped each time a stack is assigned to a new process.
      There's not a meaningful change in runtime performance; it almost looks
      like it provides a benefit.
      
      Performing back-to-back kernel builds before:
      	Run times: 157.86 157.09 158.90 160.94 160.80
      	Mean: 159.12
      	Std Dev: 1.54
      
      and after:
      	Run times: 159.31 157.34 156.71 158.15 160.81
      	Mean: 158.46
      	Std Dev: 1.46
      
      Instead of making this a build or runtime config, Andy Lutomirski
      recommended this just be enabled by default.
      
      [1] A noisy search for many kinds of stack content leaks can be seen here:
      https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak
      
      I did some more with perf and cycle counts on running 100,000 execs of
      /bin/true.
      
      before:
      Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841
      Mean:  221015379122.60
      Std Dev: 4662486552.47
      
      after:
      Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348
      Mean:  217745009865.40
      Std Dev: 5935559279.99
      
      It continues to look like it's faster, though the deviation is rather
      wide, but I'm not sure what I could do that would be less noisy.  I'm
      open to ideas!
      
      Link: http://lkml.kernel.org/r/20180221021659.GA37073@beastSigned-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e01e8063
    • L
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal · 83beed7b
      Linus Torvalds 提交于
      Pull thermal fixes from Eduardo Valentin:
       "A couple of fixes for the thermal subsystem"
      
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
        dt-bindings: thermal: Remove "cooling-{min|max}-level" properties
        dt-bindings: thermal: remove no longer needed samsung thermal properties
      83beed7b
    • L
      Merge tag 'mmc-v4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 7e3cb169
      Linus Torvalds 提交于
      Pull MMC fixes from Ulf Hansson:
       "A couple of MMC host fixes:
      
         - sdhci-pci: Fixup tuning for AMD for eMMC HS200 mode
      
         - renesas_sdhi_internal_dmac: Avoid data corruption by limiting
           DMA RX"
      
      * tag 'mmc-v4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: renesas_sdhi_internal_dmac: limit DMA RX for old SoCs
        mmc: sdhci-pci: Only do AMD tuning for HS200
      7e3cb169
    • L
      Merge tag 'md/4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md · 7768ee3f
      Linus Torvalds 提交于
      Pull MD fixes from Shaohua Li:
       "Three small fixes for MD:
      
         - md-cluster fix for faulty device from Guoqing
      
         - writehint fix for writebehind IO for raid1 from Mariusz
      
         - a live lock fix for interrupted recovery from Yufen"
      
      * tag 'md/4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
        raid1: copy write hint from master bio to behind bio
        md/raid1: exit sync request if MD_RECOVERY_INTR is set
        md-cluster: don't update recovery_offset for faulty device
      7768ee3f
    • Q
      btrfs: print-tree: debugging output enhancement · c0872323
      Qu Wenruo 提交于
      This patch enhances the following things:
      
      - tree block header
        * add generation and owner output for node and leaf
      - node pointer generation output
      - allow btrfs_print_tree() to not follow nodes
        * just like btrfs-progs
      
      Please note that, although function btrfs_print_tree() is not called by
      anyone right now, it's still a pretty useful function to debug kernel.
      So that function is still kept for later use.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c0872323
    • N
      btrfs: Fix race condition between delayed refs and blockgroup removal · 5e388e95
      Nikolay Borisov 提交于
      When the delayed refs for a head are all run, eventually
      cleanup_ref_head is called which (in case of deletion) obtains a
      reference for the relevant btrfs_space_info struct by querying the bg
      for the range. This is problematic because when the last extent of a
      bg is deleted a race window emerges between removal of that bg and the
      subsequent invocation of cleanup_ref_head. This can result in cache being null
      and either a null pointer dereference or assertion failure.
      
      	task: ffff8d04d31ed080 task.stack: ffff9e5dc10cc000
      	RIP: 0010:assfail.constprop.78+0x18/0x1a [btrfs]
      	RSP: 0018:ffff9e5dc10cfbe8 EFLAGS: 00010292
      	RAX: 0000000000000044 RBX: 0000000000000000 RCX: 0000000000000000
      	RDX: ffff8d04ffc1f868 RSI: ffff8d04ffc178c8 RDI: ffff8d04ffc178c8
      	RBP: ffff8d04d29e5ea0 R08: 00000000000001f0 R09: 0000000000000001
      	R10: ffff9e5dc0507d58 R11: 0000000000000001 R12: ffff8d04d29e5ea0
      	R13: ffff8d04d29e5f08 R14: ffff8d04efe29b40 R15: ffff8d04efe203e0
      	FS:  00007fbf58ead500(0000) GS:ffff8d04ffc00000(0000) knlGS:0000000000000000
      	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      	CR2: 00007fe6c6975648 CR3: 0000000013b2a000 CR4: 00000000000006f0
      	DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      	DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      	Call Trace:
      	 __btrfs_run_delayed_refs+0x10e7/0x12c0 [btrfs]
      	 btrfs_run_delayed_refs+0x68/0x250 [btrfs]
      	 btrfs_should_end_transaction+0x42/0x60 [btrfs]
      	 btrfs_truncate_inode_items+0xaac/0xfc0 [btrfs]
      	 btrfs_evict_inode+0x4c6/0x5c0 [btrfs]
      	 evict+0xc6/0x190
      	 do_unlinkat+0x19c/0x300
      	 do_syscall_64+0x74/0x140
      	 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      	RIP: 0033:0x7fbf589c57a7
      
      To fix this, introduce a new flag "is_system" to head_ref structs,
      which is populated at insertion time. This allows to decouple the
      querying for the spaceinfo from querying the possibly deleted bg.
      
      Fixes: d7eae340 ("Btrfs: rework delayed ref total_bytes_pinned accounting")
      CC: stable@vger.kernel.org # 4.14+
      Suggested-by: NOmar Sandoval <osandov@osandov.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5e388e95
    • D
      vfs: Undo an overly zealous MS_RDONLY -> SB_RDONLY conversion · a9e5b732
      David Howells 提交于
      In do_mount() when the MS_* flags are being converted to MNT_* flags,
      MS_RDONLY got accidentally convered to SB_RDONLY.
      
      Undo this change.
      
      Fixes: e462ec50 ("VFS: Differentiate mount flags (MS_*) from internal superblock flags")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9e5b732
    • D
      afs: Fix server record deletion · 66062592
      David Howells 提交于
      AFS server records get removed from the net->fs_servers tree when
      they're deleted, but not from the net->fs_addresses{4,6} lists, which
      can lead to an oops in afs_find_server() when a server record has been
      removed, for instance during rmmod.
      
      Fix this by deleting the record from the by-address lists before posting
      it for RCU destruction.
      
      The reason this hasn't been noticed before is that the fileserver keeps
      probing the local cache manager, thereby keeping the service record
      alive, so the oops would only happen when a fileserver eventually gets
      bored and stops pinging or if the module gets rmmod'd and a call comes
      in from the fileserver during the window between the server records
      being destroyed and the socket being closed.
      
      The oops looks something like:
      
        BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
        ...
        Workqueue: kafsd afs_process_async_call [kafs]
        RIP: 0010:afs_find_server+0x271/0x36f [kafs]
        ...
        Call Trace:
         afs_deliver_cb_init_call_back_state3+0x1f2/0x21f [kafs]
         afs_deliver_to_call+0x1ee/0x5e8 [kafs]
         afs_process_async_call+0x5b/0xd0 [kafs]
         process_one_work+0x2c2/0x504
         worker_thread+0x1d4/0x2ac
         kthread+0x11f/0x127
         ret_from_fork+0x24/0x30
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      66062592
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · a72db42c
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Unbalanced refcounting in TIPC, from Jon Maloy.
      
       2) Only allow TCP_MD5SIG to be set on sockets in close or listen state.
          Once the connection is established it makes no sense to change this.
          From Eric Dumazet.
      
       3) Missing attribute validation in neigh_dump_table(), also from Eric
          Dumazet.
      
       4) Fix address comparisons in SCTP, from Xin Long.
      
       5) Neigh proxy table clearing can deadlock, from Wolfgang Bumiller.
      
       6) Fix tunnel refcounting in l2tp, from Guillaume Nault.
      
       7) Fix double list insert in team driver, from Paolo Abeni.
      
       8) af_vsock.ko module was accidently made unremovable, from Stefan
          Hajnoczi.
      
       9) Fix reference to freed llc_sap object in llc stack, from Cong Wang.
      
      10) Don't assume netdevice struct is DMA'able memory in virtio_net
          driver, from Michael S. Tsirkin.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
        net/smc: fix shutdown in state SMC_LISTEN
        bnxt_en: Fix memory fault in bnxt_ethtool_init()
        virtio_net: sparse annotation fix
        virtio_net: fix adding vids on big-endian
        virtio_net: split out ctrl buffer
        net: hns: Avoid action name truncation
        docs: ip-sysctl.txt: fix name of some ipv6 variables
        vmxnet3: fix incorrect dereference when rxvlan is disabled
        llc: hold llc_sap before release_sock()
        MAINTAINERS: Direct networking documentation changes to netdev
        atm: iphase: fix spelling mistake: "Tansmit" -> "Transmit"
        net: qmi_wwan: add Wistron Neweb D19Q1
        net: caif: fix spelling mistake "UKNOWN" -> "UNKNOWN"
        net: stmmac: Disable ACS Feature for GMAC >= 4
        net: mvpp2: Fix DMA address mask size
        net: change the comment of dev_mc_init
        net: qualcomm: rmnet: Fix warning seen with fill_info
        tun: fix vlan packet truncation
        tipc: fix infinite loop when dumping link monitor summary
        tipc: fix use-after-free in tipc_nametbl_stop
        ...
      a72db42c
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · b9abdcfd
      Linus Torvalds 提交于
      Pull vfs fixes from Al Viro:
       "Assorted fixes.
      
        Some of that is only a matter with fault injection (broken handling of
        small allocation failure in various mount-related places), but the
        last one is a root-triggerable stack overflow, and combined with
        userns it gets really nasty ;-/"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        Don't leak MNT_INTERNAL away from internal mounts
        mm,vmscan: Allow preallocating memory for register_shrinker().
        rpc_pipefs: fix double-dput()
        orangefs_kill_sb(): deal with allocation failures
        jffs2_kill_sb(): deal with failed allocations
        hypfs_kill_super(): deal with failed allocations
      b9abdcfd
    • L
      Merge tag 'ecryptfs-4.17-rc2-fixes' of... · 43f70c96
      Linus Torvalds 提交于
      Merge tag 'ecryptfs-4.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
      
      Pull eCryptfs fixes from Tyler Hicks:
       "Minor cleanups and a bug fix to completely ignore unencrypted
        filenames in the lower filesystem when filename encryption is enabled
        at the eCryptfs layer"
      
      * tag 'ecryptfs-4.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
        eCryptfs: don't pass up plaintext names when using filename encryption
        ecryptfs: fix spelling mistake: "cadidate" -> "candidate"
        ecryptfs: lookup: Don't check if mount_crypt_stat is NULL
      43f70c96
    • L
      Merge tag 'for_v4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 0d9cf33b
      Linus Torvalds 提交于
       - isofs memory leak fix
      
       - two fsnotify fixes of event mask handling
      
       - udf fix of UTF-16 handling
      
       - couple other smaller cleanups
      
      * tag 'for_v4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        udf: Fix leak of UTF-16 surrogates into encoded strings
        fs: ext2: Adding new return type vm_fault_t
        isofs: fix potential memory leak in mount option parsing
        MAINTAINERS: add an entry for FSNOTIFY infrastructure
        fsnotify: fix typo in a comment about mark->g_list
        fsnotify: fix ignore mask logic in send_to_group()
        isofs compress: Remove VLA usage
        fs: quota: Replace GFP_ATOMIC with GFP_KERNEL in dquot_init
        fanotify: fix logic of events on child
      0d9cf33b
  4. 20 4月, 2018 1 次提交
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 4d189053
      Linus Torvalds 提交于
      Pull HID updates from Jiri Kosina:
      
       - suspend/resume handling fix for Raydium I2C-connected touchscreen
         from Aaron Ma
      
       - protocol fixup for certain BT-connected Wacoms from Aaron Armstrong
         Skomra
      
       - battery level reporting fix on BT-connected mice from Dmitry Torokhov
      
       - hidraw race condition fix from Rodrigo Rivas Costa
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
        HID: i2c-hid: fix inverted return value from i2c_hid_command()
        HID: i2c-hid: Fix resume issue on Raydium touchscreen device
        HID: wacom: bluetooth: send exit report for recent Bluetooth devices
        HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device
        HID: input: fix battery level reporting on BT mice
      4d189053