1. 25 9月, 2020 1 次提交
  2. 24 8月, 2020 8 次提交
    • L
      Linux 5.9-rc2 · d012a719
      Linus Torvalds 提交于
      d012a719
    • L
      Merge tag 'powerpc-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · cb957121
      Linus Torvalds 提交于
      Pull powerpc fixes from Michael Ellerman:
      
       - Add perf support for emitting extended registers for power10.
      
       - A fix for CPU hotplug on pseries, where on large/loaded systems we
         may not wait long enough for the CPU to be offlined, leading to
         crashes.
      
       - Addition of a raw cputable entry for Power10, which is not required
         to boot, but is required to make our PMU setup work correctly in
         guests.
      
       - Three fixes for the recent changes on 32-bit Book3S to move modules
         into their own segment for strict RWX.
      
       - A fix for a recent change in our powernv PCI code that could lead to
         crashes.
      
       - A change to our perf interrupt accounting to avoid soft lockups when
         using some events, found by syzkaller.
      
       - A change in the way we handle power loss events from the hypervisor
         on pseries. We no longer immediately shut down if we're told we're
         running on a UPS.
      
       - A few other minor fixes.
      
      Thanks to Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju T
      Sudhakar, Athira Rajeev, Christophe Leroy, Frederic Barrat, Greg Kurz,
      Kajol Jain, Madhavan Srinivasan, Michael Neuling, Michael Roth,
      Nageswara R Sastry, Oliver O'Halloran, Thiago Jung Bauermann,
      Vaidyanathan Srinivasan, Vasant Hegde.
      
      * tag 'powerpc-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/perf/hv-24x7: Move cpumask file to top folder of hv-24x7 driver
        powerpc/32s: Fix module loading failure when VMALLOC_END is over 0xf0000000
        powerpc/pseries: Do not initiate shutdown when system is running on UPS
        powerpc/perf: Fix soft lockups due to missed interrupt accounting
        powerpc/powernv/pci: Fix possible crash when releasing DMA resources
        powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death
        powerpc/32s: Fix is_module_segment() when MODULES_VADDR is defined
        powerpc/kasan: Fix KASAN_SHADOW_START on BOOK3S_32
        powerpc/fixmap: Fix the size of the early debug area
        powerpc/pkeys: Fix build error with PPC_MEM_KEYS disabled
        powerpc/kernel: Cleanup machine check function declarations
        powerpc: Add POWER10 raw mode cputable entry
        powerpc/perf: Add extended regs support for power10 platform
        powerpc/perf: Add support for outputting extended regs in perf intr_regs
        powerpc: Fix P10 PVR revision in /proc/cpuinfo for SMT4 cores
      cb957121
    • L
      Merge tag 'x86-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 550c2129
      Linus Torvalds 提交于
      Pull x86 fix from Thomas Gleixner:
       "A single fix for x86 which removes the RDPID usage from the paranoid
        entry path and unconditionally uses LSL to retrieve the CPU number.
      
        RDPID depends on MSR_TSX_AUX. KVM has an optmization to avoid
        expensive MRS read/writes on VMENTER/EXIT. It caches the MSR values
        and restores them either when leaving the run loop, on preemption or
        when going out to user space. MSR_TSX_AUX is part of that lazy MSR
        set, so after writing the guest value and before the lazy restore any
        exception using the paranoid entry will read the guest value and use
        it as CPU number to retrieve the GSBASE value for the current CPU when
        FSGSBASE is enabled. As RDPID is only used in that particular entry
        path, there is no reason to burden VMENTER/EXIT with two extra MSR
        writes. Remove the RDPID optimization, which is not even backed by
        numbers from the paranoid entry path instead"
      
      * tag 'x86-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM
      550c2129
    • L
      Merge tag 'perf-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · cea05c19
      Linus Torvalds 提交于
      Pull x86 perf fix from Thomas Gleixner:
       "A single update for perf on x86 which has support for the broken down
        bandwith counters"
      
      * tag 'perf-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel/uncore: Add BW counters for GT, IA and IO breakdown
      cea05c19
    • L
      Merge tag 'efi-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 10c091b6
      Linus Torvalds 提交于
      Pull EFI fixes from Thomas Gleixner:
      
       - Enforce NX on RO data in mixed EFI mode
      
       - Destroy workqueue in an error handling path to prevent UAF
      
       - Stop argument parser at '--' which is the delimiter for init
      
       - Treat a NULL command line pointer as empty instead of dereferncing it
         unconditionally.
      
       - Handle an unterminated command line correctly
      
       - Cleanup the 32bit code leftovers and remove obsolete documentation
      
      * tag 'efi-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        Documentation: efi: remove description of efi=old_map
        efi/x86: Move 32-bit code into efi_32.c
        efi/libstub: Handle unterminated cmdline
        efi/libstub: Handle NULL cmdline
        efi/libstub: Stop parsing arguments at "--"
        efi: add missed destroy_workqueue when efisubsys_init fails
        efi/x86: Mark kernel rodata non-executable for mixed mode
      10c091b6
    • L
      Merge tag 'core-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e99b2507
      Linus Torvalds 提交于
      Pull entry fix from Thomas Gleixner:
       "A single bug fix for the common entry code.
      
        The transcription of the x86 version messed up the reload of the
        syscall number from pt_regs after ptrace and seccomp which breaks
        syscall number rewriting"
      
      * tag 'core-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        core/entry: Respect syscall number rewrites
      e99b2507
    • L
      Merge tag 'edac_urgent_for_v5.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · d9232cb7
      Linus Torvalds 提交于
      Pull EDAC fix from Borislav Petkov:
       "A single fix correcting a reversed error severity determination check
        which lead to a recoverable error getting marked as fatal, by Tony
        Luck"
      
      * tag 'edac_urgent_for_v5.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
        EDAC/{i7core,sb,pnd2,skx}: Fix error event severity
      d9232cb7
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 9d045ed1
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
       "Nothing earth shattering here, lots of small fixes (f.e. missing RCU
        protection, bad ref counting, missing memset(), etc.) all over the
        place:
      
         1) Use get_file_rcu() in task_file iterator, from Yonghong Song.
      
         2) There are two ways to set remote source MAC addresses in macvlan
            driver, but only one of which validates things properly. Fix this.
            From Alvin Šipraga.
      
         3) Missing of_node_put() in gianfar probing, from Sumera
            Priyadarsini.
      
         4) Preserve device wanted feature bits across multiple netlink
            ethtool requests, from Maxim Mikityanskiy.
      
         5) Fix rcu_sched stall in task and task_file bpf iterators, from
            Yonghong Song.
      
         6) Avoid reset after device destroy in ena driver, from Shay
            Agroskin.
      
         7) Missing memset() in netlink policy export reallocation path, from
            Johannes Berg.
      
         8) Fix info leak in __smc_diag_dump(), from Peilin Ye.
      
         9) Decapsulate ECN properly for ipv6 in ipv4 tunnels, from Mark
            Tomlinson.
      
        10) Fix number of data stream negotiation in SCTP, from David Laight.
      
        11) Fix double free in connection tracker action module, from Alaa
            Hleihel.
      
        12) Don't allow empty NHA_GROUP attributes, from Nikolay Aleksandrov"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (46 commits)
        net: nexthop: don't allow empty NHA_GROUP
        bpf: Fix two typos in uapi/linux/bpf.h
        net: dsa: b53: check for timeout
        tipc: call rcu_read_lock() in tipc_aead_encrypt_done()
        net/sched: act_ct: Fix skb double-free in tcf_ct_handle_fragments() error flow
        net: sctp: Fix negotiation of the number of data streams.
        dt-bindings: net: renesas, ether: Improve schema validation
        gre6: Fix reception with IP6_TNL_F_RCV_DSCP_COPY
        hv_netvsc: Fix the queue_mapping in netvsc_vf_xmit()
        hv_netvsc: Remove "unlikely" from netvsc_select_queue
        bpf: selftests: global_funcs: Check err_str before strstr
        bpf: xdp: Fix XDP mode when no mode flags specified
        selftests/bpf: Remove test_align leftovers
        tools/resolve_btfids: Fix sections with wrong alignment
        net/smc: Prevent kernel-infoleak in __smc_diag_dump()
        sfc: fix build warnings on 32-bit
        net: phy: mscc: Fix a couple of spelling mistakes "spcified" -> "specified"
        libbpf: Fix map index used in error message
        net: gemini: Fix missing free_netdev() in error path of gemini_ethernet_port_probe()
        net: atlantic: Use readx_poll_timeout() for large timeout
        ...
      9d045ed1
  3. 23 8月, 2020 10 次提交
    • L
      Merge branch 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · f320ac6e
      Linus Torvalds 提交于
      Pull epoll fixes from Al Viro:
       "Fix reference counting and clean up exit paths"
      
      * 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        do_epoll_ctl(): clean the failure exits up a bit
        epoll: Keep a reference on files added to the check list
      f320ac6e
    • A
      do_epoll_ctl(): clean the failure exits up a bit · 52c47969
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      52c47969
    • M
      epoll: Keep a reference on files added to the check list · a9ed4a65
      Marc Zyngier 提交于
      When adding a new fd to an epoll, and that this new fd is an
      epoll fd itself, we recursively scan the fds attached to it
      to detect cycles, and add non-epool files to a "check list"
      that gets subsequently parsed.
      
      However, this check list isn't completely safe when deletions
      can happen concurrently. To sidestep the issue, make sure that
      a struct file placed on the check list sees its f_count increased,
      ensuring that a concurrent deletion won't result in the file
      disapearing from under our feet.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a9ed4a65
    • N
      net: nexthop: don't allow empty NHA_GROUP · eeaac363
      Nikolay Aleksandrov 提交于
      Currently the nexthop code will use an empty NHA_GROUP attribute, but it
      requires at least 1 entry in order to function properly. Otherwise we
      end up derefencing null or random pointers all over the place due to not
      having any nh_grp_entry members allocated, nexthop code relies on having at
      least the first member present. Empty NHA_GROUP doesn't make any sense so
      just disallow it.
      Also add a WARN_ON for any future users of nexthop_create_group().
      
       BUG: kernel NULL pointer dereference, address: 0000000000000080
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] SMP
       CPU: 0 PID: 558 Comm: ip Not tainted 5.9.0-rc1+ #93
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
       RIP: 0010:fib_check_nexthop+0x4a/0xaa
       Code: 0f 84 83 00 00 00 48 c7 02 80 03 f7 81 c3 40 80 fe fe 75 12 b8 ea ff ff ff 48 85 d2 74 6b 48 c7 02 40 03 f7 81 c3 48 8b 40 10 <48> 8b 80 80 00 00 00 eb 36 80 78 1a 00 74 12 b8 ea ff ff ff 48 85
       RSP: 0018:ffff88807983ba00 EFLAGS: 00010213
       RAX: 0000000000000000 RBX: ffff88807983bc00 RCX: 0000000000000000
       RDX: ffff88807983bc00 RSI: 0000000000000000 RDI: ffff88807bdd0a80
       RBP: ffff88807983baf8 R08: 0000000000000dc0 R09: 000000000000040a
       R10: 0000000000000000 R11: ffff88807bdd0ae8 R12: 0000000000000000
       R13: 0000000000000000 R14: ffff88807bea3100 R15: 0000000000000001
       FS:  00007f10db393700(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000080 CR3: 000000007bd0f004 CR4: 00000000003706f0
       Call Trace:
        fib_create_info+0x64d/0xaf7
        fib_table_insert+0xf6/0x581
        ? __vma_adjust+0x3b6/0x4d4
        inet_rtm_newroute+0x56/0x70
        rtnetlink_rcv_msg+0x1e3/0x20d
        ? rtnl_calcit.isra.0+0xb8/0xb8
        netlink_rcv_skb+0x5b/0xac
        netlink_unicast+0xfa/0x17b
        netlink_sendmsg+0x334/0x353
        sock_sendmsg_nosec+0xf/0x3f
        ____sys_sendmsg+0x1a0/0x1fc
        ? copy_msghdr_from_user+0x4c/0x61
        ___sys_sendmsg+0x63/0x84
        ? handle_mm_fault+0xa39/0x11b5
        ? sockfd_lookup_light+0x72/0x9a
        __sys_sendmsg+0x50/0x6e
        do_syscall_64+0x54/0xbe
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f10dacc0bb7
       Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 8b 05 9a 4b 2b 00 85 c0 75 2e 48 63 ff 48 63 d2 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 b1 f2 2a 00 f7 d8 64 89 02 48
       RSP: 002b:00007ffcbe628bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 00007ffcbe628f80 RCX: 00007f10dacc0bb7
       RDX: 0000000000000000 RSI: 00007ffcbe628c60 RDI: 0000000000000003
       RBP: 000000005f41099c R08: 0000000000000001 R09: 0000000000000008
       R10: 00000000000005e9 R11: 0000000000000246 R12: 0000000000000000
       R13: 0000000000000000 R14: 00007ffcbe628d70 R15: 0000563a86c6e440
       Modules linked in:
       CR2: 0000000000000080
      
      CC: David Ahern <dsahern@gmail.com>
      Fixes: 430a0491 ("nexthop: Add support for nexthop groups")
      Reported-by: syzbot+a61aa19b0c14c8770bd9@syzkaller.appspotmail.com
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeaac363
    • L
      Merge tag 'kbuild-fixes-v5.9' of... · c3d8f220
      Linus Torvalds 提交于
      Merge tag 'kbuild-fixes-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - move -Wsign-compare warning from W=2 to W=3
      
       - fix the keyword _restrict to __restrict in genksyms
      
       - fix more bugs in qconf
      
      * tag 'kbuild-fixes-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kconfig: qconf: replace deprecated QString::sprintf() with QTextStream
        kconfig: qconf: remove redundant help in the info view
        kconfig: qconf: remove qInfo() to get back Qt4 support
        kconfig: qconf: remove unused colNr
        kconfig: qconf: fix the popup menu in the ConfigInfoView window
        kconfig: qconf: fix signal connection to invalid slots
        genksyms: keywords: Use __restrict not _restrict
        kbuild: remove redundant patterns in filter/filter-out
        extract-cert: add static to local data
        Makefile.extrawarn: Move sign-compare from W=2 to W=3
      c3d8f220
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · dd105d64
      Linus Torvalds 提交于
      Pull arm64 fixes from Catalin Marinas:
      
       - Allow booting of late secondary CPUs affected by erratum 1418040
         (currently they are parked if none of the early CPUs are affected by
         this erratum).
      
       - Add the 32-bit vdso Makefile to the vdso_install rule so that 'make
         vdso_install' installs the 32-bit compat vdso when it is compiled.
      
       - Print a warning that untrusted guests without a CPU erratum
         workaround (Cortex-A57 832075) may deadlock the affected system.
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        ARM64: vdso32: Install vdso32 from vdso_install
        KVM: arm64: Print warning when cpu erratum can cause guests to deadlock
        arm64: Allow booting of late CPUs affected by erratum 1418040
        arm64: Move handling of erratum 1418040 into C code
      dd105d64
    • L
      Merge tag 's390-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · d57ce840
      Linus Torvalds 提交于
      Pull s390 fixes from Vasily Gorbik:
      
       - a couple of fixes for storage key handling relevant for debugging
      
       - add cond_resched into potentially slow subchannels scanning loop
      
       - fixes for PF/VF linking and to ignore stale PCI configuration request
         events
      
      * tag 's390-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/pci: fix PF/VF linking on hot plug
        s390/pci: re-introduce zpci_remove_device()
        s390/pci: fix zpci_bus_link_virtfn()
        s390/ptrace: fix storage key handling
        s390/runtime_instrumentation: fix storage key handling
        s390/pci: ignore stale configuration request event
        s390/cio: add cond_resched() in the slow_eval_known_fn() loop
      d57ce840
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · b2d9e996
      Linus Torvalds 提交于
      Pull kvm fixes from Paolo Bonzini:
      
       - PAE and PKU bugfixes for x86
      
       - selftests fix for new binutils
      
       - MMU notifier fix for arm64
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set
        KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()
        kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode
        kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode
        KVM: x86: fix access code passed to gva_to_gpa
        selftests: kvm: Use a shorter encoding to clear RAX
      b2d9e996
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 9e574b74
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "23 fixes in 5 drivers (qla2xxx, ufs, scsi_debug, fcoe, zfcp). The bulk
        of the changes are in qla2xxx and ufs and all are mostly small and
        definitely don't impact the core"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (23 commits)
        Revert "scsi: qla2xxx: Disable T10-DIF feature with FC-NVMe during probe"
        Revert "scsi: qla2xxx: Fix crash on qla2x00_mailbox_command"
        scsi: qla2xxx: Fix null pointer access during disconnect from subsystem
        scsi: qla2xxx: Check if FW supports MQ before enabling
        scsi: qla2xxx: Fix WARN_ON in qla_nvme_register_hba
        scsi: qla2xxx: Allow ql2xextended_error_logging special value 1 to be set anytime
        scsi: qla2xxx: Reduce noisy debug message
        scsi: qla2xxx: Fix login timeout
        scsi: qla2xxx: Indicate correct supported speeds for Mezz card
        scsi: qla2xxx: Flush I/O on zone disable
        scsi: qla2xxx: Flush all sessions on zone disable
        scsi: qla2xxx: Use MBX_TOV_SECONDS for mailbox command timeout values
        scsi: scsi_debug: Fix scp is NULL errors
        scsi: zfcp: Fix use-after-free in request timeout handlers
        scsi: ufs: No need to send Abort Task if the task in DB was cleared
        scsi: ufs: Clean up completed request without interrupt notification
        scsi: ufs: Improve interrupt handling for shared interrupts
        scsi: ufs: Fix interrupt error message for shared interrupts
        scsi: ufs-pci: Add quirk for broken auto-hibernate for Intel EHL
        scsi: ufs-mediatek: Fix incorrect time to wait link status
        ...
      9e574b74
    • L
      Merge tag 'devicetree-fixes-for-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · d6af6330
      Linus Torvalds 提交于
      Pull devicetree fixes from Rob Herring:
       "Another set of DT fixes:
      
         - restore range parsing error check
      
         - workaround PCI range parsing with missing 'device_type' now
           required
      
         - correct description of 'phy-connection-type'
      
         - fix erroneous matching on 'snps,dw-pcie' by 'intel,lgm-pcie' schema
      
         - a couple of grammar and whitespace fixes
      
         - update Shawn Guo's email"
      
      * tag 'devicetree-fixes-for-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        dt-bindings: vendor-prefixes: Remove trailing whitespace
        dt-bindings: net: correct description of phy-connection-type
        dt-bindings: PCI: intel,lgm-pcie: Fix matching on all snps,dw-pcie instances
        of: address: Work around missing device_type property in pcie nodes
        dt: writing-schema: Miscellaneous grammar fixes
        dt-bindings: Use Shawn Guo's preferred e-mail for i.MX bindings
        of/address: check for invalid range.cpu_addr
      d6af6330
  4. 22 8月, 2020 21 次提交
    • G
      dt-bindings: vendor-prefixes: Remove trailing whitespace · 5cd841d2
      Geert Uytterhoeven 提交于
      Fixes: f516fb70 ("dt-bindings: Whitespace clean-ups in schema files")
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Link: https://lore.kernel.org/r/20200819092058.1526-1-geert+renesas@glider.beSigned-off-by: NRob Herring <robh@kernel.org>
      5cd841d2
    • W
      KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set · b5331379
      Will Deacon 提交于
      When an MMU notifier call results in unmapping a range that spans multiple
      PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary,
      since this avoids running into RCU stalls during VM teardown. Unfortunately,
      if the VM is destroyed as a result of OOM, then blocking is not permitted
      and the call to the scheduler triggers the following BUG():
      
       | BUG: sleeping function called from invalid context at arch/arm64/kvm/mmu.c:394
       | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper
       | INFO: lockdep is turned off.
       | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1
       | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
       | Call trace:
       |  dump_backtrace+0x0/0x284
       |  show_stack+0x1c/0x28
       |  dump_stack+0xf0/0x1a4
       |  ___might_sleep+0x2bc/0x2cc
       |  unmap_stage2_range+0x160/0x1ac
       |  kvm_unmap_hva_range+0x1a0/0x1c8
       |  kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8
       |  __mmu_notifier_invalidate_range_start+0x218/0x31c
       |  mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0
       |  __oom_reap_task_mm+0x128/0x268
       |  oom_reap_task+0xac/0x298
       |  oom_reaper+0x178/0x17c
       |  kthread+0x1e4/0x1fc
       |  ret_from_fork+0x10/0x30
      
      Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we
      only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier
      flags.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 8b3405e3 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd")
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Message-Id: <20200811102725.7121-3-will@kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b5331379
    • W
      KVM: Pass MMU notifier range flags to kvm_unmap_hva_range() · fdfe7cbd
      Will Deacon 提交于
      The 'flags' field of 'struct mmu_notifier_range' is used to indicate
      whether invalidate_range_{start,end}() are permitted to block. In the
      case of kvm_mmu_notifier_invalidate_range_start(), this field is not
      forwarded on to the architecture-specific implementation of
      kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
      whether or not to block.
      
      Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
      architectures are aware as to whether or not they are permitted to block.
      
      Cc: <stable@vger.kernel.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Message-Id: <20200811102725.7121-2-will@kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fdfe7cbd
    • M
      dt-bindings: net: correct description of phy-connection-type · 5f53584c
      Madalin Bucur 提交于
      The phy-connection-type parameter is described in ePAPR 1.1:
      
      Specifies interface type between the Ethernet device and a physical
      layer (PHY) device. The value of this property is specific to the
      implementation.
      Signed-off-by: NMadalin Bucur <madalin.bucur@oss.nxp.com>
      Link: https://lore.kernel.org/r/1597917724-11127-1-git-send-email-madalin.bucur@oss.nxp.comSigned-off-by: NRob Herring <robh@kernel.org>
      5f53584c
    • L
      Merge tag 'io_uring-5.9-2020-08-21' of git://git.kernel.dk/linux-block · f873db9a
      Linus Torvalds 提交于
      Pull io_uring fixes from Jens Axboe:
      
       - Make sure the head link cancelation includes async work
      
       - Get rid of kiocb_wait_page_queue_init(), makes no sense to have it as
         a separate function since you moved it into io_uring itself
      
       - io_import_iovec cleanups (Pavel, me)
      
       - Use system_unbound_wq for ring exit work, to avoid spawning tons of
         these if we have tons of rings exiting at the same time
      
       - Fix req->flags overflow flag manipulation (Pavel)
      
      * tag 'io_uring-5.9-2020-08-21' of git://git.kernel.dk/linux-block:
        io_uring: kill extra iovec=NULL in import_iovec()
        io_uring: comment on kfree(iovec) checks
        io_uring: fix racy req->flags modification
        io_uring: use system_unbound_wq for ring exit work
        io_uring: cleanup io_import_iovec() of pre-mapped request
        io_uring: get rid of kiocb_wait_page_queue_init()
        io_uring: find and cancel head link async work on files exit
      f873db9a
    • R
      dt-bindings: PCI: intel,lgm-pcie: Fix matching on all snps,dw-pcie instances · a326462c
      Rob Herring 提交于
      The intel,lgm-pcie binding is matching on all snps,dw-pcie instances
      which is wrong. Add a custom 'select' entry to fix this.
      
      Fixes: e54ea45a ("dt-bindings: PCI: intel: Add YAML schemas for the PCIe RC controller")
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: linux-pci@vger.kernel.org
      Reviewed-by: NDilip Kota <eswara.kota@linux.intel.com>
      Signed-off-by: NRob Herring <robh@kernel.org>
      a326462c
    • L
      Merge branch 'akpm' (patches from Andrew) · 349111f0
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "11 patches.
      
        Subsystems affected by this: misc, mm/hugetlb, mm/vmalloc, mm/misc,
        romfs, relay, uprobes, squashfs, mm/cma, mm/pagealloc"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm, page_alloc: fix core hung in free_pcppages_bulk()
        mm: include CMA pages in lowmem_reserve at boot
        squashfs: avoid bio_alloc() failure with 1Mbyte blocks
        uprobes: __replace_page() avoid BUG in munlock_vma_page()
        kernel/relay.c: fix memleak on destroy relay channel
        romfs: fix uninitialized memory leak in romfs_dev_read()
        mm/rodata_test.c: fix missing function declaration
        mm/vunmap: add cond_resched() in vunmap_pmd_range
        khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
        hugetlb_cgroup: convert comma to semicolon
        mailmap: add Andi Kleen
      349111f0
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 4af7b32f
      David S. Miller 提交于
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2020-08-21
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 11 non-merge commits during the last 5 day(s) which contain
      a total of 12 files changed, 78 insertions(+), 24 deletions(-).
      
      The main changes are:
      
      1) three fixes in BPF task iterator logic, from Yonghong.
      
      2) fix for compressed dwarf sections in vmlinux, from Jiri.
      
      3) fix xdp attach regression, from Andrii.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4af7b32f
    • L
      Merge tag 'riscv-for-linus-5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · f22c5579
      Linus Torvalds 提交于
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - The CLINT driver has been split in two: one to handle the M-mode
         CLINT (memory mapped and used on NOMMU systems) and one to handle the
         S-mode CLINT (via SBI).
      
       - The addition of SiFive's drivers to rv32_defconfig
      
      * tag 'riscv-for-linus-5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: Add SiFive drivers to rv32_defconfig
        dt-bindings: timer: Add CLINT bindings
        RISC-V: Remove CLINT related code from timer and arch
        clocksource/drivers: Add CLINT timer driver
        RISC-V: Add mechanism to provide custom IPI operations
      f22c5579
    • L
      Merge tag 'for-linus-5.9-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · c0a4f5b3
      Linus Torvalds 提交于
      Pull xen fixes from Juergen Gross:
       "One build fix and a minor fix for suppressing a useless warning when
        booting a Xen dom0 via UEFI"
      
      * tag 'for-linus-5.9-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        Fix build error when CONFIG_ACPI is not set/enabled:
        efi: avoid error message when booting under Xen
      c0a4f5b3
    • L
      Merge tag 'pm-5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 985c788b
      Linus Torvalds 提交于
      Pull power management fixes from Rafael Wysocki:
       "These fix a few issues in the operating performance points (OPP)
        framework.
      
        Specifics:
      
         - Fix re-enabling of resources in dev_pm_opp_set_rate() (Rajendra
           Nayak)
      
         - Fix OPP table reference counting in error paths (Stephen Boyd)"
      
      * tag 'pm-5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        opp: Enable resources again if they were disabled earlier
        opp: Put opp table in dev_pm_opp_set_rate() if _set_opp_bw() fails
        opp: Put opp table in dev_pm_opp_set_rate() for empty tables
      985c788b
    • T
      bpf: Fix two typos in uapi/linux/bpf.h · b16fc097
      Tobias Klauser 提交于
      Also remove trailing whitespaces in bpf_skb_get_tunnel_key example code.
      Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200821133642.18870-1-tklauser@distanz.ch
      b16fc097
    • T
      net: dsa: b53: check for timeout · 774d977a
      Tom Rix 提交于
      clang static analysis reports this problem
      
      b53_common.c:1583:13: warning: The left expression of the compound
        assignment is an uninitialized value. The computed value will
        also be garbage
              ent.port &= ~BIT(port);
              ~~~~~~~~ ^
      
      ent is set by a successful call to b53_arl_read().  Unsuccessful
      calls are caught by an switch statement handling specific returns.
      b32_arl_read() calls b53_arl_op_wait() which fails with the
      unhandled -ETIMEDOUT.
      
      So add -ETIMEDOUT to the switch statement.  Because
      b53_arl_op_wait() already prints out a message, do not add another
      one.
      
      Fixes: 1da6df85 ("net: dsa: b53: Implement ARL add/del/dump operations")
      Signed-off-by: NTom Rix <trix@redhat.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      774d977a
    • S
      ARM64: vdso32: Install vdso32 from vdso_install · 8d75785a
      Stephen Boyd 提交于
      Add the 32-bit vdso Makefile to the vdso_install rule so that 'make
      vdso_install' installs the 32-bit compat vdso when it is compiled.
      
      Fixes: a7f71a2c ("arm64: compat: Add vDSO")
      Signed-off-by: NStephen Boyd <swboyd@chromium.org>
      Reviewed-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
      Acked-by: NWill Deacon <will@kernel.org>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Link: https://lore.kernel.org/r/20200818014950.42492-1-swboyd@chromium.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      8d75785a
    • L
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · d723b99e
      Linus Torvalds 提交于
      Pull ext4 updates from Ted Ts'o:
       "Improvements to ext4's block allocator performance for very large file
        systems, especially when the file system or files which are highly
        fragmented. There is a new mount option, prefetch_block_bitmaps which
        will pull in the block bitmaps and set up the in-memory buddy bitmaps
        when the file system is initially mounted.
      
        Beyond that, a lot of bug fixes and cleanups. In particular, a number
        of changes to make ext4 more robust in the face of write errors or
        file system corruptions"
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (46 commits)
        ext4: limit the length of per-inode prealloc list
        ext4: reorganize if statement of ext4_mb_release_context()
        ext4: add mb_debug logging when there are lost chunks
        ext4: Fix comment typo "the the".
        jbd2: clean up checksum verification in do_one_pass()
        ext4: change to use fallthrough macro
        ext4: remove unused parameter of ext4_generic_delete_entry function
        mballoc: replace seq_printf with seq_puts
        ext4: optimize the implementation of ext4_mb_good_group()
        ext4: delete invalid comments near ext4_mb_check_limits()
        ext4: fix typos in ext4_mb_regular_allocator() comment
        ext4: fix checking of directory entry validity for inline directories
        fs: prevent BUG_ON in submit_bh_wbc()
        ext4: correctly restore system zone info when remount fails
        ext4: handle add_system_zone() failure in ext4_setup_system_zone()
        ext4: fold ext4_data_block_valid_rcu() into the caller
        ext4: check journal inode extents more carefully
        ext4: don't allow overlapping system zones
        ext4: handle error of ext4_setup_system_zone() on remount
        ext4: delete the invalid BUGON in ext4_mb_load_buddy_gfp()
        ...
      d723b99e
    • D
      afs: Fix NULL deref in afs_dynroot_depopulate() · 5e0b17b0
      David Howells 提交于
      If an error occurs during the construction of an afs superblock, it's
      possible that an error occurs after a superblock is created, but before
      we've created the root dentry.  If the superblock has a dynamic root
      (ie.  what's normally mounted on /afs), the afs_kill_super() will call
      afs_dynroot_depopulate() to unpin any created dentries - but this will
      oops if the root hasn't been created yet.
      
      Fix this by skipping that bit of code if there is no root dentry.
      
      This leads to an oops looking like:
      
      	general protection fault, ...
      	KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f]
      	...
      	RIP: 0010:afs_dynroot_depopulate+0x25f/0x529 fs/afs/dynroot.c:385
      	...
      	Call Trace:
      	 afs_kill_super+0x13b/0x180 fs/afs/super.c:535
      	 deactivate_locked_super+0x94/0x160 fs/super.c:335
      	 afs_get_tree+0x1124/0x1460 fs/afs/super.c:598
      	 vfs_get_tree+0x89/0x2f0 fs/super.c:1547
      	 do_new_mount fs/namespace.c:2875 [inline]
      	 path_mount+0x1387/0x2070 fs/namespace.c:3192
      	 do_mount fs/namespace.c:3205 [inline]
      	 __do_sys_mount fs/namespace.c:3413 [inline]
      	 __se_sys_mount fs/namespace.c:3390 [inline]
      	 __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
      	 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      which is oopsing on this line:
      
      	inode_lock(root->d_inode);
      
      presumably because sb->s_root was NULL.
      
      Fixes: 0da0b7fd ("afs: Display manually added cells in dynamic root mount")
      Reported-by: syzbot+c1eff8205244ae7e11a6@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e0b17b0
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · cd02217a
      Linus Torvalds 提交于
      Pull rdma fixes from Jason Gunthorpe:
       "One regression from 5.8 and a few bugs from earlier kernels:
      
         - Various spelling corrections in kernel prints
      
         - Bug fixes in hfi1 and bntx_re
      
         - Revert a 5.8 patch in hns
      
         - Batch update for Mellanox and Cumulus maintainers emails"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
        MAINTAINERS: Update Mellanox and Cumulus Network addresses to new domain
        Revert "RDMA/hns: Reserve one sge in order to avoid local length error"
        RDMA/hfi1: Correct an interlock issue for TID RDMA WRITE request
        RDMA/bnxt_re: Do not add user qps to flushlist
        RDMA/core: Fix spelling mistake "Could't" -> "Couldn't"
        RDMA/usnic: Fix spelling mistake "transistion" -> "transition"
        RDMA/hns: Fix spelling mistake "epmty" -> "empty"
      cd02217a
    • L
      Merge tag 'sound-5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 7f04f3ed
      Linus Torvalds 提交于
      Pull sound fixes from Takashi Iwai:
       "A collection of small fixes over several drivers, but all are driver-
        specific and nothing looks scary.
      
        Slightly large changes are seen in ASoC qcom driver for the bugs that
        were revealed by the recent ASoC core change to report the invalid
        register access errors. Also ASoC fsl got a slight intensive change
        for the distortion fix.
      
        Others are only trivial fixes or device-specific quirks"
      
      * tag 'sound-5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (25 commits)
        ALSA: hda: avoid reset of sdo_limit
        ALSA: hda/realtek: Add quirk for Samsung Galaxy Book Ion
        ALSA: usb-audio: ignore broken processing/extension unit
        ASoC: intel: Fix memleak in sst_media_open
        ASoC: wm8994: Avoid attempts to read unreadable registers
        ASoC: msm8916-wcd-analog: fix register Interrupt offset
        ASoC: wm8994: Prevent access to invalid VU register bits on WM1811
        ALSA: hda/realtek: Add model alc298-samsung-headphone
        ALSA: usb-audio: Update documentation comment for MS2109 quirk
        ALSA: isa: fix spelling mistakes in the comments
        ALSA: usb-audio: Add capture support for Saffire 6 (USB 1.1)
        ALSA: hda/realtek: Add quirk for Samsung Galaxy Flex Book
        ASoC: q6routing: add dummy register read/write function
        ASoC: q6afe-dai: mark all widgets registers as SND_SOC_NOPM
        ASoC: Make soc_component_read() returning an error code again
        ASoC: amd: Replacing component->name with codec_dai->name.
        ASoC: fsl: Fix unused variable warning
        ASoC: tegra: tegra210_i2s: Fix compile warning with CONFIG_PM=n
        ASoC: tegra: tegra210_dmic: Fix compile warning with CONFIG_PM=n
        ASoC: tegra: tegra210_ahub: Fix compile warning with CONFIG_PM=n
        ...
      7f04f3ed
    • L
      Merge tag 'drm-fixes-2020-08-21' of git://anongit.freedesktop.org/drm/drm · 43d387a4
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "Regular fixes pull for rc2. Usual rc2 doesn't seem too busy, mainly
        i915 and amdgpu. I'd expect the usual uptick for rc3.
      
        amdgpu:
         - Fix allocation size
         - SR-IOV fixes
         - Vega20 SMU feature state caching fix
         - Fix custom pptable handling
         - Arcturus golden settings update
         - Several display fixes
         - Fixes for Navy Flounder
         - Misc display fixes
         - RAS fix
      
        amdkfd:
         - SDMA fix for renoir
      
        i915:
         - Fix device parameter usage for selftest mock i915 device
         - Fix LPSP capability debugfs NULL dereference
         - Fix buddy register pagemask table
         - Fix intel_atomic_check() non-negative return value
         - Fix selftests passing a random 0 into ilog2()
         - Fix TGL power well enable/disable ordering
         - Switch to PMU module refcounting
         - GVT fixes
      
        virtio:
         - Add missing dma_fence_put() in virtio_gpu_execbuffer_ioctl()
         - Fix memory leak in virtio_gpu_cleanup_object()"
      
      * tag 'drm-fixes-2020-08-21' of git://anongit.freedesktop.org/drm/drm: (34 commits)
        Revert "drm/amdgpu: disable gfxoff for navy_flounder"
        drm/i915/tgl: Make sure TC-cold is blocked before enabling TC AUX power wells
        drm/i915/selftests: Avoid passing a random 0 into ilog2
        drm/i915: Fix wrong return value in intel_atomic_check()
        drm/i915: Update bw_buddy pagemask table
        drm/i915/display: Check for an LPSP encoder before dereferencing
        drm/i915: Copy default modparams to mock i915_device
        drm/i915: Provide the perf pmu.module
        drm/amd/display: fix pow() crashing when given base 0
        drm/amd/display: Reset scrambling on Test Pattern
        drm/amd/display: fix dcn3 wide timing dsc validation
        drm/amd/display: Fix DFPstate hang due to view port changed
        drm/amd/display: Assign correct left shift
        drm/amd/display: Call DMUB for eDP power control
        drm/amdkfd: fix the wrong sdma instance query for renoir
        drm/amdgpu: parse ta firmware for navy_flounder
        drm/amdgpu: fix NULL pointer access issue when unloading driver
        drm/amdgpu: fix uninit-value in arcturus_log_thermal_throttling_event()
        drm/amdgpu: disable gfxoff for navy_flounder
        drm/amdgpu/display: use GFP_ATOMIC in dcn20_validate_bandwidth_internal
        ...
      43d387a4
    • C
      mm, page_alloc: fix core hung in free_pcppages_bulk() · 88e8ac11
      Charan Teja Reddy 提交于
      The following race is observed with the repeated online, offline and a
      delay between two successive online of memory blocks of movable zone.
      
      P1						P2
      
      Online the first memory block in
      the movable zone. The pcp struct
      values are initialized to default
      values,i.e., pcp->high = 0 &
      pcp->batch = 1.
      
      					Allocate the pages from the
      					movable zone.
      
      Try to Online the second memory
      block in the movable zone thus it
      entered the online_pages() but yet
      to call zone_pcp_update().
      					This process is entered into
      					the exit path thus it tries
      					to release the order-0 pages
      					to pcp lists through
      					free_unref_page_commit().
      					As pcp->high = 0, pcp->count = 1
      					proceed to call the function
      					free_pcppages_bulk().
      Update the pcp values thus the
      new pcp values are like, say,
      pcp->high = 378, pcp->batch = 63.
      					Read the pcp's batch value using
      					READ_ONCE() and pass the same to
      					free_pcppages_bulk(), pcp values
      					passed here are, batch = 63,
      					count = 1.
      
      					Since num of pages in the pcp
      					lists are less than ->batch,
      					then it will stuck in
      					while(list_empty(list)) loop
      					with interrupts disabled thus
      					a core hung.
      
      Avoid this by ensuring free_pcppages_bulk() is called with proper count of
      pcp list pages.
      
      The mentioned race is some what easily reproducible without [1] because
      pcp's are not updated for the first memory block online and thus there is
      a enough race window for P2 between alloc+free and pcp struct values
      update through onlining of second memory block.
      
      With [1], the race still exists but it is very narrow as we update the pcp
      struct values for the first memory block online itself.
      
      This is not limited to the movable zone, it could also happen in cases
      with the normal zone (e.g., hotplug to a node that only has DMA memory, or
      no other memory yet).
      
      [1]: https://patchwork.kernel.org/patch/11696389/
      
      Fixes: 5f8dcc21 ("page-allocator: split per-cpu list into one-list-per-migrate-type")
      Signed-off-by: NCharan Teja Reddy <charante@codeaurora.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Vinayak Menon <vinmenon@codeaurora.org>
      Cc: <stable@vger.kernel.org> [2.6+]
      Link: http://lkml.kernel.org/r/1597150703-19003-1-git-send-email-charante@codeaurora.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88e8ac11
    • D
      mm: include CMA pages in lowmem_reserve at boot · e08d3fdf
      Doug Berger 提交于
      The lowmem_reserve arrays provide a means of applying pressure against
      allocations from lower zones that were targeted at higher zones.  Its
      values are a function of the number of pages managed by higher zones and
      are assigned by a call to the setup_per_zone_lowmem_reserve() function.
      
      The function is initially called at boot time by the function
      init_per_zone_wmark_min() and may be called later by accesses of the
      /proc/sys/vm/lowmem_reserve_ratio sysctl file.
      
      The function init_per_zone_wmark_min() was moved up from a module_init to
      a core_initcall to resolve a sequencing issue with khugepaged.
      Unfortunately this created a sequencing issue with CMA page accounting.
      
      The CMA pages are added to the managed page count of a zone when
      cma_init_reserved_areas() is called at boot also as a core_initcall.  This
      makes it uncertain whether the CMA pages will be added to the managed page
      counts of their zones before or after the call to
      init_per_zone_wmark_min() as it becomes dependent on link order.  With the
      current link order the pages are added to the managed count after the
      lowmem_reserve arrays are initialized at boot.
      
      This means the lowmem_reserve values at boot may be lower than the values
      used later if /proc/sys/vm/lowmem_reserve_ratio is accessed even if the
      ratio values are unchanged.
      
      In many cases the difference is not significant, but for example
      an ARM platform with 1GB of memory and the following memory layout
      
        cma: Reserved 256 MiB at 0x0000000030000000
        Zone ranges:
          DMA      [mem 0x0000000000000000-0x000000002fffffff]
          Normal   empty
          HighMem  [mem 0x0000000030000000-0x000000003fffffff]
      
      would result in 0 lowmem_reserve for the DMA zone.  This would allow
      userspace to deplete the DMA zone easily.
      
      Funnily enough
      
        $ cat /proc/sys/vm/lowmem_reserve_ratio
      
      would fix up the situation because as a side effect it forces
      setup_per_zone_lowmem_reserve.
      
      This commit breaks the link order dependency by invoking
      init_per_zone_wmark_min() as a postcore_initcall so that the CMA pages
      have the chance to be properly accounted in their zone(s) and allowing
      the lowmem_reserve arrays to receive consistent values.
      
      Fixes: bc22af74 ("mm: update min_free_kbytes from khugepaged after core initialization")
      Signed-off-by: NDoug Berger <opendmb@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1597423766-27849-1-git-send-email-opendmb@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e08d3fdf