1. 17 2月, 2021 3 次提交
  2. 15 2月, 2021 1 次提交
  3. 14 2月, 2021 2 次提交
  4. 13 2月, 2021 1 次提交
  5. 11 2月, 2021 3 次提交
  6. 09 2月, 2021 2 次提交
  7. 08 2月, 2021 1 次提交
    • R
      cpufreq: ACPI: Update arch scale-invariance max perf ratio if CPPC is not there · d11a1d08
      Rafael J. Wysocki 提交于
      If the maximum performance level taken for computing the
      arch_max_freq_ratio value used in the x86 scale-invariance code is
      higher than the one corresponding to the cpuinfo.max_freq value
      coming from the acpi_cpufreq driver, the scale-invariant utilization
      falls below 100% even if the CPU runs at cpuinfo.max_freq or slightly
      faster, which causes the schedutil governor to select a frequency
      below cpuinfo.max_freq.  That frequency corresponds to a frequency
      table entry below the maximum performance level necessary to get to
      the "boost" range of CPU frequencies which prevents "boost"
      frequencies from being used in some workloads.
      
      While this issue is related to scale-invariance, it may be amplified
      by commit db865272 ("cpufreq: Avoid configuring old governors as
      default with intel_pstate") from the 5.10 development cycle which
      made it extremely easy to default to schedutil even if the preferred
      driver is acpi_cpufreq as long as intel_pstate is built too, because
      the mere presence of the latter effectively removes the ondemand
      governor from the defaults.  Distro kernels are likely to include
      both intel_pstate and acpi_cpufreq on x86, so their users who cannot
      use intel_pstate or choose to use acpi_cpufreq may easily be
      affectecd by this issue.
      
      If CPPC is available, it can be used to address this issue by
      extending the frequency tables created by acpi_cpufreq to cover the
      entire available frequency range (including "boost" frequencies) for
      each CPU, but if CPPC is not there, acpi_cpufreq has no idea what
      the maximum "boost" frequency is and the frequency tables created by
      it cannot be extended in a meaningful way, so in that case make it
      ask the arch scale-invariance code to to use the "nominal" performance
      level for CPU utilization scaling in order to avoid the issue at hand.
      
      Fixes: db865272 ("cpufreq: Avoid configuring old governors as default with intel_pstate")
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NGiovanni Gherdovich <ggherdovich@suse.cz>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      d11a1d08
  8. 06 2月, 2021 5 次提交
    • B
      x86/efi: Remove EFI PGD build time checks · 816ef8d7
      Borislav Petkov 提交于
      With CONFIG_X86_5LEVEL, CONFIG_UBSAN and CONFIG_UBSAN_UNSIGNED_OVERFLOW
      enabled, clang fails the build with
      
        x86_64-linux-ld: arch/x86/platform/efi/efi_64.o: in function `efi_sync_low_kernel_mappings':
        efi_64.c:(.text+0x22c): undefined reference to `__compiletime_assert_354'
      
      which happens due to -fsanitize=unsigned-integer-overflow being enabled:
      
        -fsanitize=unsigned-integer-overflow: Unsigned integer overflow, where
        the result of an unsigned integer computation cannot be represented
        in its type. Unlike signed integer overflow, this is not undefined
        behavior, but it is often unintentional. This sanitizer does not check
        for lossy implicit conversions performed before such a computation
        (see -fsanitize=implicit-conversion).
      
      and that fires when the (intentional) EFI_VA_START/END defines overflow
      an unsigned long, leading to the assertion expressions not getting
      optimized away (on GCC they do)...
      
      However, those checks are superfluous: the runtime services mapping
      code already makes sure the ranges don't overshoot EFI_VA_END as the
      EFI mapping range is hardcoded. On each runtime services call, it is
      switched to the EFI-specific PGD and even if mappings manage to escape
      that last PGD, this won't remain unnoticed for long.
      
      So rip them out.
      
      See https://github.com/ClangBuiltLinux/linux/issues/256 for more info.
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NNathan Chancellor <nathan@kernel.org>
      Acked-by: NArd Biesheuvel <ardb@kernel.org>
      Tested-by: NNick Desaulniers <ndesaulniers@google.com>
      Tested-by: NNathan Chancellor <nathan@kernel.org>
      Link: http://lkml.kernel.org/r/20210107223424.4135538-1-arnd@kernel.org
      816ef8d7
    • A
      powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mm · 8c511eff
      Aneesh Kumar K.V 提交于
      This fix the bad fault reported by KUAP when io_wqe_worker access userspace.
      
       Bug: Read fault blocked by KUAP!
       WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0
       NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
       LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
      ..........
       Call Trace:
       [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable)
       [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
       [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
       --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
      ..........
       NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
       LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
       interrupt: 300
       [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
       [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
       [c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120
       [c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
       [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
       [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
       [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
       [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800
       [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
       [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
       [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c
      
      The kernel consider thread AMR value for kernel thread to be
      AMR_KUAP_BLOCKED. Hence access to userspace is denied. This
      of course not correct and we should allow userspace access after
      kthread_use_mm(). To be precise, kthread_use_mm() should inherit the
      AMR value of the operating address space. But, the AMR value is
      thread-specific and we inherit the address space and not thread
      access restrictions. Because of this ignore AMR value when accessing
      userspace via kernel thread.
      
      current_thread_amr/iamr() are updated, because we use them in the
      below stack.
      ....
      [  530.710838] CPU: 13 PID: 5587 Comm: io_wqe_worker-0 Tainted: G      D           5.11.0-rc6+ #3
      ....
      
       NIP [c0000000000aa0c8] pkey_access_permitted+0x28/0x90
       LR [c0000000004b9278] gup_pte_range+0x188/0x420
       --- interrupt: 700
       [c00000001c4ef3f0] [0000000000000000] 0x0 (unreliable)
       [c00000001c4ef490] [c0000000004bd39c] gup_pgd_range+0x3ac/0xa20
       [c00000001c4ef5a0] [c0000000004bdd44] internal_get_user_pages_fast+0x334/0x410
       [c00000001c4ef620] [c000000000852028] iov_iter_get_pages+0xf8/0x5c0
       [c00000001c4ef6a0] [c0000000007da44c] bio_iov_iter_get_pages+0xec/0x700
       [c00000001c4ef770] [c0000000006a325c] iomap_dio_bio_actor+0x2ac/0x4f0
       [c00000001c4ef810] [c00000000069cd94] iomap_apply+0x2b4/0x740
       [c00000001c4ef920] [c0000000006a38b8] __iomap_dio_rw+0x238/0x5c0
       [c00000001c4ef9d0] [c0000000006a3c60] iomap_dio_rw+0x20/0x80
       [c00000001c4ef9f0] [c008000001927a30] xfs_file_dio_aio_write+0x1f8/0x650 [xfs]
       [c00000001c4efa60] [c0080000019284dc] xfs_file_write_iter+0xc4/0x130 [xfs]
       [c00000001c4efa90] [c000000000669984] io_write+0x104/0x4b0
       [c00000001c4efbb0] [c00000000066cea4] io_issue_sqe+0x3d4/0xf50
       [c00000001c4efc60] [c000000000670200] io_wq_submit_work+0xb0/0x2f0
       [c00000001c4efcb0] [c000000000674268] io_worker_handle_work+0x248/0x4a0
       [c00000001c4efd30] [c0000000006746e8] io_wqe_worker+0x228/0x2a0
       [c00000001c4efda0] [c00000000019d994] kthread+0x1b4/0x1c0
      
      Fixes: 48a8ab4e ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.")
      Reported-by: NZorro Lang <zlang@redhat.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210206025634.521979-1-aneesh.kumar@linux.ibm.com
      8c511eff
    • G
      entry: Ensure trap after single-step on system call return · 6342adca
      Gabriel Krisman Bertazi 提交于
      Commit 29915524 ("entry: Drop usage of TIF flags in the generic syscall
      code") introduced a bug on architectures using the generic syscall entry
      code, in which processes stopped by PTRACE_SYSCALL do not trap on syscall
      return after receiving a TIF_SINGLESTEP.
      
      The reason is that the meaning of TIF_SINGLESTEP flag is overloaded to
      cause the trap after a system call is executed, but since the above commit,
      the syscall call handler only checks for the SYSCALL_WORK flags on the exit
      work.
      
      Split the meaning of TIF_SINGLESTEP such that it only means single-step
      mode, and create a new type of SYSCALL_WORK to request a trap immediately
      after a syscall in single-step mode.  In the current implementation, the
      SYSCALL_WORK flag shadows the TIF_SINGLESTEP flag for simplicity.
      
      Update x86 to flip this bit when a tracer enables single stepping.
      
      Fixes: 29915524 ("entry: Drop usage of TIF flags in the generic syscall code")
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGabriel Krisman Bertazi <krisman@collabora.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NKyle Huey <me@kylehuey.com>
      Link: https://lore.kernel.org/r/87h7mtc9pr.fsf_-_@collabora.com
      6342adca
    • L
      x86/debug: Prevent data breakpoints on cpu_dr7 · 3943abf2
      Lai Jiangshan 提交于
      local_db_save() is called at the start of exc_debug_kernel(), reads DR7 and
      disables breakpoints to prevent recursion.
      
      When running in a guest (X86_FEATURE_HYPERVISOR), local_db_save() reads the
      per-cpu variable cpu_dr7 to check whether a breakpoint is active or not
      before it accesses DR7.
      
      A data breakpoint on cpu_dr7 therefore results in infinite #DB recursion.
      
      Disallow data breakpoints on cpu_dr7 to prevent that.
      
      Fixes: 84b6a349("x86/entry: Optimize local_db_save() for virt")
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210204152708.21308-2-jiangshanlai@gmail.com
      3943abf2
    • L
      x86/debug: Prevent data breakpoints on __per_cpu_offset · c4bed4b9
      Lai Jiangshan 提交于
      When FSGSBASE is enabled, paranoid_entry() fetches the per-CPU GSBASE value
      via __per_cpu_offset or pcpu_unit_offsets.
      
      When a data breakpoint is set on __per_cpu_offset[cpu] (read-write
      operation), the specific CPU will be stuck in an infinite #DB loop.
      
      RCU will try to send an NMI to the specific CPU, but it is not working
      either since NMI also relies on paranoid_entry(). Which means it's
      undebuggable.
      
      Fixes: eaad9812("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210204152708.21308-1-jiangshanlai@gmail.com
      c4bed4b9
  9. 05 2月, 2021 5 次提交
    • R
      ARM: kexec: fix oops after TLB are invalidated · 4d62e81b
      Russell King 提交于
      Giancarlo Ferrari reports the following oops while trying to use kexec:
      
       Unable to handle kernel paging request at virtual address 80112f38
       pgd = fd7ef03e
       [80112f38] *pgd=0001141e(bad)
       Internal error: Oops: 80d [#1] PREEMPT SMP ARM
       ...
      
      This is caused by machine_kexec() trying to set the kernel text to be
      read/write, so it can poke values into the relocation code before
      copying it - and an interrupt occuring which changes the page tables.
      The subsequent writes then hit read-only sections that trigger a
      data abort resulting in the above oops.
      
      Fix this by copying the relocation code, and then writing the variables
      into the destination, thereby avoiding the need to make the kernel text
      read/write.
      Reported-by: NGiancarlo Ferrari <giancarlo.ferrari89@gmail.com>
      Tested-by: NGiancarlo Ferrari <giancarlo.ferrari89@gmail.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      4d62e81b
    • R
      ARM: ensure the signal page contains defined contents · 9c698bff
      Russell King 提交于
      Ensure that the signal page contains our poison instruction to increase
      the protection against ROP attacks and also contains well defined
      contents.
      Acked-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      9c698bff
    • D
      x86/apic: Add extra serialization for non-serializing MSRs · 25a068b8
      Dave Hansen 提交于
      Jan Kiszka reported that the x2apic_wrmsr_fence() function uses a plain
      MFENCE while the Intel SDM (10.12.3 MSR Access in x2APIC Mode) calls for
      MFENCE; LFENCE.
      
      Short summary: we have special MSRs that have weaker ordering than all
      the rest. Add fencing consistent with current SDM recommendations.
      
      This is not known to cause any issues in practice, only in theory.
      
      Longer story below:
      
      The reason the kernel uses a different semantic is that the SDM changed
      (roughly in late 2017). The SDM changed because folks at Intel were
      auditing all of the recommended fences in the SDM and realized that the
      x2apic fences were insufficient.
      
      Why was the pain MFENCE judged insufficient?
      
      WRMSR itself is normally a serializing instruction. No fences are needed
      because the instruction itself serializes everything.
      
      But, there are explicit exceptions for this serializing behavior written
      into the WRMSR instruction documentation for two classes of MSRs:
      IA32_TSC_DEADLINE and the X2APIC MSRs.
      
      Back to x2apic: WRMSR is *not* serializing in this specific case.
      But why is MFENCE insufficient? MFENCE makes writes visible, but
      only affects load/store instructions. WRMSR is unfortunately not a
      load/store instruction and is unaffected by MFENCE. This means that a
      non-serializing WRMSR could be reordered by the CPU to execute before
      the writes made visible by the MFENCE have even occurred in the first
      place.
      
      This means that an x2apic IPI could theoretically be triggered before
      there is any (visible) data to process.
      
      Does this affect anything in practice? I honestly don't know. It seems
      quite possible that by the time an interrupt gets to consume the (not
      yet) MFENCE'd data, it has become visible, mostly by accident.
      
      To be safe, add the SDM-recommended fences for all x2apic WRMSRs.
      
      This also leaves open the question of the _other_ weakly-ordered WRMSR:
      MSR_IA32_TSC_DEADLINE. While it has the same ordering architecture as
      the x2APIC MSRs, it seems substantially less likely to be a problem in
      practice. While writes to the in-memory Local Vector Table (LVT) might
      theoretically be reordered with respect to a weakly-ordered WRMSR like
      TSC_DEADLINE, the SDM has this to say:
      
        In x2APIC mode, the WRMSR instruction is used to write to the LVT
        entry. The processor ensures the ordering of this write and any
        subsequent WRMSR to the deadline; no fencing is required.
      
      But, that might still leave xAPIC exposed. The safest thing to do for
      now is to add the extra, recommended LFENCE.
      
       [ bp: Massage commit message, fix typos, drop accidentally added
         newline to tools/arch/x86/include/asm/barrier.h. ]
      Reported-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20200305174708.F77040DD@viggo.jf.intel.com
      25a068b8
    • M
      Revert "x86/setup: don't remove E820_TYPE_RAM for pfn 0" · 5c279c4c
      Mike Rapoport 提交于
      This reverts commit bde9cfa3.
      
      Changing the first memory page type from E820_TYPE_RESERVED to
      E820_TYPE_RAM makes it a part of "System RAM" resource rather than a
      reserved resource and this in turn causes devmem_is_allowed() to treat
      is as area that can be accessed but it is filled with zeroes instead of
      the actual data as previously.
      
      The change in /dev/mem output causes lilo to fail as was reported at
      slakware users forum, and probably other legacy applications will
      experience similar problems.
      
      Link: https://www.linuxquestions.org/questions/slackware-14/slackware-current-lilo-vesa-warnings-after-recent-updates-4175689617/#post6214439Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c279c4c
    • S
      KVM: x86: Set so called 'reserved CR3 bits in LM mask' at vCPU reset · 031b91a5
      Sean Christopherson 提交于
      Set cr3_lm_rsvd_bits, which is effectively an invalid GPA mask, at vCPU
      reset.  The reserved bits check needs to be done even if userspace never
      configures the guest's CPUID model.
      
      Cc: stable@vger.kernel.org
      Fixes: 0107973a ("KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210204000117.3303214-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      031b91a5
  10. 04 2月, 2021 1 次提交
  11. 03 2月, 2021 11 次提交
  12. 02 2月, 2021 5 次提交