1. 14 11月, 2017 3 次提交
    • M
      s390: remove all code using the access register mode · 0aaba41b
      Martin Schwidefsky 提交于
      The vdso code for the getcpu() and the clock_gettime() call use the access
      register mode to access the per-CPU vdso data page with the current code.
      
      An alternative to the complicated AR mode is to use the secondary space
      mode. This makes the vdso faster and quite a bit simpler. The downside is
      that the uaccess code has to be changed quite a bit.
      
      Which instructions are used depends on the machine and what kind of uaccess
      operation is requested. The instruction dictates which ASCE value needs
      to be loaded into %cr1 and %cr7.
      
      The different cases:
      
      * User copy with MVCOS for z10 and newer machines
        The MVCOS instruction can copy between the primary space (aka user) and
        the home space (aka kernel) directly. For set_fs(KERNEL_DS) the kernel
        ASCE is loaded into %cr1. For set_fs(USER_DS) the user space is already
        loaded in %cr1.
      
      * User copy with MVCP/MVCS for older machines
        To be able to execute the MVCP/MVCS instructions the kernel needs to
        switch to primary mode. The control register %cr1 has to be set to the
        kernel ASCE and %cr7 to either the kernel ASCE or the user ASCE dependent
        on set_fs(KERNEL_DS) vs set_fs(USER_DS).
      
      * Data access in the user address space for strnlen / futex
        To use "normal" instruction with data from the user address space the
        secondary space mode is used. The kernel needs to switch to primary mode,
        %cr1 has to contain the kernel ASCE and %cr7 either the user ASCE or the
        kernel ASCE, dependent on set_fs.
      
      To load a new value into %cr1 or %cr7 is an expensive operation, the kernel
      tries to be lazy about it. E.g. for multiple user copies in a row with
      MVCP/MVCS the replacement of the vdso ASCE in %cr7 with the user ASCE is
      done only once. On return to user space a CPU bit is checked that loads the
      vdso ASCE again.
      
      To enable and disable the data access via the secondary space two new
      functions are added, enable_sacf_uaccess and disable_sacf_uaccess. The fact
      that a context is in secondary space uaccess mode is stored in the
      mm_segment_t value for the task. The code of an interrupt may use set_fs
      as long as it returns to the previous state it got with get_fs with another
      call to set_fs. The code in finish_arch_post_lock_switch simply has to do a
      set_fs with the current mm_segment_t value for the task.
      
      For CPUs with MVCOS:
      
      CPU running in                        | %cr1 ASCE | %cr7 ASCE |
      --------------------------------------|-----------|-----------|
      user space                            |  user     |  vdso     |
      kernel, USER_DS, normal-mode          |  user     |  vdso     |
      kernel, USER_DS, normal-mode, lazy    |  user     |  user     |
      kernel, USER_DS, sacf-mode            |  kernel   |  user     |
      kernel, KERNEL_DS, normal-mode        |  kernel   |  vdso     |
      kernel, KERNEL_DS, normal-mode, lazy  |  kernel   |  kernel   |
      kernel, KERNEL_DS, sacf-mode          |  kernel   |  kernel   |
      
      For CPUs without MVCOS:
      
      CPU running in                        | %cr1 ASCE | %cr7 ASCE |
      --------------------------------------|-----------|-----------|
      user space                            |  user     |  vdso     |
      kernel, USER_DS, normal-mode          |  user     |  vdso     |
      kernel, USER_DS, normal-mode lazy     |  kernel   |  user     |
      kernel, USER_DS, sacf-mode            |  kernel   |  user     |
      kernel, KERNEL_DS, normal-mode        |  kernel   |  vdso     |
      kernel, KERNEL_DS, normal-mode, lazy  |  kernel   |  kernel   |
      kernel, KERNEL_DS, sacf-mode          |  kernel   |  kernel   |
      
      The lines with "lazy" refer to the state after a copy via the secondary
      space with a delayed reload of %cr1 and %cr7.
      
      There are three hardware address spaces that can cause a DAT exception,
      primary, secondary and home space. The exception can be related to
      four different fault types: user space fault, vdso fault, kernel fault,
      and the gmap faults.
      
      Dependent on the set_fs state and normal vs. sacf mode there are a number
      of fault combinations:
      
      1) user address space fault via the primary ASCE
      2) gmap address space fault via the primary ASCE
      3) kernel address space fault via the primary ASCE for machines with
         MVCOS and set_fs(KERNEL_DS)
      4) vdso address space faults via the secondary ASCE with an invalid
         address while running in secondary space in problem state
      5) user address space fault via the secondary ASCE for user-copy
         based on the secondary space mode, e.g. futex_ops or strnlen_user
      6) kernel address space fault via the secondary ASCE for user-copy
         with secondary space mode with set_fs(KERNEL_DS)
      7) kernel address space fault via the primary ASCE for user-copy
         with secondary space mode with set_fs(USER_DS) on machines without
         MVCOS.
      8) kernel address space fault via the home space ASCE
      
      Replace user_space_fault() with a new function get_fault_type() that
      can distinguish all four different fault types.
      
      With these changes the futex atomic ops from the kernel and the
      strnlen_user will get a little bit slower, as well as the old style
      uaccess with MVCP/MVCS. All user accesses based on MVCOS will be as
      fast as before. On the positive side, the user space vdso code is a
      lot faster and Linux ceases to use the complicated AR mode.
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      0aaba41b
    • M
      s390/mm,kvm: improve detection of KVM guest faults · c771320e
      Martin Schwidefsky 提交于
      The identification of guest fault currently relies on the PF_VCPU flag.
      This is set in guest_entry_irqoff and cleared in guest_exit_irqoff.
      Both functions are called by __vcpu_run, the PF_VCPU flag is set for
      quite a lot of kernel code outside of the guest execution.
      
      Replace the PF_VCPU scheme with the PIF_GUEST_FAULT in the pt_regs and
      make the program check handler code in entry.S set the bit only for
      exception that occurred between the .Lsie_gmap and .Lsie_done labels.
      Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      c771320e
    • R
      x86 / CPU: Avoid unnecessary IPIs in arch_freq_get_on_cpu() · b29c6ef7
      Rafael J. Wysocki 提交于
      Even though aperfmperf_snapshot_khz() caches the samples.khz value to
      return if called again in a sufficiently short time, its caller,
      arch_freq_get_on_cpu(), still uses smp_call_function_single() to run it
      which may allow user space to trigger an IPI storm by reading from the
      scaling_cur_freq cpufreq sysfs file in a tight loop.
      
      To avoid that, move the decision on whether or not to return the cached
      samples.khz value to arch_freq_get_on_cpu().
      
      This change was part of commit 941f5f0f ("x86: CPU: Fix up "cpu MHz"
      in /proc/cpuinfo"), but it was not the reason for the revert and it
      remains applicable.
      
      Fixes: 4815d3c5 (cpufreq: x86: Make scaling_cur_freq behave more as expected)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NWANG Chao <chao.wang@ucloud.cn>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b29c6ef7
  2. 12 11月, 2017 1 次提交
    • X
      x86/intel_rdt: Fix a silent failure when writing zero value schemata · 2244645a
      Xiaochen Shen 提交于
      Writing an invalid schemata with no domain values (e.g., "(L3|MB):"),
      results in a silent failure, i.e. the last_cmd_status returns OK,
      
      Check for an empty value and set the result string with a proper error
      message and return -EINVAL.
      
      Before the fix:
       # mkdir /sys/fs/resctrl/p1
      
       # echo "L3:" > /sys/fs/resctrl/p1/schemata
       (silent failure)
       # cat /sys/fs/resctrl/info/last_cmd_status
       ok
      
       # echo "MB:" > /sys/fs/resctrl/p1/schemata
       (silent failure)
       # cat /sys/fs/resctrl/info/last_cmd_status
       ok
      
      After the fix:
       # mkdir /sys/fs/resctrl/p1
      
       # echo "L3:" > /sys/fs/resctrl/p1/schemata
       -bash: echo: write error: Invalid argument
       # cat /sys/fs/resctrl/info/last_cmd_status
       Missing 'L3' value
      
       # echo "MB:" > /sys/fs/resctrl/p1/schemata
       -bash: echo: write error: Invalid argument
       # cat /sys/fs/resctrl/info/last_cmd_status
       Missing 'MB' value
      
      [ Tony: This is an unintended side effect of the patch earlier to allow the
          	user to just write the value they want to change.  While allowing
          	user to specify less than all of the values, it also allows an
          	empty value. ]
      
      Fixes: c4026b7b ("x86/intel_rdt: Implement "update" mode when writing schemata file")
      Signed-off-by: NXiaochen Shen <xiaochen.shen@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Link: https://lkml.kernel.org/r/20171110191624.20280-1-tony.luck@intel.com
      2244645a
  3. 11 11月, 2017 4 次提交
    • L
      Revert "x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo" · ea0ee339
      Linus Torvalds 提交于
      This reverts commit 941f5f0f.
      
      Sadly, it turns out that we really can't just do the cross-CPU IPI to
      all CPU's to get their proper frequencies, because it's much too
      expensive on systems with lots of cores.
      
      So we'll have to revert this for now, and revisit it using a smarter
      model (probably doing one system-wide IPI at open time, and doing all
      the frequency calculations in parallel).
      Reported-by: NWANG Chao <chao.wang@ucloud.cn>
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Cc: Rafael J Wysocki <rafael.j.wysocki@intel.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea0ee339
    • H
      s390/noexec: execute kexec datamover without DAT · d0e810ee
      Heiko Carstens 提交于
      Rebooting into a new kernel with kexec fails (system dies) if tried on
      a machine that has no-execute support. Reason for this is that the so
      called datamover code gets executed with DAT on (MMU is active) and
      the page that contains the datamover is marked as non-executable.
      Therefore when branching into the datamover an unexpected program
      check happens and afterwards the machine is dead.
      
      This can be simply avoided by disabling DAT, which also disables any
      no-execute checks, just before the datamover gets executed.
      
      In fact the first thing done by the datamover is to disable DAT. The
      code in the datamover that disables DAT can be removed as well.
      
      Thanks to Michael Holzheu and Gerald Schaefer for tracking this down.
      Reviewed-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Reviewed-by: NPhilipp Rudo <prudo@linux.vnet.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Fixes: 57d7f939 ("s390: add no-execute support")
      Cc: <stable@vger.kernel.org> # v4.11+
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      d0e810ee
    • H
      s390: fix transactional execution control register handling · a1c5befc
      Heiko Carstens 提交于
      Dan Horák reported the following crash related to transactional execution:
      
      User process fault: interruption code 0013 ilc:3 in libpthread-2.26.so[3ff93c00000+1b000]
      CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1
      Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
      task: 00000000fafc8000 task.stack: 00000000fafc4000
      User PSW : 0705200180000000 000003ff93c14e70
                 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
      User GPRS: 0000000000000077 000003ff00000000 000003ff93144d48 000003ff93144d5e
                 0000000000000000 0000000000000002 0000000000000000 000003ff00000000
                 0000000000000000 0000000000000418 0000000000000000 000003ffcc9fe770
                 000003ff93d28f50 000003ff9310acf0 000003ff92b0319a 000003ffcc9fe6d0
      User Code: 000003ff93c14e62: 60e0b030            std     %f14,48(%r11)
                 000003ff93c14e66: 60f0b038            std     %f15,56(%r11)
                #000003ff93c14e6a: e5600000ff0e        tbegin  0,65294
                >000003ff93c14e70: a7740006            brc     7,3ff93c14e7c
                 000003ff93c14e74: a7080000            lhi     %r0,0
                 000003ff93c14e78: a7f40023            brc     15,3ff93c14ebe
                 000003ff93c14e7c: b2220000            ipm     %r0
                 000003ff93c14e80: 8800001c            srl     %r0,28
      
      There are several bugs with control register handling with respect to
      transactional execution:
      
      - on task switch update_per_regs() is only called if the next task has
        an mm (is not a kernel thread). This however is incorrect. This
        breaks e.g. for user mode helper handling, where the kernel creates
        a kernel thread and then execve's a user space program. Control
        register contents related to transactional execution won't be
        updated on execve. If the previous task ran with transactional
        execution disabled then the new task will also run with
        transactional execution disabled, which is incorrect. Therefore call
        update_per_regs() unconditionally within switch_to().
      
      - on startup the transactional execution facility is not enabled for
        the idle thread. This is not really a bug, but an inconsistency to
        other facilities. Therefore enable the facility if it is available.
      
      - on fork the new thread's per_flags field is not cleared. This means
        that a child process inherits the PER_FLAG_NO_TE flag. This flag can
        be set with a ptrace request to disable transactional execution for
        the current process. It should not be inherited by new child
        processes in order to be consistent with the handling of all other
        PER related debugging options. Therefore clear the per_flags field in
        copy_thread_tls().
      Reported-and-tested-by: NDan Horák <dan@danny.cz>
      Fixes: d35339a4 ("s390: add support for transactional memory")
      Cc: <stable@vger.kernel.org> # v3.7+
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      a1c5befc
    • M
      s390/bpf: take advantage of stack_depth tracking · 78372709
      Michael Holzheu 提交于
      Make use of the "stack_depth" tracking feature introduced with
      commit 8726679a ("bpf: teach verifier to track stack depth") for the
      s390 JIT, so that stack usage can be reduced.
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      78372709
  4. 10 11月, 2017 14 次提交
  5. 09 11月, 2017 8 次提交
    • H
      s390: simplify transactional execution elf hwcap handling · baaf9be8
      Heiko Carstens 提交于
      Just use MACHINE_HAS_TE to decide if HWCAP_S390_TE needs
      to be added to elf_hwcap.
      Suggested-by: NDan Horák <dan@danny.cz>
      Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      baaf9be8
    • C
      s390/virtio: remove unused header file kvm_virtio.h · a401917b
      Christian Borntraeger 提交于
      With commit 7fb2b2d5 ("s390/virtio: remove the old KVM virtio
      transport") the pre-ccw virtio transport for s390 was removed. To
      complete the removal the uapi header file that contains the related data
      structures must also be removed.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      a401917b
    • C
      x86/build: Make the boot image generation less verbose · 7980f029
      Changbin Du 提交于
      This change suppresses the 'dd' output and adds the '-quiet' parameter
      to mkisofs tool. It also removes the 'Using ...' messages, as none of the
      messages matter to the user normally.
      
      "make V=1" can still be used for a more verbose build.
      
      The new build messages are now a streamlined set of:
      
        $ make isoimage
        ...
        Kernel: arch/x86/boot/bzImage is ready  (#75)
          GENIMAGE arch/x86/boot/image.iso
        Kernel: arch/x86/boot/image.iso is ready
      Signed-off-by: NChangbin Du <changbin.du@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1510207751-22166-1-git-send-email-changbin.du@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7980f029
    • J
      x86/mm: Unbreak modules that rely on external PAGE_KERNEL availability · 87df2617
      Jiri Kosina 提交于
      Commit 7744ccdb ("x86/mm: Add Secure Memory Encryption (SME)
      support") as a side-effect made PAGE_KERNEL all of a sudden unavailable
      to modules which can't make use of EXPORT_SYMBOL_GPL() symbols.
      
      This is because once SME is enabled, sme_me_mask (which is introduced as
      EXPORT_SYMBOL_GPL) makes its way to PAGE_KERNEL through _PAGE_ENC,
      causing imminent build failure for all the modules which make use of all
      the EXPORT-SYMBOL()-exported API (such as vmap(), __vmalloc(),
      remap_pfn_range(), ...).
      
      Exporting (as EXPORT_SYMBOL()) interfaces (and having done so for ages)
      that take pgprot_t argument, while making it impossible to -- all of a
      sudden -- pass PAGE_KERNEL to it, feels rather incosistent.
      
      Restore the original behavior and make it possible to pass PAGE_KERNEL
      to all its EXPORT_SYMBOL() consumers.
      
      [ This is all so not wonderful. We shouldn't need that "sme_me_mask"
        access at all in all those places that really don't care about that
        level of detail, and just want _PAGE_KERNEL or whatever.
      
        We have some similar issues with _PAGE_CACHE_WP and _PAGE_NOCACHE,
        both of which hide a "cachemode2protval()" call, and which also ends
        up using another EXPORT_SYMBOL(), but at least that only triggers for
        the much more rare cases.
      
        Maybe we could move these dynamic page table bits to be generated much
        deeper down in the VM layer, instead of hiding them in the macros that
        everybody uses.
      
        So this all would merit some cleanup. But not today.   - Linus ]
      
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Despised-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87df2617
    • H
      s390: avoid undefined behaviour · ead7a22e
      Heiko Carstens 提交于
      At a couple of places smatch emits warnings like this:
      
          arch/s390/mm/vmem.c:409 vmem_map_init() warn:
              right shifting more than type allows
      
      In fact shifting a signed type right is undefined. Avoid this and add
      an unsigned long cast. The shifted values are always positive.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      ead7a22e
    • H
      s390/disassembler: generate opcode tables from text file · 8bc1e4ec
      Heiko Carstens 提交于
      The current way of adding new instructions to the opcode tables is
      painful and error prone. Therefore add, similar to binutils, a text
      file which contains all opcodes and the corresponding mnemonics and
      instruction formats.
      
      A small gen_opcode_table tool then generates a header file with the
      required enums and opcode table initializers at the prepare step of
      the kernel build.
      
      This way only a simple text file has to be maintained, which can be
      rather easily extended.
      
      Unlike before where there were plenty of opcode tables and a large
      switch statement to find the correct opcode table, there is now only
      one opcode table left which contains all instructions. A second opcode
      offset table now contains offsets within the opcode table to find
      instructions which have the same opcode prefix. In order to save space
      all 1-byte opcode instructions are grouped together at the end of the
      opcode table. This is also quite similar to like it was before.
      
      In addition also move and change code and definitions within the
      disassembler. As a side effect this reduces the size required for the
      code and opcode tables by ~1.5k.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      8bc1e4ec
    • H
      s390/disassembler: remove insn_to_mnemonic() · dac6dc26
      Heiko Carstens 提交于
      insn_to_mnemonic() was introduced ages ago for KVM debugging, but is
      unused in the meantime. Therefore remove it.
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      dac6dc26
    • Y
      x86/idt: Remove X86_TRAP_BP initialization in idt_setup_traps() · d0cd64b0
      Yonghong Song 提交于
      Commit b70543a0("x86/idt: Move regular trap init to tables") moves
      regular trap init for each trap vector into a table based
      initialization. It introduced the initialization for vector X86_TRAP_BP
      which was not in the code which it replaced. This breaks uprobe
      functionality for x86_32; the probed program segfaults instead of handling
      the probe proper.
      
      The reason for this is that TRAP_BP is set up as system interrupt gate
      (DPL3) in the early IDT and then replaced by a regular interrupt gate
      (DPL0) in idt_setup_traps(). The DPL0 restriction causes the int3 trap
      to fail with a #GP resulting in a SIGSEGV of the probed program.
      
      On 64bit this does not cause a problem because the IDT entry is replaced
      with a system interrupt gate (DPL3) with interrupt stack afterwards.
      
      Remove X86_TRAP_BP from the def_idts table which is used in
      idt_setup_traps(). Remove a redundant entry for X86_TRAP_NMI in def_idts
      while at it. Tested on both x86_64 and x86_32.
      
      [ tglx: Amended changelog with a description of the root cause ]
      
      Fixes: b70543a0("x86/idt: Move regular trap init to tables")
      Reported-and-tested-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: a.p.zijlstra@chello.nl
      Cc: ast@fb.com
      Cc: oleg@redhat.com
      Cc: luto@kernel.org
      Cc: kernel-team@fb.com
      Link: https://lkml.kernel.org/r/20171108192845.552709-1-yhs@fb.com
      d0cd64b0
  6. 08 11月, 2017 10 次提交
    • O
      MIPS: AR7: Ensure that serial ports are properly set up · b084116f
      Oswald Buddenhagen 提交于
      Without UPF_FIXED_TYPE, the data from the PORT_AR7 uart_config entry is
      never copied, resulting in a dead port.
      
      Fixes: 154615d5 ("MIPS: AR7: Use correct UART port type")
      Signed-off-by: NOswald Buddenhagen <oswald.buddenhagen@gmx.de>
      [jonas.gorski: add Fixes tag]
      Signed-off-by: NJonas Gorski <jonas.gorski@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
      Cc: Nicolas Schichan <nschichan@freebox.fr>
      Cc: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
      Cc: linux-mips@linux-mips.org
      Cc: linux-serial@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Patchwork: https://patchwork.linux-mips.org/patch/17543/Signed-off-by: NJames Hogan <jhogan@kernel.org>
      b084116f
    • J
      MIPS: AR7: Defer registration of GPIO · e6b03ab6
      Jonas Gorski 提交于
      When called from prom init code, ar7_gpio_init() will fail as it will
      call gpiochip_add() which relies on a working kmalloc() to alloc
      the gpio_desc array and kmalloc is not useable yet at prom init time.
      
      Move ar7_gpio_init() to ar7_register_devices() (a device_initcall)
      where kmalloc works.
      
      Fixes: 14e85c0e ("gpio: remove gpio_descs global array")
      Signed-off-by: NJonas Gorski <jonas.gorski@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
      Cc: Nicolas Schichan <nschichan@freebox.fr>
      Cc: linux-mips@linux-mips.org
      Cc: linux-serial@vger.kernel.org
      Cc: <stable@vger.kernel.org> # 3.19+
      Patchwork: https://patchwork.linux-mips.org/patch/17542/Signed-off-by: NJames Hogan <jhogan@kernel.org>
      e6b03ab6
    • B
      x86/oprofile/ppro: Do not use __this_cpu*() in preemptible context · a743bbee
      Borislav Petkov 提交于
      The warning below says it all:
      
        BUG: using __this_cpu_read() in preemptible [00000000] code: swapper/0/1
        caller is __this_cpu_preempt_check
        CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.0-rc8 #4
        Call Trace:
         dump_stack
         check_preemption_disabled
         ? do_early_param
         __this_cpu_preempt_check
         arch_perfmon_init
         op_nmi_init
         ? alloc_pci_root_info
         oprofile_arch_init
         oprofile_init
         do_one_initcall
         ...
      
      These accessors should not have been used in the first place: it is PPro so
      no mixed silicon revisions and thus it can simply use boot_cpu_data.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Tested-by: NFengguang Wu <fengguang.wu@intel.com>
      Fix-creation-mandated-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Robert Richter <rric@kernel.org>
      Cc: x86@kernel.org
      Cc: stable@vger.kernel.org
      a743bbee
    • R
      x86/traps: Fix up general protection faults caused by UMIP · 6fc9dc81
      Ricardo Neri 提交于
      If the User-Mode Instruction Prevention CPU feature is available and
      enabled, a general protection fault will be issued if the instructions
      sgdt, sldt, sidt, str or smsw are executed from user-mode context
      (CPL > 0). If the fault was caused by any of the instructions protected
      by UMIP, fixup_umip_exception() will emulate dummy results for these
      instructions as follows: in virtual-8086 and protected modes, sgdt, sidt
      and smsw are emulated; str and sldt are not emulated. No emulation is done
      for user-space long mode processes.
      
      If emulation is successful, the emulated result is passed to the user space
      program and no SIGSEGV signal is emitted.
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: ricardo.neri@intel.com
      Link: http://lkml.kernel.org/r/1509935277-22138-11-git-send-email-ricardo.neri-calderon@linux.intel.com
      [ Added curly braces. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6fc9dc81
    • R
      x86/umip: Enable User-Mode Instruction Prevention at runtime · aa35f896
      Ricardo Neri 提交于
      User-Mode Instruction Prevention (UMIP) is enabled by setting/clearing a
      bit in %cr4.
      
      It makes sense to enable UMIP at some point while booting, before user
      spaces come up. Like SMAP and SMEP, is not critical to have it enabled
      very early during boot. This is because UMIP is relevant only when there is
      a user space to be protected from. Given these similarities, UMIP can be
      enabled along with SMAP and SMEP.
      
      At the moment, UMIP is disabled by default at build time. It can be enabled
      at build time by selecting CONFIG_X86_INTEL_UMIP. If enabled at build time,
      it can be disabled at run time by adding clearcpuid=514 to the kernel
      parameters.
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: ricardo.neri@intel.com
      Link: http://lkml.kernel.org/r/1509935277-22138-10-git-send-email-ricardo.neri-calderon@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      aa35f896
    • R
      x86/umip: Force a page fault when unable to copy emulated result to user · c6a960bb
      Ricardo Neri 提交于
      fixup_umip_exception() will be called from do_general_protection(). If the
      former returns false, the latter will issue a SIGSEGV with SEND_SIG_PRIV.
      However, when emulation is successful but the emulated result cannot be
      copied to user space memory, it is more accurate to issue a SIGSEGV with
      SEGV_MAPERR with the offending address. A new function, inspired in
      force_sig_info_fault(), is introduced to model the page fault.
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: ricardo.neri@intel.com
      Link: http://lkml.kernel.org/r/1509935277-22138-9-git-send-email-ricardo.neri-calderon@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c6a960bb
    • R
      x86/umip: Add emulation code for UMIP instructions · 1e5db223
      Ricardo Neri 提交于
      The feature User-Mode Instruction Prevention present in recent Intel
      processor prevents a group of instructions (sgdt, sidt, sldt, smsw, and
      str) from being executed with CPL > 0. Otherwise, a general protection
      fault is issued.
      
      Rather than relaying to the user space the general protection fault caused
      by the UMIP-protected instructions (in the form of a SIGSEGV signal), it
      can be trapped and the instruction emulated to provide a dummy result.
      This allows to both conserve the current kernel behavior and not reveal the
      system resources that UMIP intends to protect (i.e., the locations of the
      global descriptor and interrupt descriptor tables, the segment selectors of
      the local descriptor table, the value of the task state register and the
      contents of the CR0 register).
      
      This emulation is needed because certain applications (e.g., WineHQ and
      DOSEMU2) rely on this subset of instructions to function. Given that sldt
      and str are not commonly used in programs that run on WineHQ or DOSEMU2,
      they are not emulated. Also, emulation is provided only for 32-bit
      processes; 64-bit processes that attempt to use the instructions that UMIP
      protects will receive the SIGSEGV signal issued as a consequence of the
      general protection fault.
      
      The instructions protected by UMIP can be split in two groups. Those which
      return a kernel memory address (sgdt and sidt) and those which return a
      value (smsw, sldt and str; the last two not emulated).
      
      For the instructions that return a kernel memory address, applications such
      as WineHQ rely on the result being located in the kernel memory space, not
      the actual location of the table. The result is emulated as a hard-coded
      value that lies close to the top of the kernel memory. The limit for the
      GDT and the IDT are set to zero.
      
      The instruction smsw is emulated to return the value that the register CR0
      has at boot time as set in the head_32.
      
      Care is taken to appropriately emulate the results when segmentation is
      used. That is, rather than relying on USER_DS and USER_CS, the function
      insn_get_addr_ref() inspects the segment descriptor pointed by the
      registers in pt_regs. This ensures that we correctly obtain the segment
      base address and the address and operand sizes even if the user space
      application uses a local descriptor table.
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: ricardo.neri@intel.com
      Link: http://lkml.kernel.org/r/1509935277-22138-8-git-send-email-ricardo.neri-calderon@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1e5db223
    • R
      x86/cpufeature: Add User-Mode Instruction Prevention definitions · 3522c2a6
      Ricardo Neri 提交于
      User-Mode Instruction Prevention is a security feature present in new
      Intel processors that, when set, prevents the execution of a subset of
      instructions if such instructions are executed in user mode (CPL > 0).
      Attempting to execute such instructions causes a general protection
      exception.
      
      The subset of instructions comprises:
      
       * SGDT - Store Global Descriptor Table
       * SIDT - Store Interrupt Descriptor Table
       * SLDT - Store Local Descriptor Table
       * SMSW - Store Machine Status Word
       * STR  - Store Task Register
      
      This feature is also added to the list of disabled-features to allow
      a cleaner handling of build-time configuration.
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: ricardo.neri@intel.com
      Link: http://lkml.kernel.org/r/1509935277-22138-7-git-send-email-ricardo.neri-calderon@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3522c2a6
    • R
      x86/insn-eval: Add support to resolve 16-bit address encodings · 9c6c799f
      Ricardo Neri 提交于
      Tasks running in virtual-8086 mode, in protected mode with code segment
      descriptors that specify 16-bit default address sizes via the D bit, or via
      an address override prefix will use 16-bit addressing form encodings as
      described in the Intel 64 and IA-32 Architecture Software Developer's
      Manual Volume 2A Section 2.1.5, Table 2-1.
      
      16-bit addressing encodings differ in several ways from the 32-bit/64-bit
      addressing form encodings: ModRM.rm points to different registers and, in
      some cases, effective addresses are indicated by the addition of the value
      of two registers. Also, there is no support for SIB bytes. Thus, a
      separate function is needed to parse this form of addressing.
      
      Three functions are introduced. get_reg_offset_16() obtains the
      offset from the base of pt_regs of the registers indicated by the ModRM
      byte of the address encoding. get_eff_addr_modrm_16() computes the
      effective address from the value of the register operands.
      get_addr_ref_16() computes the linear address using the obtained effective
      address and the base address of the segment.
      
      Segment limits are enforced when running in protected mode.
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: ricardo.neri@intel.com
      Link: http://lkml.kernel.org/r/1509935277-22138-6-git-send-email-ricardo.neri-calderon@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9c6c799f
    • R
      x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode · 86cc3510
      Ricardo Neri 提交于
      It is possible to utilize 32-bit address encodings in virtual-8086 mode via
      an address override instruction prefix. However, the range of the
      effective address is still limited to [0x-0xffff]. In such a case, return
      error.
      
      Also, linear addresses in virtual-8086 mode are limited to 20 bits. Enforce
      such limit by truncating the most significant bytes of the computed linear
      address.
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: ricardo.neri@intel.com
      Link: http://lkml.kernel.org/r/1509935277-22138-5-git-send-email-ricardo.neri-calderon@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      86cc3510