1. 17 11月, 2017 4 次提交
  2. 16 11月, 2017 2 次提交
    • C
      x86/mm: Limit mmap() of /dev/mem to valid physical addresses · be62a320
      Craig Bergstrom 提交于
      One thing /dev/mem access APIs should verify is that there's no way
      that excessively large pfn's can leak into the high bits of the
      page table entry.
      
      In particular, if people can use "very large physical page addresses"
      through /dev/mem to set the bits past bit 58 - SOFTW4 and permission
      key bits and NX bit, that could *really* confuse the kernel.
      
      We had an earlier attempt:
      
        ce56a86e ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses")
      
      ... which turned out to be too restrictive (breaking mem=... bootups for example) and
      had to be reverted in:
      
        90edaac6 ("Revert "x86/mm: Limit mmap() of /dev/mem to valid physical addresses"")
      
      This v2 attempt modifies the original patch and makes sure that mmap(/dev/mem)
      limits the pfns so that it at least fits in the actual pteval_t architecturally:
      
       - Make sure mmap_mem() actually validates that the offset fits in phys_addr_t
      
          ( This may be indirectly true due to some other check, but it's not
            entirely obvious. )
      
       - Change valid_mmap_phys_addr_range() to just use phys_addr_valid()
         on the top byte
      
          ( Top byte is sufficient, because mmap_mem() has already checked that
            it cannot wrap. )
      
       - Add a few comments about what the valid_phys_addr_range() vs.
         valid_mmap_phys_addr_range() difference is.
      Signed-off-by: NCraig Bergstrom <craigb@google.com>
      [ Fixed the checks and added comments. ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      [ Collected the discussion and patches into a commit. ]
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hans Verkuil <hans.verkuil@cisco.com>
      Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sander Eikelenboom <linux@eikelenboom.it>
      Cc: Sean Young <sean@mess.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/CA+55aFyEcOMb657vWSmrM13OxmHxC-XxeBmNis=DwVvpJUOogQ@mail.gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      be62a320
    • K
      x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border · 1e0f25db
      Kirill A. Shutemov 提交于
      In case of 5-level paging, the kernel does not place any mapping above
      47-bit, unless userspace explicitly asks for it.
      
      Userspace can request an allocation from the full address space by
      specifying the mmap address hint above 47-bit.
      
      Nicholas noticed that the current implementation violates this interface:
      
        If user space requests a mapping at the end of the 47-bit address space
        with a length which causes the mapping to cross the 47-bit border
        (DEFAULT_MAP_WINDOW), then the vma is partially in the address space
        below and above.
      
      Sanity check the mmap address hint so that start and end of the resulting
      vma are on the same side of the 47-bit border. If that's not the case fall
      back to the code path which ignores the address hint and allocate from the
      regular address space below 47-bit.
      
      To make the checks consistent, mask out the address hints lower bits
      (either PAGE_MASK or huge_page_mask()) instead of using ALIGN() which can
      push them up to the next boundary.
      
      [ tglx: Moved the address check to a function and massaged comment and
        	changelog ]
      Reported-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20171115143607.81541-1-kirill.shutemov@linux.intel.com
      1e0f25db
  3. 14 11月, 2017 4 次提交
  4. 12 11月, 2017 1 次提交
    • X
      x86/intel_rdt: Fix a silent failure when writing zero value schemata · 2244645a
      Xiaochen Shen 提交于
      Writing an invalid schemata with no domain values (e.g., "(L3|MB):"),
      results in a silent failure, i.e. the last_cmd_status returns OK,
      
      Check for an empty value and set the result string with a proper error
      message and return -EINVAL.
      
      Before the fix:
       # mkdir /sys/fs/resctrl/p1
      
       # echo "L3:" > /sys/fs/resctrl/p1/schemata
       (silent failure)
       # cat /sys/fs/resctrl/info/last_cmd_status
       ok
      
       # echo "MB:" > /sys/fs/resctrl/p1/schemata
       (silent failure)
       # cat /sys/fs/resctrl/info/last_cmd_status
       ok
      
      After the fix:
       # mkdir /sys/fs/resctrl/p1
      
       # echo "L3:" > /sys/fs/resctrl/p1/schemata
       -bash: echo: write error: Invalid argument
       # cat /sys/fs/resctrl/info/last_cmd_status
       Missing 'L3' value
      
       # echo "MB:" > /sys/fs/resctrl/p1/schemata
       -bash: echo: write error: Invalid argument
       # cat /sys/fs/resctrl/info/last_cmd_status
       Missing 'MB' value
      
      [ Tony: This is an unintended side effect of the patch earlier to allow the
          	user to just write the value they want to change.  While allowing
          	user to specify less than all of the values, it also allows an
          	empty value. ]
      
      Fixes: c4026b7b ("x86/intel_rdt: Implement "update" mode when writing schemata file")
      Signed-off-by: NXiaochen Shen <xiaochen.shen@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Link: https://lkml.kernel.org/r/20171110191624.20280-1-tony.luck@intel.com
      2244645a
  5. 11 11月, 2017 4 次提交
    • L
      Revert "x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo" · ea0ee339
      Linus Torvalds 提交于
      This reverts commit 941f5f0f.
      
      Sadly, it turns out that we really can't just do the cross-CPU IPI to
      all CPU's to get their proper frequencies, because it's much too
      expensive on systems with lots of cores.
      
      So we'll have to revert this for now, and revisit it using a smarter
      model (probably doing one system-wide IPI at open time, and doing all
      the frequency calculations in parallel).
      Reported-by: NWANG Chao <chao.wang@ucloud.cn>
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Cc: Rafael J Wysocki <rafael.j.wysocki@intel.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea0ee339
    • H
      s390/noexec: execute kexec datamover without DAT · d0e810ee
      Heiko Carstens 提交于
      Rebooting into a new kernel with kexec fails (system dies) if tried on
      a machine that has no-execute support. Reason for this is that the so
      called datamover code gets executed with DAT on (MMU is active) and
      the page that contains the datamover is marked as non-executable.
      Therefore when branching into the datamover an unexpected program
      check happens and afterwards the machine is dead.
      
      This can be simply avoided by disabling DAT, which also disables any
      no-execute checks, just before the datamover gets executed.
      
      In fact the first thing done by the datamover is to disable DAT. The
      code in the datamover that disables DAT can be removed as well.
      
      Thanks to Michael Holzheu and Gerald Schaefer for tracking this down.
      Reviewed-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Reviewed-by: NPhilipp Rudo <prudo@linux.vnet.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Fixes: 57d7f939 ("s390: add no-execute support")
      Cc: <stable@vger.kernel.org> # v4.11+
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      d0e810ee
    • H
      s390: fix transactional execution control register handling · a1c5befc
      Heiko Carstens 提交于
      Dan Horák reported the following crash related to transactional execution:
      
      User process fault: interruption code 0013 ilc:3 in libpthread-2.26.so[3ff93c00000+1b000]
      CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1
      Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
      task: 00000000fafc8000 task.stack: 00000000fafc4000
      User PSW : 0705200180000000 000003ff93c14e70
                 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
      User GPRS: 0000000000000077 000003ff00000000 000003ff93144d48 000003ff93144d5e
                 0000000000000000 0000000000000002 0000000000000000 000003ff00000000
                 0000000000000000 0000000000000418 0000000000000000 000003ffcc9fe770
                 000003ff93d28f50 000003ff9310acf0 000003ff92b0319a 000003ffcc9fe6d0
      User Code: 000003ff93c14e62: 60e0b030            std     %f14,48(%r11)
                 000003ff93c14e66: 60f0b038            std     %f15,56(%r11)
                #000003ff93c14e6a: e5600000ff0e        tbegin  0,65294
                >000003ff93c14e70: a7740006            brc     7,3ff93c14e7c
                 000003ff93c14e74: a7080000            lhi     %r0,0
                 000003ff93c14e78: a7f40023            brc     15,3ff93c14ebe
                 000003ff93c14e7c: b2220000            ipm     %r0
                 000003ff93c14e80: 8800001c            srl     %r0,28
      
      There are several bugs with control register handling with respect to
      transactional execution:
      
      - on task switch update_per_regs() is only called if the next task has
        an mm (is not a kernel thread). This however is incorrect. This
        breaks e.g. for user mode helper handling, where the kernel creates
        a kernel thread and then execve's a user space program. Control
        register contents related to transactional execution won't be
        updated on execve. If the previous task ran with transactional
        execution disabled then the new task will also run with
        transactional execution disabled, which is incorrect. Therefore call
        update_per_regs() unconditionally within switch_to().
      
      - on startup the transactional execution facility is not enabled for
        the idle thread. This is not really a bug, but an inconsistency to
        other facilities. Therefore enable the facility if it is available.
      
      - on fork the new thread's per_flags field is not cleared. This means
        that a child process inherits the PER_FLAG_NO_TE flag. This flag can
        be set with a ptrace request to disable transactional execution for
        the current process. It should not be inherited by new child
        processes in order to be consistent with the handling of all other
        PER related debugging options. Therefore clear the per_flags field in
        copy_thread_tls().
      Reported-and-tested-by: NDan Horák <dan@danny.cz>
      Fixes: d35339a4 ("s390: add support for transactional memory")
      Cc: <stable@vger.kernel.org> # v3.7+
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      a1c5befc
    • M
      s390/bpf: take advantage of stack_depth tracking · 78372709
      Michael Holzheu 提交于
      Make use of the "stack_depth" tracking feature introduced with
      commit 8726679a ("bpf: teach verifier to track stack depth") for the
      s390 JIT, so that stack usage can be reduced.
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      78372709
  6. 10 11月, 2017 14 次提交
  7. 09 11月, 2017 8 次提交
    • H
      s390: simplify transactional execution elf hwcap handling · baaf9be8
      Heiko Carstens 提交于
      Just use MACHINE_HAS_TE to decide if HWCAP_S390_TE needs
      to be added to elf_hwcap.
      Suggested-by: NDan Horák <dan@danny.cz>
      Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      baaf9be8
    • C
      s390/virtio: remove unused header file kvm_virtio.h · a401917b
      Christian Borntraeger 提交于
      With commit 7fb2b2d5 ("s390/virtio: remove the old KVM virtio
      transport") the pre-ccw virtio transport for s390 was removed. To
      complete the removal the uapi header file that contains the related data
      structures must also be removed.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      a401917b
    • C
      x86/build: Make the boot image generation less verbose · 7980f029
      Changbin Du 提交于
      This change suppresses the 'dd' output and adds the '-quiet' parameter
      to mkisofs tool. It also removes the 'Using ...' messages, as none of the
      messages matter to the user normally.
      
      "make V=1" can still be used for a more verbose build.
      
      The new build messages are now a streamlined set of:
      
        $ make isoimage
        ...
        Kernel: arch/x86/boot/bzImage is ready  (#75)
          GENIMAGE arch/x86/boot/image.iso
        Kernel: arch/x86/boot/image.iso is ready
      Signed-off-by: NChangbin Du <changbin.du@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1510207751-22166-1-git-send-email-changbin.du@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7980f029
    • J
      x86/mm: Unbreak modules that rely on external PAGE_KERNEL availability · 87df2617
      Jiri Kosina 提交于
      Commit 7744ccdb ("x86/mm: Add Secure Memory Encryption (SME)
      support") as a side-effect made PAGE_KERNEL all of a sudden unavailable
      to modules which can't make use of EXPORT_SYMBOL_GPL() symbols.
      
      This is because once SME is enabled, sme_me_mask (which is introduced as
      EXPORT_SYMBOL_GPL) makes its way to PAGE_KERNEL through _PAGE_ENC,
      causing imminent build failure for all the modules which make use of all
      the EXPORT-SYMBOL()-exported API (such as vmap(), __vmalloc(),
      remap_pfn_range(), ...).
      
      Exporting (as EXPORT_SYMBOL()) interfaces (and having done so for ages)
      that take pgprot_t argument, while making it impossible to -- all of a
      sudden -- pass PAGE_KERNEL to it, feels rather incosistent.
      
      Restore the original behavior and make it possible to pass PAGE_KERNEL
      to all its EXPORT_SYMBOL() consumers.
      
      [ This is all so not wonderful. We shouldn't need that "sme_me_mask"
        access at all in all those places that really don't care about that
        level of detail, and just want _PAGE_KERNEL or whatever.
      
        We have some similar issues with _PAGE_CACHE_WP and _PAGE_NOCACHE,
        both of which hide a "cachemode2protval()" call, and which also ends
        up using another EXPORT_SYMBOL(), but at least that only triggers for
        the much more rare cases.
      
        Maybe we could move these dynamic page table bits to be generated much
        deeper down in the VM layer, instead of hiding them in the macros that
        everybody uses.
      
        So this all would merit some cleanup. But not today.   - Linus ]
      
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Despised-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87df2617
    • H
      s390: avoid undefined behaviour · ead7a22e
      Heiko Carstens 提交于
      At a couple of places smatch emits warnings like this:
      
          arch/s390/mm/vmem.c:409 vmem_map_init() warn:
              right shifting more than type allows
      
      In fact shifting a signed type right is undefined. Avoid this and add
      an unsigned long cast. The shifted values are always positive.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      ead7a22e
    • H
      s390/disassembler: generate opcode tables from text file · 8bc1e4ec
      Heiko Carstens 提交于
      The current way of adding new instructions to the opcode tables is
      painful and error prone. Therefore add, similar to binutils, a text
      file which contains all opcodes and the corresponding mnemonics and
      instruction formats.
      
      A small gen_opcode_table tool then generates a header file with the
      required enums and opcode table initializers at the prepare step of
      the kernel build.
      
      This way only a simple text file has to be maintained, which can be
      rather easily extended.
      
      Unlike before where there were plenty of opcode tables and a large
      switch statement to find the correct opcode table, there is now only
      one opcode table left which contains all instructions. A second opcode
      offset table now contains offsets within the opcode table to find
      instructions which have the same opcode prefix. In order to save space
      all 1-byte opcode instructions are grouped together at the end of the
      opcode table. This is also quite similar to like it was before.
      
      In addition also move and change code and definitions within the
      disassembler. As a side effect this reduces the size required for the
      code and opcode tables by ~1.5k.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      8bc1e4ec
    • H
      s390/disassembler: remove insn_to_mnemonic() · dac6dc26
      Heiko Carstens 提交于
      insn_to_mnemonic() was introduced ages ago for KVM debugging, but is
      unused in the meantime. Therefore remove it.
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      dac6dc26
    • Y
      x86/idt: Remove X86_TRAP_BP initialization in idt_setup_traps() · d0cd64b0
      Yonghong Song 提交于
      Commit b70543a0("x86/idt: Move regular trap init to tables") moves
      regular trap init for each trap vector into a table based
      initialization. It introduced the initialization for vector X86_TRAP_BP
      which was not in the code which it replaced. This breaks uprobe
      functionality for x86_32; the probed program segfaults instead of handling
      the probe proper.
      
      The reason for this is that TRAP_BP is set up as system interrupt gate
      (DPL3) in the early IDT and then replaced by a regular interrupt gate
      (DPL0) in idt_setup_traps(). The DPL0 restriction causes the int3 trap
      to fail with a #GP resulting in a SIGSEGV of the probed program.
      
      On 64bit this does not cause a problem because the IDT entry is replaced
      with a system interrupt gate (DPL3) with interrupt stack afterwards.
      
      Remove X86_TRAP_BP from the def_idts table which is used in
      idt_setup_traps(). Remove a redundant entry for X86_TRAP_NMI in def_idts
      while at it. Tested on both x86_64 and x86_32.
      
      [ tglx: Amended changelog with a description of the root cause ]
      
      Fixes: b70543a0("x86/idt: Move regular trap init to tables")
      Reported-and-tested-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: a.p.zijlstra@chello.nl
      Cc: ast@fb.com
      Cc: oleg@redhat.com
      Cc: luto@kernel.org
      Cc: kernel-team@fb.com
      Link: https://lkml.kernel.org/r/20171108192845.552709-1-yhs@fb.com
      d0cd64b0
  8. 08 11月, 2017 3 次提交