1. 02 11月, 2017 9 次提交
    • R
      x86/mpx, x86/insn: Relocate insn util functions to a new insn-eval file · 32542ee2
      Ricardo Neri 提交于
      Other kernel submodules can benefit from using the utility functions
      defined in mpx.c to obtain the addresses and values of operands contained
      in the general purpose registers. An instance of this is the emulation code
      used for instructions protected by the Intel User-Mode Instruction
      Prevention feature.
      
      Thus, these functions are relocated to a new insn-eval.c file. The reason
      to not relocate these utilities into insn.c is that the latter solely
      analyses instructions given by a struct insn without any knowledge of the
      meaning of the values of instruction operands. This new utility insn-
      eval.c aims to be used to resolve userspace linear addresses based on
      the contents of the instruction operands as well as the contents of pt_regs
      structure.
      
      These utilities come with a separate header. This is to avoid taking insn.c
      out of sync from the instructions decoders under tools/obj and tools/perf.
      This also avoids adding cumbersome #ifdef's for the #include'd files
      required to decode instructions in a kernel context.
      
      Functions are simply relocated. There are not functional or indentation
      changes.
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: ricardo.neri@intel.com
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Link: https://lkml.kernel.org/r/1509135945-13762-10-git-send-email-ricardo.neri-calderon@linux.intel.com
      32542ee2
    • R
      x86/mpx: Do not use SIB.base if its value is 101b and ModRM.mod = 0 · 4578f06f
      Ricardo Neri 提交于
      Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
      Developer's Manual volume 2A states that if a SIB byte is used and
      SIB.base is 101b and ModRM.mod is zero, then the base part of the base
      part of the effective address computation is null. To signal this
      situation, a -EDOM error is returned to indicate callers to ignore the
      base value present in the register operand.
      
      In this scenario, a 32-bit displacement follows the SIB byte. Displacement
      is obtained when the instruction decoder parses the operands.
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Adan Hawthorn <adanhawthorn@gmail.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: ricardo.neri@intel.com
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Nathan Howard <liverlint@gmail.com>
      Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/1509135945-13762-9-git-send-email-ricardo.neri-calderon@linux.intel.com
      4578f06f
    • R
      x86/mpx: Do not use SIB.index if its value is 100b and ModRM.mod is not 11b · ff9d7802
      Ricardo Neri 提交于
      Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
      Developer's Manual volume 2A states that when ModRM.mod !=11b and
      ModRM.rm = 100b indexed register-indirect addressing is used. In other
      words, a SIB byte follows the ModRM byte. In the specific case of
      SIB.index = 100b, the scale*index portion of the computation of the
      effective address is null. To signal callers of this particular situation,
      get_reg_offset() can return -EDOM (-EINVAL continues to indicate that an
      error when decoding the SIB byte).
      
      An example of this situation can be the following instruction:
      
         8b 4c 23 80       mov -0x80(%rbx,%riz,1),%rcx
         ModRM:            0x4c [mod:1b][reg:1b][rm:100b]
         SIB:              0x23 [scale:0b][index:100b][base:11b]
         Displacement:     0x80  (1-byte, as per ModRM.mod = 1b)
      
      The %riz 'register' indicates a null index.
      
      In long mode, a REX prefix may be used. When a REX prefix is present,
      REX.X adds a fourth bit to the register selection of SIB.index. This gives
      the ability to refer to all the 16 general purpose registers. When REX.X is
      1b and SIB.index is 100b, the index is indicated in %r12. In our example,
      this would look like:
      
         42 8b 4c 23 80    mov -0x80(%rbx,%r12,1),%rcx
         REX:              0x42 [W:0b][R:0b][X:1b][B:0b]
         ModRM:            0x4c [mod:1b][reg:1b][rm:100b]
         SIB:              0x23 [scale:0b][.X: 1b, index:100b][.B:0b, base:11b]
         Displacement:     0x80  (1-byte, as per ModRM.mod = 1b)
      
      %r12 is a valid register to use in the scale*index part of the effective
      address computation.
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Adan Hawthorn <adanhawthorn@gmail.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: ricardo.neri@intel.com
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Nathan Howard <liverlint@gmail.com>
      Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/1509135945-13762-8-git-send-email-ricardo.neri-calderon@linux.intel.com
      ff9d7802
    • R
      x86/mpx: Use signed variables to compute effective addresses · b8d2eff3
      Ricardo Neri 提交于
      Even though memory addresses are unsigned, the operands used to compute the
      effective address do have a sign. This is true for ModRM.rm, SIB.base,
      SIB.index as well as the displacement bytes. Thus, signed variables shall
      be used when computing the effective address from these operands. Once the
      signed effective address has been computed, it is casted to an unsigned
      long to determine the linear address.
      
      Variables are renamed to better reflect the type of address being
      computed.
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Adan Hawthorn <adanhawthorn@gmail.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: ricardo.neri@intel.com
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Nathan Howard <liverlint@gmail.com>
      Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/1509135945-13762-7-git-send-email-ricardo.neri-calderon@linux.intel.com
      b8d2eff3
    • R
      x86/mpx: Simplify handling of errors when computing linear addresses · b15d70df
      Ricardo Neri 提交于
      When errors occur in the computation of the linear address, -1L is
      returned. Rather than having a separate return path for errors, the
      variable used to return the computed linear address can be initialized
      with the error value. Hence, only one return path is needed. This makes
      the function easier to read.
      
      While here, ensure that the error value is -1L, a 64-bit value, rather
      than -1, a 32-bit value.
      Suggested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Adan Hawthorn <adanhawthorn@gmail.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: ricardo.neri@intel.com
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Nathan Howard <liverlint@gmail.com>
      Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/1509135945-13762-6-git-send-email-ricardo.neri-calderon@linux.intel.com
      b15d70df
    • R
      uprobes/x86: Use existing definitions for segment override prefixes · ed40a104
      Ricardo Neri 提交于
      Rather than using hard-coded values of the segment override prefixes,
      leverage the existing definitions provided in inat.h.
      Suggested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: ricardo.neri@intel.com
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/1509135945-13762-5-git-send-email-ricardo.neri-calderon@linux.intel.com
      ed40a104
    • R
      ptrace,x86: Make user_64bit_mode() available to 32-bit builds · e27c310a
      Ricardo Neri 提交于
      In its current form, user_64bit_mode() can only be used when CONFIG_X86_64
      is selected. This implies that code built with CONFIG_X86_64=n cannot use
      it. If a piece of code needs to be built for both CONFIG_X86_64=y and
      CONFIG_X86_64=n and wants to use this function, it needs to wrap it in
      an #ifdef/#endif; potentially, in multiple places.
      
      This can be easily avoided with a single #ifdef/#endif pair within
      user_64bit_mode() itself.
      Suggested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: ricardo.neri@intel.com
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Link: https://lkml.kernel.org/r/1509135945-13762-4-git-send-email-ricardo.neri-calderon@linux.intel.com
      e27c310a
    • R
      x86/boot: Relocate definition of the initial state of CR0 · b0ce5b8c
      Ricardo Neri 提交于
      Both head_32.S and head_64.S utilize the same value to initialize the
      control register CR0. Also, other parts of the kernel might want to access
      this initial definition (e.g., emulation code for User-Mode Instruction
      Prevention uses this state to provide a sane dummy value for CR0 when
      emulating the smsw instruction). Thus, relocate this definition to a
      header file from which it can be conveniently accessed.
      Suggested-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: ricardo.neri@intel.com
      Cc: linux-mm@kvack.org
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: linux-arch@vger.kernel.org
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: https://lkml.kernel.org/r/1509135945-13762-3-git-send-email-ricardo.neri-calderon@linux.intel.com
      b0ce5b8c
    • R
      x86/mm: Relocate page fault error codes to traps.h · 1067f030
      Ricardo Neri 提交于
      Up to this point, only fault.c used the definitions of the page fault error
      codes. Thus, it made sense to keep them within such file. Other portions of
      code might be interested in those definitions too. For instance, the User-
      Mode Instruction Prevention emulation code will use such definitions to
      emulate a page fault when it is unable to successfully copy the results
      of the emulated instructions to user space.
      
      While relocating the error code enumeration, the prefix X86_ is used to
      make it consistent with the rest of the definitions in traps.h. Of course,
      code using the enumeration had to be updated as well. No functional changes
      were performed.
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: ricardo.neri@intel.com
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: https://lkml.kernel.org/r/1509135945-13762-2-git-send-email-ricardo.neri-calderon@linux.intel.com
      1067f030
  2. 27 10月, 2017 1 次提交
    • I
      Revert "x86/mm: Limit mmap() of /dev/mem to valid physical addresses" · 90edaac6
      Ingo Molnar 提交于
      This reverts commit ce56a86e.
      
      There's unanticipated interaction with some boot parameters like 'mem=',
      which now cause the new checks via valid_mmap_phys_addr_range() to be too
      restrictive, crashing a Qemu bootup in fact, as reported by Fengguang Wu.
      
      So while the motivation of the change is still entirely valid, we
      need a few more rounds of testing to get it right - it's way too late
      after -rc6, so revert it for now.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NCraig Bergstrom <craigb@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dsafonov@virtuozzo.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: mhocko@suse.com
      Cc: oleg@redhat.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      90edaac6
  3. 24 10月, 2017 1 次提交
  4. 23 10月, 2017 2 次提交
  5. 22 10月, 2017 1 次提交
  6. 20 10月, 2017 1 次提交
    • C
      x86/mm: Limit mmap() of /dev/mem to valid physical addresses · ce56a86e
      Craig Bergstrom 提交于
      Currently, it is possible to mmap() any offset from /dev/mem.  If a
      program mmaps() /dev/mem offsets outside of the addressable limits
      of a system, the page table can be corrupted by setting reserved bits.
      
      For example if you mmap() offset 0x0001000000000000 of /dev/mem on an
      x86_64 system with a 48-bit bus, the page fault handler will be called
      with error_code set to RSVD.  The kernel then crashes with a page table
      corruption error.
      
      This change prevents this page table corruption on x86 by refusing
      to mmap offsets higher than the highest valid address in the system.
      Signed-off-by: NCraig Bergstrom <craigb@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dsafonov@virtuozzo.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: mhocko@suse.com
      Cc: oleg@redhat.com
      Link: http://lkml.kernel.org/r/20171019192856.39672-1-craigb@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ce56a86e
  7. 18 10月, 2017 4 次提交
  8. 17 10月, 2017 1 次提交
  9. 16 10月, 2017 1 次提交
  10. 14 10月, 2017 2 次提交
    • B
      x86/microcode: Do the family check first · 1f161f67
      Borislav Petkov 提交于
      On CPUs like AMD's Geode, for example, we shouldn't even try to load
      microcode because they do not support the modern microcode loading
      interface.
      
      However, we do the family check *after* the other checks whether the
      loader has been disabled on the command line or whether we're running in
      a guest.
      
      So move the family checks first in order to exit early if we're being
      loaded on an unsupported family.
      Reported-and-tested-by: NSven Glodowski <glodi1@arcor.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org> # 4.11..
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://bugzilla.suse.com/show_bug.cgi?id=1061396
      Link: http://lkml.kernel.org/r/20171012112316.977-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1f161f67
    • A
      x86/mm: Flush more aggressively in lazy TLB mode · b956575b
      Andy Lutomirski 提交于
      Since commit:
      
        94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
      
      x86's lazy TLB mode has been all the way lazy: when running a kernel thread
      (including the idle thread), the kernel keeps using the last user mm's
      page tables without attempting to maintain user TLB coherence at all.
      
      From a pure semantic perspective, this is fine -- kernel threads won't
      attempt to access user pages, so having stale TLB entries doesn't matter.
      
      Unfortunately, I forgot about a subtlety.  By skipping TLB flushes,
      we also allow any paging-structure caches that may exist on the CPU
      to become incoherent.  This means that we can have a
      paging-structure cache entry that references a freed page table, and
      the CPU is within its rights to do a speculative page walk starting
      at the freed page table.
      
      I can imagine this causing two different problems:
      
       - A speculative page walk starting from a bogus page table could read
         IO addresses.  I haven't seen any reports of this causing problems.
      
       - A speculative page walk that involves a bogus page table can install
         garbage in the TLB.  Such garbage would always be at a user VA, but
         some AMD CPUs have logic that triggers a machine check when it notices
         these bogus entries.  I've seen a couple reports of this.
      
      Boris further explains the failure mode:
      
      > It is actually more of an optimization which assumes that paging-structure
      > entries are in WB DRAM:
      >
      > "TlbCacheDis: cacheable memory disable. Read-write. 0=Enables
      > performance optimization that assumes PML4, PDP, PDE, and PTE entries
      > are in cacheable WB-DRAM; memory type checks may be bypassed, and
      > addresses outside of WB-DRAM may result in undefined behavior or NB
      > protocol errors. 1=Disables performance optimization and allows PML4,
      > PDP, PDE and PTE entries to be in any memory type. Operating systems
      > that maintain page tables in memory types other than WB- DRAM must set
      > TlbCacheDis to insure proper operation."
      >
      > The MCE generated is an NB protocol error to signal that
      >
      > "Link: A specific coherent-only packet from a CPU was issued to an
      > IO link. This may be caused by software which addresses page table
      > structures in a memory type other than cacheable WB-DRAM without
      > properly configuring MSRC001_0015[TlbCacheDis]. This may occur, for
      > example, when page table structure addresses are above top of memory. In
      > such cases, the NB will generate an MCE if it sees a mismatch between
      > the memory operation generated by the core and the link type."
      >
      > I'm assuming coherent-only packets don't go out on IO links, thus the
      > error.
      
      To fix this, reinstate TLB coherence in lazy mode.  With this patch
      applied, we do it in one of two ways:
      
       - If we have PCID, we simply switch back to init_mm's page tables
         when we enter a kernel thread -- this seems to be quite cheap
         except for the cost of serializing the CPU.
      
       - If we don't have PCID, then we set a flag and switch to init_mm
         the first time we would otherwise need to flush the TLB.
      
      The /sys/kernel/debug/x86/tlb_use_lazy_mode debug switch can be changed
      to override the default mode for benchmarking.
      
      In theory, we could optimize this better by only flushing the TLB in
      lazy CPUs when a page table is freed.  Doing that would require
      auditing the mm code to make sure that all page table freeing goes
      through tlb_remove_page() as well as reworking some data structures
      to implement the improved flush logic.
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Reported-by: NAdam Borowski <kilobyte@angband.pl>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Johannes Hirte <johannes.hirte@datenkhaos.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
      Link: http://lkml.kernel.org/r/20171009170231.fkpraqokz6e4zeco@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b956575b
  11. 12 10月, 2017 4 次提交
  12. 11 10月, 2017 1 次提交
  13. 10 10月, 2017 11 次提交
  14. 09 10月, 2017 1 次提交