1. 07 11月, 2015 1 次提交
  2. 06 7月, 2015 1 次提交
    • A
      x86/kasan: Fix KASAN shadow region page tables · 5d5aa3cf
      Alexander Popov 提交于
      Currently KASAN shadow region page tables created without
      respect of physical offset (phys_base). This causes kernel halt
      when phys_base is not zero.
      
      So let's initialize KASAN shadow region page tables in
      kasan_early_init() using __pa_nodebug() which considers
      phys_base.
      
      This patch also separates x86_64_start_kernel() from KASAN low
      level details by moving kasan_map_early_shadow(init_level4_pgt)
      into kasan_early_init().
      
      Remove the comment before clear_bss() which stopped bringing
      much profit to the code readability. Otherwise describing all
      the new order dependencies would be too verbose.
      Signed-off-by: NAlexander Popov <alpopov@ptsecurity.com>
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: <stable@vger.kernel.org> # 4.0+
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1435828178-10975-3-git-send-email-a.ryabinin@samsung.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5d5aa3cf
  3. 02 6月, 2015 1 次提交
    • A
      x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers · 425be567
      Andy Lutomirski 提交于
      The early_idt_handlers asm code generates an array of entry
      points spaced nine bytes apart.  It's not really clear from that
      code or from the places that reference it what's going on, and
      the code only works in the first place because GAS never
      generates two-byte JMP instructions when jumping to global
      labels.
      
      Clean up the code to generate the correct array stride (member size)
      explicitly. This should be considerably more robust against
      screw-ups, as GAS will warn if a .fill directive has a negative
      count.  Using '. =' to advance would have been even more robust
      (it would generate an actual error if it tried to move
      backwards), but it would pad with nulls, confusing anyone who
      tries to disassemble the code.  The new scheme should be much
      clearer to future readers.
      
      While we're at it, improve the comments and rename the array and
      common code.
      
      Binutils may start relaxing jumps to non-weak labels.  If so,
      this change will fix our build, and we may need to backport this
      change.
      
      Before, on x86_64:
      
        0000000000000000 <early_idt_handlers>:
           0:   6a 00                   pushq  $0x0
           2:   6a 00                   pushq  $0x0
           4:   e9 00 00 00 00          jmpq   9 <early_idt_handlers+0x9>
                                5: R_X86_64_PC32        early_idt_handler-0x4
        ...
          48:   66 90                   xchg   %ax,%ax
          4a:   6a 08                   pushq  $0x8
          4c:   e9 00 00 00 00          jmpq   51 <early_idt_handlers+0x51>
                                4d: R_X86_64_PC32       early_idt_handler-0x4
        ...
         117:   6a 00                   pushq  $0x0
         119:   6a 1f                   pushq  $0x1f
         11b:   e9 00 00 00 00          jmpq   120 <early_idt_handler>
                                11c: R_X86_64_PC32      early_idt_handler-0x4
      
      After:
      
        0000000000000000 <early_idt_handler_array>:
           0:   6a 00                   pushq  $0x0
           2:   6a 00                   pushq  $0x0
           4:   e9 14 01 00 00          jmpq   11d <early_idt_handler_common>
        ...
          48:   6a 08                   pushq  $0x8
          4a:   e9 d1 00 00 00          jmpq   120 <early_idt_handler_common>
          4f:   cc                      int3
          50:   cc                      int3
        ...
         117:   6a 00                   pushq  $0x0
         119:   6a 1f                   pushq  $0x1f
         11b:   eb 03                   jmp    120 <early_idt_handler_common>
         11d:   cc                      int3
         11e:   cc                      int3
         11f:   cc                      int3
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Binutils <binutils@sourceware.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      425be567
  4. 24 5月, 2015 1 次提交
    • A
      x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers · cdeb6048
      Andy Lutomirski 提交于
      The early_idt_handlers asm code generates an array of entry
      points spaced nine bytes apart.  It's not really clear from that
      code or from the places that reference it what's going on, and
      the code only works in the first place because GAS never
      generates two-byte JMP instructions when jumping to global
      labels.
      
      Clean up the code to generate the correct array stride (member size)
      explicitly. This should be considerably more robust against
      screw-ups, as GAS will warn if a .fill directive has a negative
      count.  Using '. =' to advance would have been even more robust
      (it would generate an actual error if it tried to move
      backwards), but it would pad with nulls, confusing anyone who
      tries to disassemble the code.  The new scheme should be much
      clearer to future readers.
      
      While we're at it, improve the comments and rename the array and
      common code.
      
      Binutils may start relaxing jumps to non-weak labels.  If so,
      this change will fix our build, and we may need to backport this
      change.
      
      Before, on x86_64:
      
        0000000000000000 <early_idt_handlers>:
           0:   6a 00                   pushq  $0x0
           2:   6a 00                   pushq  $0x0
           4:   e9 00 00 00 00          jmpq   9 <early_idt_handlers+0x9>
                                5: R_X86_64_PC32        early_idt_handler-0x4
        ...
          48:   66 90                   xchg   %ax,%ax
          4a:   6a 08                   pushq  $0x8
          4c:   e9 00 00 00 00          jmpq   51 <early_idt_handlers+0x51>
                                4d: R_X86_64_PC32       early_idt_handler-0x4
        ...
         117:   6a 00                   pushq  $0x0
         119:   6a 1f                   pushq  $0x1f
         11b:   e9 00 00 00 00          jmpq   120 <early_idt_handler>
                                11c: R_X86_64_PC32      early_idt_handler-0x4
      
      After:
      
        0000000000000000 <early_idt_handler_array>:
           0:   6a 00                   pushq  $0x0
           2:   6a 00                   pushq  $0x0
           4:   e9 14 01 00 00          jmpq   11d <early_idt_handler_common>
        ...
          48:   6a 08                   pushq  $0x8
          4a:   e9 d1 00 00 00          jmpq   120 <early_idt_handler_common>
          4f:   cc                      int3
          50:   cc                      int3
        ...
         117:   6a 00                   pushq  $0x0
         119:   6a 1f                   pushq  $0x1f
         11b:   eb 03                   jmp    120 <early_idt_handler_common>
         11d:   cc                      int3
         11e:   cc                      int3
         11f:   cc                      int3
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Binutils <binutils@sourceware.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cdeb6048
  5. 17 5月, 2015 1 次提交
    • B
      x86/asm/head*.S: Change global labels to local · e839004b
      Borislav Petkov 提交于
      Make the disassembly look less confusing:
      
        -- head_64.o.before.asm
        ++ head_64.o.after.asm
         0000000000000120 <early_idt_handler>:
          120:	fc                   	cld
          121:	83 3c 24 02          	cmpl   $0x2,(%rsp)
        - 125:	0f 84 9d 00 00 00    	je     1c8 <is_nmi>
        + 125:	0f 84 9d 00 00 00    	je     1c8 <early_idt_handler+0xa8>
          12b:	83 3d 00 00 00 00 02 	cmpl   $0x2,0x0(%rip)        # 132 <early_idt_handler+0x12>
          132:	74 7e                	je     1b2 <early_idt_handler+0x92>
          134:	ff 05 00 00 00 00    	incl   0x0(%rip)        # 13a <early_idt_handler+0x1a>
        @@ -1198,9 +1198,7 @@ Disassembly of section .init.text:
          1bf:	5a                   	pop    %rdx
          1c0:	59                   	pop    %rcx
          1c1:	58                   	pop    %rax
        - 1c2:	ff 0d 00 00 00 00    	decl   0x0(%rip)        # 1c8 <is_nmi>
        -
        -00000000000001c8 <is_nmi>:
        + 1c2:	ff 0d 00 00 00 00    	decl   0x0(%rip)        # 1c8 <early_idt_handler+0xa8>
          1c8:	48 83 c4 10          	add    $0x10,%rsp
          1cc:	48 cf                	iretq
      
        -- head_32.o.before.asm
        ++ head_32.o.after.asm
         0000016c <early_idt_handler>:
          16c:  fc                      cld
          16d:  83 3c 24 02             cmpl   $0x2,(%esp)
        - 171:  74 73                   je     1e6 <is_nmi>
        + 171:  74 73                   je     1e6 <ex_entry+0xc>
          173:  36 83 3d 00 00 00 00    cmpl   $0x2,%ss:0x0
          17a:  02
          17b:  74 5a                   je     1d7 <hlt_loop>
        @@ -483,8 +483,6 @@ Disassembly of section .init.text:
          1dd:  59                      pop    %ecx
          1de:  58                      pop    %eax
          1df:  36 ff 0d 00 00 00 00    decl   %ss:0x0
        -
        -000001e6 <is_nmi>:
          1e6:  83 c4 08                add    $0x8,%esp
          1e9:  cf                      iret
          1ea:  66 90                   xchg   %ax,%ax
      
      No functionality change.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1431793079-11153-1-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e839004b
  6. 07 3月, 2015 1 次提交
    • D
      x86/asm: Optimize unnecessarily wide TEST instructions · 3e1aa7cb
      Denys Vlasenko 提交于
      By the nature of the TEST operation, it is often possible to test
      a narrower part of the operand:
      
          "testl $3,  mem"  ->  "testb $3, mem",
          "testq $3, %rcx"  ->  "testb $3, %cl"
      
      This results in shorter instructions, because the TEST instruction
      has no sign-entending byte-immediate forms unlike other ALU ops.
      
      Note that this change does not create any LCP (Length-Changing Prefix)
      stalls, which happen when adding a 0x66 prefix, which happens when
      16-bit immediates are used, which changes such TEST instructions:
      
        [test_opcode] [modrm] [imm32]
      
      to:
      
        [0x66] [test_opcode] [modrm] [imm16]
      
      where [imm16] has a *different length* now: 2 bytes instead of 4.
      This confuses the decoder and slows down execution.
      
      REX prefixes were carefully designed to almost never hit this case:
      adding REX prefix does not change instruction length except MOVABS
      and MOV [addr],RAX instruction.
      
      This patch does not add instructions which would use a 0x66 prefix,
      code changes in assembly are:
      
          -48 f7 07 01 00 00 00 	testq  $0x1,(%rdi)
          +f6 07 01             	testb  $0x1,(%rdi)
          -48 f7 c1 01 00 00 00 	test   $0x1,%rcx
          +f6 c1 01             	test   $0x1,%cl
          -48 f7 c1 02 00 00 00 	test   $0x2,%rcx
          +f6 c1 02             	test   $0x2,%cl
          -41 f7 c2 01 00 00 00 	test   $0x1,%r10d
          +41 f6 c2 01          	test   $0x1,%r10b
          -48 f7 c1 04 00 00 00 	test   $0x4,%rcx
          +f6 c1 04             	test   $0x4,%cl
          -48 f7 c1 08 00 00 00 	test   $0x8,%rcx
          +f6 c1 08             	test   $0x8,%cl
      
      Linus further notes:
      
         "There are no stalls from using 8-bit instruction forms.
      
          Now, changing from 64-bit or 32-bit 'test' instructions to 8-bit ones
          *could* cause problems if it ends up having forwarding issues, so that
          instead of just forwarding the result, you end up having to wait for
          it to be stable in the L1 cache (or possibly the register file). The
          forwarding from the store buffer is simplest and most reliable if the
          read is done at the exact same address and the exact same size as the
          write that gets forwarded.
      
          But that's true only if:
      
           (a) the write was very recent and is still in the write queue. I'm
               not sure that's the case here anyway.
      
           (b) on at least most Intel microarchitectures, you have to test a
               different byte than the lowest one (so forwarding a 64-bit write
               to a 8-bit read ends up working fine, as long as the 8-bit read
               is of the low 8 bits of the written data).
      
          A very similar issue *might* show up for registers too, not just
          memory writes, if you use 'testb' with a high-byte register (where
          instead of forwarding the value from the original producer it needs to
          go through the register file and then shifted). But it's mainly a
          problem for store buffers.
      
          But afaik, the way Denys changed the test instructions, neither of the
          above issues should be true.
      
          The real problem for store buffer forwarding tends to be "write 8
          bits, read 32 bits". That can be really surprisingly expensive,
          because the read ends up having to wait until the write has hit the
          cacheline, and we might talk tens of cycles of latency here. But
          "write 32 bits, read the low 8 bits" *should* be fast on pretty much
          all x86 chips, afaik."
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1425675332-31576-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3e1aa7cb
  7. 19 2月, 2015 1 次提交
  8. 14 2月, 2015 1 次提交
    • A
      x86_64: add KASan support · ef7f0d6a
      Andrey Ryabinin 提交于
      This patch adds arch specific code for kernel address sanitizer.
      
      16TB of virtual addressed used for shadow memory.  It's located in range
      [ffffec0000000000 - fffffc0000000000] between vmemmap and %esp fixup
      stacks.
      
      At early stage we map whole shadow region with zero page.  Latter, after
      pages mapped to direct mapping address range we unmap zero pages from
      corresponding shadow (see kasan_map_shadow()) and allocate and map a real
      shadow memory reusing vmemmap_populate() function.
      
      Also replace __pa with __pa_nodebug before shadow initialized.  __pa with
      CONFIG_DEBUG_VIRTUAL=y make external function call (__phys_addr)
      __phys_addr is instrumented, so __asan_load could be called before shadow
      area initialized.
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: NAndrey Konovalov <adech.fo@gmail.com>
      Cc: Yuri Gribov <tetra2005@gmail.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jim Davis <jim.epost@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef7f0d6a
  9. 08 3月, 2014 2 次提交
    • L
      x86: fix compile error due to X86_TRAP_NMI use in asm files · b01d4e68
      Linus Torvalds 提交于
      It's an enum, not a #define, you can't use it in asm files.
      
      Introduced in commit 5fa10196 ("x86: Ignore NMIs that come in during
      early boot"), and sadly I didn't compile-test things like I should have
      before pushing out.
      
      My weak excuse is that the x86 tree generally doesn't introduce stupid
      things like this (and the ARM pull afterwards doesn't cause me to do a
      compile-test either, since I don't cross-compile).
      
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b01d4e68
    • H
      x86: Ignore NMIs that come in during early boot · 5fa10196
      H. Peter Anvin 提交于
      Don Zickus reports:
      
      A customer generated an external NMI using their iLO to test kdump
      worked.  Unfortunately, the machine hung.  Disabling the nmi_watchdog
      made things work.
      
      I speculated the external NMI fired, caused the machine to panic (as
      expected) and the perf NMI from the watchdog came in and was latched.
      My guess was this somehow caused the hang.
      
         ----
      
      It appears that the latched NMI stays latched until the early page
      table generation on 64 bits, which causes exceptions to happen which
      end in IRET, which re-enable NMI.  Therefore, ignore NMIs that come in
      during early execution, until we have proper exception handling.
      Reported-and-tested-by: NDon Zickus <dzickus@redhat.com>
      Link: http://lkml.kernel.org/r/1394221143-29713-1-git-send-email-dzickus@redhat.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: <stable@vger.kernel.org> # v3.5+, older with some backport effort
      5fa10196
  10. 17 7月, 2013 1 次提交
    • K
      x86: Make sure IDT is page aligned · 4df05f36
      Kees Cook 提交于
      Since the IDT is referenced from a fixmap, make sure it is page aligned.
      Merge with 32-bit one, since it was already aligned to deal with F00F
      bug. Since bss is cleared before IDT setup, it can live there. This also
      moves the other *_idt_table variables into common locations.
      
      This avoids the risk of the IDT ever being moved in the bss and having
      the mapping be offset, resulting in calling incorrect handlers. In the
      current upstream kernel this is not a manifested bug, but heavily patched
      kernels (such as those using the PaX patch series) did encounter this bug.
      
      The tables other than idt_table technically do not need to be page
      aligned, at least not at the current time, but using a common
      declaration avoids mistakes.  On 64 bits the table is exactly one page
      long, anyway.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Link: http://lkml.kernel.org/r/20130716183441.GA14232@www.outflux.netReported-by: NPaX Team <pageexec@gmail.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      4df05f36
  11. 21 6月, 2013 2 次提交
    • S
      x86, trace: Add irq vector tracepoints · cf910e83
      Seiji Aguchi 提交于
      [Purpose of this patch]
      
      As Vaibhav explained in the thread below, tracepoints for irq vectors
      are useful.
      
      http://www.spinics.net/lists/mm-commits/msg85707.html
      
      <snip>
      The current interrupt traces from irq_handler_entry and irq_handler_exit
      provide when an interrupt is handled.  They provide good data about when
      the system has switched to kernel space and how it affects the currently
      running processes.
      
      There are some IRQ vectors which trigger the system into kernel space,
      which are not handled in generic IRQ handlers.  Tracing such events gives
      us the information about IRQ interaction with other system events.
      
      The trace also tells where the system is spending its time.  We want to
      know which cores are handling interrupts and how they are affecting other
      processes in the system.  Also, the trace provides information about when
      the cores are idle and which interrupts are changing that state.
      <snip>
      
      On the other hand, my usecase is tracing just local timer event and
      getting a value of instruction pointer.
      
      I suggested to add an argument local timer event to get instruction pointer before.
      But there is another way to get it with external module like systemtap.
      So, I don't need to add any argument to irq vector tracepoints now.
      
      [Patch Description]
      
      Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
      But there is an above use case to trace specific irq_vector rather than tracing all events.
      In this case, we are concerned about overhead due to unwanted events.
      
      So, add following tracepoints instead of introducing irq_vector_entry/exit.
      so that we can enable them independently.
         - local_timer_vector
         - reschedule_vector
         - call_function_vector
         - call_function_single_vector
         - irq_work_entry_vector
         - error_apic_vector
         - thermal_apic_vector
         - threshold_apic_vector
         - spurious_apic_vector
         - x86_platform_ipi_vector
      
      Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty
      makes a zero when tracepoints are disabled. Detailed explanations are as follows.
       - Create trace irq handlers with entering_irq()/exiting_irq().
       - Create a new IDT, trace_idt_table, at boot time by adding a logic to
         _set_gate(). It is just a copy of original idt table.
       - Register the new handlers for tracpoints to the new IDT by introducing
         macros to alloc_intr_gate() called at registering time of irq_vector handlers.
       - Add checking, whether irq vector tracing is on/off, into load_current_idt().
         This has to be done below debug checking for these reasons.
         - Switching to debug IDT may be kicked while tracing is enabled.
         - On the other hands, switching to trace IDT is kicked only when debugging
           is disabled.
      
      In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being
      used for other purposes.
      Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
      Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      cf910e83
    • S
      x86: Rename variables for debugging · 629f4f9d
      Seiji Aguchi 提交于
      Rename variables for debugging to describe meaning of them precisely.
      
      Also, introduce a generic way to switch IDT by checking a current state,
      debug on/off.
      Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
      Link: http://lkml.kernel.org/r/51C323A8.7050905@hds.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      629f4f9d
  12. 29 5月, 2013 1 次提交
    • Z
      x86-64, init: Fix a possible wraparound bug in switchover in head_64.S · e9d0626e
      Zhang Yanfei 提交于
      In head_64.S, a switchover has been used to handle kernel crossing
      1G, 512G boundaries.
      
      And commit 8170e6be
          x86, 64bit: Use a #PF handler to materialize early mappings on demand
      said:
          During the switchover in head_64.S, before #PF handler is available,
          we use three pages to handle kernel crossing 1G, 512G boundaries with
          sharing page by playing games with page aliasing: the same page is
          mapped twice in the higher-level tables with appropriate wraparound.
      
      But from the switchover code, when we set up the PUD table:
      114         addq    $4096, %rdx
      115         movq    %rdi, %rax
      116         shrq    $PUD_SHIFT, %rax
      117         andl    $(PTRS_PER_PUD-1), %eax
      118         movq    %rdx, (4096+0)(%rbx,%rax,8)
      119         movq    %rdx, (4096+8)(%rbx,%rax,8)
      
      It seems line 119 has a potential bug there. For example,
      if the kernel is loaded at physical address 511G+1008M, that is
          000000000 111111111 111111000 000000000000000000000
      and the kernel _end is 512G+2M, that is
          000000001 000000000 000000001 000000000000000000000
      So in this example, when using the 2nd page to setup PUD (line 114~119),
      rax is 511.
      In line 118, we put rdx which is the address of the PMD page (the 3rd page)
      into entry 511 of the PUD table. But in line 119, the entry we calculate from
      (4096+8)(%rbx,%rax,8) has exceeded the PUD page. IMO, the entry in line
      119 should be wraparound into entry 0 of the PUD table.
      
      The patch fixes the bug.
      Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Link: http://lkml.kernel.org/r/5191DE5A.3020302@cn.fujitsu.comSigned-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: <stable@vger.kernel.org> v3.9
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      e9d0626e
  13. 03 5月, 2013 1 次提交
  14. 26 2月, 2013 1 次提交
  15. 23 2月, 2013 1 次提交
  16. 30 1月, 2013 1 次提交
    • H
      x86, 64bit: Use a #PF handler to materialize early mappings on demand · 8170e6be
      H. Peter Anvin 提交于
      Linear mode (CR0.PG = 0) is mutually exclusive with 64-bit mode; all
      64-bit code has to use page tables.  This makes it awkward before we
      have first set up properly all-covering page tables to access objects
      that are outside the static kernel range.
      
      So far we have dealt with that simply by mapping a fixed amount of
      low memory, but that fails in at least two upcoming use cases:
      
      1. We will support load and run kernel, struct boot_params, ramdisk,
         command line, etc. above the 4 GiB mark.
      2. need to access ramdisk early to get microcode to update that as
         early possible.
      
      We could use early_iomap to access them too, but it will make code to
      messy and hard to be unified with 32 bit.
      
      Hence, set up a #PF table and use a fixed number of buffers to set up
      page tables on demand.  If the buffers fill up then we simply flush
      them and start over.  These buffers are all in __initdata, so it does
      not increase RAM usage at runtime.
      
      Thus, with the help of the #PF handler, we can set the final kernel
      mapping from blank, and switch to init_level4_pgt later.
      
      During the switchover in head_64.S, before #PF handler is available,
      we use three pages to handle kernel crossing 1G, 512G boundaries with
      sharing page by playing games with page aliasing: the same page is
      mapped twice in the higher-level tables with appropriate wraparound.
      The kernel region itself will be properly mapped; other mappings may
      be spurious.
      
      early_make_pgtable is using kernel high mapping address to access pages
      to set page table.
      
      -v4: Add phys_base offset to make kexec happy, and add
      	init_mapping_kernel()   - Yinghai
      -v5: fix compiling with xen, and add back ident level3 and level2 for xen
           also move back init_level4_pgt from BSS to DATA again.
           because we have to clear it anyway.  - Yinghai
      -v6: switch to init_level4_pgt in init_mem_mapping. - Yinghai
      -v7: remove not needed clear_page for init_level4_page
           it is with fill 512,8,0 already in head_64.S  - Yinghai
      -v8: we need to keep that handler alive until init_mem_mapping and don't
           let early_trap_init to trash that early #PF handler.
           So split early_trap_pf_init out and move it down. - Yinghai
      -v9: switchover only cover kernel space instead of 1G so could avoid
           touch possible mem holes. - Yinghai
      -v11: change far jmp back to far return to initial_code, that is needed
           to fix failure that is reported by Konrad on AMD systems.  - Yinghai
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-12-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      8170e6be
  17. 15 11月, 2012 1 次提交
  18. 09 5月, 2012 1 次提交
  19. 20 4月, 2012 3 次提交
  20. 22 12月, 2011 1 次提交
    • S
      x86: Keep current stack in NMI breakpoints · 228bdaa9
      Steven Rostedt 提交于
      We want to allow NMI handlers to have breakpoints to be able to
      remove stop_machine from ftrace, kprobes and jump_labels. But if
      an NMI interrupts a current breakpoint, and then it triggers a
      breakpoint itself, it will switch to the breakpoint stack and
      corrupt the data on it for the breakpoint processing that it
      interrupted.
      
      Instead, have the NMI check if it interrupted breakpoint processing
      by checking if the stack that is currently used is a breakpoint
      stack. If it is, then load a special IDT that changes the IST
      for the debug exception to keep the same stack in kernel context.
      When the NMI is done, it puts it back.
      
      This way, if the NMI does trigger a breakpoint, it will keep
      using the same stack and not stomp on the breakpoint data for
      the breakpoint it interrupted.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      228bdaa9
  21. 18 2月, 2011 1 次提交
    • H
      x86, trampoline: Common infrastructure for low memory trampolines · 4822b7fc
      H. Peter Anvin 提交于
      Common infrastructure for low memory trampolines.  This code installs
      the trampolines permanently in low memory very early.  It also permits
      multiple pieces of code to be used for this purpose.
      
      This code also introduces a standard infrastructure for computing
      symbol addresses in the trampoline code.
      
      The only change to the actual SMP trampolines themselves is that the
      64-bit trampoline has been made reusable -- the previous version would
      overwrite the code with a status variable; this moves the status
      variable to a separate location.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      LKML-Reference: <4D5DFBE4.7090104@intel.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Matthieu Castet <castet.matthieu@free.fr>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      4822b7fc
  22. 22 7月, 2010 1 次提交
  23. 09 2月, 2010 1 次提交
  24. 26 11月, 2009 1 次提交
  25. 20 10月, 2009 1 次提交
    • S
      x86-64: preserve large page mapping for 1st 2MB kernel txt with CONFIG_DEBUG_RODATA · b9af7c0d
      Suresh Siddha 提交于
      In the first 2MB, kernel text is co-located with kernel static
      page tables setup by head_64.S.  CONFIG_DEBUG_RODATA chops this
      2MB large page mapping to small 4KB pages as we mark the kernel text as RO,
      leaving the static page tables as RW.
      
      With CONFIG_DEBUG_RODATA disabled, OLTP run on NHM-EP shows 1% improvement
      with 2% reduction in system time and 1% improvement in iowait idle time.
      
      To recover this, move the kernel static page tables to .data section, so that
      we don't have to break the first 2MB of kernel text to small pages with
      CONFIG_DEBUG_RODATA.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20091014220254.063193621@sbs-t61.sc.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      b9af7c0d
  26. 21 9月, 2009 1 次提交
  27. 19 9月, 2009 1 次提交
    • T
      x86: convert to use __HEAD and HEAD_TEXT macros. · 4ae59b91
      Tim Abbott 提交于
      This has the consequence of changing the section name use for head
      code from ".text.head" to ".head.text".  It also eliminates the
      ".text.head" output section (instead placing head code at the start of
      the .text output section), which should be harmless.
      
      This patch only changes the sections in the actual kernel, not those
      in the compressed boot loader.
      Signed-off-by: NTim Abbott <tabbott@ksplice.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      4ae59b91
  28. 18 6月, 2009 1 次提交
  29. 25 2月, 2009 2 次提交
  30. 09 2月, 2009 1 次提交
  31. 20 1月, 2009 2 次提交
  32. 16 1月, 2009 3 次提交
    • T
      x86: make pda a percpu variable · b12d8db8
      Tejun Heo 提交于
      [ Based on original patch from Christoph Lameter and Mike Travis. ]
      
      As pda is now allocated in percpu area, it can easily be made a proper
      percpu variable.  Make it so by defining per cpu symbol from linker
      script and declaring it in C code for SMP and simply defining it for
      UP.  This change cleans up code and brings SMP and UP closer a bit.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b12d8db8
    • T
      x86: fold pda into percpu area on SMP · 1a51e3a0
      Tejun Heo 提交于
      [ Based on original patch from Christoph Lameter and Mike Travis. ]
      
      Currently pdas and percpu areas are allocated separately.  %gs points
      to local pda and percpu area can be reached using pda->data_offset.
      This patch folds pda into percpu area.
      
      Due to strange gcc requirement, pda needs to be at the beginning of
      the percpu area so that pda->stack_canary is at %gs:40.  To achieve
      this, a new percpu output section macro - PERCPU_VADDR_PREALLOC() - is
      added and used to reserve pda sized chunk at the start of the percpu
      area.
      
      After this change, for boot cpu, %gs first points to pda in the
      data.init area and later during setup_per_cpu_areas() gets updated to
      point to the actual pda.  This means that setup_per_cpu_areas() need
      to reload %gs for CPU0 while clearing pda area for other cpus as cpu0
      already has modified it when control reaches setup_per_cpu_areas().
      
      This patch also removes now unnecessary get_local_pda() and its call
      sites.
      
      A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into
      per cpu area" patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1a51e3a0
    • T
      x86: load pointer to pda into %gs while brining up a CPU · f32ff538
      Tejun Heo 提交于
      [ Based on original patch from Christoph Lameter and Mike Travis. ]
      
      CPU startup code in head_64.S loaded address of a zero page into %gs
      for temporary use till pda is loaded but address to the actual pda is
      available at the point.  Load the real address directly instead.
      
      This will help unifying percpu and pda handling later on.
      
      This patch is mostly taken from Mike Travis' "x86_64: Fold pda into
      per cpu area" patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f32ff538