1. 29 4月, 2016 1 次提交
    • Y
      x86/boot: Move compressed kernel to the end of the decompression buffer · 974f221c
      Yinghai Lu 提交于
      This change makes later calculations about where the kernel is located
      easier to reason about. To better understand this change, we must first
      clarify what 'VO' and 'ZO' are. These values were introduced in commits
      by hpa:
      
        77d1a499 ("x86, boot: make symbols from the main vmlinux available")
        37ba7ab5 ("x86, boot: make kernel_alignment adjustable; new bzImage fields")
      
      Specifically:
      
      All names prefixed with 'VO_':
      
       - relate to the uncompressed kernel image
      
       - the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define)
      
      All names prefixed with 'ZO_':
      
       - relate to the bootable compressed kernel image (boot/compressed/vmlinux),
         which is composed of the following memory areas:
           - head text
           - compressed kernel (VO image and relocs table)
           - decompressor code
      
       - the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)
      
      The 'INIT_SIZE' value is used to find the larger of the two image sizes:
      
       #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
       #define VO_INIT_SIZE    (VO__end - VO__text)
      
       #if ZO_INIT_SIZE > VO_INIT_SIZE
       # define INIT_SIZE ZO_INIT_SIZE
       #else
       # define INIT_SIZE VO_INIT_SIZE
       #endif
      
      The current code uses extract_offset to decide where to position the
      copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
      currently includes the extract_offset.)
      
      Why does z_extract_offset exist? It's needed because we are trying to minimize
      the amount of RAM used for the whole act of creating an uncompressed, executable,
      properly relocation-linked kernel image in system memory. We do this so that
      kernels can be booted on even very small systems.
      
      To achieve the goal of minimal memory consumption we have implemented an in-place
      decompression strategy: instead of cleanly separating the VO and ZO images and
      also allocating some memory for the decompression code's runtime needs, we instead
      create this elaborate layout of memory buffers where the output (decompressed)
      stream, as it progresses, overlaps with and destroys the input (compressed)
      stream. This can only be done safely if the ZO image is placed to the end of the
      VO range, plus a certain amount of safety distance to make sure that when the last
      bytes of the VO range are decompressed, the compressed stream pointer is safely
      beyond the end of the VO range.
      
      z_extract_offset is calculated in arch/x86/boot/compressed/mkpiggy.c during
      the build process, at a point when we know the exact compressed and
      uncompressed size of the kernel images and can calculate this safe minimum
      offset value. (Note that the mkpiggy.c calculation is not perfect, because
      we don't know the decompressor used at that stage, so the z_extract_offset
      calculation is necessarily imprecise and is mostly based on gzip internals -
      we'll improve that in the next patch.)
      
      When INIT_SIZE is bigger than VO_INIT_SIZE (uncommon but possible),
      the copied ZO occupies the memory from extract_offset to the end of
      decompression buffer. It overlaps with the soon-to-be-uncompressed kernel
      like this:
      
                                  |-----compressed kernel image------|
                                  V                                  V
      0                       extract_offset                      +INIT_SIZE
      |-----------|---------------|-------------------------|--------|
                  |               |                         |        |
                VO__text      startup_32 of ZO          VO__end    ZO__end
                  ^                                         ^
                  |-------uncompressed kernel image---------|
      
      When INIT_SIZE is equal to VO_INIT_SIZE (likely) there's still space
      left from end of ZO to the end of decompressing buffer, like below.
      
                                  |-compressed kernel image-|
                                  V                         V
      0                       extract_offset                      +INIT_SIZE
      |-----------|---------------|-------------------------|--------|
                  |               |                         |        |
                VO__text      startup_32 of ZO          ZO__end    VO__end
                  ^                                                  ^
                  |------------uncompressed kernel image-------------|
      
      To simplify calculations and avoid special cases, it is cleaner to
      always place the compressed kernel image in memory so that ZO__end
      is at the end of the decompression buffer, instead of placing t at
      the start of extract_offset as is currently done.
      
      This patch adds BP_init_size (which is the INIT_SIZE as passed in from
      the boot_params) into asm-offsets.c to make it visible to the assembly
      code.
      
      Then when moving the ZO, it calculates the starting position of
      the copied ZO (via BP_init_size and the ZO run size) so that the VO__end
      will be at the end of the decompression buffer. To make the position
      calculation safe, the end of ZO is page aligned (and a comment is added
      to the existing VO alignment for good measure).
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      [ Rewrote changelog and comments. ]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: lasse.collin@tukaani.org
      Link: http://lkml.kernel.org/r/1461888548-32439-3-git-send-email-keescook@chromium.org
      [ Rewrote the changelog some more. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      974f221c
  2. 26 3月, 2016 1 次提交
  3. 21 3月, 2016 1 次提交
    • A
      x86/kallsyms: fix GOLD link failure with new relative kallsyms table format · 142b9e6c
      Ard Biesheuvel 提交于
      Commit 2213e9a6 ("kallsyms: add support for relative offsets in
      kallsyms address table") changed the default kallsyms symbol table
      format to use relative references rather than absolute addresses.
      
      This reduces the size of the kallsyms symbol table by 50% on 64-bit
      architectures, and further reduces the size of the relocation tables
      used by relocatable kernels.  Since the memory footprint of the static
      kernel image is always much smaller than 4 GB, these relative references
      are assumed to be representable in 32 bits, even when the native word
      size is 64 bits.
      
      On 64-bit architectures, this obviously only works if the distance
      between each relative reference and the chosen anchor point is
      representable in 32 bits, and so the table generation code in
      scripts/kallsyms.c scans the table for the lowest value that is covered
      by the kernel text, and selects it as the anchor point.
      
      However, when using the GOLD linker rather than the default BFD linker
      to build the x86_64 kernel, the symbol phys_offset_64, which is the
      result of arithmetic defined in the linker script, is emitted as a 'T'
      rather than an 'A' type symbol, resulting in scripts/kallsyms.c to
      mistake it for a suitable anchor point, even though it is far away from
      the actual kernel image in the virtual address space.  This results in
      out-of-range warnings from scripts/kallsyms.c and a broken build.
      
      So let's align with the BFD linker, and emit the phys_offset_[32|64]
      symbols as absolute symbols explicitly.  Note that the out of range
      issue does not exist on 32-bit x86, but this patch changes both symbols
      for symmetry.
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      142b9e6c
  4. 29 2月, 2016 1 次提交
  5. 22 2月, 2016 1 次提交
  6. 30 1月, 2016 1 次提交
  7. 29 11月, 2015 1 次提交
  8. 11 9月, 2015 1 次提交
    • D
      kexec: split kexec_load syscall from kexec core code · 2965faa5
      Dave Young 提交于
      There are two kexec load syscalls, kexec_load another and kexec_file_load.
       kexec_file_load has been splited as kernel/kexec_file.c.  In this patch I
      split kexec_load syscall code to kernel/kexec.c.
      
      And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
      use kexec_file_load only, or vice verse.
      
      The original requirement is from Ted Ts'o, he want kexec kernel signature
      being checked with CONFIG_KEXEC_VERIFY_SIG enabled.  But kexec-tools use
      kexec_load syscall can bypass the checking.
      
      Vivek Goyal proposed to create a common kconfig option so user can compile
      in only one syscall for loading kexec kernel.  KEXEC/KEXEC_FILE selects
      KEXEC_CORE so that old config files still work.
      
      Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
      architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
      KEXEC_CORE in arch Kconfig.  Also updated general kernel code with to
      kexec_load syscall.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NDave Young <dyoung@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Petr Tesarik <ptesarik@suse.cz>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Josh Boyer <jwboyer@fedoraproject.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2965faa5
  9. 05 11月, 2014 1 次提交
    • J
      x86-64: Use RIP-relative addressing for most per-CPU accesses · 97b67ae5
      Jan Beulich 提交于
      Observing that per-CPU data (in the SMP case) is reachable by
      exploiting 64-bit address wraparound (building on the default kernel
      load address being at 16Mb), the one byte shorter RIP-relative
      addressing form can be used for most per-CPU accesses. The one
      exception are the "stable" reads, where the use of the "P" operand
      modifier prevents the compiler from using RIP-relative addressing, but
      is unavoidable due to the use of the "p" constraint (side note: with
      gcc 4.9.x the intended effect of this isn't being achieved anymore,
      see gcc bug 63637).
      
      With the dependency on the minimum kernel load address, arbitrarily
      low values for CONFIG_PHYSICAL_START are now no longer possible. A
      link time assertion is being added, directing to the need to increase
      that value when it triggers.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Link: http://lkml.kernel.org/r/5458A1780200007800044A9D@mail.emea.novell.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      97b67ae5
  10. 19 3月, 2014 2 次提交
  11. 18 10月, 2013 1 次提交
  12. 11 3月, 2013 1 次提交
  13. 09 5月, 2012 1 次提交
  14. 11 8月, 2011 1 次提交
  15. 05 8月, 2011 2 次提交
  16. 15 7月, 2011 1 次提交
  17. 06 6月, 2011 3 次提交
  18. 24 5月, 2011 1 次提交
    • A
      x86-64: Clean up vdso/kernel shared variables · 8c49d9a7
      Andy Lutomirski 提交于
      Variables that are shared between the vdso and the kernel are
      currently a bit of a mess.  They are each defined with their own
      magic, they are accessed differently in the kernel, the vsyscall page,
      and the vdso, and one of them (vsyscall_clock) doesn't even really
      exist.
      
      This changes them all to use a common mechanism.  All of them are
      delcared in vvar.h with a fixed address (validated by the linker
      script).  In the kernel (as before), they look like ordinary
      read-write variables.  In the vsyscall page and the vdso, they are
      accessed through a new macro VVAR, which gives read-only access.
      
      The vdso is now loaded verbatim into memory without any fixups.  As a
      side bonus, access from the vdso is faster because a level of
      indirection is removed.
      
      While we're at it, pack jiffies and vgetcpu_mode into the same
      cacheline.
      Signed-off-by: NAndy Lutomirski <luto@mit.edu>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Borislav Petkov <bp@amd64.org>
      Link: http://lkml.kernel.org/r/%3C7357882fbb51fa30491636a7b6528747301b7ee9.1306156808.git.luto%40mit.edu%3ESigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8c49d9a7
  19. 22 5月, 2011 1 次提交
  20. 25 3月, 2011 1 次提交
    • T
      percpu: Always align percpu output section to PAGE_SIZE · 0415b00d
      Tejun Heo 提交于
      Percpu allocator honors alignment request upto PAGE_SIZE and both the
      percpu addresses in the percpu address space and the translated kernel
      addresses should be aligned accordingly.  The calculation of the
      former depends on the alignment of percpu output section in the kernel
      image.
      
      The linker script macros PERCPU_VADDR() and PERCPU() are used to
      define this output section and the latter takes @align parameter.
      Several architectures are using @align smaller than PAGE_SIZE breaking
      percpu memory alignment.
      
      This patch removes @align parameter from PERCPU(), renames it to
      PERCPU_SECTION() and makes it always align to PAGE_SIZE.  While at it,
      add PCPU_SETUP_BUG_ON() checks such that alignment problems are
      reliably detected and remove percpu alignment comment recently added
      in workqueue.c as the condition would trigger BUG way before reaching
      there.
      
      For um, this patch raises the alignment of percpu area.  As the area
      is in .init, there shouldn't be any noticeable difference.
      
      This problem was discovered by David Howells while debugging boot
      failure on mn10300.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NMike Frysinger <vapier@gentoo.org>
      Cc: uclinux-dist-devel@blackfin.uclinux.org
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: user-mode-linux-devel@lists.sourceforge.net
      0415b00d
  21. 09 3月, 2011 1 次提交
    • J
      x86: Separate out entry text section · ea714547
      Jiri Olsa 提交于
      Put x86 entry code into a separate link section: .entry.text.
      
      Separating the entry text section seems to have performance
      benefits - caused by more efficient instruction cache usage.
      
      Running hackbench with perf stat --repeat showed that the change
      compresses the icache footprint. The icache load miss rate went
      down by about 15%:
      
       before patch:
               19417627  L1-icache-load-misses      ( +-   0.147% )
      
       after patch:
               16490788  L1-icache-load-misses      ( +-   0.180% )
      
      The motivation of the patch was to fix a particular kprobes
      bug that relates to the entry text section, the performance
      advantage was discovered accidentally.
      
      Whole perf output follows:
      
       - results for current tip tree:
      
        Performance counter stats for './hackbench/hackbench 10' (500 runs):
      
               19417627  L1-icache-load-misses      ( +-   0.147% )
             2676914223  instructions             #      0.497 IPC     ( +- 0.079% )
             5389516026  cycles                     ( +-   0.144% )
      
            0.206267711  seconds time elapsed   ( +-   0.138% )
      
       - results for current tip tree with the patch applied:
      
        Performance counter stats for './hackbench/hackbench 10' (500 runs):
      
               16490788  L1-icache-load-misses      ( +-   0.180% )
             2717734941  instructions             #      0.502 IPC     ( +- 0.079% )
             5414756975  cycles                     ( +-   0.148% )
      
            0.206747566  seconds time elapsed   ( +-   0.137% )
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: masami.hiramatsu.pt@hitachi.com
      Cc: ananth@in.ibm.com
      Cc: davem@davemloft.net
      Cc: 2nddept-manager@sdl.hitachi.co.jp
      LKML-Reference: <20110307181039.GB15197@jolsa.redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ea714547
  22. 18 2月, 2011 1 次提交
    • H
      x86, trampoline: Common infrastructure for low memory trampolines · 4822b7fc
      H. Peter Anvin 提交于
      Common infrastructure for low memory trampolines.  This code installs
      the trampolines permanently in low memory very early.  It also permits
      multiple pieces of code to be used for this purpose.
      
      This code also introduces a standard infrastructure for computing
      symbol addresses in the trampoline code.
      
      The only change to the actual SMP trampolines themselves is that the
      64-bit trampoline has been made reusable -- the previous version would
      overwrite the code with a status variable; this moves the status
      variable to a separate location.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      LKML-Reference: <4D5DFBE4.7090104@intel.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Matthieu Castet <castet.matthieu@free.fr>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      4822b7fc
  23. 10 2月, 2011 1 次提交
  24. 25 1月, 2011 1 次提交
    • T
      percpu: align percpu readmostly subsection to cacheline · 19df0c2f
      Tejun Heo 提交于
      Currently percpu readmostly subsection may share cachelines with other
      percpu subsections which may result in unnecessary cacheline bounce
      and performance degradation.
      
      This patch adds @cacheline parameter to PERCPU() and PERCPU_VADDR()
      linker macros, makes each arch linker scripts specify its cacheline
      size and use it to align percpu subsections.
      
      This is based on Shaohua's x86 only patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Shaohua Li <shaohua.li@intel.com>
      19df0c2f
  25. 19 1月, 2011 1 次提交
  26. 18 1月, 2011 1 次提交
    • S
      x86: Make relocatable kernel work with new binutils · 86b1e8dd
      Shaohua Li 提交于
      The CONFIG_RELOCATABLE=y option is broken with new binutils, which will make
      boot panic.
      
      According to Lu Hongjiu, the affected binutils are from 2.20.51.0.12 to
      2.21.51.0.3, which are release since Oct 22 this year. At least ubuntu 10.10 is
      using such binutils. See:
      
          http://sourceware.org/bugzilla/show_bug.cgi?id=12327
      
      The reason of the boot panic is that we have 'jiffies = jiffies_64;' in
      vmlinux.lds.S. The jiffies isn't in any section. In kernel build, there is
      warning saying jiffies is an absolute address and can't be relocatable. At
      runtime, jiffies will have virtual address 0.
      
      Signed-off-by: Shaohua Li<shaohua.li@intel.com>
      Cc: Lu Hongjiu<hongjiu.lu@intel.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      LKML-Reference: <1295312269.1949.725.camel@sli10-conroe>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      86b1e8dd
  27. 18 11月, 2010 1 次提交
    • M
      x86: Add NX protection for kernel data · 5bd5a452
      Matthieu Castet 提交于
      This patch expands functionality of CONFIG_DEBUG_RODATA to set main
      (static) kernel data area as NX.
      
      The following steps are taken to achieve this:
      
       1. Linker script is adjusted so .text always starts and ends on a page bound
       2. Linker script is adjusted so .rodata always start and end on a page boundary
       3. NX is set for all pages from _etext through _end in mark_rodata_ro.
       4. free_init_pages() sets released memory NX in arch/x86/mm/init.c
       5. bios rom is set to x when pcibios is used.
      
      The results of patch application may be observed in the diff of kernel page
      table dumps:
      
      pcibios:
      
       -- data_nx_pt_before.txt       2009-10-13 07:48:59.000000000 -0400
       ++ data_nx_pt_after.txt        2009-10-13 07:26:46.000000000 -0400
        0x00000000-0xc0000000           3G                           pmd
        ---[ Kernel Mapping ]---
       -0xc0000000-0xc0100000           1M     RW             GLB x  pte
       +0xc0000000-0xc00a0000         640K     RW             GLB NX pte
       +0xc00a0000-0xc0100000         384K     RW             GLB x  pte
       -0xc0100000-0xc03d7000        2908K     ro             GLB x  pte
       +0xc0100000-0xc0318000        2144K     ro             GLB x  pte
       +0xc0318000-0xc03d7000         764K     ro             GLB NX pte
       -0xc03d7000-0xc0600000        2212K     RW             GLB x  pte
       +0xc03d7000-0xc0600000        2212K     RW             GLB NX pte
        0xc0600000-0xf7a00000         884M     RW         PSE GLB NX pmd
        0xf7a00000-0xf7bfe000        2040K     RW             GLB NX pte
        0xf7bfe000-0xf7c00000           8K                           pte
      
      No pcibios:
      
       -- data_nx_pt_before.txt       2009-10-13 07:48:59.000000000 -0400
       ++ data_nx_pt_after.txt        2009-10-13 07:26:46.000000000 -0400
        0x00000000-0xc0000000           3G                           pmd
        ---[ Kernel Mapping ]---
       -0xc0000000-0xc0100000           1M     RW             GLB x  pte
       +0xc0000000-0xc0100000           1M     RW             GLB NX pte
       -0xc0100000-0xc03d7000        2908K     ro             GLB x  pte
       +0xc0100000-0xc0318000        2144K     ro             GLB x  pte
       +0xc0318000-0xc03d7000         764K     ro             GLB NX pte
       -0xc03d7000-0xc0600000        2212K     RW             GLB x  pte
       +0xc03d7000-0xc0600000        2212K     RW             GLB NX pte
        0xc0600000-0xf7a00000         884M     RW         PSE GLB NX pmd
        0xf7a00000-0xf7bfe000        2040K     RW             GLB NX pte
        0xf7bfe000-0xf7c00000           8K                           pte
      
      The patch has been originally developed for Linux 2.6.34-rc2 x86 by
      Siarhei Liakh <sliakh.lkml@gmail.com> and Xuxian Jiang <jiang@cs.ncsu.edu>.
      
       -v1:  initial patch for 2.6.30
       -v2:  patch for 2.6.31-rc7
       -v3:  moved all code into arch/x86, adjusted credits
       -v4:  fixed ifdef, removed credits from CREDITS
       -v5:  fixed an address calculation bug in mark_nxdata_nx()
       -v6:  added acked-by and PT dump diff to commit log
       -v7:  minor adjustments for -tip
       -v8:  rework with the merge of "Set first MB as RW+NX"
      Signed-off-by: NSiarhei Liakh <sliakh.lkml@gmail.com>
      Signed-off-by: NXuxian Jiang <jiang@cs.ncsu.edu>
      Signed-off-by: NMatthieu CASTET <castet.matthieu@free.fr>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: James Morris <jmorris@namei.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Kees Cook <kees.cook@canonical.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <4CE2F82E.60601@free.fr>
      [ minor cleanliness edits ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5bd5a452
  28. 07 9月, 2010 1 次提交
  29. 31 8月, 2010 1 次提交
    • K
      x86, iommu: Fix IOMMU_INIT alignment rules · 7ac41ccf
      Konrad Rzeszutek Wilk 提交于
      This boot crash was observed:
      
       DMA-API: preallocated 32768 debug entries
       DMA-API: debugging enabled by kernel config
       BUG: unable to handle kernel paging request at 19da8955
       IP: [<f4ffffff>] 0xf4ffffff
       *pde = 00000000
      
      The crux of the failure was that even if we did not use any
      of the .iommu_table section, the linker would still insert it
      in the vmlinux file. This patch fixes that and also fixes the
      runtime crash where we would try to access the array.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Joerg Roedel <joerg.roedel@amd.com>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      LKML-Reference: <1283191802-25086-1-git-send-email-konrad.wilk@oracle.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7ac41ccf
  30. 28 8月, 2010 1 次提交
  31. 27 8月, 2010 1 次提交
    • K
      x86, iommu: Add IOMMU_INIT macros, .iommu_table section, and iommu_table_entry structure · 0444ad93
      Konrad Rzeszutek Wilk 提交于
      This patch set adds a mechanism to "modularize" the IOMMUs we have
      on X86. Currently the count of IOMMUs is up to six and they have a complex
      relationship that requires careful execution order. 'pci_iommu_alloc'
      does that today, but most folks are unhappy with how it does it.
      This patch set addresses this and also paves a mechanism to jettison
      unused IOMMUs during run-time. For details that sparked this, please
      refer to: http://lkml.org/lkml/2010/8/2/282
      
      The first solution that comes to mind is to convert wholesale
      the IOMMU detection routines to be called during initcall
      time frame. Unfortunately that misses the dependency relationship
      that some of the IOMMUs have (for example: for AMD-Vi IOMMU to work,
      GART detection MUST run first, and before all of that SWIOTLB MUST run).
      
      The second solution would be to introduce a registration call wherein
      the IOMMU would provide its detection/init routines and as well on what
      MUST run before it. That would work, except that the 'pci_iommu_alloc'
      which would run through this list, is called during mem_init. This means we
      don't have any memory allocator, and it is so early that we haven't yet
      started running through the initcall_t list.
      
      This solution borrows concepts from the 2nd idea and from how
      MODULE_INIT works. A macro is provided that each IOMMU uses to define
      it's detect function and early_init (before the memory allocate is
      active), and as well what other IOMMU MUST run before us.  Since most IOMMUs
      depend on having SWIOTLB run first ("pci_swiotlb_detect") a convenience macro
      to depends on that is also provided.
      
      This macro is similar in design to MODULE_PARAM macro wherein
      we setup a .iommu_table section in which we populate it with the values
      that match a struct iommu_table_entry. During bootup we will sort
      through the array so that the IOMMUs that MUST run before us are first
      elements in the array. And then we just iterate through them calling the
      detection routine and if appropiate, the init routines.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      LKML-Reference: <1282845485-8991-2-git-send-email-konrad.wilk@oracle.com>
      CC: H. Peter Anvin <hpa@zytor.com>
      CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      0444ad93
  32. 30 3月, 2010 1 次提交
    • Y
      x86: Make smp_locks end with page alignment · 596b711e
      Yinghai Lu 提交于
      Fix:
      
       ------------[ cut here ]------------
       WARNING: at arch/x86/mm/init.c:342 free_init_pages+0x4c/0xfa()
       free_init_pages: range [0x40daf000, 0x40db5c24] is not aligned
       Modules linked in:
       Pid: 0, comm: swapper Not tainted
       2.6.34-rc2-tip-03946-g4f16b23-dirty #50 Call Trace:
        [<40232e9f>] warn_slowpath_common+0x65/0x7c
        [<4021c9f0>] ? free_init_pages+0x4c/0xfa
        [<40881434>] ? _etext+0x0/0x24
        [<40232eea>] warn_slowpath_fmt+0x24/0x27
        [<4021c9f0>] free_init_pages+0x4c/0xfa
        [<40881434>] ? _etext+0x0/0x24
        [<40d3f4bd>] alternative_instructions+0xf6/0x100
        [<40d3fe4f>] check_bugs+0xbd/0xbf
        [<40d398a7>] start_kernel+0x2d5/0x2e4
        [<40d390ce>] i386_start_kernel+0xce/0xd5
       ---[ end trace 4eaa2a86a8e2da22 ]---
      
      Comments in vmlinux.lds.S already said:
      
       |        /*
       |         * smp_locks might be freed after init
       |         * start/end must be page aligned
       |         */
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <1269830604-26214-2-git-send-email-yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      596b711e
  33. 03 3月, 2010 2 次提交
  34. 15 12月, 2009 1 次提交
    • H
      x86: Regex support and known-movable symbols for relocs, fix _end · 873b5271
      H. Peter Anvin 提交于
      This adds a new category of symbols to the relocs program: symbols
      which are known to be relative, even though the linker emits them as
      absolute; this is the case for symbols that live in the linker script,
      which currently applies to _end.
      
      Unfortunately the previous workaround of putting _end in its own empty
      section was defeated by newer binutils, which remove empty sections
      completely.
      
      This patch also changes the symbol matching to use regular expressions
      instead of hardcoded C for specific patterns.
      
      This is a decidedly non-minimal patch: a modified version of the
      relocs program is used as part of the Syslinux build, and this 	is
      basically a backport to Linux of some of those changes; they have
      thus been well tested.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      LKML-Reference: <4AF86211.3070103@zytor.com>
      Acked-by: NMichal Marek <mmarek@suse.cz>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      873b5271
  35. 19 11月, 2009 1 次提交
    • J
      x86: Eliminate redundant/contradicting cache line size config options · 350f8f56
      Jan Beulich 提交于
      Rather than having X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
      (with inconsistent defaults), just having the latter suffices as
      the former can be easily calculated from it.
      
      To be consistent, also change X86_INTERNODE_CACHE_BYTES to
      X86_INTERNODE_CACHE_SHIFT, and set it to 7 (128 bytes) for NUMA
      to account for last level cache line size (which here matters
      more than L1 cache line size).
      
      Finally, make sure the default value for X86_L1_CACHE_SHIFT,
      when X86_GENERIC is selected, is being seen before that for the
      individual CPU model options (other than on x86-64, where
      GENERIC_CPU is part of the choice construct, X86_GENERIC is a
      separate option on ix86).
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Acked-by: NRavikiran Thirumalai <kiran@scalex86.org>
      Acked-by: NNick Piggin <npiggin@suse.de>
      LKML-Reference: <4AFD5710020000780001F8F0@vpn.id2.novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      350f8f56