1. 15 3月, 2009 4 次提交
    • Y
      x86: put initial_pg_tables into .bss · 2bd2753f
      Yinghai Lu 提交于
      Impact: makes vmlinux section information more useful
      
      Don't use ram after _end blindly for pagetables. aka init pages is before _end
      put those pg table into .bss
      
      [Adapted to use brk segment - Jeremy]
      
      v2: keep initial page table up to 512M only.
      v4: put initial page tables just before _end
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      2bd2753f
    • J
      x86: allow extend_brk users to reserve brk space · 796216a5
      Jeremy Fitzhardinge 提交于
      Impact: new interface; remove hard-coded limit
      
      Add RESERVE_BRK(name, size) macro to reserve space in the brk
      area.  This should be a conservative (ie, larger) estimate of
      how much space might possibly be required from the brk area.
      Any unused space will be freed, so there's no real downside
      on making the reservation too large (within limits).
      
      The name should be unique within a given file, and somewhat
      descriptive.
      
      The C definition of RESERVE_BRK() ends up being more complex than
      one would expect to work around a cluster of gcc infelicities:
      
        The first attempt was to simply try putting __section(.brk_reservation)
        on a variable.  This doesn't work because it ends up making it a
        @progbits section, which gets actual space allocated in the vmlinux
        executable.
      
        The second attempt was to emit the space into a section using asm,
        but gcc doesn't allow arguments to be passed to file-level asm()
        statements, making it hard to pass in the size.
      
        The final attempt is to wrap the asm() in a function to allow
        it to have arguments, and put the function itself into the
        .discard section, which vmlinux*.lds drops entirely from the
        emitted vmlinux.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      796216a5
    • J
      x86-32: use brk segment for allocating initial kernel pagetable · ccf3fe02
      Jeremy Fitzhardinge 提交于
      Impact: use new interface instead of previous ad hoc implementation
      
      Rather than having special purpose init_pg_table_start/end variables
      to delimit the kernel pagetable built by head_32.S, just use the brk
      mechanism to extend the bss for the new pagetable.
      
      This patch removes init_pg_table_start/end and pg0, defines __brk_base
      (which is page-aligned and immediately follows _end), initializes
      the brk region to start there, and uses it for the 32-bit pagetable.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ccf3fe02
    • J
      x86: add brk allocation for very, very early allocations · 93dbda7c
      Jeremy Fitzhardinge 提交于
      Impact: new interface
      
      Add a brk()-like allocator which effectively extends the bss in order
      to allow very early code to do dynamic allocations.  This is better than
      using statically allocated arrays for data in subsystems which may never
      get used.
      
      The space for brk allocations is in the bss ELF segment, so that the
      space is mapped properly by the code which maps the kernel, and so
      that bootloaders keep the space free rather than putting a ramdisk or
      something into it.
      
      The bss itself, delimited by __bss_stop, ends before the brk area
      (__brk_base to __brk_limit).  The kernel text, data and bss is reserved
      up to __bss_stop.
      
      Any brk-allocated data is reserved separately just before the kernel
      pagetable is built, as that code allocates from unreserved spaces
      in the e820 map, potentially allocating from any unused brk memory.
      Ultimately any unused memory in the brk area is used in the general
      kernel memory pool.
      
      Initially the brk space is set to 1MB, which is probably much larger
      than any user needs (the largest current user is i386 head_32.S's code
      to build the pagetables to map the kernel, which can get fairly large
      with a big kernel image and no PSE support).  So long as the system
      has sufficient memory for the bootloader to reserve the kernel+1MB brk,
      there are no bad effects resulting from an over-large brk.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      93dbda7c
  2. 14 2月, 2009 1 次提交
  3. 16 1月, 2009 1 次提交
  4. 12 12月, 2008 1 次提交
  5. 16 10月, 2008 2 次提交
  6. 05 9月, 2008 1 次提交
  7. 15 8月, 2008 1 次提交
  8. 25 5月, 2008 3 次提交
  9. 17 4月, 2008 1 次提交
    • T
      x86: use ELF section to list CPU vendor specific code · 03ae5768
      Thomas Petazzoni 提交于
      Replace the hardcoded list of initialization functions for each CPU
      vendor by a list in an ELF section, which is read at initialization in
      arch/x86/kernel/cpu/cpu.c to fill the cpu_devs[] array. The ELF
      section, named .x86cpuvendor.init, is reclaimed after boot, and
      contains entries of type "struct cpu_vendor_dev" which associates a
      vendor number with a pointer to a "struct cpu_dev" structure.
      
      This first modification allows to remove all the VENDOR_init_cpu()
      functions.
      
      This patch also removes the hardcoded calls to early_init_amd() and
      early_init_intel(). Instead, we add a "c_early_init" member to the
      cpu_dev structure, which is then called if not NULL by the generic CPU
      initialization code. Unfortunately, in early_cpu_detect(), this_cpu is
      not yet set, so we have to use the cpu_devs[] array directly.
      
      This patch is part of the Linux Tiny project, and is needed for
      further patch that will allow to disable compilation of unused CPU
      support code.
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      03ae5768
  10. 19 2月, 2008 1 次提交
  11. 30 1月, 2008 2 次提交
  12. 29 1月, 2008 1 次提交
  13. 11 10月, 2007 2 次提交
  14. 20 7月, 2007 2 次提交
    • R
      i386: Put allocated ELF notes in read-only data segment · cbe87121
      Roland McGrath 提交于
      This changes the i386 linker script and the asm-generic macro it uses so that
      ELF note sections with SHF_ALLOC set are linked into the kernel image along
      with other read-only data.  The PT_NOTE also points to their location.
      
      This paves the way for putting useful build-time information into ELF notes
      that can be found easily later in a kernel memory dump.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbe87121
    • F
      define new percpu interface for shared data · 5fb7dc37
      Fenghua Yu 提交于
      per cpu data section contains two types of data.  One set which is
      exclusively accessed by the local cpu and the other set which is per cpu,
      but also shared by remote cpus.  In the current kernel, these two sets are
      not clearely separated out.  This can potentially cause the same data
      cacheline shared between the two sets of data, which will result in
      unnecessary bouncing of the cacheline between cpus.
      
      One way to fix the problem is to cacheline align the remotely accessed per
      cpu data, both at the beginning and at the end.  Because of the padding at
      both ends, this will likely cause some memory wastage and also the
      interface to achieve this is not clean.
      
      This patch:
      
      Moves the remotely accessed per cpu data (which is currently marked
      as ____cacheline_aligned_in_smp) into a different section, where all the data
      elements are cacheline aligned. And as such, this differentiates the local
      only data and remotely accessed data cleanly.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: <linux-arch@vger.kernel.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5fb7dc37
  15. 18 7月, 2007 1 次提交
    • J
      xen: Core Xen implementation · 5ead97c8
      Jeremy Fitzhardinge 提交于
      This patch is a rollup of all the core pieces of the Xen
      implementation, including:
       - booting and setup
       - pagetable setup
       - privileged instructions
       - segmentation
       - interrupt flags
       - upcalls
       - multicall batching
      
      BOOTING AND SETUP
      
      The vmlinux image is decorated with ELF notes which tell the Xen
      domain builder what the kernel's requirements are; the domain builder
      then constructs the address space accordingly and starts the kernel.
      
      Xen has its own entrypoint for the kernel (contained in an ELF note).
      The ELF notes are set up by xen-head.S, which is included into head.S.
      In principle it could be linked separately, but it seems to provoke
      lots of binutils bugs.
      
      Because the domain builder starts the kernel in a fairly sane state
      (32-bit protected mode, paging enabled, flat segments set up), there's
      not a lot of setup needed before starting the kernel proper.  The main
      steps are:
        1. Install the Xen paravirt_ops, which is simply a matter of a
           structure assignment.
        2. Set init_mm to use the Xen-supplied pagetables (analogous to the
           head.S generated pagetables in a native boot).
        3. Reserve address space for Xen, since it takes a chunk at the top
           of the address space for its own use.
        4. Call start_kernel()
      
      PAGETABLE SETUP
      
      Once we hit the main kernel boot sequence, it will end up calling back
      via paravirt_ops to set up various pieces of Xen specific state.  One
      of the critical things which requires a bit of extra care is the
      construction of the initial init_mm pagetable.  Because Xen places
      tight constraints on pagetables (an active pagetable must always be
      valid, and must always be mapped read-only to the guest domain), we
      need to be careful when constructing the new pagetable to keep these
      constraints in mind.  It turns out that the easiest way to do this is
      use the initial Xen-provided pagetable as a template, and then just
      insert new mappings for memory where a mapping doesn't already exist.
      
      This means that during pagetable setup, it uses a special version of
      xen_set_pte which ignores any attempt to remap a read-only page as
      read-write (since Xen will map its own initial pagetable as RO), but
      lets other changes to the ptes happen, so that things like NX are set
      properly.
      
      PRIVILEGED INSTRUCTIONS AND SEGMENTATION
      
      When the kernel runs under Xen, it runs in ring 1 rather than ring 0.
      This means that it is more privileged than user-mode in ring 3, but it
      still can't run privileged instructions directly.  Non-performance
      critical instructions are dealt with by taking a privilege exception
      and trapping into the hypervisor and emulating the instruction, but
      more performance-critical instructions have their own specific
      paravirt_ops.  In many cases we can avoid having to do any hypercalls
      for these instructions, or the Xen implementation is quite different
      from the normal native version.
      
      The privileged instructions fall into the broad classes of:
        Segmentation: setting up the GDT and the GDT entries, LDT,
           TLS and so on.  Xen doesn't allow the GDT to be directly
           modified; all GDT updates are done via hypercalls where the new
           entries can be validated.  This is important because Xen uses
           segment limits to prevent the guest kernel from damaging the
           hypervisor itself.
        Traps and exceptions: Xen uses a special format for trap entrypoints,
           so when the kernel wants to set an IDT entry, it needs to be
           converted to the form Xen expects.  Xen sets int 0x80 up specially
           so that the trap goes straight from userspace into the guest kernel
           without going via the hypervisor.  sysenter isn't supported.
        Kernel stack: The esp0 entry is extracted from the tss and provided to
           Xen.
        TLB operations: the various TLB calls are mapped into corresponding
           Xen hypercalls.
        Control registers: all the control registers are privileged.  The most
           important is cr3, which points to the base of the current pagetable,
           and we handle it specially.
      
      Another instruction we treat specially is CPUID, even though its not
      privileged.  We want to control what CPU features are visible to the
      rest of the kernel, and so CPUID ends up going into a paravirt_op.
      Xen implements this mainly to disable the ACPI and APIC subsystems.
      
      INTERRUPT FLAGS
      
      Xen maintains its own separate flag for masking events, which is
      contained within the per-cpu vcpu_info structure.  Because the guest
      kernel runs in ring 1 and not 0, the IF flag in EFLAGS is completely
      ignored (and must be, because even if a guest domain disables
      interrupts for itself, it can't disable them overall).
      
      (A note on terminology: "events" and interrupts are effectively
      synonymous.  However, rather than using an "enable flag", Xen uses a
      "mask flag", which blocks event delivery when it is non-zero.)
      
      There are paravirt_ops for each of cli/sti/save_fl/restore_fl, which
      are implemented to manage the Xen event mask state.  The only thing
      worth noting is that when events are unmasked, we need to explicitly
      see if there's a pending event and call into the hypervisor to make
      sure it gets delivered.
      
      UPCALLS
      
      Xen needs a couple of upcall (or callback) functions to be implemented
      by each guest.  One is the event upcalls, which is how events
      (interrupts, effectively) are delivered to the guests.  The other is
      the failsafe callback, which is used to report errors in either
      reloading a segment register, or caused by iret.  These are
      implemented in i386/kernel/entry.S so they can jump into the normal
      iret_exc path when necessary.
      
      MULTICALL BATCHING
      
      Xen provides a multicall mechanism, which allows multiple hypercalls
      to be issued at once in order to mitigate the cost of trapping into
      the hypervisor.  This is particularly useful for context switches,
      since the 4-5 hypercalls they would normally need (reload cr3, update
      TLS, maybe update LDT) can be reduced to one.  This patch implements a
      generic batching mechanism for hypercalls, which gets used in many
      places in the Xen code.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      Cc: Ian Pratt <ian.pratt@xensource.com>
      Cc: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
      Cc: Adrian Bunk <bunk@stusta.de>
      5ead97c8
  16. 19 5月, 2007 2 次提交
  17. 11 5月, 2007 1 次提交
    • E
      Revert "[PATCH] paravirt: Add startup infrastructure for paravirtualization" · 5a18c92a
      Eric W. Biederman 提交于
      This reverts commit c9ccf30d.
      
      Entering the kernel at startup_32 without passing our real mode data in
      %esi, and without guaranteeing that physical and virtual addresses are
      identity mapped makes head.S impossible to maintain.
      
      The only user of this infrastructure is lguest which is not merged so
      nothing we currently support will break by removing this over designed
      nightmare, and only the pending lguest patches will be affected.  The
      pending Xen patches have a different entry point that they use.
      
      We are currently discussing what Xen and lguest need to do to boot the
      kernel in a more normal fashion so using startup_32 in this weird manner is
      clearly not their long term direction.
      
      So let's remove this code in head.S before it causes brain damage to people
      trying to maintain head.S
      
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Zachary Amsden <zach@vmware.com>
      CC: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a18c92a
  18. 03 5月, 2007 6 次提交
  19. 16 4月, 2007 1 次提交
    • A
      [PATCH] x86: Fix gcc 4.2 _proxy_pda workaround · 08269c6d
      Andi Kleen 提交于
      Due to an over aggressive optimizer gcc 4.2 cannot optimize away _proxy_pda
      in all cases (counter intuitive, but true).  This breaks loading of some
      modules.
      
      The earlier workaround to just export a dummy symbol didn't work unfortunately
      because the module code ignores exports with 0 value.
      
      Make it 1 instead.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      08269c6d
  20. 13 2月, 2007 1 次提交
    • V
      [PATCH] i386: move startup_32() in text.head section · f8657e1b
      Vivek Goyal 提交于
      o Entry startup_32 was in .text section but it was accessing some init
        data too and it prompts MODPOST to generate compilation warnings.
      
      WARNING: vmlinux - Section mismatch: reference to .init.data:boot_params from
      .text between '_text' (at offset 0xc0100029) and 'startup_32_smp'
      WARNING: vmlinux - Section mismatch: reference to .init.data:boot_params from
      .text between '_text' (at offset 0xc0100037) and 'startup_32_smp'
      WARNING: vmlinux - Section mismatch: reference to
      .init.data:init_pg_tables_end from .text between '_text' (at offset
      0xc0100099) and 'startup_32_smp'
      
      o Can't move startup_32 to .init.text as this entry point has to be at the
        start of bzImage. Hence moved startup_32 to a new section .text.head and
        instructed MODPOST to not to generate warnings if init data is being
        accessed from .text.head section. This code has been audited.
      
      o SMP boot up code (startup_32_smp) can go into .init.text if CPU hotplug
        is not supported. Otherwise it generates more warnings
      
      WARNING: vmlinux - Section mismatch: reference to .init.data:new_cpu_data from
      .text between 'checkCPUtype' (at offset 0xc0100126) and 'is486'
      WARNING: vmlinux - Section mismatch: reference to .init.data:new_cpu_data from
      .text between 'checkCPUtype' (at offset 0xc0100130) and 'is486'
      Signed-off-by: NVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      f8657e1b
  21. 12 2月, 2007 1 次提交
  22. 10 12月, 2006 1 次提交
    • A
      [PATCH] x86: Work around gcc 4.2 over aggressive optimizer · 1bac3b38
      Andi Kleen 提交于
      The new PDA code uses a dummy _proxy_pda variable to describe
      memory references to the PDA. It is never referenced
      in inline assembly, but exists as input/output arguments.
      gcc 4.2 in some cases can CSE references to this which causes
      unresolved symbols.  Define it to zero to avoid this.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      1bac3b38
  23. 09 12月, 2006 1 次提交
    • J
      [PATCH] Generic BUG for i386 · 91768d6c
      Jeremy Fitzhardinge 提交于
      This makes i386 use the generic BUG machinery.  There are no functional
      changes from the old i386 implementation.
      
      The main advantage in using the generic BUG machinery for i386 is that the
      inlined overhead of BUG is just the ud2a instruction; the file+line(+function)
      information are no longer inlined into the instruction stream.  This reduces
      cache pollution, and makes disassembly work properly.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Hugh Dickens <hugh@veritas.com>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      91768d6c
  24. 07 12月, 2006 2 次提交