1. 18 10月, 2007 2 次提交
    • I
      i386: print better early fault info · 382f64ab
      Ingo Molnar 提交于
      improve early fault output.
      
      old format:
      
       Int 14: CR2 010001e3  err 00000002  EIP c011f2f9  CS 00000060  flags 00010046
       Stack: c073695e c0791c10 00000000 ffffffff 00000000 01000000 00001000 c0791c10
      
      new format:
      
       BUG: Int 14: CR2 010001e3
            EDI c1000000  ESI c0693c10  EBP c0637f9c  ESP c0637f08
            EBX 00000000  EDX 0000000e  ECX 00000000  EAX 010001e3
            err 00000002  EIP c0123119   CS 00000060  flg 00010046
       Stack: c064d589 c0693000 00000000 c0637f60 00c001e3 01000000 00038000 00000163
              00000000 00000163 00000000 ffffffff 00038000 00000000 00000000 00001000
              00001000 00000000 c0637f88 c06509be c0a2ae60 00001000 00001000 00000000
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      382f64ab
    • I
      x86: prepare page allocator for high allocations on PAGEALLOC=y · 1e3e1972
      Ingo Molnar 提交于
      To preserve the DMA pool in CONFIG_DEBUG_PAGEALLOC=y kernels, we'll
      allocate pagetables from above the 16MB DMA limit, so we'll have to set
      up boot pagetables to cover 16MB more RAM (worst-case).
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1e3e1972
  2. 11 10月, 2007 3 次提交
  3. 12 8月, 2007 1 次提交
  4. 18 7月, 2007 1 次提交
    • J
      xen: Core Xen implementation · 5ead97c8
      Jeremy Fitzhardinge 提交于
      This patch is a rollup of all the core pieces of the Xen
      implementation, including:
       - booting and setup
       - pagetable setup
       - privileged instructions
       - segmentation
       - interrupt flags
       - upcalls
       - multicall batching
      
      BOOTING AND SETUP
      
      The vmlinux image is decorated with ELF notes which tell the Xen
      domain builder what the kernel's requirements are; the domain builder
      then constructs the address space accordingly and starts the kernel.
      
      Xen has its own entrypoint for the kernel (contained in an ELF note).
      The ELF notes are set up by xen-head.S, which is included into head.S.
      In principle it could be linked separately, but it seems to provoke
      lots of binutils bugs.
      
      Because the domain builder starts the kernel in a fairly sane state
      (32-bit protected mode, paging enabled, flat segments set up), there's
      not a lot of setup needed before starting the kernel proper.  The main
      steps are:
        1. Install the Xen paravirt_ops, which is simply a matter of a
           structure assignment.
        2. Set init_mm to use the Xen-supplied pagetables (analogous to the
           head.S generated pagetables in a native boot).
        3. Reserve address space for Xen, since it takes a chunk at the top
           of the address space for its own use.
        4. Call start_kernel()
      
      PAGETABLE SETUP
      
      Once we hit the main kernel boot sequence, it will end up calling back
      via paravirt_ops to set up various pieces of Xen specific state.  One
      of the critical things which requires a bit of extra care is the
      construction of the initial init_mm pagetable.  Because Xen places
      tight constraints on pagetables (an active pagetable must always be
      valid, and must always be mapped read-only to the guest domain), we
      need to be careful when constructing the new pagetable to keep these
      constraints in mind.  It turns out that the easiest way to do this is
      use the initial Xen-provided pagetable as a template, and then just
      insert new mappings for memory where a mapping doesn't already exist.
      
      This means that during pagetable setup, it uses a special version of
      xen_set_pte which ignores any attempt to remap a read-only page as
      read-write (since Xen will map its own initial pagetable as RO), but
      lets other changes to the ptes happen, so that things like NX are set
      properly.
      
      PRIVILEGED INSTRUCTIONS AND SEGMENTATION
      
      When the kernel runs under Xen, it runs in ring 1 rather than ring 0.
      This means that it is more privileged than user-mode in ring 3, but it
      still can't run privileged instructions directly.  Non-performance
      critical instructions are dealt with by taking a privilege exception
      and trapping into the hypervisor and emulating the instruction, but
      more performance-critical instructions have their own specific
      paravirt_ops.  In many cases we can avoid having to do any hypercalls
      for these instructions, or the Xen implementation is quite different
      from the normal native version.
      
      The privileged instructions fall into the broad classes of:
        Segmentation: setting up the GDT and the GDT entries, LDT,
           TLS and so on.  Xen doesn't allow the GDT to be directly
           modified; all GDT updates are done via hypercalls where the new
           entries can be validated.  This is important because Xen uses
           segment limits to prevent the guest kernel from damaging the
           hypervisor itself.
        Traps and exceptions: Xen uses a special format for trap entrypoints,
           so when the kernel wants to set an IDT entry, it needs to be
           converted to the form Xen expects.  Xen sets int 0x80 up specially
           so that the trap goes straight from userspace into the guest kernel
           without going via the hypervisor.  sysenter isn't supported.
        Kernel stack: The esp0 entry is extracted from the tss and provided to
           Xen.
        TLB operations: the various TLB calls are mapped into corresponding
           Xen hypercalls.
        Control registers: all the control registers are privileged.  The most
           important is cr3, which points to the base of the current pagetable,
           and we handle it specially.
      
      Another instruction we treat specially is CPUID, even though its not
      privileged.  We want to control what CPU features are visible to the
      rest of the kernel, and so CPUID ends up going into a paravirt_op.
      Xen implements this mainly to disable the ACPI and APIC subsystems.
      
      INTERRUPT FLAGS
      
      Xen maintains its own separate flag for masking events, which is
      contained within the per-cpu vcpu_info structure.  Because the guest
      kernel runs in ring 1 and not 0, the IF flag in EFLAGS is completely
      ignored (and must be, because even if a guest domain disables
      interrupts for itself, it can't disable them overall).
      
      (A note on terminology: "events" and interrupts are effectively
      synonymous.  However, rather than using an "enable flag", Xen uses a
      "mask flag", which blocks event delivery when it is non-zero.)
      
      There are paravirt_ops for each of cli/sti/save_fl/restore_fl, which
      are implemented to manage the Xen event mask state.  The only thing
      worth noting is that when events are unmasked, we need to explicitly
      see if there's a pending event and call into the hypervisor to make
      sure it gets delivered.
      
      UPCALLS
      
      Xen needs a couple of upcall (or callback) functions to be implemented
      by each guest.  One is the event upcalls, which is how events
      (interrupts, effectively) are delivered to the guests.  The other is
      the failsafe callback, which is used to report errors in either
      reloading a segment register, or caused by iret.  These are
      implemented in i386/kernel/entry.S so they can jump into the normal
      iret_exc path when necessary.
      
      MULTICALL BATCHING
      
      Xen provides a multicall mechanism, which allows multiple hypercalls
      to be issued at once in order to mitigate the cost of trapping into
      the hypervisor.  This is particularly useful for context switches,
      since the 4-5 hypercalls they would normally need (reload cr3, update
      TLS, maybe update LDT) can be reduced to one.  This patch implements a
      generic batching mechanism for hypercalls, which gets used in many
      places in the Xen code.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      Cc: Ian Pratt <ian.pratt@xensource.com>
      Cc: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
      Cc: Adrian Bunk <bunk@stusta.de>
      5ead97c8
  5. 17 7月, 2007 1 次提交
  6. 11 5月, 2007 1 次提交
    • E
      Revert "[PATCH] paravirt: Add startup infrastructure for paravirtualization" · 5a18c92a
      Eric W. Biederman 提交于
      This reverts commit c9ccf30d.
      
      Entering the kernel at startup_32 without passing our real mode data in
      %esi, and without guaranteeing that physical and virtual addresses are
      identity mapped makes head.S impossible to maintain.
      
      The only user of this infrastructure is lguest which is not merged so
      nothing we currently support will break by removing this over designed
      nightmare, and only the pending lguest patches will be affected.  The
      pending Xen patches have a different entry point that they use.
      
      We are currently discussing what Xen and lguest need to do to boot the
      kernel in a more normal fashion so using startup_32 in this weird manner is
      clearly not their long term direction.
      
      So let's remove this code in head.S before it causes brain damage to people
      trying to maintain head.S
      
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Zachary Amsden <zach@vmware.com>
      CC: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a18c92a
  7. 03 5月, 2007 6 次提交
  8. 13 2月, 2007 6 次提交
    • R
      [PATCH] i386: Rename cpu_gdt_descr and remove extern declaration from smpboot.c · 2a57ff1a
      Rusty Russell 提交于
      When I implemented the DECLARE_PER_CPU(var) macros, I was careful that
      people couldn't use "var" in a non-percpu context, by prepending
      percpu__.  I never considered that this would allow them to overload
      the same name for a per-cpu and a non-percpu variable.
      
      It is only one of many horrors in the i386 boot code, but let's rename
      the non-perpcu cpu_gdt_descr to early_gdt_descr (not boot_gdt_descr,
      that's something else...)
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      
      ===================================================================
      2a57ff1a
    • R
      [PATCH] i386: paravirt unhandled fallthrough · 992af681
      Rusty Russell 提交于
      The current code simply calls "start_kernel" directly if we're under a
      hypervisor and no paravirt_ops backend wants us, because paravirt.c
      registers that as a backend.
      
      This was always a vain hope; start_kernel won't get far without setup.
      It's also impossible for paravirt_ops backends which don't sit in the
      arch/i386/kernel directory: they can't link before paravirt.o anyway.
      
      Keep it simple: if we pass all the registered paravirt probes, BUG().
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      992af681
    • V
      [PATCH] i386: move startup_32() in text.head section · f8657e1b
      Vivek Goyal 提交于
      o Entry startup_32 was in .text section but it was accessing some init
        data too and it prompts MODPOST to generate compilation warnings.
      
      WARNING: vmlinux - Section mismatch: reference to .init.data:boot_params from
      .text between '_text' (at offset 0xc0100029) and 'startup_32_smp'
      WARNING: vmlinux - Section mismatch: reference to .init.data:boot_params from
      .text between '_text' (at offset 0xc0100037) and 'startup_32_smp'
      WARNING: vmlinux - Section mismatch: reference to
      .init.data:init_pg_tables_end from .text between '_text' (at offset
      0xc0100099) and 'startup_32_smp'
      
      o Can't move startup_32 to .init.text as this entry point has to be at the
        start of bzImage. Hence moved startup_32 to a new section .text.head and
        instructed MODPOST to not to generate warnings if init data is being
        accessed from .text.head section. This code has been audited.
      
      o SMP boot up code (startup_32_smp) can go into .init.text if CPU hotplug
        is not supported. Otherwise it generates more warnings
      
      WARNING: vmlinux - Section mismatch: reference to .init.data:new_cpu_data from
      .text between 'checkCPUtype' (at offset 0xc0100126) and 'is486'
      WARNING: vmlinux - Section mismatch: reference to .init.data:new_cpu_data from
      .text between 'checkCPUtype' (at offset 0xc0100130) and 'is486'
      Signed-off-by: NVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      f8657e1b
    • Z
      [PATCH] i386: vMI backend for paravirt-ops · 7ce0bcfd
      Zachary Amsden 提交于
      Fairly straightforward implementation of VMI backend for paravirt-ops.
      
      [Adrian Bunk: some cleanups]
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      7ce0bcfd
    • J
      [PATCH] i386: Convert i386 PDA code to use %fs · 464d1a78
      Jeremy Fitzhardinge 提交于
      Convert the PDA code to use %fs rather than %gs as the segment for
      per-processor data.  This is because some processors show a small but
      measurable performance gain for reloading a NULL segment selector (as %fs
      generally is in user-space) versus a non-NULL one (as %gs generally is).
      
      On modern processors the difference is very small, perhaps undetectable.
      Some old AMD "K6 3D+" processors are noticably slower when %fs is used
      rather than %gs; I have no idea why this might be, but I think they're
      sufficiently rare that it doesn't matter much.
      
      This patch also fixes the math emulator, which had not been adjusted to
      match the changed struct pt_regs.
      
      [frederik.deweerdt@gmail.com: fixit with gdb]
      [mingo@elte.hu: Fix KVM too]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Ian Campbell <Ian.Campbell@XenSource.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NZachary Amsden <zach@vmware.com>
      Cc: Eric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NFrederik Deweerdt <frederik.deweerdt@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      464d1a78
    • A
      [PATCH] Dynamic kernel command-line: i386 · 4e498b66
      Alon Bar-Lev 提交于
      1. Rename saved_command_line into boot_command_line.
      2. Set command_line as __initdata.
      Signed-off-by: NAlon Bar-Lev <alon.barlev@gmail.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e498b66
  9. 07 12月, 2006 4 次提交
    • R
      [PATCH] paravirt: Add startup infrastructure for paravirtualization · c9ccf30d
      Rusty Russell 提交于
      1) Each hypervisor writes a probe function to detect whether we are
         running under that hypervisor.  paravirt_probe() registers this
         function.
      
      2) If vmlinux is booted with ring != 0, we call all the probe
         functions (with registers except %esp intact) in link order: the
         winner will not return.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Zachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      c9ccf30d
    • J
      [PATCH] i386: Use %gs as the PDA base-segment in the kernel · f95d47ca
      Jeremy Fitzhardinge 提交于
      This patch is the meat of the PDA change.  This patch makes several related
      changes:
      
      1: Most significantly, %gs is now used in the kernel.  This means that on
         entry, the old value of %gs is saved away, and it is reloaded with
         __KERNEL_PDA.
      
      2: entry.S constructs the stack in the shape of struct pt_regs, and this
         is passed around the kernel so that the process's saved register
         state can be accessed.
      
         Unfortunately struct pt_regs doesn't currently have space for %gs
         (or %fs). This patch extends pt_regs to add space for gs (no space
         is allocated for %fs, since it won't be used, and it would just
         complicate the code in entry.S to work around the space).
      
      3: Because %gs is now saved on the stack like %ds, %es and the integer
         registers, there are a number of places where it no longer needs to
         be handled specially; namely context switch, and saving/restoring the
         register state in a signal context.
      
      4: And since kernel threads run in kernel space and call normal kernel
         code, they need to be created with their %gs == __KERNEL_PDA.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Chuck Ebbert <76306.1226@compuserve.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      f95d47ca
    • J
      [PATCH] i386: Basic definitions for i386-pda · 9ca36101
      Jeremy Fitzhardinge 提交于
      This patch has the basic definitions of struct i386_pda, and the segment
      selector in the GDT.
      
      asm-i386/pda.h is more or less a direct copy of asm-x86_64/pda.h.  The most
      interesting difference is the use of _proxy_pda, which is used to give gcc a
      model for the actual memory operations on the real pda structure.  No actual
      reference is ever made to _proxy_pda, so it is never defined.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Chuck Ebbert <76306.1226@compuserve.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      9ca36101
    • S
      [PATCH] i386: espfix cleanup · be44d2aa
      Stas Sergeev 提交于
      Clean up the espfix code:
      
      - Introduced PER_CPU() macro to be used from asm
      - Introduced GET_DESC_BASE() macro to be used from asm
      - Rewrote the fixup code in asm, as calling a C code with the altered %ss
        appeared to be unsafe
      - No longer altering the stack from a .fixup section
      - 16bit per-cpu stack is no longer used, instead the stack segment base
        is patched the way so that the high word of the kernel and user %esp
        are the same.
      - Added the limit-patching for the espfix segment. (Chuck Ebbert)
      
      [jeremy@goop.org: use the x86 scaling addressing mode rather than shifting]
      Signed-off-by: NStas Sergeev <stsp@aknet.ru>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Acked-by: NZachary Amsden <zach@vmware.com>
      Acked-by: NChuck Ebbert <76306.1226@compuserve.com>
      Acked-by: NJan Beulich <jbeulich@novell.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      be44d2aa
  10. 22 10月, 2006 1 次提交
  11. 26 9月, 2006 1 次提交
  12. 31 8月, 2006 1 次提交
  13. 01 7月, 2006 1 次提交
  14. 23 3月, 2006 1 次提交
  15. 25 2月, 2006 1 次提交
    • J
      [PATCH] x86: fix broken SMP boot sequence · 2b932f6c
      James Bottomley 提交于
      Recent GDT changes broke the SMP boot sequence if the booting CPU is
      numbered anything other than zero.  There's also a subtle source of error
      in that the boot time CPU now uses cpu_gdt_table (which is actually the GDT
      for booting CPUs in head.S).  This patch fixes both problems by making GDT
      descriptors themselves allocated from a per_cpu area and switching to them
      in cpu_init(), which now means that cpu_gdt_table is exclusively used for
      booting CPUs again.
      Signed-off-by: NJames Bottomley <James.Bottomley@SteelEye.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Matt Tolentino <metolent@snoqualmie.dp.intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2b932f6c
  16. 12 2月, 2006 1 次提交
  17. 07 1月, 2006 4 次提交
  18. 10 9月, 2005 1 次提交
    • S
      kbuild: full dependency check on asm-offsets.h · 86feeaa8
      Sam Ravnborg 提交于
      Building asm-offsets.h has been moved to a seperate Kbuild file
      located in the top-level directory. This allow us to share the
      functionality across the architectures.
      
      The old rules in architecture specific Makefiles will die
      in subsequent patches.
      
      Furhtermore the usual kbuild dependency tracking is now used
      when deciding to rebuild asm-offsets.s. So we no longer risk
      to fail a rebuild caused by asm-offsets.c dependencies being touched.
      
      With this common rule-set we now force the same name across
      all architectures. Following patches will fix the rest.
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      86feeaa8
  19. 05 9月, 2005 1 次提交
    • V
      [PATCH] kdump: Save parameter segment in protected mode (x86) · 484b90c4
      Vivek Goyal 提交于
      o With introduction of kexec as boot-loader, the assumption that parameter
        segment will always be loaded at lower address than kernel and will be
        addressable by early bootup page tables is no longer valid. In kexec on
        panic case parameter segment might well be loaded beyond kernel image and
        might not be addressable by early boot page tables.
      o This case might hit in the scenario where user has reserved a chunk of
        memory for second kernel, for example 16MB to 64MB, and has also built
        second kernel for physical memory location 16MB. In this case kexec has no
        choice but to load the parameter segment at a higher address than new kernel
        image at safe location where new kernel does not stomp it.
      o Though problem should automatically go away once relocatable kernel for i386
        is in place and kexec can determine the location of new kernel at run time
        and load parameter segment at lower address than kernel image. But till then
        this patch can go in (assuming it does not break something else).
      o This patch moves up the boot parameter saving code. Now boot parameters
        are copied out in protected mode before page tables are initialized. This
        will ensure that parameter segment is always addressable irrespective of
        its physical location.
      Signed-off-by: NVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      484b90c4
  20. 26 6月, 2005 1 次提交
  21. 01 5月, 2005 1 次提交