1. 15 3月, 2009 3 次提交
    • J
      x86: allow extend_brk users to reserve brk space · 796216a5
      Jeremy Fitzhardinge 提交于
      Impact: new interface; remove hard-coded limit
      
      Add RESERVE_BRK(name, size) macro to reserve space in the brk
      area.  This should be a conservative (ie, larger) estimate of
      how much space might possibly be required from the brk area.
      Any unused space will be freed, so there's no real downside
      on making the reservation too large (within limits).
      
      The name should be unique within a given file, and somewhat
      descriptive.
      
      The C definition of RESERVE_BRK() ends up being more complex than
      one would expect to work around a cluster of gcc infelicities:
      
        The first attempt was to simply try putting __section(.brk_reservation)
        on a variable.  This doesn't work because it ends up making it a
        @progbits section, which gets actual space allocated in the vmlinux
        executable.
      
        The second attempt was to emit the space into a section using asm,
        but gcc doesn't allow arguments to be passed to file-level asm()
        statements, making it hard to pass in the size.
      
        The final attempt is to wrap the asm() in a function to allow
        it to have arguments, and put the function itself into the
        .discard section, which vmlinux*.lds drops entirely from the
        emitted vmlinux.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      796216a5
    • Y
      x86-32: compute initial mapping size more accurately · 7543c1de
      Yinghai Lu 提交于
      Impact: simplification
      
      We only need to map the kernel in head_32.S, not the whole of
      lowmem.  We use 512MB as a reasonable (but arbitrary) limit on
      the maximum size of the kernel image.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      7543c1de
    • J
      x86-32: use brk segment for allocating initial kernel pagetable · ccf3fe02
      Jeremy Fitzhardinge 提交于
      Impact: use new interface instead of previous ad hoc implementation
      
      Rather than having special purpose init_pg_table_start/end variables
      to delimit the kernel pagetable built by head_32.S, just use the brk
      mechanism to extend the bss for the new pagetable.
      
      This patch removes init_pg_table_start/end and pg0, defines __brk_base
      (which is page-aligned and immediately follows _end), initializes
      the brk region to start there, and uses it for the 32-bit pagetable.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ccf3fe02
  2. 14 2月, 2009 1 次提交
  3. 11 2月, 2009 1 次提交
    • T
      x86: fix x86_32 stack protector bugs · 5c79d2a5
      Tejun Heo 提交于
      Impact: fix x86_32 stack protector
      
      Brian Gerst found out that %gs was being initialized to stack_canary
      instead of stack_canary - 20, which basically gave the same canary
      value for all threads.  Fixing this also exposed the following bugs.
      
      * cpu_idle() didn't call boot_init_stack_canary()
      
      * stack canary switching in switch_to() was being done too late making
        the initial run of a new thread use the old stack canary value.
      
      Fix all of them and while at it update comment in cpu_idle() about
      calling boot_init_stack_canary().
      Reported-by: NBrian Gerst <brgerst@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5c79d2a5
  4. 10 2月, 2009 1 次提交
    • T
      x86: implement x86_32 stack protector · 60a5317f
      Tejun Heo 提交于
      Impact: stack protector for x86_32
      
      Implement stack protector for x86_32.  GDT entry 28 is used for it.
      It's set to point to stack_canary-20 and have the length of 24 bytes.
      CONFIG_CC_STACKPROTECTOR turns off CONFIG_X86_32_LAZY_GS and sets %gs
      to the stack canary segment on entry.  As %gs is otherwise unused by
      the kernel, the canary can be anywhere.  It's defined as a percpu
      variable.
      
      x86_32 exception handlers take register frame on stack directly as
      struct pt_regs.  With -fstack-protector turned on, gcc copies the
      whole structure after the stack canary and (of course) doesn't copy
      back on return thus losing all changed.  For now, -fno-stack-protector
      is added to all files which contain those functions.  We definitely
      need something better.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60a5317f
  5. 26 1月, 2009 2 次提交
  6. 21 1月, 2009 1 次提交
  7. 11 10月, 2008 1 次提交
  8. 28 7月, 2008 1 次提交
  9. 08 7月, 2008 2 次提交
  10. 13 6月, 2008 1 次提交
    • J
      x86: fix asm warning in head_32.S · 86b2b70e
      Joe Korty 提交于
      On Mon, May 19, 2008 at 04:10:02PM -0700, Linus Torvalds wrote:
      > It also causes these warnings on 32-bit PAE:
      >
      > 	  AS      arch/x86/kernel/head_32.o
      > 	arch/x86/kernel/head_32.S: Assembler messages:
      > 	arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
      > 	arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed
      >
      > and I do not see why (the end result seems to be identical).
      
      Fix head_32.S gcc bignum warnings when CONFIG_PAE=y.
      
          arch/x86/kernel/head_32.S: Assembler messages:
          arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
          arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed
      
      The assembler was stumbling over the 64-bit constant 0x100000000 in the
      KPMDS #define.
      
      Testing: a cmp(1) on head_32.o before and after shows the binary is unchanged.
      
      Signed-off-by: Joe Korty <joe.korty@ccur.com
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Gabriel C <nix.or.die@googlemail.com>
      Cc: Keith Packard <keithp@keithp.com>
      Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com>
      Cc: Eric Anholt <eric@anholt.net>
      Cc: "Siddha Suresh B" <suresh.b.siddha@intel.com>
      Cc: bugme-daemon@bugzilla.kernel.org
      Cc: airlied@linux.ie
      Cc: "Barnes Jesse" <jesse.barnes@intel.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      86b2b70e
  11. 03 6月, 2008 1 次提交
  12. 31 5月, 2008 1 次提交
  13. 01 5月, 2008 1 次提交
  14. 20 4月, 2008 1 次提交
  15. 17 4月, 2008 1 次提交
  16. 22 3月, 2008 1 次提交
  17. 26 2月, 2008 1 次提交
  18. 19 2月, 2008 1 次提交
  19. 10 2月, 2008 1 次提交
    • I
      x86: construct 32-bit boot time page tables in native format. · 551889a6
      Ian Campbell 提交于
      Specifically the boot time page tables in a CONFIG_X86_PAE=y enabled
      kernel are in PAE format.
      
      early_ioremap is updated to use the standard page table accessors.
      
      Clear any mappings beyond max_low_pfn from the boot page tables in
      native_pagetable_setup_start because the initial mappings can extend
      beyond the range of physical memory and into the vmalloc area.
      
      Derived from patches by Eric Biederman and H. Peter Anvin.
      
      [ jeremy@goop.org: PAE swapper_pg_dir needs to be page-sized fix ]
      Signed-off-by: NIan Campbell <ijc@hellion.org.uk>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Mika Penttilä <mika.penttila@kolumbus.fi>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      551889a6
  20. 30 1月, 2008 4 次提交
  21. 02 1月, 2008 1 次提交
  22. 04 12月, 2007 1 次提交
    • E
      x86: fix x86-32 early fixmap initialization. · 17d57a92
      Eric W. Biederman 提交于
      pageexec@freemail.hu writes:
      
      > i've just noticed that the chunk in i386/kernel/head.S ended up in a
      > weird place, namely, it's not going to be executed as it's just after
      > a 'jmp 3f' and before startup_32_smp, probably not what you intended.
      > on a sidenote, the whole thing can be done in a single insn, like:
      >
      > movl $(swapper_pg_pmd - __PAGE_OFFSET + 0x067), (swapper_pg_dir -
      > __PAGE_OFFSET+ 4092)
      
      Thanks for the reminder I thought we had fixed this problem a while ago.
      
      Needed to get fixed virtual address for USB debug and earlycon with mmio.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      17d57a92
  23. 24 10月, 2007 1 次提交
  24. 22 10月, 2007 1 次提交
    • R
      i386: paravirt boot sequence · a24e7851
      Rusty Russell 提交于
      This patch uses the updated boot protocol to do paravirtualized boot.
      If the boot version is >= 2.07, then it will do two things:
      
       1. Check the bootparams loadflags to see if we should reload the
          segment registers and clear interrupts.  This is appropriate
          for normal native boot and some paravirtualized environments, but
          inapproprate for others.
      
       2. Check the hardware architecture, and dispatch to the appropriate
          kernel entrypoint.  If the bootloader doesn't set this, then we
          simply do the normal boot sequence.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@in.ibm.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a24e7851
  25. 18 10月, 2007 2 次提交
    • I
      i386: print better early fault info · 382f64ab
      Ingo Molnar 提交于
      improve early fault output.
      
      old format:
      
       Int 14: CR2 010001e3  err 00000002  EIP c011f2f9  CS 00000060  flags 00010046
       Stack: c073695e c0791c10 00000000 ffffffff 00000000 01000000 00001000 c0791c10
      
      new format:
      
       BUG: Int 14: CR2 010001e3
            EDI c1000000  ESI c0693c10  EBP c0637f9c  ESP c0637f08
            EBX 00000000  EDX 0000000e  ECX 00000000  EAX 010001e3
            err 00000002  EIP c0123119   CS 00000060  flg 00010046
       Stack: c064d589 c0693000 00000000 c0637f60 00c001e3 01000000 00038000 00000163
              00000000 00000163 00000000 ffffffff 00038000 00000000 00000000 00001000
              00001000 00000000 c0637f88 c06509be c0a2ae60 00001000 00001000 00000000
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      382f64ab
    • I
      x86: prepare page allocator for high allocations on PAGEALLOC=y · 1e3e1972
      Ingo Molnar 提交于
      To preserve the DMA pool in CONFIG_DEBUG_PAGEALLOC=y kernels, we'll
      allocate pagetables from above the 16MB DMA limit, so we'll have to set
      up boot pagetables to cover 16MB more RAM (worst-case).
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1e3e1972
  26. 11 10月, 2007 3 次提交
  27. 12 8月, 2007 1 次提交
  28. 18 7月, 2007 1 次提交
    • J
      xen: Core Xen implementation · 5ead97c8
      Jeremy Fitzhardinge 提交于
      This patch is a rollup of all the core pieces of the Xen
      implementation, including:
       - booting and setup
       - pagetable setup
       - privileged instructions
       - segmentation
       - interrupt flags
       - upcalls
       - multicall batching
      
      BOOTING AND SETUP
      
      The vmlinux image is decorated with ELF notes which tell the Xen
      domain builder what the kernel's requirements are; the domain builder
      then constructs the address space accordingly and starts the kernel.
      
      Xen has its own entrypoint for the kernel (contained in an ELF note).
      The ELF notes are set up by xen-head.S, which is included into head.S.
      In principle it could be linked separately, but it seems to provoke
      lots of binutils bugs.
      
      Because the domain builder starts the kernel in a fairly sane state
      (32-bit protected mode, paging enabled, flat segments set up), there's
      not a lot of setup needed before starting the kernel proper.  The main
      steps are:
        1. Install the Xen paravirt_ops, which is simply a matter of a
           structure assignment.
        2. Set init_mm to use the Xen-supplied pagetables (analogous to the
           head.S generated pagetables in a native boot).
        3. Reserve address space for Xen, since it takes a chunk at the top
           of the address space for its own use.
        4. Call start_kernel()
      
      PAGETABLE SETUP
      
      Once we hit the main kernel boot sequence, it will end up calling back
      via paravirt_ops to set up various pieces of Xen specific state.  One
      of the critical things which requires a bit of extra care is the
      construction of the initial init_mm pagetable.  Because Xen places
      tight constraints on pagetables (an active pagetable must always be
      valid, and must always be mapped read-only to the guest domain), we
      need to be careful when constructing the new pagetable to keep these
      constraints in mind.  It turns out that the easiest way to do this is
      use the initial Xen-provided pagetable as a template, and then just
      insert new mappings for memory where a mapping doesn't already exist.
      
      This means that during pagetable setup, it uses a special version of
      xen_set_pte which ignores any attempt to remap a read-only page as
      read-write (since Xen will map its own initial pagetable as RO), but
      lets other changes to the ptes happen, so that things like NX are set
      properly.
      
      PRIVILEGED INSTRUCTIONS AND SEGMENTATION
      
      When the kernel runs under Xen, it runs in ring 1 rather than ring 0.
      This means that it is more privileged than user-mode in ring 3, but it
      still can't run privileged instructions directly.  Non-performance
      critical instructions are dealt with by taking a privilege exception
      and trapping into the hypervisor and emulating the instruction, but
      more performance-critical instructions have their own specific
      paravirt_ops.  In many cases we can avoid having to do any hypercalls
      for these instructions, or the Xen implementation is quite different
      from the normal native version.
      
      The privileged instructions fall into the broad classes of:
        Segmentation: setting up the GDT and the GDT entries, LDT,
           TLS and so on.  Xen doesn't allow the GDT to be directly
           modified; all GDT updates are done via hypercalls where the new
           entries can be validated.  This is important because Xen uses
           segment limits to prevent the guest kernel from damaging the
           hypervisor itself.
        Traps and exceptions: Xen uses a special format for trap entrypoints,
           so when the kernel wants to set an IDT entry, it needs to be
           converted to the form Xen expects.  Xen sets int 0x80 up specially
           so that the trap goes straight from userspace into the guest kernel
           without going via the hypervisor.  sysenter isn't supported.
        Kernel stack: The esp0 entry is extracted from the tss and provided to
           Xen.
        TLB operations: the various TLB calls are mapped into corresponding
           Xen hypercalls.
        Control registers: all the control registers are privileged.  The most
           important is cr3, which points to the base of the current pagetable,
           and we handle it specially.
      
      Another instruction we treat specially is CPUID, even though its not
      privileged.  We want to control what CPU features are visible to the
      rest of the kernel, and so CPUID ends up going into a paravirt_op.
      Xen implements this mainly to disable the ACPI and APIC subsystems.
      
      INTERRUPT FLAGS
      
      Xen maintains its own separate flag for masking events, which is
      contained within the per-cpu vcpu_info structure.  Because the guest
      kernel runs in ring 1 and not 0, the IF flag in EFLAGS is completely
      ignored (and must be, because even if a guest domain disables
      interrupts for itself, it can't disable them overall).
      
      (A note on terminology: "events" and interrupts are effectively
      synonymous.  However, rather than using an "enable flag", Xen uses a
      "mask flag", which blocks event delivery when it is non-zero.)
      
      There are paravirt_ops for each of cli/sti/save_fl/restore_fl, which
      are implemented to manage the Xen event mask state.  The only thing
      worth noting is that when events are unmasked, we need to explicitly
      see if there's a pending event and call into the hypervisor to make
      sure it gets delivered.
      
      UPCALLS
      
      Xen needs a couple of upcall (or callback) functions to be implemented
      by each guest.  One is the event upcalls, which is how events
      (interrupts, effectively) are delivered to the guests.  The other is
      the failsafe callback, which is used to report errors in either
      reloading a segment register, or caused by iret.  These are
      implemented in i386/kernel/entry.S so they can jump into the normal
      iret_exc path when necessary.
      
      MULTICALL BATCHING
      
      Xen provides a multicall mechanism, which allows multiple hypercalls
      to be issued at once in order to mitigate the cost of trapping into
      the hypervisor.  This is particularly useful for context switches,
      since the 4-5 hypercalls they would normally need (reload cr3, update
      TLS, maybe update LDT) can be reduced to one.  This patch implements a
      generic batching mechanism for hypercalls, which gets used in many
      places in the Xen code.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      Cc: Ian Pratt <ian.pratt@xensource.com>
      Cc: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
      Cc: Adrian Bunk <bunk@stusta.de>
      5ead97c8
  29. 17 7月, 2007 1 次提交
  30. 11 5月, 2007 1 次提交
    • E
      Revert "[PATCH] paravirt: Add startup infrastructure for paravirtualization" · 5a18c92a
      Eric W. Biederman 提交于
      This reverts commit c9ccf30d.
      
      Entering the kernel at startup_32 without passing our real mode data in
      %esi, and without guaranteeing that physical and virtual addresses are
      identity mapped makes head.S impossible to maintain.
      
      The only user of this infrastructure is lguest which is not merged so
      nothing we currently support will break by removing this over designed
      nightmare, and only the pending lguest patches will be affected.  The
      pending Xen patches have a different entry point that they use.
      
      We are currently discussing what Xen and lguest need to do to boot the
      kernel in a more normal fashion so using startup_32 in this weird manner is
      clearly not their long term direction.
      
      So let's remove this code in head.S before it causes brain damage to people
      trying to maintain head.S
      
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Zachary Amsden <zach@vmware.com>
      CC: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a18c92a