1. 24 2月, 2009 5 次提交
    • T
      x86: add remapping percpu first chunk allocator · 8ac83757
      Tejun Heo 提交于
      Impact: add better first percpu allocation for NUMA
      
      On NUMA, embedding allocator can't be used as different units can't be
      made to fall in the correct NUMA nodes.  To use large page mapping,
      each unit needs to be remapped.  However, percpu areas are usually
      much smaller than large page size and unused space hurts a lot as the
      number of cpus grow.  This allocator remaps large pages for each chunk
      but gives back unused part to the bootmem allocator making the large
      pages mapped twice.
      
      This adds slightly to the TLB pressure but is much better than using
      4k mappings while still being NUMA-friendly.
      
      Ingo suggested that this would be the correct approach for NUMA.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      8ac83757
    • T
      x86: add embedding percpu first chunk allocator · 89c92151
      Tejun Heo 提交于
      Impact: add better first percpu allocation for !NUMA
      
      On !NUMA, we can simply allocate contiguous memory and use it for the
      first chunk without mapping it into vmalloc area.  As the memory area
      is covered by the large page physical memory mapping, it allows the
      dynamic perpcu allocator to not add any TLB overhead for the static
      percpu area and whatever falls into the first chunk and the
      implementation is very simple too.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      89c92151
    • T
      x86: separate out setup_pcpu_4k() from setup_per_cpu_areas() · 5f5d8405
      Tejun Heo 提交于
      Impact: modularize percpu first chunk allocation
      
      x86 is gonna have a few different strategies for the first chunk
      allocation.  Modularize it by separating out the current allocation
      mechanism into pcpu_alloc_bootmem() and setup_pcpu_4k().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      5f5d8405
    • T
      percpu: give more latitude to arch specific first chunk initialization · 8d408b4b
      Tejun Heo 提交于
      Impact: more latitude for first percpu chunk allocation
      
      The first percpu chunk serves the kernel static percpu area and may or
      may not contain extra room for further dynamic allocation.
      Initialization of the first chunk needs to be done before normal
      memory allocation service is up, so it has its own init path -
      pcpu_setup_static().
      
      It seems archs need more latitude while initializing the first chunk
      for example to take advantage of large page mapping.  This patch makes
      the following changes to allow this.
      
      * Define PERCPU_DYNAMIC_RESERVE to give arch hint about how much space
        to reserve in the first chunk for further dynamic allocation.
      
      * Rename pcpu_setup_static() to pcpu_setup_first_chunk().
      
      * Make pcpu_setup_first_chunk() much more flexible by fetching page
        pointer by callback and adding optional @unit_size, @free_size and
        @base_addr arguments which allow archs to selectively part of chunk
        initialization to their likings.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      8d408b4b
    • T
      x86: update populate_extra_pte() and add populate_extra_pmd() · 458a3e64
      Tejun Heo 提交于
      Impact: minor change to populate_extra_pte() and addition of pmd flavor
      
      Update populate_extra_pte() to return pointer to the pte_t for the
      specified address and add populate_extra_pmd() which only populates
      till the pmd and returns pointer to the pmd entry for the address.
      
      For 64bit, pud/pmd/pte fill functions are separated out from
      set_pte_vaddr[_pud]() and used for set_pte_vaddr[_pud]() and
      populate_extra_{pte|pmd}().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      458a3e64
  2. 20 2月, 2009 3 次提交
  3. 12 2月, 2009 2 次提交
    • H
      x86: merge sys_rt_sigreturn between 32 and 64 bits · 74452509
      H. Peter Anvin 提交于
      Impact: cleanup
      
      With the recent changes in the 32-bit code to make system calls which
      use struct pt_regs take a pointer, sys_rt_sigreturn() have become
      identical between 32 and 64 bits, and both are empty wrappers around
      do_rt_sigreturn().  Remove both wrappers and rename both to
      sys_rt_sigreturn().
      
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      74452509
    • B
      x86: use regparm(3) for passed-in pt_regs pointer · b12bdaf1
      Brian Gerst 提交于
      Some syscalls need to access the pt_regs structure, either to copy
      user register state or to modifiy it.  This patch adds stubs to load
      the address of the pt_regs struct into the %eax register, and changes
      the syscalls to take the pointer as an argument instead of relying on
      the assumption that the pt_regs structure overlaps the function
      arguments.
      
      Drop the use of regparm(1) due to concern about gcc bugs, and to move
      in the direction of the eventual removal of regparm(0) for asmlinkage.
      Signed-off-by: NBrian Gerst <brgerst@gmail.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      b12bdaf1
  4. 11 2月, 2009 4 次提交
  5. 10 2月, 2009 5 次提交
    • T
      x86: implement x86_32 stack protector · 60a5317f
      Tejun Heo 提交于
      Impact: stack protector for x86_32
      
      Implement stack protector for x86_32.  GDT entry 28 is used for it.
      It's set to point to stack_canary-20 and have the length of 24 bytes.
      CONFIG_CC_STACKPROTECTOR turns off CONFIG_X86_32_LAZY_GS and sets %gs
      to the stack canary segment on entry.  As %gs is otherwise unused by
      the kernel, the canary can be anywhere.  It's defined as a percpu
      variable.
      
      x86_32 exception handlers take register frame on stack directly as
      struct pt_regs.  With -fstack-protector turned on, gcc copies the
      whole structure after the stack canary and (of course) doesn't copy
      back on return thus losing all changed.  For now, -fno-stack-protector
      is added to all files which contain those functions.  We definitely
      need something better.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60a5317f
    • T
      x86: make lazy %gs optional on x86_32 · ccbeed3a
      Tejun Heo 提交于
      Impact: pt_regs changed, lazy gs handling made optional, add slight
              overhead to SAVE_ALL, simplifies error_code path a bit
      
      On x86_32, %gs hasn't been used by kernel and handled lazily.  pt_regs
      doesn't have place for it and gs is saved/loaded only when necessary.
      In preparation for stack protector support, this patch makes lazy %gs
      handling optional by doing the followings.
      
      * Add CONFIG_X86_32_LAZY_GS and place for gs in pt_regs.
      
      * Save and restore %gs along with other registers in entry_32.S unless
        LAZY_GS.  Note that this unfortunately adds "pushl $0" on SAVE_ALL
        even when LAZY_GS.  However, it adds no overhead to common exit path
        and simplifies entry path with error code.
      
      * Define different user_gs accessors depending on LAZY_GS and add
        lazy_save_gs() and lazy_load_gs() which are noop if !LAZY_GS.  The
        lazy_*_gs() ops are used to save, load and clear %gs lazily.
      
      * Define ELF_CORE_COPY_KERNEL_REGS() which always read %gs directly.
      
      xen and lguest changes need to be verified.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ccbeed3a
    • T
      x86: add %gs accessors for x86_32 · d9a89a26
      Tejun Heo 提交于
      Impact: cleanup
      
      On x86_32, %gs is handled lazily.  It's not saved and restored on
      kernel entry/exit but only when necessary which usually is during task
      switch but there are few other places.  Currently, it's done by
      calling savesegment() and loadsegment() explicitly.  Define
      get_user_gs(), set_user_gs() and task_user_gs() and use them instead.
      
      While at it, clean up register access macros in signal.c.
      
      This cleans up code a bit and will help future changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d9a89a26
    • T
      x86: use asm .macro instead of cpp #define in entry_32.S · f0d96110
      Tejun Heo 提交于
      Impact: cleanup
      
      Use .macro instead of cpp #define where approriate.  This cleans up
      code and will ease future changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f0d96110
    • T
      x86: fix math_emu register frame access · d315760f
      Tejun Heo 提交于
      do_device_not_available() is the handler for #NM and it declares that
      it takes a unsigned long and calls math_emu(), which takes a long
      argument and surprisingly expects the stack frame starting at the zero
      argument would match struct math_emu_info, which isn't true regardless
      of configuration in the current code.
      
      This patch makes do_device_not_available() take struct pt_regs like
      other exception handlers and initialize struct math_emu_info with
      pointer to it and pass pointer to the math_emu_info to math_emulate()
      like normal C functions do.  This way, unless gcc makes a copy of
      struct pt_regs in do_device_not_available(), the register frame is
      correctly accessed regardless of kernel configuration or compiler
      used.
      
      This doesn't fix all math_emu problems but it at least gets it
      somewhat working.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d315760f
  6. 09 2月, 2009 4 次提交
  7. 05 2月, 2009 2 次提交
  8. 04 2月, 2009 2 次提交
  9. 03 2月, 2009 3 次提交
    • M
      x86: push old stack address on irqstack for unwinder · a67798cd
      Martin Hicks 提交于
      Impact: Fixes dumpstack and KDB on 64 bits
      
      This re-adds the old stack pointer to the top of the irqstack to help
      with unwinding.  It was removed in commit d99015b1
      as part of the save_args out-of-line work.
      
      Both dumpstack and KDB require this information.
      Signed-off-by: NMartin Hicks <mort@sgi.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      a67798cd
    • Y
      x86, percpu: fix kexec with vmlinux · ef3892bd
      Yinghai Lu 提交于
      Impact: fix regression with kexec with vmlinux
      
      Split data.init into data.init, percpu, data.init2 sections
      instead of let data.init wrap percpu secion.
      
      Thus kexec loading will be happy, because sections will not
      overlap.
      
      Before the patch we have:
      
      Elf file type is EXEC (Executable file)
      Entry point 0x200000
      There are 6 program headers, starting at offset 64
      
      Program Headers:
        Type           Offset             VirtAddr           PhysAddr
                       FileSiz            MemSiz              Flags  Align
        LOAD           0x0000000000200000 0xffffffff80200000 0x0000000000200000
                       0x0000000000ca6000 0x0000000000ca6000  R E    200000
        LOAD           0x0000000000ea6000 0xffffffff80ea6000 0x0000000000ea6000
                       0x000000000014dfe0 0x000000000014dfe0  RWE    200000
        LOAD           0x0000000001000000 0xffffffffff600000 0x0000000000ff4000
                       0x0000000000000888 0x0000000000000888  RWE    200000
        LOAD           0x00000000011f6000 0xffffffff80ff6000 0x0000000000ff6000
                       0x0000000000073086 0x0000000000a2d938  RWE    200000
        LOAD           0x0000000001400000 0x0000000000000000 0x000000000106a000
                       0x00000000001d2ce0 0x00000000001d2ce0  RWE    200000
        NOTE           0x00000000009e2c1c 0xffffffff809e2c1c 0x00000000009e2c1c
                       0x0000000000000024 0x0000000000000024         4
      
       Section to Segment mapping:
        Segment Sections...
         00     .text .notes __ex_table .rodata __bug_table .pci_fixup .builtin_fw __ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param
         01     .data .init.rodata .data.cacheline_aligned .data.read_mostly
         02     .vsyscall_0 .vsyscall_fn .vsyscall_gtod_data .vsyscall_1 .vsyscall_2 .vgetcpu_mode .jiffies
         03     .data.init_task .smp_locks .init.text .init.data .init.setup .initcall.init .con_initcall.init .x86_cpu_dev.init .altinstructions .altinstr_replacement .exit.text .init.ramfs .bss
         04     .data.percpu
         05     .notes
      
      After patch we've got:
      
      Elf file type is EXEC (Executable file)
      Entry point 0x200000
      There are 7 program headers, starting at offset 64
      
      Program Headers:
        Type           Offset             VirtAddr           PhysAddr
                       FileSiz            MemSiz              Flags  Align
        LOAD           0x0000000000200000 0xffffffff80200000 0x0000000000200000
                       0x0000000000ca6000 0x0000000000ca6000  R E    200000
        LOAD           0x0000000000ea6000 0xffffffff80ea6000 0x0000000000ea6000
                       0x000000000014dfe0 0x000000000014dfe0  RWE    200000
        LOAD           0x0000000001000000 0xffffffffff600000 0x0000000000ff4000
                       0x0000000000000888 0x0000000000000888  RWE    200000
        LOAD           0x00000000011f6000 0xffffffff80ff6000 0x0000000000ff6000
                       0x0000000000073086 0x0000000000073086  RWE    200000
        LOAD           0x0000000001400000 0x0000000000000000 0x000000000106a000
                       0x00000000001d2ce0 0x00000000001d2ce0  RWE    200000
        LOAD           0x000000000163d000 0xffffffff8123d000 0x000000000123d000
                       0x0000000000000000 0x00000000007e6938  RWE    200000
        NOTE           0x00000000009e2c1c 0xffffffff809e2c1c 0x00000000009e2c1c
                       0x0000000000000024 0x0000000000000024         4
      
       Section to Segment mapping:
        Segment Sections...
         00     .text .notes __ex_table .rodata __bug_table .pci_fixup .builtin_fw __ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param
         01     .data .init.rodata .data.cacheline_aligned .data.read_mostly
         02     .vsyscall_0 .vsyscall_fn .vsyscall_gtod_data .vsyscall_1 .vsyscall_2 .vgetcpu_mode .jiffies
         03     .data.init_task .smp_locks .init.text .init.data .init.setup .initcall.init .con_initcall.init .x86_cpu_dev.init .altinstructions .altinstr_replacement .exit.text .init.ramfs
         04     .data.percpu
         05     .bss
         06     .notes
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ef3892bd
    • J
      x86/vmi: fix interrupt enable/disable/save/restore calling convention. · 664c7954
      Jeremy Fitzhardinge 提交于
      Zach says:
      > Enable/Disable have no clobbers at all.
      > Save clobbers only return value, %eax
      > Restore also clobbers nothing.
      
      This is precisely compatible with the calling convention, so we can
      just call them directly without wrapping.
      
      (Compile tested only.)
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      664c7954
  10. 01 2月, 2009 3 次提交
  11. 31 1月, 2009 7 次提交
    • J
      x86: split loading percpu segments from loading gdt · 11e3a840
      Jeremy Fitzhardinge 提交于
      Impact: split out a function, no functional change
      
      Xen needs to be able to access percpu data from very early on.  For
      various reasons, it cannot also load the gdt at that time.   It does,
      however, have a pefectly functional gdt at that point, so there's no
      pressing need to reload the gdt.
      
      Split the function to load the segment registers off, so Xen can call
      it directly.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      11e3a840
    • B
      x86: pass in cpu number to switch_to_new_gdt() · 552be871
      Brian Gerst 提交于
      Impact: cleanup, prepare for xen boot fix.
      
      Xen needs to call this function very early to setup the GDT and
      per-cpu segments.  Remove the call to smp_processor_id() and just
      pass in the cpu number.
      Signed-off-by: NBrian Gerst <brgerst@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      552be871
    • C
      x86: UV fix uv_flush_send_and_wait() · 2749ebe3
      Cliff Wickman 提交于
      Impact: fix possible tlb mis-flushing on UV
      
      uv_flush_send_and_wait() should return a pointer if the broadcast
      remote tlb shootdown requests fail. That causes the conventional IPI
      method of shootdown to be used.
      Signed-off-by: NCliff Wickman <cpw@sgi.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      2749ebe3
    • J
      x86/paravirt: use callee-saved convention for pte_val/make_pte/etc · da5de7c2
      Jeremy Fitzhardinge 提交于
      Impact: Optimization
      
      In the native case, pte_val, make_pte, etc are all just identity
      functions, so there's no need to clobber a lot of registers over them.
      
      (This changes the 32-bit callee-save calling convention to return both
      EAX and EDX so functions can return 64-bit values.)
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      da5de7c2
    • J
      x86/paravirt: add register-saving thunks to reduce caller register pressure · ecb93d1c
      Jeremy Fitzhardinge 提交于
      Impact: Optimization
      
      One of the problems with inserting a pile of C calls where previously
      there were none is that the register pressure is greatly increased.
      The C calling convention says that the caller must expect a certain
      set of registers may be trashed by the callee, and that the callee can
      use those registers without restriction.  This includes the function
      argument registers, and several others.
      
      This patch seeks to alleviate this pressure by introducing wrapper
      thunks that will do the register saving/restoring, so that the
      callsite doesn't need to worry about it, but the callee function can
      be conventional compiler-generated code.  In many cases (particularly
      performance-sensitive cases) the callee will be in assembler anyway,
      and need not use the compiler's calling convention.
      
      Standard calling convention is:
      	 arguments	    return	scratch
      x86-32	 eax edx ecx	    eax		?
      x86-64	 rdi rsi rdx rcx    rax		r8 r9 r10 r11
      
      The thunk preserves all argument and scratch registers.  The return
      register is not preserved, and is available as a scratch register for
      unwrapped callee code (and of course the return value).
      
      Wrapped function pointers are themselves wrapped in a struct
      paravirt_callee_save structure, in order to get some warning from the
      compiler when functions with mismatched calling conventions are used.
      
      The most common paravirt ops, both statically and dynamically, are
      interrupt enable/disable/save/restore, so handle them first.  This is
      particularly easy since their calls are handled specially anyway.
      
      XXX Deal with VMI.  What's their calling convention?
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ecb93d1c
    • J
      x86: fix paravirt clobber in entry_64.S · b8aa287f
      Jeremy Fitzhardinge 提交于
      Impact: Fix latent bug
      
      The clobber is trying to say that anything except RDI is available for
      clobbering, but actually clobbers everything.  This hasn't mattered
      because the clobbers were basically ignored, but subsequent patches
      will rely on them.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      b8aa287f
    • J
      x86/pvops: add a paravirt_ident functions to allow special patching · 41edafdb
      Jeremy Fitzhardinge 提交于
      Impact: Optimization
      
      Several paravirt ops implementations simply return their arguments,
      the most obvious being the make_pte/pte_val class of operations on
      native.
      
      On 32-bit, the identity function is literally a no-op, as the calling
      convention uses the same registers for the first argument and return.
      On 64-bit, it can be implemented with a single "mov".
      
      This patch adds special identity functions for 32 and 64 bit argument,
      and machinery to recognize them and replace them with either nops or a
      mov as appropriate.
      
      At the moment, the only users for the identity functions are the
      pagetable entry conversion functions.
      
      The result is a measureable improvement on pagetable-heavy benchmarks
      (2-3%, reducing the pvops overhead from 5 to 2%).
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      41edafdb