1. 24 2月, 2009 4 次提交
    • T
      vmalloc: add @align to vm_area_register_early() · c0c0a293
      Tejun Heo 提交于
      Impact: allow larger alignment for early vmalloc area allocation
      
      Some early vmalloc users might want larger alignment, for example, for
      custom large page mapping.  Add @align to vm_area_register_early().
      While at it, drop docbook comment on non-existent @size.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      c0c0a293
    • T
      bootmem: reorder interface functions and add a missing one · 2d0aae41
      Tejun Heo 提交于
      Impact: cleanup and addition of missing interface wrapper
      
      The interface functions in bootmem.h was ordered in not so orderly
      manner.  Reorder them such that
      
      * functions allocating the same area group together -
        ie. alloc_bootmem group and alloc_bootmem_low group.
      
      * functions w/o node parameter come before the ones w/ node parameter.
      
      * nopanic variants are immediately below their panicky counterparts.
      
      While at it, add alloc_bootmem_pages_node_nopanic() which was missing.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@saeurebad.de>
      2d0aae41
    • T
      bootmem: clean up arch-specific bootmem wrapping · c1329375
      Tejun Heo 提交于
      Impact: cleaner and consistent bootmem wrapping
      
      By setting CONFIG_HAVE_ARCH_BOOTMEM_NODE, archs can define
      arch-specific wrappers for bootmem allocation.  However, this is done
      a bit strangely in that only the high level convenience macros can be
      changed while lower level, but still exported, interface functions
      can't be wrapped.  This not only is messy but also leads to strange
      situation where alloc_bootmem() does what the arch wants it to do but
      the equivalent __alloc_bootmem() call doesn't although they should be
      able to be used interchangeably.
      
      This patch updates bootmem such that archs can override / wrap the
      backend function - alloc_bootmem_core() instead of the highlevel
      interface functions to allow simpler and consistent wrapping.  Also,
      HAVE_ARCH_BOOTMEM_NODE is renamed to HAVE_ARCH_BOOTMEM.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@saeurebad.de>
      c1329375
    • T
      percpu: fix pcpu_chunk_struct_size · cb83b42e
      Tejun Heo 提交于
      Impact: fix short allocation leading to memory corruption
      
      While dropping rvalue wrapping macros around global parameters,
      pcpu_chunk_struct_size was set incorrectly resulting in shorter page
      pointer array.  Fix it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      cb83b42e
  2. 21 2月, 2009 1 次提交
    • T
      percpu: clean up size usage · cae3aeb8
      Tejun Heo 提交于
      Andrew was concerned about the unit of variables named or have suffix
      size.  Every usage in percpu allocator is in bytes but make it super
      clear by adding comments.
      
      While at it, make pcpu_depopulate_chunk() take int @off and @size like
      everyone else.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      cae3aeb8
  3. 20 2月, 2009 10 次提交
    • T
      x86: convert to the new dynamic percpu allocator · 11124411
      Tejun Heo 提交于
      Impact: use new dynamic allocator, unified access to static/dynamic
              percpu memory
      
      Convert to the new dynamic percpu allocator.
      
      * implement populate_extra_pte() for both 32 and 64
      * update setup_per_cpu_areas() to use pcpu_setup_static()
      * define __addr_to_pcpu_ptr() and __pcpu_ptr_to_addr()
      * define config HAVE_DYNAMIC_PER_CPU_AREA
      Signed-off-by: NTejun Heo <tj@kernel.org>
      11124411
    • T
      percpu: implement new dynamic percpu allocator · fbf59bc9
      Tejun Heo 提交于
      Impact: new scalable dynamic percpu allocator which allows dynamic
              percpu areas to be accessed the same way as static ones
      
      Implement scalable dynamic percpu allocator which can be used for both
      static and dynamic percpu areas.  This will allow static and dynamic
      areas to share faster direct access methods.  This feature is optional
      and enabled only when CONFIG_HAVE_DYNAMIC_PER_CPU_AREA is defined by
      arch.  Please read comment on top of mm/percpu.c for details.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      fbf59bc9
    • T
      vmalloc: add un/map_kernel_range_noflush() · 8fc48985
      Tejun Heo 提交于
      Impact: two more public map/unmap functions
      
      Implement map_kernel_range_noflush() and unmap_kernel_range_noflush().
      These functions respectively map and unmap address range in kernel VM
      area but doesn't do any vcache or tlb flushing.  These will be used by
      new percpu allocator.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      8fc48985
    • T
      vmalloc: implement vm_area_register_early() · f0aa6617
      Tejun Heo 提交于
      Impact: allow multiple early vm areas
      
      There are places where kernel VM area needs to be allocated before
      vmalloc is initialized.  This is done by allocating static vm_struct,
      initializing several fields and linking it to vmlist and later vmalloc
      initialization picking up these from vmlist.  This is currently done
      manually and if there's more than one such areas, there's no defined
      way to arbitrate who gets which address.
      
      This patch implements vm_area_register_early(), which takes vm_area
      struct with flags and size initialized, assigns address to it and puts
      it on the vmlist.  This way, multiple early vm areas can determine
      which addresses they should use.  The only current user - alpha mm
      init - is converted to use it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f0aa6617
    • T
      percpu: kill percpu_alloc() and friends · f2a8205c
      Tejun Heo 提交于
      Impact: kill unused functions
      
      percpu_alloc() and its friends never saw much action.  It was supposed
      to replace the cpu-mask unaware __alloc_percpu() but it never happened
      and in fact __percpu_alloc_mask() itself never really grew proper
      up/down handling interface either (no exported interface for
      populate/depopulate).
      
      percpu allocation is about to go through major reimplementation and
      there's no reason to carry this unused interface around.  Replace it
      with __alloc_percpu() and free_percpu().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f2a8205c
    • R
      alloc_percpu: add align argument to __alloc_percpu. · 313e458f
      Rusty Russell 提交于
      This prepares for a real __alloc_percpu, by adding an alignment argument.
      Only one place uses __alloc_percpu directly, and that's for a string.
      
      tj: af_inet also uses __alloc_percpu(), update it.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      313e458f
    • R
      alloc_percpu: change percpu_ptr to per_cpu_ptr · b36128c8
      Rusty Russell 提交于
      Impact: cleanup
      
      There are two allocated per-cpu accessor macros with almost identical
      spelling.  The original and far more popular is per_cpu_ptr (44
      files), so change over the other 4 files.
      
      tj: kill percpu_ptr() and update UP too
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: mingo@redhat.com
      Cc: lenb@kernel.org
      Cc: cpufreq@vger.kernel.org
      Signed-off-by: NTejun Heo <tj@kernel.org>
      b36128c8
    • T
      module: reorder module pcpu related functions · 6b588c18
      Tejun Heo 提交于
      Impact: cleanup
      
      Move percpu_modinit() upwards.  This is to ease further changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      6b588c18
    • T
      vmalloc: call flush_cache_vunmap() from unmap_kernel_range() · 73426952
      Tejun Heo 提交于
      Impact: proper vcache flush on unmap_kernel_range()
      
      flush_cache_vunmap() should be called before pages are unmapped.  Add
      a call to it in unmap_kernel_range().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      73426952
    • L
      x86: use percpu data for 4k hardirq and softirq stacks · 42f8faec
      Lai Jiangshan 提交于
      Impact: economize memory for large NR_CPUS
      
      percpu data is setup earlier than irq, we can use percpu data
      to economize memory.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      42f8faec
  4. 12 2月, 2009 3 次提交
    • R
      x86: UV: fix header struct usage · 58105ef1
      Randy Dunlap 提交于
      Impact: Fixes warning
      
      Fix uv.h struct usage:
      
      arch/x86/include/asm/uv/uv.h:16: warning: 'struct mm_struct' declared inside parameter list
      arch/x86/include/asm/uv/uv.h:16: warning: its scope is only this definition or declaration, which is probably not what you want
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      58105ef1
    • H
      x86: merge sys_rt_sigreturn between 32 and 64 bits · 74452509
      H. Peter Anvin 提交于
      Impact: cleanup
      
      With the recent changes in the 32-bit code to make system calls which
      use struct pt_regs take a pointer, sys_rt_sigreturn() have become
      identical between 32 and 64 bits, and both are empty wrappers around
      do_rt_sigreturn().  Remove both wrappers and rename both to
      sys_rt_sigreturn().
      
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      74452509
    • B
      x86: use regparm(3) for passed-in pt_regs pointer · b12bdaf1
      Brian Gerst 提交于
      Some syscalls need to access the pt_regs structure, either to copy
      user register state or to modifiy it.  This patch adds stubs to load
      the address of the pt_regs struct into the %eax register, and changes
      the syscalls to take the pointer as an argument instead of relying on
      the assumption that the pt_regs structure overlaps the function
      arguments.
      
      Drop the use of regparm(1) due to concern about gcc bugs, and to move
      in the direction of the eventual removal of regparm(0) for asmlinkage.
      Signed-off-by: NBrian Gerst <brgerst@gmail.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      b12bdaf1
  5. 11 2月, 2009 6 次提交
  6. 10 2月, 2009 11 次提交
    • T
      x86: implement x86_32 stack protector · 60a5317f
      Tejun Heo 提交于
      Impact: stack protector for x86_32
      
      Implement stack protector for x86_32.  GDT entry 28 is used for it.
      It's set to point to stack_canary-20 and have the length of 24 bytes.
      CONFIG_CC_STACKPROTECTOR turns off CONFIG_X86_32_LAZY_GS and sets %gs
      to the stack canary segment on entry.  As %gs is otherwise unused by
      the kernel, the canary can be anywhere.  It's defined as a percpu
      variable.
      
      x86_32 exception handlers take register frame on stack directly as
      struct pt_regs.  With -fstack-protector turned on, gcc copies the
      whole structure after the stack canary and (of course) doesn't copy
      back on return thus losing all changed.  For now, -fno-stack-protector
      is added to all files which contain those functions.  We definitely
      need something better.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60a5317f
    • T
      x86: make lazy %gs optional on x86_32 · ccbeed3a
      Tejun Heo 提交于
      Impact: pt_regs changed, lazy gs handling made optional, add slight
              overhead to SAVE_ALL, simplifies error_code path a bit
      
      On x86_32, %gs hasn't been used by kernel and handled lazily.  pt_regs
      doesn't have place for it and gs is saved/loaded only when necessary.
      In preparation for stack protector support, this patch makes lazy %gs
      handling optional by doing the followings.
      
      * Add CONFIG_X86_32_LAZY_GS and place for gs in pt_regs.
      
      * Save and restore %gs along with other registers in entry_32.S unless
        LAZY_GS.  Note that this unfortunately adds "pushl $0" on SAVE_ALL
        even when LAZY_GS.  However, it adds no overhead to common exit path
        and simplifies entry path with error code.
      
      * Define different user_gs accessors depending on LAZY_GS and add
        lazy_save_gs() and lazy_load_gs() which are noop if !LAZY_GS.  The
        lazy_*_gs() ops are used to save, load and clear %gs lazily.
      
      * Define ELF_CORE_COPY_KERNEL_REGS() which always read %gs directly.
      
      xen and lguest changes need to be verified.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ccbeed3a
    • T
      x86: add %gs accessors for x86_32 · d9a89a26
      Tejun Heo 提交于
      Impact: cleanup
      
      On x86_32, %gs is handled lazily.  It's not saved and restored on
      kernel entry/exit but only when necessary which usually is during task
      switch but there are few other places.  Currently, it's done by
      calling savesegment() and loadsegment() explicitly.  Define
      get_user_gs(), set_user_gs() and task_user_gs() and use them instead.
      
      While at it, clean up register access macros in signal.c.
      
      This cleans up code a bit and will help future changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d9a89a26
    • T
      x86: use asm .macro instead of cpp #define in entry_32.S · f0d96110
      Tejun Heo 提交于
      Impact: cleanup
      
      Use .macro instead of cpp #define where approriate.  This cleans up
      code and will ease future changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f0d96110
    • T
      x86: no stack protector for vdso · d627ded5
      Tejun Heo 提交于
      Impact: avoid crash on vsyscall
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d627ded5
    • T
      stackprotector: update make rules · 5d707e9c
      Tejun Heo 提交于
      Impact: no default -fno-stack-protector if stackp is enabled, cleanup
      
      Stackprotector make rules had the following problems.
      
      * cc support test and warning are scattered across makefile and
        kernel/panic.c.
      
      * -fno-stack-protector was always added regardless of configuration.
      
      Update such that cc support test and warning are contained in makefile
      and -fno-stack-protector is added iff stackp is turned off.  While at
      it, prepare for 32bit support.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5d707e9c
    • T
      x86: stackprotector.h misc update · 76397f72
      Tejun Heo 提交于
      Impact: misc udpate
      
      * wrap content with CONFIG_CC_STACK_PROTECTOR so that other arch files
        can include it directly
      
      * add missing includes
      
      This will help future changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      76397f72
    • T
      elf: add ELF_CORE_COPY_KERNEL_REGS() · 6cd61c0b
      Tejun Heo 提交于
      ELF core dump is used for both user land core dump and kernel crash
      dump.  Depending on architecture, register might need to be accessed
      differently for userland and kernel.  Allow architectures to define
      ELF_CORE_COPY_KERNEL_REGS() and use different operation for kernel
      register dump.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6cd61c0b
    • I
      Merge branch 'x86/urgent' into core/percpu · 92e2d508
      Ingo Molnar 提交于
      Conflicts:
      	arch/x86/kernel/acpi/boot.c
      92e2d508
    • I
      Merge branch 'x86/uaccess' into core/percpu · 5d96218b
      Ingo Molnar 提交于
      5d96218b
    • T
      x86: fix math_emu register frame access · d315760f
      Tejun Heo 提交于
      do_device_not_available() is the handler for #NM and it declares that
      it takes a unsigned long and calls math_emu(), which takes a long
      argument and surprisingly expects the stack frame starting at the zero
      argument would match struct math_emu_info, which isn't true regardless
      of configuration in the current code.
      
      This patch makes do_device_not_available() take struct pt_regs like
      other exception handlers and initialize struct math_emu_info with
      pointer to it and pass pointer to the math_emu_info to math_emulate()
      like normal C functions do.  This way, unless gcc makes a copy of
      struct pt_regs in do_device_not_available(), the register frame is
      correctly accessed regardless of kernel configuration or compiler
      used.
      
      This doesn't fix all math_emu problems but it at least gets it
      somewhat working.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d315760f
  7. 09 2月, 2009 5 次提交
    • I
      Merge commit 'v2.6.29-rc4' into core/percpu · 249d51b5
      Ingo Molnar 提交于
      Conflicts:
      	arch/x86/mach-voyager/voyager_smp.c
      	arch/x86/mm/fault.c
      249d51b5
    • T
      x86: math_emu info cleanup · ae6af41f
      Tejun Heo 提交于
      Impact: cleanup
      
      * Come on, struct info?  s/struct info/struct math_emu_info/
      
      * Use struct pt_regs and kernel_vm86_regs instead of defining its own
        register frame structure.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ae6af41f
    • T
      x86: include correct %gs in a.out core dump · 914c3d63
      Tejun Heo 提交于
      Impact: dump the correct %gs into a.out core dump
      
      aout_dump_thread() read %gs but didn't include it in core dump.  Fix
      it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      914c3d63
    • A
      x86, vmi: put a missing paravirt_release_pmd in pgd_dtor · 55a8ba4b
      Alok Kataria 提交于
      Commit 6194ba6f ("x86: don't special-case
      pmd allocations as much") made changes to the way we handle pmd allocations,
      and while doing that it dropped a call to  paravirt_release_pd on the
      pgd page from the pgd_dtor code path.
      
      As a result of this missing release, the hypervisor is now unaware of the
      pgd page being freed, and as a result it ends up tracking this page as a
      page table page.
      
      After this the guest may start using the same page for other purposes, and
      depending on what use the page is put to, it may result in various performance
      and/or functional issues ( hangs, reboots).
      
      Since this release is only required for VMI, I now release the pgd page from
      the (vmi)_pgd_free hook.
      Signed-off-by: NAlok N Kataria <akataria@vmware.com>
      Acked-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>
      55a8ba4b
    • Y
      x86: find nr_irqs_gsi with mp_ioapic_routing · 3f4a739c
      Yinghai Lu 提交于
      Impact: find right nr_irqs_gsi on some systems.
      
      One test-system has gap between gsi's:
      
      [    0.000000] ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
      [    0.000000] IOAPIC[0]: apic_id 4, version 0, address 0xfec00000, GSI 0-23
      [    0.000000] ACPI: IOAPIC (id[0x05] address[0xfeafd000] gsi_base[48])
      [    0.000000] IOAPIC[1]: apic_id 5, version 0, address 0xfeafd000, GSI 48-54
      [    0.000000] ACPI: IOAPIC (id[0x06] address[0xfeafc000] gsi_base[56])
      [    0.000000] IOAPIC[2]: apic_id 6, version 0, address 0xfeafc000, GSI 56-62
      ...
      [    0.000000] nr_irqs_gsi: 38
      
      So nr_irqs_gsi is not right. some irq for MSI will overwrite with io_apic.
      
      need to get that with acpi_probe_gsi when acpi io_apic is used
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3f4a739c