1. 20 1月, 2009 1 次提交
    • B
      x86: move stack_canary into irq_stack · 947e76cd
      Brian Gerst 提交于
      Impact: x86_64 percpu area layout change, irq_stack now at the beginning
      
      Now that the PDA is empty except for the stack canary, it can be removed.
      The irqstack is moved to the start of the per-cpu section.  If the stack
      protector is enabled, the canary overlaps the bottom 48 bytes of the irqstack.
      
      tj: * updated subject
          * dropped asm relocation of irq_stack_ptr
          * updated comments a bit
          * rebased on top of stack canary changes
      Signed-off-by: NBrian Gerst <brgerst@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      947e76cd
  2. 18 1月, 2009 3 次提交
  3. 16 1月, 2009 7 次提交
    • T
      x86: fix build bug introduced during merge · a338af2c
      Tejun Heo 提交于
      EXPORT_PER_CPU_SYMBOL() got misplaced during merge leading to build
      failure.  Fix it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a338af2c
    • T
      x86: make pda a percpu variable · b12d8db8
      Tejun Heo 提交于
      [ Based on original patch from Christoph Lameter and Mike Travis. ]
      
      As pda is now allocated in percpu area, it can easily be made a proper
      percpu variable.  Make it so by defining per cpu symbol from linker
      script and declaring it in C code for SMP and simply defining it for
      UP.  This change cleans up code and brings SMP and UP closer a bit.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b12d8db8
    • T
      x86: merge 64 and 32 SMP percpu handling · 9939ddaf
      Tejun Heo 提交于
      Now that pda is allocated as part of percpu, percpu doesn't need to be
      accessed through pda.  Unify x86_64 SMP percpu access with x86_32 SMP
      one.  Other than the segment register, operand size and the base of
      percpu symbols, they behave identical now.
      
      This patch replaces now unnecessary pda->data_offset with a dummy
      field which is necessary to keep stack_canary at its place.  This
      patch also moves per_cpu_offset initialization out of init_gdt() into
      setup_per_cpu_areas().  Note that this change also necessitates
      explicit per_cpu_offset initializations in voyager_smp.c.
      
      With this change, x86_OP_percpu()'s are as efficient on x86_64 as on
      x86_32 and also x86_64 can use assembly PER_CPU macros.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9939ddaf
    • T
      x86: fold pda into percpu area on SMP · 1a51e3a0
      Tejun Heo 提交于
      [ Based on original patch from Christoph Lameter and Mike Travis. ]
      
      Currently pdas and percpu areas are allocated separately.  %gs points
      to local pda and percpu area can be reached using pda->data_offset.
      This patch folds pda into percpu area.
      
      Due to strange gcc requirement, pda needs to be at the beginning of
      the percpu area so that pda->stack_canary is at %gs:40.  To achieve
      this, a new percpu output section macro - PERCPU_VADDR_PREALLOC() - is
      added and used to reserve pda sized chunk at the start of the percpu
      area.
      
      After this change, for boot cpu, %gs first points to pda in the
      data.init area and later during setup_per_cpu_areas() gets updated to
      point to the actual pda.  This means that setup_per_cpu_areas() need
      to reload %gs for CPU0 while clearing pda area for other cpus as cpu0
      already has modified it when control reaches setup_per_cpu_areas().
      
      This patch also removes now unnecessary get_local_pda() and its call
      sites.
      
      A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into
      per cpu area" patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1a51e3a0
    • T
      x86: use static _cpu_pda array · c8f3329a
      Tejun Heo 提交于
      _cpu_pda array first uses statically allocated storage in data.init
      and then switches to allocated bootmem to conserve space.  However,
      after folding pda area into percpu area, _cpu_pda array will be
      removed completely.  Drop the reallocation part to simplify the code
      for soon-to-follow changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c8f3329a
    • T
      x86: make percpu symbols zerobased on SMP · 3e5d8f97
      Tejun Heo 提交于
      [ Based on original patch from Christoph Lameter and Mike Travis. ]
      
      This patch makes percpu symbols zerobased on x86_64 SMP by adding
      PERCPU_VADDR() to vmlinux.lds.h which helps setting explicit vaddr on
      the percpu output section and using it in vmlinux_64.lds.S.  A new
      PHDR is added as existing ones cannot contain sections near address
      zero.  PERCPU_VADDR() also adds a new symbol __per_cpu_load which
      always points to the vaddr of the loaded percpu data.init region.
      
      The following adjustments have been made to accomodate the address
      change.
      
      * code to locate percpu gdt_page in head_64.S is updated to add the
        load address to the gdt_page offset.
      
      * __per_cpu_load is used in places where access to the init data area
        is necessary.
      
      * pda->data_offset is initialized soon after C code is entered as zero
        value doesn't work anymore.
      
      This patch is mostly taken from Mike Travis' "x86_64: Base percpu
      variables at zero" patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e5d8f97
    • M
      x86: cleanup early setup_percpu references · c90aa894
      Mike Travis 提交于
      [ Based on original patch from Christoph Lameter and Mike Travis. ]
      
        * Ruggedize some calls in setup_percpu.c to prevent mishaps
          in early calls, particularly for non-critical functions.
      
        * Cleanup DEBUG_PER_CPU_MAPS usages and some comments.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c90aa894
  4. 11 1月, 2009 1 次提交
  5. 05 1月, 2009 1 次提交
  6. 04 1月, 2009 2 次提交
  7. 26 12月, 2008 1 次提交
  8. 17 12月, 2008 1 次提交
  9. 13 12月, 2008 1 次提交
    • R
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and... · 29c0177e
      Rusty Russell 提交于
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers.
      
      Impact: change calling convention of existing cpumask APIs
      
      Most cpumask functions started with cpus_: these have been replaced by
      cpumask_ ones which take struct cpumask pointers as expected.
      
      These four functions don't have good replacement names; fortunately
      they're rarely used, so we just change them over.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: paulus@samba.org
      Cc: mingo@redhat.com
      Cc: tony.luck@intel.com
      Cc: ralf@linux-mips.org
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: cl@linux-foundation.org
      Cc: srostedt@redhat.com
      29c0177e
  10. 22 10月, 2008 1 次提交
  11. 16 10月, 2008 3 次提交
  12. 01 8月, 2008 1 次提交
  13. 29 7月, 2008 1 次提交
    • L
      cpu masks: optimize and clean up cpumask_of_cpu() · e56b3bc7
      Linus Torvalds 提交于
      Clean up and optimize cpumask_of_cpu(), by sharing all the zero words.
      
      Instead of stupidly generating all possible i=0...NR_CPUS 2^i patterns
      creating a huge array of constant bitmasks, realize that the zero words
      can be shared.
      
      In other words, on a 64-bit architecture, we only ever need 64 of these
      arrays - with a different bit set in one single world (with enough zero
      words around it so that we can create any bitmask by just offsetting in
      that big array). And then we just put enough zeroes around it that we
      can point every single cpumask to be one of those things.
      
      So when we have 4k CPU's, instead of having 4k arrays (of 4k bits each,
      with one bit set in each array - 2MB memory total), we have exactly 64
      arrays instead, each 8k bits in size (64kB total).
      
      And then we just point cpumask(n) to the right position (which we can
      calculate dynamically). Once we have the right arrays, getting
      "cpumask(n)" ends up being:
      
        static inline const cpumask_t *get_cpu_mask(unsigned int cpu)
        {
                const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG];
                p -= cpu / BITS_PER_LONG;
                return (const cpumask_t *)p;
        }
      
      This brings other advantages and simplifications as well:
      
       - we are not wasting memory that is just filled with a single bit in
         various different places
      
       - we don't need all those games to re-create the arrays in some dense
         format, because they're already going to be dense enough.
      
      if we compile a kernel for up to 4k CPU's, "wasting" that 64kB of memory
      is a non-issue (especially since by doing this "overlapping" trick we
      probably get better cache behaviour anyway).
      
      [ mingo@elte.hu:
      
        Converted Linus's mails into a commit. See:
      
           http://lkml.org/lkml/2008/7/27/156
           http://lkml.org/lkml/2008/7/28/320
      
        Also applied a family filter - which also has the side-effect of leaving
        out the bits where Linus calls me an idio... Oh, never mind ;-)
      ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Mike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e56b3bc7
  14. 26 7月, 2008 1 次提交
  15. 22 7月, 2008 1 次提交
  16. 14 7月, 2008 1 次提交
  17. 08 7月, 2008 13 次提交