1. 31 5月, 2010 1 次提交
  2. 28 5月, 2010 1 次提交
  3. 03 3月, 2010 1 次提交
  4. 26 2月, 2010 1 次提交
  5. 10 12月, 2009 1 次提交
  6. 14 8月, 2009 9 次提交
    • T
      x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA · 4518e6a0
      Tejun Heo 提交于
      Embedding percpu first chunk allocator can now handle very sparse unit
      mapping.  Use embedding allocator instead of lpage for 64bit NUMA.
      This removes extra TLB pressure and the need to do complex and fragile
      dancing when changing page attributes.
      
      For 32bit, using very sparse unit mapping isn't a good idea because
      the vmalloc space is very constrained.  32bit NUMA machines aren't
      exactly the focus of optimization and it isn't very clear whether
      lpage performs better than page.  Use page first chunk allocator for
      32bit NUMAs.
      
      As this leaves setup_pcpu_*() functions pretty much empty, fold them
      into setup_per_cpu_areas().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Andi Kleen <andi@firstfloor.org>
      4518e6a0
    • T
      percpu: update embedding first chunk allocator to handle sparse units · c8826dd5
      Tejun Heo 提交于
      Now that percpu core can handle very sparse units, given that vmalloc
      space is large enough, embedding first chunk allocator can use any
      memory to build the first chunk.  This patch teaches
      pcpu_embed_first_chunk() about distances between cpus and to use
      alloc/free callbacks to allocate node specific areas for each group
      and use them for the first chunk.
      
      This brings the benefits of embedding allocator to NUMA configurations
      - no extra TLB pressure with the flexibility of unified dynamic
      allocator and no need to restructure arch code to build memory layout
      suitable for percpu.  With units put into atom_size aligned groups
      according to cpu distances, using large page for dynamic chunks is
      also easily possible with falling back to reuglar pages if large
      allocation fails.
      
      Embedding allocator users are converted to specify NULL
      cpu_distance_fn, so this patch doesn't cause any visible behavior
      difference.  Following patches will convert them.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      c8826dd5
    • T
      percpu: add pcpu_unit_offsets[] · fb435d52
      Tejun Heo 提交于
      Currently units are mapped sequentially into address space.  This
      patch adds pcpu_unit_offsets[] which allows units to be mapped to
      arbitrary offsets from the chunk base address.  This is necessary to
      allow sparse embedding which might would need to allocate address
      ranges and memory areas which aren't aligned to unit size but
      allocation atom size (page or large page size).  This also simplifies
      things a bit by removing the need to calculate offset from unit
      number.
      
      With this change, there's no need for the arch code to know
      pcpu_unit_size.  Update pcpu_setup_first_chunk() and first chunk
      allocators to return regular 0 or -errno return code instead of unit
      size or -errno.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      fb435d52
    • T
      percpu: introduce pcpu_alloc_info and pcpu_group_info · fd1e8a1f
      Tejun Heo 提交于
      Till now, non-linear cpu->unit map was expressed using an integer
      array which maps each cpu to a unit and used only by lpage allocator.
      Although how many units have been placed in a single contiguos area
      (group) is known while building unit_map, the information is lost when
      the result is recorded into the unit_map array.  For lpage allocator,
      as all allocations are done by lpages and whether two adjacent lpages
      are in the same group or not is irrelevant, this didn't cause any
      problem.  Non-linear cpu->unit mapping will be used for sparse
      embedding and this grouping information is necessary for that.
      
      This patch introduces pcpu_alloc_info which contains all the
      information necessary for initializing percpu allocator.
      pcpu_alloc_info contains array of pcpu_group_info which describes how
      units are grouped and mapped to cpus.  pcpu_group_info also has
      base_offset field to specify its offset from the chunk's base address.
      pcpu_build_alloc_info() initializes this field as if all groups are
      allocated back-to-back as is currently done but this will be used to
      sparsely place groups.
      
      pcpu_alloc_info is a rather complex data structure which contains a
      flexible array which in turn points to nested cpu_map arrays.
      
      * pcpu_alloc_alloc_info() and pcpu_free_alloc_info() are provided to
        help dealing with pcpu_alloc_info.
      
      * pcpu_lpage_build_unit_map() is updated to build pcpu_alloc_info,
        generalized and renamed to pcpu_build_alloc_info().
        @cpu_distance_fn may be NULL indicating that all cpus are of
        LOCAL_DISTANCE.
      
      * pcpul_lpage_dump_cfg() is updated to process pcpu_alloc_info,
        generalized and renamed to pcpu_dump_alloc_info().  It now also
        prints which group each alloc unit belongs to.
      
      * pcpu_setup_first_chunk() now takes pcpu_alloc_info instead of the
        separate parameters.  All first chunk allocators are updated to use
        pcpu_build_alloc_info() to build alloc_info and call
        pcpu_setup_first_chunk() with it.  This has the side effect of
        packing units for sparse possible cpus.  ie. if cpus 0, 2 and 4 are
        possible, they'll be assigned unit 0, 1 and 2 instead of 0, 2 and 4.
      
      * x86 setup_pcpu_lpage() is updated to deal with alloc_info.
      
      * sparc64 setup_per_cpu_areas() is updated to build alloc_info.
      
      Although the changes made by this patch are pretty pervasive, it
      doesn't cause any behavior difference other than packing of sparse
      cpus.  It mostly changes how information is passed among
      initialization functions and makes room for more flexibility.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      fd1e8a1f
    • T
      percpu: add @align to pcpu_fc_alloc_fn_t · 3cbc8565
      Tejun Heo 提交于
      pcpu_fc_alloc_fn_t is about to see more interesting usage, add @align
      parameter.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      3cbc8565
    • T
      percpu: drop @static_size from first chunk allocators · 9a773769
      Tejun Heo 提交于
      First chunk allocators assume percpu areas have been linked using one
      of PERCPU_*() macros and depend on __per_cpu_load symbol defined by
      those macros, so there isn't much point in passing in static area size
      explicitly when it can be easily calculated from __per_cpu_start and
      __per_cpu_end.  Drop @static_size from all percpu first chunk
      allocators and helpers.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      9a773769
    • T
      percpu: generalize first chunk allocator selection · f58dc01b
      Tejun Heo 提交于
      Now that all first chunk allocators are in mm/percpu.c, it makes sense
      to make generalize percpu_alloc kernel parameter.  Define PCPU_FC_*
      and set pcpu_chosen_fc using early_param() in mm/percpu.c.  Arch code
      can use the set value to determine which first chunk allocator to use.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f58dc01b
    • T
      percpu: rename 4k first chunk allocator to page · 00ae4064
      Tejun Heo 提交于
      Page size isn't always 4k depending on arch and configuration.  Rename
      4k first chunk allocator to page.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: David Howells <dhowells@redhat.com>
      00ae4064
    • T
      percpu, sparc64: fix sparse possible cpu map handling · 74d46d6b
      Tejun Heo 提交于
      percpu code has been assuming num_possible_cpus() == nr_cpu_ids which
      is incorrect if cpu_possible_map contains holes.  This causes percpu
      code to access beyond allocated memories and vmalloc areas.  On a
      sparc64 machine with cpus 0 and 2 (u60), this triggers the following
      warning or fails boot.
      
       WARNING: at /devel/tj/os/work/mm/vmalloc.c:106 vmap_page_range_noflush+0x1f0/0x240()
       Modules linked in:
       Call Trace:
        [00000000004b17d0] vmap_page_range_noflush+0x1f0/0x240
        [00000000004b1840] map_vm_area+0x20/0x60
        [00000000004b1950] __vmalloc_area_node+0xd0/0x160
        [0000000000593434] deflate_init+0x14/0xe0
        [0000000000583b94] __crypto_alloc_tfm+0xd4/0x1e0
        [00000000005844f0] crypto_alloc_base+0x50/0xa0
        [000000000058b898] alg_test_comp+0x18/0x80
        [000000000058dad4] alg_test+0x54/0x180
        [000000000058af00] cryptomgr_test+0x40/0x60
        [0000000000473098] kthread+0x58/0x80
        [000000000042b590] kernel_thread+0x30/0x60
        [0000000000472fd0] kthreadd+0xf0/0x160
       ---[ end trace 429b268a213317ba ]---
      
      This patch fixes generic percpu functions and sparc64
      setup_per_cpu_areas() so that they handle sparse cpu_possible_map
      properly.
      
      Please note that on x86, cpu_possible_map() doesn't contain holes and
      thus num_possible_cpus() == nr_cpu_ids and this patch doesn't cause
      any behavior difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      74d46d6b
  7. 04 7月, 2009 4 次提交
    • T
      percpu: teach large page allocator about NUMA · a530b795
      Tejun Heo 提交于
      Large page first chunk allocator is primarily used for NUMA machines;
      however, its NUMA handling is extremely simplistic.  Regardless of
      their proximity, each cpu is put into separate large page just to
      return most of the allocated space back wasting large amount of
      vmalloc space and increasing cache footprint.
      
      This patch teachs NUMA details to large page allocator.  Given
      processor proximity information, pcpu_lpage_build_unit_map() will find
      fitting cpu -> unit mapping in which cpus in LOCAL_DISTANCE share the
      same large page and not too much virtual address space is wasted.
      
      This greatly reduces the unit and thus chunk size and wastes much less
      address space for the first chunk.  For example, on 4/4 NUMA machine,
      the original code occupied 16MB of virtual space for the first chunk
      while the new code only uses 4MB - one 2MB page for each node.
      
      [ Impact: much better space efficiency on NUMA machines ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Miller <davem@davemloft.net>
      a530b795
    • T
      x86,percpu: generalize lpage first chunk allocator · 8c4bfc6e
      Tejun Heo 提交于
      Generalize and move x86 setup_pcpu_lpage() into
      pcpu_lpage_first_chunk().  setup_pcpu_lpage() now is a simple wrapper
      around the generalized version.  Other than taking size parameters and
      using arch supplied callbacks to allocate/free/map memory,
      pcpu_lpage_first_chunk() is identical to the original implementation.
      
      This simplifies arch code and will help converting more archs to
      dynamic percpu allocator.
      
      While at it, factor out pcpu_calc_fc_sizes() which is common to
      pcpu_embed_first_chunk() and pcpu_lpage_first_chunk().
      
      [ Impact: code reorganization and generalization ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      8c4bfc6e
    • T
      x86,percpu: generalize 4k first chunk allocator · d4b95f80
      Tejun Heo 提交于
      Generalize and move x86 setup_pcpu_4k() into pcpu_4k_first_chunk().
      setup_pcpu_4k() now is a simple wrapper around the generalized
      version.  Other than taking size parameters and using arch supplied
      callbacks to allocate/free memory, pcpu_4k_first_chunk() is identical
      to the original implementation.
      
      This simplifies arch code and will help converting more archs to
      dynamic percpu allocator.
      
      While at it, s/pcpu_populate_pte_fn_t/pcpu_fc_populate_pte_fn_t/ for
      consistency.
      
      [ Impact: code reorganization and generalization ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      d4b95f80
    • T
      percpu: drop @unit_size from embed first chunk allocator · 788e5abc
      Tejun Heo 提交于
      The only extra feature @unit_size provides is making dead space at the
      end of the first chunk which doesn't have any valid usecase.  Drop the
      parameter.  This will increase consistency with generalized 4k
      allocator.
      
      James Bottomley spotted missing conversion for the default
      setup_per_cpu_areas() which caused build breakage on all arcsh which
      use it.
      
      [ Impact: drop unused code path ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      788e5abc
  8. 22 6月, 2009 6 次提交
    • T
      x86: ensure percpu lpage doesn't consume too much vmalloc space · 0017c869
      Tejun Heo 提交于
      On extreme configuration (e.g. 32bit 32-way NUMA machine), lpage
      percpu first chunk allocator can consume too much of vmalloc space.
      Make it fall back to 4k allocator if the consumption goes over 20%.
      
      [ Impact: add sanity check for lpage percpu first chunk allocator ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      0017c869
    • T
      x86: implement percpu_alloc kernel parameter · fa8a7094
      Tejun Heo 提交于
      According to Andi, it isn't clear whether lpage allocator is worth the
      trouble as there are many processors where PMD TLB is far scarcer than
      PTE TLB.  The advantage or disadvantage probably depends on the actual
      size of percpu area and specific processor.  As performance
      degradation due to TLB pressure tends to be highly workload specific
      and subtle, it is difficult to decide which way to go without more
      data.
      
      This patch implements percpu_alloc kernel parameter to allow selecting
      which first chunk allocator to use to ease debugging and testing.
      
      While at it, make sure all the failure paths report why something
      failed to help determining why certain allocator isn't working.  Also,
      kill the "Great future plan" comment which had already been realized
      quite some time ago.
      
      [ Impact: allow explicit percpu first chunk allocator selection ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      fa8a7094
    • T
      x86: fix pageattr handling for lpage percpu allocator and re-enable it · e59a1bb2
      Tejun Heo 提交于
      lpage allocator aliases a PMD page for each cpu and returns whatever
      is unused to the page allocator.  When the pageattr of the recycled
      pages are changed, this makes the two aliases point to the overlapping
      regions with different attributes which isn't allowed and known to
      cause subtle data corruption in certain cases.
      
      This can be handled in simliar manner to the x86_64 highmap alias.
      pageattr code should detect if the target pages have PMD alias and
      split the PMD alias and synchronize the attributes.
      
      pcpur allocator is updated to keep the allocated PMD pages map sorted
      in ascending address order and provide pcpu_lpage_remapped() function
      which binary searches the array to determine whether the given address
      is aliased and if so to which address.  pageattr is updated to use
      pcpu_lpage_remapped() to detect the PMD alias and split it up as
      necessary from cpa_process_alias().
      
      Jan Beulich spotted the original problem and incorrect usage of vaddr
      instead of laddr for lookup.
      
      With this, lpage percpu allocator should work correctly.  Re-enable
      it.
      
      [ Impact: fix subtle lpage pageattr bug and re-enable lpage ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      e59a1bb2
    • T
      x86: prepare setup_pcpu_lpage() for pageattr fix · 0ff2587f
      Tejun Heo 提交于
      Make the following changes in preparation of coming pageattr updates.
      
      * Define and use array of struct pcpul_ent instead of array of
        pointers.  The only difference is ->cpu field which is set but
        unused yet.
      
      * Rename variables according to the above change.
      
      * Rename local variable vm to pcpul_vm and move it out of the
        function.
      
      [ Impact: no functional difference ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      0ff2587f
    • T
      x86: rename remap percpu first chunk allocator to lpage · 97c9bf06
      Tejun Heo 提交于
      The "remap" allocator remaps large pages to build the first chunk;
      however, the name isn't very good because 4k allocator remaps too and
      the whole point of the remap allocator is using large page mapping.
      The allocator will be generalized and exported outside of x86, rename
      it to lpage before that happens.
      
      percpu_alloc kernel parameter is updated to accept both "remap" and
      "lpage" for lpage allocator.
      
      [ Impact: code cleanup, kernel parameter argument updated ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      97c9bf06
    • T
      x86: fix duplicate free in setup_pcpu_remap() failure path · c5806df9
      Tejun Heo 提交于
      In the failure path, setup_pcpu_remap() tries to free the area which
      has already been freed to make holes in the large page.  Fix it.
      
      [ Impact: fix duplicate free in failure path ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      c5806df9
  9. 25 5月, 2009 1 次提交
  10. 18 5月, 2009 1 次提交
    • Y
      x86: fix system without memory on node0 · 35d5a9a6
      Yinghai Lu 提交于
      Jack found a boot crash on a system which doesn't have memory on node0.
      
      It turns out with recent per_cpu changes, node_number for BSP will always
      be 0, and it is not consistent to cpu_to_node() that might set it to a
      different (nearer) node already.
      
      aka when numa_set_node() for node0 is called early before per_cpu area is
      setup:
      
      two places touched that per_cpu(node_number,):
      
      1. in cpu/common.c::cpu_init() and it is not for BP
      | #ifdef CONFIG_NUMA
      |        if (cpu != 0 && percpu_read(node_number) == 0 &&
      |            cpu_to_node(cpu) != NUMA_NO_NODE)
      |                percpu_write(node_number, cpu_to_node(cpu));
      | #endif
      for BP: traps_init ==> cpu_init
      for AP: start_secondary ==> cpu_init
      
      2. cpu/intel.c or amd.c::srat_detect_node via numa_set_node()
      for BP: check_bugs ==> identify_boot_cpu ==> identify_cpu()
      	 that is rather later before numa_node_id() is used for BP...
      for AP: start_secondary => smp_callin => smp_store_cpu_info() =>
      	=> identify_secondary_cpu => identify_cpu()
      
      so try to set that for BP earlier in setup_per_cpu_areas(), and
      don't bother to set that for APs there (it will be updated later
      and will be used later)
      
      (and don't mess the 0 before the copying BP per_cpu data to APs)
      
      [ Impact: fix boot crash on memoryless node-0 ]
      Reported-and-tested-by: NJack Steiner <steiner@sgi.com>
      Cc: Tejun Heo <htejun@gmail.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4A0C4A02.7050401@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      35d5a9a6
  11. 02 4月, 2009 2 次提交
  12. 10 3月, 2009 2 次提交
    • T
      percpu: generalize embedding first chunk setup helper · 66c3a757
      Tejun Heo 提交于
      Impact: code reorganization
      
      Separate out embedding first chunk setup helper from x86 embedding
      first chunk allocator and put it in mm/percpu.c.  This will be used by
      the default percpu first chunk allocator and possibly by other archs.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      66c3a757
    • T
      percpu: more flexibility for @dyn_size of pcpu_setup_first_chunk() · 6074d5b0
      Tejun Heo 提交于
      Impact: cleanup, more flexibility for first chunk init
      
      Non-negative @dyn_size used to be allowed iff @unit_size wasn't auto.
      This restriction stemmed from implementation detail and made things a
      bit less intuitive.  This patch allows @dyn_size to be specified
      regardless of @unit_size and swaps the positions of @dyn_size and
      @unit_size so that the parameter order makes more sense (static,
      reserved and dyn sizes followed by enclosing unit_size).
      
      While at it, add @unit_size >= PCPU_MIN_UNIT_SIZE sanity check.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      6074d5b0
  13. 06 3月, 2009 4 次提交
    • T
      x86, percpu: setup reserved percpu area for x86_64 · 6b19b0c2
      Tejun Heo 提交于
      Impact: fix relocation overflow during module load
      
      x86_64 uses 32bit relocations for symbol access and static percpu
      symbols whether in core or modules must be inside 2GB of the percpu
      segement base which the dynamic percpu allocator doesn't guarantee.
      This patch makes x86_64 reserve PERCPU_MODULE_RESERVE bytes in the
      first chunk so that module percpu areas are always allocated from the
      first chunk which is always inside the relocatable range.
      
      This problem exists for any percpu allocator but is easily triggered
      when using the embedding allocator because the second chunk is located
      beyond 2GB on it.
      
      This patch also changes the meaning of PERCPU_DYNAMIC_RESERVE such
      that it only indicates the size of the area to reserve for dynamic
      allocation as static and dynamic areas can be separate.  New
      PERCPU_DYNAMIC_RESERVED is increased by 4k for both 32 and 64bits as
      the reserved area separation eats away some allocatable space and
      having slightly more headroom (currently between 4 and 8k after
      minimal boot sans module area) makes sense for common case
      performance.
      
      x86_32 can address anywhere from anywhere and doesn't need reserving.
      
      Mike Galbraith first reported the problem first and bisected it to the
      embedding percpu allocator commit.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NMike Galbraith <efault@gmx.de>
      Reported-by: NJaswinder Singh Rajput <jaswinder@kernel.org>
      6b19b0c2
    • T
      percpu, module: implement reserved allocation and use it for module percpu variables · edcb4639
      Tejun Heo 提交于
      Impact: add reserved allocation functionality and use it for module
      	percpu variables
      
      This patch implements reserved allocation from the first chunk.  When
      setting up the first chunk, arch can ask to set aside certain number
      of bytes right after the core static area which is available only
      through a separate reserved allocator.  This will be used primarily
      for module static percpu variables on architectures with limited
      relocation range to ensure that the module perpcu symbols are inside
      the relocatable range.
      
      If reserved area is requested, the first chunk becomes reserved and
      isn't available for regular allocation.  If the first chunk also
      includes piggy-back dynamic allocation area, a separate chunk mapping
      the same region is created to serve dynamic allocation.  The first one
      is called static first chunk and the second dynamic first chunk.
      Although they share the page map, their different area map
      initializations guarantee they serve disjoint areas according to their
      purposes.
      
      If arch doesn't setup reserved area, reserved allocation is handled
      like any other allocation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      edcb4639
    • T
      x86: make embedding percpu allocator return excessive free space · 9a4f8a87
      Tejun Heo 提交于
      Impact: reduce unnecessary memory usage on certain configurations
      
      Embedding percpu allocator allocates unit_size *
      smp_num_possible_cpus() bytes consecutively and use it for the first
      chunk.  However, if the static area is small, this can result in
      excessive prellocated free space in the first chunk due to
      PCPU_MIN_UNIT_SIZE restriction.
      
      This patch makes embedding percpu allocator preallocate only what's
      necessary as described by PERPCU_DYNAMIC_RESERVE and return the
      leftover to the bootmem allocator.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      9a4f8a87
    • T
      percpu: use negative for auto for pcpu_setup_first_chunk() arguments · cafe8816
      Tejun Heo 提交于
      Impact: argument semantic cleanup
      
      In pcpu_setup_first_chunk(), zero @unit_size and @dyn_size meant
      auto-sizing.  It's okay for @unit_size as 0 doesn't make sense but 0
      dynamic reserve size is valid.  Alos, if arch @dyn_size is calculated
      from other parameters, it might end up passing in 0 @dyn_size and
      malfunction when the size is automatically adjusted.
      
      This patch makes both @unit_size and @dyn_size ssize_t and use -1 for
      auto sizing.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      cafe8816
  14. 25 2月, 2009 1 次提交
  15. 24 2月, 2009 5 次提交
    • T
      x86: add remapping percpu first chunk allocator · 8ac83757
      Tejun Heo 提交于
      Impact: add better first percpu allocation for NUMA
      
      On NUMA, embedding allocator can't be used as different units can't be
      made to fall in the correct NUMA nodes.  To use large page mapping,
      each unit needs to be remapped.  However, percpu areas are usually
      much smaller than large page size and unused space hurts a lot as the
      number of cpus grow.  This allocator remaps large pages for each chunk
      but gives back unused part to the bootmem allocator making the large
      pages mapped twice.
      
      This adds slightly to the TLB pressure but is much better than using
      4k mappings while still being NUMA-friendly.
      
      Ingo suggested that this would be the correct approach for NUMA.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      8ac83757
    • T
      x86: add embedding percpu first chunk allocator · 89c92151
      Tejun Heo 提交于
      Impact: add better first percpu allocation for !NUMA
      
      On !NUMA, we can simply allocate contiguous memory and use it for the
      first chunk without mapping it into vmalloc area.  As the memory area
      is covered by the large page physical memory mapping, it allows the
      dynamic perpcu allocator to not add any TLB overhead for the static
      percpu area and whatever falls into the first chunk and the
      implementation is very simple too.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      89c92151
    • T
      x86: separate out setup_pcpu_4k() from setup_per_cpu_areas() · 5f5d8405
      Tejun Heo 提交于
      Impact: modularize percpu first chunk allocation
      
      x86 is gonna have a few different strategies for the first chunk
      allocation.  Modularize it by separating out the current allocation
      mechanism into pcpu_alloc_bootmem() and setup_pcpu_4k().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      5f5d8405
    • T
      percpu: give more latitude to arch specific first chunk initialization · 8d408b4b
      Tejun Heo 提交于
      Impact: more latitude for first percpu chunk allocation
      
      The first percpu chunk serves the kernel static percpu area and may or
      may not contain extra room for further dynamic allocation.
      Initialization of the first chunk needs to be done before normal
      memory allocation service is up, so it has its own init path -
      pcpu_setup_static().
      
      It seems archs need more latitude while initializing the first chunk
      for example to take advantage of large page mapping.  This patch makes
      the following changes to allow this.
      
      * Define PERCPU_DYNAMIC_RESERVE to give arch hint about how much space
        to reserve in the first chunk for further dynamic allocation.
      
      * Rename pcpu_setup_static() to pcpu_setup_first_chunk().
      
      * Make pcpu_setup_first_chunk() much more flexible by fetching page
        pointer by callback and adding optional @unit_size, @free_size and
        @base_addr arguments which allow archs to selectively part of chunk
        initialization to their likings.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      8d408b4b
    • T
      x86: update populate_extra_pte() and add populate_extra_pmd() · 458a3e64
      Tejun Heo 提交于
      Impact: minor change to populate_extra_pte() and addition of pmd flavor
      
      Update populate_extra_pte() to return pointer to the pte_t for the
      specified address and add populate_extra_pmd() which only populates
      till the pmd and returns pointer to the pmd entry for the address.
      
      For 64bit, pud/pmd/pte fill functions are separated out from
      set_pte_vaddr[_pud]() and used for set_pte_vaddr[_pud]() and
      populate_extra_{pte|pmd}().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      458a3e64