1. 29 9月, 2009 1 次提交
    • T
      sparc64: implement page mapping percpu first chunk allocator · a70c6913
      Tejun Heo 提交于
      Implement page mapping percpu first chunk allocator as a fallback to
      the embedding allocator.  The next patch will make the embedding
      allocator check distances between units to determine whether it fits
      within the vmalloc area so that this fallback can be used on such
      cases.
      
      sparc64 currently has relatively small vmalloc area which makes it
      impossible to create any dynamic chunks on certain configurations
      leading to percpu allocation failures.  This and the next patch should
      allow those configurations to keep working until proper solution is
      found.
      
      While at it, mark pcpu_cpu_distance() with __init.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      a70c6913
  2. 14 8月, 2009 4 次提交
    • T
      sparc64: use embedding percpu first chunk allocator · bcb2107f
      Tejun Heo 提交于
      sparc64 currently allocates a large page for each cpu and partially
      remap them into vmalloc area much like what lpage first chunk
      allocator did.  As a 4M page is used for each cpu, this results in
      very large unit size and also adds TLB pressure due to the double
      mapping of pages in the first chunk.
      
      This patch converts sparc64 to use the embedding percpu first chunk
      allocator which now knows how to handle NUMA configurations.  This
      simplifies the code a lot, doesn't incur any extra TLB pressure and
      results in better utilization of address space.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      bcb2107f
    • T
      percpu: add pcpu_unit_offsets[] · fb435d52
      Tejun Heo 提交于
      Currently units are mapped sequentially into address space.  This
      patch adds pcpu_unit_offsets[] which allows units to be mapped to
      arbitrary offsets from the chunk base address.  This is necessary to
      allow sparse embedding which might would need to allocate address
      ranges and memory areas which aren't aligned to unit size but
      allocation atom size (page or large page size).  This also simplifies
      things a bit by removing the need to calculate offset from unit
      number.
      
      With this change, there's no need for the arch code to know
      pcpu_unit_size.  Update pcpu_setup_first_chunk() and first chunk
      allocators to return regular 0 or -errno return code instead of unit
      size or -errno.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      fb435d52
    • T
      percpu: introduce pcpu_alloc_info and pcpu_group_info · fd1e8a1f
      Tejun Heo 提交于
      Till now, non-linear cpu->unit map was expressed using an integer
      array which maps each cpu to a unit and used only by lpage allocator.
      Although how many units have been placed in a single contiguos area
      (group) is known while building unit_map, the information is lost when
      the result is recorded into the unit_map array.  For lpage allocator,
      as all allocations are done by lpages and whether two adjacent lpages
      are in the same group or not is irrelevant, this didn't cause any
      problem.  Non-linear cpu->unit mapping will be used for sparse
      embedding and this grouping information is necessary for that.
      
      This patch introduces pcpu_alloc_info which contains all the
      information necessary for initializing percpu allocator.
      pcpu_alloc_info contains array of pcpu_group_info which describes how
      units are grouped and mapped to cpus.  pcpu_group_info also has
      base_offset field to specify its offset from the chunk's base address.
      pcpu_build_alloc_info() initializes this field as if all groups are
      allocated back-to-back as is currently done but this will be used to
      sparsely place groups.
      
      pcpu_alloc_info is a rather complex data structure which contains a
      flexible array which in turn points to nested cpu_map arrays.
      
      * pcpu_alloc_alloc_info() and pcpu_free_alloc_info() are provided to
        help dealing with pcpu_alloc_info.
      
      * pcpu_lpage_build_unit_map() is updated to build pcpu_alloc_info,
        generalized and renamed to pcpu_build_alloc_info().
        @cpu_distance_fn may be NULL indicating that all cpus are of
        LOCAL_DISTANCE.
      
      * pcpul_lpage_dump_cfg() is updated to process pcpu_alloc_info,
        generalized and renamed to pcpu_dump_alloc_info().  It now also
        prints which group each alloc unit belongs to.
      
      * pcpu_setup_first_chunk() now takes pcpu_alloc_info instead of the
        separate parameters.  All first chunk allocators are updated to use
        pcpu_build_alloc_info() to build alloc_info and call
        pcpu_setup_first_chunk() with it.  This has the side effect of
        packing units for sparse possible cpus.  ie. if cpus 0, 2 and 4 are
        possible, they'll be assigned unit 0, 1 and 2 instead of 0, 2 and 4.
      
      * x86 setup_pcpu_lpage() is updated to deal with alloc_info.
      
      * sparc64 setup_per_cpu_areas() is updated to build alloc_info.
      
      Although the changes made by this patch are pretty pervasive, it
      doesn't cause any behavior difference other than packing of sparse
      cpus.  It mostly changes how information is passed among
      initialization functions and makes room for more flexibility.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      fd1e8a1f
    • T
      percpu, sparc64: fix sparse possible cpu map handling · 74d46d6b
      Tejun Heo 提交于
      percpu code has been assuming num_possible_cpus() == nr_cpu_ids which
      is incorrect if cpu_possible_map contains holes.  This causes percpu
      code to access beyond allocated memories and vmalloc areas.  On a
      sparc64 machine with cpus 0 and 2 (u60), this triggers the following
      warning or fails boot.
      
       WARNING: at /devel/tj/os/work/mm/vmalloc.c:106 vmap_page_range_noflush+0x1f0/0x240()
       Modules linked in:
       Call Trace:
        [00000000004b17d0] vmap_page_range_noflush+0x1f0/0x240
        [00000000004b1840] map_vm_area+0x20/0x60
        [00000000004b1950] __vmalloc_area_node+0xd0/0x160
        [0000000000593434] deflate_init+0x14/0xe0
        [0000000000583b94] __crypto_alloc_tfm+0xd4/0x1e0
        [00000000005844f0] crypto_alloc_base+0x50/0xa0
        [000000000058b898] alg_test_comp+0x18/0x80
        [000000000058dad4] alg_test+0x54/0x180
        [000000000058af00] cryptomgr_test+0x40/0x60
        [0000000000473098] kthread+0x58/0x80
        [000000000042b590] kernel_thread+0x30/0x60
        [0000000000472fd0] kthreadd+0xf0/0x160
       ---[ end trace 429b268a213317ba ]---
      
      This patch fixes generic percpu functions and sparc64
      setup_per_cpu_areas() so that they handle sparse cpu_possible_map
      properly.
      
      Please note that on x86, cpu_possible_map() doesn't contain holes and
      thus num_possible_cpus() == nr_cpu_ids and this patch doesn't cause
      any behavior difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      74d46d6b
  3. 04 7月, 2009 3 次提交
    • T
      percpu: allow non-linear / sparse cpu -> unit mapping · 2f39e637
      Tejun Heo 提交于
      Currently cpu and unit are always identity mapped.  To allow more
      efficient large page support on NUMA and lazy allocation for possible
      but offline cpus, cpu -> unit mapping needs to be non-linear and/or
      sparse.  This can be easily implemented by adding a cpu -> unit
      mapping array and using it whenever looking up the matching unit for a
      cpu.
      
      The only unusal conversion is in pcpu_chunk_addr_search().  The passed
      in address is unit0 based and unit0 might not be in use so it needs to
      be converted to address of an in-use unit.  This is easily done by
      adding the unit offset for the current processor.
      
      [ Impact: allows non-linear/sparse cpu -> unit mapping, no visible change yet ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      2f39e637
    • T
      percpu: drop pcpu_chunk->page[] · ce3141a2
      Tejun Heo 提交于
      percpu core doesn't need to tack all the allocated pages.  It needs to
      know whether certain pages are populated and a way to reverse map
      address to page when freeing.  This patch drops pcpu_chunk->page[] and
      use populated bitmap and vmalloc_to_page() lookup instead.  Using
      vmalloc_to_page() exclusively is also possible but complicates first
      chunk handling, inflates cache footprint and prevents non-standard
      memory allocation for percpu memory.
      
      pcpu_chunk->page[] was used to track each page's allocation and
      allowed asymmetric population which happens during failure path;
      however, with single bitmap for all units, this is no longer possible.
      Bite the bullet and rewrite (de)populate functions so that things are
      done in clearly separated steps such that asymmetric population
      doesn't happen.  This makes the (de)population process much more
      modular and will also ease implementing non-standard memory usage in
      the future (e.g. large pages).
      
      This makes @get_page_fn parameter to pcpu_setup_first_chunk()
      unnecessary.  The parameter is dropped and all first chunk helpers are
      updated accordingly.  Please note that despite the volume most changes
      to first chunk helpers are symbol renames for variables which don't
      need to be referenced outside of the helper anymore.
      
      This change reduces memory usage and cache footprint of pcpu_chunk.
      Now only #unit_pages bits are necessary per chunk.
      
      [ Impact: reduced memory usage and cache footprint for bookkeeping ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      ce3141a2
    • T
      percpu: simplify pcpu_setup_first_chunk() · 38a6be52
      Tejun Heo 提交于
      Now that all first chunk allocator helpers allocate and map the first
      chunk themselves, there's no need to have optional default alloc/map
      in pcpu_setup_first_chunk().  Drop @populate_pte_fn and only leave
      @dyn_size optional and make all other params mandatory.
      
      This makes it much easier to follow what pcpu_setup_first_chunk() is
      doing and what actual differences tweaking each parameter results in.
      
      [ Impact: drop unused code path ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      38a6be52
  4. 16 6月, 2009 8 次提交
  5. 15 4月, 2009 1 次提交
  6. 27 3月, 2009 1 次提交
    • D
      sparc64: Fix MM refcount check in smp_flush_tlb_pending(). · f9384d41
      David S. Miller 提交于
      As explained by Benjamin Herrenschmidt:
      
      > CPU 0 is running the context, task->mm == task->active_mm == your
      > context. The CPU is in userspace happily churning things.
      >
      > CPU 1 used to run it, not anymore, it's now running fancyfsd which
      > is a kernel thread, but current->active_mm still points to that
      > same context.
      >
      > Because there's only one "real" user, mm_users is 1 (but mm_count is
      > elevated, it's just that the presence on CPU 1 as active_mm has no
      > effect on mm_count().
      >
      > At this point, fancyfsd decides to invalidate a mapping currently mapped
      > by that context, for example because a networked file has changed
      > remotely or something like that, using unmap_mapping_ranges().
      >
      > So CPU 1 goes into the zapping code, which eventually ends up calling
      > flush_tlb_pending(). Your test will succeed, as current->active_mm is
      > indeed the target mm for the flush, and mm_users is indeed 1. So you
      > will -not- send an IPI to the other CPU, and CPU 0 will continue happily
      > accessing the pages that should have been unmapped.
      
      To fix this problem, check ->mm instead of ->active_mm, and this
      means:
      
      > So if you test current->mm, you effectively account for mm_users == 1,
      > so the only way the mm can be active on another processor is as a lazy
      > mm for a kernel thread. So your test should work properly as long
      > as you don't have a HW that will do speculative TLB reloads into the
      > TLB on that other CPU (and even if you do, you flush-on-switch-in should
      > get rid of any crap here).
      
      And therefore we should be OK.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9384d41
  7. 16 3月, 2009 3 次提交
  8. 07 1月, 2009 1 次提交
    • S
      sparc64: Use unsigned long long for u64. · 90181136
      Sam Ravnborg 提交于
      Andrew Morton wrote:
      
          People keep on doing
      
                  printk("%llu", some_u64);
      
          testing it only on x86_64 and this generates a warning storm on
          powerpc, sparc64, etc.  Because they use `long', not `long long'.
      
          Quite a few 64-bit architectures are using `long' for their
          s64/u64 types.  We should convert them all to `long long'.
      
      Update types.h so we use unsigned long long for u64 and
      fix all warnings in sparc64 code.
      Tested with an allnoconfig, defconfig and allmodconfig builds.
      
      This patch introduces additional warnings in several drivers.
      These will be dealt with in separate patches.
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90181136
  9. 08 12月, 2008 1 次提交
  10. 05 12月, 2008 3 次提交
  11. 01 12月, 2008 1 次提交
  12. 13 10月, 2008 1 次提交
  13. 03 9月, 2008 2 次提交
  14. 10 8月, 2008 1 次提交
  15. 05 8月, 2008 9 次提交