1. 16 11月, 2017 1 次提交
    • M
      mm: remove __GFP_COLD · 453f85d4
      Mel Gorman 提交于
      As the page free path makes no distinction between cache hot and cold
      pages, there is no real useful ordering of pages in the free list that
      allocation requests can take advantage of.  Juding from the users of
      __GFP_COLD, it is likely that a number of them are the result of copying
      other sites instead of actually measuring the impact.  Remove the
      __GFP_COLD parameter which simplifies a number of paths in the page
      allocator.
      
      This is potentially controversial but bear in mind that the size of the
      per-cpu pagelists versus modern cache sizes means that the whole per-cpu
      list can often fit in the L3 cache.  Hence, there is only a potential
      benefit for microbenchmarks that alloc/free pages in a tight loop.  It's
      even worse when THP is taken into account which has little or no chance
      of getting a cache-hot page as the per-cpu list is bypassed and the
      zeroing of multiple pages will thrash the cache anyway.
      
      The truncate microbenchmarks are not shown as this patch affects the
      allocation path and not the free path.  A page fault microbenchmark was
      tested but it showed no sigificant difference which is not surprising
      given that the __GFP_COLD branches are a miniscule percentage of the
      fault path.
      
      Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      453f85d4
  2. 29 6月, 2017 1 次提交
  3. 21 6月, 2017 2 次提交
  4. 07 3月, 2017 1 次提交
  5. 03 9月, 2014 4 次提交
    • T
      percpu: move region iterations out of pcpu_[de]populate_chunk() · a93ace48
      Tejun Heo 提交于
      Previously, pcpu_[de]populate_chunk() were called with the range which
      may contain multiple target regions in it and
      pcpu_[de]populate_chunk() iterated over the regions.  This has the
      benefit of batching up cache flushes for all the regions; however,
      we're planning to add more bookkeeping logic around [de]population to
      support atomic allocations and this delegation of iterations gets in
      the way.
      
      This patch moves the region iterations out of
      pcpu_[de]populate_chunk() into its callers - pcpu_alloc() and
      pcpu_reclaim() - so that we can later add logic to track more states
      around them.  This change may make cache and tlb flushes more frequent
      but multi-region [de]populations are rare anyway and if this actually
      becomes a problem, it's not difficult to factor out cache flushes as
      separate callbacks which are directly invoked from percpu.c.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a93ace48
    • T
      percpu: move common parts out of pcpu_[de]populate_chunk() · dca49645
      Tejun Heo 提交于
      percpu-vm and percpu-km implement separate versions of
      pcpu_[de]populate_chunk() and some part which is or should be common
      are currently in the specific implementations.  Make the following
      changes.
      
      * Allocate area clearing is moved from the pcpu_populate_chunk()
        implementations to pcpu_alloc().  This makes percpu-km's version
        noop.
      
      * Quick exit tests in pcpu_[de]populate_chunk() of percpu-vm are moved
        to their respective callers so that they are applied to percpu-km
        too.  This doesn't make any meaningful difference as both functions
        are noop for percpu-km; however, this is more consistent and will
        help implementing atomic allocation support.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      dca49645
    • T
      percpu: remove @may_alloc from pcpu_get_pages() · cdb4cba5
      Tejun Heo 提交于
      pcpu_get_pages() creates the temp pages array if not already allocated
      and returns the pointer to it.  As the function is called from both
      [de]population paths and depopulation can only happen after at least
      one successful population, the param doesn't make any difference - the
      allocation will always happen on the population path anyway.
      
      Remove @may_alloc from pcpu_get_pages().  Also, add an lockdep
      assertion pcpu_alloc_mutex instead of vaguely stating that the
      exclusion is the caller's responsibility.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      cdb4cba5
    • T
      percpu: remove the usage of separate populated bitmap in percpu-vm · fbbb7f4e
      Tejun Heo 提交于
      percpu-vm uses pcpu_get_pages_and_bitmap() to acquire temp pages array
      and populated bitmap and uses the two during [de]population.  The temp
      bitmap is used only to build the new bitmap that is copied to
      chunk->populated after the operation succeeds; however, the new bitmap
      can be trivially set after success without using the temp bitmap.
      
      This patch removes the temp populated bitmap usage from percpu-vm.c.
      
      * pcpu_get_pages_and_bitmap() is renamed to pcpu_get_pages() and no
        longer hands out the temp bitmap.
      
      * @populated arugment is dropped from all the related functions.
        @populated updates in pcpu_[un]map_pages() are dropped.
      
      * Two loops in pcpu_map_pages() are merged.
      
      * pcpu_[de]populated_chunk() modify chunk->populated bitmap directly
        from @page_start and @page_end after success.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NChristoph Lameter <cl@linux.com>
      fbbb7f4e
  6. 16 8月, 2014 2 次提交
    • T
      percpu: perform tlb flush after pcpu_map_pages() failure · 849f5169
      Tejun Heo 提交于
      If pcpu_map_pages() fails midway, it unmaps the already mapped pages.
      Currently, it doesn't flush tlb after the partial unmapping.  This may
      be okay in most cases as the established mapping hasn't been used at
      that point but it can go wrong and when it goes wrong it'd be
      extremely difficult to track down.
      
      Flush tlb after the partial unmapping.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      849f5169
    • T
      percpu: fix pcpu_alloc_pages() failure path · f0d27965
      Tejun Heo 提交于
      When pcpu_alloc_pages() fails midway, pcpu_free_pages() is invoked to
      free what has already been allocated.  The invocation is across the
      whole requested range and pcpu_free_pages() will try to free all
      non-NULL pages; unfortunately, this is incorrect as
      pcpu_get_pages_and_bitmap(), unlike what its comment suggests, doesn't
      clear the pages array and thus the array may have entries from the
      previous invocations making the partial failure path free incorrect
      pages.
      
      Fix it by open-coding the partial freeing of the already allocated
      pages.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      f0d27965
  7. 21 6月, 2012 1 次提交
  8. 21 1月, 2012 1 次提交
  9. 23 11月, 2011 2 次提交
    • T
      percpu: fix chunk range calculation · a855b84c
      Tejun Heo 提交于
      Percpu allocator recorded the cpus which map to the first and last
      units in pcpu_first/last_unit_cpu respectively and used them to
      determine the address range of a chunk - e.g. it assumed that the
      first unit has the lowest address in a chunk while the last unit has
      the highest address.
      
      This simply isn't true.  Groups in a chunk can have arbitrary positive
      or negative offsets from the previous one and there is no guarantee
      that the first unit occupies the lowest offset while the last one the
      highest.
      
      Fix it by actually comparing unit offsets to determine cpus occupying
      the lowest and highest offsets.  Also, rename pcu_first/last_unit_cpu
      to pcpu_low/high_unit_cpu to avoid confusion.
      
      The chunk address range is used to flush cache on vmalloc area
      map/unmap and decide whether a given address is in the first chunk by
      per_cpu_ptr_to_phys() and the bug was discovered by invalid
      per_cpu_ptr_to_phys() translation for crash_note.
      
      Kudos to Dave Young for tracking down the problem.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Reported-by: NDave Young <dyoung@redhat.com>
      Tested-by: NDave Young <dyoung@redhat.com>
      LKML-Reference: <4EC21F67.10905@redhat.com>
      Cc: stable @kernel.org
      a855b84c
    • B
      percpu: rename pcpu_mem_alloc to pcpu_mem_zalloc · 90459ce0
      Bob Liu 提交于
      Currently pcpu_mem_alloc() is implemented always return zeroed memory.
      So rename it to make user like pcpu_get_pages_and_bitmap() know don't
      reinit it.
      Signed-off-by: NBob Liu <lliubbo@gmail.com>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      90459ce0
  10. 14 1月, 2011 1 次提交
  11. 01 5月, 2010 1 次提交
    • T
      percpu: move vmalloc based chunk management into percpu-vm.c · 9f645532
      Tejun Heo 提交于
      Separate out and move chunk management (creation/desctruction and
      [de]population) code into percpu-vm.c which is included by percpu.c
      and compiled together.  The interface for chunk management is defined
      as follows.
      
       * pcpu_populate_chunk		- populate the specified range of a chunk
       * pcpu_depopulate_chunk	- depopulate the specified range of a chunk
       * pcpu_create_chunk		- create a new chunk
       * pcpu_destroy_chunk		- destroy a chunk, always preceded by full depop
       * pcpu_addr_to_page		- translate address to physical address
       * pcpu_verify_alloc_info	- check alloc_info is acceptable during init
      
      Other than wrapping vmalloc_to_page() inside pcpu_addr_to_page() and
      dummy pcpu_verify_alloc_info() implementation, this patch only moves
      code around.  This separation is to allow alternate chunk management
      implementation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NDavid Howells <dhowells@redhat.com>
      Cc: Graff Yang <graff.yang@gmail.com>
      Cc: Sonic Zhang <sonic.adi@gmail.com>
      9f645532