“22dac40c3aab587fce717a07d46e1ba61712694c”上不存在“paddle/operators/cos_sim_op.cc”
  1. 23 9月, 2016 1 次提交
  2. 09 6月, 2016 3 次提交
  3. 05 11月, 2014 2 次提交
  4. 18 6月, 2014 2 次提交
    • T
      percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations · 6fbc07bb
      Tejun Heo 提交于
      __verify_pcpu_ptr() is used to verify that a specified parameter is
      actually an percpu pointer by percpu accessor and operation
      implementations.  Currently, where it's called isn't clearly defined
      and we just ensure that it's invoked at least once for all accessors
      and operations.
      
      The lack of clarity on when it should be called isn't nice and given
      that this is a completely generic issue, there's no reason to make
      archs worry about it.
      
      This patch updates __verify_pcpu_ptr() invocations such that it's
      always invoked from the final generic wrapper once per access or
      operation.  As this is already the case for {raw|this}_cpu_*()
      definitions through __pcpu_size_*(), only the {raw|per|this}_cpu_ptr()
      accessors need to be updated.
      
      This change makes it unnecessary for archs to worry about
      __verify_pcpu_ptr().  x86's arch_raw_cpu_ptr() is updated accordingly.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      6fbc07bb
    • T
      percpu: introduce arch_raw_cpu_ptr() · bbc344e1
      Tejun Heo 提交于
      Currently, archs can override raw_cpu_ptr() directly; however, we
      wanna build a layer of indirection in the generic part of percpu so
      that we can implement generic features there without affecting archs.
      
      Introduce arch_raw_cpu_ptr() which is used to define raw_cpu_ptr() by
      generic percpu code.  The two are identical for now.  x86 is currently
      the only arch which overrides raw_cpu_ptr() and is converted to
      define arch_raw_cpu_ptr() instead.
      
      This doesn't introduce any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      bbc344e1
  5. 08 4月, 2014 1 次提交
    • C
      percpu: add raw_cpu_ops · b3ca1c10
      Christoph Lameter 提交于
      The kernel has never been audited to ensure that this_cpu operations are
      consistently used throughout the kernel.  The code generated in many
      places can be improved through the use of this_cpu operations (which
      uses a segment register for relocation of per cpu offsets instead of
      performing address calculations).
      
      The patch set also addresses various consistency issues in general with
      the per cpu macros.
      
      A. The semantics of __this_cpu_ptr() differs from this_cpu_ptr only
         because checks are skipped. This is typically shown through a raw_
         prefix. So this patch set changes the places where __this_cpu_ptr()
         is used to raw_cpu_ptr().
      
      B. There has been the long term wish by some that __this_cpu operations
         would check for preemption. However, there are cases where preemption
         checks need to be skipped. This patch set adds raw_cpu operations that
         do not check for preemption and then adds preemption checks to the
         __this_cpu operations.
      
      C. The use of __get_cpu_var is always a reference to a percpu variable
         that can also be handled via a this_cpu operation. This patch set
         replaces all uses of __get_cpu_var with this_cpu operations.
      
      D. We can then use this_cpu RMW operations in various places replacing
         sequences of instructions by a single one.
      
      E. The use of this_cpu operations throughout will allow other arches than
         x86 to implement optimized references and RMV operations to work with
         per cpu local data.
      
      F. The use of this_cpu operations opens up the possibility to
         further optimize code that relies on synchronization through
         per cpu data.
      
      The patch set works in a couple of stages:
      
      I. Patch 1 adds the additional raw_cpu operations and raw_cpu_ptr().
          Also converts the existing __this_cpu_xx_# primitive in the x86
          code to raw_cpu_xx_#.
      
      II. Patch 2-4 use the raw_cpu operations in places that would give
           us false positives once they are enabled.
      
      III. Patch 5 adds preemption checks to __this_cpu operations to allow
          checking if preemption is properly disabled when these functions
          are used.
      
      IV. Patches 6-20 are patches that simply replace uses of __get_cpu_var
         with this_cpu_ptr. They do not depend on any changes to the percpu
         code. No preemption tests are skipped if they are applied.
      
      V. Patches 21-46 are conversion patches that use this_cpu operations
         in various kernel subsystems/drivers or arch code.
      
      VI.  Patches 47/48 (not included in this series) remove no longer used
          functions (__this_cpu_ptr and __get_cpu_var).  These should only be
          applied after all the conversion patches have made it and after we
          have done additional passes through the kernel to ensure that none of
          the uses of these functions remain.
      
      This patch (of 46):
      
      The patches following this one will add preemption checks to __this_cpu
      ops so we need to have an alternative way to use this_cpu operations
      without preemption checks.
      
      raw_cpu_ops will be the basis for all other ops since these will be the
      operations that do not implement any checks.
      
      Primitive operations are renamed by this patch from __this_cpu_xxx to
      raw_cpu_xxxx.
      
      Also change the uses of the x86 percpu primitives in preempt.h.
      These depend directly on asm/percpu.h (header #include nesting issue).
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Alex Shi <alex.shi@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bryan Wu <cooloney@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: David Daney <david.daney@cavium.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Hedi Berriche <hedi@sgi.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wim Van Sebroeck <wim@iguana.be>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3ca1c10
  6. 31 10月, 2013 1 次提交
    • G
      percpu: fix this_cpu_sub() subtrahend casting for unsigneds · bd09d9a3
      Greg Thelen 提交于
      this_cpu_sub() is implemented as negation and addition.
      
      This patch casts the adjustment to the counter type before negation to
      sign extend the adjustment.  This helps in cases where the counter type
      is wider than an unsigned adjustment.  An alternative to this patch is
      to declare such operations unsupported, but it seemed useful to avoid
      surprises.
      
      This patch specifically helps the following example:
        unsigned int delta = 1
        preempt_disable()
        this_cpu_write(long_counter, 0)
        this_cpu_sub(long_counter, delta)
        preempt_enable()
      
      Before this change long_counter on a 64 bit machine ends with value
      0xffffffff, rather than 0xffffffffffffffff.  This is because
      this_cpu_sub(pcp, delta) boils down to this_cpu_add(pcp, -delta),
      which is basically:
        long_counter = 0 + 0xffffffff
      
      Also apply the same cast to:
        __this_cpu_sub()
        __this_cpu_sub_return()
        this_cpu_sub_return()
      
      All percpu_test.ko passes, especially the following cases which
      previously failed:
      
        l -= ui_one;
        __this_cpu_sub(long_counter, ui_one);
        CHECK(l, long_counter, -1);
      
        l -= ui_one;
        this_cpu_sub(long_counter, ui_one);
        CHECK(l, long_counter, -1);
        CHECK(l, long_counter, 0xffffffffffffffff);
      
        ul -= ui_one;
        __this_cpu_sub(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, -1);
        CHECK(ul, ulong_counter, 0xffffffffffffffff);
      
        ul = this_cpu_sub_return(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, 2);
      
        ul = __this_cpu_sub_return(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, 1);
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd09d9a3
  7. 27 10月, 2013 1 次提交
  8. 30 11月, 2012 1 次提交
  9. 14 6月, 2012 1 次提交
  10. 15 5月, 2012 2 次提交
  11. 23 12月, 2011 1 次提交
  12. 15 12月, 2011 1 次提交
    • J
      x86: Fix and improve percpu_cmpxchg{8,16}b_double() · cebef5be
      Jan Beulich 提交于
      They had several problems/shortcomings:
      
      Only the first memory operand was mentioned in the 2x32bit asm()
      operands, and 2x64-bit version had a memory clobber. The first
      allowed the compiler to not recognize the need to re-load the
      data in case it had it cached in some register, and the second
      was overly destructive.
      
      The memory operand in the 2x32-bit asm() was declared to only be
      an output.
      
      The types of the local copies of the old and new values were
      incorrect (as in other per-CPU ops, the types of the per-CPU
      variables accessed should be used here, to make sure the
      respective types are compatible).
      
      The __dummy variable was pointless (and needlessly initialized
      in the 2x32-bit case), given that local copies of the inputs
      already exist.
      
      The 2x64-bit variant forced the address of the first object into
      %rsi, even though this is needed only for the call to the
      emulation function. The real cmpxchg16b can operate on an
      memory.
      
      At once also change the return value type to what it really is -
      'bool'.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/4EE86D6502000078000679FE@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      cebef5be
  13. 12 7月, 2011 1 次提交
  14. 19 4月, 2011 1 次提交
  15. 29 3月, 2011 2 次提交
  16. 28 3月, 2011 1 次提交
  17. 28 2月, 2011 1 次提交
  18. 26 1月, 2011 1 次提交
    • E
      percpu, x86: Fix percpu_xchg_op() · 889a7a6a
      Eric Dumazet 提交于
      These recent percpu commits:
      
        2485b646: x86,percpu: Move out of place 64 bit ops into X86_64 section
        8270137a: cpuops: Use cmpxchg for xchg to avoid lock semantics
      
      Caused this 'perf top' crash:
      
       Kernel panic - not syncing: Fatal exception in interrupt
       Pid: 0, comm: swapper Tainted: G     D
       2.6.38-rc2-00181-gef71723 #413 Call Trace: <IRQ> [<ffffffff810465b5>]
          ? panic
          ? kmsg_dump
          ? kmsg_dump
          ? oops_end
          ? no_context
          ? __bad_area_nosemaphore
          ? perf_output_begin
          ? bad_area_nosemaphore
          ? do_page_fault
          ? __task_pid_nr_ns
          ? perf_event_tid
          ? __perf_event_header__init_id
          ? validate_chain
          ? perf_output_sample
          ? trace_hardirqs_off
          ? page_fault
          ? irq_work_run
          ? update_process_times
          ? tick_sched_timer
          ? tick_sched_timer
          ? __run_hrtimer
          ? hrtimer_interrupt
          ? account_system_vtime
          ? smp_apic_timer_interrupt
          ? apic_timer_interrupt
       ...
      
      Looking at assembly code, I found:
      
      list = this_cpu_xchg(irq_work_list, NULL);
      
      gives this wrong code : (gcc-4.1.2 cross compiler)
      
      ffffffff810bc45e:
      	mov    %gs:0xead0,%rax
      	cmpxchg %rax,%gs:0xead0
      	jne    ffffffff810bc45e <irq_work_run+0x3e>
      	test   %rax,%rax
      	je     ffffffff810bc4aa <irq_work_run+0x8a>
      
      Tell gcc we dirty eax/rax register in percpu_xchg_op()
      
      Compiler must use another register to store pxo_new__
      
      We also dont need to reload percpu value after a jump,
      since a 'failed' cmpxchg already updated eax/rax
      
      Wrong generated code was :
      	xor     %rax,%rax   /* load 0 into %rax */
      1:	mov     %gs:0xead0,%rax
      	cmpxchg %rax,%gs:0xead0
      	jne     1b
      	test    %rax,%rax
      
      After patch :
      
      	xor     %rdx,%rdx   /* load 0 into %rdx */
      	mov     %gs:0xead0,%rax
      1:	cmpxchg %rdx,%gs:0xead0
      	jne     1b:
      	test    %rax,%rax
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Tejun Heo <tj@kernel.org>
      LKML-Reference: <1295973114.3588.312.camel@edumazet-laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      889a7a6a
  19. 12 1月, 2011 1 次提交
  20. 18 12月, 2010 2 次提交
    • C
      cpuops: Use cmpxchg for xchg to avoid lock semantics · 8270137a
      Christoph Lameter 提交于
      Use cmpxchg instead of xchg to realize this_cpu_xchg.
      
      xchg will cause LOCK overhead since LOCK is always implied but cmpxchg
      will not.
      
      Baselines:
      
      xchg()		= 18 cycles (no segment prefix, LOCK semantics)
      __this_cpu_xchg = 1 cycle
      
      (simulated using this_cpu_read/write, two prefixes. Looks like the
      cpu can use loop optimization to get rid of most of the overhead)
      
      Cycles before:
      
      this_cpu_xchg	 = 37 cycles (segment prefix and LOCK (implied by xchg))
      
      After:
      
      this_cpu_xchg	= 11 cycle (using cmpxchg without lock semantics)
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      8270137a
    • C
      x86: this_cpu_cmpxchg and this_cpu_xchg operations · 7296e08a
      Christoph Lameter 提交于
      Provide support as far as the hardware capabilities of the x86 cpus
      allow.
      
      Define CONFIG_CMPXCHG_LOCAL in Kconfig.cpu to allow core code to test for
      fast cpuops implementations.
      
      V1->V2:
      	- Take out the definition for this_cpu_cmpxchg_8 and move it into
      	  a separate patch.
      
      tj: - Reordered ops to better follow this_cpu_* organization.
          - Renamed macro temp variables similar to their existing
            neighbours.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      7296e08a
  21. 17 12月, 2010 2 次提交
    • T
      percpu,x86: relocate this_cpu_add_return() and friends · 40304775
      Tejun Heo 提交于
      - include/linux/percpu.h: this_cpu_add_return() and friends were
        located next to __this_cpu_add_return().  However, the overall
        organization is to first group by preemption safeness.  Relocate
        this_cpu_add_return() and friends to preemption-safe area.
      
      - arch/x86/include/asm/percpu.h: Relocate percpu_add_return_op() after
        other more basic operations.  Relocate [__]this_cpu_add_return_8()
        so that they're first grouped by preemption safeness.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      40304775
    • C
      x86: Support for this_cpu_add, sub, dec, inc_return · 8f1d97c7
      Christoph Lameter 提交于
      Supply an implementation for x86 in order to generate more efficient code.
      
      V2->V3:
      	- Cleanup
      	- Remove strange type checking from percpu_add_return_op.
      
      tj: - Dropped unused typedef from percpu_add_return_op().
          - Renamed ret__ to paro_ret__ in percpu_add_return_op().
          - Minor indentation adjustments.
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      8f1d97c7
  22. 10 9月, 2010 1 次提交
    • B
      x86, percpu: Optimize this_cpu_ptr · db7829c6
      Brian Gerst 提交于
      Allow arches to implement __this_cpu_ptr, and provide an x86 version.
      
      Before:
      	movq $foo, %rax
      	movq %gs:this_cpu_off, %rdx
      	addq %rdx, %rax
      
      After:
      	movq $foo, %rax
      	addq %gs:this_cpu_off, %rax
      
      The benefit is doing it in one less instruction and not clobbering
      a temporary register.
      
      tj: * Beefed up the comment a bit and renamed in-macro temp variable
            to match neighboring macros.
      
          * Folded fix for const pointer case found in linux-next.
      
          * Fixed sparse notation.
      Signed-off-by: NBrian Gerst <brgerst@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      db7829c6
  23. 11 6月, 2010 1 次提交
  24. 29 4月, 2010 1 次提交
  25. 20 4月, 2010 1 次提交
  26. 05 1月, 2010 1 次提交
  27. 29 10月, 2009 2 次提交
    • R
      percpu: remove per_cpu__ prefix. · dd17c8f7
      Rusty Russell 提交于
      Now that the return from alloc_percpu is compatible with the address
      of per-cpu vars, it makes sense to hand around the address of per-cpu
      variables.  To make this sane, we remove the per_cpu__ prefix we used
      created to stop people accidentally using these vars directly.
      
      Now we have sparse, we can use that (next patch).
      
      tj: * Updated to convert stuff which were missed by or added after the
            original patch.
      
          * Kill per_cpu_var() macro.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      dd17c8f7
    • T
      percpu: remove some sparse warnings · 0f5e4816
      Tejun Heo 提交于
      Make the following changes to remove some sparse warnings.
      
      * Make DEFINE_PER_CPU_SECTION() declare __pcpu_unique_* before
        defining it.
      
      * Annotate pcpu_extend_area_map() that it is entered with pcpu_lock
        held, releases it and then reacquires it.
      
      * Make percpu related macros use unique nested variable names.
      
      * While at it, add pcpu prefix to __size_call[_return]() macros as
        to-be-implemented sparse annotations will add percpu specific stuff
        to these macros.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      0f5e4816
  28. 03 10月, 2009 1 次提交
    • C
      this_cpu: Implement X86 optimized this_cpu operations · 30ed1a79
      Christoph Lameter 提交于
      Basically the existing percpu ops can be used for this_cpu variants that allow
      operations also on dynamically allocated percpu data. However, we do not pass a
      reference to a percpu variable in. Instead a dynamically or statically
      allocated percpu variable is provided.
      
      Preempt, the non preempt and the irqsafe operations generate the same code.
      It will always be possible to have the requires per cpu atomicness in a single
      RMW instruction with segment override on x86.
      
      64 bit this_cpu operations are not supported on 32 bit.
      Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      30ed1a79
  29. 04 8月, 2009 1 次提交
  30. 04 7月, 2009 1 次提交
    • T
      x86,percpu: generalize lpage first chunk allocator · 8c4bfc6e
      Tejun Heo 提交于
      Generalize and move x86 setup_pcpu_lpage() into
      pcpu_lpage_first_chunk().  setup_pcpu_lpage() now is a simple wrapper
      around the generalized version.  Other than taking size parameters and
      using arch supplied callbacks to allocate/free/map memory,
      pcpu_lpage_first_chunk() is identical to the original implementation.
      
      This simplifies arch code and will help converting more archs to
      dynamic percpu allocator.
      
      While at it, factor out pcpu_calc_fc_sizes() which is common to
      pcpu_embed_first_chunk() and pcpu_lpage_first_chunk().
      
      [ Impact: code reorganization and generalization ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      8c4bfc6e
  31. 22 6月, 2009 1 次提交
    • T
      x86: fix pageattr handling for lpage percpu allocator and re-enable it · e59a1bb2
      Tejun Heo 提交于
      lpage allocator aliases a PMD page for each cpu and returns whatever
      is unused to the page allocator.  When the pageattr of the recycled
      pages are changed, this makes the two aliases point to the overlapping
      regions with different attributes which isn't allowed and known to
      cause subtle data corruption in certain cases.
      
      This can be handled in simliar manner to the x86_64 highmap alias.
      pageattr code should detect if the target pages have PMD alias and
      split the PMD alias and synchronize the attributes.
      
      pcpur allocator is updated to keep the allocated PMD pages map sorted
      in ascending address order and provide pcpu_lpage_remapped() function
      which binary searches the array to determine whether the given address
      is aliased and if so to which address.  pageattr is updated to use
      pcpu_lpage_remapped() to detect the PMD alias and split it up as
      necessary from cpa_process_alias().
      
      Jan Beulich spotted the original problem and incorrect usage of vaddr
      instead of laddr for lookup.
      
      With this, lpage percpu allocator should work correctly.  Re-enable
      it.
      
      [ Impact: fix subtle lpage pageattr bug and re-enable lpage ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      e59a1bb2