1. 01 5月, 2016 5 次提交
    • A
      powerpc/mm: Drop WIMG in favour of new constants · 30bda41a
      Aneesh Kumar K.V 提交于
      PowerISA 3.0 introduces two pte bits with the below meaning for radix:
        00 -> Normal Memory
        01 -> Strong Access Order (SAO)
        10 -> Non idempotent I/O (Cache inhibited and guarded)
        11 -> Tolerant I/O (Cache inhibited)
      
      We drop the existing WIMG bits in the Linux page table in favour of the
      above constants. We loose _PAGE_WRITETHRU with this conversion. We only
      use writethru via pgprot_cached_wthru() which is used by
      fbdev/controlfb.c which is Apple control display and also PPC32.
      
      With respect to _PAGE_COHERENCE, we have been marking hpte always
      coherent for some time now. htab_convert_pte_flags() always added
      HPTE_R_M.
      
      NOTE: KVM changes need closer review.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      30bda41a
    • A
      powerpc/mm: Update _PAGE_KERNEL_RO · e58e87ad
      Aneesh Kumar K.V 提交于
      PS3 had used a PPP bit hack to implement a read only mapping in the
      kernel area. Since we are bolting the ioremap area, it used the pte
      flags _PAGE_PRESENT | _PAGE_USER to get a PPP value of 0x3 there by
      resulting in a read only mapping. This means the area can be accessed by
      user space, but kernel will never return such an address to user space.
      
      But we can do better by implementing a read only kernel mapping using
      PPP bits 0b110.
      
      This also allows us to do read only kernel mapping for radix in later
      patches.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e58e87ad
    • A
      powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED · ac29c640
      Aneesh Kumar K.V 提交于
      _PAGE_PRIVILEGED means the page can be accessed only by the kernel. This
      is done to keep pte bits similar to PowerISA 3.0 Radix PTE format. User
      pages are now marked by clearing _PAGE_PRIVILEGED bit.
      
      Previously we allowed the kernel to have a privileged page in the lower
      address range (USER_REGION). With this patch such access is denied.
      
      We also prevent a kernel access to a non-privileged page in higher
      address range (ie, REGION_ID != 0).
      
      Both the above access scenarios should never happen.
      
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Jeremy Kerr <jk@ozlabs.org>
      Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
      Acked-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ac29c640
    • A
      powerpc/mm/subpage: Clear RWX bit to indicate no access · 73a1441a
      Aneesh Kumar K.V 提交于
      Subpage protection used to depend on the _PAGE_USER bit to implement no
      access mode. This patch switches that to use _PAGE_RWX. We clear Read,
      Write and Execute access from the pte instead of clearing _PAGE_USER
      now. This was done so that we can switch to _PAGE_PRIVILEGED in a later
      patch.
      
      subpage_protection() returns pte bits that need to be cleared. Instead
      of updating the interface to handle no-access in a separate way, it
      appears simpler to clear RWX acecss to indicate no access.
      
      We still don't insert hash ptes for no access implied by !_PAGE_RWX.
      Hence we should not get PROT_FAULT with change.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      73a1441a
    • A
      powerpc/mm: Use _PAGE_READ to indicate Read access · c7d54842
      Aneesh Kumar K.V 提交于
      This splits the _PAGE_RW bit into _PAGE_READ and _PAGE_WRITE. It also
      removes the dependency on _PAGE_USER for implying read only. Few things
      to note here is that, we have read implied with write and execute
      permission. Hence we should always find _PAGE_READ set on hash pte
      fault.
      
      We still can't switch PROT_NONE to !(_PAGE_RWX). Auto numa depends on
      marking a prot none pte _PAGE_WRITE. (For more details look at
      b191f9b1 "mm: numa: preserve PTE write permissions across a NUMA
      hinting fault")
      
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Jeremy Kerr <jk@ozlabs.org>
      Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
      Acked-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c7d54842
  2. 18 3月, 2016 1 次提交
  3. 02 3月, 2016 1 次提交
  4. 01 3月, 2016 3 次提交
    • D
      powerpc/mm: Clean up memory hotplug failure paths · 1dace6c6
      David Gibson 提交于
      This makes a number of cleanups to handling of mapping failures during
      memory hotplug on Power:
      
      For errors creating the linear mapping for the hot-added region:
        * This is now reported with EFAULT which is more appropriate than the
          previous EINVAL (the failure is unlikely to be related to the
          function's parameters)
        * An error in this path now prints a warning message, rather than just
          silently failing to add the extra memory.
        * Previously a failure here could result in the region being partially
          mapped.  We now clean up any partial mapping before failing.
      
      For errors creating the vmemmap for the hot-added region:
         * This is now reported with EFAULT instead of causing a BUG() - this
           could happen for external reason (e.g. full hash table) so it's better
           to handle this non-fatally
         * An error message is also printed, so the failure won't be silent
         * As above a failure could cause a partially mapped region, we now
           clean this up. [mpe: move htab_remove_mapping() out of #ifdef
           CONFIG_MEMORY_HOTPLUG to enable this]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1dace6c6
    • D
      powerpc/mm: Handle removing maybe-present bolted HPTEs · 27828f98
      David Gibson 提交于
      At the moment the hpte_removebolted callback in ppc_md returns void and
      will BUG_ON() if the hpte it's asked to remove doesn't exist in the first
      place.  This is awkward for the case of cleaning up a mapping which was
      partially made before failing.
      
      So, we add a return value to hpte_removebolted, and have it return ENOENT
      in the case that the HPTE to remove didn't exist in the first place.
      
      In the (sole) caller, we propagate errors in hpte_removebolted to its
      caller to handle.  However, we handle ENOENT specially, continuing to
      complete the unmapping over the specified range before returning the error
      to the caller.
      
      This means that htab_remove_mapping() will work sanely on a partially
      present mapping, removing any HPTEs which are present, while also returning
      ENOENT to its caller in case it's important there.
      
      There are two callers of htab_remove_mapping():
         - In remove_section_mapping() we already WARN_ON() any error return,
           which is reasonable - in this case the mapping should be fully
           present
         - In vmemmap_remove_mapping() we BUG_ON() any error.  We change that to
           just a WARN_ON() in the case of ENOENT, since failing to remove a
           mapping that wasn't there in the first place probably shouldn't be
           fatal.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      27828f98
    • D
      powerpc/mm: Clean up error handling for htab_remove_mapping · abd0a0e7
      David Gibson 提交于
      Currently, the only error that htab_remove_mapping() can report is -EINVAL,
      if removal of bolted HPTEs isn't implemeted for this platform.  We make
      a few clean ups to the handling of this:
      
       * EINVAL isn't really the right code - there's nothing wrong with the
         function's arguments - use ENODEV instead
       * We were also printing a warning message, but that's a decision better
         left up to the callers, so remove it
       * One caller is vmemmap_remove_mapping(), which will just BUG_ON() on
         error, making the warning message redundant, so no change is needed
         there.
       * The other caller is remove_section_mapping().  This is called in the
         memory hot remove path at a point after vmemmap_remove_mapping() so
         if hpte_removebolted isn't implemented, we'd expect to have already
         BUG()ed anyway.  Put a WARN_ON() here, in lieu of a printk() since this
         really shouldn't be happening.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      abd0a0e7
  5. 27 2月, 2016 1 次提交
  6. 09 1月, 2016 1 次提交
  7. 27 12月, 2015 1 次提交
  8. 19 12月, 2015 1 次提交
  9. 14 12月, 2015 7 次提交
  10. 12 10月, 2015 1 次提交
    • A
      powerpc/mm: Differentiate between hugetlb and THP during page walk · 891121e6
      Aneesh Kumar K.V 提交于
      We need to properly identify whether a hugepage is an explicit or
      a transparent hugepage in follow_huge_addr(). We used to depend
      on hugepage shift argument to do that. But in some case that can
      result in wrong results. For ex:
      
      On finding a transparent hugepage we set hugepage shift to PMD_SHIFT.
      But we can end up clearing the thp pte, via pmdp_huge_get_and_clear.
      We do prevent reusing the pfn page via the usage of
      kick_all_cpus_sync(). But that happens after we updated the pte to 0.
      Hence in follow_huge_addr() we can find hugepage shift set, but transparent
      huge page check fail for a thp pte.
      
      NOTE: We fixed a variant of this race against thp split in commit
      691e95fd
      ("powerpc/mm/thp: Make page table walk safe against thp split/collapse")
      
      Without this patch, we may hit the BUG_ON(flags & FOLL_GET) in
      follow_page_mask occasionally.
      
      In the long term, we may want to switch ppc64 64k page size config to
      enable CONFIG_ARCH_WANT_GENERAL_HUGETLB
      Reported-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      891121e6
  11. 18 8月, 2015 1 次提交
  12. 10 6月, 2015 1 次提交
    • A
      powerpc/mm: Add trace point for tracking hash pte fault · cfcb3d80
      Aneesh Kumar K.V 提交于
      This enables us to understand how many hash fault we are taking
      when running benchmarks.
      
      For ex:
      -bash-4.2# ./perf stat -e  powerpc:hash_fault -e page-faults /tmp/ebizzy.ppc64 -S 30  -P -n 1000
      ...
      
       Performance counter stats for '/tmp/ebizzy.ppc64 -S 30 -P -n 1000':
      
             1,10,04,075      powerpc:hash_fault
             1,10,03,429      page-faults
      
            30.865978991 seconds time elapsed
      
      NOTE:
      The impact of the tracepoint was not noticeable when running test. It was
      within the run-time variance of the test. For ex:
      
      without-patch:
      --------------
      
       Performance counter stats for './a.out 3000 300':
      
      	       643      page-faults               #    0.089 M/sec
      	  7.236562      task-clock (msec)         #    0.928 CPUs utilized
      	 2,179,213      stalled-cycles-frontend   #    0.00% frontend cycles idle
      	17,174,367      stalled-cycles-backend    #    0.00% backend  cycles idle
      		 0      context-switches          #    0.000 K/sec
      
             0.007794658 seconds time elapsed
      
      And with-patch:
      ---------------
      
       Performance counter stats for './a.out 3000 300':
      
      	       643      page-faults               #    0.089 M/sec
      	  7.233746      task-clock (msec)         #    0.921 CPUs utilized
      		 0      context-switches          #    0.000 K/sec
      
             0.007854876 seconds time elapsed
      
       Performance counter stats for './a.out 3000 300':
      
      	       643      page-faults               #    0.087 M/sec
      	       649      powerpc:hash_fault        #    0.087 M/sec
      	  7.430376      task-clock (msec)         #    0.938 CPUs utilized
      	 2,347,174      stalled-cycles-frontend   #    0.00% frontend cycles idle
      	17,524,282      stalled-cycles-backend    #    0.00% backend  cycles idle
      		 0      context-switches          #    0.000 K/sec
      
             0.007920284 seconds time elapsed
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cfcb3d80
  13. 02 6月, 2015 1 次提交
  14. 23 4月, 2015 1 次提交
  15. 17 4月, 2015 1 次提交
    • A
      powerpc/mm/thp: Make page table walk safe against thp split/collapse · 691e95fd
      Aneesh Kumar K.V 提交于
      We can disable a THP split or a hugepage collapse by disabling irq.
      We do send IPI to all the cpus in the early part of split/collapse,
      and disabling local irq ensure we don't make progress with
      split/collapse. If the THP is getting split we return NULL from
      find_linux_pte_or_hugepte(). For all the current callers it should be ok.
      We need to be careful if we want to use returned pte_t pointer outside
      the irq disabled region. W.r.t to THP split, the pfn remains the same,
      but then a hugepage collapse will result in a pfn change. There are
      few steps we can take to avoid a hugepage collapse.One way is to take page
      reference inside the irq disable region. Other option is to take
      mmap_sem so that a parallel collapse will not happen. We can also
      disable collapse by taking pmd_lock. Another method used by kvm
      subsystem is to check whether we had a mmu_notifer update in between
      using mmu_notifier_retry().
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      691e95fd
  16. 14 12月, 2014 1 次提交
    • J
      mm/debug-pagealloc: make debug-pagealloc boottime configurable · 031bc574
      Joonsoo Kim 提交于
      Now, we have prepared to avoid using debug-pagealloc in boottime.  So
      introduce new kernel-parameter to disable debug-pagealloc in boottime, and
      makes related functions to be disabled in this case.
      
      Only non-intuitive part is change of guard page functions.  Because guard
      page is effective only if debug-pagealloc is enabled, turning off
      according to debug-pagealloc is reasonable thing to do.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Jungsoo Son <jungsoo.son@lge.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      031bc574
  17. 05 12月, 2014 1 次提交
    • A
      powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault · aefa5688
      Aneesh Kumar K.V 提交于
      upatepp can get called for a nohpte fault when we find from the linux
      page table that the translation was hashed before. In that case
      we are sure that there is no existing translation, hence we could
      avoid doing tlbie.
      
      We could possibly race with a parallel fault filling the TLB. But
      that should be ok because updatepp is only ever relaxing permissions.
      We also look at linux pte permission bits when filling hash pte
      permission bits. We also hold the linux pte busy bits while
      inserting/updating a hashpte entry, hence a paralle update of
      linux pte is not possible. On the other hand mprotect involves
      ptep_modify_prot_start which cause a hpte invalidate and not updatepp.
      
      Performance number:
      We use randbox_access_bench written by Anton.
      
      Kernel with THP disabled and smaller hash page table size.
      
          86.60%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_updatepp
           2.10%  random_access_b  random_access_bench              [.] doit
           1.99%  random_access_b  [kernel.kallsyms]                [k] .do_raw_spin_lock
           1.85%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_insert
           1.26%  random_access_b  [kernel.kallsyms]                [k] .native_flush_hash_range
           1.18%  random_access_b  [kernel.kallsyms]                [k] .__delay
           0.69%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_remove
           0.37%  random_access_b  [kernel.kallsyms]                [k] .clear_user_page
           0.34%  random_access_b  [kernel.kallsyms]                [k] .__hash_page_64K
           0.32%  random_access_b  [kernel.kallsyms]                [k] fast_exception_return
           0.30%  random_access_b  [kernel.kallsyms]                [k] .hash_page_mm
      
      With Fix:
      
          27.54%  random_access_b  random_access_bench              [.] doit
          22.90%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_insert
           5.76%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_remove
           5.20%  random_access_b  [kernel.kallsyms]                [k] fast_exception_return
           5.12%  random_access_b  [kernel.kallsyms]                [k] .__hash_page_64K
           4.80%  random_access_b  [kernel.kallsyms]                [k] .hash_page_mm
           3.31%  random_access_b  [kernel.kallsyms]                [k] data_access_common
           1.84%  random_access_b  [kernel.kallsyms]                [k] .trace_hardirqs_on_caller
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      aefa5688
  18. 02 12月, 2014 2 次提交
  19. 03 11月, 2014 1 次提交
    • C
      powerpc: Replace __get_cpu_var uses · 69111bac
      Christoph Lameter 提交于
      This still has not been merged and now powerpc is the only arch that does
      not have this change. Sorry about missing linuxppc-dev before.
      
      V2->V2
        - Fix up to work against 3.18-rc1
      
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      At the end of the patch set all uses of __get_cpu_var have been removed so
      the macro is removed too.
      
      The patch set includes passes over all arches as well. Once these operations
      are used throughout then specialized macros can be defined in non -x86
      arches as well in order to optimize per cpu access by f.e.  using a global
      register that may be set to the per cpu base.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      [mpe: Fix build errors caused by set/or_softirq_pending(), and rework
            assignment in __set_breakpoint() to use memcpy().]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      69111bac
  20. 08 10月, 2014 3 次提交
  21. 25 9月, 2014 3 次提交
  22. 27 8月, 2014 2 次提交
    • T
      Revert "powerpc: Replace __get_cpu_var uses" · 23f66e2d
      Tejun Heo 提交于
      This reverts commit 5828f666 due to
      build failure after merging with pending powerpc changes.
      
      Link: http://lkml.kernel.org/g/20140827142243.6277eaff@canb.auug.org.auSigned-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      23f66e2d
    • C
      powerpc: Replace __get_cpu_var uses · 5828f666
      Christoph Lameter 提交于
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      At the end of the patch set all uses of __get_cpu_var have been removed so
      the macro is removed too.
      
      The patch set includes passes over all arches as well. Once these operations
      are used throughout then specialized macros can be defined in non -x86
      arches as well in order to optimize per cpu access by f.e.  using a global
      register that may be set to the per cpu base.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      tj: Folded a fix patch.
          http://lkml.kernel.org/g/alpine.DEB.2.11.1408172143020.9652@gentwo.org
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      5828f666