提交 · b1022fbd293564de91596b8775340cf41ad5214c · openanolis / cloud-kernel

30 4月, 2013 21 次提交

powerpc: Decode the pte-lp-encoding bits correctly. · b1022fbd

由 Aneesh Kumar K.V 提交于 4月 28, 2013

We look at both the segment base page size and actual page size and store
the pte-lp-encodings in an array per base page size.

We also update all relevant functions to take actual page size argument
so that we can use the correct PTE LP encoding in HPTE. This should also
get the basic Multiple Page Size per Segment (MPSS) support. This is needed
to enable THP on ppc64.

[Fixed PR KVM build --BenH]
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

b1022fbd

powerpc: Use encode avpn where we need only avpn values · 74f227b2

由 Aneesh Kumar K.V 提交于 4月 28, 2013

In all these cases we are doing something similar to

HPTE_V_COMPARE(hpte_v, want_v) which ignores the HPTE_V_LARGE bit

With MPSS support we would need actual page size to set HPTE_V_LARGE
bit and that won't be available in most of these cases. Since we are ignoring
HPTE_V_LARGE bit, use the  avpn value instead. There should not be any change
in behaviour after this patch.
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

74f227b2

powerpc: Reduce PTE table memory wastage · 5c1f6ee9

由 Aneesh Kumar K.V 提交于 4月 28, 2013

We allocate one page for the last level of linux page table. With THP and
large page size of 16MB, that would mean we are wasting large part
of that page. To map 16MB area, we only need a PTE space of 2K with 64K
page size. This patch reduce the space wastage by sharing the page
allocated for the last level of linux page table with multiple pmd
entries. We call these smaller chunks PTE page fragments and allocated
page, PTE page.

In order to support systems which doesn't have 64K HPTE support, we also
add another 2K to PTE page fragment. The second half of the PTE fragments
is used for storing slot and secondary bit information of an HPTE. With this
we now have a 4K PTE fragment.

We use a simple approach to share the PTE page. On allocation, we bump the
PTE page refcount to 16 and share the PTE page with the next 16 pte alloc
request. This should help in the node locality of the PTE page fragment,
assuming that the immediate pte alloc request will mostly come from the
same NUMA node. We don't try to reuse the freed PTE page fragment. Hence
we could be waisting some space.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

5c1f6ee9

powerpc: Move the pte free routines from common header · d614bb04

由 Aneesh Kumar K.V 提交于 4月 28, 2013

Acked-by: NPaul Mackerras <paulus@samba.org>

This patch moves the common code to 32/64 bit headers and also duplicate
4K_PAGES and 64K_PAGES section. We will later change the 64 bit 64K_PAGES
version to support smaller PTE fragments. The patch doesn't introduce
any functional changes.
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

d614bb04

powerpc: Reduce the PTE_INDEX_SIZE · 419df06e

由 Aneesh Kumar K.V 提交于 4月 28, 2013

This make one PMD cover 16MB range. That helps in easier implementation of THP
on power. THP core code make use of one pmd entry to track the hugepage and
the range mapped by a single pmd entry should be equal to the hugepage size
supported by the hardware.

This also switch PGD to cover 16GB. That is needed so that we can simplify the
hugetlb page walking code so that we have same pte format for explicit hugepage
and THP hugepage.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

419df06e

powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format · e2b3d202

由 Aneesh Kumar K.V 提交于 4月 28, 2013

We will be switching PMD_SHIFT to 24 bits to facilitate THP impmenetation.
With PMD_SHIFT set to 24, we now have 16MB huge pages allocated at PGD level.
That means with 32 bit process we cannot allocate normal pages at
all, because we cover the entire address space with one pgd entry. Fix this
by switching to a new page table format for hugepages. With the new page table
format for 16GB and 16MB hugepages we won't allocate hugepage directory. Instead
we encode the PTE information directly at the directory level. This forces 16MB
hugepage at PMD level. This will also make the page take walk much simpler later
when we add the THP support.

With the new table format we have 4 cases for pgds and pmds:
(1) invalid (all zeroes)
(2) pointer to next table, as normal; bottom 6 bits == 0
(3) leaf pte for huge page, bottom two bits != 00
(4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

e2b3d202

powerpc: New hugepage directory format · cf9427b8

由 Aneesh Kumar K.V 提交于 4月 28, 2013

Change the hugepage directory format so that we can have leaf ptes directly
at page directory avoiding the allocation of hugepage directory.

With the new table format we have 3 cases for pgds and pmds:
(1) invalid (all zeroes)
(2) pointer to next table, as normal; bottom 6 bits == 0
(4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table

Instead of storing shift value in hugepd pointer we use mmu_psize_def index
so that we can fit all the supported hugepage size in 4 bits
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

cf9427b8

powerpc: Don't truncate pgd_index wrongly · 0e5f35d0

由 Aneesh Kumar K.V 提交于 4月 28, 2013

With PGD_INDEX_SIZE set to 12 the existing macro doesn't work. Fix it to
use PTRS_PER_PGD

The idea originally was to have one more bit in the result of
pgd_index() than PGD_INDEX_SIZE, so that if one had an address
corresponding to the last PGD entry, and then incremented that address
by PGD_SIZE, and took pgd_index() of that, you wouldn't end up with
zero.  The commit that introduced that dates back to 2002, and the
code that was sensitive to that edge case has long since been
refactored (several times), so there is no need for it these days.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0e5f35d0

powerpc: Don't hard code the size of pte page · cc3665a6

由 Aneesh Kumar K.V 提交于 4月 28, 2013

USE PTRS_PER_PTE to indicate the size of pte page. To support THP,
later patches will be changing PTRS_PER_PTE value.
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

cc3665a6

powerpc: Save DAR and DSISR in pt_regs on MCE · ce54152f

由 Aneesh Kumar K.V 提交于 4月 28, 2013

We were not saving DAR and DSISR on MCE. Save then and also print the values
along with exception details in xmon.
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

ce54152f

powerpc: Use signed formatting when printing error · 4b8f63d9

由 Aneesh Kumar K.V 提交于 4月 28, 2013

PAPR defines these errors as negative values. So print them accordingly
for easy debugging.
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

4b8f63d9

powerpc/pseries: Correct builds break when CONFIG_SMP not defined · 601abdc3

由 Nathan Fontenot 提交于 4月 29, 2013

Correct build failure for powerpc/pseries builds with CONFIG_SMP not defined.

The function cpu_sibling_mask has no meaning (or definition) when CONFIG_SMP
is not defined. Additionally, the updating of NUMA affinity for a CPU in a UP
system doesn't really make sense.

This patch ifdef's out the code making the affinity updates for PRRN events to
fix the following build break.

arch/powerpc/mm/numa.c: In function ‘stage_topology_update’:
arch/powerpc/mm/numa.c:1535: error: implicit declaration of function ‘cpu_sibling_mask’
arch/powerpc/mm/numa.c:1535: warning: passing argument 3 of ‘cpumask_or’ makes pointer from integer without a cast
make[1]: *** [arch/powerpc/mm/numa.o] Error 1
Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

601abdc3

powerpc/booke: Remove obsolete macro FINISH_EXCEPTION · 177c1923

由 Kevin Hao 提交于 4月 29, 2013

This is stale and not used by anyone now.
Signed-off-by: NKevin Hao <haokexin@gmail.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

177c1923

powerpc/rtas_flash: Fix bad memory access · fb4696c3

由 Vasant Hegde 提交于 4月 28, 2013

We use kmem_cache_alloc() to allocate memory to hold the new firmware
which will be flashed. kmem_cache_alloc() calls rtas_block_ctor() to
set memory to NULL. But these constructor is called only for newly
allocated slabs.

If we run below command multiple time without rebooting, allocator may
allocate memory from the area which was free'd by kmem_cache_free and
it will not call constructor. In this situation we may hit kernel oops.

dd if=<fw image> of=/proc/ppc64/rtas/firmware_flash bs=4096

oops message:
-------------
[ 1602.399755] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1602.399772] SMP NR_CPUS=1024 NUMA pSeries
[ 1602.399779] Modules linked in: rtas_flash nfsd lockd auth_rpcgss nfs_acl sunrpc fuse loop dm_mod sg ipv6 ses enclosure ehea ehci_pci ohci_hcd ehci_hcd usbcore sd_mod usb_common crc_t10dif scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac scsi_dh ipr libata scsi_mod
[ 1602.399817] NIP: d00000000a170b9c LR: d00000000a170b64 CTR: c00000000079cd58
[ 1602.399823] REGS: c0000003b9937930 TRAP: 0300   Not tainted  (3.9.0-rc4-0.27-ppc64)
[ 1602.399828] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 22000428  XER: 20000000
[ 1602.399841] SOFTE: 1
[ 1602.399844] CFAR: c000000000005f24
[ 1602.399848] DAR: 8c2625a820631fef, DSISR: 40000000
[ 1602.399852] TASK = c0000003b4520760[3655] 'dd' THREAD: c0000003b9934000 CPU: 3
GPR00: 8c2625a820631fe7 c0000003b9937bb0 d00000000a179f28 d00000000a171f08
GPR04: 0000000010040000 0000000000001000 c0000003b9937df0 c0000003b5fb2080
GPR08: c0000003b58f7200 d00000000a179f28 c0000003b40058d4 c00000000079cd58
GPR12: d00000000a171450 c000000007f40900 0000000000000005 0000000010178d20
GPR16: 00000000100cb9d8 000000000000001d 0000000000000000 000000001003ffff
GPR20: 0000000000000001 0000000000000000 00003fffa0b50d30 000000001001f010
GPR24: 0000000010020888 0000000010040000 d00000000a171f08 d00000000a172808
GPR28: 0000000000001000 0000000010040000 c0000003b4005880 8c2625a820631fe7
[ 1602.399924] NIP [d00000000a170b9c] .rtas_flash_write+0x7c/0x1e8 [rtas_flash]
[ 1602.399930] LR [d00000000a170b64] .rtas_flash_write+0x44/0x1e8 [rtas_flash]
[ 1602.399934] Call Trace:
[ 1602.399939] [c0000003b9937bb0] [d00000000a170b64] .rtas_flash_write+0x44/0x1e8 [rtas_flash] (unreliable)
[ 1602.399948] [c0000003b9937c60] [c000000000282830] .proc_reg_write+0x90/0xe0
[ 1602.399955] [c0000003b9937ce0] [c0000000001ff374] .vfs_write+0x114/0x238
[ 1602.399961] [c0000003b9937d80] [c0000000001ff5d8] .SyS_write+0x70/0xe8
[ 1602.399968] [c0000003b9937e30] [c000000000009cdc] syscall_exit+0x0/0xa0
[ 1602.399973] Instruction dump:
[ 1602.399977] eb698010 801b0028 2f80dcd6 419e00a4 2fbc0000 419e009c ebfb0030 2fbf0000
[ 1602.399989] 409e0010 480000d8 60000000 7c1f0378 <e81f0008> 2fa00000 409efff4 e81f0000
[ 1602.400012] ---[ end trace b4136d115dc31dac ]---
[ 1602.402178]
[ 1602.402185] Sending IPI to other CPUs
[ 1602.403329] IPI complete

This patch uses kmem_cache_zalloc() instead of kmem_cache_alloc() to
allocate memory, which makes sure memory is set to 0 before using.
Also removes rtas_block_ctor(), which is no longer required.
Signed-off-by: NVasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fb4696c3

powerpc: Fix build failure after merge of the cgroup tree · 9f3a90e8

由 Stephen Rothwell 提交于 4月 28, 2013

After merging the cgroup tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

arch/powerpc/mm/numa.c: In function 'arch_update_cpu_topology':
arch/powerpc/mm/numa.c:1465:2: error: implicit declaration of function 'kzalloc' [-Werror=implicit-function-declaration]
arch/powerpc/mm/numa.c:1465:10: error: assignment makes pointer from integer without a cast [-Werror]
arch/powerpc/mm/numa.c:1497:2: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration]

Caused by commit 30c05350 ("powerpc/pseries: Use stop machine to
update cpu maps") from the powerpc tree interacting with (probably)
commit ff794dea ("cpuset: remove include of cgroup.h from cpuset.h")
from the cgroup tree. Removing includes from header files is fraught
with danger ...

The former should have added an include of linux/slab.h to
arch/powerpc/mm/numa.c.

I have added the following merge fix patch for today (but it should be
applied to the powerpc tree ASAP).

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Mon, 29 Apr 2013 14:01:44 +1000
Subject: [PATCH] powerpc: numa.c: using kzalloc/kfree requires including
slab.h

fixes these build errors:

9f3a90e8

powerpc: Fix usage of setup_pci_atmu() · d5bbe659

由 Michael Neuling 提交于 4月 14, 2013

Linux next is currently failing to compile mpc85xx_defconfig with:
  arch/powerpc/sysdev/fsl_pci.c:944:2: error: too many arguments to function 'setup_pci_atmu'

This is caused by (from Kumar's next branch):
  commit 34642bbb
  Author: Kumar Gala <galak@kernel.crashing.org>
  powerpc/fsl-pci: Keep PCI SoC controller registers in pci_controller

Which changed definition of setup_pci_atmu() but didn't update one of
the callers.  Below fixes this.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Reviewed-by: NKim Phillips <kim.phillips@freescale.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

d5bbe659

mm: use vm_unmapped_area() on powerpc architecture · fba2369e

由 Michel Lespinasse 提交于 4月 29, 2013

Update the powerpc slice_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Tested-by: N"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fba2369e

mm: remove free_area_cache use in powerpc architecture · 34d07177

由 Michel Lespinasse 提交于 4月 29, 2013

As all other architectures have been converted to use vm_unmapped_area(),
we are about to retire the free_area_cache.

This change simply removes the use of that cache in
slice_get_unmapped_area(), which will most certainly have a
performance cost. Next one will convert that function to use the
vm_unmapped_area() infrastructure and regain the performance.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

34d07177

powerpc/fsl-booke: add the reg prop for pci bridge device node for T4/B4 · 9e2ecdbb

由 Kevin Hao 提交于 4月 14, 2013

The reg property in the pci bridge device node is used to bind this
device node to the pci bridge device. Then all the pci devices under
this bridge could use the interrupt maps defined in this device node
to do the irq translation. So if this property is missed, the pci
traditional irq mechanism will not work.
Signed-off-by: NKevin Hao <haokexin@gmail.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

9e2ecdbb

powerpc/fsl-pci: don't unmap the PCI SoC controller registers in setup_pci_atmu · 04aa99cd

由 Kevin Hao 提交于 4月 13, 2013

In patch 34642bbb (powerpc/fsl-pci: Keep PCI SoC controller registers in
pci_controller) we choose to keep the map of the PCI SoC controller
registers. But we missed to delete the unmap in setup_pci_atmu
function. This will cause the following call trace once we access
the PCI SoC controller registers later.

Unable to handle kernel paging request for data at address 0x8000080080040f14
Faulting instruction address: 0xc00000000002ea58
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=24 T4240 QDS
Modules linked in:
NIP: c00000000002ea58 LR: c00000000002eaf4 CTR: c00000000002eac0
REGS: c00000017e10b4a0 TRAP: 0300   Not tainted  (3.9.0-rc1-00052-gfa3529f-dirty)
MSR: 0000000080029000 <CE,EE,ME>  CR: 28adbe22  XER: 00000000
SOFTE: 0
DEAR: 8000080080040f14, ESR: 0000000000000000
TASK = c00000017e100000[1] 'swapper/0' THREAD: c00000017e108000 CPU: 2
GPR00: 0000000000000000 c00000017e10b720 c0000000009928d8 c00000017e578e00
GPR04: 0000000000000000 000000000000000c 0000000000000001 c00000017e10bb40
GPR08: 0000000000000000 8000080080040000 0000000000000000 0000000000000016
GPR12: 0000000088adbe22 c00000000fffa800 c000000000001ba0 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 c0000000008a5b70
GPR24: c0000000008af938 c0000000009a28d8 c0000000009bb5dc c00000017e10bb40
GPR28: c00000017e32a400 c00000017e10bc00 c00000017e32a400 c00000017e578e00
NIP [c00000000002ea58] .fsl_pcie_check_link+0x88/0xf0
LR [c00000000002eaf4] .fsl_indirect_read_config+0x34/0xb0
Call Trace:
[c00000017e10b720] [c00000017e10b7a0] 0xc00000017e10b7a0 (unreliable)
[c00000017e10ba30] [c00000000002eaf4] .fsl_indirect_read_config+0x34/0xb0
[c00000017e10bad0] [c00000000033aa08] .pci_bus_read_config_byte+0x88/0xd0
[c00000017e10bb90] [c00000000088d708] .pci_apply_final_quirks+0x9c/0x18c
[c00000017e10bc40] [c0000000000013dc] .do_one_initcall+0x5c/0x1f0
[c00000017e10bcf0] [c00000000086ebac] .kernel_init_freeable+0x180/0x26c
[c00000017e10bdb0] [c000000000001bbc] .kernel_init+0x1c/0x460
[c00000017e10be30] [c000000000000880] .ret_from_kernel_thread+0x64/0xe4
Instruction dump:
38210310 2b800015 4fdde842 7c600026 5463fffe e8010010 7c0803a6 4e800020
60000000 60000000 e92301d0 7c0004ac <80690f14> 0c030000 4c00012c 38210310
---[ end trace 7a8fe0cbccb7d992 ]---

Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
Signed-off-by: NKevin Hao <haokexin@gmail.com>
Acked-by: NRoy Zang <tie-fei.zang@freescale.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

04aa99cd

powerpc/dts: Fix the dts for p1025rdb 36bit · 9aa171fb

由 Zhicheng Fan 提交于 3月 25, 2013

Fix the following errors:

Error: p1025rdb.dtsi:326.2-3 label or path, 'qe', not found
Error: p1021si-post.dtsi:242.2-3 label or path, 'qe', not found
FATAL ERROR: Syntax error parsing input tree
Signed-off-by: NZhicheng Fan <B32736@freescale.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

9aa171fb

26 4月, 2013 19 次提交

powerpc/perf: Enable branch stack sampling framework · 3925f46b

由 Anshuman Khandual 提交于 4月 22, 2013

Provides basic enablement for perf branch stack sampling framework on
POWER8 processor based platforms. Adds new BHRB related elements into
cpu_hw_event structure to represent current BHRB config, BHRB filter
configuration, manage context and to hold output BHRB buffer during
PMU interrupt before passing to the user space. This also enables
processing of BHRB data and converts them into generic perf branch
stack data format.
Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

3925f46b

powerpc/perf: Define BHRB generic functions, data and flags for POWER8 · b1113557

由 Anshuman Khandual 提交于 4月 22, 2013

This patch populates BHRB specific data for power_pmu structure. It
also implements POWER8 specific BHRB filter and configuration functions.
Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

b1113557

powerpc/perf: Add new BHRB related generic functions, data and flags · 5afc9b52

由 Anshuman Khandual 提交于 4月 22, 2013

This patch adds couple of generic functions to power_pmu structure
which would configure the BHRB and it's filters. It also adds
representation of the number of BHRB entries present on the PMU.
A new PMU flag PPMU_BHRB would indicate presence of BHRB feature.
Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

5afc9b52

powerpc/perf: Add basic assembly code to read BHRB entries on POWER8 · 73760931

由 Anshuman Khandual 提交于 4月 22, 2013

This patch adds the basic assembly code to read BHRB buffer. BHRB entries
are valid only after a PMU interrupt has happened (when MMCR0[PMAO]=1)
and BHRB has been freezed. BHRB read should not be attempted when it is
still enabled (MMCR0[PMAE]=1) and getting updated, as this can produce
non-deterministic results.
Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

73760931

powerpc/perf: Add new BHRB related instructions for POWER8 · 95213959

由 Anshuman Khandual 提交于 4月 22, 2013

This patch adds new POWER8 instruction encoding for reading
and clearing Branch History Rolling Buffer entries. The new
instruction 'mfbhrbe' (move from branch history rolling buffer
entry) is used to read BHRB buffer entries and instruction
'clrbhrb' (clear branch history rolling buffer) is used to
clear the entire buffer. The instruction 'clrbhrb' has straight
forward encoding. But the instruction encoding format for
reading the BHRB entries is like 'mfbhrbe RT, BHRBE' where it
takes two arguments, i.e the index for the BHRB buffer entry to
read and a general purpose register to put the value which was
read from the buffer entry.
Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

95213959

powerpc/perf: Power8 PMU support · e05b9b9e

由 Michael Ellerman 提交于 4月 25, 2013

This patch adds support for the power8 PMU to perf.

Work is ongoing to add generic cache events.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

e05b9b9e

powerpc/perf: Add support for SIER · 8f61aa32

由 Michael Ellerman 提交于 4月 25, 2013

On power8 we have a new SIER (Sampled Instruction Event Register), which
captures information about instructions when we have random sampling
enabled.

Add support for loading the SIER into pt_regs, overloading regs->dar.
Also set the new NO_SIPR flag in regs->result if we don't have SIPR.

Update regs_sihv/sipr() to look for SIPR/SIHV in SIER.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

8f61aa32

powerpc/perf: Add regs_no_sipr() · 860aad71

由 Michael Ellerman 提交于 4月 25, 2013

On power8 the presence or absence of SIPR depends on settings at runtime,
so convert to using a dynamic flag for NO_SIPR. Existing backends that
set NO_SIPR unconditionally set the dynamic flag obviously.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

860aad71

powerpc/perf: Add an accessor for regs->result · 33904054

由 Michael Ellerman 提交于 4月 25, 2013

Add an accessor for regs->result so we can use it to store more flags in
future.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

33904054

powerpc/perf: Convert mmcra_sipr/sihv() to regs_sipr/sihv() · 5682c460

由 Michael Ellerman 提交于 4月 25, 2013

On power8 the SIPR and SIHV are not in MMCRA, so convert the routines
to take regs and change the names accordingly.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

5682c460

powerpc/perf: Add an explict flag indicating presence of SLOT field · 7a786832

由 Michael Ellerman 提交于 4月 25, 2013

In perf_ip_adjust() we potentially use the MMCRA[SLOT] field to adjust
the reported IP of a sampled instruction.

Currently the logic is written so that if the backend does NOT have
the PPMU_ALT_SIPR flag set then we assume MMCRA[SLOT] exists.

However on power8 we do not want to set ALT_SIPR (it's in a third
location), and we also do not have MMCRA[SLOT].

So add a new flag which only indicates whether MMCRA[SLOT] exists.

Naively we'd set it on everything except power6/7, because they set
ALT_SIPR, and we've reversed the polarity of the flag. But it's more
complicated than that.

mpc7450 is 32-bit, and uses its own version of perf_ip_adjust()
which doesn't use MMCRA[SLOT], so it doesn't need the new flag set and
the behaviour is unchanged.

PPC970 (and I assume power4) don't have MMCRA[SLOT], so shouldn't have
the new flag set. This is a behaviour change on those cpus, though we
were probably getting lucky and the bits in question were 0.

power5 and power5+ set the new flag, behaviour unchanged.

power6 & power7 do not set the new flag, behaviour unchanged.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

7a786832

powerpc: Initialise PMU related regs on Power8 · 240686c1

由 Michael Ellerman 提交于 4月 25, 2013

For both HV and guest kernels, intialise PMU regs to something sane.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

240686c1

powerpc/powernv: Fix invalid IOMMU table · 959c9bdd

由 Gavin Shan 提交于 4月 25, 2013

Ben found the root cause. Commit 37f02195
("powerpc/pci: fix PCI-e devices rescan issue on powerpc platform")
overwrites the IOMMU table of PCI device while enabling PCI device.
The patch intends to fix the IOMMU table after that point.
Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

959c9bdd

powerpc/powernv: Build DMA space for PE on PHB3 · 373f5657

由 Gavin Shan 提交于 4月 25, 2013

The patch intends to build 32-bits DMA space for individual PEs on
PHB3. The TVE# is recognized by the combo of PE# and fixed bits
from DMA address, which is zero for 32-bits DMA space.
Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

373f5657

powerpc/powernv: TCE invalidation for PHB3 · 4cce9550

由 Gavin Shan 提交于 4月 25, 2013

The TCE should be invalidated while it's created or free'd. The
approach to do that for IODA1 and IODA2 compliant PHBs are different.
So the patch differentiate them with different functions called to
do that for IODA1 and IODA2 compliant PHBs. It's notable that the
PCI address is used to invalidate the corresponding TCE on IODA2
compliant PHB3.
Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

4cce9550

powerpc/powernv: Patch MSI EOI handler on P8 · 137436c9

由 Gavin Shan 提交于 4月 25, 2013

The EOI handler of MSI/MSI-X interrupts for P8 (PHB3) need additional
steps to handle the P/Q bits in IVE before EOIing the corresponding
interrupt. The patch changes the EOI handler to cover that. we have
individual IRQ chip in each PHB instance. During the MSI IRQ setup
time, the IRQ chip is copied over from the original one for that IRQ,
and the EOI handler is patched with the one that will handle the P/Q
bits (As Ben suggested).
Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

137436c9

powerpc/powernv: Add option CONFIG_POWERNV_MSI · a486bdb0

由 Gavin Shan 提交于 4月 25, 2013

As Michael Ellerman suggested, to add CONFIG_POWERNV_MSI for PowerNV
platform. That's similar to CONFIG_PSERIES_MSI for pSeries platform.
For now, we don't make it dependent on CONFIG_EEH since it's not ready
to enable that yet.

Apart from that, we also enable CONFIG_PPC_MSI_BITMAP on selecting
CONFIG_POWERNV_MSI.
Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

a486bdb0

powerpc/powernv: Supports PHB3 · aa0c033f

由 Gavin Shan 提交于 4月 25, 2013

The patch intends to initialize PHB3 during system boot stage. The
flag "PNV_PHB_MODEL_PHB3" is introduced to differentiate IODA2
compatible PHB3 from other types of PHBs.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

aa0c033f

powerpc: Fix "attempt to move .org backwards" error · a485c709

由 Paul Mackerras 提交于 4月 25, 2013

Building a 64-bit powerpc kernel with PR KVM enabled currently gives
this error:

  AS      arch/powerpc/kernel/head_64.o
arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:258: Error: attempt to move .org backwards
make[2]: *** [arch/powerpc/kernel/head_64.o] Error 1

This happens because the MASKABLE_EXCEPTION_PSERIES macro turns into
33 instructions, but we only have space for 32 at the decrementer
interrupt vector (from 0x900 to 0x980).

In the code generated by the MASKABLE_EXCEPTION_PSERIES macro, we
currently have two instances of the HMT_MEDIUM macro, which has the
effect of setting the SMT thread priority to medium.  One is the
first instruction, and is overwritten by a no-op on processors where
we save the PPR (processor priority register), that is, POWER7 or
later.  The other is after we have saved the PPR.

In order to reduce the code at 0x900 by one instruction, we omit the
first HMT_MEDIUM.  On processors without SMT this will have no effect
since HMT_MEDIUM is a no-op there.  On POWER5 and RS64 machines this
will mean that the first few instructions take a little longer in the
case where a decrementer interrupt occurs when the hardware thread is
running at low SMT priority.  On POWER6 and later machines, the
hardware automatically boosts the thread priority when a decrementer
interrupt is taken if the thread priority was below medium, so this
change won't make any difference.

The alternative would be to branch out of line after saving the CFAR.
However, that would incur an extra overhead on all processors, whereas
the approach adopted here only adds overhead on older threaded processors.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

a485c709

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功