omap4: l2x0: enable instruction and data prefetching

Enabling L2 prefetching improves performance as shown on Panda ES2.1 board with mem test, and it has measurable impact on performances. I think we should consider it, even though it damages "writes" a bit. (rebased to k.org) Usually the prefetch is used at both levels together L1 + L2, however, to enable the CP15 prefetch engines, these are under security, and on GP devices, we cannot enable it(e.g. on PandaBoard). However, just enabling PL310 prefetch seems to provide performance improvement, as shown in the data below (from Ubuntu) and would be a great thing to pull in. What prefetch does is enable automatic next line prefetching. With this enabled, whenever the PL310 receives a cachable read request, it automatically prefetches the following cache line as well. Measurement Data: == STOCK 10.10 WITHOUT PATCH ======================== ~# ./memspeed size 8388608 8192k 8M offset 8388608, 0 buffers 0x2aaad000 0x2b2ad000 copy libc 133 MB/s copy Android v5 273 MB/s copy Android NEON 235 MB/s copy INT32 116 MB/s copy ASM ARM 187 MB/s copy ASM VLDM 64 204 MB/s copy ASM VLDM 128 173 MB/s copy ASM VLD1 216 MB/s read ASM ARM 286 MB/s read ASM VLDM 242 MB/s read ASM VLD1 286 MB/s write libc 1947 MB/s write ASM ARM 1943 MB/s write ASM VSTM 1942 MB/s write ASM VST1 1935 MB/s 10.10 + PATCH ============= ~# ./memspeed size 8388608 8192k 8M offset 8388608, 0 buffers 0x2ab17000 0x2b317000 copy libc 129 MB/s copy Android v5 256 MB/s copy Android NEON 356 MB/s copy INT32 127 MB/s copy ASM ARM 321 MB/s copy ASM VLDM 64 337 MB/s copy ASM VLDM 128 321 MB/s copy ASM VLD1 350 MB/s read ASM ARM 496 MB/s read ASM VLDM 470 MB/s read ASM VLD1 488 MB/s write libc 1701 MB/s write ASM ARM 1682 MB/s write ASM VSTM 1693 MB/s write ASM VST1 1681 MB/s Signed-off-by: N Mans Rullgard <mans@mansr.com> Signed-off-by: N Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: N Nishanth Menon <nm@ti.com> Signed-off-by: N Tony Lindgren <tony@atomide.com>

omap4: l2x0: enable instruction and data prefetching
Enabling L2 prefetching improves performance as shown on Panda ES2.1 board with mem test, and it has measurable impact on performances. I think we should consider it, even though it damages "writes" a bit. (rebased to k.org) Usually the prefetch is used at both levels together L1 + L2, however, to enable the CP15 prefetch engines, these are under security, and on GP devices, we cannot enable it(e.g. on PandaBoard). However, just enabling PL310 prefetch seems to provide performance improvement, as shown in the data below (from Ubuntu) and would be a great thing to pull in. What prefetch does is enable automatic next line prefetching. With this enabled, whenever the PL310 receives a cachable read request, it automatically prefetches the following cache line as well. Measurement Data: == STOCK 10.10 WITHOUT PATCH ======================== ~# ./memspeed size 8388608 8192k 8M offset 8388608, 0 buffers 0x2aaad000 0x2b2ad000 copy libc 133 MB/s copy Android v5 273 MB/s copy Android NEON 235 MB/s copy INT32 116 MB/s copy ASM ARM 187 MB/s copy ASM VLDM 64 204 MB/s copy ASM VLDM 128 173 MB/s copy ASM VLD1 216 MB/s read ASM ARM 286 MB/s read ASM VLDM 242 MB/s read ASM VLD1 286 MB/s write libc 1947 MB/s write ASM ARM 1943 MB/s write ASM VSTM 1942 MB/s write ASM VST1 1935 MB/s 10.10 + PATCH ============= ~# ./memspeed size 8388608 8192k 8M offset 8388608, 0 buffers 0x2ab17000 0x2b317000 copy libc 129 MB/s copy Android v5 256 MB/s copy Android NEON 356 MB/s copy INT32 127 MB/s copy ASM ARM 321 MB/s copy ASM VLDM 64 337 MB/s copy ASM VLDM 128 321 MB/s copy ASM VLD1 350 MB/s read ASM ARM 496 MB/s read ASM VLDM 470 MB/s read ASM VLD1 488 MB/s write libc 1701 MB/s write ASM ARM 1682 MB/s write ASM VSTM 1693 MB/s write ASM VST1 1681 MB/s Signed-off-by: N Mans Rullgard <mans@mansr.com> Signed-off-by: N Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: N Nishanth Menon <nm@ti.com> Signed-off-by: N Tony Lindgren <tony@atomide.com>
11e02640 · Mans Rullgard · Tony Lindgren · 1773e60a · 11e02640
隐藏空白更改
内联并排

Showing with 11 addition and 6 deletion

arch/arm/mach-omap2/omap4-common.c arch/arm/mach-omap2/omap4-common.c +11 -6

未找到文件。
--- a/arch/arm/mach-omap2/omap4-common.c
+++ b/arch/arm/mach-omap2/omap4-common.c
@@ -66,9 +66,6 @@ static int __init omap_l2_cache_init(void)
 	l2cache_base = ioremap(OMAP44XX_L2CACHE_BASE, SZ_4K);
 	BUG_ON(!l2cache_base);

-	/* Enable PL310 L2 Cache controller */
-	omap_smc1(0x102, 0x1);
-
 	/*
 	 * 16-way associativity, parity disabled
 	 * Way size - 32KB (es1.0)
@@ -79,10 +76,18 @@ static int __init omap_l2_cache_init(void)
 			(0x1 << L2X0_AUX_CTRL_NS_LOCKDOWN_SHIFT) |
 			(0x1 << L2X0_AUX_CTRL_NS_INT_CTRL_SHIFT));

-	if (omap_rev() == OMAP4430_REV_ES1_0)
+	if (omap_rev() == OMAP4430_REV_ES1_0) {
 		aux_ctrl |= 0x2 << L2X0_AUX_CTRL_WAY_SIZE_SHIFT;
-	else
-		aux_ctrl |= 0x3 << L2X0_AUX_CTRL_WAY_SIZE_SHIFT;
+	} else {
+		aux_ctrl |= ((0x3 << L2X0_AUX_CTRL_WAY_SIZE_SHIFT) |
+			(1 << L2X0_AUX_CTRL_DATA_PREFETCH_SHIFT) |
+			(1 << L2X0_AUX_CTRL_INSTR_PREFETCH_SHIFT));
+	}
+	if (omap_rev() != OMAP4430_REV_ES1_0)
+		omap_smc1(0x109, aux_ctrl);
+
+	/* Enable PL310 L2 Cache controller */
+	omap_smc1(0x102, 0x1);

 	l2x0_init(l2cache_base, aux_ctrl, L2X0_AUX_CTRL_MASK);