提交 · 7999b4d18211bcfb40e3574cf75e94518e9fa2c6 · openanolis / cloud-kernel

18 10月, 2013 3 次提交

KVM: ARM: Support hugetlbfs backed huge pages · ad361f09

由 Christoffer Dall 提交于 11月 01, 2012

Support huge pages in KVM/ARM and KVM/ARM64.  The pud_huge checking on
the unmap path may feel a bit silly as the pud_huge check is always
defined to false, but the compiler should be smart about this.

Note: This deals only with VMAs marked as huge which are allocated by
users through hugetlbfs only.  Transparent huge pages can only be
detected by looking at the underlying pages (or the page tables
themselves) and this patch so far simply maps these on a page-by-page
level in the Stage-2 page tables.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

ad361f09

KVM: ARM: Update comments for kvm_handle_wfi · 86ed81aa

由 Christoffer Dall 提交于 10月 15, 2013

Update comments to reflect what is really going on and add the TWE bit
to the comments in kvm_arm.h.

Also renames the function to kvm_handle_wfx like is done on arm64 for
consistency and uber-correctness.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

86ed81aa

ARM: KVM: Yield CPU when vcpu executes a WFE · 58d5ec8f

由 Marc Zyngier 提交于 10月 08, 2013

On an (even slightly) oversubscribed system, spinlocks are quickly
becoming a bottleneck, as some vcpus are spinning, waiting for a
lock to be released, while the vcpu holding the lock may not be
running at all.

This creates contention, and the observed slowdown is 40x for
hackbench. No, this isn't a typo.

The solution is to trap blocking WFEs and tell KVM that we're
now spinning. This ensures that other vpus will get a scheduling
boost, allowing the lock to be released more quickly. Also, using
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
when the VM is severely overcommited.

Quick test to estimate the performance: hackbench 1 process 1000

2xA15 host (baseline):	1.843s

2xA15 guest w/o patch:	2.083s
4xA15 guest w/o patch:	80.212s
8xA15 guest w/o patch:	Could not be bothered to find out

2xA15 guest w/ patch:	2.102s
4xA15 guest w/ patch:	3.205s
8xA15 guest w/ patch:	6.887s

So we go from a 40x degradation to 1.5x in the 2x overcommit case,
which is vaguely more acceptable.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

58d5ec8f

16 10月, 2013 2 次提交

KVM: ARM: Update comments for kvm_handle_wfi · 82ea046c

由 Christoffer Dall 提交于 10月 15, 2013

Update comments to reflect what is really going on and add the TWE bit
to the comments in kvm_arm.h.

Also renames the function to kvm_handle_wfx like is done on arm64 for
consistency and uber-correctness.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

82ea046c

ARM: KVM: Yield CPU when vcpu executes a WFE · 1f558098

由 Marc Zyngier 提交于 10月 08, 2013

On an (even slightly) oversubscribed system, spinlocks are quickly
becoming a bottleneck, as some vcpus are spinning, waiting for a
lock to be released, while the vcpu holding the lock may not be
running at all.

This creates contention, and the observed slowdown is 40x for
hackbench. No, this isn't a typo.

The solution is to trap blocking WFEs and tell KVM that we're
now spinning. This ensures that other vpus will get a scheduling
boost, allowing the lock to be released more quickly. Also, using
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
when the VM is severely overcommited.

Quick test to estimate the performance: hackbench 1 process 1000

2xA15 host (baseline):	1.843s

2xA15 guest w/o patch:	2.083s
4xA15 guest w/o patch:	80.212s
8xA15 guest w/o patch:	Could not be bothered to find out

2xA15 guest w/ patch:	2.102s
4xA15 guest w/ patch:	3.205s
8xA15 guest w/ patch:	6.887s

So we go from a 40x degradation to 1.5x in the 2x overcommit case,
which is vaguely more acceptable.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

1f558098

14 10月, 2013 1 次提交

KVM: ARM: Get rid of KVM_HPAGE defines · dc6f6763

由 Christoffer Dall 提交于 10月 02, 2013

The KVM_HPAGE_DEFINES are a little artificial on ARM, since the huge
page size is statically defined at compile time and there is only a
single huge page size.

Now when the main kvm code relying on these defines has been moved to
the x86 specific part of the world, we can get rid of these.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

dc6f6763

13 10月, 2013 2 次提交

KVM: ARM: Add support for Cortex-A7 · e8c2d99f

由 Jonathan Austin 提交于 9月 26, 2013

This patch adds support for running Cortex-A7 guests on Cortex-A7 hosts.

As Cortex-A7 is architecturally compatible with A15, this patch is largely just
generalising existing code. Areas where 'implementation defined' behaviour
is identical for A7 and A15 is moved to allow it to be used by both cores.

The check to ensure that coprocessor register tables are sorted correctly is
also moved in to 'common' code to avoid each new cpu doing its own check
(and possibly forgetting to do so!)
Signed-off-by: NJonathan Austin <jonathan.austin@arm.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

e8c2d99f

KVM: ARM: fix the size of TTBCR_{T0SZ,T1SZ} masks · 5e497046

由 Jonathan Austin 提交于 9月 26, 2013

The T{0,1}SZ fields of TTBCR are 3 bits wide when using the long descriptor
format. Likewise, the T0SZ field of the HTCR is 3-bits. KVM currently
defines TTBCR_T{0,1}SZ as 3, not 7.

The T0SZ mask is used to calculate the value for the HTCR, both to pick out
TTBCR.T0SZ and mask off the equivalent field in the HTCR during
read-modify-write. The incorrect mask size causes the (UNKNOWN) reset value
of HTCR.T0SZ to leak in to the calculated HTCR value. Linux will hang when
initializing KVM if HTCR's reset value has bit 2 set (sometimes the case on
A7/TC2)

Fixing T0SZ allows A7 cores to boot and T1SZ is also fixed for completeness.
Signed-off-by: NJonathan Austin <jonathan.austin@arm.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

5e497046

03 10月, 2013 1 次提交

ARM: KVM: Implement kvm_vcpu_preferred_target() function · 4a6fee80

由 Anup Patel 提交于 9月 30, 2013

This patch implements kvm_vcpu_preferred_target() function for
KVM ARM which will help us implement KVM_ARM_PREFERRED_TARGET ioctl
for user space.
Signed-off-by: NAnup Patel <anup.patel@linaro.org>
Signed-off-by: NPranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

4a6fee80

02 9月, 2013 3 次提交

D
Move the EM_ARM and EM_AARCH64 definitions to uapi/linux/elf-em.h · 909e3ee4
由 Dan Aloni 提交于 8月 28, 2013
```
Signed-off-by: NDan Aloni <alonid@stratoscale.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
```
909e3ee4

ARM: 7829/1: Add ".text.unlikely" and ".text.hot" to arm unwind tables · 849b882b

由 Douglas Anderson 提交于 8月 29, 2013

It appears that gcc may put some code in ".text.unlikely" or
".text.hot" sections.  Right now those aren't accounted for in unwind
tables.  Add them.

I found some docs about this at:
  http://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc.pdf

Without this, if you have slub_debug turned on, you can get messages
that look like this:
  unwind: Index not found 7f008c50
Signed-off-by: NDoug Anderson <dianders@chromium.org>
Acked-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

849b882b

ARM: 7828/1: ARMv7-M: implement restart routine common to all v7-M machines · 6a7d2c62

由 Uwe Kleine-König 提交于 8月 27, 2013

The newly introduced function is to be used as .restart callback for
ARMv7-M machines. The used register is architecturally defined, so it
should work for all M-class machines.
Acked-by: NJonathan Austin <jonathan.austin@arm.com>
Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

6a7d2c62

31 8月, 2013 1 次提交

ARM: KVM: Fix kvm_set_pte assignment · 0963e5d0

由 Christoffer Dall 提交于 8月 08, 2013

THe kvm_set_pte function was actually assigning the entire struct to the
structure member, which should work because the structure only has that
one member, but it is still not very nice.
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

0963e5d0

27 8月, 2013 1 次提交

drivers: dma-contiguous: clean source code and prepare for device tree · a2547380

由 Marek Szyprowski 提交于 7月 29, 2013

This patch cleans the initialization of dma contiguous framework. The
all-in-one dma_declare_contiguous() function is now separated into
dma_contiguous_reserve_area() which only steals the the memory from
memblock allocator and dma_contiguous_add_device() function, which
assigns given device to the specified reserved memory area. This improves
the flexibility in defining contiguous memory areas and assigning device
to them, because now it is possible to assign more than one device to
the given contiguous memory area. Such split in initialization procedure
is also required for upcoming device tree support.
Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Acked-by: NKyungmin Park <kyungmin.park@samsung.com>
Acked-by: NMichal Nazarewicz <mina86@mina86.com>
Acked-by: NTomasz Figa <t.figa@samsung.com>

a2547380

26 8月, 2013 7 次提交

ARM: debug: move PL01X debug include into arch/arm/include/debug/ · 4e218b99

由 Russell King 提交于 7月 07, 2013

Now that the PL01X debug include can mostly stand alone without
requiring platforms to provide any macros, move it into the debug
directory so it can be directly included.  This allows us to get rid of
a lot of debug-macros include files.

The autodetect case for Versatile Express and the ux500 are left alone;
these are more complicated implementations.
Acked-by: NRob Herring <rob.herring@calxeda.com>
Acked-by: NRyan Mallon <rmallon@gmail.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

4e218b99

ARM: debug: provide PL01x debug uart phys/virt address configuration options · 5c972af4

由 Russell King 提交于 7月 07, 2013

Move the definition of the UART register addresses out of the platform
specific header files into the Kconfig files.
Acked-by: NRyan Mallon <rmallon@gmail.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

5c972af4

ARM: debug: move 8250 debug include into arch/arm/include/debug/ · 2facbc88

由 Russell King 提交于 7月 07, 2013

Now that the 8250 debug include can stand alone without requiring
platforms to provide any macros, move it into the debug directory
so it can be directly included.  This allows us to get rid of a lot
of debug-macros include files.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

2facbc88

ARM: debug: provide 8250 debug uart phys/virt address configuration options · c3faa9b7

由 Russell King 提交于 7月 07, 2013

Move the definition of the UART register addresses out of the platform
specific header file into the Kconfig files.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

c3faa9b7

ARM: debug: provide 8250 debug uart register shift configuration option · 4a003647

由 Russell King 提交于 7月 06, 2013

Move the definition of the UART register shift out of the platform
specific header file into the Kconfig files.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

4a003647

ARM: debug: provide 8250 debug uart flow control configuration option · 7610b607

由 Russell King 提交于 7月 06, 2013

Move the definition out of the machine class debug-macro.S header
into the Kconfig files.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

7610b607

ARM: 7822/1: add workaround for ambiguous C99 stdint.h types · 09096f6a

由 Ard Biesheuvel 提交于 8月 20, 2013

The C99 types uintXX_t that are usually defined in 'stdint.h' are not as
unambiguous on ARM as you would expect. For the types below, there is a
difference on ARM between GCC built for bare metal ARM, GCC built for glibc
and the kernel itself, which results in build errors if you try to build with
-ffreestanding and include 'stdint.h' (such as when you include 'arm_neon.h'
in order to use NEON intrinsics)

As the typedefs for these types in 'stdint.h' are based on builtin defines
supplied by GCC, we can tweak these to align with the kernel's idea of those
types, so 'linux/types.h' and 'stdint.h' can be safely included from the same
source file (provided that -ffreestanding is used).

                   int32_t         uint32_t               uintptr_t
bare metal GCC     long            unsigned long          unsigned int
glibc GCC          int             unsigned int           unsigned int
kernel             int             unsigned int           unsigned long

Acked by: Dave Martin <dave.martin@arm.com>
Acked-by: NNicolas Pitre <nico@linaro.org>
Acked-by: NMikael Pettersson <mikpe@it.uu.se>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

09096f6a

23 8月, 2013 1 次提交

ARM: move outer_cache declaration out of ifdef · 0b53c11d

由 Rob Herring 提交于 8月 17, 2013

Move the outer_cache declaration of the CONFIG_OUTER_CACHE ifdef so that
outer_cache can be used inside IS_ENABLED condition.
Signed-off-by: NRob Herring <rob.herring@calxeda.com>
Cc: Russell King <linux@arm.linux.org.uk>

0b53c11d

20 8月, 2013 2 次提交

ARM: cacheflush: don't round address range up to nearest page · d9524dc3

由 Will Deacon 提交于 8月 21, 2012

The flush_cache_user_range macro takes a pair of addresses describing
the start and end of the virtual address range to flush. Due to an
accidental oversight when flush_cache_range_user was introduced, the
address range was rounded up so that the start and end addresses were
page-aligned.

For historical reference, the interesting commits in history.git are:

10eacf1775e1 ("[ARM] Clean up ARM cache handling interfaces (part 1)")
71432e79b76b ("[ARM] Add flush_cache_user_page() for sys_cacheflush()")

This patch removes the alignment code, reducing the amount of flushing
required for ranges that are not an exact multiple of PAGE_SIZE.
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Reported-by: NJonathan Austin <jonathan.austin@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

d9524dc3

ARM: cacheflush: split user cache-flushing into interruptible chunks · 28256d61

由 Will Deacon 提交于 5月 13, 2013

Flushing a large, non-faulting VMA from userspace can potentially result
in a long time spent flushing the cache line-by-line without preemption
occurring (in the case of CONFIG_PREEMPT=n).

Whilst this doesn't affect the stability of the system, it can certainly
affect the responsiveness and CPU availability for other tasks.

This patch splits up the user cacheflush code so that it flushes in
chunks of a page. After each chunk has been flushed, we may reschedule
if appropriate and, before processing the next chunk, we allow any
pending signals to be handled before resuming from where we left off.
Signed-off-by: NWill Deacon <will.deacon@arm.com>

28256d61

16 8月, 2013 1 次提交

Fix TLB gather virtual address range invalidation corner cases · 2b047252

由 Linus Torvalds 提交于 8月 15, 2013

Ben Tebulin reported:

"Since v3.7.2 on two independent machines a very specific Git
repository fails in 9/10 cases on git-fsck due to an SHA1/memory
failures. This only occurs on a very specific repository and can be
reproduced stably on two independent laptops. Git mailing list ran
out of ideas and for me this looks like some very exotic kernel issue"

and bisected the failure to the backport of commit 53a59fc6 ("mm:
limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT").

That commit itself is not actually buggy, but what it does is to make it
much more likely to hit the partial TLB invalidation case, since it
introduces a new case in tlb_next_batch() that previously only ever
happened when running out of memory.

The real bug is that the TLB gather virtual memory range setup is subtly
buggered. It was introduced in commit 597e1c35 ("mm/mmu_gather:
enable tlb flush range in generic mmu_gather"), and the range handling
was already fixed at least once in commit e6c495a9 ("mm: fix the TLB
range flushed when __tlb_remove_page() runs out of slots"), but that fix
was not complete.

The problem with the TLB gather virtual address range is that it isn't
set up by the initial tlb_gather_mmu() initialization (which didn't get
the TLB range information), but it is set up ad-hoc later by the
functions that actually flush the TLB. And so any such case that forgot
to update the TLB range entries would potentially miss TLB invalidates.

Rather than try to figure out exactly which particular ad-hoc range
setup was missing (I personally suspect it's the hugetlb case in
zap_huge_pmd(), which didn't have the same logic as zap_pte_range()
did), this patch just gets rid of the problem at the source: make the
TLB range information available to tlb_gather_mmu(), and initialize it
when initializing all the other tlb gather fields.

This makes the patch larger, but conceptually much simpler. And the end
result is much more understandable; even if you want to play games with
partial ranges when invalidating the TLB contents in chunks, now the
range information is always there, and anybody who doesn't want to
bother with it won't introduce subtle bugs.

Ben verified that this fixes his problem.
Reported-bisected-and-tested-by: NBen Tebulin <tebulin@googlemail.com>
Build-testing-by: NStephen Rothwell <sfr@canb.auug.org.au>
Build-testing-by: NRichard Weinberger <richard.weinberger@gmail.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2b047252

14 8月, 2013 5 次提交

ARM: use phys_addr_t for DMA zone sizes · 364230b9

由 Rob Herring 提交于 8月 01, 2013

In order to specify a DMA zone size of 4GB on LPAE systems, the sizes need
to be 64-bit. So make machine_desc.dma_zone_size and arm_dma_zone_size be
phys_addr_t instead of unsigned long.
Signed-off-by: NRob Herring <rob.herring@calxeda.com>

364230b9

ARM: 7808/1: KVM: mm: Get rid of L_PTE_USER ref from PAGE_S2_DEVICE · 8947c09d

由 Christoffer Dall 提交于 8月 06, 2013

THe L_PTE_USER actually has nothing to do with stage 2 mappings and the
L_PTE_S2_RDWR value sets the readable bit, which was what L_PTE_USER
was used for before proper handling of stage 2 memory defines.

Changelog:
  [v3]: Drop call to kvm_set_s2pte_writable in mmu.c
  [v2]: Change default mappings to be r/w instead of r/o, as per Marc
     Zyngier's suggestion.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

8947c09d

ARM: 7807/1: kexec: validate CPU hotplug support · 2103f6cb

由 Stephen Warren 提交于 8月 02, 2013

Architectures should fully validate whether kexec is possible as part of
machine_kexec_prepare(), so that user-space's kexec_load() operation can
report any problems. Performing validation in machine_kexec() itself is
too late, since it is not allowed to return.

Prior to this patch, ARM's machine_kexec() was testing after-the-fact
whether machine_kexec_prepare() was able to disable all but one CPU.
Instead, modify machine_kexec_prepare() to validate all conditions
necessary for machine_kexec_prepare()'s to succeed. BUG if the validation
succeeded, yet disabling the CPUs didn't actually work.
Signed-off-by: NStephen Warren <swarren@nvidia.com>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

2103f6cb

ARM: 7812/1: rwlocks: retry trylock operation if strex fails on free lock · 00efaa02

由 Will Deacon 提交于 8月 12, 2013

Commit 15e7e5c1 ("ARM: 7749/1: spinlock: retry trylock operation if
strex fails on free lock") modifying our arch_spin_trylock to retry the
acquisition if the lock appeared uncontended, but the strex failed.

This patch does the same for rwlocks, which were missed by the original
patch.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

00efaa02

ARM: 7811/1: locks: use early clobber in arch_spin_trylock · afa31d8e

由 Will Deacon 提交于 8月 12, 2013

The res variable is written before we've finished with the input
operands (namely the lock address), so ensure that we mark it as `early
clobber' to avoid unintended register sharing.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

afa31d8e

12 8月, 2013 8 次提交

ARM: pci: add ->add_bus() and ->remove_bus() hooks to hw_pci · 9d981ea5

由 Thomas Petazzoni 提交于 8月 09, 2013

Some PCI drivers may need to adjust the pci_bus structure after it has
been allocated by the Linux PCI core. The PCI core allows
architectures to implement the pcibios_add_bus() and
pcibios_remove_bus() for this purpose. This commit therefore extends
the hw_pci and pci_sys_data structures of the ARM PCI core to allow
PCI drivers to register ->add_bus() and ->remove_bus() in hw_pci,
which will get called when a bus is added or removed from the system.

This will be used for example by the Marvell PCIe driver to connect a
particular PCI bus with its corresponding MSI chip to handle Message
Signaled Interrupts.
Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: NThierry Reding <thierry.reding@gmail.com>
Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Tested-by: NDaniel Price <daniel.price@gmail.com>
Tested-by: NThierry Reding <thierry.reding@gmail.com>
Signed-off-by: NJason Cooper <jason@lakedaemon.net>

9d981ea5

ARM: cacheflush: use -ishst dsb variant for ensuring flush completion · 6af396a6

由 Will Deacon 提交于 6月 12, 2013

flush_cache_vmap contains a dsb to ensure that any cacheflushing
operations to flush out newly written ptes have completed.

This patch adds the -ishst option to the dsb, since that is all that is
required for completing cacheflushing in the inner-shareable domain.
Signed-off-by: NWill Deacon <will.deacon@arm.com>

6af396a6

ARM: spinlock: use inner-shareable dsb variant prior to sev instruction · 73a6fdc4

由 Will Deacon 提交于 5月 13, 2013

When unlocking a spinlock, we use the sev instruction to signal other
CPUs waiting on the lock. Since sev is not a memory access instruction,
we require a dsb in order to ensure that the sev is not issued ahead
of the store placing the lock in an unlocked state.

However, as sev is only concerned with other processors in a
multiprocessor system, we can restrict the scope of the preceding dsb
to the inner-shareable domain. Furthermore, we can restrict the scope to
consider only stores, since there are no independent loads on the unlock
path.

A side-effect of this change is that a spin_unlock operation no longer
forces completion of pending TLB invalidation, something which we rely
on when unlocking runqueues to ensure that CPU migration during TLB
maintenance routines doesn't cause us to continue before the operation
has completed.

This patch adds the -ishst suffix to the ARMv7 definition of dsb_sev()
and adds an inner-shareable dsb to the context-switch path when running
a preemptible, SMP, v7 kernel.
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

73a6fdc4

ARM: tlb: reduce scope of barrier domains for TLB invalidation · 62cbbc42

由 Will Deacon 提交于 5月 23, 2013

Our TLB invalidation routines may require a barrier before the
maintenance (in order to ensure pending page table writes are visible to
the hardware walker) and barriers afterwards (in order to ensure
completion of the maintenance and visibility in the instruction stream).

Whilst this is expensive, the cost can be reduced somewhat by reducing
the scope of the barrier instructions:

  - The barrier before only needs to apply to stores (pte writes)
  - Local ops are required only to affect the non-shareable domain
  - Global ops are required only to affect the inner-shareable domain

This patch makes these changes for the TLB flushing code.
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

62cbbc42

ARM: barrier: allow options to be passed to memory barrier instructions · 3ea12806

由 Will Deacon 提交于 5月 10, 2013

On ARMv7, the memory barrier instructions take an optional `option'
field which can be used to constrain the effects of a memory barrier
based on shareability and access type.

This patch allows the caller to pass these options if required, and
updates the smp_*() barriers to request inner-shareable barriers,
affecting only stores for the _wmb variant. wmb() is also changed to
use the -st version of dsb.
Reported-by: NAlbin Tonnerre <albin.tonnerre@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

3ea12806

ARM: tlb: don't perform inner-shareable invalidation for local BP ops · 2c813980

由 Will Deacon 提交于 2月 18, 2013

Now that the ASID allocator doesn't require inner-shareable maintenance,
we can convert the local_bp_flush_all function to perform only
non-shareable flushing, in a similar manner to the TLB invalidation
routines.
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

2c813980

ARM: tlb: don't bother with barriers for branch predictor maintenance · 587b9b64

由 Will Deacon 提交于 5月 23, 2013

Branch predictor maintenance is only required when we are either
changing the kernel's view of memory (switching tables completely) or
dealing with ASID rollover.

Both of these use-cases require subsequent TLB invalidation, which has
the relevant barrier instructions to ensure completion and visibility
of the maintenance, so this patch removes the instruction barrier from
[local_]flush_bp_all.
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

587b9b64

ARM: tlb: don't perform inner-shareable invalidation for local TLB ops · f0915781

由 Will Deacon 提交于 2月 11, 2013

Inner-shareable TLB invalidation is typically more expensive than local
(non-shareable) invalidation, so performing the broadcasting for
local_flush_tlb_* operations is a waste of cycles and needlessly
clobbers entries in the TLBs of other CPUs.

This patch introduces __flush_tlb_* versions for many of the TLB
invalidation functions, which only respect inner-shareable variants of
the invalidation instructions when presented with the TLB_V7_UIS_FULL
flag. The local version is also inlined to prevent SMP_ON_UP kernels
from missing flushes, where the __flush variant would be called with
the UP flags.

This gains us around 0.5% in hackbench scores for a dual-core A15, but I
would expect this to improve as more cores (and clusters) are added to
the equation.
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Reported-by: NAlbin Tonnerre <Albin.Tonnerre@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

f0915781

03 8月, 2013 1 次提交

ARM: fix nommu builds with (ARM: move signal handlers into a vdso-like page) · 8c0cc8a5

由 Russell King 提交于 8月 03, 2013

Olof reports that noMMU builds error out with:

arch/arm/kernel/signal.c: In function 'setup_return':
arch/arm/kernel/signal.c:413:25: error: 'mm_context_t' has no member named 'sigpage'

This shows one of the evilnesses of IS_ENABLED().  Get rid of it here
and replace it with #ifdef's - and as no noMMU platform can make use
of sigpage, depend on CONIFG_MMU not CONFIG_ARM_MPU.
Reported-by: NOlof Johansson <olof@lixom.net>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

8c0cc8a5

01 8月, 2013 1 次提交

ARM: make vectors page inaccessible from userspace · a5463cd3

由 Russell King 提交于 7月 31, 2013

If kuser helpers are not provided by the kernel, disable user access to
the vectors page.  With the kuser helpers gone, there is no reason for
this page to be visible to userspace.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

a5463cd3

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功