提交 · 2a2830703a2371b47f7b50b1d35cb15dc0e2b717 · OpenHarmony / kernel_linux

12 5月, 2014 1 次提交

arm64: debug: avoid accessing mdscr_el1 on fault paths where possible · 2a283070

由 Will Deacon 提交于 4月 29, 2014

Since mdscr_el1 is part of the debug register group, it is highly likely
to be trapped by a hypervisor to prevent virtual machines from debugging
(buggering?) each other. Unfortunately, this absolutely destroys our
performance, since we access the register on many of our low-level
fault handling paths to keep track of the various debug state machines.

This patch removes our dependency on mdscr_el1 in the case that debugging
is not being used. More specifically we:

  - Use TIF_SINGLESTEP to indicate that a task is stepping at EL0 and
    avoid disabling step in the MDSCR when we don't need to.
    MDSCR_EL1.SS handling is moved to kernel_entry, when trapping from
    userspace.

  - Ensure debug exceptions are re-enabled on *all* exception entry
    paths, even the debug exception handling path (where we re-enable
    exceptions after invoking the handler). Since we can now rely on
    MDSCR_EL1.SS being cleared by the entry code, exception handlers can
    usually enable debug immediately before enabling interrupts.

  - Remove all debug exception unmasking from ret_to_user and
    el1_preempt, since we will never get here with debug exceptions
    masked.

This results in a slight change to kernel debug behaviour, where we now
step into interrupt handlers and data aborts from EL1 when debugging the
kernel, which is actually a useful thing to do. A side-effect of this is
that it *does* potentially prevent stepping off {break,watch}points when
there is a high-frequency interrupt source (e.g. a timer), so a debugger
would need to use either breakpoints or manually disable interrupts to
get around this issue.

With this patch applied, guest performance is restored under KVM when
debug register accesses are trapped (and we get a measurable performance
increase on the host on Cortex-A57 too).

Cc: Ian Campbell <ian.campbell@citrix.com>
Tested-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

2a283070

10 5月, 2014 8 次提交

arm64: mm: use inner-shareable barriers for inner-shareable maintenance · dc60b777

由 Will Deacon 提交于 5月 02, 2014

In order to ensure ordering and completion of inner-shareable maintenance
instructions (cache and TLB) on AArch64, we can use the -ish suffix to
the dmb and dsb instructions respectively.

This patch updates our low-level cache and tlb maintenance routines to
use the inner-shareable barrier variants where appropriate.
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

dc60b777

arm64: kvm: use inner-shareable barriers for inner-shareable maintenance · ee9e101c

由 Will Deacon 提交于 5月 02, 2014

In order to ensure completion of inner-shareable maintenance instructions
(cache and TLB) on AArch64, we can use the -ish suffix to the dsb
instruction.

This patch relaxes our dsb sy instructions to dsb ish where possible.
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

ee9e101c

arm64: head: fix cache flushing and barriers in set_cpu_boot_mode_flag · d0488597

由 Will Deacon 提交于 5月 02, 2014

set_cpu_boot_mode_flag is used to identify which exception levels are
encountered across the system by CPUs trying to enter the kernel. The
basic algorithm is: if a CPU is booting at EL2, it will set a flag at
an offset of #4 from __boot_cpu_mode, a cacheline-aligned variable.
Otherwise, a flag is set at an offset of zero into the same cacheline.
This enables us to check that all CPUs booted at the same exception
level.

This cacheline is written with the stage-1 MMU off (that is, via a
strongly-ordered mapping) and will bypass any clean lines in the cache,
leading to potential coherence problems when the variable is later
checked via the normal, cacheable mapping of the kernel image.

This patch reworks the broken flushing code so that we:

  (1) Use a DMB to order the strongly-ordered write of the cacheline
      against the subsequent cache-maintenance operation (by-VA
      operations only hazard against normal, cacheable accesses).

  (2) Use a single dc ivac instruction to invalidate any clean lines
      containing a stale copy of the line after it has been updated.
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

d0488597

arm64: barriers: use barrier() instead of smp_mb() when !SMP · be6209a6

由 Will Deacon 提交于 5月 02, 2014

The recently introduced acquire/release accessors refer to smp_mb()
in the !CONFIG_SMP case. This is confusing when reading the code, so use
barrier() directly when we know we're UP.
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

be6209a6

arm64: barriers: wire up new barrier options · 493e6874

由 Will Deacon 提交于 5月 02, 2014

Now that all callers of the barrier macros are updated to pass the
mandatory options, update the macros so the option is actually used.
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

493e6874

arm64: barriers: make use of barrier options with explicit barriers · 98f7685e

由 Will Deacon 提交于 5月 02, 2014

When calling our low-level barrier macros directly, we can often suffice
with more relaxed behaviour than the default "all accesses, full system"
option.

This patch updates the users of dsb() to specify the option which they
actually require.
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

98f7685e

arm64: mm: Optimise tlb flush logic where we have >4K granule · fa48e6f7

由 Steve Capper 提交于 5月 02, 2014

The tlb maintainence functions: __cpu_flush_user_tlb_range and
__cpu_flush_kern_tlb_range do not take into consideration the page
granule when looping through the address range, and repeatedly flush
tlb entries for the same page when operating with 64K pages.

This patch re-works the logic s.t. we instead advance the loop by
 1 << (PAGE_SHIFT - 12), so avoid repeating ourselves.

Also the routines have been converted from assembler to static inline
functions to aid with legibility and potential compiler optimisations.

The isb() has been removed from flush_tlb_kernel_range(.) as it is
only needed when changing the execute permission of a mapping. If one
needs to set an area of the kernel as execute/non-execute an isb()
must be inserted after the call to flush_tlb_kernel_range.

Cc: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: NSteve Capper <steve.capper@linaro.org>
Acked-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

fa48e6f7

arm64: xchg: prevent warning if return value is unused · e1dfda9c

由 Will Deacon 提交于 4月 30, 2014

Some users of xchg() don't bother using the return value, which results
in a compiler warning like the following (from kgdb):

In file included from linux/arch/arm64/include/asm/atomic.h:27:0,
                 from include/linux/atomic.h:4,
                 from include/linux/spinlock.h:402,
                 from include/linux/seqlock.h:35,
                 from include/linux/time.h:5,
                 from include/uapi/linux/timex.h:56,
                 from include/linux/timex.h:56,
                 from include/linux/sched.h:19,
                 from include/linux/pid_namespace.h:4,
                 from kernel/debug/debug_core.c:30:
kernel/debug/debug_core.c: In function ‘kgdb_cpu_enter’:
linux/arch/arm64/include/asm/cmpxchg.h:75:3: warning: value computed is not used [-Wunused-value]
  ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
   ^
linux/arch/arm64/include/asm/atomic.h:132:30: note: in expansion of macro ‘xchg’
 #define atomic_xchg(v, new) (xchg(&((v)->counter), new))

kernel/debug/debug_core.c:504:4: note: in expansion of macro ‘atomic_xchg’
    atomic_xchg(&kgdb_active, cpu);
    ^

This patch makes use of the same trick as we do for cmpxchg, by assigning
the return value to a dummy variable in the xchg() macro itself.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

e1dfda9c

09 5月, 2014 9 次提交

arm64: mm: Create gigabyte kernel logical mappings where possible · 206a2a73

由 Steve Capper 提交于 5月 06, 2014

We have the capability to map 1GB level 1 blocks when using a 4K
granule.

This patch adjusts the create_mapping logic s.t. when mapping physical
memory on boot, we attempt to use a 1GB block if both the VA and PA
start and end are 1GB aligned. This both reduces the levels of lookup
required to resolve a kernel logical address, as well as reduces TLB
pressure on cores that support 1GB TLB entries.
Signed-off-by: NSteve Capper <steve.capper@linaro.org>
Tested-by: NJungseok Lee <jays.lee@samsung.com>
[catalin.marinas@arm.com: s/prot_sect_kernel/PROT_SECT_NORMAL_EXEC/]
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

206a2a73

arm64: Make atomic64_t() return "long", not "long long" · ba6bf8c8

由 Bjorn Helgaas 提交于 5月 08, 2014

arm64 sets CONFIG_64BIT=y and hence uses the "long counter" atomic64_t
definition from include/linux/types.h.  Make atomic64_read() return "long",
not "long long".
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

ba6bf8c8

arm64: Clean up the default pgprot setting · a501e324

由 Catalin Marinas 提交于 4月 03, 2014

The primary aim of this patchset is to remove the pgprot_default and
prot_sect_default global variables and rely strictly on predefined
values. The original goal was to be able to run SMP kernels on UP
hardware by not setting the Shareability bit. However, it is unlikely to
see UP ARMv8 hardware and even if we do, the Shareability bit is no
longer assumed to disable cacheable accesses.

A side effect is that the device mappings now have the Shareability
attribute set. The hardware, however, should ignore it since Device
accesses are always Outer Shareable.

Following the removal of the two global variables, there is some PROT_*
macro reshuffling and cleanup, including the __PAGE_* macros (replaced
by PAGE_*).
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NWill Deacon <will.deacon@arm.com>

a501e324

arm64: Introduce execute-only page access permissions · bc07c2c6

由 Catalin Marinas 提交于 4月 03, 2014

The ARMv8 architecture allows execute-only user permissions by clearing
the PTE_UXN and PTE_USER bits. The kernel, however, can still access
such page, so execute-only page permission does not protect against
read(2)/write(2) etc. accesses. Systems requiring such protection must
implement/enable features like SECCOMP.

This patch changes the arm64 __P100 and __S100 protection_map[] macros
to the new __PAGE_EXECONLY attributes. A side effect is that
pte_valid_user() no longer triggers for __PAGE_EXECONLY since PTE_USER
isn't set. To work around this, the check is done on the PTE_NG bit via
the pte_valid_ng() macro. VM_READ is also checked now for page faults.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

bc07c2c6

arm64: Expose ESR_EL1 information to user when SIGSEGV/SIGBUS · 15af1942

由 Catalin Marinas 提交于 9月 16, 2013

This information is useful for instruction emulators to detect
read/write and access size without having to decode the faulting
instruction. The current patch exports it via sigcontext (struct
esr_context) and is only valid for SIGSEGV and SIGBUS.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

15af1942

arm64: Remove the aux_context structure · 0e0276d1

由 Catalin Marinas 提交于 4月 04, 2014

This patch removes the aux_context structure (and the containing file)
to allow the placement of the _aarch64_ctx end magic based on the
context stored on the signal stack.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

0e0276d1

arm64: Provide read/write fault information in compat signal handlers · 9141300a

由 Catalin Marinas 提交于 4月 06, 2014

For AArch32, bit 11 (WnR) of the FSR/ESR register is set when the fault
was caused by a write access and applications like Qemu rely on such
information being provided in sigcontext. This patch introduces the
ESR_EL1 tracking for the arm64 kernel faults and sets bit 11 accordingly
in compat sigcontext.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

9141300a

arm64: Remove boot thread synchronisation for spin-table release method · 64001113

由 Catalin Marinas 提交于 4月 04, 2014

The synchronisation with the boot thread already happens in __cpu_up()
via wait_for_completion_timeout(). In addition, __cpu_up() calls are
protected by the cpu_add_remove_lock mutex and already serialised.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

64001113

arm64: Implement cache_line_size() based on CTR_EL0.CWG · a41dc0e8

由 Catalin Marinas 提交于 4月 03, 2014

The hardware provides the maximum cache line size in the system via the
CTR_EL0.CWG bits. This patch implements the cache_line_size() function
to read such information, together with a sanity check if the statically
defined L1_CACHE_BYTES is smaller than the hardware value.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NWill Deacon <will.deacon@arm.com>

a41dc0e8

04 5月, 2014 5 次提交

arm64: Mark the Applied Micro X-Gene SATA controller as DMA coherent · 7a8d1ec1

由 Catalin Marinas 提交于 4月 25, 2014

Since the default DMA ops for arm64 are non-coherent, mark the X-Gene
controller explicitly as dma-coherent to avoid additional cache
maintenance.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Loc Ho <lho@apm.com>

7a8d1ec1

arm64: Use bus notifiers to set per-device coherent DMA ops · 6ecba8eb

由 Catalin Marinas 提交于 4月 25, 2014

Recently, the default DMA ops have been changed to non-coherent for
alignment with 32-bit ARM platforms (and DT files). This patch adds bus
notifiers to be able to set the coherent DMA ops (with no cache
maintenance) for devices explicitly marked as coherent via the
"dma-coherent" DT property.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

6ecba8eb

arm64: Make default dma_ops to be noncoherent · c7a4a765

由 Ritesh Harjani 提交于 4月 23, 2014

Currently arm64 dma_ops is by default made coherent which makes it
opposite in default policy from arm.

Make default dma_ops to be noncoherent (same as arm), as currently there
aren't any dma-capable drivers which assumes coherent ops
Signed-off-by: NRitesh Harjani <ritesh.harjani@gmail.com>
Acked-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

c7a4a765

arm64: fixmap: fix missing sub-page offset for earlyprintk · f774b7d1

由 Marc Zyngier 提交于 4月 28, 2014

Commit d57c33c5 (add generic fixmap.h) added (among other
similar things) set_fixmap_io to deal with early ioremap of devices.

More recently, commit bf4b558e (arm64: add early_ioremap support)
converted the arm64 earlyprintk to use set_fixmap_io. A side effect of
this conversion is that my virtual machines have stopped booting when
I pass "earlyprintk=uart8250-8bit,0x3f8" to the guest kernel.

Turns out that the new earlyprintk code doesn't care at all about
sub-page offsets, and just assumes that the earlyprintk device will
be page-aligned. Obviously, that doesn't play well with the above example.

Further investigation shows that set_fixmap_io uses __set_fixmap instead
of __set_fixmap_offset. A fix is to introduce a set_fixmap_offset_io that
uses the latter, and to remove the superflous call to fix_to_virt
(which only returns the value that set_fixmap_io has already given us).

With this applied, my VMs are back in business. Tested on a Cortex-A57
platform with kvmtool as platform emulation.

Cc: Will Deacon <will.deacon@arm.com>
Acked-by: NMark Salter <msalter@redhat.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

f774b7d1

arm64: Fix for the arm64 kern_addr_valid() function · da6e4cb6

由 Dave Anderson 提交于 4月 15, 2014

Fix for the arm64 kern_addr_valid() function to recognize
virtual addresses in the kernel logical memory map.  The
function fails as written because it does not check whether
the addresses in that region are mapped at the pmd level to
2MB or 512MB pages, continues the page table walk to the
pte level, and issues a garbage value to pfn_valid().

Tested on 4K-page and 64K-page kernels.
Signed-off-by: NDave Anderson <anderson@redhat.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

da6e4cb6

02 5月, 2014 3 次提交

H
parisc: Use generic uapi/asm/resource.h file · 8a415e53
由 Helge Deller 提交于 4月 29, 2014
```
Signed-off-by: NHelge Deller <deller@gmx.de>
```
8a415e53

parisc: remove _STK_LIM_MAX override · e0d8898d

由 John David Anglin 提交于 4月 27, 2014

There are only a couple of architectures that override _STK_LIM_MAX to
a non-infinity value. This changes the stack allocation semantics in
subtle ways. For example, GNU make changes its stack allocation to the
hard maximum defined by _STK_LIM_MAX. As a results, threads executed
by processes running under make are allocated a stack size of
_STK_LIM_MAX rather than a sensible default value. This causes various
thread stress tests to fail when they can't muster more than about 50
threads.

The attached change implements the default behavior used by the
majority of architectures.
Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
Reviewed-by: NCarlos O'Donell <carlos@systemhalted.org>
Cc: stable@vger.kernel.org # 3.14
Signed-off-by: NHelge Deller <deller@gmx.de>

e0d8898d

Hexagon: Delete stale barrier.h · b7e1bd96

由 Vineet Gupta 提交于 4月 18, 2014

Commit 93ea02bb ("arch: Clean up asm/barrier.h implementations")
wired generic barrier.h for hexagon, but failed to delete the existing
file.

Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
Compile-tested-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b7e1bd96

30 4月, 2014 1 次提交

ARC: !PREEMPT: Ensure Return to kernel mode is IRQ safe · 8aa9e85a

由 Vineet Gupta 提交于 4月 30, 2014

There was a very small race window where resume to kernel mode from a
Exception Path (or pure kernel mode which is true for most of ARC
exceptions anyways), was not disabling interrupts in restore_regs,
clobbering the exception regs

Anton found the culprit call flow (after many sleepless nights)

| 1. we got a Trap from user land
| 2. started to service it.
| 3. While doing some stuff on user-land memory (I think it is padzero()),
|     we got a DataTlbMiss
| 4. On return from it we are taking "resume_kernel_mode" path
| 5. NEED_RESHED is not set, so we go to "return from exception" path in
|     restore regs.
| 6. there seems to be IRQ happening
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
Cc: <stable@vger.kernel.org>   #3.10, 3.12, 3.13, 3.14
Cc: Anton Kolesov <Anton.Kolesov@synopsys.com>
Cc: Francois Bedard <Francois.Bedard@synopsys.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8aa9e85a

28 4月, 2014 13 次提交

arm: KVM: fix possible misalignment of PGDs and bounce page · 5d4e08c4

由 Mark Salter 提交于 3月 28, 2014

The kvm/mmu code shared by arm and arm64 uses kalloc() to allocate
a bounce page (if hypervisor init code crosses page boundary) and
hypervisor PGDs. The problem is that kalloc() does not guarantee
the proper alignment. In the case of the bounce page, the page sized
buffer allocated may also cross a page boundary negating the purpose
and leading to a hang during kvm initialization. Likewise the PGDs
allocated may not meet the minimum alignment requirements of the
underlying MMU. This patch uses __get_free_page() to guarantee the
worst case alignment needs of the bounce page and PGDs on both arm
and arm64.

Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: NMark Salter <msalter@redhat.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

5d4e08c4

genirq: x86: Ensure that dynamic irq allocation does not conflict · 62a08ae2

由 Thomas Gleixner 提交于 4月 24, 2014

On x86 the allocation of irq descriptors may allocate interrupts which
are in the range of the GSI interrupts. That's wrong as those
interrupts are hardwired and we don't have the irq domain translation
like PPC. So one of these interrupts can be hooked up later to one of
the devices which are hard wired to it and the io_apic init code for
that particular interrupt line happily reuses that descriptor with a
completely different configuration so hell breaks lose.

Inside x86 we allocate dynamic interrupts from above nr_gsi_irqs,
except for a few usage sites which have not yet blown up in our face
for whatever reason. But for drivers which need an irq range, like the
GPIO drivers, we have no limit in place and we don't want to expose
such a detail to a driver.

To cure this introduce a function which an architecture can implement
to impose a lower bound on the dynamic interrupt allocations.

Implement it for x86 and set the lower bound to nr_gsi_irqs, which is
the end of the hardwired interrupt space, so all dynamic allocations
happen above.

That not only allows the GPIO driver to work sanely, it also protects
the bogus callsites of create_irq_nr() in hpet, uv, irq_remapping and
htirq code. They need to be cleaned up as well, but that's a separate
issue.
Reported-by: NJin Yao <yao.jin@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>
Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Krogerus Heikki <heikki.krogerus@intel.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1404241617360.28206@ionos.tec.linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

62a08ae2

KVM: x86: Check for host supported fields in shadow vmcs · fe2b201b

由 Bandan Das 提交于 4月 21, 2014

We track shadow vmcs fields through two static lists,
one for read only and another for r/w fields. However, with
addition of new vmcs fields, not all fields may be supported on
all hosts. If so, copy_vmcs12_to_shadow() trying to vmwrite on
unsupported hosts will result in a vmwrite error. For example, commit
36be0b9d introduced GUEST_BNDCFGS, which is not supported
by all processors. Filter out host unsupported fields before
letting guests use shadow vmcs
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fe2b201b

x86/vsmp: Fix irq routing · 39025ba3

由 Oren Twaig 提交于 4月 28, 2014

Correct IRQ routing in case a vSMP box is detected
but the  Interrupt Routing Comply (IRC) value is set to
"comply", which leads to incorrect IRQ routing.

Before the patch:

When a vSMP box was detected and IRC was set to "comply",
users (and the kernel) couldn't effectively set the
destination of the IRQs. This is because the hook inside
vsmp_64.c always setup all CPUs as the IRQ destination using
cpumask_setall() as the return value for IRQ allocation mask.
Later, this "overrided" mask caused the kernel to set the IRQ
destination to the lowest online CPU in the mask (CPU0 usually).

After the patch:

When the IRC is set to "comply", users (and the kernel) can control
the destination of the IRQs as we will not be changing the
default "apic->vector_allocation_domain".
Signed-off-by: NOren Twaig <oren@scalemp.com>
Acked-by: NShai Fultheim <shai@scalemp.com>
Link: http://lkml.kernel.org/r/1398669697-2123-1-git-send-email-oren@scalemp.com
[ Minor readability edits. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

39025ba3

powerpc/4xx: Fix section mismatch in ppc4xx_pci.c · e4565362

由 Alistair Popple 提交于 4月 08, 2014

This patch fixes this section mismatch:

WARNING: vmlinux.o(.text+0x1efc4): Section mismatch in reference from
the function apm821xx_pciex_init_port_hw() to the function
.init.text:ppc4xx_pciex_wait_on_sdr.isra.9()

The function apm821xx_pciex_init_port_hw() references the function
__init ppc4xx_pciex_wait_on_sdr.isra.9().  This is often because
apm821xx_pciex_init_port_hw lacks a __init annotation or the
annotation of ppc4xx_pciex_wait_on_sdr.isra.9 is wrong.

apm821xx_pciex_init_port_hw is only referenced by a struct in
__initdata, so it should be safe to add __init to
apm821xx_pciex_init_port_hw.
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

e4565362

ppc/kvm: Clear the runlatch bit of a vcpu before napping · 582b910e

由 Preeti U Murthy 提交于 4月 11, 2014

When the guest cedes the vcpu or the vcpu has no guest to
run it naps. Clear the runlatch bit of the vcpu before
napping to indicate an idle cpu.
Signed-off-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

582b910e

ppc/kvm: Set the runlatch bit of a CPU just before starting guest · fd17dc7b

由 Preeti U Murthy 提交于 4月 11, 2014

The secondary threads in the core are kept offline before launching guests
in kvm on powerpc: "371fefd6:KVM: PPC: Allow book3s_hv guests to use
SMT processor modes."

Hence their runlatch bits are cleared. When the secondary threads are called
in to start a guest, their runlatch bits need to be set to indicate that they
are busy. The primary thread has its runlatch bit set though, but there is no
harm in setting this bit once again. Hence set the runlatch bit for all
threads before they start guest.
Signed-off-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fd17dc7b

ppc/powernv: Set the runlatch bits correctly for offline cpus · f2038911

由 Preeti U Murthy 提交于 4月 11, 2014

Up until now we have been setting the runlatch bits for a busy CPU and
clearing it when a CPU enters idle state. The runlatch bit has thus
been consistent with the utilization of a CPU as long as the CPU is online.

However when a CPU is hotplugged out the runlatch bit is not cleared. It
needs to be cleared to indicate an unused CPU. Hence this patch has the
runlatch bit cleared for an offline CPU just before entering an idle state
and sets it immediately after it exits the idle state.
Signed-off-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

f2038911

powerpc/pseries: Protect remove_memory() with device hotplug lock · 42dbfc86

由 Li Zhong 提交于 4月 10, 2014

While testing memory hot-remove, I found following dead lock:

Process #1141 is drmgr, trying to remove some memory, i.e. memory499.
It holds the memory_hotplug_mutex, and blocks when trying to remove file
"online" under dir memory499, in kernfs_drain(), at
        wait_event(root->deactivate_waitq,
                   atomic_read(&kn->active) == KN_DEACTIVATED_BIAS);

Process #1120 is trying to online memory499 by
   echo 1 > memory499/online

In .kernfs_fop_write, it uses kernfs_get_active() to increase
&kn->active, thus blocking process #1141. While itself is blocked later
when trying to acquire memory_hotplug_mutex, which is held by process

The backtrace of both processes are shown below:

[<c000000001b18600>] 0xc000000001b18600
[<c000000000015044>] .__switch_to+0x144/0x200
[<c000000000263ca4>] .online_pages+0x74/0x7b0
[<c00000000055b40c>] .memory_subsys_online+0x9c/0x150
[<c00000000053cbe8>] .device_online+0xb8/0x120
[<c00000000053cd04>] .online_store+0xb4/0xc0
[<c000000000538ce4>] .dev_attr_store+0x64/0xa0
[<c00000000030f4ec>] .sysfs_kf_write+0x7c/0xb0
[<c00000000030e574>] .kernfs_fop_write+0x154/0x1e0
[<c000000000268450>] .vfs_write+0xe0/0x260
[<c000000000269144>] .SyS_write+0x64/0x110
[<c000000000009ffc>] syscall_exit+0x0/0x7c

[<c000000001b18600>] 0xc000000001b18600
[<c000000000015044>] .__switch_to+0x144/0x200
[<c00000000030be14>] .__kernfs_remove+0x204/0x300
[<c00000000030d428>] .kernfs_remove_by_name_ns+0x68/0xf0
[<c00000000030fb38>] .sysfs_remove_file_ns+0x38/0x60
[<c000000000539354>] .device_remove_attrs+0x54/0xc0
[<c000000000539fd8>] .device_del+0x158/0x250
[<c00000000053a104>] .device_unregister+0x34/0xa0
[<c00000000055bc14>] .unregister_memory_section+0x164/0x170
[<c00000000024ee18>] .__remove_pages+0x108/0x4c0
[<c00000000004b590>] .arch_remove_memory+0x60/0xc0
[<c00000000026446c>] .remove_memory+0x8c/0xe0
[<c00000000007f9f4>] .pseries_remove_memblock+0xd4/0x160
[<c00000000007fcfc>] .pseries_memory_notifier+0x27c/0x290
[<c0000000008ae6cc>] .notifier_call_chain+0x8c/0x100
[<c0000000000d858c>] .__blocking_notifier_call_chain+0x6c/0xe0
[<c00000000071ddec>] .of_property_notify+0x7c/0xc0
[<c00000000071ed3c>] .of_update_property+0x3c/0x1b0
[<c0000000000756cc>] .ofdt_write+0x3dc/0x740
[<c0000000002f60fc>] .proc_reg_write+0xac/0x110
[<c000000000268450>] .vfs_write+0xe0/0x260
[<c000000000269144>] .SyS_write+0x64/0x110
[<c000000000009ffc>] syscall_exit+0x0/0x7c

This patch uses lock_device_hotplug() to protect remove_memory() called
in pseries_remove_memblock(), which is also stated before function
remove_memory():

 * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
 * and online/offline operations before this call, as required by
 * try_offline_node().
 */
void __ref remove_memory(int nid, u64 start, u64 size)

With this lock held, the other process(#1120 above) trying to online the
memory block will retry the system call when calling
lock_device_hotplug_sysfs(), and finally find No such device error.
Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

42dbfc86

powerpc: Fix error return in rtas_flash module init · 0c930692

由 Anton Blanchard 提交于 4月 14, 2014

module_init should return 0 or a negative errno.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0c930692

powerpc: Bump BOOT_COMMAND_LINE_SIZE to 2048 · 579a53ca

由 Anton Blanchard 提交于 4月 14, 2014

Bump the boot wrapper BOOT_COMMAND_LINE_SIZE to match the
kernel.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

579a53ca

powerpc: Bump COMMAND_LINE_SIZE to 2048 · a5980d06

由 Anton Blanchard 提交于 4月 14, 2014

I've had a report that the current limit is too small for
an automated network based installer. Bump it.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

a5980d06

powerpc: Rename duplicate COMMAND_LINE_SIZE define · a2dd5da7

由 Anton Blanchard 提交于 4月 14, 2014

We have two definitions of COMMAND_LINE_SIZE, one for the kernel
and one for the boot wrapper. I assume this is so the boot
wrapper can be self sufficient and not rely on kernel headers.

Having two defines with the same name is confusing, I just
updated the wrong one when trying to bump it.

Make the boot wrapper define unique by calling it
BOOT_COMMAND_LINE_SIZE.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

a2dd5da7

OpenHarmony / kernel_linux 上一次同步 4 年多

OpenHarmony / kernel_linux
上一次同步 4 年多