提交 · bac65d9d87b383471d8d29128319508d71b74180 · openanolis / cloud-kernel

07 9月, 2017 6 次提交

x86/mm: Document how CR4.PCIDE restore works · 1c9fe440

由 Andy Lutomirski 提交于 9月 06, 2017

While debugging a problem, I thought that using
cr4_set_bits_and_update_boot() to restore CR4.PCIDE would be
helpful.  It turns out to be counterproductive.

Add a comment documenting how this works.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1c9fe440

x86/mm: Reinitialize TLB state on hotplug and resume · 72c0098d

由 Andy Lutomirski 提交于 9月 06, 2017

When Linux brings a CPU down and back up, it switches to init_mm and then
loads swapper_pg_dir into CR3. With PCID enabled, this has the side effect
of masking off the ASID bits in CR3.

This can result in some confusion in the TLB handling code. If we
bring a CPU down and back up with any ASID other than 0, we end up
with the wrong ASID active on the CPU after resume. This could
cause our internal state to become corrupt, although major
corruption is unlikely because init_mm doesn't have any user pages.
More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion
in the next context switch. The result of *that* is a failure to
resume from suspend with probability 1 - 1/6^(cpus-1).

Fix it by reinitializing cpu_tlbstate on resume and CPU bringup.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reported-by: NJiri Kosina <jikos@kernel.org>
Fixes: 10af6235 ("x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID")
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

72c0098d

mm,fork: introduce MADV_WIPEONFORK · d2cd9ede

由 Rik van Riel 提交于 9月 06, 2017

Introduce MADV_WIPEONFORK semantics, which result in a VMA being empty
in the child process after fork.  This differs from MADV_DONTFORK in one
important way.

If a child process accesses memory that was MADV_WIPEONFORK, it will get
zeroes.  The address ranges are still valid, they are just empty.

If a child process accesses memory that was MADV_DONTFORK, it will get a
segmentation fault, since those address ranges are no longer valid in
the child after fork.

Since MADV_DONTFORK also seems to be used to allow very large programs
to fork in systems with strict memory overcommit restrictions, changing
the semantics of MADV_DONTFORK might break existing programs.

MADV_WIPEONFORK only works on private, anonymous VMAs.

The use case is libraries that store or cache information, and want to
know that they need to regenerate it in the child process after fork.

Examples of this would be:
 - systemd/pulseaudio API checks (fail after fork) (replacing a getpid
   check, which is too slow without a PID cache)
 - PKCS#11 API reinitialization check (mandated by specification)
 - glibc's upcoming PRNG (reseed after fork)
 - OpenSSL PRNG (reseed after fork)

The security benefits of a forking server having a re-inialized PRNG in
every child process are pretty obvious.  However, due to libraries
having all kinds of internal state, and programs getting compiled with
many different versions of each library, it is unreasonable to expect
calling programs to re-initialize everything manually after fork.

A further complication is the proliferation of clone flags, programs
bypassing glibc's functions to call clone directly, and programs calling
unshare, causing the glibc pthread_atfork hook to not get called.

It would be better to have the kernel take care of this automatically.

The patch also adds MADV_KEEPONFORK, to undo the effects of a prior
MADV_WIPEONFORK.

This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:

    https://man.openbsd.org/minherit.2

[akpm@linux-foundation.org: numerically order arch/parisc/include/uapi/asm/mman.h #defines]
Link: http://lkml.kernel.org/r/20170811212829.29186-3-riel@redhat.comSigned-off-by: NRik van Riel <riel@redhat.com>
Reported-by: NFlorian Weimer <fweimer@redhat.com>
Reported-by: NColm MacCártaigh <colm@allcosts.net>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Cc: <linux-api@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d2cd9ede

x86,mpx: make mpx depend on x86-64 to free up VMA flag · df3735c5

由 Rik van Riel 提交于 9月 06, 2017

Patch series "mm,fork,security: introduce MADV_WIPEONFORK", v4.

If a child process accesses memory that was MADV_WIPEONFORK, it will get
zeroes.  The address ranges are still valid, they are just empty.

If a child process accesses memory that was MADV_DONTFORK, it will get a
segmentation fault, since those address ranges are no longer valid in
the child after fork.

Since MADV_DONTFORK also seems to be used to allow very large programs
to fork in systems with strict memory overcommit restrictions, changing
the semantics of MADV_DONTFORK might break existing programs.

The use case is libraries that store or cache information, and want to
know that they need to regenerate it in the child process after fork.

Examples of this would be:
 - systemd/pulseaudio API checks (fail after fork) (replacing a getpid
   check, which is too slow without a PID cache)
 - PKCS#11 API reinitialization check (mandated by specification)
 - glibc's upcoming PRNG (reseed after fork)
 - OpenSSL PRNG (reseed after fork)

The security benefits of a forking server having a re-inialized PRNG in
every child process are pretty obvious.  However, due to libraries
having all kinds of internal state, and programs getting compiled with
many different versions of each library, it is unreasonable to expect
calling programs to re-initialize everything manually after fork.

A further complication is the proliferation of clone flags, programs
bypassing glibc's functions to call clone directly, and programs calling
unshare, causing the glibc pthread_atfork hook to not get called.

It would be better to have the kernel take care of this automatically.

The patchset also adds MADV_KEEPONFORK, to undo the effects of a prior
MADV_WIPEONFORK.

This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:

    https://man.openbsd.org/minherit.2

This patch (of 2):

MPX only seems to be available on 64 bit CPUs, starting with Skylake and
Goldmont.  Move VM_MPX into the 64 bit only portion of vma->vm_flags, in
order to free up a VMA flag.

Link: http://lkml.kernel.org/r/20170811212829.29186-2-riel@redhat.comSigned-off-by: NRik van Riel <riel@redhat.com>
Acked-by: NDave Hansen <dave.hansen@intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Colm MacCártaigh <colm@allcosts.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

df3735c5

mm: arch: consolidate mmap hugetlb size encodings · aafd4562

由 Mike Kravetz 提交于 9月 06, 2017

A non-default huge page size can be encoded in the flags argument of the
mmap system call.  The definitions for these encodings are in arch
specific header files.  However, all architectures use the same values.

Consolidate all the definitions in the primary user header file
(uapi/linux/mman.h).  Include definitions for all known huge page sizes.
Use the generic encoding definitions in hugetlb_encode.h as the basis
for these definitions.

Link: http://lkml.kernel.org/r/1501527386-10736-3-git-send-email-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aafd4562

metag/numa: remove the unused parent_node() macro · f0cd3406

由 Dou Liyang 提交于 9月 06, 2017

Commit a7be6e5a ("mm: drop useless local parameters of
__register_one_node()") removes the last user of parent_node().

The parent_node() macro in METAG architecture is unnecessary.

Remove it for cleanup.

Link: http://lkml.kernel.org/r/1501076076-1974-4-git-send-email-douly.fnst@cn.fujitsu.comSigned-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
Cc: James Hogan <james.hogan@imgtec.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f0cd3406

05 9月, 2017 10 次提交

alpha: math-emu: Fix modular build · d9e3cb2f

由 Ben Hutchings 提交于 7月 19, 2017

Commit 00fc0e0d ("alpha: move exports to actual definitions") also
removed the exports of the math emulator hooks, which are defined in C
code. In case anyone cares about the option of CONFIG_MATHEMU=m, add
exports next to those definitions. Also add a MODULE_LICENSE.

Fixes: 00fc0e0d ("alpha: move exports to actual definitions")
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NMatt Turner <mattst88@gmail.com>

d9e3cb2f

alpha: Restore symbol versions for symbols exported from assembly · 873f9b5b

由 Ben Hutchings 提交于 7月 19, 2017

Add <asm/asm-prototypes.h> so that genksyms knows the types of
these symbols and can generate CRCs for them.

Fixes: 00fc0e0d ("alpha: move exports to actual definitions")
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NMatt Turner <mattst88@gmail.com>

873f9b5b

alpha: defconfig: Cleanup from old Kconfig options · 81f166c2

由 Krzysztof Kozlowski 提交于 7月 20, 2017

Remove old, dead Kconfig options (in order appearing in this commit):
 - IP_NF_QUEUE: commit 3dd6664f ("netfilter: remove unused "config
   IP_NF_QUEUE"");
 - AUTOFS_FS: commit 561c5cf9 ("staging: Remove autofs3");
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: NMatt Turner <mattst88@gmail.com>

81f166c2

alpha: use kobj_to_dev() · 8c9b839c

由 Geliang Tang 提交于 1月 06, 2016

Use kobj_to_dev() instead of open-coding it.
Signed-off-by: NGeliang Tang <geliangtang@163.com>
Signed-off-by: NMatt Turner <mattst88@gmail.com>

8c9b839c

alpha: squash lines for immediate return · 203308a5

由 Masahiro Yamada 提交于 9月 11, 2016

Remove unneeded variables and assignments.

While we are here, fix the coding style of SMC37c669_read_config():
  - replace whitespaces at the start of lines with tabs
  - remove unneeded whitespaces around parentheses
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NMatt Turner <mattst88@gmail.com>

203308a5

alpha: kernel: Use vma_pages() · 236d62b0

由 Shyam Saini 提交于 10月 10, 2016

Replace explicit computation of vma page count by a call to
vma_pages()
Signed-off-by: NShyam Saini <mayhs11saini@gmail.com>
Signed-off-by: NMatt Turner <mattst88@gmail.com>

236d62b0

alpha: silence a buffer overflow warning · 03e1f044

由 Dan Carpenter 提交于 11月 14, 2016

We check that "member" is in bounds for the first line, but we also use
it on the next line without checking which is a mistake.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NMatt Turner <mattst88@gmail.com>

03e1f044

alpha: marvel: make use of raw_spinlock variants · b5a3a128

由 Julia Cartwright 提交于 3月 21, 2017

The alpha/marvel code currently implements an irq_chip for handling
interrupts; due to how irq_chip handling is done, it's necessary for the
irq_chip methods to be invoked from hardirq context, even on a a
real-time kernel. Because the spinlock_t type becomes a "sleeping"
spinlock w/ RT kernels, it is not suitable to be used with irq_chips.

A quick audit of the operations under the lock reveal that they do only
minimal, bounded work, and are therefore safe to do under a raw spinlock.
Signed-off-by: NJulia Cartwright <julia@ni.com>
Signed-off-by: NMatt Turner <mattst88@gmail.com>

b5a3a128

alpha: cleanup: remove __NR_sys_epoll_*, leave __NR_epoll_* · beb1057f

由 Sergei Trofimovich 提交于 4月 08, 2017

__NR_sys_epoll_create and friends are alpha-specific
while __NR_epoll_create is a generic name for other
arches.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NSergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: NMatt Turner <mattst88@gmail.com>

beb1057f

alpha: use generic fb.h · 1c0234aa

由 Tobias Klauser 提交于 5月 17, 2017

The arch uses a verbatim copy of the asm-generic version and does not
add any own implemntations to the header, so use asm-generic/fb.h
instead of duplicating code.
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: NMatt Turner <mattst88@gmail.com>

1c0234aa

04 9月, 2017 3 次提交

net: Remove CONFIG_NETFILTER_DEBUG and _ASSERT() macros. · 9efdb14f

由 Varsha Rao 提交于 8月 30, 2017

This patch removes CONFIG_NETFILTER_DEBUG and _ASSERT() macros as they
are no longer required. Replace _ASSERT() macros with WARN_ON().
Signed-off-by: NVarsha Rao <rvarsha016@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

9efdb14f

powerpc/xive: Fix section __init warning · 265601f0

由 Cédric Le Goater 提交于 9月 04, 2017

xive_spapr_init() is called from a __init routine and calls __init
routines.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

265601f0

powerpc: Fix kernel crash in emulation of vector loads and stores · 4716e488

由 Paul Mackerras 提交于 9月 04, 2017

Commit 350779a2 ("powerpc: Handle most loads and stores in
instruction emulation code", 2017-08-30) changed the register usage
in get_vr and put_vr with the aim of leaving the register number in
r3 untouched on return.  Unfortunately, r6 was not a good choice, as
the callers as of 350779a2 store a MSR value in r6.  Then, in
commit c22435a5 ("powerpc: Emulate FP/vector/VSX loads/stores
correctly when regs not live", 2017-08-30), the saving and restoring
of the MSR got moved into get_vr and put_vr.  Either way, the effect
is that we put a value in MSR that only has the 0x3f8 bits non-zero,
meaning that we are switching to 32-bit mode.  That leads to a crash
like this:

Unable to handle kernel paging request for instruction fetch
Faulting instruction address: 0x0007bea0
Oops: Kernel access of bad area, sig: 11 [#12]
LE SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in: vmx_crypto binfmt_misc ip_tables x_tables autofs4 crc32c_vpmsum
CPU: 6 PID: 32659 Comm: trashy_testcase Tainted: G      D         4.13.0-rc2-00313-gf3026f57e6ed-dirty #23
task: c000000f1bb9e780 task.stack: c000000f1ba98000
NIP:  000000000007bea0 LR: c00000000007b054 CTR: c00000000007be70
REGS: c000000f1ba9b960 TRAP: 0400   Tainted: G      D          (4.13.0-rc2-00313-gf3026f57e6ed-dirty)
MSR:  10000000400010a1 <HV,ME,IR,LE>  CR: 48000228  XER: 00000000
CFAR: c00000000007be74 SOFTE: 1
GPR00: c00000000007b054 c000000f1ba9bbe0 c000000000e6e000 000000000000001d
GPR04: c000000f1ba9bc00 c00000000007be70 00000000000000e8 9000000002009033
GPR08: 0000000002000000 100000000282f033 000000000b0a0900 0000000000001009
GPR12: 0000000000000000 c00000000fd42100 0706050303020100 a5a5a5a5a5a5a5a5
GPR16: 2e2e2e2e2e2de70c 2e2e2e2e2e2e2e2d 0000000000ff00ff 0606040202020000
GPR20: 000000000000005b ffffffffffffffff 0000000003020100 0000000000000000
GPR24: c000000f1ab90020 c000000f1ba9bc00 0000000000000001 0000000000000001
GPR28: c000000f1ba9bc90 c000000f1ba9bea0 000000000b0a0908 0000000000000001
NIP [000000000007bea0] 0x7bea0
LR [c00000000007b054] emulate_loadstore+0x1044/0x1280
Call Trace:
[c000000f1ba9bbe0] [c000000000076b80] analyse_instr+0x60/0x34f0 (unreliable)
[c000000f1ba9bc70] [c00000000007b7ec] emulate_step+0x23c/0x544
[c000000f1ba9bce0] [c000000000053424] arch_uprobe_skip_sstep+0x24/0x40
[c000000f1ba9bd00] [c00000000024b2f8] uprobe_notify_resume+0x598/0xba0
[c000000f1ba9be00] [c00000000001c284] do_notify_resume+0xd4/0xf0
[c000000f1ba9be30] [c00000000000bd44] ret_from_except_lite+0x70/0x74
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace a7ae7a7f3e0256b5 ]---

To fix this, we just revert to using r3 as before, since the callers
don't rely on r3 being left unmodified.

Fortunately, this can't be triggered by a misaligned load or store,
because vector loads and stores truncate misaligned addresses rather
than taking an alignment interrupt.  It can be triggered using
uprobes.

Fixes: 350779a2 ("powerpc: Handle most loads and stores in instruction emulation code")
Reported-by: NAnton Blanchard <anton@ozlabs.org>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Tested-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4716e488

02 9月, 2017 9 次提交

powerpc/xive: improve debugging macros · 5f121292

由 Cédric Le Goater 提交于 8月 30, 2017

Having the CPU identifier in the debug logs is helpful when tracking
issues. Also add some more logging and fix a compile issue in
xive_do_source_eoi().
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5f121292

powerpc/xive: add XIVE Exploitation Mode to CAS · ac5e5a54

由 Cédric Le Goater 提交于 8月 30, 2017

On POWER9, the Client Architecture Support (CAS) negotiation process
determines whether the guest operates in XIVE Legacy compatibility or
in XIVE exploitation mode. Now that we have initial guest support for
the XIVE interrupt controller, let's inform the hypervisor what we can
do.

The platform advertises the XIVE Exploitation Mode support using the
property "ibm,arch-vec-5-platform-support-vec-5", byte 23 bits 0-1 :

 - 0b00 XIVE legacy mode Only
 - 0b01 XIVE exploitation mode Only
 - 0b10 XIVE legacy or exploitation mode

The OS asks for XIVE Exploitation Mode support using the property
"ibm,architecture-vec-5", byte 23 bits 0-1:

 - 0b00 XIVE legacy mode Only
 - 0b01 XIVE exploitation mode Only
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ac5e5a54

powerpc/xive: introduce H_INT_ESB hcall · bed81ee1

由 Cédric Le Goater 提交于 8月 30, 2017

The H_INT_ESB hcall() is used to issue a load or store to the ESB page
instead of using the MMIO pages. This can be used as a workaround on
some HW issues. The OS knows that this hcall should be used on an
interrupt source when the ESB hcall flag is set to 1 in the hcall
H_INT_GET_SOURCE_INFO.

To maintain the frontier between the xive frontend and backend, we
introduce a new xive operation 'esb_rw' to be used in the routines
doing memory accesses on the ESBs.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bed81ee1

powerpc/xive: add the HW IRQ number under xive_irq_data · c58a14a9

由 Cédric Le Goater 提交于 8月 30, 2017

It will be required later by the H_INT_ESB hcall.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c58a14a9

powerpc/xive: introduce xive_esb_write() · 99f12257

由 Cédric Le Goater 提交于 8月 30, 2017

Some source support MMIO stores on the ESB page to perform EOI. Let's
introduce a specific routine for this case even if this should be the
only use of it.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

99f12257

powerpc/xive: rename xive_poke_esb() in xive_esb_read() · 59fc2724

由 Cédric Le Goater 提交于 8月 30, 2017

xive_poke_esb() is performing a load/read so it is better named as
xive_esb_read() as we will need to introduce a xive_esb_write()
routine. Also use the XIVE_ESB_LOAD_EOI offset when EOI'ing LSI
interrupts.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

59fc2724

powerpc/xive: guest exploitation of the XIVE interrupt controller · eac1e731

由 Cédric Le Goater 提交于 8月 30, 2017

This is the framework for using XIVE in a PowerVM guest. The support
is very similar to the native one in a much simpler form.

Each source is associated with an Event State Buffer (ESB). This is a
two bit state machine which is used to trigger events. The bits are
named "P" (pending) and "Q" (queued) and can be controlled by MMIO.
The Guest OS registers event (or notifications) queues on which the HW
will post event data for a target to notify.

Instead of OPAL calls, a set of Hypervisors call are used to configure
the interrupt sources and the event/notification queues of the guest:

 - H_INT_GET_SOURCE_INFO

   used to obtain the address of the MMIO page of the Event State
   Buffer (PQ bits) entry associated with the source.

 - H_INT_SET_SOURCE_CONFIG

   assigns a source to a "target".

 - H_INT_GET_SOURCE_CONFIG

   determines to which "target" and "priority" is assigned to a source

 - H_INT_GET_QUEUE_INFO

   returns the address of the notification management page associated
   with the specified "target" and "priority".

 - H_INT_SET_QUEUE_CONFIG

   sets or resets the event queue for a given "target" and "priority".
   It is also used to set the notification config associated with the
   queue, only unconditional notification for the moment.  Reset is
   performed with a queue size of 0 and queueing is disabled in that
   case.

 - H_INT_GET_QUEUE_CONFIG

   returns the queue settings for a given "target" and "priority".

 - H_INT_RESET

   resets all of the partition's interrupt exploitation structures to
   their initial state, losing all configuration set via the hcalls
   H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.

 - H_INT_SYNC

   issue a synchronisation on a source to make sure sure all
   notifications have reached their queue.

As for XICS, the XIVE interface for the guest is described in the
device tree under the "interrupt-controller" node. A couple of new
properties are specific to XIVE :

 - "reg"

   contains the base address and size of the thread interrupt
   managnement areas (TIMA), also called rings, for the User level and
   for the Guest OS level. Only the Guest OS level is taken into
   account today.

 - "ibm,xive-eq-sizes"

   the size of the event queues. One cell per size supported, contains
   log2 of size, in ascending order.

 - "ibm,xive-lisn-ranges"

   the interrupt numbers ranges assigned to the guest. These are
   allocated using a simple bitmap.

and also :

 - "/ibm,plat-res-int-priorities"

   contains a list of priorities that the hypervisor has reserved for
   its own use.

Tested with a QEMU XIVE model for pseries and with the Power hypervisor.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

eac1e731

powerpc/xive: introduce a common routine xive_queue_page_alloc() · 994ea2f4

由 Cédric Le Goater 提交于 8月 30, 2017

This routine will be used in the spapr backend. Also introduce a short
xive_alloc_order() helper.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

994ea2f4

powerpc/sstep: Avoid used uninitialized error · 3b79b261

由 Michael Ellerman 提交于 9月 02, 2017

Older compilers think val may be used uninitialized:

arch/powerpc/lib/sstep.c: In function 'emulate_loadstore':
arch/powerpc/lib/sstep.c:2758:23: error: 'val' may be used uninitialized in this function

We know better, but initialise val to 0 to avoid breaking the build.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3b79b261

01 9月, 2017 12 次提交

x86/idt: Fix the X86_TRAP_BP gate · c6ef8942

由 Ingo Molnar 提交于 9月 01, 2017

Andrei Vagin reported a CRIU regression and bisected it back to:

  90f6225f ("x86/idt: Move IST stack based traps to table init")

This table init conversion loses the system-gate property of X86_TRAP_BP
and erroneously moves it from DPL3 to DPL0.

Fix it.
Reported-by: NAndrei Vagin <avagin@virtuozzo.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: dvlasenk@redhat.com
Cc: linux-tip-commits@vger.kernel.org
Cc: peterz@infradead.org
Cc: brgerst@gmail.com
Cc: rostedt@goodmis.org
Cc: bp@alien8.de
Cc: luto@kernel.org
Cc: jpoimboe@redhat.com
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: torvalds@linux-foundation.org
Cc: tip-bot for Jacob Shin <tipbot@zytor.com>
Link: http://lkml.kernel.org/r/20170901082630.xvyi5bwk6etmppqc@gmail.com

c6ef8942

axonram: Return directly after a failed kzalloc() in axon_ram_probe() · fdbb9457

由 Markus Elfring 提交于 8月 03, 2017

* Return directly after a call of the function "kzalloc" failed
  at the beginning.

* Delete a repeated check for the local variable "bank"
  which became unnecessary with this refactoring.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fdbb9457

axonram: Improve a size determination in axon_ram_probe() · a1bddf39

由 Markus Elfring 提交于 8月 03, 2017

Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a1bddf39

axonram: Delete an error message for a failed memory allocation in axon_ram_probe() · c86a9397

由 Markus Elfring 提交于 8月 03, 2017

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdfSigned-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c86a9397

powerpc/powernv/npu: Move tlb flush before launching ATSD · bab9f954

由 Alistair Popple 提交于 8月 11, 2017

The nest MMU tlb flush needs to happen before the GPU translation
shootdown is launched to avoid the GPU refilling its tlb with stale
nmmu translations prior to the nmmu flush completing.

Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bab9f954

powerpc/iommu: Use permission-specific DEVICE_ATTR variants · 8a7aef2c

由 Julia Lawall 提交于 10月 29, 2016

Use DEVICE_ATTR_RW for read-write attributes.  This simplifies the
source code, improves readbility, and reduces the chance of
inconsistencies.
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8a7aef2c

powerpc/eeh: Delete an error out of memory message at init time · 6ab41161

由 Markus Elfring 提交于 8月 04, 2017

Omit an extra message for a memory allocation failure in
eeh_dev_init().

This issue was detected by using the Coccinelle software.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
[mpe: Do not drop the message that can happen at runtime and lead to
 an event not being handled]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6ab41161

powerpc/mm: Use seq_putc() in two functions · aae85e3c

由 Markus Elfring 提交于 5月 07, 2017

Two single characters (line breaks) should be put into a sequence.
Thus use the corresponding function "seq_putc".

This issue was detected by using the Coccinelle software.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

aae85e3c

crypto/nx: Add P9 NX specific error codes for 842 engine · 146e9f1b

由 Haren Myneni 提交于 8月 31, 2017

This patch adds changes for checking P9 specific 842 engine
error codes. These errros are reported in coprocessor status
block (CSB) for failures.
Signed-off-by: NHaren Myneni <haren@us.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

146e9f1b

powerpc/32: remove a NOP from memset() · ad1b0122

由 Christophe Leroy 提交于 8月 23, 2017

memset() is patched after initialisation to activate the
optimised part which uses cache instructions.

Today we have a 'b 2f' to skip the optimised patch, which then gets
replaced by a NOP, implying a useless cycle consumption.
As we have a 'bne 2f' just before, we could use that instruction
for the live patching, hence removing the need to have a
dedicated 'b 2f' to be replaced by a NOP.

This patch changes the 'bne 2f' by a 'b 2f'. During init, that
'b 2f' is then replaced by 'bne 2f'
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ad1b0122

powerpc/32: optimise memset() · 7bf6057b

由 Christophe Leroy 提交于 8月 23, 2017

There is no need to extend the set value to an int when the length
is lower than 4 as in that case we only do byte stores.
We can therefore immediately branch to the part handling it.
By separating it from the normal case, we are able to eliminate
a few actions on the destination pointer.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7bf6057b

powerpc: fix location of two EXPORT_SYMBOL · c0622167

由 Christophe Leroy 提交于 8月 23, 2017

Commit 9445aa1a ("ppc: move exports to definitions")
added EXPORT_SYMBOL() for memset() and flush_hash_pages() in
the middle of the functions.

This patch moves them at the end of the two functions.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c0622167

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功