提交 · 37f0e8fe6b10ee2ab52576caa721ee1282de74a6 · openanolis / cloud-kernel

09 1月, 2017 14 次提交

kvm: x86: mmu: Do not use bit 63 for tracking special SPTEs · 37f0e8fe

由 Junaid Shahid 提交于 12月 06, 2016

MMIO SPTEs currently set both bits 62 and 63 to distinguish them as special
PTEs. However, bit 63 is used as the SVE bit in Intel EPT PTEs. The SVE bit
is ignored for misconfigured PTEs but not necessarily for not-Present PTEs.
Since MMIO SPTEs use an EPT misconfiguration, so using bit 63 for them is
acceptable. However, the upcoming fast access tracking feature adds another
type of special tracking PTE, which uses not-Present PTEs and hence should
not set bit 63.

In order to use common bits to distinguish both type of special PTEs, we
now use only bit 62 as the special bit.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

37f0e8fe

kvm: x86: mmu: Introduce a no-tracking version of mmu_spte_update · f39a058d

由 Junaid Shahid 提交于 12月 06, 2016

mmu_spte_update() tracks changes in the accessed/dirty state of
the SPTE being updated and calls kvm_set_pfn_accessed/dirty
appropriately. However, in some cases (e.g. when aging the SPTE),
this shouldn't be done. mmu_spte_update_no_track() is introduced
for use in such cases.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f39a058d

kvm: x86: mmu: Refactor accessed/dirty checks in mmu_spte_update/clear · 83ef6c81

由 Junaid Shahid 提交于 12月 06, 2016

This simplifies mmu_spte_update() a little bit.
The checks for clearing of accessed and dirty bits are refactored into
separate functions, which are used inside both mmu_spte_update() and
mmu_spte_clear_track_bits(), as well as kvm_test_age_rmapp(). The new
helper functions handle both the case when A/D bits are supported in
hardware and the case when they are not.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

83ef6c81

kvm: x86: mmu: Fast Page Fault path retries · 97dceba2

由 Junaid Shahid 提交于 12月 06, 2016

This change adds retries into the Fast Page Fault path. Without the
retries, the code still works, but if a retry does end up being needed,
then it will result in a second page fault for the same memory access,
which will cause much more overhead compared to just retrying within the
original fault.

This would be especially useful with the upcoming fast access tracking
change, as that would make it more likely for retries to be needed
(e.g. due to read and write faults happening on different CPUs at
the same time).
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

97dceba2

kvm: x86: mmu: Rename spte_is_locklessly_modifiable() · ea4114bc

由 Junaid Shahid 提交于 12月 06, 2016

This change renames spte_is_locklessly_modifiable() to
spte_can_locklessly_be_made_writable() to distinguish it from other
forms of lockless modifications. The full set of lockless modifications
is covered by spte_has_volatile_bits().
Signed-off-by: NJunaid Shahid <junaids@google.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ea4114bc

kvm: x86: mmu: Use symbolic constants for EPT Violation Exit Qualifications · 27959a44

由 Junaid Shahid 提交于 12月 06, 2016

This change adds some symbolic constants for VM Exit Qualifications
related to EPT Violations and updates handle_ept_violation() to use
these constants instead of hard-coded numbers.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

27959a44

kvm: x86: reduce collisions in mmu_page_hash · 114df303

由 David Matlack 提交于 12月 19, 2016

When using two-dimensional paging, the mmu_page_hash (which provides
lookups for existing kvm_mmu_page structs), becomes imbalanced; with
too many collisions in buckets 0 and 512. This has been seen to cause
mmu_lock to be held for multiple milliseconds in kvm_mmu_get_page on
VMs with a large amount of RAM mapped with 4K pages.

The current hash function uses the lower 10 bits of gfn to index into
mmu_page_hash. When doing shadow paging, gfn is the address of the
guest page table being shadow. These tables are 4K-aligned, which
makes the low bits of gfn a good hash. However, with two-dimensional
paging, no guest page tables are being shadowed, so gfn is the base
address that is mapped by the table. Thus page tables (level=1) have
a 2MB aligned gfn, page directories (level=2) have a 1GB aligned gfn,
etc. This means hashes will only differ in their 10th bit.

hash_64() provides a better hash. For example, on a VM with ~200G
(99458 direct=1 kvm_mmu_page structs):

hash            max_mmu_page_hash_collisions
--------------------------------------------
low 10 bits     49847
hash_64         105
perfect         97

While we're changing the hash, increase the table size by 4x to better
support large VMs (further reduces number of collisions in 200G VM to
29).

Note that hash_64() does not provide a good distribution prior to commit
ef703f49 ("Eliminate bad hash multipliers from hash_32() and
hash_64()").
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Change-Id: I5aa6b13c834722813c6cca46b8b1ed6f53368ade
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

114df303

kvm: x86: export maximum number of mmu_page_hash collisions · f3414bc7

由 David Matlack 提交于 12月 20, 2016

Report the maximum number of mmu_page_hash collisions as a per-VM stat.
This will make it easy to identify problems with the mmu_page_hash in
the future.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f3414bc7

KVM: x86: simplify conditions with split/kernel irqchip · 826da321

由 Radim Krčmář 提交于 12月 16, 2016

Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

826da321

KVM: x86: prevent setup of invalid routes · 8231f50d

由 Radim Krčmář 提交于 12月 16, 2016

The check in kvm_set_pic_irq() and kvm_set_ioapic_irq() was just a
temporary measure until the code improved enough for us to do this.

This changes APIC in a case when KVM_SET_GSI_ROUTING is called to set up pic
and ioapic routes before KVM_CREATE_IRQCHIP. Those rules would get overwritten
by KVM_CREATE_IRQCHIP at best, so it is pointless to allow it. Userspaces
hopefully noticed that things don't work if they do that and don't do that.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8231f50d

KVM: x86: refactor pic setup in kvm_set_routing_entry · e5dc4877

由 Radim Krčmář 提交于 12月 16, 2016

Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e5dc4877

KVM: x86: make pic setup code look like ioapic setup · 09941366

由 Radim Krčmář 提交于 12月 16, 2016

We don't treat kvm->arch.vpic specially anymore, so the setup can look
like ioapic.  This gets a bit more information out of return values.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

09941366

KVM: x86: decouple irqchip_in_kernel() and pic_irqchip() · 49776faf

由 Radim Krčmář 提交于 12月 16, 2016

irqchip_in_kernel() tried to save a bit by reusing pic_irqchip(), but it
just complicated the code.
Add a separate state for the irqchip mode.
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
[Used Paolo's version of condition in irqchip_in_kernel().]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

49776faf

KVM: x86: don't allow kernel irqchip with split irqchip · 35e6eaa3

由 Radim Krčmář 提交于 12月 16, 2016

Split irqchip cannot be created after creating the kernel irqchip, but
we forgot to restrict the other way.  This is an API change.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

35e6eaa3

05 1月, 2017 5 次提交

KVM: VMX: remove duplicated declaration · 69130ea1

由 Jan Dakinevich 提交于 12月 23, 2016

Declaration of VMX_VPID_EXTENT_SUPPORTED_MASK occures twice in the code.
Probably, it was happened after unsuccessful merge.
Signed-off-by: NJan Dakinevich <jan.dakinevich@gmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

69130ea1

KVM: MIPS: Flush KVM entry code from icache globally · 32eb12a6

由 James Hogan 提交于 1月 03, 2017

Flush the KVM entry code from the icache on all CPUs, not just the one
that built the entry code.
Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.16.x-
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

32eb12a6

KVM: MIPS: Don't clobber CP0_Status.UX · 4c881451

由 James Hogan 提交于 1月 03, 2017

On 64-bit kernels, MIPS KVM will clear CP0_Status.UX to prevent the
guest (running in user mode) from accessing the 64-bit memory segments.
However the previous value of CP0_Status.UX is never restored when
exiting from the guest.

If the user process uses 64-bit addressing (the n64 ABI) this can result
in address error exceptions from the kernel if it needs to deliver a
signal before returning to user mode, as the kernel will need to write a
sigframe to high user addresses on the user stack which are disallowed
by CP0_Status.UX=0.

This is fixed by explicitly setting SX and UX again when exiting from
the guest, and explicitly clearing those bits when returning to the
guest. Having the SX and UX bits set when handling guest exits (rather
than only when exiting to userland) will be helpful when we support VZ,
since we shouldn't need to directly read or write guest memory, so it
will be valid for cache management IPIs to access host user addresses.
Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 4.8.x-
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

4c881451

arm64: restore get_current() optimisation · 9d84fb27

由 Mark Rutland 提交于 1月 03, 2017

Commit c02433dd ("arm64: split thread_info from task stack")
inverted the relationship between get_current() and
current_thread_info(), with sp_el0 now holding the current task_struct
rather than the current thead_info. The new implementation of
get_current() prevents the compiler from being able to optimize repeated
calls to either, resulting in a noticeable penalty in some
microbenchmarks.

This patch restores the previous optimisation by implementing
get_current() in the same way as our old current_thread_info(), using a
non-volatile asm statement.
Acked-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reported-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

9d84fb27

arm64: mm: fix show_pte KERN_CONT fallout · 6ef4fb38

由 Mark Rutland 提交于 1月 03, 2017

Recent changes made KERN_CONT mandatory for continued lines. In the
absence of KERN_CONT, a newline may be implicit inserted by the core
printk code.

In show_pte, we (erroneously) use printk without KERN_CONT for continued
prints, resulting in output being split across a number of lines, and
not matching the intended output, e.g.

[ff000000000000] *pgd=00000009f511b003
, *pud=00000009f4a80003
, *pmd=0000000000000000

Fix this by using pr_cont() for all the continuations.
Acked-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

6ef4fb38

04 1月, 2017 3 次提交

K
ARM64: defconfig: enable DRM_MESON as module · fcdaf1a2
由 Kevin Hilman 提交于 12月 08, 2016
```
Signed-off-by: NKevin Hilman <khilman@baylibre.com>
```
fcdaf1a2

ARM64: dts: meson-gx: Add Graphic Controller nodes · fafdbdf7

由 Neil Armstrong 提交于 12月 01, 2016

Add Video Processing Unit and CVBS Output nodes, and enable CVBS on selected
boards.
Reviewed-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: NNeil Armstrong <narmstrong@baylibre.com>
Signed-off-by: NKevin Hilman <khilman@baylibre.com>

fafdbdf7

K
ARM64: dts: meson-gxl: fix GPIO include · 1cf3df8a
由 Kevin Hilman 提交于 11月 07, 2016
```
Signed-off-by: NKevin Hilman <khilman@baylibre.com>
```
1cf3df8a

03 1月, 2017 3 次提交

ARM: dts: imx6: Disable "weim" node in the dtsi files · 116dad7d

由 Fabio Estevam 提交于 12月 30, 2016

Commit 1be81ea5 ("ARM: dts: imx6: Add imx-weim parameters to
dtsi's") causes the following probe error when the weim node is not
present on the board dts (such as imx6q-sabresd):

imx-weim 21b8000.weim: Invalid 'ranges' configuration
imx-weim: probe of 21b8000.weim failed with error -22

There is no need to always enable the "weim" node on mx6. Do the same
as in the other i.MX dtsi files where "weim" is disabled and only gets
enabled on a per dts basis.

All the imx6 weim dts users explicitily provide 'status = "okay"', so
this change has no impact on current imx6 weim users.

If a board does not use the weim driver it will not describe its 'ranges'
property, so simply disable the 'weim' node in the imx6 dtsi files to
avoid such probe error message.

Fixes: 1be81ea5 ("ARM: dts: imx6: Add imx-weim parameters to dtsi's")
Signed-off-by: NFabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: NShawn Guo <shawnguo@kernel.org>

116dad7d

parisc: Add line-break when printing segfault info · b4a9eb4c

由 Helge Deller 提交于 1月 02, 2017

Add a leading line break else printed line gets too long.
Signed-off-by: NHelge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v4.9

b4a9eb4c

ARM: dts: qcom: apq8064: Add missing scm clock · 542b9f07

由 Bjorn Andersson 提交于 12月 29, 2016

As per the device tree binding the apq8064 scm node requires the core
clock to be specified, so add this.
Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: NAndy Gross <andy.gross@linaro.org>

542b9f07

02 1月, 2017 9 次提交

ARM: davinci: da8xx: Fix sleeping function called from invalid context · d1df1e01

由 Alexandre Bailon 提交于 12月 09, 2016

Everytime the usb20 phy is enabled, there is a
"sleeping function called from invalid context" BUG.
In addition, there is a recursive locking happening
because of the recurse call to clk_enable().

clk_enable() from arch/arm/mach-davinci/clock.c uses
spin_lock_irqsave() before to invoke the callback
usb20_phy_clk_enable(). usb20_phy_clk_enable() uses
clk_get() and clk_enable_prepapre() which may sleep.

Replace clk_prepare_enable() by davinci_clk_enable().
Signed-off-by: NAlexandre Bailon <abailon@baylibre.com>
Suggested-by: NDavid Lechner <david@lechnology.com>
[nsekhar@ti.com: minor commit description adjustment]
Signed-off-by: NSekhar Nori <nsekhar@ti.com>

d1df1e01

ARM: davinci: Make __clk_{enable,disable} functions public · 48cd30b4

由 Alexandre Bailon 提交于 12月 09, 2016

In some cases, there is a need to enable a clock as part of
clock enable callback of a different clock. For example, USB
2.0 PHY clock enable requires USB 2.0 clock to be enabled.
In this case, it is safe to instead call __clk_enable()
since the clock framework lock is already taken. Calling
clk_enable() causes recursive locking error.

A similar case arises in the clock disable path.

To enable such usage, make __clk_{enable,disable} functions
publicly available outside of clock.c. Also, call them
davinci_clk_{enable|disable} now to be consistent with how
other davinci-specific clock functions are named.

Note that these functions are not exported to drivers. They
are meant for usage in platform specific clock management
code.
Signed-off-by: NAlexandre Bailon <abailon@baylibre.com>
Suggested-by: NDavid Lechner <david@lechnology.com>
Signed-off-by: NSekhar Nori <nsekhar@ti.com>

48cd30b4

ARM: davinci: da850: don't add emac clock to lookup table twice · ef37427a

由 Bartosz Golaszewski 提交于 12月 07, 2016

Similarly to the aemif clock - this screws up the linked list of clock
children. Create a separate clock for mdio inheriting the rate from
emac_clk.

Cc: <stable@vger.kernel.org> # 3.12.x-
Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
[nsekhar@ti.com: add a comment over mdio_clk to explaing its existence +
		 commit headline updates]
Signed-off-by: NSekhar Nori <nsekhar@ti.com>

ef37427a

ARM: davinci: da850: fix infinite loop in clk_set_rate() · 5d45b011

由 Bartosz Golaszewski 提交于 12月 07, 2016

The aemif clock is added twice to the lookup table in da850.c. This
breaks the children list of pll0_sysclk3 as we're using the same list
links in struct clk. When calling clk_set_rate(), we get stuck in
propagate_rate().

Create a separate clock for nand, inheriting the rate of the aemif
clock and retrieve it in the davinci_nand module.

Cc: <stable@vger.kernel.org> # 4.9.x
Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: NSekhar Nori <nsekhar@ti.com>

5d45b011

ARM: i.MX: remove map_io callback · d7da1ccf

由 Vladimir Murzin 提交于 12月 02, 2016

There is no need to define map_io only for debug_ll_io_init() since it
is already called in devicemaps_init() if map_io is NULL.

Apart from that, for NOMMU build debug_ll_io_init() is a nop which
leads to following error:

CC      arch/arm/mach-imx/mach-imx1.o
arch/arm/mach-imx/mach-imx1.c:40:13: error: 'debug_ll_io_init' undeclared here (not in a function)
  .map_io  = debug_ll_io_init,
             ^
make[1]: *** [arch/arm/mach-imx/mach-imx1.o] Error 1

Cc: Alexander Shiyan <shc_work@mail.ru>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: NShawn Guo <shawnguo@kernel.org>

d7da1ccf

ARM: dts: vf610-zii-dev-rev-b: Add missing newline · 4c51de45

由 Andreas Färber 提交于 11月 27, 2016

Found while reviewing Marvell dsa bindings usage.

Fixes: f283745b ("arm: vf610: zii devel b: Add support for switch interrupts")
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NAndreas Färber <afaerber@suse.de>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NShawn Guo <shawnguo@kernel.org>

4c51de45

ARM: dts: imx6qdl-nitrogen6x: remove duplicate iomux entry · db9e1886

由 Gary Bisson 提交于 11月 25, 2016

The NANDF_CS2 pad is also part of the wlan-vmmcgrp iomux group.

Removing is from the usdhc2grp group avoids the following error:
imx6q-pinctrl 20e0000.iomuxc: pin MX6Q_PAD_NANDF_CS2 already requested
by regulators:regulator@4; cannot claim for 2194000.usdhc
imx6q-pinctrl 20e0000.iomuxc: pin-187 (2194000.usdhc) status -22
imx6q-pinctrl 20e0000.iomuxc: could not request pin 187
(MX6Q_PAD_NANDF_CS2) from group usdhc2grp on device 20e0000.iomuxc
Signed-off-by: NGary Bisson <gary.bisson@boundarydevices.com>
Signed-off-by: NShawn Guo <shawnguo@kernel.org>

db9e1886

ARM: dts: imx31: fix AVIC base address · af92305e

由 Vladimir Zapolskiy 提交于 11月 17, 2016

On i.MX31 AVIC interrupt controller base address is at 0x68000000.

The problem was shadowed by the AVIC driver, which takes the correct
base address from a SoC specific header file.

Fixes: d2a37b3d ("ARM i.MX31: Add devicetree support")
Signed-off-by: NVladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Reviewed-by: NFabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: NShawn Guo <shawnguo@kernel.org>

af92305e

openrisc: Add _text symbol to fix ksym build error · 086cc1c3

由 Stafford Horne 提交于 12月 14, 2016

The build robot reports:

   .tmp_kallsyms1.o: In function `kallsyms_relative_base':
>> (.rodata+0x8a18): undefined reference to `_text'

This is when using 'make alldefconfig'. Adding this _text symbol to mark
the start of the kernel as in other architecture fixes this.
Signed-off-by: NStafford Horne <shorne@gmail.com>
Acked-by: NJonas Bonn <jonas@southpole.se>

086cc1c3

31 12月, 2016 1 次提交

ARM: dts: am572x-idk: Add gpios property to control PCIE_RESETn · 1a38de88

由 Kishon Vijay Abraham I 提交于 12月 30, 2016

Add 'gpios' property to pcie1 dt node and populate it with
GPIO3_23 in order to drive PCIE_RESETn high.

This gets PCIe cards to be detected in AM572X IDK board.
Signed-off-by: NKishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: NTony Lindgren <tony@atomide.com>

1a38de88

30 12月, 2016 5 次提交

arm64: dts: vexpress: Support GICC_DIR operations · 1dff32d7

由 Sudeep Holla 提交于 12月 13, 2016

The GICv2 CPU interface registers span across 8K, not 4K as indicated in
the DT. Only the GICC_DIR register is located after the initial 4K
boundary, leaving a functional system but without support for separately
EOI'ing and deactivating interrupts.

After this change the system supports split priority drop and interrupt
deactivation. This patch is based on similar one from Christoffer Dall:
commit 368400e2 ("ARM: dts: vexpress: Support GICC_DIR operations")
Signed-off-by: NSudeep Holla <sudeep.holla@arm.com>

1dff32d7

ARM: dts: vexpress: Support GICC_DIR operations · 368400e2

由 Christoffer Dall 提交于 12月 10, 2016

The GICv2 CPU interface registers span across 8K, not 4K as indicated in
the DT.  Only the GICC_DIR register is located after the initial 4K
boundary, leaving a functional system but without support for separately
EOI'ing and deactivating interrupts.

After this change the system supports split priority drop and interrupt
deactivation.
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
[sudeep.holla@arm.com: included same fix for tc1 platform too]
Signed-off-by: NSudeep Holla <sudeep.holla@arm.com>

368400e2

parisc: Drop TIF_RESTORE_SIGMASK and switch to generic code · 1fe0a7e0

由 Helge Deller 提交于 12月 27, 2016

Commit 7e781418 ("signal: consolidate {TS,TLF}_RESTORE_SIGMASK code")
introduced code with which the "restore sigmask" flag lives in task_struct
instead of ti->flags. Let's use this optimization on parisc too.
Signed-off-by: NHelge Deller <deller@gmx.de>

1fe0a7e0

parisc: Mark cr16 clocksource unstable on SMP systems · 41744213

由 Helge Deller 提交于 12月 26, 2016

The cr16 interval timer of each CPU is not syncronized to other cr16
timers in other CPUs in a SMP system. So, delay the registration of the
cr16 clocksource until all CPUs have been detected and then - if we are
on a SMP machine - mark the cr16 clocksource as unstable and lower it's
rating before registering it at the clocksource framework.

This patch fixes the stalled CPU warnings which we have seen since
introduction of the cr16 clocksource.
Signed-off-by: NHelge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v4.8+

41744213

mm: optimize PageWaiters bit use for unlock_page() · b91e1302

由 Linus Torvalds 提交于 12月 27, 2016

In commit 62906027 ("mm: add PageWaiters indicating tasks are
waiting for a page bit") Nick Piggin made our page locking no longer
unconditionally touch the hashed page waitqueue, which not only helps
performance in general, but is particularly helpful on NUMA machines
where the hashed wait queues can bounce around a lot.

However, the "clear lock bit atomically and then test the waiters bit"
sequence turns out to be much more expensive than it needs to be,
because you get a nasty stall when trying to access the same word that
just got updated atomically.

On architectures where locking is done with LL/SC, this would be trivial
to fix with a new primitive that clears one bit and tests another
atomically, but that ends up not working on x86, where the only atomic
operations that return the result end up being cmpxchg and xadd.  The
atomic bit operations return the old value of the same bit we changed,
not the value of an unrelated bit.

On x86, we could put the lock bit in the high bit of the byte, and use
"xadd" with that bit (where the overflow ends up not touching other
bits), and look at the other bits of the result.  However, an even
simpler model is to just use a regular atomic "and" to clear the lock
bit, and then the sign bit in eflags will indicate the resulting state
of the unrelated bit #7.

So by moving the PageWaiters bit up to bit #7, we can atomically clear
the lock bit and test the waiters bit on x86 too.  And architectures
with LL/SC (which is all the usual RISC suspects), the particular bit
doesn't matter, so they are fine with this approach too.

This avoids the extra access to the same atomic word, and thus avoids
the costly stall at page unlock time.

The only downside is that the interface ends up being a bit odd and
specialized: clear a bit in a byte, and test the sign bit.  Nick doesn't
love the resulting name of the new primitive, but I'd rather make the
name be descriptive and very clear about the limitation imposed by
trying to work across all relevant architectures than make it be some
generic thing that doesn't make the odd semantics explicit.

So this introduces the new architecture primitive

    clear_bit_unlock_is_negative_byte();

and adds the trivial implementation for x86.  We have a generic
non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
combination) which can be overridden by any architecture that can do
better.  According to Nick, Power has the same hickup x86 has, for
example, but some other architectures may not even care.

All these optimizations mean that my page locking stress-test (which is
just executing a lot of small short-lived shell scripts: "make test" in
the git source tree) no longer makes our page locking look horribly bad.
Before all these optimizations, just the unlock_page() costs were just
over 3% of all CPU overhead on "make test".  After this, it's down to
0.66%, so just a quarter of the cost it used to be.

(The difference on NUMA is bigger, but there this micro-optimization is
likely less noticeable, since the big issue on NUMA was not the accesses
to 'struct page', but the waitqueue accesses that were already removed
by Nick's earlier commit).
Acked-by: NNick Piggin <npiggin@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b91e1302

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功