提交 · 74703cc4e08372b8aedfd687bef8182797215d30 · openeuler / Kernel

20 8月, 2015 1 次提交

powerpc/powernv: Fix the log message when disabling VF · 74703cc4

由 Wei Yang 提交于 7月 20, 2015

On powernv platform, IOV BAR would be shifted if necessary. While the log
message is not correct when disabling VFs.

This patch fixes this by print correct message based on the offset value.
Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

74703cc4

19 8月, 2015 2 次提交

powerpc/512x: silence a USB Kconfig dependency warning · acf6cec8

由 Gerhard Sittig 提交于 6月 03, 2013

the PPC_MPC512x config automatically selected USB_EHCI_BIG_ENDIAN_*
switches, which made Kconfig warn about "unmet direct dependencies":

scripts/kconfig/conf --silentoldconfig Kconfig
warning: (PPC_MPC512x && 440EPX) selects USB_EHCI_BIG_ENDIAN_DESC which has unmet direct dependencies (USB_SUPPORT && USB && USB_EHCI_HCD)
warning: (PPC_MPC512x && PPC_PS3 && PPC_CELLEB && 440EPX) selects USB_EHCI_BIG_ENDIAN_MMIO which has unmet direct dependencies (USB_SUPPORT && USB && USB_EHCI_HCD)
warning: (PPC_MPC512x && 440EPX) selects USB_EHCI_BIG_ENDIAN_DESC which has unmet direct dependencies (USB_SUPPORT && USB && USB_EHCI_HCD)
warning: (PPC_MPC512x && PPC_PS3 && PPC_CELLEB && 440EPX) selects USB_EHCI_BIG_ENDIAN_MMIO which has unmet direct dependencies (USB_SUPPORT && USB && USB_EHCI_HCD)

make the selected entries additionally depend on USB_EHCI_HCD which
silences the warning
Signed-off-by: NGerhard Sittig <gsi@denx.de>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

acf6cec8

powerpc/nvram: print no error when pstore backend is not nvram · 74943dab

由 Hari Bathini 提交于 5月 11, 2015

Pstore only supports one backend at a time. The preferred
pstore backend is set by passing the pstore.backend=<name>
argument to the kernel at boot time. Currently, while trying
to register with pstore, nvram throws an error message even
when "pstore.backend != nvram", which is unnecessary. This
patch removes the error message in case "pstore.backend != nvram".
Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

74943dab

18 8月, 2015 12 次提交

powerpc/nvram: use kmemdup rather than duplicating its implementation · fc9e9cbf

由 Andrzej Hajda 提交于 8月 07, 2015

The patch was generated using fixed coccinelle semantic patch
scripts/coccinelle/api/memdup.cocci [1].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2014320Signed-off-by: NAndrzej Hajda <a.hajda@samsung.com>
Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fc9e9cbf

powerpc/pseries: use kmemdup rather than duplicating its implementation · 2e16acc5

由 Andrzej Hajda 提交于 8月 07, 2015

The patch was generated using fixed coccinelle semantic patch
scripts/coccinelle/api/memdup.cocci [1].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2014320Signed-off-by: NAndrzej Hajda <a.hajda@samsung.com>
Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2e16acc5

powerpc/eeh: Disable automatically blocked PCI config · 39bfd715

由 Gavin Shan 提交于 7月 30, 2015

pcibios_set_pcie_reset_state() could be called to complete
reset request when passing through PCI device, flag
EEH_PE_ISOLATED is set before saving the PCI config sapce.
On some Broadcom adapters, EEH_PE_CFG_BLOCKED is automatically
set when the flag EEH_PE_ISOLATED is marked. It caused bogus
data saved from the PCI config space, which will be restored
to the PCI adapter after the reset. Eventually, the hardware
can't work with corrupted data in PCI config space.

The patch fixes the issue with eeh_pe_state_mark_no_cfg(), which
doesn't set EEH_PE_CFG_BLOCKED when seeing EEH_PE_ISOLATED on the
PE, in order to avoid the bogus data saved and restored to the PCI
config space.
Reported-by: NRajanikanth H. Adaveeshaiah <rajanikanth.ha@in.ibm.com>
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

39bfd715

powerpc: Export include/uapi/asm/eeh.h · dd497154

由 Gavin Shan 提交于 8月 18, 2015

This adds include/uapi/asm/eeh.h to kbuild so that the header
file will be exported automatically with below command. The
header file was added by commit ed3e81ff ("powerpc/eeh: Move PE
state constants around")

   make INSTALL_HDR_PATH=/tmp/headers \
        SRCARCH=powerpc headers_install
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

dd497154

powerpc/pseries: enable RTC class support · e0ad784b

由 Vaibhav Jain 提交于 7月 03, 2015

A working rtc kernel driver is needed so that hwclock can synchronize
system clock to rtc during shutdown/boot. We already have a powernv
platform rtc driver located at drivers/rtc/rtc-opal.c. However it depends
on CONFIG_RTC_CLASS which is disabled by default. Hence the driver isn't
enabled and not compiled for the powernv kernel.

We fix this by enabling rtc class support in pseries defconfig which
enables this driver and compiles it into the pseries kernel. In case
CONFIG_PPC_POWERNV is not enabled we fallback to 'Generic RTC support'
driver which emulates the legacy 'PC RTC driver'.
Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e0ad784b

powerpc/numa: initialize distance lookup table from drconf path · 1d805440

由 Nikunj A Dadhania 提交于 7月 02, 2015

In some situations, a NUMA guest that supports
ibm,dynamic-memory-reconfiguration node will end up having flat NUMA
distances between nodes. This is because of two problems in the
current code.

1) Different representations of associativity lists.

   There is an assumption about the associativity list in
   initialize_distance_lookup_table(). Associativity list has two forms:

   a) [cpu,memory]@x/ibm,associativity has following
      format:
           <N> <N integers>

   b) ibm,dynamic-reconfiguration-memory/ibm,associativity-lookup-arrays

           <M> <N> <M associativity lists each having N integers>
           M = the number of associativity lists
           N = the number of entries per associativity list

   Fix initialize_distance_lookup_table() so that it does not assume
   "case a". And update the caller to skip the length field before
   sending the associativity list.

2) Distance table not getting updated from drconf path.

   Node distance table will not get initialized in certain cases as
   ibm,dynamic-reconfiguration-memory path does not initialize the
   lookup table.

   Call initialize_distance_lookup_table() from drconf path with
   appropriate associativity list.
Reported-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: NNikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Acked-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1d805440

powerpc/powernv: move dma_get_required_mask from pnv_phb to pci_controller_ops · 53522982

由 Andrew Donnellan 提交于 8月 07, 2015

Simplify the dma_get_required_mask call chain by moving it from pnv_phb to
pci_controller_ops, similar to commit 763d2d8d ("powerpc/powernv:
Move dma_set_mask from pnv_phb to pci_controller_ops").

Previous call chain:

  0) call dma_get_required_mask() (kernel/dma.c)
  1) call ppc_md.dma_get_required_mask, if it exists. On powernv, that
     points to pnv_dma_get_required_mask() (platforms/powernv/setup.c)
  2) device is PCI, therefore call pnv_pci_dma_get_required_mask()
     (platforms/powernv/pci.c)
  3) call phb->dma_get_required_mask if it exists
  4) it only exists in the ioda case, where it points to
       pnv_pci_ioda_dma_get_required_mask() (platforms/powernv/pci-ioda.c)

New call chain:

  0) call dma_get_required_mask() (kernel/dma.c)
  1) device is PCI, therefore call pci_controller_ops.dma_get_required_mask
     if it exists
  2) in the ioda case, that points to pnv_pci_ioda_dma_get_required_mask()
     (platforms/powernv/pci-ioda.c)

In the p5ioc2 case, the call chain remains the same -
dma_get_required_mask() does not find either a ppc_md call or
pci_controller_ops call, so it calls __dma_get_required_mask().
Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

53522982

powerpc/mm: Drop CONFIG_PPC_HAS_HASH_64K · 73b341ef

由 Michael Ellerman 提交于 8月 07, 2015

The relation between CONFIG_PPC_HAS_HASH_64K and CONFIG_PPC_64K_PAGES is
painfully complicated.

But if we rearrange it enough we can see that PPC_HAS_HASH_64K
essentially depends on PPC_STD_MMU_64 && PPC_64K_PAGES.

We can then notice that PPC_HAS_HASH_64K is used in files that are only
built for PPC_STD_MMU_64, meaning it's equivalent to PPC_64K_PAGES.

So replace all uses and drop it.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

73b341ef

powerpc/mm: Simplify page size kconfig dependencies · 55f8b5b8

由 Michael Ellerman 提交于 8月 07, 2015

For config options with only a single value, guarding the single value
with 'if' is the same as adding a 'depends' statement. And it's more
standard to just use 'depends'.

And if the option has both an 'if' guard and a 'depends' we can collapse
them into a single 'depends' by combining them with &&.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

55f8b5b8

powerpc/mm: Drop the 64K on 4K version of pte_pagesize_index() · 95300577

由 Michael Ellerman 提交于 8月 07, 2015

Now that support for 64k pages with a 4K kernel is removed, this code is
unreachable.

CONFIG_PPC_HAS_HASH_64K can only be true when CONFIG_PPC_64K_PAGES is
also true.

But when CONFIG_PPC_64K_PAGES is true we include pte-hash64.h which
includes pte-hash64-64k.h, which defines both pte_pagesize_index() and
crucially __real_pte, which means this definition can never be used.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

95300577

powerpc/cell: Drop support for 64K local store on 4K kernels · f444f1f8

由 Michael Ellerman 提交于 8月 07, 2015

Back in the olden days we added support for using 64K pages to map the
SPU (Synergistic Processing Unit) local store on Cell, when the main
kernel was using 4K pages.

This was useful at the time because distros were using 4K pages, but
using 64K pages on the SPUs could reduce TLB pressure there.

However these days the number of Cell users is approaching zero, and
supporting this option adds unpleasant complexity to the memory
management code.

So drop the option, CONFIG_SPU_FS_64K_LS, and all related code.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Acked-by: NJeremy Kerr <jk@ozlabs.org>

f444f1f8

powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash · 74b5037b

由 Michael Ellerman 提交于 8月 07, 2015

The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K
PAGE_SIZE.

However when built with a 4K PAGE_SIZE there is an additional config
option which can be enabled, PPC_HAS_HASH_64K, which means the kernel
also knows how to hash a 64K page even though the base PAGE_SIZE is 4K.

This is used in one obscure configuration, to support 64K pages for SPU
local store on the Cell processor when the rest of the kernel is using
4K pages.

In this configuration, pte_pagesize_index() is defined to just pass
through its arguments to get_slice_psize(). However pte_pagesize_index()
is called for both user and kernel addresses, whereas get_slice_psize()
only knows how to handle user addresses.

This has been broken forever, however until recently it happened to
work. That was because in get_slice_psize() the large kernel address
would cause the right shift of the slice mask to return zero.

However in commit 7aa0727f ("powerpc/mm: Increase the slice range to
64TB"), the get_slice_psize() code was changed so that instead of a
right shift we do an array lookup based on the address. When passed a
kernel address this means we index way off the end of the slice array
and return random junk.

That is only fatal if we happen to hit something non-zero, but when we
do return a non-zero value we confuse the MMU code and eventually cause
a check stop.

This fix is ugly, but simple. When we're called for a kernel address we
return 4K, which is always correct in this configuration, otherwise we
use the slice mask.

Fixes: 7aa0727f ("powerpc/mm: Increase the slice range to 64TB")
Reported-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

74b5037b

14 8月, 2015 2 次提交

powerpc/eeh: Probe after unbalanced kref check · e642d11b

由 Daniel Axtens 提交于 8月 14, 2015

In the complete hotplug case, EEH PEs are supposed to be released
and set to NULL. Normally, this is done by eeh_remove_device(),
which is called from pcibios_release_device().

However, if something is holding a kref to the device, it will not
be released, and the PE will remain. eeh_add_device_late() has
a check for this which will explictly destroy the PE in this case.

This check in eeh_add_device_late() occurs after a call to
eeh_ops->probe(). On PowerNV, probe is a pointer to pnv_eeh_probe(),
which will exit without probing if there is an existing PE.

This means that on PowerNV, devices with outstanding krefs will not
be rediscovered by EEH correctly after a complete hotplug. This is
affecting CXL (CAPI) devices in the field.

Put the probe after the kref check so that the PE is destroyed
and affected devices are correctly rediscovered by EEH.

Fixes: d91dafc0 ("powerpc/eeh: Delay probing EEH device during hotplug")
Cc: stable@vger.kernel.org
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e642d11b

powerpc: Add an inline function to update POWER8 HID0 · e63dbd16

由 Gautham R. Shenoy 提交于 8月 05, 2015

Section 3.7 of Version 1.2 of the Power8 Processor User's Manual
prescribes that updates to HID0 be preceded by a SYNC instruction and
followed by an ISYNC instruction (Page 91).

Create an inline function name update_power8_hid0() which follows this
recipe and invoke it from the static split core path.
Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
Tested-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e63dbd16

12 8月, 2015 6 次提交

powerpc/prom: Use DRCONF flags while processing detected LMBs · 9afac933

由 Anshuman Khandual 提交于 8月 06, 2015

Replace hard coded values with existing DRCONF flags while procesing
detected LMBs from the device tree. Does not change any functionality.
Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9afac933

powerpc/xmon: Drop the valid variable completely in dump_segments() · 8218a303

由 Anshuman Khandual 提交于 7月 29, 2015

The value of 'valid' is always zero when 'esid' is zero, and if 'esid'
is non-zero then the value of 'valid' is irrelevant because we are using
logical or in the if expression.

In fact 'valid' can be dropped completely from dump_segments() by
simply doing the check with SLB_ESID_V directly in the if.
Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Rewrite change log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8218a303

powerpc/prom: Simplify the logic to fetch SLB size · 9c61f7a0

由 Anshuman Khandual 提交于 7月 29, 2015

The code to fetch the SLB size from the device tree wants to first look
for "slb-size" and then if that's not found "ibm,slb-size".

We can simplify the code by looking for the properties and then if we
find one of them we set mmu_slb_size.

We also change the function name from check_cpu_slb_size() to
init_mmu_slb_size() as the function doesn't check anything, it only
initialises mmu_slb_size.
Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Rewrite change log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9c61f7a0

powerpc/slb: Add documentation on runtime patching of SLB encoding · 79d0be74

由 Anshuman Khandual 提交于 7月 29, 2015

This patch adds some documentation to patch_slb_encoding() explaining
how it works.
Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Update change log and mention the signedness of the immediate]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

79d0be74

powerpc/slb: Rename all the 'slot' occurrences to 'entry' · 2be682af

由 Anshuman Khandual 提交于 7月 29, 2015

The SLB code uses 'slot' and 'entry' interchangeably, change it to always
use 'entry'.
Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Rewrite change log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2be682af

powerpc/slb: Remove a duplicate extern variable · 752b8ade

由 Anshuman Khandual 提交于 7月 29, 2015

This patch just removes one redundant entry for one extern variable
'slb_compare_rr_to_size' from the scope. This patch does not change
any functionality.
Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

752b8ade

06 8月, 2015 8 次提交

powerpc/ftrace: add powerpc timebase as a trace clock source · 197165d4

由 Naveen N. Rao 提交于 4月 24, 2015

Add a new powerpc-specific trace clock using the timebase register,
similar to x86-tsc. This gives us
- a fast, monotonic, hardware clock source for trace entries, and
- a clock that can be used to correlate events across cpus as well as across
  hypervisor and guests.
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

197165d4

powerpc/4xx: Fix return value check in hsta_msi_probe() · 35a7f41c

由 Wei Yongjun 提交于 4月 16, 2015

In case of error, the functions platform_get_resource() and kmalloc()
returns NULL not ERR_PTR(). The IS_ERR() test in the return value check
should be replaced with NULL test.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

35a7f41c

powerpc: Remove redundant breaks · a825ac07

由 Joe Perches 提交于 6月 29, 2015

break; break; isn't useful.

Remove one.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a825ac07

powerpc: pci: use %pR for printing struct resource · ae2a84b4

由 Kevin Hao 提交于 6月 12, 2015

Use %pR to simplify the debug code. This also make the debug info more
readable.
Signed-off-by: NKevin Hao <haokexin@gmail.com>
[mpe: Unsplit multi-line printk strings]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ae2a84b4

powerpc/powernv: Invoke opal_cec_reboot2() on unrecoverable HMI. · 62521ea6

由 Mahesh Salgaonkar 提交于 8月 04, 2015

Invoke new opal_cec_reboot2() call with reboot type
OPAL_REBOOT_PLATFORM_ERROR (for unrecoverable HMI interrupts) to inform
BMC/OCC about this error, so that BMC can collect relevant data for error
analysis and decide what component to de-configure before rebooting.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

62521ea6

powerpc/powernv: Invoke opal_cec_reboot2() on unrecoverable machine check errors. · e784b649

由 Mahesh Salgaonkar 提交于 7月 31, 2015

On non-recoverable MCE errors in kernel space, Linux kernel panics
and system reboots. On BMC based system opal-prd runs as a daemon
in the host. Hence, kernel crash may prevent opal-prd to detect and
analyze this MCE error. This may land us in a situation where the faulty
memory never gets de-configured and Linux would keep hitting same MCE error
again and again. If this happens in early stage of kernel initialization,
then Linux will keep crashing and rebooting in a loop.

This patch fixes this issue by invoking new opal_cec_reboot2() call with
reboot type OPAL_REBOOT_PLATFORM_ERROR to inform BMC/OCC about this
error, so that BMC can collect relevant data for error analysis and
decide what component to de-configure before rebooting.

This patch is dependent on OPAL patchset posted on skiboot mailing list
at https://lists.ozlabs.org/pipermail/skiboot/2015-July/001771.html that
introduces opal_cec_reboot2() opal call.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e784b649

powerpc/powernv: Pull all HMI events before panic. · 1852ae27

由 Mahesh Salgaonkar 提交于 5月 05, 2015

In the event of unrecovered HMI the existing code panics as soon as
it receives the first unrecovered HMI event. This makes host to report
partial information about HMIs before panic. There may be more errors
which would have caused the HMI and hence more HMI event would have been
generated waiting to be pulled by host. This patch implements a logic to
pull and display all the HMI event before going down panic path.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1852ae27

powerpc/powernv: display reason for Malfunction Alert HMI. · c33e11d0

由 Mahesh Salgaonkar 提交于 5月 05, 2015

The V2 version of HMI event now carries additional information for
Malfunction Alert. It now contains error information about CORE and NX
checkstop. This patch checks and displays the check stop reason before
panic.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: NStewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c33e11d0

30 7月, 2015 1 次提交

powerpc/kernel: Enable seccomp filter · 2449acc5

由 Michael Ellerman 提交于 7月 23, 2015

This commit enables seccomp filter on powerpc, now that we have all the
necessary pieces in place.

To support seccomp's desire to modify the syscall return value under
some circumstances, we use a different ABI to the ptrace ABI. That is we
use r3 as the syscall return value, and orig_gpr3 is the first syscall
parameter.

This means the seccomp code, or a ptracer via SECCOMP_RET_TRACE, will
see -ENOSYS preloaded in r3. This is identical to the behaviour on x86,
and allows seccomp or the ptracer to either leave the -ENOSYS or change
it to something else, as well as rejecting or not the syscall by
modifying r0.

If seccomp does not reject the syscall, we restore the register state to
match what ptrace and audit expect, ie. r3 is the first syscall
parameter again. We do this restore using orig_gpr3, which may have been
modified by seccomp, which allows seccomp to modify the first syscall
paramater and allow the syscall to proceed.

We need to #ifdef the the additional handling of r3 for seccomp, so move
it all out of line.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NKees Cook <keescook@chromium.org>

2449acc5

29 7月, 2015 8 次提交

powerpc/kernel: Add SIG_SYS support for compat tasks · 1b60bab0

由 Michael Ellerman 提交于 7月 23, 2015

SIG_SYS was added in commit a0727e8c "signal, x86: add SIGSYS info
and make it synchronous."

Because we use the asm-generic struct siginfo, we got support for
SIG_SYS for free as part of that commit.

However there was no compat handling added for powerpc. That means we've
been advertising the existence of signfo._sifields._sigsys to compat
tasks, but not actually filling in the fields correctly.

Luckily it looks like no one has noticed, presumably because the only
user of SIGSYS in the kernel is seccomp filter, which we don't support
yet.

So before we enable seccomp filter, add compat handling for SIGSYS.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NKees Cook <keescook@chromium.org>

1b60bab0

powerpc: Change syscall_get_nr() to return int · e9fbe686

由 Michael Ellerman 提交于 7月 23, 2015

The documentation for syscall_get_nr() in asm-generic says:

 Note this returns int even on 64-bit machines. Only 32 bits of
 system call number can be meaningful. If the actual arch value
 is 64 bits, this truncates to 32 bits so 0xffffffff means -1.

However our implementation was never updated to reflect this.

Generally it's not important, but there is once case where it matters.

For seccomp filter with SECCOMP_RET_TRACE, the tracer will set
regs->gpr[0] to -1 to reject the syscall. When the task is a compat
task, this means we end up with 0xffffffff in r0 because ptrace will
zero extend the 32-bit value.

If syscall_get_nr() returns an unsigned long, then a 64-bit kernel will
see a positive value in r0 and will incorrectly allow the syscall
through seccomp.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NKees Cook <keescook@chromium.org>

e9fbe686

powerpc: Use orig_gpr3 in syscall_get_arguments() · 1cb9839b

由 Michael Ellerman 提交于 7月 23, 2015

Currently syscall_get_arguments() is used by syscall tracepoints, and
collect_syscall() which is used in some debugging as well as
/proc/pid/syscall.

The current implementation just copies regs->gpr[3 .. 5] out, which is
fine for all the current use cases.

When we enable seccomp filter, that will also start using
syscall_get_arguments(). However for seccomp filter we want to use r3
as the return value of the syscall, and orig_gpr3 as the first
parameter. This will allow seccomp to modify the return value in r3.

To support this we need to modify syscall_get_arguments() to return
orig_gpr3 instead of r3. This is safe for all uses because orig_gpr3
always contains the r3 value that was passed to the syscall. We store it
in the syscall entry path and never modify it.

Update syscall_set_arguments() while we're here, even though it's never
used.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NKees Cook <keescook@chromium.org>

1cb9839b

powerpc: Rework syscall_get_arguments() so there is only one loop · a7657844

由 Michael Ellerman 提交于 7月 23, 2015

Currently syscall_get_arguments() has two loops, one for compat and one
for regular tasks. In prepartion for the next patch, which changes which
registers we use, switch it to only have one loop, so we only have one
place to update.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NKees Cook <keescook@chromium.org>

a7657844

powerpc: Don't negate error in syscall_set_return_value() · 1b1a3702

由 Michael Ellerman 提交于 7月 23, 2015

Currently the only caller of syscall_set_return_value() is seccomp
filter, which is not enabled on powerpc.

This means we have not noticed that our implementation of
syscall_set_return_value() negates error, even though the value passed
in is already negative.

So remove the negation in syscall_set_return_value(), and expect the
caller to do it like all other implementations do.

Also add a comment about the ccr handling.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NKees Cook <keescook@chromium.org>

1b1a3702

powerpc: Drop unused syscall_get_error() · 2923e6d5

由 Michael Ellerman 提交于 7月 23, 2015

syscall_get_error() is unused, and never has been.

It's also probably wrong, as it negates r3 before returning it, but that
depends on what the caller is expecting.

It also doesn't deal with compat, and doesn't deal with TIF_NOERROR.

Although we could fix those, until it has a caller and it's clear what
semantics the caller wants it's just untested code. So drop it.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NKees Cook <keescook@chromium.org>

2923e6d5

powerpc/kernel: Change the do_syscall_trace_enter() API · d3837414

由 Michael Ellerman 提交于 7月 23, 2015

The API for calling do_syscall_trace_enter() is currently sensible
enough, it just returns the (modified) syscall number.

However once we enable seccomp filter it will get more complicated. When
seccomp filter runs, the seccomp kernel code (via SECCOMP_RET_ERRNO), or
a ptracer (via SECCOMP_RET_TRACE), may reject the syscall and *may* or may
*not* set a return value in r3.

That means the assembler that calls do_syscall_trace_enter() can not
blindly return ENOSYS, it needs to only return ENOSYS if a return value
has not already been set.

There is no way to implement that logic with the current API. So change
the do_syscall_trace_enter() API to make it deal with the return code
juggling, and the assembler can then just return whatever return code it
is given.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NKees Cook <keescook@chromium.org>

d3837414

powerpc/kernel: Switch to using MAX_ERRNO · c3525940

由 Michael Ellerman 提交于 7月 23, 2015

Currently on powerpc we have our own #define for the highest (negative)
errno value, called _LAST_ERRNO. This is defined to be 516, for reasons
which are not clear.

The generic code, and x86, use MAX_ERRNO, which is defined to be 4095.

In particular seccomp uses MAX_ERRNO to restrict the value that a
seccomp filter can return.

Currently with the mismatch between _LAST_ERRNO and MAX_ERRNO, a seccomp
tracer wanting to return 600, expecting it to be seen as an error, would
instead find on powerpc that userspace sees a successful syscall with a
return value of 600.

To avoid this inconsistency, switch powerpc to use MAX_ERRNO.

We are somewhat confident that generic syscalls that can return a
non-error value above negative MAX_ERRNO have already been updated to
use force_successful_syscall_return().

I have also checked all the powerpc specific syscalls, and believe that
none of them expect to return a non-error value between -MAX_ERRNO and
-516. So this change should be safe ...
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NKees Cook <keescook@chromium.org>

c3525940

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功