提交 · c9a844c5100a01c024b59401e5963ff65d6b5f31 · openeuler / Kernel

10 8月, 2017 1 次提交

sparc64: Support huge PUD case in get_user_pages · c9a844c5

由 Nitin Gupta 提交于 7月 29, 2017

get_user_pages() is used to do direct IO. It already
handles the case where the address range is backed
by PMD huge pages. This patch now adds the case where
the range could be backed by PUD huge pages.
Signed-off-by: NNitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9a844c5

15 7月, 2017 1 次提交

sparc64: Measure receiver forward progress to avoid send mondo timeout · 9d53caec

由 Jane Chu 提交于 7月 11, 2017

A large sun4v SPARC system may have moments of intensive xcall activities,
usually caused by unmapping many pages on many CPUs concurrently. This can
flood receivers with CPU mondo interrupts for an extended period, causing
some unlucky senders to hit send-mondo timeout. This problem gets worse
as cpu count increases because sometimes mappings must be invalidated on
all CPUs, and sometimes all CPUs may gang up on a single CPU.

But a busy system is not a broken system. In the above scenario, as long
as the receiver is making forward progress processing mondo interrupts,
the sender should continue to retry.

This patch implements the receiver's forward progress meter by introducing
a per cpu counter 'cpu_mondo_counter[cpu]' where 'cpu' is in the range
of 0..NR_CPUS. The receiver increments its counter as soon as it receives
a mondo and the sender tracks the receiver's counter. If the receiver has
stopped making forward progress when the retry limit is reached, the sender
declares send-mondo-timeout and panic; otherwise, the receiver is allowed
to keep making forward progress.

In addition, it's been observed that PCIe hotplug events generate Correctable
Errors that are handled by hypervisor and then OS. Hypervisor 'borrows'
a guest cpu strand briefly to provide the service. If the cpu strand is
simultaneously the only cpu targeted by a mondo, it may not be available
for the mondo in 20msec, causing SUN4V mondo timeout. It appears that 1 second
is the agreed wait time between hypervisor and guest OS, this patch makes
the adjustment.

Orabug: 25476541
Orabug: 26417466
Signed-off-by: NJane Chu <jane.chu@oracle.com>
Reviewed-by: NSteve Sistare <steven.sistare@oracle.com>
Reviewed-by: NAnthony Yznaga <anthony.yznaga@oracle.com>
Reviewed-by: NRob Gardner <rob.gardner@oracle.com>
Reviewed-by: NThomas Tai <thomas.tai@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d53caec

13 7月, 2017 2 次提交

mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic · dcda9b04

由 Michal Hocko 提交于 7月 12, 2017

__GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
the page allocator.  This has been true but only for allocations
requests larger than PAGE_ALLOC_COSTLY_ORDER.  It has been always
ignored for smaller sizes.  This is a bit unfortunate because there is
no way to express the same semantic for those requests and they are
considered too important to fail so they might end up looping in the
page allocator for ever, similarly to GFP_NOFAIL requests.

Now that the whole tree has been cleaned up and accidental or misled
usage of __GFP_REPEAT flag has been removed for !costly requests we can
give the original flag a better name and more importantly a more useful
semantic.  Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
that the allocator would try really hard but there is no promise of a
success.  This will work independent of the order and overrides the
default allocator behavior.  Page allocator users have several levels of
guarantee vs.  cost options (take GFP_KERNEL as an example)

 - GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
   attempt to free memory at all. The most light weight mode which even
   doesn't kick the background reclaim. Should be used carefully because
   it might deplete the memory and the next user might hit the more
   aggressive reclaim

 - GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
   allocation without any attempt to free memory from the current
   context but can wake kswapd to reclaim memory if the zone is below
   the low watermark. Can be used from either atomic contexts or when
   the request is a performance optimization and there is another
   fallback for a slow path.

 - (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
   non sleeping allocation with an expensive fallback so it can access
   some portion of memory reserves. Usually used from interrupt/bh
   context with an expensive slow path fallback.

 - GFP_KERNEL - both background and direct reclaim are allowed and the
   _default_ page allocator behavior is used. That means that !costly
   allocation requests are basically nofail but there is no guarantee of
   that behavior so failures have to be checked properly by callers
   (e.g. OOM killer victim is allowed to fail currently).

 - GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
   and all allocation requests fail early rather than cause disruptive
   reclaim (one round of reclaim in this implementation). The OOM killer
   is not invoked.

 - GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
   behavior and all allocation requests try really hard. The request
   will fail if the reclaim cannot make any progress. The OOM killer
   won't be triggered.

 - GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
   and all allocation requests will loop endlessly until they succeed.
   This might be really dangerous especially for larger orders.

Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
because they already had their semantic.  No new users are added.
__alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
there is no progress and we have already passed the OOM point.

This means that all the reclaim opportunities have been exhausted except
the most disruptive one (the OOM killer) and a user defined fallback
behavior is more sensible than keep retrying in the page allocator.

[akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
[mhocko@suse.com: semantic fix]
  Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
[mhocko@kernel.org: address other thing spotted by Vlastimil]
  Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Alex Belits <alex.belits@cavium.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: NeilBrown <neilb@suse.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dcda9b04

kernel/watchdog: introduce arch_touch_nmi_watchdog() · f2e0cff8

由 Nicholas Piggin 提交于 7月 12, 2017

For architectures that define HAVE_NMI_WATCHDOG, instead of having them
provide the complete touch_nmi_watchdog() function, just have them
provide arch_touch_nmi_watchdog().

This gives the generic code more flexibility in implementing this
function, and arch implementations don't miss out on touching the
softlockup watchdog or other generic details.

Link: http://lkml.kernel.org/r/20170616065715.18390-3-npiggin@gmail.comSigned-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NDon Zickus <dzickus@redhat.com>
Reviewed-by: NBabu Moger <babu.moger@oracle.com>
Tested-by: Babu Moger <babu.moger@oracle.com>	[sparc]
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f2e0cff8

12 7月, 2017 1 次提交

SPARC64: Fix sun4v DMA panic · 2ad67141

由 Tushar Dave 提交于 7月 11, 2017

64bit DMA only supported on sun4v equipped with ATU IOMMU HW.
'Commit b02c2b0b ("sparc: remove arch specific dma_supported
implementations")' introduced a code that incorrectly allow
dma_supported() to succeed for 64bit dma mask even if system doesn't
have ATU IOMMU. This results into panic.

Fix it.
Reported-by: NMeelis Roos <mroos@linux.ee>
Signed-off-by: NTushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ad67141

11 7月, 2017 2 次提交

sparc: move generic-y of exported headers to uapi/asm/Kbuild · b6744e04

由 Masahiro Yamada 提交于 7月 10, 2017

Since commit fcc8487d ("uapi: export all headers under uapi
directories"), all (and only) headers under uapi directories are
exported, but asm-generic wrappers are still exceptions.

To complete de-coupling the uapi from kernel headers, move generic-y
of exported headers to uapi/asm/Kbuild.

With this change, "make headers_install" will just need to parse
uapi/asm/Kbuild to build up exported headers.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>

b6744e04

lib/extable.c: use bsearch() library function in search_extable() · a94c33dd

由 Thomas Meyer 提交于 7月 10, 2017

[thomas@m3y3r.de: v3: fix arch specific implementations]
  Link: http://lkml.kernel.org/r/1497890858.12931.7.camel@m3y3r.deSigned-off-by: NThomas Meyer <thomas@m3y3r.de>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a94c33dd

07 7月, 2017 1 次提交

mm/hugetlb: add size parameter to huge_pte_offset() · 7868a208

由 Punit Agrawal 提交于 7月 06, 2017

A poisoned or migrated hugepage is stored as a swap entry in the page
tables.  On architectures that support hugepages consisting of
contiguous page table entries (such as on arm64) this leads to ambiguity
in determining the page table entry to return in huge_pte_offset() when
a poisoned entry is encountered.

Let's remove the ambiguity by adding a size parameter to convey
additional information about the requested address.  Also fixup the
definition/usage of huge_pte_offset() throughout the tree.

Link: http://lkml.kernel.org/r/20170522133604.11392-4-punit.agrawal@arm.comSigned-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Acked-by: NSteve Capper <steve.capper@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: James Hogan <james.hogan@imgtec.com> (odd fixer:METAG ARCHITECTURE)
Cc: Ralf Baechle <ralf@linux-mips.org> (supporter:MIPS)
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7868a208

04 7月, 2017 1 次提交
- A
  kill {__,}{get,put}_user_unaligned() · 3170d8d2
  由 Al Viro 提交于 5月 02, 2017
```
no users left
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  3170d8d2
03 7月, 2017 1 次提交

sparc: kernel: pmc: make of_device_ids const. · 0cd52df8

由 Arvind Yadav 提交于 6月 27, 2017

of_device_ids are not supposed to change at runtime. All functions
working with of_device_ids provided by <linux/of.h> work with const
of_device_ids. So mark the non-const structs as const.
Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0cd52df8

30 6月, 2017 1 次提交

sparc64: Use indirect calls in hamming weight stubs · 9289ea7f

由 David S. Miller 提交于 6月 22, 2017

Otherwise, depending upon link order, the branch relocation
limits could be exceeded.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>

9289ea7f

29 6月, 2017 1 次提交

arch: remove unused macro/function thread_saved_pc() · 6474924e

由 Tobias Klauser 提交于 6月 28, 2017

The only user of thread_saved_pc() in non-arch-specific code was removed
in commit 8243d559 ("sched/core: Remove pointless printout in
sched_show_task()").  Remove the implementations as well.

Some architectures use thread_saved_pc() in their arch-specific code.
Leave their thread_saved_pc() intact.
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6474924e

28 6月, 2017 3 次提交

sparc: remove arch specific dma_supported implementations · b02c2b0b

由 Christoph Hellwig 提交于 5月 22, 2017

Usually dma_supported decisions are done by the dma_map_ops instance.
Switch sparc to that model by providing a ->dma_supported instance for
sbus that always returns false, and implementations tailored to the sun4u
and sun4v cases for sparc64, and leave it unimplemented for PCI on
sparc32, which means always supported.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NDavid S. Miller <davem@davemloft.net>

b02c2b0b

sparc: remove leon_dma_ops · c6d333e0

由 Christoph Hellwig 提交于 5月 22, 2017

We can just use pci32_dma_ops directly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NDavid S. Miller <davem@davemloft.net>

c6d333e0

sparc: implement ->mapping_error · ceaf481c

由 Christoph Hellwig 提交于 5月 21, 2017

DMA_ERROR_CODE is going to go away, so don't rely on it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NDavid S. Miller <davem@davemloft.net>

ceaf481c

26 6月, 2017 15 次提交

sparc64: fix typo in property · a718d139

由 Pavel Tatashin 提交于 6月 25, 2017

There is a typo in a comment that propagated into code:
upa-portis instead of upa-portid

This problem was detected by code inspection.

Fixes: eea98334 ("sparc64: broken %tick frequency on spitfire cpus"
Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
Reported-by: NSteven Sistare <steven.sistare@oracle.com>
Reviewed-by: NSteven Sistare <steven.sistare@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a718d139

sparc64: add port_id to VIO device metadata · 15c35e4e

由 Jag Raman 提交于 6月 23, 2017

Add port_id field to VIO device metadata to identify the port of
VIO device.
Signed-off-by: NJagannathan Raman <jag.raman@oracle.com>
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Reviewed-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15c35e4e

sparc64: Enhance search for VIO device in MDESC · f4d29ca7

由 Jag Raman 提交于 6月 23, 2017

Enhances search for VIO device in MDESC by leveraging already existing
MDESC APIs. Enhances changes in earlier patch,
"sparc: Machine description indices can vary", by using existing MD
search functions. It also specifies a match function, thereby
enabling device_find_child() to use it for the purpose of matching
device nodes in MDESC.

An API to find VDEV node in MDESC based on its md_node_info is also
added. It is planned to be used by VIO device clients in the future.
Signed-off-by: NJagannathan Raman <jag.raman@oracle.com>
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Reviewed-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4d29ca7

sparc64: enhance VIO device probing · aa512d5e

由 Jag Raman 提交于 6月 23, 2017

- Allocate IRQs for VIO devices during probing.
- Allow clients to specify if IRQs would be allocated for a given
  VIO device.
- Cache the device handle of the root node of channel-devices sub-tree in
  Machine Description (MDESC).
Signed-off-by: NJagannathan Raman <jag.raman@oracle.com>
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Reviewed-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa512d5e

sparc64: check if a client is allowed to register for MDESC notifications · 110f2264

由 Jag Raman 提交于 6月 23, 2017

Check if a client is supported, by comparing against a whitelist, to
register for notifications from Machine Description (MDESC)
Signed-off-by: NJagannathan Raman <jag.raman@oracle.com>
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Reviewed-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

110f2264

sparc64: remove restriction on VIO device name size · 0542eb7d

由 Jag Raman 提交于 6月 23, 2017

Removes restriction on VIO device's size limit. Since KOBJ_NAME_LEN
has been dropped from kobject, there doesn't seem to be a
restriction on the device name anymore. This limit therefore
doesn't make sense.
Signed-off-by: NJagannathan Raman <jag.raman@oracle.com>
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Reviewed-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0542eb7d

sparc64: refactor code to obtain cfg_handle property from MDESC · e2169a32

由 Jag Raman 提交于 6月 23, 2017

Refactors code to get the cfg_handle property of a node from Machine
Description (MDESC)
Signed-off-by: NJagannathan Raman <jag.raman@oracle.com>
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Reviewed-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2169a32

sparc64: add MDESC node name property to VIO device metadata · 06f3c3ac

由 Jag Raman 提交于 6月 23, 2017

Add the MDESC node name of MDESC client to VIO device metadata. It is
later used to uniquely identify a node in the MDESC. VIO & MDESC APIs
are updated to handle this node name.
Signed-off-by: NJagannathan Raman <jag.raman@oracle.com>
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Reviewed-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06f3c3ac

sparc64: mdesc: use __GFP_REPEAT action modifier for VM allocation · 0ab2fcd6

由 Jag Raman 提交于 6月 23, 2017

During MDESC handle allocation, use the __GFP_REPEAT flag instead of
__GFP_NOFAIL. If memory is not available, the caller expects a NULL
pointer instead of waiting until memory is allocated.
Signed-off-by: NJagannathan Raman <jag.raman@oracle.com>
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Reviewed-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ab2fcd6

sparc64: expand MDESC interface · 411cb4a0

由 Jag Raman 提交于 6月 23, 2017

Add the following two APIs to Machine Description (MDESC)
- mdesc_get_node: Searches for a node in the Machine
  Description tree based on given information about
  that node.
- mdesc_get_node_info: Retrieves information about a
  given node.
Signed-off-by: NJagannathan Raman <jag.raman@oracle.com>
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Reviewed-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

411cb4a0

sparc64: skip handshake for LDC channels in RAW mode · 01b7a471

由 Jag Raman 提交于 6月 23, 2017

LDC channels in RAW mode does not provide any session management. No
handshake protocol is defined for LDC channels in RAW mode. It's
therefore skipped.
Signed-off-by: NJagannathan Raman <jag.raman@oracle.com>
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Reviewed-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

01b7a471

sparc64: specify the device class in VIO version info. packet · ac6bb025

由 Jag Raman 提交于 6月 23, 2017

Specify the class of VIO device in the version info. packet. The device's
class identifies the type of VIO device, whether it's DISK, CONSOLE,
NETWORK, etc... This packet is used in the handshake between the
client and server for this device.
Signed-off-by: NJagannathan Raman <jag.raman@oracle.com>
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Reviewed-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac6bb025

sparc64: ensure VIO operations are defined while being used · 7b6e04a3

由 Jag Raman 提交于 6月 23, 2017

It's possible that VIO operations are not defined for some VIO
clients. In that case, VIO ops pointer should be checked for
NULL before being used
Signed-off-by: NJagannathan Raman <jag.raman@oracle.com>
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Reviewed-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7b6e04a3

sparc: kernel: apc: make of_device_ids const · 69f57978

由 Arvind Yadav 提交于 6月 23, 2017

of_device_ids are not supposed to change at runtime. All functions
working with of_device_ids provided by <linux/of.h> work with const
of_device_ids. So mark the non-const structs as const.
Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69f57978

sparc64: Fix gup_huge_pmd · dbd2667a

由 Nitin Gupta 提交于 6月 22, 2017

The function assumes that each PMD points to head of a
huge page. This is not correct as a PMD can point to
start of any 8M region with a, say 256M, hugepage. The
fix ensures that it points to the correct head of any PMD
huge page.

Cc: Julian Calaby <julian.calaby@gmail.com>
Signed-off-by: NNitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dbd2667a

21 6月, 2017 1 次提交

net: introduce SO_PEERGROUPS getsockopt · 28b5ba2a

由 David Herrmann 提交于 6月 21, 2017

This adds the new getsockopt(2) option SO_PEERGROUPS on SOL_SOCKET to
retrieve the auxiliary groups of the remote peer. It is designed to
naturally extend SO_PEERCRED. That is, the underlying data is from the
same credentials. Regarding its syntax, it is based on SO_PEERSEC. That
is, if the provided buffer is too small, ERANGE is returned and @optlen
is updated. Otherwise, the information is copied, @optlen is set to the
actual size, and 0 is returned.

While SO_PEERCRED (and thus `struct ucred') already returns the primary
group, it lacks the auxiliary group vector. However, nearly all access
controls (including kernel side VFS and SYSVIPC, but also user-space
polkit, DBus, ...) consider the entire set of groups, rather than just
the primary group. But this is currently not possible with pure
SO_PEERCRED. Instead, user-space has to work around this and query the
system database for the auxiliary groups of a UID retrieved via
SO_PEERCRED.

Unfortunately, there is no race-free way to query the auxiliary groups
of the PID/UID retrieved via SO_PEERCRED. Hence, the current user-space
solution is to use getgrouplist(3p), which itself falls back to NSS and
whatever is configured in nsswitch.conf(3). This effectively checks
which groups we *would* assign to the user if it logged in *now*. On
normal systems it is as easy as reading /etc/group, but with NSS it can
resort to quering network databases (eg., LDAP), using IPC or network
communication.

Long story short: Whenever we want to use auxiliary groups for access
checks on IPC, we need further IPC to talk to the user/group databases,
rather than just relying on SO_PEERCRED and the incoming socket. This
is unfortunate, and might even result in dead-locks if the database
query uses the same IPC as the original request.

So far, those recursions / dead-locks have been avoided by using
primitive IPC for all crucial NSS modules. However, we want to avoid
re-inventing the wheel for each NSS module that might be involved in
user/group queries. Hence, we would preferably make DBus (and other IPC
that supports access-management based on groups) work without resorting
to the user/group database. This new SO_PEERGROUPS ioctl would allow us
to make dbus-daemon work without ever calling into NSS.

Cc: Michal Sekletar <msekleta@redhat.com>
Cc: Simon McVittie <simon.mcvittie@collabora.co.uk>
Reviewed-by: NTom Gundersen <teg@jklm.no>
Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

28b5ba2a

20 6月, 2017 3 次提交

Adding the type of exported symbols · f5a651f1

由 Nagarathnam Muthusamy 提交于 6月 19, 2017

Missing symbol type for few functions prevents genksyms from generating
symbol versions for those functions. This patch fixes them.
Signed-off-by: NNagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>
Reviewed-by: NBabu Moger <babu.moger@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5a651f1

sed regex in Makefile.build requires line break between exported symbols · d16c0649

由 Nagarathnam Muthusamy 提交于 6月 19, 2017

The following regex in Makefile.build matches only one ___EXPORT_SYMBOL per line.

sed
's/.*___EXPORT_SYMBOL[[:space:]]*\([a-zA-Z0-9_]*\)[[:space:]]*,.*/EXPORT_SYMBOL(\1);/'

ATOMIC_OPS macro in atomic_64.S expands multiple symbols in same line hence
version generation is done only for the last matched symbol. This patch adds
new line between the symbol expansions.
Signed-off-by: NNagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>
Reviewed-by: NBabu Moger <babu.moger@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d16c0649

Adding asm-prototypes.h for genksyms to generate crc · bdca8cc0

由 Nagarathnam Muthusamy 提交于 6月 19, 2017

This patch adds the prototypes of assembly defined functions to asm-prototypes.h.
Some prototypes are directly added as they are not present in any existing header
files.
Signed-off-by: NNagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>
Reviewed-by: NBabu Moger <babu.moger@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bdca8cc0

19 6月, 2017 1 次提交

mm: larger stack guard gap, between vmas · 1be7107f

由 Hugh Dickins 提交于 6月 19, 2017

Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.

This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.

Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.

One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications. For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).

Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.

Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.
Original-patch-by: NOleg Nesterov <oleg@redhat.com>
Original-patch-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NHugh Dickins <hughd@google.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1be7107f

15 6月, 2017 3 次提交

sparc/time: make of_device_ids const · c55c5dde

由 Arvind Yadav 提交于 6月 15, 2017

of_device_ids are not supposed to change at runtime. All functions
working with of_device_ids provided by <linux/of.h> work with const
of_device_ids. So mark the non-const structs as const.
Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c55c5dde

sparc64: broken %tick frequency on spitfire cpus · eea98334

由 Pavel Tatashin 提交于 6月 15, 2017

After early boot time stamps project the %tick frequency is detected
incorrectly on spittfire cpus.

We must use cpuid of boot cpu to find corresponding cpu node in OpenBoot,
and extract clock-frequency property from there.
Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eea98334

sparc64: use prom interface to get %stick frequency · fca4afe4

由 Pavel Tatashin 提交于 6月 15, 2017

We initialize time early, we must use prom interface instead of open
firmware driver, which is not yet initialized.

Also, use prom_getintdefault() instead of prom_getint() to be compatible
with the code before early boot timestamps project.
Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fca4afe4

13 6月, 2017 2 次提交

sparc64: optimize functions that access tick · eae3fc98

由 Pavel Tatashin 提交于 6月 12, 2017

Replace read tick function pointers with the new hot-patched get_tick().
This optimizes the performance of functions such as: sched_clock()
Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: NSteven Sistare <steven.sistare@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eae3fc98

sparc64: add hot-patched and inlined get_tick() · 4929c83a

由 Pavel Tatashin 提交于 6月 12, 2017

Add the new get_tick() function that is hot-patched during boot based on
processor we are booting on.
Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: NSteven Sistare <steven.sistare@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4929c83a

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功