提交 · c7d28eca1d58d335ff8de6f33559b221bdd029f9 · openanolis / cloud-kernel

07 7月, 2017 9 次提交

mm, memory_hotplug: drop CONFIG_MOVABLE_NODE · f70029bb

由 Michal Hocko 提交于 7月 06, 2017

Commit 20b2f52b ("numa: add CONFIG_MOVABLE_NODE for
movable-dedicated node") has introduced CONFIG_MOVABLE_NODE without a
good explanation on why it is actually useful.

It makes a lot of sense to make movable node semantic opt in but we
already have that because the feature has to be explicitly enabled on
the kernel command line.  A config option on top only makes the
configuration space larger without a good reason.  It also adds an
additional ifdefery that pollutes the code.

Just drop the config option and make it de-facto always enabled.  This
shouldn't introduce any change to the semantic.

Link: http://lkml.kernel.org/r/20170529114141.536-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NReza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Kani Toshimitsu <toshi.kani@hpe.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f70029bb

mm: vmstat: move slab statistics from zone to node counters · 385386cf

由 Johannes Weiner 提交于 7月 06, 2017

Patch series "mm: per-lruvec slab stats"

Josef is working on a new approach to balancing slab caches and the page
cache.  For this to work, he needs slab cache statistics on the lruvec
level.  These patches implement that by adding infrastructure that
allows updating and reading generic VM stat items per lruvec, then
switches some existing VM accounting sites, including the slab
accounting ones, to this new cgroup-aware API.

I'll follow up with more patches on this, because there is actually
substantial simplification that can be done to the memory controller
when we replace private memcg accounting with making the existing VM
accounting sites cgroup-aware.  But this is enough for Josef to base his
slab reclaim work on, so here goes.

This patch (of 5):

To re-implement slab cache vs.  page cache balancing, we'll need the
slab counters at the lruvec level, which, ever since lru reclaim was
moved from the zone to the node, is the intersection of the node, not
the zone, and the memcg.

We could retain the per-zone counters for when the page allocator dumps
its memory information on failures, and have counters on both levels -
which on all but NUMA node 0 is usually redundant.  But let's keep it
simple for now and just move them.  If anybody complains we can restore
the per-zone counters.

[hannes@cmpxchg.org: fix oops]
  Link: http://lkml.kernel.org/r/20170605183511.GA8915@cmpxchg.org
Link: http://lkml.kernel.org/r/20170530181724.27197-3-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

385386cf

mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone · c246a213

由 Michal Hocko 提交于 7月 06, 2017

Heiko Carstens has noticed that he can generate overlapping zones for
ZONE_DMA and ZONE_NORMAL:

  DMA      [mem 0x0000000000000000-0x000000007fffffff]
  Normal   [mem 0x0000000080000000-0x000000017fffffff]

  $ cat /sys/devices/system/memory/block_size_bytes
  10000000
  $ cat /sys/devices/system/memory/memory5/valid_zones
  DMA
  $ echo 0 > /sys/devices/system/memory/memory5/online
  $ cat /sys/devices/system/memory/memory5/valid_zones
  Normal
  $ echo 1 > /sys/devices/system/memory/memory5/online
  Normal

  $ cat /proc/zoneinfo
  Node 0, zone      DMA
  spanned  524288        <-----
  present  458752
  managed  455078
  start_pfn:           0 <-----

  Node 0, zone   Normal
  spanned  720896
  present  589824
  managed  571648
  start_pfn:           327680 <-----

The reason is that we assume that the default zone for kernel onlining
is ZONE_NORMAL.  This was a simplification introduced by the memory
hotplug rework and it is easily fixable by checking the range overlap in
the zone order and considering the first matching zone as the default
one.  If there is no such zone then assume ZONE_NORMAL as we have been
doing so far.

Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
Link: http://lkml.kernel.org/r/20170601083746.4924-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c246a213

mm, memory_hotplug: do not associate hotadded memory to zones until online · f1dd2cd1

由 Michal Hocko 提交于 7月 06, 2017

The current memory hotplug implementation relies on having all the
struct pages associate with a zone/node during the physical hotplug
phase (arch_add_memory->__add_pages->__add_section->__add_zone).  In the
vast majority of cases this means that they are added to ZONE_NORMAL.
This has been so since 9d99aaa3 ("[PATCH] x86_64: Support memory
hotadd without sparsemem") and it wasn't a big deal back then because
movable onlining didn't exist yet.

Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable
onlining 511c2aba ("mm, memory-hotplug: dynamic configure movable
memory and portion memory") and then things got more complicated.
Rather than reconsidering the zone association which was no longer
needed (because the memory hotplug already depended on SPARSEMEM) a
convoluted semantic of zone shifting has been developed.  Only the
currently last memblock or the one adjacent to the zone_movable can be
onlined movable.  This essentially means that the online type changes as
the new memblocks are added.

Let's simulate memory hot online manually
  $ echo 0x100000000 > /sys/devices/system/memory/probe
  $ grep . /sys/devices/system/memory/memory32/valid_zones
  Normal Movable

  $ echo $((0x100000000+(128<<20))) > /sys/devices/system/memory/probe
  $ grep . /sys/devices/system/memory/memory3?/valid_zones
  /sys/devices/system/memory/memory32/valid_zones:Normal
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable

  $ echo $((0x100000000+2*(128<<20))) > /sys/devices/system/memory/probe
  $ grep . /sys/devices/system/memory/memory3?/valid_zones
  /sys/devices/system/memory/memory32/valid_zones:Normal
  /sys/devices/system/memory/memory33/valid_zones:Normal
  /sys/devices/system/memory/memory34/valid_zones:Normal Movable

  $ echo online_movable > /sys/devices/system/memory/memory34/state
  $ grep . /sys/devices/system/memory/memory3?/valid_zones
  /sys/devices/system/memory/memory32/valid_zones:Normal
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable
  /sys/devices/system/memory/memory34/valid_zones:Movable Normal

This is an awkward semantic because an udev event is sent as soon as the
block is onlined and an udev handler might want to online it based on
some policy (e.g.  association with a node) but it will inherently race
with new blocks showing up.

This patch changes the physical online phase to not associate pages with
any zone at all.  All the pages are just marked reserved and wait for
the onlining phase to be associated with the zone as per the online
request.  There are only two requirements

	- existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap

	- ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses

the latter one is not an inherent requirement and can be changed in the
future.  It preserves the current behavior and made the code slightly
simpler.  This is subject to change in future.

This means that the same physical online steps as above will lead to the
following state: Normal Movable

  /sys/devices/system/memory/memory32/valid_zones:Normal Movable
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable

  /sys/devices/system/memory/memory32/valid_zones:Normal Movable
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable
  /sys/devices/system/memory/memory34/valid_zones:Normal Movable

  /sys/devices/system/memory/memory32/valid_zones:Normal Movable
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable
  /sys/devices/system/memory/memory34/valid_zones:Movable

Implementation:
The current move_pfn_range is reimplemented to check the above
requirements (allow_online_pfn_range) and then updates the respective
zone (move_pfn_range_to_zone), the pgdat and links all the pages in the
pfn range with the zone/node.  __add_pages is updated to not require the
zone and only initializes sections in the range.  This allowed to
simplify the arch_add_memory code (s390 could get rid of quite some of
code).

devm_memremap_pages is the only user of arch_add_memory which relies on
the zone association because it only hooks into the memory hotplug only
half way.  It uses it to associate the new memory with ZONE_DEVICE but
doesn't allow it to be {on,off}lined via sysfs.  This means that this
particular code path has to call move_pfn_range_to_zone explicitly.

The original zone shifting code is kept in place and will be removed in
the follow up patch for an easier review.

Please note that this patch also changes the original behavior when
offlining a memory block adjacent to another zone (Normal vs.  Movable)
used to allow to change its movable type.  This will be handled later.

[richard.weiyang@gmail.com: simplify zone_intersects()]
  Link: http://lkml.kernel.org/r/20170616092335.5177-1-richard.weiyang@gmail.com
[richard.weiyang@gmail.com: remove duplicate call for set_page_links]
  Link: http://lkml.kernel.org/r/20170616092335.5177-2-richard.weiyang@gmail.com
[akpm@linux-foundation.org: remove unused local `i']
Link: http://lkml.kernel.org/r/20170515085827.16474-12-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NWei Yang <richard.weiyang@gmail.com>
Tested-by: NDan Williams <dan.j.williams@intel.com>
Tested-by: NReza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # For s390 bits
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f1dd2cd1

mm, memory_hotplug: consider offline memblocks removable · 8b0662f2

由 Michal Hocko 提交于 7月 06, 2017

is_pageblock_removable_nolock() relies on having zone association to
examine all the page blocks to check whether they are movable or free.
This is just wasting of cycles when the memblock is offline.  Later
patch in the series will also change the time when the page is
associated with a zone so we let's bail out early if the memblock is
offline.

Link: http://lkml.kernel.org/r/20170515085827.16474-7-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Reported-by: NIgor Mammedov <imammedo@redhat.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8b0662f2

mm, memory_hotplug: split up register_one_node() · 9037a993

由 Michal Hocko 提交于 7月 06, 2017

Memory hotplug (add_memory_resource) has to reinitialize node
infrastructure if the node is offline (one which went through the
complete add_memory(); remove_memory() cycle).  That involves node
registration to the kobj infrastructure (register_node), the proper
association with cpus (register_cpu_under_node) and finally creation of
node<->memblock symlinks (link_mem_sections).

The last part requires to know node_start_pfn and node_spanned_pages
which we currently have but a leter patch will postpone this
initialization to the onlining phase which happens later.  In fact we do
not need to rely on the early pgdat initialization even now because the
currently hot added pfn range is currently known.

Split register_one_node into core which does all the common work for the
boot time NUMA initialization and the hotplug (__register_one_node).
register_one_node keeps the full initialization while hotplug calls
__register_one_node and manually calls link_mem_sections for the proper
range.

This shouldn't introduce any functional change.

Link: http://lkml.kernel.org/r/20170515085827.16474-6-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9037a993

mm, memory_hotplug: get rid of is_zone_device_section · 1b862aec

由 Michal Hocko 提交于 7月 06, 2017

Device memory hotplug hooks into regular memory hotplug only half way.
It needs memory sections to track struct pages but there is no
need/desire to associate those sections with memory blocks and export
them to the userspace via sysfs because they cannot be onlined anyway.

This is currently expressed by for_device argument to arch_add_memory
which then makes sure to associate the given memory range with
ZONE_DEVICE.  register_new_memory then relies on is_zone_device_section
to distinguish special memory hotplug from the regular one.  While this
works now, later patches in this series want to move __add_zone outside
of arch_add_memory path so we have to come up with something else.

Add want_memblock down the __add_pages path and use it to control
whether the section->memblock association should be done.
arch_add_memory then just trivially want memblock for everything but
for_device hotplug.

remove_memory_section doesn't need is_zone_device_section either.  We
can simply skip all the memblock specific cleanup if there is no
memblock for the given section.

This shouldn't introduce any functional change.

Link: http://lkml.kernel.org/r/20170515085827.16474-5-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Tested-by: NDan Williams <dan.j.williams@intel.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1b862aec

mm: drop page_initialized check from get_nid_for_pfn · bfe63d3b

由 Michal Hocko 提交于 7月 06, 2017

Commit c04fc586 ("mm: show node to memory section relationship with
symlinks in sysfs") has added means to export memblock<->node
association into the sysfs.  It has also introduced get_nid_for_pfn
which is a rather confusing counterpart of pfn_to_nid which checks also
whether the pfn page is already initialized (page_initialized).

This is done by checking page::lru != NULL which doesn't make any sense
at all.  Nothing in this path really relies on the lru list being used
or initialized.  Just remove it because this will become a problem with
later patches.

Thanks to Reza Arbab for testing which revealed this to be a problem
(http://lkml.kernel.org/r/20170403202337.GA12482@dhcp22.suse.cz)

Link: http://lkml.kernel.org/r/20170515085827.16474-4-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bfe63d3b

mm, sparsemem: break out of loops early · c4e1be9e

由 Dave Hansen 提交于 7月 06, 2017

There are a number of times that we loop over NR_MEM_SECTIONS, looking
for section_present() on each section. But, when we have very large
physical address spaces (large MAX_PHYSMEM_BITS), NR_MEM_SECTIONS
becomes very large, making the loops quite long.

With MAX_PHYSMEM_BITS=46 and a section size of 128MB, the current loops
are 512k iterations, which we barely notice on modern hardware. But,
raising MAX_PHYSMEM_BITS higher (like we will see on systems that
support 5-level paging) makes this 64x longer and we start to notice,
especially on slower systems like simulators. A 10-second delay for
512k iterations is annoying. But, a 640- second delay is crippling.

This does not help if we have extremely sparse physical address spaces,
but those are quite rare. We expect that most of the "slow" systems
where this matters will also be quite small and non-sparse.

To fix this, we track the highest section we've ever encountered. This
lets us know when we will *never* see another section_present(), and
lets us break out of the loops earlier.

Doing the whole for_each_present_section_nr() macro is probably
overkill, but it will ensure that any future loop iterations that we
grow are more likely to be correct.

Kirrill said "It shaved almost 40 seconds from boot time in qemu with
5-level paging enabled for me".

Link: http://lkml.kernel.org/r/20170504174434.C45A4735@viggo.jf.intel.comSigned-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Tested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c4e1be9e

03 7月, 2017 1 次提交

platform: Accept const properties · 277036f0

由 Jan Kiszka 提交于 6月 02, 2017

Aligns us with device_add_properties, the function we call.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com>

277036f0

01 7月, 2017 1 次提交

drivers: dma-mapping: allow dma_common_mmap() for NOMMU · 07c75d7a

由 Vladimir Murzin 提交于 6月 28, 2017

Currently, internals of dma_common_mmap() is compiled out if build is
done for either NOMMU or target which explicitly says it does not
have/want coherent DMA mmap. It turned out that dma_common_mmap() can
be handy in NOMMU setup (at least for ARM).

This patch converts exitent NOMMU targets to use ARCH_NO_COHERENT_DMA_MMAP,
thus when CONFIG_MMU is gone from dma_common_mmap() their behaviour stays
unchanged.

ARM is not converted to ARCH_NO_COHERENT_DMA_MMAP because it 1)
already has mmap callback which can handle (at some extent) NOMMU 2)
already defines dummy pgprot_noncached() for NOMMU build.

c6x and frv stay untouched since they already have ARCH_NO_COHERENT_DMA_MMAP.

Cc: Steven Miao <realmz6@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Suggested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
Tested-by: NBenjamin Gaignard <benjamin.gaignard@linaro.org>

07c75d7a

29 6月, 2017 7 次提交

PM / Domains: Fix missing default_power_down_ok comment · 268cd2ed

由 Krzysztof Kozlowski 提交于 6月 28, 2017

Commit fc5cbf0c (PM / Domains: Support for multiple states) split
out some code out of default_power_down_ok() function so the
documentation has to be moved to appropriate place.
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

268cd2ed

PM / Domains: Fix unsafe iteration over modified list of domains · a7e2d1bc

由 Krzysztof Kozlowski 提交于 6月 28, 2017

of_genpd_remove_last() iterates over list of domains and removes
matching element thus it has to use safe version of list iteration.

Fixes: 17926551 (PM / Domains: Add support for removing nested PM domains by provider)
Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a7e2d1bc

PM / Domains: Fix unsafe iteration over modified list of domain providers · b556b15d

由 Krzysztof Kozlowski 提交于 6月 28, 2017

of_genpd_del_provider() iterates over list of domain provides and
removes matching element thus it has to use safe version of list
iteration.

Fixes: aa42240a (PM / Domains: Add generic OF-based PM domain look-up)
Cc: 3.19+ <stable@vger.kernel.org> # 3.19+
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

b556b15d

PM / Domains: Fix unsafe iteration over modified list of device links · c6e83cac

由 Krzysztof Kozlowski 提交于 6月 28, 2017

pm_genpd_remove_subdomain() iterates over domain's master_links list and
removes matching element thus it has to use safe version of list
iteration.

Fixes: f721889f ("PM / Domains: Support for generic I/O PM domains (v8)")
Cc: 3.1+ <stable@vger.kernel.org> # 3.1+
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c6e83cac

PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device · 8b55e55e

由 Krzysztof Kozlowski 提交于 6月 28, 2017

genpd_syscore_switch() had two problems:
 1. It silently assumed that device, it is being called for, belongs
    to generic power domain and used container_of() on its power
    domain pointer.  Such assumption might not be true always.

2. It iterated over list of generic power domains without holding
   gpd_list_lock mutex thus list could have been modified at the same
   time.

Usage of genpd_lookup_dev() solves both problems as it is safe a call
for non-generic power domains and uses mutex when iterating.
Reported-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8b55e55e

PM / Domains: Call driver's noirq callbacks · 10da6542

由 Mikko Perttunen 提交于 6月 22, 2017

Currently genpd installs its own noirq callbacks, but never calls down
to the driver's corresponding callbacks. Add these calls.
Signed-off-by: NMikko Perttunen <mperttunen@nvidia.com>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

10da6542

regmap: irq: add chip option mask_writeonly · a71411db

由 Michael Grzeschik 提交于 6月 23, 2017

Some irq controllers have writeonly/multipurpose register layouts. In
those cases we read invalid data back. Here we add the option
mask_writeonly as masking option.
Signed-off-by: NMichael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: NMark Brown <broonie@kernel.org>

a71411db

28 6月, 2017 9 次提交

drivers: dma-coherent: Introduce default DMA pool · 93228b44

由 Vladimir Murzin 提交于 6月 26, 2017

This patch introduces default coherent DMA pool similar to default CMA
area concept. To keep other users safe code kept under CONFIG_ARM.

Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: NRobin Murphy <robin.murphy@arm.com>
Tested-by: NBenjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: NAndras Szemzo <sza@esh.hu>
Tested-by: NAlexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

93228b44

drivers: dma-coherent: Account dma_pfn_offset when used with device tree · c41f9ea9

由 Vladimir Murzin 提交于 6月 26, 2017

dma_declare_coherent_memory() and friends are designed to account
difference in CPU and device addresses. However, when it is used with
reserved memory regions there is assumption that CPU and device have
the same view on address space. This assumption gets invalid when
reserved memory for coherent DMA allocations is referenced by device
with non-empty "dma-range" property.

Simply feeding device address as rmem->base + dev->dma_pfn_offset
would not work due to reserved memory region can be shared, so this
patch turns device address to be expressed with help of CPU address
and device's dma_pfn_offset in case memory reservation has been done
via device tree; non device tree users continue to use the old scheme.

Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Roger Quadros <rogerq@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: NBenjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: NAndras Szemzo <sza@esh.hu>
Tested-by: NAlexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c41f9ea9

dma-mapping: replace dmam_alloc_noncoherent with dmam_alloc_attrs · 63d36c95

由 Christoph Hellwig 提交于 6月 12, 2017

dmam_alloc_noncoherent is a trivial wrapper around dmam_alloc_attrs,
that hardcodes one particular flag.  Make the devres code more
flexible by allowing the callers to pass arbitrary flags.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NTejun Heo <tj@kernel.org>

63d36c95

dma-mapping: remove dmam_free_noncoherent · 03b64386

由 Christoph Hellwig 提交于 6月 12, 2017

This function was never used since it was added.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NTejun Heo <tj@kernel.org>

03b64386

PM / QoS: constify *_attribute_group. · dbb1d8b7

由 Arvind Yadav 提交于 6月 22, 2017

File size before:
   text	   data	    bss	    dec	    hex	filename
   3890	   1152	      8	   5050	   13ba	drivers/base/power/sysfs.o

File size After adding 'const':
   text	   data	    bss	    dec	    hex	filename
   4250	    800	      8	   5058	   13c2	drivers/base/power/sysfs.o
Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
Acked-by: NPavel Machek <pavel@ucw.cz>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

dbb1d8b7

PM / sysfs: Constify attribute groups · cbcba35d

由 Krzysztof Kozlowski 提交于 6月 12, 2017

Local instances of struct attribute_group are not modified so they can
be made const to increase code safeness.
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

cbcba35d

PM: Constify info string used in messages · e3771fa9

由 Krzysztof Kozlowski 提交于 6月 12, 2017

The 'info' string appearing in many places points to a .rodata string so
it should be passes as pointer to const.
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

e3771fa9

PM: Constify returned PM event name · 952856db

由 Krzysztof Kozlowski 提交于 6月 12, 2017

The pm_verb() returns a pointer to string from .rodata so it should be
marked as const.
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

952856db

PM / Domains: Constify genpd pointer · d8600c8b

由 Krzysztof Kozlowski 提交于 6月 12, 2017

Mark pointer to struct generic_pm_domain const (either passed in
argument or used localy in a function), whenever it is not modifed by
the function itself.
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d8600c8b

27 6月, 2017 1 次提交

PM / wakeirq: Convert to SRCU · ea0212f4

由 Thomas Gleixner 提交于 6月 25, 2017

The wakeirq infrastructure uses RCU to protect the list of wakeirqs. That
breaks the irq bus locking infrastructure, which is allows sleeping
functions to be called so interrupt controllers behind slow busses,
e.g. i2c, can be handled.

The wakeirq functions hold rcu_read_lock and call into irq functions, which
in case of interrupts using the irq bus locking will trigger a
might_sleep() splat.

Convert the wakeirq infrastructure to Sleepable RCU and unbreak it.

Fixes: 4990d4fe (PM / Wakeirq: Add automated device wake IRQ handling)
Reported-by: NBrian Norris <briannorris@chromium.org>
Suggested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NTony Lindgren <tony@atomide.com>
Tested-by: NBrian Norris <briannorris@chromium.org>
Cc: 4.2+ <stable@vger.kernel.org> # 4.2+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

ea0212f4

24 6月, 2017 1 次提交

PM / OPP: Add dev_pm_opp_{set|put}_clkname() · 829a4e8c

由 Viresh Kumar 提交于 6月 21, 2017

In order to support OPP switching, OPP layer needs to get pointer to the
clock for the device. Simple cases work fine without using the routines
added by this patch (i.e.  by passing connection-id as NULL), but for a
device with multiple clocks available, the OPP core needs to know the
exact name of the clk to use.

Add a new set of APIs to get that done.
Tested-by: NRajendra Nayak <rnayak@codeaurora.org>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NStephen Boyd <sboyd@codeaurora.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

829a4e8c

23 6月, 2017 1 次提交

irqchip/MSI: Use irq_domain_update_bus_token instead of an open coded access · 96f0d93a

由 Marc Zyngier 提交于 6月 22, 2017

Now that we have irq_domain_update_bus_token(), switch everyone over
to it. The debugfs code thanks you for your continued support.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

96f0d93a

22 6月, 2017 5 次提交

PM / OPP: Don't create debugfs "supply-0" directory unnecessarily · 1fae788e

由 Viresh Kumar 提交于 5月 23, 2017

We create "supply-0" debugfs directory even if the device doesn't do
voltage scaling. That looks confusing, as if the regulator is found but
we never managed to get voltage levels for it.

Avoid creating such a directory unnecessarily.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NStephen Boyd <sboyd@codeaurora.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1fae788e

PM / OPP: opp-microvolt is not optional if regulators are set · 688a48b0

由 Viresh Kumar 提交于 5月 23, 2017

If dev_pm_opp_set_regulators() is called for a device and its regulators
are set in the OPP core, the OPP nodes for the device must contain the
"opp-microvolt" property, otherwise there is something wrong and we
better error out.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NStephen Boyd <sboyd@codeaurora.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

688a48b0

PM / OPP: Don't create copy of regulators unnecessarily · 478256bd

由 Viresh Kumar 提交于 5月 23, 2017

This code was required while the OPP core was managed with help of RCUs,
but not anymore. Get rid of unnecessary alloc/memcpy operations.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NStephen Boyd <sboyd@codeaurora.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

478256bd

PM / OPP: Reorganize _generic_set_opp_regulator() · c74b32fa

由 Viresh Kumar 提交于 5月 23, 2017

The code was overly complicated here because of the limitations that we
had with RCUs (Couldn't use opp-table and OPPs outside RCU protected
section and can't call sleep-able routines from within that). But that
is long gone now.

Reorganize _generic_set_opp_regulator() in order to avoid using "struct
dev_pm_set_opp_data" and copying data into it for the case where
opp_table->set_opp is not set.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NStephen Boyd <sboyd@codeaurora.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c74b32fa

PM / Domains: pdd->dev can't be NULL in genpd_dev_pm_qos_notifier() · b4883ca4

由 Viresh Kumar 提交于 5月 16, 2017

The pm_domain_data (pdd) pointer is set from genpd_alloc_dev_data() and
pdd->dev is guaranteed to be valid. There is no need to check pdd and
pdd->dev in rest of the code as pdd->dev will always be valid for a non
NULL pdd pointer.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

b4883ca4

15 6月, 2017 2 次提交

ACPI / PM: Ignore spurious SCI wakeups from suspend-to-idle · 33e4f80e

由 Rafael J. Wysocki 提交于 6月 12, 2017

The ACPI SCI (System Control Interrupt) is set up as a wakeup IRQ
during suspend-to-idle transitions and, consequently, any events
signaled through it wake up the system from that state.  However,
on some systems some of the events signaled via the ACPI SCI while
suspended to idle should not cause the system to wake up.  In fact,
quite often they should just be discarded.

Arguably, systems should not resume entirely on such events, but in
order to decide which events really should cause the system to resume
and which are spurious, it is necessary to resume up to the point
when ACPI SCIs are actually handled and processed, which is after
executing dpm_resume_noirq() in the system resume path.

For this reasons, add a loop around freeze_enter() in which the
platforms can process events signaled via multiplexed IRQ lines
like the ACPI SCI and add suspend-to-idle hooks that can be
used for this purpose to struct platform_freeze_ops.

In the ACPI case, the ->wake hook is used for checking if the SCI
has triggered while suspended and deferring the interrupt-induced
system wakeup until the events signaled through it are actually
processed sufficiently to decide whether or not the system should
resume.  In turn, the ->sync hook allows all of the relevant event
queues to be flushed so as to prevent events from being missed due
to race conditions.

In addition to that, some ACPI code processing wakeup events needs
to be modified to use the "hard" version of wakeup triggers, so that
it will cause a system resume to happen on device-induced wakeup
events even if the "soft" mechanism to prevent the system from
suspending is not enabled.  However, to preserve the existing
behavior with respect to suspend-to-RAM, this only is done in
the suspend-to-idle case and only if an SCI has occurred while
suspended.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

33e4f80e

PM / sleep: Print timing information if debug is enabled · 604d8958

由 Rafael J. Wysocki 提交于 6月 12, 2017

Avoid printing the device suspend/resume timing information if
CONFIG_PM_DEBUG is not set to reduce the log noise level.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

604d8958

13 6月, 2017 3 次提交

PM / Domains: Allow overriding the ->xlate() callback · 40845524

由 Thierry Reding 提交于 3月 29, 2017

Allow generic power domain providers to override the ->xlate() callback
in case the default genpd_xlate_onecell() translation callback is not
good enough.

One potential use-case for this is to allow generic power domains to be
specified by an ID rather than an index.
Signed-off-by: NThierry Reding <treding@nvidia.com>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NThierry Reding <treding@nvidia.com>

40845524

driver core: fix automatic pinctrl management · ed032c1b

由 Johan Hovold 提交于 6月 06, 2017

Commit ab78029e ("drivers/pinctrl: grab default handles from device
core") added automatic pin-control management to driver core by looking
up and setting any default pinctrl state found in device tree while a
device is being probed.

This obviously runs into problems as soon as device-tree nodes are
reused for child devices which are later also probed as pins would
already have been claimed by the ancestor device.

For example if a USB host controller claims a pin, its root hub would
consequently fail to probe when its device-tree node is set to the node
of the controller:

    pinctrl-single 48002030.pinmux: pin PIN204 already requested by 48064800.ehci; cannot claim for usb1
    pinctrl-single 48002030.pinmux: pin-204 (usb1) status -22
    pinctrl-single 48002030.pinmux: could not request pin 204 (PIN204) from group usb_dbg_pins  on device pinctrl-single
    usb usb1: Error applying setting, reverse things back
    usb: probe of usb1 failed with error -22

Fix this by checking the new of_node_reused flag and skipping automatic
pinctrl configuration during probe if set.

Note that the flag is checked in driver core rather than in pinctrl
(e.g. in pinctrl_dt_to_map()) which would specifically have prevented
intentional use of a parent's pinctrl properties by a child device
(should such a need ever arise).

Fixes: ab78029e ("drivers/pinctrl: grab default handles from device core")
Acked-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NJohan Hovold <johan@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ed032c1b

driver core: add helper to reuse a device-tree node · 4e75e1d7

由 Johan Hovold 提交于 6月 06, 2017

Add a helper function to be used when reusing the device-tree node of
another device.

It is fairly common for drivers to reuse the device-tree node of a
parent (or other ancestor) device when creating class or bus devices
(e.g. gpio chips, i2c adapters, iio chips, spi masters, serdev, phys,
usb root hubs). But reusing a device-tree node may cause problems if the
new device is later probed as for example driver core would currently
attempt to reinitialise an already active associated pinmux
configuration.

Other potential issues include the platform-bus code unconditionally
dropping the device-tree node reference in its device destructor,
reinitialisation of other bus-managed resources such as clocks, and the
recently added DMA-setup in driver core.

Note that for most examples above this is currently not an issue as the
devices are never probed, but this is a problem for the USB bus which
has recently gained device-tree support. This was discovered and
worked-around in a rather ad-hoc fashion by commit dc5878ab ("usb:
core: move root hub's device node assignment after it is added to bus")
by not setting the of_node pointer until after the root-hub device has
been registered.

Instead we can allow devices to reuse a device-tree node by setting a
flag in their struct device that can be used by core, bus and driver
code to avoid resources from being over-allocated.

Note that the helper also grabs an extra reference to the device node,
which specifically balances the unconditional put in the platform-device
destructor.
Signed-off-by: NJohan Hovold <johan@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

4e75e1d7

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功