提交 · 76098b36b5db1a509e5af94128b08f950692c7f8 · openeuler / raspberrypi-kernel

11 8月, 2017 1 次提交

firmware: send -EINTR on signal abort on fallback mechanism · 76098b36

由 Luis R. Rodriguez 提交于 7月 20, 2017

Right now we send -EAGAIN to a syfs write which got interrupted.
Userspace can't tell what happened though, send -EINTR if we
were killed due to a signal so userspace can tell things apart.

This is only applicable to the fallback mechanism.
Reported-by: NMartin Fuzzey <mfuzzey@parkeon.com>
Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

76098b36

04 8月, 2017 1 次提交

initcall_debug: add deferred probe times · 1f5000bd

由 Todd Poynor 提交于 7月 25, 2017

initcall_debug attributes all deferred device probe retries for the
late_initcall level to function deferred_probe_initcall.  Add logs of
the individual device probe routines called, to identify which drivers
are executing for how long during the initcall path.  Deferred probes
that occur after initcall processing are not shown.

Example log messages added:

[    0.505119] deferred probe my-sound-device @ 6
[    0.517656] deferred probe my-sound-device returned after 1227 usecs
Signed-off-by: NTodd Poynor <toddpoynor@google.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

1f5000bd

22 7月, 2017 3 次提交

driver core: add devm_device_add_group() and friends · 57b8ff07

由 Dmitry Torokhov 提交于 7月 19, 2017

Many drivers create additional driver-specific device attributes when
binding to the device, and providing managed version of
device_create_group() will simplify unbinding and error handling in probe
path for such drivers.

Without managed version driver writers either have to mix manual and
managed resources, which is prone to errors, or open-code this function by
providing a wrapper to device_add_group() and use it with devm_add_action()
or devm_add_action_or_reset().
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

57b8ff07

driver core: make device_{add|remove}_groups() public · a7670d42

由 Dmitry Torokhov 提交于 7月 19, 2017

Many drivers create additional driver-specific device attributes when
binding to the device. To avoid them calling SYSFS API directly, let's
export these helpers.
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

a7670d42

driver core: emit uevents when device is bound to a driver · 1455cf8d

由 Dmitry Torokhov 提交于 7月 19, 2017

There are certain touch controllers that may come up in either normal
(application) or boot mode, depending on whether firmware/configuration is
corrupted when they are powered on. In boot mode the kernel does not create
input device instance (because it does not necessarily know the
characteristics of the input device in question).

Another number of controllers does not store firmware in a non-volatile
memory, and they similarly need to have firmware loaded before input device
instance is created. There are also other types of devices with similar
behavior.

There is a desire to be able to trigger firmware loading via udev, but it
has to happen only when driver is bound to a physical device (i2c or spi).
These udev actions can not use ADD events, as those happen too early, so we
are introducing BIND and UNBIND events that are emitted at the right
moment.

Also, many drivers create additional driver-specific device attributes
when binding to the device, to provide userspace with additional controls.
The new events allow userspace to adjust these driver-specific attributes
without worrying that they are not there yet.
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

1455cf8d

17 7月, 2017 5 次提交

arch_topology: Get rid of cap_parsing_done · d8bcf4db

由 Viresh Kumar 提交于 6月 23, 2017

There is no need to check for cap_parsing_done flag anymore as
!raw_capacity flag alone is enough for us. Remove the (now) useless flag
cap_parsing_done.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NJuri Lelli <juri.lelli@arm.com>
Tested-by: NJuri Lelli <juri.lelli@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

d8bcf4db

arch_topology: Localize cap_parsing_failed to topology_parse_cpu_capacity() · 62de1161

由 Viresh Kumar 提交于 6月 23, 2017

cap_parsing_failed is only required in topology_parse_cpu_capacity() to
know if we have already tried to allocate raw_capacity and failed, or if
at least one of the cpu_node didn't had the required
"capacity-dmips-mhz" property.

All other users can use raw_capacity instead of cap_parsing_failed.

Make sure we set raw_capacity to NULL after we free it.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NJuri Lelli <juri.lelli@arm.com>
Tested-by: NJuri Lelli <juri.lelli@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

62de1161

arch_topology: Change return type of topology_parse_cpu_capacity() to bool · 805df296

由 Viresh Kumar 提交于 6月 23, 2017

topology_parse_cpu_capacity() returns 1 on success and 0 on errors. Make
it return bool instead of int as that suits the purpose better.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NJuri Lelli <juri.lelli@arm.com>
Tested-by: NJuri Lelli <juri.lelli@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

805df296

arch_topology: Convert switch block to if block · 93a57081

由 Viresh Kumar 提交于 6月 23, 2017

We only need to take care of one case here (CPUFREQ_NOTIFY) and there is
no need to add an extra level of indentation to the case specific code
by using a switch block. Use an if block instead.

Also add some blank lines to make the code look better.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NJuri Lelli <juri.lelli@arm.com>
Tested-by: NJuri Lelli <juri.lelli@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

93a57081

arch_topology: Don't break lines unnecessarily · 3eeba1a2

由 Viresh Kumar 提交于 6月 23, 2017

There is no need of line break at few places, avoid them.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NJuri Lelli <juri.lelli@arm.com>
Tested-by: NJuri Lelli <juri.lelli@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

3eeba1a2

12 7月, 2017 2 次提交

PM / QoS: return -EINVAL for bogus strings · 2ca30331

由 Dan Carpenter 提交于 7月 10, 2017

In the current code, if the user accidentally writes a bogus command to
this sysfs file, then we set the latency tolerance to an uninitialized
variable.

Fixes: 2d984ad1 (PM / QoS: Introcuce latency tolerance device PM QoS type)
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NPavel Machek <pavel@ucw.cz>
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2ca30331

device property: Introduce fwnode_call_bool_op() for ops that return bool · e8158b48

由 Sakari Ailus 提交于 7月 11, 2017

fwnode_call_int_op() isn't suitable for calling ops that return bool
since it effectively causes the result returned to the user to be
true when an op hasn't been defined or the fwnode is NULL.

Address this by introducing fwnode_call_bool_op() for calling ops
that return bool.

Fixes: 3708184a "device property: Move FW type specific functionality to FW specific files"
Fixes: 2294b3af "device property: Introduce fwnode_device_is_available()"
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NSakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

e8158b48

11 7月, 2017 1 次提交

mm: drop useless local parameters of __register_one_node() · a7be6e5a

由 Dou Liyang 提交于 7月 10, 2017

__register_one_node() initializes local parameters "p_node" & "parent"
for register_node().

But, register_node() does not use them.

Remove the related code of "parent" node, cleanup __register_one_node()
and register_node().

Link: http://lkml.kernel.org/r/1498013846-20149-1-git-send-email-douly.fnst@cn.fujitsu.comSigned-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a7be6e5a

07 7月, 2017 10 次提交

Add "shutdown" to "struct class". · f77af151

由 Josh Zimmerman 提交于 6月 25, 2017

The TPM class has some common shutdown code that must be executed for
all drivers. This adds some needed functionality for that.
Signed-off-by: NJosh Zimmerman <joshz@google.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
Fixes: 74d6b3ce ("tpm: fix suspend/resume paths for TPM 2.0")
Reviewed-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

f77af151

mm, memory_hotplug: drop CONFIG_MOVABLE_NODE · f70029bb

由 Michal Hocko 提交于 7月 06, 2017

Commit 20b2f52b ("numa: add CONFIG_MOVABLE_NODE for
movable-dedicated node") has introduced CONFIG_MOVABLE_NODE without a
good explanation on why it is actually useful.

It makes a lot of sense to make movable node semantic opt in but we
already have that because the feature has to be explicitly enabled on
the kernel command line.  A config option on top only makes the
configuration space larger without a good reason.  It also adds an
additional ifdefery that pollutes the code.

Just drop the config option and make it de-facto always enabled.  This
shouldn't introduce any change to the semantic.

Link: http://lkml.kernel.org/r/20170529114141.536-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NReza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Kani Toshimitsu <toshi.kani@hpe.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f70029bb

mm: vmstat: move slab statistics from zone to node counters · 385386cf

由 Johannes Weiner 提交于 7月 06, 2017

Patch series "mm: per-lruvec slab stats"

Josef is working on a new approach to balancing slab caches and the page
cache.  For this to work, he needs slab cache statistics on the lruvec
level.  These patches implement that by adding infrastructure that
allows updating and reading generic VM stat items per lruvec, then
switches some existing VM accounting sites, including the slab
accounting ones, to this new cgroup-aware API.

I'll follow up with more patches on this, because there is actually
substantial simplification that can be done to the memory controller
when we replace private memcg accounting with making the existing VM
accounting sites cgroup-aware.  But this is enough for Josef to base his
slab reclaim work on, so here goes.

This patch (of 5):

To re-implement slab cache vs.  page cache balancing, we'll need the
slab counters at the lruvec level, which, ever since lru reclaim was
moved from the zone to the node, is the intersection of the node, not
the zone, and the memcg.

We could retain the per-zone counters for when the page allocator dumps
its memory information on failures, and have counters on both levels -
which on all but NUMA node 0 is usually redundant.  But let's keep it
simple for now and just move them.  If anybody complains we can restore
the per-zone counters.

[hannes@cmpxchg.org: fix oops]
  Link: http://lkml.kernel.org/r/20170605183511.GA8915@cmpxchg.org
Link: http://lkml.kernel.org/r/20170530181724.27197-3-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

385386cf

mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone · c246a213

由 Michal Hocko 提交于 7月 06, 2017

Heiko Carstens has noticed that he can generate overlapping zones for
ZONE_DMA and ZONE_NORMAL:

  DMA      [mem 0x0000000000000000-0x000000007fffffff]
  Normal   [mem 0x0000000080000000-0x000000017fffffff]

  $ cat /sys/devices/system/memory/block_size_bytes
  10000000
  $ cat /sys/devices/system/memory/memory5/valid_zones
  DMA
  $ echo 0 > /sys/devices/system/memory/memory5/online
  $ cat /sys/devices/system/memory/memory5/valid_zones
  Normal
  $ echo 1 > /sys/devices/system/memory/memory5/online
  Normal

  $ cat /proc/zoneinfo
  Node 0, zone      DMA
  spanned  524288        <-----
  present  458752
  managed  455078
  start_pfn:           0 <-----

  Node 0, zone   Normal
  spanned  720896
  present  589824
  managed  571648
  start_pfn:           327680 <-----

The reason is that we assume that the default zone for kernel onlining
is ZONE_NORMAL.  This was a simplification introduced by the memory
hotplug rework and it is easily fixable by checking the range overlap in
the zone order and considering the first matching zone as the default
one.  If there is no such zone then assume ZONE_NORMAL as we have been
doing so far.

Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
Link: http://lkml.kernel.org/r/20170601083746.4924-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c246a213

mm, memory_hotplug: do not associate hotadded memory to zones until online · f1dd2cd1

由 Michal Hocko 提交于 7月 06, 2017

The current memory hotplug implementation relies on having all the
struct pages associate with a zone/node during the physical hotplug
phase (arch_add_memory->__add_pages->__add_section->__add_zone).  In the
vast majority of cases this means that they are added to ZONE_NORMAL.
This has been so since 9d99aaa3 ("[PATCH] x86_64: Support memory
hotadd without sparsemem") and it wasn't a big deal back then because
movable onlining didn't exist yet.

Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable
onlining 511c2aba ("mm, memory-hotplug: dynamic configure movable
memory and portion memory") and then things got more complicated.
Rather than reconsidering the zone association which was no longer
needed (because the memory hotplug already depended on SPARSEMEM) a
convoluted semantic of zone shifting has been developed.  Only the
currently last memblock or the one adjacent to the zone_movable can be
onlined movable.  This essentially means that the online type changes as
the new memblocks are added.

Let's simulate memory hot online manually
  $ echo 0x100000000 > /sys/devices/system/memory/probe
  $ grep . /sys/devices/system/memory/memory32/valid_zones
  Normal Movable

  $ echo $((0x100000000+(128<<20))) > /sys/devices/system/memory/probe
  $ grep . /sys/devices/system/memory/memory3?/valid_zones
  /sys/devices/system/memory/memory32/valid_zones:Normal
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable

  $ echo $((0x100000000+2*(128<<20))) > /sys/devices/system/memory/probe
  $ grep . /sys/devices/system/memory/memory3?/valid_zones
  /sys/devices/system/memory/memory32/valid_zones:Normal
  /sys/devices/system/memory/memory33/valid_zones:Normal
  /sys/devices/system/memory/memory34/valid_zones:Normal Movable

  $ echo online_movable > /sys/devices/system/memory/memory34/state
  $ grep . /sys/devices/system/memory/memory3?/valid_zones
  /sys/devices/system/memory/memory32/valid_zones:Normal
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable
  /sys/devices/system/memory/memory34/valid_zones:Movable Normal

This is an awkward semantic because an udev event is sent as soon as the
block is onlined and an udev handler might want to online it based on
some policy (e.g.  association with a node) but it will inherently race
with new blocks showing up.

This patch changes the physical online phase to not associate pages with
any zone at all.  All the pages are just marked reserved and wait for
the onlining phase to be associated with the zone as per the online
request.  There are only two requirements

	- existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap

	- ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses

the latter one is not an inherent requirement and can be changed in the
future.  It preserves the current behavior and made the code slightly
simpler.  This is subject to change in future.

This means that the same physical online steps as above will lead to the
following state: Normal Movable

  /sys/devices/system/memory/memory32/valid_zones:Normal Movable
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable

  /sys/devices/system/memory/memory32/valid_zones:Normal Movable
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable
  /sys/devices/system/memory/memory34/valid_zones:Normal Movable

  /sys/devices/system/memory/memory32/valid_zones:Normal Movable
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable
  /sys/devices/system/memory/memory34/valid_zones:Movable

Implementation:
The current move_pfn_range is reimplemented to check the above
requirements (allow_online_pfn_range) and then updates the respective
zone (move_pfn_range_to_zone), the pgdat and links all the pages in the
pfn range with the zone/node.  __add_pages is updated to not require the
zone and only initializes sections in the range.  This allowed to
simplify the arch_add_memory code (s390 could get rid of quite some of
code).

devm_memremap_pages is the only user of arch_add_memory which relies on
the zone association because it only hooks into the memory hotplug only
half way.  It uses it to associate the new memory with ZONE_DEVICE but
doesn't allow it to be {on,off}lined via sysfs.  This means that this
particular code path has to call move_pfn_range_to_zone explicitly.

The original zone shifting code is kept in place and will be removed in
the follow up patch for an easier review.

Please note that this patch also changes the original behavior when
offlining a memory block adjacent to another zone (Normal vs.  Movable)
used to allow to change its movable type.  This will be handled later.

[richard.weiyang@gmail.com: simplify zone_intersects()]
  Link: http://lkml.kernel.org/r/20170616092335.5177-1-richard.weiyang@gmail.com
[richard.weiyang@gmail.com: remove duplicate call for set_page_links]
  Link: http://lkml.kernel.org/r/20170616092335.5177-2-richard.weiyang@gmail.com
[akpm@linux-foundation.org: remove unused local `i']
Link: http://lkml.kernel.org/r/20170515085827.16474-12-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NWei Yang <richard.weiyang@gmail.com>
Tested-by: NDan Williams <dan.j.williams@intel.com>
Tested-by: NReza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # For s390 bits
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f1dd2cd1

mm, memory_hotplug: consider offline memblocks removable · 8b0662f2

由 Michal Hocko 提交于 7月 06, 2017

is_pageblock_removable_nolock() relies on having zone association to
examine all the page blocks to check whether they are movable or free.
This is just wasting of cycles when the memblock is offline.  Later
patch in the series will also change the time when the page is
associated with a zone so we let's bail out early if the memblock is
offline.

Link: http://lkml.kernel.org/r/20170515085827.16474-7-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Reported-by: NIgor Mammedov <imammedo@redhat.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8b0662f2

mm, memory_hotplug: split up register_one_node() · 9037a993

由 Michal Hocko 提交于 7月 06, 2017

Memory hotplug (add_memory_resource) has to reinitialize node
infrastructure if the node is offline (one which went through the
complete add_memory(); remove_memory() cycle).  That involves node
registration to the kobj infrastructure (register_node), the proper
association with cpus (register_cpu_under_node) and finally creation of
node<->memblock symlinks (link_mem_sections).

The last part requires to know node_start_pfn and node_spanned_pages
which we currently have but a leter patch will postpone this
initialization to the onlining phase which happens later.  In fact we do
not need to rely on the early pgdat initialization even now because the
currently hot added pfn range is currently known.

Split register_one_node into core which does all the common work for the
boot time NUMA initialization and the hotplug (__register_one_node).
register_one_node keeps the full initialization while hotplug calls
__register_one_node and manually calls link_mem_sections for the proper
range.

This shouldn't introduce any functional change.

Link: http://lkml.kernel.org/r/20170515085827.16474-6-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9037a993

mm, memory_hotplug: get rid of is_zone_device_section · 1b862aec

由 Michal Hocko 提交于 7月 06, 2017

Device memory hotplug hooks into regular memory hotplug only half way.
It needs memory sections to track struct pages but there is no
need/desire to associate those sections with memory blocks and export
them to the userspace via sysfs because they cannot be onlined anyway.

This is currently expressed by for_device argument to arch_add_memory
which then makes sure to associate the given memory range with
ZONE_DEVICE.  register_new_memory then relies on is_zone_device_section
to distinguish special memory hotplug from the regular one.  While this
works now, later patches in this series want to move __add_zone outside
of arch_add_memory path so we have to come up with something else.

Add want_memblock down the __add_pages path and use it to control
whether the section->memblock association should be done.
arch_add_memory then just trivially want memblock for everything but
for_device hotplug.

remove_memory_section doesn't need is_zone_device_section either.  We
can simply skip all the memblock specific cleanup if there is no
memblock for the given section.

This shouldn't introduce any functional change.

Link: http://lkml.kernel.org/r/20170515085827.16474-5-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Tested-by: NDan Williams <dan.j.williams@intel.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1b862aec

mm: drop page_initialized check from get_nid_for_pfn · bfe63d3b

由 Michal Hocko 提交于 7月 06, 2017

Commit c04fc586 ("mm: show node to memory section relationship with
symlinks in sysfs") has added means to export memblock<->node
association into the sysfs.  It has also introduced get_nid_for_pfn
which is a rather confusing counterpart of pfn_to_nid which checks also
whether the pfn page is already initialized (page_initialized).

This is done by checking page::lru != NULL which doesn't make any sense
at all.  Nothing in this path really relies on the lru list being used
or initialized.  Just remove it because this will become a problem with
later patches.

Thanks to Reza Arbab for testing which revealed this to be a problem
(http://lkml.kernel.org/r/20170403202337.GA12482@dhcp22.suse.cz)

Link: http://lkml.kernel.org/r/20170515085827.16474-4-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bfe63d3b

mm, sparsemem: break out of loops early · c4e1be9e

由 Dave Hansen 提交于 7月 06, 2017

There are a number of times that we loop over NR_MEM_SECTIONS, looking
for section_present() on each section. But, when we have very large
physical address spaces (large MAX_PHYSMEM_BITS), NR_MEM_SECTIONS
becomes very large, making the loops quite long.

With MAX_PHYSMEM_BITS=46 and a section size of 128MB, the current loops
are 512k iterations, which we barely notice on modern hardware. But,
raising MAX_PHYSMEM_BITS higher (like we will see on systems that
support 5-level paging) makes this 64x longer and we start to notice,
especially on slower systems like simulators. A 10-second delay for
512k iterations is annoying. But, a 640- second delay is crippling.

This does not help if we have extremely sparse physical address spaces,
but those are quite rare. We expect that most of the "slow" systems
where this matters will also be quite small and non-sparse.

To fix this, we track the highest section we've ever encountered. This
lets us know when we will *never* see another section_present(), and
lets us break out of the loops earlier.

Doing the whole for_each_present_section_nr() macro is probably
overkill, but it will ensure that any future loop iterations that we
grow are more likely to be correct.

Kirrill said "It shaved almost 40 seconds from boot time in qemu with
5-level paging enabled for me".

Link: http://lkml.kernel.org/r/20170504174434.C45A4735@viggo.jf.intel.comSigned-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Tested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c4e1be9e

05 7月, 2017 2 次提交

PM / Domains: provide pm_genpd_poweroff_noirq() stub · af3eb274

由 Arnd Bergmann 提交于 6月 30, 2017

When CONFIG_PM_SLEEP is disabled, we don't have a pm_genpd_poweroff_noirq
function definition:

drivers/base/power/domain.c: In function 'pm_genpd_init':
drivers/base/power/domain.c:1549:37: error: 'pm_genpd_poweroff_noirq' undeclared (first use in this function); did you mean 'genpd_power_off_unused'?

This adds another NULL definition for it, just like we already have
for the other _noirq handlers.

Fixes: 10da6542 (PM / Domains: Call driver's noirq callbacks)
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

af3eb274

Revert "PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device" · 23c6d2c7

由 Rafael J. Wysocki 提交于 7月 04, 2017

Revert commit 8b55e55e (PM / Domains: Handle safely
genpd_syscore_switch() call on non-genpd device) which was misguided
(the change made by it was not necessary) and it introduced a call to
a function that may sleep into an atomic context code path.
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

23c6d2c7

03 7月, 2017 1 次提交

platform: Accept const properties · 277036f0

由 Jan Kiszka 提交于 6月 02, 2017

Aligns us with device_add_properties, the function we call.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com>

277036f0

01 7月, 2017 1 次提交

drivers: dma-mapping: allow dma_common_mmap() for NOMMU · 07c75d7a

由 Vladimir Murzin 提交于 6月 28, 2017

Currently, internals of dma_common_mmap() is compiled out if build is
done for either NOMMU or target which explicitly says it does not
have/want coherent DMA mmap. It turned out that dma_common_mmap() can
be handy in NOMMU setup (at least for ARM).

This patch converts exitent NOMMU targets to use ARCH_NO_COHERENT_DMA_MMAP,
thus when CONFIG_MMU is gone from dma_common_mmap() their behaviour stays
unchanged.

ARM is not converted to ARCH_NO_COHERENT_DMA_MMAP because it 1)
already has mmap callback which can handle (at some extent) NOMMU 2)
already defines dummy pgprot_noncached() for NOMMU build.

c6x and frv stay untouched since they already have ARCH_NO_COHERENT_DMA_MMAP.

Cc: Steven Miao <realmz6@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Suggested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
Tested-by: NBenjamin Gaignard <benjamin.gaignard@linaro.org>

07c75d7a

29 6月, 2017 7 次提交

PM / Domains: Fix missing default_power_down_ok comment · 268cd2ed

由 Krzysztof Kozlowski 提交于 6月 28, 2017

Commit fc5cbf0c (PM / Domains: Support for multiple states) split
out some code out of default_power_down_ok() function so the
documentation has to be moved to appropriate place.
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

268cd2ed

PM / Domains: Fix unsafe iteration over modified list of domains · a7e2d1bc

由 Krzysztof Kozlowski 提交于 6月 28, 2017

of_genpd_remove_last() iterates over list of domains and removes
matching element thus it has to use safe version of list iteration.

Fixes: 17926551 (PM / Domains: Add support for removing nested PM domains by provider)
Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a7e2d1bc

PM / Domains: Fix unsafe iteration over modified list of domain providers · b556b15d

由 Krzysztof Kozlowski 提交于 6月 28, 2017

of_genpd_del_provider() iterates over list of domain provides and
removes matching element thus it has to use safe version of list
iteration.

Fixes: aa42240a (PM / Domains: Add generic OF-based PM domain look-up)
Cc: 3.19+ <stable@vger.kernel.org> # 3.19+
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

b556b15d

PM / Domains: Fix unsafe iteration over modified list of device links · c6e83cac

由 Krzysztof Kozlowski 提交于 6月 28, 2017

pm_genpd_remove_subdomain() iterates over domain's master_links list and
removes matching element thus it has to use safe version of list
iteration.

Fixes: f721889f ("PM / Domains: Support for generic I/O PM domains (v8)")
Cc: 3.1+ <stable@vger.kernel.org> # 3.1+
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c6e83cac

PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device · 8b55e55e

由 Krzysztof Kozlowski 提交于 6月 28, 2017

genpd_syscore_switch() had two problems:
 1. It silently assumed that device, it is being called for, belongs
    to generic power domain and used container_of() on its power
    domain pointer.  Such assumption might not be true always.

2. It iterated over list of generic power domains without holding
   gpd_list_lock mutex thus list could have been modified at the same
   time.

Usage of genpd_lookup_dev() solves both problems as it is safe a call
for non-generic power domains and uses mutex when iterating.
Reported-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8b55e55e

PM / Domains: Call driver's noirq callbacks · 10da6542

由 Mikko Perttunen 提交于 6月 22, 2017

Currently genpd installs its own noirq callbacks, but never calls down
to the driver's corresponding callbacks. Add these calls.
Signed-off-by: NMikko Perttunen <mperttunen@nvidia.com>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

10da6542

regmap: irq: add chip option mask_writeonly · a71411db

由 Michael Grzeschik 提交于 6月 23, 2017

Some irq controllers have writeonly/multipurpose register layouts. In
those cases we read invalid data back. Here we add the option
mask_writeonly as masking option.
Signed-off-by: NMichael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: NMark Brown <broonie@kernel.org>

a71411db

28 6月, 2017 6 次提交

drivers: dma-coherent: Introduce default DMA pool · 93228b44

由 Vladimir Murzin 提交于 6月 26, 2017

This patch introduces default coherent DMA pool similar to default CMA
area concept. To keep other users safe code kept under CONFIG_ARM.

Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: NRobin Murphy <robin.murphy@arm.com>
Tested-by: NBenjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: NAndras Szemzo <sza@esh.hu>
Tested-by: NAlexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

93228b44

drivers: dma-coherent: Account dma_pfn_offset when used with device tree · c41f9ea9

由 Vladimir Murzin 提交于 6月 26, 2017

dma_declare_coherent_memory() and friends are designed to account
difference in CPU and device addresses. However, when it is used with
reserved memory regions there is assumption that CPU and device have
the same view on address space. This assumption gets invalid when
reserved memory for coherent DMA allocations is referenced by device
with non-empty "dma-range" property.

Simply feeding device address as rmem->base + dev->dma_pfn_offset
would not work due to reserved memory region can be shared, so this
patch turns device address to be expressed with help of CPU address
and device's dma_pfn_offset in case memory reservation has been done
via device tree; non device tree users continue to use the old scheme.

Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Roger Quadros <rogerq@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: NBenjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: NAndras Szemzo <sza@esh.hu>
Tested-by: NAlexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c41f9ea9

dma-mapping: replace dmam_alloc_noncoherent with dmam_alloc_attrs · 63d36c95

由 Christoph Hellwig 提交于 6月 12, 2017

dmam_alloc_noncoherent is a trivial wrapper around dmam_alloc_attrs,
that hardcodes one particular flag.  Make the devres code more
flexible by allowing the callers to pass arbitrary flags.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NTejun Heo <tj@kernel.org>

63d36c95

dma-mapping: remove dmam_free_noncoherent · 03b64386

由 Christoph Hellwig 提交于 6月 12, 2017

This function was never used since it was added.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NTejun Heo <tj@kernel.org>

03b64386

PM / QoS: constify *_attribute_group. · dbb1d8b7

由 Arvind Yadav 提交于 6月 22, 2017

File size before:
   text	   data	    bss	    dec	    hex	filename
   3890	   1152	      8	   5050	   13ba	drivers/base/power/sysfs.o

File size After adding 'const':
   text	   data	    bss	    dec	    hex	filename
   4250	    800	      8	   5058	   13c2	drivers/base/power/sysfs.o
Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
Acked-by: NPavel Machek <pavel@ucw.cz>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

dbb1d8b7

PM / sysfs: Constify attribute groups · cbcba35d

由 Krzysztof Kozlowski 提交于 6月 12, 2017

Local instances of struct attribute_group are not modified so they can
be made const to increase code safeness.
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

cbcba35d