提交 · c7d28eca1d58d335ff8de6f33559b221bdd029f9 · openanolis / cloud-kernel

07 7月, 2017 11 次提交

mm, memory_hotplug: drop CONFIG_MOVABLE_NODE · f70029bb

由 Michal Hocko 提交于 7月 06, 2017

Commit 20b2f52b ("numa: add CONFIG_MOVABLE_NODE for
movable-dedicated node") has introduced CONFIG_MOVABLE_NODE without a
good explanation on why it is actually useful.

It makes a lot of sense to make movable node semantic opt in but we
already have that because the feature has to be explicitly enabled on
the kernel command line.  A config option on top only makes the
configuration space larger without a good reason.  It also adds an
additional ifdefery that pollutes the code.

Just drop the config option and make it de-facto always enabled.  This
shouldn't introduce any change to the semantic.

Link: http://lkml.kernel.org/r/20170529114141.536-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NReza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Kani Toshimitsu <toshi.kani@hpe.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f70029bb

mm: vmstat: move slab statistics from zone to node counters · 385386cf

由 Johannes Weiner 提交于 7月 06, 2017

Patch series "mm: per-lruvec slab stats"

Josef is working on a new approach to balancing slab caches and the page
cache.  For this to work, he needs slab cache statistics on the lruvec
level.  These patches implement that by adding infrastructure that
allows updating and reading generic VM stat items per lruvec, then
switches some existing VM accounting sites, including the slab
accounting ones, to this new cgroup-aware API.

I'll follow up with more patches on this, because there is actually
substantial simplification that can be done to the memory controller
when we replace private memcg accounting with making the existing VM
accounting sites cgroup-aware.  But this is enough for Josef to base his
slab reclaim work on, so here goes.

This patch (of 5):

To re-implement slab cache vs.  page cache balancing, we'll need the
slab counters at the lruvec level, which, ever since lru reclaim was
moved from the zone to the node, is the intersection of the node, not
the zone, and the memcg.

We could retain the per-zone counters for when the page allocator dumps
its memory information on failures, and have counters on both levels -
which on all but NUMA node 0 is usually redundant.  But let's keep it
simple for now and just move them.  If anybody complains we can restore
the per-zone counters.

[hannes@cmpxchg.org: fix oops]
  Link: http://lkml.kernel.org/r/20170605183511.GA8915@cmpxchg.org
Link: http://lkml.kernel.org/r/20170530181724.27197-3-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

385386cf

mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone · c246a213

由 Michal Hocko 提交于 7月 06, 2017

Heiko Carstens has noticed that he can generate overlapping zones for
ZONE_DMA and ZONE_NORMAL:

  DMA      [mem 0x0000000000000000-0x000000007fffffff]
  Normal   [mem 0x0000000080000000-0x000000017fffffff]

  $ cat /sys/devices/system/memory/block_size_bytes
  10000000
  $ cat /sys/devices/system/memory/memory5/valid_zones
  DMA
  $ echo 0 > /sys/devices/system/memory/memory5/online
  $ cat /sys/devices/system/memory/memory5/valid_zones
  Normal
  $ echo 1 > /sys/devices/system/memory/memory5/online
  Normal

  $ cat /proc/zoneinfo
  Node 0, zone      DMA
  spanned  524288        <-----
  present  458752
  managed  455078
  start_pfn:           0 <-----

  Node 0, zone   Normal
  spanned  720896
  present  589824
  managed  571648
  start_pfn:           327680 <-----

The reason is that we assume that the default zone for kernel onlining
is ZONE_NORMAL.  This was a simplification introduced by the memory
hotplug rework and it is easily fixable by checking the range overlap in
the zone order and considering the first matching zone as the default
one.  If there is no such zone then assume ZONE_NORMAL as we have been
doing so far.

Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
Link: http://lkml.kernel.org/r/20170601083746.4924-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c246a213

mm, memory_hotplug: do not associate hotadded memory to zones until online · f1dd2cd1

由 Michal Hocko 提交于 7月 06, 2017

The current memory hotplug implementation relies on having all the
struct pages associate with a zone/node during the physical hotplug
phase (arch_add_memory->__add_pages->__add_section->__add_zone).  In the
vast majority of cases this means that they are added to ZONE_NORMAL.
This has been so since 9d99aaa3 ("[PATCH] x86_64: Support memory
hotadd without sparsemem") and it wasn't a big deal back then because
movable onlining didn't exist yet.

Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable
onlining 511c2aba ("mm, memory-hotplug: dynamic configure movable
memory and portion memory") and then things got more complicated.
Rather than reconsidering the zone association which was no longer
needed (because the memory hotplug already depended on SPARSEMEM) a
convoluted semantic of zone shifting has been developed.  Only the
currently last memblock or the one adjacent to the zone_movable can be
onlined movable.  This essentially means that the online type changes as
the new memblocks are added.

Let's simulate memory hot online manually
  $ echo 0x100000000 > /sys/devices/system/memory/probe
  $ grep . /sys/devices/system/memory/memory32/valid_zones
  Normal Movable

  $ echo $((0x100000000+(128<<20))) > /sys/devices/system/memory/probe
  $ grep . /sys/devices/system/memory/memory3?/valid_zones
  /sys/devices/system/memory/memory32/valid_zones:Normal
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable

  $ echo $((0x100000000+2*(128<<20))) > /sys/devices/system/memory/probe
  $ grep . /sys/devices/system/memory/memory3?/valid_zones
  /sys/devices/system/memory/memory32/valid_zones:Normal
  /sys/devices/system/memory/memory33/valid_zones:Normal
  /sys/devices/system/memory/memory34/valid_zones:Normal Movable

  $ echo online_movable > /sys/devices/system/memory/memory34/state
  $ grep . /sys/devices/system/memory/memory3?/valid_zones
  /sys/devices/system/memory/memory32/valid_zones:Normal
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable
  /sys/devices/system/memory/memory34/valid_zones:Movable Normal

This is an awkward semantic because an udev event is sent as soon as the
block is onlined and an udev handler might want to online it based on
some policy (e.g.  association with a node) but it will inherently race
with new blocks showing up.

This patch changes the physical online phase to not associate pages with
any zone at all.  All the pages are just marked reserved and wait for
the onlining phase to be associated with the zone as per the online
request.  There are only two requirements

	- existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap

	- ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses

the latter one is not an inherent requirement and can be changed in the
future.  It preserves the current behavior and made the code slightly
simpler.  This is subject to change in future.

This means that the same physical online steps as above will lead to the
following state: Normal Movable

  /sys/devices/system/memory/memory32/valid_zones:Normal Movable
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable

  /sys/devices/system/memory/memory32/valid_zones:Normal Movable
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable
  /sys/devices/system/memory/memory34/valid_zones:Normal Movable

  /sys/devices/system/memory/memory32/valid_zones:Normal Movable
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable
  /sys/devices/system/memory/memory34/valid_zones:Movable

Implementation:
The current move_pfn_range is reimplemented to check the above
requirements (allow_online_pfn_range) and then updates the respective
zone (move_pfn_range_to_zone), the pgdat and links all the pages in the
pfn range with the zone/node.  __add_pages is updated to not require the
zone and only initializes sections in the range.  This allowed to
simplify the arch_add_memory code (s390 could get rid of quite some of
code).

devm_memremap_pages is the only user of arch_add_memory which relies on
the zone association because it only hooks into the memory hotplug only
half way.  It uses it to associate the new memory with ZONE_DEVICE but
doesn't allow it to be {on,off}lined via sysfs.  This means that this
particular code path has to call move_pfn_range_to_zone explicitly.

The original zone shifting code is kept in place and will be removed in
the follow up patch for an easier review.

Please note that this patch also changes the original behavior when
offlining a memory block adjacent to another zone (Normal vs.  Movable)
used to allow to change its movable type.  This will be handled later.

[richard.weiyang@gmail.com: simplify zone_intersects()]
  Link: http://lkml.kernel.org/r/20170616092335.5177-1-richard.weiyang@gmail.com
[richard.weiyang@gmail.com: remove duplicate call for set_page_links]
  Link: http://lkml.kernel.org/r/20170616092335.5177-2-richard.weiyang@gmail.com
[akpm@linux-foundation.org: remove unused local `i']
Link: http://lkml.kernel.org/r/20170515085827.16474-12-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NWei Yang <richard.weiyang@gmail.com>
Tested-by: NDan Williams <dan.j.williams@intel.com>
Tested-by: NReza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # For s390 bits
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f1dd2cd1

mm, memory_hotplug: consider offline memblocks removable · 8b0662f2

由 Michal Hocko 提交于 7月 06, 2017

is_pageblock_removable_nolock() relies on having zone association to
examine all the page blocks to check whether they are movable or free.
This is just wasting of cycles when the memblock is offline.  Later
patch in the series will also change the time when the page is
associated with a zone so we let's bail out early if the memblock is
offline.

Link: http://lkml.kernel.org/r/20170515085827.16474-7-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Reported-by: NIgor Mammedov <imammedo@redhat.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8b0662f2

mm, memory_hotplug: split up register_one_node() · 9037a993

由 Michal Hocko 提交于 7月 06, 2017

Memory hotplug (add_memory_resource) has to reinitialize node
infrastructure if the node is offline (one which went through the
complete add_memory(); remove_memory() cycle).  That involves node
registration to the kobj infrastructure (register_node), the proper
association with cpus (register_cpu_under_node) and finally creation of
node<->memblock symlinks (link_mem_sections).

The last part requires to know node_start_pfn and node_spanned_pages
which we currently have but a leter patch will postpone this
initialization to the onlining phase which happens later.  In fact we do
not need to rely on the early pgdat initialization even now because the
currently hot added pfn range is currently known.

Split register_one_node into core which does all the common work for the
boot time NUMA initialization and the hotplug (__register_one_node).
register_one_node keeps the full initialization while hotplug calls
__register_one_node and manually calls link_mem_sections for the proper
range.

This shouldn't introduce any functional change.

Link: http://lkml.kernel.org/r/20170515085827.16474-6-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9037a993

mm, memory_hotplug: get rid of is_zone_device_section · 1b862aec

由 Michal Hocko 提交于 7月 06, 2017

Device memory hotplug hooks into regular memory hotplug only half way.
It needs memory sections to track struct pages but there is no
need/desire to associate those sections with memory blocks and export
them to the userspace via sysfs because they cannot be onlined anyway.

This is currently expressed by for_device argument to arch_add_memory
which then makes sure to associate the given memory range with
ZONE_DEVICE.  register_new_memory then relies on is_zone_device_section
to distinguish special memory hotplug from the regular one.  While this
works now, later patches in this series want to move __add_zone outside
of arch_add_memory path so we have to come up with something else.

Add want_memblock down the __add_pages path and use it to control
whether the section->memblock association should be done.
arch_add_memory then just trivially want memblock for everything but
for_device hotplug.

remove_memory_section doesn't need is_zone_device_section either.  We
can simply skip all the memblock specific cleanup if there is no
memblock for the given section.

This shouldn't introduce any functional change.

Link: http://lkml.kernel.org/r/20170515085827.16474-5-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Tested-by: NDan Williams <dan.j.williams@intel.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1b862aec

mm: drop page_initialized check from get_nid_for_pfn · bfe63d3b

由 Michal Hocko 提交于 7月 06, 2017

Commit c04fc586 ("mm: show node to memory section relationship with
symlinks in sysfs") has added means to export memblock<->node
association into the sysfs.  It has also introduced get_nid_for_pfn
which is a rather confusing counterpart of pfn_to_nid which checks also
whether the pfn page is already initialized (page_initialized).

This is done by checking page::lru != NULL which doesn't make any sense
at all.  Nothing in this path really relies on the lru list being used
or initialized.  Just remove it because this will become a problem with
later patches.

Thanks to Reza Arbab for testing which revealed this to be a problem
(http://lkml.kernel.org/r/20170403202337.GA12482@dhcp22.suse.cz)

Link: http://lkml.kernel.org/r/20170515085827.16474-4-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bfe63d3b

zram: count same page write as page_stored · 51f9f82c

由 Minchan Kim 提交于 7月 06, 2017

Regardless of whether it is same page or not, it's surely write and
stored to zram so we should increase pages_stored stat. Otherwise, user
can see zero value via mm_stats although he writes a lot of pages to
zram.

Link: http://lkml.kernel.org/r/1494834068-27004-1-git-send-email-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

51f9f82c

mm, sparsemem: break out of loops early · c4e1be9e

由 Dave Hansen 提交于 7月 06, 2017

There are a number of times that we loop over NR_MEM_SECTIONS, looking
for section_present() on each section. But, when we have very large
physical address spaces (large MAX_PHYSMEM_BITS), NR_MEM_SECTIONS
becomes very large, making the loops quite long.

With MAX_PHYSMEM_BITS=46 and a section size of 128MB, the current loops
are 512k iterations, which we barely notice on modern hardware. But,
raising MAX_PHYSMEM_BITS higher (like we will see on systems that
support 5-level paging) makes this 64x longer and we start to notice,
especially on slower systems like simulators. A 10-second delay for
512k iterations is annoying. But, a 640- second delay is crippling.

This does not help if we have extremely sparse physical address spaces,
but those are quite rare. We expect that most of the "slow" systems
where this matters will also be quite small and non-sparse.

To fix this, we track the highest section we've ever encountered. This
lets us know when we will *never* see another section_present(), and
lets us break out of the loops earlier.

Doing the whole for_each_present_section_nr() macro is probably
overkill, but it will ensure that any future loop iterations that we
grow are more likely to be correct.

Kirrill said "It shaved almost 40 seconds from boot time in qemu with
5-level paging enabled for me".

Link: http://lkml.kernel.org/r/20170504174434.C45A4735@viggo.jf.intel.comSigned-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Tested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c4e1be9e

drivers/sh/intc/virq.c: delete an error message for a failed memory allocation... · c509e05f

由 SF Markus Elfring 提交于 7月 06, 2017

drivers/sh/intc/virq.c: delete an error message for a failed memory allocation in add_virq_to_pirq()

This issue was detected by using the Coccinelle software.

Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
Link: http://lkml.kernel.org/r/54e30d61-5183-9911-cf35-1410fb78da5a@users.sourceforge.netSigned-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c509e05f

06 7月, 2017 5 次提交

A
Fix trivial misannotations · b87b786b
由 Al Viro 提交于 7月 06, 2017
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
b87b786b

IB/core, opa_vnic, hfi1, mlx5: Properly free rdma_netdev · 8e959601

由 Niranjana Vishwanathapura 提交于 6月 30, 2017

IPOIB is calling free_rdma_netdev even though alloc_rdma_netdev has
returned -EOPNOTSUPP.
Move free_rdma_netdev from ib_device structure to rdma_netdev structure
thus ensuring proper cleanup function is called for the rdma net device.

Fix the following trace:

ib0: Failed to modify QP to ERROR state
BUG: unable to handle kernel paging request at 0000000000001d20
IP: hfi1_vnic_free_rn+0x26/0xb0 [hfi1]
Call Trace:
 ipoib_remove_one+0xbe/0x160 [ib_ipoib]
 ib_unregister_device+0xd0/0x170 [ib_core]
 rvt_unregister_device+0x29/0x90 [rdmavt]
 hfi1_unregister_ib_device+0x1a/0x100 [hfi1]
 remove_one+0x4b/0x220 [hfi1]
 pci_device_remove+0x39/0xc0
 device_release_driver_internal+0x141/0x200
 driver_detach+0x3f/0x80
 bus_remove_driver+0x55/0xd0
 driver_unregister+0x2c/0x50
 pci_unregister_driver+0x2a/0xa0
 hfi1_mod_cleanup+0x10/0xf65 [hfi1]
 SyS_delete_module+0x171/0x250
 do_syscall_64+0x67/0x150
 entry_SYSCALL64_slow_path+0x25/0x25
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NNiranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8e959601

dm zoned: fix overflow when converting zone ID to sectors · 3908c983

由 Damien Le Moal 提交于 7月 03, 2017

A zone ID is a 32 bits unsigned int which can overflow when doing the
bit shifts in dmz_start_sect().  With a 256 MB zone size drive, the
overflow happens for a zone ID >= 8192.

Fix this by casting the zone ID to a sector_t before doing the bit
shift.  While at it, similarly fix dmz_start_block().
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

3908c983

Cavium CNN55XX: fix broken default Kconfig entry · b4b8cbf6

由 Linus Torvalds 提交于 7月 05, 2017

Every developer always thinks that _their_ code is so special and
magical that it should be enabled by default.

And most of them are completely and utterly wrong.  That's definitely
the case when you write a specialty driver for a very unsual "security
processor". It does *not* get to mark itself as "default m".

If you solve world hunger, and make a driver that cures people of
cancer, by all means enable it by default.  But afaik, the Cavium
CNN55XX does neither.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b4b8cbf6

parisc: ->mapping_error · 227145eb

由 Christoph Hellwig 提交于 7月 04, 2017

DMA_ERROR_CODE already went away in linux-next, but parisc unfortunately
added a new instance of it without any review as far as I can tell.

Move the two iommu drivers to report errors through ->mapping_error.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NHelge Deller <deller@gmx.de>

227145eb

05 7月, 2017 24 次提交

net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap · 37144476

由 Murali Karicheri 提交于 7月 04, 2017

The data manual for DP83867IR/CR, SNLS484E[1], revised march 2017,
advises that strapping RX_DV/RX_CTRL pin in mode 1 and 2 is not
supported (see note below Table 5 (4-Level Strap Pins)).

There are some boards which have the pin strapped this way and need
software workaround suggested by the data manual. Bit[7] of
Configuration Register 4 (address 0x0031) must be cleared to 0. This
ensures proper operation of the PHY.

Implement driver support for device-tree property meant to advertise
the wrong strapping.

[1] http://www.ti.com/lit/ds/snls484e/snls484e.pdfSigned-off-by: NMurali Karicheri <m-karicheri2@ti.com>
[nsekhar@ti.com: rebase to mainline, code simplification]
Signed-off-by: NSekhar Nori <nsekhar@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37144476

cxgb4: Support for get_ts_info ethtool method · c3ff08eb

由 Atul Gupta 提交于 7月 04, 2017

Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: NAtul Gupta <atul.gupta@chelsio.com>
Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3ff08eb

cxgb4: Add PTP Hardware Clock (PHC) support · 9c33e420

由 Atul Gupta 提交于 7月 04, 2017

Add PTP IEEE-1588 support and make it accessible via PHC subsystem.
The functionality is enabled for T5/T6 adapters. Driver interfaces with
Firmware to program and adjust the clock offset.

Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: NAtul Gupta <atul.gupta@chelsio.com>
Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c33e420

cxgb4: time stamping interface for PTP · a4569504

由 Atul Gupta 提交于 7月 04, 2017

Supports hardware and software time stamping via the
Linux SO_TIMESTAMPING socket option.

Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: NAtul Gupta <atul.gupta@chelsio.com>
Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a4569504

nfp: default to chained metadata prepend format · 64a919a9

由 Jakub Kicinski 提交于 7月 04, 2017

ABI 4.x introduced the chained metadata format and made it the
only one possible.  There are cases, however, where the old
format is preferred - mostly to make interoperation with VFs
using ABI 3.x easier for the datapath.  In ABI 5.x we allowed
for more flexibility by selecting the metadata format based
on capabilities.  The default was left to non-chained.

In case of fallback traffic, there is no capability telling the
driver there may be chained metadata.  With a very stripped-
-down FW the default old metadata format would be selected
making the driver drop all fallback traffic.

This patch changes the default selection in the driver. It
should not hurt with old firmwares, because if they don't
advertise RSS they will not produce metadata anyway.  New
firmwares advertising ABI 5.x, however, can depend on the
driver defaulting to chained format.

Fixes: f9380629 ("nfp: advertise support for NFD ABI 0.5")
Suggested-by: NMichael Rapson <michael.rapson@netronome.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64a919a9

nfp: remove legacy MAC address lookup · cb2cda48

由 Jakub Kicinski 提交于 7月 04, 2017

The legacy MAC address lookup doesn't work well with breakout
cables.  We are probably better off picking random addresses
than the wrong ones in the theoretical scenario where management
FW didn't tell us what the port config is.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cb2cda48

nfp: improve order of interfaces in breakout mode · 2eb333c4

由 Jakub Kicinski 提交于 7月 04, 2017

For historical reasons we enumerate the vNICs in order.  This means
that if user configures breakout on a multiport card, the first
interface of the second port will have its MAC address changed.

What's worse, when moved from static information (HWInfo) to using
management FW (NSP), more features started depending on the port ids.
Right now in case of breakout first subport of the second port and
second subport of the first port will have their link info swapped.

Revise the ordering scheme so that first subport maintains its address.
Side effect of this change is that we will use base lane ids in
devlink (i.e. 40G ports will be 4 ids apart), e.g.:

pci/0000:04:00.0/0: type eth netdev p6p1
pci/0000:04:00.0/4: type eth netdev p6p2

Note that behaviour of phys_port_id is not changed since there is
a separate id number for the subport there.

Fixes: ec8b1fbe ("nfp: support port splitting via devlink")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2eb333c4

net: macb: remove extraneous return when MACB_EXT_DESC is defined · 42af627b

由 Colin Ian King 提交于 7月 04, 2017

When macro MACB_EXT_DESC is defined we end up with two identical
return statements and just one is sufficient. Remove the extra
return.

Detected by CoverityScan, CID#1449361 ("Structurally dead code")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Acked-by: NNicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42af627b

s390/vfio_ccw: remove unused variable · c14b7a85

由 Sebastian Ott 提交于 6月 26, 2017

Fix this set but not used warning:

drivers/s390/cio/vfio_ccw_drv.c: In function 'vfio_ccw_sch_io_todo':
drivers/s390/cio/vfio_ccw_drv.c:72:21: warning: variable 'sch' set but not used [-Wunused-but-set-variable]
  struct subchannel *sch;
                     ^
Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: NDong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

c14b7a85

s390/dasd: remove unneeded code · 4bca698f

由 Sebastian Ott 提交于 6月 26, 2017

Fix these set but not used warnings:

drivers/s390/block/dasd.c:3933:6: warning: variable 'rc' set but not used [-Wunused-but-set-variable]
drivers/s390/block/dasd_alias.c:757:6: warning: variable 'rc' set but not used [-Wunused-but-set-variable]

In addition to that remove the test if an unsigned is < 0:

drivers/s390/block/dasd_devmap.c:153:11: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

4bca698f

s390/zcrypt: Fix missing newlines at some debug feature messages. · 792e0e00

由 Harald Freudenberger 提交于 6月 29, 2017

On some debug feature invocations the newline was missing.
Signed-off-by: NHarald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

792e0e00

s390/dasd: Make raw I/O usable without prefix support · 9d2be0c1

由 Jan Höppner 提交于 3月 22, 2017

The Prefix CCW is not mandatory and raw I/O can also be issued without
it. Check whether the Prefix CCW is supported and if not use the
combination of Define Extent and Locate Record Extended instead.

While at it, sort the variable declarations, replace the gotos with
early exits, and remove an error check at the end which is irrelevant.
Also, remove the XRC check as it is not relevant for raw I/O.
Reviewed-by: NStefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: NJan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

9d2be0c1

s390/dasd: Rename dasd_raw_build_cp() · bbc7f7ea

由 Jan Höppner 提交于 5月 05, 2017

Rename dasd_raw_build_cp() to dasd_eckd_build_cp_raw() to fit the
scheme.
Reviewed-by: NStefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: NJan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

bbc7f7ea

s390/dasd: Refactor prefix_LRE() and related functions · 45f186be

由 Jan Höppner 提交于 3月 14, 2017

We already have define_extent() that prepares necessary data for the
Define Extent CCW. The exact same thing is done in prefix_LRE().

Remove the duplicate code and move commands that were only used in
combination with the Prefix command to define_extent(). One of these
commands needs the blocksize to be specified. Add the blksize parameter
to define_extent() to account for that.

In addition, the check_XRC() function can be made more generic. Do this
and remove the Prefix-specific check_XRC_on_prefix() function.

Furthermore, prefix_LRE() uses fill_LRE_data() to prepare Locate Record
Extended data. Rename the function to fit the scheme better and make it
usable outside of the Prefix context by adding the corresponding CCW
command.
Reviewed-by: NStefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: NJan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

45f186be

S
s390: fix up for "blk-mq: switch ->queue_rq return value to blk_status_t" · dd1023c8
由 Stephen Rothwell 提交于 7月 04, 2017
```
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
```
dd1023c8

net, vxlan: convert vxlan_sock.refcnt from atomic_t to refcount_t · 66af846f

由 Reshetova, Elena 提交于 7月 04, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66af846f

A
mga: switch compat ioctls to drm_ioctl_kernel() · aeba0390
由 Al Viro 提交于 6月 03, 2017
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
aeba0390

radeon: take out dead compat ioctls · ff32d39b

由 Al Viro 提交于 6月 03, 2017

	Compat wrappers in radeon_ioc32.c had been unreachable since
"drm/radeon: remove UMS support" has removed radeon_driver_old_fops.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ff32d39b

A
drm compat: ia64 is not biarch · 9cc73ce2
由 Al Viro 提交于 5月 25, 2017
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
9cc73ce2
A
drm_compat_ioctl(): tidy up a bit · 88e3cb07
由 Al Viro 提交于 5月 25, 2017
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
88e3cb07

switch compat_drm_mapbufs() to drm_ioctl_kernel() · 87d3ce11

由 Al Viro 提交于 5月 25, 2017

Another horror like addbufs; this one is even uglier.  With that
done, drm_ioc32.c should be sane.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

87d3ce11

A
switch compat_drm_rmmap() to drm_ioctl_kernel() · 6113252d
由 Al Viro 提交于 5月 25, 2017
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
6113252d
A
switch compat_drm_mode_addfb2() to drm_ioctl_kernel() · d6c56613
由 Al Viro 提交于 5月 25, 2017
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d6c56613
A
switch compat_drm_wait_vblank() to drm_ioctl_kernel() · d5288c88
由 Al Viro 提交于 5月 25, 2017
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d5288c88

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功