提交 · 41fc1e0fe2102435b7e35c36dd2d61669cdf787b · openeuler / Kernel

02 8月, 2016 11 次提交

qed: do not use unitialized variable · 41fc1e0f

由 xypron.glpk@gmx.de 提交于 7月 31, 2016

Do not write random bytes from the kernel stack when
calling qed_wr.
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Acked-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41fc1e0f

wan/fsl_ucc_hdlc: avoid possible NULL pointer dereference · 8c57a3a7

由 xypron.glpk@gmx.de 提交于 7月 31, 2016

All assignments to components of priv should only
occur after the check if prif is NULL.
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c57a3a7

net: qlge: remove superfluous statement · 4fb482f7

由 xypron.glpk@gmx.de 提交于 7月 31, 2016

Variable length is not used after the deleted line.
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4fb482f7

net: s2io: simplify logical constraint · 6d85a1bf

由 xypron.glpk@gmx.de 提交于 7月 31, 2016

(!A || (A && B)) is equivalent to (!A || B)
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d85a1bf

net: enic: use correct type specifier · 8d546f58

由 xypron.glpk@gmx.de 提交于 7月 31, 2016

i is defined as unsigned.
So print it with %u.
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d546f58

net: bna: use correct type specifier (2) · 4c2e9e29

由 xypron.glpk@gmx.de 提交于 7月 31, 2016

add and val are read with
sscanf(kern_buf, "%x:%x", &addr, &val);
and used as arguments for bna_reg_offset_check and writel
so they have to be unsigned.
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c2e9e29

net: bna: use correct type specifications · 112b6b79

由 xypron.glpk@gmx.de 提交于 7月 31, 2016

addr and len are read with
sscanf(kern_buf, "%x:%x", &addr, &len);
and used as arguments for
bna_reg_offset_check.

So they have to be unsigned.
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

112b6b79

net: bcm63xx: avoid possible null pointer dereference · 323b15b9

由 xypron.glpk@gmx.de 提交于 7月 31, 2016

If dev_get_platdata has failed pd is null.
Do not dereference a null pointer.
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

323b15b9

net: amd-xgbe: use correct format specifier · fb160ebd

由 xypron.glpk@gmx.de 提交于 7月 31, 2016

i has been defined as unsigned int.
So use %u for output.
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb160ebd

net: ethernet: ax88796: avoid null pointer dereference · 09d306d7

由 xypron.glpk@gmx.de 提交于 7月 31, 2016

If platform_get_resource fails, mem2 is null.
Do not dereference null.
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09d306d7

net: caif: use correct format specifier · 59d53bc2

由 xypron.glpk@gmx.de 提交于 7月 31, 2016

%u is the wrong format specifier for int.
size_t cannot be converted to int without possible
loss of information.

So leave the result as size_t and use %zu as format specifier.

cf. Documentation/printk-formats.txt
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

59d53bc2

01 8月, 2016 4 次提交

phy/micrel: Change phy_id_mask for KSZ8721 · ecd5a323

由 Alexander Stein 提交于 7月 29, 2016

There are KSZ8721 PHYs with phy_id 0x00221619. In order to detect them
as PHY_ID_KSZ8001 compatible while staying different to PHY_ID_KSZ9021
ignore the last two bits when matching PHY_ID
Signed-off-by: NAlexander Stein <alexanders83@web.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ecd5a323

r8169: fix nic may not work after changing mac address. · f51d4a10

由 Chun-Hao Lin 提交于 7月 29, 2016

When there is no AC power, NIC may not work after changing mac address.
Please refer to following link.
http://www.spinics.net/lists/netdev/msg356572.html

This issue is caused by runtime power management. When there is no AC
power, if we put NIC down (ifconfig down), the driver will be in runtime
suspend state and hardware will be put into D3 state. During this time,
driver cannot access hardware regisers. So if you set new mac address
during this time, it will not be set to hardware. After resume, NIC will
keep using the old mac address and the network will not work normally.

In this patch I add detecting runtime pm status when setting mac address.
If driver is in runtime suspend state, it will skip setting mac address, keep
the new mac address, and set the new mac address during runtime resume.
Signed-off-by: NChunhao Lin <hau@realtek.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f51d4a10

r8169: add checking driver's runtime pm status in rtl8169_get_ethtool_stats() · e0636236

由 Chun-Hao Lin 提交于 7月 29, 2016

Not to call rtl8169_update_counters() to dump tally counter when driver
is in runtime suspend state.

Calling rtl8169_update_counters() in runtime suspend state will produce
warning message "rtl_counters_cond == 1 (loop: 1000, delay: 10)".
Signed-off-by: NChunhao Lin <hau@realtek.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0636236

r8169: fix kernel log spam when set or get hardware wol setting. · 5fa80a32

由 Chun-Hao Lin 提交于 7月 29, 2016

NIC will be put into D3 state during runtime suspend state. When set or
get hardware wol setting, driver will write or read hardware registers.
If we set or get hardware wol setting in runtime suspend state, because
NIC will in D3 state, the hardware registers read by driver will return all
0xff. That will let driver thinking register flag is not toggled and
then prints the warning message "rtl_counters_cond == 1 (loop: 1000,
delay: 10)" to kernel log.

For fixing this issue, add checking driver's pm runtime status in
rtl8169_get_wol() and rtl8169_set_wol().
Signed-off-by: NChunhao Lin <hau@realtek.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5fa80a32

31 7月, 2016 16 次提交

net: dsa: bcm_sf2: Unwind errors in correct order · bb9c0fa3

由 Florian Fainelli 提交于 7月 29, 2016

In case we cannot complete bcm_sf2_sw_setup() for any reason, and we
go to the out_unmap label, but the MDIO bus has not been registered yet,
we will hit the BUG condition in drivers/net/phy/mdio_bus.c about the
bus not being registered. Fix this by dedicating a specific lable for
when we fail after the MDIO bus has been successfully registered.

Fixes: 461cd1b0 ("net: dsa: bcm_sf2: Register our slave MDIO bus")
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb9c0fa3

net: tulip: fix spelling mistake: "attemping" -> "attempting" · 33c77efb

由 Colin Ian King 提交于 7月 30, 2016

trivial fix to spelling mistake in printk message
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33c77efb

macsec: fix negative refcnt on parent link · 0759e552

由 Sabrina Dubroca 提交于 7月 29, 2016

When creation of a macsec device fails because an identical device
already exists on this link, the current code decrements the refcnt on
the parent link (in ->destructor for the macsec device), but it had not
been incremented yet.

Move the dev_hold(parent_link) call earlier during macsec device
creation.

Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0759e552

macsec: RXSAs don't need to hold a reference on RXSCs · 36b232c8

由 Sabrina Dubroca 提交于 7月 29, 2016

Following the previous patch, RXSCs are held and properly refcounted in
the RX path (instead of being implicitly held by their SA), so the SA
doesn't need to hold a reference on its parent RXSC.

This also avoids panics on module unload caused by the double layer of
RCU callbacks (call_rcu frees the RXSA, which puts the final reference
on the RXSC and allows to free it in its own call_rcu) that commit
b196c22a ("macsec: add rcu_barrier() on module exit") didn't
protect against.
There were also some refcounting bugs in macsec_add_rxsa where I didn't
put the reference on the RXSC on the error paths, which would lead to
memory leaks.

Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36b232c8

macsec: fix reference counting on RXSC in macsec_handle_frame · c78ebe1d

由 Sabrina Dubroca 提交于 7月 29, 2016

Currently, we lookup the RXSC without taking a reference on it.  The
RXSA holds a reference on the RXSC, but the SA and SC could still both
disappear before we take a reference on the SA.

Take a reference on the RXSC in macsec_handle_frame.

Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c78ebe1d

drivers: net: cpsw: use of_platform_depopulate() · 3bf2cb3a

由 Grygorii Strashko 提交于 7月 28, 2016

Use of_platform_depopulate() in cpsw_remove() instead of
of_device_unregister(), because CSPW child devices will not be
recreated otherwise on next insmod. of_platform_depopulate() is
correct way now as it will ensure that all steps done in
of_platform_populate() are reverted, including cleaning up of
OF_POPULATED flag.
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: NMugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3bf2cb3a

drivers: net: cpsw: fix wrong regs access in cpsw_remove · 8a0b6dc9

由 Grygorii Strashko 提交于 7月 28, 2016

The L3 error will be generated and system will crash during unloading
of CPSW driver if CPSW is used as module and ethX devices are down.
This happens because CPSW can be power off by PM runtime now when ethX
devices are down.

Hence, ensure that CPSW powered up by PM runtime before performing any
deinitialization actions which require CPSW registers access. In case
of PM runtime error just leave cpsw_remove() as we can't do anything
anymore.
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: NMugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a0b6dc9

net: ethernet: ti: cpdma: fix lockup in cpdma_ctlr_destroy() · fccd5bad

由 Grygorii Strashko 提交于 7月 28, 2016

Fix deadlock in cpdma_ctlr_destroy() which is triggered now on
cpsw module removal:
 cpsw_remove()
 - cpdma_ctlr_destroy()
   - spin_lock_irqsave(&ctlr->lock, flags)
   - cpdma_ctlr_stop()
     - spin_lock_irqsave(&ctlr->lock, flags);
   - cpdma_chan_destroy()
     - spin_lock_irqsave(&ctlr->lock, flags);

The issue has not been observed before because CPDMA channels have
been destroyed manually by CPSW until commit d941ebe8 ("net:
ethernet: ti: cpsw: use destroy ctlr to destroy channels") was merged.
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: NMugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fccd5bad

cxgb4/cxgb4vf: Fixes regression in perf when tx vlan offload is disabled · 8d09e6b8

由 Hariprasad Shenai 提交于 7月 28, 2016

The commit 637d3e99 ("cxgb4: Discard the packet if the length is
greater than mtu") introduced a regression in the VLAN interface
performance when Tx VLAN offload is disabled.

Check if skb is tagged, regardless of whether it is hardware accelerated
or not. Presently we were checking only for hardware acclereated one,
which caused performance to drop to ~0.17Mbps on a 10GbE adapter for
VLAN interface, when tx vlan offload is turned off using ethtool.
The ethernet head length calculation was going wrong in this case, and
driver ended up dropping packets.

Fixes: 637d3e99 ("cxgb4: Discard the packet if the length is greater than mtu")
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d09e6b8

drivers: net: phy: xgene: Remove redundant dev_err call in xgene_mdio_probe() · b2df430b

由 Wei Yongjun 提交于 7月 28, 2016

There is a error message within devm_ioremap_resource
already, so remove the dev_err call to avoid redundant
error message.
Signed-off-by: NWei Yongjun <weiyj.lk@gmail.com>
Acked-By: NIyappan Subramanian <isubramanian@apm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2df430b

qed: Prevent over-usage of vlan credits by PF · 25eb8d46

由 Yuval Mintz 提交于 7月 27, 2016

Each PF/VF has a limited number of vlan filters for
configuration purposes; This information is passed to qede
and is used to prevent over-usage - once a vlan is to be
configured and no filter credit is available, the driver
would switch into working in vlan-promisc mode.

Problem is the credit pool is shared by both PFs and VFs,
and currently PFs aren't deducting the filters that are
reserved for their VFs from their quota, which may lead
to some vlan filters failing unknowingly due to lack of credit.
Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25eb8d46

qed: Correct min bandwidth for 100g · d572c430

由 Yuval Mintz 提交于 7月 27, 2016

Driver uses reverse logic when checking if minimum
bandwidth configuration applied, causing it to
configure the guarantee only on the first hw-function.

Fixes: a0d26d5a ("qed*: Don't reset statistics on inner reload")
Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d572c430

qede: Reset statistics on explicit down · 7f7a144f

由 Yuval Mintz 提交于 7月 27, 2016

Adding the necessary logic to prevet statistics reset
on inner-reload introduced a bug, and now statistics
are reset only when re-probing the driver.

Fixes: a0d26d5a ("qed*: Don't reset statistics on inner reload")
Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f7a144f

qed: Don't over-do producer cleanup for Rx · b21290b7

由 Yuval Mintz 提交于 7月 27, 2016

Before requesting the firmware to start Rx queues,
driver goes and sets the queue producer in the device to 0.
But while the producer is 32-bit, the driver currently clears 64 bits,
effectively zeroing an additional CID's producer as well.
Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b21290b7

qed: Fix removal of spoof checking for VFs · cb1fa088

由 Yuval Mintz 提交于 7月 27, 2016

Driver has reverse logic for checking the result of the
spoof-checking configuration. As a result, it would log that
the configuration failed [even though it succeeded], and will
no longer do anything when requested to remove the configuration,
as it's accounting of the feature will be incorrect.

Fixes: 6ddc7608 ("qed*: IOV support spoof-checking")
Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cb1fa088

qede: Don't try removing unconfigured vlans · c524e2f5

由 Yuval Mintz 提交于 7月 27, 2016

As part of ndo_vlan_rx_kill_vid() implementation,
qede is requesting firmware to remove the vlan filter.
This currently happens even if the vlan wasn't previously
added [In case device ran out of vlan credits].

For PFs this doesn't cause any issues as the firmware
would simply ignore the removal request. But for VFs their
parent PF is holding an accounting of the configured vlans,
and such a request would cause the PF to fail the VF's
removal request.

Simply fix this for both PFs & VFs and don't remove filters
that were not previously added.

Fixes: 7c1bfcad ("qede: Add vlan filtering offload support")
Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c524e2f5

29 7月, 2016 7 次提交

mm: track NR_KERNEL_STACK in KiB instead of number of stacks · d30dd8be

由 Andy Lutomirski 提交于 7月 28, 2016

Currently, NR_KERNEL_STACK tracks the number of kernel stacks in a zone.
This only makes sense if each kernel stack exists entirely in one zone,
and allowing vmapped stacks could break this assumption.

Since frv has THREAD_SIZE < PAGE_SIZE, we need to track kernel stack
allocations in a unit that divides both THREAD_SIZE and PAGE_SIZE on all
architectures. Keep it simple and use KiB.

Link: http://lkml.kernel.org/r/083c71e642c5fa5f1b6898902e1b2db7b48940d4.1468523549.git.luto@kernel.orgSigned-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: NVladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d30dd8be

mm: move most file-based accounting to the node · 11fb9989

由 Mel Gorman 提交于 7月 28, 2016

There are now a number of accounting oddities such as mapped file pages
being accounted for on the node while the total number of file pages are
accounted on the zone.  This can be coped with to some extent but it's
confusing so this patch moves the relevant file-based accounted.  Due to
throttling logic in the page allocator for reliable OOM detection, it is
still necessary to track dirty and writeback pages on a per-zone basis.

[mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting]
  Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net
Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

11fb9989

mm: rename NR_ANON_PAGES to NR_ANON_MAPPED · 4b9d0fab

由 Mel Gorman 提交于 7月 28, 2016

NR_FILE_PAGES  is the number of        file pages.
NR_FILE_MAPPED is the number of mapped file pages.
NR_ANON_PAGES  is the number of mapped anon pages.

This is unhelpful naming as it's easy to confuse NR_FILE_MAPPED and
NR_ANON_PAGES for mapped pages.  This patch renames NR_ANON_PAGES so we
have

NR_FILE_PAGES  is the number of        file pages.
NR_FILE_MAPPED is the number of mapped file pages.
NR_ANON_MAPPED is the number of mapped anon pages.

Link: http://lkml.kernel.org/r/1467970510-21195-19-git-send-email-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4b9d0fab

mm: move page mapped accounting to the node · 50658e2e

由 Mel Gorman 提交于 7月 28, 2016

Reclaim makes decisions based on the number of pages that are mapped but
it's mixing node and zone information.  Account NR_FILE_MAPPED and
NR_ANON_PAGES pages on the node.

Link: http://lkml.kernel.org/r/1467970510-21195-18-git-send-email-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

50658e2e

mm, vmscan: move LRU lists to node · 599d0c95

由 Mel Gorman 提交于 7月 28, 2016

This moves the LRU lists from the zone to the node and related data such
as counters, tracing, congestion tracking and writeback tracking.

Unfortunately, due to reclaim and compaction retry logic, it is
necessary to account for the number of LRU pages on both zone and node
logic. Most reclaim logic is based on the node counters but the retry
logic uses the zone counters which do not distinguish inactive and
active sizes. It would be possible to leave the LRU counters on a
per-zone basis but it's a heavier calculation across multiple cache
lines that is much more frequent than the retry checks.

Other than the LRU counters, this is mostly a mechanical patch but note
that it introduces a number of anomalies. For example, the scans are
per-zone but using per-node counters. We also mark a node as congested
when a zone is congested. This causes weird problems that are fixed
later but is easier to review.

In the event that there is excessive overhead on 32-bit systems due to
the nodes being on LRU then there are two potential solutions

1. Long-term isolation of highmem pages when reclaim is lowmem

When pages are skipped, they are immediately added back onto the LRU
list. If lowmem reclaim persisted for long periods of time, the same
highmem pages get continually scanned. The idea would be that lowmem
keeps those pages on a separate list until a reclaim for highmem pages
arrives that splices the highmem pages back onto the LRU. It potentially
could be implemented similar to the UNEVICTABLE list.

That would reduce the skip rate with the potential corner case is that
highmem pages have to be scanned and reclaimed to free lowmem slab pages.

2. Linear scan lowmem pages if the initial LRU shrink fails

This will break LRU ordering but may be preferable and faster during
memory pressure than skipping LRU pages.

Link: http://lkml.kernel.org/r/1467970510-21195-4-git-send-email-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

599d0c95

mm, vmstat: add infrastructure for per-node vmstats · 75ef7184

由 Mel Gorman 提交于 7月 28, 2016

Patchset: "Move LRU page reclaim from zones to nodes v9"

This series moves LRUs from the zones to the node.  While this is a
current rebase, the test results were based on mmotm as of June 23rd.
Conceptually, this series is simple but there are a lot of details.
Some of the broad motivations for this are;

1. The residency of a page partially depends on what zone the page was
   allocated from.  This is partially combatted by the fair zone allocation
   policy but that is a partial solution that introduces overhead in the
   page allocator paths.

2. Currently, reclaim on node 0 behaves slightly different to node 1. For
   example, direct reclaim scans in zonelist order and reclaims even if
   the zone is over the high watermark regardless of the age of pages
   in that LRU. Kswapd on the other hand starts reclaim on the highest
   unbalanced zone. A difference in distribution of file/anon pages due
   to when they were allocated results can result in a difference in
   again. While the fair zone allocation policy mitigates some of the
   problems here, the page reclaim results on a multi-zone node will
   always be different to a single-zone node.
   it was scheduled on as a result.

3. kswapd and the page allocator scan zones in the opposite order to
   avoid interfering with each other but it's sensitive to timing.  This
   mitigates the page allocator using pages that were allocated very recently
   in the ideal case but it's sensitive to timing. When kswapd is allocating
   from lower zones then it's great but during the rebalancing of the highest
   zone, the page allocator and kswapd interfere with each other. It's worse
   if the highest zone is small and difficult to balance.

4. slab shrinkers are node-based which makes it harder to identify the exact
   relationship between slab reclaim and LRU reclaim.

The reason we have zone-based reclaim is that we used to have
large highmem zones in common configurations and it was necessary
to quickly find ZONE_NORMAL pages for reclaim. Today, this is much
less of a concern as machines with lots of memory will (or should) use
64-bit kernels. Combinations of 32-bit hardware and 64-bit hardware are
rare. Machines that do use highmem should have relatively low highmem:lowmem
ratios than we worried about in the past.

Conceptually, moving to node LRUs should be easier to understand. The
page allocator plays fewer tricks to game reclaim and reclaim behaves
similarly on all nodes.

The series has been tested on a 16 core UMA machine and a 2-socket 48
core NUMA machine. The UMA results are presented in most cases as the NUMA
machine behaved similarly.

pagealloc
---------

This is a microbenchmark that shows the benefit of removing the fair zone
allocation policy. It was tested uip to order-4 but only orders 0 and 1 are
shown as the other orders were comparable.

                                           4.7.0-rc4                  4.7.0-rc4
                                      mmotm-20160623                 nodelru-v9
Min      total-odr0-1               490.00 (  0.00%)           457.00 (  6.73%)
Min      total-odr0-2               347.00 (  0.00%)           329.00 (  5.19%)
Min      total-odr0-4               288.00 (  0.00%)           273.00 (  5.21%)
Min      total-odr0-8               251.00 (  0.00%)           239.00 (  4.78%)
Min      total-odr0-16              234.00 (  0.00%)           222.00 (  5.13%)
Min      total-odr0-32              223.00 (  0.00%)           211.00 (  5.38%)
Min      total-odr0-64              217.00 (  0.00%)           208.00 (  4.15%)
Min      total-odr0-128             214.00 (  0.00%)           204.00 (  4.67%)
Min      total-odr0-256             250.00 (  0.00%)           230.00 (  8.00%)
Min      total-odr0-512             271.00 (  0.00%)           269.00 (  0.74%)
Min      total-odr0-1024            291.00 (  0.00%)           282.00 (  3.09%)
Min      total-odr0-2048            303.00 (  0.00%)           296.00 (  2.31%)
Min      total-odr0-4096            311.00 (  0.00%)           309.00 (  0.64%)
Min      total-odr0-8192            316.00 (  0.00%)           314.00 (  0.63%)
Min      total-odr0-16384           317.00 (  0.00%)           315.00 (  0.63%)
Min      total-odr1-1               742.00 (  0.00%)           712.00 (  4.04%)
Min      total-odr1-2               562.00 (  0.00%)           530.00 (  5.69%)
Min      total-odr1-4               457.00 (  0.00%)           433.00 (  5.25%)
Min      total-odr1-8               411.00 (  0.00%)           381.00 (  7.30%)
Min      total-odr1-16              381.00 (  0.00%)           356.00 (  6.56%)
Min      total-odr1-32              372.00 (  0.00%)           346.00 (  6.99%)
Min      total-odr1-64              372.00 (  0.00%)           343.00 (  7.80%)
Min      total-odr1-128             375.00 (  0.00%)           351.00 (  6.40%)
Min      total-odr1-256             379.00 (  0.00%)           351.00 (  7.39%)
Min      total-odr1-512             385.00 (  0.00%)           355.00 (  7.79%)
Min      total-odr1-1024            386.00 (  0.00%)           358.00 (  7.25%)
Min      total-odr1-2048            390.00 (  0.00%)           362.00 (  7.18%)
Min      total-odr1-4096            390.00 (  0.00%)           362.00 (  7.18%)
Min      total-odr1-8192            388.00 (  0.00%)           363.00 (  6.44%)

This shows a steady improvement throughout. The primary benefit is from
reduced system CPU usage which is obvious from the overall times;

           4.7.0-rc4   4.7.0-rc4
        mmotm-20160623nodelru-v8
User          189.19      191.80
System       2604.45     2533.56
Elapsed      2855.30     2786.39

The vmstats also showed that the fair zone allocation policy was definitely
removed as can be seen here;

                             4.7.0-rc3   4.7.0-rc3
                         mmotm-20160623 nodelru-v8
DMA32 allocs               28794729769           0
Normal allocs              48432501431 77227309877
Movable allocs                       0           0

tiobench on ext4
----------------

tiobench is a benchmark that artifically benefits if old pages remain resident
while new pages get reclaimed. The fair zone allocation policy mitigates this
problem so pages age fairly. While the benchmark has problems, it is important
that tiobench performance remains constant as it implies that page aging
problems that the fair zone allocation policy fixes are not re-introduced.

                                         4.7.0-rc4             4.7.0-rc4
                                    mmotm-20160623            nodelru-v9
Min      PotentialReadSpeed        89.65 (  0.00%)       90.21 (  0.62%)
Min      SeqRead-MB/sec-1          82.68 (  0.00%)       82.01 ( -0.81%)
Min      SeqRead-MB/sec-2          72.76 (  0.00%)       72.07 ( -0.95%)
Min      SeqRead-MB/sec-4          75.13 (  0.00%)       74.92 ( -0.28%)
Min      SeqRead-MB/sec-8          64.91 (  0.00%)       65.19 (  0.43%)
Min      SeqRead-MB/sec-16         62.24 (  0.00%)       62.22 ( -0.03%)
Min      RandRead-MB/sec-1          0.88 (  0.00%)        0.88 (  0.00%)
Min      RandRead-MB/sec-2          0.95 (  0.00%)        0.92 ( -3.16%)
Min      RandRead-MB/sec-4          1.43 (  0.00%)        1.34 ( -6.29%)
Min      RandRead-MB/sec-8          1.61 (  0.00%)        1.60 ( -0.62%)
Min      RandRead-MB/sec-16         1.80 (  0.00%)        1.90 (  5.56%)
Min      SeqWrite-MB/sec-1         76.41 (  0.00%)       76.85 (  0.58%)
Min      SeqWrite-MB/sec-2         74.11 (  0.00%)       73.54 ( -0.77%)
Min      SeqWrite-MB/sec-4         80.05 (  0.00%)       80.13 (  0.10%)
Min      SeqWrite-MB/sec-8         72.88 (  0.00%)       73.20 (  0.44%)
Min      SeqWrite-MB/sec-16        75.91 (  0.00%)       76.44 (  0.70%)
Min      RandWrite-MB/sec-1         1.18 (  0.00%)        1.14 ( -3.39%)
Min      RandWrite-MB/sec-2         1.02 (  0.00%)        1.03 (  0.98%)
Min      RandWrite-MB/sec-4         1.05 (  0.00%)        0.98 ( -6.67%)
Min      RandWrite-MB/sec-8         0.89 (  0.00%)        0.92 (  3.37%)
Min      RandWrite-MB/sec-16        0.92 (  0.00%)        0.93 (  1.09%)

           4.7.0-rc4   4.7.0-rc4
        mmotm-20160623 approx-v9
User          645.72      525.90
System        403.85      331.75
Elapsed      6795.36     6783.67

This shows that the series has little or not impact on tiobench which is
desirable and a reduction in system CPU usage. It indicates that the fair
zone allocation policy was removed in a manner that didn't reintroduce
one class of page aging bug. There were only minor differences in overall
reclaim activity

                             4.7.0-rc4   4.7.0-rc4
                          mmotm-20160623nodelru-v8
Minor Faults                    645838      647465
Major Faults                       573         640
Swap Ins                             0           0
Swap Outs                            0           0
DMA allocs                           0           0
DMA32 allocs                  46041453    44190646
Normal allocs                 78053072    79887245
Movable allocs                       0           0
Allocation stalls                   24          67
Stall zone DMA                       0           0
Stall zone DMA32                     0           0
Stall zone Normal                    0           2
Stall zone HighMem                   0           0
Stall zone Movable                   0          65
Direct pages scanned             10969       30609
Kswapd pages scanned          93375144    93492094
Kswapd pages reclaimed        93372243    93489370
Direct pages reclaimed           10969       30609
Kswapd efficiency                  99%         99%
Kswapd velocity              13741.015   13781.934
Direct efficiency                 100%        100%
Direct velocity                  1.614       4.512
Percentage direct scans             0%          0%

kswapd activity was roughly comparable. There were differences in direct
reclaim activity but negligible in the context of the overall workload
(velocity of 4 pages per second with the patches applied, 1.6 pages per
second in the baseline kernel).

pgbench read-only large configuration on ext4
---------------------------------------------

pgbench is a database benchmark that can be sensitive to page reclaim
decisions. This also checks if removing the fair zone allocation policy
is safe

pgbench Transactions
                        4.7.0-rc4             4.7.0-rc4
                   mmotm-20160623            nodelru-v8
Hmean    1       188.26 (  0.00%)      189.78 (  0.81%)
Hmean    5       330.66 (  0.00%)      328.69 ( -0.59%)
Hmean    12      370.32 (  0.00%)      380.72 (  2.81%)
Hmean    21      368.89 (  0.00%)      369.00 (  0.03%)
Hmean    30      382.14 (  0.00%)      360.89 ( -5.56%)
Hmean    32      428.87 (  0.00%)      432.96 (  0.95%)

Negligible differences again. As with tiobench, overall reclaim activity
was comparable.

bonnie++ on ext4
----------------

No interesting performance difference, negligible differences on reclaim
stats.

paralleldd on ext4
------------------

This workload uses varying numbers of dd instances to read large amounts of
data from disk.

                               4.7.0-rc3             4.7.0-rc3
                          mmotm-20160623            nodelru-v9
Amean    Elapsd-1       186.04 (  0.00%)      189.41 ( -1.82%)
Amean    Elapsd-3       192.27 (  0.00%)      191.38 (  0.46%)
Amean    Elapsd-5       185.21 (  0.00%)      182.75 (  1.33%)
Amean    Elapsd-7       183.71 (  0.00%)      182.11 (  0.87%)
Amean    Elapsd-12      180.96 (  0.00%)      181.58 ( -0.35%)
Amean    Elapsd-16      181.36 (  0.00%)      183.72 ( -1.30%)

           4.7.0-rc4   4.7.0-rc4
        mmotm-20160623 nodelru-v9
User         1548.01     1552.44
System       8609.71     8515.08
Elapsed      3587.10     3594.54

There is little or no change in performance but some drop in system CPU usage.

                             4.7.0-rc3   4.7.0-rc3
                        mmotm-20160623  nodelru-v9
Minor Faults                    362662      367360
Major Faults                      1204        1143
Swap Ins                            22           0
Swap Outs                         2855        1029
DMA allocs                           0           0
DMA32 allocs                  31409797    28837521
Normal allocs                 46611853    49231282
Movable allocs                       0           0
Direct pages scanned                 0           0
Kswapd pages scanned          40845270    40869088
Kswapd pages reclaimed        40830976    40855294
Direct pages reclaimed               0           0
Kswapd efficiency                  99%         99%
Kswapd velocity              11386.711   11369.769
Direct efficiency                 100%        100%
Direct velocity                  0.000       0.000
Percentage direct scans             0%          0%
Page writes by reclaim            2855        1029
Page writes file                     0           0
Page writes anon                  2855        1029
Page reclaim immediate             771        1628
Sector Reads                 293312636   293536360
Sector Writes                 18213568    18186480
Page rescued immediate               0           0
Slabs scanned                   128257      132747
Direct inode steals                181          56
Kswapd inode steals                 59        1131

It basically shows that kswapd was active at roughly the same rate in
both kernels. There was also comparable slab scanning activity and direct
reclaim was avoided in both cases. There appears to be a large difference
in numbers of inodes reclaimed but the workload has few active inodes and
is likely a timing artifact.

stutter
-------

stutter simulates a simple workload. One part uses a lot of anonymous
memory, a second measures mmap latency and a third copies a large file.
The primary metric is checking for mmap latency.

stutter
                             4.7.0-rc4             4.7.0-rc4
                        mmotm-20160623            nodelru-v8
Min         mmap     16.6283 (  0.00%)     13.4258 ( 19.26%)
1st-qrtle   mmap     54.7570 (  0.00%)     34.9121 ( 36.24%)
2nd-qrtle   mmap     57.3163 (  0.00%)     46.1147 ( 19.54%)
3rd-qrtle   mmap     58.9976 (  0.00%)     47.1882 ( 20.02%)
Max-90%     mmap     59.7433 (  0.00%)     47.4453 ( 20.58%)
Max-93%     mmap     60.1298 (  0.00%)     47.6037 ( 20.83%)
Max-95%     mmap     73.4112 (  0.00%)     82.8719 (-12.89%)
Max-99%     mmap     92.8542 (  0.00%)     88.8870 (  4.27%)
Max         mmap   1440.6569 (  0.00%)    121.4201 ( 91.57%)
Mean        mmap     59.3493 (  0.00%)     42.2991 ( 28.73%)
Best99%Mean mmap     57.2121 (  0.00%)     41.8207 ( 26.90%)
Best95%Mean mmap     55.9113 (  0.00%)     39.9620 ( 28.53%)
Best90%Mean mmap     55.6199 (  0.00%)     39.3124 ( 29.32%)
Best50%Mean mmap     53.2183 (  0.00%)     33.1307 ( 37.75%)
Best10%Mean mmap     45.9842 (  0.00%)     20.4040 ( 55.63%)
Best5%Mean  mmap     43.2256 (  0.00%)     17.9654 ( 58.44%)
Best1%Mean  mmap     32.9388 (  0.00%)     16.6875 ( 49.34%)

This shows a number of improvements with the worst-case outlier greatly
improved.

Some of the vmstats are interesting

                             4.7.0-rc4   4.7.0-rc4
                          mmotm-20160623nodelru-v8
Swap Ins                           163         502
Swap Outs                            0           0
DMA allocs                           0           0
DMA32 allocs                 618719206  1381662383
Normal allocs                891235743   564138421
Movable allocs                       0           0
Allocation stalls                 2603           1
Direct pages scanned            216787           2
Kswapd pages scanned          50719775    41778378
Kswapd pages reclaimed        41541765    41777639
Direct pages reclaimed          209159           0
Kswapd efficiency                  81%         99%
Kswapd velocity              16859.554   14329.059
Direct efficiency                  96%          0%
Direct velocity                 72.061       0.001
Percentage direct scans             0%          0%
Page writes by reclaim         6215049           0
Page writes file               6215049           0
Page writes anon                     0           0
Page reclaim immediate           70673          90
Sector Reads                  81940800    81680456
Sector Writes                100158984    98816036
Page rescued immediate               0           0
Slabs scanned                  1366954       22683

While this is not guaranteed in all cases, this particular test showed
a large reduction in direct reclaim activity. It's also worth noting
that no page writes were issued from reclaim context.

This series is not without its hazards. There are at least three areas
that I'm concerned with even though I could not reproduce any problems in
that area.

1. Reclaim/compaction is going to be affected because the amount of reclaim is
   no longer targetted at a specific zone. Compaction works on a per-zone basis
   so there is no guarantee that reclaiming a few THP's worth page pages will
   have a positive impact on compaction success rates.

2. The Slab/LRU reclaim ratio is affected because the frequency the shrinkers
   are called is now different. This may or may not be a problem but if it
   is, it'll be because shrinkers are not called enough and some balancing
   is required.

3. The anon/file reclaim ratio may be affected. Pages about to be dirtied are
   distributed between zones and the fair zone allocation policy used to do
   something very similar for anon. The distribution is now different but not
   necessarily in any way that matters but it's still worth bearing in mind.

VM statistic counters for reclaim decisions are zone-based.  If the kernel
is to reclaim on a per-node basis then we need to track per-node
statistics but there is no infrastructure for that.  The most notable
change is that the old node_page_state is renamed to
sum_zone_node_page_state.  The new node_page_state takes a pglist_data and
uses per-node stats but none exist yet.  There is some renaming such as
vm_stat to vm_zone_stat and the addition of vm_node_stat and the renaming
of mod_state to mod_zone_state.  Otherwise, this is mostly a mechanical
patch with no functional change.  There is a lot of similarity between the
node and zone helpers which is unfortunate but there was no obvious way of
reusing the code and maintaining type safety.

Link: http://lkml.kernel.org/r/1467970510-21195-2-git-send-email-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

75ef7184

MD: fix null pointer deference · 5d881783

由 Shaohua Li 提交于 7月 28, 2016

The md device might not have personality (for example, ddf raid array). The
issue is introduced by 8430e7e0(md: disconnect device from personality
before trying to remove it)
Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>

5d881783

28 7月, 2016 2 次提交

sparc: serial: sunhv: fix a double lock bug · 344e3c77

由 Dan Carpenter 提交于 7月 15, 2016

We accidentally take the "port->lock" twice in a row.  This old code
was supposed to be deleted.

Fixes: e58e241c ('sparc: serial: Clean up the locking for -rt')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

344e3c77

random: use for_each_online_node() to iterate over NUMA nodes · 59b8d4f1

由 Theodore Ts'o 提交于 7月 27, 2016

This fixes a crash on s390 with fake NUMA enabled.
Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Fixes: 1e7f583a ("random: make /dev/urandom scalable for silly userspace programs")
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

59b8d4f1

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功