提交 · 43a0a98aa8da71583f84b84fd72e265c24d4c5f8 · openanolis / cloud-kernel

01 8月, 2016 8 次提交

hwmon: (adt7411) set sane values for CFG1 and CFG3 · 601807bb

由 Michael Walle 提交于 7月 25, 2016

According to the datasheet we have to set some bits as 0 and others as 1.
Make sure we do this for CFG1 and CFG3.
Signed-off-by: NMichael Walle <michael@walle.cc>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>

601807bb

hwmon: (iio_hwmon) fix memory leak in name attribute · 5d17d3b4

由 Quentin Schulz 提交于 7月 26, 2016

The "name" variable's memory is now freed when the device is destructed
thanks to devm function.
Signed-off-by: NQuentin Schulz <quentin.schulz@free-electrons.com>
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Fixes: e0f8a24e ("staging:iio::hwmon interface client driver.")
Fixes: 61bb53bc ("hwmon: (iio_hwmon) Add support for humidity sensors")
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>

5d17d3b4

hwmon: (ftsteutates) Fix potential memory access error · 4c8702b3

由 Guenter Roeck 提交于 7月 25, 2016

Using set_bit() to set a bit in an integer is not a good idea, since
the function expects an unsigned long as argument, which can be 64 bit
wide. Coverity reports this problem as

>>>     CID 1364488:  Memory - illegal accesses  (INCOMPATIBLE_CAST)
>>>     Pointer "&ret" points to an object whose effective type is "int"
>>>	(32 bits, signed) but is dereferenced as a wider "unsigned
+long" (64 bits, unsigned).  This may lead to memory corruption.
245                     set_bit(1, (unsigned long *)&ret);

Just use BIT instead.

Cc: Thilo Cestonaro <thilo@cestona.ro>
Fixes: 08426eda ("hwmon: Add driver for FTS BMC chip "Teutates"")
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>

4c8702b3

hwmon: (tmp102) Improve error handling · 1aa4f028

由 Guenter Roeck 提交于 7月 25, 2016

Use devm_add_action_or_reset() instead of devm_add_action(), and
check its return code.
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>

1aa4f028

hwmon: (lm75) Improve error handling · 90e2b545

由 Guenter Roeck 提交于 7月 25, 2016

Use devm_add_action_or_reset() instead of devm_add_action(), and
check its return value.
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>

90e2b545

hwmon: (lm90) Improve error handling · c5fcf01b

由 Guenter Roeck 提交于 7月 25, 2016

Replace devm_add_action() with devm_add_action_or_reset(),
and check its return value.
Reviewed-by: NJean Delvare <jdelvare@suse.de>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>

c5fcf01b

hwmon: (lm90) Add missing assignment · be9d6374

由 Guenter Roeck 提交于 7月 25, 2016

Coverity reports the following error.

>>>     CID 1364474:  Error handling issues  (CHECKED_RETURN)
>>>     Calling "lm90_read_reg" without checking return value (as is done
>>>     elsewhere 28 out of 29 times).
532             lm90_read_reg(client, LM90_REG_R_REMOTE_LOWH);
533             if (val < 0)
534                     return val;

Fixes: 10bfef47 ("hwmon: (lm90) Read limit registers only once")
Reviewed-by: NJean Delvare <jdelvare@suse.de>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>

be9d6374

hwmon: (sht3x) set initial jiffies to last_update · 991f9fa9

由 Matt Ranostay 提交于 7月 24, 2016

Handling the wraparound requires the data->last_update to be set to an
initial jiffies value. Otherwise on 32-bit systems you will not be able
to request a reading till the 5 minute jiffies rollover happens.

Cc: Guenter Roeck <linux@roeck-us.net>
Cc: David Frey <david.frey@sensirion.com>
Signed-off-by: NMatt Ranostay <mranostay@gmail.com>
Reviewed-by: NJean Delvare <jdelvare@suse.de>
Fixes: 7c84f7f8 ("hwmon: add support for Sensirion SHT3x sensors")
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>

991f9fa9

31 7月, 2016 1 次提交

random: Fix crashes with sparse node ids · dd0f0cf5

由 Michael Ellerman 提交于 7月 31, 2016

On a system with sparse node ids, eg. a powerpc system with 4 nodes
numbered like so:

  node   0: [mem 0x0000000000000000-0x00000007ffffffff]
  node   1: [mem 0x0000000800000000-0x0000000fffffffff]
  node  16: [mem 0x0000001000000000-0x00000017ffffffff]
  node  17: [mem 0x0000001800000000-0x0000001fffffffff]

The code in rand_initialize() will allocate 4 pointers for the pool
array, and initialise them correctly.

However when go to use the pool, in eg. extract_crng(), we use the
numa_node_id() to index into the array. For the higher numbered node ids
this leads to random memory corruption, depending on what was kmalloc'ed
adjacent to the pool array.

Fix it by using nr_node_ids to size the pool array.

Fixes: 1e7f583a ("random: make /dev/urandom scalable for silly userspace programs")
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dd0f0cf5

29 7月, 2016 16 次提交

crypto: marvell - Don't copy IV vectors from the _process op for ciphers · 8cf740ae

由 Romain Perier 提交于 7月 28, 2016

The IV output vectors should only be copied from the _complete operation
and not from the _process operation, i.e only from the operation that is
designed to copy the result of the request to the right location. This
copy is already done in the _complete operation, so this commit removes
the duplicated code in the _process op.

Fixes: 3610d6cd5231 ("crypto: marvell - Add a complete...")
Signed-off-by: NRomain Perier <romain.perier@free-electrons.com>
Acked-by: NBoris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

8cf740ae

mmc: rtsx_pci: Remove deprecated create_singlethread_workqueue · 6ea62579

由 Bhaktipriya Shridhar 提交于 7月 26, 2016

The workqueue "workq" provides support for sd/mmc async request, which
makes next request do dma_map_sg() while previous request transferring
data.

The workqueue has a single workitem(&host->work) and hence doesn't require
ordering. Also, it is not being used on a memory reclaim path. Hence,
the singlethreaded workqueue has been replaced with the use of system_wq.

System workqueues have been able to handle high level of concurrency
for a long time now and hence it's not required to have a singlethreaded
workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue
created with create_singlethread_workqueue(), system_wq allows multiple
work items to overlap executions even on the same CPU; however, a
per-cpu workqueue doesn't have any CPU locality or global ordering
guarantee unless the target CPU is explicitly specified and thus the
increase of local concurrency shouldn't make any difference.

Work item has been flushed in rtsx_pci_sdmmc_drv_remove() to ensure that
there are no pending tasks while disconnecting the driver.
Signed-off-by: NBhaktipriya Shridhar <bhaktipriya96@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>

6ea62579

mmc: rtsx_pci: Enable MMC_CAP_ERASE to allow erase/discard/trim requests · 9bce7fd6

由 Ulf Hansson 提交于 7月 26, 2016

Cc: Micky Ching <micky_ching@realsil.com.cn>
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
Tested-by: NMauro Santos <registo.mailling@gmail.com>

9bce7fd6

mmc: rtsx_pci: Use the provided busy timeout from the mmc core · 27f4bf7d

由 Ulf Hansson 提交于 7月 26, 2016

The rtsx_pci driver is using a fixed 3s timeout for R1B responses, which
in some cases isn't suffient. For example, erase/discard requests may
require longer timeouts.

Instead of always using a fixed timeout, let's use the per request
calculated busy timeout from the mmc core.

Cc: Micky Ching <micky_ching@realsil.com.cn>
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
Tested-by: NMauro Santos <registo.mailling@gmail.com>

27f4bf7d

mmc: sdhci-pltfm: Drop define for SDHCI_PLTFM_PMOPS · fa243f64

由 Ulf Hansson 提交于 7月 27, 2016

Due to previous changes this define has no longer a purpose. Instead move
the sdhci-pltfm drivers over to use the exported struct sdhci_pltfm_pmops.
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>

fa243f64

mmc: sdhci-pltfm: Convert to use the SET_SYSTEM_SLEEP_PM_OPS · 2b330999

由 Ulf Hansson 提交于 7月 27, 2016

Move the system PM callbacks within #ifdef CONFIG_PM_SLEEP as to avoid
them being build when not used. This also allows us to use the
SET_SYSTEM_SLEEP_PM_OPS macro which simplifies the code.

Within this context it also makes sense to move the declaration of the
struct sdhci_pltfm_pmops, outside the #ifdef CONFIG_PM as the
SET_SYSTEM_SLEEP_PM_OPS deals with this. This further simplifies the code.
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>

2b330999

mmc: sdhci-pltfm: Make sdhci_pltfm_suspend|resume() static · 21b8fe0f

由 Ulf Hansson 提交于 7月 27, 2016

There are no users left of these exported APIs, so let's make them static.
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>

21b8fe0f

mmc: sdhci-esdhc-imx: Use common sdhci_suspend|resume_host() · 3e3274ab

由 Ulf Hansson 提交于 7月 27, 2016

To prepare to make the sdhci_pltfm_suspend|resume() static functions, move
sdhci-esdhc-imx over to use the sdhci_suspend|resume_host().
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
Acked-by: NDong Aisheng <aisheng.dong@nxp.com>

3e3274ab

mmc: sdhci-esdhc-imx: Assign system PM ops within #ifdef CONFIG_PM_SLEEP · 2788ed42

由 Ulf Hansson 提交于 7月 27, 2016

The system PM callbacks isn't used unless CONFIG_PM_SLEEP is set, thus it
triggers a compiler warning about unused functions. Avoid this by changing
from CONFIG_PM to CONFIG_PM_SLEEP.
Reported-by: NArnd Bergmann <arnd@arndb.de>
Fixes: b70d0b3b5b29 ("mmc: sdhci-esdhc-imx: add esdhc specific suspend resume callback")
Cc: Dong Aisheng <aisheng.dong@nxp.com>
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
Acked-by: NDong Aisheng <aisheng.dong@nxp.com>

2788ed42

mm: track NR_KERNEL_STACK in KiB instead of number of stacks · d30dd8be

由 Andy Lutomirski 提交于 7月 28, 2016

Currently, NR_KERNEL_STACK tracks the number of kernel stacks in a zone.
This only makes sense if each kernel stack exists entirely in one zone,
and allowing vmapped stacks could break this assumption.

Since frv has THREAD_SIZE < PAGE_SIZE, we need to track kernel stack
allocations in a unit that divides both THREAD_SIZE and PAGE_SIZE on all
architectures. Keep it simple and use KiB.

Link: http://lkml.kernel.org/r/083c71e642c5fa5f1b6898902e1b2db7b48940d4.1468523549.git.luto@kernel.orgSigned-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: NVladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d30dd8be

mm: move most file-based accounting to the node · 11fb9989

由 Mel Gorman 提交于 7月 28, 2016

There are now a number of accounting oddities such as mapped file pages
being accounted for on the node while the total number of file pages are
accounted on the zone.  This can be coped with to some extent but it's
confusing so this patch moves the relevant file-based accounted.  Due to
throttling logic in the page allocator for reliable OOM detection, it is
still necessary to track dirty and writeback pages on a per-zone basis.

[mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting]
  Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net
Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

11fb9989

mm: rename NR_ANON_PAGES to NR_ANON_MAPPED · 4b9d0fab

由 Mel Gorman 提交于 7月 28, 2016

NR_FILE_PAGES  is the number of        file pages.
NR_FILE_MAPPED is the number of mapped file pages.
NR_ANON_PAGES  is the number of mapped anon pages.

This is unhelpful naming as it's easy to confuse NR_FILE_MAPPED and
NR_ANON_PAGES for mapped pages.  This patch renames NR_ANON_PAGES so we
have

NR_FILE_PAGES  is the number of        file pages.
NR_FILE_MAPPED is the number of mapped file pages.
NR_ANON_MAPPED is the number of mapped anon pages.

Link: http://lkml.kernel.org/r/1467970510-21195-19-git-send-email-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4b9d0fab

mm: move page mapped accounting to the node · 50658e2e

由 Mel Gorman 提交于 7月 28, 2016

Reclaim makes decisions based on the number of pages that are mapped but
it's mixing node and zone information.  Account NR_FILE_MAPPED and
NR_ANON_PAGES pages on the node.

Link: http://lkml.kernel.org/r/1467970510-21195-18-git-send-email-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

50658e2e

mm, vmscan: move LRU lists to node · 599d0c95

由 Mel Gorman 提交于 7月 28, 2016

This moves the LRU lists from the zone to the node and related data such
as counters, tracing, congestion tracking and writeback tracking.

Unfortunately, due to reclaim and compaction retry logic, it is
necessary to account for the number of LRU pages on both zone and node
logic. Most reclaim logic is based on the node counters but the retry
logic uses the zone counters which do not distinguish inactive and
active sizes. It would be possible to leave the LRU counters on a
per-zone basis but it's a heavier calculation across multiple cache
lines that is much more frequent than the retry checks.

Other than the LRU counters, this is mostly a mechanical patch but note
that it introduces a number of anomalies. For example, the scans are
per-zone but using per-node counters. We also mark a node as congested
when a zone is congested. This causes weird problems that are fixed
later but is easier to review.

In the event that there is excessive overhead on 32-bit systems due to
the nodes being on LRU then there are two potential solutions

1. Long-term isolation of highmem pages when reclaim is lowmem

When pages are skipped, they are immediately added back onto the LRU
list. If lowmem reclaim persisted for long periods of time, the same
highmem pages get continually scanned. The idea would be that lowmem
keeps those pages on a separate list until a reclaim for highmem pages
arrives that splices the highmem pages back onto the LRU. It potentially
could be implemented similar to the UNEVICTABLE list.

That would reduce the skip rate with the potential corner case is that
highmem pages have to be scanned and reclaimed to free lowmem slab pages.

2. Linear scan lowmem pages if the initial LRU shrink fails

This will break LRU ordering but may be preferable and faster during
memory pressure than skipping LRU pages.

Link: http://lkml.kernel.org/r/1467970510-21195-4-git-send-email-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

599d0c95

mm, vmstat: add infrastructure for per-node vmstats · 75ef7184

由 Mel Gorman 提交于 7月 28, 2016

Patchset: "Move LRU page reclaim from zones to nodes v9"

This series moves LRUs from the zones to the node.  While this is a
current rebase, the test results were based on mmotm as of June 23rd.
Conceptually, this series is simple but there are a lot of details.
Some of the broad motivations for this are;

1. The residency of a page partially depends on what zone the page was
   allocated from.  This is partially combatted by the fair zone allocation
   policy but that is a partial solution that introduces overhead in the
   page allocator paths.

2. Currently, reclaim on node 0 behaves slightly different to node 1. For
   example, direct reclaim scans in zonelist order and reclaims even if
   the zone is over the high watermark regardless of the age of pages
   in that LRU. Kswapd on the other hand starts reclaim on the highest
   unbalanced zone. A difference in distribution of file/anon pages due
   to when they were allocated results can result in a difference in
   again. While the fair zone allocation policy mitigates some of the
   problems here, the page reclaim results on a multi-zone node will
   always be different to a single-zone node.
   it was scheduled on as a result.

3. kswapd and the page allocator scan zones in the opposite order to
   avoid interfering with each other but it's sensitive to timing.  This
   mitigates the page allocator using pages that were allocated very recently
   in the ideal case but it's sensitive to timing. When kswapd is allocating
   from lower zones then it's great but during the rebalancing of the highest
   zone, the page allocator and kswapd interfere with each other. It's worse
   if the highest zone is small and difficult to balance.

4. slab shrinkers are node-based which makes it harder to identify the exact
   relationship between slab reclaim and LRU reclaim.

The reason we have zone-based reclaim is that we used to have
large highmem zones in common configurations and it was necessary
to quickly find ZONE_NORMAL pages for reclaim. Today, this is much
less of a concern as machines with lots of memory will (or should) use
64-bit kernels. Combinations of 32-bit hardware and 64-bit hardware are
rare. Machines that do use highmem should have relatively low highmem:lowmem
ratios than we worried about in the past.

Conceptually, moving to node LRUs should be easier to understand. The
page allocator plays fewer tricks to game reclaim and reclaim behaves
similarly on all nodes.

The series has been tested on a 16 core UMA machine and a 2-socket 48
core NUMA machine. The UMA results are presented in most cases as the NUMA
machine behaved similarly.

pagealloc
---------

This is a microbenchmark that shows the benefit of removing the fair zone
allocation policy. It was tested uip to order-4 but only orders 0 and 1 are
shown as the other orders were comparable.

                                           4.7.0-rc4                  4.7.0-rc4
                                      mmotm-20160623                 nodelru-v9
Min      total-odr0-1               490.00 (  0.00%)           457.00 (  6.73%)
Min      total-odr0-2               347.00 (  0.00%)           329.00 (  5.19%)
Min      total-odr0-4               288.00 (  0.00%)           273.00 (  5.21%)
Min      total-odr0-8               251.00 (  0.00%)           239.00 (  4.78%)
Min      total-odr0-16              234.00 (  0.00%)           222.00 (  5.13%)
Min      total-odr0-32              223.00 (  0.00%)           211.00 (  5.38%)
Min      total-odr0-64              217.00 (  0.00%)           208.00 (  4.15%)
Min      total-odr0-128             214.00 (  0.00%)           204.00 (  4.67%)
Min      total-odr0-256             250.00 (  0.00%)           230.00 (  8.00%)
Min      total-odr0-512             271.00 (  0.00%)           269.00 (  0.74%)
Min      total-odr0-1024            291.00 (  0.00%)           282.00 (  3.09%)
Min      total-odr0-2048            303.00 (  0.00%)           296.00 (  2.31%)
Min      total-odr0-4096            311.00 (  0.00%)           309.00 (  0.64%)
Min      total-odr0-8192            316.00 (  0.00%)           314.00 (  0.63%)
Min      total-odr0-16384           317.00 (  0.00%)           315.00 (  0.63%)
Min      total-odr1-1               742.00 (  0.00%)           712.00 (  4.04%)
Min      total-odr1-2               562.00 (  0.00%)           530.00 (  5.69%)
Min      total-odr1-4               457.00 (  0.00%)           433.00 (  5.25%)
Min      total-odr1-8               411.00 (  0.00%)           381.00 (  7.30%)
Min      total-odr1-16              381.00 (  0.00%)           356.00 (  6.56%)
Min      total-odr1-32              372.00 (  0.00%)           346.00 (  6.99%)
Min      total-odr1-64              372.00 (  0.00%)           343.00 (  7.80%)
Min      total-odr1-128             375.00 (  0.00%)           351.00 (  6.40%)
Min      total-odr1-256             379.00 (  0.00%)           351.00 (  7.39%)
Min      total-odr1-512             385.00 (  0.00%)           355.00 (  7.79%)
Min      total-odr1-1024            386.00 (  0.00%)           358.00 (  7.25%)
Min      total-odr1-2048            390.00 (  0.00%)           362.00 (  7.18%)
Min      total-odr1-4096            390.00 (  0.00%)           362.00 (  7.18%)
Min      total-odr1-8192            388.00 (  0.00%)           363.00 (  6.44%)

This shows a steady improvement throughout. The primary benefit is from
reduced system CPU usage which is obvious from the overall times;

           4.7.0-rc4   4.7.0-rc4
        mmotm-20160623nodelru-v8
User          189.19      191.80
System       2604.45     2533.56
Elapsed      2855.30     2786.39

The vmstats also showed that the fair zone allocation policy was definitely
removed as can be seen here;

                             4.7.0-rc3   4.7.0-rc3
                         mmotm-20160623 nodelru-v8
DMA32 allocs               28794729769           0
Normal allocs              48432501431 77227309877
Movable allocs                       0           0

tiobench on ext4
----------------

tiobench is a benchmark that artifically benefits if old pages remain resident
while new pages get reclaimed. The fair zone allocation policy mitigates this
problem so pages age fairly. While the benchmark has problems, it is important
that tiobench performance remains constant as it implies that page aging
problems that the fair zone allocation policy fixes are not re-introduced.

                                         4.7.0-rc4             4.7.0-rc4
                                    mmotm-20160623            nodelru-v9
Min      PotentialReadSpeed        89.65 (  0.00%)       90.21 (  0.62%)
Min      SeqRead-MB/sec-1          82.68 (  0.00%)       82.01 ( -0.81%)
Min      SeqRead-MB/sec-2          72.76 (  0.00%)       72.07 ( -0.95%)
Min      SeqRead-MB/sec-4          75.13 (  0.00%)       74.92 ( -0.28%)
Min      SeqRead-MB/sec-8          64.91 (  0.00%)       65.19 (  0.43%)
Min      SeqRead-MB/sec-16         62.24 (  0.00%)       62.22 ( -0.03%)
Min      RandRead-MB/sec-1          0.88 (  0.00%)        0.88 (  0.00%)
Min      RandRead-MB/sec-2          0.95 (  0.00%)        0.92 ( -3.16%)
Min      RandRead-MB/sec-4          1.43 (  0.00%)        1.34 ( -6.29%)
Min      RandRead-MB/sec-8          1.61 (  0.00%)        1.60 ( -0.62%)
Min      RandRead-MB/sec-16         1.80 (  0.00%)        1.90 (  5.56%)
Min      SeqWrite-MB/sec-1         76.41 (  0.00%)       76.85 (  0.58%)
Min      SeqWrite-MB/sec-2         74.11 (  0.00%)       73.54 ( -0.77%)
Min      SeqWrite-MB/sec-4         80.05 (  0.00%)       80.13 (  0.10%)
Min      SeqWrite-MB/sec-8         72.88 (  0.00%)       73.20 (  0.44%)
Min      SeqWrite-MB/sec-16        75.91 (  0.00%)       76.44 (  0.70%)
Min      RandWrite-MB/sec-1         1.18 (  0.00%)        1.14 ( -3.39%)
Min      RandWrite-MB/sec-2         1.02 (  0.00%)        1.03 (  0.98%)
Min      RandWrite-MB/sec-4         1.05 (  0.00%)        0.98 ( -6.67%)
Min      RandWrite-MB/sec-8         0.89 (  0.00%)        0.92 (  3.37%)
Min      RandWrite-MB/sec-16        0.92 (  0.00%)        0.93 (  1.09%)

           4.7.0-rc4   4.7.0-rc4
        mmotm-20160623 approx-v9
User          645.72      525.90
System        403.85      331.75
Elapsed      6795.36     6783.67

This shows that the series has little or not impact on tiobench which is
desirable and a reduction in system CPU usage. It indicates that the fair
zone allocation policy was removed in a manner that didn't reintroduce
one class of page aging bug. There were only minor differences in overall
reclaim activity

                             4.7.0-rc4   4.7.0-rc4
                          mmotm-20160623nodelru-v8
Minor Faults                    645838      647465
Major Faults                       573         640
Swap Ins                             0           0
Swap Outs                            0           0
DMA allocs                           0           0
DMA32 allocs                  46041453    44190646
Normal allocs                 78053072    79887245
Movable allocs                       0           0
Allocation stalls                   24          67
Stall zone DMA                       0           0
Stall zone DMA32                     0           0
Stall zone Normal                    0           2
Stall zone HighMem                   0           0
Stall zone Movable                   0          65
Direct pages scanned             10969       30609
Kswapd pages scanned          93375144    93492094
Kswapd pages reclaimed        93372243    93489370
Direct pages reclaimed           10969       30609
Kswapd efficiency                  99%         99%
Kswapd velocity              13741.015   13781.934
Direct efficiency                 100%        100%
Direct velocity                  1.614       4.512
Percentage direct scans             0%          0%

kswapd activity was roughly comparable. There were differences in direct
reclaim activity but negligible in the context of the overall workload
(velocity of 4 pages per second with the patches applied, 1.6 pages per
second in the baseline kernel).

pgbench read-only large configuration on ext4
---------------------------------------------

pgbench is a database benchmark that can be sensitive to page reclaim
decisions. This also checks if removing the fair zone allocation policy
is safe

pgbench Transactions
                        4.7.0-rc4             4.7.0-rc4
                   mmotm-20160623            nodelru-v8
Hmean    1       188.26 (  0.00%)      189.78 (  0.81%)
Hmean    5       330.66 (  0.00%)      328.69 ( -0.59%)
Hmean    12      370.32 (  0.00%)      380.72 (  2.81%)
Hmean    21      368.89 (  0.00%)      369.00 (  0.03%)
Hmean    30      382.14 (  0.00%)      360.89 ( -5.56%)
Hmean    32      428.87 (  0.00%)      432.96 (  0.95%)

Negligible differences again. As with tiobench, overall reclaim activity
was comparable.

bonnie++ on ext4
----------------

No interesting performance difference, negligible differences on reclaim
stats.

paralleldd on ext4
------------------

This workload uses varying numbers of dd instances to read large amounts of
data from disk.

                               4.7.0-rc3             4.7.0-rc3
                          mmotm-20160623            nodelru-v9
Amean    Elapsd-1       186.04 (  0.00%)      189.41 ( -1.82%)
Amean    Elapsd-3       192.27 (  0.00%)      191.38 (  0.46%)
Amean    Elapsd-5       185.21 (  0.00%)      182.75 (  1.33%)
Amean    Elapsd-7       183.71 (  0.00%)      182.11 (  0.87%)
Amean    Elapsd-12      180.96 (  0.00%)      181.58 ( -0.35%)
Amean    Elapsd-16      181.36 (  0.00%)      183.72 ( -1.30%)

           4.7.0-rc4   4.7.0-rc4
        mmotm-20160623 nodelru-v9
User         1548.01     1552.44
System       8609.71     8515.08
Elapsed      3587.10     3594.54

There is little or no change in performance but some drop in system CPU usage.

                             4.7.0-rc3   4.7.0-rc3
                        mmotm-20160623  nodelru-v9
Minor Faults                    362662      367360
Major Faults                      1204        1143
Swap Ins                            22           0
Swap Outs                         2855        1029
DMA allocs                           0           0
DMA32 allocs                  31409797    28837521
Normal allocs                 46611853    49231282
Movable allocs                       0           0
Direct pages scanned                 0           0
Kswapd pages scanned          40845270    40869088
Kswapd pages reclaimed        40830976    40855294
Direct pages reclaimed               0           0
Kswapd efficiency                  99%         99%
Kswapd velocity              11386.711   11369.769
Direct efficiency                 100%        100%
Direct velocity                  0.000       0.000
Percentage direct scans             0%          0%
Page writes by reclaim            2855        1029
Page writes file                     0           0
Page writes anon                  2855        1029
Page reclaim immediate             771        1628
Sector Reads                 293312636   293536360
Sector Writes                 18213568    18186480
Page rescued immediate               0           0
Slabs scanned                   128257      132747
Direct inode steals                181          56
Kswapd inode steals                 59        1131

It basically shows that kswapd was active at roughly the same rate in
both kernels. There was also comparable slab scanning activity and direct
reclaim was avoided in both cases. There appears to be a large difference
in numbers of inodes reclaimed but the workload has few active inodes and
is likely a timing artifact.

stutter
-------

stutter simulates a simple workload. One part uses a lot of anonymous
memory, a second measures mmap latency and a third copies a large file.
The primary metric is checking for mmap latency.

stutter
                             4.7.0-rc4             4.7.0-rc4
                        mmotm-20160623            nodelru-v8
Min         mmap     16.6283 (  0.00%)     13.4258 ( 19.26%)
1st-qrtle   mmap     54.7570 (  0.00%)     34.9121 ( 36.24%)
2nd-qrtle   mmap     57.3163 (  0.00%)     46.1147 ( 19.54%)
3rd-qrtle   mmap     58.9976 (  0.00%)     47.1882 ( 20.02%)
Max-90%     mmap     59.7433 (  0.00%)     47.4453 ( 20.58%)
Max-93%     mmap     60.1298 (  0.00%)     47.6037 ( 20.83%)
Max-95%     mmap     73.4112 (  0.00%)     82.8719 (-12.89%)
Max-99%     mmap     92.8542 (  0.00%)     88.8870 (  4.27%)
Max         mmap   1440.6569 (  0.00%)    121.4201 ( 91.57%)
Mean        mmap     59.3493 (  0.00%)     42.2991 ( 28.73%)
Best99%Mean mmap     57.2121 (  0.00%)     41.8207 ( 26.90%)
Best95%Mean mmap     55.9113 (  0.00%)     39.9620 ( 28.53%)
Best90%Mean mmap     55.6199 (  0.00%)     39.3124 ( 29.32%)
Best50%Mean mmap     53.2183 (  0.00%)     33.1307 ( 37.75%)
Best10%Mean mmap     45.9842 (  0.00%)     20.4040 ( 55.63%)
Best5%Mean  mmap     43.2256 (  0.00%)     17.9654 ( 58.44%)
Best1%Mean  mmap     32.9388 (  0.00%)     16.6875 ( 49.34%)

This shows a number of improvements with the worst-case outlier greatly
improved.

Some of the vmstats are interesting

                             4.7.0-rc4   4.7.0-rc4
                          mmotm-20160623nodelru-v8
Swap Ins                           163         502
Swap Outs                            0           0
DMA allocs                           0           0
DMA32 allocs                 618719206  1381662383
Normal allocs                891235743   564138421
Movable allocs                       0           0
Allocation stalls                 2603           1
Direct pages scanned            216787           2
Kswapd pages scanned          50719775    41778378
Kswapd pages reclaimed        41541765    41777639
Direct pages reclaimed          209159           0
Kswapd efficiency                  81%         99%
Kswapd velocity              16859.554   14329.059
Direct efficiency                  96%          0%
Direct velocity                 72.061       0.001
Percentage direct scans             0%          0%
Page writes by reclaim         6215049           0
Page writes file               6215049           0
Page writes anon                     0           0
Page reclaim immediate           70673          90
Sector Reads                  81940800    81680456
Sector Writes                100158984    98816036
Page rescued immediate               0           0
Slabs scanned                  1366954       22683

While this is not guaranteed in all cases, this particular test showed
a large reduction in direct reclaim activity. It's also worth noting
that no page writes were issued from reclaim context.

This series is not without its hazards. There are at least three areas
that I'm concerned with even though I could not reproduce any problems in
that area.

1. Reclaim/compaction is going to be affected because the amount of reclaim is
   no longer targetted at a specific zone. Compaction works on a per-zone basis
   so there is no guarantee that reclaiming a few THP's worth page pages will
   have a positive impact on compaction success rates.

2. The Slab/LRU reclaim ratio is affected because the frequency the shrinkers
   are called is now different. This may or may not be a problem but if it
   is, it'll be because shrinkers are not called enough and some balancing
   is required.

3. The anon/file reclaim ratio may be affected. Pages about to be dirtied are
   distributed between zones and the fair zone allocation policy used to do
   something very similar for anon. The distribution is now different but not
   necessarily in any way that matters but it's still worth bearing in mind.

VM statistic counters for reclaim decisions are zone-based.  If the kernel
is to reclaim on a per-node basis then we need to track per-node
statistics but there is no infrastructure for that.  The most notable
change is that the old node_page_state is renamed to
sum_zone_node_page_state.  The new node_page_state takes a pglist_data and
uses per-node stats but none exist yet.  There is some renaming such as
vm_stat to vm_zone_stat and the addition of vm_node_stat and the renaming
of mod_state to mod_zone_state.  Otherwise, this is mostly a mechanical
patch with no functional change.  There is a lot of similarity between the
node and zone helpers which is unfortunate but there was no obvious way of
reusing the code and maintaining type safety.

Link: http://lkml.kernel.org/r/1467970510-21195-2-git-send-email-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

75ef7184

MD: fix null pointer deference · 5d881783

由 Shaohua Li 提交于 7月 28, 2016

The md device might not have personality (for example, ddf raid array). The
issue is introduced by 8430e7e0(md: disconnect device from personality
before trying to remove it)
Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>

5d881783

28 7月, 2016 8 次提交

mailbox: Fix format and type mismatches in Broadcom PDC driver · a68b2166

由 Rob Rice 提交于 7月 28, 2016

Fix format and type mismatches in a couple debug prints in the
Broadcom PDC driver. Use %pad for dma_addr_t and %pa for
resource_size_t.
Signed-off-by: NRob Rice <rob.rice@broadcom.com>
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NJassi Brar <jaswinder.singh@linaro.org>

a68b2166

sparc: serial: sunhv: fix a double lock bug · 344e3c77

由 Dan Carpenter 提交于 7月 15, 2016

We accidentally take the "port->lock" twice in a row.  This old code
was supposed to be deleted.

Fixes: e58e241c ('sparc: serial: Clean up the locking for -rt')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

344e3c77

crypto: marvell - Update cache with input sg only when it is unmapped · 64ec6ccb

由 Romain Perier 提交于 7月 22, 2016

So far, the cache of the ahash requests was updated from the 'complete'
operation. This complete operation is called from mv_cesa_tdma_process
before the cleanup operation, which means that the content of req->src
can be read and copied when it is still mapped. This commit fixes the
issue by moving this cache update from mv_cesa_ahash_complete to
mv_cesa_ahash_req_cleanup, so the copy is done once the sglist is
unmapped.

Fixes: 1bf6682c ("crypto: marvell - Add a complete operation for..")
Signed-off-by: NRomain Perier <romain.perier@free-electrons.com>
Acked-by: NBoris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

64ec6ccb

crypto: marvell - Don't chain at DMA level when backlog is disabled · 603e989e

由 Romain Perier 提交于 7月 22, 2016

The flag CRYPTO_TFM_REQ_MAY_BACKLOG is optional and can be set from the
user to put requests into the backlog queue when the main cryptographic
queue is full. Before calling mv_cesa_tdma_chain we must check the value
of the return status to be sure that the current request has been
correctly queued or added to the backlog.

Fixes: 85030c51 ("crypto: marvell - Add support for chaining...")
Signed-off-by: NRomain Perier <romain.perier@free-electrons.com>
Acked-by: NBoris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

603e989e

crypto: marvell - Fix memory leaks in TDMA chain for cipher requests · ec38f82e

由 Romain Perier 提交于 7月 22, 2016

So far in mv_cesa_ablkcipher_dma_req_init, if an error is thrown while
the tdma chain is built there is a memory leak. This issue exists
because the chain is assigned later at the end of the function, so the
cleanup function is called with the wrong version of the chain.

Fixes: db509a45 ("crypto: marvell/cesa - add TDMA support")
Signed-off-by: NRomain Perier <romain.perier@free-electrons.com>
Acked-by: NBoris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

ec38f82e

mailbox: Add Broadcom PDC mailbox driver · a24532f8

由 Rob Rice 提交于 6月 30, 2016

The Broadcom PDC mailbox driver is a mailbox controller that
manages data transfers to and from one or more offload engines.
Signed-off-by: NRob Rice <rob.rice@broadcom.com>
Reviewed-by: NScott Branden <scott.branden@broadcom.com>
Reviewed-by: NRay Jui <ray.jui@broadcom.com>
Signed-off-by: NJassi Brar <jaswinder.singh@linaro.org>

a24532f8

random: use for_each_online_node() to iterate over NUMA nodes · 59b8d4f1

由 Theodore Ts'o 提交于 7月 27, 2016

This fixes a crash on s390 with fake NUMA enabled.
Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Fixes: 1e7f583a ("random: make /dev/urandom scalable for silly userspace programs")
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

59b8d4f1

Add braces to avoid "ambiguous ‘else’" compiler warnings · 194dc870

由 Linus Torvalds 提交于 7月 27, 2016

Some of our "for_each_xyz()" macro constructs make gcc unhappy about
lack of braces around if-statements inside or outside the loop, because
the loop construct itself has a "if-then-else" statement inside of it.

The resulting warnings look something like this:

  drivers/gpu/drm/i915/i915_debugfs.c: In function ‘i915_dump_lrc’:
  drivers/gpu/drm/i915/i915_debugfs.c:2103:6: warning: suggest explicit braces to avoid ambiguous ‘else’ [-Wparentheses]
     if (ctx != dev_priv->kernel_context)
        ^

even if the code itself is fine.

Since the warning is fairly easy to avoid by adding a braces around the
if-statement near the for_each_xyz() construct, do so, rather than
disabling the otherwise potentially useful warning.

(The if-then-else statements used in the "for_each_xyz()" constructs are
designed to be inherently safe even with no braces, but in this case
it's quite understandable that gcc isn't really able to tell that).

This finally leaves the standard "allmodconfig" build with just a
handful of remaining warnings, so new and valid warnings hopefully will
stand out.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

194dc870

27 7月, 2016 7 次提交

ipmi: remove trydefaults parameter and default init · b07b58a3

由 Tony Camuso 提交于 6月 22, 2016

Parameter trydefaults=1 causes the ipmi_init to initialize ipmi through
the legacy port io space that was designated for ipmi. Architectures
that do not map legacy port io can panic when trydefaults=1.

Rather than implement build-time conditional exceptions for each
architecture that does not map legacy port io, we have removed legacy
port io from the driver.

Parameter 'trydefaults' has been removed. Attempts to use it hereafter
will evoke the "Unknown symbol in module, or unknown parameter" message.

The patch was built against a number of architectures and tested for
regressions and functionality on x86_64 and ARM64.
Signed-off-by: NTony Camuso <tcamuso@redhat.com>

Removed the config entry and the address source entry for default,
since neither were used any more.
Signed-off-by: NCorey Minyard <cminyard@mvista.com>

b07b58a3

mmc: sdhci-sirf: Remove non needed #ifdef CONFIG_PM* for dev_pm_ops · ee4cf97c

由 Ulf Hansson 提交于 7月 27, 2016

The SIMPLE_DEV_PM_OPS macro deals with the CONFIG_PM options when
assigning the PM callbacks, thus it's not needed to control this when
using the macro. By removing the non needed #ifdef, the code becomes a
bit cleaner.
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>

ee4cf97c

mmc: sdhci-s3c: Remove non needed #ifdef CONFIG_PM for dev_pm_ops · 6b3a194b

由 Ulf Hansson 提交于 7月 27, 2016

As the SET_SYSTEM_SLEEP_PM_OPS and the SET_RUNTIME_PM_OPS macro deals with
the CONFIG_PM options when assigning the callbacks, it becomes redundant
to control this when declaring the struct dev_pm_ops. Instead let's always
declare it as it simplifies the code.
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>

6b3a194b

mmc: sdhci-pxav3: Remove non needed #ifdef CONFIG_PM for dev_pm_ops · a81ce772

由 Ulf Hansson 提交于 7月 27, 2016

a81ce772

mmc: sdhci-of-esdhc: Simplify code by using SIMPLE_DEV_PM_OPS · 9e48b336

由 Ulf Hansson 提交于 7月 27, 2016

Let's use the SIMPLE_DEV_PM_OPS macro when declaring/assigning the system
PM callbacks, as the code gets simplified.
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>

9e48b336

watchdog: gpio_wdt: Fix missing platform_set_drvdata() in gpio_wdt_probe() · 1ac06563

由 Wei Yongjun 提交于 7月 26, 2016

Add missing platform_set_drvdata() in gpio_wdt_probe(), otherwise
calling platform_get_drvdata() in remove returns NULL.

This is detected by Coccinelle semantic patch.
Signed-off-by: NWei Yongjun <weiyj.lk@gmail.com>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NWim Van Sebroeck <wim@iguana.be>

1ac06563

mmc: sdhci-acpi: Simplify code by using SET_SYSTEM_SLEEP_PM_OPS · dafed447

由 Ulf Hansson 提交于 7月 27, 2016

By using the SET_SYSTEM_SLEEP_PM_OPS when assigning the system PM
callbacks, we can remove some #ifdefs so code becomes a bit cleaner.
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>

dafed447

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功