提交 · 8496afaba93ece80a83cbd096f0675a1020ddfc4 · openeuler / raspberrypi-kernel

08 10月, 2016 16 次提交

mm,oom_reaper: do not attempt to reap a task twice · 8496afab

由 Tetsuo Handa 提交于 10月 07, 2016

"mm, oom_reaper: do not attempt to reap a task twice" tried to give the
OOM reaper one more chance to retry using MMF_OOM_NOT_REAPABLE flag.
But the usefulness of the flag is rather limited and actually never
shown in practice. If the flag is set, it means that the holder of
mm->mmap_sem cannot call up_write() due to presumably being blocked at
unkillable wait waiting for other thread's memory allocation. But since
one of threads sharing that mm will queue that mm immediately via
task_will_free_mem() shortcut (otherwise, oom_badness() will select the
same mm again due to oom_score_adj value unchanged), retrying
MMF_OOM_NOT_REAPABLE mm is unlikely helpful.

Let's always set MMF_OOM_REAPED.

Link: http://lkml.kernel.org/r/1472119394-11342-3-git-send-email-mhocko@kernel.orgSigned-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8496afab

mm, swap: add swap_cluster_list · 6b534915

由 Huang Ying 提交于 10月 07, 2016

This is a code clean up patch without functionality changes.  The
swap_cluster_list data structure and its operations are introduced to
provide some better encapsulation for the free cluster and discard
cluster list operations.  This avoid some code duplication, improved the
code readability, and reduced the total line number.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1472067356-16004-1-git-send-email-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Shaohua Li <shli@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6b534915

mm: pagewalk: fix the comment for test_walk · f7e2355f

由 James Morse 提交于 10月 07, 2016

Modify the comment describing struct mm_walk->test_walk()s behaviour to
match the comment on walk_page_test() and the behaviour of
walk_page_vma().

Fixes: fafaa426 ("pagewalk: improve vma handling")
Link: http://lkml.kernel.org/r/1471622518-21980-1-git-send-email-james.morse@arm.comSigned-off-by: NJames Morse <james.morse@arm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f7e2355f

mm/page_owner: don't define fields on struct page_ext by hard-coding · 9300d8df

由 Joonsoo Kim 提交于 10月 07, 2016

There is a memory waste problem if we define field on struct page_ext by
hard-coding.  Entry size of struct page_ext includes the size of those
fields even if it is disabled at runtime.  Now, extra memory request at
runtime is possible so page_owner don't need to define it's own fields
by hard-coding.

This patch removes hard-coded define and uses extra memory for storing
page_owner information in page_owner.  Most of code are just mechanical
changes.

Link: http://lkml.kernel.org/r/1471315879-32294-7-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9300d8df

mm/page_ext: support extra space allocation by page_ext user · 980ac167

由 Joonsoo Kim 提交于 10月 07, 2016

Until now, if some page_ext users want to use it's own field on
page_ext, it should be defined in struct page_ext by hard-coding.  It
has a problem that wastes memory in following situation.

  struct page_ext {
   #ifdef CONFIG_A
  	int a;
   #endif
   #ifdef CONFIG_B
  	int b;
   #endif
  };

Assume that kernel is built with both CONFIG_A and CONFIG_B.  Even if we
enable feature A and doesn't enable feature B at runtime, each entry of
struct page_ext takes two int rather than one int.  It's undesirable
result so this patch tries to fix it.

To solve above problem, this patch implements to support extra space
allocation at runtime.  When need() callback returns true, it's extra
memory requirement is summed to entry size of page_ext.  Also, offset
for each user's extra memory space is returned.  With this offset, user
can use this extra space and there is no need to define needed field on
page_ext by hard-coding.

This patch only implements an infrastructure.  Following patch will use
it for page_owner which is only user having it's own fields on page_ext.

Link: http://lkml.kernel.org/r/1471315879-32294-6-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

980ac167

mm/page_owner: move page_owner specific function to page_owner.c · e2f612e6

由 Joonsoo Kim 提交于 10月 07, 2016

There is no reason that page_owner specific function resides on
vmstat.c.

Link: http://lkml.kernel.org/r/1471315879-32294-4-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e2f612e6

mm, vmscan: get rid of throttle_vm_writeout · bf484383

由 Michal Hocko 提交于 10月 07, 2016

throttle_vm_writeout() was introduced back in 2005 to fix OOMs caused by
excessive pageout activity during the reclaim.  Too many pages could be
put under writeback therefore LRUs would be full of unreclaimable pages
until the IO completes and in turn the OOM killer could be invoked.

There have been some important changes introduced since then in the
reclaim path though.  Writers are throttled by balance_dirty_pages when
initiating the buffered IO and later during the memory pressure, the
direct reclaim is throttled by wait_iff_congested if the node is
considered congested by dirty pages on LRUs and the underlying bdi is
congested by the queued IO.  The kswapd is throttled as well if it
encounters pages marked for immediate reclaim or under writeback which
signals that that there are too many pages under writeback already.
Finally should_reclaim_retry does congestion_wait if the reclaim cannot
make any progress and there are too many dirty/writeback pages.

Another important aspect is that we do not issue any IO from the direct
reclaim context anymore.  In a heavy parallel load this could queue a
lot of IO which would be very scattered and thus unefficient which would
just make the problem worse.

This three mechanisms should throttle and keep the amount of IO in a
steady state even under heavy IO and memory pressure so yet another
throttling point doesn't really seem helpful.  Quite contrary, Mikulas
Patocka has reported that swap backed by dm-crypt doesn't work properly
because the swapout IO cannot make sufficient progress as the writeout
path depends on dm_crypt worker which has to allocate memory to perform
the encryption.  In order to guarantee a forward progress it relies on
the mempool allocator.  mempool_alloc(), however, prefers to use the
underlying (usually page) allocator before it grabs objects from the
pool.  Such an allocation can dive into the memory reclaim and
consequently to throttle_vm_writeout.  If there are too many dirty or
pages under writeback it will get throttled even though it is in fact a
flusher to clear pending pages.

  kworker/u4:0    D ffff88003df7f438 10488     6      2	0x00000000
  Workqueue: kcryptd kcryptd_crypt [dm_crypt]
  Call Trace:
    schedule+0x3c/0x90
    schedule_timeout+0x1d8/0x360
    io_schedule_timeout+0xa4/0x110
    congestion_wait+0x86/0x1f0
    throttle_vm_writeout+0x44/0xd0
    shrink_zone_memcg+0x613/0x720
    shrink_zone+0xe0/0x300
    do_try_to_free_pages+0x1ad/0x450
    try_to_free_pages+0xef/0x300
    __alloc_pages_nodemask+0x879/0x1210
    alloc_pages_current+0xa1/0x1f0
    new_slab+0x2d7/0x6a0
    ___slab_alloc+0x3fb/0x5c0
    __slab_alloc+0x51/0x90
    kmem_cache_alloc+0x27b/0x310
    mempool_alloc_slab+0x1d/0x30
    mempool_alloc+0x91/0x230
    bio_alloc_bioset+0xbd/0x260
    kcryptd_crypt+0x114/0x3b0 [dm_crypt]

Let's just drop throttle_vm_writeout altogether.  It is not very much
helpful anymore.

I have tried to test a potential writeback IO runaway similar to the one
described in the original patch which has introduced that [1].  Small
virtual machine (512MB RAM, 4 CPUs, 2G of swap space and disk image on a
rather slow NFS in a sync mode on the host) with 8 parallel writers each
writing 1G worth of data.  As soon as the pagecache fills up and the
direct reclaim hits then I start anon memory consumer in a loop
(allocating 300M and exiting after populating it) in the background to
make the memory pressure even stronger as well as to disrupt the steady
state for the IO.  The direct reclaim is throttled because of the
congestion as well as kswapd hitting congestion_wait due to nr_immediate
but throttle_vm_writeout doesn't ever trigger the sleep throughout the
test.  Dirty+writeback are close to nr_dirty_threshold with some
fluctuations caused by the anon consumer.

[1] https://www2.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm3/broken-out/vm-pageout-throttling.patch
Link: http://lkml.kernel.org/r/1471171473-21418-1-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Reported-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: NeilBrown <neilb@suse.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bf484383

mm, compaction: create compact_gap wrapper · 9861a62c

由 Vlastimil Babka 提交于 10月 07, 2016

Compaction uses a watermark gap of (2UL << order) pages at various
places and it's not immediately obvious why.  Abstract it through a
compact_gap() wrapper to create a single place with a thorough
explanation.

[vbabka@suse.cz: clarify the comment of compact_gap()]
 Link: http://lkml.kernel.org/r/7b6aed1f-fdf8-2063-9ff4-bbe4de712d37@suse.cz
Link: http://lkml.kernel.org/r/20160810091226.6709-9-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Tested-by: NLorenzo Stoakes <lstoakes@gmail.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9861a62c

mm, compaction: add the ultimate direct compaction priority · a8e025e5

由 Vlastimil Babka 提交于 10月 07, 2016

During reclaim/compaction loop, it's desirable to get a final answer
from unsuccessful compaction so we can either fail the allocation or
invoke the OOM killer.  However, heuristics such as deferred compaction
or pageblock skip bits can cause compaction to skip parts or whole zones
and lead to premature OOM's, failures or excessive reclaim/compaction
retries.

To remedy this, we introduce a new direct compaction priority called
COMPACT_PRIO_SYNC_FULL, which instructs direct compaction to:

 - ignore deferred compaction status for a zone
 - ignore pageblock skip hints
 - ignore cached scanner positions and scan the whole zone

The new priority should get eventually picked up by
should_compact_retry() and this should improve success rates for costly
allocations using __GFP_REPEAT, such as hugetlbfs allocations, and
reduce some corner-case OOM's for non-costly allocations.

Link: http://lkml.kernel.org/r/20160810091226.6709-6-vbabka@suse.cz
[vbabka@suse.cz: use the MIN_COMPACT_PRIORITY alias]
  Link: http://lkml.kernel.org/r/d443b884-87e7-1c93-8684-3a3a35759fb1@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Tested-by: NLorenzo Stoakes <lstoakes@gmail.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a8e025e5

mm, compaction: rename COMPACT_PARTIAL to COMPACT_SUCCESS · cf378319

由 Vlastimil Babka 提交于 10月 07, 2016

COMPACT_PARTIAL has historically meant that compaction returned after
doing some work without fully compacting a zone.  It however didn't
distinguish if compaction terminated because it succeeded in creating
the requested high-order page.  This has changed recently and now we
only return COMPACT_PARTIAL when compaction thinks it succeeded, or the
high-order watermark check in compaction_suitable() passes and no
compaction needs to be done.

So at this point we can make the return value clearer by renaming it to
COMPACT_SUCCESS.  The next patch will remove some redundant tests for
success where compaction just returned COMPACT_SUCCESS.

Link: http://lkml.kernel.org/r/20160810091226.6709-4-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Tested-by: NLorenzo Stoakes <lstoakes@gmail.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cf378319

mm, compaction: cleanup unused functions · 791cae96

由 Vlastimil Babka 提交于 10月 07, 2016

Since kswapd compaction moved to kcompactd, compact_pgdat() is not
called anymore, so we remove it.  The only caller of __compact_pgdat()
is compact_node(), so we merge them and remove code that was only
reachable from kswapd.

Link: http://lkml.kernel.org/r/20160810091226.6709-3-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Tested-by: NLorenzo Stoakes <lstoakes@gmail.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

791cae96

mm/vmalloc.c: fix align value calculation error · 252e5c6e

由 zijun_hu 提交于 10月 07, 2016

It causes double align requirement for __get_vm_area_node() if parameter
size is power of 2 and VM_IOREMAP is set in parameter flags, for example
size=0x10000 -> fls_long(0x10000)=17 -> align=0x20000

get_count_order_long() is implemented and can be used instead of
fls_long() for fixing the bug, for example size=0x10000 ->
get_count_order_long(0x10000)=16 -> align=0x10000

[akpm@linux-foundation.org: s/get_order_long()/get_count_order_long()/]
[zijun_hu@zoho.com: fixes]
 Link: http://lkml.kernel.org/r/57AABC8B.1040409@zoho.com
[akpm@linux-foundation.org: locate get_count_order_long() next to get_count_order()]
[akpm@linux-foundation.org: move get_count_order[_long] definitions to pick up fls_long()]
[zijun_hu@htc.com: move out get_count_order[_long]() from __KERNEL__ scope]
 Link: http://lkml.kernel.org/r/57B2C4CE.80303@zoho.com
Link: http://lkml.kernel.org/r/fc045ecf-20fa-0722-b3ac-9a6140488fad@zoho.comSigned-off-by: Nzijun_hu <zijun_hu@htc.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Nzijun_hu <zijun_hu@htc.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

252e5c6e

mm: oom: deduplicate victim selection code for memcg and global oom · 7c5f64f8

由 Vladimir Davydov 提交于 10月 07, 2016

When selecting an oom victim, we use the same heuristic for both memory
cgroup and global oom. The only difference is the scope of tasks to
select the victim from. So we could just export an iterator over all
memcg tasks and keep all oom related logic in oom_kill.c, but instead we
duplicate pieces of it in memcontrol.c reusing some initially private
functions of oom_kill.c in order to not duplicate all of it. That looks
ugly and error prone, because any modification of select_bad_process
should also be propagated to mem_cgroup_out_of_memory.

Let's rework this as follows: keep all oom heuristic related code private
to oom_kill.c and make oom_kill.c use exported memcg functions when it's
really necessary (like in case of iterating over memcg tasks).

Link: http://lkml.kernel.org/r/1470056933-7505-1-git-send-email-vdavydov@virtuozzo.comSigned-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7c5f64f8

jiffies: add time comparison functions for 64 bit jiffies · 3740dcdf

由 Jason A. Donenfeld 提交于 10月 07, 2016

Though the time_before and time_after family of functions were nicely
extended to support jiffies64, so that the interface would be consistent,
it was forgotten to also extend the before/after jiffies functions to
support jiffies64. This commit brings the interface to parity between
jiffies and jiffies64, which is quite convenient.

Link: http://lkml.kernel.org/r/20160929033319.12188-1-Jason@zx2c4.comSigned-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3740dcdf

fanotify: use notification_lock instead of access_lock · 073f6552

由 Jan Kara 提交于 10月 07, 2016

Fanotify code has its own lock (access_lock) to protect a list of events
waiting for a response from userspace.

However this is somewhat awkward as the same list_head in the event is
protected by notification_lock if it is part of the notification queue
and by access_lock if it is part of the fanotify private queue which
makes it difficult for any reliable checks in the generic code.  So make
fanotify use the same lock - notification_lock - for protecting its
private event list.

Link: http://lkml.kernel.org/r/1473797711-14111-6-git-send-email-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

073f6552

fsnotify: convert notification_mutex to a spinlock · c21dbe20

由 Jan Kara 提交于 10月 07, 2016

notification_mutex is used to protect the list of pending events.  As such
there's no reason to use a sleeping lock for it.  Convert it to a
spinlock.

[jack@suse.cz: fixed version]
  Link: http://lkml.kernel.org/r/1474031567-1831-1-git-send-email-jack@suse.cz
Link: http://lkml.kernel.org/r/1473797711-14111-5-git-send-email-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
Tested-by: NGuenter Roeck <linux@roeck-us.net>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c21dbe20

06 10月, 2016 2 次提交

netfilter: merge fixup for "nf_tables_netdev: remove redundant ip_hdr assignment" · a44c984f

由 Stephen Rothwell 提交于 9月 13, 2016

Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a44c984f

mm: filemap: don't plant shadow entries without radix tree node · d3798ae8

由 Johannes Weiner 提交于 10月 04, 2016

When the underflow checks were added to workingset_node_shadow_dec(),
they triggered immediately:

  kernel BUG at ./include/linux/swap.h:276!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: isofs usb_storage fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_REJECT nf_reject_ipv6
   soundcore wmi acpi_als pinctrl_sunrisepoint kfifo_buf tpm_tis industrialio acpi_pad pinctrl_intel tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_crypt
  CPU: 0 PID: 20929 Comm: blkid Not tainted 4.8.0-rc8-00087-gbe67d60b #1
  Hardware name: System manufacturer System Product Name/Z170-K, BIOS 1803 05/06/2016
  task: ffff8faa93ecd940 task.stack: ffff8faa7f478000
  RIP: page_cache_tree_insert+0xf1/0x100
  Call Trace:
    __add_to_page_cache_locked+0x12e/0x270
    add_to_page_cache_lru+0x4e/0xe0
    mpage_readpages+0x112/0x1d0
    blkdev_readpages+0x1d/0x20
    __do_page_cache_readahead+0x1ad/0x290
    force_page_cache_readahead+0xaa/0x100
    page_cache_sync_readahead+0x3f/0x50
    generic_file_read_iter+0x5af/0x740
    blkdev_read_iter+0x35/0x40
    __vfs_read+0xe1/0x130
    vfs_read+0x96/0x130
    SyS_read+0x55/0xc0
    entry_SYSCALL_64_fastpath+0x13/0x8f
  Code: 03 00 48 8b 5d d8 65 48 33 1c 25 28 00 00 00 44 89 e8 75 19 48 83 c4 18 5b 41 5c 41 5d 41 5e 5d c3 0f 0b 41 bd ef ff ff ff eb d7 <0f> 0b e8 88 68 ef ff 0f 1f 84 00
  RIP  page_cache_tree_insert+0xf1/0x100

This is a long-standing bug in the way shadow entries are accounted in
the radix tree nodes. The shrinker needs to know when radix tree nodes
contain only shadow entries, no pages, so node->count is split in half
to count shadows in the upper bits and pages in the lower bits.

Unfortunately, the radix tree implementation doesn't know of this and
assumes all entries are in node->count. When there is a shadow entry
directly in root->rnode and the tree is later extended, the radix tree
implementation will copy that entry into the new node and and bump its
node->count, i.e. increases the page count bits. Once the shadow gets
removed and we subtract from the upper counter, node->count underflows
and triggers the warning. Afterwards, without node->count reaching 0
again, the radix tree node is leaked.

Limit shadow entries to when we have actual radix tree nodes and can
count them properly. That means we lose the ability to detect refaults
from files that had only the first page faulted in at eviction time.

Fixes: 449dd698 ("mm: keep page cache radix tree nodes in check")
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reported-and-tested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d3798ae8

04 10月, 2016 22 次提交

mfd: arizona: Remove arizona_of_get_named_gpio helper function · 1961531d

由 Charles Keepax 提交于 9月 20, 2016

This function is only used in a single place and no new users will be
added as all the devices other required GPIOs are already handled. As
such just merge the code back into the calling function.
Signed-off-by: NCharles Keepax <ckeepax@opensource.wolfsonmicro.com>

1961531d

mfd: twl6040: Register child device for twl6040-pdmclk · 0133d323

由 Peter Ujfalusi 提交于 8月 31, 2016

The McPDM in OMAP4/5 is using the pdmclk from twl6040 as functional clock.
The twl6040-pdmclk driver provides a clock which can be used to make sure
that the pdmclk is active when the McPDM is in use.
Signed-off-by: NPeter Ujfalusi <peter.ujfalusi@ti.com>
Acked-by: NRob Herring <robh@kernel.org>
Signed-off-by: NLee Jones <lee.jones@linaro.org>

0133d323

L
mfd: db8500-prcmu: Remove unused *prcmu_set_ddr_opp() calls · 45ff2b68
由 Lee Jones 提交于 9月 14, 2016
```
There are no call sites for these functions.  Strip them out.
Signed-off-by: NLee Jones <lee.jones@linaro.org>
```
45ff2b68

mfd: ab8500-debugfs: Prevent initialised field from being over-written · c45eab2c

由 Lee Jones 提交于 9月 14, 2016

Due to the lack of parity in the way array fields have been named/
numbered, a mistake was made where more debug fields were declared
than actually existed.  In doing so, 2 fields were added, which
although unclear, were already declared in the array.  The result
was that the latter declarations trashed the former ones.

This patch places the array back in the correct order and removes
the offending NULL entries.

While we're at it, let's ensure this doesn't happen again by naming
each field properly and add a new *_LAST define to describe how
many fields there should be.
Signed-off-by: NLee Jones <lee.jones@linaro.org>

c45eab2c

mfd: rk808: Fix RK818_IRQ_DISCHG_ILIM initializer · fae5e033

由 Arnd Bergmann 提交于 9月 06, 2016

When building with -Woverride-init, we get a warning about an incorrect
initializer:

drivers/mfd/rk808.c:244:8: error: initialized field overwritten [-Werror=override-init]
  [RK818_IRQ_DISCHG_ILIM] = {

This is clearly a mistake, as both RK818_IRQ_DISCHG_ILIM and RK818_IRQ_USB_OV
are defined as '7', but they refer to different register bits. Changing
RK818_IRQ_DISCHG_ILIM to 15 is consistent with how all other 14 interrupts are
handled here, so I'm assuming this is what it should have been.

Fixes: 2eedcbfc ("mfd: rk808: Add RK818 support")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NAndy Yan <andy.yan@rock-chips.com>
Signed-off-by: NLee Jones <lee.jones@linaro.org>

fae5e033

mfd: lp873x: Remove unused mutex lock from struct lp873x · fe62c477

由 Axel Lin 提交于 9月 06, 2016

The mutex is not used, so remove it.
Signed-off-by: NAxel Lin <axel.lin@ingics.com>
Signed-off-by: NLee Jones <lee.jones@linaro.org>

fe62c477

mfd: tps65217: Add support for IRQs · 6556bdac

由 Marcin Niestroj 提交于 9月 09, 2016

Add support for handling IRQs: power button, AC and USB power state
changes. Mask and interrupt bits are shared within one register, which
prevents us to use regmap_irq implementation. New irq_domain is created in
order to add interrupt handling for each tps65217's subsystem. IRQ
resources have been added for charger subsystem to be able to notify about
AC and USB state changes.
Signed-off-by: NMarcin Niestroj <m.niestroj@grinn-global.com>
Reviewed-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Tested-by: NTony Lindgren <tony@atomide.com>
Signed-off-by: NLee Jones <lee.jones@linaro.org>

6556bdac

mfd: Add Samsung Exynos Low Power Audio Subsystem driver · c695abab

由 Sylwester Nawrocki 提交于 8月 10, 2016

This patch adds common driver for the Top block of the Samsung Exynos
SoC Low Power Audio Subsystem.  This is a minimal driver which prepares
resources for IP blocks like I2S, audio DMA and UART and exposes
a regmap for the Top block registers.  Also system power ops are added
to ensure the Audio Subsystem is operational after system suspend/resume
cycle.
Signed-off-by: NInha Song <ideal.song@samsung.com>
Signed-off-by: NBeomho Seo <beomho.seo@samsung.com>
Signed-off-by: NSylwester Nawrocki <s.nawrocki@samsung.com>
Tested-by: NChanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: NLee Jones <lee.jones@linaro.org>

c695abab

mfd: max14577: Change Krzysztof Kozlowski's email to kernel.org · 8c5d0571

由 Krzysztof Kozlowski 提交于 8月 17, 2016

Change my email address to kernel.org instead of Samsung one for the
purpose of any future contact.  The copyrights remain untouched and are
attributed to Samsung.
Signed-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: NLee Jones <lee.jones@linaro.org>

8c5d0571

mfd: 88pm80x: Double shifting bug in suspend/resume · 9a6dc644

由 Dan Carpenter 提交于 8月 04, 2016

set_bit() and clear_bit() take the bit number so this code is really
doing "1 << (1 << irq)" which is a double shift bug.  It's done
consistently so it won't cause a problem unless "irq" is more than 4.

Fixes: 70c6cce0 ('mfd: Support 88pm80x in 80x driver')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NLee Jones <lee.jones@linaro.org>

9a6dc644

mfd: da9063: Update author information to remove incorrect e-mail addresses · 37778d83

由 Steve Twiss 提交于 8月 08, 2016

Remove incorrect e-mail addresses from the copyright header and
MODULE_AUTHOR() macro. These e-mail addresses are no longer in use.

The author names have not been changed, only the e-mail addresses have
been deleted from the source files.
Signed-off-by: NSteve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: NLee Jones <lee.jones@linaro.org>

37778d83

mfd: arizona: Add gating of external MCLKn clocks · cdd8da8c

由 Sylwester Nawrocki 提交于 9月 02, 2016

This patch adds requesting of the clocks supplied on MCLK1, MCLK2 pins,
gating of the 32k clock is added to the arizona_clk32k_enable(),
arizona_clk32k_disable() helpers.

It's a temporary change until the CODEC's clock controller gets exposed
through the clk API and is helpful for board configurations where the
MCLK clocks are not provided by always on oscillators.
Signed-off-by: NSylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: NCharles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: NLee Jones <lee.jones@linaro.org>

cdd8da8c

net/ncsi: Introduce ncsi_stop_dev() · c0cd1ba4

由 Gavin Shan 提交于 10月 04, 2016

This introduces ncsi_stop_dev(), as counterpart to ncsi_start_dev(),
to stop the NCSI device so that it can be reenabled in future. This
API should be called when the network device driver is going to
shutdown the device. There are 3 things done in the function: Stop
the channel monitoring; Reset channels to inactive state; Report
NCSI link down.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0cd1ba4

net: phy: Add Edge-rate driver for Microsemi PHYs. · a4cc96d1

由 Raju Lakkaraju 提交于 10月 03, 2016

Edge-rate:
As system and networking speeds increase, a signal's output transition,
also know as the edge rate or slew rate (V/ns), takes on greater importance
because high-speed signals come with a price. That price is an assortment of
interference problems like ringing on the line, signal overshoot and
undershoot, extended signal settling times, crosstalk noise, transmission
line reflections, false signal detection by the receiving device and
electromagnetic interference (EMI) -- all of which can negate the potential
gains designers are seeking when they try to increase system speeds through
the use of higher performance logic devices. The fact is, faster signaling
edge rates can cause a higher level of electrical noise or other type of
interference that can actually lead to slower line speeds and lower maximum
system frequencies. This parameter allow the board designers to change the
driving strange, and thereby change the EMI behavioral.

Edge-rate parameters (vddmac, edge-slowdown) get from Device Tree.

Tested on Beaglebone Black with VSC 8531 PHY.
Signed-off-by: NRaju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a4cc96d1

Using BUG_ON() as an assert() is _never_ acceptable · 21f54dda

由 Linus Torvalds 提交于 10月 03, 2016

That just generally kills the machine, and makes debugging only much
harder, since the traces may long be gone.

Debugging by assert() is a disease.  Don't do it.  If you can continue,
you're much better off doing so with a live machine where you have a
much higher chance that the report actually makes it to the system logs,
rather than result in a machine that is just completely dead.

The only valid situation for BUG_ON() is when continuing is not an
option, because there is massive corruption.  But if you are just
verifying that something is true, you warn about your broken assumptions
(preferably just once), and limp on.

Fixes: 22f2ac51 ("mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()")
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

21f54dda

qed: Add RoCE ll2 & GSI support · abd49676

由 Ram Amrani 提交于 10月 01, 2016

Add the RoCE-specific LL2 logic [as well as GSI support] over
the 'generic' LL2 interface.
Signed-off-by: NRam Amrani <Ram.Amrani@caviumnetworks.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

abd49676

qed: Add support for memory registeration verbs · ee8eaea3

由 Ram Amrani 提交于 10月 01, 2016

Add slowpath configuration support for user, dma and memory
regions registration.
Signed-off-by: NRam Amrani <Ram.Amrani@caviumnetworks.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee8eaea3

qed: Add support for QP verbs · f1093940

由 Ram Amrani 提交于 10月 01, 2016

Add support for the slowpath configurations of Queue Pair verbs
which adds, deletes, modifies and queries Queue Pairs.
Signed-off-by: NRam Amrani <Ram.Amrani@caviumnetworks.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1093940

qed: PD,PKEY and CQ verb support · c295f86e

由 Ram Amrani 提交于 10月 01, 2016

Add support for the configurations of the protection domain and
completion queues.
Signed-off-by: NRam Amrani <Ram.Amrani@caviumnetworks.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c295f86e

qed: Add support for RoCE hw init · 51ff1725

由 Ram Amrani 提交于 10月 01, 2016

This adds the backbone required for the various HW initalizations
which are necessary for the qedr driver - FW notification, resource
initializations, etc.
Signed-off-by: NRam Amrani <Ram.Amrani@caviumnetworks.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51ff1725

qede: Add qedr framework · cee9fbd8

由 Ram Amrani 提交于 10月 01, 2016

Adds a skeletal implementation of the qede RoCE driver -
The qedr has some dependencies of the state of the underlying base
interface. This adds some logic required with mutual registrations
and the ability to pass updates on 'intresting' events.
Signed-off-by: NRam Amrani <Ram.Amrani@caviumnetworks.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cee9fbd8

qed: Add Light L2 support · 0a7fb11c

由 Yuval Mintz 提交于 10月 01, 2016

Other protocols beside the networking driver need the ability
of passing some L2 traffic, usually [although not limited] for the
purpose of some management traffic.
Signed-off-by: NYuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: NRam Amrani <Ram.Amrani@caviumnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a7fb11c