提交 · cc252eae85e09552f9c1e7ac0c3227f835efdf2d · openeuler / Kernel

27 10月, 2018 13 次提交

mm, slab: combine kmalloc_caches and kmalloc_dma_caches · cc252eae

由 Vlastimil Babka 提交于 10月 26, 2018

Patch series "kmalloc-reclaimable caches", v4.

As discussed at LSF/MM [1] here's a patchset that introduces
kmalloc-reclaimable caches (more details in the second patch) and uses
them for dcache external names.  That allows us to repurpose the
NR_INDIRECTLY_RECLAIMABLE_BYTES counter later in the series.

With patch 3/6, dcache external names are allocated from kmalloc-rcl-*
caches, eliminating the need for manual accounting.  More importantly, it
also ensures the reclaimable kmalloc allocations are grouped in pages
separate from the regular kmalloc allocations.  The need for proper
accounting of dcache external names has shown it's easy for misbehaving
process to allocate lots of them, causing premature OOMs.  Without the
added grouping, it's likely that a similar workload can interleave the
dcache external names allocations with regular kmalloc allocations (note:
I haven't searched myself for an example of such regular kmalloc
allocation, but I would be very surprised if there wasn't some).  A
pathological case would be e.g.  one 64byte regular allocations with 63
external dcache names in a page (64x64=4096), which means the page is not
freed even after reclaiming after all dcache names, and the process can
thus "steal" the whole page with single 64byte allocation.

If other kmalloc users similar to dcache external names become identified,
they can also benefit from the new functionality simply by adding
__GFP_RECLAIMABLE to the kmalloc calls.

Side benefits of the patchset (that could be also merged separately)
include removed branch for detecting __GFP_DMA kmalloc(), and shortening
kmalloc cache names in /proc/slabinfo output.  The latter is potentially
an ABI break in case there are tools parsing the names and expecting the
values to be in bytes.

This is how /proc/slabinfo looks like after booting in virtme:

...
kmalloc-rcl-4M         0      0 4194304    1 1024 : tunables    1    1    0 : slabdata      0      0      0
...
kmalloc-rcl-96         7     32    128   32    1 : tunables  120   60    8 : slabdata      1      1      0
kmalloc-rcl-64        25    128     64   64    1 : tunables  120   60    8 : slabdata      2      2      0
kmalloc-rcl-32         0      0     32  124    1 : tunables  120   60    8 : slabdata      0      0      0
kmalloc-4M             0      0 4194304    1 1024 : tunables    1    1    0 : slabdata      0      0      0
kmalloc-2M             0      0 2097152    1  512 : tunables    1    1    0 : slabdata      0      0      0
kmalloc-1M             0      0 1048576    1  256 : tunables    1    1    0 : slabdata      0      0      0
...

/proc/vmstat with renamed nr_indirectly_reclaimable_bytes counter:

...
nr_slab_reclaimable 2817
nr_slab_unreclaimable 1781
...
nr_kernel_misc_reclaimable 0
...

/proc/meminfo with new KReclaimable counter:

...
Shmem:               564 kB
KReclaimable:      11260 kB
Slab:              18368 kB
SReclaimable:      11260 kB
SUnreclaim:         7108 kB
KernelStack:        1248 kB
...

This patch (of 6):

The kmalloc caches currently mainain separate (optional) array
kmalloc_dma_caches for __GFP_DMA allocations.  There are tests for
__GFP_DMA in the allocation hotpaths.  We can avoid the branches by
combining kmalloc_caches and kmalloc_dma_caches into a single
two-dimensional array where the outer dimension is cache "type".  This
will also allow to add kmalloc-reclaimable caches as a third type.

Link: http://lkml.kernel.org/r/20180731090649.16028-2-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NChristoph Lameter <cl@linux.com>
Acked-by: NRoman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cc252eae

mm: remove vm_insert_pfn() · ae2b01f3

由 Matthew Wilcox 提交于 10月 26, 2018

All callers are now converted to vmf_insert_pfn() so convert
vmf_insert_pfn() from being a compatibility wrapper around vm_insert_pfn()
to being a compatibility wrapper around vmf_insert_pfn_prot().

Link: http://lkml.kernel.org/r/20180828145728.11873-8-willy@infradead.orgSigned-off-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ae2b01f3

mm: remove references to vm_insert_pfn() · 67fa1666

由 Matthew Wilcox 提交于 10月 26, 2018

Documentation and comments.

Link: http://lkml.kernel.org/r/20180828145728.11873-7-willy@infradead.orgSigned-off-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

67fa1666

mm: make vm_insert_pfn_prot() static · bc12e6ad

由 Matthew Wilcox 提交于 10月 26, 2018

Now this is no longer used outside mm/memory.c, make it static.

Link: http://lkml.kernel.org/r/20180828145728.11873-6-willy@infradead.orgSigned-off-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bc12e6ad

mm: introduce vmf_insert_pfn_prot() · f5e6d1d5

由 Matthew Wilcox 提交于 10月 26, 2018

Like vm_insert_pfn_prot(), but returns a vm_fault_t instead of an errno.
Also unexport vm_insert_pfn_prot as it has no modular users.

Link: http://lkml.kernel.org/r/20180828145728.11873-4-willy@infradead.orgSigned-off-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f5e6d1d5

mm: remove vm_insert_mixed() · 5d747637

由 Matthew Wilcox 提交于 10月 26, 2018

All callers are now converted to vmf_insert_mixed() so convert
vmf_insert_mixed() from being a compatibility wrapper into the real
function.

Link: http://lkml.kernel.org/r/20180828145728.11873-3-willy@infradead.orgSigned-off-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5d747637

Revert "mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks" · 4e15a073

由 Michal Hocko 提交于 10月 26, 2018

Revert 5ff7091f ("mm, mmu_notifier: annotate mmu notifiers with
blockable invalidate callbacks").

MMU_INVALIDATE_DOES_NOT_BLOCK flags was the only one used and it is no
longer needed since 93065ac7 ("mm, oom: distinguish blockable mode for
mmu notifiers").  We now have a full support for per range !blocking
behavior so we can drop the stop gap workaround which the per notifier
flag was used for.

Link: http://lkml.kernel.org/r/20180827112623.8992-4-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4e15a073

mm, mmu_notifier: be explicit about range invalition non-blocking mode · 33490af3

由 Michal Hocko 提交于 10月 26, 2018

If invalidate_range_start() is called for !blocking mode then all
callbacks have to guarantee they will no block/sleep.  The same obviously
applies to invalidate_range_end because this operation pairs with the
former and they are called from the same context.  Make sure this is
appropriately documented.

Link: http://lkml.kernel.org/r/20180827112623.8992-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NJerome Glisse <jglisse@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

33490af3

mm: don't miss the last page because of round-off error · 68600f62

由 Roman Gushchin 提交于 10月 26, 2018

I've noticed, that dying memory cgroups are often pinned in memory by a
single pagecache page.  Even under moderate memory pressure they sometimes
stayed in such state for a long time.  That looked strange.

My investigation showed that the problem is caused by applying the LRU
pressure balancing math:

  scan = div64_u64(scan * fraction[lru], denominator),

where

  denominator = fraction[anon] + fraction[file] + 1.

Because fraction[lru] is always less than denominator, if the initial scan
size is 1, the result is always 0.

This means the last page is not scanned and has
no chances to be reclaimed.

Fix this by rounding up the result of the division.

In practice this change significantly improves the speed of dying cgroups
reclaim.

[guro@fb.com: prevent double calculation of DIV64_U64_ROUND_UP() arguments]
  Link: http://lkml.kernel.org/r/20180829213311.GA13501@castle
Link: http://lkml.kernel.org/r/20180827162621.30187-3-guro@fb.comSigned-off-by: NRoman Gushchin <guro@fb.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

68600f62

mm: rework memcg kernel stack accounting · 9b6f7e16

由 Roman Gushchin 提交于 10月 26, 2018

If CONFIG_VMAP_STACK is set, kernel stacks are allocated using
__vmalloc_node_range() with __GFP_ACCOUNT.  So kernel stack pages are
charged against corresponding memory cgroups on allocation and uncharged
on releasing them.

The problem is that we do cache kernel stacks in small per-cpu caches and
do reuse them for new tasks, which can belong to different memory cgroups.

Each stack page still holds a reference to the original cgroup, so the
cgroup can't be released until the vmap area is released.

To make this happen we need more than two subsequent exits without forks
in between on the current cpu, which makes it very unlikely to happen.  As
a result, I saw a significant number of dying cgroups (in theory, up to 2
* number_of_cpu + number_of_tasks), which can't be released even by
significant memory pressure.

As a cgroup structure can take a significant amount of memory (first of
all, per-cpu data like memcg statistics), it leads to a noticeable waste
of memory.

Link: http://lkml.kernel.org/r/20180827162621.30187-1-guro@fb.com
Fixes: ac496bf4 ("fork: Optimize task creation by caching two thread stacks per CPU if CONFIG_VMAP_STACK=y")
Signed-off-by: NRoman Gushchin <guro@fb.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NMichal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b6f7e16

fs/iomap.c: change return type to vm_fault_t · 5780a02f

由 Souptick Joarder 提交于 10月 26, 2018

Change iomap_page_mkwrite() return type to vm_fault_t.

see commit 1c8f4220 ("mm: change return type to vm_fault_t") for
reference.

Link: http://lkml.kernel.org/r/20180827172050.GA18673@jordon-HP-15-Notebook-PCSigned-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5780a02f

include/linux/linkage.h: align weak symbols · 74f213ea

由 Andrey Ryabinin 提交于 10月 26, 2018

Since WEAK() supposed to be used instead of ENTRY() to define weak
symbols, but unlike ENTRY() it doesn't have ALIGN directive. It seems
there is no actual reason to not have, so let's add ALIGN to WEAK() too.

Link: http://lkml.kernel.org/r/20180920135631.23833-1-aryabinin@virtuozzo.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Will Deacon <will.deacon@arm.com>, Catalin Marinas <catalin.marinas@arm.com>
Cc: Kyeongdon Kim <kyeongdon.kim@lge.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

74f213ea

include/linux/pfn_t.h: force '~' to be parsed as an unary operator · 4d54954a

由 Sebastien Boisvert 提交于 10月 26, 2018

Tracing the event "fs_dax:dax_pmd_insert_mapping" with perf produces this
warning:

      [fs_dax:dax_pmd_insert_mapping] unknown op '~'

It is printed in process_op (tools/lib/traceevent/event-parse.c) because
'~' is parsed as a binary operator.

perf reads the format of fs_dax:dax_pmd_insert_mapping ("print fmt") from
/sys/kernel/debug/tracing/events/fs_dax/dax_pmd_insert_mapping/format .

The format contains:

~(((u64) ~(~(((1UL) << 12)-1)))
         ^
         \ interpreted as a binary operator by process_op().

This part is generated in the declaration of the event class
dax_pmd_insert_mapping_class in include/trace/events/fs_dax.h :

		__print_flags_u64(__entry->pfn_val & PFN_FLAGS_MASK, "|",
			PFN_FLAGS_TRACE),

This patch adds a pair of parentheses in the declaration of PFN_FLAGS_MASK
to make sure that '~' is parsed as a unary operator by perf.

The part of the format that was problematic is now:

~(((u64) (~(~(((1UL) << 12)-1))))

Now, all the '~' are parsed as unary operators.

Link: http://lkml.kernel.org/r/20181021145939.8760-1-sebhtml@videotron.qc.caSigned-off-by: NSebastien Boisvert <sebhtml@videotron.qc.ca>
Acked-by: NDan Williams <dan.j.williams@intel.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Elenie Godzaridis <arangradient@gmail.com>
Cc: <stable@vger.kerenl.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4d54954a

23 10月, 2018 11 次提交

mfd: ti-lmu: Switch to GPIOD · 7a6a395b

由 Pavel Machek 提交于 9月 11, 2018

Use new descriptor based API instead of the legacy one.
Signed-off-by: NSebastian Reichel <sebastian.reichel@collabora.co.uk>
Signed-off-by: NPavel Machek <pavel@ucw.cz>
Signed-off-by: NLee Jones <lee.jones@linaro.org>

7a6a395b

mfd: max8997: Enale irq-wakeup unconditionally · efddff27

由 Marek Szyprowski 提交于 9月 05, 2018

IRQ wake up support for MAX8997 driver was initially configured by
respective property in pdata. However, after the driver conversion to
device-tree, setting it was left as 'todo'. Nowadays most of other PMIC MFD
drivers initialized from device-tree assume that they can be an irq wakeup
source, so enable it also for MAX8997. This fixes support for wakeup from
MAX8997 RTC alarm.
Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: NKrzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: NLee Jones <lee.jones@linaro.org>

efddff27

mfd: Convert Intel PMIC drivers to use SPDX identifier · 26c7e05a

由 Andy Shevchenko 提交于 8月 30, 2018

1;5201;0c
Reduce size of duplicated comments by switching to use SPDX identifier.

No functional change.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: NLee Jones <lee.jones@linaro.org>

26c7e05a

mfd: intel_soc_pmic_bxtwc: Chain power button IRQs as well · 9f8ddee1

由 Andy Shevchenko 提交于 8月 30, 2018

Power button IRQ actually has a second level of interrupts to
distinguish between UI and POWER buttons. Moreover, current
implementation looks awkward in approach to handle second level IRQs by
first level related IRQ chip.

To address above issues, split power button IRQ to be chained as well.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: NLee Jones <lee.jones@linaro.org>

9f8ddee1

mfd: mc13xxx-core: Fix PMIC shutdown when reading ADC values · 55143439

由 Fabio Estevam 提交于 8月 28, 2018

When trying to read any MC13892 ADC channel on a imx51-babbage board:

The MC13892 PMIC shutdowns completely.

After debugging this issue and comparing the MC13892 and MC13783
initializations done in the vendor kernel, it was noticed that the
CHRGRAWDIV bit of the ADC0 register was not being set.

This bit is set by default after power on, but the driver was
clearing it.

After setting this bit it is possible to read the ADC values correctly.
Signed-off-by: NFabio Estevam <fabio.estevam@nxp.com>
Tested-by: NChris Healy <cphealy@gmail.com>
Signed-off-by: NLee Jones <lee.jones@linaro.org>

55143439

mfd: madera: Remove unused forward reference · 6360e40f

由 Richard Fitzgerald 提交于 8月 28, 2018

The madera_irqchip_pdata struct was replaced by the irq_flags
member of struct madera_pdata so the forward reference is
obsolete.
Signed-off-by: NRichard Fitzgerald <rf@opensource.cirrus.com>
Signed-off-by: NLee Jones <lee.jones@linaro.org>

6360e40f

mfd: Add ingenic-tcu.h header · 3d51ec93

由 Paul Cercueil 提交于 8月 21, 2018

This header contains macros for the registers that are present in the
regmap shared by all the drivers related to the TCU (Timer Counter Unit)
of the Ingenic JZ47xx SoCs.
Signed-off-by: NPaul Cercueil <paul@crapouillou.net>
Signed-off-by: NLee Jones <lee.jones@linaro.org>

3d51ec93

mfd: maxim: Add SPDX license identifiers · d7d8d7a2

由 Krzysztof Kozlowski 提交于 8月 07, 2018

Replace GPL v2.0+ license statements with SPDX license identifiers.
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: NLee Jones <lee.jones@linaro.org>

d7d8d7a2

mfd: sec-core: Add SPDX license identifiers · 39b27ad9

由 Krzysztof Kozlowski 提交于 8月 07, 2018

Replace GPL v2.0+ license statements with SPDX license identifiers.
Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: NLee Jones <lee.jones@linaro.org>

39b27ad9

umh: Add command line to user mode helpers · 876dcf2f

由 Olivier Brunel 提交于 10月 20, 2018

User mode helpers were spawned without a command line, and because
an empty command line is used by many tools to identify processes as
kernel threads, this could cause some issues.

Notably during killing spree on shutdown, since such helper would then
be skipped (i.e. not killed) which would result in the process remaining
alive, and thus preventing unmouting of the rootfs (as experienced with
the bpfilter umh).

Fixes: 449325b5 ("umh: introduce fork_usermode_blob() helper")
Signed-off-by: NOlivier Brunel <jjk@jjacky.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

876dcf2f

f2fs: guarantee journalled quota data by checkpoint · af033b2a

由 Chao Yu 提交于 9月 20, 2018

For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.

The implementation is as below:

1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
 a) flush dquot metadata into quota file.
 b) flush quota file to storage to keep file usage be consistent.

2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
 a) checkpoint will skip syncing dquot metadata.
 b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
    hint for fsck repairing.

3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: NWeichao Guo <guoweichao@huawei.com>
Signed-off-by: NChao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

af033b2a

22 10月, 2018 1 次提交

pstore/ram: Clarify resource reservation labels · 1227daa4

由 Kees Cook 提交于 10月 17, 2018

When ramoops reserved a memory region in the kernel, it had an unhelpful
label of "persistent_memory". When reading /proc/iomem, it would be
repeated many times, did not hint that it was ramoops in particular,
and didn't clarify very much about what each was used for:

400000000-407ffffff : Persistent Memory (legacy)
  400000000-400000fff : persistent_memory
  400001000-400001fff : persistent_memory
...
  4000ff000-4000fffff : persistent_memory

Instead, this adds meaningful labels for how the various regions are
being used:

400000000-407ffffff : Persistent Memory (legacy)
  400000000-400000fff : ramoops:dump(0/252)
  400001000-400001fff : ramoops:dump(1/252)
...
  4000fc000-4000fcfff : ramoops:dump(252/252)
  4000fd000-4000fdfff : ramoops:console
  4000fe000-4000fe3ff : ramoops:ftrace(0/3)
  4000fe400-4000fe7ff : ramoops:ftrace(1/3)
  4000fe800-4000febff : ramoops:ftrace(2/3)
  4000fec00-4000fefff : ramoops:ftrace(3/3)
  4000ff000-4000fffff : ramoops:pmsg
Signed-off-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Tested-by: NSai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Tested-by: NGuenter Roeck <groeck@chromium.org>

1227daa4

21 10月, 2018 3 次提交

blkcg: reassociate bios when make_request() is called recursively · d459d853

由 Dennis Zhou 提交于 10月 20, 2018

When submitting a bio, multiple recursive calls to make_request() may
occur. This causes the initial associate done in blkcg_bio_issue_check()
to be incorrect and reference the prior request_queue. This introduces
a helper to do reassociation when make_request() is recursively called.

Fixes: a7b39b4e ("blkcg: always associate a bio with a blkg")
Reported-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Tested-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d459d853

blkcg: fix edge case for blk_get_rl() under memory pressure · b2c3fa54

由 Dennis Zhou 提交于 10月 20, 2018

It is possible for blkg creation to fail when in blk_get_rl(). In this
situation, the fallback logic returns the nearest created blkg. There is
however special handling for the request_list for the root blkcg. This
fixes the missing edge case from the earlier series changing
blk_get_rl().

Fixes: e2b09899 ("blkcg: cleanup and make blk_get_rl use blkg_lookup_create")
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b2c3fa54

bpf: sk_msg program helper bpf_msg_push_data · 6fff607e

由 John Fastabend 提交于 10月 19, 2018

This allows user to push data into a msg using sk_msg program types.
The format is as follows,

	bpf_msg_push_data(msg, offset, len, flags)

this will insert 'len' bytes at offset 'offset'. For example to
prepend 10 bytes at the front of the message the user can,

	bpf_msg_push_data(msg, 0, 10, 0);

This will invalidate data bounds so BPF user will have to then recheck
data bounds after calling this. After this the msg size will have been
updated and the user is free to write into the added bytes. We allow
any offset/len as long as it is within the (data, data_end) range.
However, a copy will be required if the ring is full and its possible
for the helper to fail with ENOMEM or EINVAL errors which need to be
handled by the BPF program.

This can be used similar to XDP metadata to pass data between sk_msg
layer and lower layers.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

6fff607e

20 10月, 2018 7 次提交

net: phy: micrel: add Microchip KSZ9131 initial driver · bff5b4b3

由 Yuiko Oshino 提交于 10月 18, 2018

Add support for Microchip Technology KSZ9131 10/100/1000 Ethernet PHY
Signed-off-by: NYuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bff5b4b3

netpoll: allow cleanup to be synchronous · c9fbd71f

由 Debabrata Banerjee 提交于 10月 18, 2018

This fixes a problem introduced by:
commit 2cde6acd ("netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock")

When using netconsole on a bond, __netpoll_cleanup can asynchronously
recurse multiple times, each __netpoll_free_async call can result in
more __netpoll_free_async's. This means there is now a race between
cleanup_work queues on multiple netpoll_info's on multiple devices and
the configuration of a new netpoll. For example if a netconsole is set
to enable 0, reconfigured, and enable 1 immediately, this netconsole
will likely not work.

Given the reason for __netpoll_free_async is it can be called when rtnl
is not locked, if it is locked, we should be able to execute
synchronously. It appears to be locked everywhere it's called from.

Generalize the design pattern from the teaming driver for current
callers of __netpoll_free_async.

CC: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NDebabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9fbd71f

bpf: skmsg, fix psock create on existing kcm/tls port · 5032d079

由 John Fastabend 提交于 10月 18, 2018

Before using the psock returned by sk_psock_get() when adding it to a
sockmap we need to ensure it is actually a sockmap based psock.
Previously we were only checking this after incrementing the reference
counter which was an error. This resulted in a slab-out-of-bounds
error when the psock was not actually a sockmap type.

This moves the check up so the reference counter is only used
if it is a sockmap psock.

Eric reported the following KASAN BUG,

BUG: KASAN: slab-out-of-bounds in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
BUG: KASAN: slab-out-of-bounds in refcount_inc_not_zero_checked+0x97/0x2f0 lib/refcount.c:120
Read of size 4 at addr ffff88019548be58 by task syz-executor4/22387

CPU: 1 PID: 22387 Comm: syz-executor4 Not tainted 4.19.0-rc7+ #264
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
 print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
 kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272
 atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
 refcount_inc_not_zero_checked+0x97/0x2f0 lib/refcount.c:120
 sk_psock_get include/linux/skmsg.h:379 [inline]
 sock_map_link.isra.6+0x41f/0xe30 net/core/sock_map.c:178
 sock_hash_update_common+0x19b/0x11e0 net/core/sock_map.c:669
 sock_hash_update_elem+0x306/0x470 net/core/sock_map.c:738
 map_update_elem+0x819/0xdf0 kernel/bpf/syscall.c:818
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

5032d079

bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB · b39b5f41

由 Song Liu 提交于 10月 19, 2018

BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the
skb. This patch enables direct access of skb for these programs.

Two helper functions bpf_compute_and_save_data_end() and
bpf_restore_data_end() are introduced. There are used in
__cgroup_bpf_run_filter_skb(), to compute proper data_end for the
BPF program, and restore original data afterwards.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

b39b5f41

bpf: add queue and stack maps · f1a2e44a

由 Mauricio Vasquez B 提交于 10月 18, 2018

Queue/stack maps implement a FIFO/LIFO data storage for ebpf programs.
These maps support peek, pop and push operations that are exposed to eBPF
programs through the new bpf_map[peek/pop/push] helpers.  Those operations
are exposed to userspace applications through the already existing
syscalls in the following way:

BPF_MAP_LOOKUP_ELEM            -> peek
BPF_MAP_LOOKUP_AND_DELETE_ELEM -> pop
BPF_MAP_UPDATE_ELEM            -> push

Queue/stack maps are implemented using a buffer, tail and head indexes,
hence BPF_F_NO_PREALLOC is not supported.

As opposite to other maps, queue and stack do not use RCU for protecting
maps values, the bpf_map[peek/pop] have a ARG_PTR_TO_UNINIT_MAP_VALUE
argument that is a pointer to a memory zone where to save the value of a
map.  Basically the same as ARG_PTR_TO_UNINIT_MEM, but the size has not
be passed as an extra argument.

Our main motivation for implementing queue/stack maps was to keep track
of a pool of elements, like network ports in a SNAT, however we forsee
other use cases, like for exampling saving last N kernel events in a map
and then analysing from userspace.
Signed-off-by: NMauricio Vasquez B <mauricio.vasquez@polito.it>
Acked-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

f1a2e44a

bpf/verifier: add ARG_PTR_TO_UNINIT_MAP_VALUE · 2ea864c5

由 Mauricio Vasquez B 提交于 10月 18, 2018

ARG_PTR_TO_UNINIT_MAP_VALUE argument is a pointer to a memory zone
used to save the value of a map.  Basically the same as
ARG_PTR_TO_UNINIT_MEM, but the size has not be passed as an extra
argument.

This will be used in the following patch that implements some new
helpers that receive a pointer to be filled with a map value.
Signed-off-by: NMauricio Vasquez B <mauricio.vasquez@polito.it>
Acked-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

2ea864c5

bpf: rename stack trace map operations · 14499160

由 Mauricio Vasquez B 提交于 10月 18, 2018

In the following patches queue and stack maps (FIFO and LIFO
datastructures) will be implemented.  In order to avoid confusion and
a possible name clash rename stack_map_ops to stack_trace_map_ops
Signed-off-by: NMauricio Vasquez B <mauricio.vasquez@polito.it>
Acked-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

14499160

19 10月, 2018 5 次提交

regulator: bd718x7: Remove struct bd718xx_pmic · bcb047eb

由 Axel Lin 提交于 10月 04, 2018

All the fields in struct bd718xx_pmic are not really necessary.
Remove struct bd718xx_pmic to simplify the code.
Signed-off-by: NAxel Lin <axel.lin@ingics.com>
Reviewed-by: NMatti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Signed-off-by: NMark Brown <broonie@kernel.org>

bcb047eb

regmap: Add regmap_noinc_write API · cdf6b11d

由 Ben Whitten 提交于 10月 19, 2018

The regmap API had a noinc_read function added for instances where devices
supported returning data from an internal FIFO in a single read.

This commit adds the noinc_write variant to allow writing to a non
incrementing register, this is used in devices such as the sx1301 for
loading firmware.
Signed-off-by: NBen Whitten <ben.whitten@lairdtech.com>
Signed-off-by: NMark Brown <broonie@kernel.org>

cdf6b11d

locking/lockdep: Make global debug_locks* variables read-mostly · 01a14bda

由 Waiman Long 提交于 10月 18, 2018

Make the frequently used lockdep global variable debug_locks read-mostly.
As debug_locks_silent is sometime used together with debug_locks,
it is also made read-mostly so that they can be close together.

With false cacheline sharing, cacheline contention problem can happen
depending on what get put into the same cacheline as debug_locks.
Signed-off-by: NWaiman Long <longman@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1539913518-15598-2-git-send-email-longman@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

01a14bda

net/mlx5: Added "per_lane_error_counters" cap bit to PCAM · 67daf118

由 Shay Agroskin 提交于 9月 30, 2018

Added "Per lane raw errors" capability bit in
Ports Capabilities Mask (PCAM) enhanced features
layout.

This bit determines if the fields "phy_raw_errors_laneX"
in "Physical Layer statistical" counters group are supported.
Signed-off-by: NShay Agroskin <shayag@mellanox.com>
Reviewed-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

67daf118

net/mlx5: Add FEC fields to Port Phy Link Mode (PPLM) reg · 4b5b9c7d

由 Shay Agroskin 提交于 10月 09, 2018

Added FEC related fields to PPLM layout.
These fields are needed to set and query FEC policy
for different link speeds.
Signed-off-by: NShay Agroskin <shayag@mellanox.com>
Reviewed-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

4b5b9c7d

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功