提交 · b9bb6fb73b3e112d241a5edd146740be9a0c3cc0 · gsplhtlxg / clone-Linux

23 4月, 2015 1 次提交

Revert "mm: avoid tail page refcounting on non-THP compound pages" · f3ca10dd

由 Linus Torvalds 提交于 4月 22, 2015

This reverts commit 8d63d99a.

It causes in VM mapping refcount errors:

  page:ffffea0010a15040 count:0 mapcount:1 mapping:          (null) index:0x0
  flags: 0x8000000000008014(referenced|dirty|tail)
  page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
  ------------[ cut here ]------------
  kernel BUG at mm/swap.c:134!

as reported by Borislav Petkov
Reported-and-tested-by: NBorislav Petkov <bp@alien8.de>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f3ca10dd

21 4月, 2015 2 次提交

net: add skb_checksum_complete_unset · 4e18b9ad

由 Tom Herbert 提交于 4月 20, 2015

This function changes ip_summed to CHECKSUM_NONE if CHECKSUM_COMPLETE
is set. This is called to discard checksum-complete when packet
is being modified and checksum is not pulled for headers in a layer.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e18b9ad

smp: don't use 16-bit words for atomic accesses · f4d03bd1

由 Linus Torvalds 提交于 4月 20, 2015

Yes, it should work, but it's a bad idea. Not only did ARM64 not have
the 16-bit access code (there's a separate patch to add it), it's just
not a good atomic type. Some architectures fundamentally don't do
atomic accesses in them (alpha), and it's not like it saves any space
here anyway because of structure packing issues.

We normally should aim for flags to be "unsigned int" or "unsigned
long". And if space is at a premium, use a single byte (although that
causes problems on alpha again). There might be very special cases
where a 16-byte entity is really wanted, but this is not one of them.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f4d03bd1

20 4月, 2015 1 次提交

media-bus: Fixup RGB444_1X12, RGB565_1X16, and YUV8_1X24 media bus format · cec32a47

由 Philipp Zabel 提交于 4月 17, 2015

Change the constant values for RGB444_1X12, RGB565_1X16, and YUV8_1X24 media
bus formats in anticipation of a merge conflict with the media tree, where
the old values are already taken by RBG888_1X24, RGB888_1X32_PADHI, and
VUY8_1X24, respectively.
Signed-off-by: NPhilipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: NDave Airlie <airlied@redhat.com>

cec32a47

19 4月, 2015 2 次提交

Break up monolithic iommu table/lock into finer graularity pools and lock · ff7d37a5

由 Sowmini Varadhan 提交于 4月 09, 2015

Investigation of multithreaded iperf experiments on an ethernet
interface show the iommu->lock as the hottest lock identified by
lockstat, with something of the order of 21M contentions out of
27M acquisitions, and an average wait time of 26 us for the lock.
This is not efficient. A more scalable design is to follow the ppc
model, where the iommu_map_table has multiple pools, each stretching
over a segment of the map, and with a separate lock for each pool.
This model allows for better parallelization of the iommu map search.

This patch adds the iommu range alloc/free function infrastructure.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff7d37a5

sparc: Revert generic IOMMU allocator. · c12f048f

由 David S. Miller 提交于 4月 18, 2015

I applied the wrong version of this patch series, V4 instead
of V10, due to a patchwork bundling snafu.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c12f048f

18 4月, 2015 3 次提交

netns: remove BUG_ONs from net_generic() · 2591ffd3

由 Denys Vlasenko 提交于 4月 17, 2015

This inline has ~500 callsites.

On 04/14/2015 08:37 PM, David Miller wrote:
> That BUG_ON() was added 7 years ago, and I don't remember it ever
> triggering or helping us diagnose something, so just remove it and
> keep the function inlined.

On x86 allyesconfig build:

    text     data      bss       dec     hex filename
82447071 22255384 20627456 125329911 77861f7 vmlinux4
82441375 22255384 20627456 125324215 7784bb7 vmlinux5prime
Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
CC: Eric W. Biederman <ebiederm@xmission.com>
CC: David S. Miller <davem@davemloft.net>
CC: Jan Engelhardt <jengelh@medozas.de>
CC: Jiri Pirko <jpirko@redhat.com>
CC: linux-kernel@vger.kernel.org
CC: netdev@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2591ffd3

net: remove unused 'dev' argument from netif_needs_gso() · 8b86a61d

由 Johannes Berg 提交于 4月 17, 2015

In commit 04ffcb25 ("net: Add ndo_gso_check") Tom originally
added the 'dev' argument to be able to call ndo_gso_check().

Then later, when generalizing this in commit 5f35227e
("net: Generalize ndo_gso_check to ndo_features_check")
Jesse removed the call to ndo_gso_check() in netif_needs_gso()
by calling the new ndo_features_check() in a different place.
This made the 'dev' argument unused.

Remove the unused argument and go back to the code as before.

Cc: Tom Herbert <therbert@google.com>
Cc: Jesse Gross <jesse@nicira.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b86a61d

inet_diag: fix access to tcp cc information · 521f1cf1

由 Eric Dumazet 提交于 4月 16, 2015

Two different problems are fixed here :

1) inet_sk_diag_fill() might be called without socket lock held.
   icsk->icsk_ca_ops can change under us and module be unloaded.
   -> Access to freed memory.
   Fix this using rcu_read_lock() to prevent module unload.

2) Some TCP Congestion Control modules provide information
   but again this is not safe against icsk->icsk_ca_ops
   change and nla_put() errors were ignored. Some sockets
   could not get the additional info if skb was almost full.

Fix this by returning a status from get_info() handlers and
using rcu protection as well.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

521f1cf1

17 4月, 2015 17 次提交

blk-mq: fix iteration of busy bitmap · 569fd0ce

由 Jens Axboe 提交于 4月 17, 2015

Commit 889fa31f was a bit too eager in reducing the loop count,
so we ended up missing queues in some configurations. Ensure that
our division rounds up, so that's not the case.
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Fixes: 889fa31f ("blk-mq: reduce unnecessary software queue looping")
Signed-off-by: NJens Axboe <axboe@fb.com>

569fd0ce

proc: show locks in /proc/pid/fdinfo/X · 6c8c9031

由 Andrey Vagin 提交于 4月 16, 2015

Let's show locks which are associated with a file descriptor in
its fdinfo file.

Currently we don't have a reliable way to determine who holds a lock.  We
can find some information in /proc/locks, but PID which is reported there
can be wrong.  For example, a process takes a lock, then forks a child and
dies.  In this case /proc/locks contains the parent pid, which can be
reused by another process.

$ cat /proc/locks
...
6: FLOCK  ADVISORY  WRITE 324 00:13:13431 0 EOF
...

$ ps -C rpcbind
  PID TTY          TIME CMD
  332 ?        00:00:00 rpcbind

$ cat /proc/332/fdinfo/4
pos:	0
flags:	0100000
mnt_id:	22
lock:	1: FLOCK  ADVISORY  WRITE 324 00:13:13431 0 EOF

$ ls -l /proc/332/fd/4
lr-x------ 1 root root 64 Mar  5 14:43 /proc/332/fd/4 -> /run/rpcbind.lock

$ ls -l /proc/324/fd/
total 0
lrwx------ 1 root root 64 Feb 27 14:50 0 -> /dev/pts/0
lrwx------ 1 root root 64 Feb 27 14:50 1 -> /dev/pts/0
lrwx------ 1 root root 64 Feb 27 14:49 2 -> /dev/pts/0

You can see that the process with the 324 pid doesn't hold the lock.

This information is required for proper dumping and restoring file
locks.
Signed-off-by: NAndrey Vagin <avagin@openvz.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: NJeff Layton <jlayton@poochiereds.net>
Acked-by: N"J. Bruce Fields" <bfields@fieldses.org>
Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6c8c9031

seccomp: allow COMPAT sigreturn overrides · ddaa27ee

由 Kees Cook 提交于 4月 16, 2015

Most architectures don't need to do much special for the strict-mode
seccomp syscall entries.  Remove the redundant headers and reduce the
others.

This patch (of 8):

Some architectures may need to override the compat sigreturn definition,
as is already possible in the non-compat case.
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: James Hogan <james.hogan@imgtec.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ddaa27ee

include/linux/kconfig.h: ese macros which are already defined · 02d699f1

由 Michal Simek 提交于 4月 16, 2015

It is better to use macros which are already available because then there
is just one location which needs to be change.

[akpm@linux-foundation.org: move IS_ENABLED definition to after the IS_BUILTIN and IS_MODULE definitions]
Signed-off-by: NMichal Simek <michal.simek@xilinx.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

02d699f1

mm: rcu-protected get_mm_exe_file() · 90f31d0e

由 Konstantin Khlebnikov 提交于 4月 16, 2015

This patch removes mm->mmap_sem from mm->exe_file read side.
Also it kills dup_mm_exe_file() and moves exe_file duplication into
dup_mmap() where both mmap_sems are locked.

[akpm@linux-foundation.org: fix comment typo]
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

90f31d0e

kernel/sysctl.c: threads-max observe limits · 16db3d3f

由 Heinrich Schuchardt 提交于 4月 16, 2015

Users can change the maximum number of threads by writing to
/proc/sys/kernel/threads-max.

With the patch the value entered is checked against the same limits that
apply when fork_init is called.
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

16db3d3f

drivers/rtc/rtc-s5m.c: add support for S2MPS13 RTC · 5281f94a

由 Krzysztof Kozlowski 提交于 4月 16, 2015

The S2MPS13 RTC is almost the same as S2MPS14. The differences when
updating alarm are:
1. Set WUDR+AUDR field instead of WUDR+RUDR.
2. Clear the AUDR field later (it is not auto-cleared).
Signed-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5281f94a

errno.h: Improve ENOSYS's comment · e15f431f

由 Andy Lutomirski 提交于 4月 16, 2015

ENOSYS is the mechanism used by user code to detect whether the running
kernel implements a given system call.  It should not be returned by
anything except an unimplemented system call.

Unfortunately, it is rather frequently used in the kernel to indicate that
various new functions of existing system calls are not implemented.  This
should be discouraged.

Improve the comment in errno.h to help clarify ENOSYS's purpose.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e15f431f

lib/bitmap.c: bitmap_[empty,full]: remove code duplication · 2afe27c7

由 Yury Norov 提交于 4月 16, 2015

bitmap_empty() has its own implementation.  But it's clearly as simple as:

	find_first_bit(src, nbits) == nbits

The same is true for 'bitmap_full'.
Signed-off-by: NYury Norov <yury.norov@gmail.com>
Cc: George Spelvin <linux@horizon.com>
Cc: Alexey Klimov <klimov.linux@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2afe27c7

kernel.h: implement DIV_ROUND_CLOSEST_ULL · f766093e

由 Javi Merino 提交于 4月 16, 2015

We have grown a number of different implementations of
DIV_ROUND_CLOSEST_ULL throughout the kernel.  Move the i915 one to
kernel.h so that it can be reused.
Signed-off-by: NJavi Merino <javi.merino@arm.com>
Reviewed-by: NJeff Epler <jepler@unpythonic.net>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Guenter Roeck <linux@roeck-us.net>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Alex Elder <elder@linaro.org>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Turquette <mturquette@linaro.org>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f766093e

util_macros.h: add find_closest() macro · 95d11952

由 Bartosz Golaszewski 提交于 4月 16, 2015

This series unduplicates the code used to find the member in an array
closest to 'x'.

The first patch adds a macro implementing the algorithm in two flavors -
for arrays sorted in ascending and descending order.  The second updates
Documentation/CodingStyle on the naming convention for local variables in
macros resembling functions.  Other three patches replace duplicated code
with calls to one of these macros in some hwmon drivers.

This patch (of 5):

Searching for the member of an array closest to 'x' is duplicated in
several places.

Add a new include - util_macros.h - and two macros that implement this
algorithm for arrays sorted both in ascending and descending order.

Uses linear search.
Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

95d11952

lib: find_*_bit reimplementation · 2c57a0e2

由 Yury Norov 提交于 4月 16, 2015

This patchset does rework to find_bit function family to achieve better
performance, and decrease size of text.  All rework is done in patch 1.
Patches 2 and 3 are about code moving and renaming.

It was boot-tested on x86_64 and MIPS (big-endian) machines.
Performance tests were ran on userspace with code like this:

	/* addr[] is filled from /dev/urandom */
	start = clock();
	while (ret < nbits)
		ret = find_next_bit(addr, nbits, ret + 1);

	end = clock();
	printf("%ld\t", (unsigned long) end - start);

On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz measurements are: (for
find_next_bit, nbits is 8M, for find_first_bit - 80K)

	find_next_bit:		find_first_bit:
	new	current		new	current
	26932	43151		14777	14925
	26947	43182		14521	15423
	26507	43824		15053	14705
	27329	43759		14473	14777
	26895	43367		14847	15023
	26990	43693		15103	15163
	26775	43299		15067	15232
	27282	42752		14544	15121
	27504	43088		14644	14858
	26761	43856		14699	15193
	26692	43075		14781	14681
	27137	42969		14451	15061
	...			...

find_next_bit performance gain is 35-40%;
find_first_bit - no measurable difference.

On ARM machine, there is arch-specific implementation for find_bit.

Thanks a lot to George Spelvin and Rasmus Villemoes for hints and
helpful discussions.

This patch (of 3):

New implementations takes less space in source file (see diffstat) and in
object.  For me it's 710 vs 453 bytes of text.  It also shows better
performance.

find_last_bit description fixed due to obvious typo.

[akpm@linux-foundation.org: include linux/bitmap.h, per Rasmus]
Signed-off-by: NYury Norov <yury.norov@gmail.com>
Reviewed-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: NGeorge Spelvin <linux@horizon.com>
Cc: Alexey Klimov <klimov.linux@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Valentin Rothberg <valentinrothberg@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2c57a0e2

Revert "mmc: core: Convert mmc_driver to device_driver" · 96541bac

由 Ulf Hansson 提交于 4月 14, 2015

This reverts commit 6685ac62 ("mmc: core: Convert mmc_driver to
device_driver")

The reverted commit went too far in simplifing the device driver parts
for mmc.

Let's restore the old mmc_driver to enable driver core to sooner
or later to remove the ->probe(), ->remove() and ->shutdown() callbacks
from the struct device_driver.

Note that, the old ->suspend|resume() callbacks in the struct
mmc_driver don't need to be restored, since the mmc block layer has
converted to the modern system PM ops.

Fixes: 6685ac62 ("mmc: core: Convert mmc_driver to device_driver")
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
Acked-by: NJaehoon Chung <jh80.chung@samsung.com>

96541bac

sparc: Break up monolithic iommu table/lock into finer graularity pools and lock · 10b88a4b

由 Sowmini Varadhan 提交于 3月 12, 2015

Investigation of multithreaded iperf experiments on an ethernet
interface show the iommu->lock as the hottest lock identified by
lockstat, with something of the order of 21M contentions out of
27M acquisitions, and an average wait time of 26 us for the lock.
This is not efficient. A more scalable design is to follow the ppc
model, where the iommu_table has multiple pools, each stretching
over a segment of the map, and with a separate lock for each pool.
This model allows for better parallelization of the iommu map search.

This patch adds the iommu range alloc/free function infrastructure.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10b88a4b

bpf: fix bpf helpers to use skb->mac_header relative offsets · a166151c

由 Alexei Starovoitov 提交于 4月 15, 2015

For the short-term solution, lets fix bpf helper functions to use
skb->mac_header relative offsets instead of skb->data in order to
get the same eBPF programs with cls_bpf and act_bpf work on ingress
and egress qdisc path. We need to ensure that mac_header is set
before calling into programs. This is effectively the first option
from below referenced discussion.

More long term solution for LD_ABS|LD_IND instructions will be more
intrusive but also more beneficial than this, and implemented later
as it's too risky at this point in time.

I.e., we plan to look into the option of moving skb_pull() out of
eth_type_trans() and into netif_receive_skb() as has been suggested
as second option. Meanwhile, this solution ensures ingress can be
used with eBPF, too, and that we won't run into ABI troubles later.
For dealing with negative offsets inside eBPF helper functions,
we've implemented bpf_skb_clone_unwritable() to test for unwriteable
headers.

Reference: http://thread.gmane.org/gmane.linux.network/359129/focus=359694
Fixes: 608cd71a ("tc: bpf: generalize pedit action")
Fixes: 91bc4822 ("tc: bpf: add checksum helpers")
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a166151c

stmmac: Read tx-fifo-depth and rx-fifo-depth from the devicetree · e7877f52

由 Vince Bridgers 提交于 4月 15, 2015

Read the tx-fifo-depth and rx-fifo-depth from the devicetree. The Synopsys
stmmac controller fifos are configurable per product instance, and the fifo
sizes are needed to configure certain features correctly such as flow control.
Signed-off-by: NVince Bridgers <vbridger@opensource.altera.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7877f52

f2fs: pass checkpoint reason on roll-forward recovery · 10027551

由 Jaegeuk Kim 提交于 4月 09, 2015

This patch adds CP_RECOVERY to remain recovery information for checkpoint.
And, it makes sure writing checkpoint in this case.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

10027551

16 4月, 2015 14 次提交

cpumask: resurrect CPU_MASK_CPU0 · 1527781d

由 Rusty Russell 提交于 4月 16, 2015

We removed it in 2f0f267e (cpumask: remove deprecated functions.),
but grep shows it still used by MIPS, and not unreasonably.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

1527781d

linux/bitmap.h: improve BITMAP_{LAST,FIRST}_WORD_MASK · 89c1e79e

由 Rasmus Villemoes 提交于 4月 15, 2015

The macro BITMAP_LAST_WORD_MASK can be implemented without a conditional,
which will generally lead to slightly better generated code (221 bytes
saved for allmodconfig-GCOV_KERNEL, ~2k with GCOV_KERNEL).  As a small
bonus, this also ensures that the nbits parameter is expanded exactly
once.

In BITMAP_FIRST_WORD_MASK, if start is signed gcc is technically allowed
to assume it is positive (or divisible by BITS_PER_LONG), and hence just
do the simple mask.  It doesn't seem to use this, and even on an
architecture like x86 where the shift only depends on the lower 5 or 6
bits, and these bits are not affected by the signedness of the expression,
gcc still generates code to compute the C99 mandated value of start %
BITS_PER_LONG.  So just use a mask explicitly, also for consistency with
BITMAP_LAST_WORD_MASK.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: NGeorge Spelvin <linux@horizon.com>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

89c1e79e

lib/string_helpers.c: change semantics of string_escape_mem · 41416f23

由 Rasmus Villemoes 提交于 4月 15, 2015

The current semantics of string_escape_mem are inadequate for one of its
current users, vsnprintf().  If that is to honour its contract, it must
know how much space would be needed for the entire escaped buffer, and
string_escape_mem provides no way of obtaining that (short of allocating a
large enough buffer (~4 times input string) to let it play with, and
that's definitely a big no-no inside vsnprintf).

So change the semantics for string_escape_mem to be more snprintf-like:
Return the size of the output that would be generated if the destination
buffer was big enough, but of course still only write to the part of dst
it is allowed to, and (contrary to snprintf) don't do '\0'-termination.
It is then up to the caller to detect whether output was truncated and to
append a '\0' if desired.  Also, we must output partial escape sequences,
otherwise a call such as snprintf(buf, 3, "%1pE", "\123") would cause
printf to write a \0 to buf[2] but leaving buf[0] and buf[1] with whatever
they previously contained.

This also fixes a bug in the escaped_string() helper function, which used
to unconditionally pass a length of "end-buf" to string_escape_mem();
since the latter doesn't check osz for being insanely large, it would
happily write to dst.  For example, kasprintf(GFP_KERNEL, "something and
then %pE", ...); is an easy way to trigger an oops.

In test-string_helpers.c, the -ENOMEM test is replaced with testing for
getting the expected return value even if the buffer is too small.  We
also ensure that nothing is written (by relying on a NULL pointer deref)
if the output size is 0 by passing NULL - this has to work for
kasprintf("%pE") to work.

In net/sunrpc/cache.c, I think qword_add still has the same semantics.
Someone should definitely double-check this.

In fs/proc/array.c, I made the minimum possible change, but longer-term it
should stop poking around in seq_file internals.

[andriy.shevchenko@linux.intel.com: simplify qword_add]
[andriy.shevchenko@linux.intel.com: add missed curly braces]
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

41416f23

printk: comment pr_cont() stating it is only to continue a line · 7b1460ec

由 Steven Rostedt 提交于 4月 15, 2015

KERN_CONT is nicely commented in kern_levels.h, but pr_cont() is now used
more often, and it lacks the comment stating what it is used for.  It can
be confused as continuing the log level, but that is not its purpose.  Its
purpose is to continue a line that had no newline enclosed.  This should
be documented by pr_cont() as well.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Acked-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7b1460ec

kernel/reboot.c: add orderly_reboot for graceful reboot · 7a54f46b

由 Joel Stanley 提交于 4月 15, 2015

The kernel has orderly_poweroff which allows the kernel to initiate a
graceful shutdown of userspace, by running /sbin/poweroff.  This adds
orderly_reboot that will cause userspace to shut itself down by calling
/sbin/reboot.

This will be used for shutdown initiated by a system controller on
platforms that do not use ACPI.

orderly_reboot() should be used when the system wants to allow userspace
to gracefully shut itself down.  For cases where the system may imminently
catch on fire, the existing emergency_restart() provides an immediate
reboot without involving userspace.
Signed-off-by: NJoel Stanley <joel@jms.id.au>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7a54f46b

kernel/resource.c: remove deprecated __check_region() and friends · 96831c0a

由 Jakub Sitnicki 提交于 4月 15, 2015

All users of __check_region(), check_region(), and check_mem_region() are
gone.  We got rid of the last user in v4.0-rc1.  Remove them.

bloat-o-meter on x86_64 shows:

add/remove: 0/3 grow/shrink: 0/0 up/down: 0/-102 (-102)
function                                     old     new   delta
__kstrtab___check_region                      15       -     -15
__ksymtab___check_region                      16       -     -16
__check_region                                71       -     -71
Signed-off-by: NJakub Sitnicki <jsitnicki@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

96831c0a

kernel: conditionally support non-root users, groups and capabilities · 2813893f

由 Iulia Manda 提交于 4月 15, 2015

There are a lot of embedded systems that run most or all of their
functionality in init, running as root:root.  For these systems,
supporting multiple users is not necessary.

This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for
non-root users, non-root groups, and capabilities optional.  It is enabled
under CONFIG_EXPERT menu.

When this symbol is not defined, UID and GID are zero in any possible case
and processes always have all capabilities.

The following syscalls are compiled out: setuid, setregid, setgid,
setreuid, setresuid, getresuid, setresgid, getresgid, setgroups,
getgroups, setfsuid, setfsgid, capget, capset.

Also, groups.c is compiled out completely.

In kernel/capability.c, capable function was moved in order to avoid
adding two ifdef blocks.

This change saves about 25 KB on a defconfig build.  The most minimal
kernels have total text sizes in the high hundreds of kB rather than
low MB.  (The 25k goes down a bit with allnoconfig, but not that much.

The kernel was booted in Qemu.  All the common functionalities work.
Adding users/groups is not possible, failing with -ENOSYS.

Bloat-o-meter output:
add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650)

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NIulia Manda <iulia.manda21@gmail.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2813893f

include/linux: remove empty conditionals · 23f40a94

由 Rasmus Villemoes 提交于 4月 15, 2015

Commit 607ca46e ("UAPI: (Scripted) Disintegrate include/linux") left
behind some empty conditional blocks.  Since they are useless and may
cause a reader to wonder whether something is missing, remove them.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

23f40a94

zsmalloc: support compaction · 312fcae2

由 Minchan Kim 提交于 4月 15, 2015

This patch provides core functions for migration of zsmalloc.  Migraion
policy is simple as follows.

for each size class {
        while {
                src_page = get zs_page from ZS_ALMOST_EMPTY
                if (!src_page)
                        break;
                dst_page = get zs_page from ZS_ALMOST_FULL
                if (!dst_page)
                        dst_page = get zs_page from ZS_ALMOST_EMPTY
                if (!dst_page)
                        break;
                migrate(from src_page, to dst_page);
        }
}

For migration, we need to identify which objects in zspage are allocated
to migrate them out.  We could know it by iterating of freed objects in a
zspage because first_page of zspage keeps free objects singly-linked list
but it's not efficient.  Instead, this patch adds a tag(ie,
OBJ_ALLOCATED_TAG) in header of each object(ie, handle) so we could check
whether the object is allocated easily.

This patch adds another status bit in handle to synchronize between user
access through zs_map_object and migration.  During migration, we cannot
move objects user are using due to data coherency between old object and
new object.

[akpm@linux-foundation.org: zsmalloc.c needs sched.h for cond_resched()]
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Gunho Lee <gunho.lee@lge.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

312fcae2

dax: use pfn_mkwrite to update c/mtime + freeze protection · 0e3b210c

由 Boaz Harrosh 提交于 4月 15, 2015

From: Yigal Korman <yigal@plexistor.com>

[v1]
Without this patch, c/mtime is not updated correctly when mmap'ed page is
first read from and then written to.

A new xfstest is submitted for testing this (generic/080)

[v2]
Jan Kara has pointed out that if we add the
sb_start/end_pagefault pair in the new pfn_mkwrite we
are then fixing another bug where: A user could start
writing to the page while filesystem is frozen.
Signed-off-by: NYigal Korman <yigal@plexistor.com>
Signed-off-by: NBoaz Harrosh <boaz@plexistor.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0e3b210c

mm: new pfn_mkwrite same as page_mkwrite for VM_PFNMAP · dd906184

由 Boaz Harrosh 提交于 4月 15, 2015

This will allow FS that uses VM_PFNMAP | VM_MIXEDMAP (no page structs) to
get notified when access is a write to a read-only PFN.

This can happen if we mmap() a file then first mmap-read from it to
page-in a read-only PFN, than we mmap-write to the same page.

We need this functionality to fix a DAX bug, where in the scenario above
we fail to set ctime/mtime though we modified the file.  An xfstest is
attached to this patchset that shows the failure and the fix.  (A DAX
patch will follow)

This functionality is extra important for us, because upon dirtying of a
pmem page we also want to RDMA the page to a remote cluster node.

We define a new pfn_mkwrite and do not reuse page_mkwrite because
  1 - The name ;-)
  2 - But mainly because it would take a very long and tedious
      audit of all page_mkwrite functions of VM_MIXEDMAP/VM_PFNMAP
      users. To make sure they do not now CRASH. For example current
      DAX code (which this is for) would crash.
      If we would want to reuse page_mkwrite, We will need to first
      patch all users, so to not-crash-on-no-page. Then enable this
      patch. But even if I did that I would not sleep so well at night.
      Adding a new vector is the safest thing to do, and is not that
      expensive. an extra pointer at a static function vector per driver.
      Also the new vector is better for performance, because else we
      Will call all current Kernel vectors, so to:
        check-ha-no-page-do-nothing and return.

No need to call it from do_shared_fault because do_wp_page is called to
change pte permissions anyway.
Signed-off-by: NYigal Korman <yigal@plexistor.com>
Signed-off-by: NBoaz Harrosh <boaz@plexistor.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dd906184

mm/mempool.c: kasan: poison mempool elements · 92393615

由 Andrey Ryabinin 提交于 4月 15, 2015

Mempools keep allocated objects in reserved for situations when ordinary
allocation may not be possible to satisfy.  These objects shouldn't be
accessed before they leave the pool.

This patch poison elements when get into the pool and unpoison when they
leave it.  This will let KASan to detect use-after-free of mempool's
elements.
Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
Tested-by: NDavid Rientjes <rientjes@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Chernenkov <drcheren@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

92393615

mm: uninline and cleanup page-mapping related helpers · e39155ea

由 Kirill A. Shutemov 提交于 4月 15, 2015

Most-used page->mapping helper -- page_mapping() -- has already uninlined.
 Let's uninline also page_rmapping() and page_anon_vma().  It saves us
depending on configuration around 400 bytes in text:

   text	   data	    bss	    dec	    hex	filename
 660318	  99254	 410000	1169572	 11d8a4	mm/built-in.o-before
 659854	  99254	 410000	1169108	 11d6d4	mm/built-in.o

I also tried to make code a bit more clean.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e39155ea

mm: cma: add trace events for CMA allocations and freeings · 99e8ea6c

由 Stefan Strogin 提交于 4月 15, 2015

Add trace events for cma_alloc() and cma_release().

The cma_alloc tracepoint is used both for successful and failed allocations,
in case of allocation failure pfn=-1UL is stored and printed.
Signed-off-by: NStefan Strogin <stefan.strogin@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mpn@google.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Cc: Thierry Reding <treding@nvidia.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

99e8ea6c