提交 · f356d7a7ddb5ea545e81c84eecfdf1b5ab4647fc · openeuler / Kernel

14 11月, 2015 1 次提交

由 Andreas Gruenbacher 提交于 10月 04, 2015

Now that the xattr handler is passed to the xattr handler operations, we
can use the same get and set operations for the user, trusted, and security
xattr namespaces.  In those namespaces, we can access the full attribute
name by "reattaching" the name prefix the vfs has skipped for us.  Add a
xattr_full_name helper to make this obvious in the code.

For the "system.posix_acl_access" and "system.posix_acl_default"
attributes, handler->prefix is the full attribute name; the suffix is the
empty string.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: v9fs-developer@lists.sourceforge.net
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e409de99

12 11月, 2015 1 次提交
- J
  block: don't hardcode blk_qc_t -> tag mask · e3a7a3bf
  由 Jens Axboe 提交于 11月 11, 2015
```
Use the shift/mask we use elsewhere.
Signed-off-by: NJens Axboe <axboe@fb.com>
```
  e3a7a3bf
11 11月, 2015 5 次提交

irqchip: irq-mips-gic: Provide function to map GIC user section · c0a9f72c

由 Alex Smith 提交于 10月 12, 2015

The GIC provides a "user-mode visible" section containing a mirror of
the counter registers which can be mapped into user memory. This will
be used by the VDSO time function implementations, so provide a
function to map it in.

When the GIC is not enabled in Kconfig a dummy inline version of this
function is provided, along with "#define gic_present 0", so that we
don't have to litter the VDSO code with ifdefs.

[markos.chandras@imgtec.com:
  - Move mapping code to arch/mips/kernel/vdso.c and use a resource
    type to get the GIC usermode information
  - Avoid renaming function arguments and use __gic_base_addr to hold
    the base GIC address prior to ioremap.]
[ralf@linux-mips.org: Fix up gic_get_usm_range() to compile and make inline
again.]
Signed-off-by: NAlex Smith <alex.smith@imgtec.com>
Signed-off-by: NMarkos Chandras <markos.chandras@imgtec.com>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alex Smith <alex.smith@imgtec.com>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: http://patchwork.linux-mips.org/patch/11281/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

c0a9f72c

vfs: remove stale comment in inode_operations · c8fffa64

由 Ross Zwisler 提交于 10月 08, 2015

The big warning comment that is currently at the end of struct
inode_operations was added as part of this commit:

4aa7c634 ("vfs: add i_op->dentry_open()")

It was added to warn people not to use the newly added 'dentry_open'
function pointer.

This function pointer was removed as part of this commit:

4bacc9c9 ("overlayfs: Make f_path always point to the overlay and
		f_inode to the underlay")

The comment was left behind and now refers to nothing, so remove it.
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c8fffa64

vfs: remove unused wrapper block_page_mkwrite() · 5c500029

由 Ross Zwisler 提交于 10月 13, 2015

The function currently called "__block_page_mkwrite()" used to be called
"block_page_mkwrite()" until a wrapper for this function was added by:

commit 24da4fab ("vfs: Create __block_page_mkwrite() helper passing
	error values back")

This wrapper, the current "block_page_mkwrite()", is currently unused.
__block_page_mkwrite() is used directly by ext4, nilfs2 and xfs.

Remove the unused wrapper, rename __block_page_mkwrite() back to
block_page_mkwrite() and update the comment above block_page_mkwrite().
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NJan Kara <jack@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5c500029

pci: remove pci_dma_supported · 247e75db

由 Christoph Hellwig 提交于 11月 10, 2015

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

247e75db

of: Provide static inline function for of_translate_address if needed · b1d06b60

由 Guenter Roeck 提交于 11月 06, 2015

If OF_ADDRESS is not configured, builds can fail with errors such as

drivers/net/ethernet/hisilicon/hns_mdio.c:
	In function 'hns_mdio_bus_name':
drivers/net/ethernet/hisilicon/hns_mdio.c:411:3:
	error: implicit declaration of function 'of_translate_address'

as currently seen when building sparc:allmodconfig.

Introduce a static inline function if OF_ADDRESS is not configured to fix
the build failure. Return OF_BAD_ADDR in this case. For this to work, the
definition of OF_BAD_ADDR has to be moved outside CONFIG_OF conditional
code.

Fixes: 876133d3 ("net: hisilicon: add OF dependency")
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Reviewed-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NFrank Rowand <frank.rowand@sonymobile.com>
Signed-off-by: NRob Herring <robh@kernel.org>

b1d06b60

10 11月, 2015 12 次提交

pwm: Set enable state properly on failed call to enable · d1cd2142

由 Jonathan Richardson 提交于 10月 16, 2015

The pwm_enable() function didn't clear the enabled bit if a call to the
driver's ->enable() callback returned an error. The result was that the
state of the PWM core was wrong. Clearing the bit when enable returns
an error ensures the state is properly set.
Tested-by: NJonathan Richardson <jonathar@broadcom.com>
Reviewed-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: NJonathan Richardson <jonathar@broadcom.com>
[thierry.reding@gmail.com: add missing kerneldoc for the lock]
Signed-off-by: NThierry Reding <thierry.reding@gmail.com>

d1cd2142

context_tracking: avoid irq_save/irq_restore on guest entry and exit · d0e536d8

由 Paolo Bonzini 提交于 10月 28, 2015

guest_enter and guest_exit must be called with interrupts disabled,
since they take the vtime_seqlock with write_seq{lock,unlock}.
Therefore, it is not necessary to check for exceptions, nor to
save/restore the IRQ state, when context tracking functions are
called by guest_enter and guest_exit.

Split the body of context_tracking_entry and context_tracking_exit
out to __-prefixed functions, and use them from KVM.

Rik van Riel has measured this to speed up a tight vmentry/vmexit
loop by about 2%.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Tested-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d0e536d8

context_tracking: remove duplicate enabled check · f70cd6b0

由 Paolo Bonzini 提交于 10月 28, 2015

All calls to context_tracking_enter and context_tracking_exit
are already checking context_tracking_is_enabled, except the
context_tracking_user_enter and context_tracking_user_exit
functions left in for the benefit of assembly calls.

Pull the check up to those functions, by making them simple
wrappers around the user_enter and user_exit inline functions.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Tested-by: NRik van Riel <riel@redhat.com>
Acked-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f70cd6b0

KVM: x86: Replace call-back set_tsc_khz() with a common function · 381d585c

由 Haozhong Zhang 提交于 10月 20, 2015

Both VMX and SVM propagate virtual_tsc_khz in the same way, so this
patch removes the call-back set_tsc_khz() and replaces it with a common
function.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

381d585c

KVM: x86: Add a common TSC scaling function · 35181e86

由 Haozhong Zhang 提交于 10月 20, 2015

VMX and SVM calculate the TSC scaling ratio in a similar logic, so this
patch generalizes it to a common TSC scaling function.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
[Inline the multiplication and shift steps into mul_u64_u64_shr.  Remove
 BUG_ON.  - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

35181e86

scsi: use host wide tags by default · 64d513ac

由 Christoph Hellwig 提交于 10月 08, 2015

This patch changes the !blk-mq path to the same defaults as the blk-mq
I/O path by always enabling block tagging, and always using host wide
tags.  We've had blk-mq available for a few releases so bugs with
this mode should have been ironed out, and this ensures we get better
coverage of over tagging setup over different configs.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJames Bottomley <JBottomley@Odin.com>

64d513ac

include/linux/kdev_t.h: old/new_valid_dev() can return bool · 8b9758b9

由 Yaowei Bai 提交于 11月 09, 2015

Make old/new_valid_dev return bool due to these two particular functions
only using either one or zero as their return value.

No functional change.
Signed-off-by: NYaowei Bai <bywxiaobai@163.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8b9758b9

include/linux/kdev_t.h: remove unused huge_valid_dev() · 7bc4f1d2

由 Yaowei Bai 提交于 11月 09, 2015

There's no user of huge_valid_dev() any more, so remove it.

No functional change.
Signed-off-by: NYaowei Bai <bywxiaobai@163.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7bc4f1d2

kmap_atomic_to_page() has no users, remove it · 77c5b5da

由 Nicolas Pitre 提交于 11月 09, 2015

Removal started in commit 5bbeed12 ("sparc32: drop unused
kmap_atomic_to_page").  Let's do it across the whole tree.
Signed-off-by: NNicolas Pitre <nico@linaro.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

77c5b5da

remove abs64() · 79211c8e

由 Andrew Morton 提交于 11月 09, 2015

Switch everything to the new and more capable implementation of abs().
Mainly to give the new abs() a bit of a workout.

Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

79211c8e

kernel.h: make abs() work with 64-bit types · c8299cb6

由 Michal Nazarewicz 提交于 11月 09, 2015

For 64-bit arguments, the abs macro casts it to an int which leads to
lost precision and may cause incorrect results.  To deal with 64-bit
types abs64 macro has been introduced but still there are places where
abs macro is used incorrectly.

To deal with the problem, expand abs macro such that it operates on s64
type when dealing with 64-bit types while still returning long when
dealing with smaller types.

This fixes one known bug (per John):

The internal clocksteering done for fine-grained error correction uses a
: logarithmic approximation, so any time adjtimex() adjusts the clock
: steering, timekeeping_freqadjust() quickly approximates the correct clock
: frequency over a series of ticks.
:
: Unfortunately, the logic in timekeeping_freqadjust(), introduced in commit
: dc491596 (Rework frequency adjustments to work better w/ nohz),
: used the abs() function with a s64 error value to calculate the size of
: the approximated adjustment to be made.
:
: Per include/linux/kernel.h: "abs() should not be used for 64-bit types
: (s64, u64, long long) - use abs64()".
:
: Thus on 32-bit platforms, this resulted in the clocksteering to take a
: quite dampended random walk trying to converge on the proper frequency,
: which caused the adjustments to be made much slower then intended (most
: easily observed when large adjustments are made).
Signed-off-by: NMichal Nazarewicz <mina86@mina86.com>
Reported-by: NJohn Stultz <john.stultz@linaro.org>
Tested-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c8299cb6

coredump: add DAX filtering for ELF coredumps · 5037835c

由 Ross Zwisler 提交于 10月 05, 2015

Add two new flags to the existing coredump mechanism for ELF files to
allow us to explicitly filter DAX mappings.  This is desirable because
DAX mappings, like hugetlb mappings, have the potential to be very
large.

Update the coredump_filter documentation in
Documentation/filesystems/proc.txt so that it addresses the new DAX
coredump flags.  Also update the documented default value of
coredump_filter to be consistent with the core(5) man page.  The
documentation being updated talks about bit 4, Dump ELF headers, which
is enabled if CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is turned on in the
kernel config.  This kernel config option defaults to "y" if both ELF
binaries and coredump are enabled.
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

5037835c

09 11月, 2015 1 次提交

net: add skb_to_full_sk() helper and use it in selinux_netlbl_skbuff_setsid() · 54abc686

由 Eric Dumazet 提交于 11月 08, 2015

Generalize selinux_skb_sk() added in commit 212cd089
("selinux: fix random read in selinux_ip_postroute_compat()")
so that we can use it other contexts.

Use it right away in selinux_netlbl_skbuff_setsid()

Fixes: ca6fb065 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54abc686

08 11月, 2015 3 次提交

asm-generic: {get,put}_user ptr argument evaluate only 1 time · a02613a4

由 Yoshinori Sato 提交于 7月 16, 2015

Current implemantation ptr argument evaluate 2 times.
It'll be an unexpected result.

Changes v5:
Remove unnecessary const.
Changes v4:
Temporary pointer type change to const void*
Changes v3:
Some build error fix.
Changes v2:
Argument x protect.
Signed-off-by: NYoshinori Sato <ysato@users.sourceforge.jp>

a02613a4

block: add block polling support · 05229bee

由 Jens Axboe 提交于 11月 05, 2015

Add basic support for polling for specific IO to complete. This uses
the cookie that blk-mq passes back, which enables the block layer
to pass this cookie to the driver to spin for a specific request.

This will be combined with request latency tracking, so we can make
qualified decisions about when to poll and when not to. For now, for
benchmark purposes, we add a sysfs file that controls whether polling
is enabled or not.
Signed-off-by: NJens Axboe <axboe@fb.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NKeith Busch <keith.busch@intel.com>

05229bee

block: change ->make_request_fn() and users to return a queue cookie · dece1635

由 Jens Axboe 提交于 11月 05, 2015

No functional changes in this patch, but it prepares us for returning
a more useful cookie related to the IO that was queued up.
Signed-off-by: NJens Axboe <axboe@fb.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NKeith Busch <keith.busch@intel.com>

dece1635

07 11月, 2015 17 次提交

include/linux/zutil.h: fix usage example of zlib_adler32() · cb7ae262

由 Anish Bhatt 提交于 11月 06, 2015

alder32 was renamed to zlib_adler32 since before 2.6.11.
Signed-off-by: NAnish Bhatt <anish@chelsio.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cb7ae262

dma-mapping: tidy up dma_parms default handling · 002edb6f

由 Robin Murphy 提交于 11月 06, 2015

Many DMA controllers and other devices set max_segment_size to
indicate their scatter-gather capability, but have no interest in
segment_boundary_mask. However, the existence of a dma_parms structure
precludes the use of any default value, leaving them as zeros (assuming
a properly kzalloc'ed structure). If a well-behaved IOMMU (or SWIOTLB)
then tries to respect this by ensuring a mapped segment does not cross
a zero-byte boundary, hilarity ensues.

Since zero is a nonsensical value for either parameter, treat it as an
indicator for "default", as might be expected. In the process, clean up
a bit by replacing the bare constants with slightly more meaningful
macros and removing the superfluous "else" statements.

[akpm@linux-foundation.org: dma-mapping.h needs sizes.h for SZ_64K]
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Reviewed-by: NSumit Semwal <sumit.semwal@linaro.org>
Acked-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Sakari Ailus <sakari.ailus@iki.fi>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

002edb6f

signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread() · 9a13049e

由 Oleg Nesterov 提交于 11月 06, 2015

jffs2_garbage_collect_thread() can race with SIGCONT and sleep in
TASK_STOPPED state after it was already sent. Add the new helper,
kernel_signal_stop(), which does this correctly.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NTejun Heo <tj@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9a13049e

signal: turn dequeue_signal_lock() into kernel_dequeue_signal() · be0e6f29

由 Oleg Nesterov 提交于 11月 06, 2015

1. Rename dequeue_signal_lock() to kernel_dequeue_signal(). This
   matches another "for kthreads only" kernel_sigaction() helper.

2. Remove the "tsk" and "mask" arguments, they are always current
   and current->blocked. And it is simply wrong if tsk != current.

3. We could also remove the 3rd "siginfo_t *info" arg but it looks
   potentially useful. However we can simplify the callers if we
   change kernel_dequeue_signal() to accept info => NULL.

4. Remove _irqsave, it is never called from atomic context.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NTejun Heo <tj@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

be0e6f29

signals: kill block_all_signals() and unblock_all_signals() · 2e01fabe

由 Oleg Nesterov 提交于 11月 06, 2015

It is hardly possible to enumerate all problems with block_all_signals()
and unblock_all_signals().  Just for example,

1. block_all_signals(SIGSTOP/etc) simply can't help if the caller is
   multithreaded. Another thread can dequeue the signal and force the
   group stop.

2. Even is the caller is single-threaded, it will "stop" anyway. It
   will not sleep, but it will spin in kernel space until SIGCONT or
   SIGKILL.

And a lot more. In short, this interface doesn't work at all, at least
the last 10+ years.

Daniel said:

  Yeah the only times I played around with the DRM_LOCK stuff was when
  old drivers accidentally deadlocked - my impression is that the entire
  DRM_LOCK thing was never really tested properly ;-) Hence I'm all for
  purging where this leaks out of the drm subsystem.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: NDave Airlie <airlied@redhat.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2e01fabe

nilfs2: add tracepoints for analyzing reading and writing metadata files · a9cd207c

由 Hitoshi Mitake 提交于 11月 06, 2015

This patch adds tracepoints for analyzing requests of reading and writing
metadata files. The tracepoints cover every in-place mdt files (cpfile,
sufile, and datfile).

Example of tracing mdt_insert_new_block():
cp-14635 [000] ...1 30598.199309: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 155
cp-14635 [000] ...1 30598.199520: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 5
cp-14635 [000] ...1 30598.200828: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 253
Signed-off-by: NHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: TK Kato <TK.Kato@wdc.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a9cd207c

nilfs2: add tracepoints for analyzing sufile manipulation · 83eec5e6

由 Hitoshi Mitake 提交于 11月 06, 2015

This patch adds tracepoints which would be useful for analyzing segment
usage from a perspective of high level sufile manipulation (check, alloc,
free).  sufile is an important in-place updated metadata file, so
analyzing the behavior would be useful for performance turning.

example of usage (a case of allocation):

$ sudo bin/tpoint nilfs2:nilfs2_segment_usage_allocated
Tracing nilfs2:nilfs2_segment_usage_allocated. Ctrl-C to end.
        segctord-17800 [002] ...1 10671.867294: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 2
        segctord-17800 [002] ...1 10675.073477: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 3
Signed-off-by: NHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benixon Dhas <benixon.dhas@wdc.com>
Cc: TK Kato <TK.Kato@wdc.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

83eec5e6

nilfs2: add a tracepoint for transaction events · 44fda114

由 Hitoshi Mitake 提交于 11月 06, 2015

This patch adds a tracepoint for transaction events of nilfs. With the
tracepoint, these events can be tracked: begin, abort, commit, trylock,
lock, and unlock. Basically, these events have corresponding functions
e.g. begin event corresponds nilfs_transaction_begin(). The unlock event
is an exception. It corresponds to the iteration in
nilfs_transaction_lock().

Only one tracepoint is introcued: nilfs2_transaction_transition. The
above events are distinguished with newly introduced enum. With this
tracepoint, we can analyse a critical section of segment constructoin.

Sample output by tpoint of perf-tools:
cp-4457 [000] ...1 63.266220: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 1 flags = 9 state = BEGIN
cp-4457 [000] ...1 63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
cp-4457 [000] ...1 63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
segctord-4371 [001] ...1 68.261196: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK
segctord-4371 [001] ...1 68.261280: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = LOCK
segctord-4371 [001] ...1 68.261877: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 1 flags = 10 state = BEGIN
segctord-4371 [001] ...1 68.262116: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = COMMIT
segctord-4371 [001] ...1 68.265032: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = UNLOCK
segctord-4371 [001] ...1 132.376847: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK

This patch also does trivial cleaning of comma usage in collection stage
transition event for consistent coding style.
Signed-off-by: NHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

44fda114

nilfs2: add a tracepoint for tracking stage transition of segment construction · 58497703

由 Hitoshi Mitake 提交于 11月 06, 2015

This patch adds a tracepoint for tracking stage transition of block
collection in segment construction.  With the tracepoint, we can analysis
the behavior of segment construction in depth.  It would be useful for
bottleneck detection and debugging, etc.

The tracepoint is created with the standard trace API of linux (like ext3,
ext4, f2fs and btrfs).  So we can analysis with existing tools easily.  Of
course, more detailed analysis will be possible if we can create nilfs
specific analysis tools.

Below is an example of event dump with Brendan Gregg's perf-tools
(https://github.com/brendangregg/perf-tools).  Time consumption between
each stage can be obtained.

$ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition
Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end.
        segctord-14875 [003] ...1 28311.067794: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_INIT
        segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_GC
        segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_FILE
        segctord-14875 [003] ...1 28311.068486: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_IFILE
        segctord-14875 [003] ...1 28311.068540: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_CPFILE
        segctord-14875 [003] ...1 28311.068561: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SUFILE
        segctord-14875 [003] ...1 28311.068565: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DAT
        segctord-14875 [003] ...1 28311.068573: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SR
        segctord-14875 [003] ...1 28311.068574: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DONE

For capturing transition correctly, this patch adds wrappers for the
member scnt of nilfs_cstage.  With this change, every transition of the
stage can produce trace event in a correct manner.
Signed-off-by: NHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

58497703

rbtree: clarify documentation of rbtree_postorder_for_each_entry_safe() · 8de1ee7e

由 Cody P Schafer 提交于 11月 06, 2015

I noticed that commit a20135ff ("writeback: don't drain
bdi_writeback_congested on bdi destruction") added a usage of
rbtree_postorder_for_each_entry_safe() in mm/backing-dev.c which appears
to try to rb_erase() elements from an rbtree while iterating over it using
rbtree_postorder_for_each_entry_safe().

Doing this will cause random nodes to be missed by the iteration because
rb_erase() may rebalance the tree, changing the ordering that we're trying
to iterate over.

The previous documentation for rbtree_postorder_for_each_entry_safe()
wasn't clear that this wasn't allowed, it was taken from the docs for
list_for_each_entry_safe(), where erasing isn't a problem due to
list_del() not reordering.

Explicitly warn developers about this potential pit-fall.

Note that I haven't fixed the actual issue that (it appears) the commit
referenced above introduced (not familiar enough with that code).

In general (and in this case), the patterns to follow are:
 - switch to rb_first() + rb_erase(), don't use
   rbtree_postorder_for_each_entry_safe().
 - keep the postorder iteration and don't rb_erase() at all. Instead
   just clear the fields of rb_node & cgwb_congested_tree as required by
   other users of those structures.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: NCody P Schafer <dev@codyps.com>
Cc: John de la Garza <john@jjdev.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8de1ee7e

lib/kasprintf.c: introduce kvasprintf_const · 0a9df786

由 Rasmus Villemoes 提交于 11月 06, 2015

This adds kvasprintf_const which tries to use kstrdup_const if possible:
If the format string contains no % characters, or if the format string is
exactly "%s", we delegate to kstrdup_const.  Otherwise, we fall back to
kvasprintf.

Just as for kstrdup_const, the main motivation is to save memory by
reusing .rodata when possible.

The return value should be freed by kfree_const, just like for
kstrdup_const.

There is deliberately no kasprintf_const: In the vast majority of cases,
the format string argument is a literal, so one can determine statically
whether one could instead use kstrdup_const directly (which would also
require one to change all corresponding kfree calls to kfree_const).
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0a9df786

bitops.h: add sign_extend64() · 48e203e2

由 Martin Kepplinger 提交于 11月 06, 2015

Months back, this was discussed, see https://lkml.org/lkml/2015/1/18/289
The result was the 64-bit version being "likely fine", "valuable" and
"correct".  The discussion fell asleep but since there are possible users,
let's add it.
Signed-off-by: NMartin Kepplinger <martin.kepplinger@theobroma-systems.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: George Spelvin <linux@horizon.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Maxime Coquelin <maxime.coquelin@st.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

48e203e2

bitops.h: improve sign_extend32()'s documentation · e2eb53aa

由 Martin Kepplinger 提交于 11月 06, 2015

It is often overlooked that sign_extend32(), despite its name, is safe to
use for 16 and 8 bit types as well.  This should help prevent sign
extension being done manually some other way.
Signed-off-by: NMartin Kepplinger <martin.kepplinger@theobroma-systems.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: George Spelvin <linux@horizon.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Maxime Coquelin <maxime.coquelin@st.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e2eb53aa

include/linux/compiler-gcc.h: improve __visible documentation · 9add850c

由 Andrew Morton 提交于 11月 06, 2015

Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9add850c

mm: use 'unsigned int' for compound_dtor/compound_order on 64BIT · 1965c8b7

由 Kirill A. Shutemov 提交于 11月 06, 2015

On 64 bit system we have enough space in struct page to encode
compound_dtor and compound_order with unsigned int.

On x86-64 it leads to slightly smaller code size due usesage of plain
MOV instead of MOVZX (zero-extended move) or similar effect.

allyesconfig:

   text	   data	    bss	    dec	    hex	filename
159520446	48146736	72196096	279863278	10ae5fee	vmlinux.pre
159520382	48146736	72196096	279863214	10ae5fae	vmlinux.post

On other architectures without native support of 16-bit data types the
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1965c8b7

mm: use 'unsigned int' for page order · d00181b9

由 Kirill A. Shutemov 提交于 11月 06, 2015

Let's try to be consistent about data type of page order.

[sfr@canb.auug.org.au: fix build (type of pageblock_order)]
[hughd@google.com: some configs end up with MAX_ORDER and pageblock_order having different types]
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d00181b9

mm: make compound_head() robust · 1d798ca3

由 Kirill A. Shutemov 提交于 11月 06, 2015

Hugh has pointed that compound_head() call can be unsafe in some
context. There's one example:

	CPU0					CPU1

isolate_migratepages_block()
  page_count()
    compound_head()
      !!PageTail() == true
					put_page()
					  tail->first_page = NULL
      head = tail->first_page
					alloc_pages(__GFP_COMP)
					   prep_compound_page()
					     tail->first_page = head
					     __SetPageTail(p);
      !!PageTail() == true
    <head == NULL dereferencing>

The race is pure theoretical. I don't it's possible to trigger it in
practice. But who knows.

We can fix the race by changing how encode PageTail() and compound_head()
within struct page to be able to update them in one shot.

The patch introduces page->compound_head into third double word block in
front of compound_dtor and compound_order. Bit 0 encodes PageTail() and
the rest bits are pointer to head page if bit zero is set.

The patch moves page->pmd_huge_pte out of word, just in case if an
architecture defines pgtable_t into something what can have the bit 0
set.

hugetlb_cgroup uses page->lru.next in the second tail page to store
pointer struct hugetlb_cgroup. The patch switch it to use page->private
in the second tail page instead. The space is free since ->first_page is
removed from the union.

The patch also opens possibility to remove HUGETLB_CGROUP_MIN_ORDER
limitation, since there's now space in first tail page to store struct
hugetlb_cgroup pointer. But that's out of scope of the patch.

That means page->compound_head shares storage space with:

 - page->lru.next;
 - page->next;
 - page->rcu_head.next;

That's too long list to be absolutely sure, but looks like nobody uses
bit 0 of the word.

page->rcu_head.next guaranteed[1] to have bit 0 clean as long as we use
call_rcu(), call_rcu_bh(), call_rcu_sched(), or call_srcu(). But future
call_rcu_lazy() is not allowed as it makes use of the bit and we can
get false positive PageTail().

[1] http://lkml.kernel.org/g/20150827163634.GD4029@linux.vnet.ibm.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d798ca3

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功