提交 · d70ab4fb723cf7acfa656cb2ad1e75be7ed94bef · openeuler / Kernel

28 3月, 2014 1 次提交

由 Mikulas Patocka 提交于 3月 03, 2014

Remove dm_get_mapinfo() because no target uses it.  Targets can allocate
per-bio data using ti->per_bio_data_size, this is much more flexible
than union map_info.

Leave union map_info only for the request-based multipath target's use.
Also delete the unused "unsigned long long ll" field of union map_info.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d70ab4fb

21 3月, 2014 2 次提交

mm: fix swapops.h:131 bug if remap_file_pages raced migration · 7e09e738

由 Hugh Dickins 提交于 3月 20, 2014

Add remove_linear_migration_ptes_from_nonlinear(), to fix an interesting
little include/linux/swapops.h:131 BUG_ON(!PageLocked) found by trinity:
indicating that remove_migration_ptes() failed to find one of the
migration entries that was temporarily inserted.

The problem comes from remap_file_pages()'s switch from vma_interval_tree
(good for inserting the migration entry) to i_mmap_nonlinear list (no good
for locating it again); but can only be a problem if the remap_file_pages()
range does not cover the whole of the vma (zap_pte() clears the range).

remove_migration_ptes() needs a file_nonlinear method to go down the
i_mmap_nonlinear list, applying linear location to look for migration
entries in those vmas too, just in case there was this race.

The file_nonlinear method does need rmap_walk_control.arg to do this;
but it never needed vma passed in - vma comes from its own iteration.
Reported-and-tested-by: NDave Jones <davej@redhat.com>
Reported-and-tested-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7e09e738

tracing: Fix array size mismatch in format string · 87291347

由 Vaibhav Nagarnaik 提交于 2月 13, 2014

In event format strings, the array size is reported in two locations.
One in array subscript and then via the "size:" attribute. The values
reported there have a mismatch.

For e.g., in sched:sched_switch the prev_comm and next_comm character
arrays have subscript values as [32] where as the actual field size is
16.

name: sched_switch
ID: 301
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1;signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:char prev_comm[32];       offset:8;       size:16;        signed:1;
        field:pid_t prev_pid;   offset:24;      size:4; signed:1;
        field:int prev_prio;    offset:28;      size:4; signed:1;
        field:long prev_state;  offset:32;      size:8; signed:1;
        field:char next_comm[32];       offset:40;      size:16;        signed:1;
        field:pid_t next_pid;   offset:56;      size:4; signed:1;
        field:int next_prio;    offset:60;      size:4; signed:1;

After bisection, the following commit was blamed:
92edca07 tracing: Use direct field, type and system names

This commit removes the duplication of strings for field->name and
field->type assuming that all the strings passed in
__trace_define_field() are immutable. This is not true for arrays, where
the type string is created in event_storage variable and field->type for
all array fields points to event_storage.

Use __stringify() to create a string constant for the type string.

Also, get rid of event_storage and event_storage_mutex that are not
needed anymore.

also, an added benefit is that this reduces the overhead of events a bit more:

   text    data     bss     dec     hex filename
8424787 2036472 1302528 11763787         b3804b vmlinux
8420814 2036408 1302528 11759750         b37086 vmlinux.patched

Link: http://lkml.kernel.org/r/1392349908-29685-1-git-send-email-vnagarnaik@google.com

Cc: Laurent Chavey <chavey@google.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

87291347

19 3月, 2014 1 次提交

net: cdc_ncm: fix control message ordering · ff0992e9

由 Bjørn Mork 提交于 3月 17, 2014

This is a context modified revert of commit 6a9612e2
("net: cdc_ncm: remove ncm_parm field") which introduced
a NCM specification violation, causing setup errors for
some devices. These errors resulted in the device and
host disagreeing about shared settings, with complete
failure to communicate as the end result.

The NCM specification require that many of the NCM specific
control reuests are sent only while the NCM Data Interface
is in alternate setting 0. Reverting the commit ensures that
we follow this requirement.

Fixes: 6a9612e2 ("net: cdc_ncm: remove ncm_parm field")
Reported-and-tested-by: NPasi Kärkkäinen <pasik@iki.fi>
Reported-by: NThomas Schäfer <tschaefer@t-online.de>
Signed-off-by: NBjørn Mork <bjorn@mork.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff0992e9

11 3月, 2014 1 次提交

mm: fix GFP_THISNODE callers and clarify · e97ca8e5

由 Johannes Weiner 提交于 3月 10, 2014

GFP_THISNODE is for callers that implement their own clever fallback to
remote nodes.  It restricts the allocation to the specified node and
does not invoke reclaim, assuming that the caller will take care of it
when the fallback fails, e.g.  through a subsequent allocation request
without GFP_THISNODE set.

However, many current GFP_THISNODE users only want the node exclusive
aspect of the flag, without actually implementing their own fallback or
triggering reclaim if necessary.  This results in things like page
migration failing prematurely even when there is easily reclaimable
memory available, unless kswapd happens to be running already or a
concurrent allocation attempt triggers the necessary reclaim.

Convert all callsites that don't implement their own fallback strategy
to __GFP_THISNODE.  This restricts the allocation a single node too, but
at the same time allows the allocator to enter the slowpath, wake
kswapd, and invoke direct reclaim if necessary, to make the allocation
happen when memory is full.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Jan Stancek <jstancek@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e97ca8e5

10 3月, 2014 3 次提交

get rid of fget_light() · bd2a31d5

由 Al Viro 提交于 3月 04, 2014

instead of returning the flags by reference, we can just have the
low-level primitive return those in lower bits of unsigned long,
with struct file * derived from the rest.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bd2a31d5

vfs: atomic f_pos accesses as per POSIX · 9c225f26

由 Linus Torvalds 提交于 3月 03, 2014

Our write() system call has always been atomic in the sense that you get
the expected thread-safe contiguous write, but we haven't actually
guaranteed that concurrent writes are serialized wrt f_pos accesses, so
threads (or processes) that share a file descriptor and use "write()"
concurrently would quite likely overwrite each others data.

This violates POSIX.1-2008/SUSv4 Section XSI 2.9.7 that says:

 "2.9.7 Thread Interactions with Regular File Operations

  All of the following functions shall be atomic with respect to each
  other in the effects specified in POSIX.1-2008 when they operate on
  regular files or symbolic links: [...]"

and one of the effects is the file position update.

This unprotected file position behavior is not new behavior, and nobody
has ever cared.  Until now.  Yongzhi Pan reported unexpected behavior to
Michael Kerrisk that was due to this.

This resolves the issue with a f_pos-specific lock that is taken by
read/write/lseek on file descriptors that may be shared across threads
or processes.
Reported-by: NYongzhi Pan <panyongzhi@gmail.com>
Reported-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9c225f26

selinux: add gfp argument to security_xfrm_policy_alloc and fix callers · 52a4c640

由 Nikolay Aleksandrov 提交于 3月 07, 2014

security_xfrm_policy_alloc can be called in atomic context so the
allocation should be done with GFP_ATOMIC. Add an argument to let the
callers choose the appropriate way. In order to do so a gfp argument
needs to be added to the method xfrm_policy_alloc_security in struct
security_operations and to the internal function
selinux_xfrm_alloc_user. After that switch to GFP_ATOMIC in the atomic
callers and leave GFP_KERNEL as before for the rest.
The path that needed the gfp argument addition is:
security_xfrm_policy_alloc -> security_ops.xfrm_policy_alloc_security ->
all users of xfrm_policy_alloc_security (e.g. selinux_xfrm_policy_alloc) ->
selinux_xfrm_alloc_user (here the allocation used to be GFP_KERNEL only)

Now adding a gfp argument to selinux_xfrm_alloc_user requires us to also
add it to security_context_to_sid which is used inside and prior to this
patch did only GFP_KERNEL allocation. So add gfp argument to
security_context_to_sid and adjust all of its callers as well.

CC: Paul Moore <paul@paul-moore.com>
CC: Dave Jones <davej@redhat.com>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Fan Du <fan.du@windriver.com>
CC: David S. Miller <davem@davemloft.net>
CC: LSM list <linux-security-module@vger.kernel.org>
CC: SELinux list <selinux@tycho.nsa.gov>
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Acked-by: NPaul Moore <paul@paul-moore.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

52a4c640

07 3月, 2014 1 次提交

firewire: don't use PREPARE_DELAYED_WORK · 70044d71

由 Tejun Heo 提交于 3月 07, 2014

PREPARE_[DELAYED_]WORK() are being phased out.  They have few users
and a nasty surprise in terms of reentrancy guarantee as workqueue
considers work items to be different if they don't have the same work
function.

firewire core-device and sbp2 have been been multiplexing work items
with multiple work functions.  Introduce fw_device_workfn() and
sbp2_lu_workfn() which invoke fw_device->workfn and
sbp2_logical_unit->workfn respectively and always use the two
functions as the work functions and update the users to set the
->workfn fields instead of overriding work functions using
PREPARE_DELAYED_WORK().

This fixes a variety of possible regressions since a2c1c57b
"workqueue: consider work function when searching for busy work items"
due to which fw_workqueue lost its required non-reentrancy property.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
Cc: linux1394-devel@lists.sourceforge.net
Cc: stable@vger.kernel.org # v3.9+
Cc: stable@vger.kernel.org # v3.8.2+
Cc: stable@vger.kernel.org # v3.4.60+
Cc: stable@vger.kernel.org # v3.2.40+

70044d71

04 3月, 2014 4 次提交

mm: numa: bugfix for LAST_CPUPID_NOT_IN_PAGE_FLAGS · 1ae71d03

由 Liu Ping Fan 提交于 3月 03, 2014

When doing some numa tests on powerpc, I triggered an oops bug.  I find
it is caused by using page->_last_cpupid.  It should be initialized as
"-1 & LAST_CPUPID_MASK", but not "-1".  Otherwise, in task_numa_fault(),
we will miss the checking (last_cpupid == (-1 & LAST_CPUPID_MASK)).  And
finally cause an oops bug in task_numa_group(), since the online cpu is
less than possible cpu.  This happen with CONFIG_SPARSE_VMEMMAP disabled

Call trace:

  SMP NR_CPUS=64 NUMA PowerNV
  Modules linked in:
  CPU: 24 PID: 804 Comm: systemd-udevd Not tainted3.13.0-rc1+ #32
  task: c000001e2746aa80 ti: c000001e32c50000 task.ti:c000001e32c50000
  REGS: c000001e32c53510 TRAP: 0300   Not tainted(3.13.0-rc1+)
  MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR:28024424  XER: 20000000
  CFAR: c000000000009324 DAR: 7265717569726857 DSISR:40000000 SOFTE: 1
  NIP  .task_numa_fault+0x1470/0x2370
  LR  .task_numa_fault+0x1468/0x2370
  Call Trace:
   .task_numa_fault+0x1468/0x2370 (unreliable)
   .do_numa_page+0x480/0x4a0
   .handle_mm_fault+0x4ec/0xc90
   .do_page_fault+0x3a8/0x890
   handle_page_fault+0x10/0x30
  Instruction dump:
  3c82fefb 3884b138 48d9cff1 60000000 48000574 3c62fefb3863af78 3c82fefb
  3884b138 48d9cfd5 60000000 e93f0100 <812902e4> 7d2907b45529063e 7d2a07b4
  ---[ end trace 15f2510da5ae07cf ]---
Signed-off-by: NLiu Ping Fan <pingfank@linux.vnet.ibm.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1ae71d03

mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking · 9050d7eb

由 Vlastimil Babka 提交于 3月 03, 2014

Daniel Borkmann reported a VM_BUG_ON assertion failing:

  ------------[ cut here ]------------
  kernel BUG at mm/mlock.c:528!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: ccm arc4 iwldvm [...]
   video
  CPU: 3 PID: 2266 Comm: netsniff-ng Not tainted 3.14.0-rc2+ #8
  Hardware name: LENOVO 2429BP3/2429BP3, BIOS G4ET37WW (1.12 ) 05/29/2012
  task: ffff8801f87f9820 ti: ffff88002cb44000 task.ti: ffff88002cb44000
  RIP: 0010:[<ffffffff81171ad0>]  [<ffffffff81171ad0>] munlock_vma_pages_range+0x2e0/0x2f0
  Call Trace:
    do_munmap+0x18f/0x3b0
    vm_munmap+0x41/0x60
    SyS_munmap+0x22/0x30
    system_call_fastpath+0x1a/0x1f
  RIP   munlock_vma_pages_range+0x2e0/0x2f0
  ---[ end trace a0088dcf07ae10f2 ]---

because munlock_vma_pages_range() thinks it's unexpectedly in the middle
of a THP page.  This can be reproduced with default config since 3.11
kernels.  A reproducer can be found in the kernel's selftest directory
for networking by running ./psock_tpacket.

The problem is that an order=2 compound page (allocated by
alloc_one_pg_vec_page() is part of the munlocked VM_MIXEDMAP vma (mapped
by packet_mmap()) and mistaken for a THP page and assumed to be order=9.

The checks for THP in munlock came with commit ff6a6da6 ("mm:
accelerate munlock() treatment of THP pages"), i.e.  since 3.9, but did
not trigger a bug.  It just makes munlock_vma_pages_range() skip such
compound pages until the next 512-pages-aligned page, when it encounters
a head page.  This is however not a problem for vma's where mlocking has
no effect anyway, but it can distort the accounting.

Since commit 7225522b ("mm: munlock: batch non-THP page isolation
and munlock+putback using pagevec") this can trigger a VM_BUG_ON in
PageTransHuge() check.

This patch fixes the issue by adding VM_MIXEDMAP flag to VM_SPECIAL, a
list of flags that make vma's non-mlockable and non-mergeable.  The
reasoning is that VM_MIXEDMAP vma's are similar to VM_PFNMAP, which is
already on the VM_SPECIAL list, and both are intended for non-LRU pages
where mlocking makes no sense anyway.  Related Lkml discussion can be
found in [2].

 [1] tools/testing/selftests/net/psock_tpacket
 [2] https://lkml.org/lkml/2014/1/10/427Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Reported-by: NDaniel Borkmann <dborkman@redhat.com>
Tested-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Jared Hulbert <jaredeh@gmail.com>
Tested-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [3.11.x+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9050d7eb

mm: close PageTail race · 668f9abb

由 David Rientjes 提交于 3月 03, 2014

Commit bf6bddf1 ("mm: introduce compaction and migration for
ballooned pages") introduces page_count(page) into memory compaction
which dereferences page->first_page if PageTail(page).

This results in a very rare NULL pointer dereference on the
aforementioned page_count(page).  Indeed, anything that does
compound_head(), including page_count() is susceptible to racing with
prep_compound_page() and seeing a NULL or dangling page->first_page
pointer.

This patch uses Andrea's implementation of compound_trans_head() that
deals with such a race and makes it the default compound_head()
implementation.  This includes a read memory barrier that ensures that
if PageTail(head) is true that we return a head page that is neither
NULL nor dangling.  The patch then adds a store memory barrier to
prep_compound_page() to ensure page->first_page is set.

This is the safest way to ensure we see the head page that we are
expecting, PageTail(page) is already in the unlikely() path and the
memory barriers are unfortunately required.

Hugetlbfs is the exception, we don't enforce a store memory barrier
during init since no race is possible.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Cc: Holger Kiehl <Holger.Kiehl@dwd.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

668f9abb

tracing: Do not add event files for modules that fail tracepoints · 45ab2813

由 Steven Rostedt (Red Hat) 提交于 2月 26, 2014

If a module fails to add its tracepoints due to module tainting, do not
create the module event infrastructure in the debugfs directory. As the events
will not work and worse yet, they will silently fail, making the user wonder
why the events they enable do not display anything.

Having a warning on module load and the events not visible to the users
will make the cause of the problem much clearer.

Link: http://lkml.kernel.org/r/20140227154923.265882695@goodmis.org

Fixes: 6d723736 "tracing/events: add support for modules to TRACE_EVENT"
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: stable@vger.kernel.org # 2.6.31+
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

45ab2813

02 3月, 2014 1 次提交

NFSv4: Fix another nfs4_sequence corruptor · b7e63a10

由 Trond Myklebust 提交于 2月 26, 2014

nfs4_release_lockowner needs to set the rpc_message reply to point to
the nfs4_sequence_res in order to avoid another Oopsable situation
in nfs41_assign_slot.

Fixes: fbd4bfd1 (NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER)
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b7e63a10

01 3月, 2014 1 次提交

audit: Send replies in the proper network namespace. · 6f285b19

由 Eric W. Biederman 提交于 2月 28, 2014

In perverse cases of file descriptor passing the current network
namespace of a process and the network namespace of a socket used by
that socket may differ.  Therefore use the network namespace of the
appropiate socket to ensure replies always go to the appropiate
socket.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

6f285b19

26 2月, 2014 1 次提交

ipc,mqueue: remove limits for the amount of system-wide queues · f3713fd9

由 Davidlohr Bueso 提交于 2月 25, 2014

Commit 93e6f119 ("ipc/mqueue: cleanup definition names and
locations") added global hardcoded limits to the amount of message
queues that can be created.  While these limits are per-namespace,
reality is that it ends up breaking userspace applications.
Historically users have, at least in theory, been able to create up to
INT_MAX queues, and limiting it to just 1024 is way too low and dramatic
for some workloads and use cases.  For instance, Madars reports:

 "This update imposes bad limits on our multi-process application.  As
  our app uses approaches that each process opens its own set of queues
  (usually something about 3-5 queues per process).  In some scenarios
  we might run up to 3000 processes or more (which of-course for linux
  is not a problem).  Thus we might need up to 9000 queues or more.  All
  processes run under one user."

Other affected users can be found in launchpad bug #1155695:
  https://bugs.launchpad.net/ubuntu/+source/manpages/+bug/1155695

Instead of increasing this limit, revert it entirely and fallback to the
original way of dealing queue limits -- where once a user's resource
limit is reached, and all memory is used, new queues cannot be created.
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Reported-by: NMadars Vitolins <m@silodev.com>
Acked-by: NDoug Ledford <dledford@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>	[3.5+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f3713fd9

25 2月, 2014 2 次提交

sysfs: fix namespace refcnt leak · fed95bab

由 Li Zefan 提交于 2月 25, 2014

As mount() and kill_sb() is not a one-to-one match, we shoudn't get
ns refcnt unconditionally in sysfs_mount(), and instead we should
get the refcnt only when kernfs_mount() allocated a new superblock.

v2:
- Changed the name of the new argument, suggested by Tejun.
- Made the argument optional, suggested by Tejun.

v3:
- Make the new argument as second-to-last arg, suggested by Tejun.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NTejun Heo <tj@kernel.org>
 ---
 fs/kernfs/mount.c      | 8 +++++++-
 fs/sysfs/mount.c       | 5 +++--
 include/linux/kernfs.h | 9 +++++----
 3 files changed, 15 insertions(+), 7 deletions(-)
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

fed95bab

fsnotify: Allocate overflow events with proper type · ff57cd58

由 Jan Kara 提交于 2月 21, 2014

Commit 7053aee2 "fsnotify: do not share events between notification
groups" used overflow event statically allocated in a group with the
size of the generic notification event. This causes problems because
some code looks at type specific parts of event structure and gets
confused by a random data it sees there and causes crashes.

Fix the problem by allocating overflow event with type corresponding to
the group type so code cannot get confused.
Signed-off-by: NJan Kara <jack@suse.cz>

ff57cd58

22 2月, 2014 4 次提交

Revert "writeback: do not sync data dirtied after sync start" · 0dc83bd3

由 Jan Kara 提交于 2月 21, 2014

This reverts commit c4a391b5. Dave
Chinner <david@fromorbit.com> has reported the commit may cause some
inodes to be left out from sync(2). This is because we can call
redirty_tail() for some inode (which sets i_dirtied_when to current time)
after sync(2) has started or similarly requeue_inode() can set
i_dirtied_when to current time if writeback had to skip some pages. The
real problem is in the functions clobbering i_dirtied_when but fixing
that isn't trivial so revert is a safer choice for now.

CC: stable@vger.kernel.org # >= 3.13
Signed-off-by: NJan Kara <jack@suse.cz>

0dc83bd3

sched: Add 'flags' argument to sched_{set,get}attr() syscalls · 6d35ab48

由 Peter Zijlstra 提交于 2月 14, 2014

Because of a recent syscall design debate; its deemed appropriate for
each syscall to have a flags argument for future extension; without
immediately requiring new syscalls.

Cc: juri.lelli@gmail.com
Cc: Ingo Molnar <mingo@redhat.com>
Suggested-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140214161929.GL27965@twins.programming.kicks-ass.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

6d35ab48

blk-mq: support partial I/O completions · d6a25b31

由 Christoph Hellwig 提交于 2月 20, 2014

Add a new blk_mq_end_io_partial function to partially complete requests
as needed by the SCSI layer.  We do this by reusing blk_update_request
to advance the bio instead of having a simplified version of it in
the blk-mq code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

d6a25b31

blk-mq: merge blk_mq_insert_request and blk_mq_run_request · feb71dae

由 Christoph Hellwig 提交于 2月 20, 2014

It's almost identical to blk_mq_insert_request, so fold the two into one
slightly more generic function by making the flush special case a bit
smarted.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

feb71dae

20 2月, 2014 1 次提交

ARM: OMAP2+: clock: fix clkoutx2 with CLK_SET_RATE_PARENT · 994c41ee

由 Tomi Valkeinen 提交于 1月 30, 2014

If CLK_SET_RATE_PARENT is set for a clkoutx2 clock, calling
clk_set_rate() on the clock "skips" the x2 multiplier as there are no
set_rate and round_rate functions defined for the clkoutx2.

This results in getting double the requested clock rates, breaking the
display on omap3430 based devices. This got broken when
d0f58bd3 and related patches were merged
for v3.14, as omapdss driver now relies more on the clk-framework and
CLK_SET_RATE_PARENT.

This patch implements set_rate and round_rate for clkoutx2.

Tested on OMAP3430, OMAP3630, OMAP4460.
Signed-off-by: NTomi Valkeinen <tomi.valkeinen@ti.com>
Acked-by: NTero Kristo <t-kristo@ti.com>
Signed-off-by: NPaul Walmsley <paul@pwsan.com>

994c41ee

19 2月, 2014 3 次提交

mfd: tps65217: Naturalise cross-architecture discrepancies · 5c6fbd56

由 Lee Jones 提交于 2月 03, 2014

If we compile the TPS65217 for a 64bit architecture we receive the following
warnings:

drivers/mfd/tps65217.c: In function ‘tps65217_probe’:
drivers/mfd/tps65217.c:173:13:
  warning: cast from pointer to integer of different size
   chip_id = (unsigned int)match->data;
             ^
Signed-off-by: NLee Jones <lee.jones@linaro.org>

5c6fbd56

mfd: max8998: Naturalise cross-architecture discrepancies · 8bace2d5

由 Lee Jones 提交于 2月 03, 2014

If we compile the MAX8998 for a 64bit architecture we receive the following
warnings:

  drivers/mfd/max8998.c: In function ‘max8998_i2c_get_driver_data’:
  drivers/mfd/max8998.c:178:10:
    warning: cast from pointer to integer of different size
     return (int)match->data;
            ^
Signed-off-by: NLee Jones <lee.jones@linaro.org>

8bace2d5

mfd: max8997: Naturalise cross-architecture discrepancies · 05fb7a56

由 Lee Jones 提交于 1月 23, 2014

If we compile the MAX8997 for a 64bit architecture we receive the following
warnings:

  drivers/mfd/max8997.c: In function ‘max8997_i2c_get_driver_data’:
  drivers/mfd/max8997.c:173:10:
    warning: cast from pointer to integer of different size
     return (int)match->data;
            ^
Signed-off-by: NLee Jones <lee.jones@linaro.org>

05fb7a56

18 2月, 2014 2 次提交

inotify: Fix reporting of cookies for inotify events · 45a22f4c

由 Jan Kara 提交于 2月 17, 2014

My rework of handling of notification events (namely commit 7053aee2
"fsnotify: do not share events between notification groups") broke
sending of cookies with inotify events. We didn't propagate the value
passed to fsnotify() properly and passed 4 uninitialized bytes to
userspace instead (so it is also an information leak). Sadly I didn't
notice this during my testing because inotify cookies aren't used very
much and LTP inotify tests ignore them.

Fix the problem by passing the cookie value properly.

Fixes: 7053aee2Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: NJan Kara <jack@suse.cz>

45a22f4c

ceph: remove xattr when null value is given to setxattr() · bcdfeb2e

由 Yan, Zheng 提交于 2月 11, 2014

For the setxattr request, introduce a new flag CEPH_XATTR_REMOVE
to distinguish null value case from the zero-length value case.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

bcdfeb2e

17 2月, 2014 3 次提交

netfilter: nf_tables: fix nf_trace always-on with XT_TRACE=n · 478b360a

由 Florian Westphal 提交于 2月 15, 2014

When using nftables with CONFIG_NETFILTER_XT_TARGET_TRACE=n, we get
lots of "TRACE: filter:output:policy:1 IN=..." warnings as several
places will leave skb->nf_trace uninitialised.

Unlike iptables tracing functionality is not conditional in nftables,
so always copy/zero nf_trace setting when nftables is enabled.

Move this into __nf_copy() helper.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

478b360a

netdevice: move netdev_cap_txqueue for shared usage to header · b9507bda

由 Daniel Borkmann 提交于 2月 16, 2014

In order to allow users to invoke netdev_cap_txqueue, it needs to
be moved into netdevice.h header file. While at it, also add kernel
doc header to document the API.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9507bda

netdevice: add queue selection fallback handler for ndo_select_queue · 99932d4f

由 Daniel Borkmann 提交于 2月 16, 2014

Add a new argument for ndo_select_queue() callback that passes a
fallback handler. This gets invoked through netdev_pick_tx();
fallback handler is currently __netdev_pick_tx() as most drivers
invoke this function within their customized implementation in
case for skbs that don't need any special handling. This fallback
handler can then be replaced on other call-sites with different
queue selection methods (e.g. in packet sockets, pktgen etc).

This also has the nice side-effect that __netdev_pick_tx() is
then only invoked from netdev_pick_tx() and export of that
function to modules can be undone.
Suggested-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

99932d4f

14 2月, 2014 5 次提交

workqueue: add args to workqueue lockdep name · fada94ee

由 Li Zhong 提交于 2月 14, 2014

Tommi noticed a 'funny' lock class name: "%s#5" from a lock acquired in
process_one_work().

Maybe #fmt plus #args could be used as the lock_name to give some more
information for some fmt string like the above.

__builtin_constant_p() check is removed (as there seems no good way to
check all the variables in args list). However, by removing the check,
it only adds two additional "s for those constants.

Some lockdep name examples printed out after the change:

lockdep name                    wq->name

"events_long"                   events_long
"%s"("khelper")                 khelper
"xfs-data/%s"mp->m_fsname       xfs-data/dm-3
Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

fada94ee

mlx5: Add include of <linux/slab.h> because of kzalloc()/kfree() use · 6ecde51d

由 Roland Dreier 提交于 2月 13, 2014

On some architectures (for example, arm), we don't end up indirectly
pulling in the declaration of kzalloc() and kfree(), and so building
anything that includes <linux/mlx5/driver.h> breaks.  Fix this by adding
an explicit include to get the declaration.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

6ecde51d

net: ip, ipv6: handle gso skbs in forwarding path · fe6cc55f

由 Florian Westphal 提交于 2月 13, 2014

Marcelo Ricardo Leitner reported problems when the forwarding link path
has a lower mtu than the incoming one if the inbound interface supports GRO.

Given:
Host <mtu1500> R1 <mtu1200> R2

Host sends tcp stream which is routed via R1 and R2.  R1 performs GRO.

In this case, the kernel will fail to send ICMP fragmentation needed
messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu
checks in forward path. Instead, Linux tries to send out packets exceeding
the mtu.

When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does
not fragment the packets when forwarding, and again tries to send out
packets exceeding R1-R2 link mtu.

This alters the forwarding dstmtu checks to take the individual gso
segment lengths into account.

For ipv6, we send out pkt too big error for gso if the individual
segments are too big.

For ipv4, we either send icmp fragmentation needed, or, if the DF bit
is not set, perform software segmentation and let the output path
create fragments when the packet is leaving the machine.
It is not 100% correct as the error message will contain the headers of
the GRO skb instead of the original/segmented one, but it seems to
work fine in my (limited) tests.

Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid
sofware segmentation.

However it turns out that skb_segment() assumes skb nr_frags is related
to mss size so we would BUG there.  I don't want to mess with it considering
Herbert and Eric disagree on what the correct behavior should be.

Hannes Frederic Sowa notes that when we would shrink gso_size
skb_segment would then also need to deal with the case where
SKB_MAX_FRAGS would be exceeded.

This uses sofware segmentation in the forward path when we hit ipv4
non-DF packets and the outgoing link mtu is too small.  Its not perfect,
but given the lack of bug reports wrt. GRO fwd being broken this is a
rare case anyway.  Also its not like this could not be improved later
once the dust settles.
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Reported-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe6cc55f

net: core: introduce netif_skb_dev_features · d2069403

由 Florian Westphal 提交于 2月 13, 2014

Will be used by upcoming ipv4 forward path change that needs to
determine feature mask using skb->dst->dev instead of skb->dev.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2069403

PCI/MSI: Add pci_enable_msi_exact() and pci_enable_msix_exact() · 3ce4e860

由 Alexander Gordeev 提交于 2月 13, 2014

The new functions are special cases for pci_enable_msi_range() and
pci_enable_msix_range() when a particular number of MSI or MSI-X
is needed.

By contrast with pci_enable_msi_range() and pci_enable_msix_range()
functions, pci_enable_msi_exact() and pci_enable_msix_exact()
return zero in case of success, which indicates MSI or MSI-X
interrupts have been successfully allocated.
Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

3ce4e860

13 2月, 2014 2 次提交

compiler/gcc4: Make quirk for asm_volatile_goto() unconditional · a9f18034

由 Steven Noonan 提交于 2月 12, 2014

I started noticing problems with KVM guest destruction on Linux
3.12+, where guest memory wasn't being cleaned up. I bisected it
down to the commit introducing the new 'asm goto'-based atomics,
and found this quirk was later applied to those.

Unfortunately, even with GCC 4.8.2 (which ostensibly fixed the
known 'asm goto' bug) I am still getting some kind of
miscompilation. If I enable the asm_volatile_goto quirk for my
compiler, KVM guests are destroyed correctly and the memory is
cleaned up.

So make the quirk unconditional for now, until bug is found
and fixed.
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSteven Noonan <steven@uplinklabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1392274867-15236-1-git-send-email-steven@uplinklabs.net
Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670Signed-off-by: NIngo Molnar <mingo@kernel.org>

a9f18034

dma-buf: update debugfs output · c0b00a52

由 Sumit Semwal 提交于 2月 03, 2014

Russell King observed 'wierd' looking output from debugfs, and also suggested
better ways of getting device names (use KBUILD_MODNAME, dev_name())

This patch addresses these issues to make the debugfs output correct and better
looking.

While at it, replace seq_printf with seq_puts to remove the checkpatch.pl
warnings.
Reported-by: NRussell King - ARM Linux <linux@arm.linux.org.uk>
Signed-off-by: NSumit Semwal <sumit.semwal@linaro.org>

c0b00a52

11 2月, 2014 2 次提交

block: Fix cloning of discard/write same bios · 8423ae3d

由 Kent Overstreet 提交于 2月 10, 2014

Immutable biovecs changed the way bio segments are treated in such a way that
bio_for_each_segment() cannot now do what we want for discard/write same bios,
since bi_size means something completely different for them.

Fortunately discard and write same bios never have more than a single biovec, so
bio_for_each_segment() is unnecessary and not terribly meaningful for them, but
we still have to special case them in a few places.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Tested-by: NRichard W.M. Jones <rjones@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

8423ae3d

cgroup: protect modifications to cgroup_idr with cgroup_mutex · 0ab02ca8

由 Li Zefan 提交于 2月 11, 2014

Setup cgroupfs like this:
  # mount -t cgroup -o cpuacct xxx /cgroup
  # mkdir /cgroup/sub1
  # mkdir /cgroup/sub2

Then run these two commands:
  # for ((; ;)) { mkdir /cgroup/sub1/tmp && rmdir /mnt/sub1/tmp; } &
  # for ((; ;)) { mkdir /cgroup/sub2/tmp && rmdir /mnt/sub2/tmp; } &

After seconds you may see this warning:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 25243 at lib/idr.c:527 sub_remove+0x87/0x1b0()
idr_remove called for id=6 which is not allocated.
...
Call Trace:
 [<ffffffff8156063c>] dump_stack+0x7a/0x96
 [<ffffffff810591ac>] warn_slowpath_common+0x8c/0xc0
 [<ffffffff81059296>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff81300aa7>] sub_remove+0x87/0x1b0
 [<ffffffff810f3f02>] ? css_killed_work_fn+0x32/0x1b0
 [<ffffffff81300bf5>] idr_remove+0x25/0xd0
 [<ffffffff810f2bab>] cgroup_destroy_css_killed+0x5b/0xc0
 [<ffffffff810f4000>] css_killed_work_fn+0x130/0x1b0
 [<ffffffff8107cdbc>] process_one_work+0x26c/0x550
 [<ffffffff8107eefe>] worker_thread+0x12e/0x3b0
 [<ffffffff81085f96>] kthread+0xe6/0xf0
 [<ffffffff81570bac>] ret_from_fork+0x7c/0xb0
---[ end trace 2d1577ec10cf80d0 ]---

It's because allocating/removing cgroup ID is not properly synchronized.

The bug was introduced when we converted cgroup_ida to cgroup_idr.
While synchronization is already done inside ida_simple_{get,remove}(),
users are responsible for concurrent calls to idr_{alloc,remove}().

tj: Refreshed on top of b58c8998 ("cgroup: fix error return from
cgroup_create()").

Fixes: 4e96ee8e ("cgroup: convert cgroup_ida to cgroup_idr")
Cc: <stable@vger.kernel.org> #3.12+
Reported-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

0ab02ca8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功