提交 · d2312e3379d581d2c3603357a0181046448e1de3 · openeuler / raspberrypi-kernel

07 3月, 2014 1 次提交

firewire: don't use PREPARE_DELAYED_WORK · 70044d71

由 Tejun Heo 提交于 3月 07, 2014

PREPARE_[DELAYED_]WORK() are being phased out.  They have few users
and a nasty surprise in terms of reentrancy guarantee as workqueue
considers work items to be different if they don't have the same work
function.

firewire core-device and sbp2 have been been multiplexing work items
with multiple work functions.  Introduce fw_device_workfn() and
sbp2_lu_workfn() which invoke fw_device->workfn and
sbp2_logical_unit->workfn respectively and always use the two
functions as the work functions and update the users to set the
->workfn fields instead of overriding work functions using
PREPARE_DELAYED_WORK().

This fixes a variety of possible regressions since a2c1c57b
"workqueue: consider work function when searching for busy work items"
due to which fw_workqueue lost its required non-reentrancy property.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
Cc: linux1394-devel@lists.sourceforge.net
Cc: stable@vger.kernel.org # v3.9+
Cc: stable@vger.kernel.org # v3.8.2+
Cc: stable@vger.kernel.org # v3.4.60+
Cc: stable@vger.kernel.org # v3.2.40+

70044d71

04 3月, 2014 4 次提交

mm: numa: bugfix for LAST_CPUPID_NOT_IN_PAGE_FLAGS · 1ae71d03

由 Liu Ping Fan 提交于 3月 03, 2014

When doing some numa tests on powerpc, I triggered an oops bug.  I find
it is caused by using page->_last_cpupid.  It should be initialized as
"-1 & LAST_CPUPID_MASK", but not "-1".  Otherwise, in task_numa_fault(),
we will miss the checking (last_cpupid == (-1 & LAST_CPUPID_MASK)).  And
finally cause an oops bug in task_numa_group(), since the online cpu is
less than possible cpu.  This happen with CONFIG_SPARSE_VMEMMAP disabled

Call trace:

  SMP NR_CPUS=64 NUMA PowerNV
  Modules linked in:
  CPU: 24 PID: 804 Comm: systemd-udevd Not tainted3.13.0-rc1+ #32
  task: c000001e2746aa80 ti: c000001e32c50000 task.ti:c000001e32c50000
  REGS: c000001e32c53510 TRAP: 0300   Not tainted(3.13.0-rc1+)
  MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR:28024424  XER: 20000000
  CFAR: c000000000009324 DAR: 7265717569726857 DSISR:40000000 SOFTE: 1
  NIP  .task_numa_fault+0x1470/0x2370
  LR  .task_numa_fault+0x1468/0x2370
  Call Trace:
   .task_numa_fault+0x1468/0x2370 (unreliable)
   .do_numa_page+0x480/0x4a0
   .handle_mm_fault+0x4ec/0xc90
   .do_page_fault+0x3a8/0x890
   handle_page_fault+0x10/0x30
  Instruction dump:
  3c82fefb 3884b138 48d9cff1 60000000 48000574 3c62fefb3863af78 3c82fefb
  3884b138 48d9cfd5 60000000 e93f0100 <812902e4> 7d2907b45529063e 7d2a07b4
  ---[ end trace 15f2510da5ae07cf ]---
Signed-off-by: NLiu Ping Fan <pingfank@linux.vnet.ibm.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1ae71d03

mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking · 9050d7eb

由 Vlastimil Babka 提交于 3月 03, 2014

Daniel Borkmann reported a VM_BUG_ON assertion failing:

  ------------[ cut here ]------------
  kernel BUG at mm/mlock.c:528!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: ccm arc4 iwldvm [...]
   video
  CPU: 3 PID: 2266 Comm: netsniff-ng Not tainted 3.14.0-rc2+ #8
  Hardware name: LENOVO 2429BP3/2429BP3, BIOS G4ET37WW (1.12 ) 05/29/2012
  task: ffff8801f87f9820 ti: ffff88002cb44000 task.ti: ffff88002cb44000
  RIP: 0010:[<ffffffff81171ad0>]  [<ffffffff81171ad0>] munlock_vma_pages_range+0x2e0/0x2f0
  Call Trace:
    do_munmap+0x18f/0x3b0
    vm_munmap+0x41/0x60
    SyS_munmap+0x22/0x30
    system_call_fastpath+0x1a/0x1f
  RIP   munlock_vma_pages_range+0x2e0/0x2f0
  ---[ end trace a0088dcf07ae10f2 ]---

because munlock_vma_pages_range() thinks it's unexpectedly in the middle
of a THP page.  This can be reproduced with default config since 3.11
kernels.  A reproducer can be found in the kernel's selftest directory
for networking by running ./psock_tpacket.

The problem is that an order=2 compound page (allocated by
alloc_one_pg_vec_page() is part of the munlocked VM_MIXEDMAP vma (mapped
by packet_mmap()) and mistaken for a THP page and assumed to be order=9.

The checks for THP in munlock came with commit ff6a6da6 ("mm:
accelerate munlock() treatment of THP pages"), i.e.  since 3.9, but did
not trigger a bug.  It just makes munlock_vma_pages_range() skip such
compound pages until the next 512-pages-aligned page, when it encounters
a head page.  This is however not a problem for vma's where mlocking has
no effect anyway, but it can distort the accounting.

Since commit 7225522b ("mm: munlock: batch non-THP page isolation
and munlock+putback using pagevec") this can trigger a VM_BUG_ON in
PageTransHuge() check.

This patch fixes the issue by adding VM_MIXEDMAP flag to VM_SPECIAL, a
list of flags that make vma's non-mlockable and non-mergeable.  The
reasoning is that VM_MIXEDMAP vma's are similar to VM_PFNMAP, which is
already on the VM_SPECIAL list, and both are intended for non-LRU pages
where mlocking makes no sense anyway.  Related Lkml discussion can be
found in [2].

 [1] tools/testing/selftests/net/psock_tpacket
 [2] https://lkml.org/lkml/2014/1/10/427Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Reported-by: NDaniel Borkmann <dborkman@redhat.com>
Tested-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Jared Hulbert <jaredeh@gmail.com>
Tested-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [3.11.x+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9050d7eb

mm: close PageTail race · 668f9abb

由 David Rientjes 提交于 3月 03, 2014

Commit bf6bddf1 ("mm: introduce compaction and migration for
ballooned pages") introduces page_count(page) into memory compaction
which dereferences page->first_page if PageTail(page).

This results in a very rare NULL pointer dereference on the
aforementioned page_count(page).  Indeed, anything that does
compound_head(), including page_count() is susceptible to racing with
prep_compound_page() and seeing a NULL or dangling page->first_page
pointer.

This patch uses Andrea's implementation of compound_trans_head() that
deals with such a race and makes it the default compound_head()
implementation.  This includes a read memory barrier that ensures that
if PageTail(head) is true that we return a head page that is neither
NULL nor dangling.  The patch then adds a store memory barrier to
prep_compound_page() to ensure page->first_page is set.

This is the safest way to ensure we see the head page that we are
expecting, PageTail(page) is already in the unlikely() path and the
memory barriers are unfortunately required.

Hugetlbfs is the exception, we don't enforce a store memory barrier
during init since no race is possible.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Cc: Holger Kiehl <Holger.Kiehl@dwd.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

668f9abb

tracing: Do not add event files for modules that fail tracepoints · 45ab2813

由 Steven Rostedt (Red Hat) 提交于 2月 26, 2014

If a module fails to add its tracepoints due to module tainting, do not
create the module event infrastructure in the debugfs directory. As the events
will not work and worse yet, they will silently fail, making the user wonder
why the events they enable do not display anything.

Having a warning on module load and the events not visible to the users
will make the cause of the problem much clearer.

Link: http://lkml.kernel.org/r/20140227154923.265882695@goodmis.org

Fixes: 6d723736 "tracing/events: add support for modules to TRACE_EVENT"
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: stable@vger.kernel.org # 2.6.31+
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

45ab2813

02 3月, 2014 1 次提交

NFSv4: Fix another nfs4_sequence corruptor · b7e63a10

由 Trond Myklebust 提交于 2月 26, 2014

nfs4_release_lockowner needs to set the rpc_message reply to point to
the nfs4_sequence_res in order to avoid another Oopsable situation
in nfs41_assign_slot.

Fixes: fbd4bfd1 (NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER)
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b7e63a10

26 2月, 2014 1 次提交

ipc,mqueue: remove limits for the amount of system-wide queues · f3713fd9

由 Davidlohr Bueso 提交于 2月 25, 2014

Commit 93e6f119 ("ipc/mqueue: cleanup definition names and
locations") added global hardcoded limits to the amount of message
queues that can be created.  While these limits are per-namespace,
reality is that it ends up breaking userspace applications.
Historically users have, at least in theory, been able to create up to
INT_MAX queues, and limiting it to just 1024 is way too low and dramatic
for some workloads and use cases.  For instance, Madars reports:

 "This update imposes bad limits on our multi-process application.  As
  our app uses approaches that each process opens its own set of queues
  (usually something about 3-5 queues per process).  In some scenarios
  we might run up to 3000 processes or more (which of-course for linux
  is not a problem).  Thus we might need up to 9000 queues or more.  All
  processes run under one user."

Other affected users can be found in launchpad bug #1155695:
  https://bugs.launchpad.net/ubuntu/+source/manpages/+bug/1155695

Instead of increasing this limit, revert it entirely and fallback to the
original way of dealing queue limits -- where once a user's resource
limit is reached, and all memory is used, new queues cannot be created.
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Reported-by: NMadars Vitolins <m@silodev.com>
Acked-by: NDoug Ledford <dledford@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>	[3.5+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f3713fd9

25 2月, 2014 2 次提交

sysfs: fix namespace refcnt leak · fed95bab

由 Li Zefan 提交于 2月 25, 2014

As mount() and kill_sb() is not a one-to-one match, we shoudn't get
ns refcnt unconditionally in sysfs_mount(), and instead we should
get the refcnt only when kernfs_mount() allocated a new superblock.

v2:
- Changed the name of the new argument, suggested by Tejun.
- Made the argument optional, suggested by Tejun.

v3:
- Make the new argument as second-to-last arg, suggested by Tejun.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NTejun Heo <tj@kernel.org>
 ---
 fs/kernfs/mount.c      | 8 +++++++-
 fs/sysfs/mount.c       | 5 +++--
 include/linux/kernfs.h | 9 +++++----
 3 files changed, 15 insertions(+), 7 deletions(-)
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

fed95bab

fsnotify: Allocate overflow events with proper type · ff57cd58

由 Jan Kara 提交于 2月 21, 2014

Commit 7053aee2 "fsnotify: do not share events between notification
groups" used overflow event statically allocated in a group with the
size of the generic notification event. This causes problems because
some code looks at type specific parts of event structure and gets
confused by a random data it sees there and causes crashes.

Fix the problem by allocating overflow event with type corresponding to
the group type so code cannot get confused.
Signed-off-by: NJan Kara <jack@suse.cz>

ff57cd58

22 2月, 2014 4 次提交

Revert "writeback: do not sync data dirtied after sync start" · 0dc83bd3

由 Jan Kara 提交于 2月 21, 2014

This reverts commit c4a391b5. Dave
Chinner <david@fromorbit.com> has reported the commit may cause some
inodes to be left out from sync(2). This is because we can call
redirty_tail() for some inode (which sets i_dirtied_when to current time)
after sync(2) has started or similarly requeue_inode() can set
i_dirtied_when to current time if writeback had to skip some pages. The
real problem is in the functions clobbering i_dirtied_when but fixing
that isn't trivial so revert is a safer choice for now.

CC: stable@vger.kernel.org # >= 3.13
Signed-off-by: NJan Kara <jack@suse.cz>

0dc83bd3

sched: Add 'flags' argument to sched_{set,get}attr() syscalls · 6d35ab48

由 Peter Zijlstra 提交于 2月 14, 2014

Because of a recent syscall design debate; its deemed appropriate for
each syscall to have a flags argument for future extension; without
immediately requiring new syscalls.

Cc: juri.lelli@gmail.com
Cc: Ingo Molnar <mingo@redhat.com>
Suggested-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140214161929.GL27965@twins.programming.kicks-ass.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

6d35ab48

blk-mq: support partial I/O completions · d6a25b31

由 Christoph Hellwig 提交于 2月 20, 2014

Add a new blk_mq_end_io_partial function to partially complete requests
as needed by the SCSI layer.  We do this by reusing blk_update_request
to advance the bio instead of having a simplified version of it in
the blk-mq code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

d6a25b31

blk-mq: merge blk_mq_insert_request and blk_mq_run_request · feb71dae

由 Christoph Hellwig 提交于 2月 20, 2014

It's almost identical to blk_mq_insert_request, so fold the two into one
slightly more generic function by making the flush special case a bit
smarted.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

feb71dae

20 2月, 2014 1 次提交

ARM: OMAP2+: clock: fix clkoutx2 with CLK_SET_RATE_PARENT · 994c41ee

由 Tomi Valkeinen 提交于 1月 30, 2014

If CLK_SET_RATE_PARENT is set for a clkoutx2 clock, calling
clk_set_rate() on the clock "skips" the x2 multiplier as there are no
set_rate and round_rate functions defined for the clkoutx2.

This results in getting double the requested clock rates, breaking the
display on omap3430 based devices. This got broken when
d0f58bd3 and related patches were merged
for v3.14, as omapdss driver now relies more on the clk-framework and
CLK_SET_RATE_PARENT.

This patch implements set_rate and round_rate for clkoutx2.

Tested on OMAP3430, OMAP3630, OMAP4460.
Signed-off-by: NTomi Valkeinen <tomi.valkeinen@ti.com>
Acked-by: NTero Kristo <t-kristo@ti.com>
Signed-off-by: NPaul Walmsley <paul@pwsan.com>

994c41ee

19 2月, 2014 3 次提交

mfd: tps65217: Naturalise cross-architecture discrepancies · 5c6fbd56

由 Lee Jones 提交于 2月 03, 2014

If we compile the TPS65217 for a 64bit architecture we receive the following
warnings:

drivers/mfd/tps65217.c: In function ‘tps65217_probe’:
drivers/mfd/tps65217.c:173:13:
  warning: cast from pointer to integer of different size
   chip_id = (unsigned int)match->data;
             ^
Signed-off-by: NLee Jones <lee.jones@linaro.org>

5c6fbd56

mfd: max8998: Naturalise cross-architecture discrepancies · 8bace2d5

由 Lee Jones 提交于 2月 03, 2014

If we compile the MAX8998 for a 64bit architecture we receive the following
warnings:

  drivers/mfd/max8998.c: In function ‘max8998_i2c_get_driver_data’:
  drivers/mfd/max8998.c:178:10:
    warning: cast from pointer to integer of different size
     return (int)match->data;
            ^
Signed-off-by: NLee Jones <lee.jones@linaro.org>

8bace2d5

mfd: max8997: Naturalise cross-architecture discrepancies · 05fb7a56

由 Lee Jones 提交于 1月 23, 2014

If we compile the MAX8997 for a 64bit architecture we receive the following
warnings:

  drivers/mfd/max8997.c: In function ‘max8997_i2c_get_driver_data’:
  drivers/mfd/max8997.c:173:10:
    warning: cast from pointer to integer of different size
     return (int)match->data;
            ^
Signed-off-by: NLee Jones <lee.jones@linaro.org>

05fb7a56

18 2月, 2014 2 次提交

inotify: Fix reporting of cookies for inotify events · 45a22f4c

由 Jan Kara 提交于 2月 17, 2014

My rework of handling of notification events (namely commit 7053aee2
"fsnotify: do not share events between notification groups") broke
sending of cookies with inotify events. We didn't propagate the value
passed to fsnotify() properly and passed 4 uninitialized bytes to
userspace instead (so it is also an information leak). Sadly I didn't
notice this during my testing because inotify cookies aren't used very
much and LTP inotify tests ignore them.

Fix the problem by passing the cookie value properly.

Fixes: 7053aee2Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: NJan Kara <jack@suse.cz>

45a22f4c

ceph: remove xattr when null value is given to setxattr() · bcdfeb2e

由 Yan, Zheng 提交于 2月 11, 2014

For the setxattr request, introduce a new flag CEPH_XATTR_REMOVE
to distinguish null value case from the zero-length value case.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

bcdfeb2e

17 2月, 2014 3 次提交

netfilter: nf_tables: fix nf_trace always-on with XT_TRACE=n · 478b360a

由 Florian Westphal 提交于 2月 15, 2014

When using nftables with CONFIG_NETFILTER_XT_TARGET_TRACE=n, we get
lots of "TRACE: filter:output:policy:1 IN=..." warnings as several
places will leave skb->nf_trace uninitialised.

Unlike iptables tracing functionality is not conditional in nftables,
so always copy/zero nf_trace setting when nftables is enabled.

Move this into __nf_copy() helper.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

478b360a

netdevice: move netdev_cap_txqueue for shared usage to header · b9507bda

由 Daniel Borkmann 提交于 2月 16, 2014

In order to allow users to invoke netdev_cap_txqueue, it needs to
be moved into netdevice.h header file. While at it, also add kernel
doc header to document the API.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9507bda

netdevice: add queue selection fallback handler for ndo_select_queue · 99932d4f

由 Daniel Borkmann 提交于 2月 16, 2014

Add a new argument for ndo_select_queue() callback that passes a
fallback handler. This gets invoked through netdev_pick_tx();
fallback handler is currently __netdev_pick_tx() as most drivers
invoke this function within their customized implementation in
case for skbs that don't need any special handling. This fallback
handler can then be replaced on other call-sites with different
queue selection methods (e.g. in packet sockets, pktgen etc).

This also has the nice side-effect that __netdev_pick_tx() is
then only invoked from netdev_pick_tx() and export of that
function to modules can be undone.
Suggested-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

99932d4f

14 2月, 2014 5 次提交

workqueue: add args to workqueue lockdep name · fada94ee

由 Li Zhong 提交于 2月 14, 2014

Tommi noticed a 'funny' lock class name: "%s#5" from a lock acquired in
process_one_work().

Maybe #fmt plus #args could be used as the lock_name to give some more
information for some fmt string like the above.

__builtin_constant_p() check is removed (as there seems no good way to
check all the variables in args list). However, by removing the check,
it only adds two additional "s for those constants.

Some lockdep name examples printed out after the change:

lockdep name                    wq->name

"events_long"                   events_long
"%s"("khelper")                 khelper
"xfs-data/%s"mp->m_fsname       xfs-data/dm-3
Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

fada94ee

mlx5: Add include of <linux/slab.h> because of kzalloc()/kfree() use · 6ecde51d

由 Roland Dreier 提交于 2月 13, 2014

On some architectures (for example, arm), we don't end up indirectly
pulling in the declaration of kzalloc() and kfree(), and so building
anything that includes <linux/mlx5/driver.h> breaks.  Fix this by adding
an explicit include to get the declaration.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

6ecde51d

net: ip, ipv6: handle gso skbs in forwarding path · fe6cc55f

由 Florian Westphal 提交于 2月 13, 2014

Marcelo Ricardo Leitner reported problems when the forwarding link path
has a lower mtu than the incoming one if the inbound interface supports GRO.

Given:
Host <mtu1500> R1 <mtu1200> R2

Host sends tcp stream which is routed via R1 and R2.  R1 performs GRO.

In this case, the kernel will fail to send ICMP fragmentation needed
messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu
checks in forward path. Instead, Linux tries to send out packets exceeding
the mtu.

When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does
not fragment the packets when forwarding, and again tries to send out
packets exceeding R1-R2 link mtu.

This alters the forwarding dstmtu checks to take the individual gso
segment lengths into account.

For ipv6, we send out pkt too big error for gso if the individual
segments are too big.

For ipv4, we either send icmp fragmentation needed, or, if the DF bit
is not set, perform software segmentation and let the output path
create fragments when the packet is leaving the machine.
It is not 100% correct as the error message will contain the headers of
the GRO skb instead of the original/segmented one, but it seems to
work fine in my (limited) tests.

Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid
sofware segmentation.

However it turns out that skb_segment() assumes skb nr_frags is related
to mss size so we would BUG there.  I don't want to mess with it considering
Herbert and Eric disagree on what the correct behavior should be.

Hannes Frederic Sowa notes that when we would shrink gso_size
skb_segment would then also need to deal with the case where
SKB_MAX_FRAGS would be exceeded.

This uses sofware segmentation in the forward path when we hit ipv4
non-DF packets and the outgoing link mtu is too small.  Its not perfect,
but given the lack of bug reports wrt. GRO fwd being broken this is a
rare case anyway.  Also its not like this could not be improved later
once the dust settles.
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Reported-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe6cc55f

net: core: introduce netif_skb_dev_features · d2069403

由 Florian Westphal 提交于 2月 13, 2014

Will be used by upcoming ipv4 forward path change that needs to
determine feature mask using skb->dst->dev instead of skb->dev.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2069403

PCI/MSI: Add pci_enable_msi_exact() and pci_enable_msix_exact() · 3ce4e860

由 Alexander Gordeev 提交于 2月 13, 2014

The new functions are special cases for pci_enable_msi_range() and
pci_enable_msix_range() when a particular number of MSI or MSI-X
is needed.

By contrast with pci_enable_msi_range() and pci_enable_msix_range()
functions, pci_enable_msi_exact() and pci_enable_msix_exact()
return zero in case of success, which indicates MSI or MSI-X
interrupts have been successfully allocated.
Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

3ce4e860

13 2月, 2014 2 次提交

compiler/gcc4: Make quirk for asm_volatile_goto() unconditional · a9f18034

由 Steven Noonan 提交于 2月 12, 2014

I started noticing problems with KVM guest destruction on Linux
3.12+, where guest memory wasn't being cleaned up. I bisected it
down to the commit introducing the new 'asm goto'-based atomics,
and found this quirk was later applied to those.

Unfortunately, even with GCC 4.8.2 (which ostensibly fixed the
known 'asm goto' bug) I am still getting some kind of
miscompilation. If I enable the asm_volatile_goto quirk for my
compiler, KVM guests are destroyed correctly and the memory is
cleaned up.

So make the quirk unconditional for now, until bug is found
and fixed.
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSteven Noonan <steven@uplinklabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1392274867-15236-1-git-send-email-steven@uplinklabs.net
Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670Signed-off-by: NIngo Molnar <mingo@kernel.org>

a9f18034

dma-buf: update debugfs output · c0b00a52

由 Sumit Semwal 提交于 2月 03, 2014

Russell King observed 'wierd' looking output from debugfs, and also suggested
better ways of getting device names (use KBUILD_MODNAME, dev_name())

This patch addresses these issues to make the debugfs output correct and better
looking.

While at it, replace seq_printf with seq_puts to remove the checkpatch.pl
warnings.
Reported-by: NRussell King - ARM Linux <linux@arm.linux.org.uk>
Signed-off-by: NSumit Semwal <sumit.semwal@linaro.org>

c0b00a52

11 2月, 2014 5 次提交

block: Fix cloning of discard/write same bios · 8423ae3d

由 Kent Overstreet 提交于 2月 10, 2014

Immutable biovecs changed the way bio segments are treated in such a way that
bio_for_each_segment() cannot now do what we want for discard/write same bios,
since bi_size means something completely different for them.

Fortunately discard and write same bios never have more than a single biovec, so
bio_for_each_segment() is unnecessary and not terribly meaningful for them, but
we still have to special case them in a few places.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Tested-by: NRichard W.M. Jones <rjones@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

8423ae3d

cgroup: protect modifications to cgroup_idr with cgroup_mutex · 0ab02ca8

由 Li Zefan 提交于 2月 11, 2014

Setup cgroupfs like this:
  # mount -t cgroup -o cpuacct xxx /cgroup
  # mkdir /cgroup/sub1
  # mkdir /cgroup/sub2

Then run these two commands:
  # for ((; ;)) { mkdir /cgroup/sub1/tmp && rmdir /mnt/sub1/tmp; } &
  # for ((; ;)) { mkdir /cgroup/sub2/tmp && rmdir /mnt/sub2/tmp; } &

After seconds you may see this warning:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 25243 at lib/idr.c:527 sub_remove+0x87/0x1b0()
idr_remove called for id=6 which is not allocated.
...
Call Trace:
 [<ffffffff8156063c>] dump_stack+0x7a/0x96
 [<ffffffff810591ac>] warn_slowpath_common+0x8c/0xc0
 [<ffffffff81059296>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff81300aa7>] sub_remove+0x87/0x1b0
 [<ffffffff810f3f02>] ? css_killed_work_fn+0x32/0x1b0
 [<ffffffff81300bf5>] idr_remove+0x25/0xd0
 [<ffffffff810f2bab>] cgroup_destroy_css_killed+0x5b/0xc0
 [<ffffffff810f4000>] css_killed_work_fn+0x130/0x1b0
 [<ffffffff8107cdbc>] process_one_work+0x26c/0x550
 [<ffffffff8107eefe>] worker_thread+0x12e/0x3b0
 [<ffffffff81085f96>] kthread+0xe6/0xf0
 [<ffffffff81570bac>] ret_from_fork+0x7c/0xb0
---[ end trace 2d1577ec10cf80d0 ]---

It's because allocating/removing cgroup ID is not properly synchronized.

The bug was introduced when we converted cgroup_ida to cgroup_idr.
While synchronization is already done inside ida_simple_{get,remove}(),
users are responsible for concurrent calls to idr_{alloc,remove}().

tj: Refreshed on top of b58c8998 ("cgroup: fix error return from
cgroup_create()").

Fixes: 4e96ee8e ("cgroup: convert cgroup_ida to cgroup_idr")
Cc: <stable@vger.kernel.org> #3.12+
Reported-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

0ab02ca8

smp.h: fix x86+cpu.c sparse warnings about arch nonboot CPU calls · fb37bb04

由 Paul Gortmaker 提交于 2月 10, 2014

Use what we already do for arch_disable_smp_support() to fix these:

arch/x86/kernel/smpboot.c:1155:6: warning: symbol 'arch_enable_nonboot_cpus_begin' was not declared. Should it be static?
arch/x86/kernel/smpboot.c:1160:6: warning: symbol 'arch_enable_nonboot_cpus_end' was not declared. Should it be static?
kernel/cpu.c:512:13: warning: symbol 'arch_enable_nonboot_cpus_begin' was not declared. Should it be static?
kernel/cpu.c:516:13: warning: symbol 'arch_enable_nonboot_cpus_end' was not declared. Should it be static?
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fb37bb04

blk-mq: rework flush sequencing logic · 18741986

由 Christoph Hellwig 提交于 2月 10, 2014

Witch to using a preallocated flush_rq for blk-mq similar to what's done
with the old request path.  This allows us to set up the request properly
with a tag from the actually allowed range and ->rq_disk as needed by
some drivers.  To make life easier we also switch to dynamic allocation
of ->flush_rq for the old path.

This effectively reverts most of

    "blk-mq: fix for flush deadlock"

and

    "blk-mq: Don't reserve a tag for flush request"
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

18741986

blk-mq: rework I/O completions · 30a91cb4

由 Christoph Hellwig 提交于 2月 10, 2014

Rework I/O completions to work more like the old code path. blk_mq_end_io
now stays out of the business of deferring completions to others CPUs
and calling blk_mark_rq_complete. The latter is very important to allow
completing requests that have timed out and thus are already marked completed,
the former allows using the IPI callout even for driver specific completions
instead of having to reimplement them.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

30a91cb4

10 2月, 2014 2 次提交

fs: Add prototype declaration to appropriate header file include/linux/bio.h · c4540a7d

由 Rashika Kheria 提交于 2月 09, 2014

Add prototype declaration to header file include/linux/bio.h because it
is used by more than one file.

This eliminates the following warning in bio-integrity.c:
fs/bio-integrity.c:214:14: warning: no previous prototype for ‘bio_integrity_tag_size’ [-Wmissing-prototypes]
Signed-off-by: NRashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

c4540a7d

fix O_SYNC|O_APPEND syncing the wrong range on write() · d311d79d

由 Al Viro 提交于 2月 09, 2014

It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
when sync_page_range() had been introduced; generic_file_write{,v}() correctly
synced
	pos_after_write - written .. pos_after_write - 1
but generic_file_aio_write() synced
	pos_before_write .. pos_before_write + written - 1
instead.  Which is not the same thing with O_APPEND, obviously.
A couple of years later correct variant had been killed off when
everything switched to use of generic_file_aio_write().

All users of generic_file_aio_write() are affected, and the same bug
has been copied into other instances of ->aio_write().

The fix is trivial; the only subtle point is that generic_write_sync()
ought to be inlined to avoid calculations useless for the majority of
calls.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d311d79d

09 2月, 2014 1 次提交

genirq: Add devm_request_any_context_irq() · 0668d306

由 Stephen Boyd 提交于 1月 02, 2014

Some drivers use request_any_context_irq() but there isn't a
devm_* function for it. Add one so that these drivers don't need
to explicitly free the irq on driver detach.
Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Link: http://lkml.kernel.org/r/1388709460-19222-3-git-send-email-sboyd@codeaurora.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

0668d306

08 2月, 2014 3 次提交

Revert "usb: xhci: Link TRB must not occur within a USB payload burst" · 3d4b81ed

由 Sarah Sharp 提交于 1月 31, 2014

This reverts commit 35773dac.  It's a
hack that caused regressions in the usb-storage and userspace USB
drivers that use usbfs and libusb.  Commit 70cabb7d992f "xhci 1.0: Limit
arbitrarily-aligned scatter gather." should fix the issues seen with the
ax88179_178a driver on xHCI 1.0 hosts, without causing regressions.
Signed-off-by: NSarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: stable@vger.kernel.org # 3.12

3d4b81ed

blk-mq: support at_head inserations for blk_execute_rq · 72a0a36e

由 Christoph Hellwig 提交于 2月 07, 2014

This is neede for proper SG_IO operation as well as various uses of
blk_execute_rq from the SCSI midlayer.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

72a0a36e

Drivers: hv: vmbus: Specify the target CPU that should receive notification · e28bab48

由 K. Y. Srinivasan 提交于 1月 15, 2014

During the initial VMBUS connect phase, starting with WS2012 R2, we should
specify the VPCU in the guest that should receive the notification. Fix this
issue. This fix is required to properly connect to the host in the kexeced
kernel.
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>        [3.9+]
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e28bab48