提交 · 9c62a68d13119a1ca9718381d97b0cb415ff4e9d · openeuler / raspberrypi-kernel

18 3月, 2014 5 次提交

netpoll: Remove dead packet receive code (CONFIG_NETPOLL_TRAP) · 9c62a68d

由 Eric W. Biederman 提交于 3月 14, 2014

The netpoll packet receive code only becomes active if the netpoll
rx_skb_hook is implemented, and there is not a single implementation
of the netpoll rx_skb_hook in the kernel.

All of the out of tree implementations I have found all call
netpoll_poll which was removed from the kernel in 2011, so this
change should not add any additional breakage.

There are problems with the netpoll packet receive code.  __netpoll_rx
does not call dev_kfree_skb_irq or dev_kfree_skb_any in hard irq
context.  netpoll_neigh_reply leaks every skb it receives.  Reception
of packets does not work successfully on stacked devices (aka bonding,
team, bridge, and vlans).

Given that the netpoll packet receive code is buggy, there are no
out of tree users that will be merged soon, and the code has
not been used for in tree for a decade let's just remove it.

Reverting this commit can server as a starting point for anyone
who wants to resurrect netpoll packet reception support.
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c62a68d

netpoll: Move all receive processing under CONFIG_NETPOLL_TRAP · e1bd4d3d

由 Eric W. Biederman 提交于 3月 14, 2014

Make rx_skb_hook, and rx in struct netpoll depend on
CONFIG_NETPOLL_TRAP Make rx_lock, rx_np, and neigh_tx in struct
netpoll_info depend on CONFIG_NETPOLL_TRAP

Make the functions netpoll_rx_on, netpoll_rx, and netpoll_receive_skb
no-ops when CONFIG_NETPOLL_TRAP is not set.

Only build netpoll_neigh_reply, checksum_udp service_neigh_queue,
pkt_is_ns, and __netpoll_rx when CONFIG_NETPOLL_TRAP is defined.

Add helper functions netpoll_trap_setup, netpoll_trap_setup_info,
netpoll_trap_cleanup, and netpoll_trap_cleanup_info that initialize
and cleanup the struct netpoll and struct netpoll_info receive
specific fields when CONFIG_NETPOLL_TRAP is enabled and do nothing
otherwise.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e1bd4d3d

netpoll: Move netpoll_trap under CONFIG_NETPOLL_TRAP · ad8d4752

由 Eric W. Biederman 提交于 3月 14, 2014

Now that we no longer need to receive packets to safely drain the
network drivers receive queue move netpoll_trap and netpoll_set_trap
under CONFIG_NETPOLL_TRAP

Making netpoll_trap and netpoll_set_trap noop inline functions
when CONFIG_NETPOLL_TRAP is not set.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad8d4752

netpoll: Don't drop all received packets. · b6bacd55

由 Eric W. Biederman 提交于 3月 14, 2014

Change the strategy of netpoll from dropping all packets received
during netpoll_poll_dev to calling napi poll with a budget of 0
(to avoid processing drivers rx queue), and to ignore packets received
with netif_rx (those will safely be placed on the backlog queue).

All of the netpoll supporting drivers have been reviewed to ensure
either thay use netif_rx or that a budget of 0 is supported by their
napi poll routine and that a budget of 0 will not process the drivers
rx queues.

Not dropping packets makes NETPOLL_RX_DROP unnecesary so it is removed.

npinfo->rx_flags is removed  as rx_flags with just the NETPOLL_RX_ENABLED
flag becomes just a redundant mirror of list_empty(&npinfo->rx_np).
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6bacd55

netpoll: Add netpoll_rx_processing · ff607631

由 Eric W. Biederman 提交于 3月 14, 2014

Add a helper netpoll_rx_processing that reports when netpoll has
receive side processing to perform.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff607631

15 3月, 2014 1 次提交

net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq · 57a7744e

由 Eric W. Biederman 提交于 3月 13, 2014

Replace the bh safe variant with the hard irq safe variant.

We need a hard irq safe variant to deal with netpoll transmitting
packets from hard irq context, and we need it in most if not all of
the places using the bh safe variant.

Except on 32bit uni-processor the code is exactly the same so don't
bother with a bh variant, just have a hard irq safe variant that
everyone can use.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

57a7744e

13 3月, 2014 4 次提交

mlx4: Implement IP based gids support for RoCE/SRIOV · 5ea8bbfc

由 Jack Morgenstein 提交于 3月 12, 2014

Since there is no connection between the MAC/VLAN and the GID
when using IP-based addressing, the proxy QP1 (running on the
slave) must pass the source-mac, destination-mac, and vlan_id
information separately from the GID. Additionally, the Host
must pass the remote source-mac and vlan_id back to the slave,

This is achieved as follows:
Outgoing MADs:
    1. Source MAC: obtained from the CQ completion structure
       (struct ib_wc, smac field).
    2. Destination MAC: obtained from the tunnel header
    3. vlan_id: obtained from the tunnel header.
Incoming MADs
    1. The source (i.e., remote) MAC and vlan_id are passed in
       the tunnel header to the proxy QP1.

VST mode support:
     For outgoing MADs,  the vlan_id obtained from the header is
        discarded, and the vlan_id specified by the Hypervisor is used
        instead.
     For incoming MADs, the incoming vlan_id (in the wc) is discarded, and the
        "invalid" vlan (0xffff)  is substituted when forwarding to the slave.
Signed-off-by: NMoni Shoua <monis@mellanox.co.il>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ea8bbfc

mlx4: In RoCE allow guests to have multiple GIDS · b6ffaeff

由 Jack Morgenstein 提交于 3月 12, 2014

The GIDs are statically distributed, as follows:
PF: gets 16 GIDs
VFs:  Remaining GIDS are divided evenly between VFs activated by the driver.
      If the division is not even, lower-numbered VFs get an extra GID.

For an IB interface, the number of gids per guest remains as before: one gid per guest.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6ffaeff

mlx4_core: For RoCE, allow slaves to set the GID entry at that slave's index · 9cd59352

由 Jack Morgenstein 提交于 3月 12, 2014

For IB transport, the host determines the slave GIDs. For ETH (RoCE),
however, the slave's GID is determined by the IP address that the slave
itself assigns to the ETH device used by RoCE.

In this case, the slave must be able to write its GIDs to the HCA gid table
(at the GID indices that slave "owns").

This commit adds processing for the SET_PORT_GID_TABLE opcode modifier
for the SET_PORT command wrapper (so that slaves may modify their GIDS
for RoCE).
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9cd59352

mlx4: Adjust QP1 multiplexing for RoCE/SRIOV · 6ee51a4e

由 Jack Morgenstein 提交于 3月 12, 2014

This requires the following modifications:
1. Fix build_mlx4_header to properly fill in the ETH fields
2. Adjust mux and demux QP1 flow to support RoCE.

This commit still assumes only one GID per slave for RoCE.
The commit enabling multiple GIDs is a subsequent commit, and
is done separately because of its complexity.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6ee51a4e

12 3月, 2014 1 次提交

netdev: set __percpu attribute on netdev_alloc_pcpu_stats · 693350c2

由 stephen hemminger 提交于 3月 10, 2014

This patch fixes sparse warnings in vlan driver.
It propagates the sparse __percpu attribute from alloc_percpu
into netdev_alloc_pcpu_stats. I expect it may trigger additional
sparse warnings from other drivers that are missing the __percpu
attribute.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Acked-by: NCong Wang <cwang@twopensource.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

693350c2

11 3月, 2014 1 次提交

mm: fix GFP_THISNODE callers and clarify · e97ca8e5

由 Johannes Weiner 提交于 3月 10, 2014

GFP_THISNODE is for callers that implement their own clever fallback to
remote nodes.  It restricts the allocation to the specified node and
does not invoke reclaim, assuming that the caller will take care of it
when the fallback fails, e.g.  through a subsequent allocation request
without GFP_THISNODE set.

However, many current GFP_THISNODE users only want the node exclusive
aspect of the flag, without actually implementing their own fallback or
triggering reclaim if necessary.  This results in things like page
migration failing prematurely even when there is easily reclaimable
memory available, unless kswapd happens to be running already or a
concurrent allocation attempt triggers the necessary reclaim.

Convert all callsites that don't implement their own fallback strategy
to __GFP_THISNODE.  This restricts the allocation a single node too, but
at the same time allows the allocator to enter the slowpath, wake
kswapd, and invoke direct reclaim if necessary, to make the allocation
happen when memory is full.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Jan Stancek <jstancek@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e97ca8e5

10 3月, 2014 2 次提交

get rid of fget_light() · bd2a31d5

由 Al Viro 提交于 3月 04, 2014

instead of returning the flags by reference, we can just have the
low-level primitive return those in lower bits of unsigned long,
with struct file * derived from the rest.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bd2a31d5

vfs: atomic f_pos accesses as per POSIX · 9c225f26

由 Linus Torvalds 提交于 3月 03, 2014

Our write() system call has always been atomic in the sense that you get
the expected thread-safe contiguous write, but we haven't actually
guaranteed that concurrent writes are serialized wrt f_pos accesses, so
threads (or processes) that share a file descriptor and use "write()"
concurrently would quite likely overwrite each others data.

This violates POSIX.1-2008/SUSv4 Section XSI 2.9.7 that says:

 "2.9.7 Thread Interactions with Regular File Operations

  All of the following functions shall be atomic with respect to each
  other in the effects specified in POSIX.1-2008 when they operate on
  regular files or symbolic links: [...]"

and one of the effects is the file position update.

This unprotected file position behavior is not new behavior, and nobody
has ever cared.  Until now.  Yongzhi Pan reported unexpected behavior to
Michael Kerrisk that was due to this.

This resolves the issue with a f_pos-specific lock that is taken by
read/write/lseek on file descriptors that may be shared across threads
or processes.
Reported-by: NYongzhi Pan <panyongzhi@gmail.com>
Reported-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9c225f26

07 3月, 2014 3 次提交

firewire: don't use PREPARE_DELAYED_WORK · 70044d71

由 Tejun Heo 提交于 3月 07, 2014

PREPARE_[DELAYED_]WORK() are being phased out.  They have few users
and a nasty surprise in terms of reentrancy guarantee as workqueue
considers work items to be different if they don't have the same work
function.

firewire core-device and sbp2 have been been multiplexing work items
with multiple work functions.  Introduce fw_device_workfn() and
sbp2_lu_workfn() which invoke fw_device->workfn and
sbp2_logical_unit->workfn respectively and always use the two
functions as the work functions and update the users to set the
->workfn fields instead of overriding work functions using
PREPARE_DELAYED_WORK().

This fixes a variety of possible regressions since a2c1c57b
"workqueue: consider work function when searching for busy work items"
due to which fw_workqueue lost its required non-reentrancy property.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
Cc: linux1394-devel@lists.sourceforge.net
Cc: stable@vger.kernel.org # v3.9+
Cc: stable@vger.kernel.org # v3.8.2+
Cc: stable@vger.kernel.org # v3.4.60+
Cc: stable@vger.kernel.org # v3.2.40+

70044d71

can: allow to change the device mtu for CAN FD capable devices · bc05a894

由 Oliver Hartkopp 提交于 2月 28, 2014

The configuration for CAN FD depends on CAN_CTRLMODE_FD enabled in the driver
specific ctrlmode_supported capabilities.

The configuration can be done either with the 'fd { on | off }' option in the
'ip' tool from iproute2 or by setting the CAN netdevice MTU to CAN_MTU (16) or
to CANFD_MTU (72).
Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
Acked-by: NStephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

bc05a894

can: introduce the data bitrate configuration for CAN FD · 9859ccd2

由 Oliver Hartkopp 提交于 2月 28, 2014

As CAN FD offers a second bitrate for the data section of the CAN frame the
infrastructure for storing and configuring this second bitrate is introduced.
Improved the readability of the if-statement by inserting some newlines.
Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
Acked-by: NStephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

9859ccd2

06 3月, 2014 3 次提交

netfilter: ipset: add forceadd kernel support for hash set types · 07cf8f5a

由 Josh Hunt 提交于 2月 28, 2014

Adds a new property for hash set types, where if a set is created
with the 'forceadd' option and the set becomes full the next addition
to the set may succeed and evict a random entry from the set.

To keep overhead low eviction is done very simply. It checks to see
which bucket the new entry would be added. If the bucket's pos value
is non-zero (meaning there's at least one entry in the bucket) it
replaces the first entry in the bucket. If pos is zero, then it continues
down the normal add process.

This property is useful if you have a set for 'ban' lists where it may
not matter if you release some entries from the set early.
Signed-off-by: NJosh Hunt <johunt@akamai.com>
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>

07cf8f5a

J
netfilter: ipset: Prepare the kernel for create option flags when no extension is needed · af284ece
由 Jozsef Kadlecsik 提交于 2月 13, 2014
```
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
```
af284ece

netfilter: ipset: add hash:ip,mark data type to ipset · 3b02b56c

由 Vytas Dauksa 提交于 12月 17, 2013

Introduce packet mark support with new ip,mark hash set. This includes
userspace and kernelspace code, hash:ip,mark set tests and man page
updates.

The intended use of ip,mark set is similar to the ip:port type, but for
protocols which don't use a predictable port number. Instead of port
number it matches a firewall mark determined by a layer 7 filtering
program like opendpi.

As well as allowing or blocking traffic it will also be used for
accounting packets and bytes sent for each protocol.
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>

3b02b56c

05 3月, 2014 1 次提交

UAPI: add MPLS label stack definition · f3baa393

由 Simon Wunderlich 提交于 3月 03, 2014

Labels for the Multiprotocol Label Switching are defined in RFC 3032
which was superseded by RFC 5462. Add the definition to UAPI and a stub
header for include/linux.
Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: NMathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3baa393

04 3月, 2014 4 次提交

mm: numa: bugfix for LAST_CPUPID_NOT_IN_PAGE_FLAGS · 1ae71d03

由 Liu Ping Fan 提交于 3月 03, 2014

When doing some numa tests on powerpc, I triggered an oops bug.  I find
it is caused by using page->_last_cpupid.  It should be initialized as
"-1 & LAST_CPUPID_MASK", but not "-1".  Otherwise, in task_numa_fault(),
we will miss the checking (last_cpupid == (-1 & LAST_CPUPID_MASK)).  And
finally cause an oops bug in task_numa_group(), since the online cpu is
less than possible cpu.  This happen with CONFIG_SPARSE_VMEMMAP disabled

Call trace:

  SMP NR_CPUS=64 NUMA PowerNV
  Modules linked in:
  CPU: 24 PID: 804 Comm: systemd-udevd Not tainted3.13.0-rc1+ #32
  task: c000001e2746aa80 ti: c000001e32c50000 task.ti:c000001e32c50000
  REGS: c000001e32c53510 TRAP: 0300   Not tainted(3.13.0-rc1+)
  MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR:28024424  XER: 20000000
  CFAR: c000000000009324 DAR: 7265717569726857 DSISR:40000000 SOFTE: 1
  NIP  .task_numa_fault+0x1470/0x2370
  LR  .task_numa_fault+0x1468/0x2370
  Call Trace:
   .task_numa_fault+0x1468/0x2370 (unreliable)
   .do_numa_page+0x480/0x4a0
   .handle_mm_fault+0x4ec/0xc90
   .do_page_fault+0x3a8/0x890
   handle_page_fault+0x10/0x30
  Instruction dump:
  3c82fefb 3884b138 48d9cff1 60000000 48000574 3c62fefb3863af78 3c82fefb
  3884b138 48d9cfd5 60000000 e93f0100 <812902e4> 7d2907b45529063e 7d2a07b4
  ---[ end trace 15f2510da5ae07cf ]---
Signed-off-by: NLiu Ping Fan <pingfank@linux.vnet.ibm.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1ae71d03

mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking · 9050d7eb

由 Vlastimil Babka 提交于 3月 03, 2014

Daniel Borkmann reported a VM_BUG_ON assertion failing:

  ------------[ cut here ]------------
  kernel BUG at mm/mlock.c:528!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: ccm arc4 iwldvm [...]
   video
  CPU: 3 PID: 2266 Comm: netsniff-ng Not tainted 3.14.0-rc2+ #8
  Hardware name: LENOVO 2429BP3/2429BP3, BIOS G4ET37WW (1.12 ) 05/29/2012
  task: ffff8801f87f9820 ti: ffff88002cb44000 task.ti: ffff88002cb44000
  RIP: 0010:[<ffffffff81171ad0>]  [<ffffffff81171ad0>] munlock_vma_pages_range+0x2e0/0x2f0
  Call Trace:
    do_munmap+0x18f/0x3b0
    vm_munmap+0x41/0x60
    SyS_munmap+0x22/0x30
    system_call_fastpath+0x1a/0x1f
  RIP   munlock_vma_pages_range+0x2e0/0x2f0
  ---[ end trace a0088dcf07ae10f2 ]---

because munlock_vma_pages_range() thinks it's unexpectedly in the middle
of a THP page.  This can be reproduced with default config since 3.11
kernels.  A reproducer can be found in the kernel's selftest directory
for networking by running ./psock_tpacket.

The problem is that an order=2 compound page (allocated by
alloc_one_pg_vec_page() is part of the munlocked VM_MIXEDMAP vma (mapped
by packet_mmap()) and mistaken for a THP page and assumed to be order=9.

The checks for THP in munlock came with commit ff6a6da6 ("mm:
accelerate munlock() treatment of THP pages"), i.e.  since 3.9, but did
not trigger a bug.  It just makes munlock_vma_pages_range() skip such
compound pages until the next 512-pages-aligned page, when it encounters
a head page.  This is however not a problem for vma's where mlocking has
no effect anyway, but it can distort the accounting.

Since commit 7225522b ("mm: munlock: batch non-THP page isolation
and munlock+putback using pagevec") this can trigger a VM_BUG_ON in
PageTransHuge() check.

This patch fixes the issue by adding VM_MIXEDMAP flag to VM_SPECIAL, a
list of flags that make vma's non-mlockable and non-mergeable.  The
reasoning is that VM_MIXEDMAP vma's are similar to VM_PFNMAP, which is
already on the VM_SPECIAL list, and both are intended for non-LRU pages
where mlocking makes no sense anyway.  Related Lkml discussion can be
found in [2].

 [1] tools/testing/selftests/net/psock_tpacket
 [2] https://lkml.org/lkml/2014/1/10/427Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Reported-by: NDaniel Borkmann <dborkman@redhat.com>
Tested-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Jared Hulbert <jaredeh@gmail.com>
Tested-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [3.11.x+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9050d7eb

mm: close PageTail race · 668f9abb

由 David Rientjes 提交于 3月 03, 2014

Commit bf6bddf1 ("mm: introduce compaction and migration for
ballooned pages") introduces page_count(page) into memory compaction
which dereferences page->first_page if PageTail(page).

This results in a very rare NULL pointer dereference on the
aforementioned page_count(page).  Indeed, anything that does
compound_head(), including page_count() is susceptible to racing with
prep_compound_page() and seeing a NULL or dangling page->first_page
pointer.

This patch uses Andrea's implementation of compound_trans_head() that
deals with such a race and makes it the default compound_head()
implementation.  This includes a read memory barrier that ensures that
if PageTail(head) is true that we return a head page that is neither
NULL nor dangling.  The patch then adds a store memory barrier to
prep_compound_page() to ensure page->first_page is set.

This is the safest way to ensure we see the head page that we are
expecting, PageTail(page) is already in the unlikely() path and the
memory barriers are unfortunately required.

Hugetlbfs is the exception, we don't enforce a store memory barrier
during init since no race is possible.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Cc: Holger Kiehl <Holger.Kiehl@dwd.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

668f9abb

tracing: Do not add event files for modules that fail tracepoints · 45ab2813

由 Steven Rostedt (Red Hat) 提交于 2月 26, 2014

If a module fails to add its tracepoints due to module tainting, do not
create the module event infrastructure in the debugfs directory. As the events
will not work and worse yet, they will silently fail, making the user wonder
why the events they enable do not display anything.

Having a warning on module load and the events not visible to the users
will make the cause of the problem much clearer.

Link: http://lkml.kernel.org/r/20140227154923.265882695@goodmis.org

Fixes: 6d723736 "tracing/events: add support for modules to TRACE_EVENT"
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: stable@vger.kernel.org # 2.6.31+
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

45ab2813

03 3月, 2014 2 次提交

net/mlx4_en: Use union for BlueFlame WQE · ec570940

由 Amir Vadai 提交于 3月 02, 2014

When BlueFlame is turned on, control segment of the TX WQE is changed,
and the second line of it is used for QPN.
Changed code to use a union in the mlx4_wqe_ctrl_seg instead of casting.
This makes the code clearer and solves the static checker warning:

drivers/net/ethernet/mellanox/mlx4/en_tx.c:839 mlx4_en_xmit()
	warn: potential memory corrupting cast 4 vs 2 bytes

CC: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec570940

net/mlx4: Replace mlx4_en_mac_to_u64() with mlx4_mac_to_u64() · 9813337a

由 Eugenia Emantayev 提交于 3月 02, 2014

Currently, the EN driver uses a private static function
mlx4_en_mac_to_u64(). Move it to a common include file (driver.h)
for mlx4_en and mlx4_ib for further use.
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9813337a

02 3月, 2014 1 次提交

NFSv4: Fix another nfs4_sequence corruptor · b7e63a10

由 Trond Myklebust 提交于 2月 26, 2014

nfs4_release_lockowner needs to set the rpc_message reply to point to
the nfs4_sequence_res in order to avoid another Oopsable situation
in nfs41_assign_slot.

Fixes: fbd4bfd1 (NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER)
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b7e63a10

01 3月, 2014 4 次提交

audit: Send replies in the proper network namespace. · 6f285b19

由 Eric W. Biederman 提交于 2月 28, 2014

In perverse cases of file descriptor passing the current network
namespace of a process and the network namespace of a socket used by
that socket may differ.  Therefore use the network namespace of the
appropiate socket to ensure replies always go to the appropiate
socket.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

6f285b19

ieee80211: remove function ieee80211_{dsss_chan_to_freq, freq_to_dsss_chan} · 3ebe8e25

由 Zhao, Gang 提交于 2月 18, 2014

Function ieee80211_{dsss_chan_to_freq, freq_to_dsss_chan} have been
replaced with ieee80211_{channel_to_frequency, frequency_to_channel}.

There should be no users of the two functions now. So remove them.

Cc: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: NZhao, Gang <gamerh2o@gmail.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

3ebe8e25

wl1251: move power GPIO handling into the driver · 1d207cd3

由 Sebastian Reichel 提交于 2月 15, 2014

Move the power GPIO handling from the board code into
the driver. This is a dependency for device tree support.
Signed-off-by: NSebastian Reichel <sre@debian.org>
Reviewed-by: NPavel Machek <pavel@ucw.cz>
Acked-by: NTony Lindgren <tony@atomide.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

1d207cd3

wl1251: split wl251 platform data to a separate structure · 946651cb

由 Luciano Coelho 提交于 2月 15, 2014

Move the wl1251 part of the wl12xx platform data structure into a new
structure specifically for wl1251.  Change the platform data built-in
block and board files accordingly.
Signed-off-by: NLuciano Coelho <coelho@ti.com>
Acked-by: NTony Lindgren <tony@atomide.com>
Reviewed-by: NFelipe Balbi <balbi@ti.com>
Reviewed-by: NSebastian Reichel <sre@debian.org>
Reviewed-by: NPavel Machek <pavel@ucw.cz>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

946651cb

28 2月, 2014 1 次提交

net: move net_device priv_flags out from UAPI · 7aa98047

由 Luis R. Rodriguez 提交于 2月 25, 2014

These are private to userspace, and they're unstable
anyway and can be shuffled at will (see 080e4130)
so any userspace application relying on them is on crack.

Test compiled with allyesconfig.

mcgrof@drvbp1 /pub/mem/mcgrof/net-next (git::master)$ make allyesconfig
mcgrof@drvbp1 /pub/mem/mcgrof/net-next (git::master)$ time make -j 20
...
  BUILD   arch/x86/boot/bzImage
Setup is 16992 bytes (padded to 17408 bytes).
System is 56153 kB
CRC 721d2751
Kernel: arch/x86/boot/bzImage is ready  (#1)
real    19m35.744s
user    280m37.984s
sys     27m54.104s

Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7aa98047

27 2月, 2014 3 次提交

tcp: switch rtt estimations to usec resolution · 740b0f18

由 Eric Dumazet 提交于 2月 26, 2014

Upcoming congestion controls for TCP require usec resolution for RTT
estimations. Millisecond resolution is simply not enough these days.

FQ/pacing in DC environments also require this change for finer control
and removal of bimodal behavior due to the current hack in
tcp_update_pacing_rate() for 'small rtt'

TCP_CONG_RTT_STAMP is no longer needed.

As Julian Anastasov pointed out, we need to keep user compatibility :
tcp_metrics used to export RTT and RTTVAR in msec resolution,
so we added RTT_US and RTTVAR_US. An iproute2 patch is needed
to use the new attributes if provided by the kernel.

In this example ss command displays a srtt of 32 usecs (10Gbit link)

lpk51:~# ./ss -i dst lpk52
Netid  State      Recv-Q Send-Q   Local Address:Port       Peer
Address:Port
tcp    ESTAB      0      1         10.246.11.51:42959
10.246.11.52:64614
         cubic wscale:6,6 rto:201 rtt:0.032/0.001 ato:40 mss:1448
cwnd:10 send
3620.0Mbps pacing_rate 7240.0Mbps unacked:1 rcv_rtt:993 rcv_space:29559

Updated iproute2 ip command displays :

lpk51:~# ./ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 274us rttvar 213us source
10.246.11.51

Old binary displays :

lpk51:~# ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 250us rttvar 125us source
10.246.11.51

With help from Julian Anastasov, Stephen Hemminger and Yuchung Cheng
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Larry Brakmo <brakmo@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

740b0f18

net: add skb_mstamp infrastructure · 363ec392

由 Eric Dumazet 提交于 2月 26, 2014

ktime_get() is too expensive on some cases, and we'd like to get
usec resolution timestamps in TCP stack.

This patch adds a light weight facility using a combination of
local_clock() and jiffies samples.

Instead of :

        u64 t0, t1;

        t0 = ktime_get();
        // stuff
        t1 = ktime_get();
        delta_us = ktime_us_delta(t1, t0);

use :
        struct skb_mstamp t0, t1;

        skb_mstamp_get(&t0);
        // stuff
        skb_mstamp_get(&t1);
        delta_us = skb_mstamp_us_delta(&t1, &t0);

Note : local_clock() might have a (bounded) drift between cpus.

Do not use this infra in place of ktime_get() without understanding the
issues.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Larry Brakmo <brakmo@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

363ec392

net: Add sysfs file for port number · 3f85944f

由 Amir Vadai 提交于 2月 25, 2014

Add a sysfs file to enable user space to query the device
port number used by a netdevice instance. This is needed for
devices that have multiple ports on the same PCI function.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f85944f

26 2月, 2014 1 次提交

ipc,mqueue: remove limits for the amount of system-wide queues · f3713fd9

由 Davidlohr Bueso 提交于 2月 25, 2014

Commit 93e6f119 ("ipc/mqueue: cleanup definition names and
locations") added global hardcoded limits to the amount of message
queues that can be created.  While these limits are per-namespace,
reality is that it ends up breaking userspace applications.
Historically users have, at least in theory, been able to create up to
INT_MAX queues, and limiting it to just 1024 is way too low and dramatic
for some workloads and use cases.  For instance, Madars reports:

 "This update imposes bad limits on our multi-process application.  As
  our app uses approaches that each process opens its own set of queues
  (usually something about 3-5 queues per process).  In some scenarios
  we might run up to 3000 processes or more (which of-course for linux
  is not a problem).  Thus we might need up to 9000 queues or more.  All
  processes run under one user."

Other affected users can be found in launchpad bug #1155695:
  https://bugs.launchpad.net/ubuntu/+source/manpages/+bug/1155695

Instead of increasing this limit, revert it entirely and fallback to the
original way of dealing queue limits -- where once a user's resource
limit is reached, and all memory is used, new queues cannot be created.
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Reported-by: NMadars Vitolins <m@silodev.com>
Acked-by: NDoug Ledford <dledford@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>	[3.5+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f3713fd9

25 2月, 2014 3 次提交

sysfs: fix namespace refcnt leak · fed95bab

由 Li Zefan 提交于 2月 25, 2014

As mount() and kill_sb() is not a one-to-one match, we shoudn't get
ns refcnt unconditionally in sysfs_mount(), and instead we should
get the refcnt only when kernfs_mount() allocated a new superblock.

v2:
- Changed the name of the new argument, suggested by Tejun.
- Made the argument optional, suggested by Tejun.

v3:
- Make the new argument as second-to-last arg, suggested by Tejun.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NTejun Heo <tj@kernel.org>
 ---
 fs/kernfs/mount.c      | 8 +++++++-
 fs/sysfs/mount.c       | 5 +++--
 include/linux/kernfs.h | 9 +++++----
 3 files changed, 15 insertions(+), 7 deletions(-)
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

fed95bab

netfilter: nfnetlink: add rcu_dereference_protected() helpers · 0eb5db7a

由 Patrick McHardy 提交于 2月 18, 2014

Add a lockdep_nfnl_is_held() function and a nfnl_dereference() macro for
RCU dereferences protected by a NFNL subsystem mutex.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

0eb5db7a

fsnotify: Allocate overflow events with proper type · ff57cd58

由 Jan Kara 提交于 2月 21, 2014

Commit 7053aee2 "fsnotify: do not share events between notification
groups" used overflow event statically allocated in a group with the
size of the generic notification event. This causes problems because
some code looks at type specific parts of event structure and gets
confused by a random data it sees there and causes crashes.

Fix the problem by allocating overflow event with type corresponding to
the group type so code cannot get confused.
Signed-off-by: NJan Kara <jack@suse.cz>

ff57cd58