提交 · cc110922da7e902b62d18641a370fec01a9fa794 · openanolis / cloud-kernel

27 8月, 2012 1 次提交

Bluetooth: Change signature of smp_conn_security() · cc110922

由 Vinicius Costa Gomes 提交于 8月 23, 2012

To make it clear that it may be called from contexts that may not have
any knowledge of L2CAP, we change the connection parameter, to receive
a hci_conn.

This also makes it clear that it is checking the security of the link.
Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>

cc110922

15 8月, 2012 1 次提交

Bluetooth: Fix use-after-free bug in SMP · 61a0cfb0

由 Andre Guedes 提交于 8月 01, 2012

If SMP fails, we should always cancel security_timer delayed work.
Otherwise, security_timer function may run after l2cap_conn object
has been freed.

This patch fixes the following warning reported by ODEBUG:

WARNING: at lib/debugobjects.c:261 debug_print_object+0x7c/0x8d()
Hardware name: Bochs
ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x27
Modules linked in: btusb bluetooth
Pid: 440, comm: kworker/u:2 Not tainted 3.5.0-rc1+ #4
Call Trace:
 [<ffffffff81174600>] ? free_obj_work+0x4a/0x7f
 [<ffffffff81023eb8>] warn_slowpath_common+0x7e/0x97
 [<ffffffff81023f65>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff811746b1>] debug_print_object+0x7c/0x8d
 [<ffffffff810394f0>] ? __queue_work+0x241/0x241
 [<ffffffff81174fdd>] debug_check_no_obj_freed+0x92/0x159
 [<ffffffff810ac08e>] slab_free_hook+0x6f/0x77
 [<ffffffffa0019145>] ? l2cap_conn_del+0x148/0x157 [bluetooth]
 [<ffffffff810ae408>] kfree+0x59/0xac
 [<ffffffffa0019145>] l2cap_conn_del+0x148/0x157 [bluetooth]
 [<ffffffffa001b9a2>] l2cap_recv_frame+0xa77/0xfa4 [bluetooth]
 [<ffffffff810592f9>] ? trace_hardirqs_on_caller+0x112/0x1ad
 [<ffffffffa001c86c>] l2cap_recv_acldata+0xe2/0x264 [bluetooth]
 [<ffffffffa0002b2f>] hci_rx_work+0x235/0x33c [bluetooth]
 [<ffffffff81038dc3>] ? process_one_work+0x126/0x2fe
 [<ffffffff81038e22>] process_one_work+0x185/0x2fe
 [<ffffffff81038dc3>] ? process_one_work+0x126/0x2fe
 [<ffffffff81059f2e>] ? lock_acquired+0x1b5/0x1cf
 [<ffffffffa00028fa>] ? le_scan_work+0x11d/0x11d [bluetooth]
 [<ffffffff81036fb6>] ? spin_lock_irq+0x9/0xb
 [<ffffffff81039209>] worker_thread+0xcf/0x175
 [<ffffffff8103913a>] ? rescuer_thread+0x175/0x175
 [<ffffffff8103cfe0>] kthread+0x95/0x9d
 [<ffffffff812c5054>] kernel_threadi_helper+0x4/0x10
 [<ffffffff812c36b0>] ? retint_restore_args+0x13/0x13
 [<ffffffff8103cf4b>] ? flush_kthread_worker+0xdb/0xdb
 [<ffffffff812c5050>] ? gs_change+0x13/0x13

This bug can be reproduced using hctool lecc or l2test tools and
bluetoothd not running.
Signed-off-by: NAndre Guedes <andre.guedes@openbossa.org>
Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>

61a0cfb0

07 8月, 2012 8 次提交

cfg80211: process pending events when unregistering net device · 1f6fc43e

由 Daniel Drake 提交于 8月 02, 2012

libertas currently calls cfg80211_disconnected() when it is being
brought down. This causes an event to be allocated, but since the
wdev is already removed from the rdev by the time that the event
processing work executes, the event is never processed or freed.
http://article.gmane.org/gmane.linux.kernel.wireless.general/95666

Fix this leak, and other possible situations, by processing the event
queue when a device is being unregistered. Thanks to Johannes Berg for
the suggestion.
Signed-off-by: NDaniel Drake <dsd@laptop.org>
Cc: stable@vger.kernel.org
Reviewed-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

1f6fc43e

Bluetooth: Fix socket not getting freed if l2cap channel create fails · 49dfbb91

由 Jaganath Kanakkassery 提交于 7月 19, 2012

If l2cap_chan_create() fails then it will return from l2cap_sock_kill
since zapped flag of sk is reset.
Signed-off-by: NJaganath Kanakkassery <jaganath.k@samsung.com>
Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>

49dfbb91

Bluetooth: smp: Fix possible NULL dereference · d08fd0e7

由 Andrei Emeltchenko 提交于 7月 19, 2012

smp_chan_create might return NULL so we need to check before
dereferencing smp.
Signed-off-by: NAndrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>

d08fd0e7

Bluetooth: Set name_state to unknown when entry name is empty · c3e7c0d9

由 Ram Malovany 提交于 7月 19, 2012

When the name of the given entry is empty , the state needs to be
updated accordingly.

Cc: stable@vger.kernel.org
Signed-off-by: NRam Malovany <ramm@ti.com>
Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>

c3e7c0d9

Bluetooth: Fix using a NULL inquiry cache entry · 7cc8380e

由 Ram Malovany 提交于 7月 19, 2012

If the device was not found in a list of found devices names of which
are pending.This may happen in a case when HCI Remote Name Request
was sent as a part of incoming connection establishment procedure.
Hence there is no need to continue resolving a next name as it will
be done upon receiving another Remote Name Request Complete Event.
This will fix a kernel crash when trying to use this entry to resolve
the next name.

Cc: stable@vger.kernel.org
Signed-off-by: NRam Malovany <ramm@ti.com>
Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>

7cc8380e

Bluetooth: Fix using NULL inquiry entry · c810089c

由 Ram Malovany 提交于 7月 19, 2012

If entry wasn't found in the hci_inquiry_cache_lookup_resolve do not
resolve the name.This will fix a kernel crash when trying to use NULL
pointer.

Cc: stable@vger.kernel.org
Signed-off-by: NRam Malovany <ramm@ti.com>
Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>

c810089c

Bluetooth: Fix legacy pairing with some devices · a9ea3ed9

由 Szymon Janc 提交于 7月 19, 2012

Some devices e.g. some Android based phones don't do SDP search before
pairing and cancel legacy pairing when ACL is disconnected.

PIN Code Request event which changes ACL timeout to HCI_PAIRING_TIMEOUT
is only received after remote user entered PIN.

In that case no L2CAP is connected so default HCI_DISCONN_TIMEOUT
(2 seconds) is being used to timeout ACL connection. This results in
problems with legacy pairing as remote user has only few seconds to
enter PIN before ACL is disconnected.

Increase disconnect timeout for incomming connection to
HCI_PAIRING_TIMEOUT if SSP is disabled and no linkey exists.

To avoid keeping ACL alive for too long after SDP search set ACL
timeout back to HCI_DISCONN_TIMEOUT when L2CAP is connected.

2012-07-19 13:24:43.413521 < HCI Command: Create Connection (0x01|0x0005) plen 13
    bdaddr 00:02:72:D6:6A:3F ptype 0xcc18 rswitch 0x01 clkoffset 0x0000
    Packet type: DM1 DM3 DM5 DH1 DH3 DH5
2012-07-19 13:24:43.425224 > HCI Event: Command Status (0x0f) plen 4
    Create Connection (0x01|0x0005) status 0x00 ncmd 1
2012-07-19 13:24:43.885222 > HCI Event: Role Change (0x12) plen 8
    status 0x00 bdaddr 00:02:72:D6:6A:3F role 0x01
    Role: Slave
2012-07-19 13:24:44.054221 > HCI Event: Connect Complete (0x03) plen 11
    status 0x00 handle 42 bdaddr 00:02:72:D6:6A:3F type ACL encrypt 0x00
2012-07-19 13:24:44.054313 < HCI Command: Read Remote Supported Features (0x01|0x001b) plen 2
    handle 42
2012-07-19 13:24:44.055176 > HCI Event: Page Scan Repetition Mode Change (0x20) plen 7
    bdaddr 00:02:72:D6:6A:3F mode 0
2012-07-19 13:24:44.056217 > HCI Event: Max Slots Change (0x1b) plen 3
    handle 42 slots 5
2012-07-19 13:24:44.059218 > HCI Event: Command Status (0x0f) plen 4
    Read Remote Supported Features (0x01|0x001b) status 0x00 ncmd 0
2012-07-19 13:24:44.062192 > HCI Event: Command Status (0x0f) plen 4
    Unknown (0x00|0x0000) status 0x00 ncmd 1
2012-07-19 13:24:44.067219 > HCI Event: Read Remote Supported Features (0x0b) plen 11
    status 0x00 handle 42
    Features: 0xbf 0xfe 0xcf 0xfe 0xdb 0xff 0x7b 0x87
2012-07-19 13:24:44.067248 < HCI Command: Read Remote Extended Features (0x01|0x001c) plen 3
    handle 42 page 1
2012-07-19 13:24:44.071217 > HCI Event: Command Status (0x0f) plen 4
    Read Remote Extended Features (0x01|0x001c) status 0x00 ncmd 1
2012-07-19 13:24:44.076218 > HCI Event: Read Remote Extended Features (0x23) plen 13
    status 0x00 handle 42 page 1 max 1
    Features: 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00
2012-07-19 13:24:44.076249 < HCI Command: Remote Name Request (0x01|0x0019) plen 10
    bdaddr 00:02:72:D6:6A:3F mode 2 clkoffset 0x0000
2012-07-19 13:24:44.081218 > HCI Event: Command Status (0x0f) plen 4
    Remote Name Request (0x01|0x0019) status 0x00 ncmd 1
2012-07-19 13:24:44.105214 > HCI Event: Remote Name Req Complete (0x07) plen 255
    status 0x00 bdaddr 00:02:72:D6:6A:3F name 'uw000951-0'
2012-07-19 13:24:44.105284 < HCI Command: Authentication Requested (0x01|0x0011) plen 2
    handle 42
2012-07-19 13:24:44.111207 > HCI Event: Command Status (0x0f) plen 4
    Authentication Requested (0x01|0x0011) status 0x00 ncmd 1
2012-07-19 13:24:44.112220 > HCI Event: Link Key Request (0x17) plen 6
    bdaddr 00:02:72:D6:6A:3F
2012-07-19 13:24:44.112249 < HCI Command: Link Key Request Negative Reply (0x01|0x000c) plen 6
    bdaddr 00:02:72:D6:6A:3F
2012-07-19 13:24:44.115215 > HCI Event: Command Complete (0x0e) plen 10
    Link Key Request Negative Reply (0x01|0x000c) ncmd 1
    status 0x00 bdaddr 00:02:72:D6:6A:3F
2012-07-19 13:24:44.116215 > HCI Event: PIN Code Request (0x16) plen 6
    bdaddr 00:02:72:D6:6A:3F
2012-07-19 13:24:48.099184 > HCI Event: Auth Complete (0x06) plen 3
    status 0x13 handle 42
    Error: Remote User Terminated Connection
2012-07-19 13:24:48.179182 > HCI Event: Disconn Complete (0x05) plen 4
    status 0x00 handle 42 reason 0x13
    Reason: Remote User Terminated Connection

Cc: stable@vger.kernel.org
Signed-off-by: NSzymon Janc <szymon.janc@tieto.com>
Acked-by: NJohan Hedberg <johan.hedberg@intel.com>
Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>

a9ea3ed9

Bluetooth: Fix possible deadlock in SCO code · 269c4845

由 Gustavo Padovan 提交于 6月 15, 2012

sco_chan_del() only has conn != NULL when called from sco_conn_del() so
just move the code from it that deal with conn to sco_conn_del().

[  120.765529]
[  120.765529] ======================================================
[  120.766529] [ INFO: possible circular locking dependency detected ]
[  120.766529] 3.5.0-rc1-10292-g3701f944-dirty #70 Tainted: G        W
[  120.766529] -------------------------------------------------------
[  120.766529] kworker/u:3/1497 is trying to acquire lock:
[  120.766529]  (&(&conn->lock)->rlock#2){+.+...}, at:
[<ffffffffa00b7ecc>] sco_chan_del+0x4c/0x170 [bluetooth]
[  120.766529]
[  120.766529] but task is already holding lock:
[  120.766529]  (slock-AF_BLUETOOTH-BTPROTO_SCO){+.+...}, at:
[<ffffffffa00b8401>] sco_conn_del+0x61/0xe0 [bluetooth]
[  120.766529]
[  120.766529] which lock already depends on the new lock.
[  120.766529]
[  120.766529]
[  120.766529] the existing dependency chain (in reverse order) is:
[  120.766529]
[  120.766529] -> #1 (slock-AF_BLUETOOTH-BTPROTO_SCO){+.+...}:
[  120.766529]        [<ffffffff8107980e>] lock_acquire+0x8e/0xb0
[  120.766529]        [<ffffffff813c19e0>] _raw_spin_lock+0x40/0x80
[  120.766529]        [<ffffffffa00b85e9>] sco_connect_cfm+0x79/0x300
[bluetooth]
[  120.766529]        [<ffffffffa0094b13>]
hci_sync_conn_complete_evt.isra.90+0x343/0x400 [bluetooth]
[  120.766529]        [<ffffffffa009d447>] hci_event_packet+0x317/0xfb0
[bluetooth]
[  120.766529]        [<ffffffffa008aa68>] hci_rx_work+0x2c8/0x890
[bluetooth]
[  120.766529]        [<ffffffff81047db7>] process_one_work+0x197/0x460
[  120.766529]        [<ffffffff810489d6>] worker_thread+0x126/0x2d0
[  120.766529]        [<ffffffff8104ee4d>] kthread+0x9d/0xb0
[  120.766529]        [<ffffffff813c4294>] kernel_thread_helper+0x4/0x10
[  120.766529]
[  120.766529] -> #0 (&(&conn->lock)->rlock#2){+.+...}:
[  120.766529]        [<ffffffff81078a8a>] __lock_acquire+0x154a/0x1d30
[  120.766529]        [<ffffffff8107980e>] lock_acquire+0x8e/0xb0
[  120.766529]        [<ffffffff813c19e0>] _raw_spin_lock+0x40/0x80
[  120.766529]        [<ffffffffa00b7ecc>] sco_chan_del+0x4c/0x170
[bluetooth]
[  120.766529]        [<ffffffffa00b8414>] sco_conn_del+0x74/0xe0
[bluetooth]
[  120.766529]        [<ffffffffa00b88a2>] sco_disconn_cfm+0x32/0x60
[bluetooth]
[  120.766529]        [<ffffffffa0093a82>]
hci_disconn_complete_evt.isra.53+0x242/0x390 [bluetooth]
[  120.766529]        [<ffffffffa009d747>] hci_event_packet+0x617/0xfb0
[bluetooth]
[  120.766529]        [<ffffffffa008aa68>] hci_rx_work+0x2c8/0x890
[bluetooth]
[  120.766529]        [<ffffffff81047db7>] process_one_work+0x197/0x460
[  120.766529]        [<ffffffff810489d6>] worker_thread+0x126/0x2d0
[  120.766529]        [<ffffffff8104ee4d>] kthread+0x9d/0xb0
[  120.766529]        [<ffffffff813c4294>] kernel_thread_helper+0x4/0x10
[  120.766529]
[  120.766529] other info that might help us debug this:
[  120.766529]
[  120.766529]  Possible unsafe locking scenario:
[  120.766529]
[  120.766529]        CPU0                    CPU1
[  120.766529]        ----                    ----
[  120.766529]   lock(slock-AF_BLUETOOTH-BTPROTO_SCO);
[  120.766529]
lock(&(&conn->lock)->rlock#2);
[  120.766529]
lock(slock-AF_BLUETOOTH-BTPROTO_SCO);
[  120.766529]   lock(&(&conn->lock)->rlock#2);
[  120.766529]
[  120.766529]  *** DEADLOCK ***
Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>

269c4845

02 8月, 2012 8 次提交

cfg80211: Clear "beacon_found" on regulatory restore · 899852af

由 Paul Stewart 提交于 8月 01, 2012

Restore the default state to the "beacon_found" flag when
the channel flags are restored.  Otherwise, we can end up
with a channel that we can no longer transmit on even when
we can see beacons on that channel.
Signed-off-by: NPaul Stewart <pstew@chromium.org>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

899852af

cfg80211: add channel flag to prohibit OFDM operation · 03f6b084

由 Seth Forshee 提交于 8月 01, 2012

Currently the only way for wireless drivers to tell whether or not OFDM
is allowed on the current channel is to check the regulatory
information. However, this requires hodling cfg80211_mutex, which is not
visible to the drivers.

Other regulatory restrictions are provided as flags in the channel
definition, so let's do similarly with OFDM. This patch adds a new flag,
IEEE80211_CHAN_NO_OFDM, to tell drivers that OFDM on a channel is not
allowed. This flag is set on any channels for which regulatory indicates
that OFDM is prohibited.
Signed-off-by: NSeth Forshee <seth.forshee@canonical.com>
Tested-by: NArend van Spriel <arend@broadcom.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

03f6b084

ipv4: route.c cleanup · e33cdac0

由 Eric Dumazet 提交于 8月 01, 2012

Remove unused includes after IP cache removal
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e33cdac0

Fix unexpected SA hard expiration after changing date · e3c0d047

由 Fan Du 提交于 7月 30, 2012

After SA is setup, one timer is armed to detect soft/hard expiration,
however the timer handler uses xtime to do the math. This makes hard
expiration occurs first before soft expiration after setting new date
with big interval. As a result new child SA is deleted before rekeying
the new one.
Signed-off-by: NFan Du <fdu@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3c0d047

tcp: Apply device TSO segment limit earlier · 1485348d

由 Ben Hutchings 提交于 7月 30, 2012

Cache the device gso_max_segs in sock::sk_gso_max_segs and use it to
limit the size of TSO skbs.  This avoids the need to fall back to
software GSO for local TCP senders.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1485348d

net: Allow driver to limit number of GSO segments per skb · 30b678d8

由 Ben Hutchings 提交于 7月 30, 2012

A peer (or local user) may cause TCP to use a nominal MSS of as little
as 88 (actual MSS of 76 with timestamps).  Given that we have a
sufficiently prodigious local sender and the peer ACKs quickly enough,
it is nevertheless possible to grow the window for such a connection
to the point that we will try to send just under 64K at once.  This
results in a single skb that expands to 861 segments.

In some drivers with TSO support, such an skb will require hundreds of
DMA descriptors; a substantial fraction of a TX ring or even more than
a full ring.  The TX queue selected for the skb may stall and trigger
the TX watchdog repeatedly (since the problem skb will be retried
after the TX reset).  This particularly affects sfc, for which the
issue is designated as CVE-2012-3412.

Therefore:
1. Add the field net_device::gso_max_segs holding the device-specific
   limit.
2. In netif_skb_features(), if the number of segments is too high then
   mask out GSO features to force fall back to software GSO.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30b678d8

mac80211: cancel mesh path timer · dd4c9260

由 Johannes Berg 提交于 8月 01, 2012

The mesh path timer needs to be canceled when
leaving the mesh as otherwise it could fire
after the interface has been removed already.

Cc: stable@vger.kernel.org
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

dd4c9260

mac80211: clear timer bits when disconnecting · 2d9957cc

由 Johannes Berg 提交于 8月 01, 2012

There's a corner case that can happen when we
suspend with a timer running, then resume and
disconnect. If we connect again, suspend and
resume we might start timers that shouldn't be
running. Reset the timer flags to avoid this.

This affects both mesh and managed modes.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

2d9957cc

01 8月, 2012 11 次提交

nfs: enable swap on NFS · a564b8f0

由 Mel Gorman 提交于 7月 31, 2012

Implement the new swapfile a_ops for NFS and hook up ->direct_IO.  This
will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under
PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol
->connect() method.

PF_MEMALLOC should allow the allocation of struct socket and related
objects and the early (re)setting of SOCK_MEMALLOC should allow us to
receive the packets required for the TCP connection buildup.

[jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases]
[dfeng@redhat.com: Fix handling of multiple swap files]
[a.p.zijlstra@chello.nl: Original patch]
Signed-off-by: NMel Gorman <mgorman@suse.de>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a564b8f0

netvm: prevent a stream-specific deadlock · c76562b6

由 Mel Gorman 提交于 7月 31, 2012

This patch series is based on top of "Swap-over-NBD without deadlocking
v15" as it depends on the same reservation of PF_MEMALLOC reserves logic.

When a user or administrator requires swap for their application, they
create a swap partition and file, format it with mkswap and activate it
with swapon.  In diskless systems this is not an option so if swap if
required then swapping over the network is considered.  The two likely
scenarios are when blade servers are used as part of a cluster where the
form factor or maintenance costs do not allow the use of disks and thin
clients.

The Linux Terminal Server Project recommends the use of the Network Block
Device (NBD) for swap but this is not always an option.  There is no
guarantee that the network attached storage (NAS) device is running Linux
or supports NBD.  However, it is likely that it supports NFS so there are
users that want support for swapping over NFS despite any performance
concern.  Some distributions currently carry patches that support swapping
over NFS but it would be preferable to support it in the mainline kernel.

Patch 1 avoids a stream-specific deadlock that potentially affects TCP.

Patch 2 is a small modification to SELinux to avoid using PFMEMALLOC
	reserves.

Patch 3 adds three helpers for filesystems to handle swap cache pages.
	For example, page_file_mapping() returns page->mapping for
	file-backed pages and the address_space of the underlying
	swap file for swap cache pages.

Patch 4 adds two address_space_operations to allow a filesystem
	to pin all metadata relevant to a swapfile in memory. Upon
	successful activation, the swapfile is marked SWP_FILE and
	the address space operation ->direct_IO is used for writing
	and ->readpage for reading in swap pages.

Patch 5 notes that patch 3 is bolting
	filesystem-specific-swapfile-support onto the side and that
	the default handlers have different information to what
	is available to the filesystem. This patch refactors the
	code so that there are generic handlers for each of the new
	address_space operations.

Patch 6 adds an API to allow a vector of kernel addresses to be
	translated to struct pages and pinned for IO.

Patch 7 adds support for using highmem pages for swap by kmapping
	the pages before calling the direct_IO handler.

Patch 8 updates NFS to use the helpers from patch 3 where necessary.

Patch 9 avoids setting PF_private on PG_swapcache pages within NFS.

Patch 10 implements the new swapfile-related address_space operations
	for NFS and teaches the direct IO handler how to manage
	kernel addresses.

Patch 11 prevents page allocator recursions in NFS by using GFP_NOIO
	where appropriate.

Patch 12 fixes a NULL pointer dereference that occurs when using
	swap-over-NFS.

With the patches applied, it is possible to mount a swapfile that is on an
NFS filesystem.  Swap performance is not great with a swap stress test
taking roughly twice as long to complete than if the swap device was
backed by NBD.

This patch: netvm: prevent a stream-specific deadlock

It could happen that all !SOCK_MEMALLOC sockets have buffered so much data
that we're over the global rmem limit.  This will prevent SOCK_MEMALLOC
buffers from receiving data, which will prevent userspace from running,
which is needed to reduce the buffered data.

Fix this by exempting the SOCK_MEMALLOC sockets from the rmem limit.  Once
this change it applied, it is important that sockets that set
SOCK_MEMALLOC do not clear the flag until the socket is being torn down.
If this happens, a warning is generated and the tokens reclaimed to avoid
accounting errors until the bug is fixed.

[davem@davemloft.net: Warning about clearing SOCK_MEMALLOC]
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c76562b6

netvm: set PF_MEMALLOC as appropriate during SKB processing · b4b9e355

由 Mel Gorman 提交于 7月 31, 2012

In order to make sure pfmemalloc packets receive all memory needed to
proceed, ensure processing of pfmemalloc SKBs happens under PF_MEMALLOC.
This is limited to a subset of protocols that are expected to be used for
writing to swap.  Taps are not allowed to use PF_MEMALLOC as these are
expected to communicate with userspace processes which could be paged out.

[a.p.zijlstra@chello.nl: Ideas taken from various patches]
[jslaby@suse.cz: Lock imbalance fix]
Signed-off-by: NMel Gorman <mgorman@suse.de>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b4b9e355

netvm: allow skb allocation to use PFMEMALLOC reserves · c93bdd0e

由 Mel Gorman 提交于 7月 31, 2012

Change the skb allocation API to indicate RX usage and use this to fall
back to the PFMEMALLOC reserve when needed.  SKBs allocated from the
reserve are tagged in skb->pfmemalloc.  If an SKB is allocated from the
reserve and the socket is later found to be unrelated to page reclaim, the
packet is dropped so that the memory remains available for page reclaim.
Network protocols are expected to recover from this packet loss.

[a.p.zijlstra@chello.nl: Ideas taken from various patches]
[davem@davemloft.net: Use static branches, coding style corrections]
[sebastian@breakpoint.cc: Avoid unnecessary cast, fix !CONFIG_NET build]
Signed-off-by: NMel Gorman <mgorman@suse.de>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c93bdd0e

netvm: allow the use of __GFP_MEMALLOC by specific sockets · 7cb02404

由 Mel Gorman 提交于 7月 31, 2012

Allow specific sockets to be tagged SOCK_MEMALLOC and use __GFP_MEMALLOC
for their allocations.  These sockets will be able to go below watermarks
and allocate from the emergency reserve.  Such sockets are to be used to
service the VM (iow.  to swap over).  They must be handled kernel side,
exposing such a socket to user-space is a bug.

There is a risk that the reserves be depleted so for now, the
administrator is responsible for increasing min_free_kbytes as necessary
to prevent deadlock for their workloads.

[a.p.zijlstra@chello.nl: Original patches]
Signed-off-by: NMel Gorman <mgorman@suse.de>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7cb02404

net: introduce sk_gfp_atomic() to allow addition of GFP flags depending on the individual socket · 99a1dec7

由 Mel Gorman 提交于 7月 31, 2012

Introduce sk_gfp_atomic(), this function allows to inject sock specific
flags to each sock related allocation.  It is only used on allocation
paths that may be required for writing pages back to network storage.

[davem@davemloft.net: Use sk_gfp_atomic only when necessary]
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

99a1dec7

memcg: rename config variables · c255a458

由 Andrew Morton 提交于 7月 31, 2012

Sanity:

CONFIG_CGROUP_MEM_RES_CTLR -> CONFIG_MEMCG
CONFIG_CGROUP_MEM_RES_CTLR_SWAP -> CONFIG_MEMCG_SWAP
CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED -> CONFIG_MEMCG_SWAP_ENABLED
CONFIG_CGROUP_MEM_RES_CTLR_KMEM -> CONFIG_MEMCG_KMEM

[mhocko@suse.cz: fix missed bits]
Cc: Glauber Costa <glommer@parallels.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c255a458

ipv4: Properly purge netdev references on uncached routes. · caacf05e

由 David S. Miller 提交于 7月 31, 2012

When a device is unregistered, we have to purge all of the
references to it that may exist in the entire system.

If a route is uncached, we currently have no way of accomplishing
this.

So create a global list that is scanned when a network device goes
down.  This mirrors the logic in net/core/dst.c's dst_ifdown().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

caacf05e

D
ipv4: Cache routes in nexthop exception entries. · c5038a83
由 David S. Miller 提交于 7月 31, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
c5038a83

ipv4: percpu nh_rth_output cache · d26b3a7c

由 Eric Dumazet 提交于 7月 31, 2012

Input path is mostly run under RCU and doesnt touch dst refcnt

But output path on forwarding or UDP workloads hits
badly dst refcount, and we have lot of false sharing, for example
in ipv4_mtu() when reading rt->rt_pmtu

Using a percpu cache for nh_rth_output gives a nice performance
increase at a small cost.

24 udpflood test on my 24 cpu machine (dummy0 output device)
(each process sends 1.000.000 udp frames, 24 processes are started)

before : 5.24 s
after : 2.06 s
For reference, time on linux-3.5 : 6.60 s
Signed-off-by: NEric Dumazet <edumazet@google.com>
Tested-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d26b3a7c

ipv4: Restore old dst_free() behavior. · 54764bb6

由 Eric Dumazet 提交于 7月 31, 2012

commit 404e0a8b (net: ipv4: fix RCU races on dst refcounts) tried
to solve a race but added a problem at device/fib dismantle time :

We really want to call dst_free() as soon as possible, even if sockets
still have dst in their cache.
dst_release() calls in free_fib_info_rcu() are not welcomed.

Root of the problem was that now we also cache output routes (in
nh_rth_output), we must use call_rcu() instead of call_rcu_bh() in
rt_free(), because output route lookups are done in process context.

Based on feedback and initial patch from David Miller (adding another
call_rcu_bh() call in fib, but it appears it was not the right fix)

I left the inet_sk_rx_dst_set() helper and added __rcu attributes
to nh_rth_output and nh_rth_input to better document what is going on in
this code.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54764bb6

31 7月, 2012 11 次提交

libceph: recheck con state after allocating incoming message · 61399191

由 Sage Weil 提交于 7月 30, 2012

We drop the lock when calling the ->alloc_msg() con op, which means
we need to (a) not clobber con->in_msg without the mutex held, and (b)
we need to verify that we are still in the OPEN state when we retake
it to avoid causing any mayhem.  If the state does change, -EAGAIN
will get us back to con_work() and loop.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

61399191

libceph: change ceph_con_in_msg_alloc convention to be less weird · 4740a623

由 Sage Weil 提交于 7月 30, 2012

This function's calling convention is very limiting.  In particular,
we can't return any error other than ENOMEM (and only implicitly),
which is a problem (see next patch).

Instead, return an normal 0 or error code, and make the skip a pointer
output parameter.  Drop the useless in_hdr argument (we have the con
pointer).
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

4740a623

libceph: avoid dropping con mutex before fault · 8636ea67

由 Sage Weil 提交于 7月 30, 2012

The ceph_fault() function takes the con mutex, so we should avoid
dropping it before calling it.  This fixes a potential race with
another thread calling ceph_con_close(), or _open(), or similar (we
don't reverify con->state after retaking the lock).

Add annotation so that lockdep realizes we will drop the mutex before
returning.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

8636ea67

libceph: verify state after retaking con lock after dispatch · 7b862e07

由 Sage Weil 提交于 7月 30, 2012

We drop the con mutex when delivering a message.  When we retake the
lock, we need to verify we are still in the OPEN state before
preparing to read the next tag, or else we risk stepping on a
connection that has been closed.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

7b862e07

libceph: revoke mon_client messages on session restart · 4f471e4a

由 Sage Weil 提交于 7月 30, 2012

Revoke all mon_client messages when we shut down the old connection.
This is mostly moot since we are re-using the same ceph_connection,
but it is cleaner.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

4f471e4a

libceph: fix handling of immediate socket connect failure · 8007b8d6

由 Sage Weil 提交于 7月 30, 2012

If the connect() call immediately fails such that sock == NULL, we
still need con_close_socket() to reset our socket state to CLOSED.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

8007b8d6

libceph: be less chatty about stray replies · 756a16a5

由 Sage Weil 提交于 7月 30, 2012

There are many (normal) conditions that can lead to us getting
unexpected replies, include cluster topology changes, osd failures,
and timeouts.  There's no need to spam the console about it.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

756a16a5

S
libceph: clear all flags on con_close · 43c7427d
由 Sage Weil 提交于 7月 20, 2012
```
Signed-off-by: NSage Weil <sage@inktank.com>
```
43c7427d

libceph: clean up con flags · 4a861692

由 Sage Weil 提交于 7月 20, 2012

Rename flags with CON_FLAG prefix, move the definitions into the c file,
and (better) document their meaning.
Signed-off-by: NSage Weil <sage@inktank.com>

4a861692

libceph: replace connection state bits with states · 8dacc7da

由 Sage Weil 提交于 7月 20, 2012

Use a simple set of 6 enumerated values for the socket states (CON_STATE_*)
and use those instead of the state bits. All of the con->state checks are
now under the protection of the con mutex, so this is safe. It also
simplifies many of the state checks because we can check for anything other
than the expected state instead of various bits for races we can think of.

This appears to hold up well to stress testing both with and without socket
failure injection on the server side.
Signed-off-by: NSage Weil <sage@inktank.com>

8dacc7da

S
libceph: drop unnecessary CLOSED check in socket state change callback · d7353dd5
由 Sage Weil 提交于 7月 20, 2012
```
If we are CLOSED, the socket is closed and we won't get these.
Signed-off-by: NSage Weil <sage@inktank.com>
```
d7353dd5

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功