提交 · 6e8575faa8fa680d59404a4d58d12190667be815 · openeuler / raspberrypi-kernel

28 12月, 2012 4 次提交

libceph: fix protocol feature mismatch failure path · 0fa6ebc6

由 Sage Weil 提交于 12月 27, 2012

We should not set con->state to CLOSED here; that happens in
ceph_fault() in the caller, where it first asserts that the state
is not yet CLOSED.  Avoids a BUG when the features don't match.

Since the fail_protocol() has become a trivial wrapper, replace
calls to it with direct calls to reset_connection().
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

0fa6ebc6

libceph: WARN, don't BUG on unexpected connection states · 122070a2

由 Alex Elder 提交于 12月 26, 2012

A number of assertions in the ceph messenger are implemented with
BUG_ON(), killing the system if connection's state doesn't match
what's expected.  At this point our state model is (evidently) not
well understood enough for these assertions to trigger a BUG().
Convert all BUG_ON(con->state...) calls to be WARN_ON(con->state...)
so we learn about these issues without killing the machine.

We now recognize that a connection fault can occur due to a socket
closure at any time, regardless of the state of the connection.  So
there is really nothing we can assert about the state of the
connection at that point so eliminate that assertion.
Reported-by: NUgis <ugis22@gmail.com>
Tested-by: NUgis <ugis22@gmail.com>
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

122070a2

libceph: always reset osds when kicking · e6d50f67

由 Alex Elder 提交于 12月 26, 2012

When ceph_osdc_handle_map() is called to process a new osd map,
kick_requests() is called to ensure all affected requests are
updated if necessary to reflect changes in the osd map.  This
happens in two cases:  whenever an incremental map update is
processed; and when a full map update (or the last one if there is
more than one) gets processed.

In the former case, the kick_requests() call is followed immediately
by a call to reset_changed_osds() to ensure any connections to osds
affected by the map change are reset.  But for full map updates
this isn't done.

Both cases should be doing this osd reset.

Rather than duplicating the reset_changed_osds() call, move it into
the end of kick_requests().
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

e6d50f67

libceph: move linger requests sooner in kick_requests() · ab60b16d

由 Alex Elder 提交于 12月 19, 2012

The kick_requests() function is called by ceph_osdc_handle_map()
when an osd map change has been indicated.  Its purpose is to
re-queue any request whose target osd is different from what it
was when it was originally sent.

It is structured as two loops, one for incomplete but registered
requests, and a second for handling completed linger requests.
As a special case, in the first loop if a request marked to linger
has not yet completed, it is moved from the request list to the
linger list.  This is as a quick and dirty way to have the second
loop handle sending the request along with all the other linger
requests.

Because of the way it's done now, however, this quick and dirty
solution can result in these incomplete linger requests never
getting re-sent as desired.  The problem lies in the fact that
the second loop only arranges for a linger request to be sent
if it appears its target osd has changed.  This is the proper
handling for *completed* linger requests (it avoids issuing
the same linger request twice to the same osd).

But although the linger requests added to the list in the first loop
may have been sent, they have not yet completed, so they need to be
re-sent regardless of whether their target osd has changed.

The first required fix is we need to avoid calling __map_request()
on any incomplete linger request.  Otherwise the subsequent
__map_request() call in the second loop will find the target osd
has not changed and will therefore not re-send the request.

Second, we need to be sure that a sent but incomplete linger request
gets re-sent.  If the target osd is the same with the new osd map as
it was when the request was originally sent, this won't happen.
This can be fixed through careful handling when we move these
requests from the request list to the linger list, by unregistering
the request *before* it is registered as a linger request.  This
works because a side-effect of unregistering the request is to make
the request's r_osd pointer be NULL, and *that* will ensure the
second loop actually re-sends the linger request.

Processing of such a request is done at that point, so continue with
the next one once it's been moved.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

ab60b16d

21 12月, 2012 5 次提交

libceph: register request before unregister linger · c89ce05e

由 Alex Elder 提交于 12月 06, 2012

In kick_requests(), we need to register the request before we
unregister the linger request.  Otherwise the unregister will
reset the request's osd pointer to NULL.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

c89ce05e

libceph: don't use rb_init_node() in ceph_osdc_alloc_request() · a978fa20

由 Alex Elder 提交于 12月 17, 2012

The red-black node in the ceph osd request structure is initialized
in ceph_osdc_alloc_request() using rbd_init_node().  We do need to
initialize this, because in __unregister_request() we call
RB_EMPTY_NODE(), which expects the node it's checking to have
been initialized.  But rb_init_node() is apparently overkill, and
may in fact be on its way out.  So use RB_CLEAR_NODE() instead.

For a little more background, see this commit:
    4c199a93 rbtree: empty nodes have no color"
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a978fa20

libceph: init event->node in ceph_osdc_create_event() · 3ee5234d

由 Alex Elder 提交于 12月 17, 2012

The red-black node node in the ceph osd event structure is not
initialized in create_osdc_create_event().  Because this node can
be the subject of a RB_EMPTY_NODE() call later on, we should ensure
the node is initialized properly for that.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

3ee5234d

libceph: init osd->o_node in create_osd() · f407731d

由 Alex Elder 提交于 12月 06, 2012

The red-black node node in the ceph osd structure is not initialized
in create_osd().  Because this node can be the subject of a
RB_EMPTY_NODE() call later on, we should ensure the node is
initialized properly for that.  Add a call to RB_CLEAR_NODE()
initialize it.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

f407731d

libceph: report connection fault with warning · 28362986

由 Alex Elder 提交于 12月 14, 2012

When a connection's socket disconnects, or if there's a protocol
error of some kind on the connection, a fault is signaled and
the connection is reset (closed and reopened, basically).  We
currently get an error message on the log whenever this occurs.

A ceph connection will attempt to reestablish a socket connection
repeatedly if a fault occurs.  This means that these error messages
will get repeatedly added to the log, which is undesirable.

Change the error message to be a warning, so they don't get
logged by default.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

28362986

18 12月, 2012 2 次提交

libceph: socket can close in any connection state · 7bb21d68

由 Alex Elder 提交于 12月 07, 2012

A connection's socket can close for any reason, independent of the
state of the connection (and without irrespective of the connection
mutex).  As a result, the connectino can be in pretty much any state
at the time its socket is closed.

Handle those other cases at the top of con_work().  Pull this whole
block of code into a separate function to reduce the clutter.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

7bb21d68

rbd: remove linger unconditionally · 61c74035

由 Alex Elder 提交于 12月 06, 2012

In __unregister_linger_request(), the request is being removed
from the osd client's req_linger list only when the request
has a non-null osd pointer.  It should be done whether or not
the request currently has an osd.

This is most likely a non-issue because I believe the request
will always have an osd when this function is called.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

61c74035

17 12月, 2012 2 次提交

libceph: avoid using freed osd in __kick_osd_requests() · 685a7555

由 Alex Elder 提交于 12月 07, 2012

If an osd has no requests and no linger requests, __reset_osd()
will just remove it with a call to __remove_osd().  That drops
a reference to the osd, and therefore the osd may have been free
by the time __reset_osd() returns.  That function offers no
indication this may have occurred, and as a result the osd will
continue to be used even when it's no longer valid.

Change__reset_osd() so it returns an error (ENODEV) when it
deletes the osd being reset.  And change __kick_osd_requests() so it
returns immediately (before referencing osd again) if __reset_osd()
returns *any* error.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

685a7555

ceph: don't reference req after put · 7d5f2481

由 Alex Elder 提交于 11月 29, 2012

In __unregister_request(), there is a call to list_del_init()
referencing a request that was the subject of a call to
ceph_osdc_put_request() on the previous line.  This is not
safe, because the request structure could have been freed
by the time we reach the list_del_init().

Fix this by reversing the order of these lines.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-off-by: NSage Weil <sage@inktank.com>

7d5f2481

13 12月, 2012 1 次提交

libceph: remove 'osdtimeout' option · 83aff95e

由 Sage Weil 提交于 11月 28, 2012

This would reset a connection with any OSD that had an outstanding
request that was taking more than N seconds.  The idea was that if the
OSD was buggy, the client could compensate by resending the request.

In reality, this only served to hide server bugs, and we haven't
actually seen such a bug in quite a while.  Moreover, the userspace
client code never did this.

More importantly, often the request is taking a long time because the
OSD is trying to recover, or overloaded, and killing the connection
and retrying would only make the situation worse by giving the OSD
more work to do.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

83aff95e

01 11月, 2012 1 次提交

libceph: define ceph_pg_pool_name_by_id() · 72afc71f

由 Alex Elder 提交于 10月 30, 2012

Define and export function ceph_pg_pool_name_by_id() to supply
the name of a pg pool whose id is given.  This will be used by
the next patch.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

72afc71f

30 10月, 2012 1 次提交

libceph: fix osdmap decode error paths · 0ed7285e

由 Sage Weil 提交于 10月 29, 2012

Ensure that we set the err value correctly so that we do not pass a 0
value to ERR_PTR and confuse the calling code.  (In particular,
osd_client.c handle_map() will BUG(!newmap)).
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

0ed7285e

27 10月, 2012 1 次提交

libceph: avoid NULL kref_put from NULL alloc_msg return · 7246240c

由 Sage Weil 提交于 10月 25, 2012

The ceph_on_in_msg_alloc() method calls the ->alloc_msg() helper which
may return NULL. It also drops con->mutex while it allocates a message,
which means that the connection state may change (e.g., get closed). If
that happens, we clean up and bail out. Avoid calling ceph_msg_put() on
a NULL return value and triggering a crash.

This was observed when an ->alloc_msg() call races with a timeout that
resends a zillion messages and resets the connection, and ->alloc_msg()
returns NULL (because the request was resent to another target).

Fixes http://tracker.newdream.net/issues/3342Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

7246240c

10 10月, 2012 3 次提交

rbd: define common queue_con_delay() · 802c6d96

由 Alex Elder 提交于 10月 08, 2012

This patch defines a single function, queue_con_delay() to call
queue_delayed_work() for a connection.  It basically generalizes
what was previously queue_con() by adding the delay argument.
queue_con() is now a simple helper that passes 0 for its delay.
queue_con_delay() returns 0 if it queued work or an errno if it
did not for some reason.

If con_work() finds the BACKOFF flag set for a connection, it now
calls queue_con_delay() to handle arranging to start again after a
delay.

Note about connection reference counts:  con_work() only ever gets
called as a work item function.  At the time that work is scheduled,
a reference to the connection is acquired, and the corresponding
con_work() call is then responsible for dropping that reference
before it returns.

Previously, the backoff handling inside con_work() silently handed
off its reference to delayed work it scheduled.  Now that
queue_con_delay() is used, a new reference is acquired for the
newly-scheduled work, and the original reference is dropped by the
con->ops->put() call at the end of the function.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

802c6d96

rbd: let con_work() handle backoff · 8618e30b

由 Alex Elder 提交于 10月 08, 2012

Both ceph_fault() and con_work() include handling for imposing a
delay before doing further processing on a faulted connection.
The latter is used only if ceph_fault() is unable to.

Instead, just let con_work() always be responsible for implementing
the delay.  After setting up the delay value, set the BACKOFF flag
on the connection unconditionally and call queue_con() to ensure
con_work() will get called to handle it.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

8618e30b

rbd: reset BACKOFF if unable to re-queue · 588377d6

由 Alex Elder 提交于 10月 08, 2012

If ceph_fault() is unable to queue work after a delay, it sets the
BACKOFF connection flag so con_work() will attempt to do so.

In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't
result in newly-queued work, it simply ignores this condition and
proceeds as if no backoff delay were desired.  There are two
problems with this--one of which is a bug.

The first problem is simply that the intended behavior is to back
off, and if we aren't able queue the work item to run after a delay
we're not doing that.

The only reason queue_delayed_work() won't queue work is if the
provided work item is already queued.  In the messenger, this
means that con_work() is already scheduled to be run again.  So
if we simply set the BACKOFF flag again when this occurs, we know
the next con_work() call will again attempt to hold off activity
on the connection until after the delay.

The second problem--the bug--is a leak of a reference count.  If
queue_delayed_work() returns 0 in con_work(), con->ops->put() drops
the connection reference held on entry to con_work().  However,
processing is (was) allowed to continue, and at the end of the
function a second con->ops->put() is called.

This patch fixes both problems.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

588377d6

02 10月, 2012 5 次提交

ceph: propagate layout error on osd request creation · 6816282d

由 Sage Weil 提交于 9月 24, 2012

If we are creating an osd request and get an invalid layout, return
an EINVAL to the caller.  We switch up the return to have an error
code instead of NULL implying -ENOMEM.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

6816282d

libceph: check for invalid mapping · d63b77f4

由 Sage Weil 提交于 9月 24, 2012

If we encounter an invalid (e.g., zeroed) mapping, return an error
and avoid a divide by zero.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

d63b77f4

ceph: use list_move_tail instead of list_del/list_add_tail · cc4829e5

由 Wei Yongjun 提交于 9月 05, 2012

Using list_move_tail() instead of list_del() + list_add_tail().
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NSage Weil <sage@inktank.com>

cc4829e5

libceph: Fix sparse warning · 7698f2f5

由 Iulius Curt 提交于 8月 23, 2012

Make ceph_monc_do_poolop() static to remove the following sparse warning:
 * net/ceph/mon_client.c:616:5: warning: symbol 'ceph_monc_do_poolop' was not
   declared. Should it be static?
Also drops the 'ceph_monc_' prefix, now being a private function.
Signed-off-by: NIulius Curt <icurt@ixiacom.com>
Signed-off-by: NSage Weil <sage@inktank.com>

7698f2f5

S
libceph: remove unused monc->have_fsid · 290e3359
由 Sage Weil 提交于 8月 17, 2012
```
This is unused; use monc->client->have_fsid.
Signed-off-by: NSage Weil <sage@inktank.com>
```
290e3359

28 9月, 2012 2 次提交

inetpeer: fix token initialization · bc9259a8

由 Nicolas Dichtel 提交于 9月 27, 2012

When jiffies wraps around (for example, 5 minutes after the boot, see
INITIAL_JIFFIES) and peer has just been created, now - peer->rate_last can be
< XRLIM_BURST_FACTOR * timeout, so token is not set to the maximum value, thus
some icmp packets can be unexpectedly dropped.

Fix this case by initializing last_rate to 60 seconds in the past.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc9259a8

l2tp: fix return value check · 7f8436a1

由 Wei Yongjun 提交于 9月 24, 2012

In case of error, the function genlmsg_put() returns NULL pointer
not ERR_PTR(). The IS_ERR() test in the return value check should
be replaced with NULL test.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f8436a1

26 9月, 2012 3 次提交

netfilter: xt_limit: have r->cost != 0 case work · 82e6bfe2

由 Jan Engelhardt 提交于 9月 21, 2012

Commit v2.6.19-rc1~1272^2~41 tells us that r->cost != 0 can happen when
a running state is saved to userspace and then reinstated from there.

Make sure that private xt_limit area is initialized with correct values.
Otherwise, random matchings due to use of uninitialized memory.
Signed-off-by: NJan Engelhardt <jengelh@inai.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

82e6bfe2

ipv6: mip6: fix mip6_mh_filter() · 96af69ea

由 Eric Dumazet 提交于 9月 25, 2012

mip6_mh_filter() should not modify its input, or else its caller
would need to recompute ipv6_hdr() if skb->head is reallocated.

Use skb_header_pointer() instead of pskb_may_pull()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

96af69ea

ipv6: raw: fix icmpv6_filter() · 1b05c4b5

由 Eric Dumazet 提交于 9月 25, 2012

icmpv6_filter() should not modify its input, or else its caller
would need to recompute ipv6_hdr() if skb->head is reallocated.

Use skb_header_pointer() instead of pskb_may_pull() and
change the prototype to make clear both sk and skb are const.

Also, if icmpv6 header cannot be found, do not deliver the packet,
as we do in IPv4.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b05c4b5

25 9月, 2012 1 次提交

net: guard tcp_set_keepalive() to tcp sockets · 3e10986d

由 Eric Dumazet 提交于 9月 24, 2012

Its possible to use RAW sockets to get a crash in
tcp_set_keepalive() / sk_reset_timer()

Fix is to make sure socket is a SOCK_STREAM one.
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3e10986d

24 9月, 2012 2 次提交

batman-adv: Fix symmetry check / route flapping in multi interface setups · 7caf69fb

由 Linus Lüssing 提交于 9月 18, 2012

If receiving an OGM from a neighbor other than the currently selected
and if it has the same TQ then we are supposed to switch if this
neighbor provides a more symmetric link than the currently selected one.

However this symmetry check currently is broken if the interface of the
neighbor we received the OGM from and the one of the currently selected
neighbor differ: We are currently trying to determine the symmetry of the
link towards the selected router via the link we received the OGM from
instead of just checking via the link towards the currently selected
router.

This leads to way more route switches than necessary and can lead to
permanent route flapping in many common multi interface setups.

This patch fixes this issue by using the right interface for this
symmetry check.
Signed-off-by: NLinus Lüssing <linus.luessing@web.de>

7caf69fb

batman-adv: Fix change mac address of soft iface. · 40a3eb33

由 Def 提交于 9月 20, 2012

Into function interface_set_mac_addr, the function tt_local_add was
invoked before updating dev->dev_addr. The new MAC address was not
tagged as NoPurge.
Signed-off-by: NDef <def@laposte.net>

40a3eb33

23 9月, 2012 1 次提交

ipv4: raw: fix icmp_filter() · ab43ed8b

由 Eric Dumazet 提交于 9月 22, 2012

icmp_filter() should not modify its input, or else its caller
would need to recompute ip_hdr() if skb->head is reallocated.

Use skb_header_pointer() instead of pskb_may_pull() and
change the prototype to make clear both sk and skb are const.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab43ed8b

22 9月, 2012 3 次提交

libceph: only kunmap kmapped pages · 5ce765a5

由 Alex Elder 提交于 9月 21, 2012

In write_partial_msg_pages(), pages need to be kmapped in order to
perform a CRC-32c calculation on them.  As an artifact of the way
this code used to be structured, the kunmap() call was separated
from the kmap() call and both were done conditionally.  But the
conditions under which the kmap() and kunmap() calls were made
differed, so there was a chance a kunmap() call would be done on a
page that had not been mapped.

The symptom of this was tripping a BUG() in kunmap_high() when
pkmap_count[nr] became 0.
Reported-by: NBryan K. Wright <bryan@virginia.edu>
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

5ce765a5

net: change return values from -EACCES to -EPERM · bf5b30b8

由 Zhao Hongjiang 提交于 9月 20, 2012

Change return value from -EACCES to -EPERM when the permission check fails.
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf5b30b8

ipv6: fix return value check in fib6_add() · f950c0ec

由 Wei Yongjun 提交于 9月 20, 2012

In case of error, the function fib6_add_1() returns ERR_PTR()
or NULL pointer. The ERR_PTR() case check is missing in fib6_add().

dpatch engine is used to generated this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f950c0ec

21 9月, 2012 3 次提交

net: do not disable sg for packets requiring no checksum · c0d680e5

由 Ed Cashin 提交于 9月 19, 2012

A change in a series of VLAN-related changes appears to have
inadvertently disabled the use of the scatter gather feature of
network cards for transmission of non-IP ethernet protocols like ATA
over Ethernet (AoE).  Below is a reference to the commit that
introduces a "harmonize_features" function that turns off scatter
gather when the NIC does not support hardware checksumming for the
ethernet protocol of an sk buff.

  commit f01a5236
  Author: Jesse Gross <jesse@nicira.com>
  Date:   Sun Jan 9 06:23:31 2011 +0000

      net offloading: Generalize netif_get_vlan_features().

The can_checksum_protocol function is not equipped to consider a
protocol that does not require checksumming.  Calling it for a
protocol that requires no checksum is inappropriate.

The patch below has harmonize_features call can_checksum_protocol when
the protocol needs a checksum, so that the network layer is not forced
to perform unnecessary skb linearization on the transmission of AoE
packets.  Unnecessary linearization results in decreased performance
and increased memory pressure, as reported here:

  http://www.spinics.net/lists/linux-mm/msg15184.html

The problem has probably not been widely experienced yet, because
only recently has the kernel.org-distributed aoe driver acquired the
ability to use payloads of over a page in size, with the patchset
recently included in the mm tree:

  https://lkml.org/lkml/2012/8/28/140

The coraid.com-distributed aoe driver already could use payloads of
greater than a page in size, but its users generally do not use the
newest kernels.
Signed-off-by: NEd Cashin <ecashin@coraid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0d680e5

xfrm_user: don't copy esn replay window twice for new states · e3ac104d

由 Mathias Krause 提交于 9月 19, 2012

The ESN replay window was already fully initialized in
xfrm_alloc_replay_state_esn(). No need to copy it again.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3ac104d

xfrm_user: ensure user supplied esn replay window is valid · ecd79187

由 Mathias Krause 提交于 9月 20, 2012

The current code fails to ensure that the netlink message actually
contains as many bytes as the header indicates. If a user creates a new
state or updates an existing one but does not supply the bytes for the
whole ESN replay window, the kernel copies random heap bytes into the
replay bitmap, the ones happen to follow the XFRMA_REPLAY_ESN_VAL
netlink attribute. This leads to following issues:

1. The replay window has random bits set confusing the replay handling
   code later on.

2. A malicious user could use this flaw to leak up to ~3.5kB of heap
   memory when she has access to the XFRM netlink interface (requires
   CAP_NET_ADMIN).

Known users of the ESN replay window are strongSwan and Steffen's
iproute2 patch (<http://patchwork.ozlabs.org/patch/85962/>). The latter
uses the interface with a bitmap supplied while the former does not.
strongSwan is therefore prone to run into issue 1.

To fix both issues without breaking existing userland allow using the
XFRMA_REPLAY_ESN_VAL netlink attribute with either an empty bitmap or a
fully specified one. For the former case we initialize the in-kernel
bitmap with zero, for the latter we copy the user supplied bitmap. For
state updates the full bitmap must be supplied.

To prevent overflows in the bitmap length calculation the maximum size
of bmp_len is limited to 128 by this patch -- resulting in a maximum
replay window of 4096 packets. This should be sufficient for all real
life scenarios (RFC 4303 recommends a default replay window size of 64).

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Martin Willi <martin@revosec.ch>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ecd79187