提交 · af296bdb8da4d0a4284de10fc4a61497272ddf11 · openeuler / raspberrypi-kernel

23 6月, 2014 8 次提交

mac80211: move csa counters from sdata to beacon/presp · af296bdb

由 Michal Kazior 提交于 6月 05, 2014

Having csa counters part of beacon and probe_resp
structures makes it easier to get rid of possible
races between setting a beacon and updating
counters on SMP systems by guaranteeing counters
are always consistent against given beacon struct.

While at it relax WARN_ON into WARN_ON_ONCE to
prevent spamming logs and racing.
Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
[remove pointless array check]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

af296bdb

mac80211: allow tx via monitor iface when DFS · b4932836

由 Janusz Dziedzic 提交于 6月 05, 2014

Allow send frames using monitor interface
when DFS chandef and we pass CAC (beaconing
allowed).

This fix problem when old kernel and new backports used,
in such case hostapd create/use also monitor interface.
Before this patch all frames hostapd send using monitor
iface were dropped when AP was configured on DFS channel.
Signed-off-by: NJanusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

b4932836

cfg80211: make ethtool the driver's responsibility · b7ffbd7e

由 Johannes Berg 提交于 6月 04, 2014

Currently, cfg80211 tries to implement ethtool, but that doesn't
really scale well, with all the different operations. Make the
lower-level driver responsible for it, which currently only has
an effect on mac80211. It will similarly not scale well at that
level though, since mac80211 also has many drivers.

To cleanly implement this in mac80211, introduce a new file and
move some code to appropriate places.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

b7ffbd7e

mac80211: remove weak WEP IV accounting · ba9030c2

由 Johannes Berg 提交于 6月 04, 2014

Since WEP is practically dead, there seems very little
point in keeping WEP weak IV accounting.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

ba9030c2

trivial: net/mac80211/mesh.c: fix typo s/Substract/Subtract/ · b314c669

由 Antonio Ospite 提交于 6月 04, 2014

Signed-off-by: NAntonio Ospite <ao2@ao2.it>
Cc: Luis Carlos Cobo <luisca@cozybit.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: linux-wireless@vger.kernel.org
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

b314c669

mac80211: remove ignore_plink_timer flag · 2b470c39

由 Bob Copeland 提交于 6月 04, 2014

The mesh_plink code is doing some interesting things with the
ignore_plink_timer flag.  It seems the original intent was to
handle this race:

cpu 0                           cpu 1
-----                           -----
                                start timer handler for state X
acquire sta_lock
change state from X to Y
mod_timer() / del_timer()
release sta_lock
                                acquire sta_lock
                                execute state Y timer too soon

However, using the mod_timer()/del_timer() return values to
detect these cases is broken.  As a result, timers get ignored
unnecessarily, and stations can get stuck in the peering state
machine.

Instead, we can detect the case by looking at the timer expiration.
In the case of del_timer, just ignore the timers in the following
(LISTEN/ESTAB) states since they won't have timers anyway.
Signed-off-by: NBob Copeland <me@bobcopeland.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

2b470c39

mac80211: fix station/driver powersave race · 5ac2e350

由 Johannes Berg 提交于 5月 27, 2014

It is currently possible to have a race due to the station PS
unblock work like this:
 * station goes to sleep with frames buffered in the driver
 * driver blocks wakeup
 * station wakes up again
 * driver flushes/returns frames, and unblocks, which schedules
   the unblock work
 * unblock work starts to run, and checks that the station is
   awake (i.e. that the WLAN_STA_PS_STA flag isn't set)
 * we process a received frame with PM=1, setting the flag again
 * ieee80211_sta_ps_deliver_wakeup() runs, delivering all frames
   to the driver, and then clearing the WLAN_STA_PS_DRIVER and
   WLAN_STA_PS_STA flags

In this scenario, mac80211 will think that the station is awake,
while it really is asleep, and any TX'ed frames should be filtered
by the device (it will know that the station is sleeping) but then
passed to mac80211 again, which will not buffer it either as it
thinks the station is awake, and eventually the packets will be
dropped.

Fix this by moving the clearing of the flags to exactly where we
learn about the situation. This creates a problem of reordering,
so introduce another flag indicating that delivery is being done,
this new flag also queues frames and is cleared only while the
spinlock is held (which the queuing code also holds) so that any
concurrent delivery/TX is handled correctly.
Reported-by: NAndrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

5ac2e350

mac80211: remove PID rate control · 20edb50e

由 John W. Linville 提交于 5月 30, 2014

Minstrel has long since proven its worth.
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

20edb50e

15 6月, 2014 2 次提交

net: sctp: fix permissions for rto_alpha and rto_beta knobs · b58537a1

由 Daniel Borkmann 提交于 6月 15, 2014

Commit 3fd091e7 ("[SCTP]: Remove multiple levels of msecs
to jiffies conversions.") has silently changed permissions for
rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of
this was to discourage users from tweaking rto_alpha and
rto_beta knobs in production environments since they are key
to correctly compute rtt/srtt.

RFC4960 under section 6.3.1. RTO Calculation says regarding
rto_alpha and rto_beta under rule C3 and C4:

  [...]
  C3)  When a new RTT measurement R' is made, set

       RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'|

       and

       SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R'

       Note: The value of SRTT used in the update to RTTVAR
       is its value before updating SRTT itself using the
       second assignment. After the computation, update
       RTO <- SRTT + 4 * RTTVAR.

  C4)  When data is in flight and when allowed by rule C5
       below, a new RTT measurement MUST be made each round
       trip. Furthermore, new RTT measurements SHOULD be
       made no more than once per round trip for a given
       destination transport address. There are two reasons
       for this recommendation: First, it appears that
       measuring more frequently often does not in practice
       yield any significant benefit [ALLMAN99]; second,
       if measurements are made more often, then the values
       of RTO.Alpha and RTO.Beta in rule C3 above should be
       adjusted so that SRTT and RTTVAR still adjust to
       changes at roughly the same rate (in terms of how many
       round trips it takes them to reflect new values) as
       they would if making only one measurement per
       round-trip and using RTO.Alpha and RTO.Beta as given
       in rule C3. However, the exact nature of these
       adjustments remains a research issue.
  [...]

While it is discouraged to adjust rto_alpha and rto_beta
and not further specified how to adjust them, the RFC also
doesn't explicitly forbid it, but rather gives a RECOMMENDED
default value (rto_alpha=3, rto_beta=2). We have a couple
of users relying on the old permissions before they got
changed. That said, if someone really has the urge to adjust
them, we could allow it with a warning in the log.

Fixes: 3fd091e7 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.")
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b58537a1

net: Fix save software checksum complete · 46fb51eb

由 Tom Herbert 提交于 6月 14, 2014

Geert reported issues regarding checksum complete and UDP.
The logic introduced in commit 7e3cead5
("net: Save software checksum complete") is not correct.

This patch:
1) Restores code in __skb_checksum_complete_header except for setting
   CHECKSUM_UNNECESSARY. This function may be calculating checksum on
   something less than skb->len.
2) Adds saving checksum to __skb_checksum_complete. The full packet
   checksum 0..skb->len is calculated without adding in pseudo header.
   This value is saved in skb->csum and then the pseudo header is added
   to that to derive the checksum for validation.
3) In both __skb_checksum_complete_header and __skb_checksum_complete,
   set skb->csum_valid to whether checksum of zero was computed. This
   allows skb_csum_unnecessary to return true without changing to
   CHECKSUM_UNNECESSARY which was done previously.
4) Copy new csum related bits in __copy_skb_header.
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46fb51eb

14 6月, 2014 1 次提交

udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup · 63c6f81c

由 Eric Dumazet 提交于 6月 12, 2014

Its too easy to add thousand of UDP sockets on a particular bucket,
and slow down an innocent multicast receiver.

Early demux is supposed to be an optimization, we should avoid spending
too much time in it.

It is interesting to note __udp4_lib_demux_lookup() only tries to
match first socket in the chain.

10 is the threshold we already have in __udp4_lib_lookup() to switch
to secondary hash.

Fixes: 421b3885 ("udp: ipv4: Add udp early demux")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDavid Held <drheld@google.com>
Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63c6f81c

13 6月, 2014 5 次提交

rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 · e5eca6d4

由 Michal Schmidt 提交于 5月 28, 2014

When running RHEL6 userspace on a current upstream kernel, "ip link"
fails to show VF information.

The reason is a kernel<->userspace API change introduced by commit
88c5b5ce ("rtnetlink: Call nlmsg_parse() with correct header length"),
after which the kernel does not see iproute2's IFLA_EXT_MASK attribute
in the netlink request.

iproute2 adjusted for the API change in its commit 63338dca4513
("libnetlink: Use ifinfomsg instead of rtgenmsg in rtnl_wilddump_req_filter").

The problem has been noticed before:
http://marc.info/?l=linux-netdev&m=136692296022182&w=2
(Subject: Re: getting VF link info seems to be broken in 3.9-rc8)

We can do better than tell those with old userspace to upgrade. We can
recognize the old iproute2 in the kernel by checking the netlink message
length. Even when including the IFLA_EXT_MASK attribute, its netlink
message is shorter than struct ifinfomsg.

With this patch "ip link" shows VF information in both old and new
iproute2 versions.
Signed-off-by: NMichal Schmidt <mschmidt@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5eca6d4

tcp: fixing TLP's FIN recovery · bef1909e

由 Per Hurtig 提交于 6月 12, 2014

Fix to a problem observed when losing a FIN segment that does not
contain data.  In such situations, TLP is unable to recover from
*any* tail loss and instead adds at least PTO ms to the
retransmission process, i.e., RTO = RTO + PTO.
Signed-off-by: NPer Hurtig <per.hurtig@kau.se>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNandita Dukkipati <nanditad@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bef1909e

bridge: fix compile error when compiling without IPv6 support · 3993c4e1

由 Linus Lüssing 提交于 6月 12, 2014

Some fields in "struct net_bridge" aren't available when compiling the
kernel without IPv6 support. Therefore adding a check/macro to skip the
complaining code sections in that case.

Introduced by 2cd41431
("bridge: memorize and export selected IGMP/MLD querier port")
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NLinus Lüssing <linus.luessing@web.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3993c4e1

bridge: fix smatch warning / potential null pointer dereference · 6c03ee8b

由 Linus Lüssing 提交于 6月 12, 2014

"New smatch warnings:
  net/bridge/br_multicast.c:1368 br_ip6_multicast_query() error:
    we previously assumed 'group' could be null (see line 1349)"

In the rare (sort of broken) case of a query having a Maximum
Response Delay of zero, we could create a potential null pointer
dereference.

Fixing this by skipping the multicast specific MLD Query parsing again
if no multicast group address is available.

Introduced by dc4eb53a
("bridge: adhere to querier election mechanism specified by RFCs")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NLinus Lüssing <linus.luessing@web.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c03ee8b

sctp: Fix sk_ack_backlog wrap-around problem · d3217b15

由 Xufeng Zhang 提交于 6月 12, 2014

Consider the scenario:
For a TCP-style socket, while processing the COOKIE_ECHO chunk in
sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check,
a new association would be created in sctp_unpack_cookie(), but afterwards,
some processing maybe failed, and sctp_association_free() will be called to
free the previously allocated association, in sctp_association_free(),
sk_ack_backlog value is decremented for this socket, since the initial
value for sk_ack_backlog is 0, after the decrement, it will be 65535,
a wrap-around problem happens, and if we want to establish new associations
afterward in the same socket, ABORT would be triggered since sctp deem the
accept queue as full.
Fix this issue by only decrementing sk_ack_backlog for associations in
the endpoint's list.
Fix-suggested-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NXufeng Zhang <xufeng.zhang@windriver.com>
Acked-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NVlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3217b15

12 6月, 2014 20 次提交

net/core: Add VF link state control policy · c5b46160

由 Doug Ledford 提交于 6月 11, 2014

Commit 1d8faf48 (net/core: Add VF link state control) added VF link state
control to the netlink VF nested structure, but failed to add a proper entry
for the new structure into the VF policy table.  Add the missing entry so
the table and the actual data copied into the netlink nested struct are in
sync.
Signed-off-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5b46160

net_sched: drr: warn when qdisc is not work conserving · 6e765a00

由 Florian Westphal 提交于 6月 11, 2014

The DRR scheduler requires that items on the active list are work
conserving, i.e. do not hold on to skbs for throttling purposes, etc.
Attaching e.g. tbf renders DRR useless because all other classes on the
active list are delayed as well.

So, warn users that this configuration won't work as expected; we
already do this in couple of other qdiscs, see e.g.

commit b00355db
('pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler')

The 'const' change is needed to avoid compiler warning ("discards 'const'
qualifier from pointer target type").

tested with:
drr_hier() {
        parent=$1
        classes=$2
        for i in  $(seq 1 $classes); do
                classid=$parent$(printf %x $i)
                tc class add dev eth0 parent $parent classid $classid drr
		tc qdisc add dev eth0 parent $classid tbf rate 64kbit burst 256kbit limit 64kbit
        done
}
tc qdisc add dev eth0 root handle 1: drr
drr_hier 1: 32
tc filter add dev eth0 protocol all pref 1 parent 1: handle 1 flow hash keys dst perturb 1 divisor 32
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e765a00

net: Add skb_gro_postpull_rcsum to udp and vxlan · 6bae1d4c

由 Tom Herbert 提交于 6月 10, 2014

Need to gro_postpull_rcsum for GRO to work with checksum complete.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6bae1d4c

net: Save software checksum complete · 7e3cead5

由 Tom Herbert 提交于 6月 10, 2014

In skb_checksum complete, if we need to compute the checksum for the
packet (via skb_checksum) save the result as CHECKSUM_COMPLETE.
Subsequent checksum verification can use this.

Also, added csum_complete_sw flag to distinguish between software and
hardware generated checksum complete, we should always be able to trust
the software computation.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e3cead5

ceph: remove bogus extern · f6479449

由 stephen hemminger 提交于 6月 10, 2014

Sparse complained about this bogus extern on definition of
a function.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6479449

ipv4: fix a race in ip4_datagram_release_cb() · 9709674e

由 Eric Dumazet 提交于 6月 10, 2014

Alexey gave a AddressSanitizer[1] report that finally gave a good hint
at where was the origin of various problems already reported by Dormando
in the past [2]

Problem comes from the fact that UDP can have a lockless TX path, and
concurrent threads can manipulate sk_dst_cache, while another thread,
is holding socket lock and calls __sk_dst_set() in
ip4_datagram_release_cb() (this was added in linux-3.8)

It seems that all we need to do is to use sk_dst_check() and
sk_dst_set() so that all the writers hold same spinlock
(sk->sk_dst_lock) to prevent corruptions.

TCP stack do not need this protection, as all sk_dst_cache writers hold
the socket lock.

[1]
https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel

AddressSanitizer: heap-use-after-free in ipv4_dst_check
Read of size 2 by thread T15453:
 [<ffffffff817daa3a>] ipv4_dst_check+0x1a/0x90 ./net/ipv4/route.c:1116
 [<ffffffff8175b789>] __sk_dst_check+0x89/0xe0 ./net/core/sock.c:531
 [<ffffffff81830a36>] ip4_datagram_release_cb+0x46/0x390 ??:0
 [<ffffffff8175eaea>] release_sock+0x17a/0x230 ./net/core/sock.c:2413
 [<ffffffff81830882>] ip4_datagram_connect+0x462/0x5d0 ??:0
 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

Freed by thread T15455:
 [<ffffffff8178d9b8>] dst_destroy+0xa8/0x160 ./net/core/dst.c:251
 [<ffffffff8178de25>] dst_release+0x45/0x80 ./net/core/dst.c:280
 [<ffffffff818304c1>] ip4_datagram_connect+0xa1/0x5d0 ??:0
 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

Allocated by thread T15453:
 [<ffffffff8178d291>] dst_alloc+0x81/0x2b0 ./net/core/dst.c:171
 [<ffffffff817db3b7>] rt_dst_alloc+0x47/0x50 ./net/ipv4/route.c:1406
 [<     inlined    >] __ip_route_output_key+0x3e8/0xf70
__mkroute_output ./net/ipv4/route.c:1939
 [<ffffffff817dde08>] __ip_route_output_key+0x3e8/0xf70 ./net/ipv4/route.c:2161
 [<ffffffff817deb34>] ip_route_output_flow+0x14/0x30 ./net/ipv4/route.c:2249
 [<ffffffff81830737>] ip4_datagram_connect+0x317/0x5d0 ??:0
 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

[2]
<4>[196727.311203] general protection fault: 0000 [#1] SMP
<4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
<4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
<4>[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
<4>[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000
<4>[196727.311377] RIP: 0010:[<ffffffff815f8c7f>]  [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
<4>[196727.311399] RSP: 0018:ffff885effd23a70  EFLAGS: 00010282
<4>[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040
<4>[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200
<4>[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800
<4>[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
<4>[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce
<4>[196727.311510] FS:  0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000
<4>[196727.311554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0
<4>[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[196727.311713] Stack:
<4>[196727.311733]  ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42
<4>[196727.311784]  ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0
<4>[196727.311834]  ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0
<4>[196727.311885] Call Trace:
<4>[196727.311907]  <IRQ>
<4>[196727.311912]  [<ffffffff815b7f42>] dst_destroy+0x32/0xe0
<4>[196727.311959]  [<ffffffff815b86c6>] dst_release+0x56/0x80
<4>[196727.311986]  [<ffffffff81620bd5>] tcp_v4_do_rcv+0x2a5/0x4a0
<4>[196727.312013]  [<ffffffff81622b5a>] tcp_v4_rcv+0x7da/0x820
<4>[196727.312041]  [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
<4>[196727.312070]  [<ffffffff815de02d>] ? nf_hook_slow+0x7d/0x150
<4>[196727.312097]  [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
<4>[196727.312125]  [<ffffffff815fda92>] ip_local_deliver_finish+0xb2/0x230
<4>[196727.312154]  [<ffffffff815fdd9a>] ip_local_deliver+0x4a/0x90
<4>[196727.312183]  [<ffffffff815fd799>] ip_rcv_finish+0x119/0x360
<4>[196727.312212]  [<ffffffff815fe00b>] ip_rcv+0x22b/0x340
<4>[196727.312242]  [<ffffffffa0339680>] ? macvlan_broadcast+0x160/0x160 [macvlan]
<4>[196727.312275]  [<ffffffff815b0c62>] __netif_receive_skb_core+0x512/0x640
<4>[196727.312308]  [<ffffffff811427fb>] ? kmem_cache_alloc+0x13b/0x150
<4>[196727.312338]  [<ffffffff815b0db1>] __netif_receive_skb+0x21/0x70
<4>[196727.312368]  [<ffffffff815b0fa1>] netif_receive_skb+0x31/0xa0
<4>[196727.312397]  [<ffffffff815b1ae8>] napi_gro_receive+0xe8/0x140
<4>[196727.312433]  [<ffffffffa00274f1>] ixgbe_poll+0x551/0x11f0 [ixgbe]
<4>[196727.312463]  [<ffffffff815fe00b>] ? ip_rcv+0x22b/0x340
<4>[196727.312491]  [<ffffffff815b1691>] net_rx_action+0x111/0x210
<4>[196727.312521]  [<ffffffff815b0db1>] ? __netif_receive_skb+0x21/0x70
<4>[196727.312552]  [<ffffffff810519d0>] __do_softirq+0xd0/0x270
<4>[196727.312583]  [<ffffffff816cef3c>] call_softirq+0x1c/0x30
<4>[196727.312613]  [<ffffffff81004205>] do_softirq+0x55/0x90
<4>[196727.312640]  [<ffffffff81051c85>] irq_exit+0x55/0x60
<4>[196727.312668]  [<ffffffff816cf5c3>] do_IRQ+0x63/0xe0
<4>[196727.312696]  [<ffffffff816c5aaa>] common_interrupt+0x6a/0x6a
<4>[196727.312722]  <EOI>
<1>[196727.313071] RIP  [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
<4>[196727.313100]  RSP <ffff885effd23a70>
<4>[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
<0>[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt
Reported-by: NAlexey Preobrazhensky <preobr@google.com>
Reported-by: Ndormando <dormando@rydia.ne>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: 8141ed9f ("ipv4: Add a socket release callback for datagram sockets")
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9709674e

net: add __pskb_copy_fclone and pskb_copy_for_clone · bad93e9d

由 Octavian Purdila 提交于 6月 12, 2014

There are several instances where a pskb_copy or __pskb_copy is
immediately followed by an skb_clone.

Add a couple of new functions to allow the copy skb to be allocated
from the fclone cache and thus speed up subsequent skb_clone calls.

Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <antonio@meshcoding.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: Arvid Brodin <arvid.brodin@alten.se>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bad93e9d

bridge: Support 802.1ad vlan filtering · 204177f3

由 Toshiaki Makita 提交于 6月 10, 2014

This enables us to change the vlan protocol for vlan filtering.
We come to be able to filter frames on the basis of 802.1ad vlan tags
through a bridge.

This also changes br->group_addr if it has not been set by user.
This is needed for an 802.1ad bridge.
(See IEEE 802.1Q-2011 8.13.5.)

Furthermore, this sets br->group_fwd_mask_required so that an 802.1ad
bridge can forward the Nearest Customer Bridge group addresses except
for br->group_addr, which should be passed to higher layer.

To change the vlan protocol, write a protocol in sysfs:
# echo 0x88a8 > /sys/class/net/br0/bridge/vlan_protocol
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

204177f3

bridge: Prepare for forwarding another bridge group addresses · f2808d22

由 Toshiaki Makita 提交于 6月 10, 2014

If a bridge is an 802.1ad bridge, it must forward another bridge group
addresses (the Nearest Customer Bridge group addresses).
(For details, see IEEE 802.1Q-2011 8.6.3.)

As user might not want group_fwd_mask to be modified by enabling 802.1ad,
introduce a new mask, group_fwd_mask_required, which indicates addresses
the bridge wants to forward. This will be set by enabling 802.1ad.
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2808d22

bridge: Prepare for 802.1ad vlan filtering support · 8580e211

由 Toshiaki Makita 提交于 6月 10, 2014

This enables a bridge to have vlan protocol informantion and allows vlan
tag manipulation (retrieve, insert and remove tags) according to the vlan
protocol.
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8580e211

bridge: Add 802.1ad tx vlan acceleration · 1c5abb6c

由 Toshiaki Makita 提交于 6月 10, 2014

Bridge device doesn't need to embed S-tag into skb->data.
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c5abb6c

net: filter: fix warning on 32-bit arch · 61f83d0d

由 Alexei Starovoitov 提交于 6月 11, 2014

fix compiler warning on 32-bit architectures:

net/core/filter.c: In function '__sk_run_filter':
net/core/filter.c:540:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
net/core/filter.c:550:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
net/core/filter.c:560:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61f83d0d

tipc: fix potential bug in function tipc_backlog_rcv · 02c00c2a

由 Jon Paul Maloy 提交于 6月 09, 2014

In commit 4f4482dc ("tipc: compensate
for double accounting in socket rcv buffer") we access 'truesize' of
a received buffer after it might have been released by the function
filter_rcv().

In this commit we correct this by reading the value of 'truesize' to
the stack before delivering the buffer to filter_rcv().
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02c00c2a

net: sctp: fix incorrect type in gfp initializer · 9b87d465

由 Daniel Borkmann 提交于 6月 11, 2014

This fixes the following sparse warning:

net/sctp/associola.c:1556:29: warning: incorrect type in initializer (different base types)
net/sctp/associola.c:1556:29: expected bool [unsigned] [usertype] preload
net/sctp/associola.c:1556:29: got restricted gfp_t
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b87d465

net: sctp: improve sctp_select_active_and_retran_path selection · a7288c4d

由 Daniel Borkmann 提交于 6月 11, 2014

In function sctp_select_active_and_retran_path(), we walk the
transport list in order to look for the two most recently used
ACTIVE transports (trans_pri, trans_sec). In case we didn't find
anything ACTIVE, we currently just camp on a possibly PF or
INACTIVE transport that is primary path; this behavior actually
dates back to linux-history tree of the very early days of
lksctp, and can yield a behavior that chooses suboptimal
transport paths.

Instead, be a bit more clever by reusing and extending the
recently introduced sctp_trans_elect_best() handler. In case
both transports are evaluated to have the same score resulting
from their states, break the tie by looking at: 1) transport
patch error count 2) last_time_heard value from each transport.

This is analogous to Nishida's Quick Failover draft [1],
section 5.1, 3:

  The sender SHOULD avoid data transmission to PF destinations.
  When all destinations are in either PF or Inactive state,
  the sender MAY either move the destination from PF to active
  state (and transmit data to the active destination) or the
  sender MAY transmit data to a PF destination. In the former
  scenario, (i) the sender MUST NOT notify the ULP about the
  state transition, and (ii) MUST NOT clear the destination's
  error counter. It is recommended that the sender picks the
  PF destination with least error count (fewest consecutive
  timeouts) for data transmission. In case of a tie (multiple PF
  destinations with same error count), the sender MAY choose the
  last active destination.

Thus for sctp_select_active_and_retran_path(), we keep track of
the best, if any, transport that is in PF state and in case no
ACTIVE transport has been found (hence trans_{pri,sec} is NULL),
we select the best out of the three: current primary_path and
retran_path as well as a possible PF transport.

The secondary may still camp on the original primary_path as
before. The change in sctp_trans_elect_best() with a more fine
grained tie selection also improves at the same time path selection
for sctp_assoc_update_retran_path() in case of non-ACTIVE states.

  [1] http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7288c4d

net: sctp: migrate most recently used transport to ktime · e575235f

由 Daniel Borkmann 提交于 6月 11, 2014

Be more precise in transport path selection and use ktime
helpers instead of jiffies to compare and pick the better
primary and secondary recently used transports. This also
avoids any side-effects during a possible roll-over, and
could lead to better path decision-making.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e575235f

net: sctp: refactor active path selection · b82e8f31

由 Daniel Borkmann 提交于 6月 11, 2014

This patch just refactors and moves the code for the active
path selection into its own helper function outside of
sctp_assoc_control_transport() which is already big enough.
No functional changes here.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b82e8f31

ktime: add ktime_after and ktime_before helper · 67cb9366

由 Daniel Borkmann 提交于 6月 11, 2014

Add two minimal helper functions analogous to time_before() and
time_after() that will later on both be needed by SCTP code.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67cb9366

mac802154: don't deliver packets to devices that are down · 2d3b5b0a

由 Phoebe Buckheister 提交于 6月 11, 2014

Only one WPAN devices can be active at any given time, so only deliver
packets to that one interface that is actually up. Multiple monitors may
be up at any given time, but we don't have to deliver to monitors that
are down either.
Signed-off-by: NPhoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d3b5b0a

mac802154: properly free incoming skbs on decryption failure · a374eeb5

由 Phoebe Buckheister 提交于 6月 11, 2014

mac802154 RX did not free skbs on decryption failure, assuming that the
caller would when the local rx handler returned _DROP. This was false.
Signed-off-by: NPhoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a374eeb5

11 6月, 2014 4 次提交

net: fix UDP tunnel GSO of frag_list GRO packets · 5882a07c

由 Wei-Chun Chao 提交于 6月 08, 2014

This patch fixes a kernel BUG_ON in skb_segment. It is hit when
testing two VMs on openvswitch with one VM acting as VXLAN gateway.

During VXLAN packet GSO, skb_segment is called with skb->data
pointing to inner TCP payload. skb_segment calls skb_network_protocol
to retrieve the inner protocol. skb_network_protocol actually expects
skb->data to point to MAC and it calls pskb_may_pull with ETH_HLEN.
This ends up pulling in ETH_HLEN data from header tail. As a result,
pskb_trim logic is skipped and BUG_ON is hit later.

Move skb_push in front of skb_network_protocol so that skb->data
lines up properly.

kernel BUG at net/core/skbuff.c:2999!
Call Trace:
[<ffffffff816ac412>] tcp_gso_segment+0x122/0x410
[<ffffffff816bc74c>] inet_gso_segment+0x13c/0x390
[<ffffffff8164b39b>] skb_mac_gso_segment+0x9b/0x170
[<ffffffff816b3658>] skb_udp_tunnel_segment+0xd8/0x390
[<ffffffff816b3c00>] udp4_ufo_fragment+0x120/0x140
[<ffffffff816bc74c>] inet_gso_segment+0x13c/0x390
[<ffffffff8109d742>] ? default_wake_function+0x12/0x20
[<ffffffff8164b39b>] skb_mac_gso_segment+0x9b/0x170
[<ffffffff8164b4d0>] __skb_gso_segment+0x60/0xc0
[<ffffffff8164b6b3>] dev_hard_start_xmit+0x183/0x550
[<ffffffff8166c91e>] sch_direct_xmit+0xfe/0x1d0
[<ffffffff8164bc94>] __dev_queue_xmit+0x214/0x4f0
[<ffffffff8164bf90>] dev_queue_xmit+0x10/0x20
[<ffffffff81687edb>] ip_finish_output+0x66b/0x890
[<ffffffff81688a58>] ip_output+0x58/0x90
[<ffffffff816c628f>] ? fib_table_lookup+0x29f/0x350
[<ffffffff816881c9>] ip_local_out_sk+0x39/0x50
[<ffffffff816cbfad>] iptunnel_xmit+0x10d/0x130
[<ffffffffa0212200>] vxlan_xmit_skb+0x1d0/0x330 [vxlan]
[<ffffffffa02a3919>] vxlan_tnl_send+0x129/0x1a0 [openvswitch]
[<ffffffffa02a2cd6>] ovs_vport_send+0x26/0xa0 [openvswitch]
[<ffffffffa029931e>] do_output+0x2e/0x50 [openvswitch]
Signed-off-by: NWei-Chun Chao <weichunc@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5882a07c

net: ipv6: Fixed up ipsec packet be re-routing issue · f6c20c59

由 huizhang 提交于 6月 09, 2014

Bug report on https://bugzilla.kernel.org/show_bug.cgi?id=75781

When a local output ipsec packet match the mangle table rule,
and be set mark value, the packet will be route again in
route_me_harder -> _session_decoder6

In this case, the nhoff in CB of skb was still the default
value 0. So the protocal match can't success and the packet can't match
correct SA rule,and then the packet be send out in plaintext.

To fixed up the issue. The CB->nhoff must be set.
Signed-off-by: NHui Zhang <huizhang@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6c20c59

ip_tunnel: fix i_key matching in ip_tunnel_find · 5ce54af1

由 Dmitry Popov 提交于 6月 08, 2014

Some tunnels (though only vti as for now) can use i_key just for internal use:
for example vti uses it for fwmark'ing incoming packets. So raw i_key value
shouldn't be treated as a distinguisher for them. ip_tunnel_key_match exists for
cases when we want to compare two ip_tunnel_parms' i_keys.

Example bug:
ip link add type vti ikey 1 local 1.0.0.1 remote 2.0.0.2
ip link add type vti ikey 2 local 1.0.0.1 remote 2.0.0.2
spawned two tunnels, although it doesn't make sense.
Signed-off-by: NDmitry Popov <ixaphire@qrator.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ce54af1

ip_vti: Fix 'ip tunnel add' with 'key' parameters · 7c8e6b9c

由 Dmitry Popov 提交于 6月 08, 2014

ip tunnel add remote 10.2.2.1 local 10.2.2.2 mode vti ikey 1 okey 2
translates to p->iflags = VTI_ISVTI|GRE_KEY and p->i_key = 1, but GRE_KEY !=
TUNNEL_KEY, so ip_tunnel_ioctl would set i_key to 0 (same story with o_key)
making us unable to create vti tunnels with [io]key via ip tunnel.

We cannot simply translate GRE_KEY to TUNNEL_KEY (as GRE module does) because
vti_tunnels with same local/remote addresses but different ikeys will be treated
as different then. So, imo the best option here is to move p->i_flags & *_KEY
check for vti tunnels from ip_tunnel.c to ip_vti.c and to think about [io]_mark
field for ip_tunnel_parm in the future.
Signed-off-by: NDmitry Popov <ixaphire@qrator.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c8e6b9c