提交 · f89b7755f517cdbb755d7543eef986ee9d54e654 · openanolis / cloud-kernel

28 10月, 2014 6 次提交

由 Alexei Starovoitov 提交于 10月 23, 2014

introduce two configs:
- hidden CONFIG_BPF to select eBPF interpreter that classic socket filters
  depend on
- visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use

that solves several problems:
- tracing and others that wish to use eBPF don't need to depend on NET.
  They can use BPF_SYSCALL to allow loading from userspace or select BPF
  to use it directly from kernel in NET-less configs.
- in 3.18 programs cannot be attached to events yet, so don't force it on
- when the rest of eBPF infra is there in 3.19+, it's still useful to
  switch it off to minimize kernel size

bloat-o-meter on x64 shows:
add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601)

tested with many different config combinations. Hopefully didn't miss anything.
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Acked-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f89b7755

Merge branch 'cxgb4-net' · 8ae3c911

由 David S. Miller 提交于 10月 27, 2014

Anish Bhatt says:

====================
cxgb4 : DCBx fixes for apps/host lldp agents

This patchset  contains some minor fixes for cxgb4 DCBx code. Chiefly, cxgb4
was not cleaning up any apps added to kernel app table when link was lost.
Disabling DCBx in firmware would automatically set DCBx state to host-managed
and enabled, we now wait for an explicit enable call from an lldp agent instead

First patch was originally sent to net-next, but considering it applies to
correcting behaviour of code already in net, I think it qualifies as a bug fix.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ae3c911

cxgb4 : Handle dcb enable correctly · 3bb06261

由 Anish Bhatt 提交于 10月 23, 2014

Disabling DCBx in firmware automatically enables DCBx for control via host
lldp agents. Wait for an explicit setstate call from an lldp agents to enable
 DCBx instead.

Fixes: 76bcb31e ("cxgb4 : Add DCBx support codebase and dcbnl_ops")
Signed-off-by: NAnish Bhatt <anish@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3bb06261

cxgb4 : Improve handling of DCB negotiation or loss thereof · 2376c879

由 Anish Bhatt 提交于 10月 23, 2014

Clear out any DCB apps we might have added to kernel table when we lose DCB
sync (or IEEE equivalent event). These were previously left behind and not
cleaned up correctly. IEEE allows individual components to work independently,
so improve check for IEEE completion by specifying individual components.

Fixes: 10b00466 ("cxgb4: IEEE fixes for DCBx state machine")
Signed-off-by: NAnish Bhatt <anish@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2376c879

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 5d26b1f5

由 David S. Miller 提交于 10月 27, 2014

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Allow to recycle a TCP port in conntrack when the change role from
   server to client, from Marcelo Leitner.

2) Fix possible off by one access in ip_set_nfnl_get_byindex(), patch
   from Dan Carpenter.

3) alloc_percpu returns NULL on error, no need for IS_ERR() in nf_tables
   chain statistic updates. From Sabrina Dubroca.

4) Don't compile ip options in bridge netfilter, this mangles the packet
   and bridge should not alter layer >= 3 headers when forwarding packets.
   Patch from Herbert Xu and tested by Florian Westphal.

5) Account the final NLMSG_DONE message when calculating the size of the
   nflog netlink batches. Patch from Florian Westphal.

6) Fix a possible netlink attribute length overflow with large packets.
   Again from Florian Westphal.

7) Release the skbuff if nfnetlink_log fails to put the final
   NLMSG_DONE message. This fixes a leak on error. This shouldn't ever
   happen though, otherwise this means we miscalculate the netlink batch
   size, so spot a warning if this ever happens so we can track down the
   problem. This patch from Houcheng Lin.

8) Look at the right list when recycling targets in the nft_compat,
   patch from Arturo Borrero.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d26b1f5

netfilter: nft_compat: fix wrong target lookup in nft_target_select_ops() · 7965ee93

由 Arturo Borrero 提交于 10月 26, 2014

The code looks for an already loaded target, and the correct list to search
is nft_target_list, not nft_match_list.
Signed-off-by: NArturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

7965ee93

27 10月, 2014 4 次提交

net: napi_reuse_skb() should check pfmemalloc · 93a35f59

由 Eric Dumazet 提交于 10月 23, 2014

Do not reuse skb if it was pfmemalloc tainted, otherwise
future frame might be dropped anyway.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NRoman Gushchin <klamm@yandex-team.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93a35f59

Merge branch 'mellanox' · aa9c5579

由 David S. Miller 提交于 10月 26, 2014

Eli Cohen says:

====================
irq sync fixes

This two patch series fixes a race where an interrupt handler could access a
freed memory.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa9c5579

net/mlx4_core: Call synchronize_irq() before freeing EQ buffer · bf1bac5b

由 Eli Cohen 提交于 10月 23, 2014

After moving the EQ ownership to software effectively destroying it, call
synchronize_irq() to ensure that any handler routines running on other CPU
cores finish execution. Only then free the EQ buffer.
The same thing is done when we destroy a CQ which is one of the sources
generating interrupts. In the case of CQ we want to avoid completion handlers
on a CQ that was destroyed. In the case we do the same to avoid receiving
asynchronous events after the EQ has been destroyed and its buffers freed.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf1bac5b

net/mlx5_core: Call synchronize_irq() before freeing EQ buffer · 96e4be06

由 Eli Cohen 提交于 10月 23, 2014

After destroying the EQ, the object responsible for generating interrupts, call
synchronize_irq() to ensure that any handler routines running on other CPU
cores finish execution. Only then free the EQ buffer. This patch solves a very
rare case when we get panic on driver unload.
The same thing is done when we destroy a CQ which is one of the sources
generating interrupts. In the case of CQ we want to avoid completion handlers
on a CQ that was destroyed. In the case we do the same to avoid receiving
asynchronous events after the EQ has been destroyed and its buffers freed.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

96e4be06

26 10月, 2014 8 次提交

drivers: net: xgene: Rewrite buggy loop in xgene_enet_ecc_init() · b71e821d

由 Geert Uytterhoeven 提交于 10月 23, 2014

drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c: In function ‘xgene_enet_ecc_init’:
drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c:126: warning: ‘data’ may be used uninitialized in this function

Depending on the arbitrary value on the stack, the loop may terminate
too early, and cause a bogus -ENODEV failure.
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b71e821d

i40e: _MASK vs _SHIFT typo in i40e_handle_mdd_event() · 013f6579

由 Dan Carpenter 提交于 10月 22, 2014

We accidentally mask by the _SHIFT variable.  It means that "event" is
always zero.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Tested-by: NJim Young <jamesx.m.young@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

013f6579

macvlan: fix a race on port dismantle and possible skb leaks · fe0ca732

由 Eric Dumazet 提交于 10月 22, 2014

We need to cancel the work queue after rcu grace period,
otherwise it can be rescheduled by incoming packets.

We need to purge queue if some skbs are still in it.

We can use __skb_queue_head_init() variant in
macvlan_process_broadcast()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: 412ca155 ("macvlan: Move broadcasts into a work queue")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe0ca732

tcp: md5: do not use alloc_percpu() · 349ce993

由 Eric Dumazet 提交于 10月 23, 2014

percpu tcp_md5sig_pool contains memory blobs that ultimately
go through sg_set_buf().

-> sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));

This requires that whole area is in a physically contiguous portion
of memory. And that @buf is not backed by vmalloc().

Given that alloc_percpu() can use vmalloc() areas, this does not
fit the requirements.

Replace alloc_percpu() by a static DEFINE_PER_CPU() as tcp_md5sig_pool
is small anyway, there is no gain to dynamically allocate it.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: 765cf997 ("tcp: md5: remove one indirection level in tcp_md5sig_pool")
Reported-by: NCrestez Dan Leonard <cdleonard@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

349ce993

Merge branch 'xen-netback' · 4cc40af0

由 David S. Miller 提交于 10月 25, 2014

David Vrabel says:

====================
xen-netback: guest Rx queue drain and stall fixes

This series fixes two critical xen-netback bugs.

1. Netback may consume all of host memory by queuing an unlimited
   number of skb on the internal guest Rx queue.  This behaviour is
   guest triggerable.

2. Carrier flapping under high traffic rates which reduces
   performance.

The first patch is a prerequite.  Removing support for frontends with
feature-rx-notify makes it easier to reason about the correctness of
netback since it no longer has to support this outdated and broken
mode.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4cc40af0

xen-netback: reintroduce guest Rx stall detection · ecf08d2d

由 David Vrabel 提交于 10月 22, 2014

If a frontend not receiving packets it is useful to detect this and
turn off the carrier so packets are dropped early instead of being
queued and drained when they expire.

A to-guest queue is stalled if it doesn't have enough free slots for a
an extended period of time (default 60 s).

If at least one queue is stalled, the carrier is turned off (in the
expectation that the other queues will soon stall as well).  The
carrier is only turned on once all queues are ready.

When the frontend connects, all the queues start in the stalled state
and only become ready once the frontend queues enough Rx requests.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NWei Liu <wei.liu2@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ecf08d2d

xen-netback: fix unlimited guest Rx internal queue and carrier flapping · f48da8b1

由 David Vrabel 提交于 10月 22, 2014

Netback needs to discard old to-guest skb's (guest Rx queue drain) and
it needs detect guest Rx stalls (to disable the carrier so packets are
discarded earlier), but the current implementation is very broken.

1. The check in hard_start_xmit of the slot availability did not
   consider the number of packets that were already in the guest Rx
   queue.  This could allow the queue to grow without bound.

   The guest stops consuming packets and the ring was allowed to fill
   leaving S slot free.  Netback queues a packet requiring more than S
   slots (ensuring that the ring stays with S slots free).  Netback
   queue indefinately packets provided that then require S or fewer
   slots.

2. The Rx stall detection is not triggered in this case since the
   (host) Tx queue is not stopped.

3. If the Tx queue is stopped and a guest Rx interrupt occurs, netback
   will consider this an Rx purge event which may result in it taking
   the carrier down unnecessarily.  It also considers a queue with
   only 1 slot free as unstalled (even though the next packet might
   not fit in this).

The internal guest Rx queue is limited by a byte length (to 512 Kib,
enough for half the ring).  The (host) Tx queue is stopped and started
based on this limit.  This sets an upper bound on the amount of memory
used by packets on the internal queue.

This allows the estimatation of the number of slots for an skb to be
removed (it wasn't a very good estimate anyway).  Instead, the guest
Rx thread just waits for enough free slots for a maximum sized packet.

skbs queued on the internal queue have an 'expires' time (set to the
current time plus the drain timeout).  The guest Rx thread will detect
when the skb at the head of the queue has expired and discard expired
skbs.  This sets a clear upper bound on the length of time an skb can
be queued for.  For a guest being destroyed the maximum time needed to
wait for all the packets it sent to be dropped is still the drain
timeout (10 s) since it will not be sending new packets.

Rx stall detection is reintroduced in a later commit.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NWei Liu <wei.liu2@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f48da8b1

xen-netback: make feature-rx-notify mandatory · bc96f648

由 David Vrabel 提交于 10月 22, 2014

Frontends that do not provide feature-rx-notify may stall because
netback depends on the notification from frontend to wake the guest Rx
thread (even if can_queue is false).

This could be fixed but feature-rx-notify was introduced in 2006 and I
am not aware of any frontends that do not implement this.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: NWei Liu <wei.liu2@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc96f648

25 10月, 2014 1 次提交

ptp: restore the makefile for building the test program. · 5345c1d4

由 Richard Cochran 提交于 10月 22, 2014

This patch brings back the makefile called testptp.mk which was removed
in commit adb19fb6 (Documentation: add makefiles for more targets).

While the idea of that commit was to improve build coverage of the
examples, the new Makefile is unable to cross compile the testptp program.
In contrast, the deleted makefile was able to do this just fine.

This patch fixes the regression by restoring the original makefile.
Signed-off-by: NRichard Cochran <richardcochran@gmail.com>
Acked-by: NPeter Foley <pefoley2@pefoley.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5345c1d4

24 10月, 2014 4 次提交

netfilter: nf_log: release skbuff on nlmsg put failure · b51d3fa3

由 Houcheng Lin 提交于 10月 23, 2014

The kernel should reserve enough room in the skb so that the DONE
message can always be appended.  However, in case of e.g. new attribute
erronously not being size-accounted for, __nfulnl_send() will still
try to put next nlmsg into this full skbuf, causing the skb to be stuck
forever and blocking delivery of further messages.

Fix issue by releasing skb immediately after nlmsg_put error and
WARN() so we can track down the cause of such size mismatch.

[ fw@strlen.de: add tailroom/len info to WARN ]
Signed-off-by: NHoucheng Lin <houcheng@gmail.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

b51d3fa3

netfilter: nfnetlink_log: fix maximum packet length logged to userspace · c1e7dc91

由 Florian Westphal 提交于 10月 23, 2014

don't try to queue payloads > 0xffff - NLA_HDRLEN, it does not work.
The nla length includes the size of the nla struct, so anything larger
results in u16 integer overflow.

This patch is similar to
9cefbbc9 (netfilter: nfnetlink_queue: cleanup copy_range usage).
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c1e7dc91

netfilter: nf_log: account for size of NLMSG_DONE attribute · 9dfa1dfe

由 Florian Westphal 提交于 10月 23, 2014

We currently neither account for the nlattr size, nor do we consider
the size of the trailing NLMSG_DONE when allocating nlmsg skb.

This can result in nflog to stop working, as __nfulnl_send() re-tries
sending forever if it failed to append NLMSG_DONE (which will never
work if buffer is not large enough).
Reported-by: NHoucheng Lin <houcheng@gmail.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

9dfa1dfe

bridge: Do not compile options in br_parse_ip_options · 7677e868

由 Herbert Xu 提交于 10月 04, 2014

Commit 462fb2af

	bridge : Sanitize skb before it enters the IP stack

broke when IP options are actually used because it mangles the
skb as if it entered the IP stack which is wrong because the
bridge is supposed to operate below the IP stack.

Since nobody has actually requested for parsing of IP options
this patch fixes it by simply reverting to the previous approach
of ignoring all IP options, i.e., zeroing the IPCB.

If and when somebody who uses IP options and actually needs them
to be parsed by the bridge complains then we can revisit this.
Reported-by: NDavid Newall <davidn@davidnewall.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Tested-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

7677e868

23 10月, 2014 10 次提交

hyperv: Fix the total_data_buflen in send path · 942396b0

由 Haiyang Zhang 提交于 10月 22, 2014

total_data_buflen is used by netvsc_send() to decide if a packet can be put
into send buffer. It should also include the size of RNDIS message before the
Ethernet frame. Otherwise, a messge with total size bigger than send_section_size
may be copied into the send buffer, and cause data corruption.

[Request to include this patch to the Stable branches]
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

942396b0

Merge branch 'amd-xgbe' · f765678e

由 David S. Miller 提交于 10月 22, 2014

Tom Lendacky says:

====================
amd-xgbe: AMD XGBE driver fixes 2014-10-22

The following series of patches includes fixes to the driver.

- Properly handle feature changes via ethtool by using correctly sized
  variables
- Perform proper napi packet counting and budget checking

This patch series is based on net.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f765678e

amd-xgbe: Fix napi Rx budget accounting · 55ca6bcd

由 Lendacky, Thomas 提交于 10月 22, 2014

Currently the amd-xgbe driver increments the packets processed counter
each time a descriptor is processed.  Since a packet can be represented
by more than one descriptor incrementing the counter in this way is not
appropriate.  Also, since multiple descriptors cause the budget check
to be short circuited, sometimes the returned value from the poll
function would be larger than the budget value resulting in a WARN_ONCE
being triggered.

Update the polling logic to properly account for the number of packets
processed and exit when the budget value is reached.
Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55ca6bcd

amd-xgbe: Properly handle feature changes via ethtool · 386f1c96

由 Lendacky, Thomas 提交于 10月 22, 2014

The ndo_set_features callback function was improperly using an unsigned
int to save the current feature value for features such as NETIF_F_RXCSUM.
Since that feature is in the upper 32 bits of a 64 bit variable the
result was always 0 making it not possible to actually turn off the
hardware RX checksum support. Change the unsigned int type to the
netdev_features_t type in order to properly capture the current value
and perform the proper operation.
Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

386f1c96

net: fec: ptp: fix NULL pointer dereference if ptp_clock is not set · 81f35ffd

由 Philipp Zabel 提交于 10月 22, 2014

Since commit 278d2404 (net: fec: ptp: Enable PPS output based on ptp clock)
fec_enet_interrupt calls fec_ptp_check_pps_event unconditionally, which calls
into ptp_clock_event. If fep->ptp_clock is NULL, ptp_clock_event tries to
dereference the NULL pointer.
Since on i.MX53 fep->bufdesc_ex is not set, fec_ptp_init is never called,
and fep->ptp_clock is NULL, which reliably causes a kernel panic.

This patch adds a check for fep->ptp_clock == NULL in fec_enet_interrupt.
Signed-off-by: NPhilipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

81f35ffd

net: fix saving TX flow hash in sock for outgoing connections · 9e7ceb06

由 Sathya Perla 提交于 10月 22, 2014

The commit "net: Save TX flow hash in sock and set in skbuf on xmit"
introduced the inet_set_txhash() and ip6_set_txhash() routines to calculate
and record flow hash(sk_txhash) in the socket structure. sk_txhash is used
to set skb->hash which is used to spread flows across multiple TXQs.

But, the above routines are invoked before the source port of the connection
is created. Because of this all outgoing connections that just differ in the
source port get hashed into the same TXQ.

This patch fixes this problem for IPv4/6 by invoking the the above routines
after the source port is available for the socket.

Fixes: b73c3d0e("net: Save TX flow hash in sock and set in skbuf on xmit")
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e7ceb06

xfrm6: fix a potential use after free in xfrm6_policy.c · 789f2023

由 Li RongQing 提交于 10月 22, 2014

pskb_may_pull() maybe change skb->data and make nh and exthdr pointer
oboslete, so recompute the nd and exthdr
Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

789f2023

net: fs_enet: set back promiscuity mode after restart · 8751b12c

由 LEROY Christophe 提交于 10月 22, 2014

After interface restart (eg: after link disconnection/reconnection), the bridge
function doesn't work anymore. This is due to the promiscuous mode being cleared
by the restart.

The mac-fcc already includes code to set the promiscuous mode back during the restart.
This patch adds the same handling to mac-fec and mac-scc.

Tested with bridge function on MPC885 with FEC.
Reported-by: NGermain Montoies <germain.montoies@c-s.fr>
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8751b12c

net: tso: fix unaligned access to crafted TCP header in helper API · a63ba13e

由 Karl Beldan 提交于 10月 21, 2014

The crafted header start address is from a driver supplied buffer, which
one can reasonably expect to be aligned on a 4-bytes boundary.
However ATM the TSO helper API is only used by ethernet drivers and
the tcp header will then be aligned to a 2-bytes only boundary from the
header start address.
Signed-off-by: NKarl Beldan <karl.beldan@rivierawaves.com>
Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a63ba13e

sfc: remove incorrect EFX_BUG_ON_PARANOID check · 8fc96351

由 Jon Cooper 提交于 10月 21, 2014

write_count and insert_count can wrap around, making > check invalid.

Fixes: 70b33fb0 ("sfc: add support for
 skb->xmit_more").
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fc96351

22 10月, 2014 7 次提交

netfilter: nf_tables: check for NULL in nf_tables_newchain pcpu stats allocation · c123bb71

由 Sabrina Dubroca 提交于 10月 21, 2014

alloc_percpu returns NULL on failure, not a negative error code.

Fixes: ff3cd7b3 ("netfilter: nf_tables: refactor chain statistic routines")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c123bb71

netfilter: ipset: off by one in ip_set_nfnl_get_byindex() · 0f9f5e1b

由 Dan Carpenter 提交于 10月 21, 2014

The ->ip_set_list[] array is initialized in ip_set_net_init() and it
has ->ip_set_max elements so this check should be >= instead of >
otherwise we are off by one.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

0f9f5e1b

netfilter: nf_conntrack: allow server to become a client in TW handling · e37ad9fd

由 Marcelo Leitner 提交于 10月 13, 2014

When a port that was used to listen for inbound connections gets closed
and reused for outgoing connections (like rsh ends up doing for stderr
flow), current we may reject the SYN/ACK packet for the new connection
because tcp_conntracks states forbirds a port to become a client while
there is still a TIME_WAIT entry in there for it.

As TCP may expire the TIME_WAIT socket in 60s and conntrack's timeout
for it is 120s, there is a ~60s window that the application can end up
opening a port that conntrack will end up blocking.

This patch fixes this by simply allowing such state transition: if we
see a SYN, in TIME_WAIT state, on REPLY direction, move it to sSS. Note
that the rest of the code already handles this situation, more
specificly in tcp_packet(), first switch clause.
Signed-off-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

e37ad9fd

net: sched: initialize bstats syncp · 7c1c97d5

由 Sabrina Dubroca 提交于 10月 21, 2014

Use netdev_alloc_pcpu_stats to allocate percpu stats and initialize syncp.

Fixes: 22e0f8b9 "net: sched: make bstats per cpu and estimator RCU safe"
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Acked-by: NCong Wang <cwang@twopensource.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c1c97d5

bpf: fix bug in eBPF verifier · 32bf08a6

由 Alexei Starovoitov 提交于 10月 20, 2014

while comparing for verifier state equivalency the comparison
was missing a check for uninitialized register.
Make sure it does so and add a testcase.

Fixes: f1bca824 ("bpf: add search pruning optimization to verifier")
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32bf08a6

netlink: Re-add locking to netlink_lookup() and seq walker · 78fd1d0a

由 Thomas Graf 提交于 10月 21, 2014

The synchronize_rcu() in netlink_release() introduces unacceptable
latency. Reintroduce minimal lookup so we can drop the
synchronize_rcu() until socket destruction has been RCUfied.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: NSteinar H. Gunderson <sgunderson@bigfoot.com>
Reported-and-tested-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

78fd1d0a

tipc: fix lockdep warning when intra-node messages are delivered · 1a194c2d

由 Ying Xue 提交于 10月 20, 2014

When running tipcTC&tipcTS test suite, below lockdep unsafe locking
scenario is reported:

[ 1109.997854]
[ 1109.997988] =================================
[ 1109.998290] [ INFO: inconsistent lock state ]
[ 1109.998575] 3.17.0-rc1+ #113 Not tainted
[ 1109.998762] ---------------------------------
[ 1109.998762] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 1109.998762] swapper/7/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 1109.998762]  (slock-AF_TIPC){+.?...}, at: [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762] {SOFTIRQ-ON-W} state was registered at:
[ 1109.998762]   [<ffffffff810a4770>] __lock_acquire+0x6a0/0x1d80
[ 1109.998762]   [<ffffffff810a6555>] lock_acquire+0x95/0x1e0
[ 1109.998762]   [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80
[ 1109.998762]   [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]   [<ffffffffa0004fe8>] tipc_link_xmit+0xa8/0xc0 [tipc]
[ 1109.998762]   [<ffffffffa000ec6f>] tipc_sendmsg+0x15f/0x550 [tipc]
[ 1109.998762]   [<ffffffffa000f165>] tipc_connect+0x105/0x140 [tipc]
[ 1109.998762]   [<ffffffff817676ee>] SYSC_connect+0xae/0xc0
[ 1109.998762]   [<ffffffff81767b7e>] SyS_connect+0xe/0x10
[ 1109.998762]   [<ffffffff817a9788>] compat_SyS_socketcall+0xb8/0x200
[ 1109.998762]   [<ffffffff81a306e5>] sysenter_dispatch+0x7/0x1f
[ 1109.998762] irq event stamp: 241060
[ 1109.998762] hardirqs last  enabled at (241060): [<ffffffff8105a4ad>] __local_bh_enable_ip+0x6d/0xd0
[ 1109.998762] hardirqs last disabled at (241059): [<ffffffff8105a46f>] __local_bh_enable_ip+0x2f/0xd0
[ 1109.998762] softirqs last  enabled at (241020): [<ffffffff81059a52>] _local_bh_enable+0x22/0x50
[ 1109.998762] softirqs last disabled at (241021): [<ffffffff8105a626>] irq_exit+0x96/0xc0
[ 1109.998762]
[ 1109.998762] other info that might help us debug this:
[ 1109.998762]  Possible unsafe locking scenario:
[ 1109.998762]
[ 1109.998762]        CPU0
[ 1109.998762]        ----
[ 1109.998762]   lock(slock-AF_TIPC);
[ 1109.998762]   <Interrupt>
[ 1109.998762]     lock(slock-AF_TIPC);
[ 1109.998762]
[ 1109.998762]  *** DEADLOCK ***
[ 1109.998762]
[ 1109.998762] 2 locks held by swapper/7/0:
[ 1109.998762]  #0:  (rcu_read_lock){......}, at: [<ffffffff81782dc9>] __netif_receive_skb_core+0x69/0xb70
[ 1109.998762]  #1:  (rcu_read_lock){......}, at: [<ffffffffa0001c90>] tipc_l2_rcv_msg+0x40/0x260 [tipc]
[ 1109.998762]
[ 1109.998762] stack backtrace:
[ 1109.998762] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.17.0-rc1+ #113
[ 1109.998762] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 1109.998762]  ffffffff82745830 ffff880016c03828 ffffffff81a209eb 0000000000000007
[ 1109.998762]  ffff880017b3cac0 ffff880016c03888 ffffffff81a1c5ef 0000000000000001
[ 1109.998762]  ffff880000000001 ffff880000000000 ffffffff81012d4f 0000000000000000
[ 1109.998762] Call Trace:
[ 1109.998762]  <IRQ>  [<ffffffff81a209eb>] dump_stack+0x4e/0x68
[ 1109.998762]  [<ffffffff81a1c5ef>] print_usage_bug+0x1f1/0x202
[ 1109.998762]  [<ffffffff81012d4f>] ? save_stack_trace+0x2f/0x50
[ 1109.998762]  [<ffffffff810a406c>] mark_lock+0x28c/0x2f0
[ 1109.998762]  [<ffffffff810a3440>] ? print_irq_inversion_bug.part.46+0x1f0/0x1f0
[ 1109.998762]  [<ffffffff810a467d>] __lock_acquire+0x5ad/0x1d80
[ 1109.998762]  [<ffffffff810a70dd>] ? trace_hardirqs_on+0xd/0x10
[ 1109.998762]  [<ffffffff8108ace8>] ? sched_clock_cpu+0x98/0xc0
[ 1109.998762]  [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30
[ 1109.998762]  [<ffffffff810a10dc>] ? lock_release_holdtime.part.29+0x1c/0x1a0
[ 1109.998762]  [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90
[ 1109.998762]  [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc]
[ 1109.998762]  [<ffffffff810a6555>] lock_acquire+0x95/0x1e0
[ 1109.998762]  [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]  [<ffffffff810a6fb6>] ? trace_hardirqs_on_caller+0xa6/0x1c0
[ 1109.998762]  [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80
[ 1109.998762]  [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]  [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc]
[ 1109.998762]  [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]  [<ffffffffa00076bd>] tipc_rcv+0x5ed/0x960 [tipc]
[ 1109.998762]  [<ffffffffa0001d1c>] tipc_l2_rcv_msg+0xcc/0x260 [tipc]
[ 1109.998762]  [<ffffffffa0001c90>] ? tipc_l2_rcv_msg+0x40/0x260 [tipc]
[ 1109.998762]  [<ffffffff81783345>] __netif_receive_skb_core+0x5e5/0xb70
[ 1109.998762]  [<ffffffff81782dc9>] ? __netif_receive_skb_core+0x69/0xb70
[ 1109.998762]  [<ffffffff81784eb9>] ? dev_gro_receive+0x259/0x4e0
[ 1109.998762]  [<ffffffff817838f6>] __netif_receive_skb+0x26/0x70
[ 1109.998762]  [<ffffffff81783acd>] netif_receive_skb_internal+0x2d/0x1f0
[ 1109.998762]  [<ffffffff81785518>] napi_gro_receive+0xd8/0x240
[ 1109.998762]  [<ffffffff815bf854>] e1000_clean_rx_irq+0x2c4/0x530
[ 1109.998762]  [<ffffffff815c1a46>] e1000_clean+0x266/0x9c0
[ 1109.998762]  [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30
[ 1109.998762]  [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90
[ 1109.998762]  [<ffffffff817842b1>] net_rx_action+0x141/0x310
[ 1109.998762]  [<ffffffff810bd710>] ? handle_fasteoi_irq+0xe0/0x150
[ 1109.998762]  [<ffffffff81059fa6>] __do_softirq+0x116/0x4d0
[ 1109.998762]  [<ffffffff8105a626>] irq_exit+0x96/0xc0
[ 1109.998762]  [<ffffffff81a30d07>] do_IRQ+0x67/0x110
[ 1109.998762]  [<ffffffff81a2ee2f>] common_interrupt+0x6f/0x6f
[ 1109.998762]  <EOI>  [<ffffffff8100d2b7>] ? default_idle+0x37/0x250
[ 1109.998762]  [<ffffffff8100d2b5>] ? default_idle+0x35/0x250
[ 1109.998762]  [<ffffffff8100dd1f>] arch_cpu_idle+0xf/0x20
[ 1109.998762]  [<ffffffff810999fd>] cpu_startup_entry+0x27d/0x4d0
[ 1109.998762]  [<ffffffff81034c78>] start_secondary+0x188/0x1f0

When intra-node messages are delivered from one process to another
process, tipc_link_xmit() doesn't disable BH before it directly calls
tipc_sk_rcv() on process context to forward messages to destination
socket. Meanwhile, if messages delivered by remote node arrive at the
node and their destinations are also the same socket, tipc_sk_rcv()
running on process context might be preempted by tipc_sk_rcv() running
BH context. As a result, the latter cannot obtain the socket lock as
the lock was obtained by the former, however, the former has no chance
to be run as the latter is owning the CPU now, so headlock happens. To
avoid it, BH should be always disabled in tipc_sk_rcv().
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a194c2d

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功