提交 · 0f2be423f1fa70df4e3b91224bcdded76675308c · openanolis / cloud-kernel

07 9月, 2017 4 次提交

tipc: remove unnecessary call to dev_net() · 8e0deed9

由 Kleber Sacilotto de Souza 提交于 9月 06, 2017

The net device is already stored in the 'net' variable, so no need to call
dev_net() again.
Signed-off-by: NKleber Sacilotto de Souza <kleber.souza@canonical.com>
Acked-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e0deed9

netlink: access nlk groups safely in netlink bind and getname · f7736080

由 Xin Long 提交于 9月 06, 2017

Now there is no lock protecting nlk ngroups/groups' accessing in
netlink bind and getname. It's safe from nlk groups' setting in
netlink_release, but not from netlink_realloc_groups called by
netlink_setsockopt.

netlink_lock_table is needed in both netlink bind and getname when
accessing nlk groups.
Acked-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7736080

netlink: fix an use-after-free issue for nlk groups · be82485f

由 Xin Long 提交于 9月 06, 2017

ChunYu found a netlink use-after-free issue by syzkaller:

[28448.842981] BUG: KASAN: use-after-free in __nla_put+0x37/0x40 at addr ffff8807185e2378
[28448.969918] Call Trace:
[...]
[28449.117207]  __nla_put+0x37/0x40
[28449.132027]  nla_put+0xf5/0x130
[28449.146261]  sk_diag_fill.isra.4.constprop.5+0x5a0/0x750 [netlink_diag]
[28449.176608]  __netlink_diag_dump+0x25a/0x700 [netlink_diag]
[28449.202215]  netlink_diag_dump+0x176/0x240 [netlink_diag]
[28449.226834]  netlink_dump+0x488/0xbb0
[28449.298014]  __netlink_dump_start+0x4e8/0x760
[28449.317924]  netlink_diag_handler_dump+0x261/0x340 [netlink_diag]
[28449.413414]  sock_diag_rcv_msg+0x207/0x390
[28449.432409]  netlink_rcv_skb+0x149/0x380
[28449.467647]  sock_diag_rcv+0x2d/0x40
[28449.484362]  netlink_unicast+0x562/0x7b0
[28449.564790]  netlink_sendmsg+0xaa8/0xe60
[28449.661510]  sock_sendmsg+0xcf/0x110
[28449.865631]  __sys_sendmsg+0xf3/0x240
[28450.000964]  SyS_sendmsg+0x32/0x50
[28450.016969]  do_syscall_64+0x25c/0x6c0
[28450.154439]  entry_SYSCALL64_slow_path+0x25/0x25

It was caused by no protection between nlk groups' free in netlink_release
and nlk groups' accessing in sk_diag_dump_groups. The similar issue also
exists in netlink_seq_show().

This patch is to defer nlk groups' free in deferred_put_nlk_sk.
Reported-by: NChunYu Wang <chunwang@redhat.com>
Acked-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be82485f

sched: Use __qdisc_drop instead of kfree_skb in sch_prio and sch_qfq · 39ad1297

由 Gao Feng 提交于 9月 04, 2017

The commit 520ac30f ("net_sched: drop packets after root qdisc lock
is released) made a big change of tc for performance. There are two points
left in sch_prio and sch_qfq which are not changed with that commit. Now
enhance them now with __qdisc_drop.
Signed-off-by: NGao Feng <gfree.wind@vip.163.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39ad1297

06 9月, 2017 12 次提交

mac80211: fix deadlock in driver-managed RX BA session start · bde59c47

由 Johannes Berg 提交于 9月 06, 2017

When an RX BA session is started by the driver, and it has to tell
mac80211 about it, the corresponding bit in tid_rx_manage_offl gets
set and the BA session work is scheduled. Upon testing this bit, it
will call __ieee80211_start_rx_ba_session(), thus deadlocking as it
already holds the ampdu_mlme.mtx, which that acquires again.

Fix this by adding ___ieee80211_start_rx_ba_session(), a version of
the function that requires the mutex already held.

Cc: stable@vger.kernel.org
Fixes: 699cb58c ("mac80211: manage RX BA session offload without SKB queue")
Reported-by: NMatteo Croce <mcroce@redhat.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

bde59c47

mac80211: Complete ampdu work schedule during session tear down · 98e93e96

由 Ilan peer 提交于 9月 06, 2017

Commit 7a7c0a64 ("mac80211: fix TX aggregation start/stop callback race")
added a cancellation of the ampdu work after the loop that stopped the
Tx and Rx BA sessions. However, in some cases, e.g., during HW reconfig,
the low level driver might call mac80211 APIs to complete the stopping
of the BA sessions, which would queue the ampdu work to handle the actual
completion. This work needs to be performed as otherwise mac80211 data
structures would not be properly synced.

Fix this by checking if BA session STOP_CB bit is set after the BA session
cancellation and properly clean the session.
Signed-off-by: NIlan Peer <ilan.peer@intel.com>
[Johannes: the work isn't flushed because that could do other things we
 don't want, and the locking situation isn't clear]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

98e93e96

cfg80211: honor NL80211_RRF_NO_HT40{MINUS,PLUS} · 4e0854a7

由 Emmanuel Grumbach 提交于 9月 06, 2017

Honor the NL80211_RRF_NO_HT40{MINUS,PLUS} flags in
reg_process_ht_flags_channel. Not doing so leads can lead
to a firmware assert in iwlwifi for example.

Fixes: b0d7aa59 ("cfg80211: allow wiphy specific regdomain management")
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

4e0854a7

rds: Fix non-atomic operation on shared flag variable · f530f39f

由 Håkon Bugge 提交于 9月 05, 2017

The bits in m_flags in struct rds_message are used for a plurality of
reasons, and from different contexts. To avoid any missing updates to
m_flags, use the atomic set_bit() instead of the non-atomic equivalent.
Signed-off-by: NHåkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: NKnut Omang <knut.omang@oracle.com>
Reviewed-by: NWei Lin Guay <wei.lin.guay@oracle.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f530f39f

net: sched: don't use GFP_KERNEL under spin lock · 2c8468dc

由 Jakub Kicinski 提交于 9月 05, 2017

The new TC IDR code uses GFP_KERNEL under spin lock.  Which leads
to:

[  582.621091] BUG: sleeping function called from invalid context at ../mm/slab.h:416
[  582.629721] in_atomic(): 1, irqs_disabled(): 0, pid: 3379, name: tc
[  582.636939] 2 locks held by tc/3379:
[  582.641049]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff910354ce>] rtnetlink_rcv_msg+0x92e/0x1400
[  582.650958]  #1:  (&(&tn->idrinfo->lock)->rlock){+.-.+.}, at: [<ffffffff9110a5e0>] tcf_idr_create+0x2f0/0x8e0
[  582.662217] Preemption disabled at:
[  582.662222] [<ffffffff9110a5e0>] tcf_idr_create+0x2f0/0x8e0
[  582.672592] CPU: 9 PID: 3379 Comm: tc Tainted: G        W       4.13.0-rc7-debug-00648-g43503a79b9f0 #287
[  582.683432] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
[  582.691937] Call Trace:
...
[  582.742460]  kmem_cache_alloc+0x286/0x540
[  582.747055]  radix_tree_node_alloc.constprop.6+0x4a/0x450
[  582.753209]  idr_get_free_cmn+0x627/0xf80
...
[  582.815525]  idr_alloc_cmn+0x1a8/0x270
...
[  582.833804]  tcf_idr_create+0x31b/0x8e0
...

Try to preallocate the memory with idr_prealloc(GFP_KERNEL)
(as suggested by Eric Dumazet), and change the allocation
flags under spin lock.

Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c8468dc

rxrpc: Make service connection lookup always check for retry · fdade4f6

由 David Howells 提交于 9月 04, 2017

When an RxRPC service packet comes in, the target connection is looked up
by an rb-tree search under RCU and a read-locked seqlock; the seqlock retry
check is, however, currently skipped if we got a match, but probably
shouldn't be in case the connection we found gets replaced whilst we're
doing a search.

Make the lookup procedure always go through need_seqretry(), even if the
lookup was successful.  This makes sure we always pick up on a write-lock
event.

On the other hand, since we don't take a ref on the object, but rely on RCU
to prevent its destruction after dropping the seqlock, I'm not sure this is
necessary.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fdade4f6

net: dsa: tag_brcm: Set output queue from skb queue mapping · 0f15b098

由 Florian Fainelli 提交于 9月 03, 2017

We originally used skb->priority but that was not quite correct as this
bitfield needs to contain the egress switch queue we intend to send this
SKB to.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f15b098

net: dsa: Allow switch drivers to indicate number of TX queues · 55199df6

由 Florian Fainelli 提交于 9月 03, 2017

Let switch drivers indicate how many TX queues they support. Some
switches, such as Broadcom Starfighter 2 are designed with 8 egress
queues. Future changes will allow us to leverage the queue mapping and
direct the transmission towards a particular queue.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55199df6

bridge: switchdev: Use an helper to clear forward mark · f1c2eddf

由 Ido Schimmel 提交于 9月 03, 2017

Instead of using ifdef in the C file.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Suggested-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Tested-by: NYotam Gigi <yotamg@mellanox.com>
Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1c2eddf

flow_dissector: Add limit for number of headers to dissect · 1eed4dfb

由 Tom Herbert 提交于 9月 01, 2017

In flow dissector there are no limits to the number of nested
encapsulations or headers that might be dissected which makes for a
nice DOS attack. This patch sets a limit of the number of headers
that flow dissector will parse.

Headers includes network layer headers, transport layer headers, shim
headers for encapsulation, IPv6 extension headers, etc. The limit for
maximum number of headers to parse has be set to fifteen to account for
a reasonable number of encapsulations, extension headers, VLAN,
in a packet. Note that this limit does not supercede the STOP_AT_*
flags which may stop processing before the headers limit is reached.
Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NTom Herbert <tom@quantonium.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1eed4dfb

flow_dissector: Cleanup control flow · 3a1214e8

由 Tom Herbert 提交于 9月 01, 2017

__skb_flow_dissect is riddled with gotos that make discerning the flow,
debugging, and extending the capability difficult. This patch
reorganizes things so that we only perform goto's after the two main
switch statements (no gotos within the cases now). It also eliminates
several goto labels so that there are only two labels that can be target
for goto.
Reported-by: NAlexander Popov <alex.popov@linux.com>
Signed-off-by: NTom Herbert <tom@quantonium.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a1214e8

net/ncsi: fix ncsi_vlan_rx_{add,kill}_vid references · fd0c88b7

由 Arnd Bergmann 提交于 9月 05, 2017

We get a new link error in allmodconfig kernels after ftgmac100
started using the ncsi helpers:

ERROR: "ncsi_vlan_rx_kill_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined!
ERROR: "ncsi_vlan_rx_add_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined!

Related to that, we get another error when CONFIG_NET_NCSI is disabled:

drivers/net/ethernet/faraday/ftgmac100.c:1626:25: error: 'ncsi_vlan_rx_add_vid' undeclared here (not in a function); did you mean 'ncsi_start_dev'?
drivers/net/ethernet/faraday/ftgmac100.c:1627:26: error: 'ncsi_vlan_rx_kill_vid' undeclared here (not in a function); did you mean 'ncsi_vlan_rx_add_vid'?

This fixes both problems at once, using a 'static inline' stub helper
for the disabled case, and exporting the functions when they are present.

Fixes: 51564585 ("ftgmac100: Support NCSI VLAN filtering when available")
Fixes: 21acf630 ("net/ncsi: Configure VLAN tag filter")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd0c88b7

05 9月, 2017 8 次提交

nl80211: look for HT/VHT capabilities in beacon's tail · ba83bfb1

由 Igor Mitsyanko 提交于 8月 30, 2017

There are no HT/VHT capabilities in cfg80211_ap_settings::beacon_ies,
these should be looked for in beacon's tail instead.

Fixes: 66cd794e ("nl80211: add HT/VHT capabilities to AP parameters")
Signed-off-by: NIgor Mitsyanko <igor.mitsyanko.os@quantenna.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

ba83bfb1

mac80211: flush hw_roc_start work before cancelling the ROC · 6e46d8ce

由 Avraham Stern 提交于 8月 18, 2017

When HW ROC is supported it is possible that after the HW notified
that the ROC has started, the ROC was cancelled and another ROC was
added while the hw_roc_start worker is waiting on the mutex (since
cancelling the ROC and adding another one also holds the same mutex).
As a result, the hw_roc_start worker will continue to run after the
new ROC is added but before it is actually started by the HW.
This may result in notifying userspace that the ROC has started before
it actually does, or in case of management tx ROC, in an attempt to
tx while not on the right channel.

In addition, when the driver will notify mac80211 that the second ROC
has started, mac80211 will warn that this ROC has already been
notified.

Fix this by flushing the hw_roc_start work before cancelling an ROC.

Cc: stable@vger.kernel.org
Signed-off-by: NAvraham Stern <avraham.stern@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

6e46d8ce

mac80211: agg-tx: call drv_wake_tx_queue in proper context · 979e1f08

由 Johannes Berg 提交于 6月 22, 2017

Since drv_wake_tx_queue() is normally called in the TX path, which
is already in an RCU critical section, we should call it the same
way in the aggregation code path, so if the driver expects to be
able to use RCU, it'll already be protected without having to enter
a nested critical section.

Additionally, disable soft-IRQs, since not doing so could cause
issues in a driver that relies on them already being disabled like
in the other path.

Fixes: ba8c3d6f ("mac80211: add an intermediate software queue implementation")
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

979e1f08

mac80211: Fix null pointer dereference with iTXQ support · 89e9bfc4

由 Chunho Lee 提交于 7月 07, 2017

This change adds null pointer check before dereferencing pointer dev on
netif_tx_start_all_queues() when an interface is added.
With iTXQ support, netif_tx_start_all_queues() is always called while
an interface is added. however, the netdev queues are not associated
and dev is null when the interface is either NL80211_IFTYPE_P2P_DEVICE
or NL80211_IFTYPE_NAN.
Signed-off-by: NChunho Lee <ch.lee@newracom.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

89e9bfc4

mac80211: add MESH IE in the correct order · b44eebea

由 Liad Kaufman 提交于 8月 05, 2017

VHT MESH support was added, but the order of the IEs
wasn't enforced. Fix that.
Signed-off-by: NLiad Kaufman <liad.kaufman@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

b44eebea

mac80211: shorten debug prints using ht_dbg() to avoid warning · d81b0fd0

由 Sharon Dvir 提交于 8月 05, 2017

Invoking ht_dbg() with too long of a string will print a warning.
Shorten the messages while retaining the printed patameters.
Signed-off-by: NSharon Dvir <sharon.dvir@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

d81b0fd0

mac80211: fix VLAN handling with TXQs · 53168215

由 Johannes Berg 提交于 6月 22, 2017

With TXQs, the AP_VLAN interfaces are resolved to their owner AP
interface when enqueuing the frame, which makes sense since the
frame really goes out on that as far as the driver is concerned.

However, this introduces a problem: frames to be encrypted with
a VLAN-specific GTK will now be encrypted with the AP GTK, since
the information about which virtual interface to use to select
the key is taken from the TXQ.

Fix this by preserving info->control.vif and using that in the
dequeue function. This now requires doing the driver-mapping
in the dequeue as well.

Since there's no way to filter the frames that are sitting on a
TXQ, drop all frames, which may affect other interfaces, when an
AP_VLAN is removed.

Cc: stable@vger.kernel.org
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

53168215

mac80211: fix incorrect assignment of reassoc value · 7a7d3e4c

由 Simon Dinkin 提交于 8月 31, 2017

this fix minor issue in the log message.
in ieee80211_rx_mgmt_assoc_resp function, when assigning the
reassoc value from the mgmt frame control:
ieee80211_is_reassoc_resp function need to be used, instead of
ieee80211_is_reassoc_req function.
Signed-off-by: NSimon Dinkin <simon.dinkin@tandemg.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

7a7d3e4c

04 9月, 2017 15 次提交

netfilter: nf_tables: support for recursive chain deletion · 9dee1474

由 Pablo Neira Ayuso 提交于 9月 03, 2017

This patch sorts out an asymmetry in deletions. Currently, table and set
deletion commands come with an implicit content flush on deletion.
However, chain deletion results in -EBUSY if there is content in this
chain, so no implicit flush happens. So you have to send a flush command
in first place to delete chains, this is inconsistent and it can be
annoying in terms of user experience.

This patch uses the new NLM_F_NONREC flag to request non-recursive chain
deletion, ie. if the chain to be removed contains rules, then this
returns EBUSY. This problem was discussed during the NFWS'17 in Faro,
Portugal. In iptables, you hit -EBUSY if you try to delete a chain that
contains rules, so you have to flush first before you can remove
anything. Since iptables-compat uses the nf_tables netlink interface, it
has to use the NLM_F_NONREC flag from userspace to retain the original
iptables semantics, ie. bail out on removing chains that contain rules.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

9dee1474

netfilter: nf_tables: use NLM_F_NONREC for deletion requests · a8278400

由 Pablo Neira Ayuso 提交于 9月 03, 2017

Bail out if user requests non-recursive deletion for tables and sets.
This new flags tells nf_tables netlink interface to reject deletions if
tables and sets have content.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

a8278400

netfilter: nf_tables: add nf_tables_addchain() · 4035285f

由 Pablo Neira Ayuso 提交于 9月 03, 2017

Wrap the chain addition path in a function to make it more maintainable.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

4035285f

netfilter: nf_tables: add nf_tables_updchain() · 2c4a488a

由 Pablo Neira Ayuso 提交于 9月 03, 2017

nf_tables_newchain() is too large, wrap the chain update path in a
function to make it more maintainable.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

2c4a488a

net: Remove CONFIG_NETFILTER_DEBUG and _ASSERT() macros. · 9efdb14f

由 Varsha Rao 提交于 8月 30, 2017

This patch removes CONFIG_NETFILTER_DEBUG and _ASSERT() macros as they
are no longer required. Replace _ASSERT() macros with WARN_ON().
Signed-off-by: NVarsha Rao <rvarsha016@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

9efdb14f

net: Replace NF_CT_ASSERT() with WARN_ON(). · 44d6e2f2

由 Varsha Rao 提交于 8月 30, 2017

This patch removes NF_CT_ASSERT() and instead uses WARN_ON().
Signed-off-by: NVarsha Rao <rvarsha016@gmail.com>

44d6e2f2

F
netfilter: remove unused hooknum arg from packet functions · d1c1e39d
由 Florian Westphal 提交于 8月 29, 2017
```
tested with allmodconfig build.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
```
d1c1e39d

netfilter: nft_limit: add stateful object type · a6912055

由 Pablo M. Bermudo Garay 提交于 8月 23, 2017

Register a new limit stateful object type into the stateful object
infrastructure.
Signed-off-by: NPablo M. Bermudo Garay <pablombg@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

a6912055

netfilter: nft_limit: replace pkt_bytes with bytes · 6e323887

由 Pablo M. Bermudo Garay 提交于 8月 23, 2017

Just a small refactor patch in order to improve the code readability.
Signed-off-by: NPablo M. Bermudo Garay <pablombg@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

6e323887

netfilter: nf_tables: add select_ops for stateful objects · dfc46034

由 Pablo M. Bermudo Garay 提交于 8月 23, 2017

This patch adds support for overloading stateful objects operations
through the select_ops() callback, just as it is implemented for
expressions.

This change is needed for upcoming additions to the stateful objects
infrastructure.
Signed-off-by: NPablo M. Bermudo Garay <pablombg@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

dfc46034

netfilter: xt_hashlimit: add rate match mode · bea74641

由 Vishwanath Pai 提交于 8月 18, 2017

This patch adds a new feature to hashlimit that allows matching on the
current packet/byte rate without rate limiting. This can be enabled
with a new flag --hashlimit-rate-match. The match returns true if the
current rate of packets is above/below the user specified value.

The main difference between the existing algorithm and the new one is
that the existing algorithm rate-limits the flow whereas the new
algorithm does not. Instead it *classifies* the flow based on whether
it is above or below a certain rate. I will demonstrate this with an
example below. Let us assume this rule:

iptables -A INPUT -m hashlimit --hashlimit-above 10/s -j new_chain

If the packet rate is 15/s, the existing algorithm would ACCEPT 10
packets every second and send 5 packets to "new_chain".

But with the new algorithm, as long as the rate of 15/s is sustained,
all packets will continue to match and every packet is sent to new_chain.

This new functionality will let us classify different flows based on
their current rate, so that further decisions can be made on them based on
what the current rate is.

This is how the new algorithm works:
We divide time into intervals of 1 (sec/min/hour) as specified by
the user. We keep track of the number of packets/bytes processed in the
current interval. After each interval we reset the counter to 0.

When we receive a packet for match, we look at the packet rate
during the current interval and the previous interval to make a
decision:

if [ prev_rate < user and cur_rate < user ]
        return Below
else
        return Above

Where cur_rate is the number of packets/bytes seen in the current
interval, prev is the number of packets/bytes seen in the previous
interval and 'user' is the rate specified by the user.

We also provide flexibility to the user for choosing the time
interval using the option --hashilmit-interval. For example the user can
keep a low rate like x/hour but still keep the interval as small as 1
second.

To preserve backwards compatibility we have to add this feature in a new
revision, so I've created revision 3 for hashlimit. The two new options
we add are:

--hashlimit-rate-match
--hashlimit-rate-interval

I have updated the help text to add these new options. Also added a few
tests for the new options.
Suggested-by: NIgor Lubashev <ilubashe@akamai.com>
Reviewed-by: NJosh Hunt <johunt@akamai.com>
Signed-off-by: NVishwanath Pai <vpai@akamai.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

bea74641

l2tp: pass tunnel pointer to ->session_create() · f026bc29

由 Guillaume Nault 提交于 9月 01, 2017

Using l2tp_tunnel_find() in pppol2tp_session_create() and
l2tp_eth_create() is racy, because no reference is held on the
returned session. These functions are only used to implement the
->session_create callback which is run by l2tp_nl_cmd_session_create().
Therefore searching for the parent tunnel isn't necessary because
l2tp_nl_cmd_session_create() already has a pointer to it and holds a
reference.

This patch modifies ->session_create()'s prototype to directly pass the
the parent tunnel as parameter, thus avoiding searching for it in
pppol2tp_session_create() and l2tp_eth_create().

Since we have to touch the ->session_create() call in
l2tp_nl_cmd_session_create(), let's also remove the useless conditional:
we know that ->session_create isn't NULL at this point because it's
already been checked earlier in this same function.

Finally, one might be tempted to think that the removed
l2tp_tunnel_find() calls were harmless because they would return the
same tunnel as the one held by l2tp_nl_cmd_session_create() anyway.
But that tunnel might be removed and a new one created with same tunnel
Id before the l2tp_tunnel_find() call. In this case l2tp_tunnel_find()
would return the new tunnel which wouldn't be protected by the
reference held by l2tp_nl_cmd_session_create().

Fixes: 309795f4 ("l2tp: Add netlink control API for L2TP")
Fixes: d9e31d17 ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f026bc29

l2tp: prevent creation of sessions on terminated tunnels · f3c66d4e

由 Guillaume Nault 提交于 9月 01, 2017

l2tp_tunnel_destruct() sets tunnel->sock to NULL, then removes the
tunnel from the pernet list and finally closes all its sessions.
Therefore, it's possible to add a session to a tunnel that is still
reachable, but for which tunnel->sock has already been reset. This can
make l2tp_session_create() dereference a NULL pointer when calling
sock_hold(tunnel->sock).

This patch adds the .acpt_newsess field to struct l2tp_tunnel, which is
used by l2tp_tunnel_closeall() to prevent addition of new sessions to
tunnels. Resetting tunnel->sock is done after l2tp_tunnel_closeall()
returned, so that l2tp_session_add_to_tunnel() can safely take a
reference on it when .acpt_newsess is true.

The .acpt_newsess field is modified in l2tp_tunnel_closeall(), rather
than in l2tp_tunnel_destruct(), so that it benefits all tunnel removal
mechanisms. E.g. on UDP tunnels, a session could be added to a tunnel
after l2tp_udp_encap_destroy() proceeded. This would prevent the tunnel
from being removed because of the references held by this new session
on the tunnel and its socket. Even though the session could be removed
manually later on, this defeats the purpose of
commit 9980d001 ("l2tp: add udp encap socket destroy handler").

Fixes: fd558d18 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3c66d4e

Revert "net: fix percpu memory leaks" · 5a63643e

由 Jesper Dangaard Brouer 提交于 9月 01, 2017

This reverts commit 1d6119ba.

After reverting commit 6d7b857d ("net: use lib/percpu_counter API
for fragmentation mem accounting") then here is no need for this
fix-up patch.  As percpu_counter is no longer used, it cannot
memory leak it any-longer.

Fixes: 6d7b857d ("net: use lib/percpu_counter API for fragmentation mem accounting")
Fixes: 1d6119ba ("net: fix percpu memory leaks")
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a63643e

Revert "net: use lib/percpu_counter API for fragmentation mem accounting" · fb452a1a

由 Jesper Dangaard Brouer 提交于 9月 01, 2017

This reverts commit 6d7b857d.

There is a bug in fragmentation codes use of the percpu_counter API,
that can cause issues on systems with many CPUs.

The frag_mem_limit() just reads the global counter (fbc->count),
without considering other CPUs can have upto batch size (130K) that
haven't been subtracted yet.  Due to the 3MBytes lower thresh limit,
this become dangerous at >=24 CPUs (3*1024*1024/130000=24).

The correct API usage would be to use __percpu_counter_compare() which
does the right thing, and takes into account the number of (online)
CPUs and batch size, to account for this and call __percpu_counter_sum()
when needed.

We choose to revert the use of the lib/percpu_counter API for frag
memory accounting for several reasons:

1) On systems with CPUs > 24, the heavier fully locked
   __percpu_counter_sum() is always invoked, which will be more
   expensive than the atomic_t that is reverted to.

Given systems with more than 24 CPUs are becoming common this doesn't
seem like a good option.  To mitigate this, the batch size could be
decreased and thresh be increased.

2) The add_frag_mem_limit+sub_frag_mem_limit pairs happen on the RX
   CPU, before SKBs are pushed into sockets on remote CPUs.  Given
   NICs can only hash on L2 part of the IP-header, the NIC-RXq's will
   likely be limited.  Thus, a fair chance that atomic add+dec happen
   on the same CPU.

Revert note that commit 1d6119ba ("net: fix percpu memory leaks")
removed init_frag_mem_limit() and instead use inet_frags_init_net().
After this revert, inet_frags_uninit_net() becomes empty.

Fixes: 6d7b857d ("net: use lib/percpu_counter API for fragmentation mem accounting")
Fixes: 1d6119ba ("net: fix percpu memory leaks")
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb452a1a

02 9月, 2017 1 次提交

net: Add module reference to FIB notifiers · 864150df

由 Ido Schimmel 提交于 9月 01, 2017

When a listener registers to the FIB notification chain it receives a
dump of the FIB entries and rules from existing address families by
invoking their dump operations.

While we call into these modules we need to make sure they aren't
removed. Do that by increasing their reference count before invoking
their dump operations and decrease it afterwards.

Fixes: 04b1d4e5 ("net: core: Make the FIB notification chain generic")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

864150df

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功