提交 · 0973dd45ecefd746569d414406f5733062fe2817 · openeuler / Kernel

19 12月, 2017 1 次提交

Revert "mac80211: Add airtime account and scheduling to TXQs" · 0973dd45

由 Johannes Berg 提交于 12月 19, 2017

This reverts commit b0d52ad8.

We need to revert the TXQ scheduling API due to conflicts
with a new driver, and this depends on that API.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

0973dd45

11 12月, 2017 14 次提交

mac80211: Add airtime account and scheduling to TXQs · b0d52ad8

由 Toke Høiland-Jørgensen 提交于 10月 31, 2017

This adds airtime accounting and scheduling to the mac80211 TXQ
scheduler. A new hardware flag, AIRTIME_ACCOUNTING, is added that
drivers can set if they support reporting airtime usage of
transmissions. When this flag is set, mac80211 will expect the actual
airtime usage to be reported in the tx_time and rx_time fields of the
respective status structs.

When airtime information is present, mac80211 will schedule TXQs
(through ieee80211_next_txq()) in a way that enforces airtime fairness
between active stations. This scheduling works the same way as the ath9k
in-driver airtime fairness scheduling.
Signed-off-by: NToke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

b0d52ad8

mac80211: Add TXQ scheduling API · e937b8da

由 Toke Høiland-Jørgensen 提交于 10月 31, 2017

This adds an API to mac80211 to handle scheduling of TXQs and changes the
interface between driver and mac80211 for TXQ handling as follows:

- The wake_tx_queue callback interface no longer includes the TXQ. Instead,
  the driver is expected to retrieve that from ieee80211_next_txq()

- Two new mac80211 functions are added: ieee80211_next_txq() and
  ieee80211_schedule_txq(). The former returns the next TXQ that should be
  scheduled, and is how the driver gets a queue to pull packets from. The
  latter is called internally by mac80211 to start scheduling a queue, and
  the driver is supposed to call it to re-schedule the TXQ after it is
  finished pulling packets from it (unless the queue emptied).

The ath9k and ath10k drivers are changed to use the new API.
Signed-off-by: NToke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

e937b8da

mac80211: Add MIC space only for TX key option · 9de18d81

由 David Spinadel 提交于 12月 01, 2017

Add a key flag to indicates that the device only needs
MIC space and not a real MIC.
In such cases, keep the MIC zeroed for ease of debug.
Signed-off-by: NDavid Spinadel <david.spinadel@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

9de18d81

mac80211: don't warn on AID field without top two MSBs set · c7c477b5

由 Johannes Berg 提交于 12月 01, 2017

While the change between 802.11-2012 and 802.11-2016 to move from
requiring APs to set the two top bits to now requiring them to be
cleared was apparently unintentional and will be fixed, clients
should either way assume that the top five bits are reserved and
ignore them.

Implement that in mac80211.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

c7c477b5

nl80211: add a few extended error strings to key parsing · 768075eb

由 Johannes Berg 提交于 11月 13, 2017

This mostly serves as an example for how to add error strings
and erroneous attribute pointers.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

768075eb

cfg80211: cleanup signal strength units notation · 6c2fb1e6

由 Sergey Matyukevich 提交于 11月 09, 2017

Both cfg80211_rx_mgmt and cfg80211_report_obss_beacon functions send
reports to userspace using NL80211_ATTR_RX_SIGNAL_DBM attribute w/o
any processing of their input signal values. Which means that in
order to match userspace tools expectations, input signal values
for those functions are supposed to be in dBm units.

This patch cleans up comments, variable names, and trace reports
for those functions, replacing confusing 'mBm' by 'dBm'.
Signed-off-by: NSergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

6c2fb1e6

cfg80211: IBSS: Add support for static WEP in driver for IBSS · 9ae3b172

由 Tova Mussai 提交于 10月 29, 2017

Add support for drivers that implement static WEP internally for IBSS.
Add the WEP keys to the IBSS params struct, that will allow the driver
to use the keys in the join flow, and not only after the connection.
Signed-off-by: NTova Mussai <tova.mussai@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

9ae3b172

mac80211: remove BUG() when interface type is invalid · c7976f52

由 Luca Coelho 提交于 10月 29, 2017

In the ieee80211_setup_sdata() we check if the interface type is valid
and, if not, call BUG().  This should never happen, but if there is
something wrong with the code, it will not be caught until the bug
happens when an interface is being set up.  Calling BUG() is too
extreme for this and a WARN_ON() would be better used instead.  Change
that.
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

c7976f52

mac80211: call synchronize_net once in the restart flow · 2316380f

由 Sara Sharon 提交于 10月 29, 2017

Currently the restart flow enables RX back, and then proceeds
to tear down RX and TX aggregations.
The TX aggregation tear down calls synchronize_net(), which
waits for packet receiving to be done.
This is done for every session, while RX processing is already
active, and in some reproductions it takes up to 3 seconds.
Add a call once in the restart_work, before we have traffic
active again, and remove the subsequent calls when tearing
down the aggregation.
This requires to move down the code that turns off the
reconfig flag in order to be able to test it in
_ieee80211_stop_tx_ba_session().
Signed-off-by: NSara Sharon <sara.sharon@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

2316380f

mac80211: always update the PM state of a peer on MGMT / DATA frames · 9fef6544

由 Emmanuel Grumbach 提交于 10月 29, 2017

The 2016 version of the spec is more generic about when the
AP should update the power management state of the peer:
the AP shall update the state based on any management or
data frames. This means that even non-bufferable management
frames should be looked at to update to maintain the power
management state of the peer.

This can avoid problematic cases for example if a station
disappears while being asleep and then re-appears. The AP
would remember it as in power save, but the Authentication
frame couldn't be used to set the peer as awake again.
Note that this issues wasn't really critical since at some
point (after the association) we would have removed the
station and created another one with all the states cleared.
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

9fef6544

mac80211: make __ieee80211_start_rx_ba_session static · a9d09bc1

由 Johannes Berg 提交于 11月 13, 2017

The function is only used with the file, so make it static.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

a9d09bc1

mac80211: enable TDLS peer buffer STA feature · e2fb1b83

由 Yingying Tang 提交于 10月 24, 2017

Allow drivers to set the buffer station extended capability
for TDLS links, with a new hardware flag indicating this.
Signed-off-by: NYingying Tang <yintang@qti.qualcomm.com>
[change commit log/documentation wording]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

e2fb1b83

mac80211: avoid looking up tid_tx/tid_rx from timers · d559e303

由 Johannes Berg 提交于 10月 18, 2017

There's no need to re-lookup the data structures now that
we actually get them immediately with from_timer(), just
avoid that. The struct has to be valid anyway, otherwise
the timer object itself would no longer be valid, and we
can't have a different version of the struct since only a
single session per TID is permitted.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

d559e303

mac80211: mark expected switch fall-throughs · 02049ce2

由 Gustavo A. R. Silva 提交于 10月 17, 2017

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in some cases I replaced "fall through on else" and
"otherwise fall through" comments with just a "fall through" comment,
which is what GCC is expecting to find.
Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

02049ce2

09 12月, 2017 21 次提交

rtnetlink: allow GSO maximums to be set on device creation · 46e6b992

由 Stephen Hemminger 提交于 12月 07, 2017

Netlink device already allows changing GSO sizes with
ip set command. The part that is missing is allowing overriding
GSO settings on device creation.
Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46e6b992

tcp: evaluate packet losses upon RTT change · 6065fd0d

由 Yuchung Cheng 提交于 12月 07, 2017

RACK skips an ACK unless it advances the most recently delivered
TX timestamp (rack.mstamp). Since RACK also uses the most recent
RTT to decide if a packet is lost, RACK should still run the
loss detection whenever the most recent RTT changes. For example,
an ACK that does not advance the timestamp but triggers the cwnd
undo due to reordering, would then use the most recent (higher)
RTT measurement to detect further losses.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Reviewed-by: NNeal Cardwell <ncardwell@google.com>
Reviewed-by: NPriyaranjan Jha <priyarjha@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6065fd0d

tcp: fix off-by-one bug in RACK · 428aec5e

由 Yuchung Cheng 提交于 12月 07, 2017

RACK should mark a packet lost when remaining wait time is zero.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Reviewed-by: NNeal Cardwell <ncardwell@google.com>
Reviewed-by: NPriyaranjan Jha <priyarjha@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

428aec5e

tcp: always evaluate losses in RACK upon undo · cd1fc85b

由 Yuchung Cheng 提交于 12月 07, 2017

When sender detects spurious retransmission, all packets
marked lost are remarked to be in-flight. However some may
be considered lost based on its timestamps in RACK. This patch
forces RACK to re-evaluate, which may be skipped previously if
the ACK does not advance RACK timestamp.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Reviewed-by: NNeal Cardwell <ncardwell@google.com>
Reviewed-by: NPriyaranjan Jha <priyarjha@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd1fc85b

tcp: correctly test congestion state in RACK · 0ce294d8

由 Yuchung Cheng 提交于 12月 07, 2017

RACK does not test the loss recovery state correctly to compute
the reordering window. It assumes if lost_out is zero then TCP is
not in loss recovery. But it can be zero during recovery before
calling tcp_rack_detect_loss(): when an ACK acknowledges all
packets marked lost before receiving this ACK, but has not yet
to discover new ones by tcp_rack_detect_loss(). The fix is to
simply test the congestion state directly.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Reviewed-by: NNeal Cardwell <ncardwell@google.com>
Reviewed-by: NPriyaranjan Jha <priyarjha@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ce294d8

net: sched: fix use-after-free in tcf_block_put_ext · df45bf84

由 Jiri Pirko 提交于 12月 08, 2017

Since the block is freed with last chain being put, once we reach the
end of iteration of list_for_each_entry_safe, the block may be
already freed. I'm hitting this only by creating and deleting clsact:

[  202.171952] ==================================================================
[  202.180182] BUG: KASAN: use-after-free in tcf_block_put_ext+0x240/0x390
[  202.187590] Read of size 8 at addr ffff880225539a80 by task tc/796
[  202.194508]
[  202.196185] CPU: 0 PID: 796 Comm: tc Not tainted 4.15.0-rc2jiri+ #5
[  202.203200] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
[  202.213613] Call Trace:
[  202.216369]  dump_stack+0xda/0x169
[  202.220192]  ? dma_virt_map_sg+0x147/0x147
[  202.224790]  ? show_regs_print_info+0x54/0x54
[  202.229691]  ? tcf_chain_destroy+0x1dc/0x250
[  202.234494]  print_address_description+0x83/0x3d0
[  202.239781]  ? tcf_block_put_ext+0x240/0x390
[  202.244575]  kasan_report+0x1ba/0x460
[  202.248707]  ? tcf_block_put_ext+0x240/0x390
[  202.253518]  tcf_block_put_ext+0x240/0x390
[  202.258117]  ? tcf_chain_flush+0x290/0x290
[  202.262708]  ? qdisc_hash_del+0x82/0x1a0
[  202.267111]  ? qdisc_hash_add+0x50/0x50
[  202.271411]  ? __lock_is_held+0x5f/0x1a0
[  202.275843]  clsact_destroy+0x3d/0x80 [sch_ingress]
[  202.281323]  qdisc_destroy+0xcb/0x240
[  202.285445]  qdisc_graft+0x216/0x7b0
[  202.289497]  tc_get_qdisc+0x260/0x560

Fix this by holding the block also by chain 0 and put chain 0
explicitly, out of the list_for_each_entry_safe loop at the very
end of tcf_block_put_ext.

Fixes: efbf7897 ("net_sched: get rid of rcu_barrier() in tcf_block_put_ext()")
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df45bf84

net: sched: pfifo_fast use skb_array · c5ad119f

由 John Fastabend 提交于 12月 07, 2017

This converts the pfifo_fast qdisc to use the skb_array data structure
and set the lockless qdisc bit. pfifo_fast is the first qdisc to support
the lockless bit that can be a child of a qdisc requiring locking. So
we add logic to clear the lock bit on initialization in these cases when
the qdisc graft operation occurs.

This also removes the logic used to pick the next band to dequeue from
and instead just checks a per priority array for packets from top priority
to lowest. This might need to be a bit more clever but seems to work
for now.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5ad119f

net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio · ce679e8d

由 John Fastabend 提交于 12月 07, 2017

The sch_mqprio qdisc creates a sub-qdisc per tx queue which are then
called independently for enqueue and dequeue operations. However
statistics are aggregated and pushed up to the "master" qdisc.

This patch adds support for any of the sub-qdiscs to be per cpu
statistic qdiscs. To handle this case add a check when calculating
stats and aggregate the per cpu stats if needed.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce679e8d

net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq · b01ac095

由 John Fastabend 提交于 12月 07, 2017

The sch_mq qdisc creates a sub-qdisc per tx queue which are then
called independently for enqueue and dequeue operations. However
statistics are aggregated and pushed up to the "master" qdisc.

This patch adds support for any of the sub-qdiscs to be per cpu
statistic qdiscs. To handle this case add a check when calculating
stats and aggregate the per cpu stats if needed.

Also exports __gnet_stats_copy_queue() to use as a helper function.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b01ac095

net: sched: helpers to sum qlen and qlen for per cpu logic · 7e66016f

由 John Fastabend 提交于 12月 07, 2017

Add qdisc qlen helper routines for lockless qdiscs to use.

The qdisc qlen is no longer used in the hotpath but it is reported
via stats query on the qdisc so it still needs to be tracked. This
adds the per cpu operations needed along with a helper to return
the summation of per cpu stats.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e66016f

net: sched: check for frozen queue before skb_bad_txq check · fd8e8d1a

由 John Fastabend 提交于 12月 07, 2017

I can not think of any reason to pull the bad txq skb off the qdisc if
the txq we plan to send this on is still frozen. So check for frozen
queue first and abort before dequeuing either skb_bad_txq skb or
normal qdisc dequeue() skb.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd8e8d1a

net: sched: use skb list for skb_bad_tx · 70e57d5e

由 John Fastabend 提交于 12月 07, 2017

Similar to how gso is handled use skb list for skb_bad_tx this is
required with lockless qdiscs because we may have multiple cores
attempting to push skbs into skb_bad_tx concurrently
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70e57d5e

net: sched: drop qdisc_reset from dev_graft_qdisc · 7bbde83b

由 John Fastabend 提交于 12月 07, 2017

In qdisc_graft_qdisc a "new" qdisc is attached and the 'qdisc_destroy'
operation is called on the old qdisc. The destroy operation will wait
a rcu grace period and call qdisc_rcu_free(). At which point
gso_cpu_skb is free'd along with all stats so no need to zero stats
and gso_cpu_skb from the graft operation itself.

Further after dropping the qdisc locks we can not continue to call
qdisc_reset before waiting an rcu grace period so that the qdisc is
detached from all cpus. By removing the qdisc_reset() here we get
the correct property of waiting an rcu grace period and letting the
qdisc_destroy operation clean up the qdisc correctly.

Note, a refcnt greater than 1 would cause the destroy operation to
be aborted however if this ever happened the reference to the qdisc
would be lost and we would have a memory leak.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7bbde83b

net: sched: explicit locking in gso_cpu fallback · a53851e2

由 John Fastabend 提交于 12月 07, 2017

This work is preparing the qdisc layer to support egress lockless
qdiscs. If we are running the egress qdisc lockless in the case we
overrun the netdev, for whatever reason, the netdev returns a busy
error code and the skb is parked on the gso_skb pointer. With many
cores all hitting this case at once its possible to have multiple
sk_buffs here so we turn gso_skb into a queue.

This should be the edge case and if we see this frequently then
the netdev/qdisc layer needs to back off.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a53851e2

net: sched: a dflt qdisc may be used with per cpu stats · d59f5ffa

由 John Fastabend 提交于 12月 07, 2017

Enable dflt qdisc support for per cpu stats before this patch a dflt
qdisc was required to use the global statistics qstats and bstats.

This adds a static flags field to qdisc_ops that is propagated
into qdisc->flags in qdisc allocate call. This allows the allocation
block to completely allocate the qdisc object so we don't have
dangling allocations after qdisc init.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d59f5ffa

net: sched: remove remaining uses for qdisc_qlen in xmit path · 29b86cda

由 John Fastabend 提交于 12月 07, 2017

sch_direct_xmit() uses qdisc_qlen as a return value but all call sites
of the routine only check if it is zero or not. Simplify the logic so
that we don't need to return an actual queue length value.

This introduces a case now where sch_direct_xmit would have returned
a qlen of zero but now it returns true. However in this case all
call sites of sch_direct_xmit will implement a dequeue() and get
a null skb and abort. This trades tracking qlen in the hotpath for
an extra dequeue operation. Overall this seems to be good for
performance.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29b86cda

net: sched: allow qdiscs to handle locking · 6b3ba914

由 John Fastabend 提交于 12月 07, 2017

This patch adds a flag for queueing disciplines to indicate the stack
does not need to use the qdisc lock to protect operations. This can
be used to build lockless scheduling algorithms and improving
performance.

The flag is checked in the tx path and the qdisc lock is only taken
if it is not set. For now use a conditional if statement. Later we
could be more aggressive if it proves worthwhile and use a static key
or wrap this in a likely().

Also the lockless case drops the TCQ_F_CAN_BYPASS logic. The reason
for this is synchronizing a qlen counter across threads proves to
cost more than doing the enqueue/dequeue operations when tested with
pktgen.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b3ba914

net: sched: cleanup qdisc_run and __qdisc_run semantics · 6c148184

由 John Fastabend 提交于 12月 07, 2017

Currently __qdisc_run calls qdisc_run_end() but does not call
qdisc_run_begin(). This makes it hard to track pairs of
qdisc_run_{begin,end} across function calls.

To simplify reading these code paths this patch moves begin/end calls
into qdisc_run().
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c148184

tcp_bbr: reset long-term bandwidth sampling on loss recovery undo · 600647d4

由 Neal Cardwell 提交于 12月 07, 2017

Fix BBR so that upon notification of a loss recovery undo BBR resets
long-term bandwidth sampling.

Under high reordering, reordering events can be interpreted as loss.
If the reordering and spurious loss estimates are high enough, this
can cause BBR to spuriously estimate that we are seeing loss rates
high enough to trigger long-term bandwidth estimation. To avoid that
problem, this commit resets long-term bandwidth sampling on loss
recovery undo events.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Reviewed-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

600647d4

tcp_bbr: reset full pipe detection on loss recovery undo · 2f6c498e

由 Neal Cardwell 提交于 12月 07, 2017

Fix BBR so that upon notification of a loss recovery undo BBR resets
the full pipe detection (STARTUP exit) state machine.

Under high reordering, reordering events can be interpreted as loss.
If the reordering and spurious loss estimates are high enough, this
could previously cause BBR to spuriously estimate that the pipe is
full.

Since spurious loss recovery means that our overall sending will have
slowed down spuriously, this commit gives a flow more time to probe
robustly for bandwidth and decide the pipe is really full.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Reviewed-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f6c498e

tcp_bbr: record "full bw reached" decision in new full_bw_reached bit · c589e69b

由 Neal Cardwell 提交于 12月 07, 2017

This commit records the "full bw reached" decision in a new
full_bw_reached bit. This is a pure refactor that does not change the
current behavior, but enables subsequent fixes and improvements.

In particular, this enables simple and clean fixes because the full_bw
and full_bw_cnt can be unconditionally zeroed without worrying about
forgetting that we estimated we filled the pipe in Startup. And it
enables future improvements because multiple code paths can be used
for estimating that we filled the pipe in Startup; any new code paths
only need to set this bit when they think the pipe is full.

Note that this fix intentionally reduces the width of the full_bw_cnt
counter, since we have never used the most significant bit.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Reviewed-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c589e69b

08 12月, 2017 4 次提交

tcp: invalidate rate samples during SACK reneging · d4761754

由 Yousuk Seung 提交于 12月 07, 2017

Mark tcp_sock during a SACK reneging event and invalidate rate samples
while marked. Such rate samples may overestimate bw by including packets
that were SACKed before reneging.

< ack 6001 win 10000 sack 7001:38001
< ack 7001 win 0 sack 8001:38001 // Reneg detected
> seq 7001:8001 // RTO, SACK cleared.
< ack 38001 win 10000

In above example the rate sample taken after the last ack will count
7001-38001 as delivered while the actual delivery rate likely could
be much lower i.e. 7001-8001.

This patch adds a new field tcp_sock.sack_reneg and marks it when we
declare SACK reneging and entering TCP_CA_Loss, and unmarks it after
the last rate sample was taken before moving back to TCP_CA_Open. This
patch also invalidates rate samples taken while tcp_sock.is_sack_reneg
is set.

Fixes: b9f64820 ("tcp: track data delivery rate for a TCP connection")
Signed-off-by: NYousuk Seung <ysseung@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NPriyaranjan Jha <priyarjha@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4761754

smc: support variable CLC proposal messages · e7b7a64a

由 Ursula Braun 提交于 12月 07, 2017

According to RFC7609 [1] the CLC proposal message contains an area of
unknown length for future growth. Additionally it may contain up to
8 IPv6 prefixes. The current version of the SMC-code does not
understand CLC proposal messages using these variable length fields and,
thus, is incompatible with SMC implementations in other operating
systems.

This patch makes sure, SMC understands incoming CLC proposals
* with arbitrary length values for future growth
* with up to 8 IPv6 prefixes

[1] SMC-R Informational RFC: http://www.rfc-editor.org/info/rfc7609Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: NHans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7b7a64a

smc: no consumer update in tasklet context · 6b5771aa

由 Ursula Braun 提交于 12月 07, 2017

The SMC protocol requires to send a separate consumer cursor update,
if it cannot be piggybacked to updates of the producer cursor.
When receiving a blocked signal from the sender, this update is sent
already in tasklet context. In addition consumer cursor updates are
sent after data receival.
Sending of cursor updates is controlled by sequence numbers.
Assuming receiving stray messages the receiver drops updates with older
sequence numbers than an already received cursor update with a higher
sequence number.
Sending consumer cursor updates in tasklet context may result in
wrong order sends and its corresponding drops at the receiver. Since
it is sufficient to send consumer cursor updates once the data is
received, this patch gets rid of the consumer cursor update in tasklet
context to guarantee in-sequence arrival of cursor updates.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b5771aa

smc: cleanup close checking during data receival · 71c125c3

由 Ursula Braun 提交于 12月 07, 2017

When waiting for data to be received it must be checked if the
peer signals shutdown. The SMC code uses two different checks
for this purpose, even though just one check is sufficient.
This patch removes the superfluous test for SOCK_DONE.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71c125c3

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功