提交 · dfafb73f355940989507c56b9282773132e86a9f · openeuler / Kernel

06 9月, 2013 18 次提交

icplus: Use netif_running to determine device state · dfafb73f

由 Jon Mason 提交于 9月 04, 2013

Remove the __LINK_STATE_START check to verify the device is running, in
favor of netif_running().  netif_running() performs the same check of
__LINK_STATE_START, so the code should behave the same.
Signed-off-by: NJon Mason <jdmason@kudzu.us>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: Sorbica Shieh <sorbica@icplus.com.tw>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dfafb73f

ethernet/arc/arc_emac: Fix huge delays in large file copies · 27082ee1

由 Vineet Gupta 提交于 9月 04, 2013

copying large files to a NFS mounted host was taking absurdly large
time.

Turns out that TX BD reclaim had a sublte bug.

Loop starts off from @txbd_dirty cursor and stops when it hits a BD
still in use by controller. However when it stops it needs to keep the
cursor at that very BD to resume scanning in next iteration. However it
was erroneously incrementing the cursor, causing the next scan(s) to
fail too, unless the BD chain was completely drained out.

[ARCLinux]$ ls -l -sh /disk/log.txt
 17976 -rw-r--r--    1 root     root       17.5M Sep  /disk/log.txt

========== Before =====================
[ARCLinux]$ time cp /disk/log.txt /mnt/.
real    31m 7.95s
user    0m 0.00s
sys     0m 0.10s

========== After =====================
[ARCLinux]$ time cp /disk/log.txt /mnt/.
real    0m 24.33s
user    0m 0.00s
sys     0m 0.19s
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
Cc: Alexey Brodkin <abrodkin@synopsys.com> (commit_signer:3/4=75%)
Cc: "David S. Miller" <davem@davemloft.net> (commit_signer:3/4=75%)
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: arc-linux-dev@synopsys.com
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

27082ee1

tuntap: orphan frags before trying to set tx timestamp · 7bf66305

由 Jason Wang 提交于 9月 05, 2013

sock_tx_timestamp() will clear all zerocopy flags of skb which may lead the
frags never to be orphaned. This will break guest to guest traffic when zerocopy
is enabled. Fix this by orphaning the frags before trying to set tx time stamp.

The issue were introduced by commit eda29772
(tun: Support software transmit time stamping).

Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7bf66305

tuntap: purge socket error queue on detach · 4bfb0513

由 Jason Wang 提交于 9月 05, 2013

Commit eda29772
(tun: Support software transmit time stamping) will queue skbs into error queue
when tx stamping is enabled. But it forgets to purge the error queue during
detach. This patch fixes this.

Cc: Richard Cochran <richardcochran@gmail.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4bfb0513

qlcnic: use standard NAPI weights · df95fc44

由 Michal Schmidt 提交于 9月 04, 2013

Since commit 82dc3c63 ("net: introduce NAPI_POLL_WEIGHT")
netif_napi_add() produces an error message if a NAPI poll weight
greater than 64 is requested.

qlcnic requests the weight as large as 256 for some of its rings, and
smaller values for other rings. For instance in qlcnic_82xx_napi_add()
I think the intention was to give the tx+rx ring a bigger weight than
to rx-only rings, but it's actually doing the opposite. So I'm assuming
the weights do not really matter much.

Just use the standard NAPI weights for all rings.
Signed-off-by: NMichal Schmidt <mschmidt@redhat.com>
Acked-by: NHimanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df95fc44

ipv6:introduce function to find route for redirect · b55b76b2

由 Duan Jiong 提交于 9月 04, 2013

RFC 4861 says that the IP source address of the Redirect is the
same as the current first-hop router for the specified ICMP
Destination Address, so the gateway should be taken into
consideration when we find the route for redirect.

There was once a check in commit
a6279458 ("NDISC: Search over
all possible rules on receipt of redirect.") and the check
went away in commit b94f1c09
("ipv6: Use icmpv6_notify() to propagate redirect, instead of
rt6_redirect()").

The bug is only "exploitable" on layer-2 because the source
address of the redirect is checked to be a valid link-local
address but it makes spoofing a lot easier in the same L2
domain nonetheless.

Thanks very much for Hannes's help.
Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b55b76b2

bnx2x: VF RSS support - VF side · 60cad4e6

由 Ariel Elior 提交于 9月 04, 2013

In this patch capabilities are added to the Vf driver to request
multiple queues over the VF PF channel, and the logic for requesting
rss configuration for said queues.
Signed-off-by: NAriel Elior <ariele@broadcom.com>
Signed-off-by: NEilong Greenstein <eilong@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60cad4e6

bnx2x: VF RSS support - PF side · b9871bcf

由 Ariel Elior 提交于 9月 04, 2013

This patch adds support for Receive Side Scaling for queues of
Virtual Functions on the PF side. This includes support for the
requests for multiple queues from VF drivers, configuration of the
HW for multiple queues per VF, and support for rss configuration
of said queues.
Signed-off-by: NAriel Elior <ariele@broadcom.com>
Signed-off-by: NEilon Greenstein <eilong@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9871bcf

vxlan: Notify drivers for listening UDP port changes · 53cf5275

由 Joseph Gasparakis 提交于 9月 04, 2013

This patch adds two more ndo ops: ndo_add_rx_vxlan_port() and
ndo_del_rx_vxlan_port().

Drivers can get notifications through the above functions about changes
of the UDP listening port of VXLAN. Also, when physical ports come up,
now they can call vxlan_get_rx_port() in order to obtain the port number(s)
of the existing VXLAN interface in case they already up before them.

This information about the listening UDP port would be used for VXLAN
related offloads.

A big thank you to John Fastabend (john.r.fastabend@intel.com) for his
input and his suggestions on this patch set.

CC: John Fastabend <john.r.fastabend@intel.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NJoseph Gasparakis <joseph.gasparakis@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53cf5275

net: usbnet: update addr_assign_type if appropriate · eef23b53

由 Bjørn Mork 提交于 9月 04, 2013

This module generates a common default address on init,
using eth_random_addr. Set addr_assign_type to let
userspace know the address is random unless it was
overridden by the minidriver.
Signed-off-by: NBjørn Mork <bjorn@mork.no>
Acked-by: NOliver Neukum <oliver@neukum.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eef23b53

Merge branch 'enic' · c3de991f

由 David S. Miller 提交于 9月 05, 2013

Govindarajulu Varadarajan says:

====================
The following patch adds multi tx support for enic.
Signed-off-by: NNishank Trivedi <nistrive@cisco.com>
Signed-off-by: NChristian Benvenuti <benve@cisco.com>
Signed-off-by: NGovindarajulu Varadarajan <govindarajulu90@gmail.com>
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3de991f

driver/net: enic: update enic maintainers and driver · 001e1c1d

由 govindarajulu.v 提交于 9月 04, 2013

Signed-off-by: NGovindarajulu Varadarajan <govindarajulu90@gmail.com>
Signed-off-by: NSujith Sankar <ssujith@cisco.com>
Signed-off-by: NNishank Trivedi <nistrive@cisco.com>
Signed-off-by: NChristian Benvenuti <benve@cisco.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

001e1c1d

driver/net: enic: Exposing symbols for Cisco's low latency driver · 4a50ddfd

由 govindarajulu.v 提交于 9月 04, 2013

This patch exposes symbols for usnic low latency driver that can be used to
register and unregister vNics as well to traverse the resources on vNics.
Signed-off-by: NUpinder Malhi <umalhi@cisco.com>
Signed-off-by: NNishank Trivedi <nistrive@cisco.com>
Signed-off-by: NChristian Benvenuti <benve@cisco.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a50ddfd

driver/net: enic: Try DMA 64 first, then failover to DMA · 624dbf55

由 govindarajulu.v 提交于 9月 04, 2013

In servers with more than 1.1 TB of RAM, the existing 40/32 bit DMA
could cause failure as the DMA-able address could go outside the range
addressable using 40/32 bits.

The following patch first tried 64 bit DMA if possible, failover to 32
bit.
Signed-off-by: NSujith Sankar <ssujith@cisco.com>
Signed-off-by: NChristian Benvenuti <benve@cisco.com>
Signed-off-by: NGovindarajulu Varadarajan <govindarajulu90@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

624dbf55

driver/net: enic: record q_number and rss_hash for skb · bf751ba8

由 govindarajulu.v 提交于 9月 04, 2013

The following patch sets the skb->rxhash and skb->q_number.
This is used by RPS and RFS. Kernel can make use of hw provided hash
instead of calculating the hash.
Signed-off-by: NGovindarajulu Varadarajan <govindarajulu90@gmail.com>
Signed-off-by: NNishank Trivedi <nistrive@cisco.com>
Signed-off-by: NChristian Benvenuti <benve@cisco.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf751ba8

driver/net: enic: Add multi tx support for enic · 822473b6

由 govindarajulu.v 提交于 9月 04, 2013

The following patch adds multi tx support for enic.
Signed-off-by: NNishank Trivedi <nistrive@cisco.com>
Signed-off-by: NChristian Benvenuti <benve@cisco.com>
Signed-off-by: NGovindarajulu Varadarajan <govindarajulu90@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

822473b6

bridge: apply multicast snooping to IPv6 link-local, too · 3c3769e6

由 Linus Lüssing 提交于 9月 04, 2013

The multicast snooping code should have matured enough to be safely
applicable to IPv6 link-local multicast addresses (excluding the
link-local all nodes address, ff02::1), too.
Signed-off-by: NLinus Lüssing <linus.luessing@web.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c3769e6

bridge: prevent flooding IPv6 packets that do not have a listener · 8fad9c39

由 Linus Lüssing 提交于 9月 04, 2013

Currently if there is no listener for a certain group then IPv6 packets
for that group are flooded on all ports, even though there might be no
host and router interested in it on a port.

With this commit they are only forwarded to ports with a multicast
router.

Just like commit bd4265fe ("bridge: Only flood unregistered groups
to routers") did for IPv4, let's do the same for IPv6 with the same
reasoning.
Signed-off-by: NLinus Lüssing <linus.luessing@web.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fad9c39

05 9月, 2013 19 次提交

net: ipv6: mld: document force_mld_version in ip-sysctl.txt · f2127810

由 Daniel Borkmann 提交于 9月 04, 2013

Document force_mld_version parameter in ip-sysctl.txt.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2127810

net: ipv6: mld: introduce mld_{gq, ifc, dad}_stop_timer functions · b4af8def

由 Daniel Borkmann 提交于 9月 04, 2013

We already have mld_{gq,ifc,dad}_start_timer() functions, so introduce
mld_{gq,ifc,dad}_stop_timer() functions to reduce code size and make it
more readable.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4af8def

net: ipv6: mld: refactor query processing into v1/v2 functions · 2b7c121f

由 Daniel Borkmann 提交于 9月 04, 2013

Make igmp6_event_query() a bit easier to read by refactoring code
parts into mld_process_v1() and mld_process_v2().
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b7c121f

net: ipv6: mld: similarly to MLDv2 have min max_delay of 1 · cc7f7ab7

由 Daniel Borkmann 提交于 9月 04, 2013

Similarly as we do in MLDv2 queries, set a forged MLDv1 query with
0 ms mld_maxdelay to minimum timer shot time of 1 jiffies. This is
eventually done in igmp6_group_queried() anyway, so we can simplify
a check there.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc7f7ab7

net: ipv6: mld: implement RFC3810 MLDv2 mode only · 58c0ecfd

由 Daniel Borkmann 提交于 9月 04, 2013

RFC3810, 10. Security Considerations says under subsection 10.1.
Query Message:

  A forged Version 1 Query message will put MLDv2 listeners on that
  link in MLDv1 Host Compatibility Mode. This scenario can be avoided
  by providing MLDv2 hosts with a configuration option to ignore
  Version 1 messages completely.

Hence, implement a MLDv2-only mode that will ignore MLDv1 traffic:

  echo 2 > /proc/sys/net/ipv6/conf/ethX/force_mld_version  or
  echo 2 > /proc/sys/net/ipv6/conf/all/force_mld_version

Note that <all> device has a higher precedence as it was previously
also the case in the macro MLD_V1_SEEN() that would "short-circuit"
if condition on <all> case.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58c0ecfd

net: ipv6: mld: get rid of MLDV2_MRC and simplify calculation · e3f5b170

由 Daniel Borkmann 提交于 9月 04, 2013

Get rid of MLDV2_MRC and use our new macros for mantisse and
exponent to calculate Maximum Response Delay out of the Maximum
Response Code.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3f5b170

net: ipv6: mld: clean up MLD_V1_SEEN macro · 6c567b78

由 Daniel Borkmann 提交于 9月 04, 2013

Replace the macro with a function to make it more readable. GCC will
eventually decide whether to inline this or not (also, that's not
fast-path anyway).
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c567b78

net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12. · 89225d1c

由 Daniel Borkmann 提交于 9月 04, 2013

i) RFC3810, 9.2. Query Interval [QI] says:

   The Query Interval variable denotes the interval between General
   Queries sent by the Querier. Default value: 125 seconds. [...]

ii) RFC3810, 9.3. Query Response Interval [QRI] says:

  The Maximum Response Delay used to calculate the Maximum Response
  Code inserted into the periodic General Queries. Default value:
  10000 (10 seconds) [...] The number of seconds represented by the
  [Query Response Interval] must be less than the [Query Interval].

iii) RFC3810, 9.12. Older Version Querier Present Timeout [OVQPT] says:

  The Older Version Querier Present Timeout is the time-out for
  transitioning a host back to MLDv2 Host Compatibility Mode. When an
  MLDv1 query is received, MLDv2 hosts set their Older Version Querier
  Present Timer to [Older Version Querier Present Timeout].

  This value MUST be ([Robustness Variable] times (the [Query Interval]
  in the last Query received)) plus ([Query Response Interval]).

Hence, on *default* the timeout results in:

  [RV] = 2, [QI] = 125sec, [QRI] = 10sec
  [OVQPT] = [RV] * [QI] + [QRI] = 260sec

Having that said, we currently calculate [OVQPT] (here given as 'switchback'
variable) as ...

  switchback = (idev->mc_qrv + 1) * max_delay

RFC3810, 9.12. says "the [Query Interval] in the last Query received". In
section "9.14. Configuring timers", it is said:

  This section is meant to provide advice to network administrators on
  how to tune these settings to their network. Ambitious router
  implementations might tune these settings dynamically based upon
  changing characteristics of the network. [...]

iv) RFC38010, 9.14.2. Query Interval:

  The overall level of periodic MLD traffic is inversely proportional
  to the Query Interval. A longer Query Interval results in a lower
  overall level of MLD traffic. The value of the Query Interval MUST
  be equal to or greater than the Maximum Response Delay used to
  calculate the Maximum Response Code inserted in General Query
  messages.

I assume that was why switchback is calculated as is (3 * max_delay), although
this setting seems to be meant for routers only to configure their [QI]
interval for non-default intervals. So usage here like this is clearly wrong.

Concluding, the current behaviour in IPv6's multicast code is not conform
to the RFC as switch back is calculated wrongly. That is, it has a too small
value, so MLDv2 hosts switch back again to MLDv2 way too early, i.e. ~30secs
instead of ~260secs on default.

Hence, introduce necessary helper functions and fix this up properly as it
should be.

Introduced in 06da92283 ("[IPV6]: Add MLDv2 support."). Credits to Hannes
Frederic Sowa who also had a hand in this as well. Also thanks to Hangbin Liu
who did initial testing.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: David Stevens <dlstevens@us.ibm.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89225d1c

tcp: better comments for RTO initiallization · 52f20e65

由 Yuchung Cheng 提交于 9月 03, 2013

Commit 1b7fdd2a("tcp: do not use cached RTT for RTT estimation")
removes important comments on how RTO is initialized and updated.
Hopefully this patch puts those information back.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52f20e65

vxlan: Optimize vxlan rcv · 430eda6d

由 Pravin B Shelar 提交于 9月 03, 2013

vxlan-udp-recv function lookup vxlan_sock struct on every packet
recv by using udp-port number. we can use sk->sk_user_data to
store vxlan_sock and avoid lookup.
I have open coded rcu-api to store and read vxlan_sock from
sk_user_data to avoid sparse warning as sk_user_data is not
__rcu pointer.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

430eda6d

atm: he: print MAC via %pM · f8de3104

由 Andy Shevchenko 提交于 9月 03, 2013

Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8de3104

atm: nicstar: re-use native mac_pton() helper · 8390f814

由 Andy Shevchenko 提交于 9月 03, 2013

There is a nice helper to parse MAC. Let's use it and remove custom
implementation.
Signed-off-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8390f814

driver:stmmac: Adjust time stamp increase for 0.465 ns accurate only when Time... · 0cf91580

由 Sonic Zhang 提交于 9月 03, 2013

driver:stmmac: Adjust time stamp increase for 0.465 ns accurate only when Time stamp binary rollover is set.

The synopsys spec says When TSCRLSSR is cleard, the rollover value of
sub-second register is 0x7FFFFFFF(0.465 ns per clock).
Signed-off-by: NSonic Zhang <sonic.zhang@analog.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0cf91580

net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4 · c08751c8

由 Alexander Sverdlin 提交于 9月 02, 2013

net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4

Initially the problem was observed with ipsec, but later it became clear that
SCTP data chunk fragmentation algorithm has problems with MTU values which are
not multiple of 4. Test program was used which just transmits 2000 bytes long
packets to other host. tcpdump was used to observe re-fragmentation in IP layer
after SCTP already fragmented data chunks.

With MTU 1500:
12:54:34.082904 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1500)
10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 2366088589] [SID: 0] [SSEQ 1] [PPID 0x0]
12:54:34.082933 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 596)
10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 2366088590] [SID: 0] [SSEQ 1] [PPID 0x0]
12:54:34.090576 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
10.151.24.91.54321 > 10.151.38.153.39303: sctp (1) [SACK] [cum ack 2366088590] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]

With MTU 1499:
13:02:49.955220 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 0, flags [+], proto SCTP (132), length 1492)
10.151.38.153.39084 > 10.151.24.91.54321: sctp[|sctp]
13:02:49.955249 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 1472, flags [none], proto SCTP (132), length 28)
10.151.38.153 > 10.151.24.91: ip-proto-132
13:02:49.955262 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600)
10.151.38.153.39084 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 404355346] [SID: 0] [SSEQ 1] [PPID 0x0]
13:02:49.956770 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
10.151.24.91.54321 > 10.151.38.153.39084: sctp (1) [SACK] [cum ack 404355346] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]

Here problem in data portion limit calculation leads to re-fragmentation in IP,
which is sub-optimal. The problem is max_data initial value, which doesn't take
into account the fact, that data chunk must be padded to 4-bytes boundary.
It's enough to correct max_data, because all later adjustments are correctly
aligned to 4-bytes boundary.

After the fix is applied, everything is fragmented correctly for uneven MTUs:
15:16:27.083881 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1496)
10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 3077098183] [SID: 0] [SSEQ 1] [PPID 0x0]
15:16:27.083907 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600)
10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 3077098184] [SID: 0] [SSEQ 1] [PPID 0x0]
15:16:27.085640 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
10.151.24.91.54321 > 10.151.38.153.53417: sctp (1) [SACK] [cum ack 3077098184] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]

The bug was there for years already, but
- is a performance issue, the packets are still transmitted
- doesn't show up with default MTU 1500, but possibly with ipsec (MTU 1438)
Signed-off-by: NAlexander Sverdlin <alexander.sverdlin@nsn.com>
Acked-by: NVlad Yasevich <vyasevich@gmail.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c08751c8

drivers:net: delete premature free_irq · 0a171933

由 Julia Lawall 提交于 9月 02, 2013

Free_irq is not needed if there has been no request_irq.  Free_irq is
removed from both the probe and remove functions.  The correct request_irq
and free_irq are found in the open and close functions.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression e;
@@

*e = platform_get_irq(...);
... when != request_irq(e,...)
*free_irq(e,...)
// </smpl>
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a171933

net: sync some IP headers with glibc · cfd280c9

由 Carlos O'Donell 提交于 8月 15, 2013

Solution:
=========

- Synchronize linux's `include/uapi/linux/in6.h'
  with glibc's `inet/netinet/in.h'.
- Synchronize glibc's `inet/netinet/in.h with linux's
  `include/uapi/linux/in6.h'.
- Allow including the headers in either other.
- First header included defines the structures and macros.

Details:
========

The kernel promises not to break the UAPI ABI so I don't
see why we can't just have the two userspace headers
coordinate?

If you include the kernel headers first you get those,
and if you include the glibc headers first you get those,
and the following patch arranges a coordination and
synchronization between the two.

Let's handle `include/uapi/linux/in6.h' from linux,
and `inet/netinet/in.h' from glibc and ensure they compile
in any order and preserve the required ABI.

These two patches pass the following compile tests:

cat >> test1.c <<EOF
int main (void) {
  return 0;
}
EOF
gcc -c test1.c

cat >> test2.c <<EOF
int main (void) {
  return 0;
}
EOF
gcc -c test2.c

One wrinkle is that the kernel has a different name for one of
the members in ipv6_mreq. In the kernel patch we create a macro
to cover the uses of the old name, and while that's not entirely
clean it's one of the best solutions (aside from an anonymous
union which has other issues).

I've reviewed the code and it looks to me like the ABI is
assured and everything matches on both sides.

Notes:
- You want netinet/in.h to include bits/in.h as early as possible,
  but it needs in_addr so define in_addr early.
- You want bits/in.h included as early as possible so you can use
  the linux specific code to define __USE_KERNEL_DEFS based on
  the _UAPI_* macro definition and use those to cull in.h.
- glibc was missing IPPROTO_MH, added here.

Compile tested and inspected.
Reported-by: NThomas Backlund <tmb@mageia.org>
Cc: Thomas Backlund <tmb@mageia.org>
Cc: libc-alpha@sourceware.org
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: David S. Miller <davem@davemloft.net>
Tested-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NCarlos O'Donell <carlos@redhat.com>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cfd280c9

sfc: check for allocation failure · 42a5a5c1

由 Dan Carpenter 提交于 9月 04, 2013

It upsets static analyzers when we don't check for allocation failure.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42a5a5c1

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · b163b42f

由 David S. Miller 提交于 9月 04, 2013

Jeff Kirsher says:

====================
This series contains updates to igb only.

Todd provides a fix for igb to not look for a PBA in the iNVM on
devices that are flashless.

Akeem provides igb patches to add a new PHY id for i354, as well as
a couple of patches to implement the new PHY id.  He also provides
several patches to correctly report the appropriate media type as
well as correctly report advertised/supported link for i354 devices.
Lastly Akeem implements a 1 second delay mechanism for i210 devices
to avoid erroneous link issue with the link partner.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b163b42f

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 48f8e0af

由 David S. Miller 提交于 9月 04, 2013

Pablo Neira Ayuso says:

====================
The following batch contains:

* Three fixes for the new synproxy target available in your
  net-next tree, from Jesper D. Brouer and Patrick McHardy.

* One fix for TCPMSS to correctly handling the fragmentation
  case, from Phil Oester. I'll pass this one to -stable.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48f8e0af

04 9月, 2013 3 次提交

igb: Update version number · 66f40b8a

由 Akeem G Abodunrin 提交于 8月 22, 2013

This patch updates igb driver version to 5.0.5
Signed-off-by: NAkeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

66f40b8a

igb: Implementation to report advertised/supported link on i354 devices · 41fcfbea

由 Akeem G Abodunrin 提交于 8月 30, 2013

This patch changes the way we report supported/advertised link for i354
devices, especially for 2.5 GB. Instead of reporting 2.5 GB for all i354
devices erroneously, check first, if it is 2.5 GB capable.
Signed-off-by: NAkeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

41fcfbea

igb: Get speed and duplex for 1G non_copper devices · f6878e39

由 Akeem G Abodunrin 提交于 8月 28, 2013

This patch changes how we get speed/duplex for non_copper devices; it
now uses pcs register to get current speed and duplex instead of using
generic status register that we use to detect speed/duplex for copper
devices.
Signed-off-by: NAkeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

f6878e39

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功