提交 · 57b55a7ec684d8b846d6d5e67f4982363a83db7e · openeuler / Kernel

03 5月, 2012 15 次提交

tcp: Move code related to head frag in tcp_try_coalesce · 57b55a7e

由 Alexander Duyck 提交于 5月 02, 2012

This change reorders the code related to the use of an skb->head_frag so it
is placed before we check the rest of the frags.  This allows the code to
read more linearly instead of like some sort of loop.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

57b55a7e

tcp: Fix truesize accounting in tcp_try_coalesce · c73c3d9c

由 Alexander Duyck 提交于 5月 02, 2012

This patch addresses several issues in the way we were tracking the
truesize in tcp_try_coalesce.

First it was using ksize which prevents us from having a 0 sized head frag
and getting a usable result.  To resolve that this patch uses the end
pointer which is set based off either ksize, or the frag_size supplied in
build_skb.  This allows us to compute the original truesize of the entire
buffer and remove that value leaving us with just what was added as pages.

The second issue was the use of skb->len if there is a mergeable head frag.
We should only need to remove the size of an data aligned sk_buff from our
current skb->truesize to compute the delta for a buffer with a reused head.
By using skb->len the value of truesize was being artificially reduced
which means that head frags could use more memory than buffers using
standard allocations.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c73c3d9c

D
net: Add missing linux/prefetch.h include to net/core/sock.c · 8c1ae10d
由 David S. Miller 提交于 5月 03, 2012
```
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
8c1ae10d

net: Stop decapitating clones that have a head_frag · 2996d31f

由 Alexander Duyck 提交于 5月 02, 2012

This change is meant ot prevent stealing the skb->head to use as a page in
the event that the skb->head was cloned.  This allows the other clones to
track each other via shinfo->dataref.

Without this we break down to two methods for tracking the reference count,
one being dataref, the other being the page count.  As a result it becomes
difficult to track how many references there are to skb->head.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2996d31f

net: implement tcp coalescing in tcp_queue_rcv() · b081f85c

由 Eric Dumazet 提交于 5月 02, 2012

Extend tcp coalescing implementing it from tcp_queue_rcv(), the main
receiver function when application is not blocked in recvmsg().

Function tcp_queue_rcv() is moved a bit to allow its call from
tcp_data_queue()

This gives good results especially if GRO could not kick, and if skb
head is a fragment.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b081f85c

net: take care of cloned skbs in tcp_try_coalesce() · 923dd347

由 Eric Dumazet 提交于 5月 02, 2012

Before stealing fragments or skb head, we must make sure skbs are not
cloned.

Alexander was worried about destination skb being cloned : In bridge
setups, a driver could be fooled if skb->data_len would not match skb
nr_frags.

If source skb is cloned, we must take references on pages instead.

Bug happened using tcpdump (if not using mmap())

Introduce kfree_skb_partial() helper to cleanup code.
Reported-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

923dd347

be2net: Fix EEH error reset before a flash dump completes · eeb7fc7b

由 Somnath Kotur 提交于 5月 02, 2012

An EEH error can cause the FW to trigger a flash debug dump.
Resetting the card while flash dump is in progress can cause it not to recover.
Wait for it to finish before letting EEH flow to reset the card.
Signed-off-by: NSathya Perla <Sathya.Perla@emulex.com>
Signed-off-by: NSomnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eeb7fc7b

be2net: Record receive queue index in skb to aid RPS. · aaa6daec

由 Somnath Kotur 提交于 5月 02, 2012

Signed-off-by: NSarveshwar Bandi <Sarveshwar.Bandi@emulex.com>
Signed-off-by: NSomnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aaa6daec

be2net: Fix to apply duplex value as unknown when link is down. · 682256db

由 Somnath Kotur 提交于 5月 02, 2012

Suggested-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NSarveshwar Bandi <sarveshwar.bandi@emulex.com>
Signed-off-by: NSomnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

682256db

be2net: Fix to not set link speed for disabled functions of a UMC card · 22ca7a6e

由 Somnath Kotur 提交于 5月 02, 2012

This renders the interface view somewhat inconsistent from the Host OS POV
considering the rest of the interfaces are showing their respective speeds
based on the bandwidth assigned to them.
Signed-off-by: NSomnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22ca7a6e

tcp: early retransmit: delayed fast retransmit · 750ea2ba

由 Yuchung Cheng 提交于 5月 02, 2012

Implementing the advanced early retransmit (sysctl_tcp_early_retrans==2).
Delays the fast retransmit by an interval of RTT/4. We borrow the
RTO timer to implement the delay. If we receive another ACK or send
a new packet, the timer is cancelled and restored to original RTO
value offset by time elapsed.  When the delayed-ER timer fires,
we enter fast recovery and perform fast retransmit.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

750ea2ba

tcp: early retransmit · eed530b6

由 Yuchung Cheng 提交于 5月 02, 2012

This patch implements RFC 5827 early retransmit (ER) for TCP.
It reduces DUPACK threshold (dupthresh) if outstanding packets are
less than 4 to recover losses by fast recovery instead of timeout.

While the algorithm is simple, small but frequent network reordering
makes this feature dangerous: the connection repeatedly enter
false recovery and degrade performance. Therefore we implement
a mitigation suggested in the appendix of the RFC that delays
entering fast recovery by a small interval, i.e., RTT/4. Currently
ER is conservative and is disabled for the rest of the connection
after the first reordering event. A large scale web server
experiment on the performance impact of ER is summarized in
section 6 of the paper "Proportional Rate Reduction for TCP”,
IMC 2011. http://conferences.sigcomm.org/imc/2011/docs/p155.pdf

Note that Linux has a similar feature called THIN_DUPACK. The
differences are THIN_DUPACK do not mitigate reorderings and is only
used after slow start. Currently ER is disabled if THIN_DUPACK is
enabled. I would be happy to merge THIN_DUPACK feature with ER if
people think it's a good idea.

ER is enabled by sysctl_tcp_early_retrans:
  0: Disables ER

  1: Reduce dupthresh to packets_out - 1 when outstanding packets < 4.

  2: (Default) reduce dupthresh like mode 1. In addition, delay
     entering fast recovery by RTT/4.

Note: mode 2 is implemented in the third part of this patch series.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eed530b6

tcp: early retransmit: tcp_enter_recovery() · 1fbc3405

由 Yuchung Cheng 提交于 5月 02, 2012

This a prepartion patch that refactors the code to enter recovery
into a new function tcp_enter_recovery(). It's needed to implement
the delayed fast retransmit in ER.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fbc3405

net/pasemi: fix compiler warning · 5c6239c8

由 Stephen Rothwell 提交于 5月 03, 2012

Fix this compiler warning (on PowerPC) by not marking a parameter as
const:

drivers/net/ethernet/pasemi/pasemi_mac.c: In function 'pasemi_mac_replenish_rx_ring':
drivers/net/ethernet/pasemi/pasemi_mac.c:646:3: warning: passing argument 1 of 'netdev_alloc_skb' discards qualifiers from pointer target type
include/linux/skbuff.h:1706:31: note: expected 'struct net_device *' but argument is of type 'const struct net_device *'

Cc: Olof Johansson <olof@lixom.net>
Cc: Pradeep A. Dalvi <netdev@pradeepdalvi.com>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c6239c8

bnx2x: fix handling single MSIX mode for 57710/57711 · 69c326b3

由 Dmitry Kravkov 提交于 5月 02, 2012

commit 30a5de77 added
ability to use single MSI-X vector, but lack proper
handling for 57710/57711 HW
Signed-off-by: NDmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: NEilon Greenstein <eilong@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69c326b3

02 5月, 2012 7 次提交

ixgbe: Reset max_vfs to zero when user request is out of range · 6b42a9c5

由 Greg Rose 提交于 4月 17, 2012

If the user request for the number of VFs in the max_vfs parameter is
out of range then reset the value to the default value of zero. This
makes the behavior of the ixgbe driver the same as for the igb driver.
Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
Tested-by: NRobert Garrett <robertx.e.garrett@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

6b42a9c5

ixgbe: Deny MACVLAN requests from VFs with admin set MAC · 2ee7065f

由 Greg Rose 提交于 3月 24, 2012

If the host VMM administrator has set the virtual function device's
MAC address then also deny VF requests for MACVLAN filters.
Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
Tested-by: NGarrett, Robert <robertx.e.garrett@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

2ee7065f

ixgbe: add hwmon interface to export thermal data · 3ca8bc6d

由 Don Skidmore 提交于 4月 12, 2012

Some of our adapters have thermal data available, this patch exports
this data via hwmon sysfs interface.
Signed-off-by: NDon Skidmore <donald.c.skidmore@intel.com>
Tested-by: NStephen Ko <stephen.s.ko@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

3ca8bc6d

ixgbe: add support functions to access thermal data · e1ea9158

由 Don Skidmore 提交于 2月 17, 2012

Some 82599 adapters contain thermal data that we can get to via
an i2c interface.  These functions provide support to get at that
data.  A following patch will export this data.
Signed-off-by: NDon Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

e1ea9158

e1000e: fix .ndo_set_rx_mode for 82579 · 69e1e019

由 Bruce Allan 提交于 4月 14, 2012

Secondary unicast and multicast addresses are added to the Receive
Address registers (RAR) for most parts supported by the driver. For
82579, there is only one actual RAR and a number of Shared Receive Address
registers (SHRAR) that are shared among the driver and f/w which can be
reserved and write-protected by the f/w. On this device, use the SHRARs
that are not taken by f/w for the additional addresses.

Add a MAC ops function pointer infrastructure (similar to other MAC
operations in the driver) for setting RARs, introduce a new rar_set
function for 82579 and convert the existing code that sets RARs on other
devices to a generic rar_set function.
Signed-off-by: NBruce Allan <bruce.w.allan@intel.com>
Tested-by: NJeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

69e1e019

e1000e: PHY initialization flow changes for 82577/8/9 · cb17aab9

由 Bruce Allan 提交于 4月 13, 2012

The PHY initialization flows and assorted workarounds for 82577/8/9 done
during driver load and resume from Sx should be the same yet they are not.
Combine the current flows/workarounds into a common set of functions that
are called during the different code paths.
Signed-off-by: NBruce Allan <bruce.w.allan@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

cb17aab9

e1000e: workaround EEPROM configuration change on 82579 · 62bc813e

由 Bruce Allan 提交于 3月 20, 2012

An update to the EEPROM on 82579 will extend a delay in hardware to fix an
issue with WoL not working after a G3->S5 transition which is unrelated to
the driver. However, this extended delay conflicts with nominal operation
of the device when it is initialized by the driver and after every reset
of the hardware (i.e. the driver starts configuring the device before the
hardware is done with it's own configuration work). The workaround for
when the driver is in control of the device is to tell the hardware after
every reset the configuration delay should be the original shorter one.

Some pre-existing variables are renamed generically to be re-used with
new register accesses.
Signed-off-by: NBruce Allan <bruce.w.allan@intel.com>
Tested-by: NJeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

62bc813e

01 5月, 2012 18 次提交

netem: add ECN capability · e4ae004b

由 Eric Dumazet 提交于 4月 30, 2012

Add ECN (Explicit Congestion Notification) marking capability to netem

tc qdisc add dev eth0 root netem drop 0.5 ecn

Instead of dropping packets, try to ECN mark them.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: NHagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4ae004b

net: skb_peek()/skb_peek_tail() cleanups · 18d07000

由 Eric Dumazet 提交于 4月 30, 2012

remove useless casts and rename variables for less confusion.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18d07000

net: add a prefetch in socket backlog processing · e4cbb02a

由 Eric Dumazet 提交于 4月 30, 2012

TCP or UDP stacks have big enough latencies that prefetching next
pointer is worth it.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4cbb02a

l2tp: let iproute2 create L2TPv3 IP tunnels using IPv6 · 5dac94e1

由 James Chapman 提交于 4月 29, 2012

The netlink API lets users create unmanaged L2TPv3 tunnels using
iproute2. Until now, a request to create an unmanaged L2TPv3 IP
encapsulation tunnel over IPv6 would be rejected with
EPROTONOSUPPORT. Now that l2tp_ip6 implements sockets for L2TP IP
encapsulation over IPv6, we can add support for that tunnel type.
Signed-off-by: NJames Chapman <jchapman@katalix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5dac94e1

l2tp: introduce L2TPv3 IP encapsulation support for IPv6 · a32e0eec

由 Chris Elston 提交于 4月 29, 2012

L2TPv3 defines an IP encapsulation packet format where data is carried
directly over IP (no UDP). The kernel already has support for L2TP IP
encapsulation over IPv4 (l2tp_ip). This patch introduces support for
L2TP IP encapsulation over IPv6.

The implementation is derived from ipv6/raw and ipv4/l2tp_ip.
Signed-off-by: NChris Elston <celston@katalix.com>
Signed-off-by: NJames Chapman <jchapman@katalix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a32e0eec

ipv6: Export ipv6 functions for use by other protocols · a495f836

由 Chris Elston 提交于 4月 29, 2012

For implementing other protocols on top of IPv6, such as L2TPv3's IP
encapsulation over ipv6, we'd like to call some IPv6 functions which
are not currently exported. This patch exports them.
Signed-off-by: NChris Elston <celston@katalix.com>
Signed-off-by: NJames Chapman <jchapman@katalix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a495f836

l2tp: netlink api for l2tpv3 ipv6 unmanaged tunnels · f9bac8df

由 Chris Elston 提交于 4月 29, 2012

This patch adds support for unmanaged L2TPv3 tunnels over IPv6 using
the netlink API. We already support unmanaged L2TPv3 tunnels over
IPv4. A patch to iproute2 to make use of this feature will be
submitted separately.
Signed-off-by: NChris Elston <celston@katalix.com>
Signed-off-by: NJames Chapman <jchapman@katalix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f9bac8df

l2tp: show IPv6 addresses in l2tp debugfs file · 2121c3f5

由 Chris Elston 提交于 4月 29, 2012

If an L2TP tunnel uses IPv6, make sure the l2tp debugfs file shows the
IPv6 address correctly.
Signed-off-by: NChris Elston <celston@katalix.com>
Signed-off-by: NJames Chapman <jchapman@katalix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2121c3f5

l2tp: pppol2tp_connect() handles ipv6 sockaddr variants · b79585f5

由 James Chapman 提交于 4月 29, 2012

Userspace uses connect() to associate a pppol2tp socket with a tunnel
socket. This needs to allow the caller to supply the new IPv6
sockaddr_pppol2tp structures if IPv6 is used.
Signed-off-by: NJames Chapman <jchapman@katalix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b79585f5

pppox: Replace __attribute__((packed)) in if_pppox.h · 9d4ec1ae

由 James Chapman 提交于 4月 29, 2012

Checkpatch warns about the use of __attribute__((packed)). So use the
recommended __packed syntax instead.
Signed-off-by: NJames Chapman <jchapman@katalix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d4ec1ae

l2tp: remove unused stats from l2tp_ip socket · c8657fd5

由 James Chapman 提交于 4月 29, 2012

The l2tp_ip socket currently maintains packet/byte stats in its
private socket structure. But these counters aren't exposed to
userspace and so serve no purpose. The counters were also
smp-unsafe. So this patch just gets rid of the stats.

While here, change a couple of internal __u32 variables to u32.
Signed-off-by: NJames Chapman <jchapman@katalix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8657fd5

l2tp: Use ip4_datagram_connect() in l2tp_ip_connect() · de3c7a18

由 James Chapman 提交于 4月 29, 2012

Cleanup the l2tp_ip code to make use of an existing ipv4 support function.
Signed-off-by: NJames Chapman <jchapman@katalix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de3c7a18

l2tp: fix locking of 64-bit counters for smp · 5de7aee5

由 James Chapman 提交于 4月 29, 2012

L2TP uses 64-bit counters but since these are not updated atomically,
we need to make them safe for smp. This patch addresses that.
Signed-off-by: NJames Chapman <jchapman@katalix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5de7aee5

atl1c: remove PHY polling from atl1c_change_mtu · 80bcb423