提交 · 0fe6de490320bfbf1b82a33d7ee49b62af5f29db · gsplhtlxg / clone-Linux

14 1月, 2015 24 次提交

bridge: fix uninitialized variable warning · 0fe6de49

由 Roopa Prabhu 提交于 1月 12, 2015

net/bridge/br_netlink.c: In function ‘br_fill_ifinfo’:
net/bridge/br_netlink.c:146:32: warning: ‘vid_range_flags’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  err = br_fill_ifvlaninfo_range(skb, vid_range_start,
                                ^
net/bridge/br_netlink.c:108:6: note: ‘vid_range_flags’ was declared here
  u16 vid_range_flags;
Reported-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0fe6de49

ipv6: directly include libc-compat.h in ipv6.h · c66ad9ca

由 Willem de Bruijn 提交于 1月 12, 2015

Patch 3b50d902 ("ipv6: fix redefinition of in6_pktinfo ...")
fixed a libc compatibility issue in ipv6 structure definitions
as described in include/uapi/linux/libc-compat.h.

It relies on including linux/in6.h to include libc-compat.h itself.
Include that file directly to clearly communicate the dependency
(libc-compat.h: "This include must be as early as possible").
Signed-off-by: NWillem de Bruijn <willemb@google.com>

----

As discussed in http://patchwork.ozlabs.org/patch/427384/Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c66ad9ca

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 721f7951

由 David S. Miller 提交于 1月 13, 2015

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-01-13

This series contains updates to i40e and i40evf.

Mitch provides a fix for i40e to move the call to pci_disable_sriov() so
that it is called earlier to ensure that the PF driver won't free VF
resources before the VF remove routine can complete.  Also cleans up
redundant and duplicate code in the i40evf.  Refactors the i40evf
shutdown code and let the watchdog take care of shutting things down.
Fix a possible memory leak, if we are using VLANs and the communication
with the PF fail during shutdown.  On some versions of the firmware, the
VF admin send queue may become stalled.  In this case, the easiest
solution is to place another descriptor on the queue and the firmware
will then process both requests.

Greg adds a warning when the NPAR enabled partitions detected a link speed
less than 10 Gpbs.

Vasu removes redundant VN2VN MAC address which were already added by
the FCoE stack.

Shannon adds code to find how many partitions there are per port and
what is the current partition_id when in NPAR mode.  In multifunction
mode, make sure we only allow SR/IOV on the master PF of a port and
only allow partition 1 to set WoL, speed and flow control.

Kamil adds code to read the PBA block from shadow RAM and returns
the part number in a string format.

Catherine provides a fix to check if link state and link speed has
changed before exiting link event

v2: remove un-needed {} in patch #3 of the series based on feedback from
    Sergei Shtylyov
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

721f7951

i40e: limit sriov to partition 1 of NPAR configurations · ba252f13

由 Shannon Nelson 提交于 12月 11, 2014

Make sure we only allow SR/IOV on the master PF of a port in multifunction
mode. This should be the case anyway based on the num_vfs configured in
the NVM, but this will help make sure there's no question. If we're not
in multifunction mode the partition_id will always be 1.

Change-ID: I8b2592366fe6782f15301bde2ebd1d4da240109d
Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
Tested-by: NJim Young <james.m.young@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

ba252f13

i40e: Don't exit link event early if link speed has changed · fef59ddf

由 Catherine Sullivan 提交于 12月 11, 2014

Previously we were only checking if the link up state had changed,
and if it hadn't exiting the link event routine early. We should
also check if speed has changed, and if it has, stay and finish
processing the link event.

Change-ID: I9c8e0991b3f0279108a7858898c3c5ce0a9856b8
Signed-off-by: NCatherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

fef59ddf

i40e: limit WoL and link settings to partition 1 · f0d8c733

由 Shannon Nelson 提交于 12月 11, 2014

When in multi-function mode, e.g. Dell's NPAR, only partition 1
of each MAC is allowed to set WoL, speed, and flow control.

Change-ID: I87a9debc7479361c55a71f0120294ea319f23588
Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

f0d8c733

i40e: Adding function for reading PBA String · 18f680c6

由 Kamil Krawczyk 提交于 12月 11, 2014

Function will read PBA Block from Shadow RAM and return it in a string format.

Change-ID: I4ee7059f6e21bd0eba38687da15e772e0b4ab36e
Signed-off-by: NKamil Krawczyk <kamil.krawczyk@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

18f680c6

i40e/i40evf: find partition_id in npar mode · 9fee9db5

由 Shannon Nelson 提交于 12月 11, 2014

When in NPAR mode the driver instance might be controlling the base
partition or one of the other "fake" PFs.  There are some things that
can only be done by the base partition, aka partition_id 1.  This code
does a bit of work to find how many partitions are there per port and
what is the current partition_id.

Change-ID: Iba427f020a1983d02147d86f121b3627e20ee21d
Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

9fee9db5

i40e: remove VN2VN related mac filters · b2d4d905

由 Vasu Dev 提交于 12月 11, 2014

These mac address already added by FCoE stack above netdev,
therefore adding them here is redundant.

Change-ID: Ia5b59f426f57efd20f8945f7c6cc5d741fbe06e5
Signed-off-by: NVasu Dev <vasu.dev@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

b2d4d905

i40e: Add warning for NPAR partitions with link speed less than 10Gbps · 148c2d80

由 Greg Rose 提交于 12月 11, 2014

NPAR enabled partitions should warn the user when detected link speed is
less than 10Gpbs.

Change-ID: I7728bb8ce279bf0f4f755d78d7071074a4eb5f69
Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

148c2d80

i40evf: kick a stalled admin queue · 0758e7cb

由 Mitch A Williams 提交于 12月 09, 2014

On some versions of the firmware, the VF admin send queue may become
stalled. In this case, the easiest solution is to just place another
descriptor on the queue; the firmware will then process both requests.

The early init code already accounts for this, but the runtime code does
not. In the watchdog task, check for the stall condition, and if it's
found, send our API version to the PF. When the PF replies, just ignore
the reply.

Change-ID: I380d78185a4f284d649c44d263e648afc9b4d50c
Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
Acked-by: NShannon Nelson <shannon.nelson@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

0758e7cb

i40evf: enable interrupt 0 appropriately · 7235448c

由 Mitch A Williams 提交于 12月 09, 2014

Don't enable vector 0 in the ISR, just schedule the adminq task and let
it enable the vector. This prevents the task from being called
reentrantly. Make sure that the vector is enabled on all exit paths of
the adminq task, including error exits.

Change-ID: I53f3d14f91ed7a9e90291ea41c681122a5eca5b5
Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
Acked-by: NShannon Nelson <shannon.nelson@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

7235448c

i40evf: don't fire traffic IRQs when the interface is down · 4870e176

由 Mitch A Williams 提交于 12月 09, 2014

There is always a possibility that MSI-X interrupts can get lost. To
keep this problem from stalling the driver, we fire all of our MSI-X
vectors during the watchdog routine. However, we should not fire the
traffic vectors when the interface is closed. In this case, just fire
vector 0, which is used for admin queue events.

As a result, we do not enable the interrupt cause for vector 0. This
can cause the admin queue handler to be called reentrantly, which
causes a scary "critical section violation" message to be logged,
even though no real damage is done.

Change-ID: Ic43a5184708ab2cb9a23fca7dedd808a46717795
Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
Acked-by: NShannon Nelson <shannon.nelson@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

4870e176

i40evf: remove leftover VLAN filters · 37dfdf37

由 Mitch A Williams 提交于 12月 09, 2014

If we're using VLANs and communications with the PF fail during
shutdown, we will leak memory because not all of the VLAN filters will
be removed. To eliminate this possibility, go through the list again
right before the module is removed and delete any leftover entries.

Change-ID: Id3b5315c47ca0a61ae123a96ff345d010bc41aed
Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
Acked-by: NShannon Nelson <shannon.nelson@intel.com>
Tested-by: NJim Young <james.m.young@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

37dfdf37

i40evf: refactor shutdown code · 53d0b3ae

由 Mitch A Williams 提交于 12月 09, 2014

If the VF driver is running in the host, the shutdown code is completely
broken. We cannot wait in our down routine for the PF to respond to our
requests, as its admin queue task will never run while we hold the lock.

Instead, we schedule operations, then let the watchdog take care of
shutting things down. If the driver is being removed, then wait in the
remove routine until the watchdog is done before continuing.

Change-ID: I93a58d17389e8d6b58f21e430b56ed7b4590b2c5
Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
Acked-by: NShannon Nelson <shannon.nelson@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

53d0b3ae

openvswitch: Remove unnecessary version.h inclusion · a440edf1

由 Syam Sidhardhan 提交于 1月 12, 2015

version.h inclusion is not necessary as detected by versioncheck.
Signed-off-by: NSyam Sidhardhan <s.syam@samsung.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a440edf1

i40evf: Remove some scary log messages · d4f82fd3

由 Mitch A Williams 提交于 12月 09, 2014

These messages may be triggered during normal init of the driver if the
PF or FW take a long time to respond. There's nothing really wrong, so
don't freak people out logging messages.

If the communication channel really is dead, then we'll retry a few
times and give up. This will log a different more scary message that
should cause consternation. This allows the user to more easily detect a
genuine failure.

Change-ID: I6e2b758d4234a3a09c1015c82c8f2442a697cbdb
Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
Acked-by: NShannon Nelson <shannon.nelson@intel.com>
Tested-by: NJim Young <james.m.young@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

d4f82fd3

i40evf: remove redundant code · ff30cb6b

由 Mitch A Williams 提交于 12月 09, 2014

These functions are redundant and duplicate functionality found in
i40evf_free_all_[tx|rx]_resources.

Change-ID: Ia199908926d7a1a4b8247f75f89b5da24c9b149c
Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
Acked-by: NShannon Nelson <shannon.nelson@intel.com>
Tested-by: NJim Young <james.m.young@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

ff30cb6b

i40e: disable IOV before freeing resources · 6a9ddb36

由 Mitch A Williams 提交于 12月 09, 2014

If VF drivers are loaded in the host OS, the call to pci_disable_sriov()
will cause these drivers' remove routines to be called. If the PF driver
has already freed VF resources before this happens, then the VF remove
routine can't properly communicate with the PF driver causing all sorts
of mayhem and error messages and hurt feelings.

To fix this, we move the call to pci_disable_sriov() up to the top of
the function and let it complete before freeing any VF resources.

Change-ID: I397c3997a00f6408e32b7735273911e499600236
Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
Acked-by: NShannon Nelson <shannon.nelson@intel.com>
Tested-by: NJim Young <james.m.young@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

6a9ddb36

tcp: avoid reducing cwnd when ACK+DSACK is received · 08abdffa

由 Sébastien Barré 提交于 1月 12, 2015

With TLP, the peer may reply to a probe with an
ACK+D-SACK, with ack value set to tlp_high_seq. In the current code,
such ACK+DSACK will be missed and only at next, higher ack will the TLP
episode be considered done. Since the DSACK is not present anymore,
this will cost a cwnd reduction.

This patch ensures that this scenario does not cause a cwnd reduction, since
receiving an ACK+DSACK indicates that both the initial segment and the probe
have been received by the peer.

The following packetdrill test, from Neal Cardwell, validates this patch:

// Establish a connection.
0     socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0     setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0    bind(3, ..., ...) = 0
+0    listen(3, 1) = 0

+0    < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0    > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
+.020 < . 1:1(0) ack 1 win 257
+0    accept(3, ..., ...) = 4

// Send 1 packet.
+0    write(4, ..., 1000) = 1000
+0    > P. 1:1001(1000) ack 1

// Loss probe retransmission.
// packets_out == 1 => schedule PTO in max(2*RTT, 1.5*RTT + 200ms)
// In this case, this means: 1.5*RTT + 200ms = 230ms
+.230 > P. 1:1001(1000) ack 1
+0    %{ assert tcpi_snd_cwnd == 10 }%

// Receiver ACKs at tlp_high_seq with a DSACK,
// indicating they received the original packet and probe.
+.020 < . 1:1(0) ack 1001 win 257 <sack 1:1001,nop,nop>
+0    %{ assert tcpi_snd_cwnd == 10 }%

// Send another packet.
+0    write(4, ..., 1000) = 1000
+0    > P. 1001:2001(1000) ack 1

// Receiver ACKs above tlp_high_seq, which should end the TLP episode
// if we haven't already. We should not reduce cwnd.
+.020 < . 1:1(0) ack 2001 win 257
+0    %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%

Credits:
-Gregory helped in finding that tcp_process_tlp_ack was where the cwnd
got reduced in our MPTCP tests.
-Neal wrote the packetdrill test above
-Yuchung reworked the patch to make it more readable.

Cc: Gregory Detal <gregory.detal@uclouvain.be>
Cc: Nandita Dukkipati <nanditad@google.com>
Tested-by: NNeal Cardwell <ncardwell@google.com>
Reviewed-by: NYuchung Cheng <ycheng@google.com>
Reviewed-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NSébastien Barré <sebastien.barre@uclouvain.be>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08abdffa

Merge branch 'rhashtable-next' · 52e3ad9f

由 David S. Miller 提交于 1月 13, 2015

Ying Xue says:

====================
remove nl_sk_hash_lock from netlink socket

After tipc socket successfully avoids the involvement of an extra lock
with rhashtable_lookup_insert(), it's possible for netlink socket to
remove its hash socket lock now. But as netlink socket needs a compare
function to look for an object, we first introduce a new function
called rhashtable_lookup_compare_insert() in commit #1 which is
implemented based on original rhashtable_lookup_insert(). We
subsequently remove nl_sk_hash_lock from netlink socket with the new
introduced function in commit #2. Lastly, as Thomas requested, we add
commit #3 to indicate the implementation of what the grow and shrink
decision function must enforce min/max shift.

v2:
 As Thomas pointed out, there was a race between checking portid and
 then setting it in commit #2. Now use socket lock to make the process
 of both checking and setting portid atomic, and then eliminate the
 race.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52e3ad9f

rhashtable: add a note for grow and shrink decision functions · 6f73d3b1

由 Ying Xue 提交于 1月 12, 2015

As commit c0c09bfd ("rhashtable: avoid unnecessary wakeup for
worker queue") moves condition statements of verifying whether hash
table size exceeds its maximum threshold or reaches its minimum
threshold from resizing functions to resizing decision functions,
we should add a note in rhashtable.h to indicate the implementation
of what the grow and shrink decision function must enforce min/max
shift, otherwise, it's failed to take min/max shift's set watermarks
into effect.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f73d3b1

netlink: eliminate nl_sk_hash_lock · c5adde94

由 Ying Xue 提交于 1月 12, 2015

As rhashtable_lookup_compare_insert() can guarantee the process
of search and insertion is atomic, it's safe to eliminate the
nl_sk_hash_lock. After this, object insertion or removal will
be protected with per bucket lock on write side while object
lookup is guarded with rcu read lock on read side.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5adde94

rhashtable: involve rhashtable_lookup_compare_insert routine · 7a868d1e

由 Ying Xue 提交于 1月 12, 2015

Introduce a new function called rhashtable_lookup_compare_insert()
which is very similar to rhashtable_lookup_insert(). But the former
makes use of users' given compare function to look for an object,
and then inserts it into hash table if found. As the entire process
of search and insertion is under protection of per bucket lock, this
can help users to avoid the involvement of extra lock.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a868d1e

13 1月, 2015 16 次提交

Merge branch 'tuntap_queues' · d2c60b13

由 David S. Miller 提交于 1月 12, 2015

Pankaj Gupta says:

====================
Increase the limit of tuntap queues

Networking under KVM works best if we allocate a per-vCPU rx and tx
queue in a virtual NIC. This requires a per-vCPU queue on the host side.
Modern physical NICs have multiqueue support for large number of queues.
To scale vNIC to run multiple queues parallel to maximum number of vCPU's
we need to increase number of queues support in tuntap.

Changes from v4:
PATCH2: Michael.S.Tsirkin - Updated change comment message.

Changes from v3:
PATCH1: Michael.S.Tsirkin - Some cleanups and updated commit message.
                            Perf numbers on 10 Gbs NIC
Changes from v2:
PATCH 3: David Miller     - flex array adds extra level of indirection
                            for preallocated array.(dropped, as flow array
			    is allocated using kzalloc with failover to zalloc).
Changes from v1:
PATCH 2: David Miller     - sysctl changes to limit number of queues
                            not required for unprivileged users(dropped).

Changes from RFC
PATCH 1: Sergei Shtylyov  - Add an empty line after declarations.
PATCH 2: Jiri Pirko -       Do not introduce new module paramaters.
	 Michael.S.Tsirkin- We can use sysctl for limiting max number
                            of queues.

This series is to increase the number of tuntap queues. Original work is being
done by 'jasowang@redhat.com'. I am taking this 'https://lkml.org/lkml/2013/6/19/29'
patch series as a reference. As per discussion in the patch series:

There were two reasons which prevented us from increasing number of tun queues:

- The netdev_queue array in netdevice were allocated through kmalloc, which may
  cause a high order memory allocation too when we have several queues.
  E.g. sizeof(netdev_queue) is 320, which means a high order allocation would
  happens when the device has more than 16 queues.

- We store the hash buckets in tun_struct which results a very large size of
  tun_struct, this high order memory allocation fail easily when the memory is
  fragmented.

The patch 60877a32 increases the number of tx
queues. Memory allocation fallback to vzalloc() when kmalloc() fails.

This series tries to address following issues:

- Increase the number of netdev_queue queues for rx similarly its done for tx
  queues by falling back to vzalloc() when memory allocation with kmalloc() fails.

- Increase number of queues to 256, maximum number is equal to maximum number
  of vCPUS allowed in a guest.

I have also done testing with multiple parallel Netperf sessions for different
combination of queues and CPU's. It seems to be working fine without much increase
in cpu load with increase in number of queues. I also see good increase in throughput
with increase in number of queues. Though i had limitation of 8 physical CPU's.

For this test: Two Hosts(Host1 & Host2) are directly connected with cable
Host1 is running Guest1. Data is sent from Host2 to Guest1 via Host1.

Host kernel: 3.19.0-rc2+, AMD Opteron(tm) Processor 6320
NIC : Emulex Corporation OneConnect 10Gb NIC (be3)

Patch Applied  %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle  throughput
Single Queue, 2 vCPU's
-------------
Before Patch :all    0.19    0.00    0.16    0.07    0.04    0.10    0.00    0.18    0.00   99.26  57864.18
After  Patch :all    0.99    0.00    0.64    0.69    0.07    0.26    0.00    1.58    0.00   95.77  57735.77

With 2 Queues, 2 vCPU's
---------------
Before Patch :all    0.19    0.00    0.19    0.10    0.04    0.11    0.00    0.28    0.00   99.08  63083.09
After  Patch :all    0.87    0.00    0.73    0.78    0.09    0.35    0.00    2.04    0.00   95.14  62917.03

With 4 Queues, 4 vCPU's
--------------
Before Patch :all    0.20    0.00    0.21    0.11    0.04    0.12    0.00    0.32    0.00   99.00  80865.06
After  Patch :all    0.71    0.00    0.93    0.85    0.11    0.51    0.00    2.62    0.00   94.27  86463.19

With 8 Queues, 8 vCPU's
--------------
Before Patch :all    0.19    0.00    0.18    0.09    0.04    0.11    0.00    0.23    0.00   99.17  86795.31
After  Patch :all    0.65    0.00    1.18    0.93    0.13    0.68    0.00    3.38    0.00   93.05  89459.93

With 16 Queues, 8 vCPU's
--------------
After  Patch :all    0.61    0.00    1.59    0.97    0.18    0.92    0.00    4.32    0.00   91.41  120951.60
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2c60b13

tuntap: Increase the number of queues in tun. · baf71c5c

由 Pankaj Gupta 提交于 1月 12, 2015

Networking under kvm works best if we allocate a per-vCPU RX and TX
queue in a virtual NIC. This requires a per-vCPU queue on the host side.

It is now safe to increase the maximum number of queues.
Preceding patch: 'net: allow large number of rx queues'
made sure this won't cause failures due to high order memory
allocations. Increase it to 256: this is the max number of vCPUs
KVM supports.

Size of tun_struct changes from 8512 to 10496 after this patch. This keeps
pages allocated for tun_struct before and after the patch to 3.
Signed-off-by: NPankaj Gupta <pagupta@redhat.com>
Reviewed-by: NDavid Gibson <dgibson@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

baf71c5c

net: allow large number of rx queues · 10595902

由 Pankaj Gupta 提交于 1月 12, 2015

netif_alloc_rx_queues() uses kcalloc() to allocate memory
for "struct netdev_queue *_rx" array.
If we are doing large rx queue allocation kcalloc() might
fail, so this patch does a fallback to vzalloc().
Similar implementation is done for tx queue allocation in
netif_alloc_netdev_queues().

We avoid failure of high order memory allocation
with the help of vzalloc(), this allows us to do large
rx and tx queue allocation which in turn helps us to
increase the number of queues in tun.

As vmalloc() adds overhead on a critical network path,
__GFP_REPEAT flag is used with kzalloc() to do this fallback
only when really needed.
Signed-off-by: NPankaj Gupta <pagupta@redhat.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NDavid Gibson <dgibson@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10595902

team: Remove dead code · e350a96e

由 Kenneth Williams 提交于 1月 11, 2015

The deleted lines are called from a function which is called:
1) Only through __team_options_register via team_options_register and
2) Only during initialization / mode initialization when there are no
ports attached.
Therefore the ports list is guarenteed to be empty and this code will
never be executed.
Signed-off-by: NKenneth Williams <ken@williamsclan.us>
Acked-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e350a96e

net: bnx2x: avoid macro redefinition · b1e8bc61

由 David Decotigny 提交于 1月 11, 2015

Signed-off-by: NDavid Decotigny <decot@googlers.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1e8bc61

net: sched: sch_teql: Remove unused function · ddcde70c

由 Rickard Strandqvist 提交于 1月 11, 2015

Remove the function teql_neigh_release() that is not used anywhere.

This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: NRickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ddcde70c

net: xfrm: xfrm_algo: Remove unused function · 83400b99

由 Rickard Strandqvist 提交于 1月 11, 2015

Remove the function aead_entries() that is not used anywhere.

83400b99

Merge branch 'bridge_vlan_ranges' · d0d2cc53

由 David S. Miller 提交于 1月 12, 2015

Roopa Prabhu says:

====================
bridge: support for vlan range in setlink/dellink

This series adds new flags in IFLA_BRIDGE_VLAN_INFO to indicate
vlan range.

Will post corresponding iproute2 patches if these get accepted.

v1-> v2
    - changed patches to use a nested list attribute
    IFLA_BRIDGE_VLAN_INFO_LIST as suggested by scott feldman
    - dropped notification changes from the series. Will post them
    separately after this range message is accepted.

v2 -> v3
    - incorporated some review feedback
    - include patches to fill vlan ranges during getlink
    - Dropped IFLA_BRIDGE_VLAN_INFO_LIST. I think it may get
    confusing to userspace if we introduce yet another way to
    send lists. With getlink already sending nested
    IFLA_BRIDGE_VLAN_INFO in IFLA_AF_SPEC, It seems better to
    use the existing format for lists and just use the flags from v2
    to mark vlan ranges
====================
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NWilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0d2cc53

bridge: new function to pack vlans into ranges during gets · 36cd0ffb

由 Roopa Prabhu 提交于 1月 10, 2015

This patch adds new function to pack vlans into ranges
whereever applicable using the flags BRIDGE_VLAN_INFO_RANGE_BEGIN
and BRIDGE VLAN_INFO_RANGE_END

Old vlan packing code is moved to a new function and continues to be
called when filter_mask is RTEXT_FILTER_BRVLAN.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36cd0ffb

rtnetlink: new filter RTEXT_FILTER_BRVLAN_COMPRESSED · 35a27cee

由 Roopa Prabhu 提交于 1月 10, 2015

This filter is same as RTEXT_FILTER_BRVLAN except that it tries
to compress the consecutive vlans into ranges.

This helps on systems with large number of configured vlans.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35a27cee

bridge: support for multiple vlans and vlan ranges in setlink and dellink requests · bdced7ef

由 Roopa Prabhu 提交于 1月 10, 2015

This patch changes bridge IFLA_AF_SPEC netlink attribute parser to
look for more than one IFLA_BRIDGE_VLAN_INFO attribute. This allows
userspace to pack more than one vlan in the setlink msg.

The dumps were already sending more than one vlan info in the getlink msg.

This patch also adds bridge_vlan_info flags BRIDGE_VLAN_INFO_RANGE_BEGIN and
BRIDGE_VLAN_INFO_RANGE_END to indicate start and end of vlan range

This patch also deletes unused ifla_br_policy.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bdced7ef

drivers: net: xen-netfront: remove residual dead code · dd2e8bf5

由 Vincenzo Maffione 提交于 1月 10, 2015

This patch removes some unused arrays from the netfront private
data structures. These arrays were used in "flip" receive mode.
Signed-off-by: NVincenzo Maffione <v.maffione@gmail.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dd2e8bf5

Driver: Vmxnet3: Reinitialize vmxnet3 backend on wakeup from hibernate · 5ec82c1e

由 Shrikrishna Khare 提交于 1月 09, 2015

Failing to reinitialize on wakeup results in loss of network connectivity for
vmxnet3 interface.
Signed-off-by: NSrividya Murali <smurali@vmware.com>
Signed-off-by: NShrikrishna Khare <skhare@vmware.com>
Reviewed-by: NShreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ec82c1e

bonding: cleanup bond_opts array · 7bfa0145

由 Jonathan Toppins 提交于 1月 09, 2015

Remove the empty array element initializer and size the array with
BOND_OPT_LAST so the compiler will complain if more elements are in
there than should be.

An interesting unwanted side effect of this initializer is that if one
inserts new options into the middle of the array then this initializer
will zero out the option that equals BOND_OPT_TLB_DYNAMIC_LB+1.

Example:
Extend the OPTS enum:
enum {
   ...
   BOND_OPT_TLB_DYNAMIC_LB,
   BOND_OPT_LACP_NEW1,
   BOND_OPT_LAST
};

Now insert into bond_opts array:
static const struct bond_option bond_opts[] = {
      ...
      [BOND_OPT_LACP_RATE] = { .... unchanged stuff .... },
      [BOND_OPT_LACP_NEW1] = { ... new stuff ... },
      ...
      [BOND_OPT_TLB_DYNAMIC_LB] = { .... unchanged stuff ....},
      { } // MARK A
};

Since BOND_OPT_LACP_NEW1 = BOND_OPT_TLB_DYNAMIC_LB+1, the last
initializer (MARK A) will overwrite the contents of BOND_OPT_LACP_NEW1
and can be easily viewed with the crash utility.
Signed-off-by: NJonathan Toppins <jtoppins@cumulusnetworks.com>
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7bfa0145

Merge branch 'tipc-namespaces' · d9fbfb94

由 David S. Miller 提交于 1月 12, 2015

Ying Xue says:

====================
tipc: make tipc support namespace

This patchset aims to add net namespace support for TIPC stack.

Currently TIPC module declares the following global resources:
- TIPC network idenfication number
- TIPC node table
- TIPC bearer list table
- TIPC broadcast link
- TIPC socket reference table
- TIPC name service table
- TIPC node address
- TIPC service subscriber server
- TIPC random value
- TIPC netlink

In order that TIPC is aware of namespace, above each resource must be
allocated, initialized and destroyed inside per namespace. Therefore,
the major works of this patchset are to isolate these global resources
and make them private for each namespace. However, before these changes
come true, some necessary preparation works must be first done: convert
socket reference table with generic rhashtable, cleanup core.c and
core.h files, remove unnecessary wrapper functions for kernel timer
interfaces and so on.

It should be noted that commit ##1 ("tipc: fix bug in broadcast
retransmit code") was already submitted to 'net' tree, so please see
below link:

http://patchwork.ozlabs.org/patch/426717/

Since it is prerequisite for the rest of the series to apply, I
prepend them to the series.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9fbfb94

tipc: make netlink support net namespace · d49e2041

由 Ying Xue 提交于 1月 09, 2015

Currently tipc module only allows users sitting on "init_net" namespace
to configure it through netlink interface. But now almost each tipc
component is able to be aware of net namespace, so it's time to open
the permission for users residing in other namespaces, allowing them
to configure their own tipc stack instance through netlink interface.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Tested-by: NTero Aho <Tero.Aho@coriant.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d49e2041