提交 · 0e033e04c2678dbbe74a46b23fffb7bb918c288e · openanolis / cloud-kernel

06 11月, 2013 8 次提交

ipv6: fix headroom calculation in udp6_ufo_fragment · 0e033e04

由 Hannes Frederic Sowa 提交于 11月 05, 2013

Commit 1e2bd517 ("udp6: Fix udp
fragmentation for tunnel traffic.") changed the calculation if
there is enough space to include a fragment header in the skb from a
skb->mac_header dervived one to skb_headroom. Because we already peeled
off the skb to transport_header this is wrong. Change this back to check
if we have enough room before the mac_header.

This fixes a panic Saran Neti reported. He used the tbf scheduler which
skb_gso_segments the skb. The offsets get negative and we panic in memcpy
because the skb was erroneously not expanded at the head.
Reported-by: NSaran Neti <Saran.Neti@telus.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e033e04

net: mv643xx_eth: Add missing phy_addr_set in DT mode · 1cce16d3

由 Jason Gunthorpe 提交于 11月 04, 2013

Commit cc9d4598 'net: mv643xx_eth: use of_phy_connect if phy_node
present' made the call to phy_scan optional, if the DT has a link to
the phy node.

However phy_scan has the side effect of calling phy_addr_set, which
writes the phy MDIO address to the ethernet controller. If phy_addr_set
is not called, and the bootloader has not set the correct address then
the driver will fail to function.

Tested on Kirkwood.
Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Acked-by: NSebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Tested-by: NArnaud Ebalard <arno@natisbad.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1cce16d3

ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE · 482fc609

由 Hannes Frederic Sowa 提交于 11月 05, 2013

Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery,
their sockets won't accept and install new path mtu information and they
will always use the interface mtu for outgoing packets. It is guaranteed
that the packet is not fragmented locally. But we won't set the DF-Flag
on the outgoing frames.

Florian Weimer had the idea to use this flag to ensure DNS servers are
never generating outgoing fragments. They may well be fragmented on the
path, but the server never stores or usees path mtu values, which could
well be forged in an attack.

(The root of the problem with path MTU discovery is that there is
no reliable way to authenticate ICMP Fragmentation Needed But DF Set
messages because they are sent from intermediate routers with their
source addresses, and the IMCP payload will not always contain sufficient
information to identify a flow.)

Recent research in the DNS community showed that it is possible to
implement an attack where DNS cache poisoning is feasible by spoofing
fragments. This work was done by Amir Herzberg and Haya Shulman:
<https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf>

This issue was previously discussed among the DNS community, e.g.
<http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>,
without leading to fixes.

This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode
regarding local fragmentation with UFO/CORK" for the enforcement of the
non-fragmentable checks. If other users than ip_append_page/data should
use this semantic too, we have to add a new flag to IPCB(skb)->flags to
suppress local fragmentation and check for this in ip_finish_output.

Many thanks to Florian Weimer for the idea and feedback while implementing
this patch.

Cc: David S. Miller <davem@davemloft.net>
Suggested-by: NFlorian Weimer <fweimer@redhat.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

482fc609

Merge branch 'huawei_cdc_ncm' · b9155501

由 David S. Miller 提交于 11月 05, 2013

Bjørn Mork says:

====================
The huawei_cdc_ncm driver.

Enrico has been kind enough to let me repost his driver with the changes
requested by Oliver Neukum during the last review of this series.

The changes I have made from Enricos original v5 series to this version
are:

v6:
 - fix to avoid corrupting drvstate->pmcount
 - fix error return value from huawei_cdc_ncm_suspend()
 - drop redundant testing for subdriver->suspend during resume
 - broke a few lines to keep within the 80 columns recommendation
 - rebased on top of current net-next

Enrico's orginal introduction to the v5 series follows below.  It explains
the background much better than I can.

Bjørn

[quote Enrico Mioso]

So this is a new, revised, edition of the huawei_cdc_ncm.c driver, which
supports devices resembling the NCM standard, but using it also as a mean
to encapsulate other protocols, as is the case for the Huawei E3131 and
E3251 modem devices.
Some precisations are needed however - and I encourage discussion on this: and
that's why I'm sending this message with a broader CC.
Merging those patches might change:
- the way Modem Manager interacts with those devices
- some regressions might be possible if there are some unknown firmware
  variants around (Franko?)

First of all: I observed the behaviours of two devices.
Huawei E3131: this device doesn't accept NDIS setup requests unless they're
sent via the embedded AT channel exposed by this driver.
So actually we gain funcionality in this case!

The second case, is the Huawei E3251: which works with standard NCM driver,
still exposing an AT embedded channel. Whith this patch set applied, you gain
some funcionality, loosing the ability to catch standard NCM events for now.
The device will work in both ways with no problems, but this has to be
acknowledged and discussed. Might be we can develop this driver further to
change this, when more devices are tested.

We where thinking Huawei changed their interfaces on new devices - but probably
this driver only works around a nice firmware bug present in E3131, which
prevented the modem from being used in NDIS mode.

I think committing this is definitely wortth-while, since it will allow for
more Huawei devices to be used without serial connection. Some devices like the
E3251 also, reports some status information only via the embedded AT channel,
at least in my case.
Note: I'm not subscribed to any list except the Modem Manager's one, so please
CC me, thanks!!

[/quote]

Enrico Mioso (3):
  net: cdc_ncm: Export cdc_ncm_{tx,rx}_fixup functions for re-use
  net: huawei_cdc_ncm: Introduce the huawei_cdc_ncm driver
  net: cdc_ncm: remove non-standard NCM device IDs
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9155501

net: cdc_ncm: remove non-standard NCM device IDs · 9fea037d

由 Enrico Mioso 提交于 11月 04, 2013

Remove device IDs of NCM-like (but not NCM-conformant) devices, that are
handled by the huawwei_cdc_ncm driver now.
Signed-off-by: NEnrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: NBjørn Mork <bjorn@mork.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9fea037d

net: huawei_cdc_ncm: Introduce the huawei_cdc_ncm driver · 41c47d8c

由 Enrico Mioso 提交于 11月 04, 2013

This driver supports devices using the NCM protocol as an encapsulation layer
for other protocols, like the E3131 Huawei 3G modem. This drivers approach was
heavily inspired by the qmi_wwan/cdc_mbim approach & code model.
Signed-off-by: NEnrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: NBjørn Mork <bjorn@mork.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41c47d8c

net: cdc_ncm: Export cdc_ncm_{tx, rx}_fixup functions for re-use · 2f69702c

由 Enrico Mioso 提交于 11月 04, 2013

Some drivers implementing NCM-like protocols, may re-use those functions, as is
the case in the huawei_cdc_ncm driver.
Export them via EXPORT_SYMBOL_GPL, in accordance with how other functions have
been exported.
Signed-off-by: NEnrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: NBjørn Mork <bjorn@mork.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f69702c

ipv6: remove old conditions on flow label sharing · b579035f

由 Florent Fourcot 提交于 11月 02, 2013

The code of flow label in Linux Kernel follows
the rules of RFC 1809 (an informational one) for
conditions on flow label sharing. There rules are
not in the last proposed standard for flow label
(RFC 6437), or in the previous one (RFC 3697).

Since this code does not follow any current or
old standard, we can remove it.

With this removal, the ipv6_opt_cmp function is
now a dead code and it can be removed too.

Changelog to v1:
 * add justification for the change
 * remove the condition on IPv6 options

[ Remove ipv6_hdr_cmp and it is now unused as well. -DaveM ]
Signed-off-by: NFlorent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b579035f

05 11月, 2013 32 次提交

Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next · cfce0a2b

由 David S. Miller 提交于 11月 05, 2013

John W. Linville says:

====================
Please accept the following pull request intended for the 3.13 tree...

I had intended to pass most of these to you as much as two weeks ago.
Unfortunately, I failed to account for the effects of bad Internet
connections and my own fatique/laziness while traveling.  On the bright
side, at least these have been baking in linux-next for some time!

For the mac80211 bits, Johannes says:

"This time I have two fixes for P2P (which requires not using CCK rates)
and a workaround for APs with broken WMM information."

For the iwlwifi bits, Johannes says:

"I have a few fixes for warnings/issues: one from Alex, fixing scan
timings, one from Emmanuel fixing a WARN_ON in the DVM driver, one from
Stanislaw removing a trigger-happy WARN_ON in the MVM driver and a
change from myself to try to recover when the device isn't processing
commands quickly."

And:

"For this round, I have a lot of changes:
 * power management improvements
 * BT coexistence improvements/updates
 * new device support
 * VHT support
 * IBSS support (though due to a small bug it requires new firmware)
 * various other fixes/improvements."

For the Bluetooth bits, Gustavo says:

"More patches for 3.12, busy times for Bluetooth. More than a 100 commits since
the last pull. The bulk of work comes from Johan and Marcel, they are doing
fixes and improvements all over the Bluetooth subsystem, as the diffstat can
show."

For the ath10k and ath6kl bits, Kalle says:

"Bartosz added support to ath10k for our 10.x AP firmware branch, which
gives us AP specific features and fixes. We still support the main
firmware branch as well just like before, ath10k detects runtime what
firmware is used. Unfortunately the firmware interface in 10.x branch is
somewhat different so there was quite a lot of changes in ath10k for
this.

Michal and Sujith did some performance improvements in ath10k. Vladimir
fixed a compiler warning and Fengguang removed an extra semicolon."

For the NFC bits, Samuel says:

"It's a fairly big one, with the following highlights:

- NFC digital layer implementation: Most NFC chipsets implement the NFC
  digital layer in firmware, but others have more basic functionalities
  and expect the host to implement the digital layer. This layer sits
  below the NFC core.

- Sony's port100 support: This is "soft" NFC USB dongle that expects the
  digital layer to be implemented on the host. This is the first user of
  our NFC digital stack implementation.

- Secure element API: We now provide a netlink API for enabling,
  disabling and discovering NFC attached (embedded or UICC ones) secure
  elements. With some userspace help, this allows us to support NFC
  payments.
  Only the pn544 driver currently supports that API.

- NCI SPI fixes and improvements: In order to support NCI devices over
  SPI, we fixed and improved our NCI/SPI implementation. The currently
  most deployed NFC NCI chipset, Broadcom's bcm2079x, supports that mode
  and we're planning to use our NCI/SPI framework to implement a
  driver for it.

- pn533 fragmentation support in target mode: This was the only missing
  feature from our pn533 impementation. We now support fragmentation in
  both Tx and Rx modes, in target mode."

On top of all that, brcmfmac and rt2x00 both get the usual flurry
of updates.  A few other drivers get hit here or there as well.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cfce0a2b

virtio-net: coalesce rx frags when possible during rx · ba275241

由 Jason Wang 提交于 11月 01, 2013

Commit 2613af0e (virtio_net: migrate mergeable
rx buffers to page frag allocators) try to increase the payload/truesize for
MTU-sized traffic. But this will introduce the extra overhead for GSO packets
received because of the frag list. This commit tries to reduce this issue by
coalesce the possible rx frags when possible during rx. Test result shows the
about 15% improvement on full size GSO packet receiving (and even better than
before commit 2613af0e).

Before this commit:
./netperf -H 192.168.100.4
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4
() port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    20303.87

After this commit:
./netperf -H 192.168.100.4
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4
() port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    23841.26

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Michael Dalton <mwdalton@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba275241

net: introduce skb_coalesce_rx_frag() · f8e617e1

由 Jason Wang 提交于 11月 01, 2013

Sometimes we need to coalesce the rx frags to avoid frag list. One example is
virtio-net driver which tries to use small frags for both MTU sized packet and
GSO packet. So this patch introduce skb_coalesce_rx_frag() to do this.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Michael Dalton <mwdalton@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8e617e1

vxlan: Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...)) · e50fddc8

由 Duan Jiong 提交于 11月 01, 2013

trivial patch converting ERR_PTR(PTR_ERR()) into ERR_CAST().
No functional changes.
Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e50fddc8

net: codel: Avoid undefined behavior from signed overflow · 1ba3aab3

由 Jesper Dangaard Brouer 提交于 10月 31, 2013

As described in commit 5a581b36 (jiffies: Avoid undefined
behavior from signed overflow), according to the C standard
3.4.3p3, overflow of a signed integer results in undefined
behavior.

To fix this, do as the above commit, and do an unsigned
subtraction, and interpreting the result as a signed
two's-complement number.  This is based on the theory from
RFC 1982 and is nicely described in wikipedia here:
 https://en.wikipedia.org/wiki/Serial_number_arithmetic#General_Solution

A side-note, I have seen practical issues with the previous logic
when dealing with 16-bit, on a 64-bit machine (gcc version
4.4.5). This were 32-bit, which I have not observed issues with.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NJesper Dangaard Brouer <netoptimizer@brouer.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ba3aab3

Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next · 13521a57

由 David S. Miller 提交于 11月 04, 2013

Marc Kleine-Budde says:

====================
here's a pull request for net-next.

It includes a patch by Oliver Hartkopp et al. that adds documentation
for the broadcast manager to Documentation/networking/can.txt. Three
patches by me that clean up the netlink handling code in the CAN core.
And another patch that removes a not needed function from the ti_hecc
driver.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13521a57

tcp: properly handle stretch acks in slow start · 9f9843a7

由 Yuchung Cheng 提交于 10月 31, 2013

Slow start now increases cwnd by 1 if an ACK acknowledges some packets,
regardless the number of packets. Consequently slow start performance
is highly dependent on the degree of the stretch ACKs caused by
receiver or network ACK compression mechanisms (e.g., delayed-ACK,
GRO, etc).  But slow start algorithm is to send twice the amount of
packets of packets left so it should process a stretch ACK of degree
N as if N ACKs of degree 1, then exits when cwnd exceeds ssthresh. A
follow up patch will use the remainder of the N (if greater than 1)
to adjust cwnd in the congestion avoidance phase.

In addition this patch retires the experimental limited slow start
(LSS) feature. LSS has multiple drawbacks but questionable benefit. The
fractional cwnd increase in LSS requires a loop in slow start even
though it's rarely used. Configuring such an increase step via a global
sysctl on different BDPS seems hard. Finally and most importantly the
slow start overshoot concern is now better covered by the Hybrid slow
start (hystart) enabled by default.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f9843a7

tcp: enable sockets to use MSG_FASTOPEN by default · 0d41cca4

由 Yuchung Cheng 提交于 10月 31, 2013

Applications have started to use Fast Open (e.g., Chrome browser has
such an optional flag) and the feature has gone through several
generations of kernels since 3.7 with many real network tests. It's
time to enable this flag by default for applications to test more
conveniently and extensively.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d41cca4

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables · f8785c55

由 David S. Miller 提交于 11月 04, 2013

Pablo Neira Ayuso says:

====================
This batch contains fives nf_tables patches for your net-next tree,
they are:

* Fix possible use after free in the module removal path of the
  x_tables compatibility layer, from Dan Carpenter.

* Add filter chain type for the bridge family, from myself.

* Fix Kconfig dependencies of the nf_tables bridge family with
  the core, from myself.

* Fix sparse warnings in nft_nat, from Tomasz Bursztyka.

* Remove duplicated include in the IPv4 family support for nf_tables,
  from Wei Yongjun.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8785c55

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 72c39a0a

由 David S. Miller 提交于 11月 04, 2013

Pablo Neira Ayuso says:

====================
This is another batch containing Netfilter/IPVS updates for your net-next
tree, they are:

* Six patches to make the ipt_CLUSTERIP target support netnamespace,
  from Gao feng.

* Two cleanups for the nf_conntrack_acct infrastructure, introducing
  a new structure to encapsulate conntrack counters, from Holger
  Eitzenberger.

* Fix missing verdict in SCTP support for IPVS, from Daniel Borkmann.

* Skip checksum recalculation in SCTP support for IPVS, also from
  Daniel Borkmann.

* Fix behavioural change in xt_socket after IP early demux, from
  Florian Westphal.

* Fix bogus large memory allocation in the bitmap port set type in ipset,
  from Jozsef Kadlecsik.

* Fix possible compilation issues in the hash netnet set type in ipset,
  also from Jozsef Kadlecsik.

* Define constants to identify netlink callback data in ipset dumps,
  again from Jozsef Kadlecsik.

* Use sock_gen_put() in xt_socket to replace xt_socket_put_sk,
  from Eric Dumazet.

* Improvements for the SH scheduler in IPVS, from Alexander Frolkin.

* Remove extra delay due to unneeded rcu barrier in IPVS net namespace
  cleanup path, from Julian Anastasov.

* Save some cycles in ip6t_REJECT by skipping checksum validation in
  packets leaving from our stack, from Stanislav Fomichev.

* Fix IPVS_CMD_ATTR_MAX definition in IPVS, larger that required, from
  Julian Anastasov.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72c39a0a

netfilter: nft_compat: use _safe version of list_for_each · c359c415

由 Dan Carpenter 提交于 11月 04, 2013

We need to use the _safe version of list_for_each_entry() here otherwise
we have a use after free bug.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c359c415

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch · 6fcf018a

由 David S. Miller 提交于 11月 04, 2013

Jesse Gross says:

====================
Open vSwitch

A set of updates for net-next/3.13. Major changes are:
 * Restructure flow handling code to be more logically organized and
   easier to read.
 * Rehashing of the flow table is moved from a workqueue to flow
   installation time. Before, heavy load could block the workqueue for
   excessive periods of time.
 * Additional debugging information is provided to help diagnose megaflows.
 * It's now possible to match on TCP flags.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6fcf018a

Merge branch 'mlx4' · 5a6e55c4

由 David S. Miller 提交于 11月 04, 2013

Or Gerlitz says:

====================
Mellanox driver updates

This patch set from Jack Morgenstein does the following:

1. Fix MAC/VLAN SRIOV implementation, and add wrapper functions for VLAN allocation
   and de-allocation (patches 1-6).

2. Implements resource quotas when running under SRIOV (patches 7-10).
   Patch 7 is a small bug fix, and patches 8-10 implement the quotas.

Quotas are implemented per resource type for VFs and the PF, to prevent
any entity from simply grabbing all the resources for itself and leaving
the other entities unable to obtain such resources.

The series is against net-next commit ba486502 "ipv6: remove the unnecessary statement in find_match()"

changes from V0:
 - dropped the 1st patch which needs to go to -stable and hence through net,
   not net-next
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a6e55c4

net/mlx4_core: Implement resource quota enforcement · 146f3ef4

由 Jack Morgenstein 提交于 11月 03, 2013

Implements resource quota grant decision when resources are requested,
for the following resources:  QPs, CQs, SRQs, MPTs, MTTs, vlans, MACs,
and Counters.

When granting a resource, the quota system increases the allocated-count
for that slave.

When the slave later frees the resource, its allocated-count is reduced.

A spinlock is used to protect the integrity of each resource's free-pool counter.
(One slave may be in the process of being granted a resource while another
slave has crashed, initiating cleanup of that slave's resource quotas).
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

146f3ef4

net/mlx4_core: Fix quota handling in the QUERY_FUNC_CAP wrapper · eb456a68

由 Jack Morgenstein 提交于 11月 03, 2013

In current kernels, the mlx4 driver running on a VM does not
differentiate between max resource numbers for the HCA and
max quotas -- it simply takes the quota values passed to it
as max-resource values.

However, the driver actually requires the VFs to be aware of
the actual number of resources that the HCA was initialized with,
for QPs, CQs, SRQs and MPTs.

For QPs, CQs and SRQs, the reason is that in completion handling
the driver must know which of the 24 bits are the actual resource
number, and which are "padding" bits.

For MPTs, also, the driver assumes knowledge of the number of MPTs
in the system.

The previous commit fixes the quota logic on the VM for the quota values
passed to it by QUERY_FUNC_CAPS.

For QPs, CQs, SRQs, and MPTs, it takes the max resource numbers
from QUERY_HCA (and not QUERY_FUNC_CAPS).  The quotas passed
in QUERY_FUNC_CAPS are used to report max resource number values
in the response to ib_query_device.

However, the Hypervisor driver must consider that VMs
may be running previous kernels, and compatibility must be preserved.

To resolve the incompatibility with previous kernels running on VMs,
we deprecated the quota fields in mlx4_QUERY_FUNC_CAP.  In the
deprecated fields, we pass the max-resource values from INIT_HCA

The quota fields are moved to a new location, and the current kernel
driver takes the proper values from that location. There is
also a new flag in dword 0, bit 28 of the mlx4_QUERY_FUNC_CAP mailbox;
if this flag is set, the (VM) driver takes the quota values from the
new location.

VMs running previous kernels will work properly, except that the max resource
numbers reported in ib_query_device for these resources will be
too high.  The Hypervisor driver will, however, enforce the quotas
for these VMs.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb456a68

mlx4: Structures and init/teardown for VF resource quotas · 5a0d0a61

由 Jack Morgenstein 提交于 11月 03, 2013

This is step #1 for implementing SRIOV resource quotas for VFs.

Quotas are implemented per resource type for VFs and the PF, to prevent
any entity from simply grabbing all the resources for itself and leaving
the other entities unable to obtain such resources.

Resources which are allocated using quotas:  QPs, CQs, SRQs, MPTs, MTTs, MAC,
                                             VLAN, and Counters.

The quota system works as follows:
Each entity (VF or PF) is given a max number of a given resource (its quota),
and a guaranteed minimum number for each resource (starvation prevention).

For QPs, CQs, SRQs, MPTs and MTTs:
50% of the available quantity for the resource is divided equally among
the PF and all the active VFs (i.e., the number of VFs in the mlx4_core module
parameter "num_vfs"). This 50% represents the "guaranteed minimum" pool.
The other 50% is the "free pool", allocated on a first-come-first-serve basis.
For each VF/PF, resources are first allocated from its "guaranteed-minimum"
pool. When that pool is exhausted, the driver attempts to allocate from
the resource "free-pool".

The quota (i.e., max) for the VFs and the PF is:
  The free-pool amount (50% of the real max) + the guaranteed minimum

For MACs:
  Guarantee 2 MACs per VF/PF per port. As a result, since we have only
  128 MACs per port, reduce the allowable number of VFs from 64 to 63.
  Any remaining MACs are put into a free pool.

For VLANs:
  For the PF, the per-port quota is 128 and guarantee is 64
     (to allow the PF to register at least a VLAN per VF in VST mode).
  For the VFs, the per-port quota is 64 and the guarantee is 0.
      We assume that VGT VFs are trusted not to abuse the VLAN resource.

For Counters:
  For all functions (PF and VFs), the quota is 128 and the guarantee is 0.

In this patch, we define the needed structures, which are added to the
resource-tracker struct.  In addition, we do initialization
for the resource quota, and adjust the query_device response to use quotas
rather than resource maxima.

As part of the implementation, we introduce a new field in
mlx4_dev: quotas.  This field holds the resource quotas used
to report maxima to the upper layers (ib_core, via query_device).

The HCA maxima of these values are passed to the VFs (via
QUERY_HCA) so that they may continue to use these in handling
QPs, CQs, SRQs and MPTs.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a0d0a61

net/mlx4_core: Fix checking order in MR table init · a30f1bc5

由 Jack Morgenstein 提交于 11月 03, 2013

In procedure mlx4_init_mr_table(), slaves should do no processing,
but should return success. This initialization is hypervisor-only.

However, the check for num_mpts being a power-of-2 was performed
before the check to return immediately if the driver is for a slave.
This resulted in spurious failures.

The order of performing the checks is reversed, so that if the
driver is for a slave, no processing is done and success is returned.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a30f1bc5

net/mlx4_core: Don't fail reg/unreg vlan for older guests · 2c957ff2

由 Jack Morgenstein 提交于 11月 03, 2013

In upstream kernels under SRIOV, the vlan register/unregister calls
were NOPs (doing nothing and returning OK). We detect these old
calls from guests (via the comm channel), since previously the
port number in mlx4_register_vlan was passed (improperly) in the
out_param. This has been corrected so that the port number is now
passed in bits 8..15 of the in_modifier field.

For old calls, these bits will be zero, so if the passed port
number is zero, we can still look at the out_param field to see
if it contains a valid port number. If yes, the VM is running
an old driver.

Since for old drivers, the register/unregister_vlan wrappers were
NOPs, we continue this policy -- the reason being that upstream
had an additional bug in eth driver running on guests (where
procedure mlx4_en_vlan_rx_kill_vid() had the following code:

if (!mlx4_find_cached_vlan(mdev->dev, priv->port, vid, &idx))
        mlx4_unregister_vlan(mdev->dev, priv->port, idx);
else
        en_err(priv, "could not find vid %d in cache\n", vid);

On a VM, mlx4_find_cached_vlan() will always fail, since the
vlan cache is located on the Hypervisor; on guests it is empty.

Therefore, if we allow upstream guests to register vlans, we will
have vlan leakage since the unregister will never be performed.
Leaving vlan reg/unreg for old guest drivers as a NOP is not a
feature regression, since in upstream the register/unregister
vlan wrapper is a NOP.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c957ff2

net/mlx4_core: Resource tracker for reg/unreg vlans · 4874080d

由 Jack Morgenstein 提交于 11月 03, 2013

Add resource tracker support for reg/unreg vlans calls done by VFs.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4874080d

net/mlx4_en: Use vlan id instead of vlan index for unregistration · 2009d005

由 Jack Morgenstein 提交于 11月 03, 2013

Use of vlan_index created problems unregistering vlans on guests.

In addition, tools delete vlan by tag, not by index, lets follow that.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2009d005

net/mlx4_core: Fix reg/unreg vlan/mac to conform to the firmware spec · acddd5dd

由 Jack Morgenstein 提交于 11月 03, 2013

The functions mlx4_register_vlan, mlx4_unregister_vlan, mlx4_register_mac,
mlx4_unregister_mac all made illegal use of the out_param in multifunc mode
to pass the port number. The firmware spec specifies that the port number
should be passed in bits 8..15 of the input-modifier field for ALLOC_RES and
FREE_RES (sections 20.15.1 and 20.15.2).

For MAC register/unregister, this patch contains workarounds so that guests
running previous kernels continue to work on a new Hypervisor, and guests
running the new kernel will continue to work on old hypervisors.

Vlan registeration capability is still not operational in multifunction mode,
since the vlan wrapper functions are not implemented in this patch.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

acddd5dd

net/mlx4_core: Fix register/unreg vlan flow · 162226a1

由 Jack Morgenstein 提交于 11月 03, 2013

The reg/unreg vlan code was broken:

1. a wrapped function called another wrapped function, causing a deadlock.

2. unregister_vlan called cmd_box instead of cmd_box_imm, leading to
   incorrectly passed parameters.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

162226a1

sh_eth: check platform data pointer · 3b4c5cbf

由 Sergei Shtylyov 提交于 10月 30, 2013

Check the platform data pointer before dereferencing it and error out of the
probe() method if it's NULL.

This has additional effect of preventing kernel oops with outdated platform data
containing zero PHY address instead (such as on SolutionEngine7710).
Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: NSimon Horman <horms+renesas@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b4c5cbf

Merge branch 'usbnet' · 6f9fed0b

由 David S. Miller 提交于 11月 04, 2013

Bjørn Mork says:

====================
cdc_mbim + qmi_wwan trivial fixes

This series fixes three problems Oliver pointed out during the
review of the new huawei_cdc_ncm driver:
http://patchwork.ozlabs.org/patch/278903/

That innocent driver only used cdc_mbim as a blueprint, and
all the blame should really have gone to me....

I do have a similar fix for the manage_power issue in the
cdc-wdm USB class driver as well.  It will be submitted to
linux-usb as soon as Greg opens up his mailbox again :-)
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f9fed0b

net: cdc_mbim: fixup error return value · e62416e8

由 Bjørn Mork 提交于 11月 01, 2013

Reported-by: NOliver Neukum <oneukum@suse.de>
Signed-off-by: NBjørn Mork <bjorn@mork.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e62416e8

net: cdc_mbim: no need to check for resume if suspend exists · c37ac9b2

由 Bjørn Mork 提交于 11月 01, 2013

Reported-by: NOliver Neukum <oneukum@suse.de>
Signed-off-by: NBjørn Mork <bjorn@mork.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c37ac9b2

net: qmi_wwan: no need to check for resume if suspend exists · a298807b

由 Bjørn Mork 提交于 11月 01, 2013

Reported-by: NOliver Neukum <oneukum@suse.de>
Signed-off-by: NBjørn Mork <bjorn@mork.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a298807b

net: qmi_wwan: manage_power should always set needs_remote_wakeup · 79d9b62a

由 Bjørn Mork 提交于 11月 01, 2013

Reported-by: NOliver Neukum <oneukum@suse.de>
Signed-off-by: NBjørn Mork <bjorn@mork.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79d9b62a

net: cdc_mbim: manage_power should always set needs_remote_wakeup · 04d948a9

由 Bjørn Mork 提交于 11月 01, 2013

Reported-by: NOliver Neukum <oneukum@suse.de>
Signed-off-by: NBjørn Mork <bjorn@mork.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04d948a9

Merge branch 'qlcnic' · 96635fbd

由 David S. Miller 提交于 11月 04, 2013

Himanshu Madhani says:

====================
qlcnic: Multiple Tx queue support and code refactoring

This Patch series contains following changes

o Refactored code to calculate, validate and assign Tx/SDS rings for  various modes of driver.
o Enhanced ethtool statistics for multi Tx queue on all supported adapters.
o Enable multiple Tx queue for 83xx and 84xx Series adapters.
o Register netdev for failed device state.

changes from v1 -> v2
o Dropped patch to replace inappropriate usage of kzalloc() with vzalloc().

Please apply to net-next.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

96635fbd

qlcnic: update version to 5.3.52 · db62d7d9

由 Himanshu Madhani 提交于 11月 04, 2013

Signed-off-by: NHimanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db62d7d9

qlcnic: Enable multiple Tx queue support for 83xx/84xx Series adapters. · 18afc102

由 Himanshu Madhani 提交于 11月 04, 2013

o 83xx and 84xx firmware is capable of multiple Tx queues.
  This patch will enable multiple Tx queues for 83xx/84xx
  series adapters. Max number of Tx queues supported will be 8.
Signed-off-by: NHimanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18afc102

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功