提交 · e51271d4ce7b229f5c02903e3c44bf92c0dbef6b · openeuler / raspberrypi-kernel

12 2月, 2016 31 次提交

Merge branch 'tcp_dccp_ports' · e51271d4

由 David S. Miller 提交于 2月 12, 2016

Eric Dumazet says:

====================
tcp/dccp: better use of ephemeral ports

Big servers have bloated bind table, making very hard to succeed
ephemeral port allocations, without special containers/namespace tricks.

This patch series extends the strategy added in commit 07f4c900
("tcp/dccp: try to not exhaust ip_local_port_range in connect()").

Since ports used by connect() are much likely to be shared among them,
we give a hint to both bind() and connect() to keep the crowds separated
if possible.

Of course, if on a specific host an application needs to allocate ~30000
ports using bind(), it will still be able to do so. Same for ~30000 connect()
to a unique 2-tuple (dst addr, dst port)

New implemetation is also more friendly to softirqs and reschedules.

v2: rebase after TCP SO_REUSEPORT changes
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e51271d4

tcp/dccp: better use of ephemeral ports in bind() · ea8add2b

由 Eric Dumazet 提交于 2月 11, 2016

Implement strategy used in __inet_hash_connect() in opposite way :

Try to find a candidate using odd ports, then fallback to even ports.

We no longer disable BH for whole traversal, but one bucket at a time.
We also use cond_resched() to yield cpu to other tasks if needed.

I removed one indentation level and tried to mirror the loop we have
in __inet_hash_connect() and variable names to ease code maintenance.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea8add2b

tcp/dccp: better use of ephemeral ports in connect() · 1580ab63

由 Eric Dumazet 提交于 2月 11, 2016

In commit 07f4c900 ("tcp/dccp: try to not exhaust ip_local_port_range
in connect()"), I added a very simple heuristic, so that we got better
chances to use even ports, and allow bind() users to have more available
slots.

It gave nice results, but with more than 200,000 TCP sessions on a typical
server, the ~30,000 ephemeral ports are still a rare resource.

I chose to go a step further, by looking at all even ports, and if none
was available, fallback to odd ports.

The companion patch does the same in bind(), but in opposite way.

I've seen exec times of up to 30ms on busy servers, so I no longer
disable BH for the whole traversal, but only for each hash bucket.
I also call cond_resched() to be gentle to other tasks.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1580ab63

Merge branch 'net-mitigate-kmem_free-slowpath' · 3134b9f0

由 David S. Miller 提交于 2月 11, 2016

Jesper Dangaard Brouer says:

====================
net: mitigating kmem_cache free slowpath

This patchset is the first real use-case for kmem_cache bulk _free_.
The use of bulk _alloc_ is NOT included in this patchset. The full use
have previously been posted here [1].

The bulk free side have the largest benefit for the network stack
use-case, because network stack is hitting the kmem_cache/SLUB
slowpath when freeing SKBs, due to the amount of outstanding SKBs.
This is solved by using the new API kmem_cache_free_bulk().

Introduce new API napi_consume_skb(), that hides/handles bulk freeing
for the caller.  The drivers simply need to use this call when freeing
SKBs in NAPI context, e.g. replacing their calles to dev_kfree_skb() /
dev_consume_skb_any().

Driver ixgbe is the first user of this new API.

[1] http://thread.gmane.org/gmane.linux.network/384302/focus=397373
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3134b9f0

ixgbe: bulk free SKBs during TX completion cleanup cycle · a3a8749d

由 Jesper Dangaard Brouer 提交于 2月 08, 2016

There is an opportunity to bulk free SKBs during reclaiming of
resources after DMA transmit completes in ixgbe_clean_tx_irq.  Thus,
bulk freeing at this point does not introduce any added latency.

Simply use napi_consume_skb() which were recently introduced.  The
napi_budget parameter is needed by napi_consume_skb() to detect if it
is called from netpoll.

Benchmarking IPv4-forwarding, on CPU i7-4790K @4.2GHz (no turbo boost)
 Single CPU/flow numbers: before: 1982144 pps ->  after : 2064446 pps
 Improvement: +82302 pps, -20 nanosec, +4.1%
 (SLUB and GCC version 5.1.1 20150618 (Red Hat 5.1.1-4))

Joint work with Alexander Duyck.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3a8749d

net: bulk free SKBs that were delay free'ed due to IRQ context · 15fad714

由 Jesper Dangaard Brouer 提交于 2月 08, 2016

The network stack defers SKBs free, in-case free happens in IRQ or
when IRQs are disabled. This happens in __dev_kfree_skb_irq() that
writes SKBs that were free'ed during IRQ to the softirq completion
queue (softnet_data.completion_queue).

These SKBs are naturally delayed, and cleaned up during NET_TX_SOFTIRQ
in function net_tx_action().  Take advantage of this a use the skb
defer and flush API, as we are already in softirq context.

For modern drivers this rarely happens. Although most drivers do call
dev_kfree_skb_any(), which detects the situation and calls
__dev_kfree_skb_irq() when needed.  This due to netpoll can call from
IRQ context.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15fad714

net: bulk free infrastructure for NAPI context, use napi_consume_skb · 795bb1c0

由 Jesper Dangaard Brouer 提交于 2月 08, 2016

Discovered that network stack were hitting the kmem_cache/SLUB
slowpath when freeing SKBs.  Doing bulk free with kmem_cache_free_bulk
can speedup this slowpath.

NAPI context is a bit special, lets take advantage of that for bulk
free'ing SKBs.

In NAPI context we are running in softirq, which gives us certain
protection.  A softirq can run on several CPUs at once.  BUT the
important part is a softirq will never preempt another softirq running
on the same CPU.  This gives us the opportunity to access per-cpu
variables in softirq context.

Extend napi_alloc_cache (before only contained page_frag_cache) to be
a struct with a small array based stack for holding SKBs.  Introduce a
SKB defer and flush API for accessing this.

Introduce napi_consume_skb() as replacement for e.g. dev_consume_skb_any()
when running in NAPI context.  A small trick to handle/detect if we
are called from netpoll is to see if budget is 0.  In that case, we
need to invoke dev_consume_skb_irq().

Joint work with Alexander Duyck.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

795bb1c0

Merge branch 'virtio_net-ethtool-validation' · 18ac5590

由 David S. Miller 提交于 2月 11, 2016

Nikolay Aleksandrov says:

====================
virtio_net: better ethtool setting validation

This small set is a follow-up for the recent patches that added ethtool
get/set settings. Patch 1 changes the speed validation routine to check
if the speed is between 0 and INT_MAX (or SPEED_UNKNOWN) and patch 2 adds
port validation to virtio_net and better validation comment.

This set is on top of Michael's patch which explains that speeds from 0
to INT_MAX are valid:
http://patchwork.ozlabs.org/patch/578911/
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18ac5590

virtio_net: validate ethtool port setting and explain the user validation · 0cf3ace9

由 Nikolay Aleksandrov 提交于 2月 07, 2016

We should validate the port setting that we got from the user and check
if it's what we've set it to (PORT_OTHER), also add explanation that
ignoring advertising is good as long as we don't have autonegotiation.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0cf3ace9

ethtool: make validate_speed accept all speeds between 0 and INT_MAX · e02564ee

由 Nikolay Aleksandrov 提交于 2月 07, 2016

Devices these days can have any speed and as was recently pointed out
any speed from 0 to INT_MAX is valid so adjust speed validation to
accept such values.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e02564ee

Merge branch 'dp83848-TLK10x' · 83840f5b

由 David S. Miller 提交于 2月 11, 2016

Andrew F. Davis says:

====================
net: phy: dp83848: Add support for TI TLK10x Ethernet PHYs

This series is [0] split into its logical components.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83840f5b

A
net: phy: dp83848: Add comments for register definitions · 5fed0393
由 Andrew F. Davis 提交于 2月 07, 2016
```
Signed-off-by: NAndrew F. Davis <afd@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
5fed0393

net: phy: dp83848: Add support for TI TLK10x Ethernet PHYs · d1782f7b

由 Andrew F. Davis 提交于 2月 07, 2016

The TI TLK10x Ethernet PHYs are similar in the interrupt relevant
registers and so are compatible with the DP83848x devices already
supported.
Signed-off-by: NAndrew F. Davis <afd@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1782f7b

net: phy: dp83848: Reorganize code for readability and safety · cf13be5a

由 Andrew F. Davis 提交于 2月 07, 2016

Reorganize code by moving the desired interrupt mask definition
out of function. Also rearrange the enable/disable interrupt function
to prevent accidental over-writing of values in registers.
Signed-off-by: NAndrew F. Davis <afd@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf13be5a

net: phy: dp83848: Add PHY ID for TI version of DP83848C · 68336293

由 Andrew F. Davis 提交于 2月 07, 2016

After acquiring National Semiconductor, TI appears to have
changed the Vendor Model Number for the DP83848C PHYs,
add this new ID to supported IDs.
Signed-off-by: NAndrew F. Davis <afd@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68336293

net: phy: dp83848: Add macro for dp83848 compatible devices · 2f67864b

由 Andrew F. Davis 提交于 2月 07, 2016

Add a helper macro for defining dp83848 compatible phy devices.
Update copyright info.
Signed-off-by: NAndrew F. Davis <afd@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f67864b

be2net: don't report EVB for older chipsets when SR-IOV is disabled · 8431706b

由 Ivan Vecera 提交于 2月 11, 2016

The EVB (virtual bridge) functionality should be disabled on older BE3
and Lancer chips if SR-IOV is disabled in the NIC's BIOS. This setting
is identified by the zero value of total VFs reported by the card.
The GET_HSW_CONFIG command cannot be used as it is not supported by
these older chipset's FW.

v2: added the comment

Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Padmanabh Ratnakar <padmanabh.ratnakar@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: NIvan Vecera <ivecera@redhat.com>
Acked-by: NSathya Perla <sathya.perla@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8431706b

Merge branch 'spi_ks8995' · 6dca8d45

由 David S. Miller 提交于 2月 11, 2016

Helmut Buchsbaum says:

====================
Add support for MICREL KSZ8795CLX 5-port switch

This patch series refactors the spi-ks8995 driver to finally add support
for the MICREL KSZ8795CLX. Additionally support for controlling a GPIO
line for resetting the switch is added.

Helmut

Changes since v2:
 - use GPIO_ACTIVE_LOW according to Andrew's remark.
 - use ePAPR compliant node name in example, thanks to Sergei for
   pointing out
Changes since v1:
 - removed initializing registers from Device Tree following Florian's
   advice
 - fixed GPIO handling for reset according to Andrew's remark.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6dca8d45

H
dt-bindings: net: ks8995: add bindings documentation for ks8995 · 7e406d12
由 Helmut Buchsbaum 提交于 2月 09, 2016
```
Signed-off-by: NHelmut Buchsbaum <helmut.buchsbaum@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
7e406d12

net: phy: spi_ks8995: add support for MICREL KSZ8795CLX · c0e6cb1f

由 Helmut Buchsbaum 提交于 2月 09, 2016

Add support for MICREL KSZ8795CLX Integrated 5-Port, 10-/100-Managed
Ethernet Switch with Gigabit GMII/RGMII and MII/RMII interfaces.
Signed-off-by: NHelmut Buchsbaum <helmut.buchsbaum@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0e6cb1f

net: phy: spi_ks8995: generalize creation of SPI commands · 6665e623

由 Helmut Buchsbaum 提交于 2月 09, 2016

Prepare creating SPI reads and writes for other switch families.
The KS8995 family uses the straight forward
	<8bit CMD><8bit ADDR>
sequence.
To be able to support KSZ8795 family, which uses
	<3bit CMD><12bit ADDR><1 bit TR>
make the SPI command creation chip variant dependent.
Signed-off-by: NHelmut Buchsbaum <helmut.buchsbaum@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6665e623

net: phy: spi_ks8995: add support for resetting switch using GPIO · cd6f288c

由 Helmut Buchsbaum 提交于 2月 09, 2016

When using device tree it is no more possible to reset the PHY at board
level. Furthermore, doing in the driver allows to power down the switch
when it is not used any more.

The patch introduces a new optional property "reset-gpios" denoting an
appropriate GPIO handle, e.g.:

reset-gpios = <&gpio0 46 GPIO_ACTIVE_LOW>
Signed-off-by: NHelmut Buchsbaum <helmut.buchsbaum@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd6f288c

net: phy: spi_ks8995: verify chip and determine revision · 484e36ff

由 Helmut Buchsbaum 提交于 2月 09, 2016

Since the chip variant is now determined by spi_device_id, verify
family and chip id and determine the revision id.
Signed-off-by: NHelmut Buchsbaum <helmut.buchsbaum@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

484e36ff

net: phy: spi_ks8995: introduce spi_device_id table · aa54c8da

由 Helmut Buchsbaum 提交于 2月 09, 2016

Refactor to use spi_device_id table to facilitate easy
extendability.
Signed-off-by: NHelmut Buchsbaum <helmut.buchsbaum@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa54c8da

Merge branch 'thunderx-irq-hints' · 74630534

由 David S. Miller 提交于 2月 11, 2016

Sunil Goutham says:

====================
net: thunderx: Setting IRQ affinity hints and other optimizations

This patch series contains changes
- To add support for virtual function's irq affinity hint
- Replace napi_schedule() with napi_schedule_irqoff()
- Reduce page allocation overhead by allocating pages
  of higher order when pagesize is 4KB.
- Add couple of stats which helps in debugging
- Some miscellaneous changes to BGX driver.

Changes from v1:
- As suggested changed MAC address invalid log message
  to dev_err() instead of dev_warn().
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74630534

net: thunderx: Alloc higher order pages when pagesize is small · 6e4be8d6

由 Sunil Goutham 提交于 2月 11, 2016

Allocate higher order pages when pagesize is small, this will
reduce number of calls to page allocator and wastage of memory.
Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e4be8d6

net: thunderx: bgx: Add log message when setting mac address · 1d82efac

由 Robert Richter 提交于 2月 11, 2016

Signed-off-by: NRobert Richter <rrichter@cavium.com>
Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d82efac

net: thunderx: bgx: Use standard firmware node infrastructure. · eee326fd

由 David Daney 提交于 2月 11, 2016

In the case of OF device tree, the firmware information is attached to
the BGX device structure in the standard manner, so use the firmware
iterators and accessors where possible.
Signed-off-by: NDavid Daney <david.daney@cavium.com>
Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eee326fd

net: thunderx: Assign affinity hints to vf's interrupts · fb4b7d98

由 Sunil Goutham 提交于 2月 11, 2016

This affinity hint can be used by user space irqbalance tool to set
preferred CPU mask for irqs registered by this VF. Irqbalance needs
to be in 'exact' mode to set irq affinity same as indicated by
affinity hint.
Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb4b7d98

net: thunderx: Use napi_schedule_irqoff() · ef0a4d86

由 Sunil Goutham 提交于 2月 11, 2016

napi_schedule is being called from hard irq context, hence
switch to napi_schedule_irqoff which avoids unneeded call
to local_irq_save and local_irq_restore.
Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef0a4d86

net, thunderx: Add TX timeout and RX buffer alloc failure stats. · a05d4845

由 Thanneeru Srinivasulu 提交于 2月 11, 2016

When system is low on atomic memory, too many error messages are logged.
Since this is not a total failure but a simple switch to non-atomic allocation
better to have a stat.

Also add a stat for reset, kicked due to transmit watchdog timeout.
Signed-off-by: NThanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com>
Signed-off-by: NSunil Goutham <sgoutham@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a05d4845

11 2月, 2016 9 次提交

Merge branch 'igmp-ns' · 65411adb

由 David S. Miller 提交于 2月 11, 2016

Nikolay Borisov says:

====================
Make igmp sysctl knobs namespace aware

This series continue making more of the net related sysctls
namespace aware. The first 2 and last patches are straight
forward and convert sysctls which weren't defined to be
namespace aware. The only thing in them is that each removes
a define which is used in only one place (to initialise
the respective sysctl) so I don't think this is a huge loss.

The third patch however, converts igmp_llm_reports which was
already defined in the ipv4_net_table but wasn't using any of
the net namespace infrastructure.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65411adb

igmp: Namespacify igmp_qrv sysctl knob · 165094af

由 Nikolay Borisov 提交于 2月 08, 2016

Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

165094af

igmp: Namespaceify igmp_llm_reports sysctl knob · 87a8a2ae

由 Nikolay Borisov 提交于 2月 09, 2016

This was initially introduced in df2cf4a7 ("IGMP: Inhibit
reports for local multicast groups") by defining the sysctl in the
ipv4_net_table array, however it was never implemented to be
namespace aware. Fix this by changing the code accordingly.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87a8a2ae

igmp: Namespaceify igmp_max_msf sysctl knob · 166b6b2d

由 Nikolay Borisov 提交于 2月 08, 2016

Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

166b6b2d

N
igmp: Namespaceify igmp_max_memberships sysctl knob · 815c5270
由 Nikolay Borisov 提交于 2月 08, 2016
```
Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
815c5270

bonding: use return instead of goto · 1e2a8868

由 Zhang Shengju 提交于 2月 09, 2016

Replace 'goto' with 'return' to remove unnecessary check at label:
err_undo_flags.

The reason is that 'err_undo_flags' do two things for the first slave device:
1.revert bond mac address if it is set by the slave device.
2.revert bond device type if it's not ARPHRD_ETHER.

It's not necessary for the following three places, they changed neither bond
mac address nor type. It's straightforward to return directly.
Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e2a8868

net: macb: add wake-on-lan support via magic packet · 3e2a5e15

由 Sergio Prado 提交于 2月 09, 2016

Tested on Acqua A5 SoM (http://www.acmesystems.it/acqua).
Signed-off-by: NSergio Prado <sergio.prado@e-labworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3e2a5e15

net: hamradio: baycom_ser_fdx: Replace timeval with timespec64 · e6515203

由 Amitoj Kaur Chawla 提交于 2月 10, 2016

32 bit systems using 'struct timeval' will break in the year 2038, so
we replace the code appropriately. However, this driver is not broken
in 2038 since we are only using microseconds portion of the time.

This patch replaces 'struct timeval' with 'struct timespec64'. We only
need to find elapsed microseconds rather than absolute time, so it's
better to use monotonic time, so using ktime_get_ts64() makes the code
more efficient and more robust against concurrent settimeofday()
calls.
Signed-off-by: NAmitoj Kaur Chawla <amitoj1606@gmail.com>
Reviewed-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NThomas Sailer <t.sailer@alumni.ethz.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6515203

openvswitch: allow management from inside user namespaces · 4a92602a

由 Tycho Andersen 提交于 2月 05, 2016

Operations with the GENL_ADMIN_PERM flag fail permissions checks because
this flag means we call netlink_capable, which uses the init user ns.

Instead, let's introduce a new flag, GENL_UNS_ADMIN_PERM for operations
which should be allowed inside a user namespace.

The motivation for this is to be able to run openvswitch in unprivileged
containers. I've tested this and it seems to work, but I really have no
idea about the security consequences of this patch, so thoughts would be
much appreciated.

v2: use the GENL_UNS_ADMIN_PERM flag instead of a check in each function
v3: use separate ifs for UNS_ADMIN_PERM and ADMIN_PERM, instead of one
    massive one
Reported-by: NJames Page <james.page@canonical.com>
Signed-off-by: NTycho Andersen <tycho.andersen@canonical.com>
CC: Eric Biederman <ebiederm@xmission.com>
CC: Pravin Shelar <pshelar@ovn.org>
CC: Justin Pettit <jpettit@nicira.com>
CC: "David S. Miller" <davem@davemloft.net>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a92602a