提交 · d3a9632f09153bc46a8077844e05e179f1c10c3f · openeuler / raspberrypi-kernel

10 12月, 2014 16 次提交

A
skb_copy_datagram_iovec() can die · d3a9632f
由 Al Viro 提交于 11月 24, 2014
```
no callers other than itself.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d3a9632f

switch memcpy_to_msg() and skb_copy{,_and_csum}_datagram_msg() to primitives · e5a4b0bb

由 Al Viro 提交于 11月 24, 2014

... making both non-draining.  That means that tcp_recvmsg() becomes
non-draining.  And _that_ would break iscsit_do_rx_data() unless we
	a) make sure tcp_recvmsg() is uniformly non-draining (it is)
	b) make sure it copes with arbitrary (including shifted)
iov_iter (it does, all it uses is iov_iter primitives)
	c) make iscsit_do_rx_data() initialize ->msg_iter only once.

Fortunately, (c) is doable with minimal work and we are rid of one
the two places where kernel send/recvmsg users would be unhappy with
non-draining behaviour.

Actually, that makes all but one of ->recvmsg() instances iov_iter-clean.
The exception is skcipher_recvmsg() and it also isn't hard to convert
to primitives (iov_iter_get_pages() is needed there).  That'll wait
a bit - there's some interplay with ->sendmsg() path for that one.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e5a4b0bb

first fruits - kill l2cap ->memcpy_fromiovec() · 17836394

由 Al Viro 提交于 11月 24, 2014

Just use copy_from_iter().  That's what this method is trying to do
in all cases, in a very convoluted fashion.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

17836394

put iov_iter into msghdr · c0371da6

由 Al Viro 提交于 11月 24, 2014

Note that the code _using_ ->msg_iter at that point will be very
unhappy with anything other than unshifted iovec-backed iov_iter.
We still need to convert users to proper primitives.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c0371da6

A
vmci: propagate msghdr all way down to __qp_memcpy_from_queue() · d838df2e
由 Al Viro 提交于 11月 24, 2014
```
... and switch it to memcpy_to_msg()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d838df2e

switch l2cap ->memcpy_fromiovec() to msghdr · 56c39fb6

由 Al Viro 提交于 11月 24, 2014

it'll die soon enough - now that kvec-backed iov_iter works regardless
of set_fs(), both instances will become copy_from_iter() as soon as
we introduce ->msg_iter...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

56c39fb6

A
switch tcp_sock->ucopy from iovec (ucopy.iov) to msghdr (ucopy.msg) · f4362a2c
由 Al Viro 提交于 11月 24, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
f4362a2c
A
ip_generic_getfrag, udplite_getfrag: switch to passing msghdr · f69e6d13
由 Al Viro 提交于 11月 24, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
f69e6d13

ipvlan: move the device check function into netdevice.h · 5933fea7

由 Mahesh Bandewar 提交于 12月 06, 2014

Move the port check [ipvlan_dev_master()] and device check
[ipvlan_dev_slave()] functions to netdevice.h and rename them
netif_is_ipvlan_port() and netif_is_ipvlan() resp. to be
consistent with macvlan api naming.
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5933fea7

netdevice: Add a function to check macvlan port · 2f33e7d5

由 Mahesh Bandewar 提交于 12月 06, 2014

Similar to a check for macvlan device, netif_is_macvlan(), add
another function to check if a device is used as macvlan port.
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f33e7d5

dst: no need to take reference on DST_NOCACHE dsts · dbfc4fb7

由 Hannes Frederic Sowa 提交于 12月 06, 2014

Since commit f8864972 ("ipv4: fix dst race in sk_dst_get()")
DST_NOCACHE dst_entries get freed by RCU. So there is no need to get a
reference on them when we are in rcu protected sections.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Reviewed-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dbfc4fb7

tcp_cubic: add SNMP counters to track how effective is Hystart · 6e3a8a93

由 Eric Dumazet 提交于 12月 04, 2014

When deploying FQ pacing, one thing we noticed is that CUBIC Hystart
triggers too soon.

Having SNMP counters to have an idea of how often the various Hystart
methods trigger is useful prior to any modifications.

This patch adds SNMP counters tracking, how many time "ack train" or
"Delay" based Hystart triggers, and cumulative sum of cwnd at the time
Hystart decided to end SS (Slow Start)

myhost:~# nstat -a | grep Hystart
TcpExtTCPHystartTrainDetect     9                  0.0
TcpExtTCPHystartTrainCwnd       20650              0.0
TcpExtTCPHystartDelayDetect     10                 0.0
TcpExtTCPHystartDelayCwnd       360                0.0

->
 Train detection was triggered 9 times, and average cwnd was
 20650/9=2294,
 Delay detection was triggered 10 times and average cwnd was 36
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e3a8a93

net: sched: cls: remove unused op put from tcf_proto_ops · 57d743a3

由 Jiri Pirko 提交于 12月 04, 2014

It is never called and implementations are void. So just remove it.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

57d743a3

net: avoid two atomic operations in fast clones · 6ffe75eb

由 Eric Dumazet 提交于 12月 03, 2014

Commit ce1a4ea3 ("net: avoid one atomic operation in skb_clone()")
took the wrong way to save one atomic operation.

It is actually possible to avoid two atomic operations, if we
do not change skb->fclone values, and only rely on clone_ref
content to signal if the clone is available or not.

skb_clone() can simply use the fast clone if clone_ref is 1.

kfree_skbmem() can avoid the atomic_dec_and_test() if clone_ref is 1.

Note that because we usually free the clone before the original skb,
this particular attempt is only done for the original skb to have better
branch prediction.

SKB_FCLONE_FREE is removed.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Chris Mason <clm@fb.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6ffe75eb

rtnetlink: delay RTM_DELLINK notification until after ndo_uninit() · 395eea6c

由 Mahesh Bandewar 提交于 12月 03, 2014

The commit 56bfa7ee ("unregister_netdevice : move RTM_DELLINK to
until after ndo_uninit") tried to do this ealier but while doing so
it created a problem. Unfortunately the delayed rtmsg_ifinfo() also
delayed call to fill_info(). So this translated into asking driver
to remove private state and then query it's private state. This
could have catastropic consequences.

This change breaks the rtmsg_ifinfo() into two parts - one takes the
precise snapshot of the device by called fill_info() before calling
the ndo_uninit() and the second part sends the notification using
collected snapshot.

It was brought to notice when last link is deleted from an ipvlan device
when it has free-ed the port and the subsequent .fill_info() call is
trying to get the info from the port.

kernel: [ 255.139429] ------------[ cut here ]------------
kernel: [ 255.139439] WARNING: CPU: 12 PID: 11173 at net/core/rtnetlink.c:2238 rtmsg_ifinfo+0x100/0x110()
kernel: [ 255.139493] Modules linked in: ipvlan bonding w1_therm ds2482 wire cdc_acm ehci_pci ehci_hcd i2c_dev i2c_i801 i2c_core msr cpuid bnx2x ptp pps_core mdio libcrc32c
kernel: [ 255.139513] CPU: 12 PID: 11173 Comm: ip Not tainted 3.18.0-smp-DEV #167
kernel: [ 255.139514] Hardware name: Intel RML,PCH/Ibis_QC_18, BIOS 1.0.10 05/15/2012
kernel: [ 255.139515] 0000000000000009 ffff880851b6b828 ffffffff815d87f4 00000000000000e0
kernel: [ 255.139516] 0000000000000000 ffff880851b6b868 ffffffff8109c29c 0000000000000000
kernel: [ 255.139518] 00000000ffffffa6 00000000000000d0 ffffffff81aaf580 0000000000000011
kernel: [ 255.139520] Call Trace:
kernel: [ 255.139527] [<ffffffff815d87f4>] dump_stack+0x46/0x58
kernel: [ 255.139531] [<ffffffff8109c29c>] warn_slowpath_common+0x8c/0xc0
kernel: [ 255.139540] [<ffffffff8109c2ea>] warn_slowpath_null+0x1a/0x20
kernel: [ 255.139544] [<ffffffff8150d570>] rtmsg_ifinfo+0x100/0x110
kernel: [ 255.139547] [<ffffffff814f78b5>] rollback_registered_many+0x1d5/0x2d0
kernel: [ 255.139549] [<ffffffff814f79cf>] unregister_netdevice_many+0x1f/0xb0
kernel: [ 255.139551] [<ffffffff8150acab>] rtnl_dellink+0xbb/0x110
kernel: [ 255.139553] [<ffffffff8150da90>] rtnetlink_rcv_msg+0xa0/0x240
kernel: [ 255.139557] [<ffffffff81329283>] ? rhashtable_lookup_compare+0x43/0x80
kernel: [ 255.139558] [<ffffffff8150d9f0>] ? __rtnl_unlock+0x20/0x20
kernel: [ 255.139562] [<ffffffff8152cb11>] netlink_rcv_skb+0xb1/0xc0
kernel: [ 255.139563] [<ffffffff8150a495>] rtnetlink_rcv+0x25/0x40
kernel: [ 255.139565] [<ffffffff8152c398>] netlink_unicast+0x178/0x230
kernel: [ 255.139567] [<ffffffff8152c75f>] netlink_sendmsg+0x30f/0x420
kernel: [ 255.139571] [<ffffffff814e0b0c>] sock_sendmsg+0x9c/0xd0
kernel: [ 255.139575] [<ffffffff811d1d7f>] ? rw_copy_check_uvector+0x6f/0x130
kernel: [ 255.139577] [<ffffffff814e11c9>] ? copy_msghdr_from_user+0x139/0x1b0
kernel: [ 255.139578] [<ffffffff814e1774>] ___sys_sendmsg+0x304/0x310
kernel: [ 255.139581] [<ffffffff81198723>] ? handle_mm_fault+0xca3/0xde0
kernel: [ 255.139585] [<ffffffff811ebc4c>] ? destroy_inode+0x3c/0x70
kernel: [ 255.139589] [<ffffffff8108e6ec>] ? __do_page_fault+0x20c/0x500
kernel: [ 255.139597] [<ffffffff811e8336>] ? dput+0xb6/0x190
kernel: [ 255.139606] [<ffffffff811f05f6>] ? mntput+0x26/0x40
kernel: [ 255.139611] [<ffffffff811d2b94>] ? __fput+0x174/0x1e0
kernel: [ 255.139613] [<ffffffff814e2129>] __sys_sendmsg+0x49/0x90
kernel: [ 255.139615] [<ffffffff814e2182>] SyS_sendmsg+0x12/0x20
kernel: [ 255.139617] [<ffffffff815df092>] system_call_fastpath+0x12/0x17
kernel: [ 255.139619] ---[ end trace 5e6703e87d984f6b ]---
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Reported-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

395eea6c

tc_act: export uapi header file · 5f4d8d97

由 stephen hemminger 提交于 12月 03, 2014

This file is used by iproute2 and should be exported.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Acked-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f4d8d97

09 12月, 2014 11 次提交

ethtool: Support for configurable RSS hash function · 892311f6

由 Eyal Perry 提交于 12月 02, 2014

This patch extends the set/get_rxfh ethtool-options for getting or
setting the RSS hash function.

It modifies drivers implementation of set/get_rxfh accordingly.

This change also delegates the responsibility of checking whether a
modification to a certain RX flow hash parameter is supported to the
driver implementation of set_rxfh.

User-kernel API is done through the new hfunc bitmask field in the
ethtool_rxfh struct. A bit set in the hfunc field is corresponding to an
index in the new string-set ETH_SS_RSS_HASH_FUNCS.

Got approval from most of the relevant driver maintainers that their
driver is using Toeplitz, and for the few that didn't answered, also
assumed it is Toeplitz.

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Ariel Elior <ariel.elior@qlogic.com>
Cc: Prashant Sreedharan <prashant@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Sathya Perla <sathya.perla@emulex.com>
Cc: Subbu Seetharaman <subbu.seetharaman@emulex.com>
Cc: Ajit Khaparde <ajit.khaparde@emulex.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
Cc: Don Skidmore <donald.c.skidmore@intel.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: Matthew Vick <matthew.vick@intel.com>
Cc: John Ronciak <john.ronciak@intel.com>
Cc: Mitch Williams <mitch.a.williams@intel.com>
Cc: Amir Vadai <amirv@mellanox.com>
Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
Cc: Shradha Shah <sshah@solarflare.com>
Cc: Shreyas Bhatewara <sbhatewara@vmware.com>
Cc: "VMware, Inc." <pv-drivers@vmware.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: NEyal Perry <eyalpe@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

892311f6

net: Add functions for handling padding frame and adding to length · 9c0c1124

由 Alexander Duyck 提交于 12月 03, 2014

This patch adds two new helper functions skb_put_padto and eth_skb_pad.
These functions deviate from the standard skb_pad or skb_padto in that they
will also update the length and tail pointers so that they reflect the
padding added to the frame.

The eth_skb_pad helper is meant to be used with Ethernet devices to update
either Rx or Tx frames so that they report the correct size. The
skb_put_padto helper is meant to be used primarily in the transmit path for
network devices that need frames to be padded up to some minimum size and
don't wish to simply update the length somewhere external to the frame.

The motivation behind this is that there are a number of implementations
throughout the network device drivers that are all doing the same thing,
but each a little bit differently and as a result several implementations
contain bugs such as updating the length without updating the tail offset
and other similar issues.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c0c1124

net/mlx5_core: Remove unused dev cap enum fields · 0c7aac85

由 Eli Cohen 提交于 12月 02, 2014

These enumerations are not used so remove them.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c7aac85

tipc: convert name table read-write lock to RCU · 97ede29e

由 Ying Xue 提交于 12月 02, 2014

Convert tipc name table read-write lock to RCU. After this change,
a new spin lock is used to protect name table on write side while
RCU is applied on read side.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Tested-by: NErik Hugne <erik.hugne@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

97ede29e

net: bcmgenet: enable driver to work without a device tree · b0ba512e

由 Petri Gynther 提交于 12月 01, 2014

Modify bcmgenet driver so that it can be used on Broadcom 7xxx
MIPS-based STB platforms without a device tree.
Signed-off-by: NPetri Gynther <pgynther@google.com>
Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0ba512e

copy_from_iter_nocache() · aa583096

由 Al Viro 提交于 11月 27, 2014

BTW, do we want memcpy_nocache()?
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

aa583096

new helper: iov_iter_kvec() · abb78f87

由 Al Viro 提交于 11月 24, 2014

initialization of kvec-backed iov_iter
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

abb78f87

A
csum_and_copy_..._iter() · a604ec7e
由 Al Viro 提交于 11月 24, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
a604ec7e

hyperv: Add support for vNIC hot removal · c3582a2c

由 Haiyang Zhang 提交于 12月 01, 2014

This patch adds proper handling of the vNIC hot removal event, which includes
a rescind-channel-offer message from the host side that triggers vNIC close and
removal. In this case, the notices to the host during close and removal is not
necessary because the channel is rescinded. This patch blocks these unnecessary
messages, and lets vNIC removal process complete normally.
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3582a2c

net-timestamp: allow reading recv cmsg on errqueue with origin tstamp · 829ae9d6

由 Willem de Bruijn 提交于 11月 30, 2014

Allow reading of timestamps and cmsg at the same time on all relevant
socket families. One use is to correlate timestamps with egress
device, by asking for cmsg IP_PKTINFO.

on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
avoid changing legacy expectations, only do so if the caller sets a
new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.

on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
returned for all origins. only change is to set ifindex, which is
not initialized for all error origins.

In both cases, only generate the pktinfo message if an ifindex is
known. This is not the case for ACK timestamps.

The difference between the protocol families is probably a historical
accident as a result of the different conditions for generating cmsg
in the relevant ip(v6)_recv_error function:

ipv4:        if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
ipv6:        if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {

At one time, this was the same test bar for the ICMP/ICMP6
distinction. This is no longer true.
Signed-off-by: NWillem de Bruijn <willemb@google.com>

----

Changes
  v1 -> v2
    large rewrite
    - integrate with existing pktinfo cmsg generation code
    - on ipv4: only send with new flag, to maintain legacy behavior
    - on ipv6: send at most a single pktinfo cmsg
    - on ipv6: initialize fields if not yet initialized

The recv cmsg interfaces are also relevant to the discussion of
whether looping packet headers is problematic. For v6, cmsgs that
identify many headers are already returned. This patch expands
that to v4. If it sounds reasonable, I will follow with patches

1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
   (http://patchwork.ozlabs.org/patch/366967/)
2. sysctl to conditionally drop all timestamps that have payload or
   cmsg from users without CAP_NET_RAW.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

829ae9d6

iov_iter.c: handle ITER_KVEC directly · a280455f

由 Al Viro 提交于 11月 27, 2014

... without bothering with copy_..._user()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a280455f

06 12月, 2014 2 次提交

net: sock: allow eBPF programs to be attached to sockets · 89aa0758

由 Alexei Starovoitov 提交于 12月 01, 2014

introduce new setsockopt() command:

setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd))

where prog_fd was received from syscall bpf(BPF_PROG_LOAD, attr, ...)
and attr->prog_type == BPF_PROG_TYPE_SOCKET_FILTER

setsockopt() calls bpf_prog_get() which increments refcnt of the program,
so it doesn't get unloaded while socket is using the program.

The same eBPF program can be attached to multiple sockets.

User task exit automatically closes socket which calls sk_filter_uncharge()
which decrements refcnt of eBPF program
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89aa0758

bpf: verifier: add checks for BPF_ABS | BPF_IND instructions · ddd872bc

由 Alexei Starovoitov 提交于 12月 01, 2014

introduce program type BPF_PROG_TYPE_SOCKET_FILTER that is used
for attaching programs to sockets where ctx == skb.

add verifier checks for ABS/IND instructions which can only be seen
in socket filters, therefore the check:
  if (env->prog->aux->prog_type != BPF_PROG_TYPE_SOCKET_FILTER)
    verbose("BPF_LD_ABS|IND instructions are only allowed in socket filters\n");
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ddd872bc

03 12月, 2014 11 次提交

netfilter: ipset: Alignment problem between 64bit kernel 32bit userspace · a51b9199

由 Jozsef Kadlecsik 提交于 11月 30, 2014

Sven-Haegar Koch reported the issue:

sims:~# iptables -A OUTPUT -m set --match-set testset src -j ACCEPT
iptables: Invalid argument. Run `dmesg' for more information.

In syslog:
x_tables: ip_tables: set.3 match: invalid size 48 (kernel) != (user) 32

which was introduced by the counter extension in ipset.

The patch fixes the alignment issue with introducing a new set match
revision with the fixed underlying 'struct ip_set_counter_match'
structure.
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

a51b9199

bridge: add brport flags to dflt bridge_getlink · 2c3c031c