- 25 12月, 2016 1 次提交
-
-
由 Linus Torvalds 提交于
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 12月, 2016 1 次提交
-
-
由 Ido Schimmel 提交于
neigh_cleanup_and_release() is always called after marking a neighbour as dead, but it only notifies user space and not in-kernel listeners of the netevent notification chain. This can cause multiple problems. In my specific use case, it causes the listener (a switch driver capable of L3 offloads) to believe a neighbour entry is still valid, and is thus erroneously kept in the device's table. Fix that by sending a netevent after marking the neighbour as dead. Fixes: a6bf9e93 ("mlxsw: spectrum_router: Offload neighbours based on NUD state change") Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 12月, 2016 1 次提交
-
-
由 John Fastabend 提交于
This adds a warning for drivers to use when encountering an invalid buffer for XDP. For normal cases this should not happen but to catch this in virtual/qemu setups that I may not have expected from the emulation layer having a standard warning is useful. Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 12月, 2016 1 次提交
-
-
由 Paul Moore 提交于
Bring back commit bc51dddf ("netns: avoid disabling irq for netns id") now that we've fixed some audit multicast issues that caused problems with original attempt. Additional information, and history, can be found in the links below: * https://github.com/linux-audit/audit-kernel/issues/22 * https://github.com/linux-audit/audit-kernel/issues/23Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NPaul Moore <paul@paul-moore.com>
-
- 10 12月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
It seems attackers can also send UDP packets with no payload at all. skb_condense() can still be a win in this case. It will be possible to replace the custom code in tcp_add_backlog() to get full benefit from skb_condense() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 12月, 2016 4 次提交
-
-
由 Martin KaFai Lau 提交于
This patch allows XDP prog to extend/remove the packet data at the head (like adding or removing header). It is done by adding a new XDP helper bpf_xdp_adjust_head(). It also renames bpf_helper_changes_skb_data() to bpf_helper_changes_pkt_data() to better reflect that XDP prog does not work on skb. This patch adds one "xdp_adjust_head" bit to bpf_prog for the XDP-capable driver to check if the XDP prog requires bpf_xdp_adjust_head() support. The driver can then decide to error out during XDP_SETUP_PROG. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJohn Fastabend <john.r.fastabend@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Under UDP flood, many softirq producers try to add packets to UDP receive queue, and one user thread is burning one cpu trying to dequeue packets as fast as possible. Two parts of the per packet cost are : - copying payload from kernel space to user space, - freeing memory pieces associated with skb. If socket is under pressure, softirq handler(s) can try to pull in skb->head the payload of the packet if it fits. Meaning the softirq handler(s) can free/reuse the page fragment immediately, instead of letting udp_recvmsg() do this hundreds of usec later, possibly from another node. Additional gains : - We reduce skb->truesize and thus can store more packets per SO_RCVBUF - We avoid cache line misses at copyout() time and consume_skb() time, and avoid one put_page() with potential alien freeing on NUMA hosts. This comes at the cost of a copy, bounded to available tail room, which is usually small. (We might have to fix GRO_MAX_HEAD which looks bigger than necessary) This patch gave me about 5 % increase in throughput in my tests. skb_condense() helper could probably used in other contexts. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
RFS is not commonly used, so add a jump label to avoid some conditionals in fast path. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Simon Horman 提交于
Allow dissection of ICMP(V6) type and code. This should only occur if a packet is ICMP(V6) and the dissector has FLOW_DISSECTOR_KEY_ICMP set. There are currently no users of FLOW_DISSECTOR_KEY_ICMP. A follow-up patch will allow FLOW_DISSECTOR_KEY_ICMP to be used by the flower classifier. Signed-off-by: NSimon Horman <simon.horman@netronome.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 12月, 2016 3 次提交
-
-
由 Eric Dumazet 提交于
In UDP recvmsg() path we currently access 3 cache lines from an skb while holding receive queue lock, plus another one if packet is dequeued, since we need to change skb->next->prev 1st cache line (contains ->next/prev pointers, offsets 0x00 and 0x08) 2nd cache line (skb->len & skb->peeked, offsets 0x80 and 0x8e) 3rd cache line (skb->truesize/users, offsets 0xe0 and 0xe4) skb->peeked is only needed to make sure 0-length packets are properly handled while MSG_PEEK is operated. I had first the intent to remove skb->peeked but the "MSG_PEEK at non-zero offset" support added by Sam Kumar makes this not possible. This patch avoids one cache line miss during the locked section, when skb->len and skb->peeked do not have to be read. It also avoids the skb_set_peeked() cost for non empty UDP datagrams. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Commit d691f9e8 ("bpf: allow programs to write to certain skb fields") pushed access type check outside of __is_valid_access() to have different restrictions for socket filters and tc programs. type is thus not used anymore within __is_valid_access() and should be removed as a function argument. Same for __is_valid_xdp_access() introduced by 6a773a15 ("bpf: add XDP prog type for early driver filter"). Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
1) Old code was hard to maintain, due to complex lock chains. (We probably will be able to remove some kfree_rcu() in callers) 2) Using a single timer to update all estimators does not scale. 3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity is not supposed to work well) In this rewrite : - I removed the RB tree that had to be scanned in gen_estimator_active(). qdisc dumps should be much faster. - Each estimator has its own timer. - Estimations are maintained in net_rate_estimator structure, instead of dirtying the qdisc. Minor, but part of the simplification. - Reading the estimator uses RCU and a seqcount to provide proper support for 32bit kernels. - We reduce memory need when estimators are not used, since we store a pointer, instead of the bytes/packets counters. - xt_rateest_mt() no longer has to grab a spinlock. (In the future, xt_rateest_tg() could be switched to per cpu counters) Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 12月, 2016 4 次提交
-
-
由 Eric Dumazet 提交于
Under heavy stress, timer used in estimators tend to slowly be delayed by a few jiffies, leading to inaccuracies. Lets remember what was the last scheduled jiffies so that we get more precise estimations, without having to add a multiply/divide in the loop to account for the drifts. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexey Dobriyan 提交于
net_generic() function is both a) inline and b) used ~600 times. It has the following code inside ... ptr = ng->ptr[id - 1]; ... "id" is never compile time constant so compiler is forced to subtract 1. And those decrements or LEA [r32 - 1] instructions add up. We also start id'ing from 1 to catch bugs where pernet sybsystem id is not initialized and 0. This is quite pointless idea (nothing will work or immediate interference with first registered subsystem) in general but it hints what needs to be done for code size reduction. Namely, overlaying allocation of pointer array and fixed part of structure in the beginning and using usual base-0 addressing. Ids are just cookies, their exact values do not matter, so lets start with 3 on x86_64. Code size savings (oh boy): -4.2 KB As usual, ignore the initial compiler stupidity part of the table. add/remove: 0/0 grow/shrink: 12/670 up/down: 89/-4297 (-4208) function old new delta tipc_nametbl_insert_publ 1250 1270 +20 nlmclnt_lookup_host 686 703 +17 nfsd4_encode_fattr 5930 5941 +11 nfs_get_client 1050 1061 +11 register_pernet_operations 333 342 +9 tcf_mirred_init 843 849 +6 tcf_bpf_init 1143 1149 +6 gss_setup_upcall 990 994 +4 idmap_name_to_id 432 434 +2 ops_init 274 275 +1 nfsd_inject_forget_client 259 260 +1 nfs4_alloc_client 612 613 +1 tunnel_key_walker 164 163 -1 ... tipc_bcbase_select_primary 392 360 -32 mac80211_hwsim_new_radio 2808 2767 -41 ipip6_tunnel_ioctl 2228 2186 -42 tipc_bcast_rcv 715 672 -43 tipc_link_build_proto_msg 1140 1089 -51 nfsd4_lock 3851 3796 -55 tipc_mon_rcv 1012 956 -56 Total: Before=156643951, After=156639743, chg -0.00% Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexey Dobriyan 提交于
This is precursor to fixing "[id - 1]" bloat inside net_generic(). Name "s" is chosen to complement name "u" often used for dummy unions. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexey Dobriyan 提交于
Publishing net_generic pointer is done with silly mistake: new array is published BEFORE setting freshly acquired pernet subsystem pointer. memcpy rcu_assign_pointer kfree_rcu ng->ptr[id - 1] = data; This bug was introduced with commit dec827d1 ("[NETNS]: The generic per-net pointers.") in the glorious days of chopping networking stack into containers proper 8.5 years ago (whee...) How it didn't trigger for so long? Well, you need quite specific set of conditions: *) race window opens once per pernet subsystem addition (read: modprobe or boot) *) not every pernet subsystem is eligible (need ->id and ->size) *) not every pernet subsystem is vulnerable (need incorrect or absense of ordering of register_pernet_sybsys() and actually using net_generic()) *) to hide the bug even more, default is to preallocate 13 pointers which is actually quite a lot. You need IPv6, netfilter, bridging etc together loaded to trigger reallocation in the first place. Trimmed down config are OK. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 12月, 2016 5 次提交
-
-
由 Eric Dumazet 提交于
CAP_NET_ADMIN users should not be allowed to set negative sk_sndbuf or sk_rcvbuf values, as it can lead to various memory corruptions, crashes, OOM... Note that before commit 82981930 ("net: cleanups in sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF and SO_RCVBUF were vulnerable. This needs to be backported to all known linux kernels. Again, many thanks to syzkaller team for discovering this gem. Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NAndrey Konovalov <andreyknvl@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Add socket family, type and protocol to bpf_sock allowing bpf programs read-only access. Add __sk_flags_offset[0] to struct sock before the bitfield to programmtically determine the offset of the unsigned int containing protocol and type. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run any time a process in the cgroup opens an AF_INET or AF_INET6 socket. Currently only sk_bound_dev_if is exported to userspace for modification by a bpf program. This allows a cgroup to be configured such that AF_INET{6} sockets opened by processes are automatically bound to a specific device. In turn, this enables the running of programs that do not support SO_BINDTODEVICE in a specific VRF context / L3 domain. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
Eric says: "By looking at tcpdump, and TS val of xmit packets of multiple flows, we can deduct the relative qdisc delays (think of fq pacing). This should work even if we have one flow per remote peer." Having random per flow (or host) offsets doesn't allow that anymore so add a way to turn this off. Suggested-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
jiffies based timestamps allow for easy inference of number of devices behind NAT translators and also makes tracking of hosts simpler. commit ceaa1fef ("tcp: adding a per-socket timestamp offset") added the main infrastructure that is needed for per-connection ts randomization, in particular writing/reading the on-wire tcp header format takes the offset into account so rest of stack can use normal tcp_time_stamp (jiffies). So only two items are left: - add a tsoffset for request sockets - extend the tcp isn generator to also return another 32bit number in addition to the ISN. Re-use of ISN generator also means timestamps are still monotonically increasing for same connection quadruple, i.e. PAWS will still work. Includes fixes from Eric Dumazet. Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NEric Dumazet <edumazet@google.com> Acked-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 12月, 2016 4 次提交
-
-
由 Soheil Hassas Yeganeh 提交于
Only when ICMP packets are enqueued onto the error queue, sk_err is also set. Before f5f99309 (sock: do not set sk_err in sock_dequeue_err_skb), a subsequent error queue read would set sk_err to the next error on the queue, or 0 if empty. As no error types other than ICMP set this field, sk_err should not be modified upon dequeuing them. Only for ICMP errors, reset the (racy) sk_err. Some applications, like traceroute, rely on it and go into a futile busy POLLERR loop otherwise. In principle, sk_err has to be set while an ICMP error is queued. Testing is_icmp_err_skb(skb_next) approximates this without requiring a full queue walk. Applications that receive both ICMP and other errors cannot rely on this legacy behavior, as other errors do not set sk_err in the first place. Fixes: f5f99309 (sock: do not set sk_err in sock_dequeue_err_skb) Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Acked-by: NMaciej Żenczykowski <maze@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Thomas Graf 提交于
Registers new BPF program types which correspond to the LWT hooks: - BPF_PROG_TYPE_LWT_IN => dst_input() - BPF_PROG_TYPE_LWT_OUT => dst_output() - BPF_PROG_TYPE_LWT_XMIT => lwtunnel_xmit() The separate program types are required to differentiate between the capabilities each LWT hook allows: * Programs attached to dst_input() or dst_output() are restricted and may only read the data of an skb. This prevent modification and possible invalidation of already validated packet headers on receive and the construction of illegal headers while the IP headers are still being assembled. * Programs attached to lwtunnel_xmit() are allowed to modify packet content as well as prepending an L2 header via a newly introduced helper bpf_skb_change_head(). This is safe as lwtunnel_xmit() is invoked after the IP header has been assembled completely. All BPF programs receive an skb with L3 headers attached and may return one of the following error codes: BPF_OK - Continue routing as per nexthop BPF_DROP - Drop skb and return EPERM BPF_REDIRECT - Redirect skb to device as per redirect() helper. (Only valid in lwtunnel_xmit() context) The return codes are binary compatible with their TC_ACT_ relatives to ease compatibility. Signed-off-by: NThomas Graf <tgraf@suug.ch> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tobias Klauser 提交于
Use the correct attribute constant names IFLA_GSO_MAX_{SEGS,SIZE} instead of IFLA_MAX_GSO_{SEGS,SIZE} for the comments int nlmsg_size(). Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: NTobias Klauser <tklauser@distanz.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Zhang Shengju 提交于
Before this patch, function ndo_dflt_fdb_dump() will always return code from uc fdb dump. The reture code of mc fdb dump is lost. Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 12月, 2016 1 次提交
-
-
由 Zhang Shengju 提交于
Currently loop index 'idx' is used as the index in the neigh list of interest. It's increased only when the neigh is dumped. It's not the absolute index in the list. Because there is no info to record which neigh has already be scanned by previous loop. This will cause the filtered out neighs to be scanned mulitple times. This patch make idx as the absolute index in the list, it will increase no matter whether the neigh is filtered. This will prevent the above problem. And this is in line with other dump functions. v2: - take David Ahern's advice to do simple change Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 11月, 2016 2 次提交
-
-
由 Daniel Borkmann 提交于
Add an IFLA_XDP_FLAGS attribute that can be passed for setting up XDP along with IFLA_XDP_FD, which eventually allows user space to implement typical add/replace/delete logic for programs. Right now, calling into dev_change_xdp_fd() will always replace previous programs. When passed XDP_FLAGS_UPDATE_IF_NOEXIST, we can handle this more graceful when requested by returning -EBUSY in case we try to attach a new program, but we find that another one is already attached. This will be used by upcoming front-end for iproute2 as well. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Francis Yan 提交于
This patch exports the sender chronograph stats via the socket SO_TIMESTAMPING channel. Currently we can instrument how long a particular application unit of data was queued in TCP by tracking SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having these sender chronograph stats exported simultaneously along with these timestamps allow further breaking down the various sender limitation. For example, a video server can tell if a particular chunk of video on a connection takes a long time to deliver because TCP was experiencing small receive window. It is not possible to tell before this patch without packet traces. To prepare these stats, the user needs to set SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags while requesting other SOF_TIMESTAMPING TX timestamps. When the timestamps are available in the error queue, the stats are returned in a separate control message of type SCM_TIMESTAMPING_OPT_STATS, in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME, TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond. Signed-off-by: NFrancis Yan <francisyyan@gmail.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 11月, 2016 1 次提交
-
-
由 Daniel Borkmann 提交于
Commit dcf80034 ("net/sched: act_mirred: Refactor detection whether dev needs xmit at mac header") added dev_is_mac_header_xmit(); since it's also useful elsewhere, move it to if_arp.h and reuse it for BPF. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 11月, 2016 4 次提交
-
-
由 Miroslav Lichvar 提交于
The ETHTOOL_GLINKSETTINGS command is deprecating the ETHTOOL_GSET command and likewise it shouldn't require the CAP_NET_ADMIN capability. Signed-off-by: NMiroslav Lichvar <mlichvar@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Typical NAPI drivers use napi_consume_skb(skb) at TX completion time. This put skb in a percpu special queue, napi_alloc_cache, to get bulk frees. It turns out the queue is not flushed and hits the NAPI_SKB_CACHE_SIZE limit quite often, with skbs that were queued hundreds of usec earlier. I measured this can take ~6000 nsec to perform one flush. __kfree_skb_flush() can be called from two points right now : 1) From net_tx_action(), but only for skbs that were queued to sd->completion_queue. -> Irrelevant for NAPI drivers in normal operation. 2) From net_rx_action(), but only under high stress or if RPS/RFS has a pending action. This patch changes net_rx_action() to perform the flush in all cases and after more urgent operations happened (like kicking remote CPUS for RPS/RFS). Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Mack 提交于
If the cgroup associated with the receiving socket has an eBPF programs installed, run them from sk_filter_trim_cap(). eBPF programs used in this context are expected to either return 1 to let the packet pass, or != 1 to drop them. The programs have access to the skb through bpf_skb_load_bytes(), and the payload starts at the network headers (L3). Note that cgroup_bpf_run_filter() is stubbed out as static inline nop for !CONFIG_CGROUP_BPF, and is otherwise guarded by a static key if the feature is unused. Signed-off-by: NDaniel Mack <daniel@zonque.org> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Mack 提交于
This program type is similar to BPF_PROG_TYPE_SOCKET_FILTER, except that it does not allow BPF_LD_[ABS|IND] instructions and hooks up the bpf_skb_load_bytes() helper. Programs of this type will be attached to cgroups for network filtering and accounting. Signed-off-by: NDaniel Mack <daniel@zonque.org> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 11月, 2016 4 次提交
-
-
由 Florian Fainelli 提交于
PHY drivers should be able to rely on the caller of {get,set}_tunable to have acquired the PHY device mutex, in order to both serialize against concurrent calls of these functions, but also against PHY state machine changes. All ethtool PHY-level functions do this, except {get,set}_tunable, so we make them consistent here as well. We need to update the Microsemi PHY driver in the same commit to avoid introducing either deadlocks, or lack of proper locking. Fixes: 968ad9da ("ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE") Fixes: 310d9ad5 ("net: phy: Add downshift get/set support in Microsemi PHYs driver") Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Reviewed-by: NAllan W. Nielsen <allan.nielsen@microsemi.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Roi Dayan 提交于
Some HWs need the VF driver to put part of the packet headers on the TX descriptor so the e-switch can do proper matching and steering. The supported modes: none, link, network, transport. Signed-off-by: NRoi Dayan <roid@mellanox.com> Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Or Gerlitz 提交于
Some drivers would need to check few internal matters for that. To be used in downstream mlx5 commit. Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
In commit 2331ccc5 ("tcp: enhance tcp collapsing"), we made a first step allowing copying right skb to left skb head. Since all skbs in socket write queue are headless (but possibly the very first one), this strategy often does not work. This patch extends tcp_collapse_retrans() to perform frag shifting, thanks to skb_shift() helper. This helper needs to not BUG on non headless skbs, as callers are ok with that. Tested: Following packetdrill test now passes : 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> +.100 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 +0 write(4, ..., 200) = 200 +0 > P. 1:201(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 201:401(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 401:601(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 601:801(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 801:1001(200) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1001:1101(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1101:1201(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1201:1301(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1301:1401(100) ack 1 +.099 < . 1:1(0) ack 201 win 257 +.001 < . 1:1(0) ack 201 win 257 <nop,nop,sack 1001:1401> +0 > P. 201:1001(800) ack 1 Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 11月, 2016 1 次提交
-
-
由 Zhang Shengju 提交于
For RT netlink, calcit() function should return the minimal size for netlink dump message. This will make sure that dump message for every network device can be stored. Currently, rtnl_calcit() function doesn't account the size of header of netlink message, this patch will fix it. Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 11月, 2016 2 次提交
-
-
由 Miroslav Urbanek 提交于
The threshold for OOM protection is too small for systems with large number of CPUs. Applications report ENOBUFs on connect() every 10 minutes. The problem is that the variable net->xfrm.flow_cache_gc_count is a global counter while the variable fc->high_watermark is a per-CPU constant. Take the number of CPUs into account as well. Fixes: 6ad3122a ("flowcache: Avoid OOM condition under preasure") Reported-by: NLukáš Koldrt <lk@excello.cz> Tested-by: NJan Hejl <jh@excello.cz> Signed-off-by: NMiroslav Urbanek <mu@miroslavurbanek.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Eric Dumazet 提交于
Andre Noll reported panics after my recent fix (commit 34fad54c "net: __skb_flow_dissect() must cap its return value") After some more headaches, Alexander root caused the problem to init_default_flow_dissectors() being called too late, in case a network driver like IGB is not a module and receives DHCP message very early. Fix is to call init_default_flow_dissectors() much earlier, as it is a core infrastructure and does not depend on another kernel service. Fixes: 06635a35 ("flow_dissect: use programable dissector in skb_flow_dissect and friends") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NAndre Noll <maan@tuebingen.mpg.de> Diagnosed-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-