- 27 8月, 2014 1 次提交
-
-
由 Christoph Lameter 提交于
Replace uses of get_cpu_var for address calculation through this_cpu_ptr. Cc: netdev@vger.kernel.org Cc: Eric Dumazet <edumazet@google.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 12 8月, 2014 1 次提交
-
-
由 Vlad Yasevich 提交于
Currently the functionality to untag traffic on input resides as part of the vlan module and is build only when VLAN support is enabled in the kernel. When VLAN is disabled, the function vlan_untag() turns into a stub and doesn't really untag the packets. This seems to create an interesting interaction between VMs supporting checksum offloading and some network drivers. There are some drivers that do not allow the user to change tx-vlan-offload feature of the driver. These drivers also seem to assume that any VLAN-tagged traffic they transmit will have the vlan information in the vlan_tci and not in the vlan header already in the skb. When transmitting skbs that already have tagged data with partial checksum set, the checksum doesn't appear to be updated correctly by the card thus resulting in a failure to establish TCP connections. The following is a packet trace taken on the receiver where a sender is a VM with a VLAN configued. The host VM is running on doest not have VLAN support and the outging interface on the host is tg3: 10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243, offset 0, flags [DF], proto TCP (6), length 60) 10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect -> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val 4294837885 ecr 0,nop,wscale 7], length 0 10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244, offset 0, flags [DF], proto TCP (6), length 60) 10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect -> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val 4294838888 ecr 0,nop,wscale 7], length 0 This connection finally times out. I've only access to the TG3 hardware in this configuration thus have only tested this with TG3 driver. There are a lot of other drivers that do not permit user changes to vlan acceleration features, and I don't know if they all suffere from a similar issue. The patch attempt to fix this another way. It moves the vlan header stipping code out of the vlan module and always builds it into the kernel network core. This way, even if vlan is not supported on a virtualizatoin host, the virtual machines running on top of such host will still work with VLANs enabled. CC: Patrick McHardy <kaber@trash.net> CC: Nithin Nayak Sujir <nsujir@broadcom.com> CC: Michael Chan <mchan@broadcom.com> CC: Jiri Pirko <jiri@resnulli.us> Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com> Acked-by: NJiri Pirko <jiri@resnulli.us> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 8月, 2014 1 次提交
-
-
由 Jiri Benc 提交于
Commit 1d8faf48 ("net/core: Add VF link state control") added new attribute to IFLA_VF_INFO group in rtnl_fill_ifinfo but did not adjust size of the allocated memory in if_nlmsg_size/rtnl_vfinfo_size. As the result, we may trigger warnings in rtnl_getlink and similar functions when many VF links are enabled, as the information does not fit into the allocated skb. Fixes: 1d8faf48 ("net/core: Add VF link state control") Reported-by: NYulong Pei <ypei@redhat.com> Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 8月, 2014 5 次提交
-
-
由 Willem de Bruijn 提交于
TCP timestamping extends SO_TIMESTAMPING to bytestreams. Bytestreams do not have a 1:1 relationship between send() buffers and network packets. The feature interprets a send call on a bytestream as a request for a timestamp for the last byte in that send() buffer. The choice corresponds to a request for a timestamp when all bytes in the buffer have been sent. That assumption depends on in-order kernel transmission. This is the common case. That said, it is possible to construct a traffic shaping tree that would result in reordering. The guarantee is strong, then, but not ironclad. This implementation supports send and sendpages (splice). GSO replaces one large packet with multiple smaller packets. This patch also copies the option into the correct smaller packet. This patch does not yet support timestamping on data in an initial TCP Fast Open SYN, because that takes a very different data path. If ID generation in ee_data is enabled, bytestream timestamps return a byte offset, instead of the packet counter for datagrams. The implementation supports a single timestamp per packet. It silenty replaces requests for previous timestamps. To avoid missing tstamps, flush the tcp queue by disabling Nagle, cork and autocork. Missing tstamps can be detected by offset when the ee_data ID is enabled. Implementation details: - On GSO, the timestamping code can be included in the main loop. I moved it into its own loop to reduce the impact on the common case to a single branch. - To avoid leaking the absolute seqno to userspace, the offset returned in ee_data must always be relative. It is an offset between an skb and sk field. The first is always set (also for GSO & ACK). The second must also never be uninitialized. Only allow the ID option on sockets in the ESTABLISHED state, for which the seqno is available. Never reset it to zero (instead, move it to the current seqno when reenabling the option). Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
Kernel transmit latency is often incurred in the packet scheduler. Introduce a new timestamp on transmission just before entering the scheduler. When data travels through multiple devices (bonding, tunneling, ...) each device will export an individual timestamp. Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
Datagrams timestamped on transmission can coexist in the kernel stack and be reordered in packet scheduling. When reading looped datagrams from the socket error queue it is not always possible to unique correlate looped data with original send() call (for application level retransmits). Even if possible, it may be expensive and complex, requiring packet inspection. Introduce a data-independent ID mechanism to associate timestamps with send calls. Pass an ID alongside the timestamp in field ee_data of sock_extended_err. The ID is a simple 32 bit unsigned int that is associated with the socket and incremented on each send() call for which software tx timestamp generation is enabled. The feature is enabled only if SOF_TIMESTAMPING_OPT_ID is set, to avoid changing ee_data for existing applications that expect it 0. The counter is reset each time the flag is reenabled. Reenabling does not change the ID of already submitted data. It is possible to receive out of order IDs if the timestamp stream is not quiesced first. Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
sk_flags is reaching its limit. New timestamping options will not fit. Move all of them into a new field sk->sk_tsflags. Added benefit is that this removes boilerplate code to convert between SOF_TIMESTAMPING_.. and SOCK_TIMESTAMPING_.. in getsockopt/setsockopt. SOCK_TIMESTAMPING_RX_SOFTWARE is also used to toggle the receive timestamp logic (netstamp_needed). That can be simplified and this last key removed, but will leave that for a separate patch. Signed-off-by: NWillem de Bruijn <willemb@google.com> ---- The u16 in sock can be moved into a 16-bit hole below sk_gso_max_segs, though that scatters tstamp fields throughout the struct. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
Applications that request kernel tx timestamps with SO_TIMESTAMPING read timestamps as recvmsg() ancillary data. The response is defined implicitly as timespec[3]. 1) define struct scm_timestamping explicitly and 2) add support for new tstamp types. On tx, scm_timestamping always accompanies a sock_extended_err. Define previously unused field ee_info to signal the type of ts[0]. Introduce SCM_TSTAMP_SND to define the existing behavior. The reception path is not modified. On rx, no struct similar to sock_extended_err is passed along with SCM_TIMESTAMPING. Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 8月, 2014 5 次提交
-
-
由 Alexei Starovoitov 提交于
clean up names related to socket filtering and bpf in the following way: - everything that deals with sockets keeps 'sk_*' prefix - everything that is pure BPF is changed to 'bpf_*' prefix split 'struct sk_filter' into struct sk_filter { atomic_t refcnt; struct rcu_head rcu; struct bpf_prog *prog; }; and struct bpf_prog { u32 jited:1, len:31; struct sock_fprog_kern *orig_prog; unsigned int (*bpf_func)(const struct sk_buff *skb, const struct bpf_insn *filter); union { struct sock_filter insns[0]; struct bpf_insn insnsi[0]; struct work_struct work; }; }; so that 'struct bpf_prog' can be used independent of sockets and cleans up 'unattached' bpf use cases split SK_RUN_FILTER macro into: SK_RUN_FILTER to be used with 'struct sk_filter *' and BPF_PROG_RUN to be used with 'struct bpf_prog *' __sk_filter_release(struct sk_filter *) gains __bpf_prog_release(struct bpf_prog *) helper function also perform related renames for the functions that work with 'struct bpf_prog *', since they're on the same lines: sk_filter_size -> bpf_prog_size sk_filter_select_runtime -> bpf_prog_select_runtime sk_filter_free -> bpf_prog_free sk_unattached_filter_create -> bpf_prog_create sk_unattached_filter_destroy -> bpf_prog_destroy sk_store_orig_filter -> bpf_prog_store_orig_filter sk_release_orig_filter -> bpf_release_orig_filter __sk_migrate_filter -> bpf_migrate_filter __sk_prepare_filter -> bpf_prepare_filter API for attaching classic BPF to a socket stays the same: sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *) and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program which is used by sockets, tun, af_packet API for 'unattached' BPF programs becomes: bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *) and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
to indicate that this function is converting classic BPF into eBPF and not related to sockets Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
trivial rename to indicate that this functions performs classic BPF checking Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
trivial rename to better match semantics of macro Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
attaching bpf program to a socket involves multiple socket memory arithmetic, since size of 'sk_filter' is changing when classic BPF is converted to eBPF. Also common path of program creation has to deal with two ways of freeing the memory. Simplify the code by delaying socket charging until program is ready and its size is known Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 8月, 2014 1 次提交
-
-
由 Vlad Yasevich 提交于
When performing segmentation, the mac_len value is copied right out of the original skb. However, this value is not always set correctly (like when the packet is VLAN-tagged) and we'll end up copying a bad value. One way to demonstrate this is to configure a VM which tags packets internally and turn off VLAN acceleration on the forwarding bridge port. The packets show up corrupt like this: 16:18:24.985548 52:54:00:ab:be:25 > 52:54:00:26:ce:a3, ethertype 802.1Q (0x8100), length 1518: vlan 100, p 0, ethertype 0x05e0, 0x0000: 8cdb 1c7c 8cdb 0064 4006 b59d 0a00 6402 ...|...d@.....d. 0x0010: 0a00 6401 9e0d b441 0a5e 64ec 0330 14fa ..d....A.^d..0.. 0x0020: 29e3 01c9 f871 0000 0101 080a 000a e833)....q.........3 0x0030: 000f 8c75 6e65 7470 6572 6600 6e65 7470 ...unetperf.netp 0x0040: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp 0x0050: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp 0x0060: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp ... This also leads to awful throughput as GSO packets are dropped and cause retransmissions. The solution is to set the mac_len using the values already available in then new skb. We've already adjusted all of the header offset, so we might as well correctly figure out the mac_len using skb_reset_mac_len(). After this change, packets are segmented correctly and performance is restored. CC: Eric Dumazet <edumazet@google.com> Signed-off-by: NVlad Yasevich <vyasevic@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 7月, 2014 2 次提交
-
-
由 Pablo Neira 提交于
sk_unattached_filter_destroy() does not always need to release the filter object via rcu. Since this filter is never attached to the socket, the caller should be responsible for releasing the filter in a safe way, which may not necessarily imply rcu. This is a short summary of clients of this function: 1) xt_bpf.c and cls_bpf.c use the bpf matchers from rules, these rules are removed from the packet path before the filter is released. Thus, the framework makes sure the filter is safely removed. 2) In the ppp driver, the ppp_lock ensures serialization between the xmit and filter attachment/detachment path. This doesn't use rcu so deferred release via rcu makes no sense. 3) In the isdn/ppp driver, it is called from isdn_ppp_release() the isdn_ppp_ioctl(). This driver uses mutex and spinlocks, no rcu. Thus, deferred rcu makes no sense to me either, the deferred releases may be just masking the effects of wrong locking strategy, which should be fixed in the driver itself. 4) In the team driver, this is the only place where the rcu synchronization with unattached filter is used. Therefore, this patch introduces synchronize_rcu() which is called from the genetlink path to make sure the filter doesn't go away while packets are still walking over it. I think we can revisit this once struct bpf_prog (that only wraps specific bpf code bits) is in place, then add some specific struct rcu_head in the scope of the team driver if Jiri thinks this is needed. Deferred rcu release for unattached filters was originally introduced in 302d6637 ("filter: Allow to create sk-unattached filters"). Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Thomas Graf 提交于
No need for the unlikely(), WARN_ON() and BUG_ON() internally use unlikely() on the condition. Signed-off-by: NThomas Graf <tgraf@suug.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 7月, 2014 3 次提交
-
-
由 Eric W. Biederman 提交于
The synchronous syncrhonize_rcu in switch_task_namespaces makes setns a sufficiently expensive system call that people have complained. Upon inspect nsproxy no longer needs rcu protection for remote reads. remote reads are rare. So optimize for same process reads and write by switching using rask_lock instead. This yields a simpler to understand lock, and a faster setns system call. In particular this fixes a performance regression observed by Rafael David Tinoco <rafael.tinoco@canonical.com>. This is effectively a revert of Pavel Emelyanov's commit cf7b708c Make access to task's nsproxy lighter from 2007. The race this originialy fixed no longer exists as do_notify_parent uses task_active_pid_ns(parent) instead of parent->nsproxy. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Andrey Ryabinin 提交于
Sasha's report: > While fuzzing with trinity inside a KVM tools guest running the latest -next > kernel with the KASAN patchset, I've stumbled on the following spew: > > [ 4448.949424] ================================================================== > [ 4448.951737] AddressSanitizer: user-memory-access on address 0 > [ 4448.952988] Read of size 2 by thread T19638: > [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813 > [ 4448.956823] ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40 > [ 4448.958233] ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d > [ 4448.959552] 0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000 > [ 4448.961266] Call Trace: > [ 4448.963158] dump_stack (lib/dump_stack.c:52) > [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184) > [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352) > [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339) > [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339) > [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555) > [ 4448.970103] sock_sendmsg (net/socket.c:654) > [ 4448.971584] ? might_fault (mm/memory.c:3741) > [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740) > [ 4448.973596] ? verify_iovec (net/core/iovec.c:64) > [ 4448.974522] ___sys_sendmsg (net/socket.c:2096) > [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254) > [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273) > [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1)) > [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188) > [ 4448.980535] __sys_sendmmsg (net/socket.c:2181) > [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600) > [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607) > [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2)) > [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600) > [ 4448.986754] SyS_sendmmsg (net/socket.c:2201) > [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542) > [ 4448.988929] ================================================================== This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0. After this report there was no usual "Unable to handle kernel NULL pointer dereference" and this gave me a clue that address 0 is mapped and contains valid socket address structure in it. This bug was introduced in f3d33426 (net: rework recvmsg handler msg_name and msg_namelen logic). Commit message states that: "Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address." But in fact this affects sendto when address 0 is mapped and contains socket address structure in it. In such case copy-in address will succeed, verify_iovec() function will successfully exit with msg->msg_namelen > 0 and msg->msg_name == NULL. This patch fixes it by setting msg_namelen to 0 if msg_name == NULL. Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Eric Dumazet <edumazet@google.com> Cc: <stable@vger.kernel.org> Reported-by: NSasha Levin <sasha.levin@oracle.com> Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
The SO_TIMESTAMPING API defines three types of timestamps: software, hardware in raw format (hwtstamp) and hardware converted to system format (syststamp). The last has been deprecated in favor of combining hwtstamp with a PTP clock driver. There are no active users in the kernel. The option was device driver dependent. If set, but without hardware support, the correct behavior is to return zero in the relevant field in the SCM_TIMESTAMPING ancillary message. Without device drivers implementing the option, this field is effectively always zero. Remove the internal plumbing to dissuage new drivers from implementing the feature. Keep the SOF_TIMESTAMPING_SYS_HARDWARE flag, however, to avoid breaking existing applications that request the timestamp. Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 7月, 2014 1 次提交
-
-
由 Jun Zhao 提交于
ndm_type means L3 address type, in neighbour proxy and vxlan, it's RTN_UNICAST. NDA_DST is for netlink TLV type, hence it's not right value in this context. Signed-off-by: NJun Zhao <mypopydev@gmail.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 7月, 2014 2 次提交
-
-
由 WANG Cong 提交于
"net" is normally for struct net*, pointer to struct net_device should be named to either "dev" or "ndev" etc. Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
eBPF is used by socket filtering, seccomp and soon by tracing and exposed to userspace, therefore 'sock_filter_int' name is not accurate. Rename it to 'bpf_insn' Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 7月, 2014 2 次提交
-
-
由 Alexei Starovoitov 提交于
BPF is used in several kernel components. This split creates logical boundary between generic eBPF core and the rest kernel/bpf/core.c: eBPF interpreter net/core/filter.c: classic->eBPF converter, classic verifiers, socket filters This patch only moves functions. Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sorin Dumitru 提交于
It hasn't been used since commit 0fd7bac6(net: relax rcvbuf limits). Signed-off-by: NSorin Dumitru <sorin@returnze.ro> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 7月, 2014 2 次提交
-
-
由 Veaceslav Falico 提交于
Currently it's done silently (from the kernel part), and thus it might be hard to track the renames from logs. Add a simple netdev_info() to notify the rename, but only in case the previous name was valid. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Vlad Yasevich <vyasevic@redhat.com> CC: stephen hemminger <stephen@networkplumber.org> CC: Jerry Chu <hkchu@google.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: David Laight <David.Laight@ACULAB.COM> Signed-off-by: NVeaceslav Falico <vfalico@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Veaceslav Falico 提交于
This way we'll always know in what status the device is, unless it's running normally (i.e. NETDEV_REGISTERED). Also, emit a warning once in case of a bad reg_state. CC: "David S. Miller" <davem@davemloft.net> CC: Jason Baron <jbaron@akamai.com> CC: Eric Dumazet <edumazet@google.com> CC: Vlad Yasevich <vyasevic@redhat.com> CC: stephen hemminger <stephen@networkplumber.org> CC: Jerry Chu <hkchu@google.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: Joe Perches <joe@perches.com> Signed-off-by: NVeaceslav Falico <vfalico@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 7月, 2014 3 次提交
-
-
由 Alexander Duyck 提交于
This change cleans up ndo_dflt_fdb_del to drop the ENOTSUPP return value since that isn't actually returned anywhere in the code. As a result we are able to drop a few lines by just defaulting this to -EINVAL. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 françois romieu 提交于
Signed-off-by: NFrancois Romieu <romieu@fr.zoreil.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jerry Chu 提交于
Fixed a bug that was introduced by my GRE-GRO patch (bf5a755f net-gre-gro: Add GRE support to the GRO stack) that breaks the forwarding path because various GSO related fields were not set. The bug will cause on the egress path either the GSO code to fail, or a GRE-TSO capable (NETIF_F_GSO_GRE) NICs to choke. The following fix has been tested for both cases. Signed-off-by: NH.K. Jerry Chu <hkchu@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 7月, 2014 6 次提交
-
-
由 Fabian Frederick 提交于
Signed-off-by: NFabian Frederick <fabf@skynet.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Fabian Frederick 提交于
Signed-off-by: NFabian Frederick <fabf@skynet.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Gundersen 提交于
This passes down NET_NAME_USER (or NET_NAME_ENUM) to alloc_netdev(), for any device created over rtnetlink. v9: restore reverse-christmas-tree order of local variables Signed-off-by: NTom Gundersen <teg@jklm.no> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Gundersen 提交于
Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert all users to pass NET_NAME_UNKNOWN. Coccinelle patch: @@ expression sizeof_priv, name, setup, txqs, rxqs, count; @@ ( -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs) +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs) | -alloc_netdev_mq(sizeof_priv, name, setup, count) +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count) | -alloc_netdev(sizeof_priv, name, setup) +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup) ) v9: move comments here from the wrong commit Signed-off-by: NTom Gundersen <teg@jklm.no> Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Gundersen 提交于
Based on a patch from David Herrmann. This is the only place devices can be renamed. v9: restore revers-christmas-tree order of local variables Signed-off-by: NTom Gundersen <teg@jklm.no> Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Gundersen 提交于
Based on a patch by David Herrmann. The name_assign_type attribute gives hints where the interface name of a given net-device comes from. These values are currently defined: NET_NAME_ENUM: The ifname is provided by the kernel with an enumerated suffix, typically based on order of discovery. Names may be reused and unpredictable. NET_NAME_PREDICTABLE: The ifname has been assigned by the kernel in a predictable way that is guaranteed to avoid reuse and always be the same for a given device. Examples include statically created devices like the loopback device and names deduced from hardware properties (including being given explicitly by the firmware). Names depending on the order of discovery, or in any other way on the existence of other devices, must not be marked as PREDICTABLE. NET_NAME_USER: The ifname was provided by user-space during net-device setup. NET_NAME_RENAMED: The net-device has been renamed from userspace. Once this type is set, it cannot change again. NET_NAME_UNKNOWN: This is an internal placeholder to indicate that we yet haven't yet categorized the name. It will not be exposed to userspace, rather -EINVAL is returned. The aim of these patches is to improve user-space renaming of interfaces. As a general rule, userspace must rename interfaces to guarantee that names stay the same every time a given piece of hardware appears (at boot, or when attaching it). However, there are several situations where userspace should not perform the renaming, and that depends on both the policy of the local admin, but crucially also on the nature of the current interface name. If an interface was created in repsonse to a userspace request, and userspace already provided a name, we most probably want to leave that name alone. The main instance of this is wifi-P2P devices created over nl80211, which currently have a long-standing bug where they are getting renamed by udev. We label such names NET_NAME_USER. If an interface, unbeknown to us, has already been renamed from userspace, we most probably want to leave also that alone. This will typically happen when third-party plugins (for instance to udev, but the interface is generic so could be from anywhere) renames the interface without informing udev about it. A typical situation is when you switch root from an installer or an initrd to the real system and the new instance of udev does not know what happened before the switch. These types of problems have caused repeated issues in the past. To solve this, once an interface has been renamed, its name is labelled NET_NAME_RENAMED. In many cases, the kernel is actually able to name interfaces in such a way that there is no need for userspace to rename them. This is the case when the enumeration order of devices, or in fact any other (non-parent) device on the system, can not influence the name of the interface. Examples include statically created devices, or any naming schemes based on hardware properties of the interface. In this case the admin may prefer to use the kernel-provided names, and to make that possible we label such names NET_NAME_PREDICTABLE. We want the kernel to have tho possibilty of performing predictable interface naming itself (and exposing to userspace that it has), as the information necessary for a proper naming scheme for a certain class of devices may not be exposed to userspace. The case where renaming is almost certainly desired, is when the kernel has given the interface a name using global device enumeration based on order of discovery (ethX, wlanY, etc). These naming schemes are labelled NET_NAME_ENUM. Lastly, a fallback is left as NET_NAME_UNKNOWN, to indicate that a driver has not yet been ported. This is mostly useful as a transitionary measure, allowing us to label the various naming schemes bit by bit. v8: minor documentation fixes v9: move comment to the right commit Signed-off-by: NTom Gundersen <teg@jklm.no> Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com> Reviewed-by: NKay Sievers <kay@vrfy.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 7月, 2014 2 次提交
-
-
由 Tejun Heo 提交于
Currently, cgroup_subsys->base_cftypes is used for both the unified default hierarchy and legacy ones and subsystems can mark each file with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear only on one of them. This is quite hairy and error-prone. Also, we may end up exposing interface files to the default hierarchy without thinking it through. cgroup_subsys will grow two separate cftype arrays and apply each only on the hierarchies of the matching type. This will allow organizing cftypes in a lot clearer way and encourage subsystems to scrutinize the interface which is being exposed in the new default hierarchy. In preparation, this patch renames cgroup_subsys->base_cftypes to cgroup_subsys->legacy_cftypes. This patch is pure rename. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Acked-by: NLi Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
-
由 Mathias Krause 提交于
The code in neigh_sysctl_register() relies on a specific layout of struct neigh_table, namely that the 'gc_*' variables are directly following the 'parms' member in a specific order. The code, though, expresses this in the most ugly way. Get rid of the ugly casts and use the 'tbl' pointer to get a handle to the table. This way we can refer to the 'gc_*' variables directly. Similarly seen in the grsecurity patch, written by Brad Spengler. Signed-off-by: NMathias Krause <minipli@googlemail.com> Cc: Brad Spengler <spender@grsecurity.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 7月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
Add const attribute to filter argument to make clear it is no longer modified. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NDaniel Borkmann <dborkman@redhat.com> Acked-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 7月, 2014 2 次提交
-
-
由 Jamal Hadi Salim 提交于
Actually better than brctl showmacs because we can filter by bridge port in the kernel. The current bridge netlink interface doesnt scale when you have many bridges each with large fdbs or even bridges with many bridge ports And now for the science non-fiction novel you have all been waiting for.. //lets see what bridge ports we have root@moja-1:/configs/may30-iprt/bridge# ./bridge link show 8: eth1 state DOWN : <BROADCAST,MULTICAST> mtu 1500 master br0 state disabled priority 32 cost 19 17: sw1-p1 state DOWN : <BROADCAST,NOARP> mtu 1500 master br0 state disabled priority 32 cost 100 // show all.. root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show 33:33:00:00:00:01 dev bond0 self permanent 33:33:00:00:00:01 dev dummy0 self permanent 33:33:00:00:00:01 dev ifb0 self permanent 33:33:00:00:00:01 dev ifb1 self permanent 33:33:00:00:00:01 dev eth0 self permanent 01:00:5e:00:00:01 dev eth0 self permanent 33:33:ff:22:01:01 dev eth0 self permanent 02:00:00:12:01:02 dev eth1 vlan 0 master br0 permanent 00:17:42:8a:b4:05 dev eth1 vlan 0 master br0 permanent 00:17:42:8a:b4:07 dev eth1 self permanent 33:33:00:00:00:01 dev eth1 self permanent 33:33:00:00:00:01 dev gretap0 self permanent da:ac:46:27:d9:53 dev sw1-p1 vlan 0 master br0 permanent 33:33:00:00:00:01 dev sw1-p1 self permanent //filter by bridge root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show br br0 02:00:00:12:01:02 dev eth1 vlan 0 master br0 permanent 00:17:42:8a:b4:05 dev eth1 vlan 0 master br0 permanent 00:17:42:8a:b4:07 dev eth1 self permanent 33:33:00:00:00:01 dev eth1 self permanent da:ac:46:27:d9:53 dev sw1-p1 vlan 0 master br0 permanent 33:33:00:00:00:01 dev sw1-p1 self permanent // bridge sw1 has no ports attached.. root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show br sw1 //filter by port root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show brport eth1 02:00:00:12:01:02 vlan 0 master br0 permanent 00:17:42:8a:b4:05 vlan 0 master br0 permanent 00:17:42:8a:b4:07 self permanent 33:33:00:00:00:01 self permanent // filter by port + bridge root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show br br0 brport sw1-p1 da:ac:46:27:d9:53 vlan 0 master br0 permanent 33:33:00:00:00:01 self permanent // for shits and giggles (as they say in New Brunswick), lets // change the mac that br0 uses // Note: a magical fdb entry with no brport is added ... root@moja-1:/configs/may30-iprt/bridge# ip link set dev br0 address 02:00:00:12:01:04 // lets see if we can see the unicorn .. root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show 33:33:00:00:00:01 dev bond0 self permanent 33:33:00:00:00:01 dev dummy0 self permanent 33:33:00:00:00:01 dev ifb0 self permanent 33:33:00:00:00:01 dev ifb1 self permanent 33:33:00:00:00:01 dev eth0 self permanent 01:00:5e:00:00:01 dev eth0 self permanent 33:33:ff:22:01:01 dev eth0 self permanent 02:00:00:12:01:02 dev eth1 vlan 0 master br0 permanent 00:17:42:8a:b4:05 dev eth1 vlan 0 master br0 permanent 00:17:42:8a:b4:07 dev eth1 self permanent 33:33:00:00:00:01 dev eth1 self permanent 33:33:00:00:00:01 dev gretap0 self permanent 02:00:00:12:01:04 dev br0 vlan 0 master br0 permanent <=== there it is da:ac:46:27:d9:53 dev sw1-p1 vlan 0 master br0 permanent 33:33:00:00:00:01 dev sw1-p1 self permanent //can we see it if we filter by bridge? root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show br br0 02:00:00:12:01:02 dev eth1 vlan 0 master br0 permanent 00:17:42:8a:b4:05 dev eth1 vlan 0 master br0 permanent 00:17:42:8a:b4:07 dev eth1 self permanent 33:33:00:00:00:01 dev eth1 self permanent 02:00:00:12:01:04 dev br0 vlan 0 master br0 permanent <=== there it is da:ac:46:27:d9:53 dev sw1-p1 vlan 0 master br0 permanent 33:33:00:00:00:01 dev sw1-p1 self permanent Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jamal Hadi Salim 提交于
Dumping a bridge fdb dumps every fdb entry held. With this change we are going to filter on selected bridge port. Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-