- 16 11月, 2015 5 次提交
-
-
由 Daniel Borkmann 提交于
Since it's introduction in commit 69e3c75f ("net: TX_RING and packet mmap"), TX_RING could be used from SOCK_DGRAM and SOCK_RAW side. When used with SOCK_DGRAM only, the size_max > dev->mtu + reserve check should have reserve as 0, but currently, this is unconditionally set (in it's original form as dev->hard_header_len). I think this is not correct since tpacket_fill_skb() would then take dev->mtu and dev->hard_header_len into account for SOCK_DGRAM, the extra VLAN_HLEN could be possible in both cases. Presumably, the reserve code was copied from packet_snd(), but later on missed the check. Make it similar as we have it in packet_snd(). Fixes: 69e3c75f ("net: TX_RING and packet mmap") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
In case no struct sockaddr_ll has been passed to packet socket's sendmsg() when doing a TX_RING flush run, then skb->protocol is set to po->num instead, which is the protocol passed via socket(2)/bind(2). Applications only xmitting can go the path of allocating the socket as socket(PF_PACKET, <mode>, 0) and do a bind(2) on the TX_RING with sll_protocol of 0. That way, register_prot_hook() is neither called on creation nor on bind time, which saves cycles when there's no interest in capturing anyway. That leaves us however with po->num 0 instead and therefore the TX_RING flush run sets skb->protocol to 0 as well. Eric reported that this leads to problems when using tools like trafgen over bonding device. I.e. the bonding's hash function could invoke the kernel's flow dissector, which depends on skb->protocol being properly set. In the current situation, all the traffic is then directed to a single slave. Fix it up by inferring skb->protocol from the Ethernet header when not set and we have ARPHRD_ETHER device type. This is only done in case of SOCK_RAW and where we have a dev->hard_header_len length. In case of ARPHRD_ETHER devices, this is guaranteed to cover ETH_HLEN, and therefore being accessed on the skb after the skb_store_bits(). Reported-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Packet sockets can be used by various net devices and are not really restricted to ARPHRD_ETHER device types. However, when currently checking for the extra 4 bytes that can be transmitted in VLAN case, our assumption is that we generally probe on ARPHRD_ETHER devices. Therefore, before looking into Ethernet header, check the device type first. This also fixes the issue where non-ARPHRD_ETHER devices could have no dev->hard_header_len in TX_RING SOCK_RAW case, and thus the check would test unfilled linear part of the skb (instead of non-linear). Fixes: 57f89bfa ("network: Allow af_packet to transmit +4 bytes for VLAN packets.") Fixes: 52f1454f ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
We concluded that the skb_probe_transport_header() should better be called unconditionally. Avoiding the call into the flow dissector has also not really much to do with the direct xmit mode. While it seems that only virtio_net code makes use of GSO from non RX/TX ring packet socket paths, we should probe for a transport header nevertheless before they hit devices. Reference: http://thread.gmane.org/gmane.linux.network/386173/Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
In tpacket_fill_skb() commit c1aad275 ("packet: set transport header before doing xmit") and later on 40893fd0 ("net: switch to use skb_probe_transport_header()") was probing for a transport header on the skb from a ring buffer slot, but at a time, where the skb has _not even_ been filled with data yet. So that call into the flow dissector is pretty useless. Lets do it after we've set up the skb frags. Fixes: c1aad275 ("packet: set transport header before doing xmit") Reported-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 11月, 2015 1 次提交
-
-
由 Francesco Ruggeri 提交于
There is a race conditions between packet_notifier and packet_bind{_spkt}. It happens if packet_notifier(NETDEV_UNREGISTER) executes between the time packet_bind{_spkt} takes a reference on the new netdevice and the time packet_do_bind sets po->ifindex. In this case the notification can be missed. If this happens during a dev_change_net_namespace this can result in the netdevice to be moved to the new namespace while the packet_sock in the old namespace still holds a reference on it. When the netdevice is later deleted in the new namespace the deletion hangs since the packet_sock is not found in the new namespace' &net->packet.sklist. It can be reproduced with the script below. This patch makes packet_do_bind check again for the presence of the netdevice in the packet_sock's namespace after the synchronize_net in unregister_prot_hook. More in general it also uses the rcu lock for the duration of the bind to stop dev_change_net_namespace/rollback_registered_many from going past the synchronize_net following unlist_netdevice, so that no NETDEV_UNREGISTER notifications can happen on the new netdevice while the bind is executing. In order to do this some code from packet_bind{_spkt} is consolidated into packet_do_dev. import socket, os, time, sys proto=7 realDev='em1' vlanId=400 if len(sys.argv) > 1: vlanId=int(sys.argv[1]) dev='vlan%d' % vlanId os.system('taskset -p 0x10 %d' % os.getpid()) s = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, proto) os.system('ip link add link %s name %s type vlan id %d' % (realDev, dev, vlanId)) os.system('ip netns add dummy') pid=os.fork() if pid == 0: # dev should be moved while packet_do_bind is in synchronize net os.system('taskset -p 0x20000 %d' % os.getpid()) os.system('ip link set %s netns dummy' % dev) os.system('ip netns exec dummy ip link del %s' % dev) s.close() sys.exit(0) time.sleep(.004) try: s.bind(('%s' % dev, proto+1)) except: print 'Could not bind socket' s.close() os.system('ip netns del dummy') sys.exit(0) os.waitpid(pid, 0) s.close() os.system('ip netns del dummy') sys.exit(0) Signed-off-by: NFrancesco Ruggeri <fruggeri@arista.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 10月, 2015 3 次提交
-
-
由 Eric W. Biederman 提交于
The function ip_defrag is called on both the input and the output paths of the networking stack. In particular conntrack when it is tracking outbound packets from the local machine calls ip_defrag. So add a struct net parameter and stop making ip_defrag guess which network namespace it needs to defragment packets in. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Acked-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Recent TCP listener patches exposed a prior af_packet bug : match_fanout_group() blindly assumes it is always safe to cast sk to a packet socket to compare fanout with af_packet_priv But SYNACK packets can be sent while attached to request_sock, which are smaller than a "struct sock". We can read non existent memory and crash. Fixes: c0de08d0 ("af_packet: don't emit packet on orig fanout group") Fixes: ca6fb065 ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Eric Leblond <eric@regit.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Edward Jee 提交于
Signed-off-by: NEdward Hyunkoo Jee <edjee@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 10月, 2015 1 次提交
-
-
由 Alexei Starovoitov 提交于
eBPF socket filter programs may see junk in 'u32 cb[5]' area, since it could have been used by protocol layers earlier. For socket filter programs used in af_packet we need to clean 20 bytes of skb->cb area if it could be used by the program. For programs attached to TCP/UDP sockets we need to save/restore these 20 bytes, since it's used by protocol layers. Remove SK_RUN_FILTER macro, since it's no longer used. Long term we may move this bpf cb area to per-cpu scratch, but that requires addition of new 'per-cpu load/store' instructions, so not suitable as a short term fix. Fixes: d691f9e8 ("bpf: allow programs to write to certain skb fields") Reported-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 10月, 2015 1 次提交
-
-
由 Daniel Borkmann 提交于
The current ongoing effort to dump existing cBPF seccomp filters back to user space requires to hold the pre-transformed instructions like we do in case of socket filters from sk_attach_filter() side, so they can be reloaded in original form at a later point in time by utilities such as criu. To prepare for this, simply extend the bpf_prog_create_from_user() API to hold a flag that tells whether we should store the original or not. Also, fanout filters could make use of that in future for things like diag. While fanout filters already use bpf_prog_destroy(), move seccomp over to them as well to handle original programs when present. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Cc: Tycho Andersen <tycho.andersen@canonical.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Tested-by: NTycho Andersen <tycho.andersen@canonical.com> Acked-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 9月, 2015 1 次提交
-
-
由 David Woodhouse 提交于
Commit 7d824109 ("virtio: add explicit big-endian support to memory accessors") accidentally changed the virtio_net header used by AF_PACKET with PACKET_VNET_HDR from host-endian to big-endian. Since virtio_legacy_is_little_endian() is a very long identifier, define a vio_le macro and use that throughout the code instead of the hard-coded 'false' for little-endian. This restores the ABI to match 4.1 and earlier kernels, and makes my test program work again. Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 8月, 2015 2 次提交
-
-
由 Willem de Bruijn 提交于
Add fanout mode PACKET_FANOUT_EBPF that accepts an en extended BPF program to select a socket. Update the internal eBPF program by passing to socket option SOL_PACKET/PACKET_FANOUT_DATA a file descriptor returned by bpf(). Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NAlexei Starovoitov <ast@plumgrid.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
Add fanout mode PACKET_FANOUT_CBPF that accepts a classic BPF program to select a socket. This avoids having to keep adding special case fanout modes. One example use case is application layer load balancing. The QUIC protocol, for instance, encodes a connection ID in UDP payload. Also add socket option SOL_PACKET/PACKET_FANOUT_DATA that updates data associated with the socket group. Fanout mode PACKET_FANOUT_CBPF is the only user so far. Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NAlexei Starovoitov <ast@plumgrid.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 7月, 2015 1 次提交
-
-
由 Tobias Klauser 提交于
Follow e8e85cc5 ("packet: remove handling of tx_ring") and remove the tx_ring parameter from prb_shutdown_retire_blk_timer() as it is only called with tx_ring = 0. Signed-off-by: NTobias Klauser <tklauser@distanz.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 7月, 2015 1 次提交
-
-
由 Alexander Drozdov 提交于
tpacket_fill_skb() can return a negative value (-errno) which is stored in tp_len variable. In that case the following condition will be (but shouldn't be) true: tp_len > dev->mtu + dev->hard_header_len as dev->mtu and dev->hard_header_len are both unsigned. That may lead to just returning an incorrect EMSGSIZE errno to the user. Fixes: 52f1454f ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case") Signed-off-by: NAlexander Drozdov <al.drozdov@gmail.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 7月, 2015 1 次提交
-
-
由 Lars Westerhoff 提交于
When binding a PF_PACKET socket, the use count of the bound interface is always increased with dev_hold in dev_get_by_{index,name}. However, when rebound with the same protocol and device as in the previous bind the use count of the interface was not decreased. Ultimately, this caused the deletion of the interface to fail with the following message: unregister_netdevice: waiting for dummy0 to become free. Usage count = 1 This patch moves the dev_put out of the conditional part that was only executed when either the protocol or device changed on a bind. Fixes: 902fefb8 ('packet: improve socket create/bind latency in some cases') Signed-off-by: NLars Westerhoff <lars.westerhoff@newtec.eu> Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Reviewed-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 6月, 2015 1 次提交
-
-
由 Maninder Singh 提交于
Remove handling of tx_ring in prb_setup_retire_blk_timer for TPACKET_V3 because init_prb_bdqc is called only for zero tx_ring and thus prb_setup_retire_blk_timer for zero tx_ring only. And also in functon init_prb_bdqc there is no usage of tx_ring. Thus removing tx_ring from init_prb_bdqc. Signed-off-by: NManinder Singh <maninder1.s@samsung.com> Suggested-by: NFrans Klaver <fransklaver@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 6月, 2015 3 次提交
-
-
由 Willem de Bruijn 提交于
PACKET_FANOUT_LB computes f->rr_cur such that it is modulo f->num_members. It returns the old value unconditionally, but f->num_members may have changed since the last store. Ensure that the return value is always < num. When modifying the logic, simplify it further by replacing the loop with an unconditional atomic increment. Fixes: dc99f600 ("packet: Add fanout support.") Suggested-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
Destruction of the po->rollover must be delayed until there are no more packets in flight that can access it. The field is destroyed in packet_release, before synchronize_net. Delay using rcu. Fixes: 0648ab70 ("packet: rollover prepare: per-socket state") Suggested-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We need to tell compiler it must not read f->num_members multiple times. Otherwise testing if num is not zero is flaky, and we could attempt an invalid divide by 0 in fanout_demux_cpu() Note bug was present in packet_rcv_fanout_hash() and packet_rcv_fanout_lb() but final 3.1 had a simple location after commit 95ec3eb4 ("packet: Add 'cpu' fanout policy.") Fixes: dc99f600 ("packet: Add fanout support.") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 5月, 2015 1 次提交
-
-
由 Willem de Bruijn 提交于
Rollover can be enabled as flag or mode. Allocate state in both cases. This solves a NULL pointer exception in fanout_demux_rollover on referencing po->rollover if using mode rollover. Also make sure that in rollover mode each silo is tried (contrary to rollover flag, where the main socket is excluded after an initial try_self). Tested: Passes tools/testing/net/psock_fanout.c, which tests both modes and flag. My previous tests were limited to bench_rollover, which only stresses the flag. The test now completes safely. it still gives an error for mode rollover, because it does not expect the new headroom (ROOM_NORMAL) requirement. I will send a separate patch to the test. Fixes: 0648ab70 ("packet: rollover prepare: per-socket state") Signed-off-by: NWillem de Bruijn <willemb@google.com> ---- I should have run this test and caught this before submission, of course. Apologies for the oversight. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 5月, 2015 1 次提交
-
-
由 Willem de Bruijn 提交于
Avoid two xchg calls whose return values were unused, causing a warning on some architectures. The relevant variable is a hint and read without mutual exclusion. This fix makes all writers hold the receive_queue lock. Suggested-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 5月, 2015 6 次提交
-
-
由 Willem de Bruijn 提交于
Rollover indicates exceptional conditions. Export a counter to inform socket owners of this state. If no socket with sufficient room is found, rollover fails. Also count these events. Finally, also count when flows are rolled over early thanks to huge flow detection, to validate its correctness. Tested: Read counters in bench_rollover on all other tests in the patchset Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
Migrate flows from a socket to another socket in the fanout group not only when the socket is full. Start migrating huge flows early, to divert possible 4-tuple attacks without affecting normal traffic. Introduce fanout_flow_is_huge(). This detects huge flows, which are defined as taking up more than half the load. It does so cheaply, by storing the rxhashes of the N most recent packets. If over half of these are the same rxhash as the current packet, then drop it. This only protects against 4-tuple attacks. N is chosen to fit all data in a single cache line. Tested: Ran bench_rollover for 10 sec with 1.5 Mpps of single flow input. lpbb5:/export/hda3/willemb# ./bench_rollover -l 1000 -r -s cpu rx rx.k drop.k rollover r.huge r.failed 0 14 14 0 0 0 0 1 20 20 0 0 0 0 2 16 16 0 0 0 0 3 6168824 6168824 0 4867721 4867721 0 4 4867741 4867741 0 0 0 0 5 12 12 0 0 0 0 6 15 15 0 0 0 0 7 17 17 0 0 0 0 Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
Rollover has to call packet_rcv_has_room on sockets in the fanout group to find a socket to migrate to. This operation is expensive especially if the packet sockets use rings, when a lock has to be acquired. Avoid pounding on the lock by all sockets by temporarily marking a socket as "under memory pressure" when such pressure is detected. While set, only the socket owner may call packet_rcv_has_room on the socket. Once it detects normal conditions, it clears the flag. The socket is not used as a victim by any other socket in the meantime. Under reasonably balanced load, each socket writer frequently calls packet_rcv_has_room and clears its own pressure field. As a backup for when the socket is rarely written to, also clear the flag on reading (packet_recvmsg, packet_poll) if this can be done cheaply (i.e., without calling packet_rcv_has_room). This is only for edge cases. Tested: Ran bench_rollover: a process with 8 sockets in a single fanout group, each pinned to a single cpu that receives one nic recv interrupt. RPS and RFS are disabled. The benchmark uses packet rx_ring, which has to take a lock when determining whether a socket has room. Sent 3.5 Mpps of UDP traffic with sufficient entropy to spread uniformly across the packet sockets (and inserted an iptables rule to drop in PREROUTING to avoid protocol stack processing). Without this patch, all sockets try to migrate traffic to neighbors, causing lock contention when searching for a non- empty neighbor. The lock is the top 9 entries. perf record -a -g sleep 5 - 17.82% bench_rollover [kernel.kallsyms] [k] _raw_spin_lock - _raw_spin_lock - 99.00% spin_lock + 81.77% packet_rcv_has_room.isra.41 + 18.23% tpacket_rcv + 0.84% packet_rcv_has_room.isra.41 + 5.20% ksoftirqd/6 [kernel.kallsyms] [k] _raw_spin_lock + 5.15% ksoftirqd/1 [kernel.kallsyms] [k] _raw_spin_lock + 5.14% ksoftirqd/2 [kernel.kallsyms] [k] _raw_spin_lock + 5.12% ksoftirqd/7 [kernel.kallsyms] [k] _raw_spin_lock + 5.12% ksoftirqd/5 [kernel.kallsyms] [k] _raw_spin_lock + 5.10% ksoftirqd/4 [kernel.kallsyms] [k] _raw_spin_lock + 4.66% ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_lock + 4.45% ksoftirqd/3 [kernel.kallsyms] [k] _raw_spin_lock + 1.55% bench_rollover [kernel.kallsyms] [k] packet_rcv_has_room.isra.41 On net-next with this patch, this lock contention is no longer a top entry. Most time is spent in the actual read function. Next up are other locks: + 15.52% bench_rollover bench_rollover [.] reader + 4.68% swapper [kernel.kallsyms] [k] memcpy_erms + 2.77% swapper [kernel.kallsyms] [k] packet_lookup_frame.isra.51 + 2.56% ksoftirqd/1 [kernel.kallsyms] [k] memcpy_erms + 2.16% swapper [kernel.kallsyms] [k] tpacket_rcv + 1.93% swapper [kernel.kallsyms] [k] mlx4_en_process_rx_cq Looking closer at the remaining _raw_spin_lock, the cost of probing in rollover is now comparable to the cost of taking the lock later in tpacket_rcv. - 1.51% swapper [kernel.kallsyms] [k] _raw_spin_lock - _raw_spin_lock + 33.41% packet_rcv_has_room + 28.15% tpacket_rcv + 19.54% enqueue_to_backlog + 6.45% __free_pages_ok + 2.78% packet_rcv_fanout + 2.13% fanout_demux_rollover + 2.01% netif_receive_skb_internal Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
Only migrate flows to sockets that have sufficient headroom, where sufficient is defined as having at least 25% empty space. The kernel has three different buffer types: a regular socket, a ring with frames (TPACKET_V[12]) or a ring with blocks (TPACKET_V3). The latter two do not expose a read pointer to the kernel, so headroom is not computed easily. All three needs a different implementation to estimate free space. Tested: Ran bench_rollover for 10 sec with 1.5 Mpps of single flow input. bench_rollover has as many sockets as there are NIC receive queues in the system. Each socket is owned by a process that is pinned to one of the receive cpus. RFS is disabled. RPS is enabled with an identity mapping (cpu x -> cpu x), to count drops with softnettop. lpbb5:/export/hda3/willemb# ./bench_rollover -r -l 1000 -s Press [Enter] to exit cpu rx rx.k drop.k rollover r.huge r.failed 0 16 16 0 0 0 0 1 21 21 0 0 0 0 2 5227502 5227502 0 0 0 0 3 18 18 0 0 0 0 4 6083289 6083289 0 5227496 0 0 5 22 22 0 0 0 0 6 21 21 0 0 0 0 7 9 9 0 0 0 0 Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
Replace rollover state per fanout group with state per socket. Future patches will add fields to the new structure. Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
packet_rcv_fanout calls fanout_demux_rollover twice. Move all rollover logic into the callee to simplify these callsites, especially with upcoming changes. The main differences between the two callsites is that the FLAG variant tests whether the socket previously selected by another mode (RR, RND, HASH, ..) has room before migrating flows, whereas the rollover mode has no original socket to test. Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 5月, 2015 2 次提交
-
-
由 Eric W. Biederman 提交于
In preparation for changing how struct net is refcounted on kernel sockets pass the knowledge that we are creating a kernel socket from sock_create_kern through to sk_alloc. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kretschmer, Mathias 提交于
This patch fixes an issue where the send(MSG_DONTWAIT) call on a TX_RING is not fully non-blocking in cases where the device's sndBuf is full. We pass nonblock=true to sock_alloc_send_skb() and return any possibly occuring error code (most likely EGAIN) to the caller. As the fast-path stays as it is, we keep the unlikely() around skb == NULL. Signed-off-by: NMathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 3月, 2015 2 次提交
-
-
由 Alexander Drozdov 提交于
Introduce TP_STATUS_CSUM_VALID tp_status flag to tell the af_packet user that at least the transport header checksum has been already validated. For now, the flag may be set for incoming packets only. Signed-off-by: NAlexander Drozdov <al.drozdov@gmail.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Drozdov 提交于
It is just an optimization. We don't need the value of status variable if the packet is filtered. Signed-off-by: NAlexander Drozdov <al.drozdov@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 3月, 2015 1 次提交
-
-
由 Francesco Ruggeri 提交于
When an interface is deleted from a net namespace the ifindex in the corresponding entries in PF_PACKET sockets' mclists becomes stale. This can create inconsistencies if later an interface with the same ifindex is moved from a different namespace (not that unlikely since ifindexes are per-namespace). In particular we saw problems with dev->promiscuity, resulting in "promiscuity touches roof, set promiscuity failed. promiscuity feature of device might be broken" warnings and EOVERFLOW failures of setsockopt(PACKET_ADD_MEMBERSHIP). This patch deletes the mclist entries for interfaces that are deleted. Since this now causes setsockopt(PACKET_DROP_MEMBERSHIP) to fail with EADDRNOTAVAIL if called after the interface is deleted, also make packet_mc_drop not fail. Signed-off-by: NFrancesco Ruggeri <fruggeri@arista.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 3月, 2015 1 次提交
-
-
由 Ying Xue 提交于
After TIPC doesn't depend on iocb argument in its internal implementations of sendmsg() and recvmsg() hooks defined in proto structure, no any user is using iocb argument in them at all now. Then we can drop the redundant iocb argument completely from kinds of implementations of both sendmsg() and recvmsg() in the entire networking stack. Cc: Christoph Hellwig <hch@lst.de> Suggested-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 3月, 2015 3 次提交
-
-
由 Eyal Birger 提交于
As part of an effort to move skb->dropcount to skb->cb[], use a common function in order to set dropcount in struct sk_buff. Signed-off-by: NEyal Birger <eyal.birger@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eyal Birger 提交于
As part of an effort to move skb->dropcount to skb->cb[] use a common macro in protocol families using skb->cb[] for ancillary data to validate available room in skb->cb[]. Signed-off-by: NEyal Birger <eyal.birger@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eyal Birger 提交于
As part of an effort to move skb->dropcount to skb->cb[], 4 bytes of additional room are needed in skb->cb[] in packet sockets. Store the skb original length in the first two fields of sockaddr_ll (sll_family and sll_protocol) as they can be derived from the skb when needed. Signed-off-by: NEyal Birger <eyal.birger@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 2月, 2015 1 次提交
-
-
由 Alexander Drozdov 提交于
Before da413eec ("packet: Fixed TPACKET V3 to signal poll when block is closed rather than every packet") poll listening for an af_packet socket was not signaled if there was no packets to process. After the patch poll is signaled evety time when block retire timer expires. That happens because af_packet closes the current block on timeout even if the block is empty. Passing empty blocks to the user not only wastes CPU but also wastes ring buffer space increasing probability of packets dropping on small timeouts. Signed-off-by: NAlexander Drozdov <al.drozdov@gmail.com> Cc: Dan Collins <dan@dcollins.co.nz> Cc: Willem de Bruijn <willemb@google.com> Cc: Guy Harris <guy@alum.mit.edu> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 2月, 2015 1 次提交
-
-
由 Alexander Drozdov 提交于
Packets defragmentation was introduced for PACKET_FANOUT_HASH only, see 7736d33f ("packet: Add pre-defragmentation support for ipv4 fanouts") It may be useful to have defragmentation enabled regardless of fanout type. Without that, the AF_PACKET user may have to: 1. Collect fragments from different rings 2. Defragment by itself Signed-off-by: NAlexander Drozdov <al.drozdov@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-