- 04 4月, 2015 2 次提交
-
-
由 Ian Morris 提交于
The ipv4 code uses a mixture of coding styles. In some instances check for non-NULL pointer is done as x != NULL and sometimes as x. x is preferred according to checkpatch and this patch makes the code consistent by adopting the latter form. No changes detected by objdiff. Signed-off-by: NIan Morris <ipm@chirality.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ian Morris 提交于
The ipv4 code uses a mixture of coding styles. In some instances check for NULL pointer is done as x == NULL and sometimes as !x. !x is preferred according to checkpatch and this patch makes the code consistent by adopting the latter form. No changes detected by objdiff. Signed-off-by: NIan Morris <ipm@chirality.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 3月, 2015 2 次提交
-
-
由 Marcelo Ricardo Leitner 提交于
in favor of their inner __ ones, which doesn't grab rtnl. As these functions need to operate on a locked socket, we can't be grabbing rtnl by then. It's too late and doing so causes reversed locking. So this patch: - move rtnl handling to callers instead while already fixing some reversed locking situations, like on vxlan and ipvs code. - renames __ ones to not have the __ mark: __ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group __ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop} Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Marcelo Ricardo Leitner 提交于
There are some setsockopt operations in ipv4 and ipv6 that are grabbing rtnl after having grabbed the socket lock. Yet this makes it impossible to do operations that have to lock the socket when already within a rtnl protected scope, like ndo dev_open and dev_stop. We normally take coarse grained locks first but setsockopt inverted that. So this patch invert the lock logic for these operations and makes setsockopt grab rtnl if it will be needed prior to grabbing socket lock. Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 3月, 2015 1 次提交
-
-
由 Willem de Bruijn 提交于
When reading from the error queue, msg_name and msg_control are only populated for some errors. A new exception for empty timestamp skbs added a false positive on icmp errors without payload. `traceroute -M udpconn` only displayed gateways that return payload with the icmp error: the embedded network headers are pulled before sock_queue_err_skb, leaving an skb with skb->len == 0 otherwise. Fix this regression by refining when msg_name and msg_control branches are taken. The solutions for the two fields are independent. msg_name only makes sense for errors that configure serr->port and serr->addr_offset. Test the first instead of skb->len. This also fixes another issue. saddr could hold the wrong data, as serr->addr_offset is not initialized in some code paths, pointing to the start of the network header. It is only valid when serr->port is set (non-zero). msg_control support differs between IPv4 and IPv6. IPv4 only honors requests for ICMP and timestamps with SOF_TIMESTAMPING_OPT_CMSG. The skb->len test can simply be removed, because skb->dev is also tested and never true for empty skbs. IPv6 honors requests for all errors aside from local errors and timestamps on empty skbs. In both cases, make the policy more explicit by moving this logic to a new function that decides whether to process msg_control and that optionally prepares the necessary fields in skb->cb[]. After this change, the IPv4 and IPv6 paths are more similar. The last case is rxrpc. Here, simply refine to only match timestamps. Fixes: 49ca0d8b ("net-timestamp: no-payload option") Reported-by: NJan Niehusmann <jan@gondor.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> ---- Changes v1->v2 - fix local origin test inversion in ip6_datagram_support_cmsg - make v4 and v6 code paths more similar by introducing analogous ipv4_datagram_support_cmsg - fix compile bug in rxrpc Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 2月, 2015 1 次提交
-
-
由 Willem de Bruijn 提交于
Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit timestamps, this loops timestamps on top of empty packets. Doing so reduces the pressure on SO_RCVBUF. Payload inspection and cmsg reception (aside from timestamps) are no longer possible. This works together with a follow on patch that allows administrators to only allow tx timestamping if it does not loop payload or metadata. Signed-off-by: NWillem de Bruijn <willemb@google.com> ---- Changes (rfc -> v1) - add documentation - remove unnecessary skb->len test (thanks to Richard Cochran) Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 1月, 2015 1 次提交
-
-
由 Willem de Bruijn 提交于
The sockaddr is returned in IP(V6)_RECVERR as part of errhdr. That structure is defined and allocated on the stack as struct { struct sock_extended_err ee; struct sockaddr_in(6) offender; } errhdr; The second part is only initialized for certain SO_EE_ORIGIN values. Always initialize it completely. An MTU exceeded error on a SOCK_RAW/IPPROTO_RAW is one example that would return uninitialized bytes. Signed-off-by: NWillem de Bruijn <willemb@google.com> ---- Also verified that there is no padding between errhdr.ee and errhdr.offender that could leak additional kernel data. Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 1月, 2015 3 次提交
-
-
由 Tom Herbert 提交于
Add ip_cmsg_recv_offset function which takes an offset argument that indicates the starting offset in skb where data is being received from. This will be useful in the case of UDP and provided checksum to user space. ip_cmsg_recv is an inline call to ip_cmsg_recv_offset with offset of zero. Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Add ip_cmsg_recv_offset function which takes an offset argument that indicates the starting offset in skb where data is being received from. This will be useful in the case of UDP and provided checksum to user space. ip_cmsg_recv is an inline call to ip_cmsg_recv_offset with offset of zero. Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Move the IP_CMSG_* constants from ip_sockglue.c to inet_sock.h so that they can be referenced in other source files. Restructure ip_cmsg_recv to not go through flags using shift, check for flags by 'and'. This eliminates both the shift and a conditional per flag check. Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 12月, 2014 1 次提交
-
-
由 Gu Zheng 提交于
Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating cmsghdr from msghdr, just cleanup. Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 12月, 2014 2 次提交
-
-
由 Willem de Bruijn 提交于
Allow reading of timestamps and cmsg at the same time on all relevant socket families. One use is to correlate timestamps with egress device, by asking for cmsg IP_PKTINFO. on AF_INET sockets, call the relevant function (ip_cmsg_recv). To avoid changing legacy expectations, only do so if the caller sets a new timestamping flag SOF_TIMESTAMPING_OPT_CMSG. on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already returned for all origins. only change is to set ifindex, which is not initialized for all error origins. In both cases, only generate the pktinfo message if an ifindex is known. This is not the case for ACK timestamps. The difference between the protocol families is probably a historical accident as a result of the different conditions for generating cmsg in the relevant ip(v6)_recv_error function: ipv4: if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) { ipv6: if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) { At one time, this was the same test bar for the ICMP/ICMP6 distinction. This is no longer true. Signed-off-by: NWillem de Bruijn <willemb@google.com> ---- Changes v1 -> v2 large rewrite - integrate with existing pktinfo cmsg generation code - on ipv4: only send with new flag, to maintain legacy behavior - on ipv6: send at most a single pktinfo cmsg - on ipv6: initialize fields if not yet initialized The recv cmsg interfaces are also relevant to the discussion of whether looping packet headers is problematic. For v6, cmsgs that identify many headers are already returned. This patch expands that to v4. If it sounds reasonable, I will follow with patches 1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY (http://patchwork.ozlabs.org/patch/366967/) 2. sysctl to conditionally drop all timestamps that have payload or cmsg from users without CAP_NET_RAW. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
One line change, in response to catching an occurrence of this bug. See also fix f4713a3d ("net-timestamp: make tcp_recvmsg call ...") Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 11月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
Use IS_ENABLED(CONFIG_IPV6), to enable this code if IPv6 is a module. Signed-off-by: NEric Dumazet <edumazet@google.com> Fixes: c8e6ad08 ("ipv6: honor IPV6_PKTINFO with v4 mapped addresses on sendmsg") Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 11月, 2014 1 次提交
-
-
由 David S. Miller 提交于
This encapsulates all of the skb_copy_datagram_iovec() callers with call argument signature "skb, offset, msghdr->msg_iov, length". When we move to iov_iters in the networking, the iov_iter object will sit in the msghdr. Having a helper like this means there will be less places to touch during that transformation. Based upon descriptions and patch from Al Viro. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 9月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
Remove one sparse warning : net/ipv4/ip_sockglue.c:328:22: warning: incorrect type in assignment (different address spaces) net/ipv4/ip_sockglue.c:328:22: expected struct ip_ra_chain [noderef] <asn:4>*next net/ipv4/ip_sockglue.c:328:22: got struct ip_ra_chain *[assigned] ra And replace one rcu_assign_ptr() by RCU_INIT_POINTER() where applicable. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 9月, 2014 1 次提交
-
-
由 Willem de Bruijn 提交于
sk->sk_error_queue is dequeued in four locations. All share the exact same logic. Deduplicate. Also collapse the two critical sections for dequeue (at the top of the recv handler) and signal (at the bottom). This moves signal generation for the next packet forward, which should be harmless. It also changes the behavior if the recv handler exits early with an error. Previously, a signal for follow-up packets on the errqueue would then not be scheduled. The new behavior, to always signal, is arguably a bug fix. For rxrpc, the change causes the same function to be called repeatedly for each queued packet (because the recv handler == sk_error_report). It is likely that all packets will fail for the same reason (e.g., memory exhaustion). This code runs without sk_lock held, so it is not safe to trust that sk->sk_err is immutable inbetween releasing q->lock and the subsequent test. Introduce int err just to avoid this potential race. Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 7月, 2014 1 次提交
-
-
由 Karoly Kemeny 提交于
Sparse warns because of implicit pointer cast. v2: subject line correction, space between "void" and "*" Signed-off-by: NKaroly Kemeny <karoly.kemeny@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 2月, 2014 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
IP_PMTUDISC_INTERFACE has a design error: because it does not allow the generation of fragments if the interface mtu is exceeded, it is very hard to make use of this option in already deployed name server software for which I introduced this option. This patch adds yet another new IP_MTU_DISCOVER option to not honor any path mtu information and not accepting new icmp notifications destined for the socket this option is enabled on. But we allow outgoing fragmentation in case the packet size exceeds the outgoing interface mtu. As such this new option can be used as a drop-in replacement for IP_PMTUDISC_DONT, which is currently in use by most name server software making the adoption of this option very smooth and easy. The original advantage of IP_PMTUDISC_INTERFACE is still maintained: ignoring incoming path MTU updates and not honoring discovered path MTUs in the output path. Fixes: 482fc609 ("ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE") Cc: Florian Weimer <fweimer@redhat.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 2月, 2014 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
In case we decide in udp6_sendmsg to send the packet down the ipv4 udp_sendmsg path because the destination is either of family AF_INET or the destination is an ipv4 mapped ipv6 address, we don't honor the maybe specified ipv4 mapped ipv6 address in IPV6_PKTINFO. We simply can check for this option in ip_cmsg_send because no calls to ipv6 module functions are needed to do so. Reported-by: NGert Doering <gert@space.net> Cc: Tore Anderson <tore@fud.no> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 1月, 2014 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data for IPv4 datagrams on IPv6 sockets. This patch splits the ip6_datagram_recv_ctl into two functions, one which handles both protocol families, AF_INET and AF_INET6, while the ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data. ip6_datagram_recv_*_ctl never reported back any errors, so we can make them return void. Also provide a helper for protocols which don't offer dual personality to further use ip6_datagram_recv_ctl, which is exported to modules. I needed to shuffle the code for ping around a bit to make it easier to implement dual personality for ping ipv6 sockets in future. Reported-by: NGert Doering <gert@space.net> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 1月, 2014 1 次提交
-
-
由 Steffen Hurrle 提交于
This is a follow-up patch to f3d33426 ("net: rework recvmsg handler msg_name and msg_namelen logic"). DECLARE_SOCKADDR validates that the structure we use for writing the name information to is not larger than the buffer which is reserved for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR consistently in sendmsg code paths. Signed-off-by: NSteffen Hurrle <steffen@hurrle.net> Suggested-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 12月, 2013 1 次提交
-
-
由 stephen hemminger 提交于
Various spelling fixes in networking stack Signed-off-by: NStephen Hemminger <stephen@networkplumber.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 11月, 2013 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
Commit bceaa902 ("inet: prevent leakage of uninitialized memory to user in recv syscalls") conditionally updated addr_len if the msg_name is written to. The recv_error and rxpmtu functions relied on the recvmsg functions to set up addr_len before. As this does not happen any more we have to pass addr_len to those functions as well and set it to the size of the corresponding sockaddr length. This broke traceroute and such. Fixes: bceaa902 ("inet: prevent leakage of uninitialized memory to user in recv syscalls") Reported-by: NBrad Spengler <spender@grsecurity.net> Reported-by: Tom Labanowski Cc: mpb <mpb.mail@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 11月, 2013 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery, their sockets won't accept and install new path mtu information and they will always use the interface mtu for outgoing packets. It is guaranteed that the packet is not fragmented locally. But we won't set the DF-Flag on the outgoing frames. Florian Weimer had the idea to use this flag to ensure DNS servers are never generating outgoing fragments. They may well be fragmented on the path, but the server never stores or usees path mtu values, which could well be forged in an attack. (The root of the problem with path MTU discovery is that there is no reliable way to authenticate ICMP Fragmentation Needed But DF Set messages because they are sent from intermediate routers with their source addresses, and the IMCP payload will not always contain sufficient information to identify a flow.) Recent research in the DNS community showed that it is possible to implement an attack where DNS cache poisoning is feasible by spoofing fragments. This work was done by Amir Herzberg and Haya Shulman: <https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf> This issue was previously discussed among the DNS community, e.g. <http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>, without leading to fixes. This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode regarding local fragmentation with UFO/CORK" for the enforcement of the non-fragmentable checks. If other users than ip_append_page/data should use this semantic too, we have to add a new flag to IPCB(skb)->flags to suppress local fragmentation and check for this in ip_finish_output. Many thanks to Florian Weimer for the idea and feedback while implementing this patch. Cc: David S. Miller <davem@davemloft.net> Suggested-by: NFlorian Weimer <fweimer@redhat.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 10月, 2013 1 次提交
-
-
由 Shawn Bohrer 提交于
The since the removal of the routing cache computing fib_compute_spec_dst() does a fib_table lookup for each UDP multicast packet received. This has introduced a performance regression for some UDP workloads. This change skips populating the packet info for sockets that do not have IP_PKTINFO set. Benchmark results from a netperf UDP_RR test: Before 89789.68 transactions/s After 90587.62 transactions/s Benchmark results from a fio 1 byte UDP multicast pingpong test (Multicast one way unicast response): Before 12.63us RTT After 12.48us RTT Signed-off-by: NShawn Bohrer <sbohrer@rgmadvisors.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 9月, 2013 1 次提交
-
-
由 Francesco Fusco 提交于
This patch enables the IP_TTL and IP_TOS values passed from userspace to be stored in the ipcm_cookie struct. Three fields are added to the struct: - the TTL, expressed as __u8. The allowed values are in the [1-255]. A value of 0 means that the TTL is not specified. - the TOS, expressed as __s16. The allowed values are in the range [0,255]. A value of -1 means that the TOS is not specified. - the priority, expressed as a char and computed when handling the ancillary data. Signed-off-by: NFrancesco Fusco <ffusco@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 1月, 2013 1 次提交
-
-
由 Cong Wang 提交于
A regression is introduced by the following commit: commit 4d52cfbe Author: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue Jun 2 00:42:16 2009 -0700 net: ipv4/ip_sockglue.c cleanups Pure cleanups but it is not a pure cleanup... - if (val != -1 && (val < 1 || val>255)) + if (val != -1 && (val < 0 || val > 255)) Since there is no reason provided to allow ttl=0, change it back. Reported-by: Nnitin padalia <padalia.nitin@gmail.com> Cc: nitin padalia <padalia.nitin@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 11月, 2012 1 次提交
-
-
由 Eric W. Biederman 提交于
Allow an unpriviled user who has created a user namespace, and then created a network namespace to effectively use the new network namespace, by reducing capable(CAP_NET_ADMIN) and capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns, CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls. Settings that merely control a single network device are allowed. Either the network device is a logical network device where restrictions make no difference or the network device is hardware NIC that has been explicity moved from the initial network namespace. In general policy and network stack state changes are allowed while resource control is left unchanged. Allow creating raw sockets. Allow the SIOCSARP ioctl to control the arp cache. Allow the SIOCSIFFLAG ioctl to allow setting network device flags. Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address. Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address. Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address. Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask. Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for adding, changing and deleting gre tunnels. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for adding, changing and deleting ipip tunnels. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for adding, changing and deleting ipsec virtual tunnel interfaces. Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC, MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing sockets. Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and arbitrary ip options. Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option. Allow setting the IP_TRANSPARENT ipv4 socket option. Allow setting the TCP_REPAIR socket option. Allow setting the TCP_CONGESTION socket option. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 11月, 2012 1 次提交
-
-
由 Xi Wang 提交于
(1<<optname) is undefined behavior in C with a negative optname or optname larger than 31. In those cases the result of the shift is not necessarily zero (e.g., on x86). This patch simplifies the code with a switch statement on optname. It also allows the compiler to generate better code (e.g., using a 64-bit mask). Signed-off-by: NXi Wang <xi.wang@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 7月, 2012 1 次提交
-
-
由 David S. Miller 提交于
Use inet_iif() consistently, and for TCP record the input interface of cached RX dst in inet sock. rt->rt_iif is going to be encoded differently, so that we can legitimately cache input routes in the FIB info more aggressively. When the input interface is "use SKB device index" the rt->rt_iif will be set to zero. This forces us to move the TCP RX dst cache installation into the ipv4 specific code, and as well it should since doing the route caching for ipv6 is pointless at the moment since it is not inspected in the ipv6 input paths yet. Also, remove the unlikely on dst->obsolete, all ipv4 dsts have obsolete set to a non-zero value to force invocation of the check callback. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 6月, 2012 1 次提交
-
-
由 David S. Miller 提交于
The specific destination is the host we direct unicast replies to. Usually this is the original packet source address, but if we are responding to a multicast or broadcast packet we have to use something different. Specifically we must use the source address we would use if we were to send a packet to the unicast source of the original packet. The routing cache precomputes this value, but we want to remove that precomputation because it creates a hard dependency on the expensive rpfilter source address validation which we'd like to make cheaper. There are only three places where this matters: 1) ICMP replies. 2) pktinfo CMSG 3) IP options Now there will be no real users of rt->rt_spec_dst and we can simply remove it altogether. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 5月, 2012 1 次提交
-
-
由 Jiri Pirko 提交于
Until now, struct mreq has not been recognized and it was worked with as with struct in_addr. That means imr_multiaddr was copied to imr_address. So do recognize struct mreq here and copy that correctly. Signed-off-by: NJiri Pirko <jpirko@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 4月, 2012 2 次提交
-
-
由 Eric Dumazet 提交于
Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Baluta 提交于
Fix checkpatch errors of the following type: * ERROR: "foo * bar" should be "foo *bar" * ERROR: "(foo*)" should be "(foo *)" Signed-off-by: NDaniel Baluta <dbaluta@ixiacom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 2月, 2012 1 次提交
-
-
由 Paul E. McKenney 提交于
The call_rcu() in do_ip_setsockopt() invokes opt_kfree_rcu(), which just calls kfree(). So convert the call_rcu() to kfree_rcu(), which allows opt_kfree_rcu() to be eliminated. Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: netdev@vger.kernel.org
-
- 13 2月, 2012 1 次提交
-
-
由 Jiri Benc 提交于
Currently, it is not easily possible to get TOS/DSCP value of packets from an incoming TCP stream. The mechanism is there, IP_PKTOPTIONS getsockopt with IP_RECVTOS set, the same way as incoming TTL can be queried. This is not actually implemented for TOS, though. This patch adds this functionality, both for IPv4 (IP_PKTOPTIONS) and IPv6 (IPV6_2292PKTOPTIONS). For IPv4, like in the IP_RECVTTL case, the value of the TOS field is stored from the other party's ACK. This is needed for proxies which require DSCP transparency. One such example is at http://zph.bratcheda.org/. Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 2月, 2012 1 次提交
-
-
由 Erich E. Hoover 提交于
The IP_UNICAST_IF feature is needed by the Wine project. This patch implements the feature by setting the outgoing interface in a similar fashion to that of IP_MULTICAST_IF. A separate option is needed to handle this feature since the existing options do not provide all of the characteristics required by IP_UNICAST_IF, a summary is provided below. SO_BINDTODEVICE: * SO_BINDTODEVICE requires administrative privileges, IP_UNICAST_IF does not. From reading some old mailing list articles my understanding is that SO_BINDTODEVICE requires administrative privileges because it can override the administrator's routing settings. * The SO_BINDTODEVICE option restricts both outbound and inbound traffic, IP_UNICAST_IF only impacts outbound traffic. IP_PKTINFO: * Since IP_PKTINFO and IP_UNICAST_IF are independent options, implementing IP_UNICAST_IF with IP_PKTINFO will likely break some applications. * Implementing IP_UNICAST_IF on top of IP_PKTINFO significantly complicates the Wine codebase and reduces the socket performance (doing this requires a lot of extra communication between the "server" and "user" layers). bind(): * bind() does not work on broadcast packets, IP_UNICAST_IF is specifically intended to work with broadcast packets. * Like SO_BINDTODEVICE, bind() restricts both outbound and inbound traffic. Signed-off-by: NErich E. Hoover <ehoover@mines.edu> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 12月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 11月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
Le lundi 07 novembre 2011 à 15:33 +0100, Eric Dumazet a écrit : > At least, in recent kernels we dont change dst->refcnt in forwarding > patch (usinf NOREF skb->dst) > > One particular point is the atomic_inc(dst->refcnt) we have to perform > when queuing an UDP packet if socket asked PKTINFO stuff (for example a > typical DNS server has to setup this option) > > I have one patch somewhere that stores the information in skb->cb[] and > avoid the atomic_{inc|dec}(dst->refcnt). > OK I found it, I did some extra tests and believe its ready. [PATCH net-next] ipv4: IP_PKTINFO doesnt need dst reference When a socket uses IP_PKTINFO notifications, we currently force a dst reference for each received skb. Reader has to access dst to get needed information (rt_iif & rt_spec_dst) and must release dst reference. We also forced a dst reference if skb was put in socket backlog, even without IP_PKTINFO handling. This happens under stress/load. We can instead store the needed information in skb->cb[], so that only softirq handler really access dst, improving cache hit ratios. This removes two atomic operations per packet, and false sharing as well. On a benchmark using a mono threaded receiver (doing only recvmsg() calls), I can reach 720.000 pps instead of 570.000 pps. IP_PKTINFO is typically used by DNS servers, and any multihomed aware UDP application. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-