- 27 7月, 2012 1 次提交
-
-
由 David S. Miller 提交于
With the routing cache removal we lost the "noref" code paths on input, and this can kill some routing workloads. Reinstate the noref path when we hit a cached route in the FIB nexthops. With help from Eric Dumazet. Reported-by: NAlexander Duyck <alexander.duyck@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 7月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops.) introduced rt_cache_valid() helper. It unfortunately doesn't check if route is expired before caching it. I noticed sk_setup_caps() was constantly called on a tcp workload. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 7月, 2012 2 次提交
-
-
由 Alan Cox 提交于
gcc really should warn about these ! Signed-off-by: NAlan Cox <alan@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
1) Remove a non needed pskb_may_pull() in tcp_v4_early_demux() and fix a potential bug if skb->head was reallocated (iph & th pointers were not reloaded) TCP stack will pull/check headers anyway. 2) must reload iph in ip_rcv_finish() after early_demux() call since skb->head might have changed. 3) skb->dev->ifindex can be now replaced by skb->skb_iif Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 7月, 2012 9 次提交
-
-
由 WANG Cong 提交于
lockdep_is_held() is defined when CONFIG_LOCKDEP, not CONFIG_PROVE_LOCKING. Cc: "David S. Miller" <davem@davemloft.net> Cc: Jesse Gross <jesse@nicira.com> Signed-off-by: NWANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
On input packet processing, rt->rt_iif will be zero if we should use skb->dev->ifindex. Since we access rt->rt_iif consistently via inet_iif(), that is the only spot whose interpretation have to adjust. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Make it follow device decapsulation, from things such as VLAN and bonding. The stuff that actually cares about pre-demuxed device pointers, is handled by the "orig_dev" variable in __netif_receive_skb(). And the only consumer of that is the po->origdev feature of AF_PACKET sockets. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Use inet_iif() consistently, and for TCP record the input interface of cached RX dst in inet sock. rt->rt_iif is going to be encoded differently, so that we can legitimately cache input routes in the FIB info more aggressively. When the input interface is "use SKB device index" the rt->rt_iif will be set to zero. This forces us to move the TCP RX dst cache installation into the ipv4 specific code, and as well it should since doing the route caching for ipv6 is pointless at the moment since it is not inspected in the ipv6 input paths yet. Also, remove the unlikely on dst->obsolete, all ipv4 dsts have obsolete set to a non-zero value to force invocation of the check callback. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
The last and final kernel user, ICMP address replies, has been removed. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Alexey removed kernel side support for requests, and the only thing we do for replies is log a message if something doesn't look right. As Alexey's comment indicates, this belongs in userspace (if anywhere), and thus we can safely just get rid of this code. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
It's an ipv4 defined route flag, and only ipv4 uses it. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Saurabh 提交于
With CONFIG_SPARSE_RCU_POINTER=y sparse identified references which did not specificy __rcu in ip_vti.c Signed-off-by: NSaurabh Mohan <saurabh.mohan@vyatta.com> Reported-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Lin Ming 提交于
It is redundant to set no_addr and accept_local to 0 and then set them with other values just after that. Signed-off-by: NLin Ming <mlin@ss.pku.edu.cn> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 7月, 2012 14 次提交
-
-
由 Weiping Pan 提交于
Jay Fenlason (fenlason@redhat.com) found a bug, that recvfrom() on an RDS socket can return the contents of random kernel memory to userspace if it was called with a address length larger than sizeof(struct sockaddr_in). rds_recvmsg() also fails to set the addr_len paramater properly before returning, but that's just a bug. There are also a number of cases wher recvfrom() can return an entirely bogus address. Anything in rds_recvmsg() that returns a non-negative value but does not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path at the end of the while(1) loop will return up to 128 bytes of kernel memory to userspace. And I write two test programs to reproduce this bug, you will see that in rds_server, fromAddr will be overwritten and the following sock_fd will be destroyed. Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is better to make the kernel copy the real length of address to user space in such case. How to run the test programs ? I test them on 32bit x86 system, 3.5.0-rc7. 1 compile gcc -o rds_client rds_client.c gcc -o rds_server rds_server.c 2 run ./rds_server on one console 3 run ./rds_client on another console 4 you will see something like: server is waiting to receive data... old socket fd=3 server received data from client:data from client msg.msg_namelen=32 new socket fd=-1067277685 sendmsg() : Bad file descriptor /***************** rds_client.c ********************/ int main(void) { int sock_fd; struct sockaddr_in serverAddr; struct sockaddr_in toAddr; char recvBuffer[128] = "data from client"; struct msghdr msg; struct iovec iov; sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0); if (sock_fd < 0) { perror("create socket error\n"); exit(1); } memset(&serverAddr, 0, sizeof(serverAddr)); serverAddr.sin_family = AF_INET; serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1"); serverAddr.sin_port = htons(4001); if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) { perror("bind() error\n"); close(sock_fd); exit(1); } memset(&toAddr, 0, sizeof(toAddr)); toAddr.sin_family = AF_INET; toAddr.sin_addr.s_addr = inet_addr("127.0.0.1"); toAddr.sin_port = htons(4000); msg.msg_name = &toAddr; msg.msg_namelen = sizeof(toAddr); msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_iov->iov_base = recvBuffer; msg.msg_iov->iov_len = strlen(recvBuffer) + 1; msg.msg_control = 0; msg.msg_controllen = 0; msg.msg_flags = 0; if (sendmsg(sock_fd, &msg, 0) == -1) { perror("sendto() error\n"); close(sock_fd); exit(1); } printf("client send data:%s\n", recvBuffer); memset(recvBuffer, '\0', 128); msg.msg_name = &toAddr; msg.msg_namelen = sizeof(toAddr); msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_iov->iov_base = recvBuffer; msg.msg_iov->iov_len = 128; msg.msg_control = 0; msg.msg_controllen = 0; msg.msg_flags = 0; if (recvmsg(sock_fd, &msg, 0) == -1) { perror("recvmsg() error\n"); close(sock_fd); exit(1); } printf("receive data from server:%s\n", recvBuffer); close(sock_fd); return 0; } /***************** rds_server.c ********************/ int main(void) { struct sockaddr_in fromAddr; int sock_fd; struct sockaddr_in serverAddr; unsigned int addrLen; char recvBuffer[128]; struct msghdr msg; struct iovec iov; sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0); if(sock_fd < 0) { perror("create socket error\n"); exit(0); } memset(&serverAddr, 0, sizeof(serverAddr)); serverAddr.sin_family = AF_INET; serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1"); serverAddr.sin_port = htons(4000); if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) { perror("bind error\n"); close(sock_fd); exit(1); } printf("server is waiting to receive data...\n"); msg.msg_name = &fromAddr; /* * I add 16 to sizeof(fromAddr), ie 32, * and pay attention to the definition of fromAddr, * recvmsg() will overwrite sock_fd, * since kernel will copy 32 bytes to userspace. * * If you just use sizeof(fromAddr), it works fine. * */ msg.msg_namelen = sizeof(fromAddr) + 16; /* msg.msg_namelen = sizeof(fromAddr); */ msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_iov->iov_base = recvBuffer; msg.msg_iov->iov_len = 128; msg.msg_control = 0; msg.msg_controllen = 0; msg.msg_flags = 0; while (1) { printf("old socket fd=%d\n", sock_fd); if (recvmsg(sock_fd, &msg, 0) == -1) { perror("recvmsg() error\n"); close(sock_fd); exit(1); } printf("server received data from client:%s\n", recvBuffer); printf("msg.msg_namelen=%d\n", msg.msg_namelen); printf("new socket fd=%d\n", sock_fd); strcat(recvBuffer, "--data from server"); if (sendmsg(sock_fd, &msg, 0) == -1) { perror("sendmsg()\n"); close(sock_fd); exit(1); } } close(sock_fd); return 0; } Signed-off-by: NWeiping Pan <wpan@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dan Carpenter 提交于
If there is no OVS_SAMPLE_ATTR_ACTIONS set then "acts_list" is NULL and it leads to a NULL dereference when we call nla_len(acts_list). This is a static checker fix, not something I have seen in testing. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
ICMP messages generated in output path if frame length is bigger than mtu are actually lost because socket is owned by user (doing the xmit) One example is the ipgre_tunnel_xmit() calling icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); We had a similar case fixed in commit a34a101e (ipv6: disable GSO on sockets hitting dst_allfrag). Problem of such fix is that it relied on retransmit timers, so short tcp sessions paid a too big latency increase price. This patch uses the tcp_release_cb() infrastructure so that MTU reduction messages (ICMP messages) are not lost, and no extra delay is added in TCP transmits. Reported-by: NMaciej Żenczykowski <maze@google.com> Diagnosed-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Tore Anderson <tore@fud.no> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Julian Anastasov 提交于
In tcp_tw_remember_stamp we incorrectly checked tw instead of tm, it can lead to oops if the cached entry is not found. tcpm_stamp was not updated in tcpm_check_stamp when tcpm_suck_dst was called, move the update into tcpm_suck_dst, so that we do not call it infinitely on every next cache hit after TCP_METRICS_TIMEOUT. Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesper Juhl 提交于
In net/compat.c::put_cmsg_compat() we may assign 'data' the address of either the 'ctv' or 'cts' local variables inside the 'if (!COMPAT_USE_64BIT_TIME)' branch. Those variables go out of scope at the end of the 'if' statement, so when we use 'data' further down in 'copy_to_user(CMSG_COMPAT_DATA(cm), data, cmlen - sizeof(struct compat_cmsghdr))' there's no telling what it may be refering to - not good. Fix the problem by simply giving 'ctv' and 'cts' function scope. Signed-off-by: NJesper Juhl <jj@chaosbits.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Al Viro 提交于
recursion in __scm_destroy() will be cut by delaying final fput() Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 John Fastabend 提交于
Instead of updating the sk_cgrp_prioidx struct field on every send this only updates the field when a task is moved via cgroup infrastructure. This allows sockets that may be used by a kernel worker thread to be managed. For example in the iscsi case today a user can put iscsid in a netprio cgroup and control traffic will be sent with the correct sk_cgrp_prioidx value set but as soon as data is sent the kernel worker thread isssues a send and sk_cgrp_prioidx is updated with the kernel worker threads value which is the default case. It seems more correct to only update the field when the user explicitly sets it via control group infrastructure. This allows the users to manage sockets that may be used with other threads. Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael S. Tsirkin 提交于
Export skb_copy_ubufs so that modules can orphan frags. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael S. Tsirkin 提交于
zero copy packets are normally sent to the outside network, but bridging, tun etc might loop them back to host networking stack. If this happens destructors will never be called, so orphan the frags immediately on receive. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael S. Tsirkin 提交于
Reduce code duplication a bit using the new helper. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Mark A. Greer 提交于
Commit 76ff5cc9 (rtnl: allow to specify number of rx and tx queues on device creation) added a reference to the net_device structure's 'num_rx_queues' member in net/core/rtnetlink.c:rtnl_fill_ifinfo() However, the definition for 'num_rx_queues' is surrounded by an '#ifdef CONFIG_RPS' while the new reference to it is not. This causes a compile error when CONFIG_RPS is not defined. Fix the compile error by surrounding the new reference to 'num_rx_queues' by an '#ifdef CONFIG_RPS'. CC: Jiri Pirko <jiri@resnulli.us> Signed-off-by: NMark A. Greer <mgreer@animalcreek.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neil Horman 提交于
I've seen several attempts recently made to do quick failover of sctp transports by reducing various retransmit timers and counters. While its possible to implement a faster failover on multihomed sctp associations, its not particularly robust, in that it can lead to unneeded retransmits, as well as false connection failures due to intermittent latency on a network. Instead, lets implement the new ietf quick failover draft found here: http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05 This will let the sctp stack identify transports that have had a small number of errors, and avoid using them quickly until their reliability can be re-established. I've tested this out on two virt guests connected via multiple isolated virt networks and believe its in compliance with the above draft and works well. Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: Sridhar Samudrala <sri@us.ibm.com> CC: "David S. Miller" <davem@davemloft.net> CC: linux-sctp@vger.kernel.org CC: joe@perches.com Acked-by: NVlad Yasevich <vyasevich@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kevin Groeneveld 提交于
Fix race condition in several network drivers when reading stats on 32bit UP architectures. These drivers update their stats in a BH context and therefore should use u64_stats_fetch_begin_bh/u64_stats_fetch_retry_bh instead of u64_stats_fetch_begin/u64_stats_fetch_retry when reading the stats. Signed-off-by: NKevin Groeneveld <kgroeneveld@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Set unicast_sock uc_ttl to -1 so that we select the right ttl, instead of sending packets with a 0 ttl. Bug added in commit be9f4a44 (ipv4: tcp: remove per net tcp_sock) Signed-off-by: NHiroaki SHIMODA <shimoda.hiroaki@gmail.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 7月, 2012 13 次提交
-
-
由 Ben Pfaff 提交于
At the point where it was used, skb_shinfo(skb)->gso_type referred to a post-GSO sk_buff. Thus, it would always be 0. We want to know the pre-GSO gso_type, so we need to obtain it before segmenting. Before this change, the kernel would pass inconsistent data to userspace: packets for UDP fragments with nonzero offset would be passed along with flow keys that indicate a zero offset (that is, the flow key for "later" fragments claimed to be "first" fragments). This inconsistency tended to confuse Open vSwitch userspace, causing it to log messages about "failed to flow_del" the flows with "later" fragments. Signed-off-by: NBen Pfaff <blp@nicira.com> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
由 Pravin B Shelar 提交于
Fix return check typo. Signed-off-by: NPravin B Shelar <pshelar@nicira.com> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
由 David S. Miller 提交于
It's not really needed. We only grabbed a reference to the fib_info for the sake of fib_info local metrics. However, fib_info objects are freed using RCU, as are therefore their private metrics (if any). We would have triggered a route cache flush if we eliminated a reference to a fib_info object in the routing tables. Therefore, any existing cached routes will first check and see that they have been invalidated before an errant reference to these metric values would occur. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
That is this value's only use, as a boolean to indicate whether a route is an input route or not. So implement it that way, using a u16 gap present in the struct already. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Never actually used. It was being set on output routes to the original OIF specified in the flow key used for the lookup. Adjust the only user, ipmr_rt_fib_lookup(), for greater correctness of the flowi4_oif and flowi4_iif values, thanks to feedback from Julian Anastasov. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Don't bother incrementing dst->__use and setting dst->lastuse, they are completely pointless and just slow things down. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Caching input routes is slightly simpler than output routes, since we don't need to be concerned with nexthop exceptions. (locally destined, and routed packets, never trigger PMTU events or redirects that will be processed by us). However, we have to elide caching for the DIRECTSRC and non-zero itag cases. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
If we have an output route that lacks nexthop exceptions, we can cache it in the FIB info nexthop. Such routes will have DST_HOST cleared because such routes refer to a family of destinations, rather than just one. The sequence of the handling of exceptions during route lookup is adjusted to make the logic work properly. Before we allocate the route, we lookup the exception. Then we know if we will cache this route or not, and therefore whether DST_HOST should be set on the allocated route. Then we use DST_HOST to key off whether we should store the resulting route, during rt_set_nexthop(), in the FIB nexthop cache. With help from Eric Dumazet. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Mark them obsolete so there will be a re-lookup to fetch the FIB nexthop exception info. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Add a big comment explaining how the field works, and use defines instead of magic constants for the values assigned to it. Suggested by Joe Perches. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
In order to allow prefixed routes, we have to adjust how rt_gateway is set and interpreted. The new interpretation is: 1) rt_gateway == 0, destination is on-link, nexthop is iph->daddr 2) rt_gateway != 0, destination requires a nexthop gateway Abstract the fetching of the proper nexthop value using a new inline helper, rt_nexthop(), as suggested by Joe Perches. Signed-off-by: NDavid S. Miller <davem@davemloft.net> Tested-by: NVijay Subramanian <subramanian.vijay@gmail.com>
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-