- 10 4月, 2012 1 次提交
-
-
由 Ansis Atteka 提交于
There is no need to send a notification if ovs_vport_set_options() failed and ovs_vport_cmd_set() did not change anything. Signed-off-by: NAnsis Atteka <aatteka@nicira.com> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
- 03 4月, 2012 2 次提交
-
-
由 Jesse Gross 提交于
We currently check that a packet is IPv4 and TCP before fetching the TCP flags. This enables fetching from IPv6 packets as well. Reported-by: NMichael Mao <mmao@nicira.com> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
由 Jesse Gross 提交于
When collecting TCP flags we check that the IP header indicates that a TCP header is present but not that the packet is actually long enough to contain the header. This adds a check to prevent reading off the end of the packet. In practice, this is only likely to result in reading of bad data and not a crash due to the presence of struct skb_shared_info at the end of the packet. Signed-off-by: NJesse Gross <jesse@nicira.com>
-
- 29 3月, 2012 1 次提交
-
-
由 David Howells 提交于
Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
- 28 3月, 2012 2 次提交
-
-
由 Eric Dumazet 提交于
Commit f2c31e32 (net: fix NULL dereferences in check_peer_redir() ) added a regression in rt6_fill_node(), leading to rcu_read_lock() imbalance. Thats because NLA_PUT() can make a jump to nla_put_failure label. Fix this by using nla_put() Many thanks to Ben Greear for his help Reported-by: NBen Greear <greearb@candelatech.com> Reported-by: NDave Jones <davej@redhat.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Tested-by: NBen Greear <greearb@candelatech.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Bryan Schumaker 提交于
rbcb_getport_async() was looking up the rpc_xprt (reference++) and then later looking it up again (reference++) to pass through the rpcbind_args. The xprt would only be dereferenced once, when we were done with the rpcbind_args (reference--). This leaves an extra reference to the transport that would never go away. Signed-off-by: NBryan Schumaker <bjschuma@netapp.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
- 26 3月, 2012 3 次提交
-
-
由 J. Bruce Fields 提交于
There's obviously no point to doing portmap calls over the sessions backchannel. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
Add a new top-level dir in rpc_pipefs to hold the pipe for the clientid upcall. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Eric Dumazet 提交于
skb_add_rx_frag() API is misleading. Network skbs built with this helper can use uncharged kernel memory and eventually stress/crash machine in OOM. Add a 'truesize' parameter and then fix drivers in followup patches. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 3月, 2012 1 次提交
-
-
由 Hans Verkuil 提交于
In some cases the poll() implementation in a driver has to do different things depending on the events the caller wants to poll for. An example is when a driver needs to start a DMA engine if the caller polls for POLLIN, but doesn't want to do that if POLLIN is not requested but instead only POLLOUT or POLLPRI is requested. This is something that can happen in the video4linux subsystem among others. Unfortunately, the current epoll/poll/select implementation doesn't provide that information reliably. The poll_table_struct does have it: it has a key field with the event mask. But once a poll() call matches one or more bits of that mask any following poll() calls are passed a NULL poll_table pointer. Also, the eventpoll implementation always left the key field at ~0 instead of using the requested events mask. This was changed in eventpoll.c so the key field now contains the actual events that should be polled for as set by the caller. The solution to the NULL poll_table pointer is to set the qproc field to NULL in poll_table once poll() matches the events, not the poll_table pointer itself. That way drivers can obtain the mask through a new poll_requested_events inline. The poll_table_struct can still be NULL since some kernel code calls it internally (netfs_state_poll() in ./drivers/staging/pohmelfs/netfs.h). In that case poll_requested_events() returns ~0 (i.e. all events). Very rarely drivers might want to know whether poll_wait will actually wait. If another earlier file descriptor in the set already matched the events the caller wanted to wait for, then the kernel will return from the select() call without waiting. This might be useful information in order to avoid doing expensive work. A new helper function poll_does_not_wait() is added that drivers can use to detect this situation. This is now used in sock_poll_wait() in include/net/sock.h. This was the only place in the kernel that needed this information. Drivers should no longer access any of the poll_table internals, but use the poll_requested_events() and poll_does_not_wait() access functions instead. In order to enforce that the poll_table fields are now prepended with an underscore and a comment was added warning against using them directly. This required a change in unix_dgram_poll() in unix/af_unix.c which used the key field to get the requested events. It's been replaced by a call to poll_requested_events(). For qproc it was especially important to change its name since the behavior of that field changes with this patch since this function pointer can now be NULL when that wasn't possible in the past. Any driver accessing the qproc or key fields directly will now fail to compile. Some notes regarding the correctness of this patch: the driver's poll() function is called with a 'struct poll_table_struct *wait' argument. This pointer may or may not be NULL, drivers can never rely on it being one or the other as that depends on whether or not an earlier file descriptor in the select()'s fdset matched the requested events. There are only three things a driver can do with the wait argument: 1) obtain the key field: events = wait ? wait->key : ~0; This will still work although it should be replaced with the new poll_requested_events() function (which does exactly the same). This will now even work better, since wait is no longer set to NULL unnecessarily. 2) use the qproc callback. This could be deadly since qproc can now be NULL. Renaming qproc should prevent this from happening. There are no kernel drivers that actually access this callback directly, BTW. 3) test whether wait == NULL to determine whether poll would return without waiting. This is no longer sufficient as the correct test is now wait == NULL || wait->_qproc == NULL. However, the worst that can happen here is a slight performance hit in the case where wait != NULL and wait->_qproc == NULL. In that case the driver will assume that poll_wait() will actually add the fd to the set of waiting file descriptors. Of course, poll_wait() will not do that since it tests for wait->_qproc. This will not break anything, though. There is only one place in the whole kernel where this happens (sock_poll_wait() in include/net/sock.h) and that code will be replaced by a call to poll_does_not_wait() in the next patch. Note that even if wait->_qproc != NULL drivers cannot rely on poll_wait() actually waiting. The next file descriptor from the set might match the event mask and thus any possible waits will never happen. Signed-off-by: NHans Verkuil <hans.verkuil@cisco.com> Reviewed-by: NJonathan Corbet <corbet@lwn.net> Reviewed-by: NAl Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: NHans de Goede <hdegoede@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 3月, 2012 10 次提交
-
-
由 Andy Gospodarek 提交于
The following patch aimed to resolve an issue where secondary, tertiary, etc. addresses added to bond interfaces could overwrite the bond->master_ip and vlan_ip values. commit 917fbdb3 Author: Henrik Saavedra Persson <henrik.e.persson@ericsson.com> Date: Wed Nov 23 23:37:15 2011 +0000 bonding: only use primary address for ARP That patch was good because it prevented bonds using ARP monitoring from sending frames with an invalid source IP address. Unfortunately, it didn't always work as expected. When using an ioctl (like ifconfig does) to set the IP address and netmask, 2 separate ioctls are actually called to set the IP and netmask if the mask chosen doesn't match the standard mask for that class of address. The first ioctl did not have a mask that matched the one in the primary address and would still cause the device address to be overwritten. The second ioctl that was called to set the mask would then detect as secondary and ignored, but the damage was already done. This was not an issue when using an application that used netlink sockets as the setting of IP and netmask came down at once. The inconsistent behavior between those two interfaces was something that needed to be resolved. While I was thinking about how I wanted to resolve this, Ralf Zeidler came with a patch that resolved this on a RHEL kernel by keeping a full shadow of the entries in dev->ifa_list for the bonding device and vlan devices in the bonding driver. I didn't like the duplication of the list as I want to see the 'bonding' struct and code shrink rather than grow, but liked the general idea. As the Subject indicates this patch drops the master_ip and vlan_ip elements from the 'bonding' and 'vlan_entry' structs, respectively. This can be done because a device's address-list is now traversed to determine the optimal source IP address for ARP requests and for checks to see if the bonding device has a particular IP address. This code could have all be contained inside the bonding driver, but it made more sense to me to EXPORT and call inet_confirm_addr since it did exactly what was needed. I tested this and a backported patch and everything works as expected. Ralf also helped with verification of the backported patch. Thanks to Ralf for all his help on this. v2: Whitespace and organizational changes based on suggestions from Jay Vosburgh and Dave Miller. v3: Fixup incorrect usage of rcu_read_unlock based on Dave Miller's suggestion. Signed-off-by: NAndy Gospodarek <andy@greyhouse.net> CC: Ralf Zeidler <ralf.zeidler@nsn.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rusty Russell 提交于
It used to be an int, and it got changed to a bool parameter at least 7 years ago. It happens that NF_ACCEPT and NF_DROP are 0 and 1, so this works, but it's unclear, and the check that it's in range is not required. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pablo Neira Ayuso 提交于
We need to permanently attach the timeout policy to the conntrack, otherwise we may apply the custom timeout policy inconsistently. Without this patch, the following example: nfct timeout add test inet icmp timeout 100 iptables -I PREROUTING -t raw -p icmp -s 1.1.1.1 -j CT --timeout test Will only apply the custom timeout policy to outgoing packets from 1.1.1.1, but not to reply packets from 2.2.2.2 going to 1.1.1.1. To fix this issue, this patch modifies the current logic to attach the timeout policy when the first packet is seen (which is when the conntrack entry is created). Then, we keep using the attached timeout policy until the conntrack entry is destroyed. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
`iptables -p all' uses 0 to match all protocols, while the conntrack subsystem uses 255. We still need `-p all' to attach the custom timeout policies for the generic protocol tracker. Moreover, we may use `iptables -p sctp' while the SCTP tracker is not loaded. In that case, we have to default on the generic protocol tracker. Another possibility is `iptables -p ip' that should be supported as well. This patch makes sure we validate all possible scenarios. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
Fix a dereference to pointer without rcu_read_lock held. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
This patch introduces nf_conntrack_l4proto_find_get() and nf_conntrack_l4proto_put() to fix module dependencies between timeout objects and l4-protocol conntrack modules. Thus, we make sure that the module cannot be removed if it is used by any of the cttimeout objects. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Steffen Klassert 提交于
We call the wrong replay notify function when we use ESN replay handling. This leads to the fact that we don't send notifications if we use ESN. Fix this by calling the registered callbacks instead of xfrm_replay_notify(). Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Steffen Klassert 提交于
The xfrm_state argument is unused in this function, so remove it. Also the name xfrm_state_check_space does not really match what this function does. It actually checks if we have enough head and tailroom on the skb. So we rename the function to xfrm_skb_check_space. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dan Carpenter 提交于
We should be using the gfp flags the caller specified here, instead of GFP_KERNEL. I think this might be a bugfix, depending on the value of "sock->sk->sk_allocation" when we call rds_conn_create_outgoing() in rds_sendmsg(). Otherwise, it's just a cleanup. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Acked-by: NVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dan Carpenter 提交于
This function takes a GFP flags as a parameter, but they are never used. We don't take a lock in this function so there is no reason to prefer GFP_ATOMIC over the caller's GFP flags. There is only one caller, cipso_v4_map_cat_rng_ntoh(), and it passes GFP_ATOMIC as the GFP flags so this doesn't change how the code works. It's just a cleanup. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 3月, 2012 20 次提交
-
-
由 Alex Elder 提交于
In write_partial_msg_pages(), every case now does an identical call to kmap(page). Instead, just call it once inside the CRC-computing block where it's needed. Move the definition of kaddr inside that block, and make it a (char *) to ensure portable pointer arithmetic. We still don't kunmap() it until after the sendpage() call, in case that also ends up needing to use the mapping. Signed-off-by: NAlex Elder <elder@dreamhost.com> Reviewed-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
In write_partial_msg_pages() there is a local variable used to track the starting offset within a bio segment to use. Its name, "page_shift" defies the Linux convention of using that name for log-base-2(page size). Since it's only used in the bio case rename it "bio_offset". Use it along with the page_pos field to compute the memory offset when computing CRC's in that function. This makes the bio case match the others more closely. Signed-off-by: NAlex Elder <elder@dreamhost.com> Reviewed-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
There's not a lot of benefit to zero_page_address, which basically holds a mapping of the zero page through the life of the messenger module. Even with our own mapping, the sendpage interface where it's used may need to kmap() it again. It's almost certain to be in low memory anyway. So stop treating the zero page specially in write_partial_msg_pages() and just get rid of zero_page_address entirely. Signed-off-by: NAlex Elder <elder@dreamhost.com> Reviewed-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
Make ceph_tcp_sendpage() be the only place kernel_sendpage() is used, by using this helper in write_partial_msg_pages(). Signed-off-by: NAlex Elder <elder@dreamhost.com> Reviewed-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
If a message queued for send gets revoked, zeroes are sent over the wire instead of any unsent data. This is done by constructing a message and passing it to kernel_sendmsg() via ceph_tcp_sendmsg(). Since we are already working with a page in this case we can use the sendpage interface instead. Create a new ceph_tcp_sendpage() helper that sets up flags to match the way ceph_tcp_sendmsg() does now. Signed-off-by: NAlex Elder <elder@dreamhost.com> Reviewed-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
CRC's are computed for all messages between ceph entities. The CRC computation for the data portion of message can optionally be disabled using the "nocrc" (common) ceph option. The default is for CRC computation for the data portion to be enabled. Unfortunately, the code that implements this feature interprets the feature flag wrong, meaning that by default the CRC's have *not* been computed (or checked) for the data portion of messages unless the "nocrc" option was supplied. Fix this, in write_partial_msg_pages() and read_partial_message(). Also change the flag variable in write_partial_msg_pages() to be "no_datacrc" to match the usage elsewhere in the file. This fixes http://tracker.newdream.net/issues/2064Signed-off-by: NAlex Elder <elder@dreamhost.com> Reviewed-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
Nothing too big here. - define the size of the buffer used for consuming ignored incoming data using a symbolic constant - simplify the condition determining whether to unmap the page in write_partial_msg_pages(): do it for crc but not if the page is the zero page Signed-off-by: NAlex Elder <elder@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
Make a small change in the code that counts down kvecs consumed by a ceph_tcp_sendmsg() call. Same functionality, just blocked out a little differently. Signed-off-by: NAlex Elder <elder@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
Move blocks of code out of loops in read_partial_message_section() and read_partial_message(). They were only was getting called at the end of the last iteration of the loop anyway. Signed-off-by: NAlex Elder <elder@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
Calculate CRC in a separate step from rearranging the byte order of the result, to improve clarity and readability. Use offsetof() to determine the number of bytes to include in the CRC calculation. In read_partial_message(), switch which value gets byte-swapped, since the just-computed CRC is already likely to be in a register. Signed-off-by: NAlex Elder <elder@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
Change the name (and type) of a few CRC-related Boolean local variables so they contain the word "do", to distingish their purpose from variables used for holding an actual CRC value. Note that in the process of doing this I identified a fairly serious logic error in write_partial_msg_pages(): the value of "do_crc" assigned appears to be the opposite of what it should be. No attempt to fix this is made here; this change preserves the erroneous behavior. The problem I found is documented here: http://tracker.newdream.net/issues/2064Signed-off-by: NAlex Elder <elder@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
Many ceph-related Boolean options offer the ability to both enable and disable a feature. For all those that don't offer this, add a new option so that they do. Note that ceph_show_options()--which reports mount options currently in effect--only reports the option if it is different from the default value. Signed-off-by: NAlex Elder <elder@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
This gathers a number of very minor changes: - use %hu when formatting the a socket address's address family - null out the ceph_msgr_wq pointer after the queue has been destroyed - drop a needless cast in ceph_write_space() - add a WARN() call in ceph_state_change() in the event an unrecognized socket state is encountered - rearrange the logic in ceph_con_get() and ceph_con_put() so that: - the reference counts are only atomically read once - the values displayed via dout() calls are known to be meaningful at the time they are formatted Signed-off-by: NAlex Elder <elder@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
There is no real need for ceph_tcp_connect() to return the socket pointer it creates, since it already assigns it to con->sock, which is visible to the caller. Instead, have it return an error code, which tidies things up a bit. Signed-off-by: NAlex Elder <elder@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
Define a helper function to perform various cleanup operations. Use it both in the exit routine and in the init routine in the event of an error. Signed-off-by: NAlex Elder <elder@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
The messenger workqueue has no need to be public. So give it static scope. Signed-off-by: NAlex Elder <elder@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
Encapsulate the operation of adding a new chunk of data to the next open slot in a ceph_connection's out_kvec array. Also add a "reset" operation to make subsequent add operations start at the beginning of the array again. Use these routines throughout, avoiding duplicate code and ensuring all calls are handled consistently. Signed-off-by: NAlex Elder <elder@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
One of the arguments to prepare_write_connect() indicates whether it is being called immediately after a call to prepare_write_banner(). Move the prepare_write_banner() call inside prepare_write_connect(), and reinterpret (and rename) the "after_banner" argument so it indicates that prepare_write_connect() should *make* the call rather than should know it has already been made. This was split out from the next patch to highlight this change in logic. Signed-off-by: NAlex Elder <elder@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
ceph_parse_options() takes the address of a pointer as an argument and uses it to return the address of an allocated structure if successful. With this interface is not evident at call sites that the pointer is always initialized. Change the interface to return the address instead (or a pointer-coded error code) to make the validity of the returned pointer obvious. Signed-off-by: NAlex Elder <elder@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
This fixes some spots where a type cast to (void *) was used as as a universal type hiding mechanism. Instead, properly cast the type to the intended target type. Signed-off-by: NAlex Elder <elder@newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-