- 26 3月, 2016 11 次提交
-
-
由 Yan, Zheng 提交于
This avoids defining large array of r_reply_op_{len,result} in in struct ceph_osd_request. Signed-off-by: NYan, Zheng <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Follow userspace nomenclature on this - the next commit adds outdata_len. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
This can happen if __close_session() in ceph_monc_stop() races with a connection reset. We need to ignore such faults, otherwise it's likely we would take !hunting, call __schedule_delayed() and end up with delayed_work() executing on invalid memory, among other things. The (two!) con->private tests are useless, as nothing ever clears con->private. Nuke them. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Doing __schedule_delayed() in the hunting branch is pointless, as the tick will have already been scheduled by then. What we need to do instead is *reschedule* it in the !hunting branch, after reopen_session() changes hunt_mult, which affects the delay. This helps with spacing out connection attempts and avoiding things like two back-to-back attempts followed by a longer period of waiting around. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
hunting is now set in __open_session() and cleared in finish_hunting(), instead of all around. The "session lost" message is printed not only on connection resets, but also on keepalive timeouts. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Unless we are in the process of setting up a client (i.e. connecting to the monitor cluster for the first time), apply a backoff: every time we want to reopen a session, increase our timeout by a multiple (currently 2); when we complete the connection, reduce that multipler by 50%. Mirrors ceph.git commit 794c86fd289bd62a35ed14368fa096c46736e9a2. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Split ping interval and ping timeout: ping interval is 10s; keepalive timeout is 30s. Make monc_ping_timeout a constant while at it - it's not actually exported as a mount option (and the rest of tick-related settings won't be either), so it's got no place in ceph_options. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Don't try to reconnect to the same monitor when we fail to establish a session within a timeout or it's lost. For that, pick_new_mon() needs to see the old value of cur_mon, so don't clear it in __close_session() - all calls to __close_session() but one are followed by __open_session() anyway. __open_session() is only called when a new session needs to be established, so the "already open?" branch, which is now in the way, is simply dropped. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
It is currently hard-coded in the mon_client that mdsmap and monmap subs are continuous, while osdmap sub is always "onetime". To better handle full clusters/pools in the osd_client, we need to be able to issue continuous osdmap subs. Revamp subs code to allow us to specify for each sub whether it should be continuous or not. Although not strictly required for the above, switch to SUBSCRIBE2 protocol while at it, eliminating the ambiguity between a request for "every map since X" and a request for "just the latest" when we don't have a map yet (i.e. have epoch 0). SUBSCRIBE2 feature bit is now required - it's been supported since pre-argonaut (2010). Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling in before we validate the epoch and successfully install the new map can mess up mon_client sub state. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Coupling hunting state with subscribe state is not a good idea. Clear hunting when we complete the authentication handshake. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Our debugfs dir name is a concatenation of cluster fsid and client unique ID ("global_id"). It used to be the case that we learned global_id first, nowadays we always learn fsid first - the monmap is sent before any auth replies are. ceph_debugfs_client_init() call in ceph_monc_handle_map() is therefore never executed and can be removed. Its counterpart in handle_auth_reply() doesn't really belong there either: having to do monc->client and unlocking early to work around lockdep is a testament to that. Move it into __ceph_open_session(), where it can be called unconditionally. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 19 3月, 2016 9 次提交
-
-
由 Sowmini Varadhan 提交于
RDS_TCP_DEFAULT_BUFSIZE has been unused since commit 1edd6a14 ("RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune"). Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sowmini Varadhan 提交于
Add per-net sysctl tunables to set the size of sndbuf and rcvbuf on the kernel tcp socket. The tunables are added at /proc/sys/net/rds/tcp/rds_tcp_sndbuf and /proc/sys/net/rds/tcp/rds_tcp_rcvbuf. These values must be set before accept() or connect(), and there may be an arbitrary number of existing rds-tcp sockets when the tunable is modified. To make sure that all connections in the netns pick up the same value for the tunable, we reset existing rds-tcp connections in the netns, so that they can reconnect with the new parameters. Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
eBPF defines this as BPF_TUNLEN_MAX and OVS just uses the hard-coded value inside struct sw_flow_key. Thus, add and use IP_TUNNEL_OPTS_MAX for this, which makes the code a bit more generic and allows to remove BPF_TUNLEN_MAX from eBPF code. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
We can just add a small helper dst_tclassid() for retrieving the dst->tclassid value. It makes the code a bit better in that we can get rid of the ifdef from filter.c by moving this into the header. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Currently, the tc_classid from eBPF skb context is write-only, but there's no good reason for tc programs to limit it to write-only. For example, it can be used to transfer its state via tail calls where the resulting tc_classid gets filled gradually. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
There are two issues with the current code. First one is that we need to set res->class to 0 in case we use non-default classid matching. This is important for the case where cls_bpf was initially set up with an optional binding to a default class with tcf_bind_filter(), where the underlying qdisc implements bind_tcf() that fills res->class and tests for it later on when doing the classification. Convention for these cases is that after tc_classify() was called, such qdiscs (atm, drr, qfq, cbq, hfsc, htb) first test class, and if 0, then they lookup based on classid. Second, there's a bug with da mode, where res->classid is only assigned a 16 bit minor, but it needs to expand to the full 32 bit major/minor combination instead, therefore we need to expand with the bound major. This is fine as classes belonging to a classful qdisc must share the same major. Fixes: 045efa82 ("cls_bpf: introduce integrated actions") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Simon Horman 提交于
Currently output of MPLS packets on tunnel vports is not allowed by Open vSwitch. This is because historically encapsulation was done in such a way that the inner_protocol field of the skb needed to hold the inner protocol for both MPLS and tunnel encapsulation in order for GSO segmentation to be performed correctly. Since b2acd1dc ("openvswitch: Use regular GRE net_device instead of vport") Open vSwitch makes use of lwt to output to tunnel netdevs which perform encapsulation. As no drivers expose support for MPLS offloads this means that GSO packets are segmented in software by validate_xmit_skb(), which is called from __dev_queue_xmit(), before tunnel encapsulation occurs. This means that the inner protocol of MPLS is no longer needed by the time encapsulation occurs and the contention on the inner_protocol field of the skb no longer occurs. Thus it is now safe to output MPLS to tunnel vports. Signed-off-by: NSimon Horman <simon.horman@netronome.com> Reviewed-by: NJesse Gross <jesse@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Wu Fengguang 提交于
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Wu Fengguang 提交于
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 3月, 2016 3 次提交
-
-
由 Eric Dumazet 提交于
Now SYN_RECV request sockets are installed in ehash table, an ICMP handler can find a request socket while another cpu handles an incoming packet transforming this SYN_RECV request socket into an ESTABLISHED socket. We need to remove the now obsolete WARN_ON(req->sk), since req->sk is set when a new child is created and added into listener accept queue. If this race happens, the ICMP will do nothing special. Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NBen Lazarus <blazarus@google.com> Reported-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
vlan drivers lack proper propagation of gso_max_segs from lower device. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joonsoo Kim 提交于
The success of CMA allocation largely depends on the success of migration and key factor of it is page reference count. Until now, page reference is manipulated by direct calling atomic functions so we cannot follow up who and where manipulate it. Then, it is hard to find actual reason of CMA allocation failure. CMA allocation should be guaranteed to succeed so finding offending place is really important. In this patch, call sites where page reference is manipulated are converted to introduced wrapper function. This is preparation step to add tracepoint to each page reference manipulation function. With this facility, we can easily find reason of CMA allocation failure. There is no functional change in this patch. In addition, this patch also converts reference read sites. It will help a second step that renames page._count to something else and prevents later attempt to direct access to it (Suggested by Andrew). Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NMichal Nazarewicz <mina86@mina86.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 3月, 2016 2 次提交
-
-
由 Ben Hutchings 提交于
When the ETHTOOL_GLINKSETTINGS implementation finds that userland is using the wrong number of words of link mode bitmaps (or is trying to find out the right numbers) it sets the cmd field to 0 in the response structure. This is inconsistent with the implementation of every other ethtool command, so let's remove that inconsistency before it gets into a stable release. Fixes: 3f1ac7a7 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API") Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicholas Mc Guire 提交于
local_bh_disable() + spin_lock() is equivalent to spin_lock_bh(), same for the unlock/enable case, so replace the calls by the appropriate wrappers. Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org> Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 3月, 2016 1 次提交
-
-
由 Peter Zijlstra 提交于
$ make tags GEN tags ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name pattern "\1" ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name pattern "\1" ctags: Warning: kernel/locking/lockdep.c:151: null expansion of name pattern "\1" ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern "\1" ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern "\1" ctags: Warning: kernel/workqueue.c:323: null expansion of name pattern "\1" ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1" ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1" ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1" Which are all the result of the DEFINE_PER_CPU pattern: scripts/tags.sh:200: '/\<DEFINE_PER_CPU([^,]*, *\([[:alnum:]_]*\)/\1/v/' scripts/tags.sh:201: '/\<DEFINE_PER_CPU_SHARED_ALIGNED([^,]*, *\([[:alnum:]_]*\)/\1/v/' The below cures them. All except the workqueue one are within reasonable distance of the 80 char limit. TJ do you have any preference on how to fix the wq one, or shall we just not care its too long? Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NDavid S. Miller <davem@davemloft.net> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tejun Heo <tj@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 3月, 2016 14 次提交
-
-
由 Nicholas Mc Guire 提交于
The spin_lock()/spin_unlock() is synchronizing on the nf_conntrack_locks_all_lock which is equivalent to spin_unlock_wait() but the later should be more efficient. Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Eric Dumazet 提交于
On loaded TCP servers, looking at millions of sockets can hold cpu for many seconds, if the lookup condition is very narrow. (eg : ss dst 1.2.3.4 ) Better add a cond_resched() to allow other processes to access the cpu. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jarno Rajahalme 提交于
Extend OVS conntrack interface to cover NAT. New nested OVS_CT_ATTR_NAT attribute may be used to include NAT with a CT action. A bare OVS_CT_ATTR_NAT only mangles existing and expected connections. If OVS_NAT_ATTR_SRC or OVS_NAT_ATTR_DST is included within the nested attributes, new (non-committed/non-confirmed) connections are mangled according to the rest of the nested attributes. The corresponding OVS userspace patch series includes test cases (in tests/system-traffic.at) that also serve as example uses. This work extends on a branch by Thomas Graf at https://github.com/tgraf/ovs/tree/nat. Signed-off-by: NJarno Rajahalme <jarno@ovn.org> Acked-by: NThomas Graf <tgraf@suug.ch> Acked-by: NJoe Stringer <joe@ovn.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Jarno Rajahalme 提交于
There is no need to help connections that are not confirmed, so we can delay helping new connections to the time when they are confirmed. This change is needed for NAT support, and having this as a separate patch will make the following NAT patch a bit easier to review. Signed-off-by: NJarno Rajahalme <jarno@ovn.org> Acked-by: NJoe Stringer <joe@ovn.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Jarno Rajahalme 提交于
Repeat the nf_conntrack_in() call when it returns NF_REPEAT. This avoids dropping a SYN packet re-opening an existing TCP connection. Signed-off-by: NJarno Rajahalme <jarno@ovn.org> Acked-by: NJoe Stringer <joe@ovn.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Jarno Rajahalme 提交于
Add a new function ovs_ct_find_existing() to find an existing conntrack entry for which this packet was already applied to. This is only to be called when there is evidence that the packet was already tracked and committed, but we lost the ct reference due to an userspace upcall. ovs_ct_find_existing() is called from skb_nfct_cached(), which can now hide the fact that the ct reference may have been lost due to an upcall. This allows ovs_ct_commit() to be simplified. This patch is needed by later "openvswitch: Interface with NAT" patch, as we need to be able to pass the packet through NAT using the original ct reference also after the reference is lost after an upcall. Signed-off-by: NJarno Rajahalme <jarno@ovn.org> Acked-by: NJoe Stringer <joe@ovn.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Jarno Rajahalme 提交于
Only a successful nf_conntrack_in() call can effect a connection state change, so it suffices to update the key only after the nf_conntrack_in() returns. This change is needed for the later NAT patches. Signed-off-by: NJarno Rajahalme <jarno@ovn.org> Acked-by: NJoe Stringer <joe@ovn.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Jarno Rajahalme 提交于
This makes the code easier to understand and the following patches more focused. Signed-off-by: NJarno Rajahalme <jarno@ovn.org> Acked-by: NJoe Stringer <joe@ovn.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Jarno Rajahalme 提交于
NAT checksum recalculation code assumes existence of skb_dst, which becomes a problem for a later patch in the series ("openvswitch: Interface with NAT."). Simplify this by removing the check on skb_dst, as the checksum will be dealt with later in the stack. Suggested-by: NPravin Shelar <pshelar@nicira.com> Signed-off-by: NJarno Rajahalme <jarno@ovn.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Jarno Rajahalme 提交于
Remove the definition of IP_CT_NEW_REPLY from the kernel as it does not make sense. This allows the definition of IP_CT_NUMBER to be simplified as well. Signed-off-by: NJarno Rajahalme <jarno@ovn.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Vivien Didelot 提交于
Rework the netdev event handler, similar to what the Mellanox Spectrum driver does, to easily welcome more events later (for example NETDEV_PRECHANGEUPPER) and use netdev helpers (such as netif_is_bridge_master). Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vivien Didelot 提交于
netdev_upper_dev_unlink() which notifies NETDEV_CHANGEUPPER, returns void, as well as del_nbp(). So there's no advantage to catch an eventual error from the port_bridge_leave routine at the DSA level. Make this routine void for the DSA layer and its existing drivers. Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vivien Didelot 提交于
Rename DSA port_join_bridge and port_leave_bridge routines to respectively port_bridge_join and port_bridge_leave in order to respect an implicit Port::Bridge namespace. Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
Zefir Kurtisi reported kernel panic with an openwrt specific patch. However, it turns out that mainline has a similar bug waiting to happen. Once NF_HOOK() returns the skb is in undefined state and must not be used. Moreover, the okfn must consume the skb to support async processing (NF_QUEUE). Current okfn in this spot doesn't consume it and caller assumes that NF_HOOK return value tells us if skb was freed or not, but thats wrong. It "works" because no in-tree user registers a NFPROTO_BRIDGE hook at LOCAL_IN that returns STOLEN or NF_QUEUE verdicts. Once we add NF_QUEUE support for nftables bridge this will break -- NF_QUEUE holds the skb for async processing, caller will erronoulsy return RX_HANDLER_PASS and on reinject netfilter will access free'd skb. Fix this by pushing skb up the stack in the okfn instead. NB: It also seems dubious to use LOCAL_IN while bypassing PRE_ROUTING completely in this case but this is how its been forever so it seems preferable to not change this. Cc: Felix Fietkau <nbd@openwrt.org> Cc: Zefir Kurtisi <zefir.kurtisi@neratec.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Tested-by: NZefir Kurtisi <zefir.kurtisi@neratec.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-