- 06 10月, 2019 2 次提交
-
-
由 Jiri Pirko 提交于
As this is the last user of genl_family_attrbuf, convert to allocate attrs locally and do it in a similar way this is done in compat_doit(). Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
Benefit from the fact that the generic netlink code can parse the attrs for dumpit op and avoid need to parse it in the op callback. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 10月, 2019 1 次提交
-
-
由 Tuong Lien 提交于
We have identified a problem with the "oversubscription" policy in the link transmission code. When small messages are transmitted, and the sending link has reached the transmit window limit, those messages will be bundled and put into the link backlog queue. However, bundles of data messages are counted at the 'CRITICAL' level, so that the counter for that level, instead of the counter for the real, bundled message's level is the one being increased. Subsequent, to-be-bundled data messages at non-CRITICAL levels continue to be tested against the unchanged counter for their own level, while contributing to an unrestrained increase at the CRITICAL backlog level. This leaves a gap in congestion control algorithm for small messages that can result in starvation for other users or a "real" CRITICAL user. Even that eventually can lead to buffer exhaustion & link reset. We fix this by keeping a 'target_bskb' buffer pointer at each levels, then when bundling, we only bundle messages at the same importance level only. This way, we know exactly how many slots a certain level have occupied in the queue, so can manage level congestion accurately. By bundling messages at the same level, we even have more benefits. Let consider this: - One socket sends 64-byte messages at the 'CRITICAL' level; - Another sends 4096-byte messages at the 'LOW' level; When a 64-byte message comes and is bundled the first time, we put the overhead of message bundle to it (+ 40-byte header, data copy, etc.) for later use, but the next message can be a 4096-byte one that cannot be bundled to the previous one. This means the last bundle carries only one payload message which is totally inefficient, as for the receiver also! Later on, another 64-byte message comes, now we make a new bundle and the same story repeats... With the new bundling algorithm, this will not happen, the 64-byte messages will be bundled together even when the 4096-byte message(s) comes in between. However, if the 4096-byte messages are sent at the same level i.e. 'CRITICAL', the bundling algorithm will again cause the same overhead. Also, the same will happen even with only one socket sending small messages at a rate close to the link transmit's one, so that, when one message is bundled, it's transmitted shortly. Then, another message comes, a new bundle is created and so on... We will solve this issue radically by another patch. Fixes: 365ad353 ("tipc: reduce risk of user starvation during link congestion") Reported-by: NHoang Le <hoang.h.le@dektech.com.au> Acked-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 9月, 2019 1 次提交
-
-
由 Xin Long 提交于
Unlike kfree(p), kfree_rcu(p, rcu) won't do NULL pointer check. When tipc_nametbl_remove_publ returns NULL, the panic below happens: BUG: unable to handle kernel NULL pointer dereference at 0000000000000068 RIP: 0010:__call_rcu+0x1d/0x290 Call Trace: <IRQ> tipc_publ_notify+0xa9/0x170 [tipc] tipc_node_write_unlock+0x8d/0x100 [tipc] tipc_node_link_down+0xae/0x1d0 [tipc] tipc_node_check_dest+0x3ea/0x8f0 [tipc] ? tipc_disc_rcv+0x2c7/0x430 [tipc] tipc_disc_rcv+0x2c7/0x430 [tipc] ? tipc_rcv+0x6bb/0xf20 [tipc] tipc_rcv+0x6bb/0xf20 [tipc] ? ip_route_input_slow+0x9cf/0xb10 tipc_udp_recv+0x195/0x1e0 [tipc] ? tipc_udp_is_known_peer+0x80/0x80 [tipc] udp_queue_rcv_skb+0x180/0x460 udp_unicast_rcv_skb.isra.56+0x75/0x90 __udp4_lib_rcv+0x4ce/0xb90 ip_local_deliver_finish+0x11c/0x210 ip_local_deliver+0x6b/0xe0 ? ip_rcv_finish+0xa9/0x410 ip_rcv+0x273/0x362 Fixes: 97ede29e ("tipc: convert name table read-write lock to RCU") Reported-by: NLi Shuang <shuali@redhat.com> Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 8月, 2019 1 次提交
-
-
由 Jon Maloy 提交于
The policy for handling the skb list locks on the send and receive paths is simple. - On the send path we never need to grab the lock on the 'xmitq' list when the destination is an exernal node. - On the receive path we always need to grab the lock on the 'inputq' list, irrespective of source node. However, when transmitting node local messages those will eventually end up on the receive path of a local socket, meaning that the argument 'xmitq' in tipc_node_xmit() will become the 'ínputq' argument in the function tipc_sk_rcv(). This has been handled by always initializing the spinlock of the 'xmitq' list at message creation, just in case it may end up on the receive path later, and despite knowing that the lock in most cases never will be used. This approach is inaccurate and confusing, and has also concealed the fact that the stated 'no lock grabbing' policy for the send path is violated in some cases. We now clean up this by never initializing the lock at message creation, instead doing this at the moment we find that the message actually will enter the receive path. At the same time we fix the four locations where we incorrectly access the spinlock on the send/error path. This patch also reverts commit d12cffe9 ("tipc: ensure head->lock is initialised") which has now become redundant. CC: Eric Dumazet <edumazet@google.com> Reported-by: NChris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Reviewed-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 8月, 2019 1 次提交
-
-
由 Tuong Lien 提交于
This commit eliminates the use of the link 'stale_limit' & 'prev_from' (besides the already removed - 'stale_cnt') variables in the detection of repeated retransmit failures as there is no proper way to initialize them to avoid a false detection, i.e. it is not really a retransmission failure but due to a garbage values in the variables. Instead, a jiffies variable will be added to individual skbs (like the way we restrict the skb retransmissions) in order to mark the first skb retransmit time. Later on, at the next retransmissions, the timestamp will be checked to see if the skb in the link transmq is "too stale", that is, the link tolerance time has passed, so that a link reset will be ordered. Note, just checking on the first skb in the queue is fine enough since it must be the oldest one. A counter is also added to keep track the actual skb retransmissions' number for later checking when the failure happens. The downside of this approach is that the skb->cb[] buffer is about to be exhausted, however it is always able to allocate another memory area and keep a reference to it when needed. Fixes: 77cf8edb ("tipc: simplify stale link failure criteria") Reported-by: NHoang Le <hoang.h.le@dektech.com.au> Acked-by: NYing Xue <ying.xue@windriver.com> Acked-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 8月, 2019 1 次提交
-
-
由 Chris Packham 提交于
We set the field 'addr_trial_end' to 'jiffies', instead of the current value 0, at the moment the node address is initialized. This guarantees we don't inadvertently enter an address trial period when the node address is explicitly set by the user. Signed-off-by: NChris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 8月, 2019 1 次提交
-
-
由 John Rutherford 提交于
Since node internal messages are passed directly to the socket, it is not possible to observe those messages via tcpdump or wireshark. We now remedy this by making it possible to clone such messages and send the clones to the loopback interface. The clones are dropped at reception and have no functional role except making the traffic visible. The feature is enabled if network taps are active for the loopback device. pcap filtering restrictions require the messages to be presented to the receiving side of the loopback device. v3 - Function dev_nit_active used to check for network taps. - Procedure netif_rx_ni used to send cloned messages to loopback device. Signed-off-by: NJohn Rutherford <john.rutherford@dektech.com.au> Acked-by: NJon Maloy <jon.maloy@ericsson.com> Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 8月, 2019 2 次提交
-
-
由 Jon Maloy 提交于
In commit 365ad353 ("tipc: reduce risk of user starvation during link congestion") we allowed senders to add exactly one list of extra buffers to the link backlog queues during link congestion (aka "oversubscription"). However, the criteria for when to stop adding wakeup messages to the input queue when the overload abates is inaccurate, and may cause starvation problems during very high load. Currently, we stop adding wakeup messages after 10 total failed attempts where we find that there is no space left in the backlog queue for a certain importance level. The counter for this is accumulated across all levels, which may lead the algorithm to leave the loop prematurely, although there may still be plenty of space available at some levels. The result is sometimes that messages near the wakeup queue tail are not added to the input queue as they should be. We now introduce a more exact algorithm, where we keep adding wakeup messages to a level as long as the backlog queue has free slots for the corresponding level, and stop at the moment there are no more such slots or when there are no more wakeup messages to dequeue. Fixes: 365ad353 ("tipc: reduce risk of user starvation during link congestion") Reported-by: NTung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taras Kondratiuk 提交于
Commit 2753ca5d ("tipc: fix uninit-value in tipc_nl_compat_doit") broke older tipc tools that use compat interface (e.g. tipc-config from tipcutils package): % tipc-config -p operation not supported The commit started to reject TIPC netlink compat messages that do not have attributes. It is too restrictive because some of such messages are valid (they don't need any arguments): % grep 'tx none' include/uapi/linux/tipc_config.h #define TIPC_CMD_NOOP 0x0000 /* tx none, rx none */ #define TIPC_CMD_GET_MEDIA_NAMES 0x0002 /* tx none, rx media_name(s) */ #define TIPC_CMD_GET_BEARER_NAMES 0x0003 /* tx none, rx bearer_name(s) */ #define TIPC_CMD_SHOW_PORTS 0x0006 /* tx none, rx ultra_string */ #define TIPC_CMD_GET_REMOTE_MNG 0x4003 /* tx none, rx unsigned */ #define TIPC_CMD_GET_MAX_PORTS 0x4004 /* tx none, rx unsigned */ #define TIPC_CMD_GET_NETID 0x400B /* tx none, rx unsigned */ #define TIPC_CMD_NOT_NET_ADMIN 0xC001 /* tx none, rx none */ This patch relaxes the original fix and rejects messages without arguments only if such arguments are expected by a command (reg_type is non zero). Fixes: 2753ca5d ("tipc: fix uninit-value in tipc_nl_compat_doit") Cc: stable@vger.kernel.org Signed-off-by: NTaras Kondratiuk <takondra@cisco.com> Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 7月, 2019 1 次提交
-
-
由 Jon Maloy 提交于
Our test suite somtimes provokes the following crash: Description of problem: [ 1092.597234] BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8 [ 1092.605072] PGD 0 P4D 0 [ 1092.607620] Oops: 0000 [#1] SMP PTI [ 1092.611118] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Not tainted 4.18.0-122.el8.x86_64 #1 [ 1092.619724] Hardware name: Dell Inc. PowerEdge R740/08D89F, BIOS 1.3.7 02/08/2018 [ 1092.627215] RIP: 0010:tipc_mcast_filter_msg+0x93/0x2d0 [tipc] [ 1092.632955] Code: 0f 84 aa 01 00 00 89 cf 4d 01 ca 4c 8b 26 c1 ef 19 83 e7 0f 83 ff 0c 4d 0f 45 d1 41 8b 6a 10 0f cd 4c 39 e6 0f 84 81 01 00 00 <4d> 8b 9c 24 e8 00 00 00 45 8b 13 41 0f ca 44 89 d7 c1 ef 13 83 e7 [ 1092.651703] RSP: 0018:ffff929e5fa83a18 EFLAGS: 00010282 [ 1092.656927] RAX: ffff929e3fb38100 RBX: 00000000069f29ee RCX: 00000000416c0045 [ 1092.664058] RDX: ffff929e5fa83a88 RSI: ffff929e31a28420 RDI: 0000000000000000 [ 1092.671209] RBP: 0000000029b11821 R08: 0000000000000000 R09: ffff929e39b4407a [ 1092.678343] R10: ffff929e39b4407a R11: 0000000000000007 R12: 0000000000000000 [ 1092.685475] R13: 0000000000000001 R14: ffff929e3fb38100 R15: ffff929e39b4407a [ 1092.692614] FS: 0000000000000000(0000) GS:ffff929e5fa80000(0000) knlGS:0000000000000000 [ 1092.700702] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1092.706447] CR2: 00000000000000e8 CR3: 000000031300a004 CR4: 00000000007606e0 [ 1092.713579] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1092.720712] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1092.727843] PKRU: 55555554 [ 1092.730556] Call Trace: [ 1092.733010] <IRQ> [ 1092.735034] tipc_sk_filter_rcv+0x7ca/0xb80 [tipc] [ 1092.739828] ? __kmalloc_node_track_caller+0x1cb/0x290 [ 1092.744974] ? dev_hard_start_xmit+0xa5/0x210 [ 1092.749332] tipc_sk_rcv+0x389/0x640 [tipc] [ 1092.753519] tipc_sk_mcast_rcv+0x23c/0x3a0 [tipc] [ 1092.758224] tipc_rcv+0x57a/0xf20 [tipc] [ 1092.762154] ? ktime_get_real_ts64+0x40/0xe0 [ 1092.766432] ? tpacket_rcv+0x50/0x9f0 [ 1092.770098] tipc_l2_rcv_msg+0x4a/0x70 [tipc] [ 1092.774452] __netif_receive_skb_core+0xb62/0xbd0 [ 1092.779164] ? enqueue_entity+0xf6/0x630 [ 1092.783084] ? kmem_cache_alloc+0x158/0x1c0 [ 1092.787272] ? __build_skb+0x25/0xd0 [ 1092.790849] netif_receive_skb_internal+0x42/0xf0 [ 1092.795557] napi_gro_receive+0xba/0xe0 [ 1092.799417] mlx5e_handle_rx_cqe+0x83/0xd0 [mlx5_core] [ 1092.804564] mlx5e_poll_rx_cq+0xd5/0x920 [mlx5_core] [ 1092.809536] mlx5e_napi_poll+0xb2/0xce0 [mlx5_core] [ 1092.814415] ? __wake_up_common_lock+0x89/0xc0 [ 1092.818861] net_rx_action+0x149/0x3b0 [ 1092.822616] __do_softirq+0xe3/0x30a [ 1092.826193] irq_exit+0x100/0x110 [ 1092.829512] do_IRQ+0x85/0xd0 [ 1092.832483] common_interrupt+0xf/0xf [ 1092.836147] </IRQ> [ 1092.838255] RIP: 0010:cpuidle_enter_state+0xb7/0x2a0 [ 1092.843221] Code: e8 3e 79 a5 ff 80 7c 24 03 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d7 01 00 00 31 ff e8 a0 6b ab ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 4c 29 f3 ba ff ff ff 7f 48 39 c3 7f [ 1092.861967] RSP: 0018:ffffaa5ec6533e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd [ 1092.869530] RAX: ffff929e5faa3100 RBX: 000000fe63dd2092 RCX: 000000000000001f [ 1092.876665] RDX: 000000fe63dd2092 RSI: 000000003a518aaa RDI: 0000000000000000 [ 1092.883795] RBP: 0000000000000003 R08: 0000000000000004 R09: 0000000000022940 [ 1092.890929] R10: 0000040cb0666b56 R11: ffff929e5faa20a8 R12: ffff929e5faade78 [ 1092.898060] R13: ffffffffb59258f8 R14: 000000fe60f3228d R15: 0000000000000000 [ 1092.905196] ? cpuidle_enter_state+0x92/0x2a0 [ 1092.909555] do_idle+0x236/0x280 [ 1092.912785] cpu_startup_entry+0x6f/0x80 [ 1092.916715] start_secondary+0x1a7/0x200 [ 1092.920642] secondary_startup_64+0xb7/0xc0 [...] The reason is that the skb list tipc_socket::mc_method.deferredq only is initialized for connectionless sockets, while nothing stops arriving multicast messages from being filtered by connection oriented sockets, with subsequent access to the said list. We fix this by initializing the list unconditionally at socket creation. This eliminates the crash, while the message still is dropped further down in tipc_sk_filter_rcv() as it should be. Reported-by: NLi Shuang <shuali@redhat.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Reviewed-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 7月, 2019 2 次提交
-
-
由 Tuong Lien 提交于
In conjunction with changing the interfaces' MTU (e.g. especially in the case of a bonding) where the TIPC links are brought up and down in a short time, a couple of issues were detected with the current link changeover mechanism: 1) When one link is up but immediately forced down again, the failover procedure will be carried out in order to failover all the messages in the link's transmq queue onto the other working link. The link and node state is also set to FAILINGOVER as part of the process. The message will be transmited in form of a FAILOVER_MSG, so its size is plus of 40 bytes (= the message header size). There is no problem if the original message size is not larger than the link's MTU - 40, and indeed this is the max size of a normal payload messages. However, in the situation above, because the link has just been up, the messages in the link's transmq are almost SYNCH_MSGs which had been generated by the link synching procedure, then their size might reach the max value already! When the FAILOVER_MSG is built on the top of such a SYNCH_MSG, its size will exceed the link's MTU. As a result, the messages are dropped silently and the failover procedure will never end up, the link will not be able to exit the FAILINGOVER state, so cannot be re-established. 2) The same scenario above can happen more easily in case the MTU of the links is set differently or when changing. In that case, as long as a large message in the failure link's transmq queue was built and fragmented with its link's MTU > the other link's one, the issue will happen (there is no need of a link synching in advance). 3) The link synching procedure also faces with the same issue but since the link synching is only started upon receipt of a SYNCH_MSG, dropping the message will not result in a state deadlock, but it is not expected as design. The 1) & 3) issues are resolved by the last commit that only a dummy SYNCH_MSG (i.e. without data) is generated at the link synching, so the size of a FAILOVER_MSG if any then will never exceed the link's MTU. For the 2) issue, the only solution is trying to fragment the messages in the failure link's transmq queue according to the working link's MTU so they can be failovered then. A new function is made to accomplish this, it will still be a TUNNEL PROTOCOL/FAILOVER MSG but if the original message size is too large, it will be fragmented & reassembled at the receiving side. Acked-by: NYing Xue <ying.xue@windriver.com> Acked-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tuong Lien 提交于
This commit along with the next one are to resolve the issues with the link changeover mechanism. See that commit for details. Basically, for the link synching, from now on, we will send only one single ("dummy") SYNCH message to peer. The SYNCH message does not contain any data, just a header conveying the synch point to the peer. A new node capability flag ("TIPC_TUNNEL_ENHANCED") is introduced for backward compatible! Acked-by: NYing Xue <ying.xue@windriver.com> Acked-by: NJon Maloy <jon.maloy@ericsson.com> Suggested-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 7月, 2019 1 次提交
-
-
由 Christophe JAILLET 提交于
s/tipc_toprsv_listener_data_ready/tipc_topsrv_listener_data_ready/ (r and s switched in topsrv) Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 7月, 2019 1 次提交
-
-
由 Matteo Croce 提交于
In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.comSigned-off-by: NMatteo Croce <mcroce@redhat.com> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NKees Cook <keescook@chromium.org> Reviewed-by: NAaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 7月, 2019 1 次提交
-
-
由 Jon Maloy 提交于
The tipc_msg_validate() function leaves a boolean flag 'validated' in the validated buffer's control block, to avoid performing this action more than once. However, at reception of new packets, the position of this field may already have been set by lower layer protocols, so that the packet is erroneously perceived as already validated by TIPC. We fix this by initializing the said field to 'false' before performing the initial validation. Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 7月, 2019 1 次提交
-
-
由 Chris Packham 提交于
tipc_named_node_up() creates a skb list. It passes the list to tipc_node_xmit() which has some code paths that can call skb_queue_purge() which relies on the list->lock being initialised. The spin_lock is only needed if the messages end up on the receive path but when the list is created in tipc_named_node_up() we don't necessarily know if it is going to end up there. Once all the skb list users are updated in tipc it will then be possible to update them to use the unlocked variants of the skb list functions and initialise the lock when we know the message will follow the receive path. Signed-off-by: NChris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 7月, 2019 1 次提交
-
-
由 Xin Long 提交于
For these places are protected by rcu_read_lock, we change from rcu_dereference_rtnl to rcu_dereference, as there is no need to check if rtnl lock is held. For these places are protected by rtnl_lock, we change from rcu_dereference_rtnl to rtnl_dereference/rcu_dereference_protected, as no extra memory barriers are needed under rtnl_lock() which also protects tn->bearer_list[] and dev->tipc_ptr/b->media_ptr updating. rcu_dereference_rtnl will be only used in the places where it could be under rcu_read_lock or rtnl_lock. Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 7月, 2019 1 次提交
-
-
由 Xin Long 提交于
Both tipc_udp_enable and tipc_udp_disable are called under rtnl_lock, ub->ubsock could never be NULL in tipc_udp_disable and cleanup_bearer, so remove the check. Also remove the one in tipc_udp_enable by adding "free" label. Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 7月, 2019 1 次提交
-
-
由 Jon Maloy 提交于
The macro TIPC_BC_RETR_LIM is always used in combination with 'jiffies', so we can just as well perform the addition in the macro itself. This way, we get a few shorter code lines and one less line break. Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 6月, 2019 1 次提交
-
-
由 Xin Long 提交于
As other udp/ip tunnels do, tipc udp media should also have a lockless dst_cache supported on its tx path. Here we add dst_cache into udp_replicast to support dst cache for both rmcast and rcast, and rmcast uses ub->rcast and each rcast uses its own node in ub->rcast.list. Signed-off-by: NXin Long <lucien.xin@gmail.com> Acked-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 6月, 2019 3 次提交
-
-
由 Jon Maloy 提交于
We rename the inline function msg_get_wrapped() to the more comprehensible msg_inner_hdr(). Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Maloy 提交于
We increase the allocated headroom for the buffer copies to be retransmitted. This eliminates the need for the lower stack levels (UDP/IP/L2) to expand the headroom in order to add their own headers. Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Maloy 提交于
In commit a4dc70d4 ("tipc: extend link reset criteria for stale packet retransmission") we made link retransmission failure events dependent on the link tolerance, and not only of the number of failed retransmission attempts, as we did earlier. This works well. However, keeping the original, additional criteria of 99 failed retransmissions is now redundant, and may in some cases lead to failure detection times in the order of minutes instead of the expected 1.5 sec link tolerance value. We now remove this criteria altogether. Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 6月, 2019 1 次提交
-
-
由 Xin Long 提交于
This patch is to fix an uninit-value issue, reported by syzbot: BUG: KMSAN: uninit-value in memchr+0xce/0x110 lib/string.c:981 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x191/0x1f0 lib/dump_stack.c:113 kmsan_report+0x130/0x2a0 mm/kmsan/kmsan.c:622 __msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:310 memchr+0xce/0x110 lib/string.c:981 string_is_valid net/tipc/netlink_compat.c:176 [inline] tipc_nl_compat_bearer_disable+0x2a1/0x480 net/tipc/netlink_compat.c:449 __tipc_nl_compat_doit net/tipc/netlink_compat.c:327 [inline] tipc_nl_compat_doit+0x3ac/0xb00 net/tipc/netlink_compat.c:360 tipc_nl_compat_handle net/tipc/netlink_compat.c:1178 [inline] tipc_nl_compat_recv+0x1b1b/0x27b0 net/tipc/netlink_compat.c:1281 TLV_GET_DATA_LEN() may return a negtive int value, which will be used as size_t (becoming a big unsigned long) passed into memchr, cause this issue. Similar to what it does in tipc_nl_compat_bearer_enable(), this fix is to return -EINVAL when TLV_GET_DATA_LEN() is negtive in tipc_nl_compat_bearer_disable(), as well as in tipc_nl_compat_link_stat_dump() and tipc_nl_compat_link_reset_stats(). v1->v2: - add the missing Fixes tags per Eric's request. Fixes: 0762216c ("tipc: fix uninit-value in tipc_nl_compat_bearer_enable") Fixes: 8b66fee7 ("tipc: fix uninit-value in tipc_nl_compat_link_reset_stats") Reported-by: syzbot+30eaa8bf392f7fafffaf@syzkaller.appspotmail.com Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 6月, 2019 2 次提交
-
-
由 Xin Long 提交于
tipc_nl_compat_bearer_set() is only called by tipc_nl_compat_link_set() which already does the check for msg->req check, so remove it from tipc_nl_compat_bearer_set(), and do the same in tipc_nl_compat_media_set(). Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
Fix misalignment of policy statement in netlink.c due to automatic spatch code transformation. Fixes: 3b0f31f2 ("genetlink: make policy common to family") Acked-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NJohn Rutherford <john.rutherford@dektech.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 6月, 2019 1 次提交
-
-
由 Xin Long 提交于
This patch is to fix a dst defcnt leak, which can be reproduced by doing: # ip net a c; ip net a s; modprobe tipc # ip net e s ip l a n eth1 type veth peer n eth1 netns c # ip net e c ip l s lo up; ip net e c ip l s eth1 up # ip net e s ip l s lo up; ip net e s ip l s eth1 up # ip net e c ip a a 1.1.1.2/8 dev eth1 # ip net e s ip a a 1.1.1.1/8 dev eth1 # ip net e c tipc b e m udp n u1 localip 1.1.1.2 # ip net e s tipc b e m udp n u1 localip 1.1.1.1 # ip net d c; ip net d s; rmmod tipc and it will get stuck and keep logging the error: unregister_netdevice: waiting for lo to become free. Usage count = 1 The cause is that a dst is held by the udp sock's sk_rx_dst set on udp rx path with udp_early_demux == 1, and this dst (eventually holding lo dev) can't be released as bearer's removal in tipc pernet .exit happens after lo dev's removal, default_device pernet .exit. "There are two distinct types of pernet_operations recognized: subsys and device. At creation all subsys init functions are called before device init functions, and at destruction all device exit functions are called before subsys exit function." So by calling register_pernet_device instead to register tipc_net_ops, the pernet .exit() will be invoked earlier than loopback dev's removal when a netns is being destroyed, as fou/gue does. Note that vxlan and geneve udp tunnels don't have this issue, as the udp sock is released in their device ndo_stop(). This fix is also necessary for tipc dst_cache, which will hold dsts on tx path and I will introduce in my next patch. Reported-by: NLi Shuang <shuali@redhat.com> Signed-off-by: NXin Long <lucien.xin@gmail.com> Acked-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 6月, 2019 2 次提交
-
-
由 Xin Long 提交于
udp_tunnel(6)_xmit_skb() called by tipc_udp_xmit() expects a tunnel device to count packets on dev->tstats, a perpcu variable. However, TIPC is using udp tunnel with no tunnel device, and pass the lower dev, like veth device that only initializes dev->lstats(a perpcu variable) when creating it. Later iptunnel_xmit_stats() called by ip(6)tunnel_xmit() thinks the dev as a tunnel device, and uses dev->tstats instead of dev->lstats. tstats' each pointer points to a bigger struct than lstats, so when tstats->tx_bytes is increased, other percpu variable's members could be overwritten. syzbot has reported quite a few crashes due to fib_nh_common percpu member 'nhc_pcpu_rth_output' overwritten, call traces are like: BUG: KASAN: slab-out-of-bounds in rt_cache_valid+0x158/0x190 net/ipv4/route.c:1556 rt_cache_valid+0x158/0x190 net/ipv4/route.c:1556 __mkroute_output net/ipv4/route.c:2332 [inline] ip_route_output_key_hash_rcu+0x819/0x2d50 net/ipv4/route.c:2564 ip_route_output_key_hash+0x1ef/0x360 net/ipv4/route.c:2393 __ip_route_output_key include/net/route.h:125 [inline] ip_route_output_flow+0x28/0xc0 net/ipv4/route.c:2651 ip_route_output_key include/net/route.h:135 [inline] ... or: kasan: GPF could be caused by NULL-ptr deref or user memory access RIP: 0010:dst_dev_put+0x24/0x290 net/core/dst.c:168 <IRQ> rt_fibinfo_free_cpus net/ipv4/fib_semantics.c:200 [inline] free_fib_info_rcu+0x2e1/0x490 net/ipv4/fib_semantics.c:217 __rcu_reclaim kernel/rcu/rcu.h:240 [inline] rcu_do_batch kernel/rcu/tree.c:2437 [inline] invoke_rcu_callbacks kernel/rcu/tree.c:2716 [inline] rcu_process_callbacks+0x100a/0x1ac0 kernel/rcu/tree.c:2697 ... The issue exists since tunnel stats update is moved to iptunnel_xmit by Commit 039f5062 ("ip_tunnel: Move stats update to iptunnel_xmit()"), and here to fix it by passing a NULL tunnel dev to udp_tunnel(6)_xmit_skb so that the packets counting won't happen on dev->tstats. Reported-by: syzbot+9d4c12bfd45a58738d0a@syzkaller.appspotmail.com Reported-by: syzbot+a9e23ea2aa21044c2798@syzkaller.appspotmail.com Reported-by: syzbot+c4c4b2bb358bb936ad7e@syzkaller.appspotmail.com Reported-by: syzbot+0290d2290a607e035ba1@syzkaller.appspotmail.com Reported-by: syzbot+a43d8d4e7e8a7a9e149e@syzkaller.appspotmail.com Reported-by: syzbot+a47c5f4c6c00fc1ed16e@syzkaller.appspotmail.com Fixes: 039f5062 ("ip_tunnel: Move stats update to iptunnel_xmit()") Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tuong Lien 提交于
It appears that a FAILOVER_MSG can come from peer even when the failure link is resetting (i.e. just after the 'node_write_unlock()'...). This means the failover procedure on the node has not been started yet. The situation is as follows: node1 node2 linkb linka linka linkb | | | | | | x failure | | | RESETTING | | | | | | x failure RESET | | RESETTING FAILINGOVER | | | (FAILOVER_MSG) | | |<-------------------------------------------------| | *FAILINGOVER | | | | | (dummy FAILOVER_MSG) | | |------------------------------------------------->| | RESET | | FAILOVER_END | FAILINGOVER RESET | . . . . . . . . . . . . Once this happens, the link failover procedure will be triggered wrongly on the receiving node since the node isn't in FAILINGOVER state but then another link failover will be carried out. The consequences are: 1) A peer might get stuck in FAILINGOVER state because the 'sync_point' was set, reset and set incorrectly, the criteria to end the failover would not be met, it could keep waiting for a message that has already received. 2) The early FAILOVER_MSG(s) could be queued in the link failover deferdq but would be purged or not pulled out because the 'drop_point' was not set correctly. 3) The early FAILOVER_MSG(s) could be dropped too. 4) The dummy FAILOVER_MSG could make the peer leaving FAILINGOVER state shortly, but later on it would be restarted. The same situation can also happen when the link is in PEER_RESET state and a FAILOVER_MSG arrives. The commit resolves the issues by forcing the link down immediately, so the failover procedure will be started normally (which is the same as when receiving a FAILOVER_MSG and the link is in up state). Also, the function "tipc_node_link_failover()" is toughen to avoid such a situation from happening. Acked-by: NJon Maloy <jon.maloy@ericsson.se> Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 6月, 2019 1 次提交
-
-
由 Tuong Lien 提交于
In patch series, commit 9195948f ("tipc: improve TIPC throughput by Gap ACK blocks"), as for simplicity, the repeated retransmit failures' detection in the function - "tipc_link_retrans()" was kept there for broadcast retransmissions only. This commit now reapplies this feature for link unicast retransmissions that has been done via the function - "tipc_link_advance_transmq()". Also, the "tipc_link_retrans()" is renamed to "tipc_link_bc_retrans()" as it is used only for broadcast. Acked-by: NJon Maloy <jon.maloy@ericsson.se> Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 6月, 2019 1 次提交
-
-
由 Xin Long 提交于
Syzbot reported a memleak caused by grp members' deferredq list not purged when the grp is be deleted. The issue occurs when more(msg_grp_bc_seqno(hdr), m->bc_rcv_nxt) in tipc_group_filter_msg() and the skb will stay in deferredq. So fix it by calling __skb_queue_purge for each member's deferredq in tipc_group_delete() when a tipc sk leaves the grp. Fixes: b87a5ea3 ("tipc: guarantee group unicast doesn't bypass group broadcast") Reported-by: syzbot+78fbe679c8ca8d264a8d@syzkaller.appspotmail.com Signed-off-by: NXin Long <lucien.xin@gmail.com> Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 6月, 2019 1 次提交
-
-
由 Mauro Carvalho Chehab 提交于
The kbuild documentation clearly shows that the documents there are written at different times: some use markdown, some use their own peculiar logic to split sections. Convert everything to ReST without affecting too much the author's style and avoiding adding uneeded markups. The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: NJonathan Corbet <corbet@lwn.net>
-
- 21 5月, 2019 2 次提交
-
-
由 Thomas Gleixner 提交于
Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Junwei Hu 提交于
Error message printed: modprobe: ERROR: could not insert 'tipc': Address family not supported by protocol. when modprobe tipc after the following patch: switch order of device registration, commit 7e27e8d6 ("tipc: switch order of device registration to fix a crash") Because sock_create_kern(net, AF_TIPC, ...) called by tipc_topsrv_create_listener() in the initialization process of tipc_init_net(), so tipc_socket_init() must be execute before that. Meanwhile, tipc_net_id need to be initialized when sock_create() called, and tipc_socket_init() is no need to be called for each namespace. I add a variable tipc_topsrv_net_ops, and split the register_pernet_subsys() of tipc into two parts, and split tipc_socket_init() with initialization of pernet params. By the way, I fixed resources rollback error when tipc_bcast_init() failed in tipc_init_net(). Fixes: 7e27e8d6 ("tipc: switch order of device registration to fix a crash") Signed-off-by: NJunwei Hu <hujunwei4@huawei.com> Reported-by: NWang Wang <wangwang2@huawei.com> Reported-by: syzbot+1e8114b61079bfe9cbc5@syzkaller.appspotmail.com Reviewed-by: NKang Zhou <zhoukang7@huawei.com> Reviewed-by: NSuanming Mou <mousuanming@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 5月, 2019 2 次提交
-
-
由 David S. Miller 提交于
This reverts commit 532b0f7e. More revisions coming up. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Junwei Hu 提交于
Error message printed: modprobe: ERROR: could not insert 'tipc': Address family not supported by protocol. when modprobe tipc after the following patch: switch order of device registration, commit 7e27e8d6 ("tipc: switch order of device registration to fix a crash") Because sock_create_kern(net, AF_TIPC, ...) is called by tipc_topsrv_create_listener() in the initialization process of tipc_net_ops, tipc_socket_init() must be execute before that. I move tipc_socket_init() into function tipc_init_net(). Fixes: 7e27e8d6 ("tipc: switch order of device registration to fix a crash") Signed-off-by: NJunwei Hu <hujunwei4@huawei.com> Reported-by: NWang Wang <wangwang2@huawei.com> Reviewed-by: NKang Zhou <zhoukang7@huawei.com> Reviewed-by: NSuanming Mou <mousuanming@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 5月, 2019 1 次提交
-
-
由 Junwei Hu 提交于
When tipc is loaded while many processes try to create a TIPC socket, a crash occurs: PANIC: Unable to handle kernel paging request at virtual address "dfff20000000021d" pc : tipc_sk_create+0x374/0x1180 [tipc] lr : tipc_sk_create+0x374/0x1180 [tipc] Exception class = DABT (current EL), IL = 32 bits Call trace: tipc_sk_create+0x374/0x1180 [tipc] __sock_create+0x1cc/0x408 __sys_socket+0xec/0x1f0 __arm64_sys_socket+0x74/0xa8 ... This is due to race between sock_create and unfinished register_pernet_device. tipc_sk_insert tries to do "net_generic(net, tipc_net_id)". but tipc_net_id is not initialized yet. So switch the order of the two to close the race. This can be reproduced with multiple processes doing socket(AF_TIPC, ...) and one process doing module removal. Fixes: a62fbcce ("tipc: make subscriber server support net namespace") Signed-off-by: NJunwei Hu <hujunwei4@huawei.com> Reported-by: NWang Wang <wangwang2@huawei.com> Reviewed-by: NXiaogang Wang <wangxiaogang3@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 5月, 2019 1 次提交
-
-
由 Parthasarathy Bhuvaragan 提交于
commit 517d7c79 ("tipc: fix hanging poll() for stream sockets") introduced a regression for clients using non-blocking sockets. After the commit, we send EPOLLOUT event to the client even in TIPC_CONNECTING state. This causes the subsequent send() to fail with ENOTCONN, as the socket is still not in TIPC_ESTABLISHED state. In this commit, we: - improve the fix for hanging poll() by replacing sk_data_ready() with sk_state_change() to wake up all clients. - revert the faulty updates introduced by commit 517d7c79 ("tipc: fix hanging poll() for stream sockets"). Fixes: 517d7c79 ("tipc: fix hanging poll() for stream sockets") Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com> Acked-by: NJon Maloy <jon.maloy@ericsson.se> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 5月, 2019 1 次提交
-
-
由 Tuong Lien 提交于
TIPC link can temporarily fall into "half-establish" that only one of the link endpoints is ESTABLISHED and starts to send traffic, PROTOCOL messages, whereas the other link endpoint is not up (e.g. immediately when the endpoint receives ACTIVATE_MSG, the network interface goes down...). This is a normal situation and will be settled because the link endpoint will be eventually brought down after the link tolerance time. However, the situation will become worse when the second link is established before the first link endpoint goes down, For example: 1. Both links <1A-2A>, <1B-2B> down 2. Link endpoint 2A up, but 1A still down (e.g. due to network disturbance, wrong session, etc.) 3. Link <1B-2B> up 4. Link endpoint 2A down (e.g. due to link tolerance timeout) 5. Node B starts failover onto link <1B-2B> ==> Node A does never start link failover. When the "half-failover" situation happens, two consequences have been observed: a) Peer link/node gets stuck in FAILINGOVER state; b) Traffic or user messages that peer node is trying to failover onto the second link can be partially or completely dropped by this node. The consequence a) was actually solved by commit c140eb16 ("tipc: fix failover problem"), but that commit didn't cover the b). It's due to the fact that the tunnel link endpoint has never been prepared for a failover, so the 'l->drop_point' (and the other data...) is not set correctly. When a TUNNEL_MSG from peer node arrives on the link, depending on the inner message's seqno and the current 'l->drop_point' value, the message can be dropped (- treated as a duplicate message) or processed. At this early stage, the traffic messages from peer are likely to be NAME_DISTRIBUTORs, this means some name table entries will be missed on the node forever! The commit resolves the issue by starting the FAILOVER process on this node as well. Another benefit from this solution is that we ensure the link will not be re-established until the failover ends. Acked-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-