提交 · 00483690552c5fb6aa30bf3acb75b0ee89b4c0fd · openanolis / cloud-kernel

11 5月, 2018 11 次提交

tcp: Add mark for TIMEWAIT sockets · 00483690

由 Jon Maxwell 提交于 5月 10, 2018

This version has some suggestions by Eric Dumazet:

- Use a local variable for the mark in IPv6 instead of ctl_sk to avoid SMP
races.
- Use the more elegant "IP4_REPLY_MARK(net, skb->mark) ?: sk->sk_mark"
statement.
- Factorize code as sk_fullsock() check is not necessary.

Aidan McGurn from Openwave Mobility systems reported the following bug:

"Marked routing is broken on customer deployment. Its effects are large
increase in Uplink retransmissions caused by the client never receiving
the final ACK to their FINACK - this ACK misses the mark and routes out
of the incorrect route."

Currently marks are added to sk_buffs for replies when the "fwmark_reflect"
sysctl is enabled. But not for TW sockets that had sk->sk_mark set via
setsockopt(SO_MARK..).

Fix this in IPv4/v6 by adding tw->tw_mark for TIME_WAIT sockets. Copy the the
original sk->sk_mark in __inet_twsk_hashdance() to the new tw->tw_mark location.
Then progate this so that the skb gets sent with the correct mark. Do the same
for resets. Give the "fwmark_reflect" sysctl precedence over sk->sk_mark so that
netfilter rules are still honored.
Signed-off-by: NJon Maxwell <jmaxwell37@gmail.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00483690

net: ipv4: remove define INET_CSK_DEBUG and unnecessary EXPORT_SYMBOL · 03bdfc00

由 Joe Perches 提交于 5月 09, 2018

INET_CSK_DEBUG is always set and only is used for 2 pr_debug calls.

EXPORT_SYMBOL(inet_csk_timer_bug_msg) is only used by these 2
pr_debug calls and is also unnecessary as the exported string can
be used directly by these calls.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03bdfc00

net/ipv6: fix lock imbalance in ip6_route_del() · 9e575010

由 Eric Dumazet 提交于 5月 09, 2018

WARNING: lock held when returning to user space!
4.17.0-rc3+ #37 Not tainted

syz-executor1/27662 is leaving the kernel with locks still held!
1 lock held by syz-executor1/27662:
 #0: 00000000f661aee7 (rcu_read_lock){....}, at: ip6_route_del+0xea/0x13f0 net/ipv6/route.c:3206
BUG: scheduling while atomic: syz-executor1/27662/0x00000002
INFO: lockdep is turned off.
Modules linked in:
Kernel panic - not syncing: scheduling while atomic

CPU: 1 PID: 27662 Comm: syz-executor1 Not tainted 4.17.0-rc3+ #37
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 panic+0x22f/0x4de kernel/panic.c:184
 __schedule_bug.cold.85+0xdf/0xdf kernel/sched/core.c:3290
 schedule_debug kernel/sched/core.c:3307 [inline]
 __schedule+0x139e/0x1e30 kernel/sched/core.c:3412
 schedule+0xef/0x430 kernel/sched/core.c:3549
 exit_to_usermode_loop+0x220/0x310 arch/x86/entry/common.c:152
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
 do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455979
RSP: 002b:00007fbf4051dc68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: 0000000000000000 RBX: 00007fbf4051e6d4 RCX: 0000000000455979
RDX: 00000000200001c0 RSI: 000000000000890c RDI: 0000000000000013
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000003c8 R14: 00000000006f9b60 R15: 0000000000000000
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..

Fixes: 23fb93a4 ("net/ipv6: Cleanup exception and cache route handling")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: David Ahern <dsahern@gmail.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e575010

net: dsa: fix added_by_user switchdev notification · a37fb855

由 Vivien Didelot 提交于 5月 08, 2018

Commit 161d82de ("net: bridge: Notify about !added_by_user FDB
entries") causes the below oops when bringing up a slave interface,
because dsa_port_fdb_add is still scheduled, but with a NULL address.

To fix this, keep the dsa_slave_switchdev_event function agnostic of the
notified info structure and handle the added_by_user flag in the
specific dsa_slave_switchdev_event_work function.

    [   75.512263] Unable to handle kernel NULL pointer dereference at virtual address 00000000
    [   75.519063] pgd = (ptrval)
    [   75.520545] [00000000] *pgd=00000000
    [   75.522839] Internal error: Oops: 17 [#1] ARM
    [   75.525898] Modules linked in:
    [   75.527673] CPU: 0 PID: 9 Comm: kworker/u2:1 Not tainted 4.17.0-rc2 #78
    [   75.532988] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
    [   75.538153] Workqueue: dsa_ordered dsa_slave_switchdev_event_work
    [   75.542970] PC is at mv88e6xxx_port_db_load_purge+0x60/0x1b0
    [   75.547341] LR is at mdiobus_read_nested+0x6c/0x78
    [   75.550833] pc : [<804cd5c0>]    lr : [<804bba84>]    psr: 60070013
    [   75.555796] sp : 9f54bd78  ip : 9f54bd87  fp : 9f54bddc
    [   75.559719] r10: 00000000  r9 : 0000000e  r8 : 9f6a6010
    [   75.563643] r7 : 00000000  r6 : 81203048  r5 : 9f6a6010  r4 : 9f6a601c
    [   75.568867] r3 : 00000000  r2 : 00000000  r1 : 0000000d  r0 : 00000000
    [   75.574094] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
    [   75.579933] Control: 10c53c7d  Table: 9de20059  DAC: 00000051
    [   75.584384] Process kworker/u2:1 (pid: 9, stack limit = 0x(ptrval))
    [   75.589349] Stack: (0x9f54bd78 to 0x9f54c000)
    [   75.592406] bd60:                                                       00000000 00000000
    [   75.599295] bd80: 00000391 9f299d10 9f299d68 8014317c 9f7f0000 8120af00 00006dc2 00000000
    [   75.606186] bda0: 8120af00 00000000 9f54bdec 1c9f5d92 8014317c 9f6a601c 9f6a6010 00000000
    [   75.613076] bdc0: 00000000 00000000 9dd1141c 8125a0b4 9f54be0c 9f54bde0 804cd8a8 804cd56c
    [   75.619966] bde0: 0000000e 80143680 00000001 9dce9c1c 81203048 9dce9c10 00000003 00000000
    [   75.626858] be00: 9f54be5c 9f54be10 806abcac 804cd864 9f54be54 80143664 8014317c 80143054
    [   75.633748] be20: ffcaa81d 00000000 812030b0 1c9f5d92 00000000 81203048 9f54beb4 00000003
    [   75.640639] be40: ffffffff 00000000 9dd1141c 8125a0b4 9f54be84 9f54be60 80138e98 806abb18
    [   75.647529] be60: 81203048 9ddc4000 9dce9c54 9f72a300 00000000 00000000 9f54be9c 9f54be88
    [   75.654420] be80: 801390bc 80138e50 00000000 9dce9c54 9f54beac 9f54bea0 806a9524 801390a0
    [   75.661310] bea0: 9f54bedc 9f54beb0 806a9c7c 806a950c 9f54becc 00000000 00000000 00000000
    [   75.668201] bec0: 9f540000 1c9f5d92 805fe604 9ddffc00 9f54befc 9f54bee0 806ab228 806a9c38
    [   75.675092] bee0: 806ab178 9ddffc00 9f4c1900 9f40d200 9f54bf34 9f54bf00 80131e30 806ab184
    [   75.681983] bf00: 9f40d214 9f54a038 9f40d200 9f40d200 9f4c1918 812119a0 9f40d214 9f54a038
    [   75.688873] bf20: 9f40d200 9f4c1900 9f54bf7c 9f54bf38 80132124 80131d1c 9f5f2dd8 00000000
    [   75.695764] bf40: 812119a0 9f54a038 812119a0 81259c5b 9f5f2dd8 9f5f2dc0 9f53dbc0 00000000
    [   75.702655] bf60: 9f4c1900 801320b4 9f5f2dd8 9f4f7e88 9f54bfac 9f54bf80 80137ad0 801320c0
    [   75.709544] bf80: 9f54a000 9f53dbc0 801379a0 00000000 00000000 00000000 00000000 00000000
    [   75.716434] bfa0: 00000000 9f54bfb0 801010e8 801379ac 00000000 00000000 00000000 00000000
    [   75.723324] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    [   75.730206] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
    [   75.737083] Backtrace:
    [   75.738252] [<804cd560>] (mv88e6xxx_port_db_load_purge) from [<804cd8a8>] (mv88e6xxx_port_fdb_add+0x50/0x68)
    [   75.746795]  r10:8125a0b4 r9:9dd1141c r8:00000000 r7:00000000 r6:00000000 r5:9f6a6010
    [   75.753323]  r4:9f6a601c
    [   75.754570] [<804cd858>] (mv88e6xxx_port_fdb_add) from [<806abcac>] (dsa_switch_event+0x1a0/0x660)
    [   75.762238]  r8:00000000 r7:00000003 r6:9dce9c10 r5:81203048 r4:9dce9c1c
    [   75.767655] [<806abb0c>] (dsa_switch_event) from [<80138e98>] (notifier_call_chain+0x54/0x94)
    [   75.774893]  r10:8125a0b4 r9:9dd1141c r8:00000000 r7:ffffffff r6:00000003 r5:9f54beb4
    [   75.781423]  r4:81203048
    [   75.782672] [<80138e44>] (notifier_call_chain) from [<801390bc>] (raw_notifier_call_chain+0x28/0x30)
    [   75.790514]  r9:00000000 r8:00000000 r7:9f72a300 r6:9dce9c54 r5:9ddc4000 r4:81203048
    [   75.796982] [<80139094>] (raw_notifier_call_chain) from [<806a9524>] (dsa_port_notify+0x24/0x38)
    [   75.804483] [<806a9500>] (dsa_port_notify) from [<806a9c7c>] (dsa_port_fdb_add+0x50/0x6c)
    [   75.811371] [<806a9c2c>] (dsa_port_fdb_add) from [<806ab228>] (dsa_slave_switchdev_event_work+0xb0/0x10c)
    [   75.819635]  r4:9ddffc00
    [   75.820885] [<806ab178>] (dsa_slave_switchdev_event_work) from [<80131e30>] (process_one_work+0x120/0x3a4)
    [   75.829241]  r6:9f40d200 r5:9f4c1900 r4:9ddffc00 r3:806ab178
    [   75.833612] [<80131d10>] (process_one_work) from [<80132124>] (worker_thread+0x70/0x574)
    [   75.840415]  r10:9f4c1900 r9:9f40d200 r8:9f54a038 r7:9f40d214 r6:812119a0 r5:9f4c1918
    [   75.846945]  r4:9f40d200
    [   75.848191] [<801320b4>] (worker_thread) from [<80137ad0>] (kthread+0x130/0x160)
    [   75.854300]  r10:9f4f7e88 r9:9f5f2dd8 r8:801320b4 r7:9f4c1900 r6:00000000 r5:9f53dbc0
    [   75.860830]  r4:9f5f2dc0
    [   75.862076] [<801379a0>] (kthread) from [<801010e8>] (ret_from_fork+0x14/0x2c)
    [   75.867999] Exception stack(0x9f54bfb0 to 0x9f54bff8)
    [   75.871753] bfa0:                                     00000000 00000000 00000000 00000000
    [   75.878640] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    [   75.885519] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
    [   75.890844]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:801379a0
    [   75.897377]  r4:9f53dbc0 r3:9f54a000
    [   75.899663] Code: e3a02000 e3a03000 e14b26f4 e24bc055 (e5973000)
    [   75.904575] ---[ end trace fbca818a124dbf0d ]---

Fixes: 816a3bed ("switchdev: Add fdb.added_by_user to switchdev notifications")
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a37fb855

tipc: clean up removal of binding table items · 5f30721c

由 Jon Maloy 提交于 5月 09, 2018

In commit be47e41d ("tipc: fix use-after-free in tipc_nametbl_stop")
we fixed a problem caused by premature release of service range items.

That fix is correct, and solved the problem. However, it doesn't address
the root of the problem, which is that we don't lookup the tipc_service
 -> service_range -> publication items in the correct hierarchical
order.

In this commit we try to make this right, and as a side effect obtain
some code simplification.
Acked-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f30721c

net/udp: Update udp_encap_needed static key to modern api · 88ab3108

由 Davidlohr Bueso 提交于 5月 08, 2018

No changes in refcount semantics -- key init is false; replace

static_key_enable         with   static_branch_enable
static_key_slow_inc|dec   with   static_branch_inc|dec
static_key_false          with   static_branch_unlikely

Added a '_key' suffix to udp and udpv6 encap_needed, for better
self documentation.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88ab3108

net: Update generic_xdp_needed static key to modern api · 02786475

由 Davidlohr Bueso 提交于 5月 08, 2018

No changes in refcount semantics -- key init is false; replace

static_key_slow_inc|dec   with   static_branch_inc|dec
static_key_false          with   static_branch_unlikely

Added a '_key' suffix to generic_xdp_needed, for better self
documentation.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02786475

net: Update netstamp_needed static key to modern api · 39e83922

由 Davidlohr Bueso 提交于 5月 08, 2018

No changes in refcount semantics -- key init is false; replace

static_key_slow_inc|dec   with   static_branch_inc|dec
static_key_false          with   static_branch_unlikely

Added a '_key' suffix to netstamp_needed, for better self
documentation.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39e83922

net: Update [e/in]gress_needed static key to modern api · aabf6772

由 Davidlohr Bueso 提交于 5月 08, 2018

No changes in semantics -- key init is false; replace

static_key_slow_inc|dec   with   static_branch_inc|dec
static_key_false          with   static_branch_unlikely

Added a '_key' suffix to both ingress_needed and egress_needed,
for better self documentation.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aabf6772

net/sock: Update memalloc_socks static key to modern api · a7950ae8

由 Davidlohr Bueso 提交于 5月 08, 2018

No changes in refcount semantics -- key init is false; replace

static_key_slow_inc|dec   with   static_branch_inc|dec
static_key_false          with   static_branch_unlikely

Added a '_key' suffix to memalloc_socks, for better self
documentation.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7950ae8

net/ipv4: Update ip_tunnel_metadata_cnt static key to modern api · 5263a98f

由 Davidlohr Bueso 提交于 5月 08, 2018

No changes in refcount semantics -- key init is false; replace

static_key_slow_inc|dec   with   static_branch_inc|dec
static_key_false          with   static_branch_unlikely
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5263a98f

09 5月, 2018 6 次提交

udp: Do not copy destructor if one is not present · 04d55b25

由 Alexander Duyck 提交于 5月 07, 2018

This patch makes it so that if a destructor is not present we avoid trying
to update the skb socket or any reference counting that would be associated
with the NULL socket and/or descriptor. By doing this we can support
traffic coming from another namespace without any issues.
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04d55b25

udp: Add support for software checksum and GSO_PARTIAL with GSO offload · 6053d0f1

由 Alexander Duyck 提交于 5月 07, 2018

This patch adds support for a software provided checksum and GSO_PARTIAL
segmentation support. With this we can offload UDP segmentation on devices
that only have partial support for tunnels.

Since we are no longer needing the hardware checksum we can drop the checks
in the segmentation code that were verifying if it was present.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6053d0f1

udp: Partially unroll handling of first segment and last segment · 0ad65095

由 Alexander Duyck 提交于 5月 07, 2018

This patch allows us to take care of unrolling the first segment and the
last segment of the loop for processing the segmented skb. Part of the
motivation for this is that it makes it easier to process the fact that the
first fame and all of the frames in between should be mostly identical
in terms of header data, and the last frame has differences in the length
and partial checksum.

In addition I am dropping the header length calculation since we don't
really need it for anything but the last frame and it can be easily
obtained by just pulling the data_len and offset of tail from the transport
header.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ad65095

udp: Do not pass checksum as a parameter to GSO segmentation · 9a0d41b3

由 Alexander Duyck 提交于 5月 07, 2018

This patch is meant to allow us to avoid having to recompute the checksum
from scratch and have it passed as a parameter.

Instead of taking that approach we can take advantage of the fact that the
length that was used to compute the existing checksum is included in the
UDP header.

Finally to avoid the need to invert the result we can just call csum16_add
and csum16_sub directly. By doing this we can avoid a number of
instructions in the loop that is handling segmentation.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a0d41b3

udp: Do not pass MSS as parameter to GSO segmentation · b21c034b

由 Alexander Duyck 提交于 5月 07, 2018

There is no point in passing MSS as a parameter for for the GSO
segmentation call as it is already available via the shared info for the
skb itself.
Reviewed-by: NEric Dumazet <edumazet@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b21c034b

udp: Record gso_segs when supporting UDP segmentation offload · dfec0ee2

由 Alexander Duyck 提交于 5月 07, 2018

We need to record the number of segments that will be generated when this
frame is segmented. The expectation is that if gso_size is set then
gso_segs is set as well. Without this some drivers such as ixgbe get
confused if they attempt to offload this as they record 0 segments for the
entire packet instead of the correct value.
Reviewed-by: NEric Dumazet <edumazet@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dfec0ee2

08 5月, 2018 6 次提交

flow_dissector: do not rely on implicit casts · d869dea6

由 Paolo Abeni 提交于 5月 07, 2018

This change fixes a couple of type mismatch reported by the sparse
tool, explicitly using the requested type for the offending arguments.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d869dea6

net: core: rework basic flow dissection helper · 72a338bc

由 Paolo Abeni 提交于 5月 04, 2018

When the core networking needs to detect the transport offset in a given
packet and parse it explicitly, a full-blown flow_keys struct is used for
storage.
This patch introduces a smaller keys store, rework the basic flow dissect
helper to use it, and apply this new helper where possible - namely in
skb_probe_transport_header(). The used flow dissector data structures
are renamed to match more closely the new role.

The above gives ~50% performance improvement in micro benchmarking around
skb_probe_transport_header() and ~30% around eth_get_headlen(), mostly due
to the smaller memset. Small, but measurable improvement is measured also
in macro benchmarking.

v1 -> v2: use the new helper in eth_get_headlen() and skb_get_poff(),
  as per DaveM suggestion
Suggested-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72a338bc

net: ipv6/gre: Add GRO support · 0c1dd2a1

由 Eran Ben Elisha 提交于 5月 07, 2018

Add GRO capability for IPv6 GRE tunnel and ip6erspan tap, via gro_cells
infrastructure.

Performance testing: 55% higher badwidth.
Measuring bandwidth of 1 thread IPv4 TCP traffic over IPv6 GRE tunnel
while GRO on the physical interface is disabled.
CPU: Intel Xeon E312xx (Sandy Bridge)
NIC: Mellanox Technologies MT27700 Family [ConnectX-4]
Before (GRO not working in tunnel) : 2.47 Gbits/sec
After  (GRO working in tunnel)     : 3.85 Gbits/sec
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c1dd2a1

net: ipv6: Fix typo in ipv6_find_hdr() documentation · 6f2f8212

由 Tariq Toukan 提交于 5月 07, 2018

Fix 'an' into 'and', and use a comma instead of a period.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f2f8212

net/9p: correct the variable name in v9fs_get_trans_by_name() comment · 3a443bd6

由 Sun Lianwen 提交于 5月 05, 2018

The v9fs_get_trans_by_name(char *s) variable name is not "name" but "s".
Signed-off-by: NSun Lianwen <sunlw.fnst@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a443bd6

vlan: correct the file path in vlan_dev_change_flags() comment · 84a2fba2

由 Sun Lianwen 提交于 5月 05, 2018

The vlan_flags enum is defined in include/uapi/linux/if_vlan.h file.
not in include/linux/if_vlan.h file.
Signed-off-by: NSun Lianwen <sunlw.fnst@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84a2fba2

07 5月, 2018 7 次提交

netfilter: nft_dynset: fix timeout updates on 32bit · b13468dc

由 Florian Westphal 提交于 4月 27, 2018

This must now use a 64bit jiffies value, else we set
a bogus timeout on 32bit.

Fixes: 8e1102d5 ("netfilter: nf_tables: support timeouts larger than 23 days")
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

b13468dc

netfilter: ctnetlink: export nf_conntrack_max · 538c5672

由 Florent Fourcot 提交于 5月 06, 2018

IPCTNL_MSG_CT_GET_STATS netlink command allow to monitor current number
of conntrack entries. However, if one wants to compare it with the
maximum (and detect exhaustion), the only solution is currently to read
sysctl value.

This patch add nf_conntrack_max value in netlink message, and simplify
monitoring for application built on netlink API.
Signed-off-by: NFlorent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

538c5672

netfilter: extract Passive OS fingerprint infrastructure from xt_osf · bfb15f2a

由 Fernando Fernandez Mancera 提交于 5月 03, 2018

Add nf_osf_ttl() and nf_osf_match() into nf_osf.c to prepare for
nf_tables support.
Signed-off-by: NFernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

bfb15f2a

F
netfilter: nf_nat: remove unused ct arg from lookup functions · 3a2e86f6
由 Florian Westphal 提交于 4月 26, 2018
```
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
```
3a2e86f6

netfilter: ip6t_srh: extend SRH matching for previous, next and last SID · c1c7e44b

由 Ahmed Abdelsalam 提交于 4月 25, 2018

IPv6 Segment Routing Header (SRH) contains a list of SIDs to be crossed
by SR encapsulated packet. Each SID is encoded as an IPv6 prefix.

When a Firewall receives an SR encapsulated packet, it should be able
to identify which node previously processed the packet (previous SID),
which node is going to process the packet next (next SID), and which
node is the last to process the packet (last SID) which represent the
final destination of the packet in case of inline SR mode.

An example use-case of using these features could be SID list that
includes two firewalls. When the second firewall receives a packet,
it can check whether the packet has been processed by the first firewall
or not. Based on that check, it decides to apply all rules, apply just
subset of the rules, or totally skip all rules and forward the packet to
the next SID.

This patch extends SRH match to support matching previous SID, next SID,
and last SID.
Signed-off-by: NAhmed Abdelsalam <amsalam20@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c1c7e44b

netfilter: nft_numgen: enable hashing of one element · 75e72f05

由 Laura Garcia Liebana 提交于 4月 23, 2018

The modulus in the hash function was limited to > 1 as initially
there was no sense to create a hashing of just one element.

Nevertheless, there are certain cases specially for load balancing
where this case needs to be addressed.

This patch fixes the following error.

Error: Could not process rule: Numerical result out of range
add rule ip nftlb lb01 dnat to jhash ip saddr mod 1 map { 0: 192.168.0.10 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The solution comes to force the hash to 0 when the modulus is 1.
Signed-off-by: NLaura Garcia Liebana <nevola@gmail.com>

75e72f05

netfilter: nft_numgen: add map lookups for numgen statements · d734a288

由 Laura Garcia Liebana 提交于 4月 22, 2018

This patch includes a new attribute in the numgen structure to allow
the lookup of an element based on the number generator as a key.

For this purpose, different ops have been included to extend the
current numgen inc functions.

Currently, only supported for numgen incremental operations, but
it will be supported for random in a follow-up patch.
Signed-off-by: NLaura Garcia Liebana <nevola@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d734a288

05 5月, 2018 1 次提交

net/ipv6: rename rt6_next to fib6_next · 8fb11a9a

由 David Ahern 提交于 5月 04, 2018

This slipped through the cracks in the followup set to the fib6_info flip.
Rename rt6_next to fib6_next.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fb11a9a

04 5月, 2018 9 次提交

smc: add support for splice() · 9014db20

由 Stefan Raspl 提交于 5月 03, 2018

Provide an implementation for splice() when we are using SMC. See
smc_splice_read() for further details.
Signed-off-by: NStefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9014db20

smc: allocate RMBs as compound pages · 2ef4f27a

由 Stefan Raspl 提交于 5月 03, 2018

Preparatory work for splice() support.
Signed-off-by: NStefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ef4f27a

smc: make smc_rx_wait_data() generic · b51fa1b1

由 Stefan Raspl 提交于 5月 03, 2018

Turn smc_rx_wait_data into a generic function that can be used at various
instances to wait on traffic to complete with varying criteria.
Signed-off-by: NStefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b51fa1b1

smc: simplify abort logic · c8b8ec8e

由 Stefan Raspl 提交于 5月 03, 2018

Some of the conditions to exit recv() are common in two pathes - cleaning up
code by moving the check up so we have it only once.
Signed-off-by: NStefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8b8ec8e

xfrm: use a dedicated slab cache for struct xfrm_state · 565f0fa9

由 Mathias Krause 提交于 5月 03, 2018

struct xfrm_state is rather large (768 bytes here) and therefore wastes
quite a lot of memory as it falls into the kmalloc-1024 slab cache,
leaving 256 bytes of unused memory per XFRM state object -- a net waste
of 25%.

Using a dedicated slab cache for struct xfrm_state reduces the level of
internal fragmentation to a minimum.

On my configuration SLUB chooses to create a slab cache covering 4
pages holding 21 objects, resulting in an average memory waste of ~13
bytes per object -- a net waste of only 1.6%.

In my tests this led to memory savings of roughly 2.3MB for 10k XFRM
states.
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

565f0fa9

bpf: add skb_load_bytes_relative helper · 4e1ec56c

由 Daniel Borkmann 提交于 5月 04, 2018

This adds a small BPF helper similar to bpf_skb_load_bytes() that
is able to load relative to mac/net header offset from the skb's
linear data. Compared to bpf_skb_load_bytes(), it takes a fifth
argument namely start_header, which is either BPF_HDR_START_MAC
or BPF_HDR_START_NET. This allows for a more flexible alternative
compared to LD_ABS/LD_IND with negative offset. It's enabled for
tc BPF programs as well as sock filter program types where it's
mainly useful in reuseport programs to ease access to lower header
data.

Reference: https://lists.iovisor.org/pipermail/iovisor-dev/2017-March/000698.htmlSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

4e1ec56c

bpf: implement ld_abs/ld_ind in native bpf · e0cea7ce

由 Daniel Borkmann 提交于 5月 04, 2018

The main part of this work is to finally allow removal of LD_ABS
and LD_IND from the BPF core by reimplementing them through native
eBPF instead. Both LD_ABS/LD_IND were carried over from cBPF and
keeping them around in native eBPF caused way more trouble than
actually worth it. To just list some of the security issues in
the past:

  * fdfaf64e ("x86: bpf_jit: support negative offsets")
  * 35607b02 ("sparc: bpf_jit: fix loads from negative offsets")
  * e0ee9c12 ("x86: bpf_jit: fix two bugs in eBPF JIT compiler")
  * 07aee943 ("bpf, sparc: fix usage of wrong reg for load_skb_regs after call")
  * 6d59b7db ("bpf, s390x: do not reload skb pointers in non-skb context")
  * 87338c8e ("bpf, ppc64: do not reload skb pointers in non-skb context")

For programs in native eBPF, LD_ABS/LD_IND are pretty much legacy
these days due to their limitations and more efficient/flexible
alternatives that have been developed over time such as direct
packet access. LD_ABS/LD_IND only cover 1/2/4 byte loads into a
register, the load happens in host endianness and its exception
handling can yield unexpected behavior. The latter is explained
in depth in f6b1b3bf ("bpf: fix subprog verifier bypass by
div/mod by 0 exception") with similar cases of exceptions we had.
In native eBPF more recent program types will disable LD_ABS/LD_IND
altogether through may_access_skb() in verifier, and given the
limitations in terms of exception handling, it's also disabled
in programs that use BPF to BPF calls.

In terms of cBPF, the LD_ABS/LD_IND is used in networking programs
to access packet data. It is not used in seccomp-BPF but programs
that use it for socket filtering or reuseport for demuxing with
cBPF. This is mostly relevant for applications that have not yet
migrated to native eBPF.

The main complexity and source of bugs in LD_ABS/LD_IND is coming
from their implementation in the various JITs. Most of them keep
the model around from cBPF times by implementing a fastpath written
in asm. They use typically two from the BPF program hidden CPU
registers for caching the skb's headlen (skb->len - skb->data_len)
and skb->data. Throughout the JIT phase this requires to keep track
whether LD_ABS/LD_IND are used and if so, the two registers need
to be recached each time a BPF helper would change the underlying
packet data in native eBPF case. At least in eBPF case, available
CPU registers are rare and the additional exit path out of the
asm written JIT helper makes it also inflexible since not all
parts of the JITer are in control from plain C. A LD_ABS/LD_IND
implementation in eBPF therefore allows to significantly reduce
the complexity in JITs with comparable performance results for
them, e.g.:

test_bpf             tcpdump port 22             tcpdump complex
x64      - before    15 21 10                    14 19  18
         - after      7 10 10                     7 10  15
arm64    - before    40 91 92                    40 91 151
         - after     51 64 73                    51 62 113

For cBPF we now track any usage of LD_ABS/LD_IND in bpf_convert_filter()
and cache the skb's headlen and data in the cBPF prologue. The
BPF_REG_TMP gets remapped from R8 to R2 since it's mainly just
used as a local temporary variable. This allows to shrink the
image on x86_64 also for seccomp programs slightly since mapping
to %rsi is not an ereg. In callee-saved R8 and R9 we now track
skb data and headlen, respectively. For normal prologue emission
in the JITs this does not add any extra instructions since R8, R9
are pushed to stack in any case from eBPF side. cBPF uses the
convert_bpf_ld_abs() emitter which probes the fast path inline
already and falls back to bpf_skb_load_helper_{8,16,32}() helper
relying on the cached skb data and headlen as well. R8 and R9
never need to be reloaded due to bpf_helper_changes_pkt_data()
since all skb access in cBPF is read-only. Then, for the case
of native eBPF, we use the bpf_gen_ld_abs() emitter, which calls
the bpf_skb_load_helper_{8,16,32}_no_cache() helper unconditionally,
does neither cache skb data and headlen nor has an inlined fast
path. The reason for the latter is that native eBPF does not have
any extra registers available anyway, but even if there were, it
avoids any reload of skb data and headlen in the first place.
Additionally, for the negative offsets, we provide an alternative
bpf_skb_load_bytes_relative() helper in eBPF which operates
similarly as bpf_skb_load_bytes() and allows for more flexibility.
Tested myself on x64, arm64, s390x, from Sandipan on ppc64.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

e0cea7ce

bpf: migrate ebpf ld_abs/ld_ind tests to test_verifier · 93731ef0

由 Daniel Borkmann 提交于 5月 04, 2018

Remove all eBPF tests involving LD_ABS/LD_IND from test_bpf.ko. Reason
is that the eBPF tests from test_bpf module do not go via BPF verifier
and therefore any instruction rewrites from verifier cannot take place.

Therefore, move them into test_verifier which runs out of user space,
so that verfier can rewrite LD_ABS/LD_IND internally in upcoming patches.
It will have the same effect since runtime tests are also performed from
there. This also allows to finally unexport bpf_skb_vlan_{push,pop}_proto
and keep it internal to core kernel.

Additionally, also add further cBPF LD_ABS/LD_IND test coverage into
test_bpf.ko suite.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

93731ef0

bpf: prefix cbpf internal helpers with bpf_ · b390134c

由 Daniel Borkmann 提交于 5月 04, 2018

No change in functionality, just remove the '__' prefix and replace it
with a 'bpf_' prefix instead. We later on add a couple of more helpers
for cBPF and keeping the scheme with '__' is suboptimal there.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

b390134c

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功