提交 · c42d7121fbee1ee30fd9221d594e9c5a4bc1fed6 · openeuler / Kernel

23 7月, 2016 3 次提交

netfilter: nft_compat: fix crash when related match/target module is removed · 4b512e1c

由 Liping Zhang 提交于 7月 23, 2016

We "cache" the loaded match/target modules and reuse them, but when the
modules are removed, we still point to them. Then we may end up with
invalid memory references when using iptables-compat to add rules later.

Input the following commands will reproduce the kernel crash:
  # iptables-compat -A INPUT -j LOG
  # iptables-compat -D INPUT -j LOG
  # rmmod xt_LOG
  # iptables-compat -A INPUT -j LOG
  BUG: unable to handle kernel paging request at ffffffffa05a9010
  IP: [<ffffffff813f783e>] strcmp+0xe/0x30
  Call Trace:
  [<ffffffffa05acc43>] nft_target_select_ops+0x83/0x1f0 [nft_compat]
  [<ffffffffa058a177>] nf_tables_expr_parse+0x147/0x1f0 [nf_tables]
  [<ffffffffa058e541>] nf_tables_newrule+0x301/0x810 [nf_tables]
  [<ffffffff8141ca00>] ? nla_parse+0x20/0x100
  [<ffffffffa057fa8f>] nfnetlink_rcv+0x33f/0x53d [nfnetlink]
  [<ffffffffa057f94b>] ? nfnetlink_rcv+0x1fb/0x53d [nfnetlink]
  [<ffffffff817116b8>] netlink_unicast+0x178/0x220
  [<ffffffff81711a5b>] netlink_sendmsg+0x2fb/0x3a0
  [<ffffffff816b7fc8>] sock_sendmsg+0x38/0x50
  [<ffffffff816b8a7e>] ___sys_sendmsg+0x28e/0x2a0
  [<ffffffff816bcb7e>] ? release_sock+0x1e/0xb0
  [<ffffffff81804ac5>] ? _raw_spin_unlock_bh+0x35/0x40
  [<ffffffff816bcbe2>] ? release_sock+0x82/0xb0
  [<ffffffff816b93d4>] __sys_sendmsg+0x54/0x90
  [<ffffffff816b9422>] SyS_sendmsg+0x12/0x20
  [<ffffffff81805172>] entry_SYSCALL_64_fastpath+0x1a/0xa9

So when nobody use the related match/target module, there's no need to
"cache" it. And nft_[match|target]_release are useless anymore, remove
them.
Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

4b512e1c

netfilter: nft_compat: put back match/target module if init fail · 2bf4fade

由 Liping Zhang 提交于 7月 23, 2016

If the user specify the invalid NFTA_MATCH_INFO/NFTA_TARGET_INFO attr
or memory alloc fail, we should call module_put to the related match
or target. Otherwise, we cannot remove the module even nobody use it.
Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

2bf4fade

netfilter: h323: Use mod_timer instead of set_expect_timeout · 96d1327a

由 Gao Feng 提交于 7月 22, 2016

Simplify the code without any side effect. The set_expect_timeout is
used to modify the timer expired time.  It tries to delete timer, and
add it again.  So we could use mod_timer directly.
Signed-off-by: NGao Feng <fgao@ikuai8.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

96d1327a

22 7月, 2016 3 次提交

netfilter: connlabels: move set helper to xt_connlabel · 857ed310

由 Florian Westphal 提交于 7月 21, 2016

xt_connlabel is the only user so move it.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

857ed310

netfilter: conntrack: support a fixed size of 128 distinct labels · 23014011

由 Florian Westphal 提交于 7月 21, 2016

The conntrack label extension is currently variable-sized, e.g. if
only 2 labels are used by iptables rules then the labels->bits[] array
will only contain one element.

We track size of each label storage area in the 'words' member.

But in nftables and openvswitch we always have to ask for worst-case
since we don't know what bit will be used at configuration time.

As most arches are 64bit we need to allocate 24 bytes in this case:

struct nf_conn_labels {
    u8            words;   /*     0     1 */
    /* XXX 7 bytes hole, try to pack */
    long unsigned bits[2]; /*     8     24 */

Make bits a fixed size and drop the words member, it simplifies
the code and only increases memory requirements on x86 when
less than 64bit labels are required.

We still only allocate the extension if its needed.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

23014011

packet: propagate sock_cmsg_send() error · f8e7718c

由 Soheil Hassas Yeganeh 提交于 7月 20, 2016

sock_cmsg_send() can return different error codes and not only
-EINVAL, and we should properly propagate them.

Fixes: c14ac945 ("sock: enable timestamping using control messages")
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8e7718c

21 7月, 2016 6 次提交

rtnl: protect do_setlink from IFLA_XDP_ATTACHED · 262d8625

由 Brenden Blanco 提交于 7月 20, 2016

The IFLA_XDP_ATTACHED nested attribute is meant for read-only, and while
do_setlink properly ignores it, it should be more paranoid and reject
commands that try to set it.
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

262d8625

netfilter: nf_tables: allow to filter out rules by table and chain · 6e1f760e

由 Pablo Neira Ayuso 提交于 7月 19, 2016

If the table and/or chain attributes are set in a rule dump request,
we filter out the rules based on this selection.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

6e1f760e

netfilter: nft_log: fix snaplen does not truncate packets · cc37c1ad

由 Liping Zhang 提交于 7月 18, 2016

There's a similar problem in xt_NFLOG, and was fixed by commit 7643507f
("netfilter: xt_NFLOG: nflog-range does not truncate packets"). Only set
copy_len here does not work, so we should enable NF_LOG_F_COPY_LEN also.
Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

cc37c1ad

netfilter: nft_log: check the validity of log level · 1bc4e013

由 Liping Zhang 提交于 7月 18, 2016

User can specify the log level larger than 7(debug level) via
nfnetlink, this is invalid. So in this case, we should report
EINVAL to the userspace.
Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

1bc4e013

netfilter: nft_log: fix possible memory leak if log expr init fail · c2d9a429

由 Liping Zhang 提交于 7月 18, 2016

Suppose that we specify the NFTA_LOG_PREFIX, then NFTA_LOG_LEVEL
and NFTA_LOG_GROUP are specified together or nf_logger_find_get
call returns fail, i.e. expr init fail, memory leak will happen.
Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c2d9a429

netfilter: Add helper array register/unregister functions · 82de0be6

由 Gao Feng 提交于 7月 18, 2016

Add nf_ct_helper_init(), nf_conntrack_helpers_register() and
nf_conntrack_helpers_unregister() functions to avoid repetitive
opencoded initialization in helpers.

This patch keeps an id parameter for nf_ct_helper_init() not to break
helper matching by name that has been inconsistently exposed to
userspace through ports, eg. ftp-2121, and through an incremental id,
eg. tftp-1.
Signed-off-by: NGao Feng <fgao@ikuai8.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

82de0be6

20 7月, 2016 13 次提交

rtnl: add option for setting link xdp prog · d1fdd913

由 Brenden Blanco 提交于 7月 19, 2016

Sets the bpf program represented by fd as an early filter in the rx path
of the netdev. The fd must have been created as BPF_PROG_TYPE_XDP.
Providing a negative value as fd clears the program. Getting the fd back
via rtnl is not possible, therefore reading of this value merely
provides a bool whether the program is valid on the link or not.
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1fdd913

net: add ndo to setup/query xdp prog in adapter rx · a7862b45

由 Brenden Blanco 提交于 7月 19, 2016

Add one new netdev op for drivers implementing the BPF_PROG_TYPE_XDP
filter. The single op is used for both setup/query of the xdp program,
modelled after ndo_setup_tc.
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7862b45

bpf: add XDP prog type for early driver filter · 6a773a15

由 Brenden Blanco 提交于 7月 19, 2016

Add a new bpf prog type that is intended to run in early stages of the
packet rx path. Only minimal packet metadata will be available, hence a
new context type, struct xdp_md, is exposed to userspace. So far only
expose the packet start and end pointers, and only in read mode.

An XDP program must return one of the well known enum values, all other
return codes are reserved for future use. Unfortunately, this
restriction is hard to enforce at verification time, so take the
approach of warning at runtime when such programs are encountered. Out
of bounds return codes should alias to XDP_ABORTED.
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a773a15

packet: fix second argument of sock_tx_timestamp() · edbe7746

由 Yoshihiro Shimoda 提交于 7月 19, 2016

This patch fixes an issue that a syscall (e.g. sendto syscall) cannot
work correctly. Since the sendto syscall doesn't have msg_control buffer,
the sock_tx_timestamp() in packet_snd() cannot work correctly because
the socks.tsflags is set to 0.
So, this patch sets the socks.tsflags to sk->sk_tsflags as default.

Fixes: c14ac945 ("sock: enable timestamping using control messages")
Reported-by: NKazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Reported-by: NKeita Kobayashi <keita.kobayashi.ym@renesas.com>
Signed-off-by: NYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

edbe7746

net/ncsi: NCSI AEN packet handler · 7a82ecf4

由 Gavin Shan 提交于 7月 19, 2016

This introduces NCSI AEN packet handlers that result in (A) the
currently active channel is reconfigured; (B) Currently active
channel is deconfigured and disabled, another channel is chosen
as active one and configured. Case (B) won't happen if hardware
arbitration has been enabled, the channel that was in active
state is suspended simply.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a82ecf4

net/ncsi: Package and channel management · e6f44ed6

由 Gavin Shan 提交于 7月 19, 2016

This manages NCSI packages and channels:

 * The available packages and channels are enumerated in the first
   time of calling ncsi_start_dev(). The channels' capabilities are
   probed in the meanwhile. The NCSI network topology won't change
   until the NCSI device is destroyed.
 * There in a queue in every NCSI device. The element in the queue,
   channel, is waiting for configuration (bringup) or suspending
   (teardown). The channel's state (inactive/active) indicates the
   futher action (configuration or suspending) will be applied on the
   channel. Another channel's state (invisible) means the requested
   action is being applied.
 * The hardware arbitration will be enabled if all available packages
   and channels support it. All available channels try to provide
   service when hardware arbitration is enabled. Otherwise, one channel
   is selected as the active one at once.
 * When channel is in active state, meaning it's providing service, a
   timer started to retrieve the channe's link status. If the channel's
   link status fails to be updated in the determined period, the channel
   is going to be reconfigured. It's the error handling implementation
   as defined in NCSI spec.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6f44ed6

net/ncsi: NCSI response packet handler · 138635cc

由 Gavin Shan 提交于 7月 19, 2016

The NCSI response packets are sent to MC (Management Controller)
from the remote end. They are responses of NCSI command packets
for multiple purposes: completion status of NCSI command packets,
return NCSI channel's capability or configuration etc.

This defines struct to represent NCSI response packets and introduces
function ncsi_rcv_rsp() which will be used to receive NCSI response
packets and parse them.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

138635cc

net/ncsi: NCSI command packet handler · 6389eaa7

由 Gavin Shan 提交于 7月 19, 2016

The NCSI command packets are sent from MC (Management Controller)
to remote end. They are used for multiple purposes: probe existing
NCSI package/channel, retrieve NCSI channel's capability, configure
NCSI channel etc.

This defines struct to represent NCSI command packets and introduces
function ncsi_xmit_cmd(), which will be used to transmit NCSI command
packet according to the request. The request is represented by struct
ncsi_cmd_arg.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6389eaa7

net/ncsi: Resource management · 2d283bdd

由 Gavin Shan 提交于 7月 19, 2016

NCSI spec (DSP0222) defines several objects: package, channel, mode,
filter, version and statistics etc. This introduces the data structs
to represent those objects and implement functions to manage them.
Also, this introduces CONFIG_NET_NCSI for the newly implemented NCSI
stack.

   * The user (e.g. netdev driver) dereference NCSI device by
     "struct ncsi_dev", which is embedded to "struct ncsi_dev_priv".
     The later one is used by NCSI stack internally.
   * Every NCSI device can have multiple packages simultaneously, up
     to 8 packages. It's represented by "struct ncsi_package" and
     identified by 3-bits ID.
   * Every NCSI package can have multiple channels, up to 32. It's
     represented by "struct ncsi_channel" and identified by 5-bits ID.
   * Every NCSI channel has version, statistics, various modes and
     filters. They are represented by "struct ncsi_channel_version",
     "struct ncsi_channel_stats", "struct ncsi_channel_mode" and
     "struct ncsi_channel_filter" separately.
   * Apart from AEN (Asynchronous Event Notification), the NCSI stack
     works in terms of command and response. This introduces "struct
     ncsi_req" to represent a complete NCSI transaction made of NCSI
     request and response.

link: https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.1.0.pdfSigned-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d283bdd

net: dsa: support switchdev ageing time attr · 34a79f63

由 Vivien Didelot 提交于 7月 18, 2016

Add a new function for DSA drivers to handle the switchdev
SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute.

The ageing time is passed as milliseconds.

Also because we can have multiple logical bridges on top of a physical
switch and ageing time are switch-wide, call the driver function with
the fastest ageing time in use on the chip instead of the requested one.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34a79f63

net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow... · b8247f09

由 Shmulik Ladkani 提交于 7月 18, 2016

net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs

Given:
 - tap0 and vxlan0 are bridged
 - vxlan0 stacked on eth0, eth0 having small mtu (e.g. 1400)

Assume GSO skbs arriving from tap0 having a gso_size as determined by
user-provided virtio_net_hdr (e.g. 1460 corresponding to VM mtu of 1500).

After encapsulation these skbs have skb_gso_network_seglen that exceed
eth0's ip_skb_dst_mtu.

These skbs are accidentally passed to ip_finish_output2 AS IS.
Alas, each final segment (segmented either by validate_xmit_skb or by
hardware UFO) would be larger than eth0 mtu.
As a result, those above-mtu segments get dropped on certain networks.

This behavior is not aligned with the NON-GSO case:
Assume a non-gso 1500-sized IP packet arrives from tap0. After
encapsulation, the vxlan datagram is fragmented normally at the
ip_finish_output-->ip_fragment code path.

The expected behavior for the GSO case would be segmenting the
"gso-oversized" skb first, then fragmenting each segment according to
dst mtu, and finally passing the resulting fragments to ip_finish_output2.

'ip_finish_output_gso' already supports this "Slowpath" behavior,
according to the IPSKB_FRAG_SEGS flag, which is only set during ipv4
forwarding (not set in the bridged case).

In order to support the bridged case, we'll mark skbs arriving from an
ingress interface that get udp-encaspulated as "allowed to be fragmented",
causing their network_seglen to be validated by 'ip_finish_output_gso'
(and fragment if needed).

Note the TUNNEL_DONT_FRAGMENT tun_flag is still honoured (both in the
gso and non-gso cases), which serves users wishing to forbid
fragmentation at the udp tunnel endpoint.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8247f09

net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flags · 359ebda2

由 Shmulik Ladkani 提交于 7月 18, 2016

This flag indicates whether fragmentation of segments is allowed.

Formerly this policy was hardcoded according to IPSKB_FORWARDED (set by
either ip_forward or ipmr_forward).

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

359ebda2

netfilter: nft_ct: fix unpaired nf_connlabels_get/put call · 590025a2

由 Liping Zhang 提交于 7月 16, 2016

We only get nf_connlabels if the user add ct label set expr successfully,
but we will also put nf_connlabels if the user delete ct lable get expr.
This is mismathced, and will cause ct label expr cannot work properly.

Also, if we init something fail, we should put nf_connlabels back.
Otherwise, we may waste to alloc the memory that will never be used.
Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
Acked-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

590025a2

19 7月, 2016 3 次提交

sctp: load transport header after sk_filter · c74bfbdb

由 Willem de Bruijn 提交于 7月 16, 2016

Do not cache pointers into the skb linear segment across sk_filter.
The function call can trigger pskb_expand_head.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c74bfbdb

net/sched/sch_htb: clamp xstats tokens to fit into 32-bit int · 0564bf0a

由 Konstantin Khlebnikov 提交于 7月 16, 2016

In kernel HTB keeps tokens in signed 64-bit in nanoseconds. In netlink
protocol these values are converted into pshed ticks (64ns for now) and
truncated to 32-bit. In struct tc_htb_xstats fields "tokens" and "ctokens"
are declared as unsigned 32-bit but they could be negative thus tool 'tc'
prints them as signed. Big values loose higher bits and/or become negative.

This patch clamps tokens in xstat into range from INT_MIN to INT_MAX.
In this way it's easier to understand what's going on here.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0564bf0a

netfilter: x_tables: speed up jump target validation · f4dc7771

由 Florian Westphal 提交于 7月 14, 2016

The dummy ruleset I used to test the original validation change was broken,
most rules were unreachable and were not tested by mark_source_chains().

In some cases rulesets that used to load in a few seconds now require
several minutes.

sample ruleset that shows the behaviour:

echo "*filter"
for i in $(seq 0 100000);do
        printf ":chain_%06x - [0:0]\n" $i
done
for i in $(seq 0 100000);do
   printf -- "-A INPUT -j chain_%06x\n" $i
   printf -- "-A INPUT -j chain_%06x\n" $i
   printf -- "-A INPUT -j chain_%06x\n" $i
done
echo COMMIT

[ pipe result into iptables-restore ]

This ruleset will be about 74mbyte in size, with ~500k searches
though all 500k[1] rule entries. iptables-restore will take forever
(gave up after 10 minutes)

Instead of always searching the entire blob for a match, fill an
array with the start offsets of every single ipt_entry struct,
then do a binary search to check if the jump target is present or not.

After this change ruleset restore times get again close to what one
gets when reverting 36472341 (~3 seconds on my workstation).

[1] every user-defined rule gets an implicit RETURN, so we get
300k jumps + 100k userchains + 100k returns -> 500k rule entries

Fixes: 36472341 ("netfilter: x_tables: validate targets of jumps")
Reported-by: NJeff Wu <wujiafu@gmail.com>
Tested-by: NJeff Wu <wujiafu@gmail.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

f4dc7771

18 7月, 2016 2 次提交

Bluetooth: Add debugfs fields for hardware and firmware info · 5177a838

由 Marcel Holtmann 提交于 7月 17, 2016

Some Bluetooth controllers allow for reading hardware and firmware
related vendor specific infos. If they are available, then they can be
exposed via debugfs now.
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>

5177a838

Bluetooth: Fix l2cap_sock_setsockopt() with optname BT_RCVMTU · 23bc6ab0

由 Amadeusz Sławiński 提交于 7月 14, 2016

When we retrieve imtu value from userspace we should use 16 bit pointer
cast instead of 32 as it's defined that way in headers. Fixes setsockopt
calls on big-endian platforms.
Signed-off-by: NAmadeusz Sławiński <amadeusz.slawinski@tieto.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org

23bc6ab0

17 7月, 2016 8 次提交

sctp: fix GSO for IPv6 · c5c4e45c

由 Marcelo Ricardo Leitner 提交于 7月 15, 2016

commit 90017acc ("sctp: Add GSO support") didn't register SCTP GSO
offloading for IPv6 and yet didn't put any restrictions on generating
GSO packets while in IPv6, which causes all IPv6 GSO'ed packets to be
silently dropped.

The fix is to properly register the offload this time.

Fixes: 90017acc ("sctp: Add GSO support")
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5c4e45c

sctp: recvmsg should be able to run even if sock is in closing state · e5b13f34

由 Marcelo Ricardo Leitner 提交于 7月 15, 2016

Commit d46e416c missed to update some other places which checked for
the socket being TCP-style AND Established state, as Closing state has
some overlapping with the previous understanding of Established.

Without this fix, one of the effects is that some already queued rx
messages may not be readable anymore depending on how the association
teared down, and sending may also not be possible if peer initiated the
shutdown.

Also merge two if() blocks into one condition on sctp_sendmsg().

Cc: Xin Long <lucien.xin@gmail.com>
Fixes: d46e416c ("sctp: sctp should change socket state when shutdown is received")
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5b13f34

net: ipmr/ip6mr: add support for keeping an entry age · 43b9e127

由 Nikolay Aleksandrov 提交于 7月 14, 2016

In preparation for hardware offloading of ipmr/ip6mr we need an
interface that allows to check (and later update) the age of entries.
Relying on stats alone can show activity but not actual age of the entry,
furthermore when there're tens of thousands of entries a lot of the
hardware implementations only support "hit" bits which are cleared on
read to denote that the entry was active and shouldn't be aged out,
these can then be naturally translated into age timestamp and will be
compatible with the software forwarding age. Using a lastuse entry doesn't
affect performance because the members in that cache line are written to
along with the age.
Since all new users are encouraged to use ipmr via netlink, this is
exported via the RTA_EXPIRES attribute.
Also do a minor local variable declaration style adjustment - arrange them
longest to shortest.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Shrijeet Mukherjee <shm@cumulusnetworks.com>
CC: Satish Ashok <sashok@cumulusnetworks.com>
CC: Donald Sharp <sharpd@cumulusnetworks.com>
CC: David S. Miller <davem@davemloft.net>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
CC: James Morris <jmorris@namei.org>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43b9e127

vlan: use a valid default mtu value for vlan over macsec · 18d3df3e

由 Paolo Abeni 提交于 7月 14, 2016

macsec can't cope with mtu frames which need vlan tag insertion, and
vlan device set the default mtu equal to the underlying dev's one.
By default vlan over macsec devices use invalid mtu, dropping
all the large packets.
This patch adds a netif helper to check if an upper vlan device
needs mtu reduction. The helper is used during vlan devices
initialization to set a valid default and during mtu updating to
forbid invalid, too bit, mtu values.
The helper currently only check if the lower dev is a macsec device,
if we get more users, we need to update only the helper (possibly
reserving an additional IFF bit).
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18d3df3e

net: bridge: remove _deliver functions and consolidate forward code · 37b090e6

由 Nikolay Aleksandrov 提交于 7月 14, 2016

Before this patch we had two flavors of most forwarding functions -
_forward and _deliver, the difference being that the latter are used
when the packets are locally originated. Instead of all this function
pointer passing and code duplication, we can just pass a boolean noting
that the packet was locally originated and use that to perform the
necessary checks in __br_forward. This gives a minor performance
improvement but more importantly consolidates the forwarding paths.
Also add a kernel doc comment to explain the exported br_forward()'s
arguments.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37b090e6

net: bridge: drop skb2/skb0 variables and use a local_rcv boolean · b35c5f63

由 Nikolay Aleksandrov 提交于 7月 14, 2016

Currently if the packet is going to be received locally we set skb0 or
sometimes called skb2 variables to the original skb. This can get
confusing and also we can avoid one conditional on the fast path by
simply using a boolean and passing it around. Thanks to Roopa for the
name suggestion.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b35c5f63

net: bridge: rearrange flood vs unicast receive paths · e151aab9

由 Nikolay Aleksandrov 提交于 7月 14, 2016

This patch removes one conditional from the unicast path by using the fact
that skb is NULL only when the packet is multicast or is local.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e151aab9

net: bridge: minor style adjustments in br_handle_frame_finish · 46c0772d

由 Nikolay Aleksandrov 提交于 7月 14, 2016

Trivial style changes in br_handle_frame_finish.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46c0772d

16 7月, 2016 2 次提交

tcp_timer.c: Add kernel-doc function descriptions · c380d37e

由 Richard Sailer 提交于 7月 16, 2016

This adds kernel-doc style descriptions for 6 functions and
fixes 1 typo.
Signed-off-by: NRichard Sailer <richard@weltraumpflege.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c380d37e

bpf: avoid stack copy and use skb ctx for event output · 555c8a86

由 Daniel Borkmann 提交于 7月 14, 2016

This work addresses a couple of issues bpf_skb_event_output()
helper currently has: i) We need two copies instead of just a
single one for the skb data when it should be part of a sample.
The data can be non-linear and thus needs to be extracted via
bpf_skb_load_bytes() helper first, and then copied once again
into the ring buffer slot. ii) Since bpf_skb_load_bytes()
currently needs to be used first, the helper needs to see a
constant size on the passed stack buffer to make sure BPF
verifier can do sanity checks on it during verification time.
Thus, just passing skb->len (or any other non-constant value)
wouldn't work, but changing bpf_skb_load_bytes() is also not
the proper solution, since the two copies are generally still
needed. iii) bpf_skb_load_bytes() is just for rather small
buffers like headers, since they need to sit on the limited
BPF stack anyway. Instead of working around in bpf_skb_load_bytes(),
this work improves the bpf_skb_event_output() helper to address
all 3 at once.

We can make use of the passed in skb context that we have in
the helper anyway, and use some of the reserved flag bits as
a length argument. The helper will use the new __output_custom()
facility from perf side with bpf_skb_copy() as callback helper
to walk and extract the data. It will pass the data for setup
to bpf_event_output(), which generates and pushes the raw record
with an additional frag part. The linear data used in the first
frag of the record serves as programmatically defined meta data
passed along with the appended sample.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

555c8a86

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功