提交 · 4b5026ade59e07c8913ba8c4d19db2438fdeed1a · openanolis / cloud-kernel

18 2月, 2017 1 次提交

net/sched: Reflect HW offload status · e696028a

由 Or Gerlitz 提交于 2月 16, 2017

Currently there is no way of querying whether a filter is
offloaded to HW or not when using "both" policy (where none
of skip_sw or skip_hw flags are set by user-space).

Add two new flags, "in hw" and "not in hw" such that user
space can determine if a filter is actually offloaded to
hw or not. The "in hw" UAPI semantics was chosen so it's
similar to the "skip hw" flag logic.

If none of these two flags are set, this signals running
over older kernel.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NAmir Vadai <amir@vadai.me>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e696028a

16 2月, 2017 3 次提交

rhashtable: Revert nested table changes. · bf3f14d6

由 David S. Miller 提交于 2月 15, 2017

This reverts commits:

6a254780
9dbbfb0a
40137906

It's too risky to put in this late in the release
cycle.  We'll put these changes into the next merge
window instead.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf3f14d6

qed: Add infrastructure for PTP support · c78c70fa

由 Sudarsana Reddy Kalluru 提交于 2月 15, 2017

The patch adds the required qed interfaces for configuring/reading
the PTP clock on the adapter.
Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c78c70fa

sched: have stub for tcf_destroy_chain in case NET_CLS is not configured · 8ae70032

由 Jiri Pirko 提交于 2月 15, 2017

This fixes broken build for !NET_CLS:

net/built-in.o: In function `fq_codel_destroy':
/home/sab/linux/net-next/net/sched/sch_fq_codel.c:468: undefined reference to `tcf_destroy_chain'

Fixes: cf1facda ("sched: move tcf_proto_destroy and tcf_destroy_chain helpers into cls_api")
Reported-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Tested-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ae70032

15 2月, 2017 7 次提交

esp: Add a software GRO codepath · 7785bba2

由 Steffen Klassert 提交于 2月 15, 2017

This patch adds GRO ifrastructure and callbacks for ESP on
ipv4 and ipv6.

In case the GRO layer detects an ESP packet, the
esp{4,6}_gro_receive() function does a xfrm state lookup
and calls the xfrm input layer if it finds a matching state.
The packet will be decapsulated and reinjected it into layer 2.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

7785bba2

xfrm: Extend the sec_path for IPsec offloading · 54ef207a

由 Steffen Klassert 提交于 2月 15, 2017

We need to keep per packet offloading informations across
the layers. So we extend the sec_path to carry these for
the input and output offload codepath.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

54ef207a

xfrm: Export xfrm_parse_spi. · 1e295370

由 Steffen Klassert 提交于 2月 15, 2017

We need it in the ESP offload handlers, so export it.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

1e295370

net: Prepare gro for packet consuming gro callbacks · 25393d3f

由 Steffen Klassert 提交于 2月 15, 2017

The upcomming IPsec ESP gro callbacks will consume the skb,
so prepare for that.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

25393d3f

net: Add a skb_gro_flush_final helper. · 5f114163

由 Steffen Klassert 提交于 2月 15, 2017

Add a skb_gro_flush_final helper to prepare for  consuming
skbs in call_gro_receive. We will extend this helper to not
touch the skb if the skb is consumed by a gro callback with
a followup patch. We need this to handle the upcomming IPsec
ESP callbacks as they reinject the skb to the napi_gro_receive
asynchronous. The handler is used in all gro_receive functions
that can call the ESP gro handlers.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

5f114163

xfrm: Add a secpath_set helper. · b0fcee82

由 Steffen Klassert 提交于 2月 15, 2017

Add a new helper to set the secpath to the skb.
This avoids code duplication, as this is used
in multiple places.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

b0fcee82

uapi: fix linux/if_pppol2tp.h userspace compilation errors · a725eb15

由 Dmitry V. Levin 提交于 2月 15, 2017

Because of <linux/libc-compat.h> interface limitations, <netinet/in.h>
provided by libc cannot be included after <linux/in.h>, therefore any
header that includes <netinet/in.h> cannot be included after <linux/in.h>.

Change uapi/linux/l2tp.h, the last uapi header that includes
<netinet/in.h>, to include <linux/in.h> and <linux/in6.h> instead of
<netinet/in.h> and use __SOCK_SIZE__ instead of sizeof(struct sockaddr)
the same way as uapi/linux/in.h does, to fix linux/if_pppol2tp.h userspace
compilation errors like this:

In file included from /usr/include/linux/l2tp.h:12:0,
                 from /usr/include/linux/if_pppol2tp.h:21,
/usr/include/netinet/in.h:31:8: error: redefinition of 'struct in_addr'

Fixes: 47c3e778 ("net: l2tp: deprecate PPPOL2TP_MSG_* in favour of L2TP_MSG_*")
Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a725eb15

14 2月, 2017 4 次提交

net: make net_device members garp_port and mrp_port conditional · fb585b44

由 Tobias Klauser 提交于 2月 10, 2017

garp_port is only used in net/802/garp.c which is only compiled with
CONFIG_GARP enabled. Same goes for mrp_port which is only used in
net/802/mrp.c with CONFIG_MRP enabled.

Only include the two members in struct net_device if their respective
CONFIG_* is enabled. This saves a few bytes in struct net_device in case
CONFIG_GARP or CONFIG_MRP are not enabled.
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb585b44

net: busy-poll: remove LL_FLUSH_FAILED and LL_FLUSH_BUSY · 37fabbf4

由 Eric Dumazet 提交于 2月 10, 2017

Commit 79e7fff4 ("net: remove support for per driver
ndo_busy_poll()") made them obsolete.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37fabbf4

rhashtable: Add nested tables · 40137906

由 Herbert Xu 提交于 2月 11, 2017

This patch adds code that handles GFP_ATOMIC kmalloc failure on
insertion.  As we cannot use vmalloc, we solve it by making our
hash table nested.  That is, we allocate single pages at each level
and reach our desired table size by nesting them.

When a nested table is created, only a single page is allocated
at the top-level.  Lower levels are allocated on demand during
insertion.  Therefore for each insertion to succeed, only two
(non-consecutive) pages are needed.

After a nested table is created, a rehash will be scheduled in
order to switch to a vmalloced table as soon as possible.  Also,
the rehash code will never rehash into a nested table.  If we
detect a nested table during a rehash, the rehash will be aborted
and a new rehash will be scheduled.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

40137906

[media] videodev2.h: go back to limited range Y'CbCr for SRGB and, ADOBERGB · 35879ee4

由 Hans Verkuil 提交于 2月 10, 2017

This reverts 'commit 7e0739cd ("[media] videodev2.h: fix
sYCC/AdobeYCC default quantization range").

The problem is that many drivers can convert R'G'B' content (often
from sensors) to Y'CbCr, but they all produce limited range Y'CbCr.

To stay backwards compatible the default quantization range for
sRGB and AdobeRGB Y'CbCr encoding should be limited range, not full
range, even though the corresponding standards specify full range.

Update the V4L2_MAP_QUANTIZATION_DEFAULT define accordingly and
also update the documentation.

Fixes: 7e0739cd ("[media] videodev2.h: fix sYCC/AdobeYCC default quantization range")
Signed-off-by: NHans Verkuil <hans.verkuil@cisco.com>
Cc: <stable@vger.kernel.org>      # for v4.9 and up
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

35879ee4

13 2月, 2017 1 次提交

bpf: introduce BPF_F_ALLOW_OVERRIDE flag · 7f677633

由 Alexei Starovoitov 提交于 2月 10, 2017

If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
to the given cgroup the descendent cgroup will be able to override
effective bpf program that was inherited from this cgroup.
By default it's not passed, therefore override is disallowed.

Examples:
1.
prog X attached to /A with default
prog Y fails to attach to /A/B and /A/B/C
Everything under /A runs prog X

2.
prog X attached to /A with allow_override.
prog Y fails to attach to /A/B with default (non-override)
prog M attached to /A/B with allow_override.
Everything under /A/B runs prog M only.

3.
prog X attached to /A with allow_override.
prog Y fails to attach to /A with default.
The user has to detach first to switch the mode.

In the future this behavior may be extended with a chain of
non-overridable programs.

Also fix the bug where detach from cgroup where nothing is attached
was not throwing error. Return ENOENT in such case.

Add several testcases and adjust libbpf.

Fixes: 30070984 ("cgroup: add support for eBPF programs")
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: NDaniel Mack <daniel@zonque.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f677633

12 2月, 2017 9 次提交

netfilter: nf_tables: add NFTA_RULE_ID attribute · 1a94e38d

由 Pablo Neira Ayuso 提交于 2月 10, 2017

This new attribute allows us to uniquely identify a rule in transaction.
Robots may trigger an insertion followed by deletion in a batch, in that
scenario we still don't have a public rule handle that we can use to
delete the rule. This is similar to the NFTA_SET_ID attribute that
allows us to refer to an anonymous set from a batch.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

1a94e38d

netfilter: nfnetlink: allow to check for generation ID · 8c4d4e8b

由 Pablo Neira Ayuso 提交于 2月 10, 2017

This patch allows userspace to specify the generation ID that has been
used to build an incremental batch update.

If userspace specifies the generation ID in the batch message as
attribute, then nfnetlink compares it to the current generation ID so
you make sure that you work against the right baseline. Otherwise, bail
out with ERESTART so userspace knows that its changeset is stale and
needs to respin. Userspace can do this transparently at the cost of
taking slightly more time to refresh caches and rework the changeset.

This check is optional, if there is no NFNL_BATCH_GENID attribute in the
batch begin message, then no check is performed.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

8c4d4e8b

net: rename dst_neigh_output back to neigh_output · c16ec185

由 Julian Anastasov 提交于 2月 11, 2017

After the dst->pending_confirm flag was removed, we do not
need anymore to provide dst arg to dst_neigh_output.
So, rename it to neigh_output as before commit 5110effe
("net: Do delayed neigh confirmation.").
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c16ec185

tap: tap as an independent module · 9a393b5d

由 Sainath Grandhi 提交于 2月 10, 2017

This patch makes tap a separate module for other types of virtual interfaces, for example,
ipvlan to use.
Signed-off-by: NSainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a393b5d

tap: Extending tap device create/destroy APIs · d9f1f61c

由 Sainath Grandhi 提交于 2月 10, 2017

Extending tap APIs get/free_minor and create/destroy_cdev to handle more than one
type of virtual interface.
Signed-off-by: NSainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9f1f61c

tap: Abstract type of virtual interface from tap implementation · 6fe3faf8

由 Sainath Grandhi 提交于 2月 10, 2017

macvlan object is re-structured to hold tap related elements in a separate
entity, tap_dev. Upon NETDEV_REGISTER device_event, tap_dev is registered with
idr and fetched again on tap_open. Few of the tap functions are modified to
accepted tap_dev as argument. tap_dev object includes callbacks to be used by
underlying virtual interface to take care of tx and rx accounting.
Signed-off-by: NSainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6fe3faf8

tap: Tap character device creation/destroy API · ebc05ba7

由 Sainath Grandhi 提交于 2月 10, 2017

This patch provides tap device create/destroy APIs in tap.c.
Signed-off-by: NSainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ebc05ba7

tap: Renaming tap related APIs, data structures, macros · 635b8c8e

由 Sainath Grandhi 提交于 2月 10, 2017

Renaming tap related APIs, data structures and macros in tap.c from macvtap_.* to tap_.*
Signed-off-by: NSainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

635b8c8e

tap: Refactoring macvtap.c · a8e04698

由 Sainath Grandhi 提交于 2月 10, 2017

macvtap module has code for tap/queue management and link management. This patch splits
the code into macvtap_main.c for link management and tap.c for tap/queue management.
Functionality in tap.c can be re-used for implementing tap on other virtual interfaces.
Signed-off-by: NSainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8e04698

11 2月, 2017 12 次提交

bitfield.h: add FIELD_FIT() helper · 1697599e

由 Jakub Kicinski 提交于 2月 09, 2017

Add a helper for checking at runtime that a value will fit inside
a specified field/mask.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1697599e

devlink: fix the name of eswitch commands · adf200f3

由 Jiri Pirko 提交于 2月 09, 2017

The eswitch_[gs]et command is supposed to be similar to port_[gs]et
command - for multiple eswitch attributes. However, when it was introduced
by 08f4b591 ("net/devlink: Add E-Switch mode control") it was wrongly
named with the word "mode" in it. So fix this now, make the oririnal
enum value existing but obsolete.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

adf200f3

net: dsa: remove unnecessary phy*.h includes · 4d56a29f

由 Russell King 提交于 2月 07, 2017

Including phy.h and phy_fixed.h into net/dsa.h causes phy*.h to be an
unnecessary dependency for quite a large amount of the kernel.  There's
very little which actually requires definitions from phy.h in net/dsa.h
- the include itself only wants the declaration of a couple of
structures and IFNAMSIZ.

Add linux/if.h for IFNAMSIZ, declarations for the structures, phy.h to
mv88e6xxx.h as it needs it for phy_interface_t, and remove both phy.h
and phy_fixed.h from net/dsa.h.

This patch reduces from around 800 files rebuilt to around 40 - even
with ccache, the time difference is noticable.
Tested-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4d56a29f

net/act_pedit: Introduce 'add' operation · 853a14ba

由 Amir Vadai 提交于 2月 07, 2017

This command could be useful to inc/dec fields.

For example, to forward any TCP packet and decrease its TTL:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower ip_proto tcp \
    action pedit munge ip ttl add 0xff pipe \
    action mirred egress redirect dev veth0

In the example above, adding 0xff to this u8 field is actually
decreasing it by one, since the operation is masked.
Signed-off-by: NAmir Vadai <amir@vadai.me>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

853a14ba

net/act_pedit: Support using offset relative to the conventional network headers · 71d0ed70

由 Amir Vadai 提交于 2月 07, 2017

Extend pedit to enable the user setting offset relative to network
headers. This change would enable to work with more complex header
schemes (vs the simple IPv4 case) where setting a fixed offset relative
to the network header is not enough.

After this patch, the action has information about the exact header type
and field inside this header. This information could be used later on
for hardware offloading of pedit.

Backward compatibility was being kept:
1. Old kernel <-> new userspace
2. New kernel <-> old userspace
3. add rule using new userspace <-> dump using old userspace
4. add rule using old userspace <-> dump using new userspace

When using the extended api, new netlink attributes are being used. This
way, operation will fail in (1) and (3) - and no malformed rule be added
or dumped. Of course, new user space that doesn't need the new
functionality can use the old netlink attributes and operation will
succeed.
Since action can support both api's, (2) should work, and it is easy to
write the new user space to have (4) work.

The action is having a strict check that only header types and commands
it can handle are accepted. This way future additions will be much
easier.

Usage example:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
  flower \
    ip_proto tcp \
    dst_port 80 \
  action pedit munge tcp dport set 8080 pipe \
  action mirred egress redirect dev veth0

Will forward tcp port whose original dest port is 80, while modifying
the destination port to 8080.
Signed-off-by: NAmir Vadai <amir@vadai.me>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71d0ed70

net/skbuff: Introduce skb_mac_offset() · ea6da4fd

由 Amir Vadai 提交于 2月 07, 2017

Introduce skb_mac_offset() that could be used to get mac header offset.
Signed-off-by: NAmir Vadai <amir@vadai.me>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea6da4fd

switchdev: bridge: Offload mc router ports · 6d549648

由 Nogah Frankel 提交于 2月 09, 2017

Offload the mc router ports list, whenever it is being changed.
It is done because in some cases mc packets needs to be flooded to all
the ports in this list.
Signed-off-by: NNogah Frankel <nogahf@mellanox.com>
Signed-off-by: NYotam Gigi <yotamg@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NIvan Vecera <ivecera@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d549648

switchdev: bridge: Offload multicast disabled · 147c1e9b

由 Nogah Frankel 提交于 2月 09, 2017

Offload multicast disabled flag, for more accurate mc flood behavior:
When it is on, the mdb should be ignored.
When it is off, unregistered mc packets should be flooded to mc router
ports.
Signed-off-by: NNogah Frankel <nogahf@mellanox.com>
Signed-off-by: NYotam Gigi <yotamg@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NIvan Vecera <ivecera@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

147c1e9b

sched: move tcf_proto_destroy and tcf_destroy_chain helpers into cls_api · cf1facda

由 Jiri Pirko 提交于 2月 09, 2017

Creation is done in this file, move destruction to be at the same place.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf1facda

sched: rename tcf_destroy to tcf_destroy_proto · 79112c26

由 Jiri Pirko 提交于 2月 09, 2017

This function destroys TC filter protocol, not TC filter. So name it
accordingly.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79112c26

afs: Move UUID struct to linux/uuid.h · ff548773

由 David Howells 提交于 2月 10, 2017

Move the afs_uuid struct to linux/uuid.h, rename it to uuid_v1 and change
the u16/u32 fields to __be16/__be32 instead so that the structure can be
cast to a 16-octet network-order buffer.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de

ff548773

ipv4: fib: Add events for FIB replace and append · 2f3a5272

由 Ido Schimmel 提交于 2月 09, 2017

The FIB notification chain currently uses the NLM_F_{REPLACE,APPEND}
flags to signal routes being replaced or appended.

Instead of using netlink flags for in-kernel notifications we can simply
introduce two new events in the FIB notification chain. This has the
added advantage of making the API cleaner, thereby making it clear that
these events should be supported by listeners of the notification chain.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f3a5272

10 2月, 2017 3 次提交

openvswitch: Add force commit. · dd41d33f

由 Jarno Rajahalme 提交于 2月 09, 2017

Stateful network admission policy may allow connections to one
direction and reject connections initiated in the other direction.
After policy change it is possible that for a new connection an
overlapping conntrack entry already exists, where the original
direction of the existing connection is opposed to the new
connection's initial packet.

Most importantly, conntrack state relating to the current packet gets
the "reply" designation based on whether the original direction tuple
or the reply direction tuple matched.  If this "directionality" is
wrong w.r.t. to the stateful network admission policy it may happen
that packets in neither direction are correctly admitted.

This patch adds a new "force commit" option to the OVS conntrack
action that checks the original direction of an existing conntrack
entry.  If that direction is opposed to the current packet, the
existing conntrack entry is deleted and a new one is subsequently
created in the correct direction.
Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Acked-by: NJoe Stringer <joe@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dd41d33f

openvswitch: Add original direction conntrack tuple to sw_flow_key. · 9dd7f890

由 Jarno Rajahalme 提交于 2月 09, 2017

Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key. The new fields are initially marked as non-existent, and
are populated whenever a conntrack action is executed and either finds
or generates a conntrack entry. This means that these fields exist
for all packets that were not rejected by conntrack as untrackable.

The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.

The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple. This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.

When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state. While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards. If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change. When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.

It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information. If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.

The fact that neither ARP nor ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields. Hence, the IP addresses are overlaid in union with ARP
and ND fields. This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets. ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.
Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
Acked-by: NJoe Stringer <joe@ovn.org>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9dd7f890

openvswitch: Unionize ovs_key_ct_label with a u32 array. · cb80d58f

由 Jarno Rajahalme 提交于 2月 09, 2017

Make the array of labels in struct ovs_key_ct_label an union, adding a
u32 array of the same byte size as the existing u8 array.  It is
faster to loop through the labels 32 bits at the time, which is also
the alignment of netlink attributes.
Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
Acked-by: NJoe Stringer <joe@ovn.org>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cb80d58f

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功