提交 · d6787147e15dffa7b7f3116a5bc3cbe0670bd74f · openeuler / Kernel

08 5月, 2019 1 次提交

net/sched: remove block pointer from common offload structure · d6787147

由 Pieter Jansen van Vuuren 提交于 5月 06, 2019

Based on feedback from Jiri avoid carrying a pointer to the tcf_block
structure in the tc_cls_common_offload structure. Instead store
a flag in driver private data which indicates if offloads apply
to a shared block at block binding time.
Suggested-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6787147

06 5月, 2019 1 次提交

net/sched: add block pointer to tc_cls_common_offload structure · 88c44a52

由 Pieter Jansen van Vuuren 提交于 5月 04, 2019

Some actions like the police action are stateful and could share state
between devices. This is incompatible with offloading to multiple devices
and drivers might want to test for shared blocks when offloading.
Store a pointer to the tcf_block structure in the tc_cls_common_offload
structure to allow drivers to determine when offloads apply to a shared
block.
Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88c44a52

28 4月, 2019 2 次提交

netlink: make validation more configurable for future strictness · 8cb08174

由 Johannes Berg 提交于 4月 26, 2019

We currently have two levels of strict validation:

 1) liberal (default)
     - undefined (type >= max) & NLA_UNSPEC attributes accepted
     - attribute length >= expected accepted
     - garbage at end of message accepted
 2) strict (opt-in)
     - NLA_UNSPEC attributes accepted
     - attribute length >= expected accepted

Split out parsing strictness into four different options:
 * TRAILING     - check that there's no trailing data after parsing
                  attributes (in message or nested)
 * MAXTYPE      - reject attrs > max known type
 * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
 * STRICT_ATTRS - strictly validate attribute size

The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().

Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.

We end up with the following renames:
 * nla_parse           -> nla_parse_deprecated
 * nla_parse_strict    -> nla_parse_deprecated_strict
 * nlmsg_parse         -> nlmsg_parse_deprecated
 * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
 * nla_parse_nested    -> nla_parse_nested_deprecated
 * nla_validate_nested -> nla_validate_nested_deprecated

Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.

Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.

Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.

In effect then, this adds fully strict validation for any new command.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8cb08174

netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de

由 Michal Kubecek 提交于 4月 26, 2019

Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.

Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().

Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:

@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)

@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae0be8de

25 4月, 2019 1 次提交

net: sched: flower: refactor reoffload for concurrent access · c049d56e

由 Vlad Buslov 提交于 4月 24, 2019

Recent changes that introduced unlocked flower did not properly account for
case when reoffload is initiated concurrently with filter updates. To fix
the issue, extend flower with 'hw_filters' list that is used to store
filters that don't have 'skip_hw' flag set. Filter is added to the list
when it is inserted to hardware and only removed from it after being
unoffloaded from all drivers that parent block is attached to. This ensures
that concurrent reoffload can still access filter that is being deleted and
prevents race condition when driver callback can be removed when filter is
no longer accessible trough idr, but is still present in hardware.

Refactor fl_change() to respect new filter reference counter and to release
filter reference with __fl_put() in case of error, instead of directly
deallocating filter memory. This allows for concurrent access to filter
from fl_reoffload() and protects it with reference counting. Refactor
fl_reoffload() to iterate over hw_filters list instead of idr. Implement
fl_get_next_hw_filter() helper function that is used to iterate over
hw_filters list with reference counting and skips filters that are being
concurrently deleted.

Fixes: 92149190 ("net: sched: flower: set unlocked flag for flower proto ops")
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c049d56e

12 4月, 2019 2 次提交

net: sched: flower: fix filter net reference counting · 9994677c

由 Vlad Buslov 提交于 4月 12, 2019

Fix net reference counting in fl_change() and remove redundant call to
tcf_exts_get_net() from __fl_delete(). __fl_put() already tries to get net
before releasing exts and deallocating a filter, so this code caused flower
classifier to obtain net twice per filter that is being deleted.

Implementation of __fl_delete() called tcf_exts_get_net() to pass its
result as 'async' flag to fl_mask_put(). However, 'async' flag is redundant
and only complicates fl_mask_put() implementation. This functionality seems
to be copied from filter cleanup code, where it was added by Cong with
following explanation:

This patchset tries to fix the race between call_rcu() and
cleanup_net() again. Without holding the netns refcnt the
tc_action_net_exit() in netns workqueue could be called before
filter destroy works in tc filter workqueue. This patchset
moves the netns refcnt from tc actions to tcf_exts, without
breaking per-netns tc actions.

This doesn't apply to flower mask, which doesn't call any tc action code
during cleanup. Simplify fl_mask_put() by removing the flag parameter and
always use tcf_queue_work() to free mask objects.

Fixes: 06177558 ("net: sched: flower: introduce reference counting for filters")
Fixes: 1f17f774 ("net: sched: flower: insert filter to ht before offloading it to hw")
Fixes: 05cd271f ("cls_flower: Support multiple masks per priority")
Reported-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9994677c

net: sched: flower: use correct ht function to prevent duplicates · 9e35552a

由 Vlad Buslov 提交于 4月 11, 2019

Implementation of function rhashtable_insert_fast() check if its internal
helper function __rhashtable_insert_fast() returns non-NULL pointer and
seemingly return -EEXIST in such case. However, since
__rhashtable_insert_fast() is called with NULL key pointer, it never
actually checks for duplicates, which means that -EEXIST is never returned
to the user. Use rhashtable_lookup_insert_fast() hash table API instead. In
order to verify that it works as expected and prevent the problem from
happening in future, extend tc-tests with new test that verifies that no
new filters with existing key can be inserted to flower classifier.

Fixes: 1f17f774 ("net: sched: flower: insert filter to ht before offloading it to hw")
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e35552a

08 4月, 2019 1 次提交

net: sched: flower: insert filter to ht before offloading it to hw · 1f17f774

由 Vlad Buslov 提交于 4月 05, 2019

John reports:

Recent refactoring of fl_change aims to use the classifier spinlock to
avoid the need for rtnl lock. In doing so, the fl_hw_replace_filer()
function was moved to before the lock is taken. This can create problems
for drivers if duplicate filters are created (commmon in ovs tc offload
due to filters being triggered by user-space matches).

Drivers registered for such filters will now receive multiple copies of
the same rule, each with a different cookie value. This means that the
drivers would need to do a full match field lookup to determine
duplicates, repeating work that will happen in flower __fl_lookup().
Currently, drivers do not expect to receive duplicate filters.

To fix this, verify that filter with same key is not present in flower
classifier hash table and insert the new filter to the flower hash table
before offloading it to hardware. Implement helper function
fl_ht_insert_unique() to atomically verify/insert a filter.

This change makes filter visible to fast path at the beginning of
fl_change() function, which means it can no longer be freed directly in
case of error. Refactor fl_change() error handling code to deallocate the
filter with rcu timeout.

Fixes: 620da486 ("net: sched: flower: refactor fl_change")
Reported-by: NJohn Hurley <john.hurley@netronome.com>
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f17f774

05 4月, 2019 1 次提交

net: sched: ensure tc flower reoffload takes filter ref · 95e27a4d

由 John Hurley 提交于 4月 02, 2019

Recent changes to TC flower remove the requirement for rtnl lock when
accessing and modifying filters. Refcounts now ensure access and deletion
do not happen concurrently. However, the reoffload function which cycles
through all filters and replays them to registered hw drivers is not
protected.

Use the fl_get_next_filter() function to cycle the filters for reoffload
and ensure the ref taken by this function is put when done with each
filter.
Signed-off-by: NJohn Hurley <john.hurley@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NVlad Buslov <vladbu@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95e27a4d

22 3月, 2019 12 次提交

net: sched: flower: set unlocked flag for flower proto ops · 92149190

由 Vlad Buslov 提交于 3月 21, 2019

Set TCF_PROTO_OPS_DOIT_UNLOCKED for flower classifier to indicate that its
ops callbacks don't require caller to hold rtnl lock. Don't take rtnl lock
in fl_destroy_filter_work() that is executed on workqueue instead of being
called by cls API and is not affected by setting
TCF_PROTO_OPS_DOIT_UNLOCKED. Rtnl mutex is still manually taken by flower
classifier before calling hardware offloads API that has not been updated
for unlocked execution.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92149190

net: sched: flower: track rtnl lock state · c24e43d8

由 Vlad Buslov 提交于 3月 21, 2019

Use 'rtnl_held' flag to track if caller holds rtnl lock. Propagate the flag
to internal functions that need to know rtnl lock state. Take rtnl lock
before calling tcf APIs that require it (hw offload, bind filter, etc.).
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c24e43d8

net: sched: flower: protect flower classifier state with spinlock · 3d81e711

由 Vlad Buslov 提交于 3月 21, 2019

struct tcf_proto was extended with spinlock to be used by classifiers
instead of global rtnl lock. Use it to protect shared flower classifier
data structures (handle_idr, mask hashtable and list) and fields of
individual filters that can be accessed concurrently. This patch set uses
tcf_proto->lock as per instance lock that protects all filters on
tcf_proto.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d81e711

net: sched: flower: handle concurrent tcf proto deletion · 272ffaad

由 Vlad Buslov 提交于 3月 21, 2019

Without rtnl lock protection tcf proto can be deleted concurrently. Check
tcf proto 'deleting' flag after taking tcf spinlock to verify that no
concurrent deletion is in progress. Return EAGAIN error if concurrent
deletion detected, which will cause caller to retry and possibly create new
instance of tcf proto.

Retry mechanism is a result of fine-grained locking approach used in this
and previous changes in series and is necessary to allow concurrent updates
on same chain instance. Alternative approach would be to lock the whole
chain while updating filters on any of child tp's, adding and removing
classifier instances from the chain. However, since most CPU-intensive
parts of filter update code are specifically in classifier code and its
dependencies (extensions and hw offloads), such approach would negate most
of the gains introduced by this change and previous changes in the series
when updating same chain instance.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

272ffaad

net: sched: flower: handle concurrent filter insertion in fl_change · 9a2d9389

由 Vlad Buslov 提交于 3月 21, 2019

Check if user specified a handle and another filter with the same handle
was inserted concurrently. Return EAGAIN to retry filter processing (in
case it is an overwrite request).
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a2d9389

net: sched: flower: protect masks list with spinlock · 259e60f9

由 Vlad Buslov 提交于 3月 21, 2019

Protect modifications of flower masks list with spinlock to remove
dependency on rtnl lock and allow concurrent access.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

259e60f9

net: sched: flower: handle concurrent mask insertion · 195c234d

由 Vlad Buslov 提交于 3月 21, 2019

Without rtnl lock protection masks with same key can be inserted
concurrently. Insert temporary mask with reference count zero to masks
hashtable. This will cause any concurrent modifications to retry.

Wait for rcu grace period to complete after removing temporary mask from
masks hashtable to accommodate concurrent readers.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Suggested-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

195c234d

net: sched: flower: add reference counter to flower mask · f48ef4d5

由 Vlad Buslov 提交于 3月 21, 2019

Extend fl_flow_mask structure with reference counter to allow parallel
modification without relying on rtnl lock. Use rcu read lock to safely
lookup mask and increment reference counter in order to accommodate
concurrent deletes.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f48ef4d5

net: sched: flower: track filter deletion with flag · b2552b8c

由 Vlad Buslov 提交于 3月 21, 2019

In order to prevent double deletion of filter by concurrent tasks when rtnl
lock is not used for synchronization, add 'deleted' filter field. Check
value of this field when modifying filters and return error if concurrent
deletion is detected.

Refactor __fl_delete() to accept pointer to 'last' boolean as argument,
and return error code as function return value instead. This is necessary
to signal concurrent filter delete to caller.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2552b8c

net: sched: flower: introduce reference counting for filters · 06177558

由 Vlad Buslov 提交于 3月 21, 2019

Extend flower filters with reference counting in order to remove dependency
on rtnl lock in flower ops and allow to modify filters concurrently.
Reference to flower filter can be taken/released concurrently as soon as it
is marked as 'unlocked' by last patch in this series. Use atomic reference
counter type to make concurrent modifications safe.

Always take reference to flower filter while working with it:
- Modify fl_get() to take reference to filter.
- Implement tp->put() callback as fl_put() function to allow cls API to
release reference taken by fl_get().
- Modify fl_change() to assume that caller holds reference to fold and take
reference to fnew.
- Take reference to filter while using it in fl_walk().

Implement helper functions to get/put filter reference counter.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06177558

net: sched: flower: refactor fl_change · 620da486

由 Vlad Buslov 提交于 3月 21, 2019

As a preparation for using classifier spinlock instead of relying on
external rtnl lock, rearrange code in fl_change. The goal is to group the
code which changes classifier state in single block in order to allow
following commits in this set to protect it from parallel modification with
tp->lock. Data structures that require tp->lock protection are mask
hashtable and filters list, and classifier handle_idr.

fl_hw_replace_filter() is a sleeping function and cannot be called while
holding a spinlock. In order to execute all sequence of changes to shared
classifier data structures atomically, call fl_hw_replace_filter() before
modifying them.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

620da486

net: sched: flower: don't check for rtnl on head dereference · e474619a

由 Vlad Buslov 提交于 3月 21, 2019

Flower classifier only changes root pointer during init and destroy. Cls
API implements reference counting for tcf_proto, so there is no danger of
concurrent access to tp when it is being destroyed, even without protection
provided by rtnl lock.

Implement new function fl_head_dereference() to dereference tp->root
without checking for rtnl lock. Use it in all flower function that obtain
head pointer instead of rtnl_dereference().
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e474619a

07 3月, 2019 1 次提交

net: sched: flower: insert new filter to idr after setting its mask · ecb3dea4

由 Vlad Buslov 提交于 3月 06, 2019

When adding new filter to flower classifier, fl_change() inserts it to
handle_idr before initializing filter extensions and assigning it a mask.
Normally this ordering doesn't matter because all flower classifier ops
callbacks assume rtnl lock protection. However, when filter has an action
that doesn't have its kernel module loaded, rtnl lock is released before
call to request_module(). During this time the filter can be accessed bu
concurrent task before its initialization is completed, which can lead to a
crash.

Example case of NULL pointer dereference in concurrent dump:

Task 1                           Task 2

tc_new_tfilter()
 fl_change()
  idr_alloc_u32(fnew)
  fl_set_parms()
   tcf_exts_validate()
    tcf_action_init()
     tcf_action_init_1()
      rtnl_unlock()
      request_module()
      ...                        rtnl_lock()
      				 tc_dump_tfilter()
      				  tcf_chain_dump()
				   fl_walk()
				    idr_get_next_ul()
				    tcf_node_dump()
				     tcf_fill_node()
				      fl_dump()
				       mask = &f->mask->key; <- NULL ptr
      rtnl_lock()

Extension initialization and mask assignment don't depend on fnew->handle
that is allocated by idr_alloc_u32(). Move idr allocation code after action
creation and mask assignment in fl_change() to prevent concurrent access
to not fully initialized filter when rtnl lock is released to load action
module.

Fixes: 01683a14 ("net: sched: refactor flower walk to iterate over idr")
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ecb3dea4

23 2月, 2019 1 次提交

net_sched: initialize net pointer inside tcf_exts_init() · 14215108

由 Cong Wang 提交于 2月 20, 2019

For tcindex filter, it is too late to initialize the
net pointer in tcf_exts_validate(), as tcf_exts_get_net()
requires a non-NULL net pointer. We can just move its
initialization into tcf_exts_init(), which just requires
an additional parameter.

This makes the code in tcindex_alloc_perfect_hash()
prettier.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14215108

14 2月, 2019 1 次提交

net: sched: flower: only return error from hw offload if skip_sw · 1f15bb4f

由 Vlad Buslov 提交于 2月 12, 2019

Recently introduced tc_setup_flow_action() can fail when parsing tcf_exts
on some unsupported action commands. However, this should not affect the
case when user did not explicitly request hw offload by setting skip_sw
flag. Modify tc_setup_flow_action() callers to only propagate the error if
skip_sw flag is set for filter that is being offloaded, and set extack
error message in that case.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Fixes: 3a7b6861 ("cls_api: add translator to flow_action representation")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f15bb4f

13 2月, 2019 2 次提交

net: sched: extend proto ops to support unlocked classifiers · 12db03b6

由 Vlad Buslov 提交于 2月 11, 2019

Add 'rtnl_held' flag to tcf proto change, delete, destroy, dump, walk
functions to track rtnl lock status. Extend users of these function in cls
API to propagate rtnl lock status to them. This allows classifiers to
obtain rtnl lock when necessary and to pass rtnl lock status to extensions
and driver offload callbacks.

Add flags field to tcf proto ops. Add flag value to indicate that
classifier doesn't require rtnl lock.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12db03b6

net: sched: track rtnl lock status when validating extensions · ec6743a1

由 Vlad Buslov 提交于 2月 11, 2019

Actions API is already updated to not rely on rtnl lock for
synchronization. However, it need to be provided with rtnl status when
called from classifiers API in order to be able to correctly release the
lock when loading kernel module.

Extend extension validation function with 'rtnl_held' flag which is passed
to actions API. Add new 'rtnl_held' parameter to tcf_exts_validate() in cls
API. No classifier is currently updated to support unlocked execution, so
pass hardcoded 'true' flag parameter value.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec6743a1

07 2月, 2019 5 次提交

cls_flower: don't expose TC actions to drivers anymore · 2cd173e6

由 Pablo Neira Ayuso 提交于 2月 02, 2019

Now that drivers have been converted to use the flow action
infrastructure, remove this field from the tc_cls_flower_offload
structure.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2cd173e6

flow_offload: add statistics retrieval infrastructure and use it · 3b1903ef

由 Pablo Neira Ayuso 提交于 2月 02, 2019

This patch provides the flow_stats structure that acts as container for
tc_cls_flower_offload, then we can use to restore the statistics on the
existing TC actions. Hence, tcf_exts_stats_update() is not used from
drivers anymore.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b1903ef

cls_api: add translator to flow_action representation · 3a7b6861

由 Pablo Neira Ayuso 提交于 2月 02, 2019

This patch implements a new function to translate from native TC action
to the new flow_action representation. Moreover, this patch also updates
cls_flower to use this new function.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a7b6861

flow_offload: add flow action infrastructure · e3ab786b

由 Pablo Neira Ayuso 提交于 2月 02, 2019

This new infrastructure defines the nic actions that you can perform
from existing network drivers. This infrastructure allows us to avoid a
direct dependency with the native software TC action representation.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3ab786b

flow_offload: add flow_rule and flow_match structures and use them · 8f256622

由 Pablo Neira Ayuso 提交于 2月 02, 2019

This patch wraps the dissector key and mask - that flower uses to
represent the matching side - around the flow_match structure.

To avoid a follow up patch that would edit the same LoCs in the drivers,
this patch also wraps this new flow match structure around the flow rule
object. This new structure will also contain the flow actions in follow
up patches.

This introduces two new interfaces:

	bool flow_rule_match_key(rule, dissector_id)

that returns true if a given matching key is set on, and:

	flow_rule_match_XYZ(rule, &match);

To fetch the matching side XYZ into the match container structure, to
retrieve the key and the mask with one single call.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f256622

05 2月, 2019 1 次提交

net: cls_flower: Remove filter from mask before freeing it · c1f7e029

由 Petr Machata 提交于 2月 04, 2019

In fl_change(), when adding a new rule (i.e. fold == NULL), a driver may
reject the new rule, for example due to resource exhaustion. By that
point, the new rule was already assigned a mask, and it was added to
that mask's hash table. The clean-up path that's invoked as a result of
the rejection however neglects to undo the hash table addition, and
proceeds to free the new rule, thus leaving a dangling pointer in the
hash table.

Fix by removing fnew from the mask's hash table before it is freed.

Fixes: 35cc3cef ("net/sched: cls_flower: Reject duplicated rules also under skip_sw")
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1f7e029

18 1月, 2019 1 次提交

net/sched: cls_flower: allocate mask dynamically in fl_change() · 2cddd201

由 Ivan Vecera 提交于 1月 16, 2019

Recent changes (especially 05cd271f ("cls_flower: Support multiple
masks per priority")) in the fl_flow_mask structure grow it and its
current size e.g. on x86_64 with defconfig is 760 bytes and more than
1024 bytes with some debug options enabled. Prior the mentioned commit
its size was 176 bytes (using defconfig on x86_64).
With regard to this fact it's reasonable to allocate this structure
dynamically in fl_change() to reduce its stack size.

v2:
- use kzalloc() instead of kcalloc()

Fixes: 05cd271f ("cls_flower: Support multiple masks per priority")
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Paul Blakey <paulb@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIvan Vecera <ivecera@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2cddd201

20 12月, 2018 1 次提交

net/sched: cls_flower: Remove old entries from rhashtable · 599d2570

由 Roi Dayan 提交于 12月 19, 2018

When replacing a rule we add the new rule to the rhashtable
but only remove the old if not in skip_sw.
This commit fix this and remove the old rule anyway.

Fixes: 35cc3cef ("net/sched: cls_flower: Reject duplicated rules also under skip_sw")
Signed-off-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NVlad Buslov <vladbu@mellanox.com>
Acked-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

599d2570

15 12月, 2018 1 次提交

net_sched: fold tcf_block_cb_call() into tc_setup_cb_call() · aeb3fecd

由 Cong Wang 提交于 12月 11, 2018

After commit 69bd4840 ("net/sched: Remove egdev mechanism"),
tc_setup_cb_call() is nearly identical to tcf_block_cb_call(),
so we can just fold tcf_block_cb_call() into tc_setup_cb_call()
and remove its unused parameter 'exts'.

Fixes: 69bd4840 ("net/sched: Remove egdev mechanism")
Cc: Oz Shlomo <ozsh@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NOz Shlomo <ozsh@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aeb3fecd

10 12月, 2018 1 次提交

net/sched: cls_flower: Reject duplicated rules also under skip_sw · 35cc3cef

由 Or Gerlitz 提交于 12月 09, 2018

Currently, duplicated rules are rejected only for skip_hw or "none",
hence allowing users to push duplicates into HW for no reason.

Use the flower tables to protect for that.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NPaul Blakey <paulb@mellanox.com>
Reported-by: NChris Mi <chrism@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35cc3cef

16 11月, 2018 1 次提交

net: sched: cls_flower: Classify packets using port ranges · 5c72299f

由 Amritha Nambiar 提交于 11月 12, 2018

Added support in tc flower for filtering based on port ranges.

Example:
1. Match on a port range:
-------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
  action drop

$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ip_proto tcp
  dst_port range 20-30
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 85 sec used 3 sec
        Action statistics:
        Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

2. Match on IP address and port range:
--------------------------------------
$ tc filter add dev enp4s0 protocol ip parent ffff:\
  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
  skip_hw action drop

$ tc -s filter show dev enp4s0 parent ffff:
filter protocol ip pref 1 flower chain 0 handle 0x2
  eth_type ipv4
  ip_proto tcp
  dst_ip 192.168.1.1
  dst_port range 100-200
  skip_hw
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 2 ref 1 bind 1 installed 58 sec used 2 sec
        Action statistics:
        Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

v4:
1. Added condition before setting port key.
2. Organized setting and dumping port range keys into functions
   and added validation of input range.

v3:
1. Moved new fields in UAPI enum to the end of enum.
2. Removed couple of empty lines.

v2:
Addressed Jiri's comments:
1. Added separate functions for dst and src comparisons.
2. Removed endpoint enum.
3. Added new bit TCA_FLOWER_FLAGS_RANGE to decide normal/range
  lookup.
4. Cleaned up fl_lookup function.
Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c72299f

11 11月, 2018 1 次提交

net: sched: cls_flower: validate nested enc_opts_policy to avoid warning · 63c82997

由 Jakub Kicinski 提交于 11月 09, 2018

TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only
currently contain further nested attributes, which are parsed by
hand, so the policy is never actually used resulting in a W=1
build warning:

net/sched/cls_flower.c:492:1: warning: ‘enc_opts_policy’ defined but not used [-Wunused-const-variable=]
 enc_opts_policy[TCA_FLOWER_KEY_ENC_OPTS_MAX + 1] = {

Add the validation anyway to avoid potential bugs when other
attributes are added and to make the attribute structure slightly
more clear.  Validation will also set extact to point to bad
attribute on error.

Fixes: 0a6e7778 ("net/sched: allow flower to match tunnel options")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NSimon Horman <simon.horman@netronome.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63c82997

05 10月, 2018 1 次提交

net_sched: convert idrinfo->lock from spinlock to a mutex · 95278dda

由 Cong Wang 提交于 10月 02, 2018

In commit ec3ed293 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
we move fl_hw_destroy_tmplt() to a workqueue to avoid blocking
with the spinlock held. Unfortunately, this causes a lot of
troubles here:

1. tcf_chain_destroy() could be called right after we queue the work
   but before the work runs. This is a use-after-free.

2. The chain refcnt is already 0, we can't even just hold it again.
   We can check refcnt==1 but it is ugly.

3. The chain with refcnt 0 is still visible in its block, which means
   it could be still found and used!

4. The block has a refcnt too, we can't hold it without introducing a
   proper API either.

We can make it working but the end result is ugly. Instead of wasting
time on reviewing it, let's just convert the troubling spinlock to
a mutex, which allows us to use non-atomic allocations too.

Fixes: ec3ed293 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
Reported-by: NIdo Schimmel <idosch@idosch.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Tested-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95278dda

21 9月, 2018 1 次提交

net_sched: change tcf_del_walker() to take idrinfo->lock · ec3ed293

由 Vlad Buslov 提交于 9月 19, 2018

Action API was changed to work with actions and action_idr in concurrency
safe manner, however tcf_del_walker() still uses actions without taking a
reference or idrinfo->lock first, and deletes them directly, disregarding
possible concurrent delete.

Change tcf_del_walker() to take idrinfo->lock while iterating over actions
and use new tcf_idr_release_unsafe() to release them while holding the
lock.

And the blocking function fl_hw_destroy_tmplt() could be called when we
put a filter chain, so defer it to a work queue.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
[xiyou.wangcong@gmail.com: heavily modify the code and changelog]
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec3ed293

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功