提交 · 0e9796b4af9ef490e203158cb738a5a4986eb75c · openanolis / cloud-kernel

23 5月, 2014 18 次提交

openvswitch: Reduce locking requirements. · 0e9796b4

由 Jarno Rajahalme 提交于 5月 05, 2014

Reduce and clarify locking requirements for ovs_flow_cmd_alloc_info(),
ovs_flow_cmd_fill_info() and ovs_flow_cmd_build_info().

A datapath pointer is available only when holding a lock.  Change
ovs_flow_cmd_fill_info() and ovs_flow_cmd_build_info() to take a
dp_ifindex directly, rather than a datapath pointer that is then
(only) used to get the dp_ifindex.  This is useful, since the
dp_ifindex is available even when the datapath pointer is not, both
before and after taking a lock, which makes further critical section
reduction possible.

Make ovs_flow_cmd_alloc_info() take an 'acts' argument instead a
'flow' pointer.  This allows some future patches to do the allocation
before acquiring the flow pointer.

The locking requirements after this patch are:

ovs_flow_cmd_alloc_info(): May be called without locking, must not be
called while holding the RCU read lock (due to memory allocation).
If 'acts' belong to a flow in the flow table, however, then the
caller must hold ovs_mutex.

ovs_flow_cmd_fill_info(): Either ovs_mutex or RCU read lock must be held.

ovs_flow_cmd_build_info(): This calls both of the above, so the caller
must hold ovs_mutex.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

0e9796b4

openvswitch: Fix ovs_flow_stats_get/clear RCU dereference. · 86ec8dba

由 Jarno Rajahalme 提交于 5月 05, 2014

For ovs_flow_stats_get() using ovsl_dereference() was wrong, since
flow dumps call this with RCU read lock.

ovs_flow_stats_clear() is always called with ovs_mutex, so can use
ovsl_dereference().

Also, make the ovs_flow_stats_get() 'flow' argument const to make
later patches cleaner.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

86ec8dba

openvswitch: Fix typo. · eb072659

由 Jarno Rajahalme 提交于 5月 05, 2014

Incorrect struct name was confusing, even though otherwise
inconsequental.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

eb072659

openvswitch: Minimize dp and vport critical sections. · 6093ae9a

由 Jarno Rajahalme 提交于 5月 05, 2014

Move most memory allocations away from the ovs_mutex critical
sections.  vport allocations still happen while the lock is taken, as
changing that would require major refactoring. Also, vports are
created very rarely so it should not matter.

Change ovs_dp_cmd_get() now only takes the rcu_read_lock(), rather
than ovs_lock(), as nothing need to be changed.  This was done by
ovs_vport_cmd_get() already.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

6093ae9a

openvswitch: Make flow mask removal symmetric. · 56c19868

由 Jarno Rajahalme 提交于 5月 05, 2014

Masks are inserted when flows are inserted to the table, so it is
logical to correspondingly remove masks when flows are removed from
the table, in ovs_flow_table_remove().

This allows ovs_flow_free() to be called without locking, which will
be used by later patches.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

56c19868

openvswitch: Build flow cmd netlink reply only if needed. · fb5d1e9e

由 Jarno Rajahalme 提交于 5月 05, 2014

Use netlink_has_listeners() and NLM_F_ECHO flag to determine if a
reply is needed or not for OVS_FLOW_CMD_NEW, OVS_FLOW_CMD_SET, or
OVS_FLOW_CMD_DEL.  Currently, OVS userspace does not request a reply
for OVS_FLOW_CMD_NEW, but usually does for OVS_FLOW_CMD_DEL, as stats
may have changed.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

fb5d1e9e

openvswitch: Clarify locking. · bb6f9a70

由 Jarno Rajahalme 提交于 5月 05, 2014

Remove unnecessary locking from functions that are always called with
appropriate locking.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NThomas Graf <tgraf@redhat.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

bb6f9a70

openvswitch: Avoid assigning a NULL pointer to flow actions. · be52c9e9

由 Jarno Rajahalme 提交于 5月 05, 2014

Flow SET can accept an empty set of actions, with the intended
semantics of leaving existing actions unmodified.  This seems to have
been brokin after OVS 1.7, as we have assigned the flow's actions
pointer to NULL in this case, but we never check for the NULL pointer
later on.  This patch restores the intended behavior and documents it
in the include/linux/openvswitch.h.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

be52c9e9

openvswitch: Compact sw_flow_key. · 1139e241

由 Jarno Rajahalme 提交于 5月 05, 2014

Minimize padding in sw_flow_key and move 'tp' top the main struct.
These changes simplify code when accessing the transport port numbers
and the tcp flags, and makes the sw_flow_key 8 bytes smaller on 64-bit
systems (128->120 bytes).  These changes also make the keys for IPv4
packets to fit in one cache line.

There is a valid concern for safety of packing the struct
ovs_key_ipv4_tunnel, as it would be possible to take the address of
the tun_id member as a __be64 * which could result in unaligned access
in some systems. However:

- sw_flow_key itself is 64-bit aligned, so the tun_id within is
  always
  64-bit aligned.
- We never make arrays of ovs_key_ipv4_tunnel (which would force
  every
  second tun_key to be misaligned).
- We never take the address of the tun_id in to a __be64 *.
- Whereever we use struct ovs_key_ipv4_tunnel outside the
  sw_flow_key,
  it is in stack (on tunnel input functions), where compiler has full
  control of the alignment.
Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

1139e241

ieee802154: missing put_dev() on error · b3f7a7b4

由 Dan Carpenter 提交于 5月 22, 2014

We should call put_dev() on the error path here.

Fixes: 3e9c156e ('ieee802154: add netlink interfaces for llsec')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b3f7a7b4

bridge: make br_device_notifier static · b1282726

由 Cong Wang 提交于 5月 20, 2014

Merge net/bridge/br_notify.c into net/bridge/br.c,
since it has only br_device_event() and br.c is small.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1282726

net/dccp/timer.c: use 'u64' instead of 's64' to avoid compiler's warning · 5c4a43b0

由 Chen Gang 提交于 5月 21, 2014

'dccp_timestamp_seed' is initialized once by ktime_get_real() in
dccp_timestamping_init(). It is always less than ktime_get_real()
in dccp_timestamp().

Then, ktime_us_delta() in dccp_timestamp() will always return positive
number. So can use manual type cast to let compiler and do_div() know
about it to avoid warning.

The related warning (with allmodconfig under unicore32):

    CC [M]  net/dccp/timer.o
  net/dccp/timer.c: In function ‘dccp_timestamp’:
  net/dccp/timer.c:285: warning: comparison of distinct pointer types lacks a cast
Signed-off-by: NChen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c4a43b0

mac802154: llsec: correctly lookup implicit-indexed keys · 53819a6c

由 Phoebe Buckheister 提交于 5月 20, 2014

Key id comparison for type 1 keys (implicit source, with index) should
return true if mode and id are equal, not false.
Signed-off-by: NPhoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53819a6c

mac802154: llsec: fold useless return value check · 62e9c117

由 Phoebe Buckheister 提交于 5月 20, 2014

llsec_do_encrypt will never return a positive value, so the restriction
to 0-or-negative on return is useless.
Signed-off-by: NPhoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62e9c117

mac802154: llsec: fix incorrect lock pairing · 6f3eabcd

由 Phoebe Buckheister 提交于 5月 20, 2014

In encrypt, sec->lock is taken with read_lock_bh, so in the error path,
we must read_unlock_bh.
Signed-off-by: NPhoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f3eabcd

vlan: more careful checksum features handling · da08143b

由 Michal Kubeček 提交于 5月 20, 2014

When combining real_dev's features and vlan_features, simple
bitwise AND is used. This doesn't work well for checksum
offloading features as if one set has NETIF_F_HW_CSUM and the
other NETIF_F_IP_CSUM and/or NETIF_F_IPV6_CSUM, we end up with
no checksum offloading. However, from the logical point of view
(how can_checksum_protocol() works), NETIF_F_HW_CSUM contains
the functionality of NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM so
that the result should be IP/IPV6.

Add helper function netdev_intersect_features() implementing
this logic and use it in vlan_dev_fix_features().
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da08143b

net: Add a software TSO helper API · e876f208

由 Ezequiel Garcia 提交于 5月 19, 2014

Although the implementation probably needs a lot of work, this initial API
allows to implement software TSO in mvneta and mv643xx_eth drivers in a not
so intrusive way.
Signed-off-by: NEzequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e876f208

tcp: make cwnd-limited checks measurement-based, and gentler · ca8a2263

由 Neal Cardwell 提交于 5月 22, 2014

Experience with the recent e114a710 ("tcp: fix cwnd limited
checking to improve congestion control") has shown that there are
common cases where that commit can cause cwnd to be much larger than
necessary. This leads to TSO autosizing cooking skbs that are too
large, among other things.

The main problems seemed to be:

(1) That commit attempted to predict the future behavior of the
connection by looking at the write queue (if TSO or TSQ limit
sending). That prediction sometimes overestimated future outstanding
packets.

(2) That commit always allowed cwnd to grow to twice the number of
outstanding packets (even in congestion avoidance, where this is not
needed).

This commit improves both of these, by:

(1) Switching to a measurement-based approach where we explicitly
track the largest number of packets in flight during the past window
("max_packets_out"), and remember whether we were cwnd-limited at the
moment we finished sending that flight.

(2) Only allowing cwnd to grow to twice the number of outstanding
packets ("max_packets_out") in slow start. In congestion avoidance
mode we now only allow cwnd to grow if it was fully utilized.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca8a2263

22 5月, 2014 3 次提交

net: filter: cleanup invocation of internal BPF · 5fe821a9

由 Alexei Starovoitov 提交于 5月 19, 2014

Kernel API for classic BPF socket filters is:

sk_unattached_filter_create() - validate classic BPF, convert, JIT
SK_RUN_FILTER() - run it
sk_unattached_filter_destroy() - destroy socket filter

Cleanup internal BPF kernel API as following:

sk_filter_select_runtime() - final step of internal BPF creation.
  Try to JIT internal BPF program, if JIT is not available select interpreter
SK_RUN_FILTER() - run it
sk_filter_free() - free internal BPF program

Disallow direct calls to BPF interpreter. Execution of the BPF program should
be done with SK_RUN_FILTER() macro.

Example of internal BPF create, run, destroy:

  struct sk_filter *fp;

  fp = kzalloc(sk_filter_size(prog_len), GFP_KERNEL);
  memcpy(fp->insni, prog, prog_len * sizeof(fp->insni[0]));
  fp->len = prog_len;

  sk_filter_select_runtime(fp);

  SK_RUN_FILTER(fp, ctx);

  sk_filter_free(fp);

Sockets, seccomp, testsuite, tracing are using different ways to populate
sk_filter, so first steps of program creation are not common.
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Acked-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5fe821a9

ipv6: slight optimization in ip6_dst_gc · 14956643

由 Li RongQing 提交于 5月 19, 2014

entries is always greater than rt_max_size here, since if entries is less
than rt_max_size, the fib6_run_gc function will be skipped
Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14956643

net: tunnels - enable module autoloading · f98f89a0

由 Tom Gundersen 提交于 5月 15, 2014

Enable the module alias hookup to allow tunnel modules to be autoloaded on demand.

This is in line with how most other netdev kinds work, and will allow userspace
to create tunnels without having CAP_SYS_MODULE.
Signed-off-by: NTom Gundersen <teg@jklm.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f98f89a0

19 5月, 2014 19 次提交

netfilter: nf_tables: defer all object release via rcu · c7c32e72

由 Pablo Neira Ayuso 提交于 4月 10, 2014

Now that all objects are released in the reverse order via the
transaction infrastructure, we can enqueue the release via
call_rcu to save one synchronize_rcu. For small rule-sets loaded
via nft -f, it now takes around 50ms less here.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c7c32e72

netfilter: nf_tables: remove skb and nlh from context structure · 128ad332

由 Pablo Neira Ayuso 提交于 5月 09, 2014

Instead of caching the original skbuff that contains the netlink
messages, this stores the netlink message sequence number, the
netlink portID and the report flag. This helps to prepare the
introduction of the object release via call_rcu.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

128ad332

netfilter: nf_tables: simplify nf_tables_*_notify · 35151d84

由 Pablo Neira Ayuso 提交于 5月 05, 2014

Now that all these function are called from the commit path, we can
pass the context structure to reduce the amount of parameters in all
of the nf_tables_*_notify functions. This patch also removes unneeded
branches to check for skb, nlh and net that should be always set in
the context structure.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

35151d84

netfilter: nf_tables: use new transaction infrastructure to handle elements · 60319eb1

由 Pablo Neira Ayuso 提交于 4月 04, 2014

Leave the set content in consistent state if we fail to load the
batch. Use the new generic transaction infrastructure to achieve
this.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

60319eb1

netfilter: nf_tables: use new transaction infrastructure to handle table · 55dd6f93

由 Pablo Neira Ayuso 提交于 4月 03, 2014

This patch speeds up rule-set updates and it also provides a way
to revert updates and leave things in consistent state in case that
the batch needs to be aborted.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

55dd6f93

P
netfilter: nf_tables: pass context to nf_tables_updtable() · e1aaca93
由 Pablo Neira Ayuso 提交于 3月 30, 2014
```
So nf_tables_uptable() only takes one single parameter.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
```
e1aaca93

netfilter: nf_tables: disabling table hooks always succeeds · f75edf5e

由 Pablo Neira Ayuso 提交于 3月 30, 2014

nf_tables_table_disable() always succeeds, make this function void.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

f75edf5e

netfilter: nf_tables: use new transaction infrastructure to handle chain · 91c7b38d

由 Pablo Neira Ayuso 提交于 4月 09, 2014

This patch speeds up rule-set updates and it also introduces a way to
revert chain updates if the batch is aborted. The idea is to store the
changes in the transaction to apply that in the commit step.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

91c7b38d

netfilter: nf_tables: refactor chain statistic routines · ff3cd7b3

由 Pablo Neira Ayuso 提交于 4月 09, 2014

Add new routines to encapsulate chain statistics allocation and
replacement.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

ff3cd7b3

netfilter: nf_tables: use new transaction infrastructure to handle sets · 958bee14

由 Pablo Neira Ayuso 提交于 4月 03, 2014

This patch reworks the nf_tables API so set updates are included in
the same batch that contains rule updates. This speeds up rule-set
updates since we skip a dialog of four messages between kernel and
user-space (two on each direction), from:

 1) create the set and send netlink message to the kernel
 2) process the response from the kernel that contains the allocated name.
 3) add the set elements and send netlink message to the kernel.
 4) process the response from the kernel (to check for errors).

To:

 1) add the set to the batch.
 2) add the set elements to the batch.
 3) add the rule that points to the set.
 4) send batch to the kernel.

This also introduces an internal set ID (NFTA_SET_ID) that is unique
in the batch so set elements and rules can refer to new sets.

Backward compatibility has been only retained in userspace, this
means that new nft versions can talk to the kernel both in the new
and the old fashion.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

958bee14

netfilter: nf_tables: add message type to transactions · b380e5c7

由 Pablo Neira Ayuso 提交于 4月 04, 2014

The patch adds message type to the transaction to simplify the
commit the and abort routines. Yet another step forward in the
generalisation of the transaction infrastructure.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

b380e5c7

netfilter: nf_tables: relocate commit and abort routines in the source file · 37082f93

由 Pablo Neira Ayuso 提交于 4月 03, 2014

Move the commit and abort routines to the bottom of the source code
file. This change is required by the follow up patches that add the
set, chain and table transaction support.

This patch is just a cleanup to access several functions without
having to declare their prototypes.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

37082f93

netfilter: nf_tables: generalise transaction infrastructure · 1081d11b

由 Pablo Neira Ayuso 提交于 4月 04, 2014

This patch generalises the existing rule transaction infrastructure
so it can be used to handle set, table and chain object transactions
as well. The transaction provides a data area that stores private
information depending on the transaction type.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

1081d11b

netfilter: nf_tables: deconstify table and chain in context structure · 7c95f6d8

由 Pablo Neira Ayuso 提交于 4月 04, 2014

The new transaction infrastructure updates the family, table and chain
objects in the context structure, so let's deconstify them. While at it,
move the context structure initialization routine to the top of the
source file as it will be also used from the table and chain routines.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

7c95f6d8

can: add hash based access to single EFF frame filters · 45c70029

由 Oliver Hartkopp 提交于 4月 02, 2014

In contrast to the direct access to the single SFF frame filters (which are
indexed by the SFF CAN ID itself) the single EFF frame filters are arranged
in a single linked hlist. To reduce the hlist traversal in the case of many
filter subscriptions a hash based access is introduced for single EFF filters.
Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

45c70029

can: proc: make array printing function indenpendent from sff frames · e3d3917f

由 Oliver Hartkopp 提交于 4月 02, 2014

The can_rcvlist_sff_proc_show_one() function which prints the array of filters
for the single SFF CAN identifiers is prepared to be used by a second caller.
Therefore it is also renamed to properly describe its future functionality.
Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

e3d3917f

net: rds: Use time_after() for time comparison · 71fd762f

由 Manuel Schölling 提交于 5月 18, 2014

To be future-proof and for better readability the time comparisons are modified
to use time_after() instead of raw math.
Signed-off-by: NManuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71fd762f

ipv4: minor spelling fix · 614d056c

由 stephen hemminger 提交于 5月 16, 2014

Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

614d056c

bridge: fix spelling of promiscuous · 025559ee

由 stephen hemminger 提交于 5月 16, 2014

Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

025559ee

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功