提交 · 0c317a02ca982ca093e71bf07cb562265ba40032 · openeuler / raspberrypi-kernel

13 10月, 2016 1 次提交

cfg80211: support virtual interfaces with different beacon intervals · 0c317a02

由 Purushottam Kushwaha 提交于 10月 12, 2016

This commit provides a mechanism for the host drivers to advertise the
support for different beacon intervals among the respective interface
combinations in a group, through NL80211_IFACE_COMB_BI_MIN_GCD (u32).

This value will be compared against GCD of all beaconing interfaces of
matching combinations.

If the driver doesn't advertise this value, the old behaviour where
all beacon intervals must be identical is retained.

If it is specified, then any beacon interval for an interface in the
interface combination as well as the GCD of all active beacon intervals
in the combination must be greater or equal to this value.
Signed-off-by: NPurushottam Kushwaha <pkushwah@qti.qualcomm.com>
[change commit message, some variable names, small other things]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

0c317a02

12 10月, 2016 2 次提交

autofs4: move linux/auto_dev-ioctl.h to uapi/linux · 9b88ee0f

由 Ian Kent 提交于 10月 11, 2016

Since linux/auto_dev-ioctl.h wasn't included in include/linux/Kbuild
it wasn't moved to uapi/linux as part of the uapi series.

Link: http://lkml.kernel.org/r/20160812024901.12352.10984.stgit@pluto.themaw.netSigned-off-by: NIan Kent <raven@themaw.net>
Cc: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b88ee0f

autofs: move inclusion of linux/limits.h to uapi · f58b3c91

由 Tomohiro Kusumi 提交于 10月 11, 2016

linux/limits.h should be included by uapi instead of linux/auto_fs.h
so as not to cause compile error in userspace.

 # cat << EOF > ./test1.c
 > #include <stdio.h>
 > #include <linux/auto_fs.h>
 > int main(void) {
 >     return 0;
 > }
 > EOF
 # gcc -Wall -g ./test1.c
 In file included from ./test1.c:2:0:
 /usr/include/linux/auto_fs.h:54:12: error: 'NAME_MAX' undeclared here (not in a function)
   char name[NAME_MAX+1];
             ^

Link: http://lkml.kernel.org/r/20160812024856.12352.24092.stgit@pluto.themaw.netSigned-off-by: NTomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: NIan Kent <ikent@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f58b3c91

08 10月, 2016 10 次提交

IB/mthca: Move user vendor structures · 486f6095

由 Leon Romanovsky 提交于 9月 22, 2016

This patch moves mthca vendor's specific structures to
common UAPI folder which will be visible to all consumers.

These structures are used by user-space library driver
(libmthca) and currently manually copied to that library.

This move will allow cross-compile against these files and
simplify introduction of vendor specific data.
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

486f6095

IB/nes: Move user vendor structures · c546b2a3

由 Leon Romanovsky 提交于 9月 22, 2016

This patch moves nes vendor's specific structures to
common UAPI folder which will be visible to all consumers.

These structures are used by user-space library driver
(libmlx4) and currently manually copied to that library.

This move will allow cross-compile against these files and
simplify introduction of vendor specific data.
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c546b2a3

IB/ocrdma: Move user vendor structures · a7fe7380

由 Leon Romanovsky 提交于 9月 22, 2016

This patch moves ocrdma vendor's specific structures to
common UAPI folder which will be visible to all consumers.

These structures are used by user-space library driver
(libmlx4) and currently manually copied to that library.

This move will allow cross-compile against these files and
simplify introduction of vendor specific data.

In addition, it changes types to be __uXX instead of uXX.
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Acked-By: NDevesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a7fe7380

IB/mlx4: Move user vendor structures · 9ce28a20

由 Leon Romanovsky 提交于 9月 22, 2016

This patch moves mlx4 vendor's specific structures to
common UAPI folder which will be visible to all consumers.

These structures are used by user-space library driver
(libmlx4) and currently manually copied to that library.

This move will allow cross-compile against these files and
simplify introduction of vendor specific data.
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9ce28a20

IB/cxgb4: Move user vendor structures · e44ee2fd

由 Leon Romanovsky 提交于 9月 22, 2016

This patch moves cxgb4 vendor's specific structures to
common UAPI folder which will be visible to all consumers.

These structures are used by user-space library driver
(libcxgb4) and currently manually copied to that library.

This move will allow cross-compile against these files and
simplify introduction of vendor specific data.
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e44ee2fd

IB/cxgb3: Move user vendor structures · a85fb338

由 Leon Romanovsky 提交于 9月 22, 2016

This patch moves cxgb3 vendor's specific structures to
common UAPI folder which will be visible to all consumers.

These structures are used by user-space library driver
(libcxgb3) and currently manually copied to that library.

This move will allow cross-compile against these files and
simplify introduction of vendor specific data.
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a85fb338

IB/mlx5: Move and decouple user vendor structures · 3085e29e

由 Leon Romanovsky 提交于 9月 22, 2016

This patch decouples and moves vendors specific structures to
common UAPI folder which will be visible to all consumers.

These structures are used by user-space library driver
(libmlx5) and currently manually copied to that library.

This move will allow cross-compile against these files and
simplify introduction of vendor specific data.
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3085e29e

IB/core: Add more fields to IPv6 flow specification · a72c6a2b

由 Maor Gottlieb 提交于 8月 30, 2016

Add the following fields to IPv6 flow filter specification:
1. Traffic Class
2. Flow Label
3. Next Header
4. Hop Limit
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a72c6a2b

IB/uverbs: Add more fields to IPv4 flow specification · 989a3a8f

由 Maor Gottlieb 提交于 8月 30, 2016

Add the following fields to IPv4 flow filter specification:
1. Type of Service
2. Time to Live
3. Flags
4. Protocol
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

989a3a8f

IB/uverbs: Expose RSS related capabilities · 47adf2f4

由 Yishai Hadas 提交于 8月 28, 2016

Query RSS related attributes and return them to user-space via the
extended query device uverbs command.

It includes both direct ones (i.e. struct ib_uverbs_rss_caps) and
max_wq_type_rq which may be used in both RSS and non RSS flows.
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

47adf2f4

04 10月, 2016 1 次提交

uapi: add missing install of sync_file.h · 58f0f9f7

由 Emilio López 提交于 9月 27, 2016

As part of the sync framework destaging, the sync_file.h header
was moved, but an entry was not added on Kbuild to install it.
This patch resolves this omission so that "make headers_install"
installs this header.

Fixes: 460bfc41 ("dma-buf/sync_file: de-stage sync_file headers")
Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: NEmilio López <emilio.lopez@collabora.co.uk>
Signed-off-by: NSean Paul <seanpaul@chromium.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20160927143142.8975-1-emilio.lopez@collabora.co.uk

58f0f9f7

01 10月, 2016 2 次提交

fuse: Add posix ACL support · 60bcc88a

由 Seth Forshee 提交于 8月 29, 2016

Add a new INIT flag, FUSE_POSIX_ACL, for negotiating ACL support with
userspace.  When it is set in the INIT response, ACL support will be
enabled.  ACL support also implies "default_permissions".

When ACL support is enabled, the kernel will cache and have responsibility
for enforcing ACLs.  ACL xattrs will be passed to userspace, which is
responsible for updating the ACLs in the filesystem, keeping the file mode
in sync, and inheritance of default ACLs when new filesystem nodes are
created.
Signed-off-by: NSeth Forshee <seth.forshee@canonical.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

60bcc88a

fuse: handle killpriv in userspace fs · 5e940c1d

由 Miklos Szeredi 提交于 10月 01, 2016

Only userspace filesystem can do the killing of suid/sgid without races.
So introduce an INIT flag and negotiate support for this.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

5e940c1d

30 9月, 2016 7 次提交

cfg80211: Provide an API to report NAN function termination · 368e5a7b

由 Ayala Beker 提交于 9月 20, 2016

Provide a function that reports NAN DE function termination. The function
may be terminated due to one of the following reasons: user request,
ttl expiration or failure.
If the NAN instance is tied to the owner, the notification will be
sent to the socket that started the NAN interface only
Signed-off-by: NAndrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

368e5a7b

cfg80211: provide a function to report a match for NAN · 50bcd31d

由 Ayala Beker 提交于 9月 20, 2016

Provide a function the driver can call to report a match.
This will send the event to the user space.
If the NAN instance is tied to the owner, the notifications will be
sent to the socket that started the NAN interface only.
Signed-off-by: NAndrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

50bcd31d

cfg80211: allow the user space to change current NAN configuration · a5a9dcf2

由 Ayala Beker 提交于 9月 20, 2016

Some NAN configuration paramaters may change during the operation of
the NAN device. For example, a user may want to update master preference
value when the device gets plugged/unplugged to the power.
Add API that allows to do so.
Signed-off-by: NAndrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

a5a9dcf2

cfg80211: add add_nan_func / del_nan_func · a442b761

由 Ayala Beker 提交于 9月 20, 2016

A NAN function can be either publish, subscribe or follow
up. Make all the necessary verifications and just pass the
request to the driver.
Allow the user space application that starts NAN to
forbid any other socket to add or remove functions.
Signed-off-by: NAndrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NAyala Beker <ayala.beker@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

a442b761

cfg80211: add start / stop NAN commands · cb3b7d87

由 Ayala Beker 提交于 9月 20, 2016

This allows user space to start/stop NAN interface.
A NAN interface is like P2P device in a few aspects: it
doesn't have a netdev associated to it.
Add the new interface type and prevent operations that
can't be executed on NAN interface like scan.

Define several attributes that may be configured by user space
when starting NAN functionality (master preference and dual
band operation)
Signed-off-by: NAndrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

cb3b7d87

ipv6 addrconf: implement RFC7559 router solicitation backoff · bd11f074

由 Maciej Żenczykowski 提交于 9月 27, 2016

This implements:
  https://tools.ietf.org/html/rfc7559

Backoff is performed according to RFC3315 section 14:
  https://tools.ietf.org/html/rfc3315#section-14

We allow setting /proc/sys/net/ipv6/conf/*/router_solicitations
to a negative value meaning an unlimited number of retransmits,
and we make this the new default (inline with the RFC).

We also add a new setting:
  /proc/sys/net/ipv6/conf/*/router_solicitation_max_interval
defaulting to 1 hour (per RFC recommendation).
Signed-off-by: NMaciej Żenczykowski <maze@google.com>
Acked-by: NErik Kline <ek@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd11f074

audit: add exclude filter extension to feature bitmap · 7ff89ac6

由 Richard Guy Briggs 提交于 8月 18, 2016

Add to the audit feature bitmap to indicate availability of the
extension of the exclude filter to include PID, UID, AUID, GID, SUBJ_*.

RFE: add additional fields for use in audit filter exclude rules
https://github.com/linux-audit/audit-kernel/issues/5Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

7ff89ac6

28 9月, 2016 1 次提交

posix_acl: uapi header split · bc8bcf3b

由 Andreas Gruenbacher 提交于 9月 27, 2016

Export the base definitions and the xattr representation of POSIX ACLs
to user space.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bc8bcf3b

27 9月, 2016 1 次提交

serial: 8250: Set Altera 16550 TX FIFO Threshold · 8e5470c9

由 Thor Thayer 提交于 9月 22, 2016

The Altera 16550 soft IP UART requires 2 additional registers for
TX FIFO threshold support. These 2 registers enable the TX FIFO
Low Watermark and set the TX FIFO Low Watermark.
Set the TX FIFO threshold to the FIFO size - tx_loadsz.
Signed-off-by: NThor Thayer <tthayer@opensource.altera.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

8e5470c9

26 9月, 2016 4 次提交

cfg80211: add checks for beacon rate, extend to mesh · 8564e382

由 Johannes Berg 提交于 9月 19, 2016

The previous commit added support for specifying the beacon rate
for AP mode. Add features checks to this, and extend it to also
support the rate configuration for mesh networks. For IBSS it's
not as simple due to joining etc., so that's not yet supported.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

8564e382

netfilter: nft_log: complete NFTA_LOG_FLAGS attr support · ff107d27

由 Liping Zhang 提交于 9月 25, 2016

NFTA_LOG_FLAGS attribute is already supported, but the related
NF_LOG_XXX flags are not exposed to the userspace. So we cannot
explicitly enable log flags to log uid, tcp sequence, ip options
and so on, i.e. such rule "nft add rule filter output log uid"
is not supported yet.

So move NF_LOG_XXX macro definitions to the uapi/../nf_log.h. In
order to keep consistent with other modules, change NF_LOG_MASK to
refer to all supported log flags. On the other hand, add a new
NF_LOG_DEFAULT_MASK to refer to the original default log flags.

Finally, if user specify the unsupported log flags or NFTA_LOG_GROUP
and NFTA_LOG_FLAGS are set at the same time, report EINVAL to the
userspace.
Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

ff107d27

netfilter: nf_tables: add range expression · 0f3cd9b3

由 Pablo Neira Ayuso 提交于 9月 23, 2016

Inverse ranges != [a,b] are not currently possible because rules are
composites of && operations, and we need to express this:

	data < a || data > b

This patch adds a new range expression. Positive ranges can be already
through two cmp expressions:

	cmp(sreg, data, >=)
	cmp(sreg, data, <=)

This new range expression provides an alternative way to express this.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

0f3cd9b3

ALSA: control: cage TLV_DB_RANGE_HEAD in kernel land because it was obsoleted · 318824d3

由 Takashi Sakamoto 提交于 9月 24, 2016

In commit bf1d1c9b ("ALSA: tlv: add DECLARE_TLV_DB_RANGE()"), the new
macro was added so that "dB range information can be specified without
having to count the items manually for TLV_DB_RANGE_HEAD()". In short,
TLV_DB_RANGE_HEAD macro was obsoleted.

In commit 46e860f7 ("ALSA: rename TLV-related macros so that they're
friendly to user applications"), TLV-related macros are exposed for
applications in user land to get content of data structured by
Type/Length/Value shape. The commit managed to expose TLV-related macros
as many as possible, while obsoleted TLV_DB_RANGE_HEAD() was included to
the list of exposed macros.

This situation brings some confusions to application developers because
they might think all exposed macros have their own purpose and useful for
applications.

For the reason, this commit moves TLV_DB_RANGE_HEAD macro from UAPI header
to a header for kernel land, again. The above commit is done within the
same development period for kernel 4.9, thus not published yet. This
commit might certainly brings no confusions to user land.

Reference: commit bf1d1c9b ("ALSA: tlv: add DECLARE_TLV_DB_RANGE()")
Reference: commit 46e860f7 ("ALSA: rename TLV-related macros so that they're friendly to user applications")
Signed-off-by: NTakashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: NTakashi Iwai <tiwai@suse.de>

318824d3

25 9月, 2016 1 次提交

netfilter: xt_hashlimit: Create revision 2 to support higher pps rates · 11d5f157

由 Vishwanath Pai 提交于 9月 22, 2016

Create a new revision for the hashlimit iptables extension module. Rev 2
will support higher pps of upto 1 million, Version 1 supports only 10k.

To support this we have to increase the size of the variables avg and
burst in hashlimit_cfg to 64-bit. Create two new structs hashlimit_cfg2
and xt_hashlimit_mtinfo2 and also create newer versions of all the
functions for match, checkentry and destroy.

Some of the functions like hashlimit_mt, hashlimit_mt_check etc are very
similar in both rev1 and rev2 with only minor changes, so I have split
those functions and moved all the common code to a *_common function.
Signed-off-by: NVishwanath Pai <vpai@akamai.com>
Signed-off-by: NJoshua Hunt <johunt@akamai.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

11d5f157

24 9月, 2016 1 次提交

net: Update API for VF vlan protocol 802.1ad support · 79aab093

由 Moshe Shemesh 提交于 9月 22, 2016

Introduce new rtnl UAPI that exposes a list of vlans per VF, giving
the ability for user-space application to specify it for the VF, as an
option to support 802.1ad.
We adjusted IP Link tool to support this option.

For future use cases, the new UAPI supports multiple vlans. For now we
limit the list size to a single vlan in kernel.
Add IFLA_VF_VLAN_LIST in addition to IFLA_VF_VLAN to keep backward
compatibility with older versions of IP Link tool.

Add a vlan protocol parameter to the ndo_set_vf_vlan callback.
We kept 802.1Q as the drivers' default vlan protocol.
Suitable ip link tool command examples:
  Set vf vlan protocol 802.1ad:
    ip link set eth0 vf 1 vlan 100 proto 802.1ad
  Set vf to VST (802.1Q) mode:
    ip link set eth0 vf 1 vlan 100 proto 802.1Q
  Or by omitting the new parameter
    ip link set eth0 vf 1 vlan 100
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79aab093

23 9月, 2016 5 次提交

bpf: add helper to invalidate hash · 7a4b28c6

由 Daniel Borkmann 提交于 9月 23, 2016

Add a small helper that complements 36bbef52 ("bpf: direct packet
write and access for helpers for clsact progs") for invalidating the
current skb->hash after mangling on headers via direct packet write.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a4b28c6

net_sched: sch_fq: account for schedule/timers drifts · fefa569a

由 Eric Dumazet 提交于 9月 22, 2016

It looks like the following patch can make FQ very precise, even in VM
or stressed hosts. It matters at high pacing rates.

We take into account the difference between the time that was programmed
when last packet was sent, and current time (a drift of tens of usecs is
often observed)

Add an EWMA of the unthrottle latency to help diagnostics.

This latency is the difference between current time and oldest packet in
delayed RB-tree. This accounts for the high resolution timer latency,
but can be different under stress, as fq_check_throttled() can be
opportunistically be called from a dequeue() called after an enqueue()
for a different flow.

Tested:
// Start a 10Gbit flow
$ netperf --google-pacing-rate 1250000000 -H lpaa24 -l 10000 -- -K bbr &

Before patch :
$ sar -n DEV 10 5 | grep eth0 | grep Average
Average:         eth0  17106.04 756876.84   1102.75 1119049.02      0.00      0.00      0.52

After patch :
$ sar -n DEV 10 5 | grep eth0 | grep Average
Average:         eth0  17867.00 800245.90   1151.77 1183172.12      0.00      0.00      0.52

A new iproute2 tc can output the 'unthrottle latency' :

$ tc -s qd sh dev eth0 | grep latency
  0 gc, 0 highprio, 32490767 throttled, 2382 ns latency
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fefa569a

netfilter: nft_queue: add _SREG_QNUM attr to select the queue number · 8061bb54

由 Liping Zhang 提交于 9月 14, 2016

Currently, the user can specify the queue numbers by _QUEUE_NUM and
_QUEUE_TOTAL attributes, this is enough in most situations.

But acctually, it is not very flexible, for example:
  tcp dport 80 mapped to queue0
  tcp dport 81 mapped to queue1
  tcp dport 82 mapped to queue2
In order to do this thing, we must add 3 nft rules, and more
mapping meant more rules ...

So take one register to select the queue number, then we can add one
simple rule to mapping queues, maybe like this:
  queue num tcp dport map { 80:0, 81:1, 82:2 ... }

Florian Westphal also proposed wider usage scenarios:
  queue num jhash ip saddr . ip daddr mod ...
  queue num meta cpu ...
  queue num meta mark ...

The last point is how to load a queue number from sreg, although we can
use *(u16*)&regs->data[reg] to load the queue number, just like nat expr
to load its l4port do.

But we will cooperate with hash expr, meta cpu, meta mark expr and so on.
They all store the result to u32 type, so cast it to u16 pointer and
dereference it will generate wrong result in the big endian system.

So just keep it simple, we treat queue number as u32 type, although u16
type is already enough.
Suggested-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

8061bb54

nsfs: add ioctl to get a parent namespace · a7306ed8

由 Andrey Vagin 提交于 9月 06, 2016

Pid and user namepaces are hierarchical. There is no way to discover
parent-child relationships.

In a future we will use this interface to dump and restore nested
namespaces.
Acked-by: NSerge Hallyn <serge@hallyn.com>
Signed-off-by: NAndrei Vagin <avagin@openvz.org>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

a7306ed8

nsfs: add ioctl to get an owning user namespace for ns file descriptor · 6786741d

由 Andrey Vagin 提交于 9月 06, 2016

Each namespace has an owning user namespace and now there is not way
to discover these relationships.

Understending namespaces relationships allows to answer the question:
what capability does process X have to perform operations on a resource
governed by namespace Y?

After a long discussion, Eric W. Biederman proposed to use ioctl-s for
this purpose.

The NS_GET_USERNS ioctl returns a file descriptor to an owning user
namespace.
It returns EPERM if a target namespace is outside of a current user
namespace.

v2: rename parent to relative

v3: Add a missing mntput when returning -EAGAIN --EWB
Acked-by: NSerge Hallyn <serge@hallyn.com>
Link: https://lkml.org/lkml/2016/7/6/158Signed-off-by: NAndrei Vagin <avagin@openvz.org>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

6786741d

22 9月, 2016 3 次提交

netfilter: nft_numgen: add number generation offset · 2b03bf73

由 Laura Garcia Liebana 提交于 9月 13, 2016

Add support of an offset value for incremental counter and random. With
this option the sysadmin is able to start the counter to a certain value
and then apply the generated number.

Example:

	meta mark set numgen inc mod 2 offset 100

This will generate marks with the serie 100, 101, 100, 101, ...
Suggested-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NLaura Garcia Liebana <nevola@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

2b03bf73

net/sched: act_vlan: Introduce TCA_VLAN_ACT_MODIFY vlan action · 45a497f2

由 Shmulik Ladkani 提交于 9月 19, 2016

TCA_VLAN_ACT_MODIFY allows one to change an existing tag.

It accepts same attributes as TCA_VLAN_ACT_PUSH (protocol, id,
priority).
If packet is vlan tagged, then the tag gets overwritten according to
user specified attributes.

For example, this allows user to replace a tag's vid while preserving
its priority bits (as opposed to "action vlan pop pipe action vlan push").
Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45a497f2

net: cls_bpf: limit hardware offload by software-only flag · 0d01d45f

由 Jakub Kicinski 提交于 9月 21, 2016

Add cls_bpf support for the TCA_CLS_FLAGS_SKIP_HW flag.
Unlike U32 and flower cls_bpf already has some netlink
flags defined.  Create a new attribute to be able to use
the same flag values as the above.

Unlike U32 and flower reject unknown flags.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d01d45f

21 9月, 2016 1 次提交

tcp_bbr: add BBR congestion control · 0f8782ea

由 Neal Cardwell 提交于 9月 19, 2016

This commit implements a new TCP congestion control algorithm: BBR
(Bottleneck Bandwidth and RTT). A detailed description of BBR will be
published in ACM Queue, Vol. 14 No. 5, September-October 2016, as
"BBR: Congestion-Based Congestion Control".

BBR has significantly increased throughput and reduced latency for
connections on Google's internal backbone networks and google.com and
YouTube Web servers.

BBR requires only changes on the sender side, not in the network or
the receiver side. Thus it can be incrementally deployed on today's
Internet, or in datacenters.

The Internet has predominantly used loss-based congestion control
(largely Reno or CUBIC) since the 1980s, relying on packet loss as the
signal to slow down. While this worked well for many years, loss-based
congestion control is unfortunately out-dated in today's networks. On
today's Internet, loss-based congestion control causes the infamous
bufferbloat problem, often causing seconds of needless queuing delay,
since it fills the bloated buffers in many last-mile links. On today's
high-speed long-haul links using commodity switches with shallow
buffers, loss-based congestion control has abysmal throughput because
it over-reacts to losses caused by transient traffic bursts.

In 1981 Kleinrock and Gale showed that the optimal operating point for
a network maximizes delivered bandwidth while minimizing delay and
loss, not only for single connections but for the network as a
whole. Finding that optimal operating point has been elusive, since
any single network measurement is ambiguous: network measurements are
the result of both bandwidth and propagation delay, and those two
cannot be measured simultaneously.

While it is impossible to disambiguate any single bandwidth or RTT
measurement, a connection's behavior over time tells a clearer
story. BBR uses a measurement strategy designed to resolve this
ambiguity. It combines these measurements with a robust servo loop
using recent control systems advances to implement a distributed
congestion control algorithm that reacts to actual congestion, not
packet loss or transient queue delay, and is designed to converge with
high probability to a point near the optimal operating point.

In a nutshell, BBR creates an explicit model of the network pipe by
sequentially probing the bottleneck bandwidth and RTT. On the arrival
of each ACK, BBR derives the current delivery rate of the last round
trip, and feeds it through a windowed max-filter to estimate the
bottleneck bandwidth. Conversely it uses a windowed min-filter to
estimate the round trip propagation delay. The max-filtered bandwidth
and min-filtered RTT estimates form BBR's model of the network pipe.

Using its model, BBR sets control parameters to govern sending
behavior. The primary control is the pacing rate: BBR applies a gain
multiplier to transmit faster or slower than the observed bottleneck
bandwidth. The conventional congestion window (cwnd) is now the
secondary control; the cwnd is set to a small multiple of the
estimated BDP (bandwidth-delay product) in order to allow full
utilization and bandwidth probing while bounding the potential amount
of queue at the bottleneck.

When a BBR connection starts, it enters STARTUP mode and applies a
high gain to perform an exponential search to quickly probe the
bottleneck bandwidth (doubling its sending rate each round trip, like
slow start). However, instead of continuing until it fills up the
buffer (i.e. a loss), or until delay or ACK spacing reaches some
threshold (like Hystart), it uses its model of the pipe to estimate
when that pipe is full: it estimates the pipe is full when it notices
the estimated bandwidth has stopped growing. At that point it exits
STARTUP and enters DRAIN mode, where it reduces its pacing rate to
drain the queue it estimates it has created.

Then BBR enters steady state. In steady state, PROBE_BW mode cycles
between first pacing faster to probe for more bandwidth, then pacing
slower to drain any queue that created if no more bandwidth was
available, and then cruising at the estimated bandwidth to utilize the
pipe without creating excess queue. Occasionally, on an as-needed
basis, it sends significantly slower to probe for RTT (PROBE_RTT
mode).

BBR has been fully deployed on Google's wide-area backbone networks
and we're experimenting with BBR on Google.com and YouTube on a global
scale.  Replacing CUBIC with BBR has resulted in significant
improvements in network latency and application (RPC, browser, and
video) metrics. For more details please refer to our upcoming ACM
Queue publication.

Example performance results, to illustrate the difference between BBR
and CUBIC:

Resilience to random loss (e.g. from shallow buffers):
  Consider a netperf TCP_STREAM test lasting 30 secs on an emulated
  path with a 10Gbps bottleneck, 100ms RTT, and 1% packet loss
  rate. CUBIC gets 3.27 Mbps, and BBR gets 9150 Mbps (2798x higher).

Low latency with the bloated buffers common in today's last-mile links:
  Consider a netperf TCP_STREAM test lasting 120 secs on an emulated
  path with a 10Mbps bottleneck, 40ms RTT, and 1000-packet bottleneck
  buffer. Both fully utilize the bottleneck bandwidth, but BBR
  achieves this with a median RTT 25x lower (43 ms instead of 1.09
  secs).

Our long-term goal is to improve the congestion control algorithms
used on the Internet. We are hopeful that BBR can help advance the
efforts toward this goal, and motivate the community to do further
research.

Test results, performance evaluations, feedback, and BBR-related
discussions are very welcome in the public e-mail list for BBR:

  https://groups.google.com/forum/#!forum/bbr-dev

NOTE: BBR *must* be used with the fq qdisc ("man tc-fq") with pacing
enabled, since pacing is integral to the BBR design and
implementation. BBR without pacing would not function properly, and
may incur unnecessary high packet loss rates.
Signed-off-by: NVan Jacobson <vanj@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNandita Dukkipati <nanditad@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f8782ea