提交 · 541b6fe63023f3059cf85d47ff2767a3e42a8e44 · openanolis / cloud-kernel

31 10月, 2016 1 次提交

usb: add helper to extract bits 12:11 of wMaxPacketSize · 541b6fe6

由 Felipe Balbi 提交于 9月 26, 2016

According to USB Specification 2.0 table 9-4,
wMaxPacketSize is a bitfield. Endpoint's maxpacket
is laid out in bits 10:0. For high-speed,
high-bandwidth isochronous endpoints, bits 12:11
contain a multiplier to tell us how many
transactions we want to try per uframe.

This means that if we want an isochronous endpoint
to issue 3 transfers of 1024 bytes per uframe,
wMaxPacketSize should contain the value:

	1024 | (2 << 11)

or 5120 (0x1400). In order to make Host and
Peripheral controller drivers' life easier, we're
adding a helper which returns bits 12:11. Note that
no care is made WRT to checking endpoint type and
gadget's speed. That's left for drivers to handle.
Signed-off-by: NFelipe Balbi <felipe.balbi@linux.intel.com>

541b6fe6

15 10月, 2016 1 次提交

qedr: Add RoCE driver framework · 2e0cbc4d

由 Ram Amrani 提交于 10月 10, 2016

Adds a skeletal implementation of the qed* RoCE driver -
basically the ability to communicate with the qede driver and
receive notifications from it regarding various init/exit events.
Signed-off-by: NRajesh Borundia <rajesh.borundia@cavium.com>
Signed-off-by: NRam Amrani <Ram.Amrani@cavium.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2e0cbc4d

12 10月, 2016 2 次提交

autofs4: move linux/auto_dev-ioctl.h to uapi/linux · 9b88ee0f

由 Ian Kent 提交于 10月 11, 2016

Since linux/auto_dev-ioctl.h wasn't included in include/linux/Kbuild
it wasn't moved to uapi/linux as part of the uapi series.

Link: http://lkml.kernel.org/r/20160812024901.12352.10984.stgit@pluto.themaw.netSigned-off-by: NIan Kent <raven@themaw.net>
Cc: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b88ee0f

autofs: move inclusion of linux/limits.h to uapi · f58b3c91

由 Tomohiro Kusumi 提交于 10月 11, 2016

linux/limits.h should be included by uapi instead of linux/auto_fs.h
so as not to cause compile error in userspace.

 # cat << EOF > ./test1.c
 > #include <stdio.h>
 > #include <linux/auto_fs.h>
 > int main(void) {
 >     return 0;
 > }
 > EOF
 # gcc -Wall -g ./test1.c
 In file included from ./test1.c:2:0:
 /usr/include/linux/auto_fs.h:54:12: error: 'NAME_MAX' undeclared here (not in a function)
   char name[NAME_MAX+1];
             ^

Link: http://lkml.kernel.org/r/20160812024856.12352.24092.stgit@pluto.themaw.netSigned-off-by: NTomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: NIan Kent <ikent@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f58b3c91

04 10月, 2016 4 次提交

uapi: add missing install of sync_file.h · 58f0f9f7

由 Emilio López 提交于 9月 27, 2016

As part of the sync framework destaging, the sync_file.h header
was moved, but an entry was not added on Kbuild to install it.
This patch resolves this omission so that "make headers_install"
installs this header.

Fixes: 460bfc41 ("dma-buf/sync_file: de-stage sync_file headers")
Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: NEmilio López <emilio.lopez@collabora.co.uk>
Signed-off-by: NSean Paul <seanpaul@chromium.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20160927143142.8975-1-emilio.lopez@collabora.co.uk

58f0f9f7

Btrfs: catch invalid free space trees · 6675df31

由 Omar Sandoval 提交于 9月 22, 2016

There are two separate issues that can lead to corrupted free space
trees.

1. The free space tree bitmaps had an endianness issue on big-endian
   systems which is fixed by an earlier patch in this series.
2. btrfs-progs before v4.7.3 modified filesystems without updating the
   free space tree.

To catch both of these issues at once, we need to force the free space
tree to be rebuilt. To do so, add a FREE_SPACE_TREE_VALID compat_ro bit.
If the bit isn't set, we know that it was either produced by a broken
big-endian kernel or may have been corrupted by btrfs-progs.

This also provides us with a way to add rudimentary read-write support
for the free space tree to btrfs-progs: it can just clear this bit and
have the kernel rebuild the free space tree.

Cc: stable@vger.kernel.org # 4.5+
Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6675df31

vfs: add a FALLOC_FL_UNSHARE mode to fallocate to unshare a range of blocks · 71be6b49

由 Darrick J. Wong 提交于 10月 03, 2016

Add a new fallocate mode flag that explicitly unshares blocks on
filesystems that support such features.  The new flag can only
be used with an allocate-mode fallocate call.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

71be6b49

vfs: support FS_XFLAG_COWEXTSIZE and get/set of CoW extent size hint · 0a6eab8b

由 Darrick J. Wong 提交于 10月 03, 2016

Introduce XFLAGs for the new XFS CoW extent size hint, and actually
plumb the CoW extent size hint into the fsxattr structure.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

0a6eab8b

01 10月, 2016 2 次提交

fuse: Add posix ACL support · 60bcc88a

由 Seth Forshee 提交于 8月 29, 2016

Add a new INIT flag, FUSE_POSIX_ACL, for negotiating ACL support with
userspace.  When it is set in the INIT response, ACL support will be
enabled.  ACL support also implies "default_permissions".

When ACL support is enabled, the kernel will cache and have responsibility
for enforcing ACLs.  ACL xattrs will be passed to userspace, which is
responsible for updating the ACLs in the filesystem, keeping the file mode
in sync, and inheritance of default ACLs when new filesystem nodes are
created.
Signed-off-by: NSeth Forshee <seth.forshee@canonical.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

60bcc88a

fuse: handle killpriv in userspace fs · 5e940c1d

由 Miklos Szeredi 提交于 10月 01, 2016

Only userspace filesystem can do the killing of suid/sgid without races.
So introduce an INIT flag and negotiate support for this.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

5e940c1d

30 9月, 2016 8 次提交

cfg80211: Provide an API to report NAN function termination · 368e5a7b

由 Ayala Beker 提交于 9月 20, 2016

Provide a function that reports NAN DE function termination. The function
may be terminated due to one of the following reasons: user request,
ttl expiration or failure.
If the NAN instance is tied to the owner, the notification will be
sent to the socket that started the NAN interface only
Signed-off-by: NAndrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

368e5a7b

cfg80211: provide a function to report a match for NAN · 50bcd31d

由 Ayala Beker 提交于 9月 20, 2016

Provide a function the driver can call to report a match.
This will send the event to the user space.
If the NAN instance is tied to the owner, the notifications will be
sent to the socket that started the NAN interface only.
Signed-off-by: NAndrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

50bcd31d

cfg80211: allow the user space to change current NAN configuration · a5a9dcf2

由 Ayala Beker 提交于 9月 20, 2016

Some NAN configuration paramaters may change during the operation of
the NAN device. For example, a user may want to update master preference
value when the device gets plugged/unplugged to the power.
Add API that allows to do so.
Signed-off-by: NAndrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

a5a9dcf2

cfg80211: add add_nan_func / del_nan_func · a442b761

由 Ayala Beker 提交于 9月 20, 2016

A NAN function can be either publish, subscribe or follow
up. Make all the necessary verifications and just pass the
request to the driver.
Allow the user space application that starts NAN to
forbid any other socket to add or remove functions.
Signed-off-by: NAndrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NAyala Beker <ayala.beker@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

a442b761

cfg80211: add start / stop NAN commands · cb3b7d87

由 Ayala Beker 提交于 9月 20, 2016

This allows user space to start/stop NAN interface.
A NAN interface is like P2P device in a few aspects: it
doesn't have a netdev associated to it.
Add the new interface type and prevent operations that
can't be executed on NAN interface like scan.

Define several attributes that may be configured by user space
when starting NAN functionality (master preference and dual
band operation)
Signed-off-by: NAndrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

cb3b7d87

ipv6 addrconf: implement RFC7559 router solicitation backoff · bd11f074

由 Maciej Żenczykowski 提交于 9月 27, 2016

This implements:
  https://tools.ietf.org/html/rfc7559

Backoff is performed according to RFC3315 section 14:
  https://tools.ietf.org/html/rfc3315#section-14

We allow setting /proc/sys/net/ipv6/conf/*/router_solicitations
to a negative value meaning an unlimited number of retransmits,
and we make this the new default (inline with the RFC).

We also add a new setting:
  /proc/sys/net/ipv6/conf/*/router_solicitation_max_interval
defaulting to 1 hour (per RFC recommendation).
Signed-off-by: NMaciej Żenczykowski <maze@google.com>
Acked-by: NErik Kline <ek@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd11f074

ipmi: add an Aspeed BT IPMI BMC driver · 54f9c4d0

由 Alistair Popple 提交于 9月 20, 2016

This patch adds a simple device driver to expose the iBT interface on
Aspeed SOCs (AST2400 and AST2500) as a character device. Such SOCs are
commonly used as BMCs (BaseBoard Management Controllers) and this
driver implements the BMC side of the BT interface.

The BT (Block Transfer) interface is used to perform in-band IPMI
communication between a host and its BMC. Entire messages are buffered
before sending a notification to the other end, host or BMC, that
there is data to be read. Usually, the host emits requests and the BMC
responses but the specification provides a mean for the BMC to send
SMS Attention (BMC-to-Host attention or System Management Software
attention) messages.

For this purpose, the driver introduces a specific ioctl on the
device: 'BT_BMC_IOCTL_SMS_ATN' that can be used by the system running
on the BMC to signal the host of such an event.

The device name defaults to '/dev/ipmi-bt-host'
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Signed-off-by: NJeremy Kerr <jk@ozlabs.org>
Signed-off-by: NJoel Stanley <joel@jms.id.au>
[clg: - checkpatch fixes
      - added a devicetree binding documentation
      - replace 'bt_host' by 'bt_bmc' to reflect that the driver is
        the BMC side of the IPMI BT interface
      - renamed the device to 'ipmi-bt-host'
      - introduced a temporary buffer to copy_{to,from}_user
      - used platform_get_irq()
      - moved the driver under drivers/char/ipmi/ but kept it as a misc
        device
      - changed the compatible cell to "aspeed,ast2400-bt-bmc"
]
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Acked-by: NArnd Bergmann <arnd@arndb.de>
[clg: - checkpatch --strict fixes
      - removed the use of devm_iounmap, devm_kfree in cleanup paths
      - introduced an atomic-t to limit opens to 1
      - introduced a mutex to protect write/read operations]
Acked-by: NRob Herring <robh@kernel.org>
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NCorey Minyard <cminyard@mvista.com>

54f9c4d0

audit: add exclude filter extension to feature bitmap · 7ff89ac6

由 Richard Guy Briggs 提交于 8月 18, 2016

Add to the audit feature bitmap to indicate availability of the
extension of the exclude filter to include PID, UID, AUID, GID, SUBJ_*.

RFE: add additional fields for use in audit filter exclude rules
https://github.com/linux-audit/audit-kernel/issues/5Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

7ff89ac6

28 9月, 2016 1 次提交

posix_acl: uapi header split · bc8bcf3b

由 Andreas Gruenbacher 提交于 9月 27, 2016

Export the base definitions and the xattr representation of POSIX ACLs
to user space.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bc8bcf3b

27 9月, 2016 2 次提交

serial: 8250: Set Altera 16550 TX FIFO Threshold · 8e5470c9

由 Thor Thayer 提交于 9月 22, 2016

The Altera 16550 soft IP UART requires 2 additional registers for
TX FIFO threshold support. These 2 registers enable the TX FIFO
Low Watermark and set the TX FIFO Low Watermark.
Set the TX FIFO threshold to the FIFO size - tx_loadsz.
Signed-off-by: NThor Thayer <tthayer@opensource.altera.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

8e5470c9

nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant · b4c8eb03

由 Jeff Layton 提交于 9月 16, 2016

As defined in RFC 5661, section 18.16.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

b4c8eb03

26 9月, 2016 3 次提交

cfg80211: add checks for beacon rate, extend to mesh · 8564e382

由 Johannes Berg 提交于 9月 19, 2016

The previous commit added support for specifying the beacon rate
for AP mode. Add features checks to this, and extend it to also
support the rate configuration for mesh networks. For IBSS it's
not as simple due to joining etc., so that's not yet supported.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

8564e382

netfilter: nft_log: complete NFTA_LOG_FLAGS attr support · ff107d27

由 Liping Zhang 提交于 9月 25, 2016

NFTA_LOG_FLAGS attribute is already supported, but the related
NF_LOG_XXX flags are not exposed to the userspace. So we cannot
explicitly enable log flags to log uid, tcp sequence, ip options
and so on, i.e. such rule "nft add rule filter output log uid"
is not supported yet.

So move NF_LOG_XXX macro definitions to the uapi/../nf_log.h. In
order to keep consistent with other modules, change NF_LOG_MASK to
refer to all supported log flags. On the other hand, add a new
NF_LOG_DEFAULT_MASK to refer to the original default log flags.

Finally, if user specify the unsupported log flags or NFTA_LOG_GROUP
and NFTA_LOG_FLAGS are set at the same time, report EINVAL to the
userspace.
Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

ff107d27

netfilter: nf_tables: add range expression · 0f3cd9b3

由 Pablo Neira Ayuso 提交于 9月 23, 2016

Inverse ranges != [a,b] are not currently possible because rules are
composites of && operations, and we need to express this:

	data < a || data > b

This patch adds a new range expression. Positive ranges can be already
through two cmp expressions:

	cmp(sreg, data, >=)
	cmp(sreg, data, <=)

This new range expression provides an alternative way to express this.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

0f3cd9b3

25 9月, 2016 1 次提交

netfilter: xt_hashlimit: Create revision 2 to support higher pps rates · 11d5f157

由 Vishwanath Pai 提交于 9月 22, 2016

Create a new revision for the hashlimit iptables extension module. Rev 2
will support higher pps of upto 1 million, Version 1 supports only 10k.

To support this we have to increase the size of the variables avg and
burst in hashlimit_cfg to 64-bit. Create two new structs hashlimit_cfg2
and xt_hashlimit_mtinfo2 and also create newer versions of all the
functions for match, checkentry and destroy.

Some of the functions like hashlimit_mt, hashlimit_mt_check etc are very
similar in both rev1 and rev2 with only minor changes, so I have split
those functions and moved all the common code to a *_common function.
Signed-off-by: NVishwanath Pai <vpai@akamai.com>
Signed-off-by: NJoshua Hunt <johunt@akamai.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

11d5f157

24 9月, 2016 1 次提交

net: Update API for VF vlan protocol 802.1ad support · 79aab093

由 Moshe Shemesh 提交于 9月 22, 2016

Introduce new rtnl UAPI that exposes a list of vlans per VF, giving
the ability for user-space application to specify it for the VF, as an
option to support 802.1ad.
We adjusted IP Link tool to support this option.

For future use cases, the new UAPI supports multiple vlans. For now we
limit the list size to a single vlan in kernel.
Add IFLA_VF_VLAN_LIST in addition to IFLA_VF_VLAN to keep backward
compatibility with older versions of IP Link tool.

Add a vlan protocol parameter to the ndo_set_vf_vlan callback.
We kept 802.1Q as the drivers' default vlan protocol.
Suitable ip link tool command examples:
  Set vf vlan protocol 802.1ad:
    ip link set eth0 vf 1 vlan 100 proto 802.1ad
  Set vf to VST (802.1Q) mode:
    ip link set eth0 vf 1 vlan 100 proto 802.1Q
  Or by omitting the new parameter
    ip link set eth0 vf 1 vlan 100
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79aab093

23 9月, 2016 6 次提交

bpf: add helper to invalidate hash · 7a4b28c6

由 Daniel Borkmann 提交于 9月 23, 2016

Add a small helper that complements 36bbef52 ("bpf: direct packet
write and access for helpers for clsact progs") for invalidating the
current skb->hash after mangling on headers via direct packet write.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a4b28c6

net_sched: sch_fq: account for schedule/timers drifts · fefa569a

由 Eric Dumazet 提交于 9月 22, 2016

It looks like the following patch can make FQ very precise, even in VM
or stressed hosts. It matters at high pacing rates.

We take into account the difference between the time that was programmed
when last packet was sent, and current time (a drift of tens of usecs is
often observed)

Add an EWMA of the unthrottle latency to help diagnostics.

This latency is the difference between current time and oldest packet in
delayed RB-tree. This accounts for the high resolution timer latency,
but can be different under stress, as fq_check_throttled() can be
opportunistically be called from a dequeue() called after an enqueue()
for a different flow.

Tested:
// Start a 10Gbit flow
$ netperf --google-pacing-rate 1250000000 -H lpaa24 -l 10000 -- -K bbr &

Before patch :
$ sar -n DEV 10 5 | grep eth0 | grep Average
Average:         eth0  17106.04 756876.84   1102.75 1119049.02      0.00      0.00      0.52

After patch :
$ sar -n DEV 10 5 | grep eth0 | grep Average
Average:         eth0  17867.00 800245.90   1151.77 1183172.12      0.00      0.00      0.52

A new iproute2 tc can output the 'unthrottle latency' :

$ tc -s qd sh dev eth0 | grep latency
  0 gc, 0 highprio, 32490767 throttled, 2382 ns latency
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fefa569a

netfilter: nft_queue: add _SREG_QNUM attr to select the queue number · 8061bb54

由 Liping Zhang 提交于 9月 14, 2016

Currently, the user can specify the queue numbers by _QUEUE_NUM and
_QUEUE_TOTAL attributes, this is enough in most situations.

But acctually, it is not very flexible, for example:
  tcp dport 80 mapped to queue0
  tcp dport 81 mapped to queue1
  tcp dport 82 mapped to queue2
In order to do this thing, we must add 3 nft rules, and more
mapping meant more rules ...

So take one register to select the queue number, then we can add one
simple rule to mapping queues, maybe like this:
  queue num tcp dport map { 80:0, 81:1, 82:2 ... }

Florian Westphal also proposed wider usage scenarios:
  queue num jhash ip saddr . ip daddr mod ...
  queue num meta cpu ...
  queue num meta mark ...

The last point is how to load a queue number from sreg, although we can
use *(u16*)&regs->data[reg] to load the queue number, just like nat expr
to load its l4port do.

But we will cooperate with hash expr, meta cpu, meta mark expr and so on.
They all store the result to u32 type, so cast it to u16 pointer and
dereference it will generate wrong result in the big endian system.

So just keep it simple, we treat queue number as u32 type, although u16
type is already enough.
Suggested-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

8061bb54

nsfs: add ioctl to get a parent namespace · a7306ed8

由 Andrey Vagin 提交于 9月 06, 2016

Pid and user namepaces are hierarchical. There is no way to discover
parent-child relationships.

In a future we will use this interface to dump and restore nested
namespaces.
Acked-by: NSerge Hallyn <serge@hallyn.com>
Signed-off-by: NAndrei Vagin <avagin@openvz.org>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

a7306ed8

nsfs: add ioctl to get an owning user namespace for ns file descriptor · 6786741d

由 Andrey Vagin 提交于 9月 06, 2016

Each namespace has an owning user namespace and now there is not way
to discover these relationships.

Understending namespaces relationships allows to answer the question:
what capability does process X have to perform operations on a resource
governed by namespace Y?

After a long discussion, Eric W. Biederman proposed to use ioctl-s for
this purpose.

The NS_GET_USERNS ioctl returns a file descriptor to an owning user
namespace.
It returns EPERM if a target namespace is outside of a current user
namespace.

v2: rename parent to relative

v3: Add a missing mntput when returning -EAGAIN --EWB
Acked-by: NSerge Hallyn <serge@hallyn.com>
Link: https://lkml.org/lkml/2016/7/6/158Signed-off-by: NAndrei Vagin <avagin@openvz.org>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

6786741d

nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant · eed7c414

由 Jeff Layton 提交于 9月 17, 2016

As defined in RFC 5661, section 18.16.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

eed7c414

22 9月, 2016 3 次提交

netfilter: nft_numgen: add number generation offset · 2b03bf73

由 Laura Garcia Liebana 提交于 9月 13, 2016

Add support of an offset value for incremental counter and random. With
this option the sysadmin is able to start the counter to a certain value
and then apply the generated number.

Example:

	meta mark set numgen inc mod 2 offset 100

This will generate marks with the serie 100, 101, 100, 101, ...
Suggested-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NLaura Garcia Liebana <nevola@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

2b03bf73

net/sched: act_vlan: Introduce TCA_VLAN_ACT_MODIFY vlan action · 45a497f2

由 Shmulik Ladkani 提交于 9月 19, 2016

TCA_VLAN_ACT_MODIFY allows one to change an existing tag.

It accepts same attributes as TCA_VLAN_ACT_PUSH (protocol, id,
priority).
If packet is vlan tagged, then the tag gets overwritten according to
user specified attributes.

For example, this allows user to replace a tag's vid while preserving
its priority bits (as opposed to "action vlan pop pipe action vlan push").
Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45a497f2

net: cls_bpf: limit hardware offload by software-only flag · 0d01d45f

由 Jakub Kicinski 提交于 9月 21, 2016

Add cls_bpf support for the TCA_CLS_FLAGS_SKIP_HW flag.
Unlike U32 and flower cls_bpf already has some netlink
flags defined.  Create a new attribute to be able to use
the same flag values as the above.

Unlike U32 and flower reject unknown flags.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d01d45f

21 9月, 2016 4 次提交

tcp_bbr: add BBR congestion control · 0f8782ea

由 Neal Cardwell 提交于 9月 19, 2016

This commit implements a new TCP congestion control algorithm: BBR
(Bottleneck Bandwidth and RTT). A detailed description of BBR will be
published in ACM Queue, Vol. 14 No. 5, September-October 2016, as
"BBR: Congestion-Based Congestion Control".

BBR has significantly increased throughput and reduced latency for
connections on Google's internal backbone networks and google.com and
YouTube Web servers.

BBR requires only changes on the sender side, not in the network or
the receiver side. Thus it can be incrementally deployed on today's
Internet, or in datacenters.

The Internet has predominantly used loss-based congestion control
(largely Reno or CUBIC) since the 1980s, relying on packet loss as the
signal to slow down. While this worked well for many years, loss-based
congestion control is unfortunately out-dated in today's networks. On
today's Internet, loss-based congestion control causes the infamous
bufferbloat problem, often causing seconds of needless queuing delay,
since it fills the bloated buffers in many last-mile links. On today's
high-speed long-haul links using commodity switches with shallow
buffers, loss-based congestion control has abysmal throughput because
it over-reacts to losses caused by transient traffic bursts.

In 1981 Kleinrock and Gale showed that the optimal operating point for
a network maximizes delivered bandwidth while minimizing delay and
loss, not only for single connections but for the network as a
whole. Finding that optimal operating point has been elusive, since
any single network measurement is ambiguous: network measurements are
the result of both bandwidth and propagation delay, and those two
cannot be measured simultaneously.

While it is impossible to disambiguate any single bandwidth or RTT
measurement, a connection's behavior over time tells a clearer
story. BBR uses a measurement strategy designed to resolve this
ambiguity. It combines these measurements with a robust servo loop
using recent control systems advances to implement a distributed
congestion control algorithm that reacts to actual congestion, not
packet loss or transient queue delay, and is designed to converge with
high probability to a point near the optimal operating point.

In a nutshell, BBR creates an explicit model of the network pipe by
sequentially probing the bottleneck bandwidth and RTT. On the arrival
of each ACK, BBR derives the current delivery rate of the last round
trip, and feeds it through a windowed max-filter to estimate the
bottleneck bandwidth. Conversely it uses a windowed min-filter to
estimate the round trip propagation delay. The max-filtered bandwidth
and min-filtered RTT estimates form BBR's model of the network pipe.

Using its model, BBR sets control parameters to govern sending
behavior. The primary control is the pacing rate: BBR applies a gain
multiplier to transmit faster or slower than the observed bottleneck
bandwidth. The conventional congestion window (cwnd) is now the
secondary control; the cwnd is set to a small multiple of the
estimated BDP (bandwidth-delay product) in order to allow full
utilization and bandwidth probing while bounding the potential amount
of queue at the bottleneck.

When a BBR connection starts, it enters STARTUP mode and applies a
high gain to perform an exponential search to quickly probe the
bottleneck bandwidth (doubling its sending rate each round trip, like
slow start). However, instead of continuing until it fills up the
buffer (i.e. a loss), or until delay or ACK spacing reaches some
threshold (like Hystart), it uses its model of the pipe to estimate
when that pipe is full: it estimates the pipe is full when it notices
the estimated bandwidth has stopped growing. At that point it exits
STARTUP and enters DRAIN mode, where it reduces its pacing rate to
drain the queue it estimates it has created.

Then BBR enters steady state. In steady state, PROBE_BW mode cycles
between first pacing faster to probe for more bandwidth, then pacing
slower to drain any queue that created if no more bandwidth was
available, and then cruising at the estimated bandwidth to utilize the
pipe without creating excess queue. Occasionally, on an as-needed
basis, it sends significantly slower to probe for RTT (PROBE_RTT
mode).

BBR has been fully deployed on Google's wide-area backbone networks
and we're experimenting with BBR on Google.com and YouTube on a global
scale.  Replacing CUBIC with BBR has resulted in significant
improvements in network latency and application (RPC, browser, and
video) metrics. For more details please refer to our upcoming ACM
Queue publication.

Example performance results, to illustrate the difference between BBR
and CUBIC:

Resilience to random loss (e.g. from shallow buffers):
  Consider a netperf TCP_STREAM test lasting 30 secs on an emulated
  path with a 10Gbps bottleneck, 100ms RTT, and 1% packet loss
  rate. CUBIC gets 3.27 Mbps, and BBR gets 9150 Mbps (2798x higher).

Low latency with the bloated buffers common in today's last-mile links:
  Consider a netperf TCP_STREAM test lasting 120 secs on an emulated
  path with a 10Mbps bottleneck, 40ms RTT, and 1000-packet bottleneck
  buffer. Both fully utilize the bottleneck bandwidth, but BBR
  achieves this with a median RTT 25x lower (43 ms instead of 1.09
  secs).

Our long-term goal is to improve the congestion control algorithms
used on the Internet. We are hopeful that BBR can help advance the
efforts toward this goal, and motivate the community to do further
research.

Test results, performance evaluations, feedback, and BBR-related
discussions are very welcome in the public e-mail list for BBR:

  https://groups.google.com/forum/#!forum/bbr-dev

NOTE: BBR *must* be used with the fq qdisc ("man tc-fq") with pacing
enabled, since pacing is integral to the BBR design and
implementation. BBR without pacing would not function properly, and
may incur unnecessary high packet loss rates.
Signed-off-by: NVan Jacobson <vanj@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNandita Dukkipati <nanditad@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f8782ea

tcp: export data delivery rate · eb8329e0

由 Yuchung Cheng 提交于 9月 19, 2016

This commit export two new fields in struct tcp_info:

  tcpi_delivery_rate: The most recent goodput, as measured by
    tcp_rate_gen(). If the socket is limited by the sending
    application (e.g., no data to send), it reports the highest
    measurement instead of the most recent. The unit is bytes per
    second (like other rate fields in tcp_info).

  tcpi_delivery_rate_app_limited: A boolean indicating if the goodput
    was measured when the socket's throughput was limited by the
    sending application.

This delivery rate information can be useful for applications that
want to know the current throughput the TCP connection is seeing,
e.g. adaptive bitrate video streaming. It can also be very useful for
debugging or troubleshooting.
Signed-off-by: NVan Jacobson <vanj@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNandita Dukkipati <nanditad@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb8329e0

net_sched: sch_fq: add low_rate_threshold parameter · 77879147

由 Eric Dumazet 提交于 9月 19, 2016

This commit adds to the fq module a low_rate_threshold parameter to
insert a delay after all packets if the socket requests a pacing rate
below the threshold.

This helps achieve more precise control of the sending rate with
low-rate paths, especially policers. The basic issue is that if a
congestion control module detects a policer at a certain rate, it may
want fq to be able to shape to that policed rate. That way the sender
can avoid policer drops by having the packets arrive at the policer at
or just under the policed rate.

The default threshold of 550Kbps was chosen analytically so that for
policers or links at 500Kbps or 512Kbps fq would very likely invoke
this mechanism, even if the pacing rate was briefly slightly above the
available bandwidth. This value was then empirically validated with
two years of production testing on YouTube video servers.
Signed-off-by: NVan Jacobson <vanj@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNandita Dukkipati <nanditad@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77879147

bpf: direct packet write and access for helpers for clsact progs · 36bbef52

由 Daniel Borkmann 提交于 9月 20, 2016

This work implements direct packet access for helpers and direct packet
write in a similar fashion as already available for XDP types via commits
4acf6c0b ("bpf: enable direct packet data write for xdp progs") and
6841de8b ("bpf: allow helpers access the packet directly"), and as a
complementary feature to the already available direct packet read for tc
(cls/act) programs.

For enabling this, we need to introduce two helpers, bpf_skb_pull_data()
and bpf_csum_update(). The first is generally needed for both, read and
write, because they would otherwise only be limited to the current linear
skb head. Usually, when the data_end test fails, programs just bail out,
or, in the direct read case, use bpf_skb_load_bytes() as an alternative
to overcome this limitation. If such data sits in non-linear parts, we
can just pull them in once with the new helper, retest and eventually
access them.

At the same time, this also makes sure the skb is uncloned, which is, of
course, a necessary condition for direct write. As this needs to be an
invariant for the write part only, the verifier detects writes and adds
a prologue that is calling bpf_skb_pull_data() to effectively unclone the
skb from the very beginning in case it is indeed cloned. The heuristic
makes use of a similar trick that was done in 233577a2 ("net: filter:
constify detection of pkt_type_offset"). This comes at zero cost for other
programs that do not use the direct write feature. Should a program use
this feature only sparsely and has read access for the most parts with,
for example, drop return codes, then such write action can be delegated
to a tail called program for mitigating this cost of potential uncloning
to a late point in time where it would have been paid similarly with the
bpf_skb_store_bytes() as well. Advantage of direct write is that the
writes are inlined whereas the helper cannot make any length assumptions
and thus needs to generate a call to memcpy() also for small sizes, as well
as cost of helper call itself with sanity checks are avoided. Plus, when
direct read is already used, we don't need to cache or perform rechecks
on the data boundaries (due to verifier invalidating previous checks for
helpers that change skb->data), so more complex programs using rewrites
can benefit from switching to direct read plus write.

For direct packet access to helpers, we save the otherwise needed copy into
a temp struct sitting on stack memory when use-case allows. Both facilities
are enabled via may_access_direct_pkt_data() in verifier. For now, we limit
this to map helpers and csum_diff, and can successively enable other helpers
where we find it makes sense. Helpers that definitely cannot be allowed for
this are those part of bpf_helper_changes_skb_data() since they can change
underlying data, and those that write into memory as this could happen for
packet typed args when still cloned. bpf_csum_update() helper accommodates
for the fact that we need to fixup checksum_complete when using direct write
instead of bpf_skb_store_bytes(), meaning the programs can use available
helpers like bpf_csum_diff(), and implement csum_add(), csum_sub(),
csum_block_add(), csum_block_sub() equivalents in eBPF together with the
new helper. A usage example will be provided for iproute2's examples/bpf/
directory.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36bbef52

20 9月, 2016 1 次提交

dma-buf/sync_file: fix documentation error · 823d1bc1

由 Emilio López 提交于 9月 19, 2016

The ioctl name and description on the documentation block don't
match the ioctl being defined. This was probably overlooked while
renaming the ioctls during the sync file destaging. This patch
provides a more accurate description of what the ioctl actually does.
Signed-off-by: NEmilio López <emilio.lopez@collabora.co.uk>
Reviewed-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: NSumit Semwal <sumit.semwal@linaro.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20160919042120.6280-1-emilio.lopez@collabora.co.uk

823d1bc1

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功