提交 · d1afdc51399c53791ad9affbef67ba3aa206c379 · openanolis / cloud-kernel

22 7月, 2018 16 次提交

Merge branch 'tcp-improve-setsockopt-TCP_USER_TIMEOUT-accuracy' · d1afdc51

由 David S. Miller 提交于 7月 21, 2018

Jon Maxwell says:

====================
tcp: improve setsockopt() TCP_USER_TIMEOUT accuracy

The patch was becoming bigger based on feedback therefore I have
implemented a series of 3 commits instead in V4.

This series is a continuation based on V3 here and associated feedback:

https://patchwork.kernel.org/patch/10516195/

Suggestions by Neal Cardwell:

1) Fix up units mismatch regarding msec/jiffies.
2) Address possiblility of time_remaining being negative.
3) Add a helper routine tcp_clamp_rto_to_user_timeout() to do the rto
calculation.
4) Move start_ts logic into helper routine tcp_retrans_stamp() to
validate tcp_sk(sk)->retrans_stamp.
5) Some u32 declation and return refactoring.
6) Return 0 instead of false in tcp_retransmit_stamp(), it's not a bool.

Suggestions by David Laight:

1) Don't cache rto in tcp_clamp_rto_to_user_timeout().

Suggestions by Eric Dumazet:

1) Make u32 declartions consistent.
2) Use patch series for easier review.
3) Convert icsk->icsk_user_timeout to millisconds to avoid jiffie to
msec dance.
4) Use seperate titles for each commit in the series.
5) Fix fuzzy indentation and line wrap issues.
6) Make commit titles descriptive.

Changes:

1) Call tcp_clamp_rto_to_user_timeout(sk) as an argument to
inet_csk_reset_xmit_timer() to save on rto declaration.

Every time the TCP retransmission timer fires. It checks to see if
there is a timeout before scheduling the next retransmit timer. The
retransmit interval between each retransmission increases
exponentially. The issue is that in order for the timeout to occur the
retransmit timer needs to fire again. If the user timeout check happens
after the 9th retransmit for example. It needs to wait for the 10th
retransmit timer to fire in order to evaluate whether a timeout has
occurred or not. If the interval is large enough then the timeout will
be inaccurate.

For example with a TCP_USER_TIMEOUT of 10 seconds without patch:

1st retransmit:

22:25:18.973488 IP host1.49310 > host2.search-agent: Flags [.]

Last retransmit:

22:25:26.205499 IP host1.49310 > host2.search-agent: Flags [.]

Timeout:

send: Connection timed out
Sun Jul  1 22:25:34 EDT 2018

We can see that last retransmit took ~7 seconds. Which pushed the total
timeout to ~15 seconds instead of the expected 10 seconds. This gets
more inaccurate the larger the TCP_USER_TIMEOUT value. As the interval
increases.

Add tcp_clamp_rto_to_user_timeout() to determine if the user rto has
expired. Or whether the rto interval needs to be recalculated. Use the
original interval if user rto is not set.

Test results with the patch is the expected 10 second timeout:

1st retransmit:

01:37:59.022555 IP host1.49310 > host2.search-agent: Flags [.]

Last retransmit:

01:38:06.486558 IP host1.49310 > host2.search-agent: Flags [.]

Timeout:

send: Connection timed out
Mon Jul  2 01:38:09 EDT 2018
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1afdc51

tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy · b701a99e

由 Jon Maxwell 提交于 7月 19, 2018

Create the tcp_clamp_rto_to_user_timeout() helper routine. To calculate
the correct rto, so that the TCP_USER_TIMEOUT socket option is more
accurate. Taking suggestions and feedback into account from
Eric Dumazet, Neal Cardwell and David Laight. Due to the 1st commit we
can avoid the msecs_to_jiffies() and jiffies_to_msecs() dance.
Signed-off-by: NJon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b701a99e

tcp: Add tcp_retransmit_stamp() helper routine · a7fa3770

由 Jon Maxwell 提交于 7月 19, 2018

Create a seperate helper routine as per Neal Cardwells suggestion. To
be used by the final commit in this series and retransmits_timed_out().
Signed-off-by: NJon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7fa3770

tcp: convert icsk_user_timeout from jiffies to msecs · 9bcc66e1

由 Jon Maxwell 提交于 7月 19, 2018

This is a preparatory commit. Part of this series that improves the
socket TCP_USER_TIMEOUT option accuracy. Implement Eric Dumazets idea
to convert icsk->icsk_user_timeout from jiffies to msecs. To eliminate
the msecs_to_jiffies() and jiffies_to_msecs() dance in future.
Signed-off-by: NJon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bcc66e1

Merge branch 's390-qeth-updates' · 975cd350

由 David S. Miller 提交于 7月 21, 2018

Julian Wiedmann says:

====================
s390/qeth: updates 2018-07-19

please apply one more round of qeth patches to net-next.
This brings additional performance improvements for the transmit code,
and some refactoring to pave the way for using netdev_priv.
Also, two minor fixes for rare corner cases.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

975cd350

s390/qeth: speed up L2 IQD xmit · 5f89eca5

由 Julian Wiedmann 提交于 7月 19, 2018

Modify the L2 OSA xmit path so that it also supports L2 IQD devices
(in particular, their HW header requirements). This allows IQD devices
to advertise NETIF_F_SG support, and eliminates the allocation overhead
for the HW header.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f89eca5

s390/qeth: add support for constrained HW headers · a7c2f4a3

由 Julian Wiedmann 提交于 7月 19, 2018

Some transmit modes require that the HW header is located in the same
page as the initial protocol headers in skb->data. Let callers specify
the size of this contiguous header range, and enforce it when building
the HW header.

While at it, apply some gentle renaming to the relevant L2 code so that
it matches the L3 code.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7c2f4a3

s390/qeth: merge linearize-check into HW header construction · ba86ceee

由 Julian Wiedmann 提交于 7月 19, 2018

When checking whether an skb needs to be linearized to fit into an IO
buffer, it's desirable to consider the skb's final size and layout
(ie. after the HW header was added). But a subsequent linearization can
then cause the re-positioned HW header to violate its alignment
restrictions.

Dealing with this situation in two different code paths is quite tricky.
This patch integrates a) linearize-check and b) HW header construction
into one 3 step-sequence:
1. evaluate how the HW header needs to be added (to identify if it takes
   up an additional buffer element), then
2. check if the required buffer elements exceed the device's limit.
   Linearize when necessary and re-evaluate the HW header placement.
3. Add the HW header in the best-possible way:
   a) push, without taking up an additional buffer element
   b) push, but consume another buffer element
   c) allocate a header object from the cache.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba86ceee

s390/qeth: add statistics for consumed buffer elements · d2a274b2

由 Julian Wiedmann 提交于 7月 19, 2018

Nowadays an skb fragment typically spans over multiple pages. So replace
the obsolete, SG-only 'fragments' counter with one that tracks the
consumed buffer elements. This is what actually matters for performance.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2a274b2

s390/qeth: use core MTU range checking · 72f219da

由 Julian Wiedmann 提交于 7月 19, 2018

qeth's ndo_change_mtu() only applies some trivial bounds checking. Set
up dev->min_mtu properly, so that dev_set_mtu() can do this for us.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72f219da

s390/qeth: simplify max MTU handling · 8ce7a9e0

由 Julian Wiedmann 提交于 7月 19, 2018

When the MPC initialization code discovers the HW-specific max MTU,
apply the resulting changes straight to the netdevice.

If this is the device's first initialization, also set its MTU
(HiperSockets: the max MTU; else: a layer-specific default value).
Then cap the current MTU by the new max MTU.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ce7a9e0

s390/qeth: don't cache HW port number · 92d27209

由 Julian Wiedmann 提交于 7月 19, 2018

The netdevice is always available now, so get the portno from there.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92d27209

s390/qeth: allocate netdevice early · d3d1b205

由 Julian Wiedmann 提交于 7月 19, 2018

Allocation of the netdevice is currently delayed until a qeth card first
goes online. This complicates matters in several places, where we need
to cache values instead of applying them straight to the netdevice.

Improve on this by moving the allocation up to where the qeth card
itself is created. This is also one step in direction of eventually
placing the qeth card into netdev_priv().

In all subsequent code, remove the now redundant checks whether
card->dev is valid.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3d1b205

s390/qeth: remove redundant netif_carrier_ok() checks · addc5ee8

由 Julian Wiedmann 提交于 7月 19, 2018

netif_carrier_off() does its own checking.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

addc5ee8

s390/qeth: reset layer2 attribute on layer switch · 70551dc4

由 Julian Wiedmann 提交于 7月 19, 2018

After the subdriver's remove() routine has completed, the card's layer
mode is undetermined again. Reflect this in the layer2 field.

If qeth_dev_layer2_store() hits an error after remove() was called, the
card _always_ requires a setup(), even if the previous layer mode is
requested again.
But qeth_dev_layer2_store() bails out early if the requested layer mode
still matches the current one. So unless we reset the layer2 field,
re-probing the card back to its previous mode is currently not possible.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70551dc4

s390/qeth: fix race in used-buffer accounting · a702349a

由 Julian Wiedmann 提交于 7月 19, 2018

By updating q->used_buffers only _after_ do_QDIO() has completed, there
is a potential race against the buffer's TX completion. In the unlikely
case that the TX completion path wins, qeth_qdio_output_handler() would
decrement the counter before qeth_flush_buffers() even incremented it.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a702349a

21 7月, 2018 24 次提交

Merge branch 'hns3-misc-cleanups' · d528114b

由 David S. Miller 提交于 7月 21, 2018

Salil Mehta says:

====================
Misc. cleanups for HNS3 ethernet driver

This patch-set presents some cleanups for HNS3 Ethernet Driver.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d528114b

net: hns3: Add SPDX tags to HNS3 PF driver · d71d8381

由 Jian Shen 提交于 7月 19, 2018

Add the SPDX identifiers to HNS3 PF driver.
Signed-off-by: NJian Shen <shenjian15@huawei.com>
Signed-off-by: NPeng Li <lipeng321@huawei.com>
Signed-off-by: NSalil Mehta <salil.mehta@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d71d8381

net: hns3: Remove unused struct member and definition · 584b464f

由 Jian Shen 提交于 7月 19, 2018

The struct hclge_desc_cb and hclge_desc_cb are never used in
anywhere. This patch removes them.
Signed-off-by: NJian Shen <shenjian15@huawei.com>
Signed-off-by: NPeng Li <lipeng321@huawei.com>
Signed-off-by: NSalil Mehta <salil.mehta@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

584b464f

net: hns3: Fix misleading parameter name · ef0c5009

由 Jian Shen 提交于 7月 19, 2018

The input parameter "dev" of hns3_irq_handle() is indeed
used as a tqp vector, it is misleadin.

The struct member "flag" is used to indicate ring type,
so rename it.
Signed-off-by: NJian Shen <shenjian15@huawei.com>
Signed-off-by: NPeng Li <lipeng321@huawei.com>
Signed-off-by: NSalil Mehta <salil.mehta@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef0c5009

net: hns3: Modify inconsistent bit mask macros · c79301d8

由 Jian Shen 提交于 7月 19, 2018

Use BIT() and GENMASK() to convert the bit mask, modify
the inconsistent ones, and remove useless ones.
Signed-off-by: NJian Shen <shenjian15@huawei.com>
Signed-off-by: NPeng Li <lipeng321@huawei.com>
Signed-off-by: NSalil Mehta <salil.mehta@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c79301d8

net: hns3: Use decimal for bit offset macros · f8a91784

由 Jian Shen 提交于 7月 19, 2018

Using hex for bit offsets is inconsistent with the rest
of the file. Change them to decimal.
Signed-off-by: NJian Shen <shenjian15@huawei.com>
Signed-off-by: NPeng Li <lipeng321@huawei.com>
Signed-off-by: NSalil Mehta <salil.mehta@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8a91784

net: hns3: Correct unreasonable code comments · fdace1bc

由 Jian Shen 提交于 7月 19, 2018

This patch fixes some comment spelling errors, removes
redundant comments, rewrites misleading comments, and
adds some necessary comments.
Signed-off-by: NJian Shen <shenjian15@huawei.com>
Signed-off-by: NPeng Li <lipeng321@huawei.com>
Signed-off-by: NSalil Mehta <salil.mehta@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fdace1bc

net: hns3: Remove extra space and brackets · a10829c4

由 Jian Shen 提交于 7月 19, 2018

Remove extra space and brackets.
Signed-off-by: NJian Shen <shenjian15@huawei.com>
Signed-off-by: NPeng Li <lipeng321@huawei.com>
Signed-off-by: NSalil Mehta <salil.mehta@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a10829c4

net: hns3: Standardize the handle of return value · 3f639907

由 Jian Shen 提交于 7月 19, 2018

Apply the standard minor cleanup by returning ret outside
the brackets.
Signed-off-by: NJian Shen <shenjian15@huawei.com>
Signed-off-by: NPeng Li <lipeng321@huawei.com>
Signed-off-by: NSalil Mehta <salil.mehta@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f639907

net: hns3: Remove some redundant assignments · 646cb512

由 Jian Shen 提交于 7月 19, 2018

Remove some redundant assignments, because they have
been set to zero when allocate hdev.
Signed-off-by: NJian Shen <shenjian15@huawei.com>
Signed-off-by: NPeng Li <lipeng321@huawei.com>
Signed-off-by: NSalil Mehta <salil.mehta@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

646cb512

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · eae249b2

由 David S. Miller 提交于 7月 20, 2018

Daniel Borkmann says:

====================
pull-request: bpf-next 2018-07-20

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add sharing of BPF objects within one ASIC: this allows for reuse of
   the same program on multiple ports of a device, and therefore gains
   better code store utilization. On top of that, this now also enables
   sharing of maps between programs attached to different ports of a
   device, from Jakub.

2) Cleanup in libbpf and bpftool's Makefile to reduce unneeded feature
   detections and unused variable exports, also from Jakub.

3) First batch of RCU annotation fixes in prog array handling, i.e.
   there are several __rcu markers which are not correct as well as
   some of the RCU handling, from Roman.

4) Two fixes in BPF sample files related to checking of the prog_cnt
   upper limit from sample loader, from Dan.

5) Minor cleanup in sockmap to remove a set but not used variable,
   from Colin.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eae249b2

Merge branch 'Make-sys-class-net-per-net-namespace-objects-belong-to-container' · c59e18b8

由 David S. Miller 提交于 7月 20, 2018

Tyler Hicks says:

====================
Make /sys/class/net per net namespace objects belong to container

This is a revival of an older patch set from Dmitry Torokhov:

 https://lore.kernel.org/lkml/1471386795-32918-1-git-send-email-dmitry.torokhov@gmail.com/

My submission of v2 is here:

 https://lore.kernel.org/lkml/1531497949-1766-1-git-send-email-tyhicks@canonical.com/

Here's Dmitry's description:

 There are objects in /sys hierarchy (/sys/class/net/) that logically
 belong to a namespace/container. Unfortunately all sysfs objects start
 their life belonging to global root, and while we could change
 ownership manually, keeping tracks of all objects that come and go is
 cumbersome. It would be better if kernel created them using correct
 uid/gid from the beginning.

 This series changes kernfs to allow creating object's with arbitrary
 uid/gid, adds get_ownership() callback to ktype structure so subsystems
 could supply their own logic (likely tied to namespace support) for
 determining ownership of kobjects, and adjusts sysfs code to make use
 of this information. Lastly net-sysfs is adjusted to make sure that
 objects in net namespace are owned by the root user from the owning
 user namespace.

 Note that we do not adjust ownership of objects moved into a new
 namespace (as when moving a network device into a container) as
 userspace can easily do it.

I'm reviving this patch set because we would like this feature for
system containers. One specific use case that we have is that libvirt is
unable to configure its bridge device inside of a system container due
to the bridge files in /sys/class/net/ being owned by init root instead
of container root. The last two patches in this set are patches that
I've added to Dmitry's original set to allow such configuration of the
bridge device.

Eric had previously provided feedback that he didn't favor these changes
affecting all layers of the stack and that most of the changes could
remain local to drivers/base/core.c. That feedback is certainly sensible
but I wanted to send out v2 of the patch set without making that large
of a change since quite a bit of time has passed and the bridge changes
in the last patch of this set shows that not all of the changes will be
local to drivers/base/core.c. I'm happy to make the changes if the
original request still stands.

* Changes since v2:
  - Added my Co-Developed-by and Signed-off-by tags to all of Dmitry's
    patches that I've modified
  - Patch 1 received build failure fixes in
    arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
  - Patch 2 was updated to drop the declaration of sysfs_add_file() from
    sysfs.h since the patch removed all other uses of the function
  - Patch 5 is a new patch that prevents tx_maxrate from being written
    to from inside of a container
    + Maybe I'm being too cautious here but the restriction can always
      be loosened up later
  - Patches 6 and 7 were updated to make net_ns_get_ownership() always
    initialize uid and gid, even when the network namespace is NULL, so
    that it isn't a dangerous function to reuse
    + Requested by Christian Brauner
  - I've looked at all sysfs attributes affected by this patch set and
    feel comfortable about the changes. There are quite a few affected
    attributes that don't have any capable()/ns_capable() checks in
    their store operations (per_bond_attrs, at91_sysfs_attrs,
    sysfs_grcan_attrs, ican3_sysfs_attrs, cdc_ncm_sysfs_attrs,
    qmi_wwan_sysfs_attrs) but I think this is acceptable. It means that
    container root, rather than specifically CAP_NET_ADMIN inside of the
    network namespace that the device belongs to, can write to those
    device attributes. It's the same situation that those devices have
    today in that init root is able to write to the attributes without
    necessarily having CAP_NET_ADMIN. I think that this should probably
    be fixed in order to be consistent with what netdev_store() does by
    verifying CAP_NET_ADMIN in the network namespace but that it doesn't
    need to happen in this patch set.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c59e18b8

bridge: make sure objects belong to container's owner · 705e0dea

由 Tyler Hicks 提交于 7月 20, 2018

When creating various bridge objects in /sys/class/net/... make sure
that they belong to the container's owner instead of global root (if
they belong to a container/namespace).
Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

705e0dea

net: create reusable function for getting ownership info of sysfs inodes · fbdeaed4

由 Tyler Hicks 提交于 7月 20, 2018

Make net_ns_get_ownership() reusable by networking code outside of core.
This is useful, for example, to allow bridge related sysfs files to be
owned by container root.

Add a function comment since this is a potentially dangerous function to
use given the way that kobject_get_ownership() works by initializing uid
and gid before calling .get_ownership().
Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fbdeaed4

net-sysfs: make sure objects belong to container's owner · b0e37c0d

由 Dmitry Torokhov 提交于 7月 20, 2018

When creating various objects in /sys/class/net/... make sure that they
belong to container's owner instead of global root (if they belong to a
container/namespace).
Co-Developed-by: NTyler Hicks <tyhicks@canonical.com>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0e37c0d

net-sysfs: require net admin in the init ns for setting tx_maxrate · 3033fced

由 Tyler Hicks 提交于 7月 20, 2018

An upcoming change will allow container root to open some /sys/class/net
files for writing. The tx_maxrate attribute can result in changes
to actual hardware devices so err on the side of caution by requiring
CAP_NET_ADMIN in the init namespace in the corresponding attribute store
operation.
Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3033fced

driver core: set up ownership of class devices in sysfs · 9944e894

由 Dmitry Torokhov 提交于 7月 20, 2018

Plumb in get_ownership() callback for devices belonging to a class so that
they can be created with uid/gid different from global root. This will
allow network devices in a container to belong to container's root and not
global root.
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: NTyler Hicks <tyhicks@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9944e894

kobject: kset_create_and_add() - fetch ownership info from parent · d028b6f7

由 Dmitry Torokhov 提交于 7月 20, 2018

This change implements get_ownership() for ksets created with
kset_create_and_add() call by fetching ownership data from parent kobject.
This is done mostly for benefit of "queues" attribute of net devices so
that corresponding directory belongs to container's root instead of global
root for network devices in a container.
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: NTyler Hicks <tyhicks@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d028b6f7

sysfs, kobject: allow creating kobject belonging to arbitrary users · 5f81880d

由 Dmitry Torokhov 提交于 7月 20, 2018

Normally kobjects and their sysfs representation belong to global root,
however it is not necessarily the case for objects in separate namespaces.
For example, objects in separate network namespace logically belong to the
container's root and not global root.

This change lays groundwork for allowing network namespace objects
ownership to be transferred to container's root user by defining
get_ownership() callback in ktype structure and using it in sysfs code to
retrieve desired uid/gid when creating sysfs objects for given kobject.
Co-Developed-by: NTyler Hicks <tyhicks@canonical.com>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f81880d

kernfs: allow creating kernfs objects with arbitrary uid/gid · 488dee96

由 Dmitry Torokhov 提交于 7月 20, 2018

This change allows creating kernfs files and directories with arbitrary
uid/gid instead of always using GLOBAL_ROOT_UID/GID by extending
kernfs_create_dir_ns() and kernfs_create_file_ns() with uid/gid arguments.
The "simple" kernfs_create_file() and kernfs_create_dir() are left alone
and always create objects belonging to the global root.

When creating symlinks ownership (uid/gid) is taken from the target kernfs
object.
Co-Developed-by: NTyler Hicks <tyhicks@canonical.com>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

488dee96

net: Init backlog NAPI's gro_hash. · 7c4ec749

由 David S. Miller 提交于 7月 20, 2018

Based upon a patch by Sean Tranchetti.

Fixes: d4546c25 ("net: Convert GRO SKB handling to list_head.")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c4ec749

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 99d20a46

由 David S. Miller 提交于 7月 20, 2018

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your net-next
tree:

1) No need to set ttl from reject action for the bridge family, from
   Taehee Yoo.

2) Use a fixed timeout for flow that are passed up from the flowtable
   to conntrack, from Florian Westphal.

3) More preparation patches for tproxy support for nf_tables, from Mate
   Eckl.

4) Remove unnecessary indirection in core IPv6 checksum function, from
   Florian Westphal.

5) Use nf_ct_get_tuplepr() from openvswitch, instead of opencoding it.
   From Florian Westphal.

6) socket match now selects socket infrastructure, instead of depending
   on it. From Mate Eckl.

7) Patch series to simplify conntrack tuple building/parsing from packet
   path and ctnetlink, from Florian Westphal.

8) Fetch timeout policy from protocol helpers, instead of doing it from
   core, from Florian Westphal.

9) Merge IPv4 and IPv6 protocol trackers into conntrack core, from
   Florian Westphal.

10) Depend on CONFIG_NF_TABLES_IPV6 and CONFIG_IP6_NF_IPTABLES
    respectively, instead of IPV6. Patch from Mate Eckl.

11) Add specific function for garbage collection in conncount,
    from Yi-Hung Wei.

12) Catch number of elements in the connlimit list, from Yi-Hung Wei.

13) Move locking to nf_conncount, from Yi-Hung Wei.

14) Series of patches to add lockless tree traversal in nf_conncount,
    from Yi-Hung Wei.

15) Resolve clash in matching conntracks when race happens, from
    Martynas Pumputis.

16) If connection entry times out, remove template entry from the
    ip_vs_conn_tab table to improve behaviour under flood, from
    Julian Anastasov.

17) Remove useless parameter from nf_ct_helper_ext_add(), from Gao feng.

18) Call abort from 2-phase commit protocol before requesting modules,
    make sure this is done under the mutex, from Florian Westphal.

19) Grab module reference when starting transaction, also from Florian.

20) Dynamically allocate expression info array for pre-parsing, from
    Florian.

21) Add per netns mutex for nf_tables, from Florian Westphal.

22) A couple of patches to simplify and refactor nf_osf code to prepare
    for nft_osf support.

23) Break evaluation on missing socket, from Mate Eckl.

24) Allow to match socket mark from nft_socket, from Mate Eckl.

25) Remove dependency on nf_defrag_ipv6, now that IPv6 tracker is
    built-in into nf_conntrack. From Florian Westphal.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

99d20a46

Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux · c4c5551d

由 David S. Miller 提交于 7月 20, 2018

All conflicts were trivial overlapping changes, so reasonably
easy to resolve.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c4c5551d

Merge tag 'vfio-v4.18-rc6' of git://github.com/awilliam/linux-vfio · 48e5aee8

由 Linus Torvalds 提交于 7月 20, 2018

Pull VFIO fix from Alex Williamson:
 "Harden potential Spectre v1 issue (Gustavo A. R. Silva)"

* tag 'vfio-v4.18-rc6' of git://github.com/awilliam/linux-vfio:
  vfio/pci: Fix potential Spectre v1

48e5aee8

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功