提交 · c544577daddb618c7dd5fa7fb98d6a41782f020e · openeuler / Kernel

01 10月, 2018 32 次提交

SUNRPC: Clean up transport write space handling · c544577d

由 Trond Myklebust 提交于 9月 03, 2018

Treat socket write space handling in the same way we now treat transport
congestion: by denying the XPRT_LOCK until the transport signals that it
has free buffer space.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

c544577d

SUNRPC: Turn off throttling of RPC slots for TCP sockets · 36bd7de9

由 Trond Myklebust 提交于 9月 03, 2018

The theory was that we would need to grab the socket lock anyway, so we
might as well use it to gate the allocation of RPC slots for a TCP
socket.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

36bd7de9

SUNRPC: Allow soft RPC calls to time out when waiting for the XPRT_LOCK · f05d54ec

由 Trond Myklebust 提交于 9月 03, 2018

This no longer causes them to lose their place in the transmission queue.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

f05d54ec

SUNRPC: Allow calls to xprt_transmit() to drain the entire transmit queue · 89f90fe1

由 Trond Myklebust 提交于 8月 29, 2018

Rather than forcing each and every RPC task to grab the socket write
lock in order to send itself, we allow whichever task is holding the
write lock to attempt to drain the entire transmit queue.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

89f90fe1

SUNRPC: Enqueue swapper tagged RPCs at the head of the transmit queue · 86aeee0e

由 Trond Myklebust 提交于 9月 08, 2018

Avoid memory starvation by giving RPCs that are tagged with the
RPC_TASK_SWAPPER flag the highest priority.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

86aeee0e

SUNRPC: Support for congestion control when queuing is enabled · 75891f50

由 Trond Myklebust 提交于 9月 03, 2018

Both RDMA and UDP transports require the request to get a "congestion control"
credit before they can be transmitted. Right now, this is done when
the request locks the socket. We'd like it to happen when a request attempts
to be transmitted for the first time.
In order to support retransmission of requests that already hold such
credits, we also want to ensure that they get queued first, so that we
don't deadlock with requests that have yet to obtain a credit.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

75891f50

SUNRPC: Improve latency for interactive tasks · 918f3c1f

由 Trond Myklebust 提交于 9月 09, 2018

One of the intentions with the priority queues was to ensure that no
single process can hog the transport. The field task->tk_owner therefore
identifies the RPC call's origin, and is intended to allow the RPC layer
to organise queues for fairness.
This commit therefore modifies the transmit queue to group requests
by task->tk_owner, and ensures that we round robin among those groups.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

918f3c1f

T
SUNRPC: Move RPC retransmission stat counter to xprt_transmit() · dcbbeda8
由 Trond Myklebust 提交于 9月 01, 2018
```
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
```
dcbbeda8

SUNRPC: Simplify xprt_prepare_transmit() · 5f2f6bd9

由 Trond Myklebust 提交于 9月 01, 2018

Remove the checks for whether or not we need to transmit, and whether
or not a reply has been received. Those are already handled in
call_transmit() itself.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

5f2f6bd9

SUNRPC: Don't reset the request 'bytes_sent' counter when releasing XPRT_LOCK · 04b3b88f

由 Trond Myklebust 提交于 9月 01, 2018

If the request is still on the queue, this will be incorrect behaviour.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

04b3b88f

SUNRPC: Treat the task and request as separate in the xprt_ops->send_request() · 50f484e2

由 Trond Myklebust 提交于 8月 30, 2018

When we shift to using the transmit queue, then the task that holds the
write lock will not necessarily be the same as the one being transmitted.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

50f484e2

SUNRPC: Fix up the back channel transmit · 902c5887

由 Trond Myklebust 提交于 9月 01, 2018

Fix up the back channel code to recognise that it has already been
transmitted, so does not need to be called again.
Also ensure that we set req->rq_task.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

902c5887

SUNRPC: Refactor RPC call encoding · 762e4e67

由 Trond Myklebust 提交于 8月 24, 2018

Move the call encoding so that it occurs before the transport connection
etc.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

762e4e67

SUNRPC: Add a transmission queue for RPC requests · 944b0429

由 Trond Myklebust 提交于 8月 09, 2018

Add the queue that will enforce the ordering of RPC task transmission.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

944b0429

SUNRPC: Distinguish between the slot allocation list and receive queue · ef3f5434

由 Trond Myklebust 提交于 8月 08, 2018

When storing a struct rpc_rqst on the slot allocation list, we currently
use the same field 'rq_list' as we use to store the request on the
receive queue. Since the structure is never on both lists at the same
time, this is OK.
However, for clarity, let's make that a union with different names for
the different lists so that we can more easily distinguish between
the two states.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

ef3f5434

T
SUNRPC: Minor cleanup for call_transmit() · 78b576ce
由 Trond Myklebust 提交于 8月 28, 2018
```
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
```
78b576ce

SUNRPC: Refactor xprt_transmit() to remove wait for reply code · 7f3a1d1e

由 Trond Myklebust 提交于 8月 23, 2018

Allow the caller in clnt.c to call into the code to wait for a reply
after calling xprt_transmit(). Again, the reason is that the backchannel
code does not need this functionality.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

7f3a1d1e

SUNRPC: Refactor xprt_transmit() to remove the reply queue code · edc81dcd

由 Trond Myklebust 提交于 8月 22, 2018

Separate out the action of adding a request to the reply queue so that the
backchannel code can simply skip calling it altogether.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

edc81dcd

SUNRPC: Rename xprt->recv_lock to xprt->queue_lock · 75c84151

由 Trond Myklebust 提交于 8月 31, 2018

We will use the same lock to protect both the transmit and receive queues.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

75c84151

SUNRPC: Don't wake queued RPC calls multiple times in xprt_transmit · ec37a58f

由 Trond Myklebust 提交于 8月 29, 2018

Rather than waking up the entire queue of RPC messages a second time,
just wake up the task that was put to sleep.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

ec37a58f

SUNRPC: Test whether the task is queued before grabbing the queue spinlocks · 5ce97039

由 Trond Myklebust 提交于 9月 07, 2018

When asked to wake up an RPC task, it makes sense to test whether or not
the task is still queued.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

5ce97039

SUNRPC: Add a helper to wake up a sleeping rpc_task and set its status · 359c48c0

由 Trond Myklebust 提交于 8月 29, 2018

Add a helper that will wake up a task that is sleeping on a specific
queue, and will set the value of task->tk_status. This is mainly
intended for use by the transport layer to notify the task of an
error condition.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

359c48c0

SUNRPC: Refactor the transport request pinning · cf9946cd

由 Trond Myklebust 提交于 8月 06, 2018

We are going to need to pin for both send and receive.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

cf9946cd

SUNRPC: Simplify dealing with aborted partially transmitted messages · 4cd34e7c

由 Trond Myklebust 提交于 8月 31, 2018

If the previous message was only partially transmitted, we need to close
the socket in order to avoid corruption of the message stream. To do so,
we currently hijack the unlocking of the socket in order to schedule
the close.
Now that we track the message offset in the socket state, we can move
that kind of checking out of the socket lock code, which is needed to
allow messages to remain queued after dropping the socket lock.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

4cd34e7c

T
SUNRPC: Add socket transmit queue offset tracking · 6c7a64e5
由 Trond Myklebust 提交于 8月 13, 2018
```
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
```
6c7a64e5

SUNRPC: Move reset of TCP state variables into the reconnect code · e1806c7b

由 Trond Myklebust 提交于 8月 13, 2018

Rather than resetting state variables in socket state_change() callback,
do it in the sunrpc TCP connect function itself.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

e1806c7b

SUNRPC: Rename TCP receive-specific state variables · d1109aa5

由 Trond Myklebust 提交于 8月 13, 2018

Since we will want to introduce similar TCP state variables for the
transmission of requests, let's rename the existing ones to label
that they are for the receive side.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

d1109aa5

SUNRPC: Avoid holding locks across the XDR encoding of the RPC message · 3a03818f

由 Trond Myklebust 提交于 8月 14, 2018

Currently, we grab the socket bit lock before we allow the message
to be XDR encoded. That significantly slows down the transmission
rate, since we serialise on a potentially blocking operation.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

3a03818f

SUNRPC: Simplify identification of when the message send/receive is complete · 7ebbbc6e

由 Trond Myklebust 提交于 8月 28, 2018

Add states to indicate that the message send and receive are not yet
complete.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

7ebbbc6e

SUNRPC: The transmitted message must lie in the RPCSEC window of validity · 3021a5bb

由 Trond Myklebust 提交于 8月 14, 2018

If a message has been encoded using RPCSEC_GSS, the server is
maintaining a window of sequence numbers that it considers valid.
The client should normally be tracking that window, and needs to
verify that the sequence number used by the message being transmitted
still lies inside the window of validity.

So far, we've been able to assume this condition would be realised
automatically, since the client has been encoding the message only
after taking the socket lock. Once we change that condition, we
will need the explicit check.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

3021a5bb

T
SUNRPC: If there is no reply expected, bail early from call_decode · 9ee94d3e
由 Trond Myklebust 提交于 8月 28, 2018
```
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
```
9ee94d3e
T
SUNRPC: Clean up initialisation of the struct rpc_rqst · 9dc6edcf
由 Trond Myklebust 提交于 8月 22, 2018
```
Move the initialisation back into xprt.c.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
```
9dc6edcf

25 9月, 2018 2 次提交

ip_tunnel: be careful when accessing the inner header · ccfec9e5

由 Paolo Abeni 提交于 9月 24, 2018

Cong noted that we need the same checks introduced by commit 76c0ddd8
("ip6_tunnel: be careful when accessing the inner header")
even for ipv4 tunnels.

Fixes: c5441932 ("GRE: Refactor GRE tunneling code.")
Suggested-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ccfec9e5

mpls: allow routes on ip6gre devices · d8e2262a

由 Saif Hasan 提交于 9月 21, 2018

Summary:

This appears to be necessary and sufficient change to enable `MPLS` on
`ip6gre` tunnels (RFC4023).

This diff allows IP6GRE devices to be recognized by MPLS kernel module
and hence user can configure interface to accept packets with mpls
headers as well setup mpls routes on them.

Test Plan:

Test plan consists of multiple containers connected via GRE-V6 tunnel.
Then carrying out testing steps as below.

- Carry out necessary sysctl settings on all containers

```
sysctl -w net.mpls.platform_labels=65536
sysctl -w net.mpls.ip_ttl_propagate=1
sysctl -w net.mpls.conf.lo.input=1
```

- Establish IP6GRE tunnels

```
ip -6 tunnel add name if_1_2_1 mode ip6gre \
  local 2401:db00:21:6048:feed:0::1 \
  remote 2401:db00:21:6048:feed:0::2 key 1
ip link set dev if_1_2_1 up
sysctl -w net.mpls.conf.if_1_2_1.input=1
ip -4 addr add 169.254.0.2/31 dev if_1_2_1 scope link

ip -6 tunnel add name if_1_3_1 mode ip6gre \
  local 2401:db00:21:6048:feed:0::1 \
  remote 2401:db00:21:6048:feed:0::3 key 1
ip link set dev if_1_3_1 up
sysctl -w net.mpls.conf.if_1_3_1.input=1
ip -4 addr add 169.254.0.4/31 dev if_1_3_1 scope link
```

- Install MPLS encap rules on node-1 towards node-2

```
ip route add 192.168.0.11/32 nexthop encap mpls 32/64 \
  via inet 169.254.0.3 dev if_1_2_1
```

- Install MPLS forwarding rules on node-2 and node-3
```
// node2
ip -f mpls route add 32 via inet 169.254.0.7 dev if_2_4_1

// node3
ip -f mpls route add 64 via inet 169.254.0.12 dev if_4_3_1
```

- Ping 192.168.0.11 (node4) from 192.168.0.1 (node1) (where routing
  towards 192.168.0.1 is via IP route directly towards node1 from node4)
```
ping 192.168.0.11
```

- tcpdump on interface to capture ping packets wrapped within MPLS
  header which inturn wrapped within IP6GRE header

```
16:43:41.121073 IP6
  2401:db00:21:6048:feed::1 > 2401:db00:21:6048:feed::2:
  DSTOPT GREv0, key=0x1, length 100:
  MPLS (label 32, exp 0, ttl 255) (label 64, exp 0, [S], ttl 255)
  IP 192.168.0.1 > 192.168.0.11:
  ICMP echo request, id 1208, seq 45, length 64

0x0000:  6000 2cdb 006c 3c3f 2401 db00 0021 6048  `.,..l<?$....!`H
0x0010:  feed 0000 0000 0001 2401 db00 0021 6048  ........$....!`H
0x0020:  feed 0000 0000 0002 2f00 0401 0401 0100  ......../.......
0x0030:  2000 8847 0000 0001 0002 00ff 0004 01ff  ...G............
0x0040:  4500 0054 3280 4000 ff01 c7cb c0a8 0001  E..T2.@.........
0x0050:  c0a8 000b 0800 a8d7 04b8 002d 2d3c a05b  ...........--<.[
0x0060:  0000 0000 bcd8 0100 0000 0000 1011 1213  ................
0x0070:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223  .............!"#
0x0080:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233  $%&'()*+,-./0123
0x0090:  3435 3637                                4567
```
Signed-off-by: NSaif Hasan <has@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8e2262a

24 9月, 2018 2 次提交

netpoll: make ndo_poll_controller() optional · ac3d9dd0

由 Eric Dumazet 提交于 9月 21, 2018

As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.

It seems that all networking drivers that do use NAPI
for their TX completions, should not provide a ndo_poll_controller().

NAPI drivers have netpoll support already handled
in core networking stack, since netpoll_poll_dev()
uses poll_napi(dev) to iterate through registered
NAPI contexts for a device.

This patch allows netpoll_poll_dev() to process NAPI
contexts even for drivers not providing ndo_poll_controller(),
allowing for following patches in NAPI drivers.

Also we export netpoll_poll_dev() so that it can be called
by bonding/team drivers in following patches.
Reported-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Tested-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac3d9dd0

rds: Fix build regression. · 16fdf8ba

由 David S. Miller 提交于 9月 23, 2018

Use DECLARE_* not DEFINE_*

Fixes: 8360ed67 ("RDS: IB: Use DEFINE_PER_CPU_SHARED_ALIGNED for rds_ib_stats")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16fdf8ba

23 9月, 2018 1 次提交

net-ethtool: ETHTOOL_GUFO did not and should not require CAP_NET_ADMIN · 474ff260

由 Maciej Żenczykowski 提交于 9月 22, 2018

So it should not fail with EPERM even though it is no longer implemented...

This is a fix for:
  (userns)$ egrep ^Cap /proc/self/status
  CapInh: 0000003fffffffff
  CapPrm: 0000003fffffffff
  CapEff: 0000003fffffffff
  CapBnd: 0000003fffffffff
  CapAmb: 0000003fffffffff

  (userns)$ tcpdump -i usb_rndis0
  tcpdump: WARNING: usb_rndis0: SIOCETHTOOL(ETHTOOL_GUFO) ioctl failed: Operation not permitted
  Warning: Kernel filter failed: Bad file descriptor
  tcpdump: can't remove kernel filter: Bad file descriptor

With this change it returns EOPNOTSUPP instead of EPERM.

See also https://github.com/the-tcpdump-group/libpcap/issues/689

Fixes: 08a00fea "net: Remove references to NETIF_F_UFO from ethtool."
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NMaciej Żenczykowski <maze@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

474ff260

22 9月, 2018 3 次提交

RDS: IB: Use DEFINE_PER_CPU_SHARED_ALIGNED for rds_ib_stats · 8360ed67

由 Nathan Chancellor 提交于 9月 21, 2018

Clang warns when two declarations' section attributes don't match.

net/rds/ib_stats.c:40:1: warning: section does not match previous
declaration [-Wsection]
DEFINE_PER_CPU_SHARED_ALIGNED(struct rds_ib_statistics, rds_ib_stats);
^
./include/linux/percpu-defs.h:142:2: note: expanded from macro
'DEFINE_PER_CPU_SHARED_ALIGNED'
        DEFINE_PER_CPU_SECTION(type, name,
PER_CPU_SHARED_ALIGNED_SECTION) \
        ^
./include/linux/percpu-defs.h:93:9: note: expanded from macro
'DEFINE_PER_CPU_SECTION'
        extern __PCPU_ATTRS(sec) __typeof__(type) name;
\
               ^
./include/linux/percpu-defs.h:49:26: note: expanded from macro
'__PCPU_ATTRS'
        __percpu __attribute__((section(PER_CPU_BASE_SECTION sec)))
\
                                ^
net/rds/ib.h:446:1: note: previous attribute is here
DECLARE_PER_CPU(struct rds_ib_statistics, rds_ib_stats);
^
./include/linux/percpu-defs.h:111:2: note: expanded from macro
'DECLARE_PER_CPU'
        DECLARE_PER_CPU_SECTION(type, name, "")
        ^
./include/linux/percpu-defs.h:87:9: note: expanded from macro
'DECLARE_PER_CPU_SECTION'
        extern __PCPU_ATTRS(sec) __typeof__(type) name
               ^
./include/linux/percpu-defs.h:49:26: note: expanded from macro
'__PCPU_ATTRS'
        __percpu __attribute__((section(PER_CPU_BASE_SECTION sec)))
\
                                ^
1 warning generated.

The initial definition was added in commit ec16227e ("RDS/IB:
Infiniband transport") and the cache aligned definition was added in
commit e6babe4c ("RDS/IB: Stats and sysctls") right after. The
definition probably should have been updated in net/rds/ib.h, which is
what this patch does.

Link: https://github.com/ClangBuiltLinux/linux/issues/114Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8360ed67

devlink: double free in devlink_resource_fill() · 83fe9a96

由 Dan Carpenter 提交于 9月 21, 2018

Smatch reports that devlink_dpipe_send_and_alloc_skb() frees the skb
on error so this is a double free.  We fixed a bunch of these bugs in
commit 7fe4d6dc ("devlink: Remove redundant free on error path") but
we accidentally overlooked this one.

Fixes: d9f9b9a4 ("devlink: Add support for resource abstraction")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83fe9a96

net/ipv6: Display all addresses in output of /proc/net/if_inet6 · 86f9bd1f

由 Jeff Barnhill 提交于 9月 21, 2018

The backend handling for /proc/net/if_inet6 in addrconf.c doesn't properly
handle starting/stopping the iteration.  The problem is that at some point
during the iteration, an overflow is detected and the process is
subsequently stopped.  The item being shown via seq_printf() when the
overflow occurs is not actually shown, though.  When start() is
subsequently called to resume iterating, it returns the next item, and
thus the item that was being processed when the overflow occurred never
gets printed.

Alter the meaning of the private data member "offset".  Currently, when it
is not 0 (which only happens at the very beginning), "offset" represents
the next hlist item to be printed.  After this change, "offset" always
represents the current item.

This is also consistent with the private data member "bucket", which
represents the current bucket, and also the use of "pos" as defined in
seq_file.txt:
    The pos passed to start() will always be either zero, or the most
    recent pos used in the previous session.
Signed-off-by: NJeff Barnhill <0xeffeff@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86f9bd1f

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功