提交 · 161cd45ff0670c3068adb3de33e26495b648e906 · openanolis / cloud-kernel

15 6月, 2016 40 次提交

Merge branch 'rds-mprds-foundations' · 161cd45f

由 David S. Miller 提交于 6月 14, 2016

Sowmini Varadhan says:

====================
RDS: multiple connection paths for scaling

Today RDS-over-TCP is implemented by demux-ing multiple PF_RDS sockets
between any 2 endpoints (where endpoint == [IP address, port]) over a
single TCP socket between the 2 IP addresses involved. This has the
limitation that it ends up funneling multiple RDS flows over a single
TCP flow, thus the rds/tcp connection is
   (a) upper-bounded to the single-flow bandwidth,
   (b) suffers from head-of-line blocking for the RDS sockets.

Better throughput (for a fixed small packet size, MTU) can be achieved
by having multiple TCP/IP flows per rds/tcp connection, i.e., multipathed
RDS (mprds).  Each such TCP/IP flow constitutes a path for the rds/tcp
connection. RDS sockets will be attached to a path based on some hash
(e.g., of local address and RDS port number) and packets for that RDS
socket will be sent over the attached path using TCP to segment/reassemble
RDS datagrams on that path.

The table below, generated using a prototype that implements mprds,
shows that this is significant for scaling to 40G.  Packet sizes
used were: 8K byte req, 256 byte resp. MTU: 1500.  The parameters for
RDS-concurrency used below are described in the rds-stress(1) man page-
the number listed is proportional to the number of threads at which max
throughput was attained.

  -------------------------------------------------------------------
     RDS-concurrency   Num of       tx+rx K/s (iops)       throughput
     (-t N -d N)       TCP paths
  -------------------------------------------------------------------
        16             1             600K -  700K            4 Gbps
        28             8            5000K - 6000K           32 Gbps
  -------------------------------------------------------------------

FAQ: what is the relation between mprds and mptcp?
  mprds is orthogonal to mptcp. Whereas mptcp creates
  sub-flows for a single TCP connection, mprds parallelizes tx/rx
  at the RDS layer. MPRDS with N paths will allow N datagrams to
  be sent in parallel; each path will continue to send one
  datagram at a time, with sender and receiver keeping track of
  the retransmit and dgram-assembly state based on the RDS header.
  If desired, mptcp can additionally be used to speed up each TCP
  path. That acceleration is orthogonal to the parallelization benefits
  of mprds.

This patch series lays down the foundational data-structures to support
mprds in the kernel. It implements the changes to split up the
rds_connection structure into a common (to all paths) part,
and a per-path rds_conn_path. All I/O workqs are driven from
the rds_conn_path.

Note that this patchset does not (yet) actually enable multipathing
for any of the transports; all transports will continue to use a
single path with the refactored data-structures. A subsequent patchset
will  add the changes to the rds-tcp module to actually use mprds
in rds-tcp.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

161cd45f

RDS: Update rds_conn_destroy to be MP capable · 3ecc5693

由 Sowmini Varadhan 提交于 6月 13, 2016

Refactor rds_conn_destroy() so that the per-path dismantling
is done in rds_conn_path_destroy, and then iterate as needed
over rds_conn_path_destroy().
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ecc5693

RDS: Update rds_conn_shutdown to work with rds_conn_path · d769ef81

由 Sowmini Varadhan 提交于 6月 13, 2016

This commit changes rds_conn_shutdown to take a rds_conn_path *
argument, allowing it to shutdown paths other than c_path[0] for
MP-capable transports.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d769ef81

RDS: Initialize all RDS_MPATH_WORKERS in __rds_conn_create · 1c5113cf

由 Sowmini Varadhan 提交于 6月 13, 2016

Add a for() loop in __rds_conn_create to initialize all the
conn_paths, in preparate for MP capable transports.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c5113cf

RDS: Add rds_conn_path_error() · fb1b3dc4

由 Sowmini Varadhan 提交于 6月 13, 2016

rds_conn_path_error() is the MP-aware analog of rds_conn_error,
to be used by multipath-capable callers.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb1b3dc4

RDS: update rds-info related functions to traverse multiple conn_paths · 992c9ec5

由 Sowmini Varadhan 提交于 6月 13, 2016

This commit updates the callbacks related to the rds-info command
so that they walk through all the rds_conn_path structures and
report the requested info.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

992c9ec5

RDS: Add rds_conn_path_connect_if_down() for MP-aware callers · 3c0a5900

由 Sowmini Varadhan 提交于 6月 13, 2016

rds_conn_path_connect_if_down() works on the rds_conn_path
that it is passed. Callers who are not t_m_capable may continue
calling rds_conn_connect_if_down, which will invoke
rds_conn_path_connect_if_down() with the default c_path[0].
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c0a5900

RDS: Make rds_send_pong() take a rds_conn_path argument · 45997e9e

由 Sowmini Varadhan 提交于 6月 13, 2016

This commit allows rds_send_pong() callers to send back
the rds pong message on some path other than c_path[0] by
passing in a struct rds_conn_path * argument.  It also
removes the last dependency on the #defines in rds_single.h
from send.c
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45997e9e

RDS: Extract rds_conn_path from i_conn_path in rds_send_drop_to() for MP-capable transports · 01ff34ed

由 Sowmini Varadhan 提交于 6月 13, 2016

Explicitly set up rds_conn_path, either from i_conn_path (for
MP capable transpots) or as c_path[0], and use this in
rds_send_drop_to()
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

01ff34ed

RDS: Pass rds_conn_path to rds_send_xmit() · 1f9ecd7e

由 Sowmini Varadhan 提交于 6月 13, 2016

Pass a struct rds_conn_path to rds_send_xmit so that MP capable
transports can transmit packets on something other than c_path[0].
The eventual goal for MP capable transports is to hash the rds
socket to a path based on the bound local address/port, and use
this path as the argument to rds_send_xmit()
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f9ecd7e

RDS: Make rds_send_queue_rm() rds_conn_path aware · 780a6d9e

由 Sowmini Varadhan 提交于 6月 13, 2016

Pass the rds_conn_path to rds_send_queue_rm, and use it to initialize
the i_conn_path field in struct rds_incoming. This commit also makes
rds_send_queue_rm() MP capable, because it now takes locks
specific to the rds_conn_path passed in, instead of defaulting to
the c_path[0] based defines from rds_single_path.h
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

780a6d9e

RDS: Remove stale function rds_send_get_message() · 7d885d0f

由 Sowmini Varadhan 提交于 6月 13, 2016

The only caller of rds_send_get_message() was
rds_iw_send_cq_comp_handler() which was removed as part of
commit dcdede04 ("RDS: Drop stale iWARP RDMA transport"),
so remove rds_send_get_message() for the same reason.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d885d0f

RDS: Add rds_send_path_drop_acked() · 5c3d274c

由 Sowmini Varadhan 提交于 6月 13, 2016

rds_send_path_drop_acked() is the path-specific version of
rds_send_drop_acked() to be invoked by MP capable callers.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c3d274c

RDS: Add rds_send_path_reset() · 4e9b551c

由 Sowmini Varadhan 提交于 6月 13, 2016

rds_send_path_reset() is the path specific version of rds_send_reset()
intended for MP capable callers.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e9b551c

RDS: rds_inc_path_init() helper function for MP capable transports · 5e833e02

由 Sowmini Varadhan 提交于 6月 13, 2016

t_mp_capable transports can use rds_inc_path_init to initialize
all fields in struct rds_incoming, including the i_conn_path.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e833e02

RDS: recv path gets the conn_path from rds_incoming for MP capable transports · ef9e62c2

由 Sowmini Varadhan 提交于 6月 13, 2016

Transports that are t_mp_capable should set the rds_conn_path
on which the datagram was recived in the ->i_conn_path field
of struct rds_incoming.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef9e62c2

RDS: add t_mp_capable bit to be set by MP capable transports · 7e8f4413

由 Sowmini Varadhan 提交于 6月 13, 2016

The t_mp_capable bit will be used in the core rds module
to support multipathing logic when the transport supports it.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e8f4413

RDS: split out connection specific state from rds_connection to rds_conn_path · 0cb43965

由 Sowmini Varadhan 提交于 6月 13, 2016

In preparation for multipath RDS, split the rds_connection
structure into a base structure, and a per-path struct rds_conn_path.
The base structure tracks information and locks common to all
paths. The workqs for send/recv/shutdown etc are tracked per
rds_conn_path. Thus the workq callbacks now work with rds_conn_path.

This commit allows for one rds_conn_path per rds_connection, and will
be extended into multiple conn_paths in subsequent commits.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0cb43965

tcp: return sizeof tcp_dctcp_info in dctcp_get_info() · dcf1158b

由 Neal Cardwell 提交于 6月 13, 2016

Make sure that dctcp_get_info() returns only the size of the
info->dctcp struct that it zeroes out and fills in. Previously it had
been returning the size of the enclosing tcp_cc_info union,
sizeof(*info).  There is no problem yet, but that union that may one
day be larger than struct tcp_dctcp_info, in which case the
TCP_CC_INFO code might accidentally copy uninitialized bytes from the
stack.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dcf1158b

sctp: fix error return code in sctp_init() · a5e27d18

由 Wei Yongjun 提交于 6月 13, 2016

Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5e27d18

Merge tag 'rxrpc-rewrite-20160613' of... · d4c76c1a

由 David S. Miller 提交于 6月 14, 2016

Merge tag 'rxrpc-rewrite-20160613' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Rename rxrpc source files

Here's the next part of the AF_RXRPC rewrite.  In this set I rename some of
the files in the net/rxrpc/ directory and adjust the Makefile and
ar-internal.h to reflect the changes.

The aim is twofold:

 (1) Remove the "ar-" prefix on those files that have it as it's not really
     useful, especially now that I'm building rxkad in.

 (2) To aid splitting the local, peer, connection and call handling code
     into separate files for object and event handling in future patches by
     making it easier to come up with new filenames.

There are two commits:

 (1) The first commit does a bunch of renames of .c files and alters the
     Makefile.  ar-internal.h isn't renamed at this time to avoid having to
     change the contents of the files being renamed.

 (2) The second commit changes the section label comments in ar-internal.h
     to reflect the changed filenames and reorders the file so that the
     sections are back in filename order.

The patches can be found here also:

	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite

Tagged thusly:

	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
	rxrpc-rewrite-20160613
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4c76c1a

net: hns: update the dependency · 4a63538e

由 Kejian Yan 提交于 6月 13, 2016

After the patchset about adding support of ACPI (commit id is 63434888)
being applied, HNS does not depend on OF. It depends on OF or ACPI, so
the Kconfig file needs to be updated.
Signed-off-by: NKejian Yan <yankejian@huawei.com>
Signed-off-by: NYisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a63538e

Merge branch 'r8152-phy-adjustments' · 7d7549ed

由 David S. Miller 提交于 6月 14, 2016

Hayes Wang says:

====================
r8152: code adjustment for PHY

These patches are for adjusting the code about PHY and setting speed.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d7549ed

r8152: save the speed · aa7e26b6

由 hayeswang 提交于 6月 13, 2016

The user may change the speed. Use it to replace the default one.
Signed-off-by: NHayes Wang <hayeswang@realtek.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa7e26b6

r8152: move the setting for the default speed · 9d21c0d8

由 hayeswang 提交于 6月 13, 2016

Move calling set_speed() from open() to rtl_hw_phy_work_func_t().
Then, we would set the default speed only for first initialization
or after resuming.

Besides, the set_speed() could handle the flag of PHY_RESET which
would be set in rtl_ops.hw_phy_cfg().
Signed-off-by: NHayes Wang <hayeswang@realtek.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d21c0d8

r8152: move the settings of PHY to a work queue · a028a9e0

由 hayeswang 提交于 6月 13, 2016

Move the settings of PHY to a work queue and schedule it after
rtl_ops.init().

There are some reasons for this. First, the settings are only
needed for the first time initialization or after the power
down occurs.

Second, the settings are independent with the others.

Last, the settings may take more time than the others. Leave
they in probe() or open() may delay the following flows.
Signed-off-by: NHayes Wang <hayeswang@realtek.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a028a9e0

net/sched: flower: Return error when hw can't offload and skip_sw is set · e8eb36cd

由 Amir Vadai 提交于 6月 13, 2016

When skip_sw is set and hardware fails to apply filter, return error to
user. This will make error propagation logic similar to the one
currently used in u32 classifier.
Also, changed code to use tc_skip_sw() utility function.
Signed-off-by: NAmir Vadai <amirva@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e8eb36cd

Merge branch 'bnxt_en-updates' · ce9355ac

由 David S. Miller 提交于 6月 14, 2016

Michael Chan says:

====================
bnxt_en: Updates for net-next.

-Add default VLAN support for VFs.
-Add NPAR (NIC partioning) support.
-Add support for new device 5731x and 5741x. GRO logic is different.
-Support new ETHTOOL_{G|S}LINKSETTINGS.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce9355ac

bnxt_en: Support new ETHTOOL_{G|S}LINKSETTINGS API. · 00c04a92

由 Michael Chan 提交于 6月 13, 2016

To fully support 25G and 50G link settings.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00c04a92

bnxt_en: Don't allow autoneg on cards that don't support it. · 93ed8117

由 Michael Chan 提交于 6月 13, 2016

Some cards do not support autoneg.  The current code does not prevent the
user from enabling autoneg with ethtool on such cards, causing confusion.
Firmware provides the autoneg capability information and we just need to
store it in the support_auto_speeds field in bnxt_link_info struct.
The ethtool set_settings() call will check this field before proceeding
with autoneg.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93ed8117

bnxt_en: Add BCM5731X and BCM5741X device IDs. · b24eb6ae

由 Michael Chan 提交于 6月 13, 2016

Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b24eb6ae

bnxt_en: Add GRO logic for BCM5731X chips. · 94758f8d

由 Michael Chan 提交于 6月 13, 2016

Add bnxt_gro_func_5731x() to handle GRO packets for this chip. The
completion structures used in the new chip have new data to help determine
the header offsets. The offsets can be off by 4 if the packet is an
internal loopback packet (e.g. from one VF to another VF). Some additional
logic is added to adjust the offsets if it is a loopback packet.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94758f8d

bnxt_en: Refactor bnxt_gro_skb(). · 309369c9

由 Michael Chan 提交于 6月 13, 2016

Newer chips require different logic to handle GRO packets. So refactor
the code so that we can call different functions depending on the chip.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

309369c9

bnxt_en: Define the supported chip numbers. · 659c805c

由 Michael Chan 提交于 6月 13, 2016

Define all the supported chip numbers and chip categories.  Store the
chip_num returned by firmware.  If the call to get the version and chip
number fails, we should abort.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

659c805c

bnxt_en: Add PCI device ID for 57404 NPAR devices. · ebcd4eeb

由 Michael Chan 提交于 6月 13, 2016

Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ebcd4eeb

bnxt_en: Enable NPAR (NIC Partitioning) Support. · 567b2abe

由 Satish Baddipadige 提交于 6月 13, 2016

NPAR type is read from bnxt_hwrm_func_qcfg.  Do not allow changing link
parameters if in NPAR mode sinc ethe port is shared among multiple
partitions.  The link parameters are set up by firmware.
Signed-off-by: NSatish Baddipadige <sbaddipa@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

567b2abe

bnxt_en: Handle VF_CFG_CHANGE event from firmware. · fc0f1929

由 Michael Chan 提交于 6月 13, 2016

When the VF driver gets this event, the VF configuration has changed (such
as default VLAN).  The VF driver will initiate a silent reset to pick up
the new configuration.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc0f1929

bnxt_en: Add new function bnxt_reset(). · 6988bd92

由 Michael Chan 提交于 6月 13, 2016

When a default VLAN is added to the VF, the VF driver needs to reset to
pick up the default VLAN ID. We can use the same tx timeout reset logic
to do that, without the debug output. This new function, with the
silent parameter to suppress debug output will now serve both purposes.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6988bd92

bnxt_en: Add function for VF driver to query default VLAN. · cf6645f8

由 Michael Chan 提交于 6月 13, 2016

The PF can setup a default VLAN for a VF.  The default VLAN tag is
automatically inserted and stripped without the knowledge of the
stack running on the VF.  The VF driver needs to know that default
VLAN is enabled as VLAN acceleration on the RX side is no longer
supported.  Call netdev_update_features() to fix up the VLAN features
as necessary.  Also, VLAN strip mode must be enabled to strip out
the default VLAN tag.

Only allow VF default VLAN to be set if the firmware spec is >= 1.2.1.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf6645f8

net: ethernet: enic: move to new ethtool api {get|set}_link_ksettings · 7659f50c

由 Philippe Reynes 提交于 6月 12, 2016

The ethtool api {get|set}_settings is deprecated.
We move the enic driver to new api {get|set}_link_ksettings.
Signed-off-by: NPhilippe Reynes <tremyfr@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7659f50c

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功