提交 · 4333d619f9e30592426bc1315243fa0754e62c39 · openeuler / Kernel

29 3月, 2017 33 次提交

net: dsa: fix copyright holder · 4333d619

由 Vivien Didelot 提交于 3月 28, 2017

I do not hold the copyright of the DSA core and drivers source files,
since these changes have been written as an initiative of my day job.
Fix this.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4333d619

net: dsa: mv88e6xxx: unconditionally set ATU trunk · 64014fe6

由 Vivien Didelot 提交于 3月 28, 2017

Set the trunk member of the mv88e6xxx_atu_entry structure regardless its
value, so that uninitialized structures gets the correct boolean value.

Note that no mainline code is affected by the current behavior.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64014fe6

ipv6: add support for NETDEV_RESEND_IGMP event · 382ed724

由 Vlad Yasevich 提交于 3月 28, 2017

This patch adds support for NETDEV_RESEND_IGMP event similar
to how it works for IPv4.
Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

382ed724

Merge branch 'dsa-mv88e6xxx-fix-chip-definitions' · 25501837

由 David S. Miller 提交于 3月 28, 2017

Vivien Didelot says:

====================
net: dsa: fix chip definitions

The definitions of some of the mv88e6xxx_ops and mv88e6xxx_info
structures are misordered and erroneous for 88E6191 and 88E6391.

This patch series cleans that up.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25501837

net: dsa: mv88e6xxx: remove 88E6391 ops · 63709570

由 Vivien Didelot 提交于 3月 28, 2017

We don't support 88E6391 anywhere in the code, so remove the unused
mv88e6391_ops structure.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63709570

net: dsa: mv88e6xxx: fix 88E6191 ops · 2cf4cefb

由 Vivien Didelot 提交于 3月 28, 2017

The mv88e6xxx_info structure for the 88E6191 chip was pointing the
mv88e6391_ops definition instead of mv88e6191_ops. Fix this.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2cf4cefb

net: dsa: mv88e6xxx: reorder 88E6341 definitions · 16e329ae

由 Vivien Didelot 提交于 3月 28, 2017

The related mv88e6xxx_ops structure was misplaced. Reorder it correctly
to fix this.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16e329ae

net: dsa: mv88e6xxx: reorder 88E6141 definitions · 990e27b0

由 Vivien Didelot 提交于 3月 28, 2017

The related mv88e6xxx_ops and mv88e6xxx_info structure were misplaced.
Reorder them correctly to fix this.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

990e27b0

Merge branch 'qed-load-unload-mfw' · c552a50e

由 David S. Miller 提交于 3月 28, 2017

Yuval Mintz says:

====================
qed: load/unload mfw series

This series correct the unload flow and greatly enhances its
initialization flow in regard to interactions between driver
and management firmware.

Patch #1 makes sure unloading is done under management-firmware's
'criticial section' protection.

Patches #2 - #4 move driver into using a newer scheme for loading
in regard to the MFW; This newer scheme would help cleaning the device
in case a previous instance has dirtied it [preboot, PDA, etc.].

Patches #5 - #6 let driver inform management-firmware on number of
resources which are dependent on the non-management firmware used.
Patch #7 then uses a new resource [BDQ] instead of some set value.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c552a50e

qed: Use BDQ resource for storage protocols · d0d40a73

由 Mintz, Yuval 提交于 3月 28, 2017

Until now, qed used some port-defined value as BDQ index for both iSCSI
and FCoE.

As management firmware now treats BDQ as a resource and tells each PF
its BDQ-range, start using a valure from that range instead.
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0d40a73

qed: Utilize resource-lock based scheme · 9c8517c4

由 Tomer Tayar 提交于 3月 28, 2017

Management firmware is used as an arbiter between the various PFs
in matters of resources, but some of the resources that need to
be divided are dependent on the non-management firmware used,
so management firmware first needs to be told how many resources
there are before trying to divide them.

As part of the initialization sequence, driver would first inform
the management firmware of the available resources under
a dedicated resource lock, and afterwards request for various
resources which might be based on the previous set values.
Signed-off-by: NTomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c8517c4

qed: Support management-based resource locking · 95691c9c

由 Tomer Tayar 提交于 3月 28, 2017

Global locking can't properly be used to synchronize between different
PFs in all scenarios, as those instances might reside in different
logical partitions [e.g., when a PF is assigned via PDA to some VM].

The management firmware provides a generic infrastructure for
device locks. For each 'resource', it's guaranteed it could be acquired
by at most a single PF at any given time [or by management firmware].

This patch adds the necessary logic in qed for utilizing said
infrastructure, implementing lock/unlock internal APIs.
Signed-off-by: NTomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95691c9c

qed: Send pf-flr as part of initialization · 18a69e36

由 Mintz, Yuval 提交于 3月 28, 2017

During HW initialization, driver would set various registers to their
needed values - but it assumes all registers start at their reset-value,
so there's no need to re-configure a register's default value.

This assumption might be incorrect, e.g., in case of preboot driver
running and initializing the driver prior to our driver.

To overcome this, we now ask management firmware to initiate a PF-flr
early during the initialization sequence. That would return everything
in the PF's scope back to default and prevent previous configurations
from still being applied.
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18a69e36

qed: Move to new load request scheme · 5d24bcf1

由 Tomer Tayar 提交于 3月 28, 2017

Management firmware is used as an arbiter between the various PFs
in regard to loading - it causes the various PFs to load/unload
sequentially and informs each of its appropriate rule in the init.

But the existing flow is too weak to handle some scenarios where
PFs aren't properly cleaned prior to loading.
The significant scenarios falling under this criteria:
  a. Preboot drivers in some environment can't properly unload.
  b. Unexpected driver replacement [kdump, PDA].

Modern management firmware supports a more intricate loading flow,
where the driver has the ability to overcome previous limitations.
This moves qed into using this newer scheme.

Notice new scheme is backward compatible, so new drivers would
still be able to load properly on top of older management firmwares
and vice versa.
Signed-off-by: NTomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d24bcf1

qed: hw_init() to receive parameter-struct · c0c2d0b4

由 Mintz, Yuval 提交于 3月 28, 2017

We'll soon need additional information, so start by changing
the infrastructure to receive the initializing variables
via a parameter struct.
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0c2d0b4

qed: Correct HW stop flow · 1226337a

由 Tomer Tayar 提交于 3月 28, 2017

Management firmware is used as arbiter between different PFs
which are loading/unloading, but in order to use the synchronization
it offers the contending configurations need to be applied either
between their LOAD_REQ <-> LOAD_DONE or UNLOAD_REQ <-> UNLOAD_DONE
management firmware commands.

Existing HW stop flow utilizes 2 different functions: qed_hw_stop() and
qed_hw_reset() which don't abide this requirement; Most of the closure
is doing outside the scope of the unload request.

This patch removes qed_hw_reset() and places the relevant stop
functionality underneath the management firmware protection.
Signed-off-by: NTomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1226337a

Merge branch 'tipc-subscription-refcount-simplifications' · 30b38236

由 David S. Miller 提交于 3月 28, 2017

Parthasarathy Bhuvaragan says:

====================
tipc: subscription refcount simplifications

The first patch makes the subscription refcount cleanup lockless and
the second updates the subscription refcount policy.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30b38236

tipc: adjust the policy of holding subscription kref · 7efea60d

由 Ying Xue 提交于 3月 28, 2017

When a new subscription object is inserted into name_seq->subscriptions
list, it's under name_seq->lock protection; when a subscription is
deleted from the list, it's also under the same lock protection;
similarly, when accessing a subscription by going through subscriptions
list, the entire process is also protected by the name_seq->lock.

Therefore, if subscription refcount is increased before it's inserted
into subscriptions list, and its refcount is decreased after it's
deleted from the list, it will be unnecessary to hold refcount at all
before accessing subscription object which is obtained by going through
subscriptions list under name_seq->lock protection.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7efea60d

tipc: advance the time of deleting subscription from subscriber->subscrp_list · 139bb36f

由 Ying Xue 提交于 3月 28, 2017

After a subscription object is created, it's inserted into its
subscriber subscrp_list list under subscriber lock protection,
similarly, before it's destroyed, it should be first removed from
its subscriber->subscrp_list. Since the subscription list is
accessed with subscriber lock, all the subscriptions are valid
during the lock duration. Hence in tipc_subscrb_subscrp_delete(), we
remove subscription get/put and the extra subscriber unlock/lock.

After this change, the subscriptions refcount cleanup is very simple
and does not access any lock.
Acked-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

139bb36f

stmmac: use netif_set_real_num_{rx,tx}_queues · 589a1a2e

由 Arnd Bergmann 提交于 3月 28, 2017

A driver must not access the two fields directly but should instead use
the helper functions to set the values and keep a consistent internal
state:

ethernet/stmicro/stmmac/stmmac_main.c: In function 'stmmac_dvr_probe':
ethernet/stmicro/stmmac/stmmac_main.c:4083:8: error: 'struct net_device' has no member named 'real_num_rx_queues'; did you mean 'real_num_tx_queues'?

Fixes: a8f5102a ("net: stmmac: TX and RX queue priority configuration")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

589a1a2e

soc: qcom: smd-rpm: Add msm8996 compatibility · 2b624250

由 Bjorn Andersson 提交于 3月 27, 2017

With the RPM driver transitioned to RPMSG we can reuse the SMD-RPM
driver ontop of GLINK for 8996, without any modifications.
Acked-by: NAndy Gross <andy.gross@linaro.org>
Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b624250

soc: qcom: smd: Remove standalone driver · 395a4805

由 Bjorn Andersson 提交于 3月 27, 2017

Remove the standalone SMD implementation as we have transitioned the
client drivers to use the RPMSG based one.

Also remove all dependencies on QCOM_SMD from Kconfig files, in order to
keep them selectable in the absence of the removed symbol.
Acked-by: NAndy Gross <andy.gross@linaro.org>
Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

395a4805

soc: qcom: smd: Transition client drivers from smd to rpmsg · 5052de8d

由 Bjorn Andersson 提交于 3月 27, 2017

By moving these client drivers to use RPMSG instead of the direct SMD
API we can reuse them ontop of the newly added GLINK wire-protocol
support found in the 820 and 835 Qualcomm platforms.

As the new (RPMSG-based) and old SMD implementations are mutually
exclusive we have to change all client drivers in one commit, to make
sure we have a working system before and after this transition.
Acked-by: NAndy Gross <andy.gross@linaro.org>
Acked-by: NKalle Valo <kvalo@codeaurora.org>
Acked-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5052de8d

vxlan: don't age NTF_EXT_LEARNED fdb entries · def499c9

由 Roopa Prabhu 提交于 3月 27, 2017

vxlan driver already implicitly supports installing
of external fdb entries with NTF_EXT_LEARNED. This
patch just makes sure these entries are not aged
by the vxlan driver. An external entity managing these
entries will age them out. This is consistent with
the use of NTF_EXT_LEARNED in the bridge driver.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

def499c9

Merge branch 'net-dpipe' · 2a69ca71

由 David S. Miller 提交于 3月 28, 2017

Jiri Pirko says:

====================
Add support for pipeline debug (dpipe)

Arkadi says:

While doing the hardware offloading process much of the hardware
specifics cannot be presented. An example for such is the routing
LPM algorithm which differ in hardware implementation from the
kernel software implementation. The only information the user receives
is whether specific route is offloaded or not, but he cannot really
understand the underlying implementation nor get the specific statistics
related to that process.

Another example is ACL offload using TC which is commonly implemented
using TCAM memory. Currently there is no capability to gain visibility
into the TCAM structure and to debug suboptimal resource allocation.

This patchset introduces capability for exporting the ASICs pipeline
abstraction via devlink infrastructure, which should serve as an
complementary tool. This infrastructure allows the user to get visibility
into the ASIC by modeling it as a set of match/action tables.

The main objects defined:
Table - abstraction for a single pipeline stage. Contains the
        available match/actions and counter availability.
Entry - entry in a specific table with specific matches/actions
        values and dedicated counter.
Header/field - tuples which describes the tables behavior.

As an example one of the ASIC's L3 blocks will be modeled. The egress
rif (router interface) table is the final step in the L3 pipeline
processing which does match on the internal rif index which was
determined before by the routing logic. The erif table determines
whether to forward or drop the packet and updates the corresponding
rif L3 statistics.

To expose this internal resources a special metadata header will
be introduced that describes the internal information gathered by
the ASIC's pipeline and contains the following fields: rif_port_index,
forward and drop.

Some internal hardware resources have direct mapping to kernel
objects. For example the rif_port_index is mapped to the net-devices
ifindex. By providing this mapping the users gains visibility into
the offloading process.

Follow-up work will include exporting more L3 tables which will give
visibility into the routing process.

First stage is adding support for dpipe in devlink. Next add support
in spectrum driver. Finally implement egress router interface
(erif) table for spectrum ASIC as an example.

---
v1->v2: Please see individual patches
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2a69ca71

mlxsw: spectrum: Add Support for erif table entries access · 2ba5999f

由 Arkadi Sharshevsky 提交于 3月 28, 2017

Implement dpipe's table ops for erif table which provide:
1. Getting the entries in the table with the associate values.
	- match on "mlxsw_meta:erif_index"
	- action on "mlxsw_meta:forwared_out"
2. Synchronize the hardware in case of enabling/disabling counters which
   mean removing erif counters from all interfaces.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ba5999f

mlxsw: spectrum_router: Add rif helper functions · fd1b9d41

由 Arkadi Sharshevsky 提交于 3月 28, 2017

Add rif helper function to access the rif index and rif devices ifindex.
This functions will be used by dpipe in order to dump the rif table.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd1b9d41

mlxsw: spectrum: Support for counters on router interfaces · e0c0afd8

由 Arkadi Sharshevsky 提交于 3月 28, 2017

Add support for counter allocation on router interfaces. The allocation
depends on the counter state of relevant table. In case the counting is
disabled or no counters left the counter index will be set as invalid.

Also a counter pool for router allocation is added.
Signed-off-by: NArakdi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0c0afd8

mlxsw: reg: Add Router Interface Counter Register · ba73e97a

由 Arkadi Sharshevsky 提交于 3月 28, 2017

The RICNT register retrieves per port performance counter. It will be
used to query the router interfaces statistics.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba73e97a

mlxsw: spectrum: Add definition for egress rif table · d54b70fe

由 Arkadi Sharshevsky 提交于 3月 28, 2017

Add definition for egress router interface table. This table describes
the final part in the routing pipeline. This table matches the egress
interface index (rif index, which is set by the previous stages and
determine the out port) and makes the decision of forwarding the packet
towards the L2 logic or dropping it.

The metadata header is added to represent this internal information.
The rif index field is mapped logically to netdevice ifindex.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d54b70fe

mlxsw: spectrum: Add placeholder for dpipe · 230ead01

由 Arkadi Sharshevsky 提交于 3月 28, 2017

Add placeholder for dpipe. Support for specific tables and headers will
be introduced in following patches. The headers are shared between all
mlxsw_sp instances.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

230ead01

mlxsw: reg: Add counter fields to RITR register · 0f630fcb

由 Arkadi Sharshevsky 提交于 3月 28, 2017

Update RITR for counter support. This allows adding counters for
ASIC's router ports.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f630fcb

devlink: Support for pipeline debug (dpipe) · 1555d204

由 Arkadi Sharshevsky 提交于 3月 28, 2017

The pipeline debug is used to export the pipeline abstractions for the
main objects - tables, headers and entries. The only support for set is
for changing the counter parameter on specific table.

The basic structures:

Header - can represent a real protocol header information or internal
         metadata. Generic protocol headers like IPv4 can be shared
         between drivers. Each driver can add local headers.

Field - part of a header. Can represent protocol field or specific ASIC
        metadata field. Hardware special metadata fields can be mapped
        to different resources, for example switch ASIC ports can have
        internal number which from the systems point of view is mapped
        to netdeivce ifindex.

Match - represent specific match rule. Can describe match on specific
        field or header. The header index should be specified as well
        in order to support several header instances of the same type
        (tunneling).

Action - represents specific action rule. Actions can describe operations
         on specific field values for example like set, increment, etc.
         And header operation like add and delete.

Value - represents value which can be associated with specific match or
        action.

Table - represents a hardware block which can be described with match/
        action behavior. The match/action can be done on the packets
        data or on the internal metadata that it gathered along the
        packets traversal throw the pipeline which is vendor specific
        and should be exported in order to provide understanding of
        ASICs behavior.

Entry - represents single record in a specific table. The entry is
        identified by specific combination of values for match/action.

Prior to accessing the tables/entries the drivers provide the header/
field data base which is used by driver to user-space. The data base
is split between the shared headers and unique headers.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1555d204

28 3月, 2017 7 次提交

Merge tag 'mlx5e-failsafe' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · cc628c96

由 David S. Miller 提交于 3月 27, 2017

Saeed Mahameed says:

====================
mlx5e-failsafe 27-03-2017

This series provides a fail-safe mechanism to allow safely re-configuring
mlx5e netdevice and provides a resiliency against sporadic
configuration failures.

To enable this we do some refactoring and code reorganizing to allow
breaking the drivers open/close flows to stages:
      open -> activate -> deactivate -> close.

In addition we need to allow creating fresh HW ring resources
(mlx5e_channels) with their own "new" set of parameters, while keeping
the current ones running and active until the new channels are
successfully created with the new configuration, and only then we can
safly replace (switch) old channels with new ones.

For that we introduce mlx5e_channels object and an API to manage it:
 - channels = open_channels(new_params):
   open fresh TX/RX channels
 - activate_channels(channels):
   redirect traffic to them and attach them to the netdev
 - deactivate_channes(channels)
   stop traffic and detach from netdev
 - close(channels)
   Free the TX/RX HW resources of those channels

With the above strategy it is straightforward to achieve the desired
behavior of fail-safe configuration.  In pseudo code:

make_new_config(new_params)
{
	old_channels = current_active_channels;
	new_channels = create_channels(new_params);
	if (!new_channels)
		return "Failed, but current channels are still active :)"

	deactivate_channels(old_channels); /* Can't fail */
	set_hw_new_state();                /* If needed  */
	activate_channels(new_channels);   /* Can't fail */
	close_channels(old_channels);
	current_active_channels = new_channels;

        return "SUCCESS";
}

At the top of this series, we change the following flows to be fail-safe:
ethtool:
   - ring parameters
   - coalesce parameters
   - tx copy break parameters
   - cqe compressing/moderation mode setting (priv flags)
ndos:
   - tc setup
   - set features: LRO
   - change mtu
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc628c96

Merge branch 'bond-link-status-fixes' · 95ed0edd

由 David S. Miller 提交于 3月 27, 2017

Mahesh Bandewar says:

====================
link-status fixes for mii-monitoring

The mii monitoring is divided into two phases - inspect and commit. The
inspect phase technically should not make any changes to the state and
defer it to the commit phase. However detected link state inconsistencies
on several machines and discovered that it's the result of some
inconsistent update to link states and assumption that you *always* get
rtnl-mutex. In reality when trylock() fails to acquire rtnl-mutex, the
commit phase is postponed until next mii-mon run. At the next round
because of the state change performed in the previous inspect-run, this
round does not detect any changes and would skip calling commit phase.
This would result in an inconsistent state until next link event happens
(if it ever happens).

During the the commit phase, it's always assumed that speed and duplex
fetch is always successful, but that's always not the case. However the
slave state is marked UP irrespective of speed / duplex fetch operation.
If the speed / duplex fetch operation results in insane values for either
of these two fields, then keeping internal link state UP is not going to
provide fruitful results either.

Please see into individual patches for more details.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95ed0edd

bonding: avoid printing while holding a spinlock · e292dcae

由 Mahesh Bandewar 提交于 3月 27, 2017

Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e292dcae

bonding: correctly update link status during mii-commit phase · b5bf0f5b

由 Mahesh Bandewar 提交于 3月 27, 2017

bond_miimon_commit() marks the link UP after attempting to get the speed
and duplex settings for the link. There is a possibility that
bond_update_speed_duplex() could fail. This is another place where it
could result into an inconsistent bonding link state.

With this patch the link will be marked UP only if the speed and duplex
values retrieved have sane values and processed further.
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5bf0f5b

bonding: make speed, duplex setting consistent with link state · c4adfc82

由 Mahesh Bandewar 提交于 3月 27, 2017

bond_update_speed_duplex() retrieves speed and duplex settings. There
is a possibility of failure in retrieving these values but caller has
to assume it's always successful. This leads to having inconsistent
slave link settings. If these (speed, duplex) values cannot be
retrieved, then keeping the link UP causes problems.

The updated bond_update_speed_duplex() returns 0 on success if it
retrieves sane values for speed and duplex. On failure it returns 1
and marks the link down.
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c4adfc82

bonding: improve link-status update in mii-monitoring · de77ecd4

由 Mahesh Bandewar 提交于 3月 27, 2017

The primary issue is that mii-inspect phase updates link-state and
expects changes to be committed during the mii-commit phase. After
the inspect phase if it fails to acquire rtnl-mutex, the commit
phase (bond_mii_commit) doesn't get to run. This partially updated
state stays and makes the internal-state inconsistent.

e.g. setup bond0 => slaves: eth1, eth2
eth1 goes DOWN -> UP
   mii_monitor()
	mii-inspect()
	    bond_set_slave_link_state(eth1, UP, DontNotify)
	rtnl_trylock() <- fails!

Next mii-monitor round
eth1: No change
   mii_monitor()
	mii-inspect()
	    eth1->link == current-status (ethtool_ops->get_link)
	    no-change-detected

End result:
    eth1:
      Link = BOND_LINK_UP
      Speed = 0xfffff  [SpeedUnknown]
      Duplex = 0xff    [DuplexUnknown]

This doesn't always happen but for some unlucky machines in a large set
of machines it creates problems.

The fix for this is to avoid making changes during inspect phase and
postpone them until acquiring the rtnl-mutex / invoking commit phase.
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de77ecd4

bonding: split bond_set_slave_link_state into two parts · f307668b

由 Mahesh Bandewar 提交于 3月 27, 2017

Split the function into two (a) propose (b) commit phase without
changing the semantics for the original API.
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f307668b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功