1. 08 2月, 2019 22 次提交
    • D
      Merge branch 'smc-next' · f06f095f
      David S. Miller 提交于
      Ursula Braun says:
      
      ====================
      net/smc: patches 2019-02-07
      
      here are patches for SMC:
      * patches 1, 3, and 6 are cleanups without functional change
      * patch 2 postpones closing of internal clcsock
      * patches 4 and 5 improve link group creation locking
      * patch 7 restores AF_SMC as diag_family field
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f06f095f
    • K
      net/smc: original socket family in inet_sock_diag · 232dc8ef
      Karsten Graul 提交于
      Commit ed75986f ("net/smc: ipv6 support for smc_diag.c") changed the
      value of the diag_family field. The idea was to indicate the family of
      the IP address in the inet_diag_sockid field. But the change makes it
      impossible to distinguish an inet_sock_diag response message from SMC
      sock_diag response. This patch restores the original behaviour and sends
      AF_SMC as value of the diag_family field.
      
      Fixes: ed75986f ("net/smc: ipv6 support for smc_diag.c")
      Reported-by: NEugene Syromiatnikov <esyr@redhat.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      232dc8ef
    • K
      net/smc: move code to clear the conn->lgr field · 8fc002b0
      Karsten Graul 提交于
      The lgr field of an smc_connection is set in smc_conn_create() and
      should be cleared in smc_conn_free() for consistency reasons, so move
      the responsible code.
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8fc002b0
    • H
      net/smc: use client and server LGR pending locks for SMC-R · 72a36a8a
      Hans Wippel 提交于
      If SMC client and server connections are both established at the same
      time, smc_connect_rdma() cannot send a CLC confirm message while
      smc_listen_work() is waiting for one due to lock contention. This can
      result in timeouts in smc_clc_wait_msg() and failed SMC connections.
      
      In case of SMC-R, there are two types of LGRs (client and server LGRs)
      which can be protected by separate locks. So, this patch splits the LGR
      pending lock into two separate locks for client and server to avoid the
      locking issue for SMC-R.
      Signed-off-by: NHans Wippel <hwippel@linux.ibm.com>
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72a36a8a
    • H
      net/smc: unlock LGR pending lock earlier for SMC-D · 62c7139f
      Hans Wippel 提交于
      If SMC client and server connections are both established at the same
      time, smc_connect_ism() cannot send a CLC confirm message while
      smc_listen_work() is waiting for one due to lock contention. This can
      result in timeouts in smc_clc_wait_msg() and failed SMC connections.
      
      In case of SMC-D, the LGR pending lock is not needed while
      smc_listen_work() is waiting for the CLC confirm message. So, this patch
      releases the lock earlier for SMC-D to avoid the locking issue.
      Signed-off-by: NHans Wippel <hwippel@linux.ibm.com>
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62c7139f
    • U
      net/smc: use smc_curs_copy() for SMC-D · a225d2cd
      Ursula Braun 提交于
      SMC already provides a wrapper for atomic64 calls to be
      architecture independent. Use this wrapper for SMC-D as well.
      Reported-by: NJens Remus <jremus@linux.ibm.com>
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a225d2cd
    • U
      net/smc: postpone release of clcsock · b03faa1f
      Ursula Braun 提交于
      According to RFC7609 (http://www.rfc-editor.org/info/rfc7609)
      first the SMC-R connection is shut down and then the normal TCP
      connection FIN processing drives cleanup of the internal TCP connection.
      The unconditional release of the clcsock during active socket closing
      has to be postponed if the peer has not yet signalled socket closing.
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b03faa1f
    • U
      s390/net: move pnet constants · 41c80be2
      Ursula Braun 提交于
      There is no need to define these PNETID related constants in
      the pnet.h file, since they are just used locally within pnet.c.
      Just code cleanup, no functional change.
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41c80be2
    • P
      net: vxlan: Free a leaked vetoed multicast rdst · fc4aa1ca
      Petr Machata 提交于
      When an rdst is rejected by a driver, the current code removes it from
      the remote list, but neglects to free it. This is triggered by
      tools/testing/selftests/drivers/net/mlxsw/vxlan_fdb_veto.sh and shows as
      the following kmemleak trace:
      
      unreferenced object 0xffff88817fa3d888 (size 96):
        comm "softirq", pid 0, jiffies 4372702718 (age 165.252s)
        hex dump (first 32 bytes):
          02 00 00 00 c6 33 64 03 80 f5 a2 61 81 88 ff ff  .....3d....a....
          06 df 71 ae ff ff ff ff 0c 00 00 00 04 d2 6a 6b  ..q...........jk
        backtrace:
          [<00000000296b27ac>] kmem_cache_alloc_trace+0x1ae/0x370
          [<0000000075c86dc6>] vxlan_fdb_append.part.12+0x62/0x3b0 [vxlan]
          [<00000000e0414b63>] vxlan_fdb_update+0xc61/0x1020 [vxlan]
          [<00000000f330c4bd>] vxlan_fdb_add+0x2e8/0x3d0 [vxlan]
          [<0000000008f81c2c>] rtnl_fdb_add+0x4c2/0xa10
          [<00000000bdc4b270>] rtnetlink_rcv_msg+0x6dd/0x970
          [<000000006701f2ce>] netlink_rcv_skb+0x290/0x410
          [<00000000c08a5487>] rtnetlink_rcv+0x15/0x20
          [<00000000d5f54b1e>] netlink_unicast+0x43f/0x5e0
          [<00000000db4336bb>] netlink_sendmsg+0x789/0xcd0
          [<00000000e1ee26b6>] sock_sendmsg+0xba/0x100
          [<00000000ba409802>] ___sys_sendmsg+0x631/0x960
          [<000000003c332113>] __sys_sendmsg+0xea/0x180
          [<00000000f4139144>] __x64_sys_sendmsg+0x78/0xb0
          [<000000006d1ddc59>] do_syscall_64+0x94/0x410
          [<00000000c8defa9a>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Move vxlan_dst_free() up and schedule a call thereof to plug this leak.
      
      Fixes: 61f46fe8 ("vxlan: Allow vetoing of FDB notifications")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc4aa1ca
    • D
      Merge branch 'devlink-health' · 0739d24d
      David S. Miller 提交于
      Eran Ben Elisha says:
      
      ====================
      Devlink health reporting and recovery system
      
      The health mechanism is targeted for Real Time Alerting, in order to know when
      something bad had happened to a PCI device
      - Provide alert debug information
      - Self healing
      - If problem needs vendor support, provide a way to gather all needed debugging
        information.
      
      The main idea is to unify and centralize driver health reports in the
      generic devlink instance and allow the user to set different
      attributes of the health reporting and recovery procedures.
      
      The devlink health reporter:
      Device driver creates a "health reporter" per each error/health type.
      Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
      or unknown (driver specific).
      For each registered health reporter a driver can issue error/health reports
      asynchronously. All health reports handling is done by devlink.
      Device driver can provide specific callbacks for each "health reporter", e.g.
       - Recovery procedures
       - Diagnostics and object dump procedures
       - OOB initial attributes
      Different parts of the driver can register different types of health reporters
      with different handlers.
      
      Once an error is reported, devlink health will do the following actions:
        * A log is being send to the kernel trace events buffer
        * Health status and statistics are being updated for the reporter instance
        * Object dump is being taken and saved at the reporter instance (as long as
          there is no other dump which is already stored)
        * Auto recovery attempt is being done. Depends on:
          - Auto-recovery configuration
          - Grace period vs. time passed since last recover
      
      The user interface:
      User can access/change each reporter attributes and driver specific callbacks
      via devlink, e.g per error type (per health reporter)
       - Configure reporter's generic attributes (like: Disable/enable auto recovery)
       - Invoke recovery procedure
       - Run diagnostics
       - Object dump
      
      The devlink health interface (via netlink):
      DEVLINK_CMD_HEALTH_REPORTER_GET
        Retrieves status and configuration info per DEV and reporter.
      DEVLINK_CMD_HEALTH_REPORTER_SET
        Allows reporter-related configuration setting.
      DEVLINK_CMD_HEALTH_REPORTER_RECOVER
        Triggers a reporter's recovery procedure.
      DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
        Retrieves diagnostics data from a reporter on a device.
      DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET
        Retrieves the last stored dump. Devlink health
        saves a single dump. If an dump is not already stored by the devlink
        for this reporter, devlink generates a new dump.
        dump output is defined by the reporter.
      DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR
        Clears the last saved dump file for the specified reporter.
      
                                                     netlink
                                            +--------------------------+
                                            |                          |
                                            |            +             |
                                            |            |             |
                                            +--------------------------+
                                                         |request for ops
                                                         |(diagnose,
       mlx5_core                             devlink     |recover,
                                                         |dump)
      +--------+                            +--------------------------+
      |        |                            |    reporter|             |
      |        |                            |  +---------v----------+  |
      |        |   ops execution            |  |                    |  |
      |     <----------------------------------+                    |  |
      |        |                            |  |                    |  |
      |        |                            |  + ^------------------+  |
      |        |                            |    | request for ops     |
      |        |                            |    | (recover, dump)     |
      |        |                            |    |                     |
      |        |                            |  +-+------------------+  |
      |        |     health report          |  | health handler     |  |
      |        +------------------------------->                    |  |
      |        |                            |  +--------------------+  |
      |        |     health reporter create |                          |
      |        +---------------------------->                          |
      +--------+                            +--------------------------+
      
      In this patchset, mlx5e TX reporter is implemented.
      
      Cmdline format:
          devlink health show [DEV reporter REPORTE_NAME]
          devlink health recover DEV reporter REPORTER_NAME
          devlink health diagnose DEV reporter REPORTER_NAME
          devlink health dump show DEV reporter REPORTER_NAME
          devlink health dump clear DEV reporter REPORTER_NAME
          devlink health set DEV reporter REPORTER_NAME NAME VALUE
      
      Cmdline examples:
      $devlink health show
      pci/0000:00:09.0:
        name tx
          state healthy #err 1 #recover 0 last_dump_ts N/A
          parameters:
            grace_period 500 auto_recover false
      
      $devlink health diagnose pci/0000:00:09.0 reporter tx -j -p
      {
          "SQs": [ {
                  "sqn": 138,
                  "HW state": 1,
                  "stopped": false
              },{
                  "sqn": 142,
                  "HW state": 1,
                  "stopped": false
              } ]
      }
      
      $devlink health diagnose pci/0000:00:09.0 reporter tx
      SQs:
        sqn: 138 HW state: 1 stopped: false
        sqn: 142 HW state: 1 stopped: false
      
      $devlink health recover pci/0000:00:09 reporter tx
      
      $devlink health set pci/0000:00:09.0 reporter tx grace_period 3500
      
      $devlink health set pci/0000:00:09.0 reporter tx auto_recover false
      
      Changelog:
      v4:
      - Rebase on latest net-next
      - Remove trace_devlink_health signature exposure in case CONFIG_NET_DEVLINK is
        not defined as it shall only be used from devlink.
      
      v3:
      - Redesign of devlink <-> driver fmsg API
      - Various bug fixes
      
      v2:
      - Remove FW* reporters to decrease the amount of patches in the patchset
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0739d24d
    • A
      devlink: Add Documentation/networking/devlink-health.txt · db2ab7a0
      Aya Levin 提交于
      This patch adds a new file to add information about devlink health
      mechanism.
      Signed-off-by: NAya Levin <ayal@mellanox.com>
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db2ab7a0
    • E
      net/mlx5e: Add tx timeout support for mlx5e tx reporter · 7d91126b
      Eran Ben Elisha 提交于
      With this patch, ndo_tx_timeout callback will be redirected to the tx
      reporter in order to detect a tx timeout error and report it to the
      devlink health. (The watchdog detects tx timeouts, but the driver verify
      the issue still exists before launching any recover method).
      
      In addition, recover from tx timeout in case of lost interrupt was added
      to the tx reporter recover method. The tx timeout recover from lost
      interrupt is not a new feature in the driver, this patch re-organize the
      functionality and move it to the tx reporter recovery flow.
      
      tx timeout example:
      (with auto_recover set to false, if set to true, the manual recover and
      diagnose sections are irrelevant)
      
      $cat /sys/kernel/debug/tracing/trace
      ...
      devlink_health_report: bus_name=pci dev_name=0000:00:09.0
      driver_name=mlx5_core reporter_name=tx: TX timeout on queue: 0, SQ: 0x8a,
      CQ: 0x35, SQ Cons: 0x2 SQ Prod: 0x2, usecs since last trans: 14912000
      
      $devlink health show
      pci/0000:00:09.0:
        name tx
          state healthy #err 1 #recover 0 last_dump_ts N/A
          parameters:
            grace_period 500 auto_recover false
      
      $devlink health diagnose pci/0000:00:09.0 reporter tx -j -p
      {
          "SQs": [ {
                  "sqn": 138,
                  "HW state": 1,
                  "stopped": true
              },{
                  "sqn": 142,
                  "HW state": 1,
                  "stopped": false
              } ]
      }
      
      $devlink health diagnose pci/0000:00:09.0 reporter tx
      SQs:
        sqn: 138 HW state: 1 stopped: true
        sqn: 142 HW state: 1 stopped: false
      
      $devlink health recover pci/0000:00:09 reporter tx
      $devlink health show
      pci/0000:00:09.0:
        name tx
          state healthy #err 1 #recover 1 last_dump_ts N/A
          parameters:
            grace_period 500 auto_recover false
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d91126b
    • E
      net/mlx5e: Add tx reporter support · de8650a8
      Eran Ben Elisha 提交于
      Add mlx5e tx reporter to devlink health reporters. This reporter will be
      responsible for diagnosing, reporting and recovering of tx errors.
      This patch declares the TX reporter operations and creates it using the
      devlink health API. Currently, this reporter supports reporting and
      recovering from send error CQE only. In addition, it adds diagnose
      information for the open SQs.
      
      For a local SQ recover (due to driver error report), in case of SQ recover
      failure, the recover operation will be considered as a failure.
      For a full tx recover, an attempt to close and open the channels will be
      done. If this one passed successfully, it will be considered as a
      successful recover.
      
      The SQ recover from error CQE flow is not a new feature in the driver,
      this patch re-organize the functions and adapt them for the devlink
      health API. For this purpose, move code from en_main.c to a new file
      named reporter_tx.c.
      
      Diagnose output:
      $devlink health diagnose pci/0000:00:09.0 reporter tx -j -p
      {
          "SQs": [ {
                  "sqn": 138,
                  "HW state": 1,
                  "stopped": false
              },{
                  "sqn": 142,
                  "HW state": 1,
                  "stopped": false
              } ]
      }
      
      $devlink health diagnose pci/0000:00:09.0 reporter tx
      SQs:
        sqn: 138 HW state: 1 stopped: false
        sqn: 142 HW state: 1 stopped: false
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de8650a8
    • E
      devlink: Add health dump {get,clear} commands · 35455e23
      Eran Ben Elisha 提交于
      Add devlink health dump commands, in order to run an dump operation
      over a specific reporter.
      
      The supported operations are dump_get in order to get last saved
      dump (if not exist, dump now) and dump_clear to clear last saved
      dump.
      
      It is expected from driver's callback for diagnose command to fill it
      via the devlink fmsg API. Devlink will parse it and convert it to
      netlink nla API in order to pass it to the user.
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35455e23
    • E
      devlink: Add health diagnose command · fca42a27
      Eran Ben Elisha 提交于
      Add devlink health diagnose command, in order to run a diagnose
      operation over a specific reporter.
      
      It is expected from driver's callback for diagnose command to fill it
      via the devlink fmsg API. Devlink will parse it and convert it to
      netlink nla API in order to pass it to the user.
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fca42a27
    • E
      devlink: Add health recover command · 20a0943a
      Eran Ben Elisha 提交于
      Add devlink health recover command to the uapi, in order to allow the user
      to execute a recover operation over a specific reporter.
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20a0943a
    • E
      devlink: Add health set command · a1e55ec0
      Eran Ben Elisha 提交于
      Add devlink health set command, in order to set configuration parameters
      for a specific reporter.
      Supported parameters are:
      - graceful_period: Time interval between auto recoveries (in msec)
      - auto_recover: Determines if the devlink shall execute recover upon
      		receiving error for the reporter
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1e55ec0
    • E
      devlink: Add health get command · 7afe335a
      Eran Ben Elisha 提交于
      Add devlink health get command to provide reporter/s data for user space.
      Add the ability to get data per reporter or dump data from all available
      reporters.
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7afe335a
    • E
      devlink: Add health report functionality · c8e1da0b
      Eran Ben Elisha 提交于
      Upon error discover, every driver can report it to the devlink health
      mechanism via devlink_health_report function, using the appropriate
      reporter registered to it. Driver can pass error specific context which
      will be delivered to it as part of the dump / recovery callbacks.
      
      Once an error is reported, devlink health will do the following actions:
      * A log is being send to the kernel trace events buffer
      * Health status and statistics are being updated for the reporter instance
      * Object dump is being taken and stored at the reporter instance (as long
        as there is no other dump which is already stored)
      * Auto recovery attempt is being done. Depends on:
        - Auto Recovery configuration
        - Grace period vs. Time since last recover
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8e1da0b
    • E
      devlink: Add health reporter create/destroy functionality · a0bdcc59
      Eran Ben Elisha 提交于
      Devlink health reporter is an instance for reporting, diagnosing and
      recovering from run time errors discovered by the reporters.
      Define it's data structure and supported operations.
      In addition, expose devlink API to create and destroy a reporter.
      Each devlink instance will hold it's own reporters list.
      
      As part of the allocation, driver shall provide a set of callbacks which
      will be used by devlink in order to handle health reports and user
      commands related to this reporter. In addition, driver is entitled to
      provide some priv pointer, which can be fetched from the reporter by
      devlink_health_reporter_priv function.
      
      For each reporter, devlink will hold a metadata of statistics,
      dump msg and status.
      
      For passing dumps and diagnose data to the user-space, it will use devlink
      fmsg API.
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0bdcc59
    • E
      devlink: Add devlink formatted message (fmsg) API · 1db64e87
      Eran Ben Elisha 提交于
      Devlink fmsg is a mechanism to pass descriptors between drivers and
      devlink, in json-like format. The API allows the driver to add nested
      attributes such as object, object pair and value array, in addition to
      attributes such as name and value.
      
      Driver can use this API to fill the fmsg context in a format which will be
      translated by the devlink to the netlink message later.
      There is no memory allocation in advance (other than the initial list
      head), and it dynamically allocates messages descriptors and add them to
      the list on the fly.
      
      When it needs to send the data using SKBs to the netlink layer, it
      fragments the data between different SKBs. In order to do this
      fragmentation, it uses virtual nests attributes, to avoid actual
      nesting use which cannot be divided between different SKBs.
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1db64e87
    • M
      net: phy: fixed_phy: Fix fixed_phy not checking GPIO · 8f289805
      Moritz Fischer 提交于
      Fix fixed_phy not checking GPIO if no link_update callback
      is registered.
      
      In the original version all users registered a link_update
      callback so the issue was masked.
      
      Fixes: a5597008 ("phy: fixed_phy: Add gpio to determine link up/down.")
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NMoritz Fischer <mdf@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f289805
  2. 07 2月, 2019 18 次提交