1. 17 2月, 2019 2 次提交
    • G
      sock: consistent handling of extreme SO_SNDBUF/SO_RCVBUF values · 4057765f
      Guillaume Nault 提交于
      SO_SNDBUF and SO_RCVBUF (and their *BUFFORCE version) may overflow or
      underflow their input value. This patch aims at providing explicit
      handling of these extreme cases, to get a clear behaviour even with
      values bigger than INT_MAX / 2 or lower than INT_MIN / 2.
      
      For simplicity, only SO_SNDBUF and SO_SNDBUFFORCE are described here,
      but the same explanation and fix apply to SO_RCVBUF and SO_RCVBUFFORCE
      (with 'SNDBUF' replaced by 'RCVBUF' and 'wmem_max' by 'rmem_max').
      
      Overflow of positive values
      
      ===========================
      
      When handling SO_SNDBUF or SO_SNDBUFFORCE, if 'val' exceeds
      INT_MAX / 2, the buffer size is set to its minimum value because
      'val * 2' overflows, and max_t() considers that it's smaller than
      SOCK_MIN_SNDBUF. For SO_SNDBUF, this can only happen with
      net.core.wmem_max > INT_MAX / 2.
      
      SO_SNDBUF and SO_SNDBUFFORCE are actually designed to let users probe
      for the maximum buffer size by setting an arbitrary large number that
      gets capped to the maximum allowed/possible size. Having the upper
      half of the positive integer space to potentially reduce the buffer
      size to its minimum value defeats this purpose.
      
      This patch caps the base value to INT_MAX / 2, so that bigger values
      don't overflow and keep setting the buffer size to its maximum.
      
      Underflow of negative values
      ============================
      
      For negative numbers, SO_SNDBUF always considers them bigger than
      net.core.wmem_max, which is bounded by [SOCK_MIN_SNDBUF, INT_MAX].
      Therefore such values are set to net.core.wmem_max and we're back to
      the behaviour of positive integers described above (return maximum
      buffer size if wmem_max <= INT_MAX / 2, return SOCK_MIN_SNDBUF
      otherwise).
      
      However, SO_SNDBUFFORCE behaves differently. The user value is
      directly multiplied by two and compared with SOCK_MIN_SNDBUF. If
      'val * 2' doesn't underflow or if it underflows to a value smaller
      than SOCK_MIN_SNDBUF then buffer size is set to its minimum value.
      Otherwise the buffer size is set to the underflowed value.
      
      This patch treats negative values passed to SO_SNDBUFFORCE as null, to
      prevent underflows. Therefore negative values now always set the buffer
      size to its minimum value.
      
      Even though SO_SNDBUF behaves inconsistently by setting buffer size to
      the maximum value when passed a negative number, no attempt is made to
      modify this behaviour. There may exist some programs that rely on using
      negative numbers to set the maximum buffer size. Avoiding overflows
      because of extreme net.core.wmem_max values is the most we can do here.
      
      Summary of altered behaviours
      =============================
      
      val      : user-space value passed to setsockopt()
      val_uf   : the underflowed value resulting from doubling val when
                 val < INT_MIN / 2
      wmem_max : short for net.core.wmem_max
      val_cap  : min(val, wmem_max)
      min_len  : minimal buffer length (that is, SOCK_MIN_SNDBUF)
      max_len  : maximal possible buffer length, regardless of wmem_max (that
                 is, INT_MAX - 1)
      ^^^^     : altered behaviour
      
      SO_SNDBUF:
      +-------------------------+-------------+------------+----------------+
      |       CONDITION         | OLD RESULT  | NEW RESULT |    COMMENT     |
      +-------------------------+-------------+------------+----------------+
      | val < 0 &&              |             |            | No overflow,   |
      | wmem_max <= INT_MAX/2   | wmem_max*2  | wmem_max*2 | keep original  |
      |                         |             |            | behaviour      |
      +-------------------------+-------------+------------+----------------+
      | val < 0 &&              |             |            | Cap wmem_max   |
      | INT_MAX/2 < wmem_max    | min_len     | max_len    | to prevent     |
      |                         |             | ^^^^^^^    | overflow       |
      +-------------------------+-------------+------------+----------------+
      | 0 <= val <= min_len/2   | min_len     | min_len    | Ordinary case  |
      +-------------------------+-------------+------------+----------------+
      | min_len/2 < val &&      | val_cap*2   | val_cap*2  | Ordinary case  |
      | val_cap <= INT_MAX/2    |             |            |                |
      +-------------------------+-------------+------------+----------------+
      | min_len < val &&        |             |            | Cap val_cap    |
      | INT_MAX/2 < val_cap     | min_len     | max_len    | again to       |
      | (implies that           |             | ^^^^^^^    | prevent        |
      | INT_MAX/2 < wmem_max)   |             |            | overflow       |
      +-------------------------+-------------+------------+----------------+
      
      SO_SNDBUFFORCE:
      +------------------------------+---------+---------+------------------+
      |          CONDITION           | BEFORE  | AFTER   |     COMMENT      |
      |                              | PATCH   | PATCH   |                  |
      +------------------------------+---------+---------+------------------+
      | val < INT_MIN/2 &&           | min_len | min_len | Underflow with   |
      | val_uf <= min_len            |         |         | no consequence   |
      +------------------------------+---------+---------+------------------+
      | val < INT_MIN/2 &&           | val_uf  | min_len | Set val to 0 to  |
      | val_uf > min_len             |         | ^^^^^^^ | avoid underflow  |
      +------------------------------+---------+---------+------------------+
      | INT_MIN/2 <= val < 0         | min_len | min_len | No underflow     |
      +------------------------------+---------+---------+------------------+
      | 0 <= val <= min_len/2        | min_len | min_len | Ordinary case    |
      +------------------------------+---------+---------+------------------+
      | min_len/2 < val <= INT_MAX/2 | val*2   | val*2   | Ordinary case    |
      +------------------------------+---------+---------+------------------+
      | INT_MAX/2 < val              | min_len | max_len | Cap val to       |
      |                              |         | ^^^^^^^ | prevent overflow |
      +------------------------------+---------+---------+------------------+
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4057765f
    • D
      Merge tag 'mlx5-updates-2019-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · f2281c24
      David S. Miller 提交于
      Saeed Mahameed says:
      
      ====================
      Support Mellanox BlueField SmartNIC (mlx5-updates-2019-02-15)
      
      Bodong Wang says,
      
      BlueField device is a multi-core ARM processor in a highly integrated
      system on chip coupled with the ConnectX interconnect controller.
      BlueField device can be presented in one out of two modes:
      
      - SEPARATED_HOST: ARM processors as a separated and orthogonal host
        like any other external host in the multi-host virtualization model.
      - EMBEDDED_CPU: ARM processors as Embedded CPU (EC) and part of the
        external hosts virtualization model.
      
      While existing driver already supports the device on separated_host
      mode, this patch series focus on the functionalities of embedded_cpu
      mode.
      
      On embedded_cpu mode, BlueField device exposes regular network
      controller PCI function in the BlueField host(e.g, x86). However, a
      separate PCI function called Embedded CPU Physical Function(ECPF) is
      also added to the ARM host side, where standard Linux distributions is
      able to run on the ARM cores. Depends on the NV configuration from
      firmware, ECPF can be the e-switch manager and firmware pages supplier.
      If ECPF is configured as e-switch manager and page supplier, it will
      take over the responsibilities from the PF on BlueField host includes:
      - Owns, controls and manages all e-switch parts, and takes e-switch
        traffic by default. It also should perform ENABLE_HCA for the host
        PF just like a PF does for its VFs.
      - Provides and manages the ICM host memory required for the HCA to
        store various contexts for itself, the PF and VFs belong the
        e-switch it manages.
      
      The PF on BlueField host side is still responsible for:
      - Control its own permanent MAC.
      - PCI and SRIOV configurations and perform ENABLE_HCA for its VFs.
      
      The ECPF can also retrieve information about the external host it
      controls, like host identifier, PCI BDF and number of virtual functions.
      As these parameters may be changed dynamically, an event will be triggered
      to the driver on ECPF side.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2281c24
  2. 16 2月, 2019 32 次提交
  3. 15 2月, 2019 6 次提交