1. 25 4月, 2020 10 次提交
  2. 24 4月, 2020 29 次提交
    • D
      Merge branch 'ovs-meter-tables' · 18021360
      David S. Miller 提交于
      Tonghao Zhang says:
      
      ====================
      openvswitch: expand meter tables and fix bug
      
      The patch set expand or shrink the meter table when necessary.
      and other patches fix bug or improve codes.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18021360
    • T
      net: openvswitch: use u64 for meter bucket · e5735887
      Tonghao Zhang 提交于
      When setting the meter rate to 4+Gbps, there is an
      overflow, the meters don't work as expected.
      
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Cc: Andy Zhou <azhou@ovn.org>
      Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5735887
    • T
      net: openvswitch: make EINVAL return value more obvious · c7735008
      Tonghao Zhang 提交于
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Cc: Andy Zhou <azhou@ovn.org>
      Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7735008
    • T
      net: openvswitch: remove the unnecessary check · a8e38738
      Tonghao Zhang 提交于
      Before invoking the ovs_meter_cmd_reply_stats, "meter"
      was checked, so don't check it agin in that function.
      
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Cc: Andy Zhou <azhou@ovn.org>
      Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8e38738
    • T
      net: openvswitch: set max limitation to meters · eb58eebc
      Tonghao Zhang 提交于
      Don't allow user to create meter unlimitedly, which may cause
      to consume a large amount of kernel memory. The max number
      supported is decided by physical memory and 20K meters as default.
      
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Cc: Andy Zhou <azhou@ovn.org>
      Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb58eebc
    • T
      net: openvswitch: expand the meters supported number · c7c4c44c
      Tonghao Zhang 提交于
      In kernel datapath of Open vSwitch, there are only 1024
      buckets of meter in one datapath. If installing more than
      1024 (e.g. 8192) meters, it may lead to the performance drop.
      But in some case, for example, Open vSwitch used as edge
      gateway, there should be 20K at least, where meters used for
      IP address bandwidth limitation.
      
      [Open vSwitch userspace datapath has this issue too.]
      
      For more scalable meter, this patch use meter array instead of
      hash tables, and expand/shrink the array when necessary. So we
      can install more meters than before in the datapath.
      Introducing the struct *dp_meter_instance, it's easy to
      expand meter though changing the *ti point in the struct
      *dp_meter_table.
      
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Cc: Andy Zhou <azhou@ovn.org>
      Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7c4c44c
    • C
      net: phy: bcm54140: fix less than zero comparison on an unsigned · efcd549d
      Colin Ian King 提交于
      Currently the unsigned variable tmp is being checked for an negative
      error return from the call to bcm_phy_read_rdb and this can never
      be true since tmp is unsigned.  Fix this by making tmp a plain int.
      
      Addresses-Coverity: ("Unsigned compared against 0")
      Fixes: 4406d36d ("net: phy: bcm54140: add hwmon support")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NMichael Walle <michael@walle.cc>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efcd549d
    • Z
      qed: Make ll2_cbs static · 8ffe2df6
      Zou Wei 提交于
      Fix the following sparse warning:
      
      drivers/net/ethernet/qlogic/qed/qed_ll2.c:2334:20: warning: symbol 'll2_cbs'
      was not declared. Should it be static?
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NZou Wei <zou_wei@huawei.com>
      Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ffe2df6
    • X
      net: sched : Remove unnecessary cast in kfree · 3c9143d9
      Xu Wang 提交于
      Remove unnecassary casts in the argument to kfree.
      Signed-off-by: NXu Wang <vulab@iscas.ac.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c9143d9
    • D
      Merge branch 'net-ethernet-ti-cpts-add-irq-and-HW_TS_PUSH-events' · 92a8da46
      David S. Miller 提交于
      Grygorii Strashko says:
      
      ====================
      net: ethernet: ti: cpts: add irq and HW_TS_PUSH events
      
      This is re-spin of patches to add CPSW IRQ and HW_TS_PUSH events support I've
      sent long time ago [1]. In this series, I've tried to restructure and split changes,
      and also add few additional optimizations comparing to initial RFC submission [1].
      
      The HW_TS_PUSH events intended to serve for different timesync purposes on of
      which is to add PPS generation function, which can be implemented as below:
      
                           +-----------------+
                           | Control         |
                           | application     |
                  +------->+                 +----------+
                  |        |                 |          |
                  |        |                 |          |
                  |        +-----------------+          |
                  |                                     |
                  |                                     |
                  | PTP_EXTTS_REQUEST                   |
                  |                                     |
                  |                                     |
       +----------------------------------------------------------------+
                  |                                     |    Kernel
          +-------+----------+                  +-------v--------+
          |  \dev\ptpX       |                  | /sys/class/pwm/|
          |                  |                  |                |
          +-------^----------+                  +-------+--------+
                  |                                     |
                  |                                     |
                  |                             +-------v-------------------+
          +-------+----------+                  |                           |
          | CPTS driver      |                  |pwm/pwm-omap-dmtimer.c     |
          |                  |                  +---------------------------+
          +-------^----------+                  |clocksource/timer_ti_dm.c  |
                  |                             +-------+-------------------+
                  |HWx_TS_PUSH evt                      |
       +----------------------------------------------------------------+
                  |                                     |         HW
          +-------+----------+                  +-------v--------+
          | CPTS             |                  | DMTimer        |
          |                  |                  |                |
          |      HWx_TS_PUSH X<-----------------+                |
          |                  +                  |                |
          +------------------+                  +-------+--------+
                                                        |
                                                        X timer4
      
      As per my knowledge there is at least one public implemented above PPS generation
      schema from Tusori Tibor [2] based on initial HW_TS_PUSH enable submission[1].
      And now there is work done by Lokesh Vutla <lokeshvutla@ti.com> published to
      enable PWM enable/improve PWM adjustment from user space [3][4][5].
      
      Main changes comparing to initial submission:
      - TX timestamp processing deferred to ptp worker only
      - both CPTS IRQ and polling events processing supported to make it work for
        Keystone 2 also
      - switch to use new .gettimex64() interface
      - no DT updates as number of HWx_TS_PUSH inputs is static per HW
      
      Testing on am571x-idk/omap2plus_defconfig/+CONFIG_PREEMPT=y:
      1) testing HW_TS_PUSH
       - enable pwm in DT
      	pwm16: dmtimer-pwm {
      		compatible = "ti,omap-dmtimer-pwm";
      		ti,timers = <&timer16>;
      		#pwm-cells = <3>;
      	};
       - configure and start pwm
        echo 0 > /sys/class/pwm/pwmchip0/export
        echo 1000000000 > /sys/class/pwm/pwmchip0/pwm0/period
        echo 500000000 > /sys/class/pwm/pwmchip0/pwm0/duty_cycle
        echo 1 > /sys/class/pwm/pwmchip0/pwm0/enable
       - test HWx_TS_PUSH using Kernel selftest testptp application
        ./tools/testing/selftests/ptp/testptp -d /dev/ptp0 -e 1000 -i 3
      
      2) testing phc2sys
      phc2sys[1616.791]: eth0 rms 408190379792180864 max 1580914543017209856 freq +864 +/- 4635 delay 645 +/- 29
      phc2sys[1646.795]: eth0 rms 41 max 108 freq +0 +/- 36 delay 656 +/- 29
      phc2sys[1676.800]: eth0 rms 43 max 83 freq +2 +/- 38 delay 650 +/- 0
      phc2sys[1706.804]: eth0 rms 39 max 87 freq +4 +/- 34 delay 672 +/- 55
      phc2sys[1736.808]: eth0 rms 35 max 66 freq +1 +/- 30 delay 667 +/- 49
      phc2sys[1766.813]: eth0 rms 38 max 79 freq +2 +/- 33 delay 656 +/- 29
      phc2sys[1796.817]: eth0 rms 45 max 98 freq +1 +/- 39 delay 656 +/- 29
      phc2sys[1826.821]: eth0 rms 40 max 87 freq +5 +/- 35 delay 650 +/- 0
      phc2sys[1856.826]: eth0 rms 29 max 76 freq -0 +/- 25 delay 656 +/- 29
      phc2sys[1886.830]: eth0 rms 40 max 97 freq +4 +/- 35 delay 667 +/- 49
      phc2sys[1916.834]: eth0 rms 42 max 94 freq +2 +/- 36 delay 661 +/- 41
      phc2sys[1946.839]: eth0 rms 40 max 91 freq +2 +/- 35 delay 661 +/- 41
      phc2sys[1976.843]: eth0 rms 46 max 88 freq -0 +/- 40 delay 667 +/- 49
      phc2sys[2006.847]: eth0 rms 49 max 97 freq +2 +/- 43 delay 650 +/- 0
      
      3) testing ptp4l
      - 1G connection
      ptp4l[862.891]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
      ptp4l[923.894]: rms 1019697354682 max 5768279314068 freq +26053 +/- 72 delay 488 +/- 1
      ptp4l[987.896]: rms 13 max 26 freq +26005 +/- 29 delay 488 +/- 1
      ptp4l[1051.899]: rms 14 max 50 freq +25895 +/- 21 delay 488 +/- 1
      ptp4l[1115.901]: rms 11 max 27 freq +25878 +/- 17 delay 488 +/- 1
      ptp4l[1179.904]: rms 10 max 27 freq +25857 +/- 12 delay 488 +/- 1
      ptp4l[1243.906]: rms 14 max 37 freq +25851 +/- 15 delay 488 +/- 1
      ptp4l[1307.909]: rms 12 max 33 freq +25835 +/- 15 delay 488 +/- 1
      ptp4l[1371.911]: rms 11 max 27 freq +25832 +/- 14 delay 488 +/- 1
      ptp4l[1435.914]: rms 11 max 26 freq +25823 +/- 11 delay 488 +/- 1
      ptp4l[1499.916]: rms 10 max 29 freq +25829 +/- 11 delay 489 +/- 1
      ptp4l[1563.919]: rms 11 max 27 freq +25827 +/- 12 delay 488 +/- 1
      
      - 10M connection
      ptp4l[51.955]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
      ptp4l[112.957]: rms 279468848453933920 max 1580914542977391360 freq +25390 +/- 3207 delay 8222 +/- 36
      ptp4l[176.960]: rms 254 max 522 freq +25809 +/- 219 delay 8271 +/- 30
      ptp4l[240.962]: rms 271 max 684 freq +25868 +/- 234 delay 8249 +/- 22
      ptp4l[304.965]: rms 263 max 556 freq +25894 +/- 227 delay 8225 +/- 47
      ptp4l[368.967]: rms 238 max 648 freq +25908 +/- 204 delay 8234 +/- 40
      ptp4l[432.970]: rms 274 max 658 freq +25932 +/- 237 delay 8241 +/- 22
      ptp4l[496.972]: rms 247 max 557 freq +25943 +/- 213 delay 8223 +/- 26
      ptp4l[560.974]: rms 291 max 756 freq +25968 +/- 251 delay 8244 +/- 41
      ptp4l[624.977]: rms 249 max 697 freq +25975 +/- 216 delay 8258 +/- 22
      
      Changes in v5:
       - fixed build issue
      
      Changes in v4:
       - fixed comments from Richard Cochran
       - dropped patch "net: ethernet: ti: cpts: move rx timestamp processing to ptp
         worker only"
       - added "Acked-by" from Richard Cochran <richardcochran@gmail.com>
       - dependencies resolved, patch merged
      
      Changes in v3:
       - fixed rebase mess
       - fixed build issues
      
      Changes in v2 (broken):
       - fixed (formatting) comments from David Miller <davem@davemloft.net>
      
      v4: https://patchwork.ozlabs.org/project/netdev/cover/20200422201254.15232-1-grygorii.strashko@ti.com/
      v3: https://patchwork.ozlabs.org/project/netdev/cover/20200320194244.4703-1-grygorii.strashko@ti.com/
      v2: https://patchwork.ozlabs.org/cover/1258339/
      v1: https://patchwork.ozlabs.org/cover/1254708/
      
      [1] https://lore.kernel.org/patchwork/cover/799251/
      [2] https://usermanual.wiki/Document/SetupGuide.632280828.pdf
          https://github.com/t-tibor/msc_thesis
      [3] https://patchwork.kernel.org/cover/11421329/
      [4] https://patchwork.kernel.org/cover/11433197/
      [5] https://sourceforge.net/p/linuxptp/mailman/message/36943248/
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92a8da46
    • G
      net: ethernet: ti: cpsw: enable cpts irq · 84ea9c0a
      Grygorii Strashko 提交于
      The CPSW misc IRQ need be enabled for CPTS event_pend IRQs processing. This
      patch adds corresponding support to CPSW driver.
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84ea9c0a
    • G
      net: ethernet: ti: cpts: add support for HW_TS_PUSH events · b78aba49
      Grygorii Strashko 提交于
      Hence CPTS IRQ support is in place the W_TS_PUSH events can be added.
      PWM capable DmTimers can be used to generete input signals for CPTS on TI
      AM335x/AM437x/DRA7 SoCs to be timestamped:
      AM335x/AM437x: timer4 - timer7
      DRA7/AM57xx: timer13 - timer16
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b78aba49
    • G
      net: ethernet: ti: cpts: add irq support · 85624412
      Grygorii Strashko 提交于
      Add CPTS IRQ support, but do not enable it. By default, the CPTS driver
      will continue working using polling mode which is required for CPTS to
      continue working on platforms other than CPSW, like Keystone 2.
      
      The CPTS IRQ support is required to enable support for HW_TS_PUSH events.
      The CPSW CPTS IRQ and HW_TS_PUSH events support will be enabled in follow
      up patches.
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85624412
    • G
      net: ethernet: ti: cpts: rework locking · ba107428
      Grygorii Strashko 提交于
      Now spinlock is used to synchronize everything which is not required. Add
      mutex and use to sync access to PTP interface and PTP worker and use
      spinlock only to sync FIFO/events processing.
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba107428
    • G
      net: ethernet: ti: cpts: move tx timestamp processing to ptp worker only · c8f8e47e
      Grygorii Strashko 提交于
      Now the tx timestamp processing happens from different contexts - softirq
      and thread/PTP worker. Enabling IRQ will add one more hard_irq context.
      This makes over all defered TX timestamp processing and locking
      overcomplicated. Move tx timestamp processing to PTP worker always instead.
      
      napi_rx->cpts_tx_timestamp
       if ptp_packet then
          push to txq
          ptp_schedule_worker()
      
      do_aux_work->cpts_overflow_check
       cpts_process_events()
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8f8e47e
    • G
      net: ethernet: ti: cpts: optimize packet to event matching · 3bfd41b5
      Grygorii Strashko 提交于
      Now the CPTS driver performs packet (skb) parsing every time when it needs
      to match packet to CPTS event (including ptp_classify_raw() calls).
      
      This patch optimizes matching process by parsing packet only once upon
      arrival and stores PTP specific data in skb->cb using the same fromat as in
      CPTS HW event. As result, all future matching reduces to comparing two u32
      values.
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3bfd41b5
    • G
      net: ethernet: ti: cpts: switch to use new .gettimex64() interface · 856e59ab
      Grygorii Strashko 提交于
      The CPTS HW latches and saves CPTS counter value in CPTS fifo immediately
      after writing to CPSW_CPTS_PUSH.TS_PUSH (bit 0), so the total time that the
      driver needs to read the CPTS timestamp is the time required CPSW_CPTS_PUSH
      write to actually reach HW.
      
      Hence switch CPTS driver to implement new .gettimex64() callback for more
      precise measurement of the offset between a PHC and the system clock which
      is measured as time between
        write(CPSW_CPTS_PUSH)
        read(CPSW_CPTS_PUSH)
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      856e59ab
    • G
      net: ethernet: ti: cpts: move tc mult update in cpts_fifo_read() · 0d6df3e6
      Grygorii Strashko 提交于
      Now CPTS driver .adjfreq() generates request to read CPTS current time
      (CPTS_EV_PUSH) with intention to process all pending event using previous
      frequency adjustment values before switching to the new ones. So
      CPTS_EV_PUSH works as a marker to switch to the new frequency adjustment
      values. Current code assumes that all job is done in .adjfreq(), but after
      enabling IRQ this will not be true any more.
      
      Hence save new frequency adjustment values (mult) and perform actual freq
      adjustment in cpts_fifo_read() immediately after CPTS_EV_PUSH is received.
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d6df3e6
    • G
      net: ethernet: ti: cpts: separate hw counter read from timecounter · e66dccce
      Grygorii Strashko 提交于
      Now CPTS HW time reading code is implemented in timecounter->cyclecounter
      .read() callback and performs following operations:
      timecounter_read() ->cc.read() -> cpts_systim_read()
       - request current CPTS HW time CPTS_TS_PUSH.TS_PUSH = 1
       - poll CPTS FIFO for CPTS_EV_PUSH event with current HW timestamp
      
      This approach need to be changed for the future switch to PTP PHC
      .gettimex64() callback, which require to separate requesting current CPTS
      HW time and processing CPTS FIFO. And for the follow up patch, which
      improves .adjfreq() implementation.
      
      This patch moves code accessing CPTS HW out of timecounter code as
      following:
      - convert HW timestamp of every CPTS event to PTP time (us) and store it as
      part struct cpts_event;
      - add CPTS context field to store current CPTS HW time (counter) value and
      update it on CPTS_EV_PUSH reception;
      - move code accessing CPTS HW out of timecounter code and use current CPTS
      HW time (counter) from CPTS context instead;
      - ensure timecounter->cycle_last is updated on CPTS_EV_PUSH reception.
      
      After this change CPTS timecounter will only perform timekeeper role
      without actually accessing CPTS HW.
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e66dccce
    • G
      net: ethernet: ti: cpts: use dev_yy() api for logs · 79d6e755
      Grygorii Strashko 提交于
      Use dev_yy() API instead of pr_yy() for log outputs.
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79d6e755
    • D
      Merge branch 'net-napi-addition-of-napi_defer_hard_irqs' · 4c532b14
      David S. Miller 提交于
      Eric Dumazet says:
      
      ====================
      net: napi: addition of napi_defer_hard_irqs
      
      This patch series augments gro_glush_timeout feature with napi_defer_hard_irqs
      
      As extensively described in first patch changelog, this can suppresss
      the chit-chat traffic between NIC and host to signal interrupts and re-arming
      them, since this can be an issue on high speed NIC with many queues.
      
      The last patch in this series converts mlx4 TX completion to
      napi_complete_done(), to enable this new mechanism.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c532b14
    • E
      net/mlx4_en: use napi_complete_done() in TX completion · cf4058db
      Eric Dumazet 提交于
      In order to benefit from the new napi_defer_hard_irqs feature,
      we need to use napi_complete_done() variant in this driver.
      
      RX path is already using it, this patch implements TX completion side.
      
      mlx4_en_process_tx_cq() now returns the amount of retired packets,
      instead of a boolean, so that mlx4_en_poll_tx_cq() can pass
      this value to napi_complete_done().
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf4058db
    • E
      net: napi: use READ_ONCE()/WRITE_ONCE() · 7e417a66
      Eric Dumazet 提交于
      gro_flush_timeout and napi_defer_hard_irqs can be read
      from napi_complete_done() while other cpus write the value,
      whithout explicit synchronization.
      
      Use READ_ONCE()/WRITE_ONCE() to annotate the races.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e417a66
    • E
      net: napi: add hard irqs deferral feature · 6f8b12d6
      Eric Dumazet 提交于
      Back in commit 3b47d303 ("net: gro: add a per device gro flush timer")
      we added the ability to arm one high resolution timer, that we used
      to keep not-complete packets in GRO engine a bit longer, hoping that further
      frames might be added to them.
      
      Since then, we added the napi_complete_done() interface, and commit
      364b6055 ("net: busy-poll: return busypolling status to drivers")
      allowed drivers to avoid re-arming NIC interrupts if we made a promise
      that their NAPI poll() handler would be called in the near future.
      
      This infrastructure can be leveraged, thanks to a new device parameter,
      which allows to arm the napi hrtimer, instead of re-arming the device
      hard IRQ.
      
      We have noticed that on some servers with 32 RX queues or more, the chit-chat
      between the NIC and the host caused by IRQ delivery and re-arming could hurt
      throughput by ~20% on 100Gbit NIC.
      
      In contrast, hrtimers are using local (percpu) resources and might have lower
      cost.
      
      The new tunable, named napi_defer_hard_irqs, is placed in the same hierarchy
      than gro_flush_timeout (/sys/class/net/ethX/)
      
      By default, both gro_flush_timeout and napi_defer_hard_irqs are zero.
      
      This patch does not change the prior behavior of gro_flush_timeout
      if used alone : NIC hard irqs should be rearmed as before.
      
      One concrete usage can be :
      
      echo 20000 >/sys/class/net/eth1/gro_flush_timeout
      echo 10 >/sys/class/net/eth1/napi_defer_hard_irqs
      
      If at least one packet is retired, then we will reset napi counter
      to 10 (napi_defer_hard_irqs), ensuring at least 10 periodic scans
      of the queue.
      
      On busy queues, this should avoid NIC hard IRQ, while before this patch IRQ
      avoidance was only possible if napi->poll() was exhausting its budget
      and not call napi_complete_done().
      
      This feature also can be used to work around some non-optimal NIC irq
      coalescing strategies.
      
      Having the ability to insert XX usec delays between each napi->poll()
      can increase cache efficiency, since we increase batch sizes.
      
      It also keeps serving cpus not idle too long, reducing tail latencies.
      Co-developed-by: NLuigi Rizzo <lrizzo@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f8b12d6
    • D
      Merge branch 'qed-aer' · e6acd2b6
      David S. Miller 提交于
      Sudarsana Reddy Kalluru says:
      
      ====================
      qed*: Add support for pcie advanced error recovery.
      
      The patch series adds qed/qede driver changes for PCIe Advanced Error
      Recovery (AER) support.
      Patch (1) adds qed changes to enable the device to send error messages
      to root port when detected.
      Patch (2) adds qede support for handling the detected errors (AERs).
      
      Changes from previous version:
      -------------------------------
      v2: use pci_num_vf() instead of caching the value in edev.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6acd2b6
    • S
      qede: Add support for handling the pcie errors. · 731815e7
      Sudarsana Reddy Kalluru 提交于
      The error recovery is handled by management firmware (MFW) with the help of
      qed/qede drivers. Upon detecting the errors, driver informs MFW about this
      event which in turn starts a recovery process. MFW sends ERROR_RECOVERY
      notification to the driver which performs the required cleanup/recovery
      from the driver side.
      Signed-off-by: NSudarsana Reddy Kalluru <skalluru@marvell.com>
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      731815e7
    • S
      qed: Enable device error reporting capability. · 2196d831
      Sudarsana Reddy Kalluru 提交于
      The patch enables the device to send error messages to root port when
      an error is detected.
      Signed-off-by: NSudarsana Reddy Kalluru <skalluru@marvell.com>
      Signed-off-by: NAriel Elior <aelior@marvell.com>
      Signed-off-by: NIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2196d831
    • A
      net: dsa: add GRO support via gro_cells · e131a563
      Alexander Lobakin 提交于
      gro_cells lib is used by different encapsulating netdevices, such as
      geneve, macsec, vxlan etc. to speed up decapsulated traffic processing.
      CPU tag is a sort of "encapsulation", and we can use the same mechs to
      greatly improve overall DSA performance.
      skbs are passed to the GRO layer after removing CPU tags, so we don't
      need any new packet offload types as it was firstly proposed by me in
      the first GRO-over-DSA variant [1].
      
      The size of struct gro_cells is sizeof(void *), so hot struct
      dsa_slave_priv becomes only 4/8 bytes bigger, and all critical fields
      remain in one 32-byte cacheline.
      The other positive side effect is that drivers for network devices
      that can be shipped as CPU ports of DSA-driven switches can now use
      napi_gro_frags() to pass skbs to kernel. Packets built that way are
      completely non-linear and are likely being dropped without GRO.
      
      This was tested on to-be-mainlined-soon Ethernet driver that uses
      napi_gro_frags(), and the overall performance was on par with the
      variant from [1], sometimes even better due to minimal overhead.
      net.core.gro_normal_batch tuning may help to push it to the limit
      on particular setups and platforms.
      
      iperf3 IPoE VLAN NAT TCP forwarding (port1.218 -> port0) setup
      on 1.2 GHz MIPS board:
      
      5.7-rc2 baseline:
      
      [ID]  Interval         Transfer     Bitrate        Retr
      [ 5]  0.00-120.01 sec  9.00 GBytes  644 Mbits/sec  413  sender
      [ 5]  0.00-120.00 sec  8.99 GBytes  644 Mbits/sec       receiver
      
      Iface      RX packets  TX packets
      eth0       7097731     7097702
      port0      426050      6671829
      port1      6671681     425862
      port1.218  6671677     425851
      
      With this patch:
      
      [ID]  Interval         Transfer     Bitrate        Retr
      [ 5]  0.00-120.01 sec  12.2 GBytes  870 Mbits/sec  122  sender
      [ 5]  0.00-120.00 sec  12.2 GBytes  870 Mbits/sec       receiver
      
      Iface      RX packets  TX packets
      eth0       9474792     9474777
      port0      455200      353288
      port1      9019592     455035
      port1.218  353144      455024
      
      v2:
       - Add some performance examples in the commit message;
       - No functional changes.
      
      [1] https://lore.kernel.org/netdev/20191230143028.27313-1-alobakin@dlink.ru/Signed-off-by: NAlexander Lobakin <bloodyreaper@yandex.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e131a563
    • F
      ipv6: Honor all IPv6 PIO Valid Lifetime values · b75326c2
      Fernando Gont 提交于
      RFC4862 5.5.3 e) prevents received Router Advertisements from reducing
      the Valid Lifetime of configured addresses to less than two hours, thus
      preventing hosts from reacting to the information provided by a router
      that has positive knowledge that a prefix has become invalid.
      
      This patch makes hosts honor all Valid Lifetime values, as per
      draft-gont-6man-slaac-renum-06, Section 4.2. This is meant to help
      mitigate the problem discussed in draft-ietf-v6ops-slaac-renum.
      
      Note: Attacks aiming at disabling an advertised prefix via a Valid
      Lifetime of 0 are not really more harmful than other attacks
      that can be performed via forged RA messages, such as those
      aiming at completely disabling a next-hop router via an RA that
      advertises a Router Lifetime of 0, or performing a Denial of
      Service (DoS) attack by advertising illegitimate prefixes via
      forged PIOs.  In scenarios where RA-based attacks are of concern,
      proper mitigations such as RA-Guard [RFC6105] [RFC7113] should
      be implemented.
      Signed-off-by: NFernando Gont <fgont@si6networks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b75326c2
  3. 23 4月, 2020 1 次提交
    • D
      Merge branch 'dpaa2-eth-add-support-for-xdp-bulk-enqueue' · 30685b2a
      David S. Miller 提交于
      Ioana Ciornei says:
      
      ====================
      dpaa2-eth: add support for xdp bulk enqueue
      
      The first patch moves the DEV_MAP_BULK_SIZE macro into the xdp.h header
      file so that drivers can take advantage of it and use it.
      
      The following 3 patches are there to setup the scene for using the bulk
      enqueue feature.  First of all, the prototype of the enqueue function is
      changed so that it returns the number of enqueued frames. Second, the
      bulk enqueue interface is used but without any functional changes, still
      one frame at a time is enqueued.  Third, the .ndo_xdp_xmit callback is
      split into two stages, create all FDs for the xdp_frames received and
      then enqueue them.
      
      The last patch of the series builds on top of the others and instead of
      issuing an enqueue operation for each FD it issues a bulk enqueue call
      for as many frames as possible. This is repeated until all frames are
      enqueued or the maximum number of retries is hit. We do not use the
      XDP_XMIT_FLUSH flag since the architecture is not capable to store all
      frames dequeued in a NAPI cycle, instead we send out right away all
      frames received in a .ndo_xdp_xmit call.
      
      Changes in v2:
       - statically allocate an array of dpaa2_fd by frame queue
       - use the DEV_MAP_BULK_SIZE as the maximum number of xdp_frames
         received in .ndo_xdp_xmit()
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30685b2a