1. 27 3月, 2020 1 次提交
  2. 24 3月, 2020 1 次提交
  3. 20 3月, 2020 9 次提交
  4. 19 3月, 2020 1 次提交
  5. 18 3月, 2020 5 次提交
    • R
      net: phylink: pcs: add 802.3 clause 22 helpers · 74db1c18
      Russell King 提交于
      Implement helpers for PCS accessed via the MII bus using 802.3 clause
      22 cycles, conforming to 802.3 clause 37 and Cisco SGMII specifications
      for the advertisement word.
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74db1c18
    • N
      net: bridge: vlan options: add support for tunnel mapping set/del · 569da082
      Nikolay Aleksandrov 提交于
      This patch adds support for manipulating vlan/tunnel mappings. The
      tunnel ids are globally unique and are one per-vlan. There were two
      trickier issues - first in order to support vlan ranges we have to
      compute the current tunnel id in the following way:
       - base tunnel id (attr) + current vlan id - starting vlan id
      This is in line how the old API does vlan/tunnel mapping with ranges. We
      already have the vlan range present, so it's redundant to add another
      attribute for the tunnel range end. It's simply base tunnel id + vlan
      range. And second to support removing mappings we need an out-of-band way
      to tell the option manipulating function because there are no
      special/reserved tunnel id values, so we use a vlan flag to denote the
      operation is tunnel mapping removal.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      569da082
    • N
      net: bridge: vlan options: add support for tunnel id dumping · 188c67dd
      Nikolay Aleksandrov 提交于
      Add a new option - BRIDGE_VLANDB_ENTRY_TUNNEL_ID which is used to dump
      the tunnel id mapping. Since they're unique per vlan they can enter a
      vlan range if they're consecutive, thus we can calculate the tunnel id
      range map simply as: vlan range end id - vlan range start id. The
      starting point is the tunnel id in BRIDGE_VLANDB_ENTRY_TUNNEL_ID. This
      is similar to how the tunnel entries can be created in a range via the
      old API (a vlan range maps to a tunnel range).
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      188c67dd
    • E
      net_sched: sch_fq: enable use of hrtimer slack · 583396f4
      Eric Dumazet 提交于
      Add a new attribute to control the fq qdisc hrtimer slack.
      
      Default is set to 10 usec.
      
      When/if packets are throttled, fq set up an hrtimer that can
      lead to one interrupt per packet in the throttled queue.
      
      By using a timer slack, we allow better use of timer interrupts,
      by giving them a chance to call multiple timer callbacks
      at each hardware interrupt.
      
      Also, giving a slack allows FQ to dequeue batches of packets
      instead of a single one, thus increasing xmit_more efficiency.
      
      This has no negative effect on the rate a TCP flow can sustain,
      since each TCP flow maintains its own precise vtime (tp->tcp_wstamp_ns)
      
      v2: added strict netlink checking (as feedback from Jakub Kicinski)
      
      Tested:
       1000 concurrent flows all using paced packets.
       1,000,000 packets sent per second.
      
      Before the patch :
      
      $ vmstat 2 10
      procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
       r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
       0  0      0 60726784  23628 3485992    0    0   138     1  977  535  0 12 87  0  0
       0  0      0 60714700  23628 3485628    0    0     0     0 1568827 26462  0 22 78  0  0
       1  0      0 60716012  23628 3485656    0    0     0     0 1570034 26216  0 22 78  0  0
       0  0      0 60722420  23628 3485492    0    0     0     0 1567230 26424  0 22 78  0  0
       0  0      0 60727484  23628 3485556    0    0     0     0 1568220 26200  0 22 78  0  0
       2  0      0 60718900  23628 3485380    0    0     0    40 1564721 26630  0 22 78  0  0
       2  0      0 60718096  23628 3485332    0    0     0     0 1562593 26432  0 22 78  0  0
       0  0      0 60719608  23628 3485064    0    0     0     0 1563806 26238  0 22 78  0  0
       1  0      0 60722876  23628 3485236    0    0     0   130 1565874 26566  0 22 78  0  0
       1  0      0 60722752  23628 3484908    0    0     0     0 1567646 26247  0 22 78  0  0
      
      After the patch, slack of 10 usec, we can see a reduction of interrupts
      per second, and a small decrease of reported cpu usage.
      
      $ vmstat 2 10
      procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
       r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
       1  0      0 60722564  23628 3484728    0    0   133     1  696  545  0 13 87  0  0
       1  0      0 60722568  23628 3484824    0    0     0     0 977278 25469  0 20 80  0  0
       0  0      0 60716396  23628 3484764    0    0     0     0 979997 25326  0 20 80  0  0
       0  0      0 60713844  23628 3484960    0    0     0     0 981394 25249  0 20 80  0  0
       2  0      0 60720468  23628 3484916    0    0     0     0 982860 25062  0 20 80  0  0
       1  0      0 60721236  23628 3484856    0    0     0     0 982867 25100  0 20 80  0  0
       1  0      0 60722400  23628 3484456    0    0     0     8 982698 25303  0 20 80  0  0
       0  0      0 60715396  23628 3484428    0    0     0     0 981777 25176  0 20 80  0  0
       0  0      0 60716520  23628 3486544    0    0     0    36 978965 27857  0 21 79  0  0
       0  0      0 60719592  23628 3486516    0    0     0    22 977318 25106  0 20 80  0  0
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      583396f4
    • L
      netfilter: Introduce egress hook · 8537f786
      Lukas Wunner 提交于
      Commit e687ad60 ("netfilter: add netfilter ingress hook after
      handle_ing() under unique static key") introduced the ability to
      classify packets on ingress.
      
      Allow the same on egress.  Position the hook immediately before a packet
      is handed to tc and then sent out on an interface, thereby mirroring the
      ingress order.  This order allows marking packets in the netfilter
      egress hook and subsequently using the mark in tc.  Another benefit of
      this order is consistency with a lot of existing documentation which
      says that egress tc is performed after netfilter hooks.
      
      Egress hooks already exist for the most common protocols, such as
      NF_INET_LOCAL_OUT or NF_ARP_OUT, and those are to be preferred because
      they are executed earlier during packet processing.  However for more
      exotic protocols, there is currently no provision to apply netfilter on
      egress.  A common workaround is to enslave the interface to a bridge and
      use ebtables, or to resort to tc.  But when the ingress hook was
      introduced, consensus was that users should be given the choice to use
      netfilter or tc, whichever tool suits their needs best:
      https://lore.kernel.org/netdev/20150430153317.GA3230@salvia/
      This hook is also useful for NAT46/NAT64, tunneling and filtering of
      locally generated af_packet traffic such as dhclient.
      
      There have also been occasional user requests for a netfilter egress
      hook in the past, e.g.:
      https://www.spinics.net/lists/netfilter/msg50038.html
      
      Performance measurements with pktgen surprisingly show a speedup rather
      than a slowdown with this commit:
      
      * Without this commit:
        Result: OK: 34240933(c34238375+d2558) usec, 100000000 (60byte,0frags)
        2920481pps 1401Mb/sec (1401830880bps) errors: 0
      
      * With this commit:
        Result: OK: 33997299(c33994193+d3106) usec, 100000000 (60byte,0frags)
        2941410pps 1411Mb/sec (1411876800bps) errors: 0
      
      * Without this commit + tc egress:
        Result: OK: 39022386(c39019547+d2839) usec, 100000000 (60byte,0frags)
        2562631pps 1230Mb/sec (1230062880bps) errors: 0
      
      * With this commit + tc egress:
        Result: OK: 37604447(c37601877+d2570) usec, 100000000 (60byte,0frags)
        2659259pps 1276Mb/sec (1276444320bps) errors: 0
      
      * With this commit + nft egress:
        Result: OK: 41436689(c41434088+d2600) usec, 100000000 (60byte,0frags)
        2413320pps 1158Mb/sec (1158393600bps) errors: 0
      
      Tested on a bare-metal Core i7-3615QM, each measurement was performed
      three times to verify that the numbers are stable.
      
      Commands to perform a measurement:
      modprobe pktgen
      echo "add_device lo@3" > /proc/net/pktgen/kpktgend_3
      samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i 'lo@3' -n 100000000
      
      Commands for testing tc egress:
      tc qdisc add dev lo clsact
      tc filter add dev lo egress protocol ip prio 1 u32 match ip dst 4.3.2.1/32
      
      Commands for testing nft egress:
      nft add table netdev t
      nft add chain netdev t co \{ type filter hook egress device lo priority 0 \; \}
      nft add rule netdev t co ip daddr 4.3.2.1/32 drop
      
      All testing was performed on the loopback interface to avoid distorting
      measurements by the packet handling in the low-level Ethernet driver.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8537f786
  6. 16 3月, 2020 1 次提交
    • E
      macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw) · 48ef50fa
      Era Mayflower 提交于
      Netlink support of extended packet number cipher suites,
      allows adding and updating XPN macsec interfaces.
      
      Added support in:
          * Creating interfaces with GCM-AES-XPN-128 and GCM-AES-XPN-256 suites.
          * Setting and getting 64bit packet numbers with of SAs.
          * Setting (only on SA creation) and getting ssci of SAs.
          * Setting salt when installing a SAK.
      
      Added 2 cipher suite identifiers according to 802.1AE-2018 table 14-1:
          * MACSEC_CIPHER_ID_GCM_AES_XPN_128
          * MACSEC_CIPHER_ID_GCM_AES_XPN_256
      
      In addition, added 2 new netlink attribute types:
          * MACSEC_SA_ATTR_SSCI
          * MACSEC_SA_ATTR_SALT
      
      Depends on: macsec: Support XPN frame handling - IEEE 802.1AEbw.
      Signed-off-by: NEra Mayflower <mayflowerera@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48ef50fa
  7. 15 3月, 2020 5 次提交
    • G
      netfilter: Replace zero-length array with flexible-array member · 6daf1414
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      Lastly, fix checkpatch.pl warning
      WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
      in net/bridge/netfilter/ebtables.c
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      6daf1414
    • X
      netfilter: nft_tunnel: add support for geneve opts · 925d8446
      Xin Long 提交于
      Like vxlan and erspan opts, geneve opts should also be supported in
      nft_tunnel. The difference is geneve RFC (draft-ietf-nvo3-geneve-14)
      allows a geneve packet to carry multiple geneve opts. So with this
      patch, nftables/libnftnl would do:
      
        # nft add table ip filter
        # nft add chain ip filter input { type filter hook input priority 0 \; }
        # nft add tunnel filter geneve_02 { type geneve\; id 2\; \
          ip saddr 192.168.1.1\; ip daddr 192.168.1.2\; \
          sport 9000\; dport 9001\; dscp 1234\; ttl 64\; flags 1\; \
          opts \"1:1:34567890,2:2:12121212,3:3:1212121234567890\"\; }
        # nft list tunnels table filter
          table ip filter {
          	tunnel geneve_02 {
          		id 2
          		ip saddr 192.168.1.1
          		ip daddr 192.168.1.2
          		sport 9000
          		dport 9001
          		tos 18
          		ttl 64
          		flags 1
          		geneve opts 1:1:34567890,2:2:12121212,3:3:1212121234567890
          	}
          }
      
      v1->v2:
        - no changes, just post it separately.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      925d8446
    • M
      netfilter: xtables: Add snapshot of hardidletimer target · 68983a35
      Manoj Basapathi 提交于
      This is a snapshot of hardidletimer netfilter target.
      
      This patch implements a hardidletimer Xtables target that can be
      used to identify when interfaces have been idle for a certain period
      of time.
      
      Timers are identified by labels and are created when a rule is set
      with a new label. The rules also take a timeout value (in seconds) as
      an option. If more than one rule uses the same timer label, the timer
      will be restarted whenever any of the rules get a hit.
      
      One entry for each timer is created in sysfs. This attribute contains
      the timer remaining for the timer to expire. The attributes are
      located under the xt_idletimer class:
      
      /sys/class/xt_idletimer/timers/<label>
      
      When the timer expires, the target module sends a sysfs notification
      to the userspace, which can then decide what to do (eg. disconnect to
      save power)
      
      Compared to IDLETIMER, HARDIDLETIMER can send notifications when
      CPU is in suspend too, to notify the timer expiry.
      
      v1->v2: Moved all functionality into IDLETIMER module to avoid
      code duplication per comment from Florian.
      Signed-off-by: NManoj Basapathi <manojbm@codeaurora.org>
      Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      68983a35
    • P
      net: sched: RED: Introduce an ECN nodrop mode · 0a7fad23
      Petr Machata 提交于
      When the RED Qdisc is currently configured to enable ECN, the RED algorithm
      is used to decide whether a certain SKB should be marked. If that SKB is
      not ECN-capable, it is early-dropped.
      
      It is also possible to keep all traffic in the queue, and just mark the
      ECN-capable subset of it, as appropriate under the RED algorithm. Some
      switches support this mode, and some installations make use of it.
      
      To that end, add a new RED flag, TC_RED_NODROP. When the Qdisc is
      configured with this flag, non-ECT traffic is enqueued instead of being
      early-dropped.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a7fad23
    • P
      net: sched: Allow extending set of supported RED flags · 14bc175d
      Petr Machata 提交于
      The qdiscs RED, GRED, SFQ and CHOKE use different subsets of the same pool
      of global RED flags. These are passed in tc_red_qopt.flags. However none of
      these qdiscs validate the flag field, and just copy it over wholesale to
      internal structures, and later dump it back. (An exception is GRED, which
      does validate for VQs -- however not for the main setup.)
      
      A broken userspace can therefore configure a qdisc with arbitrary
      unsupported flags, and later expect to see the flags on qdisc dump. The
      current ABI therefore allows storage of several bits of custom data to
      qdisc instances of the types mentioned above. How many bits, depends on
      which flags are meaningful for the qdisc in question. E.g. SFQ recognizes
      flags ECN and HARDDROP, and the rest is not interpreted.
      
      If SFQ ever needs to support ADAPTATIVE, it needs another way of doing it,
      and at the same time it needs to retain the possibility to store 6 bits of
      uninterpreted data. Likewise RED, which adds a new flag later in this
      patchset.
      
      To that end, this patch adds a new function, red_get_flags(), to split the
      passed flags of RED-like qdiscs to flags and user bits, and
      red_validate_flags() to validate the resulting configuration. It further
      adds a new attribute, TCA_RED_FLAGS, to pass arbitrary flags.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14bc175d
  8. 13 3月, 2020 14 次提交
  9. 12 3月, 2020 1 次提交
  10. 10 3月, 2020 1 次提交
  11. 09 3月, 2020 1 次提交