1. 08 4月, 2019 7 次提交
    • F
      xfrm: remove output2 indirection from xfrm_mode · 1de70830
      Florian Westphal 提交于
      similar to previous patch: no external module dependencies,
      so we can avoid the indirection by placing this in the core.
      
      This change removes the last indirection from xfrm_mode and the
      xfrm4|6_mode_{beet,tunnel}.c modules contain (almost) no code anymore.
      
      Before:
         text    data     bss     dec     hex filename
         3957     136       0    4093     ffd net/xfrm/xfrm_output.o
          587      44       0     631     277 net/ipv4/xfrm4_mode_beet.o
          649      32       0     681     2a9 net/ipv4/xfrm4_mode_tunnel.o
          625      44       0     669     29d net/ipv6/xfrm6_mode_beet.o
          599      32       0     631     277 net/ipv6/xfrm6_mode_tunnel.o
      After:
         text    data     bss     dec     hex filename
         5359     184       0    5543    15a7 net/xfrm/xfrm_output.o
          171      24       0     195      c3 net/ipv4/xfrm4_mode_beet.o
          171      24       0     195      c3 net/ipv4/xfrm4_mode_tunnel.o
          172      24       0     196      c4 net/ipv6/xfrm6_mode_beet.o
          172      24       0     196      c4 net/ipv6/xfrm6_mode_tunnel.o
      
      v2: fold the *encap_add functions into xfrm*_prepare_output
          preserve (move) output2 comment (Sabrina)
          use x->outer_mode->encap, not inner
          fix a build breakage on ppc (kbuild robot)
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      1de70830
    • F
      xfrm: remove input2 indirection from xfrm_mode · b3284df1
      Florian Westphal 提交于
      No external dependencies on any module, place this in the core.
      Increase is about 1800 byte for xfrm_input.o.
      
      The beet helpers get added to internal header, as they can be reused
      from xfrm_output.c in the next patch (kernel contains several
      copies of them in the xfrm{4,6}_mode_beet.c files).
      
      Before:
         text    data     bss     dec filename
         5578     176    2364    8118 net/xfrm/xfrm_input.o
         1180      64       0    1244 net/ipv4/xfrm4_mode_beet.o
          171      40       0     211 net/ipv4/xfrm4_mode_transport.o
         1163      40       0    1203 net/ipv4/xfrm4_mode_tunnel.o
         1083      52       0    1135 net/ipv6/xfrm6_mode_beet.o
          172      40       0     212 net/ipv6/xfrm6_mode_ro.o
          172      40       0     212 net/ipv6/xfrm6_mode_transport.o
         1056      40       0    1096 net/ipv6/xfrm6_mode_tunnel.o
      
      After:
         text    data     bss     dec filename
         7373     200    2364    9937 net/xfrm/xfrm_input.o
          587      44       0     631 net/ipv4/xfrm4_mode_beet.o
          171      32       0     203 net/ipv4/xfrm4_mode_transport.o
          649      32       0     681 net/ipv4/xfrm4_mode_tunnel.o
          625      44       0     669 net/ipv6/xfrm6_mode_beet.o
          172      32       0     204 net/ipv6/xfrm6_mode_ro.o
          172      32       0     204 net/ipv6/xfrm6_mode_transport.o
          599      32       0     631 net/ipv6/xfrm6_mode_tunnel.o
      
      v2: pass inner_mode to xfrm_inner_mode_encap_remove to fix
          AF_UNSPEC selector breakage (bisected by Benedict Wong)
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      b3284df1
    • F
      xfrm: remove gso_segment indirection from xfrm_mode · 7613b92b
      Florian Westphal 提交于
      These functions are small and we only have versions for tunnel
      and transport mode for ipv4 and ipv6 respectively.
      
      Just place the 'transport or tunnel' conditional in the protocol
      specific function instead of using an indirection.
      
      Before:
          3226       12       0     3238   net/ipv4/esp4_offload.o
          7004      492       0     7496   net/ipv4/ip_vti.o
          3339       12       0     3351   net/ipv6/esp6_offload.o
         11294      460       0    11754   net/ipv6/ip6_vti.o
          1180       72       0     1252   net/ipv4/xfrm4_mode_beet.o
           428       48       0      476   net/ipv4/xfrm4_mode_transport.o
          1271       48       0     1319   net/ipv4/xfrm4_mode_tunnel.o
          1083       60       0     1143   net/ipv6/xfrm6_mode_beet.o
           172       48       0      220   net/ipv6/xfrm6_mode_ro.o
           429       48       0      477   net/ipv6/xfrm6_mode_transport.o
          1164       48       0     1212   net/ipv6/xfrm6_mode_tunnel.o
      15730428  6937008 4046908 26714344   vmlinux
      
      After:
          3461       12       0     3473   net/ipv4/esp4_offload.o
          7000      492       0     7492   net/ipv4/ip_vti.o
          3574       12       0     3586   net/ipv6/esp6_offload.o
         11295      460       0    11755   net/ipv6/ip6_vti.o
          1180       64       0     1244   net/ipv4/xfrm4_mode_beet.o
           171       40       0      211   net/ipv4/xfrm4_mode_transport.o
          1163       40       0     1203   net/ipv4/xfrm4_mode_tunnel.o
          1083       52       0     1135   net/ipv6/xfrm6_mode_beet.o
           172       40       0      212   net/ipv6/xfrm6_mode_ro.o
           172       40       0      212   net/ipv6/xfrm6_mode_transport.o
          1056       40       0     1096   net/ipv6/xfrm6_mode_tunnel.o
      15730424  6937008 4046908 26714340   vmlinux
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      7613b92b
    • F
      xfrm: remove xmit indirection from xfrm_mode · 303c5fab
      Florian Westphal 提交于
      There are only two versions (tunnel and transport). The ip/ipv6 versions
      are only differ in sizeof(iphdr) vs ipv6hdr.
      
      Place this in the core and use x->outer_mode->encap type to call the
      correct adjustment helper.
      
      Before:
         text   data    bss     dec      filename
      15730311  6937008 4046908 26714227 vmlinux
      
      After:
      15730428  6937008 4046908 26714344 vmlinux
      
      (about 117 byte increase)
      
      v2: use family from x->outer_mode, not inner
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      303c5fab
    • F
      xfrm: remove output indirection from xfrm_mode · 0c620e97
      Florian Westphal 提交于
      Same is input indirection.  Only exception: we need to export
      xfrm_outer_mode_output for pktgen.
      
      Increases size of vmlinux by about 163 byte:
      Before:
         text    data     bss     dec      filename
      15730208  6936948 4046908 26714064   vmlinux
      
      After:
      15730311  6937008 4046908 26714227   vmlinux
      
      xfrm_inner_extract_output has no more external callers, make it static.
      
      v2: add IS_ENABLED(IPV6) guard in xfrm6_prepare_output
          add two missing breaks in xfrm_outer_mode_output (Sabrina Dubroca)
          add WARN_ON_ONCE for 'call AF_INET6 related output function, but
          CONFIG_IPV6=n' case.
          make xfrm_inner_extract_output static
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      0c620e97
    • F
      xfrm: remove input indirection from xfrm_mode · c2d305e5
      Florian Westphal 提交于
      No need for any indirection or abstraction here, both functions
      are pretty much the same and quite small, they also have no external
      dependencies.
      
      xfrm_prepare_input can then be made static.
      
      With allmodconfig build, size increase of vmlinux is 25 byte:
      
      Before:
         text   data     bss     dec      filename
      15730207  6936924 4046908 26714039  vmlinux
      
      After:
      15730208  6936948 4046908 26714064 vmlinux
      
      v2: Fix INET_XFRM_MODE_TRANSPORT name in is-enabled test (Sabrina Dubroca)
          change copied comment to refer to transport and network header,
          not skb->{h,nh}, which don't exist anymore. (Sabrina)
          make xfrm_prepare_input static (Eyal Birger)
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      c2d305e5
    • F
      xfrm: place af number into xfrm_mode struct · b262a695
      Florian Westphal 提交于
      This will be useful to know if we're supposed to decode ipv4 or ipv6.
      
      While at it, make the unregister function return void, all module_exit
      functions did just BUG(); there is never a point in doing error checks
      if there is no way to handle such error.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      b262a695
  2. 22 3月, 2019 4 次提交
    • J
      genetlink: make policy common to family · 3b0f31f2
      Johannes Berg 提交于
      Since maxattr is common, the policy can't really differ sanely,
      so make it common as well.
      
      The only user that did in fact manage to make a non-common policy
      is taskstats, which has to be really careful about it (since it's
      still using a common maxattr!). This is no longer supported, but
      we can fake it using pre_doit.
      
      This reduces the size of e.g. nl80211.o (which has lots of commands):
      
         text	   data	    bss	    dec	    hex	filename
       398745	  14323	   2240	 415308	  6564c	net/wireless/nl80211.o (before)
       397913	  14331	   2240	 414484	  65314	net/wireless/nl80211.o (after)
      --------------------------------
         -832      +8       0    -824
      
      Which is obviously just 8 bytes for each command, and an added 8
      bytes for the new policy pointer. I'm not sure why the ops list is
      counted as .text though.
      
      Most of the code transformations were done using the following spatch:
          @ops@
          identifier OPS;
          expression POLICY;
          @@
          struct genl_ops OPS[] = {
          ...,
           {
          -	.policy = POLICY,
           },
          ...
          };
      
          @@
          identifier ops.OPS;
          expression ops.POLICY;
          identifier fam;
          expression M;
          @@
          struct genl_family fam = {
                  .ops = OPS,
                  .maxattr = M,
          +       .policy = POLICY,
                  ...
          };
      
      This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing
      the cb->data as ops, which we want to change in a later genl patch.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b0f31f2
    • J
      net: dst: remove gc leftovers · 02afc7ad
      Julian Wiedmann 提交于
      Get rid of some obsolete gc-related documentation and macros that were
      missed in commit 5b7c9a8f ("net: remove dst gc related code").
      
      CC: Wei Wang <weiwan@google.com>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02afc7ad
    • D
      ipv4: Allow amount of dirty memory from fib resizing to be controllable · 9ab948a9
      David Ahern 提交于
      fib_trie implementation calls synchronize_rcu when a certain amount of
      pages are dirty from freed entries. The number of pages was determined
      experimentally in 2009 (commit c3059477).
      
      At the current setting, synchronize_rcu is called often -- 51 times in a
      second in one test with an average of an 8 msec delay adding a fib entry.
      The total impact is a lot of slow down modifying the fib. This is seen
      in the output of 'time' - the difference between real time and sys+user.
      For example, using 720,022 single path routes and 'ip -batch'[1]:
      
          $ time ./ip -batch ipv4/routes-1-hops
          real    0m14.214s
          user    0m2.513s
          sys     0m6.783s
      
      So roughly 35% of the actual time to install the routes is from the ip
      command getting scheduled out, most notably due to synchronize_rcu (this
      is observed using 'perf sched timehist').
      
      This patch makes the amount of dirty memory configurable between 64k where
      the synchronize_rcu is called often (small, low end systems that are memory
      sensitive) to 64M where synchronize_rcu is called rarely during a large
      FIB change (for high end systems with lots of memory). The default is 512kB
      which corresponds to the current setting of 128 pages with a 4kB page size.
      
      As an example, at 16MB the worst interval shows 4 calls to synchronize_rcu
      in a second blocking for up to 30 msec in a single instance, and a total
      of almost 100 msec across the 4 calls in the second. The trade off is
      allowing FIB entries to consume more memory in a given time window but
      but with much better fib insertion rates (~30% increase in prefixes/sec).
      With this patch and net.ipv4.fib_sync_mem set to 16MB, the same batch
      file runs in:
      
          $ time ./ip -batch ipv4/routes-1-hops
          real    0m9.692s
          user    0m2.491s
          sys     0m6.769s
      
      So the dead time is reduced to about 1/2 second or <5% of the real time.
      
      [1] 'ip' modified to not request ACK messages which improves route
          insertion times by about 20%
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ab948a9
    • D
      ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create · c7a1ce39
      David Ahern 提交于
      Change addrconf_f6i_alloc to generate a fib6_config and call
      ip6_route_info_create. addrconf_f6i_alloc is the last caller to
      fib6_info_alloc besides ip6_route_info_create, and there is no
      reason for it to do its own initialization on a fib6_info.
      
      Host routes need to be created even if the device is down, so add a
      new flag, fc_ignore_dev_down, to fib6_config and update fib6_nh_init
      to not error out if device is not up.
      
      Notes on the conversion:
      - ip_fib_metrics_init is the same as fib6_config has fc_mx set to NULL
        and fc_mx_len set to 0
      - dst_nocount is handled by the RTF_ADDRCONF flag
      - dst_host is handled by fc_dst_len = 128
      
      nh_gw does not get set after the conversion to ip6_route_info_create
      but it should not be set in addrconf_f6i_alloc since this is a host
      route not a gateway route.
      
      Everything else is a straight forward map between fib6_info and
      fib6_config.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7a1ce39
  3. 21 3月, 2019 2 次提交
  4. 20 3月, 2019 2 次提交
  5. 19 3月, 2019 2 次提交
    • X
      sctp: get sctphdr by offset in sctp_compute_cksum · 273160ff
      Xin Long 提交于
      sctp_hdr(skb) only works when skb->transport_header is set properly.
      
      But in Netfilter, skb->transport_header for ipv6 is not guaranteed
      to be right value for sctphdr. It would cause to fail to check the
      checksum for sctp packets.
      
      So fix it by using offset, which is always right in all places.
      
      v1->v2:
        - Fix the changelog.
      
      Fixes: e6d8b64b ("net: sctp: fix and consolidate SCTP checksumming code")
      Reported-by: NLi Shuang <shuali@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      273160ff
    • M
      packets: Always register packet sk in the same order · a4dc6a49
      Maxime Chevallier 提交于
      When using fanouts with AF_PACKET, the demux functions such as
      fanout_demux_cpu will return an index in the fanout socket array, which
      corresponds to the selected socket.
      
      The ordering of this array depends on the order the sockets were added
      to a given fanout group, so for FANOUT_CPU this means sockets are bound
      to cpus in the order they are configured, which is OK.
      
      However, when stopping then restarting the interface these sockets are
      bound to, the sockets are reassigned to the fanout group in the reverse
      order, due to the fact that they were inserted at the head of the
      interface's AF_PACKET socket list.
      
      This means that traffic that was directed to the first socket in the
      fanout group is now directed to the last one after an interface restart.
      
      In the case of FANOUT_CPU, traffic from CPU0 will be directed to the
      socket that used to receive traffic from the last CPU after an interface
      restart.
      
      This commit introduces a helper to add a socket at the tail of a list,
      then uses it to register AF_PACKET sockets.
      
      Note that this changes the order in which sockets are listed in /proc and
      with sock_diag.
      
      Fixes: dc99f600 ("packet: Add fanout support")
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4dc6a49
  6. 16 3月, 2019 1 次提交
  7. 13 3月, 2019 1 次提交
  8. 11 3月, 2019 1 次提交
  9. 10 3月, 2019 1 次提交
  10. 08 3月, 2019 1 次提交
    • P
      netfilter: nf_tables: fix set double-free in abort path · 40ba1d9b
      Pablo Neira Ayuso 提交于
      The abort path can cause a double-free of an anonymous set.
      Added-and-to-be-aborted rule looks like this:
      
      udp dport { 137, 138 } drop
      
      The to-be-aborted transaction list looks like this:
      
      newset
      newsetelem
      newsetelem
      rule
      
      This gets walked in reverse order, so first pass disables the rule, the
      set elements, then the set.
      
      After synchronize_rcu(), we then destroy those in same order: rule, set
      element, set element, newset.
      
      Problem is that the anonymous set has already been bound to the rule, so
      the rule (lookup expression destructor) already frees the set, when then
      cause use-after-free when trying to delete the elements from this set,
      then try to free the set again when handling the newset expression.
      
      Rule releases the bound set in first place from the abort path, this
      causes the use-after-free on set element removal when undoing the new
      element transactions. To handle this, skip new element transaction if
      set is bound from the abort path.
      
      This is still causes the use-after-free on set element removal.  To
      handle this, remove transaction from the list when the set is already
      bound.
      
      Joint work with Florian Westphal.
      
      Fixes: f6ac8585 ("netfilter: nf_tables: unbind set in rule from commit path")
      Bugzilla: https://bugzilla.netfilter.org/show_bug.cgi?id=1325Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      40ba1d9b
  11. 05 3月, 2019 1 次提交
  12. 04 3月, 2019 3 次提交
    • B
      tls: Fix write space handling · 7463d3a2
      Boris Pismenny 提交于
      TLS device cannot use the sw context. This patch returns the original
      tls device write space handler and moves the sw/device specific portions
      to the relevant files.
      
      Also, we remove the write_space call for the tls_sw flow, because it
      handles partial records in its delayed tx work handler.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Reviewed-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7463d3a2
    • B
      tls: Fix tls_device handling of partial records · 94850257
      Boris Pismenny 提交于
      Cleanup the handling of partial records while fixing a bug where the
      tls_push_pending_closed_record function is using the software tls
      context instead of the hardware context.
      
      The bug resulted in the following crash:
      [   88.791229] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      [   88.793271] #PF error: [normal kernel read fault]
      [   88.794449] PGD 800000022a426067 P4D 800000022a426067 PUD 22a156067 PMD 0
      [   88.795958] Oops: 0000 [#1] SMP PTI
      [   88.796884] CPU: 2 PID: 4973 Comm: openssl Not tainted 5.0.0-rc4+ #3
      [   88.798314] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      [   88.800067] RIP: 0010:tls_tx_records+0xef/0x1d0 [tls]
      [   88.801256] Code: 00 02 48 89 43 08 e8 a0 0b 96 d9 48 89 df e8 48 dd
      4d d9 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39
      c7 <49> 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00
      [   88.805179] RSP: 0018:ffffbd888186fca8 EFLAGS: 00010213
      [   88.806458] RAX: ffff9af1ed657c98 RBX: ffff9af1e88a1980 RCX: 0000000000000000
      [   88.808050] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9af1e88a1980
      [   88.809724] RBP: ffff9af1e88a1980 R08: 0000000000000017 R09: ffff9af1ebeeb700
      [   88.811294] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      [   88.812917] R13: ffff9af1e88a1980 R14: ffff9af1ec13f800 R15: 0000000000000000
      [   88.814506] FS:  00007fcad2240740(0000) GS:ffff9af1f7880000(0000) knlGS:0000000000000000
      [   88.816337] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   88.817717] CR2: 0000000000000000 CR3: 0000000228b3e000 CR4: 00000000001406e0
      [   88.819328] Call Trace:
      [   88.820123]  tls_push_data+0x628/0x6a0 [tls]
      [   88.821283]  ? remove_wait_queue+0x20/0x60
      [   88.822383]  ? n_tty_read+0x683/0x910
      [   88.823363]  tls_device_sendmsg+0x53/0xa0 [tls]
      [   88.824505]  sock_sendmsg+0x36/0x50
      [   88.825492]  sock_write_iter+0x87/0x100
      [   88.826521]  __vfs_write+0x127/0x1b0
      [   88.827499]  vfs_write+0xad/0x1b0
      [   88.828454]  ksys_write+0x52/0xc0
      [   88.829378]  do_syscall_64+0x5b/0x180
      [   88.830369]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   88.831603] RIP: 0033:0x7fcad1451680
      
      [ 1248.470626] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      [ 1248.472564] #PF error: [normal kernel read fault]
      [ 1248.473790] PGD 0 P4D 0
      [ 1248.474642] Oops: 0000 [#1] SMP PTI
      [ 1248.475651] CPU: 3 PID: 7197 Comm: openssl Tainted: G           OE 5.0.0-rc4+ #3
      [ 1248.477426] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      [ 1248.479310] RIP: 0010:tls_tx_records+0x110/0x1f0 [tls]
      [ 1248.480644] Code: 00 02 48 89 43 08 e8 4f cb 63 d7 48 89 df e8 f7 9c
      1b d7 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39
      c7 <49> 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00
      [ 1248.484825] RSP: 0018:ffffaa0a41543c08 EFLAGS: 00010213
      [ 1248.486154] RAX: ffff955a2755dc98 RBX: ffff955a36031980 RCX: 0000000000000006
      [ 1248.487855] RDX: 0000000000000000 RSI: 000000000000002b RDI: 0000000000000286
      [ 1248.489524] RBP: ffff955a36031980 R08: 0000000000000000 R09: 00000000000002b1
      [ 1248.491394] R10: 0000000000000003 R11: 00000000ad55ad55 R12: 0000000000000000
      [ 1248.493162] R13: 0000000000000000 R14: ffff955a2abe6c00 R15: 0000000000000000
      [ 1248.494923] FS:  0000000000000000(0000) GS:ffff955a378c0000(0000) knlGS:0000000000000000
      [ 1248.496847] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1248.498357] CR2: 0000000000000000 CR3: 000000020c40e000 CR4: 00000000001406e0
      [ 1248.500136] Call Trace:
      [ 1248.500998]  ? tcp_check_oom+0xd0/0xd0
      [ 1248.502106]  tls_sk_proto_close+0x127/0x1e0 [tls]
      [ 1248.503411]  inet_release+0x3c/0x60
      [ 1248.504530]  __sock_release+0x3d/0xb0
      [ 1248.505611]  sock_close+0x11/0x20
      [ 1248.506612]  __fput+0xb4/0x220
      [ 1248.507559]  task_work_run+0x88/0xa0
      [ 1248.508617]  do_exit+0x2cb/0xbc0
      [ 1248.509597]  ? core_sys_select+0x17a/0x280
      [ 1248.510740]  do_group_exit+0x39/0xb0
      [ 1248.511789]  get_signal+0x1d0/0x630
      [ 1248.512823]  do_signal+0x36/0x620
      [ 1248.513822]  exit_to_usermode_loop+0x5c/0xc6
      [ 1248.515003]  do_syscall_64+0x157/0x180
      [ 1248.516094]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [ 1248.517456] RIP: 0033:0x7fb398bd3f53
      [ 1248.518537] Code: Bad RIP value.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94850257
    • T
      net: dsa: add KSZ9893 switch tagging support · 88b573af
      Tristram Ha 提交于
      KSZ9893 switch is similar to KSZ9477 switch except the ingress tail tag
      has 1 byte instead of 2 bytes.  The size of the portmap is smaller and
      so the override and lookup bits are also moved.
      Signed-off-by: NTristram Ha <Tristram.Ha@microchip.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88b573af
  13. 03 3月, 2019 1 次提交
    • E
      net: sched: put back q.qlen into a single location · 46b1c18f
      Eric Dumazet 提交于
      In the series fc8b81a5 ("Merge branch 'lockless-qdisc-series'")
      John made the assumption that the data path had no need to read
      the qdisc qlen (number of packets in the qdisc).
      
      It is true when pfifo_fast is used as the root qdisc, or as direct MQ/MQPRIO
      children.
      
      But pfifo_fast can be used as leaf in class full qdiscs, and existing
      logic needs to access the child qlen in an efficient way.
      
      HTB breaks badly, since it uses cl->leaf.q->q.qlen in :
        htb_activate() -> WARN_ON()
        htb_dequeue_tree() to decide if a class can be htb_deactivated
        when it has no more packets.
      
      HFSC, DRR, CBQ, QFQ have similar issues, and some calls to
      qdisc_tree_reduce_backlog() also read q.qlen directly.
      
      Using qdisc_qlen_sum() (which iterates over all possible cpus)
      in the data path is a non starter.
      
      It seems we have to put back qlen in a central location,
      at least for stable kernels.
      
      For all qdisc but pfifo_fast, qlen is guarded by the qdisc lock,
      so the existing q.qlen{++|--} are correct.
      
      For 'lockless' qdisc (pfifo_fast so far), we need to use atomic_{inc|dec}()
      because the spinlock might be not held (for example from
      pfifo_fast_enqueue() and pfifo_fast_dequeue())
      
      This patch adds atomic_qlen (in the same location than qlen)
      and renames the following helpers, since we want to express
      they can be used without qdisc lock, and that qlen is no longer percpu.
      
      - qdisc_qstats_cpu_qlen_dec -> qdisc_qstats_atomic_qlen_dec()
      - qdisc_qstats_cpu_qlen_inc -> qdisc_qstats_atomic_qlen_inc()
      
      Later (net-next) we might revert this patch by tracking all these
      qlen uses and replace them by a more efficient method (not having
      to access a precise qlen, but an empty/non_empty status that might
      be less expensive to maintain/track).
      
      Another possibility is to have a legacy pfifo_fast version that would
      be used when used a a child qdisc, since the parent qdisc needs
      a spinlock anyway. But then, future lockless qdiscs would also
      have the same problem.
      
      Fixes: 7e66016f ("net: sched: helpers to sum qlen and qlen for per cpu logic")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46b1c18f
  14. 02 3月, 2019 2 次提交
  15. 01 3月, 2019 2 次提交
  16. 28 2月, 2019 4 次提交
  17. 27 2月, 2019 5 次提交