1. 05 9月, 2018 7 次提交
    • S
      mac80211: support reporting 0-length PSDU in radiotap · c3d1f875
      Shaul Triebitz 提交于
      For certain sounding frames, it may be useful to report them
      to userspace even though they don't have a PSDU in order to
      determine the PHY parameters (e.g. VHT rate/stream config.)
      Add support for this to mac80211.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NShaul Triebitz <shaul.triebitz@intel.com>
      Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      c3d1f875
    • A
      mac80211: Fix PTK rekey freezes and clear text leak · 62872a9b
      Alexander Wetzel 提交于
      Rekeying PTK keys without "Extended Key ID for Individually Addressed
      Frames" did use a procedure not suitable to replace in-use keys and
      could caused the following issues:
      
       1) Freeze caused by incoming frames:
          If the local STA installed the key prior to the remote STA we still
          had the old key active in the hardware when mac80211 switched over
          to the new key.
          Therefore there was a window where the card could hand over frames
          decoded with the old key to mac80211 and bump the new PN (IV) value
          to an incorrect high number. When it happened the local replay
          detection silently started to drop all frames sent with the new key.
      
       2) Freeze caused by outgoing frames:
          If mac80211 was providing the PN (IV) and handed over a clear text
          frame for encryption to the hardware prior to a key change the
          driver/card could have processed the queued frame after switching
          to the new key. This bumped the PN value on the remote STA to an
          incorrect high number, tricking the remote STA to discard all frames
          we sent later.
      
       3) Freeze caused by RX aggregation reorder buffer:
          An aggregation session started with the old key and ending after the
          switch to the new key also bumped the PN to an incorrect high number,
          freezing the connection quite similar to 1).
      
       4) Freeze caused by repeating lost frames in an aggregation session:
          A driver could repeat a lost frame and encrypt it with the new key
          while in a TX aggregation session without updating the PN for the
          new key. This also could freeze connections similar to 2).
      
       5) Clear text leak:
          Removing encryption offload from the card cleared the encryption
          offload flag only after the card had deleted the key and we did not
          stop TX during the rekey. The driver/card could therefore get
          unencrypted frames from mac80211 while no longer be instructed to
          encrypt them.
      
      To prevent those issues the key install logic has been changed:
       - Mac80211 divers known to be able to rekey PTK0 keys have to set
         @NL80211_EXT_FEATURE_CAN_REPLACE_PTK0,
       - mac80211 stops queuing frames depending on the key during the replace
       - the key is first replaced in the hardware and after that in mac80211
       - and mac80211 stops/blocks new aggregation sessions during the rekey.
      
      For drivers not setting
      @NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 the user space must avoid PTK
      rekeys if "Extended Key ID for Individually Addressed Frames" is not
      being used. Rekeys for mac80211 drivers without this flag will generate a
      warning and use an extra call to ieee80211_flush_queues() to both
      highlight and try to prevent the issues with not updated drivers.
      
      The core of the fix changes the key install procedure from:
       - atomic switch over to the new key in mac80211
       - remove the old key in the hardware (stops encryption offloading, fall
         back to software encryption with a potential clear text packet leak
         in between)
       - delete the inactive old key in mac80211
       - enable hardware encryption offloading for the new key
      to:
       - if it's a PTK mark the old key as tainted to drop TX frames with the
         outgoing key
       - replace the key in hardware with the new one
       - atomic switch over to the new (not marked as tainted) key in
         mac80211 (which also resumes TX)
       - delete the inactive old key in mac80211
      
      With the new sequence the hardware will be unable to decrypt frames
      encrypted with the old key prior to switching to the new key in mac80211
      and thus prevent PNs from packets decrypted with the old key to be
      accounted against the new key.
      
      For that to work the drivers have to provide a clear boundary.
      Mac80211 drivers setting @NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 confirm
      to provide it and mac80211 will then be able to correctly rekey in-use
      PTK keys with those drivers.
      
      The mac80211 requirements for drivers to set the flag have been added to
      the "Hardware crypto acceleration" documentation section. It drills down
      to:
      The drivers must not hand over frames decrypted with the old key to
      mac80211 once the call to set_key() with %DISABLE_KEY has been
      completed. It's allowed to either drop or continue to use the old key
      for any outgoing frames which are already in the queues, but it must not
      send out any of them unencrypted or encrypted with the new key.
      
      Even with the new boundary in place aggregation sessions with the
      reorder buffer are problematic:
      RX aggregation session started prior and completed after the rekey could
      still dump frames received with the old key at mac80211 after it
      switched over to the new key. This is side stepped by stopping all (RX
      and TX) aggregation sessions when replacing a PTK key and hardware key
      offloading.
      Stopping TX aggregation sessions avoids the need to get
      the PNs (IVs) updated in frames prepared for the old key and
      (re)transmitted after the switch to the new key. As a bonus it improves
      the compatibility when the remote STA is not handling rekeys as it
      should.
      
      When using software crypto aggregation sessions are not stopped.
      Mac80211 won't be able to decode the dangerous frames and discard them
      without special handling.
      Signed-off-by: NAlexander Wetzel <alexander@wetzel-home.de>
      [trim overly long rekey warning]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      62872a9b
    • S
      mac80211: support radiotap L-SIG data · d1332e7b
      Shaul Triebitz 提交于
      As before with HE, the data needs to be provided by the
      driver in the skb head, since there's not enough space
      in the skb CB.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NShaul Triebitz <shaul.triebitz@intel.com>
      Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      d1332e7b
    • W
      mac80211: Store sk_pacing_shift in ieee80211_hw · 70e53669
      Wen Gong 提交于
      Make it possibly for drivers to adjust the default skb_pacing_shift
      by storing it in the hardware struct.
      Signed-off-by: NWen Gong <wgong@codeaurora.org>
      [adjust commit log, move & adjust comment]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      70e53669
    • J
      mac80211: introduce capability flags for VHT EXT NSS support · 09b4a4fa
      Johannes Berg 提交于
      Depending on whether or not rate control supports selecting
      rates depending on the bandwidth, we can use VHT extended
      NSS support. In essence, this is dot11VHTExtendedNSSBWCapable
      from the spec, since depending on that we'll need to parse
      the bandwidth.
      
      If needed, also set/clear the VHT Capability Element bit for
      this capability so that we don't advertise it erroneously or
      don't advertise it when we actually use it.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      09b4a4fa
    • S
      cfg80211: add he_capabilities (ext) IE to AP settings · 244eb9ae
      Shaul Triebitz 提交于
      Same as for HT and VHT.
      This helps the lower level to know whether the AP supports HE.
      Signed-off-by: NShaul Triebitz <shaul.triebitz@intel.com>
      Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      244eb9ae
    • J
      mac80211: add an optional TXQ for other PS-buffered frames · adf8ed01
      Johannes Berg 提交于
      Some drivers may want to also use the TXQ abstraction with
      non-data packets that need powersave buffering, so add a
      hardware flag to allow this.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      adf8ed01
  2. 02 9月, 2018 1 次提交
    • V
      net/tls: Add support for async decryption of tls records · 94524d8f
      Vakul Garg 提交于
      When tls records are decrypted using asynchronous acclerators such as
      NXP CAAM engine, the crypto apis return -EINPROGRESS. Presently, on
      getting -EINPROGRESS, the tls record processing stops till the time the
      crypto accelerator finishes off and returns the result. This incurs a
      context switch and is not an efficient way of accessing the crypto
      accelerators. Crypto accelerators work efficient when they are queued
      with multiple crypto jobs without having to wait for the previous ones
      to complete.
      
      The patch submits multiple crypto requests without having to wait for
      for previous ones to complete. This has been implemented for records
      which are decrypted in zero-copy mode. At the end of recvmsg(), we wait
      for all the asynchronous decryption requests to complete.
      
      The references to records which have been sent for async decryption are
      dropped. For cases where record decryption is not possible in zero-copy
      mode, asynchronous decryption is not used and we wait for decryption
      crypto api to complete.
      
      For crypto requests executing in async fashion, the memory for
      aead_request, sglists and skb etc is freed from the decryption
      completion handler. The decryption completion handler wakesup the
      sleeping user context when recvmsg() flags that it has done sending
      all the decryption requests and there are no more decryption requests
      pending to be completed.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Reviewed-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94524d8f
  3. 01 9月, 2018 2 次提交
  4. 30 8月, 2018 4 次提交
  5. 28 8月, 2018 4 次提交
    • A
      cfg80211: Add support for 60GHz band channels 5 and 6 · 9cf0a0b4
      Alexei Avshalom Lazar 提交于
      The current support in the 60GHz band is for channels 1-4.
      Add support for channels 5 and 6.
      This requires enlarging ieee80211_channel.center_freq from u16 to u32.
      Signed-off-by: NAlexei Avshalom Lazar <ailizaro@codeaurora.org>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      9cf0a0b4
    • M
      mac80211: add stop/start logic for software TXQs · 21a5d4c3
      Manikanta Pubbisetty 提交于
      Sometimes, it is required to stop the transmissions momentarily and
      resume it later; stopping the txqs becomes very critical in scenarios where
      the packet transmission has to be ceased completely. For example, during
      the hardware restart, during off channel operations,
      when initiating CSA(upon detecting a radar on the DFS channel), etc.
      
      The TX queue stop/start logic in mac80211 works well in stopping the TX
      when drivers make use of netdev queues, i.e, when Qdiscs in network layer
      take care of traffic scheduling. Since the devices implementing
      wake_tx_queue can run without Qdiscs, packets will be handed to mac80211
      directly without queueing them in the netdev queues.
      
      Also, mac80211 does not invoke any of the
      netif_stop_*/netif_wake_* APIs if wake_tx_queue is implemented.
      Since the queues are not stopped in this case, transmissions can continue
      and this will impact negatively on the operation of the wireless device.
      
      For example,
      During hardware restart, we stop the netdev queues so that packets are
      not sent to the driver. Since ath10k implements wake_tx_queue,
      TX queues will not be stopped and packets might reach the hardware while
      it is restarting; this can make hardware unresponsive and the only
      possible option for recovery is to reboot the entire system.
      
      There is another problem to this, it is observed that the packets
      were sent on the DFS channel for a prolonged duration after radar
      detection impacting the channel closing time.
      
      We can still invoke netif stop/wake APIs when wake_tx_queue is implemented
      but this could lead to packet drops in network layer; adding stop/start
      logic for software TXQs in mac80211 instead makes more sense; the change
      proposed adds the same in mac80211.
      Signed-off-by: NManikanta Pubbisetty <mpubbise@codeaurora.org>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      21a5d4c3
    • D
      cfg80211/mac80211: make ieee80211_send_layer2_update a public function · 30ca1aa5
      Dedy Lansky 提交于
      Make ieee80211_send_layer2_update() a common function so other drivers
      can re-use it.
      Signed-off-by: NDedy Lansky <dlansky@codeaurora.org>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      30ca1aa5
    • S
      cfg80211: make wmm_rule part of the reg_rule structure · 38cb87ee
      Stanislaw Gruszka 提交于
      Make wmm_rule be part of the reg_rule structure. This simplifies the
      code a lot at the cost of having bigger memory usage. However in most
      cases we have only few reg_rule's and when we do have many like in
      iwlwifi we do not save memory as it allocates a separate wmm_rule for
      each channel anyway.
      
      This also fixes a bug reported in various places where somewhere the
      pointers were corrupted and we ended up doing a null-dereference.
      
      Fixes: 230ebaa1 ("cfg80211: read wmm rules from regulatory database")
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      [rephrase commit message slightly]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      38cb87ee
  6. 23 8月, 2018 1 次提交
    • A
      net_sched: fix unused variable warning in stmmac · 191672ca
      Arnd Bergmann 提交于
      The new tcf_exts_for_each_action() macro doesn't reference its
      arguments when CONFIG_NET_CLS_ACT is disabled, which leads to
      a harmless warning in at least one driver:
      
      drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c: In function 'tc_fill_actions':
      drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c:64:6: error: unused variable 'i' [-Werror=unused-variable]
      
      Adding a cast to void lets us avoid this kind of warning.
      To be on the safe side, do it for all three arguments, not
      just the one that caused the warning.
      
      Fixes: 244cd96a ("net_sched: remove list_head from tc_action")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      191672ca
  7. 22 8月, 2018 4 次提交
  8. 17 8月, 2018 3 次提交
    • D
      tcp, ulp: add alias for all ulp modules · 037b0b86
      Daniel Borkmann 提交于
      Lets not turn the TCP ULP lookup into an arbitrary module loader as
      we only intend to load ULP modules through this mechanism, not other
      unrelated kernel modules:
      
        [root@bar]# cat foo.c
        #include <sys/types.h>
        #include <sys/socket.h>
        #include <linux/tcp.h>
        #include <linux/in.h>
      
        int main(void)
        {
            int sock = socket(PF_INET, SOCK_STREAM, 0);
            setsockopt(sock, IPPROTO_TCP, TCP_ULP, "sctp", sizeof("sctp"));
            return 0;
        }
      
        [root@bar]# gcc foo.c -O2 -Wall
        [root@bar]# lsmod | grep sctp
        [root@bar]# ./a.out
        [root@bar]# lsmod | grep sctp
        sctp                 1077248  4
        libcrc32c              16384  3 nf_conntrack,nf_nat,sctp
        [root@bar]#
      
      Fix it by adding module alias to TCP ULP modules, so probing module
      via request_module() will be limited to tcp-ulp-[name]. The existing
      modules like kTLS will load fine given tcp-ulp-tls alias, but others
      will fail to load:
      
        [root@bar]# lsmod | grep sctp
        [root@bar]# ./a.out
        [root@bar]# lsmod | grep sctp
        [root@bar]#
      
      Sockmap is not affected from this since it's either built-in or not.
      
      Fixes: 734942cc ("tcp: ULP infrastructure")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      037b0b86
    • F
      netfilter: nf_tables: fix register ordering · d209df3e
      Florian Westphal 提交于
      We must register nfnetlink ops last, as that exposes nf_tables to
      userspace.  Without this, we could theoretically get nfnetlink request
      before net->nft state has been initialized.
      
      Fixes: 99633ab2 ("netfilter: nf_tables: complete net namespace support")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d209df3e
    • T
      netfilter: nft_set: fix allocation size overflow in privsize callback. · 4ef360dd
      Taehee Yoo 提交于
      In order to determine allocation size of set, ->privsize is invoked.
      At this point, both desc->size and size of each data structure of set
      are used. desc->size means number of element that is given by user.
      desc->size is u32 type. so that upperlimit of set element is 4294967295.
      but return type of ->privsize is also u32. hence overflow can occurred.
      
      test commands:
         %nft add table ip filter
         %nft add set ip filter hash1 { type ipv4_addr \; size 4294967295 \; }
         %nft list ruleset
      
      splat looks like:
      [ 1239.202910] kasan: CONFIG_KASAN_INLINE enabled
      [ 1239.208788] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [ 1239.217625] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [ 1239.219329] CPU: 0 PID: 1603 Comm: nft Not tainted 4.18.0-rc5+ #7
      [ 1239.229091] RIP: 0010:nft_hash_walk+0x1d2/0x310 [nf_tables_set]
      [ 1239.229091] Code: 84 d2 7f 10 4c 89 e7 89 44 24 38 e8 d8 5a 17 e0 8b 44 24 38 48 8d 7b 10 41 0f b6 0c 24 48 89 fa 48 89 fe 48 c1 ea 03 83 e6 07 <42> 0f b6 14 3a 40 38 f2 7f 1a 84 d2 74 16
      [ 1239.229091] RSP: 0018:ffff8801118cf358 EFLAGS: 00010246
      [ 1239.229091] RAX: 0000000000000000 RBX: 0000000000020400 RCX: 0000000000000001
      [ 1239.229091] RDX: 0000000000004082 RSI: 0000000000000000 RDI: 0000000000020410
      [ 1239.229091] RBP: ffff880114d5a988 R08: 0000000000007e94 R09: ffff880114dd8030
      [ 1239.229091] R10: ffff880114d5a988 R11: ffffed00229bb006 R12: ffff8801118cf4d0
      [ 1239.229091] R13: ffff8801118cf4d8 R14: 0000000000000000 R15: dffffc0000000000
      [ 1239.229091] FS:  00007f5a8fe0b700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
      [ 1239.229091] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1239.229091] CR2: 00007f5a8ecc27b0 CR3: 000000010608e000 CR4: 00000000001006f0
      [ 1239.229091] Call Trace:
      [ 1239.229091]  ? nft_hash_remove+0xf0/0xf0 [nf_tables_set]
      [ 1239.229091]  ? memset+0x1f/0x40
      [ 1239.229091]  ? __nla_reserve+0x9f/0xb0
      [ 1239.229091]  ? memcpy+0x34/0x50
      [ 1239.229091]  nf_tables_dump_set+0x9a1/0xda0 [nf_tables]
      [ 1239.229091]  ? __kmalloc_reserve.isra.29+0x2e/0xa0
      [ 1239.229091]  ? nft_chain_hash_obj+0x630/0x630 [nf_tables]
      [ 1239.229091]  ? nf_tables_commit+0x2c60/0x2c60 [nf_tables]
      [ 1239.229091]  netlink_dump+0x470/0xa20
      [ 1239.229091]  __netlink_dump_start+0x5ae/0x690
      [ 1239.229091]  nft_netlink_dump_start_rcu+0xd1/0x160 [nf_tables]
      [ 1239.229091]  nf_tables_getsetelem+0x2e5/0x4b0 [nf_tables]
      [ 1239.229091]  ? nft_get_set_elem+0x440/0x440 [nf_tables]
      [ 1239.229091]  ? nft_chain_hash_obj+0x630/0x630 [nf_tables]
      [ 1239.229091]  ? nf_tables_dump_obj_done+0x70/0x70 [nf_tables]
      [ 1239.229091]  ? nla_parse+0xab/0x230
      [ 1239.229091]  ? nft_get_set_elem+0x440/0x440 [nf_tables]
      [ 1239.229091]  nfnetlink_rcv_msg+0x7f0/0xab0 [nfnetlink]
      [ 1239.229091]  ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink]
      [ 1239.229091]  ? debug_show_all_locks+0x290/0x290
      [ 1239.229091]  ? sched_clock_cpu+0x132/0x170
      [ 1239.229091]  ? find_held_lock+0x39/0x1b0
      [ 1239.229091]  ? sched_clock_local+0x10d/0x130
      [ 1239.229091]  netlink_rcv_skb+0x211/0x320
      [ 1239.229091]  ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink]
      [ 1239.229091]  ? netlink_ack+0x7b0/0x7b0
      [ 1239.229091]  ? ns_capable_common+0x6e/0x110
      [ 1239.229091]  nfnetlink_rcv+0x2d1/0x310 [nfnetlink]
      [ 1239.229091]  ? nfnetlink_rcv_batch+0x10f0/0x10f0 [nfnetlink]
      [ 1239.229091]  ? netlink_deliver_tap+0x829/0x930
      [ 1239.229091]  ? lock_acquire+0x265/0x2e0
      [ 1239.229091]  netlink_unicast+0x406/0x520
      [ 1239.509725]  ? netlink_attachskb+0x5b0/0x5b0
      [ 1239.509725]  ? find_held_lock+0x39/0x1b0
      [ 1239.509725]  netlink_sendmsg+0x987/0xa20
      [ 1239.509725]  ? netlink_unicast+0x520/0x520
      [ 1239.509725]  ? _copy_from_user+0xa9/0xc0
      [ 1239.509725]  __sys_sendto+0x21a/0x2c0
      [ 1239.509725]  ? __ia32_sys_getpeername+0xa0/0xa0
      [ 1239.509725]  ? retint_kernel+0x10/0x10
      [ 1239.509725]  ? sched_clock_cpu+0x132/0x170
      [ 1239.509725]  ? find_held_lock+0x39/0x1b0
      [ 1239.509725]  ? lock_downgrade+0x540/0x540
      [ 1239.509725]  ? up_read+0x1c/0x100
      [ 1239.509725]  ? __do_page_fault+0x763/0x970
      [ 1239.509725]  ? retint_user+0x18/0x18
      [ 1239.509725]  __x64_sys_sendto+0x177/0x180
      [ 1239.509725]  do_syscall_64+0xaa/0x360
      [ 1239.509725]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 1239.509725] RIP: 0033:0x7f5a8f468e03
      [ 1239.509725] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb d0 0f 1f 84 00 00 00 00 00 83 3d 49 c9 2b 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8
      [ 1239.509725] RSP: 002b:00007ffd78d0b778 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [ 1239.509725] RAX: ffffffffffffffda RBX: 00007ffd78d0c890 RCX: 00007f5a8f468e03
      [ 1239.509725] RDX: 0000000000000034 RSI: 00007ffd78d0b7e0 RDI: 0000000000000003
      [ 1239.509725] RBP: 00007ffd78d0b7d0 R08: 00007f5a8f15c160 R09: 000000000000000c
      [ 1239.509725] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd78d0b7e0
      [ 1239.509725] R13: 0000000000000034 R14: 00007f5a8f9aff60 R15: 00005648040094b0
      [ 1239.509725] Modules linked in: nf_tables_set nf_tables nfnetlink ip_tables x_tables
      [ 1239.670713] ---[ end trace 39375adcda140f11 ]---
      [ 1239.676016] RIP: 0010:nft_hash_walk+0x1d2/0x310 [nf_tables_set]
      [ 1239.682834] Code: 84 d2 7f 10 4c 89 e7 89 44 24 38 e8 d8 5a 17 e0 8b 44 24 38 48 8d 7b 10 41 0f b6 0c 24 48 89 fa 48 89 fe 48 c1 ea 03 83 e6 07 <42> 0f b6 14 3a 40 38 f2 7f 1a 84 d2 74 16
      [ 1239.705108] RSP: 0018:ffff8801118cf358 EFLAGS: 00010246
      [ 1239.711115] RAX: 0000000000000000 RBX: 0000000000020400 RCX: 0000000000000001
      [ 1239.719269] RDX: 0000000000004082 RSI: 0000000000000000 RDI: 0000000000020410
      [ 1239.727401] RBP: ffff880114d5a988 R08: 0000000000007e94 R09: ffff880114dd8030
      [ 1239.735530] R10: ffff880114d5a988 R11: ffffed00229bb006 R12: ffff8801118cf4d0
      [ 1239.743658] R13: ffff8801118cf4d8 R14: 0000000000000000 R15: dffffc0000000000
      [ 1239.751785] FS:  00007f5a8fe0b700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
      [ 1239.760993] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1239.767560] CR2: 00007f5a8ecc27b0 CR3: 000000010608e000 CR4: 00000000001006f0
      [ 1239.775679] Kernel panic - not syncing: Fatal exception
      [ 1239.776630] Kernel Offset: 0x1f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [ 1239.776630] Rebooting in 5 seconds..
      
      Fixes: 20a69341 ("netfilter: nf_tables: add netlink set API")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      4ef360dd
  9. 15 8月, 2018 1 次提交
  10. 13 8月, 2018 4 次提交
  11. 12 8月, 2018 6 次提交
  12. 11 8月, 2018 3 次提交
    • M
      bpf: Enable BPF_PROG_TYPE_SK_REUSEPORT bpf prog in reuseport selection · 8217ca65
      Martin KaFai Lau 提交于
      This patch allows a BPF_PROG_TYPE_SK_REUSEPORT bpf prog to select a
      SO_REUSEPORT sk from a BPF_MAP_TYPE_REUSEPORT_ARRAY introduced in
      the earlier patch.  "bpf_run_sk_reuseport()" will return -ECONNREFUSED
      when the BPF_PROG_TYPE_SK_REUSEPORT prog returns SK_DROP.
      The callers, in inet[6]_hashtable.c and ipv[46]/udp.c, are modified to
      handle this case and return NULL immediately instead of continuing the
      sk search from its hashtable.
      
      It re-uses the existing SO_ATTACH_REUSEPORT_EBPF setsockopt to attach
      BPF_PROG_TYPE_SK_REUSEPORT.  The "sk_reuseport_attach_bpf()" will check
      if the attaching bpf prog is in the new SK_REUSEPORT or the existing
      SOCKET_FILTER type and then check different things accordingly.
      
      One level of "__reuseport_attach_prog()" call is removed.  The
      "sk_unhashed() && ..." and "sk->sk_reuseport_cb" tests are pushed
      back to "reuseport_attach_prog()" in sock_reuseport.c.  sock_reuseport.c
      seems to have more knowledge on those test requirements than filter.c.
      In "reuseport_attach_prog()", after new_prog is attached to reuse->prog,
      the old_prog (if any) is also directly freed instead of returning the
      old_prog to the caller and asking the caller to free.
      
      The sysctl_optmem_max check is moved back to the
      "sk_reuseport_attach_filter()" and "sk_reuseport_attach_bpf()".
      As of other bpf prog types, the new BPF_PROG_TYPE_SK_REUSEPORT is only
      bounded by the usual "bpf_prog_charge_memlock()" during load time
      instead of bounded by both bpf_prog_charge_memlock and sysctl_optmem_max.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      8217ca65
    • M
      bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT · 2dbb9b9e
      Martin KaFai Lau 提交于
      This patch adds a BPF_PROG_TYPE_SK_REUSEPORT which can select
      a SO_REUSEPORT sk from a BPF_MAP_TYPE_REUSEPORT_ARRAY.  Like other
      non SK_FILTER/CGROUP_SKB program, it requires CAP_SYS_ADMIN.
      
      BPF_PROG_TYPE_SK_REUSEPORT introduces "struct sk_reuseport_kern"
      to store the bpf context instead of using the skb->cb[48].
      
      At the SO_REUSEPORT sk lookup time, it is in the middle of transiting
      from a lower layer (ipv4/ipv6) to a upper layer (udp/tcp).  At this
      point,  it is not always clear where the bpf context can be appended
      in the skb->cb[48] to avoid saving-and-restoring cb[].  Even putting
      aside the difference between ipv4-vs-ipv6 and udp-vs-tcp.  It is not
      clear if the lower layer is only ipv4 and ipv6 in the future and
      will it not touch the cb[] again before transiting to the upper
      layer.
      
      For example, in udp_gro_receive(), it uses the 48 byte NAPI_GRO_CB
      instead of IP[6]CB and it may still modify the cb[] after calling
      the udp[46]_lib_lookup_skb().  Because of the above reason, if
      sk->cb is used for the bpf ctx, saving-and-restoring is needed
      and likely the whole 48 bytes cb[] has to be saved and restored.
      
      Instead of saving, setting and restoring the cb[], this patch opts
      to create a new "struct sk_reuseport_kern" and setting the needed
      values in there.
      
      The new BPF_PROG_TYPE_SK_REUSEPORT and "struct sk_reuseport_(kern|md)"
      will serve all ipv4/ipv6 + udp/tcp combinations.  There is no protocol
      specific usage at this point and it is also inline with the current
      sock_reuseport.c implementation (i.e. no protocol specific requirement).
      
      In "struct sk_reuseport_md", this patch exposes data/data_end/len
      with semantic similar to other existing usages.  Together
      with "bpf_skb_load_bytes()" and "bpf_skb_load_bytes_relative()",
      the bpf prog can peek anywhere in the skb.  The "bind_inany" tells
      the bpf prog that the reuseport group is bind-ed to a local
      INANY address which cannot be learned from skb.
      
      The new "bind_inany" is added to "struct sock_reuseport" which will be
      used when running the new "BPF_PROG_TYPE_SK_REUSEPORT" bpf prog in order
      to avoid repeating the "bind INANY" test on
      "sk_v6_rcv_saddr/sk->sk_rcv_saddr" every time a bpf prog is run.  It can
      only be properly initialized when a "sk->sk_reuseport" enabled sk is
      adding to a hashtable (i.e. during "reuseport_alloc()" and
      "reuseport_add_sock()").
      
      The new "sk_select_reuseport()" is the main helper that the
      bpf prog will use to select a SO_REUSEPORT sk.  It is the only function
      that can use the new BPF_MAP_TYPE_REUSEPORT_ARRAY.  As mentioned in
      the earlier patch, the validity of a selected sk is checked in
      run time in "sk_select_reuseport()".  Doing the check in
      verification time is difficult and inflexible (consider the map-in-map
      use case).  The runtime check is to compare the selected sk's reuseport_id
      with the reuseport_id that we want.  This helper will return -EXXX if the
      selected sk cannot serve the incoming request (e.g. reuseport_id
      not match).  The bpf prog can decide if it wants to do SK_DROP as its
      discretion.
      
      When the bpf prog returns SK_PASS, the kernel will check if a
      valid sk has been selected (i.e. "reuse_kern->selected_sk != NULL").
      If it does , it will use the selected sk.  If not, the kernel
      will select one from "reuse->socks[]" (as before this patch).
      
      The SK_DROP and SK_PASS handling logic will be in the next patch.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      2dbb9b9e
    • M
      net: Add ID (if needed) to sock_reuseport and expose reuseport_lock · 736b4602
      Martin KaFai Lau 提交于
      A later patch will introduce a BPF_MAP_TYPE_REUSEPORT_ARRAY which
      allows a SO_REUSEPORT sk to be added to a bpf map.  When a sk
      is removed from reuse->socks[], it also needs to be removed from
      the bpf map.  Also, when adding a sk to a bpf map, the bpf
      map needs to ensure it is indeed in a reuse->socks[].
      Hence, reuseport_lock is needed by the bpf map to ensure its
      map_update_elem() and map_delete_elem() operations are in-sync with
      the reuse->socks[].  The BPF_MAP_TYPE_REUSEPORT_ARRAY map will only
      acquire the reuseport_lock after ensuring the adding sk is already
      in a reuseport group (i.e. reuse->socks[]).  The map_lookup_elem()
      will be lockless.
      
      This patch also adds an ID to sock_reuseport.  A later patch
      will introduce BPF_PROG_TYPE_SK_REUSEPORT which allows
      a bpf prog to select a sk from a bpf map.  It is inflexible to
      statically enforce a bpf map can only contain the sk belonging to
      a particular reuse->socks[] (i.e. same IP:PORT) during the bpf
      verification time. For example, think about the the map-in-map situation
      where the inner map can be dynamically changed in runtime and the outer
      map may have inner maps belonging to different reuseport groups.
      Hence, when the bpf prog (in the new BPF_PROG_TYPE_SK_REUSEPORT
      type) selects a sk,  this selected sk has to be checked to ensure it
      belongs to the requesting reuseport group (i.e. the group serving
      that IP:PORT).
      
      The "sk->sk_reuseport_cb" pointer cannot be used for this checking
      purpose because the pointer value will change after reuseport_grow().
      Instead of saving all checking conditions like the ones
      preced calling "reuseport_add_sock()" and compare them everytime a
      bpf_prog is run, a 32bits ID is introduced to survive the
      reuseport_grow().  The ID is only acquired if any of the
      reuse->socks[] is added to the newly introduced
      "BPF_MAP_TYPE_REUSEPORT_ARRAY" map.
      
      If "BPF_MAP_TYPE_REUSEPORT_ARRAY" is not used,  the changes in this
      patch is a no-op.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      736b4602