1. 22 4月, 2017 6 次提交
    • W
      net_sched: remove useless NULL to tp->root · 43920538
      WANG Cong 提交于
      There is no need to NULL tp->root in ->destroy(), since tp is
      going to be freed very soon, and existing readers are still
      safe to read them.
      
      For cls_route, we always init its tp->root, so it can't be NULL,
      we can drop more useless code.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43920538
    • W
      net_sched: move the empty tp check from ->destroy() to ->delete() · 763dbf63
      WANG Cong 提交于
      We could have a race condition where in ->classify() path we
      dereference tp->root and meanwhile a parallel ->destroy() makes it
      a NULL. Daniel cured this bug in commit d9363774
      ("net, sched: respect rcu grace period on cls destruction").
      
      This happens when ->destroy() is called for deleting a filter to
      check if we are the last one in tp, this tp is still linked and
      visible at that time. The root cause of this problem is the semantic
      of ->destroy(), it does two things (for non-force case):
      
      1) check if tp is empty
      2) if tp is empty we could really destroy it
      
      and its caller, if cares, needs to check its return value to see if it
      is really destroyed. Therefore we can't unlink tp unless we know it is
      empty.
      
      As suggested by Daniel, we could actually move the test logic to ->delete()
      so that we can safely unlink tp after ->delete() tells us the last one is
      just deleted and before ->destroy().
      
      Fixes: 1e052be6 ("net_sched: destroy proto tp when all filters are gone")
      Cc: Roi Dayan <roid@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      763dbf63
    • D
      bpf: add napi_id read access to __sk_buff · b1d9fc41
      Daniel Borkmann 提交于
      Add napi_id access to __sk_buff for socket filter program types, tc
      program types and other bpf_convert_ctx_access() users. Having access
      to skb->napi_id is useful for per RX queue listener siloing, f.e.
      in combination with SO_ATTACH_REUSEPORT_EBPF and when busy polling is
      used, meaning SO_REUSEPORT enabled listeners can then select the
      corresponding socket at SYN time already [1]. The skb is marked via
      skb_mark_napi_id() early in the receive path (e.g., napi_gro_receive()).
      
      Currently, sockets can only use SO_INCOMING_NAPI_ID from 6d433902
      ("net: Introduce SO_INCOMING_NAPI_ID") as a socket option to look up
      the NAPI ID associated with the queue for steering, which requires a
      prior sk_mark_napi_id() after the socket was looked up.
      
      Semantics for the __sk_buff napi_id access are similar, meaning if
      skb->napi_id is < MIN_NAPI_ID (e.g. outgoing packets using sender_cpu),
      then an invalid napi_id of 0 is returned to the program, otherwise a
      valid non-zero napi_id.
      
        [1] http://netdevconf.org/2.1/slides/apr6/dumazet-BUSY-POLLING-Netdev-2.1.pdfSuggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1d9fc41
    • M
      Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning · 7acf8a1e
      Matthew Whitehead 提交于
      Constants used for tuning are generally a bad idea, especially as hardware
      changes over time. Replace the constant 2 jiffies with sysctl variable
      netdev_budget_usecs to enable sysadmins to tune the softirq processing.
      Also document the variable.
      
      For example, a very fast machine might tune this to 1000 microseconds,
      while my regression testing 486DX-25 needs it to be 4000 microseconds on
      a nearly idle network to prevent time_squeeze from being incremented.
      
      Version 2: changed jiffies to microseconds for predictable units.
      Signed-off-by: NMatthew Whitehead <tedheadster@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7acf8a1e
    • C
      ip_tunnel: Allow policy-based routing through tunnels · 9830ad4c
      Craig Gallek 提交于
      This feature allows the administrator to set an fwmark for
      packets traversing a tunnel.  This allows the use of independent
      routing tables for tunneled packets without the use of iptables.
      
      There is no concept of per-packet routing decisions through IPv4
      tunnels, so this implementation does not need to work with
      per-packet route lookups as the v6 implementation may
      (with IP6_TNL_F_USE_ORIG_FWMARK).
      
      Further, since the v4 tunnel ioctls share datastructures
      (which can not be trivially modified) with the kernel's internal
      tunnel configuration structures, the mark attribute must be stored
      in the tunnel structure itself and passed as a parameter when
      creating or changing tunnel attributes.
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9830ad4c
    • C
      ip6_tunnel: Allow policy-based routing through tunnels · 0a473b82
      Craig Gallek 提交于
      This feature allows the administrator to set an fwmark for
      packets traversing a tunnel.  This allows the use of independent
      routing tables for tunneled packets without the use of iptables.
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a473b82
  2. 21 4月, 2017 7 次提交
  3. 19 4月, 2017 5 次提交
    • I
      esp4/6: Fix GSO path for non-GSO SW-crypto packets · 8f92e03e
      Ilan Tayari 提交于
      If esp*_offload module is loaded, outbound packets take the
      GSO code path, being encapsulated at layer 3, but encrypted
      in layer 2. validate_xmit_xfrm calls esp*_xmit for that.
      
      esp*_xmit was wrongfully detecting these packets as going
      through hardware crypto offload, while in fact they should
      be encrypted in software, causing plaintext leakage to
      the network, and also dropping at the receiver side.
      
      Perform the encryption in esp*_xmit, if the SA doesn't have
      a hardware offload_handle.
      
      Also, align esp6 code to esp4 logic.
      
      Fixes: fca11ebd ("esp4: Reorganize esp_output")
      Fixes: 383d0350 ("esp6: Reorganize esp_output")
      Signed-off-by: NIlan Tayari <ilant@mellanox.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      8f92e03e
    • C
      esp6: fix incorrect null pointer check on xo · ffa6f571
      Colin Ian King 提交于
      The check for xo being null is incorrect, currently it is checking
      for non-null, it should be checking for null.
      
      Detected with CoverityScan, CID#1429349 ("Dereference after null check")
      
      Fixes: 7862b405 ("esp: Add gso handlers for esp4 and esp6")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      ffa6f571
    • X
      sctp: process duplicated strreset asoc request correctly · 6c801387
      Xin Long 提交于
      This patch is to fix the replay attack issue for strreset asoc requests.
      
      When a duplicated strreset asoc request is received, reply it with bad
      seqno if it's seqno < asoc->strreset_inseq - 2, and reply it with the
      result saved in asoc if it's seqno >= asoc->strreset_inseq - 2.
      
      But note that if the result saved in asoc is performed, the sender's next
      tsn and receiver's next tsn for the response chunk should be set. It's
      safe to get them from asoc. Because if it's changed, which means the peer
      has received the response already, the new response with wrong tsn won't
      be accepted by peer.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c801387
    • X
      sctp: process duplicated strreset in and addstrm in requests correctly · d0f025e6
      Xin Long 提交于
      This patch is to fix the replay attack issue for strreset and addstrm in
      requests.
      
      When a duplicated strreset in or addstrm in request is received, reply it
      with bad seqno if it's seqno < asoc->strreset_inseq - 2, and reply it with
      the result saved in asoc if it's seqno >= asoc->strreset_inseq - 2.
      
      For strreset in or addstrm in request, if the receiver side processes it
      successfully, a strreset out or addstrm out request(as a response for that
      request) will be sent back to peer. reconf_time will retransmit the out
      request even if it's lost.
      
      So when receiving a duplicated strreset in or addstrm in request and it's
      result was performed, it shouldn't reply this request, but drop it instead.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0f025e6
    • X
      sctp: process duplicated strreset out and addstrm out requests correctly · e4dc99c7
      Xin Long 提交于
      Now sctp stream reconf will process a request again even if it's seqno is
      less than asoc->strreset_inseq.
      
      If one request has been done successfully and some data chunks have been
      accepted and then a duplicated strreset out request comes, the streamin's
      ssn will be cleared. It will cause that stream will never receive chunks
      any more because of unsynchronized ssn. It allows a replay attack.
      
      A similar issue also exists when processing addstrm out requests. It will
      cause more extra streams being added.
      
      This patch is to fix it by saving the last 2 results into asoc. When a
      duplicated strreset out or addstrm out request is received, reply it with
      bad seqno if it's seqno < asoc->strreset_inseq - 2, and reply it with the
      result saved in asoc if it's seqno >= asoc->strreset_inseq - 2.
      
      Note that it saves last 2 results instead of only last 1 result, because
      two requests can be sent together in one chunk.
      
      And note that when receiving a duplicated request, the receiver side will
      still reply it even if the peer has received the response. It's safe, As
      the response will be dropped by the peer.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4dc99c7
  4. 18 4月, 2017 18 次提交
  5. 17 4月, 2017 1 次提交
  6. 14 4月, 2017 3 次提交