1. 15 10月, 2014 6 次提交
    • A
      net: filter: move common defines into bpf_common.h · c15952dc
      Alexei Starovoitov 提交于
      userspace programs that use eBPF instruction macros need to include two files:
      uapi/linux/filter.h and uapi/linux/bpf.h
      Move common macro definitions that are shared between classic BPF and eBPF
      into uapi/linux/bpf_common.h, so that user app can include only one bpf.h file
      
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c15952dc
    • T
      isdn/capi: correct capi20_manufacturer argument type mismatch · 9ea8aa8d
      Tilman Schmidt 提交于
      Function capi20_manufacturer() is declared with unsigned int cmd
      argument but called with unsigned long.
      Fix by correcting the function prototype since the actual argument
      is part of the user visible API.
      
      Spotted with Coverity.
      Signed-off-by: NTilman Schmidt <tilman@imap.cc>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ea8aa8d
    • L
      ipv6: remove aca_lock spinlock from struct ifacaddr6 · 02ea8074
      Li RongQing 提交于
      no user uses this lock.
      Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02ea8074
    • A
      skbuff: fix ftrace handling in skb_unshare · 31eff81e
      Alexander Aring 提交于
      If the skb is not dropped afterwards we should run consume_skb instead
      kfree_skb. Inside of function skb_unshare we do always a kfree_skb,
      doesn't depend if skb_copy failed or was successful.
      
      This patch switch this behaviour like skb_share_check, if allocation of
      sk_buff failed we use kfree_skb otherwise consume_skb.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31eff81e
    • D
      net: sctp: fix panic on duplicate ASCONF chunks · b69040d8
      Daniel Borkmann 提交于
      When receiving a e.g. semi-good formed connection scan in the
      form of ...
      
        -------------- INIT[ASCONF; ASCONF_ACK] ------------->
        <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
        -------------------- COOKIE-ECHO -------------------->
        <-------------------- COOKIE-ACK ---------------------
        ---------------- ASCONF_a; ASCONF_b ----------------->
      
      ... where ASCONF_a equals ASCONF_b chunk (at least both serials
      need to be equal), we panic an SCTP server!
      
      The problem is that good-formed ASCONF chunks that we reply with
      ASCONF_ACK chunks are cached per serial. Thus, when we receive a
      same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
      not need to process them again on the server side (that was the
      idea, also proposed in the RFC). Instead, we know it was cached
      and we just resend the cached chunk instead. So far, so good.
      
      Where things get nasty is in SCTP's side effect interpreter, that
      is, sctp_cmd_interpreter():
      
      While incoming ASCONF_a (chunk = event_arg) is being marked
      !end_of_packet and !singleton, and we have an association context,
      we do not flush the outqueue the first time after processing the
      ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
      queued up, although we set local_cork to 1. Commit 2e3216cd
      changed the precedence, so that as long as we get bundled, incoming
      chunks we try possible bundling on outgoing queue as well. Before
      this commit, we would just flush the output queue.
      
      Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
      continue to process the same ASCONF_b chunk from the packet. As
      we have cached the previous ASCONF_ACK, we find it, grab it and
      do another SCTP_CMD_REPLY command on it. So, effectively, we rip
      the chunk->list pointers and requeue the same ASCONF_ACK chunk
      another time. Since we process ASCONF_b, it's correctly marked
      with end_of_packet and we enforce an uncork, and thus flush, thus
      crashing the kernel.
      
      Fix it by testing if the ASCONF_ACK is currently pending and if
      that is the case, do not requeue it. When flushing the output
      queue we may relink the chunk for preparing an outgoing packet,
      but eventually unlink it when it's copied into the skb right
      before transmission.
      
      Joint work with Vlad Yasevich.
      
      Fixes: 2e3216cd ("sctp: Follow security requirement of responding with 1 packet")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b69040d8
    • D
      net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks · 9de7922b
      Daniel Borkmann 提交于
      Commit 6f4c618d ("SCTP : Add paramters validity check for
      ASCONF chunk") added basic verification of ASCONF chunks, however,
      it is still possible to remotely crash a server by sending a
      special crafted ASCONF chunk, even up to pre 2.6.12 kernels:
      
      skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768
       head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950
       end:0x440 dev:<NULL>
       ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:129!
      [...]
      Call Trace:
       <IRQ>
       [<ffffffff8144fb1c>] skb_put+0x5c/0x70
       [<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp]
       [<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp]
       [<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20
       [<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp]
       [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
       [<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0
       [<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
       [<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
       [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
       [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
       [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
       [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
       [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
       [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
       [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
       [<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0
       [<ffffffff81497078>] ip_local_deliver+0x98/0xa0
       [<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440
       [<ffffffff81496ac5>] ip_rcv+0x275/0x350
       [<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750
       [<ffffffff81460588>] netif_receive_skb+0x58/0x60
      
      This can be triggered e.g., through a simple scripted nmap
      connection scan injecting the chunk after the handshake, for
      example, ...
      
        -------------- INIT[ASCONF; ASCONF_ACK] ------------->
        <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
        -------------------- COOKIE-ECHO -------------------->
        <-------------------- COOKIE-ACK ---------------------
        ------------------ ASCONF; UNKNOWN ------------------>
      
      ... where ASCONF chunk of length 280 contains 2 parameters ...
      
        1) Add IP address parameter (param length: 16)
        2) Add/del IP address parameter (param length: 255)
      
      ... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
      Address Parameter in the ASCONF chunk is even missing, too.
      This is just an example and similarly-crafted ASCONF chunks
      could be used just as well.
      
      The ASCONF chunk passes through sctp_verify_asconf() as all
      parameters passed sanity checks, and after walking, we ended
      up successfully at the chunk end boundary, and thus may invoke
      sctp_process_asconf(). Parameter walking is done with
      WORD_ROUND() to take padding into account.
      
      In sctp_process_asconf()'s TLV processing, we may fail in
      sctp_process_asconf_param() e.g., due to removal of the IP
      address that is also the source address of the packet containing
      the ASCONF chunk, and thus we need to add all TLVs after the
      failure to our ASCONF response to remote via helper function
      sctp_add_asconf_response(), which basically invokes a
      sctp_addto_chunk() adding the error parameters to the given
      skb.
      
      When walking to the next parameter this time, we proceed
      with ...
      
        length = ntohs(asconf_param->param_hdr.length);
        asconf_param = (void *)asconf_param + length;
      
      ... instead of the WORD_ROUND()'ed length, thus resulting here
      in an off-by-one that leads to reading the follow-up garbage
      parameter length of 12336, and thus throwing an skb_over_panic
      for the reply when trying to sctp_addto_chunk() next time,
      which implicitly calls the skb_put() with that length.
      
      Fix it by using sctp_walk_params() [ which is also used in
      INIT parameter processing ] macro in the verification *and*
      in ASCONF processing: it will make sure we don't spill over,
      that we walk parameters WORD_ROUND()'ed. Moreover, we're being
      more defensive and guard against unknown parameter types and
      missized addresses.
      
      Joint work with Vlad Yasevich.
      
      Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NVlad Yasevich <vyasevich@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9de7922b
  2. 11 10月, 2014 1 次提交
  3. 09 10月, 2014 3 次提交
  4. 08 10月, 2014 4 次提交
  5. 07 10月, 2014 6 次提交
  6. 06 10月, 2014 8 次提交
  7. 05 10月, 2014 2 次提交
  8. 04 10月, 2014 10 次提交
    • T
      ip_tunnel: Add GUE support · bc1fc390
      Tom Herbert 提交于
      This patch allows configuring IPIP, sit, and GRE tunnels to use GUE.
      This is very similar to fou excpet that we need to insert the GUE header
      in addition to the UDP header on transmit.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc1fc390
    • T
      gue: Receive side for Generic UDP Encapsulation · 37dd0247
      Tom Herbert 提交于
      This patch adds support receiving for GUE packets in the fou module. The
      fou module now supports direct foo-over-udp (no encapsulation header)
      and GUE. To support this a type parameter is added to the fou netlink
      parameters.
      
      For a GUE socket we define gue_udp_recv, gue_gro_receive, and
      gue_gro_complete to handle the specifics of the GUE protocol. Most
      of the code to manage and configure sockets is common with the fou.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37dd0247
    • T
      fou: eliminate IPv4,v6 specific GRO functions · efc98d08
      Tom Herbert 提交于
      This patch removes fou[46]_gro_receive and fou[46]_gro_complete
      functions. The v4 or v6 variants were chosen for the UDP offloads
      based on the address family of the socket this is not necessary
      or correct. Alternatively, this patch adds is_ipv6 to napi_gro_skb.
      This is set in udp6_gro_receive and unset in udp4_gro_receive. In
      fou_gro_receive the value is used to select the correct inet_offloads
      for the protocol of the outer IP header.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efc98d08
    • E
      net/mlx5_core: Identify resources by their type · 5903325a
      Eli Cohen 提交于
      This patch puts a common part as the first field of mlx5_core_qp. This field is
      used to identify which resource generated an event. This is required since upcoming
      new resource types such as DC targets are allocated for the same numerical space
      as regular QPs and may generate the same events. By searching the resource in the
      same table we can then look at the common field to identify the resource.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5903325a
    • E
      net/mlx5_core: use set/get macros in device caps · b775516b
      Eli Cohen 提交于
      Transform device capabilities related commands to use set/get macros to
      manipulate command mailboxes.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b775516b
    • E
      net/mlx5_core: Use hardware registers description header file · d29b796a
      Eli Cohen 提交于
      Add an auto generated header file that describes hardware registers along with
      set of macros that set/get values. The macros do static checks to avoid
      overflow, handle endianess, and overall provide a clean way to code commands.
      Currently the header file is small and we will add structs as we make use of
      the macros.
      A few commands were removed from the commands enum since they are not supported
      currently and will be added when support is available.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d29b796a
    • E
      net/mlx5_core: Update device capabilities handling · c7a08ac7
      Eli Cohen 提交于
      Rearrange struct mlx5_caps so it has a "gen" field to represent the current
      capabilities configured for the device. Max capabilities can also be queried
      from the device. Also update capabilities struct to contain more fields as per
      the latest revision if firmware specification.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7a08ac7
    • E
      qdisc: validate skb without holding lock · 55a93b3e
      Eric Dumazet 提交于
      Validation of skb can be pretty expensive :
      
      GSO segmentation and/or checksum computations.
      
      We can do this without holding qdisc lock, so that other cpus
      can queue additional packets.
      
      Trick is that requeued packets were already validated, so we carry
      a boolean so that sch_direct_xmit() can validate a fresh skb list,
      or directly use an old one.
      
      Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads
      host.
      
      Turning TSO on or off had no effect on throughput, only few more cpu
      cycles. Lock contention on qdisc lock disappeared.
      
      Same if disabling TX checksum offload.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55a93b3e
    • J
      dynamic_debug: change __dynamic_<foo>_dbg return types to void · 906d2015
      Joe Perches 提交于
      The return value is not used by callers of these functions
      so change the functions to return void.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Acked-by: NJason Baron <jbaron@akamai.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      906d2015
    • J
      qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE · 5772e9a3
      Jesper Dangaard Brouer 提交于
      Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
      sending/processing an entire skb list.
      
      This patch implements qdisc bulk dequeue, by allowing multiple packets
      to be dequeued in dequeue_skb().
      
      The optimization principle for this is two fold, (1) to amortize
      locking cost and (2) avoid expensive tailptr update for notifying HW.
       (1) Several packets are dequeued while holding the qdisc root_lock,
      amortizing locking cost over several packet.  The dequeued SKB list is
      processed under the TXQ lock in dev_hard_start_xmit(), thus also
      amortizing the cost of the TXQ lock.
       (2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
      API to delay HW tailptr update, which also reduces the cost per
      packet.
      
      One restriction of the new API is that every SKB must belong to the
      same TXQ.  This patch takes the easy way out, by restricting bulk
      dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
      qdisc only have attached a single TXQ.
      
      Some detail about the flow; dev_hard_start_xmit() will process the skb
      list, and transmit packets individually towards the driver (see
      xmit_one()).  In case the driver stops midway in the list, the
      remaining skb list is returned by dev_hard_start_xmit().  In
      sch_direct_xmit() this returned list is requeued by dev_requeue_skb().
      
      To avoid overshooting the HW limits, which results in requeuing, the
      patch limits the amount of bytes dequeued, based on the drivers BQL
      limits.  In-effect bulking will only happen for BQL enabled drivers.
      
      Small amounts for extra HoL blocking (2x MTU/0.24ms) were
      measured at 100Mbit/s, with bulking 8 packets, but the
      oscillating nature of the measurement indicate something, like
      sched latency might be causing this effect. More comparisons
      show, that this oscillation goes away occationally. Thus, we
      disregard this artifact completely and remove any "magic" bulking
      limit.
      
      For now, as a conservative approach, stop bulking when seeing TSO and
      segmented GSO packets.  They already benefit from bulking on their own.
      A followup patch add this, to allow easier bisect-ability for finding
      regressions.
      
      Jointed work with Hannes, Daniel and Florian.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5772e9a3