1. 03 1月, 2013 1 次提交
  2. 15 12月, 2012 2 次提交
    • C
      inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sock · e337e24d
      Christoph Paasch 提交于
      If in either of the above functions inet_csk_route_child_sock() or
      __inet_inherit_port() fails, the newsk will not be freed:
      
      unreferenced object 0xffff88022e8a92c0 (size 1592):
        comm "softirq", pid 0, jiffies 4294946244 (age 726.160s)
        hex dump (first 32 bytes):
          0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00  ................
          02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff8153d190>] kmemleak_alloc+0x21/0x3e
          [<ffffffff810ab3e7>] kmem_cache_alloc+0xb5/0xc5
          [<ffffffff8149b65b>] sk_prot_alloc.isra.53+0x2b/0xcd
          [<ffffffff8149b784>] sk_clone_lock+0x16/0x21e
          [<ffffffff814d711a>] inet_csk_clone_lock+0x10/0x7b
          [<ffffffff814ebbc3>] tcp_create_openreq_child+0x21/0x481
          [<ffffffff814e8fa5>] tcp_v4_syn_recv_sock+0x3a/0x23b
          [<ffffffff814ec5ba>] tcp_check_req+0x29f/0x416
          [<ffffffff814e8e10>] tcp_v4_do_rcv+0x161/0x2bc
          [<ffffffff814eb917>] tcp_v4_rcv+0x6c9/0x701
          [<ffffffff814cea9f>] ip_local_deliver_finish+0x70/0xc4
          [<ffffffff814cec20>] ip_local_deliver+0x4e/0x7f
          [<ffffffff814ce9f8>] ip_rcv_finish+0x1fc/0x233
          [<ffffffff814cee68>] ip_rcv+0x217/0x267
          [<ffffffff814a7bbe>] __netif_receive_skb+0x49e/0x553
          [<ffffffff814a7cc3>] netif_receive_skb+0x50/0x82
      
      This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus
      a single sock_put() is not enough to free the memory. Additionally, things
      like xfrm, memcg, cookie_values,... may have been initialized.
      We have to free them properly.
      
      This is fixed by forcing a call to tcp_done(), ending up in
      inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary,
      because it ends up doing all the cleanup on xfrm, memcg, cookie_values,
      xfrm,...
      
      Before calling tcp_done, we have to set the socket to SOCK_DEAD, to
      force it entering inet_csk_destroy_sock. To avoid the warning in
      inet_csk_destroy_sock, inet_num has to be set to 0.
      As inet_csk_destroy_sock does a dec on orphan_count, we first have to
      increase it.
      
      Calling tcp_done() allows us to remove the calls to
      tcp_clear_xmit_timer() and tcp_cleanup_congestion_control().
      
      A similar approach is taken for dccp by calling dccp_done().
      
      This is in the kernel since 093d2823 (tproxy: fix hash locking issue
      when using port redirection in __inet_inherit_port()), thus since
      version >= 2.6.37.
      Signed-off-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e337e24d
    • D
      ipv6: Change skb->data before using icmpv6_notify() to propagate redirect · 093d04d4
      Duan Jiong 提交于
      In function ndisc_redirect_rcv(), the skb->data points to the transport
      header, but function icmpv6_notify() need the skb->data points to the
      inner IP packet. So before using icmpv6_notify() to propagate redirect,
      change skb->data to point the inner IP packet that triggered the sending
      of the Redirect, and introduce struct rd_msg to make it easy.
      Signed-off-by: NDuan Jiong <djduanjiong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      093d04d4
  3. 13 12月, 2012 1 次提交
  4. 12 12月, 2012 2 次提交
    • E
      pkt_sched: avoid requeues if possible · 1abbe139
      Eric Dumazet 提交于
      With BQL being deployed, we can more likely have following behavior :
      
      We dequeue a packet from qdisc in dequeue_skb(), then we realize target
      tx queue is in XOFF state in sch_direct_xmit(), and we have to hold the
      skb into gso_skb for later.
      
      This shows in stats (tc -s qdisc dev eth0) as requeues.
      
      Problem of these requeues is that high priority packets can not be
      dequeued as long as this (possibly low prio and big TSO packet) is not
      removed from gso_skb.
      
      At 1Gbps speed, a full size TSO packet is 500 us of extra latency.
      
      In some cases, we know that all packets dequeued from a qdisc are
      for a particular and known txq :
      
      - If device is non multi queue
      - For all MQ/MQPRIO slave qdiscs
      
      This patch introduces a new qdisc flag, TCQ_F_ONETXQUEUE to mark
      this capability, so that dequeue_skb() is allowed to dequeue a packet
      only if the associated txq is not stopped.
      
      This indeed reduce latencies for high prio packets (or improve fairness
      with sfq/fq_codel), and almost remove qdisc 'requeues'.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1abbe139
    • E
      net: fix a race in gro_cell_poll() · f8e8f97c
      Eric Dumazet 提交于
      Dmitry Kravkov reported packet drops for GRE packets since GRO support
      was added.
      
      There is a race in gro_cell_poll() because we call napi_complete()
      without any synchronization with a concurrent gro_cells_receive()
      
      Once bug was triggered, we queued packets but did not schedule NAPI
      poll.
      
      We can fix this issue using the spinlock protected the napi_skbs queue,
      as we have to hold it to perform skb dequeue anyway.
      
      As we open-code skb_dequeue(), we no longer need to mask IRQS, as both
      producer and consumer run under BH context.
      
      Bug added in commit c9e6bc64 (net: add gro_cells infrastructure)
      Reported-by: NDmitry Kravkov <dmitry@broadcom.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NDmitry Kravkov <dmitry@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8e8f97c
  5. 08 12月, 2012 2 次提交
    • Y
      tcp: bug fix Fast Open client retransmission · 93b174ad
      Yuchung Cheng 提交于
      If SYN-ACK partially acks SYN-data, the client retransmits the
      remaining data by tcp_retransmit_skb(). This increments lost recovery
      state variables like tp->retrans_out in Open state. If loss recovery
      happens before the retransmission is acked, it triggers the WARN_ON
      check in tcp_fastretrans_alert(). For example: the client sends
      SYN-data, gets SYN-ACK acking only ISN, retransmits data, sends
      another 4 data packets and get 3 dupacks.
      
      Since the retransmission is not caused by network drop it should not
      update the recovery state variables. Further the server may return a
      smaller MSS than the cached MSS used for SYN-data, so the retranmission
      needs a loop. Otherwise some data will not be retransmitted until timeout
      or other loss recovery events.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93b174ad
    • T
      sctp: Add RCU protection to assoc->transport_addr_list · 45122ca2
      Thomas Graf 提交于
      peer.transport_addr_list is currently only protected by sk_sock
      which is inpractical to acquire for procfs dumping purposes.
      
      This patch adds RCU protection allowing for the procfs readers to
      enter RCU read-side critical sections.
      
      Modification of the list continues to be serialized via sk_lock.
      
      V2: Use list_del_rcu() in sctp_association_free() to be safe
          Skip transports marked dead when dumping for procfs
      
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45122ca2
  6. 05 12月, 2012 2 次提交
  7. 04 12月, 2012 5 次提交
  8. 03 12月, 2012 5 次提交
  9. 02 12月, 2012 1 次提交
  10. 01 12月, 2012 2 次提交
    • E
      net: move inet_dport/inet_num in sock_common · ce43b03e
      Eric Dumazet 提交于
      commit 68835aba (net: optimize INET input path further)
      moved some fields used for tcp/udp sockets lookup in the first cache
      line of struct sock_common.
      
      This patch moves inet_dport/inet_num as well, filling a 32bit hole
      on 64 bit arches and reducing number of cache line misses in lookups.
      
      Also change INET_MATCH()/INET_TW_MATCH() to perform the ports match
      before addresses match, as this check is more discriminant.
      
      Remove the hash check from MATCH() macros because we dont need to
      re validate the hash value after taking a refcount on socket, and
      use likely/unlikely compiler hints, as the sk_hash/hash check
      makes the following conditional tests 100% predicted by cpu.
      
      Introduce skc_addrpair/skc_portpair pair values to better
      document the alignment requirements of the port/addr pairs
      used in the various MATCH() macros, and remove some casts.
      
      The namespace check can also be done at last.
      
      This slightly improves TCP/UDP lookup times.
      
      IP/TCP early demux needs inet->rx_dst_ifindex and
      TCP needs inet->min_ttl, lets group them together in same cache line.
      
      With help from Ben Hutchings & Joe Perches.
      
      Idea of this patch came after Ling Ma proposal to move skc_hash
      to the beginning of struct sock_common, and should allow him
      to submit a final version of his patch. My tests show an improvement
      doing so.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Ling Ma <ling.ma.program@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce43b03e
    • R
      rtnelink: remove unused parameter from rtnl_create_link(). · c0713563
      Rami Rosen 提交于
      This patch removes an unused parameter (src_net) from rtnl_create_link()
      method and from the method single invocation, in veth.
      This parameter was used in the past when calling
      ops->get_tx_queues(src_net, tb) in rtnl_create_link().
      The get_tx_queues() member of rtnl_link_ops was replaced by two methods,
      get_num_tx_queues() and get_num_rx_queues(), which do not get any
      parameter. This was done in commit d40156aa by
      Jiri Pirko ("rtnl: allow to specify different num for rx and tx queue count").
      Signed-off-by: NRami Rosen <ramirose@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0713563
  11. 30 11月, 2012 2 次提交
  12. 27 11月, 2012 3 次提交
  13. 26 11月, 2012 8 次提交
    • J
      mac80211: support VHT rates in TX info · 8bc83c24
      Johannes Berg 提交于
      To achieve this, limit the number of retries to
      31 (instead of 255) and use the three bits that
      are then free for VHT flags.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      8bc83c24
    • J
      mac80211: support drivers reporting VHT RX · 5614618e
      Johannes Berg 提交于
      Add support to mac80211 for having drivers report
      received VHT MCS information.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      5614618e
    • J
      nl80211/cfg80211: add VHT MCS support · db9c64cf
      Johannes Berg 提交于
      Add support for reporting and calculating VHT MCSes.
      
      Note that I'm not completely sure that the bitrate
      calculations are correct, nor that they can't be
      simplified.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      db9c64cf
    • J
      mac80211: convert to channel definition struct · 4bf88530
      Johannes Berg 提交于
      Convert mac80211 (and where necessary, some drivers a
      little bit) to the new channel definition struct.
      
      This will allow extending mac80211 for VHT, which is
      currently restricted to channel contexts since there
      are no drivers using that which makes it easier. As
      I also don't care about VHT for drivers not using the
      channel context API, I won't convert the previous API
      to VHT support.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      4bf88530
    • J
      nl80211/cfg80211: support VHT channel configuration · 3d9d1d66
      Johannes Berg 提交于
      Change nl80211 to support specifying a VHT (or HT)
      using the control channel frequency (as before) and
      new attributes for the channel width and first and
      second center frequency. The old channel type is of
      course still supported for HT.
      
      Also change the cfg80211 channel definition struct
      to support these by adding the relevant fields to
      it (and removing the _type field.)
      
      This also adds new helper functions:
       - cfg80211_chandef_create to create a channel def
         struct given the control channel and channel type,
       - cfg80211_chandef_identical to check if two channel
         definitions are identical
       - cfg80211_chandef_compatible to check if the given
         channel definitions are compatible, and return the
         wider of the two
      
      This isn't entirely complete, but that doesn't matter
      until we have a driver using it. In particular, it's
      missing
       - regulatory checks on the usable bandwidth (if that
         even makes sense)
       - regulatory TX power (database can't deal with it)
       - a proper channel compatibility calculation for the
         new channel types
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      3d9d1d66
    • J
      cfg80211: pass a channel definition struct · 683b6d3b
      Johannes Berg 提交于
      Instead of passing a channel pointer and channel type
      to all functions and driver methods, pass a new channel
      definition struct. Right now, this struct contains just
      the control channel and channel type, but for VHT this
      will change.
      
      Also, add a small inline cfg80211_get_chandef_type() so
      that drivers don't need to use the _type field of the
      new structure all the time, which will change.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      683b6d3b
    • J
      cfg80211: remove remain-on-channel channel type · 42d97a59
      Johannes Berg 提交于
      As mwifiex (and mac80211 in the software case) are the
      only drivers actually implementing remain-on-channel
      with channel type, userspace can't be relying on it.
      This is the case, as it's used only for P2P operations
      right now.
      
      Rather than adding a flag to tell userspace whether or
      not it can actually rely on it, simplify all the code
      by removing the ability to use different channel types.
      Leave only the validation of the attribute, so that if
      we extend it again later (with the needed capability
      flag), it can't break userspace sending invalid data.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      42d97a59
    • A
      cfg80211: change function signature of cfg80211_get_p2p_attr() · c216e641
      Arend van Spriel 提交于
      The function cfg80211_get_p2p_attr() can fail and returns
      a negative error code. However, the return type is unsigned
      int. The largest positive number is determined by desired_len
      variable in the function, which is u16. So changing the return
      type to int to allow easy error checking. Also change the type
      for the attribute to enum for improved type checking.
      Signed-off-by: NArend van Spriel <arend@broadcom.com>
      [fix indentation, don't use u8 attr variable]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      c216e641
  14. 24 11月, 2012 1 次提交
  15. 22 11月, 2012 1 次提交
  16. 21 11月, 2012 2 次提交