1. 07 8月, 2014 1 次提交
    • K
      list: fix order of arguments for hlist_add_after(_rcu) · 1d023284
      Ken Helias 提交于
      All other add functions for lists have the new item as first argument
      and the position where it is added as second argument.  This was changed
      for no good reason in this function and makes using it unnecessary
      confusing.
      
      The name was changed to hlist_add_behind() to cause unconverted code to
      generate a compile error instead of using the wrong parameter order.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKen Helias <kenhelias@firemail.de>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	[intel driver bits]
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d023284
  2. 06 8月, 2014 9 次提交
    • E
      sctp: fix possible seqlock seadlock in sctp_packet_transmit() · 757efd32
      Eric Dumazet 提交于
      Dave reported following splat, caused by improper use of
      IP_INC_STATS_BH() in process context.
      
      BUG: using __this_cpu_add() in preemptible [00000000] code: trinity-c117/14551
      caller is __this_cpu_preempt_check+0x13/0x20
      CPU: 3 PID: 14551 Comm: trinity-c117 Not tainted 3.16.0+ #33
       ffffffff9ec898f0 0000000047ea7e23 ffff88022d32f7f0 ffffffff9e7ee207
       0000000000000003 ffff88022d32f818 ffffffff9e397eaa ffff88023ee70b40
       ffff88022d32f970 ffff8801c026d580 ffff88022d32f828 ffffffff9e397ee3
      Call Trace:
       [<ffffffff9e7ee207>] dump_stack+0x4e/0x7a
       [<ffffffff9e397eaa>] check_preemption_disabled+0xfa/0x100
       [<ffffffff9e397ee3>] __this_cpu_preempt_check+0x13/0x20
       [<ffffffffc0839872>] sctp_packet_transmit+0x692/0x710 [sctp]
       [<ffffffffc082a7f2>] sctp_outq_flush+0x2a2/0xc30 [sctp]
       [<ffffffff9e0d985c>] ? mark_held_locks+0x7c/0xb0
       [<ffffffff9e7f8c6d>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
       [<ffffffffc082b99a>] sctp_outq_uncork+0x1a/0x20 [sctp]
       [<ffffffffc081e112>] sctp_cmd_interpreter.isra.23+0x1142/0x13f0 [sctp]
       [<ffffffffc081c86b>] sctp_do_sm+0xdb/0x330 [sctp]
       [<ffffffff9e0b8f1b>] ? preempt_count_sub+0xab/0x100
       [<ffffffffc083b350>] ? sctp_cname+0x70/0x70 [sctp]
       [<ffffffffc08389ca>] sctp_primitive_ASSOCIATE+0x3a/0x50 [sctp]
       [<ffffffffc083358f>] sctp_sendmsg+0x88f/0xe30 [sctp]
       [<ffffffff9e0d673a>] ? lock_release_holdtime.part.28+0x9a/0x160
       [<ffffffff9e0d62ce>] ? put_lock_stats.isra.27+0xe/0x30
       [<ffffffff9e73b624>] inet_sendmsg+0x104/0x220
       [<ffffffff9e73b525>] ? inet_sendmsg+0x5/0x220
       [<ffffffff9e68ac4e>] sock_sendmsg+0x9e/0xe0
       [<ffffffff9e1c0c09>] ? might_fault+0xb9/0xc0
       [<ffffffff9e1c0bae>] ? might_fault+0x5e/0xc0
       [<ffffffff9e68b234>] SYSC_sendto+0x124/0x1c0
       [<ffffffff9e0136b0>] ? syscall_trace_enter+0x250/0x330
       [<ffffffff9e68c3ce>] SyS_sendto+0xe/0x10
       [<ffffffff9e7f9be4>] tracesys+0xdd/0xe2
      
      This is a followup of commits f1d8cba6 ("inet: fix possible
      seqlock deadlocks") and 7f88c6b2 ("ipv6: fix possible seqlock
      deadlock in ip6_finish_output2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Reported-by: NDave Jones <davej@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      757efd32
    • T
      bridge: Update outdated comment on promiscuous mode · fdb0a662
      Toshiaki Makita 提交于
      Now bridge ports can be non-promiscuous, vlan_vid_add() is no longer an
      unnecessary operation.
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fdb0a662
    • W
      net-timestamp: ACK timestamp for bytestreams · e1c8a607
      Willem de Bruijn 提交于
      Add SOF_TIMESTAMPING_TX_ACK, a request for a tstamp when the last byte
      in the send() call is acknowledged. It implements the feature for TCP.
      
      The timestamp is generated when the TCP socket cumulative ACK is moved
      beyond the tracked seqno for the first time. The feature ignores SACK
      and FACK, because those acknowledge the specific byte, but not
      necessarily the entire contents of the buffer up to that byte.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1c8a607
    • W
      net-timestamp: TCP timestamping · 4ed2d765
      Willem de Bruijn 提交于
      TCP timestamping extends SO_TIMESTAMPING to bytestreams.
      
      Bytestreams do not have a 1:1 relationship between send() buffers and
      network packets. The feature interprets a send call on a bytestream as
      a request for a timestamp for the last byte in that send() buffer.
      
      The choice corresponds to a request for a timestamp when all bytes in
      the buffer have been sent. That assumption depends on in-order kernel
      transmission. This is the common case. That said, it is possible to
      construct a traffic shaping tree that would result in reordering.
      The guarantee is strong, then, but not ironclad.
      
      This implementation supports send and sendpages (splice). GSO replaces
      one large packet with multiple smaller packets. This patch also copies
      the option into the correct smaller packet.
      
      This patch does not yet support timestamping on data in an initial TCP
      Fast Open SYN, because that takes a very different data path.
      
      If ID generation in ee_data is enabled, bytestream timestamps return a
      byte offset, instead of the packet counter for datagrams.
      
      The implementation supports a single timestamp per packet. It silenty
      replaces requests for previous timestamps. To avoid missing tstamps,
      flush the tcp queue by disabling Nagle, cork and autocork. Missing
      tstamps can be detected by offset when the ee_data ID is enabled.
      
      Implementation details:
      
      - On GSO, the timestamping code can be included in the main loop. I
      moved it into its own loop to reduce the impact on the common case
      to a single branch.
      
      - To avoid leaking the absolute seqno to userspace, the offset
      returned in ee_data must always be relative. It is an offset between
      an skb and sk field. The first is always set (also for GSO & ACK).
      The second must also never be uninitialized. Only allow the ID
      option on sockets in the ESTABLISHED state, for which the seqno
      is available. Never reset it to zero (instead, move it to the
      current seqno when reenabling the option).
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ed2d765
    • W
      net-timestamp: SCHED timestamp on entering packet scheduler · e7fd2885
      Willem de Bruijn 提交于
      Kernel transmit latency is often incurred in the packet scheduler.
      Introduce a new timestamp on transmission just before entering the
      scheduler. When data travels through multiple devices (bonding,
      tunneling, ...) each device will export an individual timestamp.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7fd2885
    • W
      net-timestamp: add key to disambiguate concurrent datagrams · 09c2d251
      Willem de Bruijn 提交于
      Datagrams timestamped on transmission can coexist in the kernel stack
      and be reordered in packet scheduling. When reading looped datagrams
      from the socket error queue it is not always possible to unique
      correlate looped data with original send() call (for application
      level retransmits). Even if possible, it may be expensive and complex,
      requiring packet inspection.
      
      Introduce a data-independent ID mechanism to associate timestamps with
      send calls. Pass an ID alongside the timestamp in field ee_data of
      sock_extended_err.
      
      The ID is a simple 32 bit unsigned int that is associated with the
      socket and incremented on each send() call for which software tx
      timestamp generation is enabled.
      
      The feature is enabled only if SOF_TIMESTAMPING_OPT_ID is set, to
      avoid changing ee_data for existing applications that expect it 0.
      The counter is reset each time the flag is reenabled. Reenabling
      does not change the ID of already submitted data. It is possible
      to receive out of order IDs if the timestamp stream is not quiesced
      first.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09c2d251
    • W
      net-timestamp: move timestamp flags out of sk_flags · b9f40e21
      Willem de Bruijn 提交于
      sk_flags is reaching its limit. New timestamping options will not fit.
      Move all of them into a new field sk->sk_tsflags.
      
      Added benefit is that this removes boilerplate code to convert between
      SOF_TIMESTAMPING_.. and SOCK_TIMESTAMPING_.. in getsockopt/setsockopt.
      
      SOCK_TIMESTAMPING_RX_SOFTWARE is also used to toggle the receive
      timestamp logic (netstamp_needed). That can be simplified and this
      last key removed, but will leave that for a separate patch.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      
      ----
      
      The u16 in sock can be moved into a 16-bit hole below sk_gso_max_segs,
      though that scatters tstamp fields throughout the struct.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9f40e21
    • W
      net-timestamp: extend SCM_TIMESTAMPING ancillary data struct · f24b9be5
      Willem de Bruijn 提交于
      Applications that request kernel tx timestamps with SO_TIMESTAMPING
      read timestamps as recvmsg() ancillary data. The response is defined
      implicitly as timespec[3].
      
      1) define struct scm_timestamping explicitly and
      
      2) add support for new tstamp types. On tx, scm_timestamping always
         accompanies a sock_extended_err. Define previously unused field
         ee_info to signal the type of ts[0]. Introduce SCM_TSTAMP_SND to
         define the existing behavior.
      
      The reception path is not modified. On rx, no struct similar to
      sock_extended_err is passed along with SCM_TIMESTAMPING.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f24b9be5
    • N
      tcp: reduce spurious retransmits due to transient SACK reneging · 5ae344c9
      Neal Cardwell 提交于
      This commit reduces spurious retransmits due to apparent SACK reneging
      by only reacting to SACK reneging that persists for a short delay.
      
      When a sequence space hole at snd_una is filled, some TCP receivers
      send a series of ACKs as they apparently scan their out-of-order queue
      and cumulatively ACK all the packets that have now been consecutiveyly
      received. This is essentially misbehavior B in "Misbehaviors in TCP
      SACK generation" ACM SIGCOMM Computer Communication Review, April
      2011, so we suspect that this is from several common OSes (Windows
      2000, Windows Server 2003, Windows XP). However, this issue has also
      been seen in other cases, e.g. the netdev thread "TCP being hoodwinked
      into spurious retransmissions by lack of timestamps?" from March 2014,
      where the receiver was thought to be a BSD box.
      
      Since snd_una would temporarily be adjacent to a previously SACKed
      range in these scenarios, this receiver behavior triggered the Linux
      SACK reneging code path in the sender. This led the sender to clear
      the SACK scoreboard, enter CA_Loss, and spuriously retransmit
      (potentially) every packet from the entire write queue at line rate
      just a few milliseconds before the ACK for each packet arrives at the
      sender.
      
      To avoid such situations, now when a sender sees apparent reneging it
      does not yet retransmit, but rather adjusts the RTO timer to give the
      receiver a little time (max(RTT/2, 10ms)) to send us some more ACKs
      that will restore sanity to the SACK scoreboard. If the reneging
      persists until this RTO then, as before, we clear the SACK scoreboard
      and enter CA_Loss.
      
      A 10ms delay tolerates a receiver sending such a stream of ACKs at
      56Kbit/sec. And to allow for receivers with slower or more congested
      paths, we wait for at least RTT/2.
      
      We validated the resulting max(RTT/2, 10ms) delay formula with a mix
      of North American and South American Google web server traffic, and
      found that for ACKs displaying transient reneging:
      
       (1) 90% of inter-ACK delays were less than 10ms
       (2) 99% of inter-ACK delays were less than RTT/2
      
      In tests on Google web servers this commit reduced reneging events by
      75%-90% (as measured by the TcpExtTCPSACKReneging counter), without
      any measurable impact on latency for user HTTP and SPDY requests.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ae344c9
  3. 05 8月, 2014 7 次提交
  4. 04 8月, 2014 1 次提交
  5. 03 8月, 2014 14 次提交
  6. 02 8月, 2014 1 次提交
  7. 01 8月, 2014 7 次提交
    • P
      netlabel: shorter names for the NetLabel catmap funcs/structs · 4fbe63d1
      Paul Moore 提交于
      Historically the NetLabel LSM secattr catmap functions and data
      structures have had very long names which makes a mess of the NetLabel
      code and anyone who uses NetLabel.  This patch renames the catmap
      functions and structures from "*_secattr_catmap_*" to just "*_catmap_*"
      which improves things greatly.
      
      There are no substantial code or logic changes in this patch.
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      Tested-by: NCasey Schaufler <casey@schaufler-ca.com>
      4fbe63d1
    • P
      netlabel: fix the catmap walking functions · d960a618
      Paul Moore 提交于
      The two NetLabel LSM secattr catmap walk functions didn't handle
      certain edge conditions correctly, causing incorrect security labels
      to be generated in some cases.  This patch corrects these problems and
      converts the functions to use the new _netlbl_secattr_catmap_getnode()
      function in order to reduce the amount of repeated code.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      Tested-by: NCasey Schaufler <casey@schaufler-ca.com>
      d960a618
    • P
      netlabel: fix the horribly broken catmap functions · 4b8feff2
      Paul Moore 提交于
      The NetLabel secattr catmap functions, and the SELinux import/export
      glue routines, were broken in many horrible ways and the SELinux glue
      code fiddled with the NetLabel catmap structures in ways that we
      probably shouldn't allow.  At some point this "worked", but that was
      likely due to a bit of dumb luck and sub-par testing (both inflicted
      by yours truly).  This patch corrects these problems by basically
      gutting the code in favor of something less obtuse and restoring the
      NetLabel abstractions in the SELinux catmap glue code.
      
      Everything is working now, and if it decides to break itself in the
      future this code will be much easier to debug than the code it
      replaces.
      
      One noteworthy side effect of the changes is that it is no longer
      necessary to allocate a NetLabel catmap before calling one of the
      NetLabel APIs to set a bit in the catmap.  NetLabel will automatically
      allocate the catmap nodes when needed, resulting in less allocations
      when the lowest bit is greater than 255 and less code in the LSMs.
      
      Cc: stable@vger.kernel.org
      Reported-by: NChristian Evans <frodox@zoho.com>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      Tested-by: NCasey Schaufler <casey@schaufler-ca.com>
      4b8feff2
    • P
      netlabel: fix a problem when setting bits below the previously lowest bit · 41c3bd20
      Paul Moore 提交于
      The NetLabel category (catmap) functions have a problem in that they
      assume categories will be set in an increasing manner, e.g. the next
      category set will always be larger than the last.  Unfortunately, this
      is not a valid assumption and could result in problems when attempting
      to set categories less than the startbit in the lowest catmap node.
      In some cases kernel panics and other nasties can result.
      
      This patch corrects the problem by checking for this and allocating a
      new catmap node instance and placing it at the front of the list.
      
      Cc: stable@vger.kernel.org
      Reported-by: NChristian Evans <frodox@zoho.com>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      Tested-by: NCasey Schaufler <casey@schaufler-ca.com>
      41c3bd20
    • D
      4330487a
    • V
      net: Correctly set segment mac_len in skb_segment(). · fcdfe3a7
      Vlad Yasevich 提交于
      When performing segmentation, the mac_len value is copied right
      out of the original skb.  However, this value is not always set correctly
      (like when the packet is VLAN-tagged) and we'll end up copying a bad
      value.
      
      One way to demonstrate this is to configure a VM which tags
      packets internally and turn off VLAN acceleration on the forwarding
      bridge port.  The packets show up corrupt like this:
      16:18:24.985548 52:54:00:ab:be:25 > 52:54:00:26:ce:a3, ethertype 802.1Q
      (0x8100), length 1518: vlan 100, p 0, ethertype 0x05e0,
              0x0000:  8cdb 1c7c 8cdb 0064 4006 b59d 0a00 6402 ...|...d@.....d.
              0x0010:  0a00 6401 9e0d b441 0a5e 64ec 0330 14fa ..d....A.^d..0..
              0x0020:  29e3 01c9 f871 0000 0101 080a 000a e833)....q.........3
              0x0030:  000f 8c75 6e65 7470 6572 6600 6e65 7470 ...unetperf.netp
              0x0040:  6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
              0x0050:  6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
              0x0060:  6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
              ...
      
      This also leads to awful throughput as GSO packets are dropped and
      cause retransmissions.
      
      The solution is to set the mac_len using the values already available
      in then new skb.  We've already adjusted all of the header offset, so we
      might as well correctly figure out the mac_len using skb_reset_mac_len().
      After this change, packets are segmented correctly and performance
      is restored.
      
      CC: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fcdfe3a7
    • T
      netlink: Use PAGE_ALIGNED macro · 74e83b23
      Tobias Klauser 提交于
      Use PAGE_ALIGNED(...) instead of IS_ALIGNED(..., PAGE_SIZE).
      Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74e83b23