1. 01 10月, 2013 5 次提交
    • E
      tcp: TSQ can use a dynamic limit · c9eeec26
      Eric Dumazet 提交于
      When TCP Small Queues was added, we used a sysctl to limit amount of
      packets queues on Qdisc/device queues for a given TCP flow.
      
      Problem is this limit is either too big for low rates, or too small
      for high rates.
      
      Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO
      auto sizing, it can better control number of packets in Qdisc/device
      queues.
      
      New limit is two packets or at least 1 to 2 ms worth of packets.
      
      Low rates flows benefit from this patch by having even smaller
      number of packets in queues, allowing for faster recovery,
      better RTT estimations.
      
      High rates flows benefit from this patch by allowing more than 2 packets
      in flight as we had reports this was a limiting factor to reach line
      rate. [ In particular if TX completion is delayed because of coalescing
      parameters ]
      
      Example for a single flow on 10Gbp link controlled by FQ/pacing
      
      14 packets in flight instead of 2
      
      $ tc -s -d qd
      qdisc fq 8001: dev eth0 root refcnt 32 limit 10000p flow_limit 100p
      buckets 1024 quantum 3028 initial_quantum 15140
       Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0
      requeues 6822476)
       rate 9346Mbit 771713pps backlog 953820b 14p requeues 6822476
        2047 flow, 2046 inactive, 1 throttled, delay 15673 ns
        2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit
      
      Note that sk_pacing_rate is currently set to twice the actual rate, but
      this might be refined in the future when a flow is in congestion
      avoidance.
      
      Additional change : skb->destructor should be set to tcp_wfree().
      
      A future patch (for linux 3.13+) might remove tcp_limit_output_bytes
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Wei Liu <wei.liu2@citrix.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9eeec26
    • E
      pkt_sched: fq: qdisc dismantle fixes · 8d34ce10
      Eric Dumazet 提交于
      fq_reset() should drops all packets in queue, including
      throttled flows.
      
      This patch moves code from fq_destroy() to fq_reset()
      to do the cleaning.
      
      fq_change() must stop calling fq_dequeue() if all remaining
      packets are from throttled flows.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d34ce10
    • E
      net: flow_dissector: fix thoff for IPPROTO_AH · b8678358
      Eric Dumazet 提交于
      In commit 8ed78166 ("flow_keys: include thoff into flow_keys for
      later usage"), we missed that existing code was using nhoff as a
      temporary variable that could not always contain transport header
      offset.
      
      This is not a problem for TCP/UDP because port offset (@poff)
      is 0 for these protocols.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Nikolay Aleksandrov <nikolay@redhat.com>
      Acked-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Acked-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8678358
    • P
      ipv6: Fix preferred_lft not updating in some cases · c9d55d5b
      Paul Marks 提交于
      Consider the scenario where an IPv6 router is advertising a fixed
      preferred_lft of 1800 seconds, while the valid_lft begins at 3600
      seconds and counts down in realtime.
      
      A client should reset its preferred_lft to 1800 every time the RA is
      received, but a bug is causing Linux to ignore the update.
      
      The core problem is here:
        if (prefered_lft != ifp->prefered_lft) {
      
      Note that ifp->prefered_lft is an offset, so it doesn't decrease over
      time.  Thus, the comparison is always (1800 != 1800), which fails to
      trigger an update.
      
      The most direct solution would be to compute a "stored_prefered_lft",
      and use that value in the comparison.  But I think that trying to filter
      out unnecessary updates here is a premature optimization.  In order for
      the filter to apply, both of these would need to hold:
      
        - The advertised valid_lft and preferred_lft are both declining in
          real time.
        - No clock skew exists between the router & client.
      
      So in this patch, I've set "update_lft = 1" unconditionally, which
      allows the surrounding code to be greatly simplified.
      Signed-off-by: NPaul Marks <pmarks@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9d55d5b
    • P
      ip_tunnel: Do not use stale inner_iph pointer. · d4a71b15
      Pravin B Shelar 提交于
      While sending packet skb_cow_head() can change skb header which
      invalidates inner_iph pointer to skb header. Following patch
      avoid using it. Found by code inspection.
      
      This bug was introduced by commit 0e6fbc5b (ip_tunnels: extend
      iptunnel_xmit()).
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4a71b15
  2. 29 9月, 2013 3 次提交
    • E
      net: net_secret should not depend on TCP · 9a3bab6b
      Eric Dumazet 提交于
      A host might need net_secret[] and never open a single socket.
      
      Problem added in commit aebda156
      ("net: defer net_secret[] initialization")
      
      Based on prior patch from Hannes Frederic Sowa.
      Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@strressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a3bab6b
    • E
      net: Delay default_device_exit_batch until no devices are unregistering v2 · 50624c93
      Eric W. Biederman 提交于
      There is currently serialization network namespaces exiting and
      network devices exiting as the final part of netdev_run_todo does not
      happen under the rtnl_lock.  This is compounded by the fact that the
      only list of devices unregistering in netdev_run_todo is local to the
      netdev_run_todo.
      
      This lack of serialization in extreme cases results in network devices
      unregistering in netdev_run_todo after the loopback device of their
      network namespace has been freed (making dst_ifdown unsafe), and after
      the their network namespace has exited (making the NETDEV_UNREGISTER,
      and NETDEV_UNREGISTER_FINAL callbacks unsafe).
      
      Add the missing serialization by a per network namespace count of how
      many network devices are unregistering and having a wait queue that is
      woken up whenever the count is decreased.  The count and wait queue
      allow default_device_exit_batch to wait until all of the unregistration
      activity for a network namespace has finished before proceeding to
      unregister the loopback device and then allowing the network namespace
      to exit.
      
      Only a single global wait queue is used because there is a single global
      lock, and there is a single waiter, per network namespace wait queues
      would be a waste of resources.
      
      The per network namespace count of unregistering devices gives a
      progress guarantee because the number of network devices unregistering
      in an exiting network namespace must ultimately drop to zero (assuming
      network device unregistration completes).
      
      The basic logic remains the same as in v1.  This patch is now half
      comment and half rtnl_lock_unregistering an expanded version of
      wait_event performs no extra work in the common case where no network
      devices are unregistering when we get to default_device_exit_batch.
      Reported-by: NFrancesco Ruggeri <fruggeri@aristanetworks.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50624c93
    • C
      IPv6 NAT: Do not drop DNATed 6to4/6rd packets · 7df37ff3
      Catalin\(ux\) M. BOIE 提交于
      When a router is doing DNAT for 6to4/6rd packets the latest
      anti-spoofing commit 218774dc ("ipv6: add anti-spoofing checks for
      6to4 and 6rd") will drop them because the IPv6 address embedded does
      not match the IPv4 destination. This patch will allow them to pass by
      testing if we have an address that matches on 6to4/6rd interface.  I
      have been hit by this problem using Fedora and IPV6TO4_IPV4ADDR.
      Also, log the dropped packets (with rate limit).
      Signed-off-by: NCatalin(ux) M. BOIE <catab@embedromix.ro>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7df37ff3
  3. 24 9月, 2013 5 次提交
  4. 21 9月, 2013 1 次提交
    • G
      Bluetooth: don't release the port in rfcomm_dev_state_change() · 29cd718b
      Gianluca Anzolin 提交于
      When the dlc is closed, rfcomm_dev_state_change() tries to release the
      port in the case it cannot get a reference to the tty. However this is
      racy and not even needed.
      
      Infact as Peter Hurley points out:
      
      1. Only consider dlcs that are 'stolen' from a connected socket, ie.
         reused. Allocated dlcs cannot have been closed prior to port
         activate and so for these dlcs a tty reference will always be avail
         in rfcomm_dev_state_change() -- except for the conditions covered by
         #2b below.
      2. If a tty was at some point previously created for this rfcomm, then
         either
         (a) the tty reference is still avail, so rfcomm_dev_state_change()
             will perform a hangup. So nothing to do, or,
         (b) the tty reference is no longer avail, and the tty_port will be
             destroyed by the last tty_port_put() in rfcomm_tty_cleanup.
             Again, no action required.
      3. Prior to obtaining the dlc lock in rfcomm_dev_add(),
         rfcomm_dev_state_change() will not 'see' a rfcomm_dev so nothing to
         do here.
      4. After releasing the dlc lock in rfcomm_dev_add(),
         rfcomm_dev_state_change() will 'see' an incomplete rfcomm_dev if a
         tty reference could not be obtained. Again, the best thing to do here
         is nothing. Any future attempted open() will block on
         rfcomm_dev_carrier_raised(). The unconnected device will exist until
         released by ioctl(RFCOMMRELEASEDEV).
      
      The patch removes the aforementioned code and uses the
      tty_port_tty_hangup() helper to hangup the tty.
      Signed-off-by: NGianluca Anzolin <gianluca@sottospazio.it>
      Reviewed-by: NPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>
      29cd718b
  5. 20 9月, 2013 3 次提交
  6. 19 9月, 2013 3 次提交
  7. 18 9月, 2013 2 次提交
  8. 17 9月, 2013 10 次提交
    • G
      netfilter: nfnetlink_queue: use network skb for sequence adjustment · 0a0d80eb
      Gao feng 提交于
      Instead of the netlink skb.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0a0d80eb
    • D
      net: sctp: rfc4443: do not report ICMP redirects to user space · 3f96a532
      Daniel Borkmann 提交于
      Adapt the same behaviour for SCTP as present in TCP for ICMP redirect
      messages. For IPv6, RFC4443, section 2.4. says:
      
        ...
        (e) An ICMPv6 error message MUST NOT be originated as a result of
            receiving the following:
        ...
             (e.2) An ICMPv6 redirect message [IPv6-DISC].
        ...
      
      Therefore, do not report an error to user space, just invoke dst's redirect
      callback and leave, same for IPv4 as done in TCP as well. The implication
      w/o having this patch could be that the reception of such packets would
      generate a poll notification and in worst case it could even tear down the
      whole connection. Therefore, stop updating sk_err on redirects.
      Reported-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Suggested-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f96a532
    • D
      ip6_tunnels: raddr and laddr are inverted in nl msg · 0d2ede92
      Ding Zhi 提交于
      IFLA_IPTUN_LOCAL and IFLA_IPTUN_REMOTE were inverted.
      
      Introduced by c075b130 (ip6tnl: advertise tunnel param via rtnl).
      Signed-off-by: NDing Zhi <zhi.ding@6wind.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d2ede92
    • O
      netfilter: ipset: Fix serious failure in CIDR tracking · 2cf55125
      Oliver Smith 提交于
      This fixes a serious bug affecting all hash types with a net element -
      specifically, if a CIDR value is deleted such that none of the same size
      exist any more, all larger (less-specific) values will then fail to
      match. Adding back any prefix with a CIDR equal to or more specific than
      the one deleted will fix it.
      
      Steps to reproduce:
      ipset -N test hash:net
      ipset -A test 1.1.0.0/16
      ipset -A test 2.2.2.0/24
      ipset -T test 1.1.1.1           #1.1.1.1 IS in set
      ipset -D test 2.2.2.0/24
      ipset -T test 1.1.1.1           #1.1.1.1 IS NOT in set
      
      This is due to the fact that the nets counter was unconditionally
      decremented prior to the iteration that shifts up the entries. Now, we
      first check if there is a proceeding entry and if not, decrement it and
      return. Otherwise, we proceed to iterate and then zero the last element,
      which, in most cases, will already be zero.
      Signed-off-by: NOliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
      Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      2cf55125
    • J
      netfilter: ipset: Validate the set family and not the set type family at swapping · 169faa2e
      Jozsef Kadlecsik 提交于
      This closes netfilter bugzilla #843, reported by Quentin Armitage.
      Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      169faa2e
    • J
      netfilter: ipset: Consistent userspace testing with nomatch flag · 0f1799ba
      Jozsef Kadlecsik 提交于
      The "nomatch" commandline flag should invert the matching at testing,
      similarly to the --return-nomatch flag of the "set" match of iptables.
      Until now it worked with the elements with "nomatch" flag only. From
      now on it works with elements without the flag too, i.e:
      
       # ipset n test hash:net
       # ipset a test 10.0.0.0/24 nomatch
       # ipset t test 10.0.0.1
       10.0.0.1 is NOT in set test.
       # ipset t test 10.0.0.1 nomatch
       10.0.0.1 is in set test.
      
       # ipset a test 192.168.0.0/24
       # ipset t test 192.168.0.1
       192.168.0.1 is in set test.
       # ipset t test 192.168.0.1 nomatch
       192.168.0.1 is NOT in set test.
      
       Before the patch the results were
      
       ...
       # ipset t test 192.168.0.1
       192.168.0.1 is in set test.
       # ipset t test 192.168.0.1 nomatch
       192.168.0.1 is in set test.
      Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      0f1799ba
    • J
    • S
      Bluetooth: Fix ACL alive for long in case of non pariable devices · 330b6c15
      Syam Sidhardhan 提交于
      For certain devices (ex: HID mouse), support for authentication,
      pairing and bonding is optional. For such devices, the ACL alive
      for too long after the L2CAP disconnection.
      
      To avoid the ACL alive for too long after L2CAP disconnection, reset the
      ACL disconnect timeout back to HCI_DISCONN_TIMEOUT during L2CAP connect.
      
      While merging the commit id:a9ea3ed9
      this issue might have introduced.
      
      Hcidump info:
      sh-4.1# /opt/hcidump -Xt
      2013-08-05 16:49:00.894129 < ACL data: handle 12 flags 0x00 dlen 12
          L2CAP(s): Disconn req: dcid 0x004a scid 0x0041
      2013-08-05 16:49:00.894195 < HCI Command: Exit Sniff Mode (0x02|0x0004)
          plen 2
          handle 12
      2013-08-05 16:49:00.894269 < ACL data: handle 12 flags 0x00 dlen 12
          L2CAP(s): Disconn req: dcid 0x0049 scid 0x0040
      2013-08-05 16:49:00.895645 > HCI Event: Command Status (0x0f) plen 4
          Exit Sniff Mode (0x02|0x0004) status 0x00 ncmd 1
      2013-08-05 16:49:00.934391 > HCI Event: Mode Change (0x14) plen 6
          status 0x00 handle 12 mode 0x00 interval 0
          Mode: Active
      2013-08-05 16:49:00.936592 > HCI Event: Number of Completed Packets
          (0x13) plen 5
          handle 12 packets 2
      2013-08-05 16:49:00.951577 > ACL data: handle 12 flags 0x02 dlen 12
          L2CAP(s): Disconn rsp: dcid 0x004a scid 0x0041
      2013-08-05 16:49:00.952820 > ACL data: handle 12 flags 0x02 dlen 12
          L2CAP(s): Disconn rsp: dcid 0x0049 scid 0x0040
      2013-08-05 16:49:00.969165 > HCI Event: Mode Change (0x14) plen 6
          status 0x00 handle 12 mode 0x02 interval 50
          Mode: Sniff
      
      2013-08-05 16:49:48.175533 > HCI Event: Mode Change (0x14) plen 6
          status 0x00 handle 12 mode 0x00 interval 0
          Mode: Active
      2013-08-05 16:49:48.219045 > HCI Event: Mode Change (0x14) plen 6
          status 0x00 handle 12 mode 0x02 interval 108
          Mode: Sniff
      
      2013-08-05 16:51:00.968209 < HCI Command: Disconnect (0x01|0x0006) plen 3
          handle 12 reason 0x13
          Reason: Remote User Terminated Connection
      2013-08-05 16:51:00.969056 > HCI Event: Command Status (0x0f) plen 4
          Disconnect (0x01|0x0006) status 0x00 ncmd 1
      2013-08-05 16:51:01.013495 > HCI Event: Mode Change (0x14) plen 6
          status 0x00 handle 12 mode 0x00 interval 0
          Mode: Active
      2013-08-05 16:51:01.073777 > HCI Event: Disconn Complete (0x05) plen 4
          status 0x00 handle 12 reason 0x16
          Reason: Connection Terminated by Local Host
      
      ============================ After  fix ================================
      
      2013-08-05 16:57:35.986648 < ACL data: handle 11 flags 0x00 dlen 12
          L2CAP(s): Disconn req: dcid 0x004c scid 0x0041
      2013-08-05 16:57:35.986713 < HCI Command: Exit Sniff Mode (0x02|0x0004)
          plen 2
          handle 11
      2013-08-05 16:57:35.986785 < ACL data: handle 11 flags 0x00 dlen 12
          L2CAP(s): Disconn req: dcid 0x004b scid 0x0040
      2013-08-05 16:57:35.988110 > HCI Event: Command Status (0x0f) plen 4
          Exit Sniff Mode (0x02|0x0004) status 0x00 ncmd 1
      2013-08-05 16:57:36.030714 > HCI Event: Mode Change (0x14) plen 6
          status 0x00 handle 11 mode 0x00 interval 0
          Mode: Active
      2013-08-05 16:57:36.032950 > HCI Event: Number of Completed Packets
          (0x13) plen 5
          handle 11 packets 2
      2013-08-05 16:57:36.047926 > ACL data: handle 11 flags 0x02 dlen 12
          L2CAP(s): Disconn rsp: dcid 0x004c scid 0x0041
      2013-08-05 16:57:36.049200 > ACL data: handle 11 flags 0x02 dlen 12
          L2CAP(s): Disconn rsp: dcid 0x004b scid 0x0040
      2013-08-05 16:57:36.065509 > HCI Event: Mode Change (0x14) plen 6
          status 0x00 handle 11 mode 0x02 interval 50
          Mode: Sniff
      
      2013-08-05 16:57:40.052006 < HCI Command: Disconnect (0x01|0x0006) plen 3
          handle 11 reason 0x13
          Reason: Remote User Terminated Connection
      2013-08-05 16:57:40.052869 > HCI Event: Command Status (0x0f) plen 4
          Disconnect (0x01|0x0006) status 0x00 ncmd 1
      2013-08-05 16:57:40.104731 > HCI Event: Mode Change (0x14) plen 6
          status 0x00 handle 11 mode 0x00 interval 0
          Mode: Active
      2013-08-05 16:57:40.146935 > HCI Event: Disconn Complete (0x05) plen 4
          status 0x00 handle 11 reason 0x16
          Reason: Connection Terminated by Local Host
      Signed-off-by: NSang-Ki Park <sangki79.park@samsung.com>
      Signed-off-by: NChan-yeol Park <chanyeol.park@samsung.com>
      Signed-off-by: NJaganath Kanakkassery <jaganath.k@samsung.com>
      Signed-off-by: NSzymon Janc <szymon.janc@tieto.com>
      Signed-off-by: NSyam Sidhardhan <s.syam@samsung.com>
      Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>
      330b6c15
    • A
      Bluetooth: Fix encryption key size for peripheral role · 89cbb4da
      Andre Guedes 提交于
      This patch fixes the connection encryption key size information when
      the host is playing the peripheral role. We should set conn->enc_key_
      size in hci_le_ltk_request_evt, otherwise it is left uninitialized.
      
      Cc: Stable <stable@vger.kernel.org>
      Signed-off-by: NAndre Guedes <andre.guedes@openbossa.org>
      Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>
      89cbb4da
    • A
      Bluetooth: Fix security level for peripheral role · f8776218
      Andre Guedes 提交于
      While playing the peripheral role, the host gets a LE Long Term Key
      Request Event from the controller when a connection is established
      with a bonded device. The host then informs the LTK which should be
      used for the connection. Once the link is encrypted, the host gets
      an Encryption Change Event.
      
      Therefore we should set conn->pending_sec_level instead of conn->
      sec_level in hci_le_ltk_request_evt. This way, conn->sec_level is
      properly updated in hci_encrypt_change_evt.
      
      Moreover, since we have a LTK associated to the device, we have at
      least BT_SECURITY_MEDIUM security level.
      
      Cc: Stable <stable@vger.kernel.org>
      Signed-off-by: NAndre Guedes <andre.guedes@openbossa.org>
      Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>
      f8776218
  9. 16 9月, 2013 2 次提交
  10. 13 9月, 2013 6 次提交
    • M
      Remove GENERIC_HARDIRQ config option · 0244ad00
      Martin Schwidefsky 提交于
      After the last architecture switched to generic hard irqs the config
      options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
      for !CONFIG_GENERIC_HARDIRQS can be removed.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      0244ad00
    • P
      netfilter: nf_nat_proto_icmpv6:: fix wrong comparison in icmpv6_manip_pkt · d830f0fa
      Phil Oester 提交于
      In commit 58a317f1 (netfilter: ipv6: add IPv6 NAT support), icmpv6_manip_pkt
      was added with an incorrect comparison of ICMP codes to types.  This causes
      problems when using NAT rules with the --random option.  Correct the
      comparison.
      
      This closes netfilter bugzilla #851, reported by Alexander Neumann.
      Signed-off-by: NPhil Oester <kernel@linuxace.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d830f0fa
    • H
      bridge: Clamp forward_delay when enabling STP · be4f154d
      Herbert Xu 提交于
      At some point limits were added to forward_delay.  However, the
      limits are only enforced when STP is enabled.  This created a
      scenario where you could have a value outside the allowed range
      while STP is disabled, which then stuck around even after STP
      is enabled.
      
      This patch fixes this by clamping the value when we enable STP.
      
      I had to move the locking around a bit to ensure that there is
      no window where someone could insert a value outside the range
      while we're in the middle of enabling STP.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      
      Cheers,
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be4f154d
    • C
      resubmit bridge: fix message_age_timer calculation · 9a062013
      Chris Healy 提交于
      This changes the message_age_timer calculation to use the BPDU's max age as
      opposed to the local bridge's max age.  This is in accordance with section
      8.6.2.3.2 Step 2 of the 802.1D-1998 sprecification.
      
      With the current implementation, when running with very large bridge
      diameters, convergance will not always occur even if a root bridge is
      configured to have a longer max age.
      
      Tested successfully on bridge diameters of ~200.
      Signed-off-by: NChris Healy <cphealy@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a062013
    • S
      memcg: rename RESOURCE_MAX to RES_COUNTER_MAX · 6de5a8bf
      Sha Zhengju 提交于
      RESOURCE_MAX is far too general name, change it to RES_COUNTER_MAX.
      Signed-off-by: NSha Zhengju <handai.szj@taobao.com>
      Signed-off-by: NQiang Huang <h.huangqiang@huawei.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Jeff Liu <jeff.liu@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6de5a8bf
    • D
      net: sctp: fix ipv6 ipsec encryption bug in sctp_v6_xmit · 95ee6208
      Daniel Borkmann 提交于
      Alan Chester reported an issue with IPv6 on SCTP that IPsec traffic is not
      being encrypted, whereas on IPv4 it is. Setting up an AH + ESP transport
      does not seem to have the desired effect:
      
      SCTP + IPv4:
      
        22:14:20.809645 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 116)
          192.168.0.2 > 192.168.0.5: AH(spi=0x00000042,sumlen=16,seq=0x1): ESP(spi=0x00000044,seq=0x1), length 72
        22:14:20.813270 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 340)
          192.168.0.5 > 192.168.0.2: AH(spi=0x00000043,sumlen=16,seq=0x1):
      
      SCTP + IPv6:
      
        22:31:19.215029 IP6 (class 0x02, hlim 64, next-header SCTP (132) payload length: 364)
          fe80::222:15ff:fe87:7fc.3333 > fe80::92e6:baff:fe0d:5a54.36767: sctp
          1) [INIT ACK] [init tag: 747759530] [rwnd: 62464] [OS: 10] [MIS: 10]
      
      Moreover, Alan says:
      
        This problem was seen with both Racoon and Racoon2. Other people have seen
        this with OpenSwan. When IPsec is configured to encrypt all upper layer
        protocols the SCTP connection does not initialize. After using Wireshark to
        follow packets, this is because the SCTP packet leaves Box A unencrypted and
        Box B believes all upper layer protocols are to be encrypted so it drops
        this packet, causing the SCTP connection to fail to initialize. When IPsec
        is configured to encrypt just SCTP, the SCTP packets are observed unencrypted.
      
      In fact, using `socat sctp6-listen:3333 -` on one end and transferring "plaintext"
      string on the other end, results in cleartext on the wire where SCTP eventually
      does not report any errors, thus in the latter case that Alan reports, the
      non-paranoid user might think he's communicating over an encrypted transport on
      SCTP although he's not (tcpdump ... -X):
      
        ...
        0x0030: 5d70 8e1a 0003 001a 177d eb6c 0000 0000  ]p.......}.l....
        0x0040: 0000 0000 706c 6169 6e74 6578 740a 0000  ....plaintext...
      
      Only in /proc/net/xfrm_stat we can see XfrmInTmplMismatch increasing on the
      receiver side. Initial follow-up analysis from Alan's bug report was done by
      Alexey Dobriyan. Also thanks to Vlad Yasevich for feedback on this.
      
      SCTP has its own implementation of sctp_v6_xmit() not calling inet6_csk_xmit().
      This has the implication that it probably never really got updated along with
      changes in inet6_csk_xmit() and therefore does not seem to invoke xfrm handlers.
      
      SCTP's IPv4 xmit however, properly calls ip_queue_xmit() to do the work. Since
      a call to inet6_csk_xmit() would solve this problem, but result in unecessary
      route lookups, let us just use the cached flowi6 instead that we got through
      sctp_v6_get_dst(). Since all SCTP packets are being sent through sctp_packet_transmit(),
      we do the route lookup / flow caching in sctp_transport_route(), hold it in
      tp->dst and skb_dst_set() right after that. If we would alter fl6->daddr in
      sctp_v6_xmit() to np->opt->srcrt, we possibly could run into the same effect
      of not having xfrm layer pick it up, hence, use fl6_update_dst() in sctp_v6_get_dst()
      instead to get the correct source routed dst entry, which we assign to the skb.
      
      Also source address routing example from 62503411 ("sctp: fix sctp to work with
      ipv6 source address routing") still works with this patch! Nevertheless, in RFC5095
      it is actually 'recommended' to not use that anyway due to traffic amplification [1].
      So it seems we're not supposed to do that anyway in sctp_v6_xmit(). Moreover, if
      we overwrite the flow destination here, the lower IPv6 layer will be unable to
      put the correct destination address into IP header, as routing header is added in
      ipv6_push_nfrag_opts() but then probably with wrong final destination. Things aside,
      result of this patch is that we do not have any XfrmInTmplMismatch increase plus on
      the wire with this patch it now looks like:
      
      SCTP + IPv6:
      
        08:17:47.074080 IP6 2620:52:0:102f:7a2b:cbff:fe27:1b0a > 2620:52:0:102f:213:72ff:fe32:7eba:
          AH(spi=0x00005fb4,seq=0x1): ESP(spi=0x00005fb5,seq=0x1), length 72
        08:17:47.074264 IP6 2620:52:0:102f:213:72ff:fe32:7eba > 2620:52:0:102f:7a2b:cbff:fe27:1b0a:
          AH(spi=0x00003d54,seq=0x1): ESP(spi=0x00003d55,seq=0x1), length 296
      
      This fixes Kernel Bugzilla 24412. This security issue seems to be present since
      2.6.18 kernels. Lets just hope some big passive adversary in the wild didn't have
      its fun with that. lksctp-tools IPv6 regression test suite passes as well with
      this patch.
      
       [1] http://www.secdev.org/conf/IPv6_RH_security-csw07.pdfReported-by: NAlan Chester <alan.chester@tekelec.com>
      Reported-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95ee6208