1. 05 4月, 2013 2 次提交
  2. 03 4月, 2013 4 次提交
    • M
      netfilter: ip6t_NPT: Fix translation for non-multiple of 32 prefix lengths · 906b1c39
      Matthias Schiffer 提交于
      The bitmask used for the prefix mangling was being calculated
      incorrectly, leading to the wrong part of the address being replaced
      when the prefix length wasn't a multiple of 32.
      Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      906b1c39
    • R
      VSOCK: Handle changes to the VMCI context ID. · 990454b5
      Reilly Grant 提交于
      The VMCI context ID of a virtual machine may change at any time. There
      is a VMCI event which signals this but datagrams may be processed before
      this is handled. It is therefore necessary to be flexible about the
      destination context ID of any datagrams received. (It can be assumed to
      be correct because it is provided by the hypervisor.) The context ID on
      existing sockets should be updated to reflect how the hypervisor is
      currently referring to the system.
      Signed-off-by: NReilly Grant <grantr@vmware.com>
      Acked-by: NAndy King <acking@vmware.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      990454b5
    • B
      net IPv6 : Fix broken IPv6 routing table after loopback down-up · 25fb6ca4
      Balakumaran Kannan 提交于
      IPv6 Routing table becomes broken once we do ifdown, ifup of the loopback(lo)
      interface. After down-up, routes of other interface's IPv6 addresses through
      'lo' are lost.
      
      IPv6 addresses assigned to all interfaces are routed through 'lo' for internal
      communication. Once 'lo' is down, those routing entries are removed from routing
      table. But those removed entries are not being re-created properly when 'lo' is
      brought up. So IPv6 addresses of other interfaces becomes unreachable from the
      same machine. Also this breaks communication with other machines because of
      NDISC packet processing failure.
      
      This patch fixes this issue by reading all interface's IPv6 addresses and adding
      them to IPv6 routing table while bringing up 'lo'.
      
      ==Testing==
      Before applying the patch:
      $ route -A inet6
      Kernel IPv6 routing table
      Destination                    Next Hop                   Flag Met Ref Use If
      2000::20/128                   ::                         U    256 0     0 eth0
      fe80::/64                      ::                         U    256 0     0 eth0
      ::/0                           ::                         !n   -1  1     1 lo
      ::1/128                        ::                         Un   0   1     0 lo
      2000::20/128                   ::                         Un   0   1     0 lo
      fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
      ff00::/8                       ::                         U    256 0     0 eth0
      ::/0                           ::                         !n   -1  1     1 lo
      $ sudo ifdown lo
      $ sudo ifup lo
      $ route -A inet6
      Kernel IPv6 routing table
      Destination                    Next Hop                   Flag Met Ref Use If
      2000::20/128                   ::                         U    256 0     0 eth0
      fe80::/64                      ::                         U    256 0     0 eth0
      ::/0                           ::                         !n   -1  1     1 lo
      ::1/128                        ::                         Un   0   1     0 lo
      ff00::/8                       ::                         U    256 0     0 eth0
      ::/0                           ::                         !n   -1  1     1 lo
      $
      
      After applying the patch:
      $ route -A inet6
      Kernel IPv6 routing
      table
      Destination                    Next Hop                   Flag Met Ref Use If
      2000::20/128                   ::                         U    256 0     0 eth0
      fe80::/64                      ::                         U    256 0     0 eth0
      ::/0                           ::                         !n   -1  1     1 lo
      ::1/128                        ::                         Un   0   1     0 lo
      2000::20/128                   ::                         Un   0   1     0 lo
      fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
      ff00::/8                       ::                         U    256 0     0 eth0
      ::/0                           ::                         !n   -1  1     1 lo
      $ sudo ifdown lo
      $ sudo ifup lo
      $ route -A inet6
      Kernel IPv6 routing table
      Destination                    Next Hop                   Flag Met Ref Use If
      2000::20/128                   ::                         U    256 0     0 eth0
      fe80::/64                      ::                         U    256 0     0 eth0
      ::/0                           ::                         !n   -1  1     1 lo
      ::1/128                        ::                         Un   0   1     0 lo
      2000::20/128                   ::                         Un   0   1     0 lo
      fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
      ff00::/8                       ::                         U    256 0     0 eth0
      ::/0                           ::                         !n   -1  1     1 lo
      $
      Signed-off-by: NBalakumaran Kannan <Balakumaran.Kannan@ap.sony.com>
      Signed-off-by: NMaruthi Thotad <Maruthi.Thotad@ap.sony.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25fb6ca4
    • V
      cbq: incorrect processing of high limits · f0f6ee1f
      Vasily Averin 提交于
      currently cbq works incorrectly for limits > 10% real link bandwidth,
      and practically does not work for limits > 50% real link bandwidth.
      Below are results of experiments taken on 1 Gbit link
      
       In shaper | Actual Result
      -----------+---------------
        100M     | 108 Mbps
        200M     | 244 Mbps
        300M     | 412 Mbps
        500M     | 893 Mbps
      
      This happen because of q->now changes incorrectly in cbq_dequeue():
      when it is called before real end of packet transmitting,
      L2T is greater than real time delay, q_now gets an extra boost
      but never compensate it.
      
      To fix this problem we prevent change of q->now until its synchronization
      with real time.
      Signed-off-by: NVasily Averin <vvs@openvz.org>
      Reviewed-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0f6ee1f
  3. 30 3月, 2013 5 次提交
    • E
      net: add a synchronize_net() in netdev_rx_handler_unregister() · 00cfec37
      Eric Dumazet 提交于
      commit 35d48903 (bonding: fix rx_handler locking) added a race
      in bonding driver, reported by Steven Rostedt who did a very good
      diagnosis :
      
      <quoting Steven>
      
      I'm currently debugging a crash in an old 3.0-rt kernel that one of our
      customers is seeing. The bug happens with a stress test that loads and
      unloads the bonding module in a loop (I don't know all the details as
      I'm not the one that is directly interacting with the customer). But the
      bug looks to be something that may still be present and possibly present
      in mainline too. It will just be much harder to trigger it in mainline.
      
      In -rt, interrupts are threads, and can schedule in and out just like
      any other thread. Note, mainline now supports interrupt threads so this
      may be easily reproducible in mainline as well. I don't have the ability
      to tell the customer to try mainline or other kernels, so my hands are
      somewhat tied to what I can do.
      
      But according to a core dump, I tracked down that the eth irq thread
      crashed in bond_handle_frame() here:
      
              slave = bond_slave_get_rcu(skb->dev);
              bond = slave->bond; <--- BUG
      
      the slave returned was NULL and accessing slave->bond caused a NULL
      pointer dereference.
      
      Looking at the code that unregisters the handler:
      
      void netdev_rx_handler_unregister(struct net_device *dev)
      {
      
              ASSERT_RTNL();
              RCU_INIT_POINTER(dev->rx_handler, NULL);
              RCU_INIT_POINTER(dev->rx_handler_data, NULL);
      }
      
      Which is basically:
              dev->rx_handler = NULL;
              dev->rx_handler_data = NULL;
      
      And looking at __netif_receive_skb() we have:
      
              rx_handler = rcu_dereference(skb->dev->rx_handler);
              if (rx_handler) {
                      if (pt_prev) {
                              ret = deliver_skb(skb, pt_prev, orig_dev);
                              pt_prev = NULL;
                      }
                      switch (rx_handler(&skb)) {
      
      My question to all of you is, what stops this interrupt from happening
      while the bonding module is unloading?  What happens if the interrupt
      triggers and we have this:
      
              CPU0                    CPU1
              ----                    ----
        rx_handler = skb->dev->rx_handler
      
                              netdev_rx_handler_unregister() {
                                 dev->rx_handler = NULL;
                                 dev->rx_handler_data = NULL;
      
        rx_handler()
         bond_handle_frame() {
          slave = skb->dev->rx_handler;
          bond = slave->bond; <-- NULL pointer dereference!!!
      
      What protection am I missing in the bond release handler that would
      prevent the above from happening?
      
      </quoting Steven>
      
      We can fix bug this in two ways. First is adding a test in
      bond_handle_frame() and others to check if rx_handler_data is NULL.
      
      A second way is adding a synchronize_net() in
      netdev_rx_handler_unregister() to make sure that a rcu protected reader
      has the guarantee to see a non NULL rx_handler_data.
      
      The second way is better as it avoids an extra test in fast path.
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jiri Pirko <jpirko@redhat.com>
      Cc: Paul E. McKenney <paulmck@us.ibm.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00cfec37
    • V
      net: fq_codel: Fix off-by-one error · cd68ddd4
      Vijay Subramanian 提交于
      Currently, we hold a max of sch->limit -1 number of packets instead of
      sch->limit packets. Fix this off-by-one error.
      Signed-off-by: NVijay Subramanian <subramanian.vijay@gmail.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd68ddd4
    • S
      net: core: Remove redundant call to 'nf_reset' in 'dev_forward_skb' · a561cf7e
      Shmulik Ladkani 提交于
      'nf_reset' is called just prior calling 'netif_rx'.
      No need to call it twice.
      Reported-by: NIgor Michailov <rgohita@gmail.com>
      Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a561cf7e
    • L
      net: fix the use of this_cpu_ptr · 50eab050
      Li RongQing 提交于
      flush_tasklet is not percpu var, and percpu is percpu var, and
      	this_cpu_ptr(&info->cache->percpu->flush_tasklet)
      is not equal to
      	&this_cpu_ptr(info->cache->percpu)->flush_tasklet
      
      1f743b07(use this_cpu_ptr per-cpu helper) introduced this bug.
      Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50eab050
    • H
      ipv6: don't accept node local multicast traffic from the wire · 1c4a154e
      Hannes Frederic Sowa 提交于
      Erik Hugne's errata proposal (Errata ID: 3480) to RFC4291 has been
      verified: http://www.rfc-editor.org/errata_search.php?eid=3480
      
      We have to check for pkt_type and loopback flag because either the
      packets are allowed to travel over the loopback interface (in which case
      pkt_type is PACKET_HOST and IFF_LOOPBACK flag is set) or they travel
      over a non-loopback interface back to us (in which case PACKET_TYPE is
      PACKET_LOOPBACK and IFF_LOOPBACK flag is not set).
      
      Cc: Erik Hugne <erik.hugne@ericsson.com>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c4a154e
  4. 28 3月, 2013 3 次提交
  5. 27 3月, 2013 2 次提交
  6. 26 3月, 2013 3 次提交
  7. 25 3月, 2013 10 次提交
  8. 24 3月, 2013 2 次提交
    • B
      mac80211: Don't restart sta-timer if not associated. · 370bd005
      Ben Greear 提交于
      I found another crash when deleting lots of virtual stations
      in a congested environment.  I think the problem is that
      the ieee80211_mlme_notify_scan_completed could call
      ieee80211_restart_sta_timer for a stopped interface
      that was about to be deleted.
      
      With the following patch I am unable to reproduce the
      crash.
      Signed-off-by: NBen Greear <greearb@candelatech.com>
      [move check, also make the same change in mesh]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      370bd005
    • J
      cfg80211: always check for scan end on P2P device · f9f47529
      Johannes Berg 提交于
      If a P2P device wdev is removed while it has a scan, then the
      scan completion might crash later as it is already freed by
      that time. To avoid the crash always check the scan completion
      when the P2P device is being removed for some reason. If the
      driver already canceled it, don't want and free it, otherwise
      warn and leak it to avoid later crashes.
      
      In order to do this, locking needs to be changed away from the
      rdev mutex (which can't always be guaranteed). For now, use
      the sched_scan_mtx instead, I'll rename it to just scan_mtx in
      a later patch.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      f9f47529
  9. 22 3月, 2013 1 次提交
    • E
      tcp: preserve ACK clocking in TSO · f4541d60
      Eric Dumazet 提交于
      A long standing problem with TSO is the fact that tcp_tso_should_defer()
      rearms the deferred timer, while it should not.
      
      Current code leads to following bad bursty behavior :
      
      20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119
      20:11:24.484337 IP B > A: . ack 263721 win 1117
      20:11:24.485086 IP B > A: . ack 265241 win 1117
      20:11:24.485925 IP B > A: . ack 266761 win 1117
      20:11:24.486759 IP B > A: . ack 268281 win 1117
      20:11:24.487594 IP B > A: . ack 269801 win 1117
      20:11:24.488430 IP B > A: . ack 271321 win 1117
      20:11:24.489267 IP B > A: . ack 272841 win 1117
      20:11:24.490104 IP B > A: . ack 274361 win 1117
      20:11:24.490939 IP B > A: . ack 275881 win 1117
      20:11:24.491775 IP B > A: . ack 277401 win 1117
      20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119
      20:11:24.492620 IP B > A: . ack 278921 win 1117
      20:11:24.493448 IP B > A: . ack 280441 win 1117
      20:11:24.494286 IP B > A: . ack 281961 win 1117
      20:11:24.495122 IP B > A: . ack 283481 win 1117
      20:11:24.495958 IP B > A: . ack 285001 win 1117
      20:11:24.496791 IP B > A: . ack 286521 win 1117
      20:11:24.497628 IP B > A: . ack 288041 win 1117
      20:11:24.498459 IP B > A: . ack 289561 win 1117
      20:11:24.499296 IP B > A: . ack 291081 win 1117
      20:11:24.500133 IP B > A: . ack 292601 win 1117
      20:11:24.500970 IP B > A: . ack 294121 win 1117
      20:11:24.501388 IP B > A: . ack 295641 win 1117
      20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119
      
      While the expected behavior is more like :
      
      20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119
      20:19:49.260446 IP B > A: . ack 154281 win 1212
      20:19:49.261282 IP B > A: . ack 155801 win 1212
      20:19:49.262125 IP B > A: . ack 157321 win 1212
      20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119
      20:19:49.262958 IP B > A: . ack 158841 win 1212
      20:19:49.263795 IP B > A: . ack 160361 win 1212
      20:19:49.264628 IP B > A: . ack 161881 win 1212
      20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119
      20:19:49.265465 IP B > A: . ack 163401 win 1212
      20:19:49.265886 IP B > A: . ack 164921 win 1212
      20:19:49.266722 IP B > A: . ack 166441 win 1212
      20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119
      20:19:49.267559 IP B > A: . ack 167961 win 1212
      20:19:49.268394 IP B > A: . ack 169481 win 1212
      20:19:49.269232 IP B > A: . ack 171001 win 1212
      20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Van Jacobson <vanj@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4541d60
  10. 21 3月, 2013 8 次提交