1. 11 2月, 2014 10 次提交
    • D
      Merge branch 'bridge' · 684bd2e1
      David S. Miller 提交于
      Toshiaki Makita says:
      
      ====================
      bridge: Fix corner case problems around local fdb entries
      
      There are so many corner cases that are not handled properly around local
      fdb entries.
      - We might fail to delete the old entry and might delete an arbitrary local
        entry when changing mac address of a bridge port.
      - We always fail to delete the old entry when changing mac address of the
        bridge device.
      - We might incorrectly delete a necessary entry when detaching a bridge port.
      - We might incorrectly delete a necessary entry when deleting a vlan.
      and so on.
      
      This is a patch series to fix these issues.
      
      v3:
      - Handle NTF_USE case in patch 1/9, commented by Vlad Yasevich.
      
      - Tested port detach/attach and didn't find any problem with patch 5/9,
        suggested by Stephen Hemminger.
      
      - Add comments about possible inconsistent state in current implementation
        into commit log of patch 5/9, found by the above test.
      
      - Reword unintensive changelog of patch 7/9, commented by Vlad Yasevich.
      
      v2:
      - Change the way to find the old entry in br_fdb_changeaddr() from memorizing
        previous port address to introducing a new flag indicating whether a fdb
        entry is added by user or not, commented by Stephen Hemminger.
      
      - Add a fix for the way to insert a new address in br_fdb_changeaddr().
      
      - Prevent creating an entry such that its dst is NULL in br_add_if() to
        preserve old behavior, commented by Vlad Yasevich.
      
      - Add more comments about slight behavior change, where the bridge device
        come to be able to receive traffic to an address it has during short
        window, to changelogs, commented by Vlad Yasevich.
      
      - Add a fix for possible race in br_fdb_change_mac_address().
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      684bd2e1
    • T
      bridge: Prevent possible race condition in br_fdb_change_mac_address · ac4c8868
      Toshiaki Makita 提交于
      br_fdb_change_mac_address() calls fdb_insert()/fdb_delete() without
      br->hash_lock.
      
      These hash list updates are racy with br_fdb_update()/br_fdb_cleanup().
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac4c8868
    • T
      bridge: Properly check if local fdb entry can be deleted when deleting vlan · 424bb9c9
      Toshiaki Makita 提交于
      Vlan codes unconditionally delete local fdb entries.
      We should consider the possibility that other ports have the same
      address and vlan.
      
      Example of problematic case:
        ip link set eth0 address 12:34:56:78:90:ab
        ip link set eth1 address aa:bb:cc:dd:ee:ff
        brctl addif br0 eth0
        brctl addif br0 eth1 # br0 will have mac address 12:34:56:78:90:ab
        bridge vlan add dev eth0 vid 10
        bridge vlan add dev eth1 vid 10
        bridge vlan add dev br0 vid 10 self
      We will have fdb entry such that f->dst == eth0, f->vlan_id == 10 and
      f->addr == 12:34:56:78:90:ab at this time.
      Next, delete eth0 vlan 10.
        bridge vlan del dev eth0 vid 10
      In this case, we still need the entry for br0, but it will be deleted.
      
      Note that br0 needs the entry even though its mac address is not set
      manually. To delete the entry with proper condition checking,
      fdb_delete_local() is suitable to use.
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      424bb9c9
    • T
      bridge: Properly check if local fdb entry can be deleted in br_fdb_delete_by_port · a778e6d1
      Toshiaki Makita 提交于
      br_fdb_delete_by_port() doesn't care about vlan and mac address of the
      bridge device.
      
      As the check is almost the same as mac address changing, slightly modify
      fdb_delete_local() and use it.
      
      Note that we can always set added_by_user to 0 in fdb_delete_local() because
      - br_fdb_delete_by_port() calls fdb_delete_local() for local entries
        regardless of its added_by_user. In this case, we have to check if another
        port has the same address and vlan, and if found, we have to create the
        entry (by changing dst). This is kernel-added entry, not user-added.
      - br_fdb_changeaddr() doesn't call fdb_delete_local() for user-added entry.
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a778e6d1
    • T
      bridge: Properly check if local fdb entry can be deleted in br_fdb_change_mac_address · 960b589f
      Toshiaki Makita 提交于
      br_fdb_change_mac_address() doesn't check if the local entry has the
      same address as any of bridge ports.
      Although I'm not sure when it is beneficial, current implementation allow
      the bridge device to receive any mac address of its ports.
      To preserve this behavior, we have to check if the mac address of the
      entry being deleted is identical to that of any port.
      
      As this check is almost the same as that in br_fdb_changeaddr(), create
      a common function fdb_delete_local() and call it from
      br_fdb_changeadddr() and br_fdb_change_mac_address().
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      960b589f
    • T
      bridge: Fix the way to check if a local fdb entry can be deleted · 2b292fb4
      Toshiaki Makita 提交于
      We should take into account the followings when deleting a local fdb
      entry.
      
      - nbp_vlan_find() can be used only when vid != 0 to check if an entry is
        deletable, because a fdb entry with vid 0 can exist at any time while
        nbp_vlan_find() always return false with vid 0.
      
        Example of problematic case:
          ip link set eth0 address 12:34:56:78:90:ab
          ip link set eth1 address 12:34:56:78:90:ab
          brctl addif br0 eth0
          brctl addif br0 eth1
          ip link set eth0 address aa:bb:cc:dd:ee:ff
        Then, the fdb entry 12:34:56:78:90:ab will be deleted even though the
        bridge port eth1 still has that address.
      
      - The port to which the bridge device is attached might needs a local entry
        if its mac address is set manually.
      
        Example of problematic case:
          ip link set eth0 address 12:34:56:78:90:ab
          brctl addif br0 eth0
          ip link set br0 address 12:34:56:78:90:ab
          ip link set eth0 address aa:bb:cc:dd:ee:ff
        Then, the fdb still must have the entry 12:34:56:78:90:ab, but it will be
        deleted.
      
      We can use br->dev->addr_assign_type to check if the address is manually
      set or not, but I propose another approach.
      
      Since we delete and insert local entries whenever changing mac address
      of the bridge device, we can change dst of the entry to NULL regardless of
      addr_assign_type when deleting an entry associated with a certain port,
      and if it is found to be unnecessary later, then delete it.
      That is, if changing mac address of a port, the entry might be changed
      to its dst being NULL first, but is eventually deleted when recalculating
      and changing bridge id.
      
      This approach is especially useful when we want to share the code with
      deleting vlan in which the bridge device might want such an entry regardless
      of addr_assign_type, and makes things easy because we don't have to consider
      if mac address of the bridge device will be changed or not at the time we
      delete a local entry of a port, which means fdb code will not be bothered
      even if the bridge id calculating logic is changed in the future.
      
      Also, this change reduces inconsistent state, where frames whose dst is the
      mac address of the bridge, can't reach the bridge because of premature fdb
      entry deletion. This change reduces the possibility that the bridge device
      replies unreachable mac address to arp requests, which could occur during
      the short window between calling del_nbp() and br_stp_recalculate_bridge_id()
      in br_del_if(). This will effective after br_fdb_delete_by_port() starts to
      use the same code by following patch.
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b292fb4
    • T
      bridge: Change local fdb entries whenever mac address of bridge device changes · a4b816d8
      Toshiaki Makita 提交于
      Vlan code may need fdb change when changing mac address of bridge device
      even if it is caused by the mac address changing of a bridge port.
      
      Example configuration:
        ip link set eth0 address 12:34:56:78:90:ab
        ip link set eth1 address aa:bb:cc:dd:ee:ff
        brctl addif br0 eth0
        brctl addif br0 eth1 # br0 will have mac address 12:34:56:78:90:ab
        bridge vlan add dev br0 vid 10 self
        bridge vlan add dev eth0 vid 10
      We will have fdb entry such that f->dst == NULL, f->vlan_id == 10 and
      f->addr == 12:34:56:78:90:ab at this time.
      Next, change the mac address of eth0 to greater value.
        ip link set eth0 address ee:ff:12:34:56:78
      Then, mac address of br0 will be recalculated and set to aa:bb:cc:dd:ee:ff.
      However, an entry aa:bb:cc:dd:ee:ff will not be created and we will be not
      able to communicate using br0 on vlan 10.
      
      Address this issue by deleting and adding local entries whenever
      changing the mac address of the bridge device.
      
      If there already exists an entry that has the same address, for example,
      in case that br_fdb_changeaddr() has already inserted it,
      br_fdb_change_mac_address() will simply fail to insert it and no
      duplicated entry will be made, as it was.
      
      This approach also needs br_add_if() to call br_fdb_insert() before
      br_stp_recalculate_bridge_id() so that we don't create an entry whose
      dst == NULL in this function to preserve previous behavior.
      
      Note that this is a slight change in behavior where the bridge device can
      receive the traffic to the new address before calling
      br_stp_recalculate_bridge_id() in br_add_if().
      However, it is not a problem because we have already the address on the
      new port and such a way to insert new one before recalculating bridge id
      is taken in br_device_event() as well.
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4b816d8
    • T
      bridge: Fix the way to find old local fdb entries in br_fdb_change_mac_address · a3ebb7ef
      Toshiaki Makita 提交于
      We have been always failed to delete the old entry at
      br_fdb_change_mac_address() because br_set_mac_address() updates
      dev->dev_addr before calling br_fdb_change_mac_address() and
      br_fdb_change_mac_address() uses dev->dev_addr to find the old entry.
      
      That update of dev_addr is completely unnecessary because the same work
      is done in br_stp_change_bridge_id() which is called right away after
      calling br_fdb_change_mac_address().
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3ebb7ef
    • T
      bridge: Fix the way to insert new local fdb entries in br_fdb_changeaddr · 2836882f
      Toshiaki Makita 提交于
      Since commit bc9a25d2 ("bridge: Add vlan support for local fdb entries"),
      br_fdb_changeaddr() has inserted a new local fdb entry only if it can
      find old one. But if we have two ports where they have the same address
      or user has deleted a local entry, there will be no entry for one of the
      ports.
      
      Example of problematic case:
        ip link set eth0 address aa:bb:cc:dd:ee:ff
        ip link set eth1 address aa:bb:cc:dd:ee:ff
        brctl addif br0 eth0
        brctl addif br0 eth1 # eth1 will not have a local entry due to dup.
        ip link set eth1 address 12:34:56:78:90:ab
      Then, the new entry for the address 12:34:56:78:90:ab will not be
      created, and the bridge device will not be able to communicate.
      
      Insert new entries regardless of whether we can find old entries or not.
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2836882f
    • T
      bridge: Fix the way to find old local fdb entries in br_fdb_changeaddr · a5642ab4
      Toshiaki Makita 提交于
      br_fdb_changeaddr() assumes that there is at most one local entry per port
      per vlan. It used to be true, but since commit 36fd2b63 ("bridge: allow
      creating/deleting fdb entries via netlink"), it has not been so.
      Therefore, the function might fail to search a correct previous address
      to be deleted and delete an arbitrary local entry if user has added local
      entries manually.
      
      Example of problematic case:
        ip link set eth0 address ee:ff:12:34:56:78
        brctl addif br0 eth0
        bridge fdb add 12:34:56:78:90:ab dev eth0 master
        ip link set eth0 address aa:bb:cc:dd:ee:ff
      Then, the address 12:34:56:78:90:ab might be deleted instead of
      ee:ff:12:34:56:78, the original mac address of eth0.
      
      Address this issue by introducing a new flag, added_by_user, to struct
      net_bridge_fdb_entry.
      
      Note that br_fdb_delete_by_port() has to set added_by_user to 0 in cases
      like:
        ip link set eth0 address 12:34:56:78:90:ab
        ip link set eth1 address aa:bb:cc:dd:ee:ff
        brctl addif br0 eth0
        bridge fdb add aa:bb:cc:dd:ee:ff dev eth0 master
        brctl addif br0 eth1
        brctl delif br0 eth0
      In this case, kernel should delete the user-added entry aa:bb:cc:dd:ee:ff,
      but it also should have been added by "brctl addif br0 eth1" originally,
      so we don't delete it and treat it a new kernel-created entry.
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5642ab4
  2. 10 2月, 2014 22 次提交
  3. 08 2月, 2014 3 次提交
  4. 07 2月, 2014 5 次提交
    • P
      netfilter: nft_rbtree: fix data handling of end interval elements · 2fb91ddb
      Pablo Neira Ayuso 提交于
      This patch fixes several things which related to the handling of
      end interval elements:
      
      * Chain use underflow with intervals and map: If you add a rule
        using intervals+map that introduces a loop, the error path of the
        rbtree set decrements the chain refcount for each side of the
        interval, leading to a chain use counter underflow.
      
      * Don't copy the data part of the end interval element since, this
        area is uninitialized and this confuses the loop detection code.
      
      * Don't allocate room for the data part of end interval elements
        since this is unused.
      
      So, after this patch the idea is that end interval elements don't
      have a data part.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      2fb91ddb
    • P
      netfilter: nf_tables: do not allow NFT_SET_ELEM_INTERVAL_END flag and data · bd7fc645
      Pablo Neira Ayuso 提交于
      This combination is not allowed since end interval elements cannot
      contain data.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      bd7fc645
    • E
      tcp: remove 1ms offset in srtt computation · 4a5ab4e2
      Eric Dumazet 提交于
      TCP pacing depends on an accurate srtt estimation.
      
      Current srtt estimation is using jiffie resolution,
      and has an artificial offset of at least 1 ms, which can produce
      slowdowns when FQ/pacing is used, especially in DC world,
      where typical rtt is below 1 ms.
      
      We are planning a switch to usec resolution for linux-3.15,
      but in the meantime, this patch removes the 1 ms offset.
      
      All we need is to have tp->srtt minimal value of 1 to differentiate
      the case of srtt being initialized or not, not 8.
      
      The problematic behavior was observed on a 40Gbit testbed,
      where 32 concurrent netperf were reaching 12Gbps of aggregate
      speed, instead of line speed.
      
      This patch also has the effect of reporting more accurate srtt and send
      rates to iproute2 ss command as in :
      
      $ ss -i dst cca2
      Netid  State      Recv-Q Send-Q          Local Address:Port
      Peer Address:Port
      tcp    ESTAB      0      0                10.244.129.1:56984
      10.244.129.2:12865
      	 cubic wscale:6,6 rto:200 rtt:0.25/0.25 ato:40 mss:1448 cwnd:10 send
      463.4Mbps rcv_rtt:1 rcv_space:29200
      tcp    ESTAB      0      390960           10.244.129.1:60247
      10.244.129.2:50204
      	 cubic wscale:6,6 rto:200 rtt:0.875/0.75 mss:1448 cwnd:73 ssthresh:51
      send 966.4Mbps unacked:73 retrans:0/121 rcv_space:29200
      Reported-by: NVytautas Valancius <valas@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a5ab4e2
    • C
      bridge: fix netconsole setup over bridge · dbe17307
      Cong Wang 提交于
      Commit 93d8bf9f ("bridge: cleanup netpoll code") introduced
      a check in br_netpoll_enable(), but this check is incorrect for
      br_netpoll_setup(). This patch moves the code after the check
      into __br_netpoll_enable() and calls it in br_netpoll_setup().
      For br_add_if(), the check is still needed.
      
      Fixes: 93d8bf9f ("bridge: cleanup netpoll code")
      Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Tested-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dbe17307
    • E
      net: use __GFP_NORETRY for high order allocations · ed98df33
      Eric Dumazet 提交于
      sock_alloc_send_pskb() & sk_page_frag_refill()
      have a loop trying high order allocations to prepare
      skb with low number of fragments as this increases performance.
      
      Problem is that under memory pressure/fragmentation, this can
      trigger OOM while the intent was only to try the high order
      allocations, then fallback to order-0 allocations.
      
      We had various reports from unexpected regressions.
      
      According to David, setting __GFP_NORETRY should be fine,
      as the asynchronous compaction is still enabled, and this
      will prevent OOM from kicking as in :
      
      CFSClientEventm invoked oom-killer: gfp_mask=0x42d0, order=3, oom_adj=0,
      oom_score_adj=0, oom_score_badness=2 (enabled),memcg_scoring=disabled
      CFSClientEventm
      
      Call Trace:
       [<ffffffff8043766c>] dump_header+0xe1/0x23e
       [<ffffffff80437a02>] oom_kill_process+0x6a/0x323
       [<ffffffff80438443>] out_of_memory+0x4b3/0x50d
       [<ffffffff8043a4a6>] __alloc_pages_may_oom+0xa2/0xc7
       [<ffffffff80236f42>] __alloc_pages_nodemask+0x1002/0x17f0
       [<ffffffff8024bd23>] alloc_pages_current+0x103/0x2b0
       [<ffffffff8028567f>] sk_page_frag_refill+0x8f/0x160
       [<ffffffff80295fa0>] tcp_sendmsg+0x560/0xee0
       [<ffffffff802a5037>] inet_sendmsg+0x67/0x100
       [<ffffffff80283c9c>] __sock_sendmsg_nosec+0x6c/0x90
       [<ffffffff80283e85>] sock_sendmsg+0xc5/0xf0
       [<ffffffff802847b6>] __sys_sendmsg+0x136/0x430
       [<ffffffff80284ec8>] sys_sendmsg+0x88/0x110
       [<ffffffff80711472>] system_call_fastpath+0x16/0x1b
      Out of Memory: Kill process 2856 (bash) score 9999 or sacrifice child
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed98df33