1. 22 5月, 2014 1 次提交
    • C
      net_sched: fix an oops in tcindex filter · bf63ac73
      Cong Wang 提交于
      Kelly reported the following crash:
      
              IP: [<ffffffff817a993d>] tcf_action_exec+0x46/0x90
              PGD 3009067 PUD 300c067 PMD 11ff30067 PTE 800000011634b060
              Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
              CPU: 1 PID: 639 Comm: dhclient Not tainted 3.15.0-rc4+ #342
              Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
              task: ffff8801169ecd00 ti: ffff8800d21b8000 task.ti: ffff8800d21b8000
              RIP: 0010:[<ffffffff817a993d>]  [<ffffffff817a993d>] tcf_action_exec+0x46/0x90
              RSP: 0018:ffff8800d21b9b90  EFLAGS: 00010283
              RAX: 00000000ffffffff RBX: ffff88011634b8e8 RCX: ffff8800cf7133d8
              RDX: ffff88011634b900 RSI: ffff8800cf7133e0 RDI: ffff8800d210f840
              RBP: ffff8800d21b9bb0 R08: ffffffff8287bf60 R09: 0000000000000001
              R10: ffff8800d2b22b24 R11: 0000000000000001 R12: ffff8800d210f840
              R13: ffff8800d21b9c50 R14: ffff8800cf7133e0 R15: ffff8800cad433d8
              FS:  00007f49723e1840(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
              CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
              CR2: ffff88011634b8f0 CR3: 00000000ce469000 CR4: 00000000000006e0
              Stack:
               ffff8800d2170188 ffff8800d210f840 ffff8800d2171b90 0000000000000000
               ffff8800d21b9be8 ffffffff817c55bb ffff8800d21b9c50 ffff8800d2171b90
               ffff8800d210f840 ffff8800d21b0300 ffff8800d21b9c50 ffff8800d21b9c18
              Call Trace:
               [<ffffffff817c55bb>] tcindex_classify+0x88/0x9b
               [<ffffffff817a7f7d>] tc_classify_compat+0x3e/0x7b
               [<ffffffff817a7fdf>] tc_classify+0x25/0x9f
               [<ffffffff817b0e68>] htb_enqueue+0x55/0x27a
               [<ffffffff817b6c2e>] dsmark_enqueue+0x165/0x1a4
               [<ffffffff81775642>] __dev_queue_xmit+0x35e/0x536
               [<ffffffff8177582a>] dev_queue_xmit+0x10/0x12
               [<ffffffff818f8ecd>] packet_sendmsg+0xb26/0xb9a
               [<ffffffff810b1507>] ? __lock_acquire+0x3ae/0xdf3
               [<ffffffff8175cf08>] __sock_sendmsg_nosec+0x25/0x27
               [<ffffffff8175d916>] sock_aio_write+0xd0/0xe7
               [<ffffffff8117d6b8>] do_sync_write+0x59/0x78
               [<ffffffff8117d84d>] vfs_write+0xb5/0x10a
               [<ffffffff8117d96a>] SyS_write+0x49/0x7f
               [<ffffffff8198e212>] system_call_fastpath+0x16/0x1b
      
      This is because we memcpy struct tcindex_filter_result which contains
      struct tcf_exts, obviously struct list_head can not be simply copied.
      This is a regression introduced by commit 33be6271
      (net_sched: act: use standard struct list_head).
      
      It's not very easy to fix it as the code is a mess:
      
             if (old_r)
                     memcpy(&cr, r, sizeof(cr));
             else {
                     memset(&cr, 0, sizeof(cr));
                     tcf_exts_init(&cr.exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
             }
             ...
             tcf_exts_change(tp, &cr.exts, &e);
             ...
             memcpy(r, &cr, sizeof(cr));
      
      the above code should equal to:
      
              tcindex_filter_result_init(&cr);
              if (old_r)
                     cr.res = r->res;
              ...
              if (old_r)
                     tcf_exts_change(tp, &r->exts, &e);
              else
                     tcf_exts_change(tp, &cr.exts, &e);
              ...
              r->res = cr.res;
      
      after this change, since there is no need to copy struct tcf_exts.
      
      And it also fixes other places zero'ing struct's contains struct tcf_exts.
      
      Fixes: commit 33be6271 (net_sched: act: use standard struct list_head)
      Reported-by: NKelly Anderson <kelly@xilka.com>
      Tested-by: NKelly Anderson <kelly@xilka.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf63ac73
  2. 21 5月, 2014 3 次提交
  3. 20 5月, 2014 1 次提交
  4. 19 5月, 2014 3 次提交
  5. 17 5月, 2014 20 次提交
    • D
      Merge branch 'bond_stacked_vlans' · a8d0d841
      David S. Miller 提交于
      Vlad Yasevich says:
      
      ====================
      Fixed stacked vlan usage on top of bonds
      
      Bonding device driver now support q-in-q on top for bonds.  There are
      a few issues here though.
      
      First, when arp monitoring is used, bonding driver will not correctly
      tag traffic if the source of the arp device was configured on top of
      q-in-q.  It may also incorrectly pick the wrong vlan id if the ordering
      of that upper devices isn't as expected (there is no guarntee on ordering).
      
      Second, the alb/tlb may use what would be considered 'inner' vlans in
      its learning announcements, as it simply announces all vlans configured
      on top of the bond without regard for encapsulation/stacking.
      
      This series fixes the above 2 issues.  This series also depends on the
      functionality introduced in
      	http://patchwork.ozlabs.org/patch/349766/
      
      Since v1:
        - Changed how patch1 verifies the device path.  We no longer use the
          _all_upper version of the function.  We find the path and if it was
          found, then collect the vlan information.
        - Use the constant to devine maximum vlan nest level support on top
          of bonding.  This can be changed if 2 is too low.
        - Inlude patch2 into the series.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8d0d841
    • V
      bonding: Fix alb mode to only use first level vlans. · f60c3704
      Vlad Yasevich 提交于
      ALB/TLB learning packets use all vlans configured on top
      of the bond.  This ends up being incorrect if we have a stack
      of vlans on top of the bond.  ALB/TLB should only use
      first level/outer most vlans in its announcements.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f60c3704
    • V
      bonding: Fix stacked device detection in arp monitoring · 44a40855
      Vlad Yasevich 提交于
      Prior to commit fbd929f2
      	bonding: support QinQ for bond arp interval
      
      the arp monitoring code allowed for proper detection of devices
      stacked on top of vlans.  Since the above commit, the
      code can still detect a device stacked on top of single
      vlan, but not a device stacked on top of Q-in-Q configuration.
      The search will only set the inner vlan tag if the route
      device is the vlan device.  However, this is not always the
      case, as it is possible to extend the stacked configuration.
      
      With this patch it is possible to provision devices on
      top Q-in-Q vlan configuration that should be used as
      a source of ARP monitoring information.
      
      For example:
      ip link add link bond0 vlan10 type vlan proto 802.1q id 10
      ip link add link vlan10 vlan100 type vlan proto 802.1q id 100
      ip link add link vlan100 type macvlan
      
      Note:  This patch limites the number of stacked VLANs to 2,
      just like before.  The original, however had another issue
      in that if we had more then 2 levels of VLANs, we would end
      up generating incorrectly tagged traffic.  This is no longer
      possible.
      
      Fixes: fbd929f2 (bonding: support QinQ for bond arp interval)
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@redhat.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: Ding Tianhong <dingtianhong@huawei.com>
      CC: Patric McHardy <kaber@trash.net>
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44a40855
    • D
      Merge branch 'stacked_netdevice_locking' · 6bd64ac0
      David S. Miller 提交于
      Vlad Yasevich says:
      
      ====================
      Fix lockdep issues with stacked devices
      
      Recent commit dc8eaaa0
          vlan: Fix lockdep warning when vlan dev handle notification
      
      attempted to solve lockdep issues with vlans where multiple
      vlans were stacked.  However, the code does not work correctly
      when the vlan stack is interspersed with other devices in between
      the vlans.  Additionally, similar lockdep issues show up with other
      devices.
      
      This series provides a generic way to solve these issue for any
      devices that can be stacked.  It also addresses the concern for
      vlan and macvlan devices.  I am not sure whether it makes sense
      to do so for other types like team, vxlan, and bond.
      
      Thanks
      -vlad
      
      Since v2:
        - Remove rcu variants from patch1, since that function is called
          only under rtnl.
        - Fix whitespace problems reported by checkpatch
      
      Since v1:
        - Fixed up a goofed-up rebase.
          * is_vlan_dev() should be bool and that change belongs in patch3.
          * patch4 should not have any vlan changes in it.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bd64ac0
    • V
      macvlan: Fix lockdep warnings with stacked macvlan devices · c674ac30
      Vlad Yasevich 提交于
      Macvlan devices try to avoid stacking, but that's not always
      successfull or even desired.  As an example, the following
      configuration is perefectly legal and valid:
      
      eth0 <--- macvlan0 <---- vlan0.10 <--- macvlan1
      
      However, this configuration produces the following lockdep
      trace:
      [  115.620418] ======================================================
      [  115.620477] [ INFO: possible circular locking dependency detected ]
      [  115.620516] 3.15.0-rc1+ #24 Not tainted
      [  115.620540] -------------------------------------------------------
      [  115.620577] ip/1704 is trying to acquire lock:
      [  115.620604]  (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80
      [  115.620686]
      but task is already holding lock:
      [  115.620723]  (&macvlan_netdev_addr_lock_key){+.....}, at: [<ffffffff815da5be>] dev_set_rx_mode+0x1e/0x40
      [  115.620795]
      which lock already depends on the new lock.
      
      [  115.620853]
      the existing dependency chain (in reverse order) is:
      [  115.620894]
      -> #1 (&macvlan_netdev_addr_lock_key){+.....}:
      [  115.620935]        [<ffffffff810d57f2>] lock_acquire+0xa2/0x130
      [  115.620974]        [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50
      [  115.621019]        [<ffffffffa07296c3>] vlan_dev_set_rx_mode+0x53/0x110 [8021q]
      [  115.621066]        [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0
      [  115.621105]        [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40
      [  115.621143]        [<ffffffff815da6be>] __dev_open+0xde/0x140
      [  115.621174]        [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170
      [  115.621174]        [<ffffffff815daaa9>] dev_change_flags+0x29/0x60
      [  115.621174]        [<ffffffff815e7f11>] do_setlink+0x321/0x9a0
      [  115.621174]        [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730
      [  115.621174]        [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250
      [  115.621174]        [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0
      [  115.621174]        [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40
      [  115.621174]        [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0
      [  115.621174]        [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740
      [  115.621174]        [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0
      [  115.621174]        [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380
      [  115.621174]        [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80
      [  115.621174]        [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20
      [  115.621174]        [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b
      [  115.621174]
      -> #0 (&vlan_netdev_addr_lock_key/1){+.....}:
      [  115.621174]        [<ffffffff810d4d43>] __lock_acquire+0x1773/0x1a60
      [  115.621174]        [<ffffffff810d57f2>] lock_acquire+0xa2/0x130
      [  115.621174]        [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50
      [  115.621174]        [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80
      [  115.621174]        [<ffffffffa0696d2a>] macvlan_set_mac_lists+0xca/0x110 [macvlan]
      [  115.621174]        [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0
      [  115.621174]        [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40
      [  115.621174]        [<ffffffff815da6be>] __dev_open+0xde/0x140
      [  115.621174]        [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170
      [  115.621174]        [<ffffffff815daaa9>] dev_change_flags+0x29/0x60
      [  115.621174]        [<ffffffff815e7f11>] do_setlink+0x321/0x9a0
      [  115.621174]        [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730
      [  115.621174]        [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250
      [  115.621174]        [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0
      [  115.621174]        [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40
      [  115.621174]        [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0
      [  115.621174]        [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740
      [  115.621174]        [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0
      [  115.621174]        [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380
      [  115.621174]        [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80
      [  115.621174]        [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20
      [  115.621174]        [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b
      [  115.621174]
      other info that might help us debug this:
      
      [  115.621174]  Possible unsafe locking scenario:
      
      [  115.621174]        CPU0                    CPU1
      [  115.621174]        ----                    ----
      [  115.621174]   lock(&macvlan_netdev_addr_lock_key);
      [  115.621174]                                lock(&vlan_netdev_addr_lock_key/1);
      [  115.621174]                                lock(&macvlan_netdev_addr_lock_key);
      [  115.621174]   lock(&vlan_netdev_addr_lock_key/1);
      [  115.621174]
       *** DEADLOCK ***
      
      [  115.621174] 2 locks held by ip/1704:
      [  115.621174]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff815e6dbb>] rtnetlink_rcv+0x1b/0x40
      [  115.621174]  #1:  (&macvlan_netdev_addr_lock_key){+.....}, at: [<ffffffff815da5be>] dev_set_rx_mode+0x1e/0x40
      [  115.621174]
      stack backtrace:
      [  115.621174] CPU: 3 PID: 1704 Comm: ip Not tainted 3.15.0-rc1+ #24
      [  115.621174] Hardware name: Hewlett-Packard HP xw8400 Workstation/0A08h, BIOS 786D5 v02.38 10/25/2010
      [  115.621174]  ffffffff82339ae0 ffff880465f79568 ffffffff816ee20c ffffffff82339ae0
      [  115.621174]  ffff880465f795a8 ffffffff816e9e1b ffff880465f79600 ffff880465b019c8
      [  115.621174]  0000000000000001 0000000000000002 ffff880465b019c8 ffff880465b01230
      [  115.621174] Call Trace:
      [  115.621174]  [<ffffffff816ee20c>] dump_stack+0x4d/0x66
      [  115.621174]  [<ffffffff816e9e1b>] print_circular_bug+0x200/0x20e
      [  115.621174]  [<ffffffff810d4d43>] __lock_acquire+0x1773/0x1a60
      [  115.621174]  [<ffffffff810d3172>] ? trace_hardirqs_on_caller+0xb2/0x1d0
      [  115.621174]  [<ffffffff810d57f2>] lock_acquire+0xa2/0x130
      [  115.621174]  [<ffffffff815df49c>] ? dev_uc_sync+0x3c/0x80
      [  115.621174]  [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50
      [  115.621174]  [<ffffffff815df49c>] ? dev_uc_sync+0x3c/0x80
      [  115.621174]  [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80
      [  115.621174]  [<ffffffffa0696d2a>] macvlan_set_mac_lists+0xca/0x110 [macvlan]
      [  115.621174]  [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0
      [  115.621174]  [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40
      [  115.621174]  [<ffffffff815da6be>] __dev_open+0xde/0x140
      [  115.621174]  [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170
      [  115.621174]  [<ffffffff815daaa9>] dev_change_flags+0x29/0x60
      [  115.621174]  [<ffffffff811e1db1>] ? mem_cgroup_bad_page_check+0x21/0x30
      [  115.621174]  [<ffffffff815e7f11>] do_setlink+0x321/0x9a0
      [  115.621174]  [<ffffffff810d394c>] ? __lock_acquire+0x37c/0x1a60
      [  115.621174]  [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730
      [  115.621174]  [<ffffffff815ea169>] ? rtnl_newlink+0xe9/0x730
      [  115.621174]  [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250
      [  115.621174]  [<ffffffff810d329d>] ? trace_hardirqs_on+0xd/0x10
      [  115.621174]  [<ffffffff815e6dbb>] ? rtnetlink_rcv+0x1b/0x40
      [  115.621174]  [<ffffffff815e6de0>] ? rtnetlink_rcv+0x40/0x40
      [  115.621174]  [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0
      [  115.621174]  [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40
      [  115.621174]  [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0
      [  115.621174]  [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740
      [  115.621174]  [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0
      [  115.621174]  [<ffffffff8119d4af>] ? might_fault+0x5f/0xb0
      [  115.621174]  [<ffffffff8119d4f8>] ? might_fault+0xa8/0xb0
      [  115.621174]  [<ffffffff8119d4af>] ? might_fault+0x5f/0xb0
      [  115.621174]  [<ffffffff815cb51e>] ? verify_iovec+0x5e/0xe0
      [  115.621174]  [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380
      [  115.621174]  [<ffffffff816faa0d>] ? __do_page_fault+0x11d/0x570
      [  115.621174]  [<ffffffff810cfe9f>] ? up_read+0x1f/0x40
      [  115.621174]  [<ffffffff816fab04>] ? __do_page_fault+0x214/0x570
      [  115.621174]  [<ffffffff8120a10b>] ? mntput_no_expire+0x6b/0x1c0
      [  115.621174]  [<ffffffff8120a0b7>] ? mntput_no_expire+0x17/0x1c0
      [  115.621174]  [<ffffffff8120a284>] ? mntput+0x24/0x40
      [  115.621174]  [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80
      [  115.621174]  [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20
      [  115.621174]  [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b
      
      Fix this by correctly providing macvlan lockdep class.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c674ac30
    • V
      vlan: Fix lockdep warning with stacked vlan devices. · d38569ab
      Vlad Yasevich 提交于
      This reverts commit dc8eaaa0.
      	vlan: Fix lockdep warning when vlan dev handle notification
      
      Instead we use the new new API to find the lock subclass of
      our vlan device.  This way we can support configurations where
      vlans are interspersed with other devices:
        bond -> vlan -> macvlan -> vlan
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d38569ab
    • V
      net: Allow for more then a single subclass for netif_addr_lock · 25175ba5
      Vlad Yasevich 提交于
      Currently netif_addr_lock_nested assumes that there can be only
      a single nesting level between 2 devices.  However, if we
      have multiple devices of the same type stacked, this fails.
      For example:
       eth0 <-- vlan0.10 <-- vlan0.10.20
      
      A more complicated configuration may stack more then one type of
      device in different order.
      Ex:
        eth0 <-- vlan0.10 <-- macvlan0 <-- vlan1.10.20 <-- macvlan1
      
      This patch adds an ndo_* function that allows each stackable
      device to report its nesting level.  If the device doesn't
      provide this function default subclass of 1 is used.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25175ba5
    • V
      net: Find the nesting level of a given device by type. · 4085ebe8
      Vlad Yasevich 提交于
      Multiple devices in the kernel can be stacked/nested and they
      need to know their nesting level for the purposes of lockdep.
      This patch provides a generic function that determines a nesting
      level of a particular device by its type (ex: vlan, macvlan, etc).
      We only care about nesting of the same type of devices.
      
      For example:
        eth0 <- vlan0.10 <- macvlan0 <- vlan1.20
      
      The nesting level of vlan1.20 would be 1, since there is another vlan
      in the stack under it.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4085ebe8
    • E
      net: gro: make sure skb->cb[] initial content has not to be zero · 29e98242
      Eric Dumazet 提交于
      Starting from linux-3.13, GRO attempts to build full size skbs.
      
      Problem is the commit assumed one particular field in skb->cb[]
      was clean, but it is not the case on some stacked devices.
      
      Timo reported a crash in case traffic is decrypted before
      reaching a GRE device.
      
      Fix this by initializing NAPI_GRO_CB(skb)->last at the right place,
      this also removes one conditional.
      
      Thanks a lot to Timo for providing full reports and bisecting this.
      
      Fixes: 8a29111c ("net: gro: allow to build full sized skb")
      Bisected-by: NTimo Teras <timo.teras@iki.fi>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NTimo Teräs <timo.teras@iki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29e98242
    • T
      ipv4: ip_tunnels: disable cache for nbma gre tunnels · 22fb22ea
      Timo Teräs 提交于
      The connected check fails to check for ip_gre nbma mode tunnels
      properly. ip_gre creates temporary tnl_params with daddr specified
      to pass-in the actual target on per-packet basis from neighbor
      layer. Detect these tunnels by inspecting the actual tunnel
      configuration.
      
      Minimal test case:
       ip route add 192.168.1.1/32 via 10.0.0.1
       ip route add 192.168.1.2/32 via 10.0.0.2
       ip tunnel add nbma0 mode gre key 1 tos c0
       ip addr add 172.17.0.0/16 dev nbma0
       ip link set nbma0 up
       ip neigh add 172.17.0.1 lladdr 192.168.1.1 dev nbma0
       ip neigh add 172.17.0.2 lladdr 192.168.1.2 dev nbma0
       ping 172.17.0.1
       ping 172.17.0.2
      
      The second ping should be going to 192.168.1.2 and head 10.0.0.2;
      but cached gre tunnel level route is used and it's actually going
      to 192.168.1.1 via 10.0.0.1.
      
      The lladdr's need to go to separate dst for the bug to trigger.
      Test case uses separate route entries, but this can also happen
      when the route entry is same: if there is a nexthop exception or
      the GRE tunnel is IPsec'ed in which case the dst points to xfrm
      bundle unique to the gre lladdr.
      
      Fixes: 7d442fab ("ipv4: Cache dst in tunnels")
      Signed-off-by: NTimo Teräs <timo.teras@iki.fi>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22fb22ea
    • F
      net/dsa/dsa.c: increment chip_index during of_node handling on dsa_of_probe() · d1c0b471
      Fabian Godehardt 提交于
      Adding more than one chip on device-tree currently causes the probing
      routine to always use the first chips data pointer.
      Signed-off-by: NFabian Godehardt <fg@emlix.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1c0b471
    • L
      net: ipv6: make "ip -6 route get mark xyz" work. · 2e47b291
      Lorenzo Colitti 提交于
      Currently, "ip -6 route get mark xyz" ignores the mark passed in
      by userspace. Make it honour the mark, just like IPv4 does.
      Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e47b291
    • D
      Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge · 2f67cc87
      David S. Miller 提交于
      Include changes:
      - fix NULL dereference in batadv_orig_hardif_seq_print_text()
      - fix reference counting imbalance when using fragmentation
      - avoid access to orig_node objects after they have been free'd
      - fix local TT check for outgoing arp requests in DAT
      2f67cc87
    • D
      xen-netback: fix race between napi_complete() and interrupt handler · 0d08fceb
      David Vrabel 提交于
      When the NAPI budget was not all used, xenvif_poll() would call
      napi_complete() /after/ enabling the interrupt.  This resulted in a
      race between the napi_complete() and the napi_schedule() in the
      interrupt handler.  The use of local_irq_save/restore() avoided by
      race iff the handler is running on the same CPU but not if it was
      running on a different CPU.
      
      Fix this properly by calling napi_complete() before reenabling
      interrupts (in the xenvif_napi_schedule_or_enable_irq() call).
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NWei Liu <wei.liu2@citrix.com>
      Acked-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d08fceb
    • D
      Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless · 202630b4
      David S. Miller 提交于
      John W. Linville says:
      
      ====================
      pull request: wireless 2014-05-15
      
      Please pull this batch of fixes for the 3.15 stream...
      
      For the mac80211 bits, Johannes says:
      
      "One fix is to get better VHT performance and the other fixes tracing
      garbage or other potential issues with the interface name tracing."
      
      And...
      
      "This has a fix from Emmanuel for a problem I failed to fix - when
      association is in progress then it needs to be cancelled while
      suspending (I had fixed the same for authentication). Also included a
      fix from myself for a userspace API problem that hit the iw tool and a
      fix to the remain-on-channel framework."
      
      For the iwlwifi bits, Emmanuel says:
      
      "Alex fixes the scan by disabling the fragmented scan. David prevents
      scan offload while associated, the firmware seems not to like it. I
      fix a stupid bug I made in BT Coex, and fix a bad #ifdef clause in rate
      scaling.  Along with that there is a fix for a NULL pointer exception
      that can happen if we load the driver and our ISR gets called because
      the interrupt line is shared. The fix has been tested by the reporter."
      
      And...
      
      "We have here a fix from David Spinadel that makes a previous fix more
      complete, and an off-by-one issue fixed by Eliad in the same area.
      I fix the monitor that broke on the way."
      
      Beyond that...
      
      Daniel Kim's one-liner fixes a brcmfmac regression caused by a typo
      in an earlier commit..
      
      Rajkumar Manoharan fixes an ath9k oops reported by David Herrmann.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      202630b4
    • N
      af_rxrpc: Fix XDR length check in rxrpc key demarshalling. · fde0133b
      Nathaniel W Filardo 提交于
      There may be padding on the ticket contained in the key payload, so just ensure
      that the claimed token length is large enough, rather than exactly the right
      size.
      Signed-off-by: NNathaniel Wesley Filardo <nwf@cs.jhu.edu>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fde0133b
    • Z
      net: phy: resume phydev when going to RESUMING · 6e14a5ee
      Zhangfei Gao 提交于
      With commit be9dad1f ("net: phy: suspend phydev when going
      to HALTED"), an unused PHY device will be put in a low-power mode
      using BMCR_PDOWN. Some Ethernet drivers might be calling phy_start()
      and phy_stop() from ndo_open and ndo_close() respectively, while
      calling phy_connect() and phy_disconnect() from probe and remove.
      In such a case, the PHY will be powered down during the phy_stop()
      call, but will fail to be powered up in phy_start().
      This patch fixes this scenario.
      Signed-off-by: NJiancheng Xue <xuejiancheng@huawei.com>
      Signed-off-by: NZhangfei Gao <zhangfei.gao@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e14a5ee
    • D
      Merge branch 'mlx4-net' · 0c2e3fa9
      David S. Miller 提交于
      Or Gerlitz says:
      
      ====================
      mlx4: Fix VF MAC address change under RoCE usage
      
      This short series provides proper handling for the case where a
      VF netdevice change their MAC address under a RoCE use case. The code
      it deals with was introduced in 3.15-rc1
      
      Prior to this series the source MAC used for the VM RoCE CM
      packets remains as before the MAC modification. Hence RoCE CM
      packets sent by the VF will not carry the same source MAC
      address as the non-CM packets.
      
      Earlier 3.15-rc commit f24f790f "net/mlx4_core: Load the Eth driver
      first" handled just one instance of the problem, but this one
      provides a more generic and proper solution which covers all
      cases of VF mac change.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c2e3fa9
    • M
      IB/mlx4: Invoke UPDATE_QP for proxy QP1 on MAC changes · 9433c188
      Matan Barak 提交于
      When we receive a netdev event indicating a netdev change and/or
      a netdev address change, we must change the MAC index used by the
      proxy QP1 (in the QP context), otherwise RoCE CM packets sent by the
      VF will not carry the same source MAC address as the non-CM packets.
      
      We use the UPDATE_QP command to perform this change.
      
      In order to avoid modifying a QP context based on netdev event,
      while the driver attempts to destroy this QP (e.g either the mlx4_ib
      or ib_mad modules are unloaded), we use mutex locking in both flows.
      
      Since the relevant mlx4 proxy GSI QP is created indirectly by the
      mad module when they create their GSI QP, the mlx4 didn't need to
      keep track on that QP prior to this change.
      
      Now, when QP modifications are needed to this QP from within the
      driver, we added refernece to it.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9433c188
    • M
      net/mlx4_core: Add UPDATE_QP SRIOV wrapper support · ce8d9e0d
      Matan Barak 提交于
      This patch adds UPDATE_QP SRIOV wrapper support.
      
      The mechanism is a general one, but currently only source MAC
      index changes are allowed for VFs.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce8d9e0d
  6. 16 5月, 2014 12 次提交
    • N
      bonding: fix out of range parameters for bond_intmax_tbl · 81c70806
      Nikolay Aleksandrov 提交于
      I've missed to add a NULL entry to the bond_intmax_tbl when I introduced
      it with the conversion of arp_interval so add it now.
      
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      
      Fixes: 7bdb04ed ("bonding: convert arp_interval to use the new option API")
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Acked-by: NVeaceslav Falico <vfalico@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81c70806
    • Z
      xen-netback: Fix grant ref resolution in RX path · 58375744
      Zoltan Kiss 提交于
      The original series for reintroducing grant mapping for netback had a patch [1]
      to handle receiving of packets from an another VIF. Grant copy on the receiving
      side needs the grant ref of the page to set up the op.
      The original patch assumed (wrongly) that the frags array haven't changed. In
      the case reported by Sander, the sending guest sent a packet where the linear
      buffer and the first frag were under PKT_PROT_LEN (=128) bytes.
      xenvif_tx_submit() then pulled up the linear area to 128 bytes, and ditched the
      first frag. The receiving side had an off-by-one problem when gathered the grant
      refs.
      This patch fixes that by checking whether the actual frag's page pointer is the
      same as the page in the original frag list. It can handle any kind of changes on
      the original frags array, like:
      - removing granted frags from the array at any point
      - adding local pages to the frags list anywhere
      - reordering the frags
      It's optimized to the most common case, when there is 1:1 relation between the
      frags and the list, plus works optimal when frags are removed from the end or
      the beginning.
      
      [1]: 3e2234: xen-netback: Handle foreign mapped pages on the guest RX path
      Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
      Acked-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58375744
    • D
      ipv6: update Destination Cache entries when gateway turn into host · be7a010d
      Duan Jiong 提交于
      RFC 4861 states in 7.2.5:
      
      	The IsRouter flag in the cache entry MUST be set based on the
               Router flag in the received advertisement.  In those cases
               where the IsRouter flag changes from TRUE to FALSE as a result
               of this update, the node MUST remove that router from the
               Default Router List and update the Destination Cache entries
               for all destinations using that neighbor as a router as
               specified in Section 7.3.3.  This is needed to detect when a
               node that is used as a router stops forwarding packets due to
               being configured as a host.
      
      Currently, when dealing with NA Message which IsRouter flag changes from
      TRUE to FALSE, the kernel only removes router from the Default Router List,
      and don't update the Destination Cache entries.
      
      Now in order to update those Destination Cache entries, i introduce
      function rt6_clean_tohost().
      Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be7a010d
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · f895f0cf
      David S. Miller 提交于
      Conflicts:
      	net/ipv4/ip_vti.c
      
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2014-05-15
      
      This pull request has a merge conflict in net/ipv4/ip_vti.c
      between commit 8d89dcdf ("vti: don't allow to add the same
      tunnel twice") and commit a3245236  ("vti4:Don't count header
      length twice"). It can be solved like it is done in linux-next.
      
      1) Fix a ipv6 xfrm output crash when a packet is rerouted
         by netfilter to not use IPsec.
      
      2) vti4 counts some header lengths twice leading to an incorrect
         device mtu. Fix this by counting these headers only once.
      
      3) We don't catch the case if an unsupported protocol is submitted
         to the xfrm protocol handlers, this can lead to NULL pointer
         dereferences. Fix this by adding the appropriate checks.
      
      4) vti6 may unregister pernet ops twice on init errors.
         Fix this by removing one of the calls to do it only once.
         From Mathias Krause.
      
      5) Set the vti tunnel mark before doing a lookup in the error
         handlers. Otherwise we don't find the correct xfrm state.
      ====================
      
      The conflict in ip_vti.c was simple, 'net' had a commit
      removing a line from vti_tunnel_init() and this tree
      being merged had a commit adding a line to the same
      location.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f895f0cf
    • G
      net: phy: Don't call phy_resume if phy_init_hw failed · b394745d
      Guenter Roeck 提交于
      After the call to phy_init_hw failed in phy_attach_direct, phy_detach is called
      to detach the phy device from its network device. If the attached driver is a
      generic phy driver, this also detaches the driver. Subsequently phy_resume
      is called, which assumes without checking that a driver is attached to the
      device. This will result in a crash such as
      
      Unable to handle kernel paging request for data at address 0xffffffffffffff90
      Faulting instruction address: 0xc0000000003a0e18
      Oops: Kernel access of bad area, sig: 11 [#1]
      ...
      NIP [c0000000003a0e18] .phy_attach_direct+0x68/0x17c
      LR [c0000000003a0e6c] .phy_attach_direct+0xbc/0x17c
      Call Trace:
      [c0000003fc0475d0] [c0000000003a0e6c] .phy_attach_direct+0xbc/0x17c (unreliable)
      [c0000003fc047670] [c0000000003a0ff8] .phy_connect_direct+0x28/0x98
      [c0000003fc047700] [c0000000003f0074] .of_phy_connect+0x4c/0xa4
      
      Only call phy_resume if phy_init_hw was successful.
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b394745d
    • D
      Merge branch 'altera_tse' · 48f0459f
      David S. Miller 提交于
      Vince Bridgers says:
      
      ====================
      Altera TSE: Fix Sparse errors and misc issues
      
      This is version 2 of a patch series to correct sparse errors, cppcheck
      warnings, and workaound a multicast filtering issue in the Altera TSE
      Ethernet driver. Multicast filtering is not working as expected, so if
      present in the hardware will not be used and promiscuous mode enabled
      instead. This workaround will be replaced with a working solution when
      completely debugged, integrated and tested.
      
      Version 2 is different from the first submission by breaking out the
      workaround as a seperate patch and addressing a few structure instance
      declarations by making them const per review comments.
      
      If you find this patch acceptable, please consider this for inclusion into
      the Altera TSE driver source code.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48f0459f
    • V
      Altera TSE: Disable Multicast filtering to workaround problem · d91e5c02
      Vince Bridgers 提交于
      This patch disables multicast hash filtering if present in the hardware
      and uses promiscuous mode instead until the problem with multicast
      filtering has been debugged, integrated and tested.
      Signed-off-by: NVince Bridgers <vbridgers2013@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d91e5c02
    • V
      Altera TSE: Fix sparse errors and warnings · 89830580
      Vince Bridgers 提交于
      This patch fixes the many sparse errors and warnings contained in the
      initial submission of the Altera Triple Speed Ethernet driver, and a
      few minor cppcheck warnings. Changes are tested on ARM and NIOS2
      example designs, and compile tested against multiple architectures.
      Typical issues addressed were as follows:
      
      altera_tse_ethtool.c:136:19: warning: incorrect type in argument
          1 (different address spaces)
      altera_tse_ethtool.c:136:19:    expected void const volatile
          [noderef] <asn:2>*addr
      altera_tse_ethtool.c:136:19:    got unsigned int *<noident>
      ...
      altera_sgdma.c:129:31: warning: cast removes address space of
          expression
      Signed-off-by: NVince Bridgers <vbridgers2013@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89830580
    • C
      rtnetlink: wait for unregistering devices in rtnl_link_unregister() · 200b916f
      Cong Wang 提交于
      From: Cong Wang <cwang@twopensource.com>
      
      commit 50624c93 (net: Delay default_device_exit_batch until no
      devices are unregistering) introduced rtnl_lock_unregistering() for
      default_device_exit_batch(). Same race could happen we when rmmod a driver
      which calls rtnl_link_unregister() as we call dev->destructor without rtnl
      lock.
      
      For long term, I think we should clean up the mess of netdev_run_todo()
      and net namespce exit code.
      
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      200b916f
    • A
      batman-adv: fix local TT check for outgoing arp requests in DAT · cc2f3386
      Antonio Quartulli 提交于
      Change introduced by 88e48d7b
      ("batman-adv: make DAT drop ARP requests targeting local clients")
      implements a check that prevents DAT from using the caching
      mechanism when the client that is supposed to provide a reply
      to an arp request is local.
      
      However change brought by be1db4f6
      ("batman-adv: make the Distributed ARP Table vlan aware")
      has not converted the above check into its vlan aware version
      thus making it useless when the local client is behind a vlan.
      
      Fix the behaviour by properly specifying the vlan when
      checking for a client being local or not.
      Reported-by: NSimon Wunderlich <simon@open-mesh.com>
      Signed-off-by: NAntonio Quartulli <antonio@open-mesh.com>
      Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
      cc2f3386
    • A
      batman-adv: increase orig refcount when storing ref in gw_node · 377fe0f9
      Antonio Quartulli 提交于
      A pointer to the orig_node representing a bat-gateway is
      stored in the gw_node->orig_node member, but the refcount
      for such orig_node is never increased.
      This leads to memory faults when gw_node->orig_node is accessed
      and the originator has already been freed.
      
      Fix this by increasing the refcount on gw_node creation
      and decreasing it on gw_node free.
      Signed-off-by: NAntonio Quartulli <antonio@open-mesh.com>
      Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
      377fe0f9
    • A
      batman-adv: fix reference counting imbalance while sending fragment · be181015
      Antonio Quartulli 提交于
      In the new fragmentation code the batadv_frag_send_packet()
      function obtains a reference to the primary_if, but it does
      not release it upon return.
      
      This reference imbalance prevents the primary_if (and then
      the related netdevice) to be properly released on shut down.
      
      Fix this by releasing the primary_if in batadv_frag_send_packet().
      
      Introduced by ee75ed88
      ("batman-adv: Fragment and send skbs larger than mtu")
      
      Cc: Martin Hundebøll <martin@hundeboll.net>
      Signed-off-by: NAntonio Quartulli <antonio@open-mesh.com>
      Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
      Acked-by: NMartin Hundebøll <martin@hundeboll.net>
      be181015