1. 08 10月, 2015 5 次提交
  2. 30 9月, 2015 6 次提交
  3. 29 9月, 2015 4 次提交
    • D
      net: sctp: Don't use 64 kilobyte lookup table for four elements · 2103d6b8
      Denys Vlasenko 提交于
      Seemingly innocuous sctp_trans_state_to_prio_map[] array
      is way bigger than it looks, since
      "[SCTP_UNKNOWN] = 2" expands into "[0xffff] = 2" !
      
      This patch replaces it with switch() statement.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: Neil Horman <nhorman@tuxdriver.com>
      CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      CC: linux-sctp@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-kernel@vger.kernel.org
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2103d6b8
    • A
      l2tp: protect tunnel->del_work by ref_count · 06a15f51
      Alexander Couzens 提交于
      There is a small chance that tunnel_free() is called before tunnel->del_work scheduled
      resulting in a zero pointer dereference.
      Signed-off-by: NAlexander Couzens <lynxis@fe80.eu>
      Acked-by: NJames Chapman <jchapman@katalix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06a15f51
    • K
      sctp: Prevent soft lockup when sctp_accept() is called during a timeout event · 635682a1
      Karl Heiss 提交于
      A case can occur when sctp_accept() is called by the user during
      a heartbeat timeout event after the 4-way handshake.  Since
      sctp_assoc_migrate() changes both assoc->base.sk and assoc->ep, the
      bh_sock_lock in sctp_generate_heartbeat_event() will be taken with
      the listening socket but released with the new association socket.
      The result is a deadlock on any future attempts to take the listening
      socket lock.
      
      Note that this race can occur with other SCTP timeouts that take
      the bh_lock_sock() in the event sctp_accept() is called.
      
       BUG: soft lockup - CPU#9 stuck for 67s! [swapper:0]
       ...
       RIP: 0010:[<ffffffff8152d48e>]  [<ffffffff8152d48e>] _spin_lock+0x1e/0x30
       RSP: 0018:ffff880028323b20  EFLAGS: 00000206
       RAX: 0000000000000002 RBX: ffff880028323b20 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: ffff880028323be0 RDI: ffff8804632c4b48
       RBP: ffffffff8100bb93 R08: 0000000000000000 R09: 0000000000000000
       R10: ffff880610662280 R11: 0000000000000100 R12: ffff880028323aa0
       R13: ffff8804383c3880 R14: ffff880028323a90 R15: ffffffff81534225
       FS:  0000000000000000(0000) GS:ffff880028320000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
       CR2: 00000000006df528 CR3: 0000000001a85000 CR4: 00000000000006e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Process swapper (pid: 0, threadinfo ffff880616b70000, task ffff880616b6cab0)
       Stack:
       ffff880028323c40 ffffffffa01c2582 ffff880614cfb020 0000000000000000
       <d> 0100000000000000 00000014383a6c44 ffff8804383c3880 ffff880614e93c00
       <d> ffff880614e93c00 0000000000000000 ffff8804632c4b00 ffff8804383c38b8
       Call Trace:
       <IRQ>
       [<ffffffffa01c2582>] ? sctp_rcv+0x492/0xa10 [sctp]
       [<ffffffff8148c559>] ? nf_iterate+0x69/0xb0
       [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
       [<ffffffff8148c716>] ? nf_hook_slow+0x76/0x120
       [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
       [<ffffffff8149757d>] ? ip_local_deliver_finish+0xdd/0x2d0
       [<ffffffff81497808>] ? ip_local_deliver+0x98/0xa0
       [<ffffffff81496ccd>] ? ip_rcv_finish+0x12d/0x440
       [<ffffffff81497255>] ? ip_rcv+0x275/0x350
       [<ffffffff8145cfeb>] ? __netif_receive_skb+0x4ab/0x750
       ...
      
      With lockdep debugging:
      
       =====================================
       [ BUG: bad unlock balance detected! ]
       -------------------------------------
       CslRx/12087 is trying to release lock (slock-AF_INET) at:
       [<ffffffffa01bcae0>] sctp_generate_timeout_event+0x40/0xe0 [sctp]
       but there are no more locks to release!
      
       other info that might help us debug this:
       2 locks held by CslRx/12087:
       #0:  (&asoc->timers[i]){+.-...}, at: [<ffffffff8108ce1f>] run_timer_softirq+0x16f/0x3e0
       #1:  (slock-AF_INET){+.-...}, at: [<ffffffffa01bcac3>] sctp_generate_timeout_event+0x23/0xe0 [sctp]
      
      Ensure the socket taken is also the same one that is released by
      saving a copy of the socket before entering the timeout event
      critical section.
      Signed-off-by: NKarl Heiss <kheiss@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      635682a1
    • K
      sctp: Whitespace fix · f05940e6
      Karl Heiss 提交于
      Fix indentation in sctp_generate_heartbeat_event.
      Signed-off-by: NKarl Heiss <kheiss@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f05940e6
  4. 28 9月, 2015 1 次提交
  5. 26 9月, 2015 1 次提交
    • D
      net: Fix panic in icmp_route_lookup · bdb06cbf
      David Ahern 提交于
      Andrey reported a panic:
      
      [ 7249.865507] BUG: unable to handle kernel pointer dereference at 000000b4
      [ 7249.865559] IP: [<c16afeca>] icmp_route_lookup+0xaa/0x320
      [ 7249.865598] *pdpt = 0000000030f7f001 *pde = 0000000000000000
      [ 7249.865637] Oops: 0000 [#1]
      ...
      [ 7249.866811] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
      4.3.0-999-generic #201509220155
      [ 7249.866876] Hardware name: MSI MS-7250/MS-7250, BIOS 080014  08/02/2006
      [ 7249.866916] task: c1a5ab00 ti: c1a52000 task.ti: c1a52000
      [ 7249.866949] EIP: 0060:[<c16afeca>] EFLAGS: 00210246 CPU: 0
      [ 7249.866981] EIP is at icmp_route_lookup+0xaa/0x320
      [ 7249.867012] EAX: 00000000 EBX: f483ba48 ECX: 00000000 EDX: f2e18a00
      [ 7249.867045] ESI: 000000c0 EDI: f483ba70 EBP: f483b9ec ESP: f483b974
      [ 7249.867077]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      [ 7249.867108] CR0: 8005003b CR2: 000000b4 CR3: 36ee07c0 CR4: 000006f0
      [ 7249.867141] Stack:
      [ 7249.867165]  320310ee 00000000 00000042 320310ee 00000000 c1aeca00
      f3920240 f0c69180
      [ 7249.867268]  f483ba04 f855058b a89b66cd f483ba44 f8962f4b 00000000
      e659266c f483ba54
      [ 7249.867361]  8004753c f483ba5c f8962f4b f2031140 000003c1 ffbd8fa0
      c16b0e00 00000064
      [ 7249.867448] Call Trace:
      [ 7249.867494]  [<f855058b>] ? e1000_xmit_frame+0x87b/0xdc0 [e1000e]
      [ 7249.867534]  [<f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack]
      [ 7249.867576]  [<f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack]
      [ 7249.867615]  [<c16b0e00>] ? icmp_send+0xa0/0x380
      [ 7249.867648]  [<c16b102f>] icmp_send+0x2cf/0x380
      [ 7249.867681]  [<f89c8126>] nf_send_unreach+0xa6/0xc0 [nf_reject_ipv4]
      [ 7249.867714]  [<f89cd0da>] reject_tg+0x7a/0x9f [ipt_REJECT]
      [ 7249.867746]  [<f88c29a7>] ipt_do_table+0x317/0x70c [ip_tables]
      [ 7249.867780]  [<f895e0a6>] ? __nf_conntrack_find_get+0x166/0x3b0
      [nf_conntrack]
      [ 7249.867838]  [<f895eea8>] ? nf_conntrack_in+0x398/0x600 [nf_conntrack]
      [ 7249.867889]  [<f84c0035>] iptable_filter_hook+0x35/0x80 [iptable_filter]
      [ 7249.867933]  [<c16776a1>] nf_iterate+0x71/0x80
      [ 7249.867970]  [<c1677715>] nf_hook_slow+0x65/0xc0
      [ 7249.868002]  [<c1681811>] __ip_local_out_sk+0xc1/0xd0
      [ 7249.868034]  [<c1680f30>] ? ip_forward_options+0x1a0/0x1a0
      [ 7249.868066]  [<c1681836>] ip_local_out_sk+0x16/0x30
      [ 7249.868097]  [<c1684054>] ip_send_skb+0x14/0x80
      [ 7249.868129]  [<c16840f4>] ip_push_pending_frames+0x34/0x40
      [ 7249.868163]  [<c16844a2>] ip_send_unicast_reply+0x282/0x310
      [ 7249.868196]  [<c16a0863>] tcp_v4_send_reset+0x1b3/0x380
      [ 7249.868227]  [<c16a1b63>] tcp_v4_rcv+0x323/0x990
      [ 7249.868257]  [<c16776a1>] ? nf_iterate+0x71/0x80
      [ 7249.868289]  [<c167dc2b>] ip_local_deliver_finish+0x8b/0x230
      [ 7249.868322]  [<c167df4c>] ip_local_deliver+0x4c/0xa0
      [ 7249.868353]  [<c167dba0>] ? ip_rcv_finish+0x390/0x390
      [ 7249.868384]  [<c167d88c>] ip_rcv_finish+0x7c/0x390
      [ 7249.868415]  [<c167e280>] ip_rcv+0x2e0/0x420
      ...
      
      Prior to the VRF change the oif was not set in the flow struct, so the
      VRF support should really have only added the vrf_master_ifindex lookup.
      
      Fixes: 613d09b3 ("net: Use VRF device index for lookups on TX")
      Cc: Andrey Melnikov <temnota.am@gmail.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdb06cbf
  6. 25 9月, 2015 10 次提交
    • C
      xprtrdma: Replace global lkey with lkey local to PD · bb6c96d7
      Chuck Lever 提交于
      The core API has changed so that devices that do not have a global
      DMA lkey automatically create an mr, per-PD, and make that lkey
      available. The global DMA lkey interface is going away in favor of
      the per-PD DMA lkey.
      
      The per-PD DMA lkey is always available. Convert xprtrdma to use the
      device's per-PD DMA lkey for regbufs, no matter which memory
      registration scheme is in use.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
      Cc: linux-nfs <linux-nfs@vger.kernel.org>
      Acked-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      bb6c96d7
    • R
      net: fix net_device refcounting · 9861f720
      Russell King 提交于
      of_find_net_device_by_node() uses class_find_device() internally to
      lookup the corresponding network device.  class_find_device() returns
      a reference to the embedded struct device, with its refcount
      incremented.
      
      Add a comment to the definition in net/core/net-sysfs.c indicating the
      need to drop this refcount, and fix the DSA code to drop this refcount
      when the OF-generated platform data is cleaned up and freed.  Also
      arrange for the ref to be dropped when handling errors.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9861f720
    • R
      net: dsa: fix of_mdio_find_bus() device refcount leak · e496ae69
      Russell King 提交于
      Current users of of_mdio_find_bus() leak a struct device refcount, as
      they fail to clean up the reference obtained inside class_find_device().
      
      Fix the DSA code to properly refcount the returned MDIO bus by:
      1. taking a reference on the struct device whenever we assign it to
         pd->chip[x].host_dev.
      2. dropping the reference when we overwrite the existing reference.
      3. dropping the reference when we free the data structure.
      4. dropping the initial reference we obtained after setting up the
         platform data structure, or on failure.
      
      In step 2 above, where we obtain a new MDIO bus, there is no need to
      take a reference on it as we would only have to drop it immediately
      after assignment again, iow:
      
      	put_device(cd->host_dev);	/* drop original assignment ref */
      	cd->host_dev = get_device(&mdio_bus_switch->dev); /* get our ref */
      	put_device(&mdio_bus_switch->dev); /* drop of_mdio_find_bus ref */
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e496ae69
    • M
      ip6_tunnel: Reduce log level in ip6_tnl_err() to debug · 17a10c92
      Matt Bennett 提交于
      Currently error log messages in ip6_tnl_err are printed at 'warn'
      level. This is different to other tunnel types which don't print
      any messages. These log messages don't provide any information that
      couldn't be deduced with networking tools. Also it can be annoying
      to have one end of the tunnel go down and have the logs fill with
      pointless messages such as "Path to destination invalid or inactive!".
      
      This patch reduces the log level of these messages to 'dbg' level to
      bring the visible behaviour into line with other tunnel types.
      Signed-off-by: NMatt Bennett <matt.bennett@alliedtelesis.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17a10c92
    • M
      ip6_gre: Reduce log level in ip6gre_err() to debug · a46496ce
      Matt Bennett 提交于
      Currently error log messages in ip6gre_err are printed at 'warn'
      level. This is different to most other tunnel types which don't
      print any messages. These log messages don't provide any information
      that couldn't be deduced with networking tools. Also it can be annoying
      to have one end of the tunnel go down and have the logs fill with
      pointless messages such as "Path to destination invalid or inactive!".
      
      This patch reduces the log level of these messages to 'dbg' level to
      bring the visible behaviour into line with other tunnel types.
      Signed-off-by: NMatt Bennett <matt.bennett@alliedtelesis.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a46496ce
    • W
      fib_rules: fix fib rule dumps across multiple skbs · 41fc0143
      Wilson Kok 提交于
      dump_rules returns skb length and not error.
      But when family == AF_UNSPEC, the caller of dump_rules
      assumes that it returns an error. Hence, when family == AF_UNSPEC,
      we continue trying to dump on -EMSGSIZE errors resulting in
      incorrect dump idx carried between skbs belonging to the same dump.
      This results in fib rule dump always only dumping rules that fit
      into the first skb.
      
      This patch fixes dump_rules to return error so that we exit correctly
      and idx is correctly maintained between skbs that are part of the
      same dump.
      Signed-off-by: NWilson Kok <wkok@cumulusnetworks.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41fc0143
    • W
      net: revert "net_sched: move tp->root allocation into fw_init()" · d8aecb10
      WANG Cong 提交于
      fw filter uses tp->root==NULL to check if it is the old method,
      so it doesn't need allocation at all in this case. This patch
      reverts the offending commit and adds some comments for old
      method to make it obvious.
      
      Fixes: 33f8b9ec ("net_sched: move tp->root allocation into fw_init()")
      Reported-by: NAkshat Kakkar <akshat.1984@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8aecb10
    • J
      lwtunnel: remove source and destination UDP port config option · b194f30c
      Jiri Benc 提交于
      The UDP tunnel config is asymmetric wrt. to the ports used. The source and
      destination ports from one direction of the tunnel are not related to the
      ports of the other direction. We need to be able to respond to ARP requests
      using the correct ports without involving routing.
      
      As the consequence, UDP ports need to be fixed property of the tunnel
      interface and cannot be set per route. Remove the ability to set ports per
      route. This is still okay to do, as no kernel has been released with these
      attributes yet.
      
      Note that the ability to specify source and destination ports is preserved
      for other users of the lwtunnel API which don't use routes for tunnel key
      specification (like openvswitch).
      
      If in the future we rework ARP handling to allow port specification, the
      attributes can be added back.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b194f30c
    • J
      ipv4: send arp replies to the correct tunnel · 63d008a4
      Jiri Benc 提交于
      When using ip lwtunnels, the additional data for xmit (basically, the actual
      tunnel to use) are carried in ip_tunnel_info either in dst->lwtstate or in
      metadata dst. When replying to ARP requests, we need to send the reply to
      the same tunnel the request came from. This means we need to construct
      proper metadata dst for ARP replies.
      
      We could perform another route lookup to get a dst entry with the correct
      lwtstate. However, this won't always ensure that the outgoing tunnel is the
      same as the incoming one, and it won't work anyway for IPv4 duplicate
      address detection.
      
      The only thing to do is to "reverse" the ip_tunnel_info.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63d008a4
    • H
      netlink: Replace rhash_portid with bound · da314c99
      Herbert Xu 提交于
      On Mon, Sep 21, 2015 at 02:20:22PM -0400, Tejun Heo wrote:
      >
      > store_release and load_acquire are different from the usual memory
      > barriers and can't be paired this way.  You have to pair store_release
      > and load_acquire.  Besides, it isn't a particularly good idea to
      
      OK I've decided to drop the acquire/release helpers as they don't
      help us at all and simply pessimises the code by using full memory
      barriers (on some architectures) where only a write or read barrier
      is needed.
      
      > depend on memory barriers embedded in other data structures like the
      > above.  Here, especially, rhashtable_insert() would have write barrier
      > *before* the entry is hashed not necessarily *after*, which means that
      > in the above case, a socket which appears to have set bound to a
      > reader might not visible when the reader tries to look up the socket
      > on the hashtable.
      
      But you are right we do need an explicit write barrier here to
      ensure that the hashing is visible.
      
      > There's no reason to be overly smart here.  This isn't a crazy hot
      > path, write barriers tend to be very cheap, store_release more so.
      > Please just do smp_store_release() and note what it's paired with.
      
      It's not about being overly smart.  It's about actually understanding
      what's going on with the code.  I've seen too many instances of
      people simply sprinkling synchronisation primitives around without
      any knowledge of what is happening underneath, which is just a recipe
      for creating hard-to-debug races.
      
      > > @@ -1539,7 +1546,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
      > >  		}
      > >  	}
      > >
      > > -	if (!nlk->portid) {
      > > +	if (!nlk->bound) {
      >
      > I don't think you can skip load_acquire here just because this is the
      > second deref of the variable.  That doesn't change anything.  Race
      > condition could still happen between the first and second tests and
      > skipping the second would lead to the same kind of bug.
      
      The reason this one is OK is because we do not use nlk->portid or
      try to get nlk from the hash table before we return to user-space.
      
      However, there is a real bug here that none of these acquire/release
      helpers discovered.  The two bound tests here used to be a single
      one.  Now that they are separate it is entirely possible for another
      thread to come in the middle and bind the socket.  So we need to
      repeat the portid check in order to maintain consistency.
      
      > > @@ -1587,7 +1594,7 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr,
      > >  	    !netlink_allowed(sock, NL_CFG_F_NONROOT_SEND))
      > >  		return -EPERM;
      > >
      > > -	if (!nlk->portid)
      > > +	if (!nlk->bound)
      >
      > Don't we need load_acquire here too?  Is this path holding a lock
      > which makes that unnecessary?
      
      Ditto.
      
      ---8<---
      The commit 1f770c0a ("netlink:
      Fix autobind race condition that leads to zero port ID") created
      some new races that can occur due to inconcsistencies between the
      two port IDs.
      
      Tejun is right that a barrier is unavoidable.  Therefore I am
      reverting to the original patch that used a boolean to indicate
      that a user netlink socket has been bound.
      
      Barriers have been added where necessary to ensure that a valid
      portid and the hashed socket is visible.
      
      I have also changed netlink_insert to only return EBUSY if the
      socket is bound to a portid different to the requested one.  This
      combined with only reading nlk->bound once in netlink_bind fixes
      a race where two threads that bind the socket at the same time
      with different port IDs may both succeed.
      
      Fixes: 1f770c0a ("netlink: Fix autobind race condition that leads to zero port ID")
      Reported-by: NTejun Heo <tj@kernel.org>
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Nacked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da314c99
  7. 24 9月, 2015 3 次提交
    • D
      Fix AF_PACKET ABI breakage in 4.2 · d3869efe
      David Woodhouse 提交于
      Commit 7d824109 ("virtio: add explicit big-endian support to memory
      accessors") accidentally changed the virtio_net header used by
      AF_PACKET with PACKET_VNET_HDR from host-endian to big-endian.
      
      Since virtio_legacy_is_little_endian() is a very long identifier,
      define a vio_le macro and use that throughout the code instead of the
      hard-coded 'false' for little-endian.
      
      This restores the ABI to match 4.1 and earlier kernels, and makes my
      test program work again.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3869efe
    • N
      netpoll: Close race condition between poll_one_napi and napi_disable · 2d8bff12
      Neil Horman 提交于
      Drivers might call napi_disable while not holding the napi instance poll_lock.
      In those instances, its possible for a race condition to exist between
      poll_one_napi and napi_disable.  That is to say, poll_one_napi only tests the
      NAPI_STATE_SCHED bit to see if there is work to do during a poll, and as such
      the following may happen:
      
      CPU0				CPU1
      ndo_tx_timeout			napi_poll_dev
       napi_disable			 poll_one_napi
        test_and_set_bit (ret 0)
      				  test_bit (ret 1)
         reset adapter		   napi_poll_routine
      
      If the adapter gets a tx timeout without a napi instance scheduled, its possible
      for the adapter to think it has exclusive access to the hardware  (as the napi
      instance is now scheduled via the napi_disable call), while the netpoll code
      thinks there is simply work to do.  The result is parallel hardware access
      leading to corrupt data structures in the driver, and a crash.
      
      Additionaly, there is another, more critical race between netpoll and
      napi_disable.  The disabled napi state is actually identical to the scheduled
      state for a given napi instance.  The implication being that, if a napi instance
      is disabled, a netconsole instance would see the napi state of the device as
      having been scheduled, and poll it, likely while the driver was dong something
      requiring exclusive access.  In the case above, its fairly clear that not having
      the rings in a state ready to be polled will cause any number of crashes.
      
      The fix should be pretty easy.  netpoll uses its own bit to indicate that that
      the napi instance is in a state of being serviced by netpoll (NAPI_STATE_NPSVC).
      We can just gate disabling on that bit as well as the sched bit.  That should
      prevent netpoll from conducting a napi poll if we convert its set bit to a
      test_and_set_bit operation to provide mutual exclusion
      
      Change notes:
      V2)
      	Remove a trailing whtiespace
      	Resubmit with proper subject prefix
      
      V3)
      	Clean up spacing nits
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: jmaxwell@redhat.com
      Tested-by: jmaxwell@redhat.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d8bff12
    • E
      tcp: add proper TS val into RST packets · 675ee231
      Eric Dumazet 提交于
      RST packets sent on behalf of TCP connections with TS option (RFC 7323
      TCP timestamps) have incorrect TS val (set to 0), but correct TS ecr.
      
      A > B: Flags [S], seq 0, win 65535, options [mss 1000,nop,nop,TS val 100
      ecr 0], length 0
      B > A: Flags [S.], seq 2444755794, ack 1, win 28960, options [mss
      1460,nop,nop,TS val 7264344 ecr 100], length 0
      A > B: Flags [.], ack 1, win 65535, options [nop,nop,TS val 110 ecr
      7264344], length 0
      
      B > A: Flags [R.], seq 1, ack 1, win 28960, options [nop,nop,TS val 0
      ecr 110], length 0
      
      We need to call skb_mstamp_get() to get proper TS val,
      derived from skb->skb_mstamp
      
      Note that RFC 1323 was advocating to not send TS option in RST segment,
      but RFC 7323 recommends the opposite :
      
        Once TSopt has been successfully negotiated, that is both <SYN> and
        <SYN,ACK> contain TSopt, the TSopt MUST be sent in every non-<RST>
        segment for the duration of the connection, and SHOULD be sent in an
        <RST> segment (see Section 5.2 for details)
      
      Note this RFC recommends to send TS val = 0, but we believe it is
      premature : We do not know if all TCP stacks are properly
      handling the receive side :
      
         When an <RST> segment is
         received, it MUST NOT be subjected to the PAWS check by verifying an
         acceptable value in SEG.TSval, and information from the Timestamps
         option MUST NOT be used to update connection state information.
         SEG.TSecr MAY be used to provide stricter <RST> acceptance checks.
      
      In 5 years, if/when all TCP stack are RFC 7323 ready, we might consider
      to decide to send TS val = 0, if it buys something.
      
      Fixes: 7faee5c0 ("tcp: remove TCP_SKB_CB(skb)->when")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      675ee231
  8. 23 9月, 2015 3 次提交
    • N
      net: dsa: Fix Marvell Egress Trailer check · fbd03513
      Neil Armstrong 提交于
      The Marvell Egress rx trailer check must be fixed to
      correctly detect bad bits in the third byte of the
      Eggress trailer as described in the Table 28 of the
      88E6060 datasheet.
      The current code incorrectly omits to check the third
      byte and checks the fourth byte twice.
      Signed-off-by: NNeil Armstrong <narmstrong@baylibre.com>
      Acked-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fbd03513
    • J
      openvswitch: Zero flows on allocation. · ae5f2fb1
      Jesse Gross 提交于
      When support for megaflows was introduced, OVS needed to start
      installing flows with a mask applied to them. Since masking is an
      expensive operation, OVS also had an optimization that would only
      take the parts of the flow keys that were covered by a non-zero
      mask. The values stored in the remaining pieces should not matter
      because they are masked out.
      
      While this works fine for the purposes of matching (which must always
      look at the mask), serialization to netlink can be problematic. Since
      the flow and the mask are serialized separately, the uninitialized
      portions of the flow can be encoded with whatever values happen to be
      present.
      
      In terms of functionality, this has little effect since these fields
      will be masked out by definition. However, it leaks kernel memory to
      userspace, which is a potential security vulnerability. It is also
      possible that other code paths could look at the masked key and get
      uninitialized data, although this does not currently appear to be an
      issue in practice.
      
      This removes the mask optimization for flows that are being installed.
      This was always intended to be the case as the mask optimizations were
      really targetting per-packet flow operations.
      
      Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae5f2fb1
    • A
      userfaultfd: revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key" · ac5be6b4
      Andrea Arcangeli 提交于
      This reverts commit 51360155 and adapts
      fs/userfaultfd.c to use the old version of that function.
      
      It didn't look robust to call __wake_up_common with "nr == 1" when we
      absolutely require wakeall semantics, but we've full control of what we
      insert in the two waitqueue heads of the blocked userfaults.  No
      exclusive waitqueue risks to be inserted into those two waitqueue heads
      so we can as well stick to "nr == 1" of the old code and we can rely
      purely on the fact no waitqueue inserted in one of the two waitqueue
      heads we must enforce as wakeall, has wait->flags WQ_FLAG_EXCLUSIVE set.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Cc: Thierry Reding <treding@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac5be6b4
  9. 22 9月, 2015 4 次提交
    • S
      mac80211: reset CQM history upon reconfiguration · babc305e
      Sara Sharon 提交于
      The current behavior of notifying CQM events is inconsistent:
      Upon first configuration there is a cqm event with the current
      status according to threshold configured, regardless of signal
      stability.
      When there is reconfiguration no event is sent unless there is
      a significant change to the signal level according to the new
      configuration.
      
      Since the current reconfiguration behavior might cause missing
      CQM events in case the current signal did not change but is on
      the other side of the new threshold, fix that by resetting the
      stored signal level upon reconfiguration.
      Signed-off-by: NSara Sharon <sara.sharon@intel.com>
      Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      babc305e
    • J
      mac80211: fix VHT MCS mask array overrun · 2df1b131
      Johannes Berg 提交于
      The HT MCS mask has 9 bytes, the VHT one only has 8 streams.
      Split the loops to handle this correctly.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      2df1b131
    • E
      inet: fix races in reqsk_queue_hash_req() · 29c68526
      Eric Dumazet 提交于
      Before allowing lockless LISTEN processing, we need to make
      sure to arm the SYN_RECV timer before the req socket is visible
      in hash tables.
      
      Also, req->rsk_hash should be written before we set rsk_refcnt
      to a non zero value.
      
      Fixes: fa76ce73 ("inet: get rid of central tcp/dccp listener timer")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ying Cai <ycai@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29c68526
    • E
      tcp/dccp: fix timewait races in timer handling · ed2e9239
      Eric Dumazet 提交于
      When creating a timewait socket, we need to arm the timer before
      allowing other cpus to find it. The signal allowing cpus to find
      the socket is setting tw_refcnt to non zero value.
      
      As we set tw_refcnt in __inet_twsk_hashdance(), we therefore need to
      call inet_twsk_schedule() first.
      
      This also means we need to remove tw_refcnt changes from
      inet_twsk_schedule() and let the caller handle it.
      
      Note that because we use mod_timer_pinned(), we have the guarantee
      the timer wont expire before we set tw_refcnt as we run in BH context.
      
      To make things more readable I introduced inet_twsk_reschedule() helper.
      
      When rearming the timer, we can use mod_timer_pending() to make sure
      we do not rearm a canceled timer.
      
      Note: This bug can possibly trigger if packets of a flow can hit
      multiple cpus. This does not normally happen, unless flow steering
      is broken somehow. This explains this bug was spotted ~5 months after
      its introduction.
      
      A similar fix is needed for SYN_RECV sockets in reqsk_queue_hash_req(),
      but will be provided in a separate patch for proper tracking.
      
      Fixes: 789f558c ("tcp/dccp: get rid of central timewait timer")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NYing Cai <ycai@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed2e9239
  10. 21 9月, 2015 3 次提交