1. 20 11月, 2013 7 次提交
    • J
      quota/genetlink: use proper genetlink multicast APIs · 2ecf7536
      Johannes Berg 提交于
      The quota code is abusing the genetlink API and is using
      its family ID as the multicast group ID, which is invalid
      and may belong to somebody else (and likely will.)
      
      Make the quota code use the correct API, but since this
      is already used as-is by userspace, reserve a family ID
      for this code and also reserve that group ID to not break
      userspace assumptions.
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ecf7536
    • J
      drop_monitor/genetlink: use proper genetlink multicast APIs · e5dcecba
      Johannes Berg 提交于
      The drop monitor code is abusing the genetlink API and is
      statically using the generic netlink multicast group 1, even
      if that group belongs to somebody else (which it invariably
      will, since it's not reserved.)
      
      Make the drop monitor code use the proper APIs to reserve a
      group ID, but also reserve the group id 1 in generic netlink
      code to preserve the userspace API. Since drop monitor can
      be a module, don't clear the bit for it on unregistration.
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5dcecba
    • J
      genetlink: only pass array to genl_register_family_with_ops() · c53ed742
      Johannes Berg 提交于
      As suggested by David Miller, make genl_register_family_with_ops()
      a macro and pass only the array, evaluating ARRAY_SIZE() in the
      macro, this is a little safer.
      
      The openvswitch has some indirection, assing ops/n_ops directly in
      that code. This might ultimately just assign the pointers in the
      family initializations, saving the struct genl_family_and_ops and
      code (once mcast groups are handled differently.)
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c53ed742
    • A
      tcp: don't update snd_nxt, when a socket is switched from repair mode · dbde4979
      Andrey Vagin 提交于
      snd_nxt must be updated synchronously with sk_send_head.  Otherwise
      tp->packets_out may be updated incorrectly, what may bring a kernel panic.
      
      Here is a kernel panic from my host.
      [  103.043194] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
      [  103.044025] IP: [<ffffffff815aaaaf>] tcp_rearm_rto+0xcf/0x150
      ...
      [  146.301158] Call Trace:
      [  146.301158]  [<ffffffff815ab7f0>] tcp_ack+0xcc0/0x12c0
      
      Before this panic a tcp socket was restored. This socket had sent and
      unsent data in the write queue. Sent data was restored in repair mode,
      then the socket was switched from reapair mode and unsent data was
      restored. After that the socket was switched back into repair mode.
      
      In that moment we had a socket where write queue looks like this:
      snd_una    snd_nxt   write_seq
         |_________|________|
                   |
      	  sk_send_head
      
      After a second switching from repair mode the state of socket was
      changed:
      
      snd_una          snd_nxt, write_seq
         |_________ ________|
                   |
      	  sk_send_head
      
      This state is inconsistent, because snd_nxt and sk_send_head are not
      synchronized.
      
      Bellow you can find a call trace, how packets_out can be incremented
      twice for one skb, if snd_nxt and sk_send_head are not synchronized.
      In this case packets_out will be always positive, even when
      sk_write_queue is empty.
      
      tcp_write_wakeup
      	skb = tcp_send_head(sk);
      	tcp_fragment
      		if (!before(tp->snd_nxt, TCP_SKB_CB(buff)->end_seq))
      			tcp_adjust_pcount(sk, skb, diff);
      	tcp_event_new_data_sent
      		tp->packets_out += tcp_skb_pcount(skb);
      
      I think update of snd_nxt isn't required, when a socket is switched from
      repair mode.  Because it's initialized in tcp_connect_init. Then when a
      write queue is restored, snd_nxt is incremented in tcp_event_new_data_sent,
      so it's always is in consistent state.
      
      I have checked, that the bug is not reproduced with this patch and
      all tests about restoring tcp connections work fine.
      
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dbde4979
    • Y
      atm: idt77252: fix dev refcnt leak · b5de4a22
      Ying Xue 提交于
      init_card() calls dev_get_by_name() to get a network deceive. But it
      doesn't decrease network device reference count after the device is
      used.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5de4a22
    • F
      xfrm: Release dst if this dst is improper for vti tunnel · 236c9f84
      fan.du 提交于
      After searching rt by the vti tunnel dst/src parameter,
      if this rt has neither attached to any transformation
      nor the transformation is not tunnel oriented, this rt
      should be released back to ip layer.
      
      otherwise causing dst memory leakage.
      Signed-off-by: NFan Du <fan.du@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      236c9f84
    • J
      netlink: fix documentation typo in netlink_set_err() · 840e93f2
      Johannes Berg 提交于
      The parameter is just 'group', not 'groups', fix the documentation typo.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      840e93f2
  2. 19 11月, 2013 14 次提交
  3. 16 11月, 2013 10 次提交
  4. 15 11月, 2013 9 次提交
    • J
      6lowpan: Uncompression of traffic class field was incorrect · 1188f054
      Jukka Rissanen 提交于
      If priority/traffic class field in IPv6 header is set (seen when
      using ssh), the uncompression sets the TC and Flow fields incorrectly.
      
      Example:
      
      This is IPv6 header of a sent packet. Note the priority/TC (=1) in
      the first byte.
      
      00000000: 61 00 00 00 00 2c 06 40 fe 80 00 00 00 00 00 00
      00000010: 02 02 72 ff fe c6 42 10 fe 80 00 00 00 00 00 00
      00000020: 02 1e ab ff fe 4c 52 57
      
      This gets compressed like this in the sending side
      
      00000000: 72 31 04 06 02 1e ab ff fe 4c 52 57 ec c2 00 16
      00000010: aa 2d fe 92 86 4e be c6 ....
      
      In the receiving end, the packet gets uncompressed to this
      IPv6 header
      
      00000000: 60 06 06 02 00 2a 1e 40 fe 80 00 00 00 00 00 00
      00000010: 02 02 72 ff fe c6 42 10 fe 80 00 00 00 00 00 00
      00000020: ab ff fe 4c 52 57 ec c2
      
      First four bytes are set incorrectly and we have also lost
      two bytes from destination address.
      
      The fix is to switch the case values in switch statement
      when checking the TC field.
      Signed-off-by: NJukka Rissanen <jukka.rissanen@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1188f054
    • E
      tipc: fix dereference before check warning · 3db0a197
      Erik Hugne 提交于
      This fixes the following Smatch warning:
      net/tipc/link.c:2364 tipc_link_recv_fragment()
          warn: variable dereferenced before check '*head' (see line 2361)
      
      A null pointer might be passed to skb_try_coalesce if
      a malicious sender injects orphan fragments on a link.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3db0a197
    • E
      ipv4: fix possible seqlock deadlock · c9e90429
      Eric Dumazet 提交于
      ip4_datagram_connect() being called from process context,
      it should use IP_INC_STATS() instead of IP_INC_STATS_BH()
      otherwise we can deadlock on 32bit arches, or get corruptions of
      SNMP counters.
      
      Fixes: 584bdf8c ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9e90429
    • G
      net/hsr: Fix possible leak in 'hsr_get_node_status()' · 84a035f6
      Geyslan G. Bem 提交于
      If 'hsr_get_node_data()' returns error, going directly to 'fail' label
      doesn't free the memory pointed by 'skb_out'.
      Signed-off-by: NGeyslan G. Bem <geyslan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84a035f6
    • D
      Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless · 8422d1f1
      David S. Miller 提交于
      John W. Linville says:
      
      ====================
      pull request: wireless 2013-11-14
      
      Please pull this batch of fixes intended for the 3.13 stream!
      
      Amitkumar Karwar offers a quartet of mwifiex fixes, including an
      endian fix and three fixes for invalid memory access.
      
      Avinash Patil trims the packet length value for packets received from
      an SDIO interface.
      
      Colin Ian King fixes a NULL pointer dereference in the rtlwifi
      efuse code.
      
      Dan Carpenter cleans-up an mwifiex integer underflow, a potential
      libertas oops, a memory corrupion bug in wcn36xx, and a locking issue
      also in wcn36xx.
      
      Dan Williams helps prism54 devices to avoid being misclassified as
      Ethernet devices.
      
      Felipe Pena fixes a couple of typo errors, one in rt2x00 and the
      other in rtlwifi.
      
      Janusz Dziedzic corrects a pair of DFS-related problems in ath9k.
      
      Larry Finger patches three rtlwifi drivers to correctly report signal
      strength even for an unassociated AP.
      
      Mark Cave-Ayland rewrites some endian-illiterate packet type extraction
      code in rtlwifi.
      
      Stanislaw Gruszka addresses an rt2x00 regression related to setting
      HT station WCID and AMPDU density parameters.
      
      Sujith Manoharan corrects the initvals settings for AR9485.
      
      Ujjal Roy patches an obscure bit of code in mwifiex that was using
      the wrong definition of eth_hdr when briding patches in AP mode.
      
      Wei Yongjun fixes a couple of bugs: one is a return code handling
      bug in libertas; and, the other is a locking issue in wcn36xx.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8422d1f1
    • M
      virtio-net: mergeable buffer size should include virtio-net header · 5061de36
      Michael Dalton 提交于
      Commit 2613af0e ("virtio_net: migrate mergeable rx buffers to page
      frag allocators") changed the mergeable receive buffer size from PAGE_SIZE
      to MTU-size. However, the merge buffer size does not take into account the
      size of the virtio-net header. Consequently, packets that are MTU-size
      will take two buffers intead of one (to store the virtio-net header),
      substantially decreasing the throughput of MTU-size traffic due to TCP
      window / SKB truesize effects.
      
      This commit changes the mergeable buffer size to include the virtio-net
      header. The buffer size is cacheline-aligned because skb_page_frag_refill
      will not automatically align the requested size.
      
      Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs
      between two QEMU VMs on a single physical machine. Each VM has two VCPUs and
      vhost enabled. All VMs and vhost threads run in a single 4 CPU cgroup
      cpuset, using cgroups to ensure that other processes in the system will not
      be scheduled on the benchmark CPUs. Transmit offloads and mergeable receive
      buffers are enabled, but guest_tso4 / guest_csum are explicitly disabled to
      force MTU-sized packets on the receiver.
      
      next-net trunk before 2613af0e (PAGE_SIZE buf): 3861.08Gb/s
      net-next trunk (MTU 1500- packet uses two buf due to size bug): 4076.62Gb/s
      net-next trunk (MTU 1480- packet fits in one buf): 6301.34Gb/s
      net-next trunk w/ size fix (MTU 1500 - packet fits in one buf): 6445.44Gb/s
      Suggested-by: NEric Northup <digitaleric@google.com>
      Signed-off-by: NMichael Dalton <mwdalton@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5061de36
    • C
      connector: improved unaligned access error fix · 1ca1a4cf
      Chris Metcalf 提交于
      In af3e095a, Erik Jacobsen fixed one type of unaligned access
      bug for ia64 by converting a 64-bit write to use put_unaligned().
      Unfortunately, since gcc will convert a short memset() to a series
      of appropriately-aligned stores, the problem is now visible again
      on tilegx, where the memset that zeros out proc_event is converted
      to three 64-bit stores, causing an unaligned access panic.
      
      A better fix for the original problem is to ensure that proc_event
      is aligned to 8 bytes here.  We can do that relatively easily by
      arranging to start the struct cn_msg aligned to 8 bytes and then
      offset by 4 bytes.  Doing so means that the immediately following
      proc_event structure is then correctly aligned to 8 bytes.
      
      The result is that the memset() stores are now aligned, and as an
      added benefit, we can remove the put_unaligned() calls in the code.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ca1a4cf
    • M
      pkt_sched: fq: change classification of control packets · 2abc2f07
      Maciej Żenczykowski 提交于
      Initial sch_fq implementation copied code from pfifo_fast to classify
      a packet as a high prio packet.
      
      This clashes with setups using PRIO with say 7 bands, as one of the
      band could be incorrectly (mis)classified by FQ.
      
      Packets would be queued in the 'internal' queue, and no pacing ever
      happen for this special queue.
      
      Fixes: afe4fd06 ("pkt_sched: fq: Fair Queue packet scheduler")
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2abc2f07
    • H
      alx: Reset phy speed after resume · b54629e2
      hahnjo 提交于
      This fixes bug 62491 (https://bugzilla.kernel.org/show_bug.cgi?id=62491).
      After resuming some users got the following error flooding the kernel log:
      alx 0000:02:00.0: invalid PHY speed/duplex: 0xffff
      Signed-off-by: NJonas Hahnfeld <linux@hahnjo.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b54629e2