1. 08 6月, 2013 1 次提交
  2. 05 6月, 2013 5 次提交
  3. 03 6月, 2013 2 次提交
  4. 01 6月, 2013 2 次提交
  5. 29 5月, 2013 1 次提交
  6. 26 5月, 2013 2 次提交
    • L
      net: ipv6: Add IPv6 support to the ping socket. · 6d0bfe22
      Lorenzo Colitti 提交于
      This adds the ability to send ICMPv6 echo requests without a
      raw socket. The equivalent ability for ICMPv4 was added in
      2011.
      
      Instead of having separate code paths for IPv4 and IPv6, make
      most of the code in net/ipv4/ping.c dual-stack and only add a
      few IPv6-specific bits (like the protocol definition) to a new
      net/ipv6/ping.c. Hopefully this will reduce divergence and/or
      duplication of bugs in the future.
      
      Caveats:
      
      - Setting options via ancillary data (e.g., using IPV6_PKTINFO
        to specify the outgoing interface) is not yet supported.
      - There are no separate security settings for IPv4 and IPv6;
        everything is controlled by /proc/net/ipv4/ping_group_range.
      - The proc interface does not yet display IPv6 ping sockets
        properly.
      
      Tested with a patched copy of ping6 and using raw socket calls.
      Compiles and works with all of CONFIG_IPV6={n,m,y}.
      Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d0bfe22
    • Z
      ipvs: change type of netns_ipvs->sysctl_sync_qlen_max · 07995674
      Zhang Yanfei 提交于
      This member of struct netns_ipvs is calculated from nr_free_buffer_pages
      so change its type to unsigned long in case of overflow.  Also, type of
      its related proc var sync_qlen_max and the return type of function
      sysctl_sync_qlen_max() should be changed to unsigned long, too.
      
      Besides, the type of ipvs_master_sync_state->sync_queue_len should be
      changed to unsigned long accordingly.
      Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      07995674
  7. 23 5月, 2013 2 次提交
    • P
      netfilter: {ipt,ebt}_ULOG: rise warning on deprecation · de94c459
      Pablo Neira Ayuso 提交于
      This target has been superseded by NFLOG. Spot a warning
      so we prepare removal in a couple of years.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NGao feng <gaofeng@cn.fujitsu.com>
      de94c459
    • F
      netfilter: add nf_ipv6_ops hook to fix xt_addrtype with IPv6 · 2a7851bf
      Florian Westphal 提交于
      Quoting https://bugzilla.netfilter.org/show_bug.cgi?id=812:
      
      [ ip6tables -m addrtype ]
      When I tried to use in the nat/PREROUTING it messes up the
      routing cache even if the rule didn't matched at all.
      [..]
      If I remove the --limit-iface-in from the non-working scenario, so just
      use the -m addrtype --dst-type LOCAL it works!
      
      This happens when LOCAL type matching is requested with --limit-iface-in,
      and the default ipv6 route is via the interface the packet we test
      arrived on.
      
      Because xt_addrtype uses ip6_route_output, the ipv6 routing implementation
      creates an unwanted cached entry, and the packet won't make it to the
      real/expected destination.
      
      Silently ignoring --limit-iface-in makes the routing work but it breaks
      rule matching (--dst-type LOCAL with limit-iface-in is supposed to only
      match if the dst address is configured on the incoming interface;
      without --limit-iface-in it will match if the address is reachable
      via lo).
      
      The test should call ipv6_chk_addr() instead.  However, this would add
      a link-time dependency on ipv6.
      
      There are two possible solutions:
      
      1) Revert the commit that moved ipt_addrtype to xt_addrtype,
         and put ipv6 specific code into ip6t_addrtype.
      2) add new "nf_ipv6_ops" struct to register pointers to ipv6 functions.
      
      While the former might seem preferable, Pablo pointed out that there
      are more xt modules with link-time dependeny issues regarding ipv6,
      so lets go for 2).
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2a7851bf
  8. 21 5月, 2013 2 次提交
  9. 20 5月, 2013 2 次提交
    • Y
      tcp: remove bad timeout logic in fast recovery · 3e59cb0d
      Yuchung Cheng 提交于
      tcp_timeout_skb() was intended to trigger fast recovery on timeout,
      unfortunately in reality it often causes spurious retransmission
      storms during fast recovery. The particular sign is a fast retransmit
      over the highest sacked sequence (SND.FACK).
      
      Currently the RTO timer re-arming (as in RFC6298) offers a nice cushion
      to avoid spurious timeout: when SND.UNA advances the sender re-arms
      RTO and extends the timeout by icsk_rto. The sender does not offset
      the time elapsed since the packet at SND.UNA was sent.
      
      But if the next (DUP)ACK arrives later than ~RTTVAR and triggers
      tcp_fastretrans_alert(), then tcp_timeout_skb() will mark any packet
      sent before the icsk_rto interval lost, including one that's above the
      highest sacked sequence. Most likely a large part of scorebard will be
      marked.
      
      If most packets are not lost then the subsequent DUPACKs with new SACK
      blocks will cause the sender to continue to retransmit packets beyond
      SND.FACK spuriously. Even if only one packet is lost the sender may
      falsely retransmit almost the entire window.
      
      The situation becomes common in the world of bufferbloat: the RTT
      continues to grow as the queue builds up but RTTVAR remains small and
      close to the minimum 200ms. If a data packet is lost and the DUPACK
      triggered by the next data packet is slightly delayed, then a spurious
      retransmission storm forms.
      
      As the original comment on tcp_timeout_skb() suggests: the usefulness
      of this feature is questionable. It also wastes cycles walking the
      sack scoreboard and is actually harmful because of false recovery.
      
      It's time to remove this.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NNandita Dukkipati <nanditad@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e59cb0d
    • N
      ipv6: add support of peer address · caeaba79
      Nicolas Dichtel 提交于
      This patch adds the support of peer address for IPv6. For example, it is
      possible to specify the remote end of a 6inY tunnel.
      This was already possible in IPv4:
       ip addr add ip1 peer ip2 dev dev1
      
      The peer address is specified with IFA_ADDRESS and the local address with
      IFA_LOCAL (like explained in include/uapi/linux/if_addr.h).
      Note that the API is not changed, because before this patch, it was not
      possible to specify two different addresses in IFA_LOCAL and IFA_REMOTE.
      There is a small change for the dump: if the peer is different from ::,
      IFA_ADDRESS will contain the peer address instead of the local address.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      caeaba79
  10. 17 5月, 2013 1 次提交
  11. 15 5月, 2013 1 次提交
  12. 12 5月, 2013 1 次提交
    • E
      ipv6: do not clear pinet6 field · f77d6021
      Eric Dumazet 提交于
      We have seen multiple NULL dereferences in __inet6_lookup_established()
      
      After analysis, I found that inet6_sk() could be NULL while the
      check for sk_family == AF_INET6 was true.
      
      Bug was added in linux-2.6.29 when RCU lookups were introduced in UDP
      and TCP stacks.
      
      Once an IPv6 socket, using SLAB_DESTROY_BY_RCU is inserted in a hash
      table, we no longer can clear pinet6 field.
      
      This patch extends logic used in commit fcbdf09d
      ("net: fix nulls list corruptions in sk_prot_alloc")
      
      TCP/UDP/UDPLite IPv6 protocols provide their own .clear_sk() method
      to make sure we do not clear pinet6 field.
      
      At socket clone phase, we do not really care, as cloning the parent (non
      NULL) pinet6 is not adding a fatal race.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f77d6021
  13. 06 5月, 2013 1 次提交
    • K
      net: frag, fix race conditions in LRU list maintenance · b56141ab
      Konstantin Khlebnikov 提交于
      This patch fixes race between inet_frag_lru_move() and inet_frag_lru_add()
      which was introduced in commit 3ef0eb0d
      ("net: frag, move LRU list maintenance outside of rwlock")
      
      One cpu already added new fragment queue into hash but not into LRU.
      Other cpu found it in hash and tries to move it to the end of LRU.
      This leads to NULL pointer dereference inside of list_move_tail().
      
      Another possible race condition is between inet_frag_lru_move() and
      inet_frag_lru_del(): move can happens after deletion.
      
      This patch initializes LRU list head before adding fragment into hash and
      inet_frag_lru_move() doesn't touches it if it's empty.
      
      I saw this kernel oops two times in a couple of days.
      
      [119482.128853] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [119482.132693] IP: [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
      [119482.136456] PGD 2148f6067 PUD 215ab9067 PMD 0
      [119482.140221] Oops: 0000 [#1] SMP
      [119482.144008] Modules linked in: vfat msdos fat 8021q fuse nfsd auth_rpcgss nfs_acl nfs lockd sunrpc ppp_async ppp_generic bridge slhc stp llc w83627ehf hwmon_vid snd_hda_codec_hdmi snd_hda_codec_realtek kvm_amd k10temp kvm snd_hda_intel snd_hda_codec edac_core radeon snd_hwdep ath9k snd_pcm ath9k_common snd_page_alloc ath9k_hw snd_timer snd soundcore drm_kms_helper ath ttm r8169 mii
      [119482.152692] CPU 3
      [119482.152721] Pid: 20, comm: ksoftirqd/3 Not tainted 3.9.0-zurg-00001-g9f95269 #132 To Be Filled By O.E.M. To Be Filled By O.E.M./RS880D
      [119482.161478] RIP: 0010:[<ffffffff812ede89>]  [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
      [119482.166004] RSP: 0018:ffff880216d5db58  EFLAGS: 00010207
      [119482.170568] RAX: 0000000000000000 RBX: ffff88020882b9c0 RCX: dead000000200200
      [119482.175189] RDX: 0000000000000000 RSI: 0000000000000880 RDI: ffff88020882ba00
      [119482.179860] RBP: ffff880216d5db58 R08: ffffffff8155c7f0 R09: 0000000000000014
      [119482.184570] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88020882ba00
      [119482.189337] R13: ffffffff81c8d780 R14: ffff880204357f00 R15: 00000000000005a0
      [119482.194140] FS:  00007f58124dc700(0000) GS:ffff88021fcc0000(0000) knlGS:0000000000000000
      [119482.198928] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [119482.203711] CR2: 0000000000000000 CR3: 00000002155f0000 CR4: 00000000000007e0
      [119482.208533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [119482.213371] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [119482.218221] Process ksoftirqd/3 (pid: 20, threadinfo ffff880216d5c000, task ffff880216d3a9a0)
      [119482.223113] Stack:
      [119482.228004]  ffff880216d5dbd8 ffffffff8155dcda 0000000000000000 ffff000200000001
      [119482.233038]  ffff8802153c1f00 ffff880000289440 ffff880200000014 ffff88007bc72000
      [119482.238083]  00000000000079d5 ffff88007bc72f44 ffffffff00000002 ffff880204357f00
      [119482.243090] Call Trace:
      [119482.248009]  [<ffffffff8155dcda>] ip_defrag+0x8fa/0xd10
      [119482.252921]  [<ffffffff815a8013>] ipv4_conntrack_defrag+0x83/0xe0
      [119482.257803]  [<ffffffff8154485b>] nf_iterate+0x8b/0xa0
      [119482.262658]  [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
      [119482.267527]  [<ffffffff815448e4>] nf_hook_slow+0x74/0x130
      [119482.272412]  [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
      [119482.277302]  [<ffffffff8155d068>] ip_rcv+0x268/0x320
      [119482.282147]  [<ffffffff81519992>] __netif_receive_skb_core+0x612/0x7e0
      [119482.286998]  [<ffffffff81519b78>] __netif_receive_skb+0x18/0x60
      [119482.291826]  [<ffffffff8151a650>] process_backlog+0xa0/0x160
      [119482.296648]  [<ffffffff81519f29>] net_rx_action+0x139/0x220
      [119482.301403]  [<ffffffff81053707>] __do_softirq+0xe7/0x220
      [119482.306103]  [<ffffffff81053868>] run_ksoftirqd+0x28/0x40
      [119482.310809]  [<ffffffff81074f5f>] smpboot_thread_fn+0xff/0x1a0
      [119482.315515]  [<ffffffff81074e60>] ? lg_local_lock_cpu+0x40/0x40
      [119482.320219]  [<ffffffff8106d870>] kthread+0xc0/0xd0
      [119482.324858]  [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
      [119482.329460]  [<ffffffff816c32dc>] ret_from_fork+0x7c/0xb0
      [119482.334057]  [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
      [119482.338661] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08
      [119482.343787] RIP  [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
      [119482.348675]  RSP <ffff880216d5db58>
      [119482.353493] CR2: 0000000000000000
      
      Oops happened on this path:
      ip_defrag() -> ip_frag_queue() -> inet_frag_lru_move() -> list_move_tail() -> __list_del_entry()
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b56141ab
  14. 02 5月, 2013 1 次提交
    • E
      af_unix: fix a fatal race with bit fields · 60bc851a
      Eric Dumazet 提交于
      Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
      uses 64bit instructions to manipulate them.
      If the 64bit word includes any atomic_t or spinlock_t, we can lose
      critical concurrent changes.
      
      This is happening in af_unix, where unix_sk(sk)->gc_candidate/
      gc_maybe_cycle/lock share the same 64bit word.
      
      This leads to fatal deadlock, as one/several cpus spin forever
      on a spinlock that will never be available again.
      
      A safer way would be to use a long to store flags.
      This way we are sure compiler/arch wont do bad things.
      
      As we own unix_gc_lock spinlock when clearing or setting bits,
      we can use the non atomic __set_bit()/__clear_bit().
      
      recursion_level can share the same 64bit location with the spinlock,
      as it is set only with this spinlock held.
      
      [1] bug fixed in gcc-4.8.0 :
      http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080Reported-by: NAmbrose Feinstein <ambrose@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60bc851a
  15. 30 4月, 2013 5 次提交
    • D
      hostap: Don't use create_proc_read_entry() · 6bbefe86
      David Howells 提交于
      Don't use create_proc_read_entry() as that is deprecated, but rather use
      proc_create_data() and seq_file instead.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc: Jouni Malinen <j@w1.fi>
      cc: John W. Linville <linville@tuxdriver.com>
      cc: Johannes Berg <johannes@sipsolutions.net>
      cc: linux-wireless@vger.kernel.org
      cc: netdev@vger.kernel.org
      cc: devel@driverdev.osuosl.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6bbefe86
    • E
      net: defer net_secret[] initialization · aebda156
      Eric Dumazet 提交于
      Instead of feeding net_secret[] at boot time, defer the init
      at the point first socket is created.
      
      This permits some platforms to use better entropy sources than
      the ones available at boot time.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aebda156
    • S
      sctp: Correct type and usage of sctp_end_cksum() · eee1d5a1
      Simon Horman 提交于
      Change the type of the crc32 parameter of sctp_end_cksum()
      from __be32 to __u32 to reflect that fact that it is passed
      to cpu_to_le32().
      
      There are five in-tree users of sctp_end_cksum().
      The following four had warnings flagged by sparse which are
      no longer present with this change.
      
      net/netfilter/ipvs/ip_vs_proto_sctp.c:sctp_nat_csum()
      net/netfilter/ipvs/ip_vs_proto_sctp.c:sctp_csum_check()
      net/sctp/input.c:sctp_rcv_checksum()
      net/sctp/output.c:sctp_packet_transmit()
      
      The fifth user is net/netfilter/nf_nat_proto_sctp.c:sctp_manip_pkt().
      It has been updated to pass a __u32 instead of a __be32,
      the value in question was already calculated in cpu byte-order.
      
      net/netfilter/nf_nat_proto_sctp.c:sctp_manip_pkt() has also
      been updated to assign the return value of sctp_end_cksum()
      directly to a variable of type __le32, matching the
      type of the return value. Previously the return value
      was assigned to a variable of type __be32 and then that variable
      was finally assigned to another variable of type __le32.
      
      Problems flagged by sparse.
      Compile and sparse tested only.
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      eee1d5a1
    • F
      netfilter: move skb_gso_segment into nfnetlink_queue module · a5fedd43
      Florian Westphal 提交于
      skb_gso_segment is expensive, so it would be nice if we could
      avoid it in the future. However, userspace needs to be prepared
      to receive larger-than-mtu-packets (which will also have incorrect
      l3/l4 checksums), so we cannot simply remove it.
      
      The plan is to add a per-queue feature flag that userspace can
      set when binding the queue.
      
      The problem is that in nf_queue, we only have a queue number,
      not the queue context/configuration settings.
      
      This patch should have no impact other than the skb_gso_segment
      call now being in a function that has access to the queue config
      data.
      
      A new size attribute in nf_queue_entry is needed so
      nfnetlink_queue can duplicate the entry of the gso skb
      when segmenting the skb while also copying the route key.
      
      The follow up patch adds switch to disable skb_gso_segment when
      queue config says so.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      a5fedd43
    • J
      net: increase frag hash size · a4c4009f
      Jesper Dangaard Brouer 提交于
      Increase fragmentation hash bucket size to 1024 from old 64 elems.
      
      After we increased the frag mem limits commit c2a93660 (net: increase
      fragment memory usage limits) the hash size of 64 elements is simply
      too small.  Also considering the mem limit is per netns and the hash
      table is shared for all netns.
      
      For the embedded people, note that this increase will change the hash
      table/array from using approx 1 Kbytes to 16 Kbytes.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4c4009f
  16. 25 4月, 2013 2 次提交
  17. 24 4月, 2013 1 次提交
  18. 23 4月, 2013 3 次提交
  19. 22 4月, 2013 3 次提交
  20. 21 4月, 2013 1 次提交
  21. 20 4月, 2013 1 次提交
    • D
      irda: small read past the end of array in debug code · e15465e1
      Dan Carpenter 提交于
      The "reason" can come from skb->data[] and it hasn't been capped so it
      can be from 0-255 instead of just 0-6.  For example in irlmp_state_dtr()
      the code does:
      
      	reason = skb->data[3];
      	...
      	irlmp_disconnect_indication(self, reason, skb);
      
      Also LMREASON has a couple other values which don't have entries in the
      irlmp_reasons[] array.  And 0xff is a valid reason as well which means
      "unknown".
      
      So far as I can see we don't actually care about "reason" except for in
      the debug code.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e15465e1