1. 13 2月, 2010 2 次提交
    • S
      IPv6: keep permanent addresses on admin down · dc2b99f7
      stephen hemminger 提交于
      Permanent IPV6 addresses should not be removed when the link is
      set to admin down, only when device is removed.
      
      When link is lost permanent addresses should be marked as tentative
      so that when link comes back they are subject to duplicate address
      detection (if DAD was enabled for that address).
      
      Other routing systems keep manually configured IPv6 addresses
      when link is set down.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc2b99f7
    • P
      ipv6: fib: fix crash when changing large fib while dumping it · 2bec5a36
      Patrick McHardy 提交于
      When the fib size exceeds what can be dumped in a single skb, the
      dump is suspended and resumed once the last skb has been received
      by userspace. When the fib is changed while the dump is suspended,
      the walker might contain stale pointers, causing a crash when the
      dump is resumed.
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      IP: [<ffffffffa01bce04>] fib6_walk_continue+0xbb/0x124 [ipv6]
      PGD 5347a067 PUD 65c7067 PMD 0
      Oops: 0000 [#1] PREEMPT SMP
      ...
      RIP: 0010:[<ffffffffa01bce04>]
      [<ffffffffa01bce04>] fib6_walk_continue+0xbb/0x124 [ipv6]
      ...
      Call Trace:
       [<ffffffff8104aca3>] ? mutex_spin_on_owner+0x59/0x71
       [<ffffffffa01bd105>] inet6_dump_fib+0x11b/0x1b9 [ipv6]
       [<ffffffff81371af4>] netlink_dump+0x5b/0x19e
       [<ffffffff8134f288>] ? consume_skb+0x28/0x2a
       [<ffffffff81373b69>] netlink_recvmsg+0x1ab/0x2c6
       [<ffffffff81372781>] ? netlink_unicast+0xfa/0x151
       [<ffffffff813483e0>] __sock_recvmsg+0x6d/0x79
       [<ffffffff81348a53>] sock_recvmsg+0xca/0xe3
       [<ffffffff81066d4b>] ? autoremove_wake_function+0x0/0x38
       [<ffffffff811ed1f8>] ? radix_tree_lookup_slot+0xe/0x10
       [<ffffffff810b3ed7>] ? find_get_page+0x90/0xa5
       [<ffffffff810b5dc5>] ? filemap_fault+0x201/0x34f
       [<ffffffff810ef152>] ? fget_light+0x2f/0xac
       [<ffffffff813519e7>] ? verify_iovec+0x4f/0x94
       [<ffffffff81349a65>] sys_recvmsg+0x14d/0x223
      
      Store the serial number when beginning to walk the fib and reload
      pointers when continuing to walk after a change occured. Similar
      to other dumping functions, this might cause unrelated entries to
      be missed when entries are deleted.
      Tested-by: NBen Greear <greearb@candelatech.com>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2bec5a36
  2. 09 2月, 2010 1 次提交
  3. 28 1月, 2010 3 次提交
  4. 26 1月, 2010 1 次提交
  5. 25 1月, 2010 1 次提交
  6. 23 1月, 2010 1 次提交
  7. 18 1月, 2010 2 次提交
  8. 14 1月, 2010 1 次提交
    • D
      ipv6: skb_dst() can be NULL in ipv6_hop_jumbo(). · 2570a4f5
      David S. Miller 提交于
      This fixes CERT-FI FICORA #341748
      
      Discovered by Olli Jarva and Tuomo Untinen from the CROSS
      project at Codenomicon Ltd.
      
      Just like in CVE-2007-4567, we can't rely upon skb_dst() being
      non-NULL at this point.  We fixed that in commit
      e76b2b25 ("[IPV6]: Do no rely on
      skb->dst before it is assigned.")
      
      However commit 483a47d2 ("ipv6: added
      net argument to IP6_INC_STATS_BH") put a new version of the same bug
      into this function.
      
      Complicating analysis further, this bug can only trigger when network
      namespaces are enabled in the build.  When namespaces are turned off,
      the dev_net() does not evaluate it's argument, so the dereference
      would not occur.
      
      So, for a long time, namespaces couldn't be turned on unless SYSFS was
      disabled.  Therefore, this code has largely been disabled except by
      people turning it on explicitly for namespace development.
      
      With help from Eugene Teo <eugene@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2570a4f5
  9. 11 1月, 2010 1 次提交
    • J
      NET: ipv6, remove unnecessary check · c3f6c21d
      Jiri Slaby 提交于
      Stanse found a potential null dereference in snmp6_unregister_dev.
      There is a check for idev being NULL, but it is dereferenced
      earlier. But idev cannot be NULL when passed to
      snmp6_unregister_dev, so remove the test.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3f6c21d
  10. 08 1月, 2010 1 次提交
  11. 07 1月, 2010 1 次提交
    • O
      ip: fix mc_loop checks for tunnels with multicast outer addresses · 7ad6848c
      Octavian Purdila 提交于
      When we have L3 tunnels with different inner/outer families
      (i.e. IPV4/IPV6) which use a multicast address as the outer tunnel
      destination address, multicast packets will be loopbacked back to the
      sending socket even if IP*_MULTICAST_LOOP is set to disabled.
      
      The mc_loop flag is present in the family specific part of the socket
      (e.g. the IPv4 or IPv4 specific part).  setsockopt sets the inner
      family mc_loop flag. When the packet is pushed through the L3 tunnel
      it will eventually be processed by the outer family which if different
      will check the flag in a different part of the socket then it was set.
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ad6848c
  12. 24 12月, 2009 1 次提交
    • L
      net: Add rtnetlink init_rcvwnd to set the TCP initial receive window · 31d12926
      laurent chavey 提交于
      Add rtnetlink init_rcvwnd to set the TCP initial receive window size
      advertised by passive and active TCP connections.
      The current Linux TCP implementation limits the advertised TCP initial
      receive window to the one prescribed by slow start. For short lived
      TCP connections used for transaction type of traffic (i.e. http
      requests), bounding the advertised TCP initial receive window results
      in increased latency to complete the transaction.
      Support for setting initial congestion window is already supported
      using rtnetlink init_cwnd, but the feature is useless without the
      ability to set a larger TCP initial receive window.
      The rtnetlink init_rcvwnd allows increasing the TCP initial receive
      window, allowing TCP connection to advertise larger TCP receive window
      than the ones bounded by slow start.
      Signed-off-by: NLaurent Chavey <chavey@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31d12926
  13. 19 12月, 2009 2 次提交
    • Y
      ipv6: fix an oops when force unload ipv6 module · 3705e11a
      Yang Hongyang 提交于
      When I do an ipv6 module force unload,I got the following oops:
      #rmmod -f ipv6
      ------------[ cut here ]------------
      kernel BUG at mm/slub.c:2969!
      invalid opcode: 0000 [#1] SMP
      last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/0000:02:03.0/net/eth2/ifindex
      Modules linked in: ipv6(-) dm_multipath uinput ppdev tpm_tis tpm tpm_bios pcspkr pcnet32 mii parport_pc i2c_piix4 parport i2c_core floppy mptspi mptscsih mptbase scsi_transport_spi
      
      Pid: 2530, comm: rmmod Tainted: G  R        2.6.32 #2 440BX Desktop Reference Platform/VMware Virtual Platform
      EIP: 0060:[<c04b73f2>] EFLAGS: 00010246 CPU: 0
      EIP is at kfree+0x6a/0xdd
      EAX: 00000000 EBX: c09e86bc ECX: c043e4dd EDX: c14293e0
      ESI: e141f1d8 EDI: e140fc31 EBP: dec58ef0 ESP: dec58ed0
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      Process rmmod (pid: 2530, ti=dec58000 task=decb1940 task.ti=dec58000)
      Stack:
       c14293e0 00000282 df624240 c0897d08 c09e86bc c09e86bc e141f1d8 dec58f1c
      <0> dec58f00 e140fc31 c09e84c4 e141f1bc dec58f14 c0689d21 dec58f1c e141f1bc
      <0> 00000000 dec58f2c c0689eff c09e84d8 c09e84d8 e141f1bc bff33a90 dec58f38
      Call Trace:
       [<e140fc31>] ? ipv6_frags_exit_net+0x22/0x32 [ipv6]
       [<c0689d21>] ? ops_exit_list+0x19/0x3d
       [<c0689eff>] ? unregister_pernet_operations+0x2a/0x51
       [<c0689f70>] ? unregister_pernet_subsys+0x17/0x24
       [<e140fbfe>] ? ipv6_frag_exit+0x21/0x32 [ipv6]
       [<e141a361>] ? inet6_exit+0x47/0x122 [ipv6]
       [<c045f5de>] ? sys_delete_module+0x198/0x1f6
       [<c04a8acf>] ? remove_vma+0x57/0x5d
       [<c070f63f>] ? do_page_fault+0x2e7/0x315
       [<c0403218>] ? sysenter_do_call+0x12/0x28
      Code: 86 00 00 00 40 c1 e8 0c c1 e0 05 01 d0 89 45 e0 66 83 38 00 79 06 8b 40 0c 89 45 e0 8b 55 e0 8b 02 84 c0 78 14 66 a9 00 c0 75 04 <0f> 0b eb fe 8b 45 e0 e8 35 15 fe ff eb 5d 8b 45 04 8b 55 e0 89
      EIP: [<c04b73f2>] kfree+0x6a/0xdd SS:ESP 0068:dec58ed0
      ---[ end trace 4475d1a5b0afa7e5 ]---
      
      It's because in ip6_frags_ns_sysctl_register,
      "table" only alloced when "net" is not equals
      to "init_net".So when we free "table" in 
      ip6_frags_ns_sysctl_unregister,we should check
      this first.
      
      This patch fix the problem.
      Signed-off-by: NYang Hongyang <yanghy@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3705e11a
    • A
      netns: fix net.ipv6.route.gc_min_interval_ms in netns · 9c69fabe
      Alexey Dobriyan 提交于
      sysctl table was copied, all right, but ->data for net.ipv6.route.gc_min_interval_ms
      was not reinitialized for "!= &init_net" case.
      
      In init_net everthing works by accident due to correct ->data initialization
      in source table.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c69fabe
  14. 16 12月, 2009 1 次提交
    • D
      tcp: Revert per-route SACK/DSACK/TIMESTAMP changes. · bb5b7c11
      David S. Miller 提交于
      It creates a regression, triggering badness for SYN_RECV
      sockets, for example:
      
      [19148.022102] Badness at net/ipv4/inet_connection_sock.c:293
      [19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000
      [19148.023035] REGS: eeecbd30 TRAP: 0700   Not tainted  (2.6.32)
      [19148.023496] MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 24002442  XER: 00000000
      [19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000
      
      This is likely caused by the change in the 'estab' parameter
      passed to tcp_parse_options() when invoked by the functions
      in net/ipv4/tcp_minisocks.c
      
      But even if that is fixed, the ->conn_request() changes made in
      this patch series is fundamentally wrong.  They try to use the
      listening socket's 'dst' to probe the route settings.  The
      listening socket doesn't even have a route, and you can't
      get the right route (the child request one) until much later
      after we setup all of the state, and it must be done by hand.
      
      This stuff really isn't ready, so the best thing to do is a
      full revert.  This reverts the following commits:
      
      f55017a9
      022c3f7d
      1aba721e
      cda42ebd
      345cda2f
      dc343475
      05eaade2
      6a2a2d6bSigned-off-by: NDavid S. Miller <davem@davemloft.net>
      bb5b7c11
  15. 15 12月, 2009 2 次提交
  16. 09 12月, 2009 1 次提交
  17. 04 12月, 2009 4 次提交
    • E
      tcp: connect() race with timewait reuse · 13475a30
      Eric Dumazet 提交于
      Its currently possible that several threads issuing a connect() find
      the same timewait socket and try to reuse it, leading to list
      corruptions.
      
      Condition for bug is that these threads bound their socket on same
      address/port of to-be-find timewait socket, and connected to same
      target. (SO_REUSEADDR needed)
      
      To fix this problem, we could unhash timewait socket while holding
      ehash lock, to make sure lookups/changes will be serialized. Only
      first thread finds the timewait socket, other ones find the
      established socket and return an EADDRNOTAVAIL error.
      
      This second version takes into account Evgeniy's review and makes sure
      inet_twsk_put() is called outside of locked sections.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13475a30
    • E
      net: Batch inet_twsk_purge · b099ce26
      Eric W. Biederman 提交于
      This function walks the whole hashtable so there is no point in
      passing it a network namespace.  Instead I purge all timewait
      sockets from dead network namespaces that I find.  If the namespace
      is one of the once I am trying to purge I am guaranteed no new timewait
      sockets can be formed so this will get them all.  If the namespace
      is one I am not acting for it might form a few more but I will
      call inet_twsk_purge again and  shortly to get rid of them.  In
      any even if the network namespace is dead timewait sockets are
      useless.
      
      Move the calls of inet_twsk_purge into batch_exit routines so
      that if I am killing a bunch of namespaces at once I will just
      call inet_twsk_purge once and save a lot of redundant unnecessary
      work.
      
      My simple 4k network namespace exit test the cleanup time dropped from
      roughly 8.2s to 1.6s.  While the time spent running inet_twsk_purge fell
      to about 2ms.  1ms for ipv4 and 1ms for ipv6.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b099ce26
    • E
      net: Allow fib_rule_unregister to batch · e9c5158a
      Eric W. Biederman 提交于
      Refactor the code so fib_rules_register always takes a template instead
      of the actual fib_rules_ops structure that will be used.  This is
      required for network namespace support so 2 out of the 3 callers already
      do this, it allows the error handling to be made common, and it allows
      fib_rules_unregister to free the template for hte caller.
      
      Modify fib_rules_unregister to use call_rcu instead of syncrhonize_rcu
      to allw multiple namespaces to be cleaned up in the same rcu grace
      period.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9c5158a
    • P
      net 04/05: fib_rules: allow to delete local rule · 5adef180
      Patrick McHardy 提交于
      commit d124356ce314fff22a047ea334379d5105b2d834
      Author: Patrick McHardy <kaber@trash.net>
      Date:   Thu Dec 3 12:16:35 2009 +0100
      
          net: fib_rules: allow to delete local rule
      
          Allow to delete the local rule and recreate it with a higher priority. This
          can be used to force packets with a local destination out on the wire instead
          of routing them to loopback. Additionally this patch allows to recreate rules
          with a priority of 0.
      
          Combined with the previous patch to allow oif classification, a socket can
          be bound to the desired interface and packets routed to the wire like this:
      
          # move local rule to lower priority
          ip rule add pref 1000 lookup local
          ip rule del pref 0
      
          # route packets of sockets bound to eth0 to the wire independant
          # of the destination address
          ip rule add pref 100 oif eth0 lookup 100
          ip route add default dev eth0 table 100
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5adef180
  18. 03 12月, 2009 3 次提交
    • W
      TCPCT part 1g: Responder Cookie => Initiator · 4957faad
      William Allen Simpson 提交于
      Parse incoming TCP_COOKIE option(s).
      
      Calculate <SYN,ACK> TCP_COOKIE option.
      
      Send optional <SYN,ACK> data.
      
      This is a significantly revised implementation of an earlier (year-old)
      patch that no longer applies cleanly, with permission of the original
      author (Adam Langley):
      
          http://thread.gmane.org/gmane.linux.network/102586
      
      Requires:
         TCPCT part 1a: add request_values parameter for sending SYNACK
         TCPCT part 1b: generate Responder Cookie secret
         TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
         TCPCT part 1d: define TCP cookie option, extend existing struct's
         TCPCT part 1e: implement socket option TCP_COOKIE_TRANSACTIONS
         TCPCT part 1f: Initiator Cookie => Responder
      
      Signed-off-by: William.Allen.Simpson@gmail.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4957faad
    • W
      TCPCT part 1d: define TCP cookie option, extend existing struct's · 435cf559
      William Allen Simpson 提交于
      Data structures are carefully composed to require minimal additions.
      For example, the struct tcp_options_received cookie_plus variable fits
      between existing 16-bit and 8-bit variables, requiring no additional
      space (taking alignment into consideration).  There are no additions to
      tcp_request_sock, and only 1 pointer in tcp_sock.
      
      This is a significantly revised implementation of an earlier (year-old)
      patch that no longer applies cleanly, with permission of the original
      author (Adam Langley):
      
          http://thread.gmane.org/gmane.linux.network/102586
      
      The principle difference is using a TCP option to carry the cookie nonce,
      instead of a user configured offset in the data.  This is more flexible and
      less subject to user configuration error.  Such a cookie option has been
      suggested for many years, and is also useful without SYN data, allowing
      several related concepts to use the same extension option.
      
          "Re: SYN floods (was: does history repeat itself?)", September 9, 1996.
          http://www.merit.net/mail.archives/nanog/1996-09/msg00235.html
      
          "Re: what a new TCP header might look like", May 12, 1998.
          ftp://ftp.isi.edu/end2end/end2end-interest-1998.mail
      
      These functions will also be used in subsequent patches that implement
      additional features.
      
      Requires:
         TCPCT part 1a: add request_values parameter for sending SYNACK
         TCPCT part 1b: generate Responder Cookie secret
         TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
      
      Signed-off-by: William.Allen.Simpson@gmail.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      435cf559
    • W
      TCPCT part 1a: add request_values parameter for sending SYNACK · e6b4d113
      William Allen Simpson 提交于
      Add optional function parameters associated with sending SYNACK.
      These parameters are not needed after sending SYNACK, and are not
      used for retransmission.  Avoids extending struct tcp_request_sock,
      and avoids allocating kernel memory.
      
      Also affects DCCP as it uses common struct request_sock_ops,
      but this parameter is currently reserved for future use.
      
      Signed-off-by: William.Allen.Simpson@gmail.com
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6b4d113
  19. 02 12月, 2009 2 次提交
  20. 26 11月, 2009 2 次提交
  21. 25 11月, 2009 1 次提交
  22. 24 11月, 2009 1 次提交
  23. 18 11月, 2009 1 次提交
  24. 14 11月, 2009 3 次提交
  25. 12 11月, 2009 1 次提交
    • E
      sysctl net: Remove unused binary sysctl code · f8572d8f
      Eric W. Biederman 提交于
      Now that sys_sysctl is a compatiblity wrapper around /proc/sys
      all sysctl strategy routines, and all ctl_name and strategy
      entries in the sysctl tables are unused, and can be
      revmoed.
      
      In addition neigh_sysctl_register has been modified to no longer
      take a strategy argument and it's callers have been modified not
      to pass one.
      
      Cc: "David Miller" <davem@davemloft.net>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      f8572d8f