1. 21 8月, 2013 4 次提交
  2. 16 8月, 2013 1 次提交
  3. 15 8月, 2013 2 次提交
  4. 14 8月, 2013 1 次提交
    • H
      ipv6: make unsolicited report intervals configurable for mld · fc4eba58
      Hannes Frederic Sowa 提交于
      Commit cab70040 ("net: igmp:
      Reduce Unsolicited report interval to 1s when using IGMPv3") and
      2690048c ("net: igmp: Allow user-space
      configuration of igmp unsolicited report interval") by William Manley made
      igmp unsolicited report intervals configurable per interface and corrected
      the interval of unsolicited igmpv3 report messages resendings to 1s.
      
      Same needs to be done for IPv6:
      
      MLDv1 (RFC2710 7.10.): 10 seconds
      MLDv2 (RFC3810 9.11.): 1 second
      
      Both intervals are configurable via new procfs knobs
      mldv1_unsolicited_report_interval and mldv2_unsolicited_report_interval.
      
      (also added .force_mld_version to ipv6_devconf_dflt to bring structs in
      line without semantic changes)
      
      v2:
      a) Joined documentation update for IPv4 and IPv6 MLD/IGMP
         unsolicited_report_interval procfs knobs.
      b) incorporate stylistic feedback from William Manley
      
      v3:
      a) add new DEVCONF_* values to the end of the enum (thanks to David
         Miller)
      
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: William Manley <william.manley@youview.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc4eba58
  5. 09 8月, 2013 2 次提交
  6. 08 8月, 2013 1 次提交
  7. 06 8月, 2013 1 次提交
    • D
      net: esp{4,6}: fix potential MTU calculation overflows · 7921895a
      Daniel Borkmann 提交于
      Commit 91657eaf ("xfrm: take net hdr len into account for esp payload
      size calculation") introduced a possible interger overflow in
      esp{4,6}_get_mtu() handlers in case of x->props.mode equals
      XFRM_MODE_TUNNEL. Thus, the following expression will overflow
      
        unsigned int net_adj;
        ...
        <case ipv{4,6} XFRM_MODE_TUNNEL>
               net_adj = 0;
        ...
        return ((mtu - x->props.header_len - crypto_aead_authsize(esp->aead) -
                 net_adj) & ~(align - 1)) + (net_adj - 2);
      
      where (net_adj - 2) would be evaluated as <foo> + (0 - 2) in an unsigned
      context. Fix it by simply removing brackets as those operations here
      do not need to have special precedence.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Benjamin Poirier <bpoirier@suse.de>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Acked-by: NBenjamin Poirier <bpoirier@suse.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7921895a
  8. 04 8月, 2013 1 次提交
    • S
      fib_rules: fix suppressor names and default values · 73f5698e
      Stefan Tomanek 提交于
      This change brings the suppressor attribute names into line; it also changes
      the data types to provide a more consistent interface.
      
      While -1 indicates that the suppressor is not enabled, values >= 0 for
      suppress_prefixlen or suppress_ifgroup  reject routing decisions violating the
      constraint.
      
      This changes the previously presented behaviour of suppress_prefixlen, where a
      prefix length _less_ than the attribute value was rejected. After this change,
      a prefix length less than *or* equal to the value is considered a violation of
      the rule constraint.
      
      It also changes the default values for default and newly added rules (disabling
      any suppression for those).
      Signed-off-by: NStefan Tomanek <stefan.tomanek@wertarbyte.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73f5698e
  9. 03 8月, 2013 3 次提交
  10. 02 8月, 2013 5 次提交
    • F
      ipv6: bump genid when delete/add address · 439677d7
      fan.du 提交于
      Server           Client
      2001:1::803/64  <-> 2001:1::805/64
      2001:2::804/64  <-> 2001:2::806/64
      
      Server side fib binary tree looks like this:
      
                                         (2001:/64)
                                         /
                                        /
                         ffff88002103c380
                       /                 \
           (2)        /                   \
       (2001::803/128)                     ffff880037ac07c0
                                          /               \
                                         /                 \  (3)
                            ffff880037ac0640               (2001::806/128)
                             /             \
                   (1)      /               \
              (2001::804/128)               (2001::805/128)
      
      Delete 2001::804/64 won't cause prefix route deleted as well as rt in (3)
      destinate to 2001::806 with source address as 2001::804/64. That's because
      2001::803/64 is still alive, which make onlink=1 in ipv6_del_addr, this is
      where the substantial difference between same prefix configuration and
      different prefix configuration :) So packet are still transmitted out to
      2001::806 with source address as 2001::804/64.
      
      So bump genid will clear rt in (3), and up layer protocol will eventually
      find the right one for themselves.
      
      This problem arised from the discussion in here:
      http://marc.info/?l=linux-netdev&m=137404469219410&w=4Signed-off-by: NFan Du <fan.du@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      439677d7
    • J
      ipv6: prevent race between address creation and removal · 8a226b2c
      Jiri Benc 提交于
      There's a race in IPv6 automatic addess assignment. The address is created
      with zero lifetime when it's added to various address lists. Before it gets
      assigned the correct lifetime, there's a window where a new address may be
      configured. This causes the semi-initiated address to be deleted in
      addrconf_verify.
      
      This was discovered as a reference leak caused by concurrent run of
      __ipv6_ifa_notify for both RTM_NEWADDR and RTM_DELADDR with the same
      address.
      
      Fix this by setting the lifetime before the address is added to
      inet6_addr_lst.
      
      A few notes:
      
      1. In addrconf_prefix_rcv, by setting update_lft to zero, the
         if (update_lft) { ... } condition is no longer executed for newly
         created addresses. This is okay, as the ifp fields are set in
         ipv6_add_addr now and ipv6_ifa_notify is called (and has been called)
         through addrconf_dad_start.
      
      2. The removal of the whole block under ifp->lock in inet6_addr_add is okay,
         too, as tstamp is initialized to jiffies in ipv6_add_addr.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a226b2c
    • J
      ipv6: move peer_addr init into ipv6_add_addr() · 3f8f5298
      Jiri Pirko 提交于
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f8f5298
    • M
      ipv6: update ip6_rt_last_gc every time GC is run · 49a18d86
      Michal Kubeček 提交于
      As pointed out by Eric Dumazet, net->ipv6.ip6_rt_last_gc should
      hold the last time garbage collector was run so that we should
      update it whenever fib6_run_gc() calls fib6_clean_all(), not only
      if we got there from ip6_dst_gc().
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49a18d86
    • M
      ipv6: prevent fib6_run_gc() contention · 2ac3ac8f
      Michal Kubeček 提交于
      On a high-traffic router with many processors and many IPv6 dst
      entries, soft lockup in fib6_run_gc() can occur when number of
      entries reaches gc_thresh.
      
      This happens because fib6_run_gc() uses fib6_gc_lock to allow
      only one thread to run the garbage collector but ip6_dst_gc()
      doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc()
      returns. On a system with many entries, this can take some time
      so that in the meantime, other threads pass the tests in
      ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for
      the lock. They then have to run the garbage collector one after
      another which blocks them for quite long.
      
      Resolve this by replacing special value ~0UL of expire parameter
      to fib6_run_gc() by explicit "force" parameter to choose between
      spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with
      force=false if gc_thresh is reached but not max_size.
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ac3ac8f
  11. 01 8月, 2013 3 次提交
    • H
      ipv6: fib6_rules should return exact return value · 46b3a421
      Hannes Frederic Sowa 提交于
      With the addition of the suppress operation
      (7764a45a ("fib_rules: add .suppress
      operation") we rely on accurate error reporting of the fib_rules.actions.
      
      fib6_rule_action always returned -EAGAIN in case we could not find a
      matching route and 0 if a rule was matched. This also included a match
      for blackhole or prohibited rule actions which could get suppressed by
      the new logic.
      
      So adapt fib6_rule_action to always return the correct error code as
      its counterpart fib4_rule_action does. This also fixes a possiblity of
      nullptr-deref where we don't find a table, thus rt == NULL. Because
      the condition rt != ip6_null_entry still holdes it seems we could later
      get a nullptr bug on dereference rt->dst.
      
      v2:
      a) Fixed a brain fart in the commit msg (the rule => a table, etc). No
         changes to the patch.
      
      Cc: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46b3a421
    • S
      fib_rules: add .suppress operation · 7764a45a
      Stefan Tomanek 提交于
      This change adds a new operation to the fib_rules_ops struct; it allows the
      suppression of routing decisions if certain criteria are not met by its
      results.
      
      The first implemented constraint is a minimum prefix length added to the
      structures of routing rules. If a rule is added with a minimum prefix length
      >0, only routes meeting this threshold will be considered. Any other (more
      general) routing table entries will be ignored.
      
      When configuring a system with multiple network uplinks and default routes, it
      is often convinient to reference the main routing table multiple times - but
      omitting the default route. Using this patch and a modified "ip" utility, this
      can be achieved by using the following command sequence:
      
        $ ip route add table secuplink default via 10.42.23.1
      
        $ ip rule add pref 100            table main prefixlength 1
        $ ip rule add pref 150 fwmark 0xA table secuplink
      
      With this setup, packets marked 0xA will be processed by the additional routing
      table "secuplink", but only if no suitable route in the main routing table can
      be found. By using a minimal prefixlength of 1, the default route (/0) of the
      table "main" is hidden to packets processed by rule 100; packets traveling to
      destinations with more specific routing entries are processed as usual.
      Signed-off-by: NStefan Tomanek <stefan.tomanek@wertarbyte.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7764a45a
    • F
      net: split rt_genid for ipv4 and ipv6 · ca4c3fc2
      fan.du 提交于
      Current net name space has only one genid for both IPv4 and IPv6, it has below
      drawbacks:
      
      - Add/delete an IPv4 address will invalidate all IPv6 routing table entries.
      - Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table
        entries even when the policy is only applied for one address family.
      
      Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6
      separately in a fine granularity.
      Signed-off-by: NFan Du <fan.du@windriver.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca4c3fc2
  12. 31 7月, 2013 1 次提交
  13. 29 7月, 2013 1 次提交
  14. 25 7月, 2013 4 次提交
    • E
      tcp: TCP_NOTSENT_LOWAT socket option · c9bee3b7
      Eric Dumazet 提交于
      Idea of this patch is to add optional limitation of number of
      unsent bytes in TCP sockets, to reduce usage of kernel memory.
      
      TCP receiver might announce a big window, and TCP sender autotuning
      might allow a large amount of bytes in write queue, but this has little
      performance impact if a large part of this buffering is wasted :
      
      Write queue needs to be large only to deal with large BDP, not
      necessarily to cope with scheduling delays (incoming ACKS make room
      for the application to queue more bytes)
      
      For most workloads, using a value of 128 KB or less is OK to give
      applications enough time to react to POLLOUT events in time
      (or being awaken in a blocking sendmsg())
      
      This patch adds two ways to set the limit :
      
      1) Per socket option TCP_NOTSENT_LOWAT
      
      2) A sysctl (/proc/sys/net/ipv4/tcp_notsent_lowat) for sockets
      not using TCP_NOTSENT_LOWAT socket option (or setting a zero value)
      Default value being UINT_MAX (0xFFFFFFFF), meaning this has no effect.
      
      This changes poll()/select()/epoll() to report POLLOUT
      only if number of unsent bytes is below tp->nosent_lowat
      
      Note this might increase number of sendmsg()/sendfile() calls
      when using non blocking sockets,
      and increase number of context switches for blocking sockets.
      
      Note this is not related to SO_SNDLOWAT (as SO_SNDLOWAT is
      defined as :
       Specify the minimum number of bytes in the buffer until
       the socket layer will pass the data to the protocol)
      
      Tested:
      
      netperf sessions, and watching /proc/net/protocols "memory" column for TCP
      
      With 200 concurrent netperf -t TCP_STREAM sessions, amount of kernel memory
      used by TCP buffers shrinks by ~55 % (20567 pages instead of 45458)
      
      lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
      lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
      TCPv6     1880      2   45458   no     208   yes  ipv6        y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
      TCP       1696    508   45458   no     208   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
      
      lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
      lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
      TCPv6     1880      2   20567   no     208   yes  ipv6        y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
      TCP       1696    508   20567   no     208   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
      
      Using 128KB has no bad effect on the throughput or cpu usage
      of a single flow, although there is an increase of context switches.
      
      A bonus is that we hold socket lock for a shorter amount
      of time and should improve latencies of ACK processing.
      
      lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
      lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
      OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
      Local       Remote      Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service
      Send Socket Recv Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand
      Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units
      Final       Final                                             %     Method %      Method
      1651584     6291456     16384  20.00   17447.90   10^6bits/s  3.13  S      -1.00  U      0.353   -1.000  usec/KB
      
       Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':
      
                 412,514 context-switches
      
           200.034645535 seconds time elapsed
      
      lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
      lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
      OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
      Local       Remote      Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service
      Send Socket Recv Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand
      Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units
      Final       Final                                             %     Method %      Method
      1593240     6291456     16384  20.00   17321.16   10^6bits/s  3.35  S      -1.00  U      0.381   -1.000  usec/KB
      
       Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':
      
               2,675,818 context-switches
      
           200.029651391 seconds time elapsed
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-By: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9bee3b7
    • H
      ipv6: take rtnl_lock and mark mrt6 table as freed on namespace cleanup · 905a6f96
      Hannes Frederic Sowa 提交于
      Otherwise we end up dereferencing the already freed net->ipv6.mrt pointer
      which leads to a panic (from Srivatsa S. Bhat):
      
      BUG: unable to handle kernel paging request at ffff882018552020
      IP: [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6]
      PGD 290a067 PUD 207ffe0067 PMD 207ff1d067 PTE 8000002018552060
      Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      Modules linked in: ebtable_nat ebtables nfs fscache nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables nfsd lockd nfs_acl exportfs auth_rpcgss autofs4 sunrpc 8021q garp bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
      +ip6_tables ipv6 vfat fat vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput iTCO_wdt iTCO_vendor_support cdc_ether usbnet mii microcode i2c_i801 i2c_core lpc_ich mfd_core shpchp ioatdma dca mlx4_core be2net wmi acpi_cpufreq mperf ext4 jbd2 mbcache dm_mirror dm_region_hash dm_log dm_mod
      CPU: 0 PID: 7 Comm: kworker/u33:0 Not tainted 3.11.0-rc1-ea45e-a #4
      Hardware name: IBM  -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
      Workqueue: netns cleanup_net
      task: ffff8810393641c0 ti: ffff881039366000 task.ti: ffff881039366000
      RIP: 0010:[<ffffffffa0366b02>]  [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6]
      RSP: 0018:ffff881039367bd8  EFLAGS: 00010286
      RAX: ffff881039367fd8 RBX: ffff882018552000 RCX: dead000000200200
      RDX: 0000000000000000 RSI: ffff881039367b68 RDI: ffff881039367b68
      RBP: ffff881039367bf8 R08: ffff881039367b68 R09: 2222222222222222
      R10: 2222222222222222 R11: 2222222222222222 R12: ffff882015a7a040
      R13: ffff882014eb89c0 R14: ffff8820289e2800 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff88103fc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff882018552020 CR3: 0000000001c0b000 CR4: 00000000000407f0
      Stack:
       ffff881039367c18 ffff882014eb89c0 ffff882015e28c00 0000000000000000
       ffff881039367c18 ffffffffa034d9d1 ffff8820289e2800 ffff882014eb89c0
       ffff881039367c58 ffffffff815bdecb ffffffff815bddf2 ffff882014eb89c0
      Call Trace:
       [<ffffffffa034d9d1>] rawv6_close+0x21/0x40 [ipv6]
       [<ffffffff815bdecb>] inet_release+0xfb/0x220
       [<ffffffff815bddf2>] ? inet_release+0x22/0x220
       [<ffffffffa032686f>] inet6_release+0x3f/0x50 [ipv6]
       [<ffffffff8151c1d9>] sock_release+0x29/0xa0
       [<ffffffff81525520>] sk_release_kernel+0x30/0x70
       [<ffffffffa034f14b>] icmpv6_sk_exit+0x3b/0x80 [ipv6]
       [<ffffffff8152fff9>] ops_exit_list+0x39/0x60
       [<ffffffff815306fb>] cleanup_net+0xfb/0x1a0
       [<ffffffff81075e3a>] process_one_work+0x1da/0x610
       [<ffffffff81075dc9>] ? process_one_work+0x169/0x610
       [<ffffffff81076390>] worker_thread+0x120/0x3a0
       [<ffffffff81076270>] ? process_one_work+0x610/0x610
       [<ffffffff8107da2e>] kthread+0xee/0x100
       [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70
       [<ffffffff8162a99c>] ret_from_fork+0x7c/0xb0
       [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70
      Code: 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 4c 8b 67 30 49 89 fd e8 db 3c 1e e1 49 8b 9c 24 90 08 00 00 48 85 db 74 06 <4c> 39 6b 20 74 20 bb f3 ff ff ff e8 8e 3c 1e e1 89 d8 4c 8b 65
      RIP  [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6]
       RSP <ffff881039367bd8>
      CR2: ffff882018552020
      Reported-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Tested-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      905a6f96
    • F
      net: ipv6 eliminate parameter "int addrlen" in function fib6_add_1 · 9225b230
      fan.du 提交于
      The "int addrlen" in fib6_add_1 is rebundant, as we can get it from
      parameter "struct in6_addr *addr" once we modified its type.
      And also fix some coding style issues in fib6_add_1
      Signed-off-by: NFan Du <fan.du@windriver.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9225b230
    • F
      net ipv6: Remove rebundant rt6i_nsiblings initialization · 86a37def
      fan.du 提交于
      Seting rt->rt6i_nsiblings to zero is rebundant, because above memset
      zeroed the rest of rt excluding the first dst memember.
      Signed-off-by: NFan Du <fan.du@windriver.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86a37def
  15. 24 7月, 2013 1 次提交
  16. 23 7月, 2013 1 次提交
  17. 17 7月, 2013 1 次提交
  18. 13 7月, 2013 1 次提交
    • H
      ipv6: only static routes qualify for equal cost multipathing · 307f2fb9
      Hannes Frederic Sowa 提交于
      Static routes in this case are non-expiring routes which did not get
      configured by autoconf or by icmpv6 redirects.
      
      To make sure we actually get an ecmp route while searching for the first
      one in this fib6_node's leafs, also make sure it matches the ecmp route
      assumptions.
      
      v2:
      a) Removed RTF_EXPIRE check in dst.from chain. The check of RTF_ADDRCONF
         already ensures that this route, even if added again without
         RTF_EXPIRES (in case of a RA announcement with infinite timeout),
         does not cause the rt6i_nsiblings logic to go wrong if a later RA
         updates the expiration time later.
      
      v3:
      a) Allow RTF_EXPIRES routes to enter the ecmp route set. We have to do so,
         because an pmtu event could update the RTF_EXPIRES flag and we would
         not count this route, if another route joins this set. We now filter
         only for RTF_GATEWAY|RTF_ADDRCONF|RTF_DYNAMIC, which are flags that
         don't get changed after rt6_info construction.
      
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      307f2fb9
  19. 12 7月, 2013 1 次提交
    • H
      ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF · afc154e9
      Hannes Frederic Sowa 提交于
      This is a follow-up patch to 3630d400
      ("ipv6: rt6_check_neigh should successfully verify neigh if no NUD
      information are available").
      
      Since the removal of rt->n in rt6_info we can end up with a dst ==
      NULL in rt6_check_neigh. In case the kernel is not compiled with
      CONFIG_IPV6_ROUTER_PREF we should also select a route with unkown
      NUD state but we must not avoid doing round robin selection on routes
      with the same target. So introduce and pass down a boolean ``do_rr'' to
      indicate when we should update rt->rr_ptr. As soon as no route is valid
      we do backtracking and do a lookup on a higher level in the fib trie.
      
      v2:
      a) Improved rt6_check_neigh logic (no need to create neighbour there)
         and documented return values.
      
      v3:
      a) Introduce enum rt6_nud_state to get rid of the magic numbers
         (thanks to David Miller).
      b) Update and shorten commit message a bit to actualy reflect
         the source.
      Reported-by: NPierre Emeriaud <petrus.lt@gmail.com>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      afc154e9
  20. 11 7月, 2013 3 次提交
    • H
      ipv6: in case of link failure remove route directly instead of letting it expire · 1eb4f758
      Hannes Frederic Sowa 提交于
      We could end up expiring a route which is part of an ecmp route set. Doing
      so would invalidate the rt->rt6i_nsiblings calculations and could provoke
      the following panic:
      
      [   80.144667] ------------[ cut here ]------------
      [   80.145172] kernel BUG at net/ipv6/ip6_fib.c:733!
      [   80.145172] invalid opcode: 0000 [#1] SMP
      [   80.145172] Modules linked in: 8021q nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables
      +snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer virtio_balloon snd soundcore i2c_piix4 i2c_core virtio_net virtio_blk
      [   80.145172] CPU: 1 PID: 786 Comm: ping6 Not tainted 3.10.0+ #118
      [   80.145172] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [   80.145172] task: ffff880117fa0000 ti: ffff880118770000 task.ti: ffff880118770000
      [   80.145172] RIP: 0010:[<ffffffff815f3b5d>]  [<ffffffff815f3b5d>] fib6_add+0x75d/0x830
      [   80.145172] RSP: 0018:ffff880118771798  EFLAGS: 00010202
      [   80.145172] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011350e480
      [   80.145172] RDX: ffff88011350e238 RSI: 0000000000000004 RDI: ffff88011350f738
      [   80.145172] RBP: ffff880118771848 R08: ffff880117903280 R09: 0000000000000001
      [   80.145172] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011350f680
      [   80.145172] R13: ffff880117903280 R14: ffff880118771890 R15: ffff88011350ef90
      [   80.145172] FS:  00007f02b5127740(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
      [   80.145172] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [   80.145172] CR2: 00007f981322a000 CR3: 00000001181b1000 CR4: 00000000000006e0
      [   80.145172] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   80.145172] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [   80.145172] Stack:
      [   80.145172]  0000000000000001 ffff880100000000 ffff880100000000 ffff880117903280
      [   80.145172]  0000000000000000 ffff880119a4cf00 0000000000000400 00000000000007fa
      [   80.145172]  0000000000000000 0000000000000000 0000000000000000 ffff88011350f680
      [   80.145172] Call Trace:
      [   80.145172]  [<ffffffff815eeceb>] ? rt6_bind_peer+0x4b/0x90
      [   80.145172]  [<ffffffff815ed985>] __ip6_ins_rt+0x45/0x70
      [   80.145172]  [<ffffffff815eee35>] ip6_ins_rt+0x35/0x40
      [   80.145172]  [<ffffffff815ef1e4>] ip6_pol_route.isra.44+0x3a4/0x4b0
      [   80.145172]  [<ffffffff815ef34a>] ip6_pol_route_output+0x2a/0x30
      [   80.145172]  [<ffffffff81616077>] fib6_rule_action+0xd7/0x210
      [   80.145172]  [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30
      [   80.145172]  [<ffffffff81553026>] fib_rules_lookup+0xc6/0x140
      [   80.145172]  [<ffffffff81616374>] fib6_rule_lookup+0x44/0x80
      [   80.145172]  [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30
      [   80.145172]  [<ffffffff815edea3>] ip6_route_output+0x73/0xb0
      [   80.145172]  [<ffffffff815dfdf3>] ip6_dst_lookup_tail+0x2c3/0x2e0
      [   80.145172]  [<ffffffff813007b1>] ? list_del+0x11/0x40
      [   80.145172]  [<ffffffff81082a4c>] ? remove_wait_queue+0x3c/0x50
      [   80.145172]  [<ffffffff815dfe4d>] ip6_dst_lookup_flow+0x3d/0xa0
      [   80.145172]  [<ffffffff815fda77>] rawv6_sendmsg+0x267/0xc20
      [   80.145172]  [<ffffffff815a8a83>] inet_sendmsg+0x63/0xb0
      [   80.145172]  [<ffffffff8128eb93>] ? selinux_socket_sendmsg+0x23/0x30
      [   80.145172]  [<ffffffff815218d6>] sock_sendmsg+0xa6/0xd0
      [   80.145172]  [<ffffffff81524a68>] SYSC_sendto+0x128/0x180
      [   80.145172]  [<ffffffff8109825c>] ? update_curr+0xec/0x170
      [   80.145172]  [<ffffffff81041d09>] ? kvm_clock_get_cycles+0x9/0x10
      [   80.145172]  [<ffffffff810afd1e>] ? __getnstimeofday+0x3e/0xd0
      [   80.145172]  [<ffffffff8152509e>] SyS_sendto+0xe/0x10
      [   80.145172]  [<ffffffff8164efd9>] system_call_fastpath+0x16/0x1b
      [   80.145172] Code: fe ff ff 41 f6 45 2a 06 0f 85 ca fe ff ff 49 8b 7e 08 4c 89 ee e8 94 ef ff ff e9 b9 fe ff ff 48 8b 82 28 05 00 00 e9 01 ff ff ff <0f> 0b 49 8b 54 24 30 0d 00 00 40 00 89 83 14 01 00 00 48 89 53
      [   80.145172] RIP  [<ffffffff815f3b5d>] fib6_add+0x75d/0x830
      [   80.145172]  RSP <ffff880118771798>
      [   80.387413] ---[ end trace 02f20b7a8b81ed95 ]---
      [   80.390154] Kernel panic - not syncing: Fatal exception in interrupt
      
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1eb4f758
    • E
      net: rename ll methods to busy-poll · 8b80cda5
      Eliezer Tamir 提交于
      Rename ndo_ll_poll to ndo_busy_poll.
      Rename sk_mark_ll to sk_mark_napi_id.
      Rename skb_mark_ll to skb_mark_napi_id.
      Correct all useres of these functions.
      Update comments and defines  in include/net/busy_poll.h
      Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b80cda5
    • E
      net: rename include/net/ll_poll.h to include/net/busy_poll.h · 076bb0c8
      Eliezer Tamir 提交于
      Rename the file and correct all the places where it is included.
      Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      076bb0c8
  21. 05 7月, 2013 1 次提交
  22. 04 7月, 2013 1 次提交
    • H
      ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available · 3630d400
      Hannes Frederic Sowa 提交于
      After the removal of rt->n we do not create a neighbour entry at route
      insertion time (rt6_bind_neighbour is gone). As long as no neighbour is
      created because of "useful traffic" we skip this routing entry because
      rt6_check_neigh cannot pick up a valid neighbour (neigh == NULL) and
      thus returns false.
      
      This change was introduced by commit
      887c95cc ("ipv6: Complete neighbour
      entry removal from dst_entry.")
      
      To quote RFC4191:
      "If the host has no information about the router's reachability, then
      the host assumes the router is reachable."
      
      and also:
      "A host MUST NOT probe a router's reachability in the absence of useful
      traffic that the host would have sent to the router if it were reachable."
      
      So, just assume the router is reachable and let's rt6_probe do the
      rest. We don't need to create a neighbour on route insertion time.
      
      If we don't compile with CONFIG_IPV6_ROUTER_PREF (RFC4191 support)
      a neighbour is only valid if its nud_state is NUD_VALID. I did not find
      any references that we should probe the router on route insertion time
      via the other RFCs. So skip this route in that case.
      
      v2:
      a) use IS_ENABLED instead of #ifdefs (thanks to Sergei Shtylyov)
      Reported-by: NPierre Emeriaud <petrus.lt@gmail.com>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3630d400