1. 09 8月, 2013 1 次提交
    • E
      net: add SNMP counters tracking incoming ECN bits · 1f07d03e
      Eric Dumazet 提交于
      With GRO/LRO processing, there is a problem because Ip[6]InReceives SNMP
      counters do not count the number of frames, but number of aggregated
      segments.
      
      Its probably too late to change this now.
      
      This patch adds four new counters, tracking number of frames, regardless
      of LRO/GRO, and on a per ECN status basis, for IPv4 and IPv6.
      
      Ip[6]NoECTPkts : Number of packets received with NOECT
      Ip[6]ECT1Pkts  : Number of packets received with ECT(1)
      Ip[6]ECT0Pkts  : Number of packets received with ECT(0)
      Ip[6]CEPkts    : Number of packets received with Congestion Experienced
      
      lph37:~# nstat | egrep "Pkts|InReceive"
      IpInReceives                    1634137            0.0
      Ip6InReceives                   3714107            0.0
      Ip6InNoECTPkts                  19205              0.0
      Ip6InECT0Pkts                   52651828           0.0
      IpExtInNoECTPkts                33630              0.0
      IpExtInECT0Pkts                 15581379           0.0
      IpExtInCEPkts                   6                  0.0
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f07d03e
  2. 08 8月, 2013 8 次提交
  3. 05 8月, 2013 1 次提交
  4. 04 8月, 2013 2 次提交
    • S
      fib_rules: fix suppressor names and default values · 73f5698e
      Stefan Tomanek 提交于
      This change brings the suppressor attribute names into line; it also changes
      the data types to provide a more consistent interface.
      
      While -1 indicates that the suppressor is not enabled, values >= 0 for
      suppress_prefixlen or suppress_ifgroup  reject routing decisions violating the
      constraint.
      
      This changes the previously presented behaviour of suppress_prefixlen, where a
      prefix length _less_ than the attribute value was rejected. After this change,
      a prefix length less than *or* equal to the value is considered a violation of
      the rule constraint.
      
      It also changes the default values for default and newly added rules (disabling
      any suppression for those).
      Signed-off-by: NStefan Tomanek <stefan.tomanek@wertarbyte.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73f5698e
    • W
      vlan: cleanup the usage of vlan_dev_priv(dev) · 0c0667a8
      Wang Sheng-Hui 提交于
      This patch cleanup 2 points for the usage of vlan_dev_priv(dev):
      * In vlan_dev.c/vlan_dev_hard_header, we should use the var *vlan directly
        after grabing the pointer at the beginning with
              *vlan = vlan_dev_priv(dev);
        when we need to access the fields of *vlan.
      * In vlan.c/register_vlan_device, add the var *vlan pointer
              struct vlan_dev_priv *vlan;
      to cleanup the code to access the fields of vlan_dev_priv(new_dev).
      Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c0667a8
  5. 03 8月, 2013 13 次提交
  6. 02 8月, 2013 7 次提交
    • F
      ipv6: bump genid when delete/add address · 439677d7
      fan.du 提交于
      Server           Client
      2001:1::803/64  <-> 2001:1::805/64
      2001:2::804/64  <-> 2001:2::806/64
      
      Server side fib binary tree looks like this:
      
                                         (2001:/64)
                                         /
                                        /
                         ffff88002103c380
                       /                 \
           (2)        /                   \
       (2001::803/128)                     ffff880037ac07c0
                                          /               \
                                         /                 \  (3)
                            ffff880037ac0640               (2001::806/128)
                             /             \
                   (1)      /               \
              (2001::804/128)               (2001::805/128)
      
      Delete 2001::804/64 won't cause prefix route deleted as well as rt in (3)
      destinate to 2001::806 with source address as 2001::804/64. That's because
      2001::803/64 is still alive, which make onlink=1 in ipv6_del_addr, this is
      where the substantial difference between same prefix configuration and
      different prefix configuration :) So packet are still transmitted out to
      2001::806 with source address as 2001::804/64.
      
      So bump genid will clear rt in (3), and up layer protocol will eventually
      find the right one for themselves.
      
      This problem arised from the discussion in here:
      http://marc.info/?l=linux-netdev&m=137404469219410&w=4Signed-off-by: NFan Du <fan.du@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      439677d7
    • Y
      tipc: fix oops when creating server socket fails · c756891a
      Ying Xue 提交于
      When creation of TIPC internal server socket fails,
      we get an oops with the following dump:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
      IP: [<ffffffffa0011f49>] tipc_close_conn+0x59/0xb0 [tipc]
      PGD 13719067 PUD 12008067 PMD 0
      Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      Modules linked in: tipc(+)
      CPU: 4 PID: 4340 Comm: insmod Not tainted 3.10.0+ #1
      Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      task: ffff880014360000 ti: ffff88001374c000 task.ti: ffff88001374c000
      RIP: 0010:[<ffffffffa0011f49>]  [<ffffffffa0011f49>] tipc_close_conn+0x59/0xb0 [tipc]
      RSP: 0018:ffff88001374dc98  EFLAGS: 00010292
      RAX: 0000000000000000 RBX: ffff880012ac09d8 RCX: 0000000000000000
      RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffff880014360000
      RBP: ffff88001374dcb8 R08: 0000000000000001 R09: 0000000000000001
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0016fa0
      R13: ffffffffa0017010 R14: ffffffffa0017010 R15: ffff880012ac09d8
      FS:  0000000000000000(0000) GS:ffff880016600000(0063) knlGS:00000000f76668d0
      CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
      CR2: 0000000000000020 CR3: 0000000012227000 CR4: 00000000000006e0
      Stack:
      ffff88001374dcb8 ffffffffa0016fa0 0000000000000000 0000000000000001
      ffff88001374dcf8 ffffffffa0012922 ffff88001374dce8 00000000ffffffea
      ffffffffa0017100 0000000000000000 ffff8800134241a8 ffffffffa0017150
      Call Trace:
      [<ffffffffa0012922>] tipc_server_stop+0xa2/0x1b0 [tipc]
      [<ffffffffa0009995>] tipc_subscr_stop+0x15/0x20 [tipc]
      [<ffffffffa00130f5>] tipc_core_stop+0x1d/0x33 [tipc]
      [<ffffffffa001f0d4>] tipc_init+0xd4/0xf8 [tipc]
      [<ffffffffa001f000>] ? 0xffffffffa001efff
      [<ffffffff8100023f>] do_one_initcall+0x3f/0x150
      [<ffffffff81082f4d>] ? __blocking_notifier_call_chain+0x7d/0xd0
      [<ffffffff810cc58a>] load_module+0x11aa/0x19c0
      [<ffffffff810c8d60>] ? show_initstate+0x50/0x50
      [<ffffffff8190311c>] ? retint_restore_args+0xe/0xe
      [<ffffffff810cce79>] SyS_init_module+0xd9/0x110
      [<ffffffff8190dc65>] sysenter_dispatch+0x7/0x1f
      Code: 6c 24 70 4c 89 ef e8 b7 04 8f e1 8b 73 04 4c 89 e7 e8 7c 9e 32 e1 41 83 ac 24
      b8 00 00 00 01 4c 89 ef e8 eb 0a 8f e1 48 8b 43 08 <4c> 8b 68 20 4d 8d a5 48 03 00
      00 4c 89 e7 e8 04 05 8f e1 4c 89
      RIP  [<ffffffffa0011f49>] tipc_close_conn+0x59/0xb0 [tipc]
      RSP <ffff88001374dc98>
      CR2: 0000000000000020
      ---[ end trace b02321f40e4269a3 ]---
      
      We have the following call chain:
      
      tipc_core_start()
          ret = tipc_subscr_start()
              ret = tipc_server_start(){
                        server->enabled = 1;
                        ret = tipc_open_listening_sock()
                    }
      
      I.e., the server->enabled flag is unconditionally set to 1, whatever
      the return value of tipc_open_listening_sock().
      
      This causes a crash when tipc_core_start() tries to clean up
      resources after a failed initialization:
      
          if (ret == failed)
              tipc_subscr_stop()
                  tipc_server_stop(){
                      if (server->enabled)
                          tipc_close_conn(){
                              NULL reference of con->sock-sk
                              OOPS!
                      }
                  }
      
      To avoid this, tipc_server_start() should only set server->enabled
      to 1 in case of a succesful socket creation. In case of failure, it
      should release all allocated resources before returning.
      
      Problem introduced in commit c5fa7b3c
      ("tipc: introduce new TIPC server infrastructure") in v3.11-rc1.
      Note that it won't be seen often; it takes a module load under memory
      constrained conditions in order to trigger the failure condition.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c756891a
    • C
      net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLL · e0d1095a
      Cong Wang 提交于
      Eliezer renames several *ll_poll to *busy_poll, but forgets
      CONFIG_NET_LL_RX_POLL, so in case of confusion, rename it too.
      
      Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0d1095a
    • J
      ipv6: prevent race between address creation and removal · 8a226b2c
      Jiri Benc 提交于
      There's a race in IPv6 automatic addess assignment. The address is created
      with zero lifetime when it's added to various address lists. Before it gets
      assigned the correct lifetime, there's a window where a new address may be
      configured. This causes the semi-initiated address to be deleted in
      addrconf_verify.
      
      This was discovered as a reference leak caused by concurrent run of
      __ipv6_ifa_notify for both RTM_NEWADDR and RTM_DELADDR with the same
      address.
      
      Fix this by setting the lifetime before the address is added to
      inet6_addr_lst.
      
      A few notes:
      
      1. In addrconf_prefix_rcv, by setting update_lft to zero, the
         if (update_lft) { ... } condition is no longer executed for newly
         created addresses. This is okay, as the ifp fields are set in
         ipv6_add_addr now and ipv6_ifa_notify is called (and has been called)
         through addrconf_dad_start.
      
      2. The removal of the whole block under ifp->lock in inet6_addr_add is okay,
         too, as tstamp is initialized to jiffies in ipv6_add_addr.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a226b2c
    • J
      ipv6: move peer_addr init into ipv6_add_addr() · 3f8f5298
      Jiri Pirko 提交于
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f8f5298
    • M
      ipv6: update ip6_rt_last_gc every time GC is run · 49a18d86
      Michal Kubeček 提交于
      As pointed out by Eric Dumazet, net->ipv6.ip6_rt_last_gc should
      hold the last time garbage collector was run so that we should
      update it whenever fib6_run_gc() calls fib6_clean_all(), not only
      if we got there from ip6_dst_gc().
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49a18d86
    • M
      ipv6: prevent fib6_run_gc() contention · 2ac3ac8f
      Michal Kubeček 提交于
      On a high-traffic router with many processors and many IPv6 dst
      entries, soft lockup in fib6_run_gc() can occur when number of
      entries reaches gc_thresh.
      
      This happens because fib6_run_gc() uses fib6_gc_lock to allow
      only one thread to run the garbage collector but ip6_dst_gc()
      doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc()
      returns. On a system with many entries, this can take some time
      so that in the meantime, other threads pass the tests in
      ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for
      the lock. They then have to run the garbage collector one after
      another which blocks them for quite long.
      
      Resolve this by replacing special value ~0UL of expire parameter
      to fib6_run_gc() by explicit "force" parameter to choose between
      spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with
      force=false if gc_thresh is reached but not max_size.
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ac3ac8f
  7. 01 8月, 2013 8 次提交
    • J
      svcrpc: set cr_gss_mech from gss-proxy as well as legacy upcall · 7193bd17
      J. Bruce Fields 提交于
      The change made to rsc_parse() in
      0dc1531a "svcrpc: store gss mech in
      svc_cred" should also have been propagated to the gss-proxy codepath.
      This fixes a crash in the gss-proxy case.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      7193bd17
    • J
      svcrpc: fix kfree oops in gss-proxy code · 743e2171
      J. Bruce Fields 提交于
      mech_oid.data is an array, not kmalloc()'d memory.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      743e2171
    • J
      svcrpc: fix gss-proxy xdr decoding oops · dc43376c
      J. Bruce Fields 提交于
      Uninitialized stack data was being used as the destination for memcpy's.
      
      Longer term we'll just delete some of this code; all we're doing is
      skipping over xdr that we don't care about.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      dc43376c
    • J
      svcrpc: fix gss_rpc_upcall create error · 9f96392b
      J. Bruce Fields 提交于
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      9f96392b
    • N
      NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure. · 447383d2
      NeilBrown 提交于
      Since we enabled auto-tuning for sunrpc TCP connections we do not
      guarantee that there is enough write-space on each connection to
      queue a reply.
      
      If memory pressure causes the window to shrink too small, the request
      throttling in sunrpc/svc will not accept any requests so no more requests
      will be handled.  Even when pressure decreases the window will not
      grow again until data is sent on the connection.
      This means we get a deadlock:  no requests will be handled until there
      is more space, and no space will be allocated until a request is
      handled.
      
      This can be simulated by modifying svc_tcp_has_wspace to inflate the
      number of byte required and removing the 'svc_sock_setbufsize' calls
      in svc_setup_socket.
      
      I found that multiplying by 16 was enough to make the requirement
      exceed the default allocation.  With this modification in place:
         mount -o vers=3,proto=tcp 127.0.0.1:/home /mnt
      would block and eventually time out because the nfs server could not
      accept any requests.
      
      This patch relaxes the request throttling to always allow at least one
      request through per connection.  It does this by checking both
        sk_stream_min_wspace() and xprt->xpt_reserved
      are zero.
      The first is zero when the TCP transmit queue is empty.
      The second is zero when there are no RPC requests being processed.
      When both of these are zero the socket is idle and so one more
      request can safely be allowed through.
      
      Applying this patch allows the above mount command to succeed cleanly.
      Tracing shows that the allocated write buffer space quickly grows and
      after a few requests are handled, the extra tests are no longer needed
      to permit further requests to be processed.
      
      The main purpose of request throttling is to handle the case when one
      client is slow at collecting replies and the send queue gets full of
      replies that the client hasn't acknowledged (at the TCP level) yet.
      As we only change behaviour when the send queue is empty this main
      purpose is still preserved.
      Reported-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      447383d2
    • H
      ipv6: fib6_rules should return exact return value · 46b3a421
      Hannes Frederic Sowa 提交于
      With the addition of the suppress operation
      (7764a45a ("fib_rules: add .suppress
      operation") we rely on accurate error reporting of the fib_rules.actions.
      
      fib6_rule_action always returned -EAGAIN in case we could not find a
      matching route and 0 if a rule was matched. This also included a match
      for blackhole or prohibited rule actions which could get suppressed by
      the new logic.
      
      So adapt fib6_rule_action to always return the correct error code as
      its counterpart fib4_rule_action does. This also fixes a possiblity of
      nullptr-deref where we don't find a table, thus rt == NULL. Because
      the condition rt != ip6_null_entry still holdes it seems we could later
      get a nullptr bug on dereference rt->dst.
      
      v2:
      a) Fixed a brain fart in the commit msg (the rule => a table, etc). No
         changes to the patch.
      
      Cc: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46b3a421
    • L
      bridge: disable snooping if there is no querier · b00589af
      Linus Lüssing 提交于
      If there is no querier on a link then we won't get periodic reports and
      therefore won't be able to learn about multicast listeners behind ports,
      potentially leading to lost multicast packets, especially for multicast
      listeners that joined before the creation of the bridge.
      
      These lost multicast packets can appear since c5c23260
      ("bridge: Add multicast_querier toggle and disable queries by default")
      in particular.
      
      With this patch we are flooding multicast packets if our querier is
      disabled and if we didn't detect any other querier.
      
      A grace period of the Maximum Response Delay of the querier is added to
      give multicast responses enough time to arrive and to be learned from
      before disabling the flooding behaviour again.
      Signed-off-by: NLinus Lüssing <linus.luessing@web.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b00589af
    • S
      fib_rules: add .suppress operation · 7764a45a
      Stefan Tomanek 提交于
      This change adds a new operation to the fib_rules_ops struct; it allows the
      suppression of routing decisions if certain criteria are not met by its
      results.
      
      The first implemented constraint is a minimum prefix length added to the
      structures of routing rules. If a rule is added with a minimum prefix length
      >0, only routes meeting this threshold will be considered. Any other (more
      general) routing table entries will be ignored.
      
      When configuring a system with multiple network uplinks and default routes, it
      is often convinient to reference the main routing table multiple times - but
      omitting the default route. Using this patch and a modified "ip" utility, this
      can be achieved by using the following command sequence:
      
        $ ip route add table secuplink default via 10.42.23.1
      
        $ ip rule add pref 100            table main prefixlength 1
        $ ip rule add pref 150 fwmark 0xA table secuplink
      
      With this setup, packets marked 0xA will be processed by the additional routing
      table "secuplink", but only if no suitable route in the main routing table can
      be found. By using a minimal prefixlength of 1, the default route (/0) of the
      table "main" is hidden to packets processed by rule 100; packets traveling to
      destinations with more specific routing entries are processed as usual.
      Signed-off-by: NStefan Tomanek <stefan.tomanek@wertarbyte.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7764a45a