1. 01 9月, 2012 3 次提交
  2. 31 8月, 2012 7 次提交
    • P
      netfilter: nf_conntrack: fix racy timer handling with reliable events · 5b423f6a
      Pablo Neira Ayuso 提交于
      Existing code assumes that del_timer returns true for alive conntrack
      entries. However, this is not true if reliable events are enabled.
      In that case, del_timer may return true for entries that were
      just inserted in the dying list. Note that packets / ctnetlink may
      hold references to conntrack entries that were just inserted to such
      list.
      
      This patch fixes the issue by adding an independent timer for
      event delivery. This increases the size of the ecache extension.
      Still we can revisit this later and use variable size extensions
      to allocate this area on demand.
      Tested-by: NOliver Smith <olipro@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      5b423f6a
    • E
      ipv4: must use rcu protection while calling fib_lookup · c5ae7d41
      Eric Dumazet 提交于
      Following lockdep splat was reported by Pavel Roskin :
      
      [ 1570.586223] ===============================
      [ 1570.586225] [ INFO: suspicious RCU usage. ]
      [ 1570.586228] 3.6.0-rc3-wl-main #98 Not tainted
      [ 1570.586229] -------------------------------
      [ 1570.586231] /home/proski/src/linux/net/ipv4/route.c:645 suspicious rcu_dereference_check() usage!
      [ 1570.586233]
      [ 1570.586233] other info that might help us debug this:
      [ 1570.586233]
      [ 1570.586236]
      [ 1570.586236] rcu_scheduler_active = 1, debug_locks = 0
      [ 1570.586238] 2 locks held by Chrome_IOThread/4467:
      [ 1570.586240]  #0:  (slock-AF_INET){+.-...}, at: [<ffffffff814f2c0c>] release_sock+0x2c/0xa0
      [ 1570.586253]  #1:  (fnhe_lock){+.-...}, at: [<ffffffff815302fc>] update_or_create_fnhe+0x2c/0x270
      [ 1570.586260]
      [ 1570.586260] stack backtrace:
      [ 1570.586263] Pid: 4467, comm: Chrome_IOThread Not tainted 3.6.0-rc3-wl-main #98
      [ 1570.586265] Call Trace:
      [ 1570.586271]  [<ffffffff810976ed>] lockdep_rcu_suspicious+0xfd/0x130
      [ 1570.586275]  [<ffffffff8153042c>] update_or_create_fnhe+0x15c/0x270
      [ 1570.586278]  [<ffffffff815305b3>] __ip_rt_update_pmtu+0x73/0xb0
      [ 1570.586282]  [<ffffffff81530619>] ip_rt_update_pmtu+0x29/0x90
      [ 1570.586285]  [<ffffffff815411dc>] inet_csk_update_pmtu+0x2c/0x80
      [ 1570.586290]  [<ffffffff81558d1e>] tcp_v4_mtu_reduced+0x2e/0xc0
      [ 1570.586293]  [<ffffffff81553bc4>] tcp_release_cb+0xa4/0xb0
      [ 1570.586296]  [<ffffffff814f2c35>] release_sock+0x55/0xa0
      [ 1570.586300]  [<ffffffff815442ef>] tcp_sendmsg+0x4af/0xf50
      [ 1570.586305]  [<ffffffff8156fc60>] inet_sendmsg+0x120/0x230
      [ 1570.586308]  [<ffffffff8156fb40>] ? inet_sk_rebuild_header+0x40/0x40
      [ 1570.586312]  [<ffffffff814f4bdd>] ? sock_update_classid+0xbd/0x3b0
      [ 1570.586315]  [<ffffffff814f4c50>] ? sock_update_classid+0x130/0x3b0
      [ 1570.586320]  [<ffffffff814ec435>] do_sock_write+0xc5/0xe0
      [ 1570.586323]  [<ffffffff814ec4a3>] sock_aio_write+0x53/0x80
      [ 1570.586328]  [<ffffffff8114bc83>] do_sync_write+0xa3/0xe0
      [ 1570.586332]  [<ffffffff8114c5a5>] vfs_write+0x165/0x180
      [ 1570.586335]  [<ffffffff8114c805>] sys_write+0x45/0x90
      [ 1570.586340]  [<ffffffff815d2722>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NPavel Roskin <proski@gnu.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5ae7d41
    • F
      net: ipv4: ipmr_expire_timer causes crash when removing net namespace · acbb219d
      Francesco Ruggeri 提交于
      When tearing down a net namespace, ipv4 mr_table structures are freed
      without first deactivating their timers. This can result in a crash in
      run_timer_softirq.
      This patch mimics the corresponding behaviour in ipv6.
      Locking and synchronization seem to be adequate.
      We are about to kfree mrt, so existing code should already make sure that
      no other references to mrt are pending or can be created by incoming traffic.
      The functions invoked here do not cause new references to mrt or other
      race conditions to be created.
      Invoking del_timer_sync guarantees that ipmr_expire_timer is inactive.
      Both ipmr_expire_process (whose completion we may have to wait in
      del_timer_sync) and mroute_clean_tables internally use mfc_unres_lock
      or other synchronizations when needed, and they both only modify mrt.
      
      Tested in Linux 3.4.8.
      Signed-off-by: NFrancesco Ruggeri <fruggeri@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      acbb219d
    • E
      netpoll: provide an IP ident in UDP frames · ee130409
      Eric Dumazet 提交于
      Let's fill IP header ident field with a meaningful value,
      it might help some setups.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee130409
    • X
      l2tp: avoid to use synchronize_rcu in tunnel free function · 99469c32
      xeb@mail.ru 提交于
      Avoid to use synchronize_rcu in l2tp_tunnel_free because context may be
      atomic.
      Signed-off-by: NDmitry Kozlov <xeb@mail.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99469c32
    • P
      netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation · 3f509c68
      Pablo Neira Ayuso 提交于
      We're hitting bug while trying to reinsert an already existing
      expectation:
      
      kernel BUG at kernel/timer.c:895!
      invalid opcode: 0000 [#1] SMP
      [...]
      Call Trace:
       <IRQ>
       [<ffffffffa0069563>] nf_ct_expect_related_report+0x4a0/0x57a [nf_conntrack]
       [<ffffffff812d423a>] ? in4_pton+0x72/0x131
       [<ffffffffa00ca69e>] ip_nat_sdp_media+0xeb/0x185 [nf_nat_sip]
       [<ffffffffa00b5b9b>] set_expected_rtp_rtcp+0x32d/0x39b [nf_conntrack_sip]
       [<ffffffffa00b5f15>] process_sdp+0x30c/0x3ec [nf_conntrack_sip]
       [<ffffffff8103f1eb>] ? irq_exit+0x9a/0x9c
       [<ffffffffa00ca738>] ? ip_nat_sdp_media+0x185/0x185 [nf_nat_sip]
      
      We have to remove the RTP expectation if the RTCP expectation hits EBUSY
      since we keep trying with other ports until we succeed.
      Reported-by: NRafal Fitt <rafalf@aplusc.com.pl>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      3f509c68
    • G
      net: dev: fix the incorrect hold of net namespace's lo device · 6549dd43
      Gao feng 提交于
      When moving a net device from one net namespace to another
      net namespace,dev_change_net_namespace calls NETDEV_DOWN
      event,so the original net namespace's dst entries which
      beloned to this net device will be put into dst_garbage
      list.
      
      then dev_change_net_namespace will set this net device's
      net to the new net namespace.
      
      If we unregister this net device's driver, this will trigger
      the NETDEV_UNREGISTER_FINAL event, dst_ifdown will be called,
      and get this net device's dst entries from dst_garbage list,
      put these entries' dev to the new net namespace's lo device.
      
      It's not what we want,actually we need these dst entries hold
      the original net namespace's lo device,this incorrect device
      holding will trigger emg message like below.
      unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      so we should call NETDEV_UNREGISTER_FINAL event in
      dev_change_net_namespace too,in order to make sure dst entries
      already in the dst_garbage list, we need rcu_barrier before we
      call NETDEV_UNREGISTER_FINAL event.
      
      With help form Eric Dumazet.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6549dd43
  3. 30 8月, 2012 4 次提交
  4. 25 8月, 2012 5 次提交
    • Y
      tcp: fix cwnd reduction for non-sack recovery · 7c4a56fe
      Yuchung Cheng 提交于
      The cwnd reduction in fast recovery is based on the number of packets
      newly delivered per ACK. For non-sack connections every DUPACK
      signifies a packet has been delivered, but the sender mistakenly
      skips counting them for cwnd reduction.
      
      The fix is to compute newly_acked_sacked after DUPACKs are accounted
      in sacked_out for non-sack connections.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNandita Dukkipati <nanditad@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c4a56fe
    • J
      vlan: add helper which can be called to see if device is used by vlan · 9b361c13
      Jiri Pirko 提交于
      also, remove unused vlan_info definition from header
      
      CC: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b361c13
    • P
      netlink: fix possible spoofing from non-root processes · 20e1db19
      Pablo Neira Ayuso 提交于
      Non-root user-space processes can send Netlink messages to other
      processes that are well-known for being subscribed to Netlink
      asynchronous notifications. This allows ilegitimate non-root
      process to send forged messages to Netlink subscribers.
      
      The userspace process usually verifies the legitimate origin in
      two ways:
      
      a) Socket credentials. If UID != 0, then the message comes from
         some ilegitimate process and the message needs to be dropped.
      
      b) Netlink portID. In general, portID == 0 means that the origin
         of the messages comes from the kernel. Thus, discarding any
         message not coming from the kernel.
      
      However, ctnetlink sets the portID in event messages that has
      been triggered by some user-space process, eg. conntrack utility.
      So other processes subscribed to ctnetlink events, eg. conntrackd,
      know that the event was triggered by some user-space action.
      
      Neither of the two ways to discard ilegitimate messages coming
      from non-root processes can help for ctnetlink.
      
      This patch adds capability validation in case that dst_pid is set
      in netlink_sendmsg(). This approach is aggressive since existing
      applications using any Netlink bus to deliver messages between
      two user-space processes will break. Note that the exception is
      NETLINK_USERSOCK, since it is reserved for netlink-to-netlink
      userspace communication.
      
      Still, if anyone wants that his Netlink bus allows netlink-to-netlink
      userspace, then they can set NL_NONROOT_SEND. However, by default,
      I don't think it makes sense to allow to use NETLINK_ROUTE to
      communicate two processes that are sending no matter what information
      that is not related to link/neighbouring/routing. They should be using
      NETLINK_USERSOCK instead for that.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20e1db19
    • B
      net: Set device operstate at registration time · 8f4cccbb
      Ben Hutchings 提交于
      The operstate of a device is initially IF_OPER_UNKNOWN and is updated
      asynchronously by linkwatch after each change of carrier state
      reported by the driver.  The default carrier state of a net device is
      on, and this will never be changed on drivers that do not support
      carrier detection, thus the operstate remains IF_OPER_UNKNOWN.
      
      For devices that do support carrier detection, the driver must set the
      carrier state to off initially, then poll the hardware state when the
      device is opened.  However, we must not activate linkwatch for a
      unregistered device, and commit b4730016 ('net: Do not fire linkwatch
      events until the device is registered.') ensured that we don't.  But
      this means that the operstate for many devices that support carrier
      detection remains IF_OPER_UNKNOWN when it should be IF_OPER_DOWN.
      
      The same issue exists with the dormant state.
      
      The proper initialisation sequence, avoiding a race with opening of
      the device, is:
      
              rtnl_lock();
              rc = register_netdevice(dev);
              if (rc)
                      goto out_unlock;
              netif_carrier_off(dev); /* or netif_dormant_on(dev) */
              rtnl_unlock();
      
      but it seems silly that this should have to be repeated in so many
      drivers.  Further, the operstate seen immediately after opening the
      device may still be IF_OPER_UNKNOWN due to the asynchronous nature of
      linkwatch.
      
      Commit 22604c86 ('net: Fix for initial link state in 2.6.28') attempted
      to fix this by setting the operstate synchronously, but it was
      reverted as it could lead to deadlock.
      
      This initialises the operstate synchronously at registration time
      only.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f4cccbb
    • N
      cls_cgroup: Allow classifier cgroups to have their classid reset to 0 · 3afa6d00
      Neil Horman 提交于
      The network classifier cgroup initalizes each cgroups instance classid value to
      0.  However, the sock_update_classid function only updates classid's in sockets
      if the tasks cgroup classid is not zero, and if it differs from the current
      classid.  The later check is to prevent cache line dirtying, but the former is
      detrimental, as it prevents resetting a classid for a cgroup to 0.  While this
      is not a common action, it has administrative usefulness (if the admin wants to
      disable classification of a certain group temporarily for instance).
      
      Easy fix, just remove the zero check.  Tested successfully by myself
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3afa6d00
  5. 24 8月, 2012 3 次提交
  6. 23 8月, 2012 18 次提交