1. 30 3月, 2015 4 次提交
    • Y
      tipc: involve reference counter for node structure · 8a0f6ebe
      Ying Xue 提交于
      TIPC node hash node table is protected with rcu lock on read side.
      tipc_node_find() is used to look for a node object with node address
      through iterating the hash node table. As the entire process of what
      tipc_node_find() traverses the table is guarded with rcu read lock,
      it's safe for us. However, when callers use the node object returned
      by tipc_node_find(), there is no rcu read lock applied. Therefore,
      this is absolutely unsafe for callers of tipc_node_find().
      
      Now we introduce a reference counter for node structure. Before
      tipc_node_find() returns node object to its caller, it first increases
      the reference counter. Accordingly, after its caller used it up,
      it decreases the counter again. This can prevent a node being used by
      one thread from being freed by another thread.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a0f6ebe
    • Y
      tipc: fix potential deadlock when all links are reset · b952b2be
      Ying Xue 提交于
      [   60.988363] ======================================================
      [   60.988754] [ INFO: possible circular locking dependency detected ]
      [   60.989152] 3.19.0+ #194 Not tainted
      [   60.989377] -------------------------------------------------------
      [   60.989781] swapper/3/0 is trying to acquire lock:
      [   60.990079]  (&(&n_ptr->lock)->rlock){+.-...}, at: [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
      [   60.990743]
      [   60.990743] but task is already holding lock:
      [   60.991106]  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
      [   60.991738]
      [   60.991738] which lock already depends on the new lock.
      [   60.991738]
      [   60.992174]
      [   60.992174] the existing dependency chain (in reverse order) is:
      [   60.992174]
      -> #1 (&(&bclink->lock)->rlock){+.-...}:
      [   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
      [   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
      [   60.992174]        [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
      [   60.992174]        [<ffffffffa0000f57>] tipc_bclink_add_node+0x97/0xf0 [tipc]
      [   60.992174]        [<ffffffffa0011815>] tipc_node_link_up+0xf5/0x110 [tipc]
      [   60.992174]        [<ffffffffa0007783>] link_state_event+0x2b3/0x4f0 [tipc]
      [   60.992174]        [<ffffffffa00193c0>] tipc_link_proto_rcv+0x24c/0x418 [tipc]
      [   60.992174]        [<ffffffffa0008857>] tipc_rcv+0x827/0xac0 [tipc]
      [   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
      [   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
      [   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
      [   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
      [   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
      [   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
      [   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
      [   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
      [   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
      [   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
      [   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
      [   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
      [   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
      [   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
      [   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
      [   60.992174]
      -> #0 (&(&n_ptr->lock)->rlock){+.-...}:
      [   60.992174]        [<ffffffff810a8f7d>] __lock_acquire+0x163d/0x1ca0
      [   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
      [   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
      [   60.992174]        [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
      [   60.992174]        [<ffffffffa0001e11>] tipc_bclink_rcv+0x611/0x640 [tipc]
      [   60.992174]        [<ffffffffa0008646>] tipc_rcv+0x616/0xac0 [tipc]
      [   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
      [   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
      [   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
      [   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
      [   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
      [   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
      [   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
      [   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
      [   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
      [   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
      [   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
      [   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
      [   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
      [   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
      [   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
      [   60.992174]
      [   60.992174] other info that might help us debug this:
      [   60.992174]
      [   60.992174]  Possible unsafe locking scenario:
      [   60.992174]
      [   60.992174]        CPU0                    CPU1
      [   60.992174]        ----                    ----
      [   60.992174]   lock(&(&bclink->lock)->rlock);
      [   60.992174]                                lock(&(&n_ptr->lock)->rlock);
      [   60.992174]                                lock(&(&bclink->lock)->rlock);
      [   60.992174]   lock(&(&n_ptr->lock)->rlock);
      [   60.992174]
      [   60.992174]  *** DEADLOCK ***
      [   60.992174]
      [   60.992174] 3 locks held by swapper/3/0:
      [   60.992174]  #0:  (rcu_read_lock){......}, at: [<ffffffff81646791>] __netif_receive_skb_core+0x71/0x980
      [   60.992174]  #1:  (rcu_read_lock){......}, at: [<ffffffffa0002c35>] tipc_l2_rcv_msg+0x5/0xd0 [tipc]
      [   60.992174]  #2:  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
      [   60.992174]
      
      The correct the sequence of grabbing n_ptr->lock and bclink->lock
      should be that the former is first held and the latter is then taken,
      which exactly happened on CPU1. But especially when the retransmission
      of broadcast link is failed, bclink->lock is first held in
      tipc_bclink_rcv(), and n_ptr->lock is taken in link_retransmit_failure()
      called by tipc_link_retransmit() subsequently, which is demonstrated on
      CPU0. As a result, deadlock occurs.
      
      If the order of holding the two locks happening on CPU0 is reversed, the
      deadlock risk will be relieved. Therefore, the node lock taken in
      link_retransmit_failure() originally is moved to tipc_bclink_rcv()
      so that it's obtained before bclink lock. But the precondition of
      the adjustment of node lock is that responding to bclink reset event
      must be moved from tipc_bclink_unlock() to tipc_node_unlock().
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b952b2be
    • E
      tcp: tcp_syn_flood_action() can be static · 41d25fe0
      Eric Dumazet 提交于
      After commit 1fb6f159 ("tcp: add tcp_conn_request"),
      tcp_syn_flood_action() is no longer used from IPv6.
      
      We can make it static, by moving it above tcp_conn_request()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NOctavian Purdila <octavian.purdila@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41d25fe0
    • W
      fib6: install fib6 ops in the last step · 85b99092
      WANG Cong 提交于
      We should not commit the new ops until we finish
      all the setup, otherwise we have to NULL it on failure.
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85b99092
  2. 26 3月, 2015 6 次提交
  3. 25 3月, 2015 15 次提交
  4. 24 3月, 2015 15 次提交
    • H
      ipv6: introduce idgen_delay and idgen_retries knobs · 1855b7c3
      Hannes Frederic Sowa 提交于
      This is specified by RFC 7217.
      
      Cc: Erik Kline <ek@google.com>
      Cc: Fernando Gont <fgont@si6networks.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1855b7c3
    • H
      ipv6: do retries on stable privacy addresses · 5f40ef77
      Hannes Frederic Sowa 提交于
      If a DAD conflict is detected, we want to retry privacy stable address
      generation up to idgen_retries (= 3) times with a delay of idgen_delay
      (= 1 second). Add the logic to addrconf_dad_failure.
      
      By design, we don't clean up dad failed permanent addresses.
      
      Cc: Erik Kline <ek@google.com>
      Cc: Fernando Gont <fgont@si6networks.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f40ef77
    • H
      ipv6: collapse state_lock and lock · 8e8e676d
      Hannes Frederic Sowa 提交于
      Cc: Erik Kline <ek@google.com>
      Cc: Fernando Gont <fgont@si6networks.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e8e676d
    • H
      ipv6: introduce IFA_F_STABLE_PRIVACY flag · 64236f3f
      Hannes Frederic Sowa 提交于
      We need to mark appropriate addresses so we can do retries in case their
      DAD failed.
      
      Cc: Erik Kline <ek@google.com>
      Cc: Fernando Gont <fgont@si6networks.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64236f3f
    • H
      ipv6: generation of stable privacy addresses for link-local and autoconf · 622c81d5
      Hannes Frederic Sowa 提交于
      This patch implements the stable privacy address generation for
      link-local and autoconf addresses as specified in RFC7217.
      
        RID = F(Prefix, Net_Iface, Network_ID, DAD_Counter, secret_key)
      
      is the RID (random identifier). As the hash function F we chose one
      round of sha1. Prefix will be either the link-local prefix or the
      router advertised one. As Net_Iface we use the MAC address of the
      device. DAD_Counter and secret_key are implemented as specified.
      
      We don't use Network_ID, as it couples the code too closely to other
      subsystems. It is specified as optional in the RFC.
      
      As Net_Iface we only use the MAC address: we simply have no stable
      identifier in the kernel we could possibly use: because this code might
      run very early, we cannot depend on names, as they might be changed by
      user space early on during the boot process.
      
      A new address generation mode is introduced,
      IN6_ADDR_GEN_MODE_STABLE_PRIVACY. With iproute2 one can switch back to
      none or eui64 address configuration mode although the stable_secret is
      already set.
      
      We refuse writes to ipv6/conf/all/stable_secret but only allow
      ipv6/conf/default/stable_secret and the interface specific file to be
      written to. The default stable_secret is used as the parameter for the
      namespace, the interface specific can overwrite the secret, e.g. when
      switching a network configuration from one system to another while
      inheriting the secret.
      
      Cc: Erik Kline <ek@google.com>
      Cc: Fernando Gont <fgont@si6networks.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      622c81d5
    • H
      ipv6: introduce secret_stable to ipv6_devconf · 3d1bec99
      Hannes Frederic Sowa 提交于
      This patch implements the procfs logic for the stable_address knob:
      The secret is formatted as an ipv6 address and will be stored per
      interface and per namespace. We track initialized flag and return EIO
      errors until the secret is set.
      
      We don't inherit the secret to newly created namespaces.
      
      Cc: Erik Kline <ek@google.com>
      Cc: Fernando Gont <fgont@si6networks.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d1bec99
    • H
      tipc: Use default rhashtable hashfn · 6d022949
      Herbert Xu 提交于
      This patch removes the explicit jhash value for the hashfn parameter
      of rhashtable.  The default is now jhash so removing the setting
      makes no difference apart from making one less copy of jhash in
      the kernel.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d022949
    • H
      netlink: Use default rhashtable hashfn · 11b58ba1
      Herbert Xu 提交于
      This patch removes the explicit jhash value for the hashfn parameter
      of rhashtable.  As the key length is a multiple of 4, this means that
      we will actually end up using jhash2.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11b58ba1
    • A
      af_packet: pass checksum validation status to the user · 682f048b
      Alexander Drozdov 提交于
      Introduce TP_STATUS_CSUM_VALID tp_status flag to tell the
      af_packet user that at least the transport header checksum
      has been already validated.
      
      For now, the flag may be set for incoming packets only.
      Signed-off-by: NAlexander Drozdov <al.drozdov@gmail.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      682f048b
    • A
      af_packet: make tpacket_rcv to not set status value before run_filter · 68c2e5de
      Alexander Drozdov 提交于
      It is just an optimization. We don't need the value of status variable
      if the packet is filtered.
      Signed-off-by: NAlexander Drozdov <al.drozdov@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68c2e5de
    • F
      inet: fix double request socket freeing · c6973669
      Fan Du 提交于
      Eric Hugne reported following error :
      
      I'm hitting this warning on latest net-next when i try to SSH into a machine
      with eth0 added to a bridge (but i think the problem is older than that)
      
      Steps to reproduce:
      node2 ~ # brctl addif br0 eth0
      [  223.758785] device eth0 entered promiscuous mode
      node2 ~ # ip link set br0 up
      [  244.503614] br0: port 1(eth0) entered forwarding state
      [  244.505108] br0: port 1(eth0) entered forwarding state
      node2 ~ # [  251.160159] ------------[ cut here ]------------
      [  251.160831] WARNING: CPU: 0 PID: 3 at include/net/request_sock.h:102 tcp_v4_err+0x6b1/0x720()
      [  251.162077] Modules linked in:
      [  251.162496] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.0.0-rc3+ #18
      [  251.163334] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  251.164078]  ffffffff81a8365c ffff880038a6ba18 ffffffff8162ace4 0000000000009898
      [  251.165084]  0000000000000000 ffff880038a6ba58 ffffffff8104da85 ffff88003fa437c0
      [  251.166195]  ffff88003fa437c0 ffff88003fa74e00 ffff88003fa43bb8 ffff88003fad99a0
      [  251.167203] Call Trace:
      [  251.167533]  [<ffffffff8162ace4>] dump_stack+0x45/0x57
      [  251.168206]  [<ffffffff8104da85>] warn_slowpath_common+0x85/0xc0
      [  251.169239]  [<ffffffff8104db65>] warn_slowpath_null+0x15/0x20
      [  251.170271]  [<ffffffff81559d51>] tcp_v4_err+0x6b1/0x720
      [  251.171408]  [<ffffffff81630d03>] ? _raw_read_lock_irq+0x3/0x10
      [  251.172589]  [<ffffffff81534e20>] ? inet_del_offload+0x40/0x40
      [  251.173366]  [<ffffffff81569295>] icmp_socket_deliver+0x65/0xb0
      [  251.174134]  [<ffffffff815693a2>] icmp_unreach+0xc2/0x280
      [  251.174820]  [<ffffffff8156a82d>] icmp_rcv+0x2bd/0x3a0
      [  251.175473]  [<ffffffff81534ea2>] ip_local_deliver_finish+0x82/0x1e0
      [  251.176282]  [<ffffffff815354d8>] ip_local_deliver+0x88/0x90
      [  251.177004]  [<ffffffff815350f0>] ip_rcv_finish+0xf0/0x310
      [  251.177693]  [<ffffffff815357bc>] ip_rcv+0x2dc/0x390
      [  251.178336]  [<ffffffff814f5da3>] __netif_receive_skb_core+0x713/0xa20
      [  251.179170]  [<ffffffff814f7fca>] __netif_receive_skb+0x1a/0x80
      [  251.179922]  [<ffffffff814f97d4>] process_backlog+0x94/0x120
      [  251.180639]  [<ffffffff814f9612>] net_rx_action+0x1e2/0x310
      [  251.181356]  [<ffffffff81051267>] __do_softirq+0xa7/0x290
      [  251.182046]  [<ffffffff81051469>] run_ksoftirqd+0x19/0x30
      [  251.182726]  [<ffffffff8106cc23>] smpboot_thread_fn+0x153/0x1d0
      [  251.183485]  [<ffffffff8106cad0>] ? SyS_setgroups+0x130/0x130
      [  251.184228]  [<ffffffff8106935e>] kthread+0xee/0x110
      [  251.184871]  [<ffffffff81069270>] ? kthread_create_on_node+0x1b0/0x1b0
      [  251.185690]  [<ffffffff81631108>] ret_from_fork+0x58/0x90
      [  251.186385]  [<ffffffff81069270>] ? kthread_create_on_node+0x1b0/0x1b0
      [  251.187216] ---[ end trace c947fc7b24e42ea1 ]---
      [  259.542268] br0: port 1(eth0) entered forwarding state
      
      Remove the double calls to reqsk_put()
      
      [edumazet] :
      
      I got confused because reqsk_timer_handler() _has_ to call
      reqsk_put(req) after calling inet_csk_reqsk_queue_drop(), as
      the timer handler holds a reference on req.
      Signed-off-by: NFan Du <fan.du@intel.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NErik Hugne <erik.hugne@ericsson.com>
      Fixes: fa76ce73 ("inet: get rid of central tcp/dccp listener timer")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6973669
    • A
      fib_trie: Fix regression in handling of inflate/halve failure · b6f15f82
      Alexander Duyck 提交于
      When I updated the code to address a possible null pointer dereference in
      resize I ended up reverting an exception handling fix for the suffix length
      in the event that inflate or halve failed.  This change is meant to correct
      that by reverting the earlier fix and instead simply getting the parent
      again after inflate has been completed to avoid the possible null pointer
      issue.
      
      Fixes: ddb4b9a1 ("fib_trie: Address possible NULL pointer dereference in resize")
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6f15f82
    • Y
      net: Move the comment about unsettable socket-level options to default clause... · 443b5991
      YOSHIFUJI Hideaki/吉藤英明 提交于
      net: Move the comment about unsettable socket-level options to default clause and update its reference.
      
      We implement the SO_SNDLOWAT etc not to be settable and return
      ENOPROTOOPT per 1003.1g 7.  Move the comment to appropriate
      position and update the reference.
      Signed-off-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
      Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      443b5991
    • E
      ipv6: dccp: handle ICMP messages on DCCP_NEW_SYN_RECV request sockets · 52036a43
      Eric Dumazet 提交于
      dccp_v6_err() can restrict lookups to ehash table, and not to listeners.
      
      Note this patch creates the infrastructure, but this means that ICMP
      messages for request sockets are ignored until complete conversion.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52036a43
    • E
      ipv4: dccp: handle ICMP messages on DCCP_NEW_SYN_RECV request sockets · 85645bab
      Eric Dumazet 提交于
      dccp_v4_err() can restrict lookups to ehash table, and not to listeners.
      
      Note this patch creates the infrastructure, but this means that ICMP
      messages for request sockets are ignored until complete conversion.
      
      New dccp_req_err() helper is exported so that we can use it in IPv6
      in following patch.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85645bab