1. 19 3月, 2015 9 次提交
  2. 18 3月, 2015 15 次提交
    • Y
      tipc: withdraw tipc topology server name when namespace is deleted · 2b9bb7f3
      Ying Xue 提交于
      The TIPC topology server is a per namespace service associated with the
      tipc name {1, 1}. When a namespace is deleted, that name must be withdrawn
      before we call sk_release_kernel because the kernel socket release is
      done in init_net and trying to withdraw a TIPC name published in another
      namespace will fail with an error as:
      
      [  170.093264] Unable to remove local publication
      [  170.093264] (type=1, lower=1, ref=2184244004, key=2184244005)
      
      We fix this by breaking the association between the topology server name
      and socket before calling sk_release_kernel.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b9bb7f3
    • Y
      tipc: fix a potential deadlock when nametable is purged · 8460504b
      Ying Xue 提交于
      [   28.531768] =============================================
      [   28.532322] [ INFO: possible recursive locking detected ]
      [   28.532322] 3.19.0+ #194 Not tainted
      [   28.532322] ---------------------------------------------
      [   28.532322] insmod/583 is trying to acquire lock:
      [   28.532322]  (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000d219>] tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]
      [   28.532322] but task is already holding lock:
      [   28.532322]  (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000e0dc>] tipc_nametbl_stop+0xfc/0x1f0 [tipc]
      [   28.532322]
      [   28.532322] other info that might help us debug this:
      [   28.532322]  Possible unsafe locking scenario:
      [   28.532322]
      [   28.532322]        CPU0
      [   28.532322]        ----
      [   28.532322]   lock(&(&nseq->lock)->rlock);
      [   28.532322]   lock(&(&nseq->lock)->rlock);
      [   28.532322]
      [   28.532322]  *** DEADLOCK ***
      [   28.532322]
      [   28.532322]  May be due to missing lock nesting notation
      [   28.532322]
      [   28.532322] 3 locks held by insmod/583:
      [   28.532322]  #0:  (net_mutex){+.+.+.}, at: [<ffffffff8163e30f>] register_pernet_subsys+0x1f/0x50
      [   28.532322]  #1:  (&(&tn->nametbl_lock)->rlock){+.....}, at: [<ffffffffa000e091>] tipc_nametbl_stop+0xb1/0x1f0 [tipc]
      [   28.532322]  #2:  (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000e0dc>] tipc_nametbl_stop+0xfc/0x1f0 [tipc]
      [   28.532322]
      [   28.532322] stack backtrace:
      [   28.532322] CPU: 1 PID: 583 Comm: insmod Not tainted 3.19.0+ #194
      [   28.532322] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [   28.532322]  ffffffff82394460 ffff8800144cb928 ffffffff81792f3e 0000000000000007
      [   28.532322]  ffffffff82394460 ffff8800144cba28 ffffffff810a8080 ffff8800144cb998
      [   28.532322]  ffffffff810a4df3 ffff880013e9cb10 ffffffff82b0d330 ffff880013e9cb38
      [   28.532322] Call Trace:
      [   28.532322]  [<ffffffff81792f3e>] dump_stack+0x4c/0x65
      [   28.532322]  [<ffffffff810a8080>] __lock_acquire+0x740/0x1ca0
      [   28.532322]  [<ffffffff810a4df3>] ? __bfs+0x23/0x270
      [   28.532322]  [<ffffffff810a7506>] ? check_irq_usage+0x96/0xe0
      [   28.532322]  [<ffffffff810a8a73>] ? __lock_acquire+0x1133/0x1ca0
      [   28.532322]  [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
      [   28.532322]  [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
      [   28.532322]  [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffffa000d219>] tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffffa000e11e>] tipc_nametbl_stop+0x13e/0x1f0 [tipc]
      [   28.532322]  [<ffffffffa000dfe5>] ? tipc_nametbl_stop+0x5/0x1f0 [tipc]
      [   28.532322]  [<ffffffffa0004bab>] tipc_init_net+0x13b/0x150 [tipc]
      [   28.532322]  [<ffffffffa0004a75>] ? tipc_init_net+0x5/0x150 [tipc]
      [   28.532322]  [<ffffffff8163dece>] ops_init+0x4e/0x150
      [   28.532322]  [<ffffffff810aa66d>] ? trace_hardirqs_on+0xd/0x10
      [   28.532322]  [<ffffffff8163e1d3>] register_pernet_operations+0xf3/0x190
      [   28.532322]  [<ffffffff8163e31e>] register_pernet_subsys+0x2e/0x50
      [   28.532322]  [<ffffffffa002406a>] tipc_init+0x6a/0x1000 [tipc]
      [   28.532322]  [<ffffffffa0024000>] ? 0xffffffffa0024000
      [   28.532322]  [<ffffffff810002d9>] do_one_initcall+0x89/0x1c0
      [   28.532322]  [<ffffffff811b7cb0>] ? kmem_cache_alloc_trace+0x50/0x1b0
      [   28.532322]  [<ffffffff810e725b>] ? do_init_module+0x2b/0x200
      [   28.532322]  [<ffffffff810e7294>] do_init_module+0x64/0x200
      [   28.532322]  [<ffffffff810e9353>] load_module+0x12f3/0x18e0
      [   28.532322]  [<ffffffff810e5890>] ? show_initstate+0x50/0x50
      [   28.532322]  [<ffffffff810e9a19>] SyS_init_module+0xd9/0x110
      [   28.532322]  [<ffffffff8179f3b3>] sysenter_dispatch+0x7/0x1f
      
      Before tipc_purge_publications() calls tipc_nametbl_remove_publ() to
      remove a publication with a name sequence, the name sequence's lock
      is held. However, when tipc_nametbl_remove_publ() calling
      tipc_nameseq_remove_publ() to remove the publication, it first tries
      to query name sequence instance with the publication, and then holds
      the lock of the found name sequence. But as the lock may be already
      taken in tipc_purge_publications(), deadlock happens like above
      scenario demonstrated. As tipc_nameseq_remove_publ() doesn't grab name
      sequence's lock, the deadlock can be avoided if it's directly invoked
      by tipc_purge_publications().
      
      Fixes: 97ede29e ("tipc: convert name table read-write lock to RCU")
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8460504b
    • Y
      tipc: fix netns refcnt leak · 76100a8a
      Ying Xue 提交于
      When the TIPC module is loaded, we launch a topology server in kernel
      space, which in its turn is creating TIPC sockets for communication
      with topology server users. Because both the socket's creator and
      provider reside in the same module, it is necessary that the TIPC
      module's reference count remains zero after the server is started and
      the socket created; otherwise it becomes impossible to perform "rmmod"
      even on an idle module.
      
      Currently, we achieve this by defining a separate "tipc_proto_kern"
      protocol struct, that is used only for kernel space socket allocations.
      This structure has the "owner" field set to NULL, which restricts the
      module reference count from being be bumped when sk_alloc() for local
      sockets is called. Furthermore, we have defined three kernel-specific
      functions, tipc_sock_create_local(), tipc_sock_release_local() and
      tipc_sock_accept_local(), to avoid the module counter being modified
      when module local sockets are created or deleted. This has worked well
      until we introduced name space support.
      
      However, after name space support was introduced, we have observed that
      a reference count leak occurs, because the netns counter is not
      decremented in tipc_sock_delete_local().
      
      This commit remedies this problem. But instead of just modifying
      tipc_sock_delete_local(), we eliminate the whole parallel socket
      handling infrastructure, and start using the regular sk_create_kern(),
      kernel_accept() and sk_release_kernel() calls. Since those functions
      manipulate the module counter, we must now compensate for that by
      explicitly decrementing the counter after module local sockets are
      created, and increment it just before calling sk_release_kernel().
      
      Fixes: a62fbcce ("tipc: make subscriber server support net namespace")
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reported-by: NCong Wang <cwang@twopensource.com>
      Tested-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76100a8a
    • E
      inet: fix request sock refcounting · 0470c8ca
      Eric Dumazet 提交于
      While testing last patch series, I found req sock refcounting was wrong.
      
      We must set skc_refcnt to 1 for all request socks added in hashes,
      but also on request sockets created by FastOpen or syncookies.
      
      It is tricky because we need to defer this initialization so that
      future RCU lookups do not try to take a refcount on a not yet
      fully initialized request socket.
      
      Also get rid of ireq_refcnt alias.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Fixes: 13854e5a ("inet: add proper refcounting to request sock")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0470c8ca
    • E
      inet: avoid fastopen lock for regular accept() · e3d95ad7
      Eric Dumazet 提交于
      It is not because a TCP listener is FastOpen ready that
      all incoming sockets actually used FastOpen.
      
      Avoid taking queue->fastopenq->lock if not needed.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3d95ad7
    • E
      tcp: rename struct tcp_request_sock listener · 9439ce00
      Eric Dumazet 提交于
      The listener field in struct tcp_request_sock is a pointer
      back to the listener. We now have req->rsk_listener, so TCP
      only needs one boolean and not a full pointer.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9439ce00
    • E
      inet: add rsk_listener field to struct request_sock · 4e9a578e
      Eric Dumazet 提交于
      Once we'll be able to lookup request sockets in ehash table,
      we'll need to get access to listener which created this request.
      
      This avoid doing a lookup to find the listener, which benefits
      for a more solid SO_REUSEPORT, and is needed once we no
      longer queue request sock into a listener private queue.
      
      Note that 'struct tcp_request_sock'->listener could be reduced
      to a single bit, as TFO listener should match req->rsk_listener.
      TFO will no longer need to hold a reference on the listener.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e9a578e
    • E
      inet: uninline inet_reqsk_alloc() · e49bb337
      Eric Dumazet 提交于
      inet_reqsk_alloc() is becoming fat and should not be inlined.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e49bb337
    • E
      inet: add sk_listener argument to inet_reqsk_alloc() · 407640de
      Eric Dumazet 提交于
      listener socket can be used to set net pointer, and will
      be later used to hold a reference on listener.
      
      Add a const qualifier to first argument (struct request_sock_ops *),
      and factorize all write_pnet(&ireq->ireq_net, sock_net(sk));
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      407640de
    • E
      tcp: uninline tcp_oow_rate_limited() · 7970ddc8
      Eric Dumazet 提交于
      tcp_oow_rate_limited() is hardly used in fast path, there is
      no point inlining it.
      Signed-of-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7970ddc8
    • E
      tcp: move tcp_openreq_init() to tcp_input.c · 1bfc4438
      Eric Dumazet 提交于
      This big helper is called once from tcp_conn_request(), there is no
      point having it in an include. Compiler will inline it anyway.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1bfc4438
    • E
      netfilter: xt_socket: prepare for TCP_NEW_SYN_RECV support · a9407000
      Eric Dumazet 提交于
      TCP request socks soon will be visible in ehash table.
      
      xt_socket will be able to match them, but first we need
      to make sure to not consider them as full sockets.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9407000
    • E
      netfilter: tproxy: prepare TCP_NEW_SYN_RECV support · 8b580147
      Eric Dumazet 提交于
      TCP request socks soon will be visible in ehash table.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b580147
    • E
      netfilter: use sk_fullsock() helper · a8399231
      Eric Dumazet 提交于
      Upcoming request sockets have TCP_NEW_SYN_RECV state and should
      be special cased a bit like TCP_TIME_WAIT sockets.
      
      Signed-off-by; Eric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8399231
    • A
      bpf: allow BPF programs access 'protocol' and 'vlan_tci' fields · c2497395
      Alexei Starovoitov 提交于
      as a follow on to patch 70006af9 ("bpf: allow eBPF access skb fields")
      this patch allows 'protocol' and 'vlan_tci' fields to be accessible
      from extended BPF programs.
      
      The usage of 'protocol', 'vlan_present' and 'vlan_tci' fields is the same as
      corresponding SKF_AD_PROTOCOL, SKF_AD_VLAN_TAG_PRESENT and SKF_AD_VLAN_TAG
      accesses in classic BPF.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2497395
  3. 17 3月, 2015 7 次提交
  4. 16 3月, 2015 4 次提交
  5. 15 3月, 2015 5 次提交
    • F
      net: dsa: do not use slave MII bus for fixed PHYs · 96026d05
      Florian Fainelli 提交于
      Commit cd28a1a9 ("net: dsa: fully divert PHY reads/writes if
      requested") introduced a check for particular PHYs that need to be
      accessed using the slave MII bus created by DSA, but this check was too
      inclusive. This would prevent fixed PHYs from being successfully
      registered because those should not go through the slave MII bus created
      by DSA.
      
      Make sure we check that the PHY is not a fixed PHY to prevent that from
      happening.
      
      Fixes: cd28a1a9 ("net: dsa: fully divert PHY reads/writes if requested")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96026d05
    • E
      inet_diag: factorize code in new inet_diag_msg_common_fill() helper · a4458343
      Eric Dumazet 提交于
      Now the three type of sockets share a common base, we can factorize
      code in inet_diag_msg_common_fill().
      
      inet_diag_entry no longer requires saddr_storage & daddr_storage
      and the extra copies.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4458343
    • E
      inet_diag: adjust inet_sk_diag_fill() bug condition · a07c9207
      Eric Dumazet 提交于
      inet_sk_diag_fill() only copes with non timewait and non request socks
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a07c9207
    • E
      inet: fill request sock ir_iif for IPv4 · 16f86165
      Eric Dumazet 提交于
      Once request socks will be in ehash table, they will need to have
      a valid ir_iff field.
      
      This is currently true only for IPv6. This patch extends support
      for IPv4 as well.
      
      This means inet_diag_fill_req() can now properly use ir_iif,
      which is better for IPv6 link locals anyway, as request sockets
      and established sockets will propagate consistent netlink idiag_if.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16f86165
    • J
      tipc: clean up handling of message priorities · e3eea1eb
      Jon Paul Maloy 提交于
      Messages transferred by TIPC are assigned an "importance priority", -an
      integer value indicating how to treat the message when there is link or
      destination socket congestion.
      
      There is no separate header field for this value. Instead, the message
      user values have been chosen in ascending order according to perceived
      importance, so that the message user field can be used for this.
      
      This is not a good solution. First, we have many more users than the
      needed priority levels, so we end up with treating more priority
      levels than necessary. Second, the user field cannot always
      accurately reflect the priority of the message. E.g., a message
      fragment packet should really have the priority of the enveloped
      user data message, and not the priority of the MSG_FRAGMENTER user.
      Until now, we have been working around this problem in different ways,
      but it is now time to implement a consistent way of handling such
      priorities, although still within the constraint that we cannot
      allocate any more bits in the regular data message header for this.
      
      In this commit, we define a new priority level, TIPC_SYSTEM_IMPORTANCE,
      that will be the only one used apart from the four (lower) user data
      levels. All non-data messages map down to this priority. Furthermore,
      we take some free bits from the MSG_FRAGMENTER header and allocate
      them to store the priority of the enveloped message. We then adjust
      the functions msg_importance()/msg_set_importance() so that they
      read/set the correct header fields depending on user type.
      
      This small protocol change is fully compatible, because the code at
      the receiving end of a link currently reads the importance level
      only from user data messages, where there is no change.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3eea1eb