1. 01 11月, 2017 13 次提交
  2. 31 10月, 2017 1 次提交
  3. 29 10月, 2017 26 次提交
    • C
      net_sched: fix call_rcu() race on act_sample module removal · 46e235c1
      Cong Wang 提交于
      Similar to commit c78e1746
      ("net: sched: fix call_rcu() race on classifier module unloads"),
      we need to wait for flying RCU callback tcf_sample_cleanup_rcu().
      
      Cc: Yotam Gigi <yotamg@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46e235c1
    • C
      net_sched: add rtnl assertion to tcf_exts_destroy() · 2d132eba
      Cong Wang 提交于
      After previous patches, it is now safe to claim that
      tcf_exts_destroy() is always called with RTNL lock.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d132eba
    • C
      net_sched: use tcf_queue_work() in tcindex filter · 27ce4f05
      Cong Wang 提交于
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: NChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27ce4f05
    • C
      net_sched: use tcf_queue_work() in rsvp filter · d4f84a41
      Cong Wang 提交于
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: NChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4f84a41
    • C
      net_sched: use tcf_queue_work() in route filter · c2f3f31d
      Cong Wang 提交于
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: NChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2f3f31d
    • C
      net_sched: use tcf_queue_work() in u32 filter · c0d378ef
      Cong Wang 提交于
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: NChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0d378ef
    • C
      net_sched: use tcf_queue_work() in matchall filter · df2735ee
      Cong Wang 提交于
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: NChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df2735ee
    • C
      net_sched: use tcf_queue_work() in fw filter · e071dff2
      Cong Wang 提交于
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: NChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e071dff2
    • C
      net_sched: use tcf_queue_work() in flower filter · 0552c8af
      Cong Wang 提交于
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: NChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0552c8af
    • C
      net_sched: use tcf_queue_work() in flow filter · 94cdb475
      Cong Wang 提交于
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: NChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94cdb475
    • C
      net_sched: use tcf_queue_work() in cgroup filter · b1b5b04f
      Cong Wang 提交于
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: NChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1b5b04f
    • C
      net_sched: use tcf_queue_work() in bpf filter · e910af67
      Cong Wang 提交于
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: NChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e910af67
    • C
      net_sched: use tcf_queue_work() in basic filter · c96a4838
      Cong Wang 提交于
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: NChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c96a4838
    • C
      net_sched: introduce a workqueue for RCU callbacks of tc filter · 7aa0045d
      Cong Wang 提交于
      This patch introduces a dedicated workqueue for tc filters
      so that each tc filter's RCU callback could defer their
      action destroy work to this workqueue. The helper
      tcf_queue_work() is introduced for them to use.
      
      Because we hold RTNL lock when calling tcf_block_put(), we
      can not simply flush works inside it, therefore we have to
      defer it again to this workqueue and make sure all flying RCU
      callbacks have already queued their work before this one, in
      other words, to ensure this is the last one to execute to
      prevent any use-after-free.
      
      On the other hand, this makes tcf_block_put() ugly and
      harder to understand. Since David and Eric strongly dislike
      adding synchronize_rcu(), this is probably the only
      solution that could make everyone happy.
      
      Please also see the code comments below.
      Reported-by: NChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7aa0045d
    • X
      sctp: fix some type cast warnings introduced since very beginning · 978aa047
      Xin Long 提交于
      These warnings were found by running 'make C=2 M=net/sctp/'.
      They are there since very beginning.
      
      Note after this patch, there still one warning left in
      sctp_outq_flush():
        sctp_chunk_fail(chunk, SCTP_ERROR_INV_STRM)
      
      Since it has been moved to sctp_stream_outq_migrate on net-next,
      to avoid the extra job when merging net-next to net, I will post
      the fix for it after the merging is done.
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      978aa047
    • X
      sctp: fix a type cast warnings that causes a_rwnd gets the wrong value · f6fc6bc0
      Xin Long 提交于
      These warnings were found by running 'make C=2 M=net/sctp/'.
      
      Commit d4d6fb57 ("sctp: Try not to change a_rwnd when faking a
      SACK from SHUTDOWN.") expected to use the peers old rwnd and add
      our flight size to the a_rwnd. But with the wrong Endian, it may
      not work as well as expected.
      
      So fix it by converting to the right value.
      
      Fixes: d4d6fb57 ("sctp: Try not to change a_rwnd when faking a SACK from SHUTDOWN.")
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6fc6bc0
    • X
      sctp: fix some type cast warnings introduced by transport rhashtable · 8d32503e
      Xin Long 提交于
      These warnings were found by running 'make C=2 M=net/sctp/'.
      
      They are introduced by not aware of Endian for the port when
      coding transport rhashtable patches.
      
      Fixes: 7fda702f ("sctp: use new rhlist interface on sctp transport rhashtable")
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d32503e
    • X
      sctp: fix some type cast warnings introduced by stream reconf · 1da4fc97
      Xin Long 提交于
      These warnings were found by running 'make C=2 M=net/sctp/'.
      
      They are introduced by not aware of Endian when coding stream
      reconf patches.
      
      Since commit c0d8bab6 ("sctp: add get and set sockopt for
      reconf_enable") enabled stream reconf feature for users, the
      Fixes tag below would use it.
      
      Fixes: c0d8bab6 ("sctp: add get and set sockopt for reconf_enable")
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1da4fc97
    • C
      net_sched: avoid matching qdisc with zero handle · 50317fce
      Cong Wang 提交于
      Davide found the following script triggers a NULL pointer
      dereference:
      
      ip l a name eth0 type dummy
      tc q a dev eth0 parent :1 handle 1: htb
      
      This is because for a freshly created netdevice noop_qdisc
      is attached and when passing 'parent :1', kernel actually
      tries to match the major handle which is 0 and noop_qdisc
      has handle 0 so is matched by mistake. Commit 69012ae4
      tries to fix a similar bug but still misses this case.
      
      Handle 0 is not a valid one, should be just skipped. In
      fact, kernel uses it as TC_H_UNSPEC.
      
      Fixes: 69012ae4 ("net: sched: fix handling of singleton qdiscs with qdisc_hash")
      Fixes: 59cc1f61 ("net: sched:convert qdisc linked list to hashtable")
      Reported-by: NDavide Caratti <dcaratti@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50317fce
    • W
      ipv6: prevent user from adding cached routes · 2ea2352e
      Wei Wang 提交于
      Cached routes should only be created by the system when receiving pmtu
      discovery or ip redirect msg. Users should not be allowed to create
      cached routes.
      
      Furthermore, after the patch series to move cached routes into exception
      table, user added cached routes will trigger the following warning in
      fib6_add():
      
      WARNING: CPU: 0 PID: 2985 at net/ipv6/ip6_fib.c:1137
      fib6_add+0x20d9/0x2c10 net/ipv6/ip6_fib.c:1137
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 0 PID: 2985 Comm: syzkaller320388 Not tainted 4.14.0-rc3+ #74
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       panic+0x1e4/0x417 kernel/panic.c:181
       __warn+0x1c4/0x1d9 kernel/panic.c:542
       report_bug+0x211/0x2d0 lib/bug.c:183
       fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
       do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
       do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
       do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
       do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
       invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
      RIP: 0010:fib6_add+0x20d9/0x2c10 net/ipv6/ip6_fib.c:1137
      RSP: 0018:ffff8801cf09f6a0 EFLAGS: 00010297
      RAX: ffff8801ce45e340 RBX: 1ffff10039e13eec RCX: ffff8801d749c814
      RDX: 0000000000000000 RSI: ffff8801d749c700 RDI: ffff8801d749c780
      RBP: ffff8801cf09fa08 R08: 0000000000000000 R09: ffff8801cf09f360
      R10: ffff8801cf09f2d8 R11: 1ffff10039c8befb R12: 0000000000000001
      R13: dffffc0000000000 R14: ffff8801d749c700 R15: ffffffff860655c0
       __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1011
       ip6_route_add+0x148/0x1a0 net/ipv6/route.c:2782
       ipv6_route_ioctl+0x4d5/0x690 net/ipv6/route.c:3291
       inet6_ioctl+0xef/0x1e0 net/ipv6/af_inet6.c:521
       sock_do_ioctl+0x65/0xb0 net/socket.c:961
       sock_ioctl+0x2c2/0x440 net/socket.c:1058
       vfs_ioctl fs/ioctl.c:45 [inline]
       do_vfs_ioctl+0x1b1/0x1530 fs/ioctl.c:685
       SYSC_ioctl fs/ioctl.c:700 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      So we fix this by failing the attemp to add cached routes from userspace
      with returning EINVAL error.
      
      Fixes: 2b760fcf ("ipv6: hook up exception table to store dst cache")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ea2352e
    • X
      sctp: reset owner sk for data chunks on out queues when migrating a sock · d04adf1b
      Xin Long 提交于
      Now when migrating sock to another one in sctp_sock_migrate(), it only
      resets owner sk for the data in receive queues, not the chunks on out
      queues.
      
      It would cause that data chunks length on the sock is not consistent
      with sk sk_wmem_alloc. When closing the sock or freeing these chunks,
      the old sk would never be freed, and the new sock may crash due to
      the overflow sk_wmem_alloc.
      
      syzbot found this issue with this series:
      
        r0 = socket$inet_sctp()
        sendto$inet(r0)
        listen(r0)
        accept4(r0)
        close(r0)
      
      Although listen() should have returned error when one TCP-style socket
      is in connecting (I may fix this one in another patch), it could also
      be reproduced by peeling off an assoc.
      
      This issue is there since very beginning.
      
      This patch is to reset owner sk for the chunks on out queues so that
      sk sk_wmem_alloc has correct value after accept one sock or peeloff
      an assoc to one sock.
      
      Note that when resetting owner sk for chunks on outqueue, it has to
      sctp_clear_owner_w/skb_orphan chunks before changing assoc->base.sk
      first and then sctp_set_owner_w them after changing assoc->base.sk,
      due to that sctp_wfree and it's callees are using assoc->base.sk.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d04adf1b
    • J
      bpf: rename sk_actions to align with bpf infrastructure · bfa64075
      John Fastabend 提交于
      Recent additions to support multiple programs in cgroups impose
      a strict requirement, "all yes is yes, any no is no". To enforce
      this the infrastructure requires the 'no' return code, SK_DROP in
      this case, to be 0.
      
      To apply these rules to SK_SKB program types the sk_actions return
      codes need to be adjusted.
      
      This fix adds SK_PASS and makes 'SK_DROP = 0'. Finally, remove
      SK_ABORTED to remove any chance that the API may allow aborted
      program flows to be passed up the stack. This would be incorrect
      behavior and allow programs to break existing policies.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfa64075
    • J
      bpf: bpf_compute_data uses incorrect cb structure · 8108a775
      John Fastabend 提交于
      SK_SKB program types use bpf_compute_data to store the end of the
      packet data. However, bpf_compute_data assumes the cb is stored in the
      qdisc layer format. But, for SK_SKB this is the wrong layer of the
      stack for this type.
      
      It happens to work (sort of!) because in most cases nothing happens
      to be overwritten today. This is very fragile and error prone.
      Fortunately, we have another hole in tcp_skb_cb we can use so lets
      put the data_end value there.
      
      Note, SK_SKB program types do not use data_meta, they are failed by
      sk_skb_is_valid_access().
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8108a775
    • G
      l2tp: initialise PPP sessions before registering them · f98be6c6
      Guillaume Nault 提交于
      pppol2tp_connect() initialises L2TP sessions after they've been exposed
      to the rest of the system by l2tp_session_register(). This puts
      sessions into transient states that are the source of several races, in
      particular with session's deletion path.
      
      This patch centralises the initialisation code into
      pppol2tp_session_init(), which is called before the registration phase.
      The only field that can't be set before session registration is the
      pppol2tp socket pointer, which has already been converted to RCU. So
      pppol2tp_connect() should now be race-free.
      
      The session's .session_close() callback is now set before registration.
      Therefore, it's always called when l2tp_core deletes the session, even
      if it was created by pppol2tp_session_create() and hasn't been plugged
      to a pppol2tp socket yet. That'd prevent session free because the extra
      reference taken by pppol2tp_session_close() wouldn't be dropped by the
      socket's ->sk_destruct() callback (pppol2tp_session_destruct()).
      We could set .session_close() only while connecting a session to its
      pppol2tp socket, or teach pppol2tp_session_close() to avoid grabbing a
      reference when the session isn't connected, but that'd require adding
      some form of synchronisation to be race free.
      
      Instead of that, we can just let the pppol2tp socket hold a reference
      on the session as soon as it starts depending on it (that is, in
      pppol2tp_connect()). Then we don't need to utilise
      pppol2tp_session_close() to hold a reference at the last moment to
      prevent l2tp_core from dropping it.
      
      When releasing the socket, pppol2tp_release() now deletes the session
      using the standard l2tp_session_delete() function, instead of merely
      removing it from hash tables. l2tp_session_delete() drops the reference
      the sessions holds on itself, but also makes sure it doesn't remove a
      session twice. So it can safely be called, even if l2tp_core already
      tried, or is concurrently trying, to remove the session.
      Finally, pppol2tp_session_destruct() drops the reference held by the
      socket.
      
      Fixes: fd558d18 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f98be6c6
    • G
      l2tp: protect sock pointer of struct pppol2tp_session with RCU · ee40fb2e
      Guillaume Nault 提交于
      pppol2tp_session_create() registers sessions that can't have their
      corresponding socket initialised. This socket has to be created by
      userspace, then connected to the session by pppol2tp_connect().
      Therefore, we need to protect the pppol2tp socket pointer of L2TP
      sessions, so that it can safely be updated when userspace is connecting
      or closing the socket. This will eventually allow pppol2tp_connect()
      to avoid generating transient states while initialising its parts of the
      session.
      
      To this end, this patch protects the pppol2tp socket pointer using RCU.
      
      The pppol2tp socket pointer is still set in pppol2tp_connect(), but
      only once we know the function isn't going to fail. It's eventually
      reset by pppol2tp_release(), which now has to wait for a grace period
      to elapse before it can drop the last reference on the socket. This
      ensures that pppol2tp_session_get_sock() can safely grab a reference
      on the socket, even after ps->sk is reset to NULL but before this
      operation actually gets visible from pppol2tp_session_get_sock().
      
      The rest is standard RCU conversion: pppol2tp_recv(), which already
      runs in atomic context, is simply enclosed by rcu_read_lock() and
      rcu_read_unlock(), while other functions are converted to use
      pppol2tp_session_get_sock() followed by sock_put().
      pppol2tp_session_setsockopt() is a special case. It used to retrieve
      the pppol2tp socket from the L2TP session, which itself was retrieved
      from the pppol2tp socket. Therefore we can just avoid dereferencing
      ps->sk and directly use the original socket pointer instead.
      
      With all users of ps->sk now handling NULL and concurrent updates, the
      L2TP ->ref() and ->deref() callbacks aren't needed anymore. Therefore,
      rather than converting pppol2tp_session_sock_hold() and
      pppol2tp_session_sock_put(), we can just drop them.
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee40fb2e
    • G
      l2tp: initialise l2tp_eth sessions before registering them · ee28de6b
      Guillaume Nault 提交于
      Sessions must be initialised before being made externally visible by
      l2tp_session_register(). Otherwise the session may be concurrently
      deleted before being initialised, which can confuse the deletion path
      and eventually lead to kernel oops.
      
      Therefore, we need to move l2tp_session_register() down in
      l2tp_eth_create(), but also handle the intermediate step where only the
      session or the netdevice has been registered.
      
      We can't just call l2tp_session_register() in ->ndo_init() because
      we'd have no way to properly undo this operation in ->ndo_uninit().
      Instead, let's register the session and the netdevice in two different
      steps and protect the session's device pointer with RCU.
      
      And now that we allow the session's .dev field to be NULL, we don't
      need to prevent the netdevice from being removed anymore. So we can
      drop the dev_hold() and dev_put() calls in l2tp_eth_create() and
      l2tp_eth_dev_uninit().
      
      Fixes: d9e31d17 ("l2tp: Add L2TP ethernet pseudowire support")
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee28de6b