1. 22 1月, 2016 3 次提交
    • I
      libceph: fix ceph_msg_revoke() · 67645d76
      Ilya Dryomov 提交于
      There are a number of problems with revoking a "was sending" message:
      
      (1) We never make any attempt to revoke data - only kvecs contibute to
      con->out_skip.  However, once the header (envelope) is written to the
      socket, our peer learns data_len and sets itself to expect at least
      data_len bytes to follow front or front+middle.  If ceph_msg_revoke()
      is called while the messenger is sending message's data portion,
      anything we send after that call is counted by the OSD towards the now
      revoked message's data portion.  The effects vary, the most common one
      is the eventual hang - higher layers get stuck waiting for the reply to
      the message that was sent out after ceph_msg_revoke() returned and
      treated by the OSD as a bunch of data bytes.  This is what Matt ran
      into.
      
      (2) Flat out zeroing con->out_kvec_bytes worth of bytes to handle kvecs
      is wrong.  If ceph_msg_revoke() is called before the tag is sent out or
      while the messenger is sending the header, we will get a connection
      reset, either due to a bad tag (0 is not a valid tag) or a bad header
      CRC, which kind of defeats the purpose of revoke.  Currently the kernel
      client refuses to work with header CRCs disabled, but that will likely
      change in the future, making this even worse.
      
      (3) con->out_skip is not reset on connection reset, leading to one or
      more spurious connection resets if we happen to get a real one between
      con->out_skip is set in ceph_msg_revoke() and before it's cleared in
      write_partial_skip().
      
      Fixing (1) and (3) is trivial.  The idea behind fixing (2) is to never
      zero the tag or the header, i.e. send out tag+header regardless of when
      ceph_msg_revoke() is called.  That way the header is always correct, no
      unnecessary resets are induced and revoke stands ready for disabled
      CRCs.  Since ceph_msg_revoke() rips out con->out_msg, introduce a new
      "message out temp" and copy the header into it before sending.
      
      Cc: stable@vger.kernel.org # 4.0+
      Reported-by: NMatt Conner <matt.conner@keepertech.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Tested-by: NMatt Conner <matt.conner@keepertech.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      67645d76
    • G
      libceph: use list_for_each_entry_safe · 10bcee14
      Geliang Tang 提交于
      Use list_for_each_entry_safe() instead of list_for_each_safe() to
      simplify the code.
      Signed-off-by: NGeliang Tang <geliangtang@163.com>
      [idryomov@gmail.com: nuke call to list_splice_init() as well]
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      10bcee14
    • G
      libceph: use list_next_entry instead of list_entry_next · 17ddc49b
      Geliang Tang 提交于
      list_next_entry has been defined in list.h, so I replace list_entry_next
      with it.
      Signed-off-by: NGeliang Tang <geliangtang@163.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      17ddc49b
  2. 07 1月, 2016 2 次提交
    • Y
      tcp: fix zero cwnd in tcp_cwnd_reduction · 8b8a321f
      Yuchung Cheng 提交于
      Patch 3759824d ("tcp: PRR uses CRB mode by default and SS mode
      conditionally") introduced a bug that cwnd may become 0 when both
      inflight and sndcnt are 0 (cwnd = inflight + sndcnt). This may lead
      to a div-by-zero if the connection starts another cwnd reduction
      phase by setting tp->prior_cwnd to the current cwnd (0) in
      tcp_init_cwnd_reduction().
      
      To prevent this we skip PRR operation when nothing is acked or
      sacked. Then cwnd must be positive in all cases as long as ssthresh
      is positive:
      
      1) The proportional reduction mode
         inflight > ssthresh > 0
      
      2) The reduction bound mode
        a) inflight == ssthresh > 0
      
        b) inflight < ssthresh
           sndcnt > 0 since newly_acked_sacked > 0 and inflight < ssthresh
      
      Therefore in all cases inflight and sndcnt can not both be 0.
      We check invalid tp->prior_cwnd to avoid potential div0 bugs.
      
      In reality this bug is triggered only with a sequence of less common
      events.  For example, the connection is terminating an ECN-triggered
      cwnd reduction with an inflight 0, then it receives reordered/old
      ACKs or DSACKs from prior transmission (which acks nothing). Or the
      connection is in fast recovery stage that marks everything lost,
      but fails to retransmit due to local issues, then receives data
      packets from other end which acks nothing.
      
      Fixes: 3759824d ("tcp: PRR uses CRB mode by default and SS mode conditionally")
      Reported-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b8a321f
    • F
      net: possible use after free in dst_release · 07a5d384
      Francesco Ruggeri 提交于
      dst_release should not access dst->flags after decrementing
      __refcnt to 0. The dst_entry may be in dst_busy_list and
      dst_gc_task may dst_destroy it before dst_release gets a chance
      to access dst->flags.
      
      Fixes: d69bbf88 ("net: fix a race in dst_release()")
      Fixes: 27b75c95 ("net: avoid RCU for NOCACHE dst")
      Signed-off-by: NFrancesco Ruggeri <fruggeri@arista.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07a5d384
  3. 06 1月, 2016 2 次提交
  4. 05 1月, 2016 2 次提交
    • R
      af_unix: Fix splice-bind deadlock · c845acb3
      Rainer Weikusat 提交于
      On 2015/11/06, Dmitry Vyukov reported a deadlock involving the splice
      system call and AF_UNIX sockets,
      
      http://lists.openwall.net/netdev/2015/11/06/24
      
      The situation was analyzed as
      
      (a while ago) A: socketpair()
      B: splice() from a pipe to /mnt/regular_file
      	does sb_start_write() on /mnt
      C: try to freeze /mnt
      	wait for B to finish with /mnt
      A: bind() try to bind our socket to /mnt/new_socket_name
      	lock our socket, see it not bound yet
      	decide that it needs to create something in /mnt
      	try to do sb_start_write() on /mnt, block (it's
      	waiting for C).
      D: splice() from the same pipe to our socket
      	lock the pipe, see that socket is connected
      	try to lock the socket, block waiting for A
      B:	get around to actually feeding a chunk from
      	pipe to file, try to lock the pipe.  Deadlock.
      
      on 2015/11/10 by Al Viro,
      
      http://lists.openwall.net/netdev/2015/11/10/4
      
      The patch fixes this by removing the kern_path_create related code from
      unix_mknod and executing it as part of unix_bind prior acquiring the
      readlock of the socket in question. This means that A (as used above)
      will sb_start_write on /mnt before it acquires the readlock, hence, it
      won't indirectly block B which first did a sb_start_write and then
      waited for a thread trying to acquire the readlock. Consequently, A
      being blocked by C waiting for B won't cause a deadlock anymore
      (effectively, both A and B acquire two locks in opposite order in the
      situation described above).
      
      Dmitry Vyukov(<dvyukov@google.com>) tested the original patch.
      Signed-off-by: NRainer Weikusat <rweikusat@mobileactivedefense.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c845acb3
    • D
      net: Propagate lookup failure in l3mdev_get_saddr to caller · b5bdacf3
      David Ahern 提交于
      Commands run in a vrf context are not failing as expected on a route lookup:
          root@kenny:~# ip ro ls table vrf-red
          unreachable default
      
          root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254
          ping: Warning: source address might be selected on device other than vrf-red.
          PING 10.100.1.254 (10.100.1.254) from 0.0.0.0 vrf-red: 56(84) bytes of data.
      
          --- 10.100.1.254 ping statistics ---
          2 packets transmitted, 0 received, 100% packet loss, time 999ms
      
      Since the vrf table does not have a route for 10.100.1.254 the ping
      should have failed. The saddr lookup causes a full VRF table lookup.
      Propogating a lookup failure to the user allows the command to fail as
      expected:
      
          root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254
          connect: No route to host
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5bdacf3
  5. 31 12月, 2015 2 次提交
    • X
      sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close · 068d8bd3
      Xin Long 提交于
      In sctp_close, sctp_make_abort_user may return NULL because of memory
      allocation failure. If this happens, it will bypass any state change
      and never free the assoc. The assoc has no chance to be freed and it
      will be kept in memory with the state it had even after the socket is
      closed by sctp_close().
      
      So if sctp_make_abort_user fails to allocate memory, we should abort
      the asoc via sctp_primitive_ABORT as well. Just like the annotation in
      sctp_sf_cookie_wait_prm_abort and sctp_sf_do_9_1_prm_abort said,
      "Even if we can't send the ABORT due to low memory delete the TCB.
      This is a departure from our typical NOMEM handling".
      
      But then the chunk is NULL (low memory) and the SCTP_CMD_REPLY cmd would
      dereference the chunk pointer, and system crash. So we should add
      SCTP_CMD_REPLY cmd only when the chunk is not NULL, just like other
      places where it adds SCTP_CMD_REPLY cmd.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      068d8bd3
    • N
      net, socket, socket_wq: fix missing initialization of flags · 574aab1e
      Nicolai Stange 提交于
      Commit ceb5d58b ("net: fix sock_wake_async() rcu protection") from
      the current 4.4 release cycle introduced a new flags member in
      struct socket_wq and moved SOCKWQ_ASYNC_NOSPACE and SOCKWQ_ASYNC_WAITDATA
      from struct socket's flags member into that new place.
      
      Unfortunately, the new flags field is never initialized properly, at least
      not for the struct socket_wq instance created in sock_alloc_inode().
      
      One particular issue I encountered because of this is that my GNU Emacs
      failed to draw anything on my desktop -- i.e. what I got is a transparent
      window, including the title bar. Bisection lead to the commit mentioned
      above and further investigation by means of strace told me that Emacs
      is indeed speaking to my Xorg through an O_ASYNC AF_UNIX socket. This is
      reproducible 100% of times and the fact that properly initializing the
      struct socket_wq ->flags fixes the issue leads me to the conclusion that
      somehow SOCKWQ_ASYNC_WAITDATA got set in the uninitialized ->flags,
      preventing my Emacs from receiving any SIGIO's due to data becoming
      available and it got stuck.
      
      Make sock_alloc_inode() set the newly created struct socket_wq's ->flags
      member to zero.
      
      Fixes: ceb5d58b ("net: fix sock_wake_async() rcu protection")
      Signed-off-by: NNicolai Stange <nicstange@gmail.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      574aab1e
  6. 30 12月, 2015 1 次提交
    • J
      openvswitch: Fix template leak in error cases. · 90c7afc9
      Joe Stringer 提交于
      Commit 5b48bb8506c5 ("openvswitch: Fix helper reference leak") fixed a
      reference leak on helper objects, but inadvertently introduced a leak on
      the ct template.
      
      Previously, ct_info.ct->general.use was initialized to 0 by
      nf_ct_tmpl_alloc() and only incremented when ovs_ct_copy_action()
      returned successful. If an error occurred while adding the helper or
      adding the action to the actions buffer, the __ovs_ct_free_action()
      cleanup would use nf_ct_put() to free the entry; However, this relies on
      atomic_dec_and_test(ct_info.ct->general.use). This reference must be
      incremented first, or nf_ct_put() will never free it.
      
      Fix the issue by acquiring a reference to the template immediately after
      allocation.
      
      Fixes: cae3a262 ("openvswitch: Allow attaching helpers to ct action")
      Fixes: 5b48bb8506c5 ("openvswitch: Fix helper reference leak")
      Signed-off-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90c7afc9
  7. 28 12月, 2015 2 次提交
  8. 24 12月, 2015 1 次提交
  9. 23 12月, 2015 3 次提交
  10. 19 12月, 2015 2 次提交
  11. 18 12月, 2015 5 次提交
  12. 17 12月, 2015 1 次提交
  13. 16 12月, 2015 4 次提交
  14. 15 12月, 2015 10 次提交