1. 10 7月, 2019 5 次提交
    • S
      rds: avoid version downgrade to legitimate newer peer connections · dc205a8d
      Santosh Shilimkar 提交于
      Connections with legitimate tos values can get into usual connection
      race. It can result in consumer reject. We don't want tos value or
      protocol version to be demoted for such connections otherwise
      piers would end up different tos values which can results in
      no connection. Example a peer initiated connection with say
      tos 8 while usual connection racing can get downgraded to tos 0
      which is not desirable.
      
      Patch fixes above issue introduced by commit
      commit d021fabf ("rds: rdma: add consumer reject")
      Reported-by: NYanjun Zhu <yanjun.zhu@oracle.com>
      Tested-by: NYanjun Zhu <yanjun.zhu@oracle.com>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      dc205a8d
    • G
      rds: Return proper "tos" value to user-space · fc640d4c
      Gerd Rausch 提交于
      The proper "tos" value needs to be returned
      to user-space (sockopt RDS_INFO_CONNECTIONS).
      
      Fixes: 3eb45036 ("rds: add type of service(tos) infrastructure")
      Signed-off-by: NGerd Rausch <gerd.rausch@oracle.com>
      Reviewed-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      fc640d4c
    • G
      rds: Accept peer connection reject messages due to incompatible version · 8c6166cf
      Gerd Rausch 提交于
      Prior to
      commit d021fabf ("rds: rdma: add consumer reject")
      
      function "rds_rdma_cm_event_handler_cmn" would always honor a rejected
      connection attempt by issuing a "rds_conn_drop".
      
      The commit mentioned above added a "break", eliminating
      the "fallthrough" case and made the "rds_conn_drop" rather conditional:
      
      Now it only happens if a "consumer defined" reject (i.e. "rdma_reject")
      carries an integer-value of "1" inside "private_data":
      
        if (!conn)
          break;
          err = (int *)rdma_consumer_reject_data(cm_id, event, &len);
          if (!err || (err && ((*err) == RDS_RDMA_REJ_INCOMPAT))) {
            pr_warn("RDS/RDMA: conn <%pI6c, %pI6c> rejected, dropping connection\n",
                    &conn->c_laddr, &conn->c_faddr);
                    conn->c_proposed_version = RDS_PROTOCOL_COMPAT_VERSION;
                    rds_conn_drop(conn);
          }
          rdsdebug("Connection rejected: %s\n",
                   rdma_reject_msg(cm_id, event->status));
          break;
          /* FALLTHROUGH */
      A number of issues are worth mentioning here:
         #1) Previous versions of the RDS code simply rejected a connection
             by calling "rdma_reject(cm_id, NULL, 0);"
             So the value of the payload in "private_data" will not be "1",
             but "0".
      
         #2) Now the code has become dependent on host byte order and sizing.
             If one peer is big-endian, the other is little-endian,
             or there's a difference in sizeof(int) (e.g. ILP64 vs LP64),
             the *err check does not work as intended.
      
         #3) There is no check for "len" to see if the data behind *err is even valid.
             Luckily, it appears that the "rdma_reject(cm_id, NULL, 0)" will always
             carry 148 bytes of zeroized payload.
             But that should probably not be relied upon here.
      
         #4) With the added "break;",
             we might as well drop the misleading "/* FALLTHROUGH */" comment.
      
      This commit does _not_ address issue #2, as the sender would have to
      agree on a byte order as well.
      
      Here is the sequence of messages in this observed error-scenario:
         Host-A is pre-QoS changes (excluding the commit mentioned above)
         Host-B is post-QoS changes (including the commit mentioned above)
      
         #1 Host-B
            issues a connection request via function "rds_conn_path_transition"
            connection state transitions to "RDS_CONN_CONNECTING"
      
         #2 Host-A
            rejects the incompatible connection request (from #1)
            It does so by calling "rdma_reject(cm_id, NULL, 0);"
      
         #3 Host-B
            receives an "RDMA_CM_EVENT_REJECTED" event (from #2)
            But since the code is changed in the way described above,
            it won't drop the connection here, simply because "*err == 0".
      
         #4 Host-A
            issues a connection request
      
         #5 Host-B
            receives an "RDMA_CM_EVENT_CONNECT_REQUEST" event
            and ends up calling "rds_ib_cm_handle_connect".
            But since the state is already in "RDS_CONN_CONNECTING"
            (as of #1) it will end up issuing a "rdma_reject" without
            dropping the connection:
               if (rds_conn_state(conn) == RDS_CONN_CONNECTING) {
                   /* Wait and see - our connect may still be succeeding */
                   rds_ib_stats_inc(s_ib_connect_raced);
               }
               goto out;
      
         #6 Host-A
            receives an "RDMA_CM_EVENT_REJECTED" event (from #5),
            drops the connection and tries again (goto #4) until it gives up.
      Tested-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: NGerd Rausch <gerd.rausch@oracle.com>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      8c6166cf
    • G
      Revert "RDS: IB: split the mr registration and invalidation path" · a5520788
      Gerd Rausch 提交于
      This reverts commit 56012459.
      
      RDS kept spinning inside function "rds_ib_post_reg_frmr", waiting for
      "i_fastreg_wrs" to become incremented:
               while (atomic_dec_return(&ibmr->ic->i_fastreg_wrs) <= 0) {
                       atomic_inc(&ibmr->ic->i_fastreg_wrs);
                       cpu_relax();
               }
      
      Looking at the original commit:
      
      commit 56012459 ("RDS: IB: split the mr registration and
      invalidation path")
      
      In there, the "rds_ib_mr_cqe_handler" was changed in the following
      way:
      
       void rds_ib_mr_cqe_handler(struct
       rds_ib_connection *ic,
       struct ib_wc *wc)
              if (frmr->fr_inv) {
                        frmr->fr_state = FRMR_IS_FREE;
                        frmr->fr_inv = false;
                      atomic_inc(&ic->i_fastreg_wrs);
              } else {
                      atomic_inc(&ic->i_fastunreg_wrs);
              }
      
      It looks like it's got it exactly backwards:
      
      Function "rds_ib_post_reg_frmr" keeps track of the outstanding
      requests via "i_fastreg_wrs".
      
      Function "rds_ib_post_inv" keeps track of the outstanding requests
      via "i_fastunreg_wrs" (post original commit). It also sets:
               frmr->fr_inv = true;
      
      However the completion handler "rds_ib_mr_cqe_handler" adjusts
      "i_fastreg_wrs" when "fr_inv" had been true, and adjusts
      "i_fastunreg_wrs" otherwise.
      
      The original commit was done in the name of performance:
      to remove the performance bottleneck
      
      No performance benefit could be observed with a fixed-up version
      of the original commit measured between two Oracle X7 servers,
      both equipped with Mellanox Connect-X5 HCAs.
      
      The prudent course of action is to revert this commit.
      Signed-off-by: NGerd Rausch <gerd.rausch@oracle.com>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      a5520788
    • S
      rds: fix reordering with composite message notification · 616d37a0
      Santosh Shilimkar 提交于
      RDS composite message(rdma + control) user notification needs to be
      triggered once the full message is delivered and such a fix was
      added as part of commit 941f8d55 ("RDS: RDMA: Fix the composite
      message user notification"). But rds_send_remove_from_sock is missing
      data part notify check and hence at times the user don't get
      notification which isn't desirable.
      
      One way is to fix the rds_send_remove_from_sock to check of that case
      but considering the ordering complexity with completion handler and
      rdma + control messages are always dispatched back to back in same send
      context, just delaying the signaled completion on rmda work request also
      gets the desired behaviour. i.e Notifying application only after
      RDMA + control message send completes. So patch updates the earlier
      fix with this approach. The delay signaling completions of rdma op
      till the control message send completes fix was done by Venkat
      Venkatsubra in downstream kernel.
      Reviewed-and-tested-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Reviewed-by: NGerd Rausch <gerd.rausch@oracle.com>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      616d37a0
  2. 07 6月, 2019 1 次提交
    • Z
      net: rds: fix memory leak in rds_ib_flush_mr_pool · 85cb9287
      Zhu Yanjun 提交于
      When the following tests last for several hours, the problem will occur.
      
      Server:
          rds-stress -r 1.1.1.16 -D 1M
      Client:
          rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M -T 30
      
      The following will occur.
      
      "
      Starting up....
      tsks   tx/s   rx/s  tx+rx K/s    mbi K/s    mbo K/s tx us/c   rtt us cpu
      %
        1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
        1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
        1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
        1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
      "
      >From vmcore, we can find that clean_list is NULL.
      
      >From the source code, rds_mr_flushd calls rds_ib_mr_pool_flush_worker.
      Then rds_ib_mr_pool_flush_worker calls
      "
       rds_ib_flush_mr_pool(pool, 0, NULL);
      "
      Then in function
      "
      int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
                               int free_all, struct rds_ib_mr **ibmr_ret)
      "
      ibmr_ret is NULL.
      
      In the source code,
      "
      ...
      list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail);
      if (ibmr_ret)
              *ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode);
      
      /* more than one entry in llist nodes */
      if (clean_nodes->next)
              llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list);
      ...
      "
      When ibmr_ret is NULL, llist_entry is not executed. clean_nodes->next
      instead of clean_nodes is added in clean_list.
      So clean_nodes is discarded. It can not be used again.
      The workqueue is executed periodically. So more and more clean_nodes are
      discarded. Finally the clean_list is NULL.
      Then this problem will occur.
      
      Fixes: 1bc144b6 ("net, rds, Replace xlist in net/rds/xlist.h with llist")
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85cb9287
  3. 06 6月, 2019 2 次提交
    • Z
      net: rds: fix memory leak when unload rds_rdma · b50e0587
      Zhu Yanjun 提交于
      When KASAN is enabled, after several rds connections are
      created, then "rmmod rds_rdma" is run. The following will
      appear.
      
      "
      BUG rds_ib_incoming (Not tainted): Objects remaining
      in rds_ib_incoming on __kmem_cache_shutdown()
      
      Call Trace:
       dump_stack+0x71/0xab
       slab_err+0xad/0xd0
       __kmem_cache_shutdown+0x17d/0x370
       shutdown_cache+0x17/0x130
       kmem_cache_destroy+0x1df/0x210
       rds_ib_recv_exit+0x11/0x20 [rds_rdma]
       rds_ib_exit+0x7a/0x90 [rds_rdma]
       __x64_sys_delete_module+0x224/0x2c0
       ? __ia32_sys_delete_module+0x2c0/0x2c0
       do_syscall_64+0x73/0x190
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      "
      This is rds connection memory leak. The root cause is:
      When "rmmod rds_rdma" is run, rds_ib_remove_one will call
      rds_ib_dev_shutdown to drop the rds connections.
      rds_ib_dev_shutdown will call rds_conn_drop to drop rds
      connections as below.
      "
      rds_conn_path_drop(&conn->c_path[0], false);
      "
      In the above, destroy is set to false.
      void rds_conn_path_drop(struct rds_conn_path *cp, bool destroy)
      {
              atomic_set(&cp->cp_state, RDS_CONN_ERROR);
      
              rcu_read_lock();
              if (!destroy && rds_destroy_pending(cp->cp_conn)) {
                      rcu_read_unlock();
                      return;
              }
              queue_work(rds_wq, &cp->cp_down_w);
              rcu_read_unlock();
      }
      In the above function, destroy is set to false. rds_destroy_pending
      is called. This does not move rds connections to ib_nodev_conns.
      So destroy is set to true to move rds connections to ib_nodev_conns.
      In rds_ib_unregister_client, flush_workqueue is called to make rds_wq
      finsh shutdown rds connections. The function rds_ib_destroy_nodev_conns
      is called to shutdown rds connections finally.
      Then rds_ib_recv_exit is called to destroy slab.
      
      void rds_ib_recv_exit(void)
      {
              kmem_cache_destroy(rds_ib_incoming_slab);
              kmem_cache_destroy(rds_ib_frag_slab);
      }
      The above slab memory leak will not occur again.
      
      >From tests,
      256 rds connections
      [root@ca-dev14 ~]# time rmmod rds_rdma
      
      real    0m16.522s
      user    0m0.000s
      sys     0m8.152s
      512 rds connections
      [root@ca-dev14 ~]# time rmmod rds_rdma
      
      real    0m32.054s
      user    0m0.000s
      sys     0m15.568s
      
      To rmmod rds_rdma with 256 rds connections, about 16 seconds are needed.
      And with 512 rds connections, about 32 seconds are needed.
      >From ftrace, when one rds connection is destroyed,
      
      "
       19)               |  rds_conn_destroy [rds]() {
       19)   7.782 us    |    rds_conn_path_drop [rds]();
       15)               |  rds_shutdown_worker [rds]() {
       15)               |    rds_conn_shutdown [rds]() {
       15)   1.651 us    |      rds_send_path_reset [rds]();
       15)   7.195 us    |    }
       15) + 11.434 us   |  }
       19)   2.285 us    |    rds_cong_remove_conn [rds]();
       19) * 24062.76 us |  }
      "
      So if many rds connections will be destroyed, this function
      rds_ib_destroy_nodev_conns uses most of time.
      Suggested-by: NHåkon Bugge <haakon.bugge@oracle.com>
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b50e0587
    • Z
      net: rds: add per rds connection cache statistics · fe3475af
      Zhu Yanjun 提交于
      The variable cache_allocs is to indicate how many frags (KiB) are in one
      rds connection frag cache.
      The command "rds-info -Iv" will output the rds connection cache
      statistics as below:
      "
      RDS IB Connections:
            LocalAddr RemoteAddr Tos SL  LocalDev            RemoteDev
            1.1.1.14 1.1.1.14   58 255  fe80::2:c903:a:7a31 fe80::2:c903:a:7a31
            send_wr=256, recv_wr=1024, send_sge=8, rdma_mr_max=4096,
            rdma_mr_size=257, cache_allocs=12
      "
      This means that there are about 12KiB frag in this rds connection frag
      cache.
      Since rds.h in rds-tools is not related with the kernel rds.h, the change
      in kernel rds.h does not affect rds-tools.
      rds-info in rds-tools 2.0.5 and 2.0.6 is tested with this commit. It works
      well.
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe3475af
  4. 21 5月, 2019 1 次提交
  5. 15 5月, 2019 1 次提交
    • I
      mm/gup: change GUP fast to use flags rather than a write 'bool' · 73b0140b
      Ira Weiny 提交于
      To facilitate additional options to get_user_pages_fast() change the
      singular write parameter to be gup_flags.
      
      This patch does not change any functionality.  New functionality will
      follow in subsequent patches.
      
      Some of the get_user_pages_fast() call sites were unchanged because they
      already passed FOLL_WRITE or 0 for the write parameter.
      
      NOTE: It was suggested to change the ordering of the get_user_pages_fast()
      arguments to ensure that callers were converted.  This breaks the current
      GUP call site convention of having the returned pages be the final
      parameter.  So the suggestion was rejected.
      
      Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
      Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.comSigned-off-by: NIra Weiny <ira.weiny@intel.com>
      Reviewed-by: NMike Marshall <hubcap@omnibond.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73b0140b
  6. 06 5月, 2019 1 次提交
  7. 02 5月, 2019 1 次提交
  8. 25 4月, 2019 1 次提交
    • Z
      net: rds: exchange of 8K and 1M pool · 4b9fc714
      Zhu Yanjun 提交于
      Before the commit 490ea596 ("RDS: IB: move FMR code to its own file"),
      when the dirty_count is greater than 9/10 of max_items of 8K pool,
      1M pool is used, Vice versa. After the commit 490ea596 ("RDS: IB: move
      FMR code to its own file"), the above is removed. When we make the
      following tests.
      
      Server:
        rds-stress -r 1.1.1.16 -D 1M
      
      Client:
        rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M
      
      The following will appear.
      "
      connecting to 1.1.1.16:4000
      negotiated options, tasks will start in 2 seconds
      Starting up..header from 1.1.1.166:4001 to id 4001 bogus
      ..
      tsks  tx/s  rx/s tx+rx K/s  mbi K/s  mbo K/s tx us/c  rtt us
      cpu %
         1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
         1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
         1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
         1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
         1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
      ...
      "
      So this exchange between 8K and 1M pool is added back.
      
      Fixes: commit 490ea596 ("RDS: IB: move FMR code to its own file")
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b9fc714
  9. 13 4月, 2019 1 次提交
  10. 29 3月, 2019 1 次提交
    • M
      net: rds: force to destroy connection if t_sock is NULL in rds_tcp_kill_sock(). · cb66ddd1
      Mao Wenan 提交于
      When it is to cleanup net namespace, rds_tcp_exit_net() will call
      rds_tcp_kill_sock(), if t_sock is NULL, it will not call
      rds_conn_destroy(), rds_conn_path_destroy() and rds_tcp_conn_free() to free
      connection, and the worker cp_conn_w is not stopped, afterwards the net is freed in
      net_drop_ns(); While cp_conn_w rds_connect_worker() will call rds_tcp_conn_path_connect()
      and reference 'net' which has already been freed.
      
      In rds_tcp_conn_path_connect(), rds_tcp_set_callbacks() will set t_sock = sock before
      sock->ops->connect, but if connect() is failed, it will call
      rds_tcp_restore_callbacks() and set t_sock = NULL, if connect is always
      failed, rds_connect_worker() will try to reconnect all the time, so
      rds_tcp_kill_sock() will never to cancel worker cp_conn_w and free the
      connections.
      
      Therefore, the condition !tc->t_sock is not needed if it is going to do
      cleanup_net->rds_tcp_exit_net->rds_tcp_kill_sock, because tc->t_sock is always
      NULL, and there is on other path to cancel cp_conn_w and free
      connection. So this patch is to fix this.
      
      rds_tcp_kill_sock():
      ...
      if (net != c_net || !tc->t_sock)
      ...
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      
      ==================================================================
      BUG: KASAN: use-after-free in inet_create+0xbcc/0xd28
      net/ipv4/af_inet.c:340
      Read of size 4 at addr ffff8003496a4684 by task kworker/u8:4/3721
      
      CPU: 3 PID: 3721 Comm: kworker/u8:4 Not tainted 5.1.0 #11
      Hardware name: linux,dummy-virt (DT)
      Workqueue: krdsd rds_connect_worker
      Call trace:
       dump_backtrace+0x0/0x3c0 arch/arm64/kernel/time.c:53
       show_stack+0x28/0x38 arch/arm64/kernel/traps.c:152
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x120/0x188 lib/dump_stack.c:113
       print_address_description+0x68/0x278 mm/kasan/report.c:253
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x21c/0x348 mm/kasan/report.c:409
       __asan_report_load4_noabort+0x30/0x40 mm/kasan/report.c:429
       inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340
       __sock_create+0x4f8/0x770 net/socket.c:1276
       sock_create_kern+0x50/0x68 net/socket.c:1322
       rds_tcp_conn_path_connect+0x2b4/0x690 net/rds/tcp_connect.c:114
       rds_connect_worker+0x108/0x1d0 net/rds/threads.c:175
       process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153
       worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296
       kthread+0x2f0/0x378 kernel/kthread.c:255
       ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
      
      Allocated by task 687:
       save_stack mm/kasan/kasan.c:448 [inline]
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xd4/0x180 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x14/0x20 mm/kasan/kasan.c:490
       slab_post_alloc_hook mm/slab.h:444 [inline]
       slab_alloc_node mm/slub.c:2705 [inline]
       slab_alloc mm/slub.c:2713 [inline]
       kmem_cache_alloc+0x14c/0x388 mm/slub.c:2718
       kmem_cache_zalloc include/linux/slab.h:697 [inline]
       net_alloc net/core/net_namespace.c:384 [inline]
       copy_net_ns+0xc4/0x2d0 net/core/net_namespace.c:424
       create_new_namespaces+0x300/0x658 kernel/nsproxy.c:107
       unshare_nsproxy_namespaces+0xa0/0x198 kernel/nsproxy.c:206
       ksys_unshare+0x340/0x628 kernel/fork.c:2577
       __do_sys_unshare kernel/fork.c:2645 [inline]
       __se_sys_unshare kernel/fork.c:2643 [inline]
       __arm64_sys_unshare+0x38/0x58 kernel/fork.c:2643
       __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:47 [inline]
       el0_svc_common+0x168/0x390 arch/arm64/kernel/syscall.c:83
       el0_svc_handler+0x60/0xd0 arch/arm64/kernel/syscall.c:129
       el0_svc+0x8/0xc arch/arm64/kernel/entry.S:960
      
      Freed by task 264:
       save_stack mm/kasan/kasan.c:448 [inline]
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x114/0x220 mm/kasan/kasan.c:521
       kasan_slab_free+0x10/0x18 mm/kasan/kasan.c:528
       slab_free_hook mm/slub.c:1370 [inline]
       slab_free_freelist_hook mm/slub.c:1397 [inline]
       slab_free mm/slub.c:2952 [inline]
       kmem_cache_free+0xb8/0x3a8 mm/slub.c:2968
       net_free net/core/net_namespace.c:400 [inline]
       net_drop_ns.part.6+0x78/0x90 net/core/net_namespace.c:407
       net_drop_ns net/core/net_namespace.c:406 [inline]
       cleanup_net+0x53c/0x6d8 net/core/net_namespace.c:569
       process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153
       worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296
       kthread+0x2f0/0x378 kernel/kthread.c:255
       ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
      
      The buggy address belongs to the object at ffff8003496a3f80
       which belongs to the cache net_namespace of size 7872
      The buggy address is located 1796 bytes inside of
       7872-byte region [ffff8003496a3f80, ffff8003496a5e40)
      The buggy address belongs to the page:
      page:ffff7e000d25a800 count:1 mapcount:0 mapping:ffff80036ce4b000
      index:0x0 compound_mapcount: 0
      flags: 0xffffe0000008100(slab|head)
      raw: 0ffffe0000008100 dead000000000100 dead000000000200 ffff80036ce4b000
      raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8003496a4580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8003496a4600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8003496a4680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                         ^
       ffff8003496a4700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8003496a4780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      
      Fixes: 467fa153("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NMao Wenan <maowenan@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb66ddd1
  11. 05 2月, 2019 6 次提交
  12. 04 2月, 2019 3 次提交
  13. 01 2月, 2019 1 次提交
    • E
      rds: fix refcount bug in rds_sock_addref · 6fa19f56
      Eric Dumazet 提交于
      syzbot was able to catch a bug in rds [1]
      
      The issue here is that the socket might be found in a hash table
      but that its refcount has already be set to 0 by another cpu.
      
      We need to use refcount_inc_not_zero() to be safe here.
      
      [1]
      
      refcount_t: increment on 0; use-after-free.
      WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked lib/refcount.c:153 [inline]
      WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked+0x61/0x70 lib/refcount.c:151
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 1 PID: 23129 Comm: syz-executor3 Not tainted 5.0.0-rc4+ #53
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
       panic+0x2cb/0x65c kernel/panic.c:214
       __warn.cold+0x20/0x48 kernel/panic.c:571
       report_bug+0x263/0x2b0 lib/bug.c:186
       fixup_bug arch/x86/kernel/traps.c:178 [inline]
       fixup_bug arch/x86/kernel/traps.c:173 [inline]
       do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
       do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
       invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
      RIP: 0010:refcount_inc_checked lib/refcount.c:153 [inline]
      RIP: 0010:refcount_inc_checked+0x61/0x70 lib/refcount.c:151
      Code: 1d 51 63 c8 06 31 ff 89 de e8 eb 1b f2 fd 84 db 75 dd e8 a2 1a f2 fd 48 c7 c7 60 9f 81 88 c6 05 31 63 c8 06 01 e8 af 65 bb fd <0f> 0b eb c1 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 54 49
      RSP: 0018:ffff8880a0cbf1e8 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc90006113000
      RDX: 000000000001047d RSI: ffffffff81685776 RDI: 0000000000000005
      RBP: ffff8880a0cbf1f8 R08: ffff888097c9e100 R09: ffffed1015ce5021
      R10: ffffed1015ce5020 R11: ffff8880ae728107 R12: ffff8880723c20c0
      R13: ffff8880723c24b0 R14: dffffc0000000000 R15: ffffed1014197e64
       sock_hold include/net/sock.h:647 [inline]
       rds_sock_addref+0x19/0x20 net/rds/af_rds.c:675
       rds_find_bound+0x97c/0x1080 net/rds/bind.c:82
       rds_recv_incoming+0x3be/0x1430 net/rds/recv.c:362
       rds_loop_xmit+0xf3/0x2a0 net/rds/loop.c:96
       rds_send_xmit+0x1355/0x2a10 net/rds/send.c:355
       rds_sendmsg+0x323c/0x44e0 net/rds/send.c:1368
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg+0xdd/0x130 net/socket.c:631
       __sys_sendto+0x387/0x5f0 net/socket.c:1788
       __do_sys_sendto net/socket.c:1800 [inline]
       __se_sys_sendto net/socket.c:1796 [inline]
       __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1796
       do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x458089
      Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fc266df8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 0000000000458089
      RDX: 0000000000000000 RSI: 00000000204b3fff RDI: 0000000000000005
      RBP: 000000000073bf00 R08: 00000000202b4000 R09: 0000000000000010
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc266df96d4
      R13: 00000000004c56e4 R14: 00000000004d94a8 R15: 00000000ffffffff
      
      Fixes: cc4dfb7f ("rds: fix two RCU related problems")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
      Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
      Cc: rds-devel@oss.oracle.com
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fa19f56
  14. 07 1月, 2019 1 次提交
  15. 02 1月, 2019 1 次提交
  16. 20 12月, 2018 3 次提交
    • D
      rds: Fix warning. · d84e7bc0
      David S. Miller 提交于
      >> net/rds/send.c:1109:42: warning: Using plain integer as NULL pointer
      
      Fixes: ea010070 ("net/rds: fix warn in rds_message_alloc_sgs")
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d84e7bc0
    • S
      net/rds: remove user triggered WARN_ON in rds_sendmsg · c75ab8a5
      shamir rabinovitch 提交于
      per comment from Leon in rdma mailing list
      https://lkml.org/lkml/2018/10/31/312 :
      
      Please don't forget to remove user triggered WARN_ON.
      https://lwn.net/Articles/769365/
      "Greg Kroah-Hartman raised the problem of core kernel API code that will
      use WARN_ON_ONCE() to complain about bad usage; that will not generate
      the desired result if WARN_ON_ONCE() is configured to crash the machine.
      He was told that the code should just call pr_warn() instead, and that
      the called function should return an error in such situations. It was
      generally agreed that any WARN_ON() or WARN_ON_ONCE() calls that can be
      triggered from user space need to be fixed."
      
      in addition harden rds_sendmsg to detect and overcome issues with
      invalid sg count and fail the sendmsg.
      Suggested-by: NLeon Romanovsky <leon@kernel.org>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: Nshamir rabinovitch <shamir.rabinovitch@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c75ab8a5
    • S
      net/rds: fix warn in rds_message_alloc_sgs · ea010070
      shamir rabinovitch 提交于
      redundant copy_from_user in rds_sendmsg system call expose rds
      to issue where rds_rdma_extra_size walk the rds iovec and and
      calculate the number pf pages (sgs) it need to add to the tail of
      rds message and later rds_cmsg_rdma_args copy the rds iovec again
      and re calculate the same number and get different result causing
      WARN_ON in rds_message_alloc_sgs.
      
      fix this by doing the copy_from_user only once per rds_sendmsg
      system call.
      
      When issue occur the below dump is seen:
      
      WARNING: CPU: 0 PID: 19789 at net/rds/message.c:316 rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 0 PID: 19789 Comm: syz-executor827 Not tainted 4.19.0-next-20181030+ #101
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x244/0x39d lib/dump_stack.c:113
       panic+0x2ad/0x55c kernel/panic.c:188
       __warn.cold.8+0x20/0x45 kernel/panic.c:540
       report_bug+0x254/0x2d0 lib/bug.c:186
       fixup_bug arch/x86/kernel/traps.c:178 [inline]
       do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
       do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
       invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
      RIP: 0010:rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316
      Code: c0 74 04 3c 03 7e 6c 44 01 ab 78 01 00 00 e8 2b 9e 35 fa 4c 89 e0 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 14 9e 35 fa <0f> 0b 31 ff 44 89 ee e8 18 9f 35 fa 45 85 ed 75 1b e8 fe 9d 35 fa
      RSP: 0018:ffff8801c51b7460 EFLAGS: 00010293
      RAX: ffff8801bc412080 RBX: ffff8801d7bf4040 RCX: ffffffff8749c9e6
      RDX: 0000000000000000 RSI: ffffffff8749ca5c RDI: 0000000000000004
      RBP: ffff8801c51b7490 R08: ffff8801bc412080 R09: ffffed003b5c5b67
      R10: ffffed003b5c5b67 R11: ffff8801dae2db3b R12: 0000000000000000
      R13: 000000000007165c R14: 000000000007165c R15: 0000000000000005
       rds_cmsg_rdma_args+0x82d/0x1510 net/rds/rdma.c:623
       rds_cmsg_send net/rds/send.c:971 [inline]
       rds_sendmsg+0x19a2/0x3180 net/rds/send.c:1273
       sock_sendmsg_nosec net/socket.c:622 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:632
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2117
       __sys_sendmsg+0x11d/0x280 net/socket.c:2155
       __do_sys_sendmsg net/socket.c:2164 [inline]
       __se_sys_sendmsg net/socket.c:2162 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x44a859
      Code: e8 dc e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 6b cb fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f1d4710ada8 EFLAGS: 00000297 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000006dcc28 RCX: 000000000044a859
      RDX: 0000000000000000 RSI: 0000000020001600 RDI: 0000000000000003
      RBP: 00000000006dcc20 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000297 R12: 00000000006dcc2c
      R13: 646e732f7665642f R14: 00007f1d4710b9c0 R15: 00000000006dcd2c
      Kernel Offset: disabled
      Rebooting in 86400 seconds..
      
      Reported-by: syzbot+26de17458aeda9d305d8@syzkaller.appspotmail.com
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: Nshamir rabinovitch <shamir.rabinovitch@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea010070
  17. 12 12月, 2018 1 次提交
  18. 11 10月, 2018 1 次提交
  19. 24 9月, 2018 1 次提交
  20. 22 9月, 2018 1 次提交
    • N
      RDS: IB: Use DEFINE_PER_CPU_SHARED_ALIGNED for rds_ib_stats · 8360ed67
      Nathan Chancellor 提交于
      Clang warns when two declarations' section attributes don't match.
      
      net/rds/ib_stats.c:40:1: warning: section does not match previous
      declaration [-Wsection]
      DEFINE_PER_CPU_SHARED_ALIGNED(struct rds_ib_statistics, rds_ib_stats);
      ^
      ./include/linux/percpu-defs.h:142:2: note: expanded from macro
      'DEFINE_PER_CPU_SHARED_ALIGNED'
              DEFINE_PER_CPU_SECTION(type, name,
      PER_CPU_SHARED_ALIGNED_SECTION) \
              ^
      ./include/linux/percpu-defs.h:93:9: note: expanded from macro
      'DEFINE_PER_CPU_SECTION'
              extern __PCPU_ATTRS(sec) __typeof__(type) name;
      \
                     ^
      ./include/linux/percpu-defs.h:49:26: note: expanded from macro
      '__PCPU_ATTRS'
              __percpu __attribute__((section(PER_CPU_BASE_SECTION sec)))
      \
                                      ^
      net/rds/ib.h:446:1: note: previous attribute is here
      DECLARE_PER_CPU(struct rds_ib_statistics, rds_ib_stats);
      ^
      ./include/linux/percpu-defs.h:111:2: note: expanded from macro
      'DECLARE_PER_CPU'
              DECLARE_PER_CPU_SECTION(type, name, "")
              ^
      ./include/linux/percpu-defs.h:87:9: note: expanded from macro
      'DECLARE_PER_CPU_SECTION'
              extern __PCPU_ATTRS(sec) __typeof__(type) name
                     ^
      ./include/linux/percpu-defs.h:49:26: note: expanded from macro
      '__PCPU_ATTRS'
              __percpu __attribute__((section(PER_CPU_BASE_SECTION sec)))
      \
                                      ^
      1 warning generated.
      
      The initial definition was added in commit ec16227e ("RDS/IB:
      Infiniband transport") and the cache aligned definition was added in
      commit e6babe4c ("RDS/IB: Stats and sysctls") right after. The
      definition probably should have been updated in net/rds/ib.h, which is
      what this patch does.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/114Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8360ed67
  21. 17 9月, 2018 1 次提交
  22. 12 9月, 2018 1 次提交
    • C
      rds: fix two RCU related problems · cc4dfb7f
      Cong Wang 提交于
      When a rds sock is bound, it is inserted into the bind_hash_table
      which is protected by RCU. But when releasing rds sock, after it
      is removed from this hash table, it is freed immediately without
      respecting RCU grace period. This could cause some use-after-free
      as reported by syzbot.
      
      Mark the rds sock with SOCK_RCU_FREE before inserting it into the
      bind_hash_table, so that it would be always freed after a RCU grace
      period.
      
      The other problem is in rds_find_bound(), the rds sock could be
      freed in between rhashtable_lookup_fast() and rds_sock_addref(),
      so we need to extend RCU read lock protection in rds_find_bound()
      to close this race condition.
      
      Reported-and-tested-by: syzbot+8967084bcac563795dc6@syzkaller.appspotmail.com
      Reported-by: syzbot+93a5839deb355537440f@syzkaller.appspotmail.com
      Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
      Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
      Cc: rds-devel@oss.oracle.com
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oarcle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc4dfb7f
  23. 02 9月, 2018 1 次提交
  24. 01 9月, 2018 1 次提交
  25. 28 8月, 2018 1 次提交
  26. 22 8月, 2018 1 次提交