1. 01 2月, 2020 2 次提交
    • J
      mm, tracing: print symbol name for kmem_alloc_node call_site events · 7e168b9b
      Junyong Sun 提交于
      Print the call_site ip of kmem_alloc_node using '%pS' to improve the
      readability of raw slab trace points.
      
      Link: http://lkml.kernel.org/r/1577949568-4518-1-git-send-email-sunjunyong@xiaomi.comSigned-off-by: NJunyong Sun <sunjunyong@xiaomi.com>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: Changbin Du <changbin.du@intel.com>
      Cc: Tim Murray <timmurray@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e168b9b
    • T
      memcg: fix a crash in wb_workfn when a device disappears · 68f23b89
      Theodore Ts'o 提交于
      Without memcg, there is a one-to-one mapping between the bdi and
      bdi_writeback structures.  In this world, things are fairly
      straightforward; the first thing bdi_unregister() does is to shutdown
      the bdi_writeback structure (or wb), and part of that writeback ensures
      that no other work queued against the wb, and that the wb is fully
      drained.
      
      With memcg, however, there is a one-to-many relationship between the bdi
      and bdi_writeback structures; that is, there are multiple wb objects
      which can all point to a single bdi.  There is a refcount which prevents
      the bdi object from being released (and hence, unregistered).  So in
      theory, the bdi_unregister() *should* only get called once its refcount
      goes to zero (bdi_put will drop the refcount, and when it is zero,
      release_bdi gets called, which calls bdi_unregister).
      
      Unfortunately, del_gendisk() in block/gen_hd.c never got the memo about
      the Brave New memcg World, and calls bdi_unregister directly.  It does
      this without informing the file system, or the memcg code, or anything
      else.  This causes the root wb associated with the bdi to be
      unregistered, but none of the memcg-specific wb's are shutdown.  So when
      one of these wb's are woken up to do delayed work, they try to
      dereference their wb->bdi->dev to fetch the device name, but
      unfortunately bdi->dev is now NULL, thanks to the bdi_unregister()
      called by del_gendisk().  As a result, *boom*.
      
      Fortunately, it looks like the rest of the writeback path is perfectly
      happy with bdi->dev and bdi->owner being NULL, so the simplest fix is to
      create a bdi_dev_name() function which can handle bdi->dev being NULL.
      This also allows us to bulletproof the writeback tracepoints to prevent
      them from dereferencing a NULL pointer and crashing the kernel if one is
      tracing with memcg's enabled, and an iSCSI device dies or a USB storage
      stick is pulled.
      
      The most common way of triggering this will be hotremoval of a device
      while writeback with memcg enabled is going on.  It was triggering
      several times a day in a heavily loaded production environment.
      
      Google Bug Id: 145475544
      
      Link: https://lore.kernel.org/r/20191227194829.150110-1-tytso@mit.edu
      Link: http://lkml.kernel.org/r/20191228005211.163952-1-tytso@mit.eduSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: Chris Mason <clm@fb.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      68f23b89
  2. 25 1月, 2020 3 次提交
  3. 24 1月, 2020 1 次提交
  4. 21 1月, 2020 1 次提交
  5. 20 1月, 2020 2 次提交
  6. 18 1月, 2020 2 次提交
    • S
      f2fs: show the CP_PAUSE reason in checkpoint traces · fad5fbce
      Sahitya Tummala 提交于
      Remove the duplicate CP_UMOUNT enum and add the new CP_PAUSE
      enum to show the checkpoint reason in the trace prints.
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fad5fbce
    • C
      f2fs: support data compression · 4c8ff709
      Chao Yu 提交于
      This patch tries to support compression in f2fs.
      
      - New term named cluster is defined as basic unit of compression, file can
      be divided into multiple clusters logically. One cluster includes 4 << n
      (n >= 0) logical pages, compression size is also cluster size, each of
      cluster can be compressed or not.
      
      - In cluster metadata layout, one special flag is used to indicate cluster
      is compressed one or normal one, for compressed cluster, following metadata
      maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
      data including compress header and compressed data.
      
      - In order to eliminate write amplification during overwrite, F2FS only
      support compression on write-once file, data can be compressed only when
      all logical blocks in file are valid and cluster compress ratio is lower
      than specified threshold.
      
      - To enable compression on regular inode, there are three ways:
      * chattr +c file
      * chattr +c dir; touch dir/file
      * mount w/ -o compress_extension=ext; touch file.ext
      
      Compress metadata layout:
                                   [Dnode Structure]
                   +-----------------------------------------------+
                   | cluster 1 | cluster 2 | ......... | cluster N |
                   +-----------------------------------------------+
                   .           .                       .           .
             .                       .                .                      .
        .         Compressed Cluster       .        .        Normal Cluster            .
      +----------+---------+---------+---------+  +---------+---------+---------+---------+
      |compr flag| block 1 | block 2 | block 3 |  | block 1 | block 2 | block 3 | block 4 |
      +----------+---------+---------+---------+  +---------+---------+---------+---------+
                 .                             .
               .                                           .
             .                                                           .
            +-------------+-------------+----------+----------------------------+
            | data length | data chksum | reserved |      compressed data       |
            +-------------+-------------+----------+----------------------------+
      
      Changelog:
      
      20190326:
      - fix error handling of read_end_io().
      - remove unneeded comments in f2fs_encrypt_one_page().
      
      20190327:
      - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
      - don't jump into loop directly to avoid uninitialized variables.
      - add TODO tag in error path of f2fs_write_cache_pages().
      
      20190328:
      - fix wrong merge condition in f2fs_read_multi_pages().
      - check compressed file in f2fs_post_read_required().
      
      20190401
      - allow overwrite on non-compressed cluster.
      - check cluster meta before writing compressed data.
      
      20190402
      - don't preallocate blocks for compressed file.
      
      - add lz4 compress algorithm
      - process multiple post read works in one workqueue
        Now f2fs supports processing post read work in multiple workqueue,
        it shows low performance due to schedule overhead of multiple
        workqueue executing orderly.
      
      20190921
      - compress: support buffered overwrite
      C: compress cluster flag
      V: valid block address
      N: NEW_ADDR
      
      One cluster contain 4 blocks
      
       before overwrite   after overwrite
      
      - VVVV		->	CVNN
      - CVNN		->	VVVV
      
      - CVNN		->	CVNN
      - CVNN		->	CVVV
      
      - CVVV		->	CVNN
      - CVVV		->	CVVV
      
      20191029
      - add kconfig F2FS_FS_COMPRESSION to isolate compression related
      codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
      note that: will remove lzo backend if Jaegeuk agreed that too.
      - update codes according to Eric's comments.
      
      20191101
      - apply fixes from Jaegeuk
      
      20191113
      - apply fixes from Jaegeuk
      - split workqueue for fsverity
      
      20191216
      - apply fixes from Jaegeuk
      
      20200117
      - fix to avoid NULL pointer dereference
      
      [Jaegeuk Kim]
      - add tracepoint for f2fs_{,de}compress_pages()
      - fix many bugs and add some compression stats
      - fix overwrite/mmap bugs
      - address 32bit build error, reported by Geert.
      - bug fixes when handling errors and i_compressed_blocks
      
      Reported-by: <noreply@ellerman.id.au>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4c8ff709
  7. 17 1月, 2020 3 次提交
    • J
      devmap: Adjust tracepoint for map-less queue flush · 58aa94f9
      Jesper Dangaard Brouer 提交于
      Now that we don't have a reference to a devmap when flushing the device
      bulk queue, let's change the the devmap_xmit tracepoint to remote the
      map_id and map_index fields entirely. Rearrange the fields so 'drops' and
      'sent' stay in the same position in the tracepoint struct, to make it
      possible for the xdp_monitor utility to read both the old and the new
      format.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/157918768613.1458396.9165902403373826572.stgit@toke.dk
      58aa94f9
    • T
      xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths · 1d233886
      Toke Høiland-Jørgensen 提交于
      Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
      we can re-use the bulking for the non-map version of the bpf_redirect()
      helper. This is a simple matter of having xdp_do_redirect_slow() queue the
      frame on the bulk queue instead of sending it out with __bpf_tx_xdp().
      
      Unfortunately we can't make the bpf_redirect() helper return an error if
      the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
      have a reference to the network namespace of the ingress device at the time
      the helper is called. So we have to leave it as-is and keep the device
      lookup in xdp_do_redirect_slow().
      
      Since this leaves less reason to have the non-map redirect code in a
      separate function, so we get rid of the xdp_do_redirect_slow() function
      entirely. This does lose us the tracepoint disambiguation, but fortunately
      the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint
      entry structures. This means both can contain a map index, so we can just
      amend the tracepoint definitions so we always emit the xdp_redirect(_err)
      tracepoints, but with the map ID only populated if a map is present. This
      means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep
      the definitions around in case someone is still listening for them.
      
      With this change, the performance of the xdp_redirect sample program goes
      from 5Mpps to 8.4Mpps (a 68% increase).
      
      Since the flush functions are no longer map-specific, rename the flush()
      functions to drop _map from their names. One of the renamed functions is
      the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To
      keep from having to update all drivers, use a #define to keep the old name
      working, and only update the virtual drivers in this patch.
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk
      1d233886
    • T
      xdp: Move devmap bulk queue into struct net_device · 75ccae62
      Toke Høiland-Jørgensen 提交于
      Commit 96360004 ("xdp: Make devmap flush_list common for all map
      instances"), changed devmap flushing to be a global operation instead of a
      per-map operation. However, the queue structure used for bulking was still
      allocated as part of the containing map.
      
      This patch moves the devmap bulk queue into struct net_device. The
      motivation for this is reusing it for the non-map variant of XDP_REDIRECT,
      which will be changed in a subsequent commit.  To avoid other fields of
      struct net_device moving to different cache lines, we also move a couple of
      other members around.
      
      We defer the actual allocation of the bulk queue structure until the
      NETDEV_REGISTER notification devmap.c. This makes it possible to check for
      ndo_xdp_xmit support before allocating the structure, which is not possible
      at the time struct net_device is allocated. However, we keep the freeing in
      free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER.
      
      Because of this change, we lose the reference back to the map that
      originated the redirect, so change the tracepoint to always return 0 as the
      map ID and index. Otherwise no functional change is intended with this
      patch.
      
      After this patch, the relevant part of struct net_device looks like this,
      according to pahole:
      
      	/* --- cacheline 14 boundary (896 bytes) --- */
      	struct netdev_queue *      _tx __attribute__((__aligned__(64))); /*   896     8 */
      	unsigned int               num_tx_queues;        /*   904     4 */
      	unsigned int               real_num_tx_queues;   /*   908     4 */
      	struct Qdisc *             qdisc;                /*   912     8 */
      	unsigned int               tx_queue_len;         /*   920     4 */
      	spinlock_t                 tx_global_lock;       /*   924     4 */
      	struct xdp_dev_bulk_queue * xdp_bulkq;           /*   928     8 */
      	struct xps_dev_maps *      xps_cpus_map;         /*   936     8 */
      	struct xps_dev_maps *      xps_rxqs_map;         /*   944     8 */
      	struct mini_Qdisc *        miniq_egress;         /*   952     8 */
      	/* --- cacheline 15 boundary (960 bytes) --- */
      	struct hlist_head  qdisc_hash[16];               /*   960   128 */
      	/* --- cacheline 17 boundary (1088 bytes) --- */
      	struct timer_list  watchdog_timer;               /*  1088    40 */
      
      	/* XXX last struct has 4 bytes of padding */
      
      	int                        watchdog_timeo;       /*  1128     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	struct list_head   todo_list;                    /*  1136    16 */
      	/* --- cacheline 18 boundary (1152 bytes) --- */
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/157918768397.1458396.12673224324627072349.stgit@toke.dk
      75ccae62
  8. 16 1月, 2020 2 次提交
  9. 15 1月, 2020 4 次提交
  10. 14 1月, 2020 1 次提交
  11. 13 1月, 2020 1 次提交
  12. 10 1月, 2020 2 次提交
  13. 08 1月, 2020 2 次提交
    • C
      RDMA/core: Add trace points to follow MR allocation · 622db5b6
      Chuck Lever 提交于
      Track the lifetime of ib_mr objects. Here's sample output from a test run
      with NFS/RDMA:
      
                 <...>-361   [009] 79238.772782: mr_alloc:             pd.id=3 mr.id=11 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79238.772812: mr_alloc:             pd.id=3 mr.id=12 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79238.772839: mr_alloc:             pd.id=3 mr.id=13 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79238.772866: mr_alloc:             pd.id=3 mr.id=14 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79238.772893: mr_alloc:             pd.id=3 mr.id=15 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79238.772921: mr_alloc:             pd.id=3 mr.id=16 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79238.772947: mr_alloc:             pd.id=3 mr.id=17 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79238.772974: mr_alloc:             pd.id=3 mr.id=18 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79238.773001: mr_alloc:             pd.id=3 mr.id=19 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79238.773028: mr_alloc:             pd.id=3 mr.id=20 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79238.773055: mr_alloc:             pd.id=3 mr.id=21 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79240.270942: mr_alloc:             pd.id=3 mr.id=22 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79240.270975: mr_alloc:             pd.id=3 mr.id=23 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79240.271007: mr_alloc:             pd.id=3 mr.id=24 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79240.271036: mr_alloc:             pd.id=3 mr.id=25 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79240.271067: mr_alloc:             pd.id=3 mr.id=26 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79240.271095: mr_alloc:             pd.id=3 mr.id=27 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79240.271121: mr_alloc:             pd.id=3 mr.id=28 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79240.271153: mr_alloc:             pd.id=3 mr.id=29 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79240.271181: mr_alloc:             pd.id=3 mr.id=30 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79240.271208: mr_alloc:             pd.id=3 mr.id=31 type=MEM_REG max_num_sg=30 rc=0
                 <...>-361   [009] 79240.271236: mr_alloc:             pd.id=3 mr.id=32 type=MEM_REG max_num_sg=30 rc=0
                 <...>-4351  [001] 79242.299400: mr_dereg:             mr.id=32
                 <...>-4351  [001] 79242.299467: mr_dereg:             mr.id=31
                 <...>-4351  [001] 79242.299554: mr_dereg:             mr.id=30
                 <...>-4351  [001] 79242.299615: mr_dereg:             mr.id=29
                 <...>-4351  [001] 79242.299684: mr_dereg:             mr.id=28
                 <...>-4351  [001] 79242.299748: mr_dereg:             mr.id=27
                 <...>-4351  [001] 79242.299812: mr_dereg:             mr.id=26
                 <...>-4351  [001] 79242.299874: mr_dereg:             mr.id=25
                 <...>-4351  [001] 79242.299944: mr_dereg:             mr.id=24
                 <...>-4351  [001] 79242.300009: mr_dereg:             mr.id=23
                 <...>-4351  [001] 79242.300190: mr_dereg:             mr.id=22
                 <...>-4351  [001] 79242.300263: mr_dereg:             mr.id=21
                 <...>-4351  [001] 79242.300326: mr_dereg:             mr.id=20
                 <...>-4351  [001] 79242.300388: mr_dereg:             mr.id=19
                 <...>-4351  [001] 79242.300450: mr_dereg:             mr.id=18
                 <...>-4351  [001] 79242.300516: mr_dereg:             mr.id=17
                 <...>-4351  [001] 79242.300629: mr_dereg:             mr.id=16
                 <...>-4351  [001] 79242.300718: mr_dereg:             mr.id=15
                 <...>-4351  [001] 79242.300784: mr_dereg:             mr.id=14
                 <...>-4351  [001] 79242.300879: mr_dereg:             mr.id=13
                 <...>-4351  [001] 79242.300945: mr_dereg:             mr.id=12
                 <...>-4351  [001] 79242.301012: mr_dereg:             mr.id=11
      
      Some features of the output:
      - The lifetime and owner PD of each MR is clearly visible.
      - The type of MR is captured, as is the SGE array size.
      - Failing MR allocation can be recorded.
      
      Link: https://lore.kernel.org/r/20191218201820.30584.34636.stgit@manet.1015granger.netSigned-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      622db5b6
    • C
      RDMA/core: Trace points for diagnosing completion queue issues · 3e5901cb
      Chuck Lever 提交于
      Sample trace events:
      
         kworker/u29:0-300   [007]   120.042217: cq_alloc:             cq.id=4 nr_cqe=161 comp_vector=2 poll_ctx=WORKQUEUE
                <idle>-0     [002]   120.056292: cq_schedule:          cq.id=4
          kworker/2:1H-482   [002]   120.056402: cq_process:           cq.id=4 wake-up took 109 [us] from interrupt
          kworker/2:1H-482   [002]   120.056407: cq_poll:              cq.id=4 requested 16, returned 1
                <idle>-0     [002]   120.067503: cq_schedule:          cq.id=4
          kworker/2:1H-482   [002]   120.067537: cq_process:           cq.id=4 wake-up took 34 [us] from interrupt
          kworker/2:1H-482   [002]   120.067541: cq_poll:              cq.id=4 requested 16, returned 1
                <idle>-0     [002]   120.067657: cq_schedule:          cq.id=4
          kworker/2:1H-482   [002]   120.067672: cq_process:           cq.id=4 wake-up took 15 [us] from interrupt
          kworker/2:1H-482   [002]   120.067674: cq_poll:              cq.id=4 requested 16, returned 1
      
       ...
      
               systemd-1     [002]   122.392653: cq_schedule:          cq.id=4
          kworker/2:1H-482   [002]   122.392688: cq_process:           cq.id=4 wake-up took 35 [us] from interrupt
          kworker/2:1H-482   [002]   122.392693: cq_poll:              cq.id=4 requested 16, returned 16
          kworker/2:1H-482   [002]   122.392836: cq_poll:              cq.id=4 requested 16, returned 16
          kworker/2:1H-482   [002]   122.392970: cq_poll:              cq.id=4 requested 16, returned 16
          kworker/2:1H-482   [002]   122.393083: cq_poll:              cq.id=4 requested 16, returned 16
          kworker/2:1H-482   [002]   122.393195: cq_poll:              cq.id=4 requested 16, returned 3
      
      Several features to note in this output:
       - The WCE count and context type are reported at allocation time
       - The CPU and kworker for each CQ is evident
       - The CQ's restracker ID is tagged on each trace event
       - CQ poll scheduling latency is measured
       - Details about how often single completions occur versus multiple
         completions are evident
       - The cost of the ULP's completion handler is recorded
      
      Link: https://lore.kernel.org/r/20191218201815.30584.3481.stgit@manet.1015granger.netSigned-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      3e5901cb
  14. 07 1月, 2020 1 次提交
    • L
      iommu/vt-d: trace: Extend map_sg trace event · 984d03ad
      Lu Baolu 提交于
      Current map_sg stores trace message in a coarse manner. This
      extends it so that more detailed messages could be traced.
      
      The map_sg trace message looks like:
      
      map_sg: dev=0000:00:17.0 [1/9] dev_addr=0xf8f90000 phys_addr=0x158051000 size=4096
      map_sg: dev=0000:00:17.0 [2/9] dev_addr=0xf8f91000 phys_addr=0x15a858000 size=4096
      map_sg: dev=0000:00:17.0 [3/9] dev_addr=0xf8f92000 phys_addr=0x15aa13000 size=4096
      map_sg: dev=0000:00:17.0 [4/9] dev_addr=0xf8f93000 phys_addr=0x1570f1000 size=8192
      map_sg: dev=0000:00:17.0 [5/9] dev_addr=0xf8f95000 phys_addr=0x15c6d0000 size=4096
      map_sg: dev=0000:00:17.0 [6/9] dev_addr=0xf8f96000 phys_addr=0x157194000 size=4096
      map_sg: dev=0000:00:17.0 [7/9] dev_addr=0xf8f97000 phys_addr=0x169552000 size=4096
      map_sg: dev=0000:00:17.0 [8/9] dev_addr=0xf8f98000 phys_addr=0x169dde000 size=4096
      map_sg: dev=0000:00:17.0 [9/9] dev_addr=0xf8f99000 phys_addr=0x148351000 size=4096
      Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      984d03ad
  15. 04 1月, 2020 1 次提交
  16. 03 1月, 2020 1 次提交
  17. 30 12月, 2019 1 次提交
  18. 27 12月, 2019 1 次提交
    • K
      sctp: move trace_sctp_probe_path into sctp_outq_sack · f643ee29
      Kevin Kou 提交于
      The original patch bringed in the "SCTP ACK tracking trace event"
      feature was committed at Dec.20, 2017, it replaced jprobe usage
      with trace events, and bringed in two trace events, one is
      TRACE_EVENT(sctp_probe), another one is TRACE_EVENT(sctp_probe_path).
      The original patch intended to trigger the trace_sctp_probe_path in
      TRACE_EVENT(sctp_probe) as below code,
      
      +TRACE_EVENT(sctp_probe,
      +
      +	TP_PROTO(const struct sctp_endpoint *ep,
      +		 const struct sctp_association *asoc,
      +		 struct sctp_chunk *chunk),
      +
      +	TP_ARGS(ep, asoc, chunk),
      +
      +	TP_STRUCT__entry(
      +		__field(__u64, asoc)
      +		__field(__u32, mark)
      +		__field(__u16, bind_port)
      +		__field(__u16, peer_port)
      +		__field(__u32, pathmtu)
      +		__field(__u32, rwnd)
      +		__field(__u16, unack_data)
      +	),
      +
      +	TP_fast_assign(
      +		struct sk_buff *skb = chunk->skb;
      +
      +		__entry->asoc = (unsigned long)asoc;
      +		__entry->mark = skb->mark;
      +		__entry->bind_port = ep->base.bind_addr.port;
      +		__entry->peer_port = asoc->peer.port;
      +		__entry->pathmtu = asoc->pathmtu;
      +		__entry->rwnd = asoc->peer.rwnd;
      +		__entry->unack_data = asoc->unack_data;
      +
      +		if (trace_sctp_probe_path_enabled()) {
      +			struct sctp_transport *sp;
      +
      +			list_for_each_entry(sp, &asoc->peer.transport_addr_list,
      +					    transports) {
      +				trace_sctp_probe_path(sp, asoc);
      +			}
      +		}
      +	),
      
      But I found it did not work when I did testing, and trace_sctp_probe_path
      had no output, I finally found that there is trace buffer lock
      operation(trace_event_buffer_reserve) in include/trace/trace_events.h:
      
      static notrace void							\
      trace_event_raw_event_##call(void *__data, proto)			\
      {									\
      	struct trace_event_file *trace_file = __data;			\
      	struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\
      	struct trace_event_buffer fbuffer;				\
      	struct trace_event_raw_##call *entry;				\
      	int __data_size;						\
      									\
      	if (trace_trigger_soft_disabled(trace_file))			\
      		return;							\
      									\
      	__data_size = trace_event_get_offsets_##call(&__data_offsets, args); \
      									\
      	entry = trace_event_buffer_reserve(&fbuffer, trace_file,	\
      				 sizeof(*entry) + __data_size);		\
      									\
      	if (!entry)							\
      		return;							\
      									\
      	tstruct								\
      									\
      	{ assign; }							\
      									\
      	trace_event_buffer_commit(&fbuffer);				\
      }
      
      The reason caused no output of trace_sctp_probe_path is that
      trace_sctp_probe_path written in TP_fast_assign part of
      TRACE_EVENT(sctp_probe), and it will be placed( { assign; } ) after the
      trace_event_buffer_reserve() when compiler expands Macro,
      
              entry = trace_event_buffer_reserve(&fbuffer, trace_file,        \
                                       sizeof(*entry) + __data_size);         \
                                                                              \
              if (!entry)                                                     \
                      return;                                                 \
                                                                              \
              tstruct                                                         \
                                                                              \
              { assign; }                                                     \
      
      so trace_sctp_probe_path finally can not acquire trace_event_buffer
      and return no output, that is to say the nest of tracepoint entry function
      is not allowed. The function call flow is:
      
      trace_sctp_probe()
      -> trace_event_raw_event_sctp_probe()
       -> lock buffer
       -> trace_sctp_probe_path()
         -> trace_event_raw_event_sctp_probe_path()  --nested
         -> buffer has been locked and return no output.
      
      This patch is to remove trace_sctp_probe_path from the TP_fast_assign
      part of TRACE_EVENT(sctp_probe) to avoid the nest of entry function,
      and trigger sctp_probe_path_trace in sctp_outq_sack.
      
      After this patch, you can enable both events individually,
        # cd /sys/kernel/debug/tracing
        # echo 1 > events/sctp/sctp_probe/enable
        # echo 1 > events/sctp/sctp_probe_path/enable
      
      Or, you can enable all the events under sctp.
      
        # echo 1 > events/sctp/enable
      Signed-off-by: NKevin Kou <qdkevin.kou@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f643ee29
  19. 10 12月, 2019 2 次提交
    • L
      rcu: Make PREEMPT_RCU be a modifier to TREE_RCU · b3e627d3
      Lai Jiangshan 提交于
      Currently PREEMPT_RCU and TREE_RCU are mutually exclusive Kconfig
      options.  But PREEMPT_RCU actually specifies a kind of TREE_RCU,
      namely a preemptible TREE_RCU. This commit therefore makes PREEMPT_RCU
      be a modifer to the TREE_RCU Kconfig option.  This has the benefit of
      simplifying several of the #if expressions that formerly needed to
      check both, but now need only check one or the other.
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      Signed-off-by: NLai Jiangshan <jiangshanlai@gmail.com>
      Reviewed-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      b3e627d3
    • M
      rcu: Fix data-race due to atomic_t copy-by-value · 6cf539a8
      Marco Elver 提交于
      This fixes a data-race where `atomic_t dynticks` is copied by value. The
      copy is performed non-atomically, resulting in a data-race if `dynticks`
      is updated concurrently.
      
      This data-race was found with KCSAN:
      ==================================================================
      BUG: KCSAN: data-race in dyntick_save_progress_counter / rcu_irq_enter
      
      write to 0xffff989dbdbe98e0 of 4 bytes by task 10 on cpu 3:
       atomic_add_return include/asm-generic/atomic-instrumented.h:78 [inline]
       rcu_dynticks_snap kernel/rcu/tree.c:310 [inline]
       dyntick_save_progress_counter+0x43/0x1b0 kernel/rcu/tree.c:984
       force_qs_rnp+0x183/0x200 kernel/rcu/tree.c:2286
       rcu_gp_fqs kernel/rcu/tree.c:1601 [inline]
       rcu_gp_fqs_loop+0x71/0x880 kernel/rcu/tree.c:1653
       rcu_gp_kthread+0x22c/0x3b0 kernel/rcu/tree.c:1799
       kthread+0x1b5/0x200 kernel/kthread.c:255
       <snip>
      
      read to 0xffff989dbdbe98e0 of 4 bytes by task 154 on cpu 7:
       rcu_nmi_enter_common kernel/rcu/tree.c:828 [inline]
       rcu_irq_enter+0xda/0x240 kernel/rcu/tree.c:870
       irq_enter+0x5/0x50 kernel/softirq.c:347
       <snip>
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 7 PID: 154 Comm: kworker/7:1H Not tainted 5.3.0+ #5
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
      Workqueue: kblockd blk_mq_run_work_fn
      ==================================================================
      Signed-off-by: NMarco Elver <elver@google.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: rcu@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      6cf539a8
  20. 01 12月, 2019 2 次提交
    • J
      rss_stat: add support to detect RSS updates of external mm · e4dcad20
      Joel Fernandes (Google) 提交于
      When a process updates the RSS of a different process, the rss_stat
      tracepoint appears in the context of the process doing the update.  This
      can confuse userspace that the RSS of process doing the update is
      updated, while in reality a different process's RSS was updated.
      
      This issue happens in reclaim paths such as with direct reclaim or
      background reclaim.
      
      This patch adds more information to the tracepoint about whether the mm
      being updated belongs to the current process's context (curr field).  We
      also include a hash of the mm pointer so that the process who the mm
      belongs to can be uniquely identified (mm_id field).
      
      Also vsprintf.c is refactored a bit to allow reuse of hashing code.
      
      [akpm@linux-foundation.org: remove unused local `str']
      [joelaf@google.com: inline call to ptr_to_hashval]
        Link: http://lore.kernel.org/r/20191113153816.14b95acd@gandalf.local.home
        Link: http://lkml.kernel.org/r/20191114164622.GC233237@google.com
      Link: http://lkml.kernel.org/r/20191106024452.81923-1-joel@joelfernandes.orgSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Reported-by: NIoannis Ilkos <ilkos@google.com>
      Acked-by: Petr Mladek <pmladek@suse.com>	[lib/vsprintf.c]
      Cc: Tim Murray <timmurray@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Carmen Jackson <carmenjackson@google.com>
      Cc: Mayank Gupta <mayankgupta@google.com>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e4dcad20
    • J
      mm: emit tracepoint when RSS changes · b3d1411b
      Joel Fernandes (Google) 提交于
      Useful to track how RSS is changing per TGID to detect spikes in RSS and
      memory hogs.  Several Android teams have been using this patch in
      various kernel trees for half a year now.  Many reported to me it is
      really useful so I'm posting it upstream.
      
      Initial patch developed by Tim Murray.  Changes I made from original
      patch: o Prevent any additional space consumed by mm_struct.
      
      Regarding the fact that the RSS may change too often thus flooding the
      traces - note that, there is some "hysterisis" with this already.  That
      is - We update the counter only if we receive 64 page faults due to
      SPLIT_RSS_ACCOUNTING.  However, during zapping or copying of pte range,
      the RSS is updated immediately which can become noisy/flooding.  In a
      previous discussion, we agreed that BPF or ftrace can be used to rate
      limit the signal if this becomes an issue.
      
      Also note that I added wrappers to trace_rss_stat to prevent compiler
      errors where linux/mm.h is included from tracing code, causing errors
      such as:
      
          CC      kernel/trace/power-traces.o
        In file included from ./include/trace/define_trace.h:102,
                         from ./include/trace/events/kmem.h:342,
                         from ./include/linux/mm.h:31,
                         from ./include/linux/ring_buffer.h:5,
                         from ./include/linux/trace_events.h:6,
                         from ./include/trace/events/power.h:12,
                         from kernel/trace/power-traces.c:15:
        ./include/trace/trace_events.h:113:22: error: field `ent' has incomplete type
           struct trace_entry ent;    \
      
      Link: http://lore.kernel.org/r/20190903200905.198642-1-joel@joelfernandes.org
      Link: http://lkml.kernel.org/r/20191001172817.234886-1-joel@joelfernandes.orgCo-developed-by: NTim Murray <timmurray@google.com>
      Signed-off-by: NTim Murray <timmurray@google.com>
      Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Carmen Jackson <carmenjackson@google.com>
      Cc: Mayank Gupta <mayankgupta@google.com>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3d1411b
  21. 27 11月, 2019 1 次提交
    • P
      ftrace: Rework event_create_dir() · 04ae87a5
      Peter Zijlstra 提交于
      Rework event_create_dir() to use an array of static data instead of
      function pointers where possible.
      
      The problem is that it would call the function pointer on module load
      before parse_args(), possibly even before jump_labels were initialized.
      Luckily the generated functions don't use jump_labels but it still seems
      fragile. It also gets in the way of changing when we make the module map
      executable.
      
      The generated function are basically calling trace_define_field() with a
      bunch of static arguments. So instead of a function, capture these
      arguments in a static array, avoiding the function call.
      
      Now there are a number of cases where the fields are dynamic (syscall
      arguments, kprobes and uprobes), in which case a static array does not
      work, for these we preserve the function call. Luckily all these cases
      are not related to modules and so we can retain the function call for
      them.
      
      Also fix up all broken tracepoint definitions that now generate a
      compile error.
      Tested-by: NAlexei Starovoitov <ast@kernel.org>
      Tested-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20191111132458.342979914@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      04ae87a5
  22. 26 11月, 2019 1 次提交
  23. 25 11月, 2019 1 次提交
    • Q
      writeback: fix -Wformat compilation warnings · 40363cf1
      Qian Cai 提交于
      The commit f05499a0 ("writeback: use ino_t for inodes in
      tracepoints") introduced a lot of GCC compilation warnings on s390,
      
      In file included from ./include/trace/define_trace.h:102,
                       from ./include/trace/events/writeback.h:904,
                       from fs/fs-writeback.c:82:
      ./include/trace/events/writeback.h: In function
      'trace_raw_output_writeback_page_template':
      ./include/trace/events/writeback.h:76:12: warning: format '%lu' expects
      argument of type 'long unsigned int', but argument 4 has type 'ino_t'
      {aka 'unsigned int'} [-Wformat=]
        TP_printk("bdi %s: ino=%lu index=%lu",
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
      ./include/trace/trace_events.h:360:22: note: in definition of macro
      'DECLARE_EVENT_CLASS'
        trace_seq_printf(s, print);     \
                            ^~~~~
      ./include/trace/events/writeback.h:76:2: note: in expansion of macro
      'TP_printk'
        TP_printk("bdi %s: ino=%lu index=%lu",
        ^~~~~~~~~
      
      Fix them by adding necessary casts where ino_t could be either "unsigned
      int" or "unsigned long".
      
      Fixes: f05499a0 ("writeback: use ino_t for inodes in tracepoints")
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      40363cf1
  24. 23 11月, 2019 1 次提交
    • C
      SUNRPC: Capture completion of all RPC tasks · a264abad
      Chuck Lever 提交于
      RPC tasks on the backchannel never invoke xprt_complete_rqst(), so
      there is no way to report their tk_status at completion. Also, any
      RPC task that exits via rpc_exit_task() before it is replied to will
      also disappear without a trace.
      
      Introduce a trace point that is symmetrical with rpc_task_begin that
      captures the termination status of each RPC task.
      
      Sample trace output for callback requests initiated on the server:
         kworker/u8:12-448   [003]   127.025240: rpc_task_end:         task:50@3 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task
         kworker/u8:12-448   [002]   127.567310: rpc_task_end:         task:51@3 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task
         kworker/u8:12-448   [001]   130.506817: rpc_task_end:         task:52@3 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task
      
      Odd, though, that I never see trace_rpc_task_complete, either in the
      forward or backchannel. Should it be removed?
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      a264abad
  25. 21 11月, 2019 1 次提交