1. 18 4月, 2017 1 次提交
  2. 03 3月, 2017 1 次提交
  3. 02 3月, 2017 2 次提交
    • I
      sched/headers: Prepare for new header dependencies before moving code to... · 6a3827d7
      Ingo Molnar 提交于
      sched/headers: Prepare for new header dependencies before moving code to <linux/sched/numa_balancing.h>
      
      We are going to split <linux/sched/numa_balancing.h> out of <linux/sched.h>, which
      will have to be picked up from other headers and a couple of .c files.
      
      Create a trivial placeholder <linux/sched/numa_balancing.h> file that just
      maps to <linux/sched.h> to make this patch obviously correct and
      bisectable.
      
      Include the new header in the files that are going to need it.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6a3827d7
    • D
      rxrpc: Fix deadlock between call creation and sendmsg/recvmsg · 540b1c48
      David Howells 提交于
      All the routines by which rxrpc is accessed from the outside are serialised
      by means of the socket lock (sendmsg, recvmsg, bind,
      rxrpc_kernel_begin_call(), ...) and this presents a problem:
      
       (1) If a number of calls on the same socket are in the process of
           connection to the same peer, a maximum of four concurrent live calls
           are permitted before further calls need to wait for a slot.
      
       (2) If a call is waiting for a slot, it is deep inside sendmsg() or
           rxrpc_kernel_begin_call() and the entry function is holding the socket
           lock.
      
       (3) sendmsg() and recvmsg() or the in-kernel equivalents are prevented
           from servicing the other calls as they need to take the socket lock to
           do so.
      
       (4) The socket is stuck until a call is aborted and makes its slot
           available to the waiter.
      
      Fix this by:
      
       (1) Provide each call with a mutex ('user_mutex') that arbitrates access
           by the users of rxrpc separately for each specific call.
      
       (2) Make rxrpc_sendmsg() and rxrpc_recvmsg() unlock the socket as soon as
           they've got a call and taken its mutex.
      
           Note that I'm returning EWOULDBLOCK from recvmsg() if MSG_DONTWAIT is
           set but someone else has the lock.  Should I instead only return
           EWOULDBLOCK if there's nothing currently to be done on a socket, and
           sleep in this particular instance because there is something to be
           done, but we appear to be blocked by the interrupt handler doing its
           ping?
      
       (3) Make rxrpc_new_client_call() unlock the socket after allocating a new
           call, locking its user mutex and adding it to the socket's call tree.
           The call is returned locked so that sendmsg() can add data to it
           immediately.
      
           From the moment the call is in the socket tree, it is subject to
           access by sendmsg() and recvmsg() - even if it isn't connected yet.
      
       (4) Lock new service calls in the UDP data_ready handler (in
           rxrpc_new_incoming_call()) because they may already be in the socket's
           tree and the data_ready handler makes them live immediately if a user
           ID has already been preassigned.
      
           Note that the new call is locked before any notifications are sent
           that it is live, so doing mutex_trylock() *ought* to always succeed.
           Userspace is prevented from doing sendmsg() on calls that are in a
           too-early state in rxrpc_do_sendmsg().
      
       (5) Make rxrpc_new_incoming_call() return the call with the user mutex
           held so that a ping can be scheduled immediately under it.
      
           Note that it might be worth moving the ping call into
           rxrpc_new_incoming_call() and then we can drop the mutex there.
      
       (6) Make rxrpc_accept_call() take the lock on the call it is accepting and
           release the socket after adding the call to the socket's tree.  This
           is slightly tricky as we've dequeued the call by that point and have
           to requeue it.
      
           Note that requeuing emits a trace event.
      
       (7) Make rxrpc_kernel_send_data() and rxrpc_kernel_recv_data() take the
           new mutex immediately and don't bother with the socket mutex at all.
      
      This patch has the nice bonus that calls on the same socket are now to some
      extent parallelisable.
      
      Note that we might want to move rxrpc_service_prealloc() calls out from the
      socket lock and give it its own lock, so that we don't hang progress in
      other calls because we're waiting for the allocator.
      
      We probably also want to avoid calling rxrpc_notify_socket() from within
      the socket lock (rxrpc_accept_call()).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMarc Dionne <marc.c.dionne@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      540b1c48
  4. 28 2月, 2017 1 次提交
  5. 25 2月, 2017 1 次提交
  6. 24 2月, 2017 3 次提交
  7. 23 2月, 2017 14 次提交
  8. 15 2月, 2017 1 次提交
  9. 14 2月, 2017 1 次提交
    • N
      btrfs: Make btrfs_ino take a struct btrfs_inode · 4a0cc7ca
      Nikolay Borisov 提交于
      Currently btrfs_ino takes a struct inode and this causes a lot of
      internal btrfs functions which consume this ino to take a VFS inode,
      rather than btrfs' own struct btrfs_inode. In order to fix this "leak"
      of VFS structs into the internals of btrfs first it's necessary to
      eliminate all uses of struct inode for the purpose of inode. This patch
      does that by using BTRFS_I to convert an inode to btrfs_inode. With
      this problem eliminated subsequent patches will start eliminating the
      passing of struct inode altogether, eventually resulting in a lot cleaner
      code.
      Signed-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com>
      [ fix btrfs_get_extent tracepoint prototype ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4a0cc7ca
  10. 10 2月, 2017 1 次提交
  11. 01 2月, 2017 1 次提交
    • F
      timers/itimer: Convert internal cputime_t units to nsec · 858cf3a8
      Frederic Weisbecker 提交于
      Use the new nsec based cputime accessors as part of the whole cputime
      conversion from cputime_t to nsecs.
      
      Also convert itimers to use nsec based internal counters. This simplifies
      it and removes the whole game with error/inc_error which served to deal
      with cputime_t random granularity.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1485832191-26889-20-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      858cf3a8
  12. 29 1月, 2017 2 次提交
  13. 28 1月, 2017 1 次提交
    • C
      block: cleanup tracing · 48b77ad6
      Christoph Hellwig 提交于
      A couple tweaks to the tracing code:
      
       - trace the request size for all requests
       - trace request sector and nr_sectors only for fs requests, enforced by
         helpers
       - drop SCSI CDB tracing - we have SCSI tracing for this and are going
         to me the CDB out of the generic struct request soon.
      
      With this the tracing code stops to know about BLOCK_PC requests entirely,
      it's just FS vs passthrough requests now, where the latter includes any
      driver-private requests.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      48b77ad6
  14. 26 1月, 2017 1 次提交
    • D
      bpf: add initial bpf tracepoints · a67edbf4
      Daniel Borkmann 提交于
      This work adds a number of tracepoints to paths that are either
      considered slow-path or exception-like states, where monitoring or
      inspecting them would be desirable.
      
      For bpf(2) syscall, tracepoints have been placed for main commands
      when they succeed. In XDP case, tracepoint is for exceptions, that
      is, f.e. on abnormal BPF program exit such as unknown or XDP_ABORTED
      return code, or when error occurs during XDP_TX action and the packet
      could not be forwarded.
      
      Both have been split into separate event headers, and can be further
      extended. Worst case, if they unexpectedly should get into our way in
      future, they can also removed [1]. Of course, these tracepoints (like
      any other) can be analyzed by eBPF itself, etc. Example output:
      
        # ./perf record -a -e bpf:* sleep 10
        # ./perf script
        sock_example  6197 [005]   283.980322:      bpf:bpf_map_create: map type=ARRAY ufd=4 key=4 val=8 max=256 flags=0
        sock_example  6197 [005]   283.980721:       bpf:bpf_prog_load: prog=a5ea8fa30ea6849c type=SOCKET_FILTER ufd=5
        sock_example  6197 [005]   283.988423:   bpf:bpf_prog_get_type: prog=a5ea8fa30ea6849c type=SOCKET_FILTER
        sock_example  6197 [005]   283.988443: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[06 00 00 00] val=[00 00 00 00 00 00 00 00]
        [...]
        sock_example  6197 [005]   288.990868: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[01 00 00 00] val=[14 00 00 00 00 00 00 00]
             swapper     0 [005]   289.338243:    bpf:bpf_prog_put_rcu: prog=a5ea8fa30ea6849c type=SOCKET_FILTER
      
        [1] https://lwn.net/Articles/705270/Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a67edbf4
  15. 24 1月, 2017 1 次提交
    • P
      rcu: Check cond_resched_rcu_qs() state less often to reduce GP overhead · 3a19b46a
      Paul E. McKenney 提交于
      Commit 4a81e832 ("rcu: Reduce overhead of cond_resched() checks
      for RCU") moved quiescent-state generation out of cond_resched()
      and commit bde6c3aa ("rcu: Provide cond_resched_rcu_qs() to force
      quiescent states in long loops") introduced cond_resched_rcu_qs(), and
      commit 5cd37193 ("rcu: Make cond_resched_rcu_qs() apply to normal RCU
      flavors") introduced the per-CPU rcu_qs_ctr variable, which is frequently
      polled by the RCU core state machine.
      
      This frequent polling can increase grace-period rate, which in turn
      increases grace-period overhead, which is visible in some benchmarks
      (for example, the "open1" benchmark in Anton Blanchard's "will it scale"
      suite).  This commit therefore reduces the rate at which rcu_qs_ctr
      is polled by moving that polling into the force-quiescent-state (FQS)
      machinery, and by further polling it only after the grace period has
      been in effect for at least jiffies_till_sched_qs jiffies.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      3a19b46a
  16. 11 1月, 2017 1 次提交
  17. 09 1月, 2017 6 次提交
    • D
      afs: Refcount the afs_call struct · 341f741f
      David Howells 提交于
      A static checker warning occurs in the AFS filesystem:
      
      	fs/afs/cmservice.c:155 SRXAFSCB_CallBack()
      	error: dereferencing freed memory 'call'
      
      due to the reply being sent before we access the server it points to.  The
      act of sending the reply causes the call to be freed if an error occurs
      (but not if it doesn't).
      
      On top of this, the lifetime handling of afs_call structs is fragile
      because they get passed around through workqueues without any sort of
      refcounting.
      
      Deal with the issues by:
      
       (1) Fix the maybe/maybe not nature of the reply sending functions with
           regards to whether they release the call struct.
      
       (2) Refcount the afs_call struct and sort out places that need to get/put
           references.
      
       (3) Pass a ref through the work queue and release (or pass on) that ref in
           the work function.  Care has to be taken because a work queue may
           already own a ref to the call.
      
       (4) Do the cleaning up in the put function only.
      
       (5) Simplify module cleanup by always incrementing afs_outstanding_calls
           whenever a call is allocated.
      
       (6) Set the backlog to 0 with kernel_listen() at the beginning of the
           process of closing the socket to prevent new incoming calls from
           occurring and to remove the contribution of preallocated calls from
           afs_outstanding_calls before we wait on it.
      
      A tracepoint is also added to monitor the afs_call refcount and lifetime.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Fixes: 08e0e7c8: "[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC."
      341f741f
    • D
      btrfs: make tracepoint format strings more compact · 562a7a07
      David Sterba 提交于
      We've recently added the fsid to trace events, this makes the line quite
      long. To reduce the it again, remove extra spaces around = and remove
      ",".
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      562a7a07
    • L
      Btrfs: add truncated_len for ordered extent tracepoints · 78566548
      Liu Bo 提交于
      This can help us monitor truncated ordered extents.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      78566548
    • L
      Btrfs: add 'inode' for extent map tracepoint · 92a1bf76
      Liu Bo 提交于
      'inode' is an important field for btrfs_get_extent, lets trace it.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      92a1bf76
    • D
      btrfs: fix crash when tracepoint arguments are freed by wq callbacks · ac0c7cf8
      David Sterba 提交于
      Enabling btrfs tracepoints leads to instant crash, as reported. The wq
      callbacks could free the memory and the tracepoints started to
      dereference the members to get to fs_info.
      
      The proposed fix https://marc.info/?l=linux-btrfs&m=148172436722606&w=2
      removed the tracepoints but we could preserve them by passing only the
      required data in a safe way.
      
      Fixes: bc074524 ("btrfs: prefix fsid to all trace events")
      CC: stable@vger.kernel.org # 4.8+
      Reported-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ac0c7cf8
    • D
      afs: Add some tracepoints · 8e8d7f13
      David Howells 提交于
      Add three tracepoints to the AFS filesystem:
      
       (1) The afs_recv_data tracepoint logs data segments that are extracted
           from the data received from the peer through afs_extract_data().
      
       (2) The afs_notify_call tracepoint logs notification from AF_RXRPC of data
           coming in to an asynchronous call.
      
       (3) The afs_cb_call tracepoint logs incoming calls that have had their
           operation ID extracted and mapped into a supported cache manager
           service call.
      
      To make (3) work, the name strings in the afs_call_type struct objects have
      to be annotated with __tracepoint_string.  This is done with the CM_NAME()
      macro.
      
      Further, the AFS call state enum needs a name so that it can be used to
      declare parameter types.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      8e8d7f13
  18. 06 1月, 2017 1 次提交
    • L
      scsi: ufs: add trace event for ufs commands · 1a07f2d9
      Lee Susman 提交于
      Use the ftrace infrastructure to conditionally trace ufs command events.
      New trace event is created, which samples the following ufs command data:
      - device name
      - optional identification string
      - task tag
      - doorbell register
      - number of transfer bytes
      - interrupt status register
      - request start LBA
      - command opcode
      
      Currently we only fully trace read(10) and write(10) commands.
      All other commands which pass through ufshcd_send_command() will be
      printed with "-1" in the lba and transfer_len fields.
      
      Usage:
      	echo 1 > /sys/kernel/debug/tracing/events/ufs/enable
      	cat /sys/kernel/debug/tracing/trace_pipe
      Signed-off-by: NLee Susman <lsusman@codeaurora.org>
      Signed-off-by: NSubhash Jadavani <subhashj@codeaurora.org>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      1a07f2d9