1. 14 6月, 2017 8 次提交
  2. 13 6月, 2017 1 次提交
  3. 19 5月, 2017 1 次提交
  4. 18 5月, 2017 3 次提交
    • S
      ftrace: Remove #ifdef from code and add clear_ftrace_function_probes() stub · 8a49f3e0
      Steven Rostedt (VMware) 提交于
      No need to add ugly #ifdefs in the code. Having a standard stub file is much
      prettier.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      8a49f3e0
    • N
      ftrace/instances: Clear function triggers when removing instances · a0e6369e
      Naveen N. Rao 提交于
      If instance directories are deleted while there are registered function
      triggers:
      
        # cd /sys/kernel/debug/tracing/instances
        # mkdir test
        # echo "schedule:enable_event:sched:sched_switch" > test/set_ftrace_filter
        # rmdir test
        Unable to handle kernel paging request for data at address 0x00000008
        Unable to handle kernel paging request for data at address 0x00000008
        Faulting instruction address: 0xc0000000021edde8
        Oops: Kernel access of bad area, sig: 11 [#1]
        SMP NR_CPUS=2048
        NUMA
        pSeries
        Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc kvm iptable_filter fuse binfmt_misc pseries_rng rng_core vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c multipath virtio_net virtio_blk virtio_pci crc32c_vpmsum virtio_ring virtio
        CPU: 8 PID: 8694 Comm: rmdir Not tainted 4.11.0-nnr+ #113
        task: c0000000bab52800 task.stack: c0000000baba0000
        NIP: c0000000021edde8 LR: c0000000021f0590 CTR: c000000002119620
        REGS: c0000000baba3870 TRAP: 0300   Not tainted  (4.11.0-nnr+)
        MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>
          CR: 22002422  XER: 20000000
        CFAR: 00007fffabb725a8 DAR: 0000000000000008 DSISR: 40000000 SOFTE: 0
        GPR00: c00000000220f750 c0000000baba3af0 c000000003157e00 0000000000000000
        GPR04: 0000000000000040 00000000000000eb 0000000000000040 0000000000000000
        GPR08: 0000000000000000 0000000000000113 0000000000000000 c00000000305db98
        GPR12: c000000002119620 c00000000fd42c00 0000000000000000 0000000000000000
        GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR20: 0000000000000000 0000000000000000 c0000000bab52e90 0000000000000000
        GPR24: 0000000000000000 00000000000000eb 0000000000000040 c0000000baba3bb0
        GPR28: c00000009cb06eb0 c0000000bab52800 c00000009cb06eb0 c0000000baba3bb0
        NIP [c0000000021edde8] ring_buffer_lock_reserve+0x8/0x4e0
        LR [c0000000021f0590] trace_event_buffer_lock_reserve+0xe0/0x1a0
        Call Trace:
        [c0000000baba3af0] [c0000000021f96c8] trace_event_buffer_commit+0x1b8/0x280 (unreliable)
        [c0000000baba3b60] [c00000000220f750] trace_event_buffer_reserve+0x80/0xd0
        [c0000000baba3b90] [c0000000021196b8] trace_event_raw_event_sched_switch+0x98/0x180
        [c0000000baba3c10] [c0000000029d9980] __schedule+0x6e0/0xab0
        [c0000000baba3ce0] [c000000002122230] do_task_dead+0x70/0xc0
        [c0000000baba3d10] [c0000000020ea9c8] do_exit+0x828/0xd00
        [c0000000baba3dd0] [c0000000020eaf70] do_group_exit+0x60/0x100
        [c0000000baba3e10] [c0000000020eb034] SyS_exit_group+0x24/0x30
        [c0000000baba3e30] [c00000000200bcec] system_call+0x38/0x54
        Instruction dump:
        60000000 60420000 7d244b78 7f63db78 4bffaa09 393efff8 793e0020 39200000
        4bfffecc 60420000 3c4c00f7 3842a020 <81230008> 2f890000 409e02f0 a14d0008
        ---[ end trace b917b8985d0e650b ]---
        Unable to handle kernel paging request for data at address 0x00000008
        Faulting instruction address: 0xc0000000021edde8
        Unable to handle kernel paging request for data at address 0x00000008
        Faulting instruction address: 0xc0000000021edde8
        Faulting instruction address: 0xc0000000021edde8
      
      To address this, let's clear all registered function probes before
      deleting the ftrace instance.
      
      Link: http://lkml.kernel.org/r/c5f1ca624043690bd94642bb6bffd3f2fc504035.1494956770.git.naveen.n.rao@linux.vnet.ibm.comReported-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      a0e6369e
    • S
      tracing: Move postpone selftests to core from early_initcall · b9ef0326
      Steven Rostedt 提交于
      I hit the following lockdep splat when booting with ftrace selftests
      enabled, as well as CONFIG_PREEMPT and LOCKDEP.
      
       Testing dynamic ftrace ops #1:
       (1 0 1 0 0)
       (1 1 2 0 0)
       (2 1 3 0 169)
       (2 2 4 0 50066)
       ------------[ cut here ]------------
       WARNING: CPU: 0 PID: 13 at kernel/rcu/srcutree.c:202 check_init_srcu_struct+0x60/0x70
       Modules linked in:
       CPU: 0 PID: 13 Comm: rcu_tasks_kthre Not tainted 4.12.0-rc1-test+ #587
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
       task: ffff880119628040 task.stack: ffffc900006a4000
       RIP: 0010:check_init_srcu_struct+0x60/0x70
       RSP: 0000:ffffc900006a7d98 EFLAGS: 00010246
       RAX: 0000000000000246 RBX: 0000000000000000 RCX: 0000000000000000
       RDX: ffff880119628040 RSI: 00000000ffffffff RDI: ffffffff81e5fb40
       RBP: ffffc900006a7e20 R08: 00000023b403c000 R09: 0000000000000001
       R10: ffffc900006a7e40 R11: 0000000000000000 R12: ffffffff81e5fb40
       R13: 0000000000000286 R14: ffff880119628040 R15: ffffc900006a7e98
       FS:  0000000000000000(0000) GS:ffff88011ea00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: ffff88011edff000 CR3: 0000000001e0f000 CR4: 00000000001406f0
       Call Trace:
        ? __synchronize_srcu+0x6e/0x140
        ? lock_acquire+0xdc/0x1d0
        ? ktime_get_mono_fast_ns+0x5d/0xb0
        synchronize_srcu+0x6f/0x110
        ? synchronize_srcu+0x6f/0x110
        rcu_tasks_kthread+0x20a/0x540
        kthread+0x114/0x150
        ? __rcu_read_unlock+0x70/0x70
        ? kthread_create_on_node+0x40/0x40
        ret_from_fork+0x2e/0x40
       Code: f6 83 70 06 00 00 03 49 89 c5 74 0d be 01 00 00 00 48 89 df e8 42 fa ff ff 4c 89 ee 4c 89 e7 e8 b7 42 75 00 5b 41 5c 41 5d 5d c3 <0f> ff eb aa 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
       ---[ end trace 5c3f4206ce50f6ac ]---
      
      What happens is that the selftests include a creating of a dynamically
      allocated ftrace_ops, which requires the use of synchronize_rcu_tasks()
      which uses srcu, and triggers the above warning.
      
      It appears that synchronize_rcu_tasks() is not set up at early_initcall(),
      but it is at core_initcall(). By moving the tests down to that location
      works out properly.
      
      Link: http://lkml.kernel.org/r/20170517111435.7388c033@gandalf.local.homeAcked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      b9ef0326
  5. 09 5月, 2017 1 次提交
  6. 04 5月, 2017 1 次提交
  7. 01 5月, 2017 1 次提交
    • S
      ring-buffer: Return reader page back into existing ring buffer · 73a757e6
      Steven Rostedt (VMware) 提交于
      When reading the ring buffer for consuming, it is optimized for splice,
      where a page is taken out of the ring buffer (zero copy) and sent to the
      reading consumer. When the read is finished with the page, it calls
      ring_buffer_free_read_page(), which simply frees the page. The next time the
      reader needs to get a page from the ring buffer, it must call
      ring_buffer_alloc_read_page() which allocates and initializes a reader page
      for the ring buffer to be swapped into the ring buffer for a new filled page
      for the reader.
      
      The problem is that there's no reason to actually free the page when it is
      passed back to the ring buffer. It can hold it off and reuse it for the next
      iteration. This completely removes the interaction with the page_alloc
      mechanism.
      
      Using the trace-cmd utility to record all events (causing trace-cmd to
      require reading lots of pages from the ring buffer, and calling
      ring_buffer_alloc/free_read_page() several times), and also assigning a
      stack trace trigger to the mm_page_alloc event, we can see how many times
      the ring_buffer_alloc_read_page() needed to allocate a page for the ring
      buffer.
      
      Before this change:
      
        # trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
        # trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
        9968
      
      After this change:
      
        # trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
        # trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
        4
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      73a757e6
  8. 21 4月, 2017 10 次提交
    • S
      tracing/ftrace: Allow for the traceonoff probe be unique to instances · 2290f2c5
      Steven Rostedt (VMware) 提交于
      Have the traceon/off function probe triggers affect only the instance they
      are set in. This required making the trace_on/off accessible for other files
      in the tracing directory.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      2290f2c5
    • S
      tracing/ftrace: Enable snapshot function trigger to work with instances · cab50379
      Steven Rostedt (VMware) 提交于
      Modify the snapshot probe trigger to work with instances. This way the
      snapshot function trigger will only affect the instance that it is added to
      in the set_ftrace_filter file.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      cab50379
    • S
      tracing/ftrace: Add a better way to pass data via the probe functions · 6e444319
      Steven Rostedt (VMware) 提交于
      With the redesign of the registration and execution of the function probes
      (triggers), data can now be passed from the setup of the probe to the probe
      callers that are specific to the trace_array it is on. Although, all probes
      still only affect the toplevel trace array, this change will allow for
      instances to have their own probes separated from other instances and the
      top array.
      
      That is, something like the stacktrace probe can be set to trace only in an
      instance and not the toplevel trace array. This isn't implement yet, but
      this change sets the ground work for the change.
      
      When a probe callback is triggered (someone writes the probe format into
      set_ftrace_filter), it calls register_ftrace_function_probe() passing in
      init_data that will be used to initialize the probe. Then for every matching
      function, register_ftrace_function_probe() will call the probe_ops->init()
      function with the init data that was passed to it, as well as an address to
      a place holder that is associated with the probe and the instance. The first
      occurrence will have a NULL in the pointer. The init() function will then
      initialize it. If other probes are added, or more functions are part of the
      probe, the place holder will be passed to the init() function with the place
      holder data that it was initialized to the last time.
      
      Then this place_holder is passed to each of the other probe_ops functions,
      where it can be used in the function callback. When the probe_ops free()
      function is called, it can be called either with the rip of the function
      that is being removed from the probe, or zero, indicating that there are no
      more functions attached to the probe, and the place holder is about to be
      freed. This gives the probe_ops a way to free the data it assigned to the
      place holder if it was allocade during the first init call.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      6e444319
    • S
      ftrace: Dynamically create the probe ftrace_ops for the trace_array · 7b60f3d8
      Steven Rostedt (VMware) 提交于
      In order to eventually have each trace_array instance have its own unique
      set of function probes (triggers), the trace array needs to hold the ops and
      the filters for the probes.
      
      This is the first step to accomplish this. Instead of having the private
      data of the probe ops point to the trace_array, create a separate list that
      the trace_array holds. There's only one private_data for a probe, we need
      one per trace_array. The probe ftrace_ops will be dynamically created for
      each instance, instead of being static.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      7b60f3d8
    • S
      tracing: Pass the trace_array into ftrace_probe_ops functions · b5f081b5
      Steven Rostedt (VMware) 提交于
      Pass the trace_array associated to a ftrace_probe_ops into the probe_ops
      func(), init() and free() functions. The trace_array is the descriptor that
      describes a tracing instance. This will help create the infrastructure that
      will allow having function probes unique to tracing instances.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      b5f081b5
    • S
      tracing: Have the trace_array hold the list of registered func probes · 04ec7bb6
      Steven Rostedt (VMware) 提交于
      Add a link list to the trace_array to hold func probes that are registered.
      Currently, all function probes are the same for all instances as it was
      before, that is, only the top level trace_array holds the function probes.
      But this lays the ground work to have function probes be attached to
      individual instances, and having the event trigger only affect events in the
      given instance. But that work is still to be done.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      04ec7bb6
    • S
      ftrace: Have unregister_ftrace_function_probe_func() return a value · d3d532d7
      Steven Rostedt (VMware) 提交于
      Currently unregister_ftrace_function_probe_func() is a void function. It
      does not give any feedback if an error occurred or no item was found to
      remove and nothing was done.
      
      Change it to return status and success if it removed something. Also update
      the callers to return that feedback to the user.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      d3d532d7
    • S
      ftrace: Remove data field from ftrace_func_probe structure · 1a48df00
      Steven Rostedt (VMware) 提交于
      No users of the function probes uses the data field anymore. Remove it, and
      change the init function to take a void *data parameter instead of a
      void **data, because the init will just get the data that the registering
      function was received, and there's no state after it is called.
      
      The other functions for ftrace_probe_ops still take the data parameter, but
      it will currently only be passed NULL. It will stay as a parameter for
      future data to be passed to these functions.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      1a48df00
    • S
      tracing: Have the snapshot trigger use the mapping helper functions · 1a93f8bd
      Steven Rostedt (VMware) 提交于
      As the data pointer for individual ips will soon be removed and no longer
      passed to the callback function probe handlers, convert the snapshot
      trigger counter over to the new ftrace_func_mapper helper functions.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      1a93f8bd
    • S
      ftrace: Pass probe ops to probe function · bca6c8d0
      Steven Rostedt (VMware) 提交于
      In preparation to cleaning up the probe function registration code, the
      "data" parameter will eventually be removed from the probe->func() call.
      Instead it will receive its own "ops" function, in which it can set up its
      own data that it needs to map.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      bca6c8d0
  9. 20 4月, 2017 1 次提交
    • S
      tracing: Allocate the snapshot buffer before enabling probe · df62db5b
      Steven Rostedt (VMware) 提交于
      Currently the snapshot trigger enables the probe and then allocates the
      snapshot. If the probe triggers before the allocation, it could cause the
      snapshot to fail and turn tracing off. It's best to allocate the snapshot
      buffer first, and then enable the trigger. If something goes wrong in the
      enabling of the trigger, the snapshot buffer is still allocated, but it can
      also be freed by the user by writting zero into the snapshot buffer file.
      
      Also add a check of the return status of alloc_snapshot().
      
      Cc: stable@vger.kernel.org
      Fixes: 77fd5c15 ("tracing: Add snapshot trigger to function probes")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      df62db5b
  10. 18 4月, 2017 2 次提交
  11. 25 3月, 2017 4 次提交
    • S
      tracing: Move trace_handle_return() out of line · af0009fc
      Steven Rostedt (VMware) 提交于
      Currently trace_handle_return() looks like this:
      
       static inline enum print_line_t trace_handle_return(struct trace_seq *s)
       {
              return trace_seq_has_overflowed(s) ?
                      TRACE_TYPE_PARTIAL_LINE : TRACE_TYPE_HANDLED;
       }
      
      Where trace_seq_overflowed(s) is:
      
       static inline bool trace_seq_has_overflowed(struct trace_seq *s)
       {
      	return s->full || seq_buf_has_overflowed(&s->seq);
       }
      
      And seq_buf_has_overflowed(&s->seq) is:
      
       static inline bool
       seq_buf_has_overflowed(struct seq_buf *s)
       {
      	return s->len > s->size;
       }
      
      Making trace_handle_return() into:
      
       return (s->full || (s->seq->len > s->seq->size)) ?
                 TRACE_TYPE_PARTIAL_LINE :
                 TRACE_TYPE_HANDLED;
      
      One would think this is not an issue to keep as an inline. But because this
      is used in the TRACE_EVENT() macro, it is extended for every tracepoint in
      the system. Taking a look at a single tracepoint x86_irq_vector (was the
      first one I randomly chosen). As trace_handle_return is used in the
      TRACE_EVENT() macro of trace_raw_output_##call() we disassemble
      trace_raw_output_x86_irq_vector and do a diff:
      
      - is the original
      + is the out-of-line code
      
      I removed identical lines that were different just due to different
      addresses.
      
      --- /tmp/irq-vec-orig	2017-03-16 09:12:48.569384851 -0400
      +++ /tmp/irq-vec-ool	2017-03-16 09:13:39.378153385 -0400
      @@ -6,27 +6,23 @@
              53                      push   %rbx
              48 89 fb                mov    %rdi,%rbx
              4c 8b a7 c0 20 00 00    mov    0x20c0(%rdi),%r12
              e8 f7 72 13 00          callq  ffffffff81155c80 <trace_raw_output_prep>
              83 f8 01                cmp    $0x1,%eax
              74 05                   je     ffffffff8101e993 <trace_raw_output_x86_irq_vector+0x23>
              5b                      pop    %rbx
              41 5c                   pop    %r12
              5d                      pop    %rbp
              c3                      retq
              41 8b 54 24 08          mov    0x8(%r12),%edx
      -       48 8d bb 98 10 00 00    lea    0x1098(%rbx),%rdi
      +       48 81 c3 98 10 00 00    add    $0x1098,%rbx
      -       48 c7 c6 7b 8a a0 81    mov    $0xffffffff81a08a7b,%rsi
      +       48 c7 c6 ab 8a a0 81    mov    $0xffffffff81a08aab,%rsi
      -       e8 c5 85 13 00          callq  ffffffff81156f70 <trace_seq_printf>
      
       === here's the start of the main difference ===
      
      +       48 89 df                mov    %rbx,%rdi
      +       e8 62 7e 13 00          callq  ffffffff81156810 <trace_seq_printf>
      -       8b 93 b8 20 00 00       mov    0x20b8(%rbx),%edx
      -       31 c0                   xor    %eax,%eax
      -       85 d2                   test   %edx,%edx
      -       75 11                   jne    ffffffff8101e9c8 <trace_raw_output_x86_irq_vector+0x58>
      -       48 8b 83 a8 20 00 00    mov    0x20a8(%rbx),%rax
      -       48 39 83 a0 20 00 00    cmp    %rax,0x20a0(%rbx)
      -       0f 93 c0                setae  %al
      +       48 89 df                mov    %rbx,%rdi
      +       e8 4a c5 12 00          callq  ffffffff8114af00 <trace_handle_return>
              5b                      pop    %rbx
      -       0f b6 c0                movzbl %al,%eax
      
       === end ===
      
              41 5c                   pop    %r12
              5d                      pop    %rbp
              c3                      retq
      
      If you notice, the original has 22 bytes of text more than the out of line
      version. As this is for every TRACE_EVENT() defined in the system, this can
      become quite large.
      
         text	   data	    bss	    dec	    hex	filename
      8690305	5450490	1298432	15439227	 eb957b	vmlinux-orig
      8681725	5450490	1298432	15430647	 eb73f7	vmlinux-handle
      
      This change has a total of 8580 bytes in savings.
      
       $ objdump -dr /tmp/vmlinux-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
      324
      
      That's 324 tracepoints. But this does not include modules (which contain
      many more tracepoints). For an allyesconfig build:
      
       $ objdump -dr vmlinux-allyes-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
      1401
      
      That's 1401 tracepoints giving us:
      
         text    data     bss     dec     hex filename
      137920629       140221067       53264384        331406080       13c0db00 vmlinux-allyes-orig
      137827709       140221067       53264384        331313160       13bf7008 vmlinux-allyes-handle
      
      92920 bytes in savings!!!
      
      Link: http://lkml.kernel.org/r/20170315021431.13107-2-andi@firstfloor.orgReported-by: NAndi Kleen <andi@firstfloor.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      af0009fc
    • S
      ftrace: Have function tracing start in early boot up · dbeafd0d
      Steven Rostedt (VMware) 提交于
      Register the function tracer right after the tracing buffers are initialized
      in early boot up. This will allow function tracing to begin early if it is
      enabled via the kernel command line.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      dbeafd0d
    • S
      tracing: Postpone tracer start-up tests till the system is more robust · 9afecfbb
      Steven Rostedt (VMware) 提交于
      As tracing can now be enabled very early in boot up, even before some
      critical system services (like scheduling), do not run the tracer selftests
      until after early_initcall() is performed. If a tracer is registered before
      such time, it is saved off in a list and the test is run when the system is
      able to handle more diverse functions.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      9afecfbb
    • S
      tracing: Split tracing initialization into two for early initialization · e725c731
      Steven Rostedt (VMware) 提交于
      Create an early_trace_init() function that will initialize the buffers and
      allow for ealier use of trace_printk(). This will also allow for future work
      to have function tracing start earlier at boot up.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      e725c731
  12. 04 3月, 2017 1 次提交
  13. 01 3月, 2017 1 次提交
  14. 17 2月, 2017 1 次提交
  15. 03 2月, 2017 1 次提交
  16. 01 2月, 2017 1 次提交
    • E
      fs: Better permission checking for submounts · 93faccbb
      Eric W. Biederman 提交于
      To support unprivileged users mounting filesystems two permission
      checks have to be performed: a test to see if the user allowed to
      create a mount in the mount namespace, and a test to see if
      the user is allowed to access the specified filesystem.
      
      The automount case is special in that mounting the original filesystem
      grants permission to mount the sub-filesystems, to any user who
      happens to stumble across the their mountpoint and satisfies the
      ordinary filesystem permission checks.
      
      Attempting to handle the automount case by using override_creds
      almost works.  It preserves the idea that permission to mount
      the original filesystem is permission to mount the sub-filesystem.
      Unfortunately using override_creds messes up the filesystems
      ordinary permission checks.
      
      Solve this by being explicit that a mount is a submount by introducing
      vfs_submount, and using it where appropriate.
      
      vfs_submount uses a new mount internal mount flags MS_SUBMOUNT, to let
      sget and friends know that a mount is a submount so they can take appropriate
      action.
      
      sget and sget_userns are modified to not perform any permission checks
      on submounts.
      
      follow_automount is modified to stop using override_creds as that
      has proven problemantic.
      
      do_mount is modified to always remove the new MS_SUBMOUNT flag so
      that we know userspace will never by able to specify it.
      
      autofs4 is modified to stop using current_real_cred that was put in
      there to handle the previous version of submount permission checking.
      
      cifs is modified to pass the mountpoint all of the way down to vfs_submount.
      
      debugfs is modified to pass the mountpoint all of the way down to
      trace_automount by adding a new parameter.  To make this change easier
      a new typedef debugfs_automount_t is introduced to capture the type of
      the debugfs automount function.
      
      Cc: stable@vger.kernel.org
      Fixes: 069d5ac9 ("autofs:  Fix automounts by using current_real_cred()->uid")
      Fixes: aeaa4a79 ("fs: Call d_automount with the filesystems creds")
      Reviewed-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Reviewed-by: NSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      93faccbb
  17. 27 12月, 2016 1 次提交
  18. 25 12月, 2016 1 次提交