1. 03 8月, 2016 10 次提交
  2. 29 7月, 2016 9 次提交
  3. 27 7月, 2016 4 次提交
  4. 26 7月, 2016 2 次提交
    • S
      bpf: Add bpf_probe_write_user BPF helper to be called in tracers · 96ae5227
      Sargun Dhillon 提交于
      This allows user memory to be written to during the course of a kprobe.
      It shouldn't be used to implement any kind of security mechanism
      because of TOC-TOU attacks, but rather to debug, divert, and
      manipulate execution of semi-cooperative processes.
      
      Although it uses probe_kernel_write, we limit the address space
      the probe can write into by checking the space with access_ok.
      We do this as opposed to calling copy_to_user directly, in order
      to avoid sleeping. In addition we ensure the threads's current fs
      / segment is USER_DS and the thread isn't exiting nor a kernel thread.
      
      Given this feature is meant for experiments, and it has a risk of
      crashing the system, and running programs, we print a warning on
      when a proglet that attempts to use this helper is installed,
      along with the pid and process name.
      Signed-off-by: NSargun Dhillon <sargun@sargun.me>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96ae5227
    • D
      bpf, events: fix offset in skb copy handler · aa7145c1
      Daniel Borkmann 提交于
      This patch fixes the __output_custom() routine we currently use with
      bpf_skb_copy(). I missed that when len is larger than the size of the
      current handle, we can issue multiple invocations of copy_func, and
      __output_custom() advances destination but also source buffer by the
      written amount of bytes. When we have __output_custom(), this is actually
      wrong since in that case the source buffer points to a non-linear object,
      in our case an skb, which the copy_func helper is supposed to walk.
      Therefore, since this is non-linear we thus need to pass the offset into
      the helper, so that copy_func can use it for extracting the data from
      the source object.
      
      Therefore, adjust the callback signatures properly and pass offset
      into the skb_header_pointer() invoked from bpf_skb_copy() callback. The
      __DEFINE_OUTPUT_COPY_BODY() is adjusted to accommodate for two things:
      i) to pass in whether we should advance source buffer or not; this is
      a compile-time constant condition, ii) to pass in the offset for
      __output_custom(), which we do with help of __VA_ARGS__, so everything
      can stay inlined as is currently. Both changes allow for adapting the
      __output_* fast-path helpers w/o extra overhead.
      
      Fixes: 555c8a86 ("bpf: avoid stack copy and use skb ctx for event output")
      Fixes: 7e3f977e ("perf, events: add non-linear data support for raw records")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa7145c1
  5. 22 7月, 2016 2 次提交
    • C
      PM / hibernate: Introduce test_resume mode for hibernation · fe12c00d
      Chen Yu 提交于
      test_resume mode is to verify if the snapshot data
      written to swap device can be successfully restored
      to memory. It is useful to ease the debugging process
      on hibernation, since this mode can not only bypass
      the BIOSes/bootloader, but also the system re-initialization.
      
      To avoid the risk to break the filesystm on persistent storage,
      this patch resumes the image with tasks frozen.
      
      For example:
      echo test_resume > /sys/power/disk
      echo disk > /sys/power/state
      
      [  187.306470] PM: Image saving progress:  70%
      [  187.395298] PM: Image saving progress:  80%
      [  187.476697] PM: Image saving progress:  90%
      [  187.554641] PM: Image saving done.
      [  187.558896] PM: Wrote 594600 kbytes in 0.90 seconds (660.66 MB/s)
      [  187.566000] PM: S|
      [  187.589742] PM: Basic memory bitmaps freed
      [  187.594694] PM: Checking hibernation image
      [  187.599865] PM: Image signature found, resuming
      [  187.605209] PM: Loading hibernation image.
      [  187.665753] PM: Basic memory bitmaps created
      [  187.691397] PM: Using 3 thread(s) for decompression.
      [  187.691397] PM: Loading and decompressing image data (148650 pages)...
      [  187.889719] PM: Image loading progress:   0%
      [  188.100452] PM: Image loading progress:  10%
      [  188.244781] PM: Image loading progress:  20%
      [  189.057305] PM: Image loading done.
      [  189.068793] PM: Image successfully loaded
      Suggested-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NChen Yu <yu.c.chen@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      fe12c00d
    • S
      cpufreq: schedutil: map raw required frequency to driver frequency · 5cbea469
      Steve Muckle 提交于
      The slow-path frequency transition path is relatively expensive as it
      requires waking up a thread to do work. Should support be added for
      remote CPU cpufreq updates that is also expensive since it requires an
      IPI. These activities should be avoided if they are not necessary.
      
      To that end, calculate the actual driver-supported frequency required by
      the new utilization value in schedutil by using the recently added
      cpufreq_driver_resolve_freq API. If it is the same as the previously
      requested driver frequency then there is no need to continue with the
      update assuming the cpu frequency limits have not changed. This will
      have additional benefits should the semantics of the rate limit be
      changed to apply solely to frequency transitions rather than to
      frequency calculations in schedutil.
      
      The last raw required frequency is cached. This allows the driver
      frequency lookup to be skipped in the event that the new raw required
      frequency matches the last one, assuming a frequency update has not been
      forced due to limits changing (indicated by a next_freq value of
      UINT_MAX, see sugov_should_update_freq).
      Signed-off-by: NSteve Muckle <smuckle@linaro.org>
      Reviewed-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5cbea469
  6. 21 7月, 2016 1 次提交
    • P
      audit: fix a double fetch in audit_log_single_execve_arg() · 43761473
      Paul Moore 提交于
      There is a double fetch problem in audit_log_single_execve_arg()
      where we first check the execve(2) argumnets for any "bad" characters
      which would require hex encoding and then re-fetch the arguments for
      logging in the audit record[1].  Of course this leaves a window of
      opportunity for an unsavory application to munge with the data.
      
      This patch reworks things by only fetching the argument data once[2]
      into a buffer where it is scanned and logged into the audit
      records(s).  In addition to fixing the double fetch, this patch
      improves on the original code in a few other ways: better handling
      of large arguments which require encoding, stricter record length
      checking, and some performance improvements (completely unverified,
      but we got rid of some strlen() calls, that's got to be a good
      thing).
      
      As part of the development of this patch, I've also created a basic
      regression test for the audit-testsuite, the test can be tracked on
      GitHub at the following link:
      
       * https://github.com/linux-audit/audit-testsuite/issues/25
      
      [1] If you pay careful attention, there is actually a triple fetch
      problem due to a strnlen_user() call at the top of the function.
      
      [2] This is a tiny white lie, we do make a call to strnlen_user()
      prior to fetching the argument data.  I don't like it, but due to the
      way the audit record is structured we really have no choice unless we
      copy the entire argument at once (which would require a rather
      wasteful allocation).  The good news is that with this patch the
      kernel no longer relies on this strnlen_user() value for anything
      beyond recording it in the log, we also update it with a trustworthy
      value whenever possible.
      Reported-by: NPengfei Wang <wpengfeinudt@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      43761473
  7. 20 7月, 2016 5 次提交
  8. 19 7月, 2016 3 次提交
  9. 17 7月, 2016 1 次提交
  10. 16 7月, 2016 3 次提交
    • D
      bpf: avoid stack copy and use skb ctx for event output · 555c8a86
      Daniel Borkmann 提交于
      This work addresses a couple of issues bpf_skb_event_output()
      helper currently has: i) We need two copies instead of just a
      single one for the skb data when it should be part of a sample.
      The data can be non-linear and thus needs to be extracted via
      bpf_skb_load_bytes() helper first, and then copied once again
      into the ring buffer slot. ii) Since bpf_skb_load_bytes()
      currently needs to be used first, the helper needs to see a
      constant size on the passed stack buffer to make sure BPF
      verifier can do sanity checks on it during verification time.
      Thus, just passing skb->len (or any other non-constant value)
      wouldn't work, but changing bpf_skb_load_bytes() is also not
      the proper solution, since the two copies are generally still
      needed. iii) bpf_skb_load_bytes() is just for rather small
      buffers like headers, since they need to sit on the limited
      BPF stack anyway. Instead of working around in bpf_skb_load_bytes(),
      this work improves the bpf_skb_event_output() helper to address
      all 3 at once.
      
      We can make use of the passed in skb context that we have in
      the helper anyway, and use some of the reserved flag bits as
      a length argument. The helper will use the new __output_custom()
      facility from perf side with bpf_skb_copy() as callback helper
      to walk and extract the data. It will pass the data for setup
      to bpf_event_output(), which generates and pushes the raw record
      with an additional frag part. The linear data used in the first
      frag of the record serves as programmatically defined meta data
      passed along with the appended sample.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      555c8a86
    • D
      bpf, perf: split bpf_perf_event_output · 8e7a3920
      Daniel Borkmann 提交于
      Split the bpf_perf_event_output() helper as a preparation into
      two parts. The new bpf_perf_event_output() will prepare the raw
      record itself and test for unknown flags from BPF trace context,
      where the __bpf_perf_event_output() does the core work. The
      latter will be reused later on from bpf_event_output() directly.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e7a3920
    • D
      perf, events: add non-linear data support for raw records · 7e3f977e
      Daniel Borkmann 提交于
      This patch adds support for non-linear data on raw records. It
      extends raw records to have one or multiple fragments that will
      be written linearly into the ring slot, where each fragment can
      optionally have a custom callback handler to walk and extract
      complex, possibly non-linear data.
      
      If a callback handler is provided for a fragment, then the new
      __output_custom() will be used instead of __output_copy() for
      the perf_output_sample() part. perf_prepare_sample() does all
      the size calculation only once, so perf_output_sample() doesn't
      need to redo the same work anymore, meaning real_size and padding
      will be cached in the raw record. The raw record becomes 32 bytes
      in size without holes; to not increase it further and to avoid
      doing unnecessary recalculations in fast-path, we can reuse
      next pointer of the last fragment, idea here is borrowed from
      ZERO_OR_NULL_PTR(), which should keep the perf_output_sample()
      path for PERF_SAMPLE_RAW minimal.
      
      This facility is needed for BPF's event output helper as a first
      user that will, in a follow-up, add an additional perf_raw_frag
      to its perf_raw_record in order to be able to more efficiently
      dump skb context after a linear head meta data related to it.
      skbs can be non-linear and thus need a custom output function to
      dump buffers. Currently, the skb data needs to be copied twice;
      with the help of __output_custom() this work only needs to be
      done once. Future users could be things like XDP/BPF programs
      that work on different context though and would thus also have
      a different callback function.
      
      The few users of raw records are adapted to initialize their frag
      data from the raw record itself, no change in behavior for them.
      The code is based upon a PoC diff provided by Peter Zijlstra [1].
      
        [1] http://thread.gmane.org/gmane.linux.network/421294Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e3f977e