1. 13 11月, 2017 5 次提交
  2. 12 11月, 2017 1 次提交
  3. 09 11月, 2017 3 次提交
    • P
      sched/core: Optimize sched_feat() for !CONFIG_SCHED_DEBUG builds · 765cc3a4
      Patrick Bellasi 提交于
      When the kernel is compiled with !CONFIG_SCHED_DEBUG support, we expect that
      all SCHED_FEAT are turned into compile time constants being propagated
      to support compiler optimizations.
      
      Specifically, we expect that code blocks like this:
      
         if (sched_feat(FEATURE_NAME) [&& <other_conditions>]) {
      	/* FEATURE CODE */
         }
      
      are turned into dead-code in case FEATURE_NAME defaults to FALSE, and thus
      being removed by the compiler from the finale image.
      
      For this mechanism to properly work it's required for the compiler to
      have full access, from each translation unit, to whatever is the value
      defined by the sched_feat macro. This macro is defined as:
      
         #define sched_feat(x) (sysctl_sched_features & (1UL << __SCHED_FEAT_##x))
      
      and thus, the compiler can optimize that code only if the value of
      sysctl_sched_features is visible within each translation unit.
      
      Since:
      
         029632fb ("sched: Make separate sched*.c translation units")
      
      the scheduler code has been split into separate translation units
      however the definition of sysctl_sched_features is part of
      kernel/sched/core.c while, for all the other scheduler modules, it is
      visible only via kernel/sched/sched.h as an:
      
         extern const_debug unsigned int sysctl_sched_features
      
      Unfortunately, an extern reference does not allow the compiler to apply
      constants propagation. Thus, on !CONFIG_SCHED_DEBUG kernel we still end up
      with code to load a memory reference and (eventually) doing an unconditional
      jump of a chunk of code.
      
      This mechanism is unavoidable when sched_features can be turned on and off at
      run-time. However, this is not the case for "production" kernels compiled with
      !CONFIG_SCHED_DEBUG. In this case, sysctl_sched_features is just a constant value
      which cannot be changed at run-time and thus memory loads and jumps can be
      avoided altogether.
      
      This patch fixes the case of !CONFIG_SCHED_DEBUG kernel by declaring a local version
      of the sysctl_sched_features constant for each translation unit. This will
      ultimately allow the compiler to perform constants propagation and dead-code
      pruning.
      
      Tests have been done, with !CONFIG_SCHED_DEBUG on a v4.14-rc8 with and without
      the patch, by running 30 iterations of:
      
         perf bench sched messaging --pipe --thread --group 4 --loop 50000
      
      on a 40 cores Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz using the
      powersave governor to rule out variations due to frequency scaling.
      
      Statistics on the reported completion time:
      
                         count     mean       std     min       99%     max
        v4.14-rc8         30.0  15.7831  0.176032  15.442  16.01226  16.014
        v4.14-rc8+patch   30.0  15.5033  0.189681  15.232  15.93938  15.962
      
      ... show a 1.8% speedup on average completion time and 0.5% speedup in the
      99 percentile.
      Signed-off-by: NPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: NChris Redpath <chris.redpath@arm.com>
      Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Reviewed-by: NBrendan Jackman <brendan.jackman@arm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Link: http://lkml.kernel.org/r/20171108184101.16006-1-patrick.bellasi@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      765cc3a4
    • L
      stop using '%pK' for /proc/kallsyms pointer values · c0f3ea15
      Linus Torvalds 提交于
      Not only is it annoying to have one single flag for all pointers, as if
      that was a global choice and all kernel pointers are the same, but %pK
      can't get the 'access' vs 'open' time check right anyway.
      
      So make the /proc/kallsyms pointer value code use logic specific to that
      particular file.  We do continue to honor kptr_restrict, but the default
      (which is unrestricted) is changed to instead take expected users into
      account, and restrict access by default.
      
      Right now the only actual expected user is kernel profiling, which has a
      separate sysctl flag for kernel profile access.  There may be others.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c0f3ea15
    • B
      module: export module signature enforcement status · fda784e5
      Bruno E. O. Meneguele 提交于
      A static variable sig_enforce is used as status var to indicate the real
      value of CONFIG_MODULE_SIG_FORCE, once this one is set the var will hold
      true, but if the CONFIG is not set the status var will hold whatever
      value is present in the module.sig_enforce kernel cmdline param: true
      when =1 and false when =0 or not present.
      
      Considering this cmdline param take place over the CONFIG value when
      it's not set, other places in the kernel could misbehave since they
      would have only the CONFIG_MODULE_SIG_FORCE value to rely on. Exporting
      this status var allows the kernel to rely in the effective value of
      module signature enforcement, being it from CONFIG value or cmdline
      param.
      Signed-off-by: NBruno E. O. Meneguele <brdeoliv@redhat.com>
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      fda784e5
  4. 08 11月, 2017 12 次提交
  5. 07 11月, 2017 5 次提交
  6. 05 11月, 2017 1 次提交
  7. 02 11月, 2017 5 次提交
    • J
      futex: futex_wake_op, do not fail on invalid op · e78c38f6
      Jiri Slaby 提交于
      In commit 30d6e0a4 ("futex: Remove duplicated code and fix undefined
      behaviour"), I let FUTEX_WAKE_OP to fail on invalid op.  Namely when op
      should be considered as shift and the shift is out of range (< 0 or > 31).
      
      But strace's test suite does this madness:
      
        futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xa0caffee);
        futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xbadfaced);
        futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xffffffff);
      
      When I pick the first 0xa0caffee, it decodes as:
      
        0x80000000 & 0xa0caffee: oparg is shift
        0x70000000 & 0xa0caffee: op is FUTEX_OP_OR
        0x0f000000 & 0xa0caffee: cmp is FUTEX_OP_CMP_EQ
        0x00fff000 & 0xa0caffee: oparg is sign-extended 0xcaf = -849
        0x00000fff & 0xa0caffee: cmparg is sign-extended 0xfee = -18
      
      That means the op tries to do this:
      
        (futex |= (1 << (-849))) == -18
      
      which is completely bogus. The new check of op in the code is:
      
              if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) {
                      if (oparg < 0 || oparg > 31)
                              return -EINVAL;
                      oparg = 1 << oparg;
              }
      
      which results obviously in the "Invalid argument" errno:
      
        FAIL: futex
        ===========
      
        futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xa0caffee) = -1: Invalid argument
        futex.test: failed test: ../futex failed with code 1
      
      So let us soften the failure to print only a (ratelimited) message, crop
      the value and continue as if it were right.  When userspace keeps up, we
      can switch this to return -EINVAL again.
      
      [v2] Do not return 0 immediatelly, proceed with the cropped value.
      
      Fixes: 30d6e0a4 ("futex: Remove duplicated code and fix undefined behaviour")
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Darren Hart <dvhart@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e78c38f6
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
    • A
      signal: Fix name of SIGEMT in #if defined() check · c3aff086
      Andrew Clayton 提交于
      Commit cc731525 ("signal: Remove kernel interal si_code magic")
      added a check for SIGMET and NSIGEMT being defined. That SIGMET should
      in fact be SIGEMT, with SIGEMT being defined in
      arch/{alpha,mips,sparc}/include/uapi/asm/signal.h
      
      This was actually pointed out by BenHutchings in a lwn.net comment
      here https://lwn.net/Comments/734608/
      
      Fixes: cc731525 ("signal: Remove kernel interal si_code magic")
      Signed-off-by: NAndrew Clayton <andrew@digital-domain.net>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      c3aff086
    • D
      watchdog/hardlockup/perf: Use atomics to track in-use cpu counter · 42f930da
      Don Zickus 提交于
      Guenter reported:
        There is still a problem. When running 
          echo 6 > /proc/sys/kernel/watchdog_thresh
          echo 5 > /proc/sys/kernel/watchdog_thresh
        repeatedly, the message
       
         NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
       
        stops after a while (after ~10-30 iterations, with fluctuations).
        Maybe watchdog_cpus needs to be atomic ?
      
      That's correct as this again is affected by the asynchronous nature of the
      smpboot thread unpark mechanism.
      
      CPU 0				CPU1			CPU2
      write(watchdog_thresh, 6)	
        stop()
          park()
        update()
        start()
          unpark()
      				thread->unpark()
      				  cnt++;
      write(watchdog_thresh, 5)				thread->unpark()
        stop()
          park()			thread->park()
      				   cnt--;		  cnt++;
        update()
        start()
          unpark()
      
      That's not a functional problem, it just affects the informational message.
      
      Convert watchdog_cpus to atomic_t to prevent the problem
      Reported-and-tested-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20171101181126.j727fqjmdthjz4xk@redhat.com
      
      42f930da
    • T
      watchdog/harclockup/perf: Revert a33d4484 ("watchdog/hardlockup/perf:... · 9c388a5e
      Thomas Gleixner 提交于
      watchdog/harclockup/perf: Revert a33d4484 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
      
      Guenter reported a crash in the watchdog/perf code, which is caused by
      cleanup() and enable() running concurrently. The reason for this is:
      
      The watchdog functions are serialized via the watchdog_mutex and cpu
      hotplug locking, but the enable of the perf based watchdog happens in
      context of the unpark callback of the smpboot thread. But that unpark
      function is not synchronous inside the locking. The unparking of the thread
      just wakes it up and leaves so there is no guarantee when the thread is
      executing.
      
      If it starts running _before_ the cleanup happened then it will create a
      event and overwrite the dead event pointer. The new event is then cleaned
      up because the event is marked dead.
      
          lock(watchdog_mutex);
          lockup_detector_reconfigure();
              cpus_read_lock();
      	stop();
      	   park()
      	update();
      	start();
      	   unpark()
      	cpus_read_unlock();		thread runs()
      					  overwrite dead event ptr
      	cleanup();
      	  free new event, which is active inside perf....
          unlock(watchdog_mutex);
      
      The park side is safe as that actually waits for the thread to reach
      parked state.
      
      Commit a33d4484 removed the protection against this kind of scenario
      under the stupid assumption that the hotplug serialization and the
      watchdog_mutex cover everything. 
      
      Bring it back.
      
      Reverts: a33d4484 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
      Reported-and-tested-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NThomas Feels-stupid Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Don Zickus <dzickus@redhat.com>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710312145190.1942@nanos
      
      
      9c388a5e
  8. 01 11月, 2017 3 次提交
  9. 31 10月, 2017 1 次提交
  10. 30 10月, 2017 2 次提交
    • L
      workqueue: Fix NULL pointer dereference · cef572ad
      Li Bin 提交于
      When queue_work() is used in irq (not in task context), there is
      a potential case that trigger NULL pointer dereference.
      ----------------------------------------------------------------
      worker_thread()
      |-spin_lock_irq()
      |-process_one_work()
      	|-worker->current_pwq = pwq
      	|-spin_unlock_irq()
      	|-worker->current_func(work)
      	|-spin_lock_irq()
       	|-worker->current_pwq = NULL
      |-spin_unlock_irq()
      
      				//interrupt here
      				|-irq_handler
      					|-__queue_work()
      						//assuming that the wq is draining
      						|-is_chained_work(wq)
      							|-current_wq_worker()
      							//Here, 'current' is the interrupted worker!
      								|-current->current_pwq is NULL here!
      |-schedule()
      ----------------------------------------------------------------
      
      Avoid it by checking for task context in current_wq_worker(), and
      if not in task context, we shouldn't use the 'current' to check the
      condition.
      Reported-by: NXiaofei Tan <tanxiaofei@huawei.com>
      Signed-off-by: NLi Bin <huawei.libin@huawei.com>
      Reviewed-by: NLai Jiangshan <jiangshanlai@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Fixes: 8d03ecfe ("workqueue: reimplement is_chained_work() using current_wq_worker()")
      Cc: stable@vger.kernel.org # v3.9+
      cef572ad
    • T
      perf/cgroup: Fix perf cgroup hierarchy support · be96b316
      Tejun Heo 提交于
      The following commit:
      
        864c2357 ("perf/core: Do not set cpuctx->cgrp for unscheduled cgroups")
      
      made list_update_cgroup_event() skip setting cpuctx->cgrp if no cgroup event
      targets %current's cgroup.
      
      This breaks perf_event's hierarchical support because events which target one
      of the ancestors get ignored.
      
      Fix it by using cgroup_is_descendant() test instead of equality.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kernel-team@fb.com
      Cc: stable@vger.kernel.org # v4.9+
      Fixes: 864c2357 ("perf/core: Do not set cpuctx->cgrp for unscheduled cgroups")
      Link: http://lkml.kernel.org/r/20171028164237.GA972780@devbig577.frc2.facebook.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      be96b316
  11. 29 10月, 2017 2 次提交
    • C
      genirq: Document vcpu_info usage for percpu_devid interrupts · 250a53d6
      Christoffer Dall 提交于
      It is currently unclear how to set the VCPU affinity for a percpu_devid
      interrupt , since the Linux irq_data structure describes the state for
      multiple interrupts, one for each physical CPU on the system.  Since
      each such interrupt can be associated with different VCPUs or none at
      all, associating a single VCPU state with such an interrupt does not
      capture the necessary semantics.
      
      The implementers of irq_set_affinity are the Intel and AMD IOMMUs, and
      the ARM GIC irqchip.  The Intel and AMD callers do not appear to use
      percpu_devid interrupts, and the ARM GIC implementation only checks the
      pointer against NULL vs. non-NULL.
      
      Therefore, simply update the function documentation to explain the
      expected use in the context of percpu_devid interrupts, allowing future
      changes or additions to irqchip implementers to do the right thing.
      Signed-off-by: NChristoffer Dall <cdall@linaro.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: kvm@vger.kernel.org
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Eric Auger <eric.auger@redhat.com>
      Cc: kvmarm@lists.cs.columbia.edu
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lkml.kernel.org/r/1509093281-15225-13-git-send-email-cdall@linaro.org
      250a53d6
    • J
      bpf: rename sk_actions to align with bpf infrastructure · bfa64075
      John Fastabend 提交于
      Recent additions to support multiple programs in cgroups impose
      a strict requirement, "all yes is yes, any no is no". To enforce
      this the infrastructure requires the 'no' return code, SK_DROP in
      this case, to be 0.
      
      To apply these rules to SK_SKB program types the sk_actions return
      codes need to be adjusted.
      
      This fix adds SK_PASS and makes 'SK_DROP = 0'. Finally, remove
      SK_ABORTED to remove any chance that the API may allow aborted
      program flows to be passed up the stack. This would be incorrect
      behavior and allow programs to break existing policies.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfa64075