1. 03 10月, 2014 1 次提交
  2. 25 9月, 2014 2 次提交
    • Z
      cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags · 2ad654bc
      Zefan Li 提交于
      When we change cpuset.memory_spread_{page,slab}, cpuset will flip
      PF_SPREAD_{PAGE,SLAB} bit of tsk->flags for each task in that cpuset.
      This should be done using atomic bitops, but currently we don't,
      which is broken.
      
      Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened
      when one thread tried to clear PF_USED_MATH while at the same time another
      thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on
      the same task.
      
      Here's the full report:
      https://lkml.org/lkml/2014/9/19/230
      
      To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags.
      
      v4:
      - updated mm/slab.c. (Fengguang Wu)
      - updated Documentation.
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Miao Xie <miaox@cn.fujitsu.com>
      Cc: Kees Cook <keescook@chromium.org>
      Fixes: 950592f7 ("cpusets: update tasks' page/slab spread flags in time")
      Cc: <stable@vger.kernel.org> # 2.6.31+
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NZefan Li <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      2ad654bc
    • R
      Revert "PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()" · 5c4dd348
      Rafael J. Wysocki 提交于
      Revert commit 6efde38f (PM / Hibernate: Iterate over set bits
      instead of PFNs in swsusp_free()) that introduced a NULL pointer
      dereference during system resume from hibernation:
      
      BUG: unable to handle kernel NULL pointer dereference at (null)
      IP: [<ffffffff810a8cc1>] swsusp_free+0x21/0x190
      PGD b39c2067 PUD b39c1067 PMD 0
      Oops: 0000 [#1] SMP
      Modules linked in: <irrelevant list of modules>
      CPU: 1 PID: 4898 Comm: s2disk Tainted: G         C     3.17-rc5-amd64 #1 Debian 3.17~rc5-1~exp1
      Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
      task: ffff88023155ea40 ti: ffff8800b3b14000 task.ti: ffff8800b3b14000
      RIP: 0010:[<ffffffff810a8cc1>]  [<ffffffff810a8cc1>]
      swsusp_free+0x21/0x190
      RSP: 0018:ffff8800b3b17ea8  EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff8800b39bab00 RCX: 0000000000000001
      RDX: ffff8800b39bab10 RSI: ffff8800b39bab00 RDI: 0000000000000000
      RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000000
      R10: ffff8800b39bab10 R11: 0000000000000246 R12: ffffea0000000000
      R13: ffff880232f485a0 R14: ffff88023ac27cd8 R15: ffff880232927590
      FS:  00007f406d83b700(0000) GS:ffff88023bc80000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000000 CR3: 00000000b3a62000 CR4: 00000000000007e0
      Stack:
       ffff8800b39bab00 0000000000000010 ffff880232927590 ffffffff810acb4a
       ffff8800b39bab00 ffffffff811a955a ffff8800b39bab10 0000000000000000
       ffff88023155f098 ffffffff81a6b8c0 ffff88023155ea40 0000000000000007
      Call Trace:
       [<ffffffff810acb4a>] ? snapshot_release+0x2a/0xb0
       [<ffffffff811a955a>] ? __fput+0xca/0x1d0
       [<ffffffff81080627>] ? task_work_run+0x97/0xd0
       [<ffffffff81012d89>] ? do_notify_resume+0x69/0xa0
       [<ffffffff8151452a>] ? int_signal+0x12/0x17
      Code: 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 54 48 8b 05 ba 62 9c 00 49 bc 00 00 00 00 00 ea ff ff 48 8b 3d a1 62 9c 00 55 53 <48> 8b 10 48 89 50 18 48 8b 52 20 48 c7 40 28 00 00 00 00 c7 40
      RIP  [<ffffffff810a8cc1>] swsusp_free+0x21/0x190
       RSP <ffff8800b3b17ea8>
      CR2: 0000000000000000
      ---[ end trace f02be86a1ec0cccb ]---
      
      due to forbidden_pages_map being NULL in swsusp_free().
      
      Fixes: 6efde38f "PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()"
      Reported-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5c4dd348
  3. 19 9月, 2014 1 次提交
  4. 13 9月, 2014 5 次提交
    • R
      alarmtimer: Lock k_itimer during timer callback · 474e941b
      Richard Larocque 提交于
      Locks the k_itimer's it_lock member when handling the alarm timer's
      expiry callback.
      
      The regular posix timers defined in posix-timers.c have this lock held
      during timout processing because their callbacks are routed through
      posix_timer_fn().  The alarm timers follow a different path, so they
      ought to grab the lock somewhere else.
      
      Cc: stable@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Sharvil Nanavati <sharvil@google.com>
      Signed-off-by: NRichard Larocque <rlarocque@google.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      474e941b
    • R
      alarmtimer: Do not signal SIGEV_NONE timers · 265b81d2
      Richard Larocque 提交于
      Avoids sending a signal to alarm timers created with sigev_notify set to
      SIGEV_NONE by checking for that special case in the timeout callback.
      
      The regular posix timers avoid sending signals to SIGEV_NONE timers by
      not scheduling any callbacks for them in the first place.  Although it
      would be possible to do something similar for alarm timers, it's simpler
      to handle this as a special case in the timeout.
      
      Prior to this patch, the alarm timer would ignore the sigev_notify value
      and try to deliver signals to the process anyway.  Even worse, the
      sanity check for the value of sigev_signo is skipped when SIGEV_NONE was
      specified, so the signal number could be bogus.  If sigev_signo was an
      unitialized value (as it often would be if SIGEV_NONE is used), then
      it's hard to predict which signal will be sent.
      
      Cc: stable@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Sharvil Nanavati <sharvil@google.com>
      Signed-off-by: NRichard Larocque <rlarocque@google.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      265b81d2
    • R
      alarmtimer: Return relative times in timer_gettime · e86fea76
      Richard Larocque 提交于
      Returns the time remaining for an alarm timer, rather than the time at
      which it is scheduled to expire.  If the timer has already expired or it
      is not currently scheduled, the it_value's members are set to zero.
      
      This new behavior matches that of the other posix-timers and the POSIX
      specifications.
      
      This is a change in user-visible behavior, and may break existing
      applications.  Hopefully, few users rely on the old incorrect behavior.
      
      Cc: stable@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Sharvil Nanavati <sharvil@google.com>
      Signed-off-by: NRichard Larocque <rlarocque@google.com>
      [jstultz: minor style tweak]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      e86fea76
    • A
      jiffies: Fix timeval conversion to jiffies · d78c9300
      Andrew Hunter 提交于
      timeval_to_jiffies tried to round a timeval up to an integral number
      of jiffies, but the logic for doing so was incorrect: intervals
      corresponding to exactly N jiffies would become N+1. This manifested
      itself particularly repeatedly stopping/starting an itimer:
      
      setitimer(ITIMER_PROF, &val, NULL);
      setitimer(ITIMER_PROF, NULL, &val);
      
      would add a full tick to val, _even if it was exactly representable in
      terms of jiffies_ (say, the result of a previous rounding.)  Doing
      this repeatedly would cause unbounded growth in val.  So fix the math.
      
      Here's what was wrong with the conversion: we essentially computed
      (eliding seconds)
      
      jiffies = usec  * (NSEC_PER_USEC/TICK_NSEC)
      
      by using scaling arithmetic, which took the best approximation of
      NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC =
      x/(2^USEC_JIFFIE_SC), and computed:
      
      jiffies = (usec * x) >> USEC_JIFFIE_SC
      
      and rounded this calculation up in the intermediate form (since we
      can't necessarily exactly represent TICK_NSEC in usec.) But the
      scaling arithmetic is a (very slight) *over*approximation of the true
      value; that is, instead of dividing by (1 usec/ 1 jiffie), we
      effectively divided by (1 usec/1 jiffie)-epsilon (rounding
      down). This would normally be fine, but we want to round timeouts up,
      and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this
      would be fine if our division was exact, but dividing this by the
      slightly smaller factor was equivalent to adding just _over_ 1 to the
      final result (instead of just _under_ 1, as desired.)
      
      In particular, with HZ=1000, we consistently computed that 10000 usec
      was 11 jiffies; the same was true for any exact multiple of
      TICK_NSEC.
      
      We could possibly still round in the intermediate form, adding
      something less than 2^USEC_JIFFIE_SC - 1, but easier still is to
      convert usec->nsec, round in nanoseconds, and then convert using
      time*spec*_to_jiffies.  This adds one constant multiplication, and is
      not observably slower in microbenchmarks on recent x86 hardware.
      
      Tested: the following program:
      
      int main() {
        struct itimerval zero = {{0, 0}, {0, 0}};
        /* Initially set to 10 ms. */
        struct itimerval initial = zero;
        initial.it_interval.tv_usec = 10000;
        setitimer(ITIMER_PROF, &initial, NULL);
        /* Save and restore several times. */
        for (size_t i = 0; i < 10; ++i) {
          struct itimerval prev;
          setitimer(ITIMER_PROF, &zero, &prev);
          /* on old kernels, this goes up by TICK_USEC every iteration */
          printf("previous value: %ld %ld %ld %ld\n",
                 prev.it_interval.tv_sec, prev.it_interval.tv_usec,
                 prev.it_value.tv_sec, prev.it_value.tv_usec);
          setitimer(ITIMER_PROF, &prev, NULL);
        }
          return 0;
      }
      
      Cc: stable@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Reviewed-by: NPaul Turner <pjt@google.com>
      Reported-by: NAaron Jacobs <jacobsa@google.com>
      Signed-off-by: NAndrew Hunter <ahh@google.com>
      [jstultz: Tweaked to apply to 3.17-rc]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      d78c9300
    • T
      futex: Unlock hb->lock in futex_wait_requeue_pi() error path · 13c42c2f
      Thomas Gleixner 提交于
      futex_wait_requeue_pi() calls futex_wait_setup(). If
      futex_wait_setup() succeeds it returns with hb->lock held and
      preemption disabled. Now the sanity check after this does:
      
              if (match_futex(&q.key, &key2)) {
      	   	ret = -EINVAL;
      		goto out_put_keys;
      	}
      
      which releases the keys but does not release hb->lock.
      
      So we happily return to user space with hb->lock held and therefor
      preemption disabled.
      
      Unlock hb->lock before taking the exit route.
      Reported-by: NDave "Trinity" Jones <davej@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDarren Hart <dvhart@linux.intel.com>
      Reviewed-by: NDavidlohr Bueso <dave@stgolabs.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409112318500.4178@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      13c42c2f
  5. 11 9月, 2014 2 次提交
    • R
      kcmp: fix standard comparison bug · acbbe6fb
      Rasmus Villemoes 提交于
      The C operator <= defines a perfectly fine total ordering on the set of
      values representable in a long.  However, unlike its namesake in the
      integers, it is not translation invariant, meaning that we do not have
      "b <= c" iff "a+b <= a+c" for all a,b,c.
      
      This means that it is always wrong to try to boil down the relationship
      between two longs to a question about the sign of their difference,
      because the resulting relation [a LEQ b iff a-b <= 0] is neither
      anti-symmetric or transitive.  The former is due to -LONG_MIN==LONG_MIN
      (take any two a,b with a-b = LONG_MIN; then a LEQ b and b LEQ a, but a !=
      b).  The latter can either be seen observing that x LEQ x+1 for all x,
      implying x LEQ x+1 LEQ x+2 ...  LEQ x-1 LEQ x; or more directly with the
      simple example a=LONG_MIN, b=0, c=1, for which a-b < 0, b-c < 0, but a-c >
      0.
      
      Note that it makes absolutely no difference that a transmogrying bijection
      has been applied before the comparison is done.  In fact, had the
      obfuscation not been done, one could probably not observe the bug
      (assuming all values being compared always lie in one half of the address
      space, the mathematical value of a-b is always representable in a long).
      As it stands, one can easily obtain three file descriptors exhibiting the
      non-transitivity of kcmp().
      
      Side note 1: I can't see that ensuring the MSB of the multiplier is
      set serves any purpose other than obfuscating the obfuscating code.
      
      Side note 2:
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <assert.h>
      #include <sys/syscall.h>
      
      enum kcmp_type {
              KCMP_FILE,
              KCMP_VM,
              KCMP_FILES,
              KCMP_FS,
              KCMP_SIGHAND,
              KCMP_IO,
              KCMP_SYSVSEM,
              KCMP_TYPES,
      };
      pid_t pid;
      
      int kcmp(pid_t pid1, pid_t pid2, int type,
      	 unsigned long idx1, unsigned long idx2)
      {
      	return syscall(SYS_kcmp, pid1, pid2, type, idx1, idx2);
      }
      int cmp_fd(int fd1, int fd2)
      {
      	int c = kcmp(pid, pid, KCMP_FILE, fd1, fd2);
      	if (c < 0) {
      		perror("kcmp");
      		exit(1);
      	}
      	assert(0 <= c && c < 3);
      	return c;
      }
      int cmp_fdp(const void *a, const void *b)
      {
      	static const int normalize[] = {0, -1, 1};
      	return normalize[cmp_fd(*(int*)a, *(int*)b)];
      }
      #define MAX 100 /* This is plenty; I've seen it trigger for MAX==3 */
      int main(int argc, char *argv[])
      {
      	int r, s, count = 0;
      	int REL[3] = {0,0,0};
      	int fd[MAX];
      	pid = getpid();
      	while (count < MAX) {
      		r = open("/dev/null", O_RDONLY);
      		if (r < 0)
      			break;
      		fd[count++] = r;
      	}
      	printf("opened %d file descriptors\n", count);
      	for (r = 0; r < count; ++r) {
      		for (s = r+1; s < count; ++s) {
      			REL[cmp_fd(fd[r], fd[s])]++;
      		}
      	}
      	printf("== %d\t< %d\t> %d\n", REL[0], REL[1], REL[2]);
      	qsort(fd, count, sizeof(fd[0]), cmp_fdp);
      	memset(REL, 0, sizeof(REL));
      
      	for (r = 0; r < count; ++r) {
      		for (s = r+1; s < count; ++s) {
      			REL[cmp_fd(fd[r], fd[s])]++;
      		}
      	}
      	printf("== %d\t< %d\t> %d\n", REL[0], REL[1], REL[2]);
      	return (REL[0] + REL[2] != 0);
      }
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reviewed-by: NCyrill Gorcunov <gorcunov@openvz.org>
      "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      acbbe6fb
    • P
      kernel/printk/printk.c: fix faulty logic in the case of recursive printk · 000a7d66
      Patrick Palka 提交于
      We shouldn't set text_len in the code path that detects printk recursion
      because text_len corresponds to the length of the string inside textbuf.
      A few lines down from the line
      
          text_len = strlen(recursion_msg);
      
      is the line
      
          text_len += vscnprintf(text + text_len, ...);
      
      So if printk detects recursion, it sets text_len to 29 (the length of
      recursion_msg) and logs an error.  Then the message supplied by the
      caller of printk is stored inside textbuf but offset by 29 bytes.  This
      means that the output of the recursive call to printk will contain 29
      bytes of garbage in front of it.
      
      This defect is caused by commit 458df9fd ("printk: remove separate
      printk_sched buffers and use printk buf instead") which turned the line
      
          text_len = vscnprintf(text, ...);
      
      into
      
          text_len += vscnprintf(text + text_len, ...);
      
      To fix this, this patch avoids setting text_len when logging the printk
      recursion error.  This patch also marks unlikely() the branch leading up
      to this code.
      
      Fixes: 458df9fd ("printk: remove separate printk_sched buffers and use printk buf instead")
      Signed-off-by: NPatrick Palka <patrick@parcs.ath.cx>
      Reviewed-by: NPetr Mladek <pmladek@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      000a7d66
  6. 09 9月, 2014 1 次提交
  7. 06 9月, 2014 2 次提交
  8. 05 9月, 2014 3 次提交
    • F
      nohz: Restore NMI safe local irq work for local nohz kick · 40bea039
      Frederic Weisbecker 提交于
      The local nohz kick is currently used by perf which needs it to be
      NMI-safe. Recent commit though (7d1311b9)
      changed its implementation to fire the local kick using the remote kick
      API. It was convenient to make the code more generic but the remote kick
      isn't NMI-safe.
      
      As a result:
      
      	WARNING: CPU: 3 PID: 18062 at kernel/irq_work.c:72 irq_work_queue_on+0x11e/0x140()
      	CPU: 3 PID: 18062 Comm: trinity-subchil Not tainted 3.16.0+ #34
      	0000000000000009 00000000903774d1 ffff880244e06c00 ffffffff9a7f1e37
      	0000000000000000 ffff880244e06c38 ffffffff9a0791dd ffff880244fce180
      	0000000000000003 ffff880244e06d58 ffff880244e06ef8 0000000000000000
      	Call Trace:
      	<NMI>  [<ffffffff9a7f1e37>] dump_stack+0x4e/0x7a
      	[<ffffffff9a0791dd>] warn_slowpath_common+0x7d/0xa0
      	[<ffffffff9a07930a>] warn_slowpath_null+0x1a/0x20
      	[<ffffffff9a17ca1e>] irq_work_queue_on+0x11e/0x140
      	[<ffffffff9a10a2c7>] tick_nohz_full_kick_cpu+0x57/0x90
      	[<ffffffff9a186cd5>] __perf_event_overflow+0x275/0x350
      	[<ffffffff9a184f80>] ? perf_event_task_disable+0xa0/0xa0
      	[<ffffffff9a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
      	[<ffffffff9a187934>] perf_event_overflow+0x14/0x20
      	[<ffffffff9a020386>] intel_pmu_handle_irq+0x206/0x410
      	[<ffffffff9a0b54d3>] ? arch_vtime_task_switch+0x63/0x130
      	[<ffffffff9a01937b>] perf_event_nmi_handler+0x2b/0x50
      	[<ffffffff9a007b72>] nmi_handle+0xd2/0x390
      	[<ffffffff9a007aa5>] ? nmi_handle+0x5/0x390
      	[<ffffffff9a0d131b>] ? lock_release+0xab/0x330
      	[<ffffffff9a008062>] default_do_nmi+0x72/0x1c0
      	[<ffffffff9a0c925f>] ? cpuacct_account_field+0xcf/0x200
      	[<ffffffff9a008268>] do_nmi+0xb8/0x100
      
      Lets fix this by restoring the use of local irq work for the nohz local
      kick.
      Reported-by: NCatalin Iacob <iacobcatalin@gmail.com>
      Reported-and-tested-by: NDave Jones <davej@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      40bea039
    • L
      cgroup: check cgroup liveliness before unbreaking kernfs · aa32362f
      Li Zefan 提交于
      When cgroup_kn_lock_live() is called through some kernfs operation and
      another thread is calling cgroup_rmdir(), we'll trigger the warning in
      cgroup_get().
      
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 1228 at kernel/cgroup.c:1034 cgroup_get+0x89/0xa0()
      ...
      Call Trace:
       [<c16ee73d>] dump_stack+0x41/0x52
       [<c10468ef>] warn_slowpath_common+0x7f/0xa0
       [<c104692d>] warn_slowpath_null+0x1d/0x20
       [<c10bb999>] cgroup_get+0x89/0xa0
       [<c10bbe58>] cgroup_kn_lock_live+0x28/0x70
       [<c10be3c1>] __cgroup_procs_write.isra.26+0x51/0x230
       [<c10be5b2>] cgroup_tasks_write+0x12/0x20
       [<c10bb7b0>] cgroup_file_write+0x40/0x130
       [<c11aee71>] kernfs_fop_write+0xd1/0x160
       [<c1148e58>] vfs_write+0x98/0x1e0
       [<c114934d>] SyS_write+0x4d/0xa0
       [<c16f656b>] sysenter_do_call+0x12/0x12
      ---[ end trace 6f2e0c38c2108a74 ]---
      
      Fix this by calling css_tryget() instead of cgroup_get().
      
      v2:
      - move cgroup_tryget() right below cgroup_get() definition. (Tejun)
      
      Cc: <stable@vger.kernel.org> # 3.15+
      Reported-by: NToralf Förster <toralf.foerster@gmx.de>
      Signed-off-by: NZefan Li <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      aa32362f
    • L
      cgroup: delay the clearing of cgrp->kn->priv · a4189487
      Li Zefan 提交于
      Run these two scripts concurrently:
      
          for ((; ;))
          {
              mkdir /cgroup/sub
              rmdir /cgroup/sub
          }
      
          for ((; ;))
          {
              echo $$ > /cgroup/sub/cgroup.procs
              echo $$ > /cgroup/cgroup.procs
          }
      
      A kernel bug will be triggered:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000038
      IP: [<c10bbd69>] cgroup_put+0x9/0x80
      ...
      Call Trace:
       [<c10bbe19>] cgroup_kn_unlock+0x39/0x50
       [<c10bbe91>] cgroup_kn_lock_live+0x61/0x70
       [<c10be3c1>] __cgroup_procs_write.isra.26+0x51/0x230
       [<c10be5b2>] cgroup_tasks_write+0x12/0x20
       [<c10bb7b0>] cgroup_file_write+0x40/0x130
       [<c11aee71>] kernfs_fop_write+0xd1/0x160
       [<c1148e58>] vfs_write+0x98/0x1e0
       [<c114934d>] SyS_write+0x4d/0xa0
       [<c16f656b>] sysenter_do_call+0x12/0x12
      
      We clear cgrp->kn->priv in the end of cgroup_rmdir(), but another
      concurrent thread can access kn->priv after the clearing.
      
      We should move the clearing to css_release_work_fn(). At that time
      no one is holding reference to the cgroup and no one can gain a new
      reference to access it.
      
      v2:
      - move RCU_INIT_POINTER() into the else block. (Tejun)
      - remove the cgroup_parent() check. (Tejun)
      - update the comment in css_tryget_online_from_dir().
      
      Cc: <stable@vger.kernel.org> # 3.15+
      Reported-by: NToralf Förster <toralf.foerster@gmx.de>
      Signed-off-by: NZefan Li <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a4189487
  9. 03 9月, 2014 1 次提交
  10. 30 8月, 2014 2 次提交
    • V
      kexec: create a new config option CONFIG_KEXEC_FILE for new syscall · 74ca317c
      Vivek Goyal 提交于
      Currently new system call kexec_file_load() and all the associated code
      compiles if CONFIG_KEXEC=y.  But new syscall also compiles purgatory
      code which currently uses gcc option -mcmodel=large.  This option seems
      to be available only gcc 4.4 onwards.
      
      Hiding new functionality behind a new config option will not break
      existing users of old gcc.  Those who wish to enable new functionality
      will require new gcc.  Having said that, I am trying to figure out how
      can I move away from using -mcmodel=large but that can take a while.
      
      I think there are other advantages of introducing this new config
      option.  As this option will be enabled only on x86_64, other arches
      don't have to compile generic kexec code which will never be used.  This
      new code selects CRYPTO=y and CRYPTO_SHA256=y.  And all other arches had
      to do this for CONFIG_KEXEC.  Now with introduction of new config
      option, we can remove crypto dependency from other arches.
      
      Now CONFIG_KEXEC_FILE is available only on x86_64.  So whereever I had
      CONFIG_X86_64 defined, I got rid of that.
      
      For CONFIG_KEXEC_FILE, instead of doing select CRYPTO=y, I changed it to
      "depends on CRYPTO=y".  This should be safer as "select" is not
      recursive.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Tested-by: NShaun Ruffell <sruffell@digium.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      74ca317c
    • V
      resource: fix the case of null pointer access · 800df627
      Vivek Goyal 提交于
      Richard and Daniel reported that UML is broken due to changes to
      resource traversal functions.  Problem is that iomem_resource.child can
      be null and new code does not consider that possibility.  Old code used
      a for loop and that loop will not even execute if p was null.
      
      Revert back to for() loop logic and bail out if p is null.
      
      I also moved sibling_only check out of resource_lock. There is no
      reason to keep it inside the lock.
      
      Following is backtrace of the UML crash.
      
      RIP: 0033:[<0000000060039b9f>]
      RSP: 0000000081459da0  EFLAGS: 00010202
      RAX: 0000000000000000 RBX: 00000000219b3fff RCX: 000000006010d1d9
      RDX: 0000000000000001 RSI: 00000000602dfb94 RDI: 0000000081459df8
      RBP: 0000000081459de0 R08: 00000000601b59f4 R09: ffffffff0000ff00
      R10: ffffffff0000ff00 R11: 0000000081459e88 R12: 0000000081459df8
      R13: 00000000219b3fff R14: 00000000602dfb94 R15: 0000000000000000
      Kernel panic - not syncing: Segfault with no mm
      CPU: 0 PID: 1 Comm: swapper Not tainted 3.16.0-10454-g58d08e3b #13
      Stack:
       00000000 000080d0 81459df0 219b3fff
       81459e70 6010d1d9 ffffffff 6033e010
       81459e50 6003a269 81459e30 00000000
      Call Trace:
       [<6010d1d9>] ? kclist_add_private+0x0/0xe7
       [<6003a269>] walk_system_ram_range+0x61/0xb7
       [<6000e859>] ? proc_kcore_init+0x0/0xf1
       [<6010d574>] kcore_update_ram+0x4c/0x168
       [<6010d72e>] ? kclist_add+0x0/0x2e
       [<6000e943>] proc_kcore_init+0xea/0xf1
       [<6000e859>] ? proc_kcore_init+0x0/0xf1
       [<6000e859>] ? proc_kcore_init+0x0/0xf1
       [<600189f0>] do_one_initcall+0x13c/0x204
       [<6004ca46>] ? parse_args+0x1df/0x2e0
       [<6004c82d>] ? parameq+0x0/0x3a
       [<601b5990>] ? strcpy+0x0/0x18
       [<60001e1a>] kernel_init_freeable+0x240/0x31e
       [<6026f1c0>] kernel_init+0x12/0x148
       [<60019fad>] new_thread_handler+0x81/0xa3
      
      Fixes 8c86e70a ("resource: provide new functions to walk
      through resources").
      Reported-by: NDaniel Walter <sahne@0x90.at>
      Tested-by: NRichard Weinberger <richard@nod.at>
      Tested-by: NToralf Förster <toralf.foerster@gmx.de>
      Tested-by: NDaniel Walter <sahne@0x90.at>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      800df627
  11. 28 8月, 2014 1 次提交
  12. 26 8月, 2014 2 次提交
  13. 23 8月, 2014 6 次提交
    • S
      ftrace: Use current addr when converting to nop in __ftrace_replace_code() · 39b5552c
      Steven Rostedt (Red Hat) 提交于
      In __ftrace_replace_code(), when converting the call to a nop in a function
      it needs to compare against the "curr" (current) value of the ftrace ops, and
      not the "new" one. It currently does not affect x86 which is the only arch
      to do the trampolines with function graph tracer, but when other archs that do
      depend on this code implement the function graph trampoline, it can crash.
      
      Here's an example when ARM uses the trampolines (in the future):
      
       ------------[ cut here ]------------
       WARNING: CPU: 0 PID: 9 at kernel/trace/ftrace.c:1716 ftrace_bug+0x17c/0x1f4()
       Modules linked in: omap_rng rng_core ipv6
       CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.16.0-test-10959-gf0094b28-dirty #52
       [<c02188f4>] (unwind_backtrace) from [<c021343c>] (show_stack+0x20/0x24)
       [<c021343c>] (show_stack) from [<c095a674>] (dump_stack+0x78/0x94)
       [<c095a674>] (dump_stack) from [<c02532a0>] (warn_slowpath_common+0x7c/0x9c)
       [<c02532a0>] (warn_slowpath_common) from [<c02532ec>] (warn_slowpath_null+0x2c/0x34)
       [<c02532ec>] (warn_slowpath_null) from [<c02cbac4>] (ftrace_bug+0x17c/0x1f4)
       [<c02cbac4>] (ftrace_bug) from [<c02cc44c>] (ftrace_replace_code+0x80/0x9c)
       [<c02cc44c>] (ftrace_replace_code) from [<c02cc658>] (ftrace_modify_all_code+0xb8/0x164)
       [<c02cc658>] (ftrace_modify_all_code) from [<c02cc718>] (__ftrace_modify_code+0x14/0x1c)
       [<c02cc718>] (__ftrace_modify_code) from [<c02c7244>] (multi_cpu_stop+0xf4/0x134)
       [<c02c7244>] (multi_cpu_stop) from [<c02c6e90>] (cpu_stopper_thread+0x54/0x130)
       [<c02c6e90>] (cpu_stopper_thread) from [<c0271cd4>] (smpboot_thread_fn+0x1ac/0x1bc)
       [<c0271cd4>] (smpboot_thread_fn) from [<c026ddf0>] (kthread+0xe0/0xfc)
       [<c026ddf0>] (kthread) from [<c020f318>] (ret_from_fork+0x14/0x20)
       ---[ end trace dc9ce72c5b617d8f ]---
      [   65.047264] ftrace failed to modify [<c0208580>] asm_do_IRQ+0x10/0x1c
      [   65.054070]  actual: 85:1b:00:eb
      
      Fixes: 7413af1f "ftrace: Make get_ftrace_addr() and get_ftrace_addr_old() global"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      39b5552c
    • S
      ftrace: Fix function_profiler and function tracer together · 5f151b24
      Steven Rostedt (Red Hat) 提交于
      The latest rewrite of ftrace removed the separate ftrace_ops of
      the function tracer and the function graph tracer and had them
      share the same ftrace_ops. This simplified the accounting by removing
      the multiple layers of functions called, where the global_ops func
      would call a special list that would iterate over the other ops that
      were registered within it (like function and function graph), which
      itself was registered to the ftrace ops list of all functions
      currently active. If that sounds confusing, the code that implemented
      it was also confusing and its removal is a good thing.
      
      The problem with this change was that it assumed that the function
      and function graph tracer can never be used at the same time.
      This is mostly true, but there is an exception. That is when the
      function profiler uses the function graph tracer to profile.
      The function profiler can be activated the same time as the function
      tracer, and this breaks the assumption and the result is that ftrace
      will crash (it detects the error and shuts itself down, it does not
      cause a kernel oops).
      
      To solve this issue, a previous change allowed the hash tables
      for the functions traced by a ftrace_ops to be a pointer and let
      multiple ftrace_ops share the same hash. This allows the function
      and function_graph tracer to have separate ftrace_ops, but still
      share the hash, which is what is done.
      
      Now the function and function graph tracers have separate ftrace_ops
      again, and the function tracer can be run while the function_profile
      is active.
      
      Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5f151b24
    • S
      ftrace: Fix up trampoline accounting with looping on hash ops · bce0b6c5
      Steven Rostedt (Red Hat) 提交于
      Now that a ftrace_hash can be shared by multiple ftrace_ops, they can dec
      the rec->flags by more than once (one per those that share the ftrace_hash).
      This means that the tramp_hash may not have a hash item when it was added.
      
      For example, if two ftrace_ops share a hash for a ftrace record, and the
      first ops has a trampoline, when it adds itself it will set the rec->flags
      TRAMP flag and increments its nr_trampolines counter. When the second ops
      is added, it must clear that tramp flag but also decrement the other ops
      that shares its hash. As the update to the function callbacks has not yet
      been performed, the other ops will not have the tramp hash set yet and it
      can not be used to know to decrement its nr_trampolines.
      
      Luckily, the tramp_hash does not need to be used. As the ftrace_mutex is
      held, a ops with a trampoline to a record during an update of another ops
      that shares the record will have its func_hash pointing to it. Since a
      trampoline can only be set for a record if only one ops is attached to it,
      we can just check if the record has a trampoline (the FTRACE_FL_TRAMP flag
      is set) and then find the ops that has this record in its hashes.
      
      Also added some output to help debug when things go wrong.
      
      Cc: stable@vger.kernel.org # 3.16+ (apply after 3.17-rc4 is out)
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bce0b6c5
    • S
      ftrace: Update all ftrace_ops for a ftrace_hash_ops update · 84261912
      Steven Rostedt (Red Hat) 提交于
      When updating what an ftrace_ops traces, if it is registered (that is,
      actively tracing), and that ftrace_ops uses the shared global_ops
      local_hash, then we need to update all tracers that are active and
      also share the global_ops' ftrace_hash_ops.
      
      Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      84261912
    • V
      cgroup: Display legacy cgroup files on default hierarchy · fa8137be
      Vivek Goyal 提交于
      Kernel command line parameter cgroup__DEVEL__legacy_files_on_dfl forces
      legacy cgroup files to show up on default hierarhcy if susbsystem does
      not have any files defined for default hierarchy.
      
      But this seems to be working only if legacy files are defined in
      ss->legacy_cftypes. If one adds some cftypes later using
      cgroup_add_legacy_cftypes(), these files don't show up on default
      hierarchy.  Update the function accordingly so that the dynamically
      added legacy files also show up in the default hierarchy if the target
      subsystem is also using the base legacy files for the default
      hierarchy.
      
      tj: Patch description and comment updates.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      fa8137be
    • S
      ftrace: Allow ftrace_ops to use the hashes from other ops · 33b7f99c
      Steven Rostedt (Red Hat) 提交于
      Currently the top level debug file system function tracer shares its
      ftrace_ops with the function graph tracer. This was thought to be fine
      because the tracers are not used together, as one can only enable
      function or function_graph tracer in the current_tracer file.
      
      But that assumption proved to be incorrect. The function profiler
      can use the function graph tracer when function tracing is enabled.
      Since all function graph users uses the function tracing ftrace_ops
      this causes a conflict and when a user enables both function profiling
      as well as the function tracer it will crash ftrace and disable it.
      
      The quick solution so far is to move them as separate ftrace_ops like
      it was earlier. The problem though is to synchronize the functions that
      are traced because both function and function_graph tracer are limited
      by the selections made in the set_ftrace_filter and set_ftrace_notrace
      files.
      
      To handle this, a new structure is made called ftrace_ops_hash. This
      structure will now hold the filter_hash and notrace_hash, and the
      ftrace_ops will point to this structure. That will allow two ftrace_ops
      to share the same hashes.
      
      Since most ftrace_ops do not share the hashes, and to keep allocation
      simple, the ftrace_ops structure will include both a pointer to the
      ftrace_ops_hash called func_hash, as well as the structure itself,
      called local_hash. When the ops are registered, the func_hash pointer
      will be initialized to point to the local_hash within the ftrace_ops
      structure. Some of the ftrace internal ftrace_ops will be initialized
      statically. This will allow for the function and function_graph tracer
      to have separate ops but still share the same hash tables that determine
      what functions they trace.
      
      Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      33b7f99c
  14. 20 8月, 2014 1 次提交
  15. 18 8月, 2014 1 次提交
  16. 16 8月, 2014 1 次提交
  17. 15 8月, 2014 1 次提交
  18. 13 8月, 2014 1 次提交
  19. 12 8月, 2014 1 次提交
  20. 09 8月, 2014 5 次提交
    • V
      kexec: verify the signature of signed PE bzImage · 8e7d8381
      Vivek Goyal 提交于
      This is the final piece of the puzzle of verifying kernel image signature
      during kexec_file_load() syscall.
      
      This patch calls into PE file routines to verify signature of bzImage.  If
      signature are valid, kexec_file_load() succeeds otherwise it fails.
      
      Two new config options have been introduced.  First one is
      CONFIG_KEXEC_VERIFY_SIG.  This option enforces that kernel has to be
      validly signed otherwise kernel load will fail.  If this option is not
      set, no signature verification will be done.  Only exception will be when
      secureboot is enabled.  In that case signature verification should be
      automatically enforced when secureboot is enabled.  But that will happen
      when secureboot patches are merged.
      
      Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG.  This option
      enables signature verification support on bzImage.  If this option is not
      set and previous one is set, kernel image loading will fail because kernel
      does not have support to verify signature of bzImage.
      
      I tested these patches with both "pesign" and "sbsign" signed bzImages.
      
      I used signing_key.priv key and signing_key.x509 cert for signing as
      generated during kernel build process (if module signing is enabled).
      
      Used following method to sign bzImage.
      
      pesign
      ======
      - Convert DER format cert to PEM format cert
      openssl x509 -in signing_key.x509 -inform DER -out signing_key.x509.PEM -outform
      PEM
      
      - Generate a .p12 file from existing cert and private key file
      openssl pkcs12 -export -out kernel-key.p12 -inkey signing_key.priv -in
      signing_key.x509.PEM
      
      - Import .p12 file into pesign db
      pk12util -i /tmp/kernel-key.p12 -d /etc/pki/pesign
      
      - Sign bzImage
      pesign -i /boot/vmlinuz-3.16.0-rc3+ -o /boot/vmlinuz-3.16.0-rc3+.signed.pesign
      -c "Glacier signing key - Magrathea" -s
      
      sbsign
      ======
      sbsign --key signing_key.priv --cert signing_key.x509.PEM --output
      /boot/vmlinuz-3.16.0-rc3+.signed.sbsign /boot/vmlinuz-3.16.0-rc3+
      
      Patch details:
      
      Well all the hard work is done in previous patches.  Now bzImage loader
      has just call into that code and verify whether bzImage signature are
      valid or not.
      
      Also create two config options.  First one is CONFIG_KEXEC_VERIFY_SIG.
      This option enforces that kernel has to be validly signed otherwise kernel
      load will fail.  If this option is not set, no signature verification will
      be done.  Only exception will be when secureboot is enabled.  In that case
      signature verification should be automatically enforced when secureboot is
      enabled.  But that will happen when secureboot patches are merged.
      
      Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG.  This option
      enables signature verification support on bzImage.  If this option is not
      set and previous one is set, kernel image loading will fail because kernel
      does not have support to verify signature of bzImage.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Matt Fleming <matt@console-pimps.org>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8e7d8381
    • V
      kexec: support for kexec on panic using new system call · dd5f7260
      Vivek Goyal 提交于
      This patch adds support for loading a kexec on panic (kdump) kernel usning
      new system call.
      
      It prepares ELF headers for memory areas to be dumped and for saved cpu
      registers.  Also prepares the memory map for second kernel and limits its
      boot to reserved areas only.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd5f7260
    • V
      kexec-bzImage64: support for loading bzImage using 64bit entry · 27f48d3e
      Vivek Goyal 提交于
      This is loader specific code which can load bzImage and set it up for
      64bit entry.  This does not take care of 32bit entry or real mode entry.
      
      32bit mode entry can be implemented if somebody needs it.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27f48d3e
    • V
      kexec: load and relocate purgatory at kernel load time · 12db5562
      Vivek Goyal 提交于
      Load purgatory code in RAM and relocate it based on the location.
      Relocation code has been inspired by module relocation code and purgatory
      relocation code in kexec-tools.
      
      Also compute the checksums of loaded kexec segments and store them in
      purgatory.
      
      Arch independent code provides this functionality so that arch dependent
      bootloaders can make use of it.
      
      Helper functions are provided to get/set symbol values in purgatory which
      are used by bootloaders later to set things like stack and entry point of
      second kernel etc.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12db5562
    • V
      kexec: implementation of new syscall kexec_file_load · cb105258
      Vivek Goyal 提交于
      Previous patch provided the interface definition and this patch prvides
      implementation of new syscall.
      
      Previously segment list was prepared in user space.  Now user space just
      passes kernel fd, initrd fd and command line and kernel will create a
      segment list internally.
      
      This patch contains generic part of the code.  Actual segment preparation
      and loading is done by arch and image specific loader.  Which comes in
      next patch.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb105258