1. 13 12月, 2013 3 次提交
    • D
      X.509: Fix certificate gathering · d7ec435f
      David Howells 提交于
      Fix the gathering of certificates from both the source tree and the build tree
      to correctly calculate the pathnames of all the certificates.
      
      The problem was that if the default generated cert, signing_key.x509, didn't
      exist then it would not have a path attached and if it did, it would have a
      path attached.
      
      This means that the contents of kernel/.x509.list would change between the
      first compilation in a directory and the second.  After the second it would
      remain stable because the signing_key.x509 file exists.
      
      The consequence was that the kernel would get relinked unconditionally on the
      second recompilation.  The second recompilation would also show something like
      this:
      
         X.509 certificate list changed
           CERTS   kernel/x509_certificate_list
           - Including cert /home/torvalds/v2.6/linux/signing_key.x509
           AS      kernel/system_certificates.o
           LD      kernel/built-in.o
      
      which is why the relink would happen.
      
      
      Unfortunately, it isn't a simple matter of just sticking a path on the front
      of the filename of the certificate in the build directory as make can't then
      work out how to build it.
      
      So the path has to be prepended to the name for sorting and duplicate
      elimination and then removed for the make rule if it is in the build tree.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d7ec435f
    • L
      futex: move user address verification up to common code · 5cdec2d8
      Linus Torvalds 提交于
      When debugging the read-only hugepage case, I was confused by the fact
      that get_futex_key() did an access_ok() only for the non-shared futex
      case, since the user address checking really isn't in any way specific
      to the private key handling.
      
      Now, it turns out that the shared key handling does effectively do the
      equivalent checks inside get_user_pages_fast() (it doesn't actually
      check the address range on x86, but does check the page protections for
      being a user page).  So it wasn't actually a bug, but the fact that we
      treat the address differently for private and shared futexes threw me
      for a loop.
      
      Just move the check up, so that it gets done for both cases.  Also, use
      the 'rw' parameter for the type, even if it doesn't actually matter any
      more (it's a historical artifact of the old racy i386 "page faults from
      kernel space don't check write protections").
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5cdec2d8
    • L
      futex: fix handling of read-only-mapped hugepages · f12d5bfc
      Linus Torvalds 提交于
      The hugepage code had the exact same bug that regular pages had in
      commit 7485d0d3 ("futexes: Remove rw parameter from
      get_futex_key()").
      
      The regular page case was fixed by commit 9ea71503 ("futex: Fix
      regression with read only mappings"), but the transparent hugepage case
      (added in a5b338f2: "thp: update futex compound knowledge") case
      remained broken.
      
      Found by Dave Jones and his trinity tool.
      Reported-and-tested-by: NDave Jones <davej@fedoraproject.org>
      Cc: stable@kernel.org # v2.6.38+
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f12d5bfc
  2. 11 12月, 2013 2 次提交
    • H
      KEYS: correct alignment of system_certificate_list content in assembly file · 62226983
      Hendrik Brueckner 提交于
      Apart from data-type specific alignment constraints, there are also
      architecture-specific alignment requirements.
      For example, on s390 symbols must be on even addresses implying a 2-byte
      alignment.  If the system_certificate_list_end symbol is on an odd address
      and if this address is loaded, the least-significant bit is ignored.  As a
      result, the load_system_certificate_list() fails to load the certificates
      because of a wrong certificate length calculation.
      
      To be safe, align system_certificate_list on an 8-byte boundary.  Also improve
      the length calculation of the system_certificate_list content.  Introduce a
      system_certificate_list_size (8-byte aligned because of unsigned long) variable
      that stores the length.  Let the linker calculate this size by introducing
      a start and end label for the certificate content.
      Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      62226983
    • R
      Ignore generated file kernel/x509_certificate_list · 7cfe5b33
      Rusty Russell 提交于
      $ git status
      # On branch pending-rebases
      # Untracked files:
      #   (use "git add <file>..." to include in what will be committed)
      #
      #	kernel/x509_certificate_list
      nothing added to commit but untracked files present (use "git add" to track)
      $
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      7cfe5b33
  3. 06 12月, 2013 1 次提交
  4. 29 11月, 2013 2 次提交
  5. 28 11月, 2013 2 次提交
    • T
      cgroup: fix cgroup_subsys_state leak for seq_files · e605b365
      Tejun Heo 提交于
      If a cgroup file implements either read_map() or read_seq_string(),
      such file is served using seq_file by overriding file->f_op to
      cgroup_seqfile_operations, which also overrides the release method to
      single_release() from cgroup_file_release().
      
      Because cgroup_file_open() didn't use to acquire any resources, this
      used to be fine, but since f7d58818 ("cgroup: pin
      cgroup_subsys_state when opening a cgroupfs file"), cgroup_file_open()
      pins the css (cgroup_subsys_state) which is put by
      cgroup_file_release().  The patch forgot to update the release path
      for seq_files and each open/release cycle leaks a css reference.
      
      Fix it by updating cgroup_file_release() to also handle seq_files and
      using it for seq_file release path too.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org # v3.12
      e605b365
    • P
      cpuset: Fix memory allocator deadlock · 0fc0287c
      Peter Zijlstra 提交于
      Juri hit the below lockdep report:
      
      [    4.303391] ======================================================
      [    4.303392] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
      [    4.303394] 3.12.0-dl-peterz+ #144 Not tainted
      [    4.303395] ------------------------------------------------------
      [    4.303397] kworker/u4:3/689 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
      [    4.303399]  (&p->mems_allowed_seq){+.+...}, at: [<ffffffff8114e63c>] new_slab+0x6c/0x290
      [    4.303417]
      [    4.303417] and this task is already holding:
      [    4.303418]  (&(&q->__queue_lock)->rlock){..-...}, at: [<ffffffff812d2dfb>] blk_execute_rq_nowait+0x5b/0x100
      [    4.303431] which would create a new lock dependency:
      [    4.303432]  (&(&q->__queue_lock)->rlock){..-...} -> (&p->mems_allowed_seq){+.+...}
      [    4.303436]
      
      [    4.303898] the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock:
      [    4.303918] -> (&p->mems_allowed_seq){+.+...} ops: 2762 {
      [    4.303922]    HARDIRQ-ON-W at:
      [    4.303923]                     [<ffffffff8108ab9a>] __lock_acquire+0x65a/0x1ff0
      [    4.303926]                     [<ffffffff8108cbe3>] lock_acquire+0x93/0x140
      [    4.303929]                     [<ffffffff81063dd6>] kthreadd+0x86/0x180
      [    4.303931]                     [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0
      [    4.303933]    SOFTIRQ-ON-W at:
      [    4.303933]                     [<ffffffff8108abcc>] __lock_acquire+0x68c/0x1ff0
      [    4.303935]                     [<ffffffff8108cbe3>] lock_acquire+0x93/0x140
      [    4.303940]                     [<ffffffff81063dd6>] kthreadd+0x86/0x180
      [    4.303955]                     [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0
      [    4.303959]    INITIAL USE at:
      [    4.303960]                    [<ffffffff8108a884>] __lock_acquire+0x344/0x1ff0
      [    4.303963]                    [<ffffffff8108cbe3>] lock_acquire+0x93/0x140
      [    4.303966]                    [<ffffffff81063dd6>] kthreadd+0x86/0x180
      [    4.303969]                    [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0
      [    4.303972]  }
      
      Which reports that we take mems_allowed_seq with interrupts enabled. A
      little digging found that this can only be from
      cpuset_change_task_nodemask().
      
      This is an actual deadlock because an interrupt doing an allocation will
      hit get_mems_allowed()->...->__read_seqcount_begin(), which will spin
      forever waiting for the write side to complete.
      
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Reported-by: NJuri Lelli <juri.lelli@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Tested-by: NJuri Lelli <juri.lelli@gmail.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      0fc0287c
  6. 27 11月, 2013 1 次提交
  7. 26 11月, 2013 2 次提交
    • S
      ftrace: Fix function graph with loading of modules · 8a56d776
      Steven Rostedt (Red Hat) 提交于
      Commit 8c4f3c3f "ftrace: Check module functions being traced on reload"
      fixed module loading and unloading with respect to function tracing, but
      it missed the function graph tracer. If you perform the following
      
       # cd /sys/kernel/debug/tracing
       # echo function_graph > current_tracer
       # modprobe nfsd
       # echo nop > current_tracer
      
      You'll get the following oops message:
      
       ------------[ cut here ]------------
       WARNING: CPU: 2 PID: 2910 at /linux.git/kernel/trace/ftrace.c:1640 __ftrace_hash_rec_update.part.35+0x168/0x1b9()
       Modules linked in: nfsd exportfs nfs_acl lockd ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt
       CPU: 2 PID: 2910 Comm: bash Not tainted 3.13.0-rc1-test #7
       Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
        0000000000000668 ffff8800787efcf8 ffffffff814fe193 ffff88007d500000
        0000000000000000 ffff8800787efd38 ffffffff8103b80a 0000000000000668
        ffffffff810b2b9a ffffffff81a48370 0000000000000001 ffff880037aea000
       Call Trace:
        [<ffffffff814fe193>] dump_stack+0x4f/0x7c
        [<ffffffff8103b80a>] warn_slowpath_common+0x81/0x9b
        [<ffffffff810b2b9a>] ? __ftrace_hash_rec_update.part.35+0x168/0x1b9
        [<ffffffff8103b83e>] warn_slowpath_null+0x1a/0x1c
        [<ffffffff810b2b9a>] __ftrace_hash_rec_update.part.35+0x168/0x1b9
        [<ffffffff81502f89>] ? __mutex_lock_slowpath+0x364/0x364
        [<ffffffff810b2cc2>] ftrace_shutdown+0xd7/0x12b
        [<ffffffff810b47f0>] unregister_ftrace_graph+0x49/0x78
        [<ffffffff810c4b30>] graph_trace_reset+0xe/0x10
        [<ffffffff810bf393>] tracing_set_tracer+0xa7/0x26a
        [<ffffffff810bf5e1>] tracing_set_trace_write+0x8b/0xbd
        [<ffffffff810c501c>] ? ftrace_return_to_handler+0xb2/0xde
        [<ffffffff811240a8>] ? __sb_end_write+0x5e/0x5e
        [<ffffffff81122aed>] vfs_write+0xab/0xf6
        [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85
        [<ffffffff81122dbd>] SyS_write+0x59/0x82
        [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85
        [<ffffffff8150a2d2>] system_call_fastpath+0x16/0x1b
       ---[ end trace 940358030751eafb ]---
      
      The above mentioned commit didn't go far enough. Well, it covered the
      function tracer by adding checks in __register_ftrace_function(). The
      problem is that the function graph tracer circumvents that (for a slight
      efficiency gain when function graph trace is running with a function
      tracer. The gain was not worth this).
      
      The problem came with ftrace_startup() which should always be called after
      __register_ftrace_function(), if you want this bug to be completely fixed.
      
      Anyway, this solution moves __register_ftrace_function() inside of
      ftrace_startup() and removes the need to call them both.
      Reported-by: NDave Wysochanski <dwysocha@redhat.com>
      Fixes: ed926f9b ("ftrace: Use counters to enable functions to trace")
      Cc: stable@vger.kernel.org # 3.0+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8a56d776
    • L
      irq: Enable all irqs unconditionally in irq_resume · ac01810c
      Laxman Dewangan 提交于
      When the system enters suspend, it disables all interrupts in
      suspend_device_irqs(), including the interrupts marked EARLY_RESUME.
      
      On the resume side things are different. The EARLY_RESUME interrupts
      are reenabled in sys_core_ops->resume and the non EARLY_RESUME
      interrupts are reenabled in the normal system resume path.
      
      When suspend_noirq() failed or suspend is aborted for any other
      reason, we might omit the resume side call to sys_core_ops->resume()
      and therefor the interrupts marked EARLY_RESUME are not reenabled and
      stay disabled forever.
      
      To solve this, enable all irqs unconditionally in irq_resume()
      regardless whether interrupts marked EARLY_RESUMEhave been already
      enabled or not.
      
      This might try to reenable already enabled interrupts in the non
      failure case, but the only affected platform is XEN and it has been
      confirmed that it does not cause any side effects.
      
      [ tglx: Massaged changelog. ]
      Signed-off-by: NLaxman Dewangan <ldewangan@nvidia.com>
      Acked-by-and-tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NHeiko Stuebner <heiko@sntech.de>
      Reviewed-by: NPavel Machek <pavel@ucw.cz>
      Cc: <ian.campbell@citrix.com>
      Cc: <rjw@rjwysocki.net>
      Cc: <len.brown@intel.com>
      Cc: <gregkh@linuxfoundation.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1385388587-16442-1-git-send-email-ldewangan@nvidia.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      ac01810c
  8. 23 11月, 2013 6 次提交
    • L
      workqueue: fix pool ID allocation leakage and remove BUILD_BUG_ON() in init_workqueues · 4e8b22bd
      Li Bin 提交于
      When one work starts execution, the high bits of work's data contain
      pool ID. It can represent a maximum of WORK_OFFQ_POOL_NONE. Pool ID
      is assigned WORK_OFFQ_POOL_NONE when the work being initialized
      indicating that no pool is associated and get_work_pool() uses it to
      check the associated pool. So if worker_pool_assign_id() assigns a
      ID greater than or equal WORK_OFFQ_POOL_NONE to a pool, it triggers
      leakage, and it may break the non-reentrance guarantee.
      
      This patch fix this issue by modifying the worker_pool_assign_id()
      function calling idr_alloc() by setting @end param WORK_OFFQ_POOL_NONE.
      
      Furthermore, in the current implementation, the BUILD_BUG_ON() in
      init_workqueues makes no sense. The number of worker pools needed
      cannot be determined at compile time, because the number of backing
      pools for UNBOUND workqueues is dynamic based on the assigned custom
      attributes. So remove it.
      
      tj: Minor comment and indentation updates.
      Signed-off-by: NLi Bin <huawei.libin@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      4e8b22bd
    • L
      workqueue: fix comment typo for __queue_work() · 9ef28a73
      Li Bin 提交于
      It seems the "dying" should be "draining" here.
      Signed-off-by: NLi Bin <huawei.libin@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      9ef28a73
    • T
      workqueue: fix ordered workqueues in NUMA setups · 8a2b7538
      Tejun Heo 提交于
      An ordered workqueue implements execution ordering by using single
      pool_workqueue with max_active == 1.  On a given pool_workqueue, work
      items are processed in FIFO order and limiting max_active to 1
      enforces the queued work items to be processed one by one.
      
      Unfortunately, 4c16bd32 ("workqueue: implement NUMA affinity for
      unbound workqueues") accidentally broke this guarantee by applying
      NUMA affinity to ordered workqueues too.  On NUMA setups, an ordered
      workqueue would end up with separate pool_workqueues for different
      nodes.  Each pool_workqueue still limits max_active to 1 but multiple
      work items may be executed concurrently and out of order depending on
      which node they are queued to.
      
      Fix it by using dedicated ordered_wq_attrs[] when creating ordered
      workqueues.  The new attrs match the unbound ones except that no_numa
      is always set thus forcing all NUMA nodes to share the default
      pool_workqueue.
      
      While at it, add sanity check in workqueue creation path which
      verifies that an ordered workqueues has only the default
      pool_workqueue.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NLibin <huawei.libin@huawei.com>
      Cc: stable@vger.kernel.org
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      8a2b7538
    • O
      workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY · 91151228
      Oleg Nesterov 提交于
      Move the setting of PF_NO_SETAFFINITY up before set_cpus_allowed()
      in create_worker(). Otherwise userland can change ->cpus_allowed
      in between.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      91151228
    • T
      cgroup: use a dedicated workqueue for cgroup destruction · e5fca243
      Tejun Heo 提交于
      Since be445626 ("cgroup: remove synchronize_rcu() from
      cgroup_diput()"), cgroup destruction path makes use of workqueue.  css
      freeing is performed from a work item from that point on and a later
      commit, ea15f8cc ("cgroup: split cgroup destruction into two
      steps"), moves css offlining to workqueue too.
      
      As cgroup destruction isn't depended upon for memory reclaim, the
      destruction work items were put on the system_wq; unfortunately, some
      controller may block in the destruction path for considerable duration
      while holding cgroup_mutex.  As large part of destruction path is
      synchronized through cgroup_mutex, when combined with high rate of
      cgroup removals, this has potential to fill up system_wq's max_active
      of 256.
      
      Also, it turns out that memcg's css destruction path ends up queueing
      and waiting for work items on system_wq through work_on_cpu().  If
      such operation happens while system_wq is fully occupied by cgroup
      destruction work items, work_on_cpu() can't make forward progress
      because system_wq is full and other destruction work items on
      system_wq can't make forward progress because the work item waiting
      for work_on_cpu() is holding cgroup_mutex, leading to deadlock.
      
      This can be fixed by queueing destruction work items on a separate
      workqueue.  This patch creates a dedicated workqueue -
      cgroup_destroy_wq - for this purpose.  As these work items shouldn't
      have inter-dependencies and mostly serialized by cgroup_mutex anyway,
      giving high concurrency level doesn't buy anything and the workqueue's
      @max_active is set to 1 so that destruction work items are executed
      one by one on each CPU.
      
      Hugh Dickins: Because cgroup_init() is run before init_workqueues(),
      cgroup_destroy_wq can't be allocated from cgroup_init().  Do it from a
      separate core_initcall().  In the future, we probably want to reorder
      so that workqueue init happens before cgroup_init().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NHugh Dickins <hughd@google.com>
      Reported-by: NShawn Bohrer <shawn.bohrer@gmail.com>
      Link: http://lkml.kernel.org/r/20131111220626.GA7509@sbohrermbp13-local.rgmadvisors.com
      Link: http://lkml.kernel.org/g/alpine.LNX.2.00.1310301606080.2333@eggly.anvils
      Cc: stable@vger.kernel.org # v3.9+
      e5fca243
    • M
      time: Fix 1ns/tick drift w/ GENERIC_TIME_VSYSCALL_OLD · 4be77398
      Martin Schwidefsky 提交于
      Since commit 1e75fa8b (time: Condense timekeeper.xtime
      into xtime_sec - merged in v3.6), there has been an problem
      with the error accounting in the timekeeping code, such that
      when truncating to nanoseconds, we round up to the next nsec,
      but the balancing adjustment to the ntp_error value was dropped.
      
      This causes 1ns per tick drift forward of the clock.
      
      In 3.7, this logic was isolated to only GENERIC_TIME_VSYSCALL_OLD
      architectures (s390, ia64, powerpc).
      
      The fix is simply to balance the accounting and to subtract the
      added nanosecond from ntp_error. This allows the internal long-term
      clock steering to keep the clock accurate.
      
      While this fix removes the regression added in 1e75fa8b, the
      ideal solution is to move away from GENERIC_TIME_VSYSCALL_OLD
      and use the new VSYSCALL method, which avoids entirely the
      nanosecond granular rounding, and the resulting short-term clock
      adjustment oscillation needed to keep long term accurate time.
      
      [ jstultz: Many thanks to Martin for his efforts identifying this
        	   subtle bug, and providing the fix. ]
      
      Originally-from: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Paul Turner <pjt@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable <stable@vger.kernel.org>  #v3.6+
      Link: http://lkml.kernel.org/r/1385149491-20307-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      4be77398
  9. 20 11月, 2013 5 次提交
  10. 19 11月, 2013 6 次提交
  11. 16 11月, 2013 1 次提交
  12. 15 11月, 2013 9 次提交