1. 29 11月, 2012 1 次提交
  2. 13 10月, 2012 1 次提交
  3. 09 10月, 2012 1 次提交
  4. 06 10月, 2012 1 次提交
  5. 03 10月, 2012 1 次提交
  6. 01 10月, 2012 1 次提交
    • A
      preparation for generic kernel_thread() · 2aa3a7f8
      Al Viro 提交于
      Let architectures select GENERIC_KERNEL_THREAD and have their copy_thread()
      treat NULL regs as "it came from kernel_thread(), sp argument contains
      the function new thread will be calling and stack_size - the argument for
      that function".  Switching the architectures begins shortly...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2aa3a7f8
  7. 26 9月, 2012 1 次提交
    • F
      rcu: Switch task's syscall hooks on context switch · 04e7e951
      Frederic Weisbecker 提交于
      Clear the syscalls hook of a task when it's scheduled out so that if
      the task migrates, it doesn't run the syscall slow path on a CPU
      that might not need it.
      
      Also set the syscalls hook on the next task if needed.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      04e7e951
  8. 25 9月, 2012 1 次提交
    • E
      net: use a per task frag allocator · 5640f768
      Eric Dumazet 提交于
      We currently use a per socket order-0 page cache for tcp_sendmsg()
      operations.
      
      This page is used to build fragments for skbs.
      
      Its done to increase probability of coalescing small write() into
      single segments in skbs still in write queue (not yet sent)
      
      But it wastes a lot of memory for applications handling many mostly
      idle sockets, since each socket holds one page in sk->sk_sndmsg_page
      
      Its also quite inefficient to build TSO 64KB packets, because we need
      about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
      page allocator more than wanted.
      
      This patch adds a per task frag allocator and uses bigger pages,
      if available. An automatic fallback is done in case of memory pressure.
      
      (up to 32768 bytes per frag, thats order-3 pages on x86)
      
      This increases TCP stream performance by 20% on loopback device,
      but also benefits on other network devices, since 8x less frags are
      mapped on transmit and unmapped on tx completion. Alexander Duyck
      mentioned a probable performance win on systems with IOMMU enabled.
      
      Its possible some SG enabled hardware cant cope with bigger fragments,
      but their ndo_start_xmit() should already handle this, splitting a
      fragment in sub fragments, since some arches have PAGE_SIZE=65536
      
      Successfully tested on various ethernet devices.
      (ixgbe, igb, bnx2x, tg3, mellanox mlx4)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5640f768
  9. 18 9月, 2012 1 次提交
    • E
      userns: Convert the audit loginuid to be a kuid · e1760bd5
      Eric W. Biederman 提交于
      Always store audit loginuids in type kuid_t.
      
      Print loginuids by converting them into uids in the appropriate user
      namespace, and then printing the resulting uid.
      
      Modify audit_get_loginuid to return a kuid_t.
      
      Modify audit_set_loginuid to take a kuid_t.
      
      Modify /proc/<pid>/loginuid on read to convert the loginuid into the
      user namespace of the opener of the file.
      
      Modify /proc/<pid>/loginud on write to convert the loginuid
      rom the user namespace of the opener of the file.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Paul Moore <paul@paul-moore.com> ?
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      e1760bd5
  10. 17 9月, 2012 1 次提交
  11. 15 9月, 2012 1 次提交
    • O
      uprobes: Introduce MMF_RECALC_UPROBES · 9f68f672
      Oleg Nesterov 提交于
      Add the new MMF_RECALC_UPROBES flag, it means that MMF_HAS_UPROBES
      can be false positive after remove_breakpoint() or uprobe_munmap().
      It is also set by uprobe_dup_mmap(), this is not optimal but simple.
      We could add the new hook, uprobe_dup_vma(), to set MMF_HAS_UPROBES
      only if the new mm actually has uprobes, but I don't think this
      makes sense.
      
      The next patch will use this flag to clear MMF_HAS_UPROBES.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      9f68f672
  12. 13 9月, 2012 2 次提交
  13. 29 8月, 2012 1 次提交
    • O
      uprobes: Introduce MMF_HAS_UPROBES · f8ac4ec9
      Oleg Nesterov 提交于
      Add the new MMF_HAS_UPROBES flag. It is set by install_breakpoint()
      and it is copied by dup_mmap(), uprobe_pre_sstep_notifier() checks
      it to avoid the slow path if the task was never probed. Perhaps it
      makes sense to check it in valid_vma(is_register => false) as well.
      
      This needs the new dup_mmap()->uprobe_dup_mmap() hook. We can't use
      uprobe_reset_state() or put MMF_HAS_UPROBES into MMF_INIT_MASK, we
      need oldmm->mmap_sem to avoid the race with uprobe_register() or
      mmap() from another thread.
      
      Currently we never clear this bit, it can be false-positive after
      uprobe_unregister() or uprobe_munmap() or if dup_mmap() hits the
      probed VM_DONTCOPY vma. But this is fine correctness-wise and has
      no effect unless the task hits the non-uprobe breakpoint.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      f8ac4ec9
  14. 14 8月, 2012 1 次提交
  15. 09 8月, 2012 1 次提交
  16. 01 8月, 2012 2 次提交
    • M
      mm: allow PF_MEMALLOC from softirq context · 907aed48
      Mel Gorman 提交于
      This is needed to allow network softirq packet processing to make use of
      PF_MEMALLOC.
      
      Currently softirq context cannot use PF_MEMALLOC due to it not being
      associated with a task, and therefore not having task flags to fiddle with
      - thus the gfp to alloc flag mapping ignores the task flags when in
      interrupts (hard or soft) context.
      
      Allowing softirqs to make use of PF_MEMALLOC therefore requires some
      trickery.  This patch borrows the task flags from whatever process happens
      to be preempted by the softirq.  It then modifies the gfp to alloc flags
      mapping to not exclude task flags in softirq context, and modify the
      softirq code to save, clear and restore the PF_MEMALLOC flag.
      
      The save and clear, ensures the preempted task's PF_MEMALLOC flag doesn't
      leak into the softirq.  The restore ensures a softirq's PF_MEMALLOC flag
      cannot leak back into the preempted process.  This should be safe due to
      the following reasons
      
      Softirqs can run on multiple CPUs sure but the same task should not be
      	executing the same softirq code. Neither should the softirq
      	handler be preempted by any other softirq handler so the flags
      	should not leak to an unrelated softirq.
      
      Softirqs re-enable hardware interrupts in __do_softirq() so can be
      	preempted by hardware interrupts so PF_MEMALLOC is inherited
      	by the hard IRQ. However, this is similar to a process in
      	reclaim being preempted by a hardirq. While PF_MEMALLOC is
      	set, gfp_to_alloc_flags() distinguishes between hard and
      	soft irqs and avoids giving a hardirq the ALLOC_NO_WATERMARKS
      	flag.
      
      If the softirq is deferred to ksoftirq then its flags may be used
              instead of a normal tasks but as the softirq cannot be preempted,
              the PF_MEMALLOC flag does not leak to other code by accident.
      
      [davem@davemloft.net: Document why PF_MEMALLOC is safe]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: David Miller <davem@davemloft.net>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      907aed48
    • A
      memcg: rename config variables · c255a458
      Andrew Morton 提交于
      Sanity:
      
      CONFIG_CGROUP_MEM_RES_CTLR -> CONFIG_MEMCG
      CONFIG_CGROUP_MEM_RES_CTLR_SWAP -> CONFIG_MEMCG_SWAP
      CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED -> CONFIG_MEMCG_SWAP_ENABLED
      CONFIG_CGROUP_MEM_RES_CTLR_KMEM -> CONFIG_MEMCG_KMEM
      
      [mhocko@suse.cz: fix missed bits]
      Cc: Glauber Costa <glommer@parallels.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c255a458
  17. 31 7月, 2012 2 次提交
    • S
      NMI watchdog: fix for lockup detector breakage on resume · 45226e94
      Sameer Nanda 提交于
      On the suspend/resume path the boot CPU does not go though an
      offline->online transition.  This breaks the NMI detector post-resume
      since it depends on PMU state that is lost when the system gets
      suspended.
      
      Fix this by forcing a CPU offline->online transition for the lockup
      detector on the boot CPU during resume.
      
      To provide more context, we enable NMI watchdog on Chrome OS.  We have
      seen several reports of systems freezing up completely which indicated
      that the NMI watchdog was not firing for some reason.
      
      Debugging further, we found a simple way of repro'ing system freezes --
      issuing the command 'tasket 1 sh -c "echo nmilockup > /proc/breakme"'
      after the system has been suspended/resumed one or more times.
      
      With this patch in place, the system freeze result in panics, as
      expected.
      
      These panics provide a nice stack trace for us to debug the actual issue
      causing the freeze.
      
      [akpm@linux-foundation.org: fiddle with code comment]
      [akpm@linux-foundation.org: make lockup_detector_bootcpu_resume() conditional on CONFIG_SUSPEND]
      [akpm@linux-foundation.org: fix section errors]
      Signed-off-by: NSameer Nanda <snanda@chromium.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Mandeep Singh Baines <msb@chromium.org>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      45226e94
    • K
      coredump: warn about unsafe suid_dumpable / core_pattern combo · 54b50199
      Kees Cook 提交于
      When suid_dumpable=2, detect unsafe core_pattern settings and warn when
      they are seen.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54b50199
  18. 24 7月, 2012 2 次提交
  19. 23 7月, 2012 3 次提交
  20. 06 7月, 2012 1 次提交
  21. 03 7月, 2012 1 次提交
  22. 08 6月, 2012 1 次提交
  23. 06 6月, 2012 2 次提交
  24. 02 6月, 2012 3 次提交
  25. 30 5月, 2012 2 次提交
  26. 24 5月, 2012 3 次提交
    • O
      keys: kill task_struct->replacement_session_keyring · f23ca335
      Oleg Nesterov 提交于
      Kill the no longer used task_struct->replacement_session_keyring, update
      copy_creds() and exit_creds().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Alexander Gordeev <agordeev@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Smith <dsmith@redhat.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f23ca335
    • O
      genirq: reimplement exit_irq_thread() hook via task_work_add() · 4d1d61a6
      Oleg Nesterov 提交于
      exit_irq_thread() and task->irq_thread are needed to handle the unexpected
      (and unlikely) exit of irq-thread.
      
      We can use task_work instead and make this all private to
      kernel/irq/manage.c, cleanup plus micro-optimization.
      
      1. rename exit_irq_thread() to irq_thread_dtor(), make it
         static, and move it up before irq_thread().
      
      2. change irq_thread() to do task_work_add(irq_thread_dtor)
         at the start and task_work_cancel() before return.
      
         tracehook_notify_resume() can never play with kthreads,
         only do_exit()->exit_task_work() can call the callback
         and this is what we want.
      
      3. remove task_struct->irq_thread and the special hook
         in do_exit().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Alexander Gordeev <agordeev@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Smith <dsmith@redhat.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4d1d61a6
    • O
      task_work_add: generic process-context callbacks · e73f8959
      Oleg Nesterov 提交于
      Provide a simple mechanism that allows running code in the (nonatomic)
      context of the arbitrary task.
      
      The caller does task_work_add(task, task_work) and this task executes
      task_work->func() either from do_notify_resume() or from do_exit().  The
      callback can rely on PF_EXITING to detect the latter case.
      
      "struct task_work" can be embedded in another struct, still it has "void
      *data" to handle the most common/simple case.
      
      This allows us to kill the ->replacement_session_keyring hack, and
      potentially this can have more users.
      
      Performance-wise, this adds 2 "unlikely(!hlist_empty())" checks into
      tracehook_notify_resume() and do_exit().  But at the same time we can
      remove the "replacement_session_keyring != NULL" checks from
      arch/*/signal.c and exit_creds().
      
      Note: task_work_add/task_work_run abuses ->pi_lock.  This is only because
      this lock is already used by lookup_pi_state() to synchronize with
      do_exit() setting PF_EXITING.  Fortunately the scope of this lock in
      task_work.c is really tiny, and the code is unlikely anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Alexander Gordeev <agordeev@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Smith <dsmith@redhat.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e73f8959
  27. 17 5月, 2012 1 次提交
    • P
      sched: Remove stale power aware scheduling remnants and dysfunctional knobs · 8e7fbcbc
      Peter Zijlstra 提交于
      It's been broken forever (i.e. it's not scheduling in a power
      aware fashion), as reported by Suresh and others sending
      patches, and nobody cares enough to fix it properly ...
      so remove it to make space free for something better.
      
      There's various problems with the code as it stands today, first
      and foremost the user interface which is bound to topology
      levels and has multiple values per level. This results in a
      state explosion which the administrator or distro needs to
      master and almost nobody does.
      
      Furthermore large configuration state spaces aren't good, it
      means the thing doesn't just work right because it's either
      under so many impossibe to meet constraints, or even if
      there's an achievable state workloads have to be aware of
      it precisely and can never meet it for dynamic workloads.
      
      So pushing this kind of decision to user-space was a bad idea
      even with a single knob - it's exponentially worse with knobs
      on every node of the topology.
      
      There is a proposal to replace the user interface with a single
      3 state knob:
      
       sched_balance_policy := { performance, power, auto }
      
      where 'auto' would be the preferred default which looks at things
      like Battery/AC mode and possible cpufreq state or whatever the hw
      exposes to show us power use expectations - but there's been no
      progress on it in the past many months.
      
      Aside from that, the actual implementation of the various knobs
      is known to be broken. There have been sporadic attempts at
      fixing things but these always stop short of reaching a mergable
      state.
      
      Therefore this wholesale removal with the hopes of spurring
      people who care to come forward once again and work on a
      coherent replacement.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1326104915.2442.53.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8e7fbcbc
  28. 14 5月, 2012 1 次提交
    • P
      sched/fair: Revert sched-domain iteration breakage · 04f733b4
      Peter Zijlstra 提交于
      Patches c22402a2 ("sched/fair: Let minimally loaded cpu balance the
      group") and 0ce90475 ("sched/fair: Add some serialization to the
      sched_domain load-balance walk") are horribly broken so revert them.
      
      The problem is that while it sounds good to have the minimally loaded
      cpu do the pulling of more load, the way we walk the domains there is
      absolutely no guarantee this cpu will actually get to the domain. In
      fact its very likely it wont. Therefore the higher up the tree we get,
      the less likely it is we'll balance at all.
      
      The first of mask always walks up, while sucky in that it accumulates
      load on the first cpu and needs extra passes to spread it out at least
      guarantees a cpu gets up that far and load-balancing happens at all.
      
      Since its now always the first and idle cpus should always be able to
      balance so they get a task as fast as possible we can also do away
      with the added serialization.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/n/tip-rpuhs5s56aiv1aw7khv9zkw6@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      04f733b4