1. 18 9月, 2012 1 次提交
    • E
      userns: Convert taskstats to handle the user and pid namespaces. · 4bd6e32a
      Eric W. Biederman 提交于
      - Explicitly limit exit task stat broadcast to the initial user and
        pid namespaces, as it is already limited to the initial network
        namespace.
      
      - For broadcast task stats explicitly generate all of the idenitiers
        in terms of the initial user namespace and the initial pid
        namespace.
      
      - For request stats report them in terms of the current user namespace
        and the current pid namespace.  Netlink messages are delivered
        syncrhonously to the kernel allowing us to get the user namespace
        and the pid namespace from the current task.
      
      - Pass the namespaces for representing pids and uids and gids
        into bacct_add_task.
      
      Cc: Balbir Singh <bsingharora@gmail.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      4bd6e32a
  2. 31 7月, 2012 1 次提交
  3. 20 9月, 2011 1 次提交
  4. 04 8月, 2011 2 次提交
  5. 27 7月, 2011 1 次提交
  6. 28 6月, 2011 1 次提交
    • V
      taskstats: don't allow duplicate entries in listener mode · 26c4caea
      Vasiliy Kulikov 提交于
      Currently a single process may register exit handlers unlimited times.
      It may lead to a bloated listeners chain and very slow process
      terminations.
      
      Eg after 10KK sent TASKSTATS_CMD_ATTR_REGISTER_CPUMASKs ~300 Mb of
      kernel memory is stolen for the handlers chain and "time id" shows 2-7
      seconds instead of normal 0.003.  It makes it possible to exhaust all
      kernel memory and to eat much of CPU time by triggerring numerous exits
      on a single CPU.
      
      The patch limits the number of times a single process may register
      itself on a single CPU to one.
      
      One little issue is kept unfixed - as taskstats_exit() is called before
      exit_files() in do_exit(), the orphaned listener entry (if it was not
      explicitly deregistered) is kept until the next someone's exit() and
      implicit deregistration in send_cpu_listeners().  So, if a process
      registered itself as a listener exits and the next spawned process gets
      the same pid, it would inherit taskstats attributes.
      Signed-off-by: NVasiliy Kulikov <segooon@gmail.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26c4caea
  7. 24 3月, 2011 1 次提交
  8. 14 1月, 2011 1 次提交
    • J
      taskstats: use better ifdef for alignment · 9ab020cf
      Jeff Mahoney 提交于
      Commit 4be2c95d ("taskstats: pad taskstats netlink response for aligment
      issues on ia64") added a null field to align the taskstats structure but
      the discussion centered around ia64.  The issue exists on other platforms
      with inefficient unaligned access and adding them piecemeal would be an
      unmaintainable mess.
      
      This patch uses Dave Miller's suggestion of using a combination of
      CONFIG_64BIT && !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to determine
      whether alignment is needed.
      
      Note that this will cause breakage on those platforms with applications
      like iotop which had hard-coded offsets into the packet to access the
      taskstats structure.
      
      The message seen on systems without the alignment fixes looks like: kernel
      unaligned access to 0xe000023879dca9bc, ip=0xa000000100133d10
      
      The addresses may vary but resolve to locations inside __delayacct_add_tsk.
      
      iotop makes what I'd call unreasonable assumptions about the contents of a
      netlink genetlink packet containing generic attributes.  They're typed and
      have headers that specify value lengths, so the client can (should)
      identify and skip the ones the client doesn't understand.
      
      The kernel, as of version 2.6.36, presented a packet like so:
      +--------------------------------+
      | genlmsghdr - 4 bytes           |
      +--------------------------------+
      | NLA header - 4 bytes           | /* Aggregate header */
      +-+------------------------------+
      | | NLA header - 4 bytes         | /* PID header */
      | +------------------------------+
      | | pid/tgid   - 4 bytes         |
      | +------------------------------+
      | | NLA header - 4 bytes         | /* stats header */
      | + -----------------------------+ <- oops. aligned on 4 byte boundary
      | | struct taskstats - 328 bytes |
      +-+------------------------------+
      
      The iotop code expects that the kernel will behave as it did then,
      assuming that the packet format is set in stone.  The format is set in
      stone, but the packet offsets are not.  There's nothing in the packet
      format that guarantees that the packet will be sent in exactly the same
      way.  The attribute contents are set (or versioned) and the aggregate
      contents are set but they can be anywhere in the packet.
      
      The issue here isn't that an unaligned structure gets passed to userspace,
      it's that the NLA infrastructure has something of a weakness: The 4 byte
      attribute header may force the payload to be unaligned.  The taskstats
      structure is created at an unaligned location and then 64-bit values are
      operated on inside the kernel, so the unaligned access warnings gets
      spewed everywhere.
      
      It's possible to use the unaligned access API to operate on the structure
      in the kernel but it seems like a wasted effort to work around userspace
      code that isn't following the packet format.  Any new additions would also
      need the be worked around.  It's a maintenance nightmare.
      
      The conclusion of the earlier discussion seemed to be "ok fine, if we have
      to break it, don't break it on arches that don't have the problem." Dave
      pointed out that the unaligned access problem doesn't only exist on ia64,
      but also on other 64-bit arches that don't have efficient unaligned access
      and it should be fixed there as well.  The committed version of the patch
      and this addition keep with the conclusion of that discussion not to break
      it unnecessarily, which the pid padding and the packet padding fixes did
      do.  x86_64 and powerpc don't suffer this problem so they shouldn't suffer
      the solution.  Other 64-bit architectures do and will, though.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reported-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Dan Carpenter <error27@gmail.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Florian Mickler <florian@mickler.org>
      Cc: Guillaume Chazarain <guichaz@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9ab020cf
  9. 23 12月, 2010 1 次提交
    • J
      taskstats: pad taskstats netlink response for aligment issues on ia64 · 4be2c95d
      Jeff Mahoney 提交于
      The taskstats structure is internally aligned on 8 byte boundaries but the
      layout of the aggregrate reply, with two NLA headers and the pid (each 4
      bytes), actually force the entire structure to be unaligned.  This causes
      the kernel to issue unaligned access warnings on some architectures like
      ia64.  Unfortunately, some software out there doesn't properly unroll the
      NLA packet and assumes that the start of the taskstats structure will
      always be 20 bytes from the start of the netlink payload.  Aligning the
      start of the taskstats structure breaks this software, which we don't
      want.  So, for now the alignment only happens on architectures that
      require it and those users will have to update to fixed versions of those
      packages.  Space is reserved in the packet only when needed.  This ifdef
      should be removed in several years e.g.  2012 once we can be confident
      that fixed versions are installed on most systems.  We add the padding
      before the aggregate since the aggregate is already a defined type.
      
      Commit 85893120 ("delayacct: align to 8 byte boundary on 64-bit systems")
      previously addressed the alignment issues by padding out the pid field.
      This was supposed to be a compatible change but the circumstances
      described above mean that it wasn't.  This patch backs out that change,
      since it was a hack, and introduces a new NULL attribute type to provide
      the padding.  Padding the response with 4 bytes avoids allocating an
      aligned taskstats structure and copying it back.  Since the structure
      weighs in at 328 bytes, it's too big to do it on the stack.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reported-by: NBrian Rogers <brian@xyzw.org>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Guillaume Chazarain <guichaz@gmail.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4be2c95d
  10. 17 12月, 2010 1 次提交
  11. 28 10月, 2010 3 次提交
  12. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  13. 19 2月, 2010 1 次提交
  14. 13 7月, 2009 1 次提交
    • J
      genetlink: make netns aware · 134e6375
      Johannes Berg 提交于
      This makes generic netlink network namespace aware. No
      generic netlink families except for the controller family
      are made namespace aware, they need to be checked one by
      one and then set the family->netnsok member to true.
      
      A new function genlmsg_multicast_netns() is introduced to
      allow sending a multicast message in a given namespace,
      for example when it applies to an object that lives in
      that namespace, a new function genlmsg_multicast_allns()
      to send a message to all network namespaces (for objects
      that do not have an associated netns).
      
      The function genlmsg_multicast() is changed to multicast
      the message in just init_net, which is currently correct
      for all generic netlink families since they only work in
      init_net right now. Some will later want to work in all
      net namespaces because they do not care about the netns
      at all -- those will have to be converted to use one of
      the new functions genlmsg_multicast_allns() or
      genlmsg_multicast_netns() whenever they are made netns
      aware in some way.
      
      After this patch families can easily decide whether or
      not they should be available in all net namespaces. Many
      genl families us it for objects not related to networking
      and should therefore be available in all namespaces, but
      that will have to be done on a per family basis.
      
      Note that this doesn't touch on the checkpoint/restart
      problem where network namespaces could be used, genl
      families and multicast groups are numbered globally and
      I see no easy way of changing that, especially since it
      must be possible to multicast to all network namespaces
      for those families that do not care about netns.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      134e6375
  15. 01 1月, 2009 1 次提交
  16. 13 12月, 2008 1 次提交
    • R
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and... · 29c0177e
      Rusty Russell 提交于
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers.
      
      Impact: change calling convention of existing cpumask APIs
      
      Most cpumask functions started with cpus_: these have been replaced by
      cpumask_ ones which take struct cpumask pointers as expected.
      
      These four functions don't have good replacement names; fortunately
      they're rarely used, so we just change them over.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: paulus@samba.org
      Cc: mingo@redhat.com
      Cc: tony.luck@intel.com
      Cc: ralf@linux-mips.org
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: cl@linux-foundation.org
      Cc: srostedt@redhat.com
      29c0177e
  17. 26 7月, 2008 1 次提交
  18. 24 5月, 2008 1 次提交
  19. 30 4月, 2008 1 次提交
    • P
      Use find_task_by_vpid in taskstats · cb41d6d0
      Pavel Emelyanov 提交于
      The pid to lookup a task by is passed inside taskstats code via genetlink
      message.
      
      Since netlink packets are now processed in the context of the sending task,
      this is correct to lookup the task with find_task_by_vpid() here.
      
      Besides, I fix the call to fill_pid() from taskstats_exit(), since the
      tsk->pid is not required in fill_pid() in this case, and the pid field on
      task_struct is going to be deprecated as well.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Cc: Jonathan Lim <jlim@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb41d6d0
  20. 15 11月, 2007 1 次提交
  21. 20 10月, 2007 2 次提交
    • R
      Fix misspellings of "system", "controller", "interrupt" and "necessary". · 3a4fa0a2
      Robert P. J. Day 提交于
      Fix the various misspellings of "system", controller", "interrupt" and
      "[un]necessary".
      Signed-off-by: NRobert P. J. Day <rpjday@mindspring.com>
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      3a4fa0a2
    • B
      Add cgroupstats · 846c7bb0
      Balbir Singh 提交于
      This patch is inspired by the discussion at
      http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
      as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.  The
      patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
      ported)
      
      This patch implements per cgroup statistics infrastructure and re-uses
      code from the taskstats interface.  A new set of cgroup operations are
      registered with commands and attributes.  It should be very easy to
      *extend* per cgroup statistics, by adding members to the cgroupstats
      structure.
      
      The current model for cgroupstats is a pull, a push model (to post
      statistics on interesting events), should be very easy to add.  Currently
      user space requests for statistics by passing the cgroup file
      descriptor.  Statistics about the state of all the tasks in the cgroup
      is returned to user space.
      
      TODO's/NOTE:
      
      This patch provides an infrastructure for implementing cgroup statistics.
      Based on the needs of each controller, we can incrementally add more statistics,
      event based support for notification of statistics, accumulation of taskstats
      into cgroup statistics in the future.
      
      Sample output
      
      # ./cgroupstats -C /cgroup/a
      sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0
      
      # ./cgroupstats -C /cgroup/
      sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0
      
      If the approach looks good, I'll enhance and post the user space utility for
      the same
      
      Feedback, comments, test results are always welcome!
      
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      846c7bb0
  22. 17 10月, 2007 1 次提交
  23. 17 7月, 2007 1 次提交
    • M
      taskstats: add context-switch counters · b663a79c
      Maxim Uvarov 提交于
      Make available to the user the following task and process performance
      statistics:
      
      	* Involuntary Context Switches (task_struct->nivcsw)
      	* Voluntary Context Switches (task_struct->nvcsw)
      
      Statistics information is available from:
      	1. taskstats interface (Documentation/accounting/)
      	2. /proc/PID/status (task only).
      
      This data is useful for detecting hyperactivity patterns between processes.
      
      [akpm@linux-foundation.org: cleanup]
      Signed-off-by: NMaxim Uvarov <muvarov@ru.mvista.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Cc: Jonathan Lim <jlim@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b663a79c
  24. 08 5月, 2007 1 次提交
  25. 26 4月, 2007 1 次提交
  26. 08 12月, 2006 8 次提交
  27. 03 12月, 2006 3 次提交