• V
    x86/perf/cqm: Wipe out perf based cqm · c39a0e2c
    Vikas Shivappa 提交于
    'perf cqm' never worked due to the incompatibility between perf
    infrastructure and cqm hardware support.  The hardware uses RMIDs to
    track the llc occupancy of tasks and these RMIDs are per package. This
    makes monitoring a hierarchy like cgroup along with monitoring of tasks
    separately difficult and several patches sent to lkml to fix them were
    NACKed. Further more, the following issues in the current perf cqm make
    it almost unusable:
    
        1. No support to monitor the same group of tasks for which we do
        allocation using resctrl.
    
        2. It gives random and inaccurate data (mostly 0s) once we run out
        of RMIDs due to issues in Recycling.
    
        3. Recycling results in inaccuracy of data because we cannot
        guarantee that the RMID was stolen from a task when it was not
        pulling data into cache or even when it pulled the least data. Also
        for monitoring llc_occupancy, if we stop using an RMID_x and then
        start using an RMID_y after we reclaim an RMID from an other event,
        we miss accounting all the occupancy that was tagged to RMID_x at a
        later perf_count.
    
        2. Recycling code makes the monitoring code complex including
        scheduling because the event can lose RMID any time. Since MBM
        counters count bandwidth for a period of time by taking snap shot of
        total bytes at two different times, recycling complicates the way we
        count MBM in a hierarchy. Also we need a spin lock while we do the
        processing to account for MBM counter overflow. We also currently
        use a spin lock in scheduling to prevent the RMID from being taken
        away.
    
        4. Lack of support when we run different kind of event like task,
        system-wide and cgroup events together. Data mostly prints 0s. This
        is also because we can have only one RMID tied to a cpu as defined
        by the cqm hardware but a perf can at the same time tie multiple
        events during one sched_in.
    
        5. No support of monitoring a group of tasks. There is partial support
        for cgroup but it does not work once there is a hierarchy of cgroups
        or if we want to monitor a task in a cgroup and the cgroup itself.
    
        6. No support for monitoring tasks for the lifetime without perf
        overhead.
    
        7. It reported the aggregate cache occupancy or memory bandwidth over
        all sockets. But most cloud and VMM based use cases want to know the
        individual per-socket usage.
    Signed-off-by: NVikas Shivappa <vikas.shivappa@linux.intel.com>
    Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
    Cc: ravi.v.shankar@intel.com
    Cc: tony.luck@intel.com
    Cc: fenghua.yu@intel.com
    Cc: peterz@infradead.org
    Cc: eranian@google.com
    Cc: vikas.shivappa@intel.com
    Cc: ak@linux.intel.com
    Cc: davidcc@google.com
    Cc: reinette.chatre@intel.com
    Link: http://lkml.kernel.org/r/1501017287-28083-2-git-send-email-vikas.shivappa@linux.intel.com
    c39a0e2c
perf_event.h 38.8 KB