1. 24 9月, 2013 8 次提交
    • K
      rcu: Fix CONFIG_RCU_NOCB_CPU_ALL panic on machines with sparse CPU mask · 5d5a0800
      Kirill Tkhai 提交于
      Some architectures have sparse cpu mask. UltraSparc's cpuinfo for example:
      
      CPU0: online
      CPU2: online
      
      So, set only possible CPUs when CONFIG_RCU_NOCB_CPU_ALL is enabled.
      
      Also, check that user passes right 'rcu_nocbs=' option.
      Signed-off-by: NKirill Tkhai <tkhai@yandex.ru>
      CC: Dipankar Sarma <dipankar@in.ibm.com>
      [ paulmck: Fix pr_info() issue noted by scripts/checkpatch.pl. ]
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5d5a0800
    • P
      rcu: Have rcutiny tracepoints use tracepoint_string() · 0d752924
      Paul E. McKenney 提交于
      This commit extends the work done in f7f7bac9 (rcu: Have the RCU
      tracepoints use the tracepoint_string infrastructure) to cover rcutiny.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      0d752924
    • P
      rcu: Reject memory-order-induced stall-warning false positives · 26cdfedf
      Paul E. McKenney 提交于
      If a system is idle from an RCU perspective for longer than specified
      by CONFIG_RCU_CPU_STALL_TIMEOUT, and if one CPU starts a grace period
      just as a second checks for CPU stalls, and if this second CPU happens
      to see the old value of rsp->jiffies_stall, it will incorrectly report a
      CPU stall.  This is quite rare, but apparently occurs deterministically
      on systems with about 6TB of memory.
      
      This commit therefore orders accesses to the data used to determine
      whether or not a CPU stall is in progress.  Grace-period initialization
      and cleanup first increments rsp->completed to mark the end of the
      previous grace period, then records the current jiffies in rsp->gp_start,
      then records the jiffies at which a stall can be expected to occur in
      rsp->jiffies_stall, and finally increments rsp->gpnum to mark the start
      of the new grace period.  Now, this ordering by itself does not prevent
      false positives.  For example, if grace-period initialization was delayed
      between recording rsp->gp_start and rsp->jiffies_stall, the CPU stall
      warning code might still see an old value of rsp->jiffies_stall.
      
      Therefore, this commit also orders the CPU stall warning accesses as
      well, loading rsp->gpnum and jiffies, then rsp->jiffies_stall, then
      rsp->gp_start, and finally rsp->completed.  This ordering means that
      the false-positive scenario in the previous paragraph would result
      in rsp->completed being greater than or equal to rsp->gpnum, which is
      never valid for a CPU stall, allowing the false positive to be rejected.
      Furthermore, any fetch that gets an old value of rsp->jiffies_stall
      must also get an old value of rsp->gpnum, which will again be rejected
      by the comparison of rsp->gpnum and rsp->completed.  Situations where
      rsp->gp_start is later than rsp->jiffies_stall are also rejected, as
      are situations where jiffies is less than rsp->jiffies_stall.
      
      Although use of unsynchronized accesses means that there are likely
      still some false-positive scenarios (synchronization has proven to be
      a very bad idea on large systems), this should get rid of a large class
      of these scenarios.
      Reported-by: NFabian Herschel <fabian.herschel@suse.com>
      Reported-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Tested-by: NJochen Striepe <jochen@tolot.escape.de>
      26cdfedf
    • P
      rcu: Micro-optimize rcu_cpu_has_callbacks() · 69c8d28c
      Paul E. McKenney 提交于
      The for_each_rcu_flavor() loop unconditionally scans all flavors, even
      when the first flavor might have some non-lazy callbacks.  Once the
      loop has seen a non-lazy callback, further passes through the loop
      cannot change the state.  This is not a huge problem, given that there
      can be at most three RCU flavors (RCU-bh, RCU-preempt, and RCU-sched),
      but this code is on the path to idle, so speeding it up even a small
      amount would have some benefit.
      
      This commit therefore does two things:
      
      1.	Rearranges the order of the list of RCU flavors in order to
      	place the most active flavor first in the list.  The most active
      	RCU flavor is RCU-preempt, or, if there is no RCU-preempt,
      	RCU-sched.
      
      2.	Reworks the for_each_rcu_flavor() to exit early when the first
      	non-lazy callback is seen, or, in the case where the caller
      	does not care about non-lazy callbacks (RCU_FAST_NO_HZ=n),
      	when the first callback is seen.
      Reported-by: NChen Gang <gang.chen@asianux.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      69c8d28c
    • P
      rcu: Silence unused-variable warnings · 289828e6
      Paul E. McKenney 提交于
      The "idle" variable in both rcu_eqs_enter_common() and
      rcu_eqs_exit_common() is only used in a WARN_ON_ONCE().  If the kernel
      is built disabling WARN_ON_ONCE(), the compiler will complain (rightly)
      that "idle" is unused.  This commit therefore adds a __maybe_unused to
      the declaration of both variables.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      289828e6
    • C
      rcu: Replace __get_cpu_var() uses · c9d4b0af
      Christoph Lameter 提交于
      __get_cpu_var() is used for multiple purposes in the kernel source. One
      of them is address calculation via the form &__get_cpu_var(x). This
      calculates the address for the instance of the percpu variable of the
      current processor based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      __get_cpu_var() always only does an address determination. However,
      store and retrieve operations could use a segment prefix (or global
      register on other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into
      a percpu area and use optimized assembly code to read and write per
      cpu variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations
      that use the offset. Thereby address calcualtions are avoided and less
      registers are used when code is generated.
      
      At the end of the patchset all uses of __get_cpu_var have been removed
      so the macro is removed too.
      
      The patchset includes passes over all arches as well. Once these
      operations are used throughout then specialized macros can be defined in
      non -x86 arches as well in order to optimize per cpu access by f.e. using
      a global register that may be set to the per cpu base.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
         variable.
      
      	DEFINE_PER_CPU(int, u);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(this_cpu_ptr(&x), y, sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	this_cpu_inc(y)
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      [ paulmck: Address conflicts. ]
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c9d4b0af
    • P
      rcu: Fix dubious "if" condition in __call_rcu_nocb_enqueue() · 829511d8
      Paul E. McKenney 提交于
      This commit replaces an incorrect (but fortunately functional)
      bitwise OR ("|") operator with the correct logical OR ("||").
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      829511d8
    • P
      rcu: Convert local functions to static · 01896f7e
      Paul E. McKenney 提交于
      The rcu_cpu_stall_timeout kernel parameter, the rcu_dynticks per-CPU
      variable, and the rcu_gp_fqs() function are used only locally.  This
      commit therefore marks them as static.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      01896f7e
  2. 21 9月, 2013 1 次提交
  3. 13 9月, 2013 5 次提交
  4. 12 9月, 2013 23 次提交
  5. 11 9月, 2013 2 次提交
    • A
      perf: Fix up MMAP2 buffer space reservation · d008d525
      Arnaldo Carvalho de Melo 提交于
      The ino_generation field was added in the PERF_RECORD_MMAP2 record in
      the 13d7a241 cset but no space for it was allocated, corrupting the
      PERF_FORMAT_{TIME,CPU,TID,etc} area (sample_type/sample_id_all), fix it.
      
      Detected with one of the regression tests done by 'perf test':
      
        [root@sandy ~]# perf test -v 7
         7: Validate PERF_RECORD_* events & perf_sample fields     :
        --- start ---
        61315294449606 0 PERF_RECORD_SAMPLE
        61315294453161 0 PERF_RECORD_SAMPLE
        61315294454441 0 PERF_RECORD_SAMPLE
        61315294455709 0 PERF_RECORD_SAMPLE
        61315295600899 0 PERF_RECORD_COMM: sleep:6500
        27917287430500 342521613 PERF_RECORD_MMAP2 6500/6500: [0x400000(0x7000) @ 0 00:1d 311442 9016]: /usr/bin/sleep
        MMAP2 going backwards in time, prev=61315295600899, curr=27917287430500
        MMAP2 with unexpected cpu, expected 0, got 342521613
        MMAP2 with unexpected pid, expected 6500, got 1701606191
        MMAP2 with unexpected tid, expected 6500, got 28773
        27917287430500 342561333 PERF_RECORD_MMAP2 6500/6500: [0x3b7e000000(0x223000) @ 0 00:1d 309186 9016]: /usr/lib64/ld-2.16.so
        MMAP2 with unexpected cpu, expected 0, got 342561333
        MMAP2 with unexpected pid, expected 6500, got 1932408369
        MMAP2 with unexpected tid, expected 6500, got 111
        27917287430500 342600095 PERF_RECORD_MMAP2 6500/6500: [0x7fffbd7dc000(0x1000) @ 0x7fffbd7dc000 00:00 0 0]: [vdso]
        MMAP2 with unexpected cpu, expected 0, got 342600095
        MMAP2 with unexpected pid, expected 6500, got 1935963739
        MMAP2 with unexpected tid, expected 6500, got 23919
        27917287430500 342882834 PERF_RECORD_MMAP2 6500/6500: [0x3b7e400000(0x3b8000) @ 0 00:1d 309187 9016]: /usr/lib64/libc-2.16.so
        MMAP2 with unexpected cpu, expected 0, got 342882834
        MMAP2 with unexpected pid, expected 6500, got 909192754
        MMAP2 with unexpected tid, expected 6500, got 7303982
        61316297195411 0 PERF_RECORD_EXIT(6500:6500):(6500:6500)
        ---- end ----
        Validate PERF_RECORD_* events & perf_sample fields: FAILED!
        [root@sandy ~]#
      
      After this patch:
      
        [root@sandy ~]# perf test 7
         7: Validate PERF_RECORD_* events & perf_sample fields     : Ok
        [root@sandy ~]#
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NStephane Eranian <eranian@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-heeuv986b8ha7whqg4o3he7c@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d008d525
    • G
      fs: bump inode and dentry counters to long · 3942c07c
      Glauber Costa 提交于
      This series reworks our current object cache shrinking infrastructure in
      two main ways:
      
       * Noticing that a lot of users copy and paste their own version of LRU
         lists for objects, we put some effort in providing a generic version.
         It is modeled after the filesystem users: dentries, inodes, and xfs
         (for various tasks), but we expect that other users could benefit in
         the near future with little or no modification.  Let us know if you
         have any issues.
      
       * The underlying list_lru being proposed automatically and
         transparently keeps the elements in per-node lists, and is able to
         manipulate the node lists individually.  Given this infrastructure, we
         are able to modify the up-to-now hammer called shrink_slab to proceed
         with node-reclaim instead of always searching memory from all over like
         it has been doing.
      
      Per-node lru lists are also expected to lead to less contention in the lru
      locks on multi-node scans, since we are now no longer fighting for a
      global lock.  The locks usually disappear from the profilers with this
      change.
      
      Although we have no official benchmarks for this version - be our guest to
      independently evaluate this - earlier versions of this series were
      performance tested (details at
      http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no
      visible performance regressions while yielding a better qualitative
      behavior in NUMA machines.
      
      With this infrastructure in place, we can use the list_lru entry point to
      provide memcg isolation and per-memcg targeted reclaim.  Historically,
      those two pieces of work have been posted together.  This version presents
      only the infrastructure work, deferring the memcg work for a later time,
      so we can focus on getting this part tested.  You can see more about the
      history of such work at http://lwn.net/Articles/552769/
      
      Dave Chinner (18):
        dcache: convert dentry_stat.nr_unused to per-cpu counters
        dentry: move to per-sb LRU locks
        dcache: remove dentries from LRU before putting on dispose list
        mm: new shrinker API
        shrinker: convert superblock shrinkers to new API
        list: add a new LRU list type
        inode: convert inode lru list to generic lru list code.
        dcache: convert to use new lru list infrastructure
        list_lru: per-node list infrastructure
        shrinker: add node awareness
        fs: convert inode and dentry shrinking to be node aware
        xfs: convert buftarg LRU to generic code
        xfs: rework buffer dispose list tracking
        xfs: convert dquot cache lru to list_lru
        fs: convert fs shrinkers to new scan/count API
        drivers: convert shrinkers to new count/scan API
        shrinker: convert remaining shrinkers to count/scan API
        shrinker: Kill old ->shrink API.
      
      Glauber Costa (7):
        fs: bump inode and dentry counters to long
        super: fix calculation of shrinkable objects for small numbers
        list_lru: per-node API
        vmscan: per-node deferred work
        i915: bail out earlier when shrinker cannot acquire mutex
        hugepage: convert huge zero page shrinker to new shrinker API
        list_lru: dynamically adjust node arrays
      
      This patch:
      
      There are situations in very large machines in which we can have a large
      quantity of dirty inodes, unused dentries, etc.  This is particularly true
      when umounting a filesystem, where eventually since every live object will
      eventually be discarded.
      
      Dave Chinner reported a problem with this while experimenting with the
      shrinker revamp patchset.  So we believe it is time for a change.  This
      patch just moves int to longs.  Machines where it matters should have a
      big long anyway.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3942c07c
  6. 10 9月, 2013 1 次提交