1. 28 10月, 2010 12 次提交
  2. 27 10月, 2010 13 次提交
  3. 26 10月, 2010 6 次提交
    • C
      fs: use percpu counter for nr_dentry and nr_dentry_unused · 312d3ca8
      Christoph Hellwig 提交于
      The nr_dentry stat is a globally touched cacheline and atomic operation
      twice over the lifetime of a dentry. It is used for the benfit of userspace
      only. Turn it into a per-cpu counter and always decrement it in d_free instead
      of doing various batching operations to reduce lock hold times in the callers.
      
      Based on an earlier patch from Nick Piggin <npiggin@suse.de>.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      312d3ca8
    • C
      fs: do not assign default i_ino in new_inode · 85fe4025
      Christoph Hellwig 提交于
      Instead of always assigning an increasing inode number in new_inode
      move the call to assign it into those callers that actually need it.
      For now callers that need it is estimated conservatively, that is
      the call is added to all filesystems that do not assign an i_ino
      by themselves.  For a few more filesystems we can avoid assigning
      any inode number given that they aren't user visible, and for others
      it could be done lazily when an inode number is actually needed,
      but that's left for later patches.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      85fe4025
    • A
      new helper: ihold() · 7de9c6ee
      Al Viro 提交于
      Clones an existing reference to inode; caller must already hold one.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7de9c6ee
    • D
      fs: Convert nr_inodes and nr_unused to per-cpu counters · cffbc8aa
      Dave Chinner 提交于
      The number of inodes allocated does not need to be tied to the
      addition or removal of an inode to/from a list. If we are not tied
      to a list lock, we could update the counters when inodes are
      initialised or destroyed, but to do that we need to convert the
      counters to be per-cpu (i.e. independent of a lock). This means that
      we have the freedom to change the list/locking implementation
      without needing to care about the counters.
      
      Based on a patch originally from Eric Dumazet.
      
      [AV: cleaned up a bit, fixed build breakage on weird configs
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cffbc8aa
    • E
      fs: allow for more than 2^31 files · 7e360c38
      Eric Dumazet 提交于
      Andrew,
      
      Could you please review this patch, you probably are the right guy to
      take it, because it crosses fs and net trees.
      
      Note : /proc/sys/fs/file-nr is a read-only file, so this patch doesnt
      depend on previous patch (sysctl: fix min/max handling in
      __do_proc_doulongvec_minmax())
      
      Thanks !
      
      [PATCH V4] fs: allow for more than 2^31 files
      
      Robin Holt tried to boot a 16TB system and found af_unix was overflowing
      a 32bit value :
      
      <quote>
      
      We were seeing a failure which prevented boot.  The kernel was incapable
      of creating either a named pipe or unix domain socket.  This comes down
      to a common kernel function called unix_create1() which does:
      
              atomic_inc(&unix_nr_socks);
              if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
                      goto out;
      
      The function get_max_files() is a simple return of files_stat.max_files.
      files_stat.max_files is a signed integer and is computed in
      fs/file_table.c's files_init().
      
              n = (mempages * (PAGE_SIZE / 1024)) / 10;
              files_stat.max_files = n;
      
      In our case, mempages (total_ram_pages) is approx 3,758,096,384
      (0xe0000000).  That leaves max_files at approximately 1,503,238,553.
      This causes 2 * get_max_files() to integer overflow.
      
      </quote>
      
      Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
      integers, and change af_unix to use an atomic_long_t instead of
      atomic_t.
      
      get_max_files() is changed to return an unsigned long.
      get_nr_files() is changed to return a long.
      
      unix_nr_socks is changed from atomic_t to atomic_long_t, while not
      strictly needed to address Robin problem.
      
      Before patch (on a 64bit kernel) :
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      -18446744071562067968
      
      After patch:
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      2147483648
      # cat /proc/sys/fs/file-nr
      704     0       2147483648
      Reported-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NDavid Miller <davem@davemloft.net>
      Reviewed-by: NRobin Holt <holt@sgi.com>
      Tested-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7e360c38
    • D
      MN10300: Fix the PERCPU() alignment to allow for workqueues · 52605627
      David Howells 提交于
      In the MN10300 arch, we occasionally see an assertion being tripped in
      alloc_cwqs() at the following line:
      
              /* just in case, make sure it's actually aligned */
        --->  BUG_ON(!IS_ALIGNED(wq->cpu_wq.v, align));
              return wq->cpu_wq.v ? 0 : -ENOMEM;
      
      The values are:
      
              wa->cpu_wq.v => 0x902776e0
              align => 0x100
      
      and align is calculated by the following:
      
              const size_t align = max_t(size_t, 1 << WORK_STRUCT_FLAG_BITS,
                                         __alignof__(unsigned long long));
      
      This is because the pointer in question (wq->cpu_wq.v) loses some of its
      lower bits to control flags, and so the object it points to must be
      sufficiently aligned to avoid the need to use those bits for pointing to
      things.
      
      Currently, 4 control bits and 4 colour bits are used in normal
      circumstances, plus a debugging bit if debugging is set.  This requires
      the cpu_workqueue_struct struct to be at least 256 bytes aligned (or 512
      bytes aligned with debugging).
      
      PERCPU() alignment on MN13000, however, is only 32 bytes as set in
      vmlinux.lds.S.  So we set this to PAGE_SIZE (4096) to match most other
      arches and stick a comment in alloc_cwqs() for anyone else who triggers
      the assertion.
      Reported-by: NAkira Takeuchi <takeuchi.akr@jp.panasonic.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NMark Salter <msalter@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52605627
  4. 23 10月, 2010 9 次提交
    • J
      kdb,debug_core: adjust master cpu switch logic against new debug_core locking · 495363d3
      Jason Wessel 提交于
      The kdb shell needs to enforce switching back to the original CPU that
      took the exception before restoring normal kernel execution.  Resuming
      from a different CPU than what took the original exception will cause
      problems with spin locks that are freed from the a different processor
      than had taken the lock.
      
      The special logic in dbg_cpu_switch() can go away entirely with
      because the state of what cpus want to be masters or slaves will
      remain unchanged between entry and exit of the debug_core exception
      context.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      495363d3
    • J
      debug_core: refactor locking for master/slave cpus · dfee3a7b
      Jason Wessel 提交于
      For quite some time there have been problems with memory barriers and
      various races with NMI on multi processor systems using the kernel
      debugger.  The algorithm for entering the kernel debug core and
      resuming kernel execution was racy and had several known edge case
      problems with attempting to debug something on a heavily loaded system
      using breakpoints that are hit repeatedly and quickly.
      
      The prior "locking" design entry worked as follows:
      
        * The atomic counter kgdb_active was used with atomic exchange in
          order to elect a master cpu out of all the cpus that may have
          taken a debug exception.
        * The master cpu increments all elements of passive_cpu_wait[].
        * The master cpu issues the round up cpus message.
        * Each "slave cpu" that enters the debug core increments its own
          element in cpu_in_kgdb[].
        * Each "slave cpu" spins on passive_cpu_wait[] until it becomes 0.
        * The master cpu debugs the system.
      
      The new scheme removes the two arrays of atomic counters and replaces
      them with 2 single counters.  One counter is used to count the number
      of cpus waiting to become a master cpu (because one or more hit an
      exception). The second counter is use to indicate how many cpus have
      entered as slave cpus.
      
      The new entry logic works as follows:
      
        * One or more cpus enters via kgdb_handle_exception() and increments
          the masters_in_kgdb. Each cpu attempts to get the spin lock called
          dbg_master_lock.
        * The master cpu sets kgdb_active to the current cpu.
        * The master cpu takes the spinlock dbg_slave_lock.
        * The master cpu asks to round up all the other cpus.
        * Each slave cpu that is not already in kgdb_handle_exception()
          will enter and increment slaves_in_kgdb.  Each slave will now spin
          try_locking on dbg_slave_lock.
        * The master cpu waits for the sum of masters_in_kgdb and slaves_in_kgdb
          to be equal to the sum of the online cpus.
        * The master cpu debugs the system.
      
      In the new design the kgdb_active can only be changed while holding
      dbg_master_lock.  Stress testing has not turned up any further
      entry/exit races that existed in the prior locking design.  The prior
      locking design suffered from atomic variables not being truly atomic
      (in the capacity as used by kgdb) along with memory barrier races.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Acked-by: NDongdong Deng <dongdong.deng@windriver.com>
      dfee3a7b
    • D
      debug_core: disable hw_breakpoints on all cores in kgdb_cpu_enter() · c1bb9a9c
      Dongdong Deng 提交于
      The slave cpus do not have the hw breakpoints disabled upon entry to
      the debug_core and as a result could cause unrecoverable recursive
      faults on badly placed breakpoints, or get out of sync with the arch
      specific hw breakpoint operations.
      
      This patch addresses the problem by invoking kgdb_disable_hw_debug()
      earlier in kgdb_enter_cpu for each cpu that enters the debug core.
      
      The hw breakpoint dis/enable flow should be:
      
      master_debug_cpu   slave_debug_cpu
               \              /
                kgdb_cpu_enter
                      |
              kgdb_disable_hw_debug --> uninstall pre-enabled hw_breakpoint
                      |
       do add/rm dis/enable operates to hw_breakpoints on master_debug_cpu..
                      |
              correct_hw_break --> correct/install the enabled hw_breakpoint
                      |
                 leave_kgdb
      Signed-off-by: NDongdong Deng <dongdong.deng@windriver.com>
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      c1bb9a9c
    • J
      kdb,kgdb: fix sparse fixups · 91b152aa
      Jason Wessel 提交于
      Fix the following sparse warnings:
      
      kdb_main.c:328:5: warning: symbol 'kdbgetu64arg' was not declared. Should it be static?
      kgdboc.c:246:12: warning: symbol 'kgdboc_early_init' was not declared. Should it be static?
      kgdb.c:652:26: warning: incorrect type in argument 1 (different address spaces)
      kgdb.c:652:26:    expected void const *ptr
      kgdb.c:652:26:    got struct perf_event *[noderef] <asn:3>*pev
      
      The one in kgdb.c required the (void * __force) because of the return
      code from register_wide_hw_breakpoint looking like:
      
              return (void __percpu __force *)ERR_PTR(err);
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      91b152aa
    • J
      kdb: Fix oops in kdb_unregister · 75d14ede
      Jason Wessel 提交于
      Nothing should try to use kdb_commands directly as sometimes it is
      null.  Instead, use the for_each_kdbcmd() iterator.
      
      This particular problem dates back to the initial kdb merge (2.6.35),
      but at that point nothing was dynamically unregistering commands from
      the kdb shell.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      75d14ede
    • J
      kdb,ftdump: Remove reference to internal kdb include · e3bda3ac
      Jason Wessel 提交于
      Now that include/linux/kdb.h properly exports all the functions
      required to dynamically add a kdb shell command, the reference to the
      private kdb header can be removed.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      e3bda3ac
    • J
      kdb: Allow kernel loadable modules to add kdb shell functions · f7030bbc
      Jason Wessel 提交于
      In order to allow kernel modules to dynamically add a command to the
      kdb shell the kdb_register, kdb_register_repeat, kdb_unregister, and
      kdb_printf need to be exported as GPL symbols.
      
      Any kernel module that adds a dynamic kdb shell function should only
      need to include linux/kdb.h.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      f7030bbc
    • J
      debug_core: stop rcu warnings on kernel resume · fb70b588
      Jason Wessel 提交于
      When returning from the kernel debugger reset the rcu jiffies_stall
      value to prevent the rcu stall detector from sending NMI events which
      invoke a stack dump for each cpu in the system.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      fb70b588
    • J
      debug_core: move all watch dog syncs to a single function · 16cdc628
      Jason Wessel 提交于
      Move the various clock and watch dog syncs to a single function in
      advance of adding another sync for the rcu stall detector.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      16cdc628