1. 30 3月, 2012 1 次提交
  2. 29 3月, 2012 9 次提交
  3. 28 3月, 2012 8 次提交
  4. 27 3月, 2012 3 次提交
    • D
      xfs: Account log unmount transaction correctly · 3948659e
      Dave Chinner 提交于
      There have been a few reports of this warning appearing recently:
      
      XFS (dm-4): xlog_space_left: head behind tail
       tail_cycle = 129, tail_bytes = 20163072
       GH   cycle = 129, GH   bytes = 20162880
      
      The common cause appears to be lots of freeze and unfreeze cycles,
      and the output from the warnings indicates that we are leaking
      around 8 bytes of log space per freeze/unfreeze cycle.
      
      When we freeze the filesystem, we write an unmount record and that
      uses xlog_write directly - a special type of transaction,
      effectively. What it doesn't do, however, is correctly account for
      the log space it uses. The unmount record writes an 8 byte structure
      with a special magic number into the log, and the space this
      consumes is not accounted for in the log ticket tracking the
      operation. Hence we leak 8 bytes every unmount record that is
      written.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      3948659e
    • D
      xfs: don't cache inodes read through bulkstat · 5132ba8f
      Dave Chinner 提交于
      When we read inodes via bulkstat, we generally only read them once
      and then throw them away - they never get used again. If we retain
      them in cache, then it simply causes the working set of inodes and
      other cached items to be reclaimed just so the inode cache can grow.
      
      Avoid this problem by marking inodes read by bulkstat not to be
      cached and check this flag in .drop_inode to determine whether the
      inode should be added to the VFS LRU or not. If the inode lookup
      hits an already cached inode, then don't set the flag. If the inode
      lookup hits an inode marked with no cache flag, remove the flag and
      allow it to be cached once the current reference goes away.
      
      Inodes marked as not cached will get cleaned up by the background
      inode reclaim or via memory pressure, so they will still generate
      some short term cache pressure. They will, however, be reclaimed
      much sooner and in preference to cache hot inodes.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      5132ba8f
    • C
      xfs: trace xfs_name strings correctly · f6161375
      Christoph Hellwig 提交于
      Strings store in an xfs_name structure are often not NUL terminated,
      print them using the correct printf specifiers that make use of the
      string length store in the xfs_name structure.
      Reported-by: NBrian Candler <B.Candler@pobox.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      f6161375
  5. 26 3月, 2012 7 次提交
    • J
      nfsd4: allow numeric idmapping · e9541ce8
      J. Bruce Fields 提交于
      Mimic the client side by providing a module parameter that turns off
      idmapping in the auth_sys case, for backwards compatibility with NFSv2
      and NFSv3.
      
      Unlike in the client case, we don't have any way to negotiate, since the
      client can return an error to us if it doesn't like the id that we
      return to it in (for example) a getattr call.
      
      However, it has always been possible for servers to return numeric id's,
      and as far as we're aware clients have always been able to handle them.
      
      Also, in the auth_sys case clients already need to have numeric id's the
      same between client and server.
      
      Therefore we believe it's safe to default this to on; but the module
      parameter is available to return to previous behavior if this proves to
      be a problem in some unexpected setup.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      e9541ce8
    • J
      nfsd: don't allow legacy client tracker init for anything but init_net · cc27e0d4
      Jeff Layton 提交于
      This code isn't set up for containers, so don't allow it to be
      used for anything but init_net.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      cc27e0d4
    • J
      nfsd: add notifier to handle mount/unmount of rpc_pipefs sb · 813fd320
      Jeff Layton 提交于
      In the event that rpc_pipefs isn't mounted when nfsd starts, we
      must register a notifier to handle creating the dentry once it
      is mounted, and to remove the dentry on unmount.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      813fd320
    • J
      nfsd: add the infrastructure to handle the cld upcall · f3f80148
      Jeff Layton 提交于
      ...and add a mechanism for switching between the "legacy" tracker and
      the new one. The decision is made by looking to see whether the
      v4recoverydir exists. If it does, then the legacy client tracker is
      used.
      
      If it's not, then the kernel will create a "cld" pipe in rpc_pipefs.
      That pipe is used to talk to a daemon for handling the upcall.
      
      Most of the data structures for the new client tracker are handled on a
      per-namespace basis, so this upcall should be essentially ready for
      containerization. For now however, nfsd just starts it by calling the
      initialization and exit functions for init_net.
      
      I'm making the assumption that at some point in the future we'll be able
      to determine the net namespace from the nfs4_client. Until then, this
      patch hardcodes init_net in those places. I've sprinkled some "FIXME"
      comments around that code to attempt to make it clear where we'll need
      to fix that up later.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      f3f80148
    • J
      nfsd: add a per-net-namespace struct for nfsd · 7ea34ac1
      Jeff Layton 提交于
      Eventually, we'll need this when nfsd gets containerized fully. For
      now, create a struct on a per-net-namespace basis that will just hold
      a pointer to the cld_net structure. That struct will hold all of the
      per-net data that we need for the cld tracker.
      
      Eventually we can add other pernet objects to struct nfsd_net.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      7ea34ac1
    • J
      nfsd: add nfsd4_client_tracking_ops struct and a way to set it · 2a4317c5
      Jeff Layton 提交于
      Abstract out the mechanism that we use to track clients into a set of
      client name tracking functions.
      
      This gives us a mechanism to plug in a new set of client tracking
      functions without disturbing the callers. It also gives us a way to
      decide on what tracking scheme to use at runtime.
      
      For now, this just looks like pointless abstraction, but later we'll
      add a new alternate scheme for tracking clients on stable storage.
      
      Note too that this patch anticipates the eventual containerization
      of this code by passing in struct net pointers in places. No attempt
      is made to containerize the legacy client tracker however.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      2a4317c5
    • J
      nfsd: convert nfs4_client->cl_cb_flags to a generic flags field · a52d726b
      Jeff Layton 提交于
      We'll need a way to flag the nfs4_client as already being recorded on
      stable storage so that we don't continually upcall. Currently, that's
      recorded in the cl_firststate field of the client struct. Using an
      entire u32 to store a flag is rather wasteful though.
      
      The cl_cb_flags field is only using 2 bits right now, so repurpose that
      to a generic flags field. Rename NFSD4_CLIENT_KILL to
      NFSD4_CLIENT_CB_KILL to make it evident that it's part of the callback
      flags. Add a mask that we can use for existing checks that look to see
      whether any flags are set, so that the new flags don't interfere.
      
      Convert all references to cl_firstate to the NFSD4_CLIENT_STABLE flag,
      and add a new NFSD4_CLIENT_RECLAIM_COMPLETE flag.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      a52d726b
  6. 25 3月, 2012 3 次提交
  7. 24 3月, 2012 9 次提交
    • K
      seq_file: add seq_set_overflow(), seq_overflow() · e075f591
      KAMEZAWA Hiroyuki 提交于
      It is undocumented but a seq_file's overflow state is indicated by
      m->count == m->size.  Add seq_set_overflow() and seq_overflow() to
      set/check overflow status explicitly.
      
      Based on an idea from Eric Dumazet.
      
      [akpm@linux-foundation.org: tweak code comment]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e075f591
    • P
      proc-ns: use d_set_d_op() API to set dentry ops in proc_ns_instantiate(). · 1b26c9b3
      Pravin B Shelar 提交于
      The namespace cleanup path leaks a dentry which holds a reference count
      on a network namespace.  Keeping that network namespace from being freed
      when the last user goes away.  Leaving things like vlan devices in the
      leaked network namespace.
      
      If you use ip netns add for much real work this problem becomes apparent
      pretty quickly.  It light testing the problem hides because frequently
      you simply don't notice the leak.
      
      Use d_set_d_op() so that DCACHE_OP_* flags are set correctly.
      
      This issue exists back to 3.0.
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Reported-by: NJustin Pettit <jpettit@nicira.com>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b26c9b3
    • K
      procfs: speed up /proc/pid/stat, statm · bda7bad6
      KAMEZAWA Hiroyuki 提交于
      Process accounting applications as top, ps visit some files under
      /proc/<pid>.  With seq_put_decimal_ull(), we can optimize /proc/<pid>/stat
      and /proc/<pid>/statm files.
      
      This patch adds
        - seq_put_decimal_ll() for signed values.
        - allow delimiter == 0.
        - convert seq_printf() to seq_put_decimal_ull/ll in /proc/stat, statm.
      
      Test result on a system with 2000+ procs.
      
      Before patch:
        [kamezawa@bluextal test]$ top -b -n 1 | wc -l
        2223
        [kamezawa@bluextal test]$ time top -b -n 1 > /dev/null
      
        real    0m0.675s
        user    0m0.044s
        sys     0m0.121s
      
        [kamezawa@bluextal test]$ time ps -elf > /dev/null
      
        real    0m0.236s
        user    0m0.056s
        sys     0m0.176s
      
      After patch:
        kamezawa@bluextal ~]$ time top -b -n 1 > /dev/null
      
        real    0m0.657s
        user    0m0.052s
        sys     0m0.100s
      
        [kamezawa@bluextal ~]$ time ps -elf > /dev/null
      
        real    0m0.198s
        user    0m0.050s
        sys     0m0.145s
      
      Considering top, ps tend to scan /proc periodically, this will reduce cpu
      consumption by top/ps to some extent.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bda7bad6
    • K
      procfs: add num_to_str() to speed up /proc/stat · 1ac101a5
      KAMEZAWA Hiroyuki 提交于
      == stat_check.py
      num = 0
      with open("/proc/stat") as f:
              while num < 1000 :
                      data = f.read()
                      f.seek(0, 0)
                      num = num + 1
      ==
      
      perf shows
      
          20.39%  stat_check.py  [kernel.kallsyms]    [k] format_decode
          13.41%  stat_check.py  [kernel.kallsyms]    [k] number
          12.61%  stat_check.py  [kernel.kallsyms]    [k] vsnprintf
          10.85%  stat_check.py  [kernel.kallsyms]    [k] memcpy
           4.85%  stat_check.py  [kernel.kallsyms]    [k] radix_tree_lookup
           4.43%  stat_check.py  [kernel.kallsyms]    [k] seq_printf
      
      This patch removes most of calls to vsnprintf() by adding num_to_str()
      and seq_print_decimal_ull(), which prints decimal numbers without rich
      functions provided by printf().
      
      On my 8cpu box.
      == Before patch ==
      [root@bluextal test]# time ./stat_check.py
      
      real    0m0.150s
      user    0m0.026s
      sys     0m0.121s
      
      == After patch ==
      [root@bluextal test]# time ./stat_check.py
      
      real    0m0.055s
      user    0m0.022s
      sys     0m0.030s
      
      [akpm@linux-foundation.org: remove incorrect comment, use less statck in num_to_str(), move comment from .h to .c, simplify seq_put_decimal_ull()]
      [andrea@betterlinux.com: avoid breaking the ABI in /proc/stat]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrea Righi <andrea@betterlinux.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Turner <pjt@google.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ac101a5
    • E
      proc: speed up /proc/stat handling · 59a32e2c
      Eric Dumazet 提交于
      On a typical 16 cpus machine, "cat /proc/stat" gives more than 4096 bytes,
      and is slow :
      
        # strace -T -o /tmp/STRACE cat /proc/stat | wc -c
        5826
        # grep "cpu " /tmp/STRACE
        read(0, "cpu  1949310 19 2144714 12117253"..., 32768) = 5826 <0.001504>
      
      Thats partly because show_stat() must be called twice since initial
      buffer size is too small (4096 bytes for less than 32 possible cpus)
      
      Fix this by :
      
       1) Taking into account nr_irqs in the initial buffer sizing.
      
       2) Using ksize() to allow better filling of initial buffer.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59a32e2c
    • D
      fs/proc/kcore.c: make get_sparsemem_vmemmap_info() static · b908243c
      Djalal Harouni 提交于
      get_sparsemem_vmemmap_info() is only used inside fs/proc/kcore.c
      Signed-off-by: NDjalal Harouni <tixxdz@opendz.org>
      Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b908243c
    • J
      coredump: add VM_NODUMP, MADV_NODUMP, MADV_CLEAR_NODUMP · accb61fe
      Jason Baron 提交于
      Since we no longer need the VM_ALWAYSDUMP flag, let's use the freed bit
      for 'VM_NODUMP' flag.  The idea is is to add a new madvise() flag:
      MADV_DONTDUMP, which can be set by applications to specifically request
      memory regions which should not dump core.
      
      The specific application I have in mind is qemu: we can add a flag there
      that wouldn't dump all of guest memory when qemu dumps core.  This flag
      might also be useful for security sensitive apps that want to absolutely
      make sure that parts of memory are not dumped.  To clear the flag use:
      MADV_DODUMP.
      
      [akpm@linux-foundation.org: s/MADV_NODUMP/MADV_DONTDUMP/, s/MADV_CLEAR_NODUMP/MADV_DODUMP/, per Roland]
      [akpm@linux-foundation.org: fix up the architectures which broke]
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Acked-by: NRoland McGrath <roland@hack.frob.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      accb61fe
    • J
      coredump: remove VM_ALWAYSDUMP flag · 909af768
      Jason Baron 提交于
      The motivation for this patchset was that I was looking at a way for a
      qemu-kvm process, to exclude the guest memory from its core dump, which
      can be quite large.  There are already a number of filter flags in
      /proc/<pid>/coredump_filter, however, these allow one to specify 'types'
      of kernel memory, not specific address ranges (which is needed in this
      case).
      
      Since there are no more vma flags available, the first patch eliminates
      the need for the 'VM_ALWAYSDUMP' flag.  The flag is used internally by
      the kernel to mark vdso and vsyscall pages.  However, it is simple
      enough to check if a vma covers a vdso or vsyscall page without the need
      for this flag.
      
      The second patch then replaces the 'VM_ALWAYSDUMP' flag with a new
      'VM_NODUMP' flag, which can be set by userspace using new madvise flags:
      'MADV_DONTDUMP', and unset via 'MADV_DODUMP'.  The core dump filters
      continue to work the same as before unless 'MADV_DONTDUMP' is set on the
      region.
      
      The qemu code which implements this features is at:
      
        http://people.redhat.com/~jbaron/qemu-dump/qemu-dump.patch
      
      In my testing the qemu core dump shrunk from 383MB -> 13MB with this
      patch.
      
      I also believe that the 'MADV_DONTDUMP' flag might be useful for
      security sensitive apps, which might want to select which areas are
      dumped.
      
      This patch:
      
      The VM_ALWAYSDUMP flag is currently used by the coredump code to
      indicate that a vma is part of a vsyscall or vdso section.  However, we
      can determine if a vma is in one these sections by checking it against
      the gate_vma and checking for a non-NULL return value from
      arch_vma_name().  Thus, freeing a valuable vma bit.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Acked-by: NRoland McGrath <roland@hack.frob.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Avi Kivity <avi@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      909af768
    • N
      fat: fix bug in enforcing Long File Name length · d533df07
      Namjae Jeon 提交于
      Since '*outlen' is initialized to zero, it is currently possible to
      create a filename of length (FAT_LFN_LEN + 1) when utf8 is not enabled.
      To enforce the FAT_LFN_LEN limit, we must perform one less iteration.
      Signed-off-by: NNamjae Jeon <linkinjeon@gmail.com>
      Signed-off-by: NRavishankar N <cyberax82@gmail.com>
      Acked-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d533df07