1. 10 10月, 2010 1 次提交
  2. 07 10月, 2010 4 次提交
    • S
      · 43695d09
      Sunil Mushran 提交于
      ocfs2/cluster: Show per region heartbeat elapsed time
      
      This patch adds a per region debugfs file that shows the elapsed time
      since the time the o2hb timer was last armed.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      43695d09
    • S
      · d6aa1c7c
      Sunil Mushran 提交于
      ocfs2/cluster: Add mlogs for heartbeat up/down events
      
      This patch adds mlogs for o2hb up and down events.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      d6aa1c7c
    • S
      · 1f285305
      Sunil Mushran 提交于
      ocfs2/cluster: Create debugfs dir/files for each region
      
      This patch creates debugfs directory for each o2hb region and creates
      files to expose the region number and the per region live node bitmap.
      This information will be useful in debugging cluster issues.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      1f285305
    • S
      · a6de0136
      Sunil Mushran 提交于
      ocfs2/cluster: Create debugfs files for live, quorum and failed region bitmaps
      
      This patch prints the bitmaps of live, quorum and failed regions. This
      information will be useful in debugging cluster issues.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      a6de0136
  3. 08 10月, 2010 1 次提交
    • S
      · b1c5ebfb
      Sunil Mushran 提交于
      ocfs2/cluster: Maintain bitmap of failed regions
      
      In global heartbeat mode, we track the bitmap of regions that have seen
      heartbeat timeouts. We fence if the number of such regions is greater than
      or equal to half the number of quorum regions.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      b1c5ebfb
  4. 07 10月, 2010 2 次提交
    • S
      · 43182d2a
      Sunil Mushran 提交于
      ocfs2/cluster: Maintain bitmap of quorum regions
      
      o2hb allows online adding of regions. However, a newly added region is not
      used in quorum calculations unless it has been added on all nodes. This patch
      tracks a bitmap of such quorum regions.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      43182d2a
    • S
      · e7d656ba
      Sunil Mushran 提交于
      ocfs2/cluster: Track bitmap of live heartbeat regions
      
      A heartbeat region becomes live (or active) after a fixed number of (steady)
      iterations.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      e7d656ba
  5. 08 10月, 2010 1 次提交
    • S
      · 536f0741
      Sunil Mushran 提交于
      ocfs2/cluster: Track number of global heartbeat regions
      
      In global heartbeat mode, we have a upper limit for the number of active regions.
      This patch adds the facility to track the number of active global heartbeat
      regions and fails to start heartbeat if the number exceeds the maximum.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      536f0741
  6. 07 10月, 2010 1 次提交
    • S
      · 823a637a
      Sunil Mushran 提交于
      ocfs2/cluster: Maintain live node bitmap per heartbeat region
      
      Currently we track a global livenode bitmap that keeps track of all nodes
      that are heartbeating in all regions.
      
      This patch adds the ability to track the livenode bitmap on a per region basis.
      We will use this facility in a later patch to allow us to withstand the loss of
      a minority number of regions.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      823a637a
  7. 08 10月, 2010 3 次提交
  8. 07 10月, 2010 1 次提交
    • S
      · 18c50cb0
      Sunil Mushran 提交于
      ocfs2/cluster: Print messages when adding/removing heartbeat regions
      
      Prints messages when the user adds or removes heartbeat regions in global
      heartbeat mode. These messages are useful when debugging cluster related issues.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      18c50cb0
  9. 08 10月, 2010 1 次提交
    • S
      · 18cfdf1b
      Sunil Mushran 提交于
      ocfs2/dlm: Add message DLM_QUERY_NODEINFO
      
      Adds new dlm message DLM_QUERY_NODEINFO that sends the attributes of all
      registered nodes. This message is sent if the negotiated dlm protocol is
      1.1 or higher. If the information of the joining node does not match
      that of any existing nodes, the join domain request is rejected.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      18cfdf1b
  10. 07 10月, 2010 1 次提交
    • S
      · 5f3c6d9c
      Sunil Mushran 提交于
      ocfs2: Print message if user mounts without starting global heartbeat
      
      In global heartbeat mode, the heartbeat is started by the user. This patch
      prints an error if the user attempts to mount a volume without starting the
      heartbeat.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      5f3c6d9c
  11. 10 10月, 2010 1 次提交
    • S
      · ea203441
      Sunil Mushran 提交于
      ocfs2/dlm: Add message DLM_QUERY_REGION
      
      Adds new dlm message DLM_QUERY_REGION that sends the names of all active
      heartbeat regions. This message is only sent in the global heartbeat
      mode. If the regions in the joining node do not fully match the ones in
      the active nodes, the join domain request is rejected.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      ea203441
  12. 08 10月, 2010 1 次提交
    • S
      · b3c85c4c
      Sunil Mushran 提交于
      ocfs2/cluster: Get all heartbeat regions
      
      Export function in o2hb to get a list of heartbeat regions. It also adds an
      upper limit to the length of the heartbeat region name.
      
      o2hb_global_heartbeat_active() currently disables global heartbeat. It will
      be enabled in a later patch after all the code is added.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      b3c85c4c
  13. 07 10月, 2010 1 次提交
  14. 08 10月, 2010 1 次提交
    • S
      · 2c442719
      Sunil Mushran 提交于
      ocfs2: Add support for heartbeat=global mount option
      
      Adds support for heartbeat=global mount option. It ensures that the heartbeat
      mode passed matches the one enabled on disk.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      2c442719
  15. 10 10月, 2010 1 次提交
    • S
      · 98f486f2
      Sunil Mushran 提交于
      ocfs2: Add an incompat feature flag OCFS2_FEATURE_INCOMPAT_CLUSTERINFO
      
      OCFS2_FEATURE_INCOMPAT_CLUSTERINFO allows us to use sb->s_cluster_info for
      both userspace and o2cb cluster stacks. It also allows us to extend cluster
      info to include stack flags.
      
      This patch also adds stackflags to sb->s_clusterinfo. It also introduces a
      clusterinfo flag OCFS2_CLUSTER_O2CB_GLOBAL_HEARTBEAT to denote the enabled
      global heartbeat mode.
      
      This incompat flag can be set/cleared using tunefs.ocfs2 --fs-features. The
      clusterinfo flag is set/cleared using tunefs.ocfs2 --update-cluster-stack.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      98f486f2
  16. 08 10月, 2010 1 次提交
    • S
      · 54b5187b
      Sunil Mushran 提交于
      ocfs2/cluster: Add heartbeat mode configfs parameter
      
      Add heartbeat mode parameter to the configfs tree. This will be used
      to set/show the heartbeat mode. The user is free to toggle the mode
      between local and global as long as there is no active heartbeat region.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      54b5187b
  17. 04 10月, 2010 2 次提交
    • C
      writeback: always use sb->s_bdi for writeback purposes · aaead25b
      Christoph Hellwig 提交于
      We currently use struct backing_dev_info for various different purposes.
      Originally it was introduced to describe a backing device which includes
      an unplug and congestion function and various bits of readahead information
      and VM-relevant flags.  We're also using for tracking dirty inodes for
      writeback.
      
      To make writeback properly find all inodes we need to only access the
      per-filesystem backing_device pointed to by the superblock in ->s_bdi
      inside the writeback code, and not the instances pointeded to by
      inode->i_mapping->backing_dev which can be overriden by special devices
      or might not be set at all by some filesystems.
      
      Long term we should split out the writeback-relevant bits of struct
      backing_device_info (which includes more than the current bdi_writeback)
      and only point to it from the superblock while leaving the traditional
      backing device as a separate structure that can be overriden by devices.
      
      The one exception for now is the block device filesystem which really
      wants different writeback contexts for it's different (internal) inodes
      to handle the writeout more efficiently.  For now we do this with
      a hack in fs-writeback.c because we're so late in the cycle, but in
      the future I plan to replace this with a superblock method that allows
      for multiple writeback contexts per filesystem.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      aaead25b
    • G
      fuse: Initialize total_len in fuse_retrieve() · 0157443c
      Geert Uytterhoeven 提交于
      fs/fuse/dev.c:1357: warning: ‘total_len’ may be used uninitialized in this
      function
      
      Initialize total_len to zero, else its value will be undefined.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      0157443c
  18. 02 10月, 2010 4 次提交
    • F
      reiserfs: fix unwanted reiserfs lock recursion · 9d8117e7
      Frederic Weisbecker 提交于
      Prevent from recursively locking the reiserfs lock in reiserfs_unpack()
      because we may call journal_begin() that requires the lock to be taken
      only once, otherwise it won't be able to release the lock while taking
      other mutexes, ending up in inverted dependencies between the journal
      mutex and the reiserfs lock for example.
      
      This fixes:
      
        =======================================================
        [ INFO: possible circular locking dependency detected ]
        2.6.35.4.4a #3
        -------------------------------------------------------
        lilo/1620 is trying to acquire lock:
         (&journal->j_mutex){+.+...}, at: [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs]
      
        but task is already holding lock:
         (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs]
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
               [<c10562b7>] lock_acquire+0x67/0x80
               [<c12facad>] __mutex_lock_common+0x4d/0x410
               [<c12fb0c8>] mutex_lock_nested+0x18/0x20
               [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs]
               [<d0325c06>] do_journal_begin_r+0x86/0x340 [reiserfs]
               [<d0325f77>] journal_begin+0x77/0x140 [reiserfs]
               [<d0315be4>] reiserfs_remount+0x224/0x530 [reiserfs]
               [<c10b6a20>] do_remount_sb+0x60/0x110
               [<c10cee25>] do_mount+0x625/0x790
               [<c10cf014>] sys_mount+0x84/0xb0
               [<c12fca3d>] syscall_call+0x7/0xb
      
        -> #0 (&journal->j_mutex){+.+...}:
               [<c10560f6>] __lock_acquire+0x1026/0x1180
               [<c10562b7>] lock_acquire+0x67/0x80
               [<c12facad>] __mutex_lock_common+0x4d/0x410
               [<c12fb0c8>] mutex_lock_nested+0x18/0x20
               [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs]
               [<d0325f77>] journal_begin+0x77/0x140 [reiserfs]
               [<d0326271>] reiserfs_persistent_transaction+0x41/0x90 [reiserfs]
               [<d030d06c>] reiserfs_get_block+0x22c/0x1530 [reiserfs]
               [<c10db9db>] __block_prepare_write+0x1bb/0x3a0
               [<c10dbbe6>] block_prepare_write+0x26/0x40
               [<d030b738>] reiserfs_prepare_write+0x88/0x170 [reiserfs]
               [<d03294d6>] reiserfs_unpack+0xe6/0x120 [reiserfs]
               [<d0329782>] reiserfs_ioctl+0x272/0x320 [reiserfs]
               [<c10c3188>] vfs_ioctl+0x28/0xa0
               [<c10c3bbd>] do_vfs_ioctl+0x32d/0x5c0
               [<c10c3eb3>] sys_ioctl+0x63/0x70
               [<c12fca3d>] syscall_call+0x7/0xb
      
        other info that might help us debug this:
      
        2 locks held by lilo/1620:
         #0:  (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<d032945a>] reiserfs_unpack+0x6a/0x120 [reiserfs]
         #1:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs]
      
        stack backtrace:
        Pid: 1620, comm: lilo Not tainted 2.6.35.4.4a #3
        Call Trace:
         [<c10560f6>] __lock_acquire+0x1026/0x1180
         [<c10562b7>] lock_acquire+0x67/0x80
         [<c12facad>] __mutex_lock_common+0x4d/0x410
         [<c12fb0c8>] mutex_lock_nested+0x18/0x20
         [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs]
         [<d0325f77>] journal_begin+0x77/0x140 [reiserfs]
         [<d0326271>] reiserfs_persistent_transaction+0x41/0x90 [reiserfs]
         [<d030d06c>] reiserfs_get_block+0x22c/0x1530 [reiserfs]
         [<c10db9db>] __block_prepare_write+0x1bb/0x3a0
         [<c10dbbe6>] block_prepare_write+0x26/0x40
         [<d030b738>] reiserfs_prepare_write+0x88/0x170 [reiserfs]
         [<d03294d6>] reiserfs_unpack+0xe6/0x120 [reiserfs]
         [<d0329782>] reiserfs_ioctl+0x272/0x320 [reiserfs]
         [<c10c3188>] vfs_ioctl+0x28/0xa0
         [<c10c3bbd>] do_vfs_ioctl+0x32d/0x5c0
         [<c10c3eb3>] sys_ioctl+0x63/0x70
         [<c12fca3d>] syscall_call+0x7/0xb
      Reported-by: NJarek Poplawski <jarkao2@gmail.com>
      Tested-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: All since 2.6.32 <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d8117e7
    • F
      reiserfs: fix dependency inversion between inode and reiserfs mutexes · 3f259d09
      Frederic Weisbecker 提交于
      The reiserfs mutex already depends on the inode mutex, so we can't lock
      the inode mutex in reiserfs_unpack() without using the safe locking API,
      because reiserfs_unpack() is always called with the reiserfs mutex locked.
      
      This fixes:
      
        =======================================================
        [ INFO: possible circular locking dependency detected ]
        2.6.35c #13
        -------------------------------------------------------
        lilo/1606 is trying to acquire lock:
         (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs]
      
        but task is already holding lock:
         (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs]
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
               [<c1056347>] lock_acquire+0x67/0x80
               [<c12f083d>] __mutex_lock_common+0x4d/0x410
               [<c12f0c58>] mutex_lock_nested+0x18/0x20
               [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs]
               [<d0329e9a>] reiserfs_lookup_privroot+0x2a/0x90 [reiserfs]
               [<d0316b81>] reiserfs_fill_super+0x941/0xe60 [reiserfs]
               [<c10b7d17>] get_sb_bdev+0x117/0x170
               [<d0313e21>] get_super_block+0x21/0x30 [reiserfs]
               [<c10b74ba>] vfs_kern_mount+0x6a/0x1b0
               [<c10b7659>] do_kern_mount+0x39/0xe0
               [<c10cebe0>] do_mount+0x340/0x790
               [<c10cf0b4>] sys_mount+0x84/0xb0
               [<c12f25cd>] syscall_call+0x7/0xb
      
        -> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}:
               [<c1056186>] __lock_acquire+0x1026/0x1180
               [<c1056347>] lock_acquire+0x67/0x80
               [<c12f083d>] __mutex_lock_common+0x4d/0x410
               [<c12f0c58>] mutex_lock_nested+0x18/0x20
               [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs]
               [<d0329772>] reiserfs_ioctl+0x272/0x320 [reiserfs]
               [<c10c3228>] vfs_ioctl+0x28/0xa0
               [<c10c3c5d>] do_vfs_ioctl+0x32d/0x5c0
               [<c10c3f53>] sys_ioctl+0x63/0x70
               [<c12f25cd>] syscall_call+0x7/0xb
      
        other info that might help us debug this:
      
        1 lock held by lilo/1606:
         #0:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs]
      
        stack backtrace:
        Pid: 1606, comm: lilo Not tainted 2.6.35c #13
        Call Trace:
         [<c1056186>] __lock_acquire+0x1026/0x1180
         [<c1056347>] lock_acquire+0x67/0x80
         [<c12f083d>] __mutex_lock_common+0x4d/0x410
         [<c12f0c58>] mutex_lock_nested+0x18/0x20
         [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs]
         [<d0329772>] reiserfs_ioctl+0x272/0x320 [reiserfs]
         [<c10c3228>] vfs_ioctl+0x28/0xa0
         [<c10c3c5d>] do_vfs_ioctl+0x32d/0x5c0
         [<c10c3f53>] sys_ioctl+0x63/0x70
         [<c12f25cd>] syscall_call+0x7/0xb
      Reported-by: NJarek Poplawski <jarkao2@gmail.com>
      Tested-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: <stable@kernel.org>		[2.6.32 and later]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f259d09
    • J
      proc: make /proc/pid/limits world readable · 3036e7b4
      Jiri Olsa 提交于
      Having the limits file world readable will ease the task of system
      management on systems where root privileges might be restricted.
      
      Having admin restricted with root priviledges, he/she could not check
      other users process' limits.
      
      Also it'd align with most of the /proc stat files.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Cc: Eugene Teo <eugene@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3036e7b4
    • J
      cifs: prevent infinite recursion in cifs_reconnect_tcon · f569599a
      Jeff Layton 提交于
      cifs_reconnect_tcon is called from smb_init. After a successful
      reconnect, cifs_reconnect_tcon will call reset_cifs_unix_caps. That
      function will, in turn call CIFSSMBQFSUnixInfo and CIFSSMBSetFSUnixInfo.
      Those functions also call smb_init.
      
      It's possible for the session and tcon reconnect to succeed, and then
      for another cifs_reconnect to occur before CIFSSMBQFSUnixInfo or
      CIFSSMBSetFSUnixInfo to be called. That'll cause those functions to call
      smb_init and cifs_reconnect_tcon again, ad infinitum...
      
      Break the infinite recursion by having those functions use a new
      smb_init variant that doesn't attempt to perform a reconnect.
      Reported-and-Tested-by: NMichal Suchanek <hramrach@centrum.cz>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      f569599a
  19. 30 9月, 2010 2 次提交
  20. 29 9月, 2010 1 次提交
    • D
      xfs: force background CIL push under sustained load · 80168676
      Dave Chinner 提交于
      I have been seeing occasional pauses in transaction throughput up to
      30s long under heavy parallel workloads. The only notable thing was
      that the xfsaild was trying to be active during the pauses, but
      making no progress. It was running exactly 20 times a second (on the
      50ms no-progress backoff), and the number of pushbuf events was
      constant across this time as well.  IOWs, the xfsaild appeared to be
      stuck on buffers that it could not push out.
      
      Further investigation indicated that it was trying to push out inode
      buffers that were pinned and/or locked. The xfsbufd was also getting
      woken at the same frequency (by the xfsaild, no doubt) to push out
      delayed write buffers. The xfsbufd was not making any progress
      because all the buffers in the delwri queue were pinned. This scan-
      and-make-no-progress dance went one in the trace for some seconds,
      before the xfssyncd came along an issued a log force, and then
      things started going again.
      
      However, I noticed something strange about the log force - there
      were way too many IO's issued. 516 log buffers were written, to be
      exact. That added up to 129MB of log IO, which got me very
      interested because it's almost exactly 25% of the size of the log.
      He delayed logging code is suppose to aggregate the minimum of 25%
      of the log or 8MB worth of changes before flushing. That's what
      really puzzled me - why did a log force write 129MB instead of only
      8MB?
      
      Essentially what has happened is that no CIL pushes had occurred
      since the previous tail push which cleared out 25% of the log space.
      That caused all the new transactions to block because there wasn't
      log space for them, but they kick the xfsaild to push the tail.
      However, the xfsaild was not making progress because there were
      buffers it could not lock and flush, and the xfsbufd could not flush
      them because they were pinned. As a result, both the xfsaild and the
      xfsbufd could not move the tail of the log forward without the CIL
      first committing.
      
      The cause of the problem was that the background CIL push, which
      should happen when 8MB of aggregated changes have been committed, is
      being held off by the concurrent transaction commit load. The
      background push does a down_write_trylock() which will fail if there
      is a concurrent transaction commit holding the push lock in read
      mode. With 8 CPUs all doing transactions as fast as they can, there
      was enough concurrent transaction commits to hold off the background
      push until tail-pushing could no longer free log space, and the halt
      would occur.
      
      It should be noted that there is no reason why it would halt at 25%
      of log space used by a single CIL checkpoint. This bug could
      definitely violate the "no transaction should be larger than half
      the log" requirement and hence result in corruption if the system
      crashed under heavy load. This sort of bug is exactly the reason why
      delayed logging was tagged as experimental....
      
      The fix is to start blocking background pushes once the threshold
      has been exceeded. Rework the threshold calculations to keep the
      amount of log space a CIL checkpoint can use to below that of the
      AIL push threshold to avoid the problem completely.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      80168676
  21. 24 9月, 2010 5 次提交
  22. 23 9月, 2010 4 次提交
    • K
      /proc/pid/smaps: fix dirty pages accounting · 1c2499ae
      KOSAKI Motohiro 提交于
      Currently, /proc/<pid>/smaps has wrong dirty pages accounting.
      Shared_Dirty and Private_Dirty output only pte dirty pages and ignore
      PG_dirty page flag.  It is difference against documentation, but also
      inconsistent against Referenced field.  (Referenced checks both pte and
      page flags)
      
      This patch fixes it.
      
      Test program:
      
       large-array.c
       ---------------------------------------------------
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       #include <unistd.h>
      
       char array[1*1024*1024*1024L];
      
       int main(void)
       {
               memset(array, 1, sizeof(array));
               pause();
      
               return 0;
       }
       ---------------------------------------------------
      
      Test case:
       1. run ./large-array
       2. cat /proc/`pidof large-array`/smaps
       3. swapoff -a
       4. cat /proc/`pidof large-array`/smaps again
      
      Test result:
       <before patch>
      
      00601000-40601000 rw-p 00000000 00:00 0
      Size:            1048576 kB
      Rss:             1048576 kB
      Pss:             1048576 kB
      Shared_Clean:          0 kB
      Shared_Dirty:          0 kB
      Private_Clean:    218992 kB   <-- showed pages as clean incorrectly
      Private_Dirty:    829584 kB
      Referenced:       388364 kB
      Swap:                  0 kB
      KernelPageSize:        4 kB
      MMUPageSize:           4 kB
      
       <after patch>
      
      00601000-40601000 rw-p 00000000 00:00 0
      Size:            1048576 kB
      Rss:             1048576 kB
      Pss:             1048576 kB
      Shared_Clean:          0 kB
      Shared_Dirty:          0 kB
      Private_Clean:         0 kB
      Private_Dirty:   1048576 kB  <-- fixed
      Referenced:       388480 kB
      Swap:                  0 kB
      KernelPageSize:        4 kB
      MMUPageSize:           4 kB
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c2499ae
    • J
      aio: do not return ERESTARTSYS as a result of AIO · a0c42bac
      Jan Kara 提交于
      OCFS2 can return ERESTARTSYS from its write function when the process is
      signalled while waiting for a cluster lock (and the filesystem is mounted
      with intr mount option).  Generally, it seems reasonable to allow
      filesystems to return this error code from its IO functions.  As we must
      not leak ERESTARTSYS (and similar error codes) to userspace as a result of
      an AIO operation, we have to properly convert it to EINTR inside AIO code
      (restarting the syscall isn't really an option because other AIO could
      have been already submitted by the same io_submit syscall).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0c42bac
    • A
      /proc/vmcore: fix seeking · c227e690
      Arnd Bergmann 提交于
      Commit 73296bc6 ("procfs: Use generic_file_llseek in /proc/vmcore")
      broke seeking on /proc/vmcore.  This changes it back to use default_llseek
      in order to restore the original behaviour.
      
      The problem with generic_file_llseek is that it only allows seeks up to
      inode->i_sb->s_maxbytes, which is zero on procfs and some other virtual
      file systems.  We should merge generic_file_llseek and default_llseek some
      day and clean this up in a proper way, but for 2.6.35/36, reverting vmcore
      is the safer solution.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Reported-by: NCAI Qian <caiqian@redhat.com>
      Tested-by: NCAI Qian <caiqian@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c227e690
    • D
      Prevent freeing uninitialized pointer in compat_do_readv_writev · 767b68e9
      Dan Rosenberg 提交于
      In 32-bit compatibility mode, the error handling for
      compat_do_readv_writev() may free an uninitialized pointer, potentially
      leading to all sorts of ugly memory corruption.  This is reliably
      triggerable by unprivileged users by invoking the readv()/writev()
      syscalls with an invalid iovec pointer.  The below patch fixes this to
      emulate the non-compat version.
      
      Introduced by commit b8373363 ("compat: factor out
      compat_rw_copy_check_uvector from compat_do_readv_writev")
      Signed-off-by: NDan Rosenberg <dan.j.rosenberg@gmail.com>
      Cc: stable@kernel.org (2.6.35)
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      767b68e9