1. 18 5月, 2010 1 次提交
  2. 17 5月, 2010 1 次提交
    • J
      writeback: fix WB_SYNC_NONE writeback from umount · e913fc82
      Jens Axboe 提交于
      When umount calls sync_filesystem(), we first do a WB_SYNC_NONE
      writeback to kick off writeback of pending dirty inodes, then follow
      that up with a WB_SYNC_ALL to wait for it. Since umount already holds
      the sb s_umount mutex, WB_SYNC_NONE ends up doing nothing and all
      writeback happens as WB_SYNC_ALL. This can greatly slow down umount,
      since WB_SYNC_ALL writeback is a data integrity operation and thus
      a bigger hammer than simple WB_SYNC_NONE. For barrier aware file systems
      it's a lot slower.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      e913fc82
  3. 11 5月, 2010 1 次提交
    • M
      block: allow initialization of previously allocated request_queue · 01effb0d
      Mike Snitzer 提交于
      blk_init_queue() allocates the request_queue structure and then
      initializes it as needed (request_fn, elevator, etc).
      
      Split initialization out to blk_init_allocated_queue_node.
      Introduce blk_init_allocated_queue wrapper function to model existing
      blk_init_queue and blk_init_queue_node interfaces.
      
      Export elv_register_queue to allow a newly added elevator to be
      registered with sysfs.  Export elv_unregister_queue for symmetry.
      
      These changes allow DM to initialize a device's request_queue with more
      precision.  In particular, DM no longer unconditionally initializes a
      full request_queue (elevator et al).  It only does so for a
      request-based DM device.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      01effb0d
  4. 29 4月, 2010 4 次提交
    • V
      sctp: Fix oops when sending queued ASCONF chunks · c0786693
      Vlad Yasevich 提交于
      When we finish processing ASCONF_ACK chunk, we try to send
      the next queued ASCONF.  This action runs the sctp state
      machine recursively and it's not prepared to do so.
      
      kernel BUG at kernel/timer.c:790!
      invalid opcode: 0000 [#1] SMP
      last sysfs file: /sys/module/ipv6/initstate
      Modules linked in: sha256_generic sctp libcrc32c ipv6 dm_multipath
      uinput 8139too i2c_piix4 8139cp mii i2c_core pcspkr virtio_net joydev
      floppy virtio_blk virtio_pci [last unloaded: scsi_wait_scan]
      
      Pid: 0, comm: swapper Not tainted 2.6.34-rc4 #15 /Bochs
      EIP: 0060:[<c044a2ef>] EFLAGS: 00010286 CPU: 0
      EIP is at add_timer+0xd/0x1b
      EAX: cecbab14 EBX: 000000f0 ECX: c0957b1c EDX: 03595cf4
      ESI: cecba800 EDI: cf276f00 EBP: c0957aa0 ESP: c0957aa0
       DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      Process swapper (pid: 0, ti=c0956000 task=c0988ba0 task.ti=c0956000)
      Stack:
       c0957ae0 d1851214 c0ab62e4 c0ab5f26 0500ffff 00000004 00000005 00000004
      <0> 00000000 d18694fd 00000004 1666b892 cecba800 cecba800 c0957b14
      00000004
      <0> c0957b94 d1851b11 ceda8b00 cecba800 cf276f00 00000001 c0957b14
      000000d0
      Call Trace:
       [<d1851214>] ? sctp_side_effects+0x607/0xdfc [sctp]
       [<d1851b11>] ? sctp_do_sm+0x108/0x159 [sctp]
       [<d1863386>] ? sctp_pname+0x0/0x1d [sctp]
       [<d1861a56>] ? sctp_primitive_ASCONF+0x36/0x3b [sctp]
       [<d185657c>] ? sctp_process_asconf_ack+0x2a4/0x2d3 [sctp]
       [<d184e35c>] ? sctp_sf_do_asconf_ack+0x1dd/0x2b4 [sctp]
       [<d1851ac1>] ? sctp_do_sm+0xb8/0x159 [sctp]
       [<d1863334>] ? sctp_cname+0x0/0x52 [sctp]
       [<d1854377>] ? sctp_assoc_bh_rcv+0xac/0xe1 [sctp]
       [<d1858f0f>] ? sctp_inq_push+0x2d/0x30 [sctp]
       [<d186329d>] ? sctp_rcv+0x797/0x82e [sctp]
      Tested-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NYuansong Qiao <ysqiao@research.ait.ie>
      Signed-off-by: NShuaijun Zhang <szhang@research.ait.ie>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0786693
    • W
      sctp: avoid irq lock inversion while call sk->sk_data_ready() · 561b1733
      Wei Yongjun 提交于
      sk->sk_data_ready() of sctp socket can be called from both BH and non-BH
      contexts, but the default sk->sk_data_ready(), sock_def_readable(), can
      not be used in this case. Therefore, we have to make a new function
      sctp_data_ready() to grab sk->sk_data_ready() with BH disabling.
      
      =========================================================
      [ INFO: possible irq lock inversion dependency detected ]
      2.6.33-rc6 #129
      ---------------------------------------------------------
      sctp_darn/1517 just changed the state of lock:
       (clock-AF_INET){++.?..}, at: [<c06aab60>] sock_def_readable+0x20/0x80
      but this lock took another, SOFTIRQ-unsafe lock in the past:
       (slock-AF_INET){+.-...}
      
      and interrupts could create inverse lock ordering between them.
      
      other info that might help us debug this:
      1 lock held by sctp_darn/1517:
       #0:  (sk_lock-AF_INET){+.+.+.}, at: [<cdfe363d>] sctp_sendmsg+0x23d/0xc00 [sctp]
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      561b1733
    • D
      blkdev: add blkdev_issue_zeroout helper function · 3f14d792
      Dmitry Monakhov 提交于
      - Add bio_batch helper primitive. This is rather generic primitive
        for submitting/waiting a complex request which consists of several
        bios.
      - blkdev_issue_zeroout() generate number of zero filed write bios.
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3f14d792
    • D
      blkdev: generalize flags for blkdev_issue_fn functions · fbd9b09a
      Dmitry Monakhov 提交于
      The patch just convert all blkdev_issue_xxx function to common
      set of flags. Wait/allocation semantics preserved.
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      fbd9b09a
  5. 28 4月, 2010 1 次提交
  6. 27 4月, 2010 1 次提交
    • T
      block: implement bd_claiming and claiming block · 6b4517a7
      Tejun Heo 提交于
      Currently, device claiming for exclusive open is done after low level
      open - disk->fops->open() - has completed successfully.  This means
      that exclusive open attempts while a device is already exclusively
      open will fail only after disk->fops->open() is called.
      
      cdrom driver issues commands during open() which means that O_EXCL
      open attempt can unintentionally inject commands to in-progress
      command stream for burning thus disturbing burning process.  In most
      cases, this doesn't cause problems because the first command to be
      issued is TUR which most devices can process in the middle of burning.
      However, depending on how a device replies to TUR during burning,
      cdrom driver may end up issuing further commands.
      
      This can't be resolved trivially by moving bd_claim() before doing
      actual open() because that means an open attempt which will end up
      failing could interfere other legit O_EXCL open attempts.
      ie. unconfirmed open attempts can fail others.
      
      This patch resolves the problem by introducing claiming block which is
      started by bd_start_claiming() and terminated either by bd_claim() or
      bd_abort_claiming().  bd_claim() from inside a claiming block is
      guaranteed to succeed and once a claiming block is started, other
      bd_start_claiming() or bd_claim() attempts block till the current
      claiming block is terminated.
      
      bd_claim() can still be used standalone although now it always
      synchronizes against claiming blocks, so the existing users will keep
      working without any change.
      
      blkdev_open() and open_bdev_exclusive() are converted to use claiming
      blocks so that exclusive open attempts from these functions don't
      interfere with the existing exclusive open.
      
      This problem was discovered while investigating bko#15403.
      
        https://bugzilla.kernel.org/show_bug.cgi?id=15403
      
      The burning problem itself can be resolved by updating userspace
      probing tools to always open w/ O_EXCL.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NMatthias-Christian Ott <ott@mirix.org>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      6b4517a7
  7. 25 4月, 2010 2 次提交
  8. 24 4月, 2010 1 次提交
  9. 22 4月, 2010 4 次提交
  10. 21 4月, 2010 2 次提交
    • V
      blkio: Fix blkio crash during rq stat update · 7f1dc8a2
      Vivek Goyal 提交于
      blkio + cfq was crashing even when two sequential readers were put in two
      separate cgroups (group_isolation=0).
      
      The reason being that cfqq can migrate across groups based on its being
      sync-noidle or not, it can happen that at request insertion time, cfqq
      belonged to one cfqg and at request dispatch time, it belonged to root
      group. In this case request stats per cgroup can go wrong and it also runs
      into BUG_ON().
      
      This patch implements rq stashing away a cfq group pointer and not relying
      on cfqq->cfqg pointer alone for rq stat accounting.
      
      [   65.163523] ------------[ cut here ]------------
      [   65.164301] kernel BUG at block/blk-cgroup.c:117!
      [   65.164301] invalid opcode: 0000 [#1] SMP
      [   65.164301] last sysfs file: /sys/devices/pci0000:00/0000:00:05.0/0000:60:00.1/host9/rport-9:0-0/target9:0:0/9:0:0:2/block/sde/stat
      [   65.164301] CPU 1
      [   65.164301] Modules linked in: dm_round_robin dm_multipath qla2xxx scsi_transport_fc dm_zero dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      [   65.164301]
      [   65.164301] Pid: 4505, comm: fio Not tainted 2.6.34-rc4-blk-for-35 #34 0A98h/HP xw8600 Workstation
      [   65.164301] RIP: 0010:[<ffffffff8121924f>]  [<ffffffff8121924f>] blkiocg_update_io_remove_stats+0x5b/0xaf
      [   65.164301] RSP: 0018:ffff8800ba5a79e8  EFLAGS: 00010046
      [   65.164301] RAX: 0000000000000096 RBX: ffff8800bb268d60 RCX: 0000000000000000
      [   65.164301] RDX: ffff8800bb268eb8 RSI: 0000000000000000 RDI: ffff8800bb268e00
      [   65.164301] RBP: ffff8800ba5a7a08 R08: 0000000000000064 R09: 0000000000000001
      [   65.164301] R10: 0000000000079640 R11: ffff8800a0bd5bf0 R12: ffff8800bab4af01
      [   65.164301] R13: ffff8800bab4af00 R14: ffff8800bb1d8928 R15: 0000000000000000
      [   65.164301] FS:  00007f18f75056f0(0000) GS:ffff880001e40000(0000) knlGS:0000000000000000
      [   65.164301] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   65.164301] CR2: 000000000040e7f0 CR3: 00000000ba52b000 CR4: 00000000000006e0
      [   65.164301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   65.164301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [   65.164301] Process fio (pid: 4505, threadinfo ffff8800ba5a6000, task ffff8800ba45ae80)
      [   65.164301] Stack:
      [   65.164301]  ffff8800ba5a7a08 ffff8800ba722540 ffff8800bab4af68 ffff8800bab4af68
      [   65.164301] <0> ffff8800ba5a7a38 ffffffff8121d814 ffff8800ba722540 ffff8800bab4af68
      [   65.164301] <0> ffff8800ba722540 ffff8800a08f6800 ffff8800ba5a7a68 ffffffff8121d8ca
      [   65.164301] Call Trace:
      [   65.164301]  [<ffffffff8121d814>] cfq_remove_request+0xe4/0x116
      [   65.164301]  [<ffffffff8121d8ca>] cfq_dispatch_insert+0x84/0xe1
      [   65.164301]  [<ffffffff8121e833>] cfq_dispatch_requests+0x767/0x8e8
      [   65.164301]  [<ffffffff8120e524>] ? submit_bio+0xc3/0xcc
      [   65.164301]  [<ffffffff810ad657>] ? sync_page_killable+0x0/0x35
      [   65.164301]  [<ffffffff8120ea8d>] blk_peek_request+0x191/0x1a7
      [   65.164301]  [<ffffffffa000109c>] ? dm_get_live_table+0x44/0x4f [dm_mod]
      [   65.164301]  [<ffffffffa0002799>] dm_request_fn+0x38/0x14c [dm_mod]
      [   65.164301]  [<ffffffff810ad657>] ? sync_page_killable+0x0/0x35
      [   65.164301]  [<ffffffff8120f600>] __generic_unplug_device+0x32/0x37
      [   65.164301]  [<ffffffff8120f8a0>] generic_unplug_device+0x2e/0x3c
      [   65.164301]  [<ffffffffa00011a6>] dm_unplug_all+0x42/0x5b [dm_mod]
      [   65.164301]  [<ffffffff8120b063>] blk_unplug+0x29/0x2d
      [   65.164301]  [<ffffffff8120b079>] blk_backing_dev_unplug+0x12/0x14
      [   65.164301]  [<ffffffff81108a82>] block_sync_page+0x35/0x39
      [   65.164301]  [<ffffffff810ad64e>] sync_page+0x41/0x4a
      [   65.164301]  [<ffffffff810ad665>] sync_page_killable+0xe/0x35
      [   65.164301]  [<ffffffff81589027>] __wait_on_bit_lock+0x46/0x8f
      [   65.164301]  [<ffffffff810ad52d>] __lock_page_killable+0x66/0x6d
      [   65.164301]  [<ffffffff81055fd4>] ? wake_bit_function+0x0/0x33
      [   65.164301]  [<ffffffff810ad560>] lock_page_killable+0x2c/0x2e
      [   65.164301]  [<ffffffff810aebfd>] generic_file_aio_read+0x361/0x4f0
      [   65.164301]  [<ffffffff810e906c>] do_sync_read+0xcb/0x108
      [   65.164301]  [<ffffffff811e32a3>] ? security_file_permission+0x16/0x18
      [   65.164301]  [<ffffffff810e96d3>] vfs_read+0xab/0x108
      [   65.164301]  [<ffffffff810e97f0>] sys_read+0x4a/0x6e
      [   65.164301]  [<ffffffff81002b5b>] system_call_fastpath+0x16/0x1b
      [   65.164301] Code: 00 74 1c 48 8b 8b 60 01 00 00 48 85 c9 75 04 0f 0b eb fe 48 ff c9 48 89 8b 60 01 00 00 eb 1a 48 8b 8b 58 01 00 00 48 85 c9 75 04 <0f> 0b eb fe 48 ff c9 48 89 8b 58 01 00 00 45 84 e4 74 16 48 8b
      [   65.164301] RIP  [<ffffffff8121924f>] blkiocg_update_io_remove_stats+0x5b/0xaf
      [   65.164301]  RSP <ffff8800ba5a79e8>
      [   65.164301] ---[ end trace 1b2b828753032e68 ]---
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      7f1dc8a2
    • D
      pcmcia: pcmcia_dev_present bugfix · 04de0816
      Dominik Brodowski 提交于
      pcmcia_dev_present is in and by itself buggy. Add a note specifying
      why it is broken, and replace the broken locking -- taking a mutex
      is a bad idea in IRQ context, from which this function is rarely
      called -- by an atomic_t.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      04de0816
  11. 20 4月, 2010 2 次提交
  12. 19 4月, 2010 3 次提交
  13. 16 4月, 2010 1 次提交
  14. 15 4月, 2010 1 次提交
  15. 14 4月, 2010 2 次提交
    • D
      rcu: Better explain the condition parameter of rcu_dereference_check() · c08c68dd
      David Howells 提交于
      Better explain the condition parameter of
      rcu_dereference_check() that describes the conditions under
      which the dereference is permitted to take place (and
      incorporate Yong Zhang's suggestion).  This condition is only
      checked under lockdep proving.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: eric.dumazet@gmail.com
      LKML-Reference: <1270852752-25278-2-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c08c68dd
    • P
      rcu: Add rcu_access_pointer and rcu_dereference_protected · b62730ba
      Paul E. McKenney 提交于
      This patch adds variants of rcu_dereference() that handle
      situations where the RCU-protected data structure cannot change,
      perhaps due to our holding the update-side lock, or where the
      RCU-protected pointer is only to be fetched, not dereferenced.
      These are needed due to some performance concerns with using
      rcu_dereference() where it is not required, aside from the need
      for lockdep/sparse checking.
      
      The new rcu_access_pointer() primitive is for the case where the
      pointer is be fetch and not dereferenced.  This primitive may be
      used without protection, RCU or otherwise, due to the fact that
      it uses ACCESS_ONCE().
      
      The new rcu_dereference_protected() primitive is for the case
      where updates are prevented, for example, due to holding the
      update-side lock.  This primitive does neither ACCESS_ONCE() nor
      smp_read_barrier_depends(), so can only be used when updates are
      somehow prevented.
      Suggested-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: eric.dumazet@gmail.com
      LKML-Reference: <1270852752-25278-1-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b62730ba
  16. 12 4月, 2010 1 次提交
    • T
      NFSv4: fix delegated locking · 0df5dd4a
      Trond Myklebust 提交于
      Arnaud Giersch reports that NFSv4 locking is broken when we hold a
      delegation since commit 8e469ebd (NFSv4:
      Don't allow posix locking against servers that don't support it).
      
      According to Arnaud, the lock succeeds the first time he opens the file
      (since we cannot do a delegated open) but then fails after we start using
      delegated opens.
      
      The following patch fixes it by ensuring that locking behaviour is
      governed by a per-filesystem capability flag that is initially set, but
      gets cleared if the server ever returns an OPEN without the
      NFS4_OPEN_RESULT_LOCKTYPE_POSIX flag being set.
      Reported-by: NArnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@kernel.org
      0df5dd4a
  17. 10 4月, 2010 4 次提交
  18. 09 4月, 2010 3 次提交
    • D
      blkio: Add io_merged stat · 812d4026
      Divyesh Shah 提交于
      This includes both the number of bios merged into requests belonging to this
      cgroup as well as the number of requests merged together.
      In the past, we've observed different merging behavior across upstream kernels,
      some by design some actual bugs. This stat helps a lot in debugging such
      problems when applications report decreased throughput with a new kernel
      version.
      
      This needed adding an extra elevator function to capture bios being merged as I
      did not want to pollute elevator code with blkiocg knowledge and hence needed
      the accounting invocation to come from CFQ.
      
      Signed-off-by: Divyesh Shah<dpshah@google.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      812d4026
    • D
      blkio: Changes to IO controller additional stats patches · 84c124da
      Divyesh Shah 提交于
      that include some minor fixes and addresses all comments.
      
      Changelog: (most based on Vivek Goyal's comments)
      o renamed blkiocg_reset_write to blkiocg_reset_stats
      o more clarification in the documentation on io_service_time and io_wait_time
      o Initialize blkg->stats_lock
      o rename io_add_stat to blkio_add_stat and declare it static
      o use bool for direction and sync
      o derive direction and sync info from existing rq methods
      o use 12 for major:minor string length
      o define io_service_time better to cover the NCQ case
      o add a separate reset_stats interface
      o make the indexed stats a 2d array to simplify macro and function pointer code
      o blkio.time now exports in jiffies as before
      o Added stats description in patch description and
        Documentation/cgroup/blkio-controller.txt
      o Prefix all stats functions with blkio and make them static as applicable
      o replace IO_TYPE_MAX with IO_TYPE_TOTAL
      o Moved #define constant to top of blk-cgroup.c
      o Pass dev_t around instead of char *
      o Add note to documentation file about resetting stats
      o use BLK_CGROUP_MODULE in addition to BLK_CGROUP config option in #ifdef
        statements
      o Avoid struct request specific knowledge in blk-cgroup. blk-cgroup.h now has
        rq_direction() and rq_sync() functions which are used by CFQ and when using
        io-controller at a higher level, bio_* functions can be added.
      
      Signed-off-by: Divyesh Shah<dpshah@google.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      84c124da
    • M
      libata: Fix accesses at LBA28 boundary (old bug, but nasty) (v2) · 45c4d015
      Mark Lord 提交于
      Most drives from Seagate, Hitachi, and possibly other brands,
      do not allow LBA28 access to sector number 0x0fffffff (2^28 - 1).
      So instead use LBA48 for such accesses.
      
      This bug could bite a lot of systems, especially when the user has
      taken care to align partitions to 4KB boundaries. On misaligned systems,
      it is less likely to be encountered, since a 4KB read would end at
      0x10000000 rather than at 0x0fffffff.
      Signed-off-by: NMark Lord <mlord@pobox.com>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      45c4d015
  19. 08 4月, 2010 2 次提交
  20. 07 4月, 2010 3 次提交
    • K
      memcg: fix race in file_mapped accounting · 8725d541
      KAMEZAWA Hiroyuki 提交于
      Presently, memcg's FILE_MAPPED accounting has following race with
      move_account (happens at rmdir()).
      
          increment page->mapcount (rmap.c)
          mem_cgroup_update_file_mapped()           move_account()
      					      lock_page_cgroup()
      					      check page_mapped() if
      					      page_mapped(page)>1 {
      						FILE_MAPPED -1 from old memcg
      						FILE_MAPPED +1 to old memcg
      					      }
      					      .....
      					      overwrite pc->mem_cgroup
      					      unlock_page_cgroup()
          lock_page_cgroup()
          FILE_MAPPED + 1 to pc->mem_cgroup
          unlock_page_cgroup()
      
      Then,
      	old memcg (-1 file mapped)
      	new memcg (+2 file mapped)
      
      This happens because move_account see page_mapped() which is not guarded
      by lock_page_cgroup().  This patch adds FILE_MAPPED flag to page_cgroup
      and move account information based on it.  Now, all checks are synchronous
      with lock_page_cgroup().
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NBalbir Singh <balbir@in.ibm.com>
      Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Andrea Righi <arighi@develer.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8725d541
    • N
      pagemap: fix pfn calculation for hugepage · 116354d1
      Naoya Horiguchi 提交于
      When we look into pagemap using page-types with option -p, the value of
      pfn for hugepages looks wrong (see below.) This is because pte was
      evaluated only once for one vma although it should be updated for each
      hugepage.  This patch fixes it.
      
        $ page-types -p 3277 -Nl -b huge
        voffset   offset  len     flags
        7f21e8a00 11e400  1       ___U___________H_G________________
        7f21e8a01 11e401  1ff     ________________TG________________
                     ^^^
        7f21e8c00 11e400  1       ___U___________H_G________________
        7f21e8c01 11e401  1ff     ________________TG________________
                     ^^^
      
      One hugepage contains 1 head page and 511 tail pages in x86_64 and each
      two lines represent each hugepage.  Voffset and offset mean virtual
      address and physical address in the page unit, respectively.  The
      different hugepages should not have the same offset value.
      
      With this patch applied:
      
        $ page-types -p 3386 -Nl -b huge
        voffset   offset   len    flags
        7fec7a600 112c00   1      ___UD__________H_G________________
        7fec7a601 112c01   1ff    ________________TG________________
                     ^^^
        7fec7a800 113200   1      ___UD__________H_G________________
        7fec7a801 113201   1ff    ________________TG________________
                     ^^^
                     OK
      
      More info:
      
      - This patch modifies walk_page_range()'s hugepage walker.  But the
        change only affects pagemap_read(), which is the only caller of hugepage
        callback.
      
      - Without this patch, hugetlb_entry() callback is called per vma, that
        doesn't match the natural expectation from its name.
      
      - With this patch, hugetlb_entry() is called per hugepte entry and the
        callback can become much simpler.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NMatt Mackall <mpm@selenic.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      116354d1
    • Y
      kernel.h: fix wrong usage of __ratelimit() · bb1dc0ba
      Yong Zhang 提交于
      When __ratelimit() returns 1 this means that we can go ahead.
      Signed-off-by: NYong Zhang <yong.zhang@windriver.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bb1dc0ba