1. 31 1月, 2014 10 次提交
    • M
      zram: remove zram->lock in read path and change it with mutex · e46e3315
      Minchan Kim 提交于
      Finally, we separated zram->lock dependency from 32bit stat/ table
      handling so there is no reason to use rw_semaphore between read and
      write path so this patch removes the lock from read path totally and
      changes rw_semaphore with mutex.  So, we could do
      
      old:
      
        read-read: OK
        read-write: NO
        write-write: NO
      
      Now:
      
        read-read: OK
        read-write: OK
        write-write: NO
      
      The below data proves mixed workload performs well 11 times and there is
      also enhance on write-write path because current rw-semaphore doesn't
      support SPIN_ON_OWNER.  It's side effect but anyway good thing for us.
      
      Write-related tests perform better (from 61% to 1058%) but read path has
      good/bad(from -2.22% to 1.45%) but they are all marginal within stddev.
      
        CPU 12
        iozone -t -T -l 12 -u 12 -r 16K -s 60M -I +Z -V 0
      
        ==Initial write                ==Initial write
        records: 10                    records: 10
        avg:  516189.16                avg:  839907.96
        std:   22486.53 (4.36%)        std:   47902.17 (5.70%)
        max:  546970.60                max:  909910.35
        min:  481131.54                min:  751148.38
        ==Rewrite                      ==Rewrite
        records: 10                    records: 10
        avg:  509527.98                avg: 1050156.37
        std:   45799.94 (8.99%)        std:   40695.44 (3.88%)
        max:  611574.27                max: 1111929.26
        min:  443679.95                min:  980409.62
        ==Read                         ==Read
        records: 10                    records: 10
        avg: 4408624.17                avg: 4472546.76
        std:  281152.61 (6.38%)        std:  163662.78 (3.66%)
        max: 4867888.66                max: 4727351.03
        min: 4058347.69                min: 4126520.88
        ==Re-read                      ==Re-read
        records: 10                    records: 10
        avg: 4462147.53                avg: 4363257.75
        std:  283546.11 (6.35%)        std:  247292.63 (5.67%)
        max: 4912894.44                max: 4677241.75
        min: 4131386.50                min: 4035235.84
        ==Reverse Read                 ==Reverse Read
        records: 10                    records: 10
        avg: 4565865.97                avg: 4485818.08
        std:  313395.63 (6.86%)        std:  248470.10 (5.54%)
        max: 5232749.16                max: 4789749.94
        min: 4185809.62                min: 3963081.34
        ==Stride read                  ==Stride read
        records: 10                    records: 10
        avg: 4515981.80                avg: 4418806.01
        std:  211192.32 (4.68%)        std:  212837.97 (4.82%)
        max: 4889287.28                max: 4686967.22
        min: 4210362.00                min: 4083041.84
        ==Random read                  ==Random read
        records: 10                    records: 10
        avg: 4410525.23                avg: 4387093.18
        std:  236693.22 (5.37%)        std:  235285.23 (5.36%)
        max: 4713698.47                max: 4669760.62
        min: 4057163.62                min: 3952002.16
        ==Mixed workload               ==Mixed workload
        records: 10                    records: 10
        avg:  243234.25                avg: 2818677.27
        std:   28505.07 (11.72%)       std:  195569.70 (6.94%)
        max:  288905.23                max: 3126478.11
        min:  212473.16                min: 2484150.69
        ==Random write                 ==Random write
        records: 10                    records: 10
        avg:  555887.07                avg: 1053057.79
        std:   70841.98 (12.74%)       std:   35195.36 (3.34%)
        max:  683188.28                max: 1096125.73
        min:  437299.57                min:  992481.93
        ==Pwrite                       ==Pwrite
        records: 10                    records: 10
        avg:  501745.93                avg:  810363.09
        std:   16373.54 (3.26%)        std:   19245.01 (2.37%)
        max:  518724.52                max:  833359.70
        min:  464208.73                min:  765501.87
        ==Pread                        ==Pread
        records: 10                    records: 10
        avg: 4539894.60                avg: 4457680.58
        std:  197094.66 (4.34%)        std:  188965.60 (4.24%)
        max: 4877170.38                max: 4689905.53
        min: 4226326.03                min: 4095739.72
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e46e3315
    • M
      zram: remove workqueue for freeing removed pending slot · f614a9f4
      Minchan Kim 提交于
      Commit a0c516cb ("zram: don't grab mutex in zram_slot_free_noity")
      introduced free request pending code to avoid scheduling by mutex under
      spinlock and it was a mess which made code lenghty and increased
      overhead.
      
      Now, we don't need zram->lock any more to free slot so this patch
      reverts it and then, tb_lock should protect it.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f614a9f4
    • M
      zram: introduce zram->tb_lock · 92967471
      Minchan Kim 提交于
      Currently, the zram table is protected by zram->lock but it's rather
      coarse-grained lock and it makes hard for scalibility.
      
      Let's use own rwlock instead of depending on zram->lock.  This patch
      adds new locking so obviously, it would make slow but this patch is just
      prepartion for removing coarse-grained rw_semaphore(ie, zram->lock)
      which is hurdle about zram scalability.
      
      Final patch in this patchset series will remove the lock from read-path
      and change rw_semaphore with mutex in write path.  With bonus, we could
      drop pending slot free mess in next patch.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      92967471
    • M
      zram: use atomic operation for stat · deb0bdeb
      Minchan Kim 提交于
      Some of fields in zram->stats are protected by zram->lock which is
      rather coarse-grained so let's use atomic operation without explict
      locking.
      
      This patch is ready for removing dependency of zram->lock in read path
      which is very coarse-grained rw_semaphore.  Of course, this patch adds
      new atomic operation so it might make slow but my 12CPU test couldn't
      spot any regression.  All gain/lose is marginal within stddev.
      
        iozone -t -T -l 12 -u 12 -r 16K -s 60M -I +Z -V 0
      
        ==Initial write                ==Initial write
        records: 50                    records: 50
        avg:  412875.17                avg:  415638.23
        std:   38543.12 (9.34%)        std:   36601.11 (8.81%)
        max:  521262.03                max:  502976.72
        min:  343263.13                min:  351389.12
        ==Rewrite                      ==Rewrite
        records: 50                    records: 50
        avg:  416640.34                avg:  397914.33
        std:   60798.92 (14.59%)       std:   46150.42 (11.60%)
        max:  543057.07                max:  522669.17
        min:  304071.67                min:  316588.77
        ==Read                         ==Read
        records: 50                    records: 50
        avg: 4147338.63                avg: 4070736.51
        std:  179333.25 (4.32%)        std:  223499.89 (5.49%)
        max: 4459295.28                max: 4539514.44
        min: 3753057.53                min: 3444686.31
        ==Re-read                      ==Re-read
        records: 50                    records: 50
        avg: 4096706.71                avg: 4117218.57
        std:  229735.04 (5.61%)        std:  171676.25 (4.17%)
        max: 4430012.09                max: 4459263.94
        min: 2987217.80                min: 3666904.28
        ==Reverse Read                 ==Reverse Read
        records: 50                    records: 50
        avg: 4062763.83                avg: 4078508.32
        std:  186208.46 (4.58%)        std:  172684.34 (4.23%)
        max: 4401358.78                max: 4424757.22
        min: 3381625.00                min: 3679359.94
        ==Stride read                  ==Stride read
        records: 50                    records: 50
        avg: 4094933.49                avg: 4082170.22
        std:  185710.52 (4.54%)        std:  196346.68 (4.81%)
        max: 4478241.25                max: 4460060.97
        min: 3732593.23                min: 3584125.78
        ==Random read                  ==Random read
        records: 50                    records: 50
        avg: 4031070.04                avg: 4074847.49
        std:  192065.51 (4.76%)        std:  206911.33 (5.08%)
        max: 4356931.16                max: 4399442.56
        min: 3481619.62                min: 3548372.44
        ==Mixed workload               ==Mixed workload
        records: 50                    records: 50
        avg:  149925.73                avg:  149675.54
        std:    7701.26 (5.14%)        std:    6902.09 (4.61%)
        max:  191301.56                max:  175162.05
        min:  133566.28                min:  137762.87
        ==Random write                 ==Random write
        records: 50                    records: 50
        avg:  404050.11                avg:  393021.47
        std:   58887.57 (14.57%)       std:   42813.70 (10.89%)
        max:  601798.09                max:  524533.43
        min:  325176.99                min:  313255.34
        ==Pwrite                       ==Pwrite
        records: 50                    records: 50
        avg:  411217.70                avg:  411237.96
        std:   43114.99 (10.48%)       std:   33136.29 (8.06%)
        max:  530766.79                max:  471899.76
        min:  320786.84                min:  317906.94
        ==Pread                        ==Pread
        records: 50                    records: 50
        avg: 4154908.65                avg: 4087121.92
        std:  151272.08 (3.64%)        std:  219505.04 (5.37%)
        max: 4459478.12                max: 4435857.38
        min: 3730512.41                min: 3101101.67
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      deb0bdeb
    • M
      zram: remove unnecessary free · 874e3cdd
      Minchan Kim 提交于
      Commit a0c516cb ("zram: don't grab mutex in zram_slot_free_noity")
      introduced pending zram slot free in zram's write path in case of
      missing slot free by memory allocation failure in zram_slot_free_notify
      but it is not necessary because we have already freed the slot right
      before overwriting.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      874e3cdd
    • M
      zram: delay pending free request in read path · 9b353db1
      Minchan Kim 提交于
      Sergey reported we don't need to handle pending free request every I/O
      so that this patch removes it in read path while we remain it in write
      path.
      
      Let's consider below example.
      
      Swap subsystem ask to zram "A" block free by swap_slot_free_notify but
      zram had been pended it without real freeing.  Swap subsystem allocates
      "A" block for new data but request pended for a long time just handled
      and zram blindly free new data on the "A" block.  :(
      
      That's why we couldn't remove handle pending free request right before
      zram-write.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Reported-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9b353db1
    • M
      zram: fix race between reset and flushing pending work · da4a0412
      Minchan Kim 提交于
      Dan and Sergey reported that there is a racy between reset and flushing
      of pending work so that it could make oops by freeing zram->meta in
      reset while zram_slot_free can access zram->meta if new request is
      adding during the race window.
      
      This patch moves flush after taking init_lock so it prevents new request
      so that it closes the race.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da4a0412
    • M
      zram: add copyright · 7bfb3de8
      Minchan Kim 提交于
      Add my copyright to the zram source code which I maintain.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7bfb3de8
    • M
      zram: remove old private project comment · 49061236
      Minchan Kim 提交于
      Remove the old private compcache project address so upcoming patches
      should be sent to LKML because we Linux kernel community will take care.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49061236
    • M
      zram: promote zram from staging · cd67e10a
      Minchan Kim 提交于
      Zram has lived in staging for a LONG LONG time and have been
      fixed/improved by many contributors so code is clean and stable now.  Of
      course, there are lots of product using zram in real practice.
      
      The major TV companys have used zram as swap since two years ago and
      recently our production team released android smart phone with zram
      which is used as swap, too and recently Android Kitkat start to use zram
      for small memory smart phone.  And there was a report Google released
      their ChromeOS with zram, too and cyanogenmod have been used zram long
      time ago.  And I heard some disto have used zram block device for tmpfs.
      In addition, I saw many report from many other peoples.  For example,
      Lubuntu start to use it.
      
      The benefit of zram is very clear.  With my experience, one of the
      benefit was to remove jitter of video application with backgroud memory
      pressure.  It would be effect of efficient memory usage by compression
      but more issue is whether swap is there or not in the system.  Recent
      mobile platforms have used JAVA so there are many anonymous pages.  But
      embedded system normally are reluctant to use eMMC or SDCard as swap
      because there is wear-leveling and latency issues so if we do not use
      swap, it means we can't reclaim anoymous pages and at last, we could
      encounter OOM kill.  :(
      
      Although we have real storage as swap, it was a problem, too.  Because
      it sometime ends up making system very unresponsible caused by slow swap
      storage performance.
      
      Quote from Luigi on Google
       "Since Chrome OS was mentioned: the main reason why we don't use swap
        to a disk (rotating or SSD) is because it doesn't degrade gracefully
        and leads to a bad interactive experience.  Generally we prefer to
        manage RAM at a higher level, by transparently killing and restarting
        processes.  But we noticed that zram is fast enough to be competitive
        with the latter, and it lets us make more efficient use of the
        available RAM.  " and he announced.
      http://www.spinics.net/lists/linux-mm/msg57717.html
      
      Other uses case is to use zram for block device.  Zram is block device
      so anyone can format the block device and mount on it so some guys on
      the internet start zram as /var/tmp.
      http://forums.gentoo.org/viewtopic-t-838198-start-0.html
      
      Let's promote zram and enhance/maintain it instead of removing.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NNitin Gupta <ngupta@vflare.org>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cd67e10a
  2. 28 1月, 2014 4 次提交
  3. 24 1月, 2014 1 次提交
  4. 22 1月, 2014 9 次提交
  5. 17 1月, 2014 1 次提交
    • J
      floppy: bail out in open() if drive is not responding to block0 read · 7b7b68bb
      Jiri Kosina 提交于
      In case reading of block 0 during open() fails, it is not the right thing
      to let open() succeed.
      
      Fix this by introducing FD_OPEN_SHOULD_FAIL_BIT flag, and setting it in
      case the bio callback encounters an error while trying to read block 0.
      
      As a bonus, this works around certain broken userspace (blkid), which is
      not able to properly handle read()s returning IO errors. Hence be nice to
      those, and bail out during open() already; if block 0 is not readable,
      read()s are not going to provide any meaningful data anyway.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      7b7b68bb
  6. 12 1月, 2014 1 次提交
  7. 04 1月, 2014 1 次提交
    • K
      xen/pvhvm: If xen_platform_pci=0 is set don't blow up (v4). · 51c71a3b
      Konrad Rzeszutek Wilk 提交于
      The user has the option of disabling the platform driver:
      00:02.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)
      
      which is used to unplug the emulated drivers (IDE, Realtek 8169, etc)
      and allow the PV drivers to take over. If the user wishes
      to disable that they can set:
      
        xen_platform_pci=0
        (in the guest config file)
      
      or
        xen_emul_unplug=never
        (on the Linux command line)
      
      except it does not work properly. The PV drivers still try to
      load and since the Xen platform driver is not run - and it
      has not initialized the grant tables, most of the PV drivers
      stumble upon:
      
      input: Xen Virtual Keyboard as /devices/virtual/input/input5
      input: Xen Virtual Pointer as /devices/virtual/input/input6M
      ------------[ cut here ]------------
      kernel BUG at /home/konrad/ssd/konrad/linux/drivers/xen/grant-table.c:1206!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: xen_kbdfront(+) xenfs xen_privcmd
      CPU: 6 PID: 1389 Comm: modprobe Not tainted 3.13.0-rc1upstream-00021-ga6c892b-dirty #1
      Hardware name: Xen HVM domU, BIOS 4.4-unstable 11/26/2013
      RIP: 0010:[<ffffffff813ddc40>]  [<ffffffff813ddc40>] get_free_entries+0x2e0/0x300
      Call Trace:
       [<ffffffff8150d9a3>] ? evdev_connect+0x1e3/0x240
       [<ffffffff813ddd0e>] gnttab_grant_foreign_access+0x2e/0x70
       [<ffffffffa0010081>] xenkbd_connect_backend+0x41/0x290 [xen_kbdfront]
       [<ffffffffa0010a12>] xenkbd_probe+0x2f2/0x324 [xen_kbdfront]
       [<ffffffff813e5757>] xenbus_dev_probe+0x77/0x130
       [<ffffffff813e7217>] xenbus_frontend_dev_probe+0x47/0x50
       [<ffffffff8145e9a9>] driver_probe_device+0x89/0x230
       [<ffffffff8145ebeb>] __driver_attach+0x9b/0xa0
       [<ffffffff8145eb50>] ? driver_probe_device+0x230/0x230
       [<ffffffff8145eb50>] ? driver_probe_device+0x230/0x230
       [<ffffffff8145cf1c>] bus_for_each_dev+0x8c/0xb0
       [<ffffffff8145e7d9>] driver_attach+0x19/0x20
       [<ffffffff8145e260>] bus_add_driver+0x1a0/0x220
       [<ffffffff8145f1ff>] driver_register+0x5f/0xf0
       [<ffffffff813e55c5>] xenbus_register_driver_common+0x15/0x20
       [<ffffffff813e76b3>] xenbus_register_frontend+0x23/0x40
       [<ffffffffa0015000>] ? 0xffffffffa0014fff
       [<ffffffffa001502b>] xenkbd_init+0x2b/0x1000 [xen_kbdfront]
       [<ffffffff81002049>] do_one_initcall+0x49/0x170
      
      .. snip..
      
      which is hardly nice. This patch fixes this by having each
      PV driver check for:
       - if running in PV, then it is fine to execute (as that is their
         native environment).
       - if running in HVM, check if user wanted 'xen_emul_unplug=never',
         in which case bail out and don't load any PV drivers.
       - if running in HVM, and if PCI device 5853:0001 (xen_platform_pci)
         does not exist, then bail out and not load PV drivers.
       - (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=ide-disks',
         then bail out for all PV devices _except_ the block one.
         Ditto for the network one ('nics').
       - (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=unnecessary'
         then load block PV driver, and also setup the legacy IDE paths.
         In (v3) make it actually load PV drivers.
      
      Reported-by: Sander Eikelenboom <linux@eikelenboom.it
      Reported-by: NAnthony PERARD <anthony.perard@citrix.com>
      Reported-and-Tested-by: NFabio Fantoni <fabio.fantoni@m2r.biz>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [v2: Add extra logic to handle the myrid ways 'xen_emul_unplug'
      can be used per Ian and Stefano suggestion]
      [v3: Make the unnecessary case work properly]
      [v4: s/disks/ide-disks/ spotted by Fabio]
      Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: Bjorn Helgaas <bhelgaas@google.com> [for PCI parts]
      CC: stable@vger.kernel.org
      51c71a3b
  8. 03 1月, 2014 1 次提交
  9. 01 1月, 2014 10 次提交
  10. 22 12月, 2013 2 次提交