1. 09 6月, 2016 2 次提交
  2. 08 6月, 2016 9 次提交
  3. 26 5月, 2016 10 次提交
    • I
      libceph: replace ceph_monc_request_next_osdmap() · 7cca78c9
      Ilya Dryomov 提交于
      ... with a wrapper around maybe_request_map() - no need for two
      osdmap-specific functions.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      7cca78c9
    • I
      libceph: async MON client generic requests · d0b19705
      Ilya Dryomov 提交于
      For map check, we are going to need to send CEPH_MSG_MON_GET_VERSION
      messages asynchronously and get a callback on completion.  Refactor MON
      client to allow firing off generic requests asynchronously and add an
      async variant of ceph_monc_get_version().  ceph_monc_do_statfs() is
      switched over and remains sync.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      d0b19705
    • I
      libceph, rbd: ceph_osd_linger_request, watch/notify v2 · 922dab61
      Ilya Dryomov 提交于
      This adds support and switches rbd to a new, more reliable version of
      watch/notify protocol.  As with the OSD client update, this is mostly
      about getting the right structures linked into the right places so that
      reconnects are properly sent when needed.  watch/notify v2 also
      requires sending regular pings to the OSDs - send_linger_ping().
      
      A major change from the old watch/notify implementation is the
      introduction of ceph_osd_linger_request - linger requests no longer
      piggy back on ceph_osd_request.  ceph_osd_event has been merged into
      ceph_osd_linger_request.
      
      All the details are now hidden within libceph, the interface consists
      of a simple pair of watch/unwatch functions and ceph_osdc_notify_ack().
      ceph_osdc_watch() does return ceph_osd_linger_request, but only to keep
      the lifetime management simple.
      
      ceph_osdc_notify_ack() accepts an optional data payload, which is
      relayed back to the notifier.
      
      Portions of this patch are loosely based on work by Douglas Fuller
      <dfuller@redhat.com> and Mike Christie <michaelc@cs.wisc.edu>.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      922dab61
    • I
      rbd: rbd_dev_header_unwatch_sync() variant · c525f036
      Ilya Dryomov 提交于
      Introduce __rbd_dev_header_unwatch_sync(), which doesn't flush notify
      callbacks.  This is for the new rados_watcherrcb_t, which would be
      called from a notify callback.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      c525f036
    • I
      libceph: drop msg argument from ceph_osdc_callback_t · 85e084fe
      Ilya Dryomov 提交于
      finish_read(), its only user, uses it to get to hdr.data_len, which is
      what ->r_result is set to on success.  This gains us the ability to
      safely call callbacks from contexts other than reply, e.g. map check.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      85e084fe
    • I
      libceph: switch to calc_target(), part 2 · bb873b53
      Ilya Dryomov 提交于
      The crux of this is getting rid of ceph_osdc_build_request(), so that
      MOSDOp can be encoded not before but after calc_target() calculates the
      actual target.  Encoding now happens within ceph_osdc_start_request().
      
      Also nuked is the accompanying bunch of pointers into the encoded
      buffer that was used to update fields on each send - instead, the
      entire front is re-encoded.  If we want to support target->name_len !=
      base->name_len in the future, there is no other way, because oid is
      surrounded by other fields in the encoded buffer.
      
      Encoding OSD ops and adding data items to the request message were
      mixed together in osd_req_encode_op().  While we want to re-encode OSD
      ops, we don't want to add duplicate data items to the message when
      resending, so all call to ceph_osdc_msg_data_add() are factored out
      into a new setup_request_data().
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      bb873b53
    • I
      rbd: use header_oid instead of header_name · c41d13a3
      Ilya Dryomov 提交于
      Switch to ceph_object_id and use ceph_oid_aprintf() instead of a bare
      const char *.  This reduces noise in rbd_dev_header_name().
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      c41d13a3
    • I
      libceph: variable-sized ceph_object_id · d30291b9
      Ilya Dryomov 提交于
      Currently ceph_object_id can hold object names of up to 100
      (CEPH_MAX_OID_NAME_LEN) characters.  This is enough for all use cases,
      expect one - long rbd image names:
      
      - a format 1 header is named "<imgname>.rbd"
      - an object that points to a format 2 header is named "rbd_id.<imgname>"
      
      We operate on these potentially long-named objects during rbd map, and,
      for format 1 images, during header refresh.  (A format 2 header name is
      a small system-generated string.)
      
      Lift this 100 character limit by making ceph_object_id be able to point
      to an externally-allocated string.  Apart from being able to work with
      almost arbitrarily-long named objects, this allows us to reduce the
      size of ceph_object_id from >100 bytes to 64 bytes.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      d30291b9
    • I
      libceph: move message allocation out of ceph_osdc_alloc_request() · 13d1ad16
      Ilya Dryomov 提交于
      The size of ->r_request and ->r_reply messages depends on the size of
      the object name (ceph_object_id), while the size of ceph_osd_request is
      fixed.  Move message allocation into a separate function that would
      have to be called after ceph_object_id and ceph_object_locator (which
      is also going to become variable in size with RADOS namespaces) have
      been filled in:
      
          req = ceph_osdc_alloc_request(...);
          <fill in req->r_base_oid>
          <fill in req->r_base_oloc>
          ceph_osdc_alloc_messages(req);
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      13d1ad16
    • I
      rbd: get/put img_request in rbd_img_request_submit() · 663ae2cc
      Ilya Dryomov 提交于
      By the time we get to checking for_each_obj_request_safe(img_request)
      terminating condition, all obj_requests may be complete and img_request
      ref, that rbd_img_request_submit() takes away from its caller, may be
      put.  Moving the next_obj_request cursor is then a use-after-free on
      img_request.
      
      It's totally benign, as the value that's read is never used, but
      I think it's still worth fixing.
      
      Cc: Alex Elder <elder@linaro.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      663ae2cc
  4. 21 5月, 2016 4 次提交
    • S
      zram: introduce per-device debug_stat sysfs node · 623e47fc
      Sergey Senozhatsky 提交于
      debug_stat sysfs is read-only and represents various debugging data that
      zram developers may need.  This file is not meant to be used by anyone
      else: its content is not documented and will change any time w/o any
      notice.  Therefore, the output of debug_stat file contains a version
      string.  To avoid any confusion, we will increase the version number
      every time we modify the output.
      
      At the moment this file exports only one value -- the number of
      re-compressions, IOW, the number of times compression fast path has
      failed.  This stat is temporary any will be useful in case if any
      per-cpu compression streams regressions will be reported.
      
      Link: http://lkml.kernel.org/r/20160513230834.GB26763@bbox
      Link: http://lkml.kernel.org/r/20160511134553.12655-1-sergey.senozhatsky@gmail.comSigned-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      623e47fc
    • S
      zram: remove max_comp_streams internals · 43209ea2
      Sergey Senozhatsky 提交于
      Remove the internal part of max_comp_streams interface, since we
      switched to per-cpu streams.  We will keep RW max_comp_streams attr
      around, because:
      
      a) we may (silently) switch back to idle compression streams list and
         don't want to disturb user space
      
      b) max_comp_streams attr must wait for the next 'lay off cycle'; we
         give user space 2 years to adjust before we remove/downgrade the attr,
         and there are already several attrs scheduled for removal in 4.11, so
         it's too late for max_comp_streams.
      
      This slightly change a user visible behaviour:
      
      - First, reading from max_comp_stream file now will always return the
        number of online CPUs.
      
      - Second, writing to max_comp_stream will not take any effect.
      
      Link: http://lkml.kernel.org/r/20160503165546.25201-1-sergey.senozhatsky@gmail.comSigned-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      43209ea2
    • S
      zram: user per-cpu compression streams · da9556a2
      Sergey Senozhatsky 提交于
      Remove idle streams list and keep compression streams in per-cpu data.
      This removes two contented spin_lock()/spin_unlock() calls from write
      path and also prevent write OP from being preempted while holding the
      compression stream, which can cause slow downs.
      
      For instance, let's assume that we have N cpus and N-2
      max_comp_streams.TASK1 owns the last idle stream, TASK2-TASK3 come in
      with the write requests:
      
        TASK1            TASK2              TASK3
       zram_bvec_write()
        spin_lock
        find stream
        spin_unlock
      
        compress
      
        <<preempted>>   zram_bvec_write()
                         spin_lock
                         find stream
                         spin_unlock
                           no_stream
                             schedule
                                           zram_bvec_write()
                                            spin_lock
                                            find_stream
                                            spin_unlock
                                              no_stream
                                                schedule
         spin_lock
         release stream
         spin_unlock
           wake up TASK2
      
      not only TASK2 and TASK3 will not get the stream, TASK1 will be
      preempted in the middle of its operation; while we would prefer it to
      finish compression and release the stream.
      
      Test environment: x86_64, 4 CPU box, 3G zram, lzo
      
      The following fio tests were executed:
            read, randread, write, randwrite, rw, randrw
      with the increasing number of jobs from 1 to 10.
      
                        4 streams        8 streams       per-cpu
        ===========================================================
        jobs1
        READ:           2520.1MB/s       2566.5MB/s      2491.5MB/s
        READ:           2102.7MB/s       2104.2MB/s      2091.3MB/s
        WRITE:          1355.1MB/s       1320.2MB/s      1378.9MB/s
        WRITE:          1103.5MB/s       1097.2MB/s      1122.5MB/s
        READ:           434013KB/s       435153KB/s      439961KB/s
        WRITE:          433969KB/s       435109KB/s      439917KB/s
        READ:           403166KB/s       405139KB/s      403373KB/s
        WRITE:          403223KB/s       405197KB/s      403430KB/s
        jobs2
        READ:           7958.6MB/s       8105.6MB/s      8073.7MB/s
        READ:           6864.9MB/s       6989.8MB/s      7021.8MB/s
        WRITE:          2438.1MB/s       2346.9MB/s      3400.2MB/s
        WRITE:          1994.2MB/s       1990.3MB/s      2941.2MB/s
        READ:           981504KB/s       973906KB/s      1018.8MB/s
        WRITE:          981659KB/s       974060KB/s      1018.1MB/s
        READ:           937021KB/s       938976KB/s      987250KB/s
        WRITE:          934878KB/s       936830KB/s      984993KB/s
        jobs3
        READ:           13280MB/s        13553MB/s       13553MB/s
        READ:           11534MB/s        11785MB/s       11755MB/s
        WRITE:          3456.9MB/s       3469.9MB/s      4810.3MB/s
        WRITE:          3029.6MB/s       3031.6MB/s      4264.8MB/s
        READ:           1363.8MB/s       1362.6MB/s      1448.9MB/s
        WRITE:          1361.9MB/s       1360.7MB/s      1446.9MB/s
        READ:           1309.4MB/s       1310.6MB/s      1397.5MB/s
        WRITE:          1307.4MB/s       1308.5MB/s      1395.3MB/s
        jobs4
        READ:           20244MB/s        20177MB/s       20344MB/s
        READ:           17886MB/s        17913MB/s       17835MB/s
        WRITE:          4071.6MB/s       4046.1MB/s      6370.2MB/s
        WRITE:          3608.9MB/s       3576.3MB/s      5785.4MB/s
        READ:           1824.3MB/s       1821.6MB/s      1997.5MB/s
        WRITE:          1819.8MB/s       1817.4MB/s      1992.5MB/s
        READ:           1765.7MB/s       1768.3MB/s      1937.3MB/s
        WRITE:          1767.5MB/s       1769.1MB/s      1939.2MB/s
        jobs5
        READ:           18663MB/s        18986MB/s       18823MB/s
        READ:           16659MB/s        16605MB/s       16954MB/s
        WRITE:          3912.4MB/s       3888.7MB/s      6126.9MB/s
        WRITE:          3506.4MB/s       3442.5MB/s      5519.3MB/s
        READ:           1798.2MB/s       1746.5MB/s      1935.8MB/s
        WRITE:          1792.7MB/s       1740.7MB/s      1929.1MB/s
        READ:           1727.6MB/s       1658.2MB/s      1917.3MB/s
        WRITE:          1726.5MB/s       1657.2MB/s      1916.6MB/s
        jobs6
        READ:           21017MB/s        20922MB/s       21162MB/s
        READ:           19022MB/s        19140MB/s       18770MB/s
        WRITE:          3968.2MB/s       4037.7MB/s      6620.8MB/s
        WRITE:          3643.5MB/s       3590.2MB/s      6027.5MB/s
        READ:           1871.8MB/s       1880.5MB/s      2049.9MB/s
        WRITE:          1867.8MB/s       1877.2MB/s      2046.2MB/s
        READ:           1755.8MB/s       1710.3MB/s      1964.7MB/s
        WRITE:          1750.5MB/s       1705.9MB/s      1958.8MB/s
        jobs7
        READ:           21103MB/s        20677MB/s       21482MB/s
        READ:           18522MB/s        18379MB/s       19443MB/s
        WRITE:          4022.5MB/s       4067.4MB/s      6755.9MB/s
        WRITE:          3691.7MB/s       3695.5MB/s      5925.6MB/s
        READ:           1841.5MB/s       1933.9MB/s      2090.5MB/s
        WRITE:          1842.7MB/s       1935.3MB/s      2091.9MB/s
        READ:           1832.4MB/s       1856.4MB/s      1971.5MB/s
        WRITE:          1822.3MB/s       1846.2MB/s      1960.6MB/s
        jobs8
        READ:           20463MB/s        20194MB/s       20862MB/s
        READ:           18178MB/s        17978MB/s       18299MB/s
        WRITE:          4085.9MB/s       4060.2MB/s      7023.8MB/s
        WRITE:          3776.3MB/s       3737.9MB/s      6278.2MB/s
        READ:           1957.6MB/s       1944.4MB/s      2109.5MB/s
        WRITE:          1959.2MB/s       1946.2MB/s      2111.4MB/s
        READ:           1900.6MB/s       1885.7MB/s      2082.1MB/s
        WRITE:          1896.2MB/s       1881.4MB/s      2078.3MB/s
        jobs9
        READ:           19692MB/s        19734MB/s       19334MB/s
        READ:           17678MB/s        18249MB/s       17666MB/s
        WRITE:          4004.7MB/s       4064.8MB/s      6990.7MB/s
        WRITE:          3724.7MB/s       3772.1MB/s      6193.6MB/s
        READ:           1953.7MB/s       1967.3MB/s      2105.6MB/s
        WRITE:          1953.4MB/s       1966.7MB/s      2104.1MB/s
        READ:           1860.4MB/s       1897.4MB/s      2068.5MB/s
        WRITE:          1858.9MB/s       1895.9MB/s      2066.8MB/s
        jobs10
        READ:           19730MB/s        19579MB/s       19492MB/s
        READ:           18028MB/s        18018MB/s       18221MB/s
        WRITE:          4027.3MB/s       4090.6MB/s      7020.1MB/s
        WRITE:          3810.5MB/s       3846.8MB/s      6426.8MB/s
        READ:           1956.1MB/s       1994.6MB/s      2145.2MB/s
        WRITE:          1955.9MB/s       1993.5MB/s      2144.8MB/s
        READ:           1852.8MB/s       1911.6MB/s      2075.8MB/s
        WRITE:          1855.7MB/s       1914.6MB/s      2078.1MB/s
      
      perf stat
      
                                        4 streams                       8 streams                       per-cpu
        ====================================================================================================================
        jobs1
        stalled-cycles-frontend      23,174,811,209 (  38.21%)     23,220,254,188 (  38.25%)       23,061,406,918 (  38.34%)
        stalled-cycles-backend       11,514,174,638 (  18.98%)     11,696,722,657 (  19.27%)       11,370,852,810 (  18.90%)
        instructions                 73,925,005,782 (    1.22)     73,903,177,632 (    1.22)       73,507,201,037 (    1.22)
        branches                     14,455,124,835 ( 756.063)     14,455,184,779 ( 755.281)       14,378,599,509 ( 758.546)
        branch-misses                    69,801,336 (   0.48%)         80,225,529 (   0.55%)           72,044,726 (   0.50%)
        jobs2
        stalled-cycles-frontend      49,912,741,782 (  46.11%)     50,101,189,290 (  45.95%)       32,874,195,633 (  35.11%)
        stalled-cycles-backend       27,080,366,230 (  25.02%)     27,949,970,232 (  25.63%)       16,461,222,706 (  17.58%)
        instructions                122,831,629,690 (    1.13)    122,919,846,419 (    1.13)      121,924,786,775 (    1.30)
        branches                     23,725,889,239 ( 692.663)     23,733,547,140 ( 688.062)       23,553,950,311 ( 794.794)
        branch-misses                    90,733,041 (   0.38%)         96,320,895 (   0.41%)           84,561,092 (   0.36%)
        jobs3
        stalled-cycles-frontend      66,437,834,608 (  45.58%)     63,534,923,344 (  43.69%)       42,101,478,505 (  33.19%)
        stalled-cycles-backend       34,940,799,661 (  23.97%)     34,774,043,148 (  23.91%)       21,163,324,388 (  16.68%)
        instructions                171,692,121,862 (    1.18)    171,775,373,044 (    1.18)      170,353,542,261 (    1.34)
        branches                     32,968,962,622 ( 628.723)     32,987,739,894 ( 630.512)       32,729,463,918 ( 717.027)
        branch-misses                   111,522,732 (   0.34%)        110,472,894 (   0.33%)           99,791,291 (   0.30%)
        jobs4
        stalled-cycles-frontend      98,741,701,675 (  49.72%)     94,797,349,965 (  47.59%)       54,535,655,381 (  33.53%)
        stalled-cycles-backend       54,642,609,615 (  27.51%)     55,233,554,408 (  27.73%)       27,882,323,541 (  17.14%)
        instructions                220,884,807,851 (    1.11)    220,930,887,273 (    1.11)      218,926,845,851 (    1.35)
        branches                     42,354,518,180 ( 592.105)     42,362,770,587 ( 590.452)       41,955,552,870 ( 716.154)
        branch-misses                   138,093,449 (   0.33%)        131,295,286 (   0.31%)          121,794,771 (   0.29%)
        jobs5
        stalled-cycles-frontend     116,219,747,212 (  48.14%)    110,310,397,012 (  46.29%)       66,373,082,723 (  33.70%)
        stalled-cycles-backend       66,325,434,776 (  27.48%)     64,157,087,914 (  26.92%)       32,999,097,299 (  16.76%)
        instructions                270,615,008,466 (    1.12)    270,546,409,525 (    1.14)      268,439,910,948 (    1.36)
        branches                     51,834,046,557 ( 599.108)     51,811,867,722 ( 608.883)       51,412,576,077 ( 729.213)
        branch-misses                   158,197,086 (   0.31%)        142,639,805 (   0.28%)          133,425,455 (   0.26%)
        jobs6
        stalled-cycles-frontend     138,009,414,492 (  48.23%)    139,063,571,254 (  48.80%)       75,278,568,278 (  32.80%)
        stalled-cycles-backend       79,211,949,650 (  27.68%)     79,077,241,028 (  27.75%)       37,735,797,899 (  16.44%)
        instructions                319,763,993,731 (    1.12)    319,937,782,834 (    1.12)      316,663,600,784 (    1.38)
        branches                     61,219,433,294 ( 595.056)     61,250,355,540 ( 598.215)       60,523,446,617 ( 733.706)
        branch-misses                   169,257,123 (   0.28%)        154,898,028 (   0.25%)          141,180,587 (   0.23%)
        jobs7
        stalled-cycles-frontend     162,974,812,119 (  49.20%)    159,290,061,987 (  48.43%)       88,046,641,169 (  33.21%)
        stalled-cycles-backend       92,223,151,661 (  27.84%)     91,667,904,406 (  27.87%)       44,068,454,971 (  16.62%)
        instructions                369,516,432,430 (    1.12)    369,361,799,063 (    1.12)      365,290,380,661 (    1.38)
        branches                     70,795,673,950 ( 594.220)     70,743,136,124 ( 597.876)       69,803,996,038 ( 732.822)
        branch-misses                   181,708,327 (   0.26%)        165,767,821 (   0.23%)          150,109,797 (   0.22%)
        jobs8
        stalled-cycles-frontend     185,000,017,027 (  49.30%)    182,334,345,473 (  48.37%)       99,980,147,041 (  33.26%)
        stalled-cycles-backend      105,753,516,186 (  28.18%)    107,937,830,322 (  28.63%)       51,404,177,181 (  17.10%)
        instructions                418,153,161,055 (    1.11)    418,308,565,828 (    1.11)      413,653,475,581 (    1.38)
        branches                     80,035,882,398 ( 592.296)     80,063,204,510 ( 589.843)       79,024,105,589 ( 730.530)
        branch-misses                   199,764,528 (   0.25%)        177,936,926 (   0.22%)          160,525,449 (   0.20%)
        jobs9
        stalled-cycles-frontend     210,941,799,094 (  49.63%)    204,714,679,254 (  48.55%)      114,251,113,756 (  33.96%)
        stalled-cycles-backend      122,640,849,067 (  28.85%)    122,188,553,256 (  28.98%)       58,360,041,127 (  17.35%)
        instructions                468,151,025,415 (    1.10)    467,354,869,323 (    1.11)      462,665,165,216 (    1.38)
        branches                     89,657,067,510 ( 585.628)     89,411,550,407 ( 588.990)       88,360,523,943 ( 730.151)
        branch-misses                   218,292,301 (   0.24%)        191,701,247 (   0.21%)          178,535,678 (   0.20%)
        jobs10
        stalled-cycles-frontend     233,595,958,008 (  49.81%)    227,540,615,689 (  49.11%)      160,341,979,938 (  43.07%)
        stalled-cycles-backend      136,153,676,021 (  29.03%)    133,635,240,742 (  28.84%)       65,909,135,465 (  17.70%)
        instructions                517,001,168,497 (    1.10)    516,210,976,158 (    1.11)      511,374,038,613 (    1.37)
        branches                     98,911,641,329 ( 585.796)     98,700,069,712 ( 591.583)       97,646,761,028 ( 728.712)
        branch-misses                   232,341,823 (   0.23%)        199,256,308 (   0.20%)          183,135,268 (   0.19%)
      
      per-cpu streams tend to cause significantly less stalled cycles; execute
      less branches and hit less branch-misses.
      
      perf stat reported execution time
      
                                4 streams        8 streams       per-cpu
        ====================================================================
        jobs1
        seconds elapsed        20.909073870     20.875670495    20.817838540
        jobs2
        seconds elapsed        18.529488399     18.720566469    16.356103108
        jobs3
        seconds elapsed        18.991159531     18.991340812    16.766216066
        jobs4
        seconds elapsed        19.560643828     19.551323547    16.246621715
        jobs5
        seconds elapsed        24.746498464     25.221646740    20.696112444
        jobs6
        seconds elapsed        28.258181828     28.289765505    22.885688857
        jobs7
        seconds elapsed        32.632490241     31.909125381    26.272753738
        jobs8
        seconds elapsed        35.651403851     36.027596308    29.108024711
        jobs9
        seconds elapsed        40.569362365     40.024227989    32.898204012
        jobs10
        seconds elapsed        44.673112304     43.874898137    35.632952191
      
      Please see
        Link: http://marc.info/?l=linux-kernel&m=146166970727530
        Link: http://marc.info/?l=linux-kernel&m=146174716719650
      for more test results (under low memory conditions).
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Suggested-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da9556a2
    • S
      zsmalloc: require GFP in zs_malloc() · d0d8da2d
      Sergey Senozhatsky 提交于
      Pass GFP flags to zs_malloc() instead of using a fixed mask supplied to
      zs_create_pool(), so we can be more flexible, but, more importantly, we
      need this to switch zram to per-cpu compression streams -- zram will try
      to allocate handle with preemption disabled in a fast path and switch to
      a slow path (using different gfp mask) if the fast one has failed.
      
      Apart from that, this also align zs_malloc() interface with zspool/zbud.
      
      [sergey.senozhatsky@gmail.com: pass GFP flags to zs_malloc() instead of using a fixed mask]
        Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
      Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfishSigned-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0d8da2d
  5. 20 5月, 2016 1 次提交
    • J
      mm: rename _count, field of the struct page, to _refcount · 0139aa7b
      Joonsoo Kim 提交于
      Many developers already know that field for reference count of the
      struct page is _count and atomic type.  They would try to handle it
      directly and this could break the purpose of page reference count
      tracepoint.  To prevent direct _count modification, this patch rename it
      to _refcount and add warning message on the code.  After that, developer
      who need to handle reference count will find that field should not be
      accessed directly.
      
      [akpm@linux-foundation.org: fix comments, per Vlastimil]
      [akpm@linux-foundation.org: Documentation/vm/transhuge.txt too]
      [sfr@canb.auug.org.au: sync ethernet driver changes]
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Sunil Goutham <sgoutham@cavium.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Manish Chopra <manish.chopra@qlogic.com>
      Cc: Yuval Mintz <yuval.mintz@qlogic.com>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0139aa7b
  6. 19 5月, 2016 1 次提交
  7. 11 5月, 2016 1 次提交
  8. 28 4月, 2016 2 次提交
    • I
      rbd: report unsupported features to syslog · d3767f0f
      Ilya Dryomov 提交于
      ... instead of just returning an error.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NJosh Durgin <jdurgin@redhat.com>
      d3767f0f
    • I
      rbd: fix rbd map vs notify races · 811c6688
      Ilya Dryomov 提交于
      A while ago, commit 9875201e ("rbd: fix use-after free of
      rbd_dev->disk") fixed rbd unmap vs notify race by introducing
      an exported wrapper for flushing notifies and sticking it into
      do_rbd_remove().
      
      A similar problem exists on the rbd map path, though: the watch is
      registered in rbd_dev_image_probe(), while the disk is set up quite
      a few steps later, in rbd_dev_device_setup().  Nothing prevents
      a notify from coming in and crashing on a NULL rbd_dev->disk:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
          Call Trace:
           [<ffffffffa0508344>] rbd_watch_cb+0x34/0x180 [rbd]
           [<ffffffffa04bd290>] do_event_work+0x40/0xb0 [libceph]
           [<ffffffff8109d5db>] process_one_work+0x17b/0x470
           [<ffffffff8109e3ab>] worker_thread+0x11b/0x400
           [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
           [<ffffffff810a5acf>] kthread+0xcf/0xe0
           [<ffffffff810b41b3>] ? finish_task_switch+0x53/0x170
           [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
           [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
           [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
          RIP  [<ffffffffa050828a>] rbd_dev_refresh+0xfa/0x180 [rbd]
      
      If an error occurs during rbd map, we have to error out, potentially
      tearing down a watch.  Just like on rbd unmap, notifies have to be
      flushed, otherwise rbd_watch_cb() may end up trying to read in the
      image header after rbd_dev_image_release() has run:
      
          Assertion failure in rbd_dev_header_info() at line 4722:
      
           rbd_assert(rbd_image_format_valid(rbd_dev->image_format));
      
          Call Trace:
           [<ffffffff81cccee0>] ? rbd_parent_request_create+0x150/0x150
           [<ffffffff81cd4e59>] rbd_dev_refresh+0x59/0x390
           [<ffffffff81cd5229>] rbd_watch_cb+0x69/0x290
           [<ffffffff81fde9bf>] do_event_work+0x10f/0x1c0
           [<ffffffff81107799>] process_one_work+0x689/0x1a80
           [<ffffffff811076f7>] ? process_one_work+0x5e7/0x1a80
           [<ffffffff81132065>] ? finish_task_switch+0x225/0x640
           [<ffffffff81107110>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
           [<ffffffff81108c69>] worker_thread+0xd9/0x1320
           [<ffffffff81108b90>] ? process_one_work+0x1a80/0x1a80
           [<ffffffff8111b02d>] kthread+0x21d/0x2e0
           [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
           [<ffffffff82022802>] ret_from_fork+0x22/0x40
           [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
          RIP  [<ffffffff81ccd8f9>] rbd_dev_header_info+0xa19/0x1e30
      
      To fix this, a) check if RBD_DEV_FLAG_EXISTS is set before calling
      revalidate_disk(), b) move ceph_osdc_flush_notifies() call into
      rbd_dev_header_unwatch_sync() to cover rbd map error paths and c) turn
      header read-in into a critical section.  The latter also happens to
      take care of rbd map foo@bar vs rbd snap rm foo@bar race.
      
      Fixes: http://tracker.ceph.com/issues/15490Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NJosh Durgin <jdurgin@redhat.com>
      811c6688
  9. 26 4月, 2016 1 次提交
    • J
      skd: remove broken discard support · 49bdedb3
      Jeff Moyer 提交于
      Simply creating a file system on an skd device, followed by mount and
      fstrim will result in errors in the logs and then a BUG().  Let's remove
      discard support from that driver.  As far as I can tell, it hasn't
      worked right since it was merged.  This patch also has a side-effect of
      cleaning up an unintentional shadowed declaration inside of
      skd_end_request.
      
      I tested to ensure that I can still do I/O to the device using xfstests
      ./check -g quick.  I didn't do anything more extensive than that,
      though.
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      49bdedb3
  10. 15 4月, 2016 1 次提交
    • M
      block: loop: fix filesystem corruption in case of aio/dio · a7297a6a
      Ming Lei 提交于
      Starting from commit e36f6204(block: split bios to max possible length),
      block core starts to split bio in the middle of bvec.
      
      Unfortunately loop dio/aio doesn't consider this situation, and
      always treat 'iter.iov_offset' as zero. Then filesystem corruption
      is observed.
      
      This patch figures out the offset of the base bvevc via
      'bio->bi_iter.bi_bvec_done' and fixes the issue by passing the offset
      to iov iterator.
      
      Fixes: e36f6204 (block: split bios to max possible length)
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org (4.5)
      Signed-off-by: NMing Lei <ming.lei@canonical.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      a7297a6a
  11. 14 4月, 2016 1 次提交
  12. 13 4月, 2016 7 次提交