1. 31 5月, 2015 9 次提交
  2. 25 4月, 2015 2 次提交
    • E
      fix I_DIO_WAKEUP definition · ac74d8d6
      Eric Sandeen 提交于
      I_DIO_WAKEUP is never directly used, but fix it up anyway.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ac74d8d6
    • J
      direct-io: only inc/dec inode->i_dio_count for file systems · fe0f07d0
      Jens Axboe 提交于
      do_blockdev_direct_IO() increments and decrements the inode
      ->i_dio_count for each IO operation. It does this to protect against
      truncate of a file. Block devices don't need this sort of protection.
      
      For a capable multiqueue setup, this atomic int is the only shared
      state between applications accessing the device for O_DIRECT, and it
      presents a scaling wall for that. In my testing, as much as 30% of
      system time is spent incrementing and decrementing this value. A mixed
      read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
      better latencies too. Before:
      
      clat percentiles (usec):
       |  1.00th=[   33],  5.00th=[   34], 10.00th=[   34], 20.00th=[   34],
       | 30.00th=[   34], 40.00th=[   34], 50.00th=[   35], 60.00th=[   35],
       | 70.00th=[   35], 80.00th=[   35], 90.00th=[   37], 95.00th=[   80],
       | 99.00th=[   98], 99.50th=[  151], 99.90th=[  155], 99.95th=[  155],
       | 99.99th=[  165]
      
      After:
      
      clat percentiles (usec):
       |  1.00th=[   95],  5.00th=[  108], 10.00th=[  129], 20.00th=[  149],
       | 30.00th=[  155], 40.00th=[  161], 50.00th=[  167], 60.00th=[  171],
       | 70.00th=[  177], 80.00th=[  185], 90.00th=[  201], 95.00th=[  270],
       | 99.00th=[  390], 99.50th=[  398], 99.90th=[  418], 99.95th=[  422],
       | 99.99th=[  438]
      
      In other setups, Robert Elliott reported seeing good performance
      improvements:
      
      https://lkml.org/lkml/2015/4/3/557
      
      The more applications accessing the device, the worse it gets.
      
      Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells
      do_blockdev_direct_IO() that it need not worry about incrementing
      or decrementing the inode i_dio_count for this caller.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Elliott, Robert (Server Storage) <elliott@hp.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fe0f07d0
  3. 24 4月, 2015 4 次提交
  4. 23 4月, 2015 1 次提交
  5. 22 4月, 2015 7 次提交
    • I
      libceph: announce support for straw2 buckets · 7c1c4747
      Ilya Dryomov 提交于
      Sync up feature bits and enable CEPH_FEATURE_CRUSH_V4.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      7c1c4747
    • I
      crush: straw2 bucket type with an efficient 64-bit crush_ln() · 958a2765
      Ilya Dryomov 提交于
      This is an improved straw bucket that correctly avoids any data movement
      between items A and B when neither A nor B's weights are changed.  Said
      differently, if we adjust the weight of item C (including adding it anew
      or removing it completely), we will only see inputs move to or from C,
      never between other items in the bucket.
      
      Notably, there is not intermediate scaling factor that needs to be
      calculated.  The mapping function is a simple function of the item weights.
      
      The below commits were squashed together into this one (mostly to avoid
      adding and then yanking a ~6000 lines worth of crush_ln_table):
      
      - crush: add a straw2 bucket type
      - crush: add crush_ln to calculate nature log efficently
      - crush: improve straw2 adjustment slightly
      - crush: change crush_ln to provide 32 more digits
      - crush: fix crush_get_bucket_item_weight and bucket destroy for straw2
      - crush/mapper: fix divide-by-0 in straw2
        (with div64_s64() for draw = ln / w and INT64_MIN -> S64_MIN - need
         to create a proper compat.h in ceph.git)
      
      Reflects ceph.git commits 242293c908e923d474910f2b8203fa3b41eb5a53,
                                32a1ead92efcd351822d22a5fc37d159c65c1338,
                                6289912418c4a3597a11778bcf29ed5415117ad9,
                                35fcb04e2945717cf5cfe150b9fa89cb3d2303a1,
                                6445d9ee7290938de1e4ee9563912a6ab6d8ee5f,
                                b5921d55d16796e12d66ad2c4add7305f9ce2353.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      958a2765
    • Y
      ceph: rename snapshot support · 0ea611a3
      Yan, Zheng 提交于
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      0ea611a3
    • M
      md/raid5: activate raid6 rmw feature · 584acdd4
      Markus Stockhausen 提交于
      Glue it altogehter. The raid6 rmw path should work the same as the
      already existing raid5 logic. So emulate the prexor handling/flags
      and split functions as needed.
      
      1) Enable xor_syndrome() in the async layer.
      
      2) Split ops_run_prexor() into RAID4/5 and RAID6 logic. Xor the syndrome
      at the start of a rmw run as we did it before for the single parity.
      
      3) Take care of rmw run in ops_run_reconstruct6(). Again process only
      the changed pages to get syndrome back into sync.
      
      4) Enhance set_syndrome_sources() to fill NULL pages if we are in a rmw
      run. The lower layers will calculate start & end pages from that and
      call the xor_syndrome() correspondingly.
      
      5) Adapt the several places where we ignored Q handling up to now.
      
      Performance numbers for a single E5630 system with a mix of 10 7200k
      desktop/server disks. 300 seconds random write with 8 threads onto a
      3,2TB (10*400GB) RAID6 64K chunk without spare (group_thread_cnt=4)
      
      bsize   rmw_level=1   rmw_level=0   rmw_level=1   rmw_level=0
              skip_copy=1   skip_copy=1   skip_copy=0   skip_copy=0
         4K      115 KB/s      141 KB/s      165 KB/s      140 KB/s
         8K      225 KB/s      275 KB/s      324 KB/s      274 KB/s
        16K      434 KB/s      536 KB/s      640 KB/s      534 KB/s
        32K      751 KB/s    1,051 KB/s    1,234 KB/s    1,045 KB/s
        64K    1,339 KB/s    1,958 KB/s    2,282 KB/s    1,962 KB/s
       128K    2,673 KB/s    3,862 KB/s    4,113 KB/s    3,898 KB/s
       256K    7,685 KB/s    7,539 KB/s    7,557 KB/s    7,638 KB/s
       512K   19,556 KB/s   19,558 KB/s   19,652 KB/s   19,688 Kb/s
      Signed-off-by: NMarkus Stockhausen <stockhausen@collogia.de>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      584acdd4
    • M
      md/raid6 algorithms: delta syndrome functions · fe5cbc6e
      Markus Stockhausen 提交于
      v3: s-o-b comment, explanation of performance and descision for
      the start/stop implementation
      
      Implementing rmw functionality for RAID6 requires optimized syndrome
      calculation. Up to now we can only generate a complete syndrome. The
      target P/Q pages are always overwritten. With this patch we provide
      a framework for inplace P/Q modification. In the first place simply
      fill those functions with NULL values.
      
      xor_syndrome() has two additional parameters: start & stop. These
      will indicate the first and last page that are changing during a
      rmw run. That makes it possible to avoid several unneccessary loops
      and speed up calculation. The caller needs to implement the following
      logic to make the functions work.
      
      1) xor_syndrome(disks, start, stop, ...): "Remove" all data of source
      blocks inside P/Q between (and including) start and end.
      
      2) modify any block with start <= block <= stop
      
      3) xor_syndrome(disks, start, stop, ...): "Reinsert" all data of
      source blocks into P/Q between (and including) start and end.
      
      Pages between start and stop that won't be changed should be filled
      with a pointer to the kernel zero page. The reasons for not taking NULL
      pages are:
      
      1) Algorithms cross the whole source data line by line. Thus avoid
      additional branches.
      
      2) Having a NULL page avoids calculating the XOR P parity but still
      need calulation steps for the Q parity. Depending on the algorithm
      unrolling that might be only a difference of 2 instructions per loop.
      
      The benchmark numbers of the gen_syndrome() functions are displayed in
      the kernel log. Do the same for the xor_syndrome() functions. This
      will help to analyze performance problems and give an rough estimate
      how well the algorithm works. The choice of the fastest algorithm will
      still depend on the gen_syndrome() performance.
      
      With the start/stop page implementation the speed can vary a lot in real
      life. E.g. a change of page 0 & page 15 on a stripe will be harder to
      compute than the case where page 0 & page 1 are XOR candidates. To be not
      to enthusiatic about the expected speeds we will run a worse case test
      that simulates a change on the upper half of the stripe. So we do:
      
      1) calculation of P/Q for the upper pages
      
      2) continuation of Q for the lower (empty) pages
      Signed-off-by: NMarkus Stockhausen <stockhausen@collogia.de>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      fe5cbc6e
    • A
      uapi: Remove kernel internal declaration · bff17523
      Andreas Gruenbacher 提交于
      The enum nfs4_acl_whotype is only used in nfs4d's internal nfs4 acl
      representation. No longer expose it to user space.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      bff17523
    • M
      nfsd: eliminate NFSD_DEBUG · 135dd002
      Mark Salter 提交于
      Commit f895b252 ("sunrpc: eliminate RPC_DEBUG") introduced
      use of IS_ENABLED() in a uapi header which leads to a build
      failure for userspace apps trying to use <linux/nfsd/debug.h>:
      
         linux/nfsd/debug.h:18:15: error: missing binary operator before token "("
        #if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
                      ^
      
      Since this was only used to define NFSD_DEBUG if CONFIG_SUNRPC_DEBUG
      is enabled, replace instances of NFSD_DEBUG with CONFIG_SUNRPC_DEBUG.
      
      Cc: stable@vger.kernel.org
      Fixes: f895b252 "sunrpc: eliminate RPC_DEBUG"
      Signed-off-by: NMark Salter <msalter@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      135dd002
  6. 21 4月, 2015 4 次提交
  7. 20 4月, 2015 6 次提交
    • I
      libceph: simplify our debugfs attr macro · 9571eb4f
      Ilya Dryomov 提交于
      No need to do single_open()'s job ourselves.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      9571eb4f
    • I
      libceph: expose client options through debugfs · 5cf7bd30
      Ilya Dryomov 提交于
      Add a client_options attribute for showing libceph options.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      5cf7bd30
    • I
      libceph, ceph: split ceph_show_options() · ff40f9ae
      Ilya Dryomov 提交于
      Split ceph_show_options() into two pieces and move the piece
      responsible for printing client (libceph) options into net/ceph.  This
      way people adding a libceph option wouldn't have to remember to update
      code in fs/ceph.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      ff40f9ae
    • J
      libceph: osdmap.h: Add missing format newlines · 3ef650d3
      Joe Perches 提交于
      To avoid possible interleaving, add missing '\n' to formats.
      
      Convert pr_warning to pr_warn while there.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      3ef650d3
    • A
      target: Version 2 of TCMU ABI · 0ad46af8
      Andy Grover 提交于
      The initial version of TCMU (in 3.18) does not properly handle
      bidirectional SCSI commands -- those with both an in and out buffer. In
      looking to fix this it also became clear that TCMU's support for adding
      new types of entries (opcodes) to the command ring was broken. We need
      to fix this now, so that future issues can be handled properly by adding
      new opcodes.
      
      We make the most of this ABI break by enabling bidi cmd handling within
      TCMP_OP_CMD opcode. Add an iov_bidi_cnt field to tcmu_cmd_entry.req.
      This enables TCMU to describe bidi commands, but further kernel work is
      needed for full bidi support.
      
      Enlarge tcmu_cmd_entry_hdr by 32 bits by pulling in cmd_id and __pad1. Turn
      __pad1 into two 8 bit flags fields, for kernel-set and userspace-set flags,
      "kflags" and "uflags" respectively.
      
      Update version fields so userspace can tell the interface is changed.
      
      Update tcmu-design.txt with details of how new stuff works:
      - Specify an additional requirement for userspace to set UNKNOWN_OP
        (bit 0) in hdr.uflags for unknown/unhandled opcodes.
      - Define how Data-In and Data-Out fields are described in req.iov[]
      
      Changed in v2:
      - Change name of SKIPPED bit to UNKNOWN bit
      - PAD op does not set the bit any more
      - Change len_op helper functions to take just len_op, not the whole struct
      - Change version to 2 in missed spots, and use defines
      - Add 16 unused bytes to cmd_entry.req, in case additional SAM cmd
        parameters need to be included
      - Add iov_dif_cnt field to specify buffers used for DIF info in iov[]
      - Rearrange fields to naturally align cdb_off
      - Handle if userspace sets UNKNOWN_OP by indicating failure of the cmd
      - Wrap some overly long UPDATE_HEAD lines
      
      (Add missing req.iov_bidi_cnt + req.iov_dif_cnt zeroing - Ilias)
      Signed-off-by: NAndy Grover <agrover@redhat.com>
      Reviewed-by: NIlias Tsitsimpis <iliastsi@arrikto.com>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      0ad46af8
    • P
      media-bus: Fixup RGB444_1X12, RGB565_1X16, and YUV8_1X24 media bus format · cec32a47
      Philipp Zabel 提交于
      Change the constant values for RGB444_1X12, RGB565_1X16, and YUV8_1X24 media
      bus formats in anticipation of a merge conflict with the media tree, where
      the old values are already taken by RBG888_1X24, RGB888_1X32_PADHI, and
      VUY8_1X24, respectively.
      Signed-off-by: NPhilipp Zabel <p.zabel@pengutronix.de>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      cec32a47
  8. 19 4月, 2015 2 次提交
  9. 18 4月, 2015 5 次提交
    • V
      ALSA: asound.h - use SNDRV_CTL_ELEM_ID_NAME_MAXLEN · 43c499dc
      Vinod Koul 提交于
      we have defined SNDRV_CTL_ELEM_ID_NAME_MAXLEN as size of name array so use
      this define instead of numeric value
      Signed-off-by: NVinod Koul <vinod.koul@intel.com>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      43c499dc
    • D
      netns: remove BUG_ONs from net_generic() · 2591ffd3
      Denys Vlasenko 提交于
      This inline has ~500 callsites.
      
      On 04/14/2015 08:37 PM, David Miller wrote:
      > That BUG_ON() was added 7 years ago, and I don't remember it ever
      > triggering or helping us diagnose something, so just remove it and
      > keep the function inlined.
      
      On x86 allyesconfig build:
      
          text     data      bss       dec     hex filename
      82447071 22255384 20627456 125329911 77861f7 vmlinux4
      82441375 22255384 20627456 125324215 7784bb7 vmlinux5prime
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: Eric W. Biederman <ebiederm@xmission.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: Jan Engelhardt <jengelh@medozas.de>
      CC: Jiri Pirko <jpirko@redhat.com>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2591ffd3
    • K
      dmaengine: vdma: Fix compilation warnings · e60841b4
      Kedareswara rao Appana 提交于
      This patch fixes the following compilation warnings.
      In file included from drivers/dma/xilinx/xilinx_vdma.c:26:0:
      include/linux/dmapool.h:18:4: warning: 'struct device' declared inside parameter list
          size_t size, size_t align, size_t allocation);
          ^
      include/linux/dmapool.h:18:4: warning: its scope is only this definition or declaration, which is probably not what you want
      include/linux/dmapool.h:31:7: warning: 'struct device' declared inside parameter list
             size_t size, size_t align, size_t allocation);
             ^
      drivers/dma/xilinx/xilinx_vdma.c: In function 'xilinx_vdma_alloc_chan_resources':
      drivers/dma/xilinx/xilinx_vdma.c:501:20: warning: passing argument 2 of 'dma_pool_create' from incompatible pointer type
        chan->desc_pool = dma_pool_create("xilinx_vdma_desc_pool",
                          ^
      In file included from drivers/dma/xilinx/xilinx_vdma.c:26:0:
      include/linux/dmapool.h:17:18: note: expected 'struct device *' but argument is of type 'struct device *'
       struct dma_pool *dma_pool_create(const char *name, struct device *dev, .
      Signed-off-by: NKedareswara rao Appana <appanad@xilinx.com>
      Signed-off-by: NVinod Koul <vinod.koul@intel.com>
      e60841b4
    • J
      net: remove unused 'dev' argument from netif_needs_gso() · 8b86a61d
      Johannes Berg 提交于
      In commit 04ffcb25 ("net: Add ndo_gso_check") Tom originally
      added the 'dev' argument to be able to call ndo_gso_check().
      
      Then later, when generalizing this in commit 5f35227e
      ("net: Generalize ndo_gso_check to ndo_features_check")
      Jesse removed the call to ndo_gso_check() in netif_needs_gso()
      by calling the new ndo_features_check() in a different place.
      This made the 'dev' argument unused.
      
      Remove the unused argument and go back to the code as before.
      
      Cc: Tom Herbert <therbert@google.com>
      Cc: Jesse Gross <jesse@nicira.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b86a61d
    • E
      inet_diag: fix access to tcp cc information · 521f1cf1
      Eric Dumazet 提交于
      Two different problems are fixed here :
      
      1) inet_sk_diag_fill() might be called without socket lock held.
         icsk->icsk_ca_ops can change under us and module be unloaded.
         -> Access to freed memory.
         Fix this using rcu_read_lock() to prevent module unload.
      
      2) Some TCP Congestion Control modules provide information
         but again this is not safe against icsk->icsk_ca_ops
         change and nla_put() errors were ignored. Some sockets
         could not get the additional info if skb was almost full.
      
      Fix this by returning a status from get_info() handlers and
      using rcu protection as well.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      521f1cf1