1. 06 5月, 2015 1 次提交
    • C
      splice: sendfile() at once fails for big files · 0ff28d9f
      Christophe Leroy 提交于
      Using sendfile with below small program to get MD5 sums of some files,
      it appear that big files (over 64kbytes with 4k pages system) get a
      wrong MD5 sum while small files get the correct sum.
      This program uses sendfile() to send a file to an AF_ALG socket
      for hashing.
      
      /* md5sum2.c */
      #include <stdio.h>
      #include <stdlib.h>
      #include <unistd.h>
      #include <string.h>
      #include <fcntl.h>
      #include <sys/socket.h>
      #include <sys/stat.h>
      #include <sys/types.h>
      #include <linux/if_alg.h>
      
      int main(int argc, char **argv)
      {
      	int sk = socket(AF_ALG, SOCK_SEQPACKET, 0);
      	struct stat st;
      	struct sockaddr_alg sa = {
      		.salg_family = AF_ALG,
      		.salg_type = "hash",
      		.salg_name = "md5",
      	};
      	int n;
      
      	bind(sk, (struct sockaddr*)&sa, sizeof(sa));
      
      	for (n = 1; n < argc; n++) {
      		int size;
      		int offset = 0;
      		char buf[4096];
      		int fd;
      		int sko;
      		int i;
      
      		fd = open(argv[n], O_RDONLY);
      		sko = accept(sk, NULL, 0);
      		fstat(fd, &st);
      		size = st.st_size;
      		sendfile(sko, fd, &offset, size);
      		size = read(sko, buf, sizeof(buf));
      		for (i = 0; i < size; i++)
      			printf("%2.2x", buf[i]);
      		printf("  %s\n", argv[n]);
      		close(fd);
      		close(sko);
      	}
      	exit(0);
      }
      
      Test below is done using official linux patch files. First result is
      with a software based md5sum. Second result is with the program above.
      
      root@vgoip:~# ls -l patch-3.6.*
      -rw-r--r--    1 root     root         64011 Aug 24 12:01 patch-3.6.2.gz
      -rw-r--r--    1 root     root         94131 Aug 24 12:01 patch-3.6.3.gz
      
      root@vgoip:~# md5sum patch-3.6.*
      b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
      c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz
      
      root@vgoip:~# ./md5sum2 patch-3.6.*
      b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
      5fd77b24e68bb24dcc72d6e57c64790e  patch-3.6.3.gz
      
      After investivation, it appears that sendfile() sends the files by blocks
      of 64kbytes (16 times PAGE_SIZE). The problem is that at the end of each
      block, the SPLICE_F_MORE flag is missing, therefore the hashing operation
      is reset as if it was the end of the file.
      
      This patch adds SPLICE_F_MORE to the flags when more data is pending.
      
      With the patch applied, we get the correct sums:
      
      root@vgoip:~# md5sum patch-3.6.*
      b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
      c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz
      
      root@vgoip:~# ./md5sum2 patch-3.6.*
      b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
      c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      0ff28d9f
  2. 05 5月, 2015 2 次提交
    • S
      blk-mq: don't lose requests if a stopped queue restarts · 9ba52e58
      Shaohua Li 提交于
      Normally if driver is busy to dispatch a request the logic is like below:
      block layer:					driver:
      	__blk_mq_run_hw_queue
      a.						blk_mq_stop_hw_queue
      b.	rq add to ctx->dispatch
      
      later:
      1.						blk_mq_start_hw_queue
      2.	__blk_mq_run_hw_queue
      
      But it's possible step 1-2 runs between a and b. And since rq isn't in
      ctx->dispatch yet, step 2 will not run rq. The rq might get lost if
      there are no subsequent requests kick in.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9ba52e58
    • S
      blk-mq: fix FUA request hang · b2387ddc
      Shaohua Li 提交于
      When a FUA request enters its DATA stage of flush pipeline, the
      request is added to mq requeue list, the request will then be added to
      ctx->rq_list. blk_mq_attempt_merge() might merge the request with a bio.
      Later when the request is finished the flush pipeline, the
      request->__data_len is 0. Then I only saw the bio gets endio called, the
      original request never finish.
      
      Adding REQ_FLUSH_SEQ into REQ_NOMERGE_FLAGS looks an easy fix.
      
      stable: 3.15+
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b2387ddc
  3. 28 4月, 2015 1 次提交
    • N
      block: destroy bdi before blockdev is unregistered. · 6cd18e71
      NeilBrown 提交于
      Because of the peculiar way that md devices are created (automatically
      when the device node is opened), a new device can be created and
      registered immediately after the
      	blk_unregister_region(disk_devt(disk), disk->minors);
      call in del_gendisk().
      
      Therefore it is important that all visible artifacts of the previous
      device are removed before this call.  In particular, the 'bdi'.
      
      Since:
      commit c4db59d3
      Author: Christoph Hellwig <hch@lst.de>
          fs: don't reassign dirty inodes to default_backing_dev_info
      
      moved the
         device_unregister(bdi->dev);
      call from bdi_unregister() to bdi_destroy() it has been quite easy to
      lose a race and have a new (e.g.) "md127" be created after the
      blk_unregister_region() call and before bdi_destroy() is ultimately
      called by the final 'put_disk', which must come after del_gendisk().
      
      The new device finds that the bdi name is already registered in sysfs
      and complains
      
      > [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
      > [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'
      
      We can fix this by moving the bdi_destroy() call out of
      blk_release_queue() (which can happen very late when a refcount
      reaches zero) and into blk_cleanup_queue() - which happens exactly when the md
      device driver calls it.
      
      Then it is only necessary for md to call blk_cleanup_queue() before
      del_gendisk().  As loop.c devices are also created on demand by
      opening the device node, we make the same change there.
      
      Fixes: c4db59d3Reported-by: NAzat Khuzhin <a3at.mail@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: stable@vger.kernel.org (v4.0)
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      6cd18e71
  4. 27 4月, 2015 1 次提交
    • W
      block:bounce: fix call inc_|dec_zone_page_state on different pages confuse value of NR_BOUNCE · 393a3397
      Wang YanQing 提交于
      Commit d2c5e30c
      ("[PATCH] zoned vm counters: conversion of nr_bounce to per zone counter")
      convert statistic of nr_bounce to per zone and one global value in vm_stat,
      but it call inc_|dec_zone_page_state on different pages, then different
      zones, and cause us to get unexpected value of NR_BOUNCE.
      
      Below is the result on my machine:
      Mar  2 09:26:08 udknight kernel: [144766.778265] Mem-Info:
      Mar  2 09:26:08 udknight kernel: [144766.778266] DMA per-cpu:
      Mar  2 09:26:08 udknight kernel: [144766.778268] CPU    0: hi:    0, btch:   1 usd:   0
      Mar  2 09:26:08 udknight kernel: [144766.778269] CPU    1: hi:    0, btch:   1 usd:   0
      Mar  2 09:26:08 udknight kernel: [144766.778270] Normal per-cpu:
      Mar  2 09:26:08 udknight kernel: [144766.778271] CPU    0: hi:  186, btch:  31 usd:   0
      Mar  2 09:26:08 udknight kernel: [144766.778273] CPU    1: hi:  186, btch:  31 usd:   0
      Mar  2 09:26:08 udknight kernel: [144766.778274] HighMem per-cpu:
      Mar  2 09:26:08 udknight kernel: [144766.778275] CPU    0: hi:  186, btch:  31 usd:   0
      Mar  2 09:26:08 udknight kernel: [144766.778276] CPU    1: hi:  186, btch:  31 usd:   0
      Mar  2 09:26:08 udknight kernel: [144766.778279] active_anon:46926 inactive_anon:287406 isolated_anon:0
      Mar  2 09:26:08 udknight kernel: [144766.778279]  active_file:105085 inactive_file:139432 isolated_file:0
      Mar  2 09:26:08 udknight kernel: [144766.778279]  unevictable:653 dirty:0 writeback:0 unstable:0
      Mar  2 09:26:08 udknight kernel: [144766.778279]  free:178957 slab_reclaimable:6419 slab_unreclaimable:9966
      Mar  2 09:26:08 udknight kernel: [144766.778279]  mapped:4426 shmem:305277 pagetables:784 bounce:0
      Mar  2 09:26:08 udknight kernel: [144766.778279]  free_cma:0
      Mar  2 09:26:08 udknight kernel: [144766.778286] DMA free:3324kB min:68kB low:84kB high:100kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      Mar  2 09:26:08 udknight kernel: [144766.778287] lowmem_reserve[]: 0 822 3754 3754
      Mar  2 09:26:08 udknight kernel: [144766.778293] Normal free:26828kB min:3632kB low:4540kB high:5448kB active_anon:4872kB inactive_anon:68kB active_file:1796kB inactive_file:1796kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:892920kB managed:842560kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:4144kB slab_reclaimable:25676kB slab_unreclaimable:39864kB kernel_stack:1944kB pagetables:3136kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2412612 all_unreclaimable? yes
      Mar  2 09:26:08 udknight kernel: [144766.778294] lowmem_reserve[]: 0 0 23451 23451
      Mar  2 09:26:08 udknight kernel: [144766.778299] HighMem free:685676kB min:512kB low:3748kB high:6984kB active_anon:182832kB inactive_anon:1149556kB active_file:418544kB inactive_file:555932kB unevictable:2612kB isolated(anon):0kB isolated(file):0kB present:3001732kB managed:3001732kB mlocked:0kB dirty:0kB writeback:0kB mapped:17704kB shmem:1216964kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:75771152kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      Mar  2 09:26:08 udknight kernel: [144766.778300] lowmem_reserve[]: 0 0 0 0
      
      You can see bounce:75771152kB for HighMem, but bounce:0 for lowmem and global.
      
      This patch fix it.
      Signed-off-by: NWang YanQing <udknight@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      393a3397
  5. 24 4月, 2015 5 次提交
  6. 18 4月, 2015 2 次提交
    • L
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · e076b7c1
      Linus Torvalds 提交于
      Pull block core fix from Jens Axboe:
       "A commit in the previous pull request introduce a regression.  So far
        only observed on qemu-sparc64, but it's a general bug.  Please pull
        this single fix to rectify that, thanks"
      
      [ And it turns out that it's been seen outside of that qemu-sparc64
        case, and is easy to trigger with small number of CPUs and blk-mq
        enabled by default - Linus ]
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        blk-mq: fix iteration of busy bitmap
      e076b7c1
    • L
      Merge tag 'acpica-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 0f5abd40
      Linus Torvalds 提交于
      Pull ACPICA updates from Rafael Wysocki:
       "This updates the kernel's ACPICA code to upstream revision 20150410
        and adds a fix for a GPE handling regression introduced during the
        3.19 cycle on top of that.
      
        Included are two stable-candidate bug fixes (one of them fixing a 3.16
        regression), multiple other fixes and a bunch of cleanups.
      
        Specifics:
      
         - Fix for a GPE handling regression on Dell Latitude D600 that caused
           GPE signaling to stop working on that machine, which appears to be
           due to a hardware glitch, but it used to work and it can be made
           work again in a relativly straightforward way (Rafael J Wysocki).
      
         - Fix for a mutex unlock regression related to the handling of ACPI
           tables introduced during the 3.16 development cycle (Octavian
           Purdila).
      
         - _REV modification to always return 2 which has been done by all
           versions of Windows since NT and the firmware people started to use
           it to distinguish between OSes in their AML and do some silly and
           wrong things on that basis (Bob Moore).
      
         - Fixes and cleanups related to the acpi_physicall_address data type
           including one stable-candidate fix for an issue occasionally
           occuring on 64-bit machines running 32-bit kernels where using
           offsets provided by the firmware may lead to address overflows (Lv
           Zheng).
      
         - External() opcode support infrastructure needed for recompiling
           disassembled ACPI tables in some cases including interpreter
           modification to ignore that opcode (Bob Moore).
      
         - Support for the "Windows 2015" string in _OSI (Bob Moore).
      
         - GPE debug interface change to return values read from hardware
           registers (Lv Zheng).
      
         - Removal of the __DATE__ macro usage in tools (Rasmus Villemoes).
      
         - Assorted minor fixes and cleanups (Lv Zheng, Rickard Strandqvist,
           Bob Moore)"
      
      * tag 'acpica-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits)
        ACPICA: Store GPE register enable masks upfront
        ACPICA: Update version to 20150410.
        ACPICA: Fix a couple issues with the local printf module.
        ACPICA: Disassembler: Some cleanup of the table dump module.
        ACPICA: iASL: Add support for MSDM ACPI table.
        ACPICA: Update for SLIC ACPI table.
        ACPICA: Add "//" before ascii output of buffers.
        ACPICA: Remove unused internal AML opcode.
        ACPICA: Permanently set _REV to the value '2'.
        ACPICA: Add "Windows 2015" string to _OSI support.
        ACPICA: Add infrastructure for External() opcode.
        ACPICA: iASL: Enhancement for constant folding.
        ACPICA: iASL/Disassembler: Add option to assume table contains valid AML.
        ACPICA: Update AML Debugger global variables.
        ACPICA: Update Resource descriptor dump module.
        ACPICA: Fix a sscanf format string.
        ACPICA: Casting changes around acpi_physical_address/acpi_size.
        ACPICA: Resources: Correct conditional compilation definitions.
        ACPICA: Utilities: Correct conditional compilation definitions.
        ACPICA: Tables: Move an iasl specific table function to iasl source file.
        ...
      0f5abd40
  7. 17 4月, 2015 28 次提交