1. 08 11月, 2013 1 次提交
    • M
      loop: fix crash if blk_alloc_queue fails · 3ec981e3
      Mikulas Patocka 提交于
      loop: fix crash if blk_alloc_queue fails
      
      If blk_alloc_queue fails, loop_add cleans up, but it doesn't clean up the
      identifier allocated with idr_alloc. That causes crash on module unload in
      idr_for_each(&loop_index_idr, &loop_exit_cb, NULL); where we attempt to
      remove non-existed device with that id.
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000380
      IP: [<ffffffff812057c9>] del_gendisk+0x19/0x2d0
      PGD 43d399067 PUD 43d0ad067 PMD 0
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: loop(-) dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_loop dm_mod ip6table_filter ip6_tables uvesafb cfbcopyarea cfbimgblt cfbfillrect fbcon font bitblit fbcon_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb fbdev msr ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc tun ipv6 cpufreq_userspace cpufreq_stats cpufreq_ondemand cpufreq_conservative cpufreq_powersave spadfs fuse hid_generic usbhid hid raid0 md_mod dmi_sysfs nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack snd_usb_audio snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc lm85 hwmon_vid snd_hwdep snd_usbmidi_lib snd_rawmidi snd soundcore acpi_cpufreq ohci_hcd freq_table tg3 ehci_pci mperf ehci_hcd kvm_amd kvm sata_svw serverworks libphy libata ide_core k10temp usbcore hwmon microcode ptp pcspkr pps_core e100 skge mii usb_common i2c_piix4 floppy evdev rtc_cmos i2c_core processor but!
       ton unix
      CPU: 7 PID: 2735 Comm: rmmod Tainted: G        W    3.10.15-devel #15
      Hardware name: empty empty/S3992-E, BIOS 'V1.06   ' 06/09/2009
      task: ffff88043d38e780 ti: ffff88043d21e000 task.ti: ffff88043d21e000
      RIP: 0010:[<ffffffff812057c9>]  [<ffffffff812057c9>] del_gendisk+0x19/0x2d0
      RSP: 0018:ffff88043d21fe10  EFLAGS: 00010282
      RAX: ffffffffa05102e0 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffff88043ea82800 RDI: 0000000000000000
      RBP: ffff88043d21fe48 R08: 0000000000000000 R09: 0000000000000001
      R10: 0000000000000001 R11: 0000000000000000 R12: 00000000000000ff
      R13: 0000000000000080 R14: 0000000000000000 R15: ffff88043ea82800
      FS:  00007ff646534700(0000) GS:ffff880447000000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000380 CR3: 000000043e9bf000 CR4: 00000000000007e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Stack:
       ffffffff8100aba4 0000000000000092 ffff88043d21fe48 ffff88043ea82800
       00000000000000ff ffff88043d21fe98 0000000000000000 ffff88043d21fe60
       ffffffffa05102b4 0000000000000000 ffff88043d21fe70 ffffffffa05102ec
      Call Trace:
       [<ffffffff8100aba4>] ? native_sched_clock+0x24/0x80
       [<ffffffffa05102b4>] loop_remove+0x14/0x40 [loop]
       [<ffffffffa05102ec>] loop_exit_cb+0xc/0x10 [loop]
       [<ffffffff81217b74>] idr_for_each+0x104/0x190
       [<ffffffffa05102e0>] ? loop_remove+0x40/0x40 [loop]
       [<ffffffff8109adc5>] ? trace_hardirqs_on_caller+0x105/0x1d0
       [<ffffffffa05135dc>] loop_exit+0x34/0xa58 [loop]
       [<ffffffff810a98ea>] SyS_delete_module+0x13a/0x260
       [<ffffffff81221d5e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
       [<ffffffff813cff16>] system_call_fastpath+0x1a/0x1f
      Code: f0 4c 8b 6d f8 c9 c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 56 41 55 4c 8d af 80 00 00 00 41 54 53 48 89 fb 48 83 ec 18 <48> 83 bf 80 03 00
      00 00 74 4d e8 98 fe ff ff 31 f6 48 c7 c7 20
      RIP  [<ffffffff812057c9>] del_gendisk+0x19/0x2d0
       RSP <ffff88043d21fe10>
      CR2: 0000000000000380
      ---[ end trace 64ec069ec70f1309 ]---
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: stable@kernel.org	# 3.1+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3ec981e3
  2. 29 6月, 2013 1 次提交
  3. 07 5月, 2013 1 次提交
  4. 10 4月, 2013 1 次提交
  5. 08 4月, 2013 1 次提交
    • J
      Revert "loop: cleanup partitions when detaching loop device" · c2fccc1c
      Jens Axboe 提交于
      This reverts commit 8761a3dc.
      
      There are situations where the destruction path is called
      with the bdev->bd_mutex already held, which then deadlocks in
      loop_clr_fd(). The normal partition cleanup does a trylock()
      on the mutex, but it'd be nice to have a more bullet proof
      method in loop. So punt this more involved fix to the next
      merge window, and just back out this buggy fix for now.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c2fccc1c
  6. 02 4月, 2013 1 次提交
    • A
      loop: prevent bdev freeing while device in use · c1681bf8
      Anatol Pomozov 提交于
      struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
      block_device allocated first time we access /dev/loopXX and deallocated on
      bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
      we want that block_device stay alive until we destroy the loop device
      with "losetup -d".
      
      But because we do not hold /dev/loopXX inode its counter goes 0, and
      inode/bdev can be destroyed at any moment. Usually it happens at memory
      pressure or when user drops inode cache (like in the test below). When later in
      loop_clr_fd() we want to use bdev we have use-after-free error with following
      stack:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
        bd_set_size+0x10/0xa0
        loop_clr_fd+0x1f8/0x420 [loop]
        lo_ioctl+0x200/0x7e0 [loop]
        lo_compat_ioctl+0x47/0xe0 [loop]
        compat_blkdev_ioctl+0x341/0x1290
        do_filp_open+0x42/0xa0
        compat_sys_ioctl+0xc1/0xf20
        do_sys_open+0x16e/0x1d0
        sysenter_dispatch+0x7/0x1a
      
      To prevent use-after-free we need to grab the device in loop_set_fd()
      and put it later in loop_clr_fd().
      
      The issue is reprodusible on current Linus head and v3.3. Here is the test:
      
        dd if=/dev/zero of=loop.file bs=1M count=1
        while [ true ]; do
          losetup /dev/loop0 loop.file
          echo 2 > /proc/sys/vm/drop_caches
          losetup -d /dev/loop0
        done
      
      [ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
        time we call loop_set_fd() we check that loop_device->lo_state is
        Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
        it will get EBUSY.  And if we try to loop_clr_fd() on unbound loop
        device we'll get ENXIO.
      
        loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
        loop_device->lo_ctl_mutex. ]
      Signed-off-by: NAnatol Pomozov <anatol.pomozov@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c1681bf8
  7. 23 3月, 2013 1 次提交
    • P
      loop: cleanup partitions when detaching loop device · 8761a3dc
      Phillip Susi 提交于
      Any partitions added by user space to the loop device were being
      left in place after detaching the loop device.  This was because
      the detach path issued a BLKRRPART to clean up partitions if
      LO_FLAGS_PARTSCAN was set, meaning that the partitions were auto
      scanned on attach.  Replace this BLKRRPART with code that
      unconditionally cleans up partitions on detach instead.
      Signed-off-by: NPhillip Susi <psusi@ubuntu.com>
      
      Modified by Jens to export delete_partition().
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8761a3dc
  8. 22 3月, 2013 1 次提交
  9. 28 2月, 2013 2 次提交
  10. 26 2月, 2013 1 次提交
  11. 22 2月, 2013 5 次提交
    • G
      loopdev: ignore negative offset when calculate loop device size · b7a1da69
      Guo Chao 提交于
      Negative offset may cause loop device size larger than backing file
      size.
      
       $ fallocate -l 1M a
       $ losetup --offset 0xffffffffffff0000 /dev/loop0 a
       $ blockdev --getsize64 /dev/loop0
       1114112
       $ ls -l a
       -rw-r--r-- 1 root root 1048576 Jan 23 12:46 a
       $ cat /dev/loop0
       cat: /dev/loop0: Input/output error
      
      It makes no sense to do that. Only apply offset when it's positive.
      
      Fix a typo in the comment by the way.
      Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: M. Hindess <hindessm@uk.ibm.com>
      Cc: Nikanth Karthikesan <knikanth@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b7a1da69
    • G
      loopdev: remove an user triggerable oops · b1a66504
      Guo Chao 提交于
      When loopdev is built as module and we pass an invalid parameter,
      loop_init() will return directly without deregister misc device, which
      will cause an oops when insert loop module next time because we left some
      garbage in the misc device list.
      
      Test case:
      sudo modprobe loop max_part=1024
      (failed due to invalid parameter)
      sudo modprobe loop
      (oops)
      
      Clean up nicely to avoid such oops.
      Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: M. Hindess <hindessm@uk.ibm.com>
      Cc: Nikanth Karthikesan <knikanth@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b1a66504
    • G
      loopdev: move common code into loop_figure_size() · 7b0576a3
      Guo Chao 提交于
      Update block device size in accord with gendisk size and let userspace
      know the change in loop_figure_size(). This is a clean up to remove
      common code of loop_figure_size()'s two callers.
      Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: M. Hindess <hindessm@uk.ibm.com>
      Cc: Nikanth Karthikesan <knikanth@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7b0576a3
    • G
      loopdev: update block device size in loop_set_status() · 541c742a
      Guo Chao 提交于
      Loop device driver sometimes fails to impose the size limit on the
      device. Keep issuing following two commands:
      
      losetup --offset 7517244416 --sizelimit 3224971264 /dev/loop0 backed_file
      blockdev --getsize64 /dev/loop0
      
      blockdev reports file size instead of sizelimit several out of 100 times.
      
      The problems are:
      
      	- losetup set up the device in two ioctl:
      		  LOOP_SET_FD and LOOP_SET_STATUS64.
      
      	- LOOP_SET_STATUS64 only update size of gendisk.
      
      Block device size will be updated lazily when device comes to use. If udev
      rushes in between the two ioctl, it will bring in a block device whose
      size is backing file size. If the device is not released after
      LOOP_SET_STATUS64 ioctl, blockdev will not see the updated size.
      
      Update block size in LOOP_SET_STATUS64 ioctl.
      Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
      Reported-by: NM. Hindess <hindessm@uk.ibm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Nikanth Karthikesan <knikanth@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      541c742a
    • G
      loopdev: fix a deadlock · 5370019d
      Guo Chao 提交于
      bd_mutex and lo_ctl_mutex can be held in different order.
      
      Path #1:
      
      blkdev_open
       blkdev_get
        __blkdev_get (hold bd_mutex)
         lo_open (hold lo_ctl_mutex)
      
      Path #2:
      
      blkdev_ioctl
       lo_ioctl (hold lo_ctl_mutex)
        lo_set_capacity (hold bd_mutex)
      
      Lockdep does not report it, because path #2 actually holds a subclass of
      lo_ctl_mutex.  This subclass seems creep into the code by mistake.  The
      patch author actually just mentioned it in the changelog, see commit
      f028f3b2 ("loop: fix circular locking in loop_clr_fd()"), also see:
      
      	http://marc.info/?l=linux-kernel&m=123806169129727&w=2
      
      Path #2 hold bd_mutex to call bd_set_size(), I've protected it
      with i_mutex in a previous patch, so drop bd_mutex at this site.
      Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: M. Hindess <hindessm@uk.ibm.com>
      Cc: Nikanth Karthikesan <knikanth@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5370019d
  12. 30 11月, 2012 1 次提交
    • L
      loop: Limit the number of requests in the bio list · 7b5a3522
      Lukas Czerner 提交于
      Currently there is not limitation of number of requests in the loop bio
      list. This can lead into some nasty situations when the caller spawns
      tons of bio requests taking huge amount of memory. This is even more
      obvious with discard where blkdev_issue_discard() will submit all bios
      for the range and wait for them to finish afterwards. On really big loop
      devices and slow backing file system this can lead to OOM situation as
      reported by Dave Chinner.
      
      With this patch we will wait in loop_make_request() if the number of
      bios in the loop bio list would exceed 'nr_congestion_on'.
      We'll wake up the process as we process the bios form the list. Some
      threshold hysteresis is in place to avoid high frequency oscillation.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reported-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7b5a3522
  13. 30 10月, 2012 1 次提交
    • D
      loop: Make explicit loop device destruction lazy · a1ecac3b
      Dave Chinner 提交于
      xfstests has always had random failures of tests due to loop devices
      failing to be torn down and hence leaving filesytems that cannot be
      unmounted. This causes test runs to immediately stop.
      
      Over the past 6 or 7 years we've added hacks like explicit unmount
      -d commands for loop mounts, losetup -d after unmount -d fails, etc,
      but still the problems persist.  Recently, the frequency of loop
      related failures increased again to the point that xfstests 259 will
      reliably fail with a stray loop device that was not torn down.
      
      That is despite the fact the test is above as simple as it gets -
      loop 5 or 6 times running mkfs.xfs with different paramters:
      
              lofile=$(losetup -f)
              losetup $lofile "$testfile"
              "$MKFS_XFS_PROG" -b size=512 $lofile >/dev/null || echo "mkfs failed!"
              sync
              losetup -d $lofile
      
      And losteup -d $lofile is failing with EBUSY on 1-3 of these loops
      every time the test is run.
      
      Turns out that blkid is running simultaneously with losetup -d, and
      so it sees an elevated reference count and returns EBUSY.  But why
      is blkid running? It's obvious, isn't it? udev has decided to try
      and find out what is on the block device as a result of a creation
      notification. And it is racing with mkfs, so might still be scanning
      the device when mkfs finishes and we try to tear it down.
      
      So, make losetup -d force autoremove behaviour. That is, when the
      last reference goes away, tear down the device. xfstests wants it
      *gone*, not causing random teardown failures when we know that all
      the operations the tests have specifically run on the device have
      completed and are no longer referencing the loop device.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a1ecac3b
  14. 21 9月, 2012 1 次提交
  15. 15 7月, 2012 1 次提交
  16. 20 3月, 2012 1 次提交
  17. 09 2月, 2012 1 次提交
  18. 04 1月, 2012 1 次提交
  19. 02 12月, 2011 1 次提交
  20. 25 11月, 2011 1 次提交
  21. 16 11月, 2011 2 次提交
    • D
      loop: cleanup set_status interface · 7035b5df
      Dmitry Monakhov 提交于
      1) Anyone who has read access to loopdev has permission to call set_status
         and may change important parameters such as lo_offset, lo_sizelimit and
         so on, which contradicts to read access pattern and definitely equals
         to write access pattern.
      2) Add lo_offset over i_size check to prevent blkdev_size overflow.
         ##Testcase_bagin
         #dd if=/dev/zero of=./file bs=1k count=1
         #losetup /dev/loop0 ./file
         /* userspace_application */
         struct loop_info64 loinf;
         fd = open("/dev/loop0", O_RDONLY);
         ioctl(fd, LOOP_GET_STATUS64, &loinf);
         /* Set offset to any value which is bigger than i_size, and sizelimit
          * to nonzero value*/
         loinf.lo_offset = 4096*1024;
         loinf.lo_sizelimit = 1024;
         ioctl(fd, LOOP_SET_STATUS64, &loinf);
         /* After this loop device will have size similar to 0x7fffffffffxxxx */
         #blockdev --getsz /dev/loop0
         ##OUTPUT: 36028797018955968
         ##Testcase_end
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7035b5df
    • D
      loop: prevent information leak after failed read · 3bb90682
      Dmitry Monakhov 提交于
      If read was not fully successful we have to fail whole bio to prevent
      information leak of old pages
      
      ##Testcase_begin
      dd if=/dev/zero of=./file bs=1M count=1
      losetup /dev/loop0 ./file -o 4096
      truncate -s 0 ./file
      # OOps loop offset is now beyond i_size, so read will silently fail.
      # So bio's pages would not be cleared, may which result in information leak.
      hexdump -C /dev/loop0
      ##testcase_end
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3bb90682
  22. 17 10月, 2011 1 次提交
    • C
      loop: remove the incorrect write_begin/write_end shortcut · 456be148
      Christoph Hellwig 提交于
      Currently the loop device tries to call directly into write_begin/write_end
      instead of going through ->write if it can.  This is a fairly nasty shortcut
      as write_begin and write_end are only callbacks for the generic write code
      and expect to be called with filesystem specific locks held.
      
      This code currently causes various issues for clustered filesystems as it
      doesn't take the required cluster locks, and it also causes issues for XFS
      as it doesn't properly lock against the swapext ioctl as called by the
      defragmentation tools.  This in case causes data corruption if
      defragmentation hits a busy loop device in the wrong time window, as
      reported by RH QA.
      
      The reason why we have this shortcut is that it saves a data copy when
      doing a transformation on the loop device, which is the technical term
      for using cryptoloop (or an XOR transformation).  Given that cryptoloop
      has been deprecated in favour of dm-crypt my opinion is that we should
      simply drop this shortcut instead of finding complicated ways to to
      introduce a formal interface for this shortcut.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      456be148
  23. 21 9月, 2011 2 次提交
  24. 12 9月, 2011 1 次提交
  25. 24 8月, 2011 1 次提交
    • K
      loop: always allow userspace partitions and optionally support automatic scanning · e03c8dd1
      Kay Sievers 提交于
      Automatic partition scanning can be requested individually per loop
      device during its setup by setting LO_FLAGS_PARTSCAN. By default, no
      partition tables are scanned.
      
      Userspace can now always add and remove partitions from all loop
      devices, regardless if the in-kernel partition scanner is enabled or
      not.
      
      The needed partition minor numbers are allocated from the extended
      minors space, the main loop device numbers will continue to match the
      loop minors, regardless of the number of partitions used.
      
        # grep . /sys/class/block/loop1/loop/*
        /sys/block/loop1/loop/autoclear:0
        /sys/block/loop1/loop/backing_file:/home/kay/data/stuff/part.img
        /sys/block/loop1/loop/offset:0
        /sys/block/loop1/loop/partscan:1
        /sys/block/loop1/loop/sizelimit:0
      
        # ls -l /dev/loop*
        brw-rw---- 1 root disk   7,   0 Aug 14 20:22 /dev/loop0
        brw-rw---- 1 root disk   7,   1 Aug 14 20:23 /dev/loop1
        brw-rw---- 1 root disk 259,   0 Aug 14 20:23 /dev/loop1p1
        brw-rw---- 1 root disk 259,   1 Aug 14 20:23 /dev/loop1p2
        brw-rw---- 1 root disk   7,  99 Aug 14 20:23 /dev/loop99
        brw-rw---- 1 root disk 259,   2 Aug 14 20:23 /dev/loop99p1
        brw-rw---- 1 root disk 259,   3 Aug 14 20:23 /dev/loop99p2
        crw------T 1 root root  10, 237 Aug 14 20:22 /dev/loop-control
      
      Cc: Karel Zak  <kzak@redhat.com>
      Cc: Davidlohr Bueso <dave@gnu.org>
      Acked-By: NTejun Heo <tj@kernel.org>
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      e03c8dd1
  26. 19 8月, 2011 1 次提交
    • L
      loop: add discard support for loop devices · dfaa2ef6
      Lukas Czerner 提交于
      This commit adds discard support for loop devices. Discard is usually
      supported by SSD and thinly provisioned devices as a method for
      reclaiming unused space. This is no different than trying to reclaim
      back space which is not used by the file system on the image, but it
      still occupies space on the host file system.
      
      We can do the reclamation on file system which does support hole
      punching. So when discard request gets to the loop driver we can
      translate that to punch a hole to the underlying file, hence reclaim
      the free space.
      
      This is very useful for trimming down the size of the image to only what
      is really used by the file system on that image. Fstrim may be used for
      that purpose.
      
      It has been tested on ext4, xfs and btrfs with the image file systems
      ext4, ext3, xfs and btrfs. ext4, or ext6 image on ext4 file system has
      some problems but it seems that ext4 punch hole implementation is
      somewhat flawed and it is unrelated to this commit.
      
      Also this is a very good method of validating file systems punch hole
      implementation.
      
      Note that when encryption is used, discard support is disabled, because
      using it might leak some information useful for possible attacker.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      dfaa2ef6
  27. 01 8月, 2011 4 次提交
    • K
      loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other · 05eb0f25
      Kay Sievers 提交于
      LOOP_CLR_FD takes lo->lo_ctl_mutex and tries to remove the loop sysfs
      files. Sysfs calls show() and waits for lo->lo_ctl_mutex. LOOP_CLR_FD
      waits for show() to finish to remove the sysfs file.
      
        cat /sys/class/block/loop0/loop/backing_file
          mutex_lock_nested+0x176/0x350
          ? loop_attr_do_show_backing_file+0x2f/0xd0 [loop]
          ? loop_attr_do_show_backing_file+0x2f/0xd0 [loop]
          loop_attr_do_show_backing_file+0x2f/0xd0 [loop]
          dev_attr_show+0x1b/0x60
          ? sysfs_read_file+0x86/0x1a0
          ? __get_free_pages+0x12/0x50
          sysfs_read_file+0xaf/0x1a0
      
        ioctl(LOOP_CLR_FD):
          wait_for_common+0x12c/0x180
          ? try_to_wake_up+0x2a0/0x2a0
          wait_for_completion+0x18/0x20
          sysfs_deactivate+0x178/0x180
          ? sysfs_addrm_finish+0x43/0x70
          ? sysfs_addrm_start+0x1d/0x20
          sysfs_addrm_finish+0x43/0x70
          sysfs_hash_and_remove+0x85/0xa0
          sysfs_remove_group+0x59/0x100
          loop_clr_fd+0x1dc/0x3f0 [loop]
          lo_ioctl+0x223/0x7a0 [loop]
      
      Instead of taking the lo_ctl_mutex from sysfs code, take the inner
      lo->lo_lock, to protect the access to the backing_file data.
      
      Thanks to Tejun for help debugging and finding a solution.
      
      Cc: Milan Broz <mbroz@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      05eb0f25
    • K
      loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices · d134b00b
      Kay Sievers 提交于
      Instead of unconditionally creating a fixed number of dead loop
      devices which need to be investigated by storage handling services,
      even when they are never used, we allow distros start with 0
      loop devices and have losetup(8) and similar switch to the dynamic
      /dev/loop-control interface instead of searching /dev/loop%i for free
      devices.
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      d134b00b
    • K
      loop: add management interface for on-demand device allocation · 770fe30a
      Kay Sievers 提交于
      Loop devices today have a fixed pre-allocated number of usually 8.
      The number can only be changed at module init time. To find a free
      device to use, /dev/loop%i needs to be scanned, and all devices need
      to be opened until a free one is possibly found.
      
      This adds a new /dev/loop-control device node, that allows to
      dynamically find or allocate a free device, and to add and remove loop
      devices from the running system:
       LOOP_CTL_ADD adds a specific device. Arg is the number
       of the device. It returns the device i or a negative
       error code.
      
       LOOP_CTL_REMOVE removes a specific device, Arg is the
       number the device. It returns the device i or a negative
       error code.
      
       LOOP_CTL_GET_FREE finds the next unbound device or allocates
       a new one. No arg is given. It returns the device i or a
       negative error code.
      
      The loop kernel module gets automatically loaded when
      /dev/loop-control is accessed the first time. The alias
      specified in the module, instructs udev to create this
      'dead' device node, even when the module is not loaded.
      
      Example:
       cfd = open("/dev/loop-control", O_RDWR);
      
       # add a new specific loop device
       err = ioctl(cfd, LOOP_CTL_ADD, devnr);
      
       # remove a specific loop device
       err = ioctl(cfd, LOOP_CTL_REMOVE, devnr);
      
       # find or allocate a free loop device to use
       devnr = ioctl(cfd, LOOP_CTL_GET_FREE);
      
       sprintf(loopname, "/dev/loop%i", devnr);
       ffd = open("backing-file", O_RDWR);
       lfd = open(loopname, O_RDWR);
       err = ioctl(lfd, LOOP_SET_FD, ffd);
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Karel Zak  <kzak@redhat.com>
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      770fe30a
    • K
      loop: replace linked list of allocated devices with an idr index · 34dd82af
      Kay Sievers 提交于
      Replace the linked list, that keeps track of allocated devices, with an
      idr index to allow a more efficient lookup of devices.
      
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      34dd82af
  28. 27 5月, 2011 1 次提交
    • N
      loop: export module parameters · ac04fee0
      Namhyung Kim 提交于
      Export 'max_loop' and 'max_part' parameters to sysfs so user can know
      that how many devices are allowed and how many partitions are supported.
      
      If 'max_loop' is 0, there is no restriction on the number of loop devices.
      User can create/use the devices as many as minor numbers available. If
      'max_part' is 0, it means simply the device doesn't support partitioning.
      
      Also note that 'max_part' can be adjusted to power of 2 minus 1 form if
      needed. User should check this value after the module loading if he/she
      want to use that number correctly (i.e. fdisk, mknod, etc.).
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      Cc: Laurent Vivier <Laurent.Vivier@bull.net>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      ac04fee0
  29. 24 5月, 2011 2 次提交
    • N
      loop: handle on-demand devices correctly · a1c15c59
      Namhyung Kim 提交于
      When finding or allocating a loop device, loop_probe() did not take
      partition numbers into account so that it can result to a different
      device. Consider following example:
      
      $ sudo modprobe loop max_part=15
      $ ls -l /dev/loop*
      brw-rw---- 1 root disk 7,   0 2011-05-24 22:16 /dev/loop0
      brw-rw---- 1 root disk 7,  16 2011-05-24 22:16 /dev/loop1
      brw-rw---- 1 root disk 7,  32 2011-05-24 22:16 /dev/loop2
      brw-rw---- 1 root disk 7,  48 2011-05-24 22:16 /dev/loop3
      brw-rw---- 1 root disk 7,  64 2011-05-24 22:16 /dev/loop4
      brw-rw---- 1 root disk 7,  80 2011-05-24 22:16 /dev/loop5
      brw-rw---- 1 root disk 7,  96 2011-05-24 22:16 /dev/loop6
      brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7
      $ sudo mknod /dev/loop8 b 7 128
      $ sudo losetup /dev/loop8 ~/temp/disk-with-3-parts.img
      $ sudo losetup -a
      /dev/loop128: [0805]:278201 (/home/namhyung/temp/disk-with-3-parts.img)
      $ ls -l /dev/loop*
      brw-rw---- 1 root disk 7,    0 2011-05-24 22:16 /dev/loop0
      brw-rw---- 1 root disk 7,   16 2011-05-24 22:16 /dev/loop1
      brw-rw---- 1 root disk 7, 2048 2011-05-24 22:18 /dev/loop128
      brw-rw---- 1 root disk 7, 2049 2011-05-24 22:18 /dev/loop128p1
      brw-rw---- 1 root disk 7, 2050 2011-05-24 22:18 /dev/loop128p2
      brw-rw---- 1 root disk 7, 2051 2011-05-24 22:18 /dev/loop128p3
      brw-rw---- 1 root disk 7,   32 2011-05-24 22:16 /dev/loop2
      brw-rw---- 1 root disk 7,   48 2011-05-24 22:16 /dev/loop3
      brw-rw---- 1 root disk 7,   64 2011-05-24 22:16 /dev/loop4
      brw-rw---- 1 root disk 7,   80 2011-05-24 22:16 /dev/loop5
      brw-rw---- 1 root disk 7,   96 2011-05-24 22:16 /dev/loop6
      brw-rw---- 1 root disk 7,  112 2011-05-24 22:16 /dev/loop7
      brw-r--r-- 1 root root 7,  128 2011-05-24 22:17 /dev/loop8
      
      After this patch, /dev/loop8 - instead of /dev/loop128 - was
      accessed correctly.
      
      In addition, 'range' passed to blk_register_region() should
      include all range of dev_t that LOOP_MAJOR can address. It does
      not need to be limited by partition numbers unless 'max_loop'
      param was specified.
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      Cc: Laurent Vivier <Laurent.Vivier@bull.net>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      a1c15c59
    • N
      loop: limit 'max_part' module param to DISK_MAX_PARTS · 78f4bb36
      Namhyung Kim 提交于
      The 'max_part' parameter controls the number of maximum partition
      a loop block device can have. However if a user specifies very
      large value it would exceed the limitation of device minor number
      and can cause a kernel panic (or, at least, produce invalid
      device nodes in some cases).
      
      On my desktop system, following command kills the kernel. On qemu,
      it triggers similar oops but the kernel was alive:
      
      $ sudo modprobe loop max_part0000
       ------------[ cut here ]------------
       kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65!
       invalid opcode: 0000 [#1] SMP
       last sysfs file:
       CPU 0
       Modules linked in: loop(+)
      
       Pid: 43, comm: insmod Tainted: G        W   2.6.39-qemu+ #155 Bochs Bochs
       RIP: 0010:[<ffffffff8113ce61>]  [<ffffffff8113ce61>] internal_create_group=
      +0x2a/0x170
       RSP: 0018:ffff880007b3fde8  EFLAGS: 00000246
       RAX: 00000000ffffffef RBX: ffff880007b3d878 RCX: 00000000000007b4
       RDX: ffffffff8152da50 RSI: 0000000000000000 RDI: ffff880007b3d878
       RBP: ffff880007b3fe38 R08: ffff880007b3fde8 R09: 0000000000000000
       R10: ffff88000783b4a8 R11: ffff880007b3d878 R12: ffffffff8152da50
       R13: ffff880007b3d868 R14: 0000000000000000 R15: ffff880007b3d800
       FS:  0000000002137880(0063) GS:ffff880007c00000(0000) knlGS:00000000000000=
      00
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000422680 CR3: 0000000007b50000 CR4: 00000000000006b0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
       Process insmod (pid: 43, threadinfo ffff880007b3e000, task ffff880007afb9c=
      0)
       Stack:
        ffff880007b3fe58 ffffffff811e66dd ffff880007b3fe58 ffffffff811e570b
        0000000000000010 ffff880007b3d800 ffff880007a7b390 ffff880007b3d868
        0000000000400920 ffff880007b3d800 ffff880007b3fe48 ffffffff8113cfc8
       Call Trace:
        [<ffffffff811e66dd>] ? device_add+0x4bc/0x5af
        [<ffffffff811e570b>] ? dev_set_name+0x3c/0x3e
        [<ffffffff8113cfc8>] sysfs_create_group+0xe/0x12
        [<ffffffff810b420e>] blk_trace_init_sysfs+0x14/0x16
        [<ffffffff8116a090>] blk_register_queue+0x47/0xf7
        [<ffffffff8116f527>] add_disk+0xdf/0x290
        [<ffffffffa00060eb>] loop_init+0xeb/0x1b8 [loop]
        [<ffffffffa0006000>] ? 0xffffffffa0005fff
        [<ffffffff8100020a>] do_one_initcall+0x7a/0x12e
        [<ffffffff81096804>] sys_init_module+0x9c/0x1e0
        [<ffffffff813329bb>] system_call_fastpath+0x16/0x1b
       Code: c3 55 48 89 e5 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 53 48 89 fb=
       48 83 ec 28 48 85 ff 74 0b 85 f6 75 0b 48 83 7f 30 00 75 14 <0f> 0b eb fe =
      48 83 7f 30 00 b9 ea ff ff ff 0f 84 18 01 00 00 49
       RIP  [<ffffffff8113ce61>] internal_create_group+0x2a/0x170
        RSP <ffff880007b3fde8>
       ---[ end trace a123eb592043acad ]---
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      Cc: Laurent Vivier <Laurent.Vivier@bull.net>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      78f4bb36