1. 04 11月, 2017 2 次提交
  2. 10 8月, 2017 3 次提交
  3. 05 6月, 2017 1 次提交
  4. 22 4月, 2017 1 次提交
    • I
      block: get rid of blk_integrity_revalidate() · 19b7ccf8
      Ilya Dryomov 提交于
      Commit 25520d55 ("block: Inline blk_integrity in struct gendisk")
      introduced blk_integrity_revalidate(), which seems to assume ownership
      of the stable pages flag and unilaterally clears it if no blk_integrity
      profile is registered:
      
          if (bi->profile)
                  disk->queue->backing_dev_info->capabilities |=
                          BDI_CAP_STABLE_WRITES;
          else
                  disk->queue->backing_dev_info->capabilities &=
                          ~BDI_CAP_STABLE_WRITES;
      
      It's called from revalidate_disk() and rescan_partitions(), making it
      impossible to enable stable pages for drivers that support partitions
      and don't use blk_integrity: while the call in revalidate_disk() can be
      trivially worked around (see zram, which doesn't support partitions and
      hence gets away with zram_revalidate_disk()), rescan_partitions() can
      be triggered from userspace at any time.  This breaks rbd, where the
      ceph messenger is responsible for generating/verifying CRCs.
      
      Since blk_integrity_{un,}register() "must" be used for (un)registering
      the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
      setting there.  This way drivers that call blk_integrity_register() and
      use integrity infrastructure won't interfere with drivers that don't
      but still want stable pages.
      
      Fixes: 25520d55 ("block: Inline blk_integrity in struct gendisk")
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 4.4+, needs backporting
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      19b7ccf8
  5. 25 3月, 2017 1 次提交
  6. 09 3月, 2017 1 次提交
  7. 02 2月, 2017 1 次提交
    • D
      scsi, block: fix duplicate bdi name registration crashes · 0dba1314
      Dan Williams 提交于
      Warnings of the following form occur because scsi reuses a devt number
      while the block layer still has it referenced as the name of the bdi
      [1]:
      
       WARNING: CPU: 1 PID: 93 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80
       sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:192'
       [..]
       Call Trace:
        dump_stack+0x86/0xc3
        __warn+0xcb/0xf0
        warn_slowpath_fmt+0x5f/0x80
        ? kernfs_path_from_node+0x4f/0x60
        sysfs_warn_dup+0x62/0x80
        sysfs_create_dir_ns+0x77/0x90
        kobject_add_internal+0xb2/0x350
        kobject_add+0x75/0xd0
        device_add+0x15a/0x650
        device_create_groups_vargs+0xe0/0xf0
        device_create_vargs+0x1c/0x20
        bdi_register+0x90/0x240
        ? lockdep_init_map+0x57/0x200
        bdi_register_owner+0x36/0x60
        device_add_disk+0x1bb/0x4e0
        ? __pm_runtime_use_autosuspend+0x5c/0x70
        sd_probe_async+0x10d/0x1c0
        async_run_entry_fn+0x39/0x170
      
      This is a brute-force fix to pass the devt release information from
      sd_probe() to the locations where we register the bdi,
      device_add_disk(), and unregister the bdi, blk_cleanup_queue().
      
      Thanks to Omar for the quick reproducer script [2]. This patch survives
      where an unmodified kernel fails in a few seconds.
      
      [1]: https://marc.info/?l=linux-scsi&m=147116857810716&w=4
      [2]: http://marc.info/?l=linux-block&m=148554717109098&w=2
      
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: Bart Van Assche <bart.vanassche@sandisk.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Jan Kara <jack@suse.cz>
      Reported-by: NOmar Sandoval <osandov@osandov.com>
      Tested-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      0dba1314
  8. 23 12月, 2016 1 次提交
  9. 11 10月, 2016 1 次提交
    • E
      latent_entropy: Mark functions with __latent_entropy · 0766f788
      Emese Revfy 提交于
      The __latent_entropy gcc attribute can be used only on functions and
      variables.  If it is on a function then the plugin will instrument it for
      gathering control-flow entropy. If the attribute is on a variable then
      the plugin will initialize it with random contents.  The variable must
      be an integer, an integer array type or a structure with integer fields.
      
      These specific functions have been selected because they are init
      functions (to help gather boot-time entropy), are called at unpredictable
      times, or they have variable loops, each of which provide some level of
      latent entropy.
      Signed-off-by: NEmese Revfy <re.emese@gmail.com>
      [kees: expanded commit message]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      0766f788
  10. 28 6月, 2016 1 次提交
  11. 16 6月, 2016 1 次提交
  12. 21 5月, 2016 1 次提交
  13. 10 1月, 2016 2 次提交
  14. 22 10月, 2015 3 次提交
    • D
      block: move blk_integrity to request_queue · ac6fc48c
      Dan Williams 提交于
      A trace like the following proceeds a crash in bio_integrity_process()
      when it goes to use an already freed blk_integrity profile.
      
       BUG: unable to handle kernel paging request at ffff8800d31b10d8
       IP: [<ffff8800d31b10d8>] 0xffff8800d31b10d8
       PGD 2f65067 PUD 21fffd067 PMD 80000000d30001e3
       Oops: 0011 [#1] SMP
       Dumping ftrace buffer:
       ---------------------------------
          ndctl-2222    2.... 44526245us : disk_release: pmem1s
       systemd--2223    4.... 44573945us : bio_integrity_endio: pmem1s
          <...>-409     4.... 44574005us : bio_integrity_process: pmem1s
       ---------------------------------
      [..]
        Call Trace:
        [<ffffffff8144e0f9>] ? bio_integrity_process+0x159/0x2d0
        [<ffffffff8144e4f6>] bio_integrity_verify_fn+0x36/0x60
        [<ffffffff810bd2dc>] process_one_work+0x1cc/0x4e0
      
      Given that a request_queue is pinned while i/o is in flight and that a
      gendisk is allowed to have a shorter lifetime, move blk_integrity to
      request_queue to satisfy requests arriving after the gendisk has been
      torn down.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      [martin: fix the CONFIG_BLK_DEV_INTEGRITY=n case]
      Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      ac6fc48c
    • M
      block: Inline blk_integrity in struct gendisk · 25520d55
      Martin K. Petersen 提交于
      Up until now the_integrity profile has been dynamically allocated and
      attached to struct gendisk after the disk has been made active.
      
      This causes problems because NVMe devices need to register the profile
      prior to the partition table being read due to a mandatory metadata
      buffer requirement. In addition, DM goes through hoops to deal with
      preallocating, but not initializing integrity profiles.
      
      Since the integrity profile is small (4 bytes + a pointer), Christoph
      suggested moving it to struct gendisk proper. This requires several
      changes:
      
       - Moving the blk_integrity definition to genhd.h.
      
       - Inlining blk_integrity in struct gendisk.
      
       - Removing the dynamic allocation code.
      
       - Adding helper functions which allow gendisk to set up and tear down
         the integrity sysfs dir when a disk is added/deleted.
      
       - Adding a blk_integrity_revalidate() callback for updating the stable
         pages bdi setting.
      
       - The calls that depend on whether a device has an integrity profile or
         not now key off of the bi->profile pointer.
      
       - Simplifying the integrity support routines in DM (Mike Snitzer).
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reported-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      25520d55
    • M
      block: Move integrity kobject to struct gendisk · aff34e19
      Martin K. Petersen 提交于
      The integrity kobject purely exists to support the integrity
      subdirectory in sysfs and doesn't really have anything to do with the
      blk_integrity data structure. Move the kobject to struct gendisk where
      it belongs.
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reported-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      aff34e19
  15. 17 7月, 2015 2 次提交
  16. 18 4月, 2014 1 次提交
  17. 26 2月, 2013 1 次提交
  18. 23 11月, 2012 1 次提交
    • S
      block: store partition_meta_info.uuid as a string · 1ad7e899
      Stephen Warren 提交于
      This will allow other types of UUID to be stored here, aside from true
      UUIDs.  This also simplifies code that uses this field, since it's usually
      constructed from a, used as a, or compared to other, strings.
      
      Note: A simplistic approach here would be to set uuid_str[36]=0 whenever a
      /PARTNROFF option was found to be present.  However, this modifies the
      input string, and causes subsequent calls to devt_from_partuuid() not to
      see the /PARTNROFF option, which causes different results.  In order to
      avoid misleading future maintainers, this parameter is marked const.
      Signed-off-by: NStephen Warren <swarren@nvidia.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1ad7e899
  19. 01 8月, 2012 1 次提交
    • V
      block: add partition resize function to blkpg ioctl · c83f6bf9
      Vivek Goyal 提交于
      Add a new operation code (BLKPG_RESIZE_PARTITION) to the BLKPG ioctl that
      allows altering the size of an existing partition, even if it is currently
      in use.
      
      This patch converts hd_struct->nr_sects into sequence counter because
      One might extend a partition while IO is happening to it and update of
      nr_sects can be non-atomic on 32bit machines with 64bit sector_t. This
      can lead to issues like reading inconsistent size of a partition. Sequence
      counter have been used so that readers don't have to take bdev mutex lock
      as we call sector_in_part() very frequently.
      
      Now all the access to hd_struct->nr_sects should happen using sequence
      counter read/update helper functions part_nr_sects_read/part_nr_sects_write.
      There is one exception though, set_capacity()/get_capacity(). I think
      theoritically race should exist there too but this patch does not
      modify set_capacity()/get_capacity() due to sheer number of call sites
      and I am afraid that change might break something. I have left that as a
      TODO item. We can handle it later if need be. This patch does not introduce
      any new races as such w.r.t set_capacity()/get_capacity().
      
      v2: Add CONFIG_LBDAF test to UP preempt case as suggested by Phillip.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NPhillip Susi <psusi@ubuntu.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c83f6bf9
  20. 17 7月, 2012 1 次提交
  21. 15 5月, 2012 1 次提交
    • T
      block: fix buffer overflow when printing partition UUIDs · 05c69d29
      Tejun Heo 提交于
      6d1d8050 "block, partition: add partition_meta_info to hd_struct"
      added part_unpack_uuid() which assumes that the passed in buffer has
      enough space for sprintfing "%pU" - 37 characters including '\0'.
      
      Unfortunately, b5af921e "init: add support for root devices
      specified by partition UUID" supplied 33 bytes buffer to the function
      leading to the following panic with stackprotector enabled.
      
        Kernel panic - not syncing: stack-protector: Kernel stack corrupted in: ffffffff81b14c7e
      
        [<ffffffff815e226b>] panic+0xba/0x1c6
        [<ffffffff81b14c7e>] ? printk_all_partitions+0x259/0x26xb
        [<ffffffff810566bb>] __stack_chk_fail+0x1b/0x20
        [<ffffffff81b15c7e>] printk_all_paritions+0x259/0x26xb
        [<ffffffff81aedfe0>] mount_block_root+0x1bc/0x27f
        [<ffffffff81aee0fa>] mount_root+0x57/0x5b
        [<ffffffff81aee23b>] prepare_namespace+0x13d/0x176
        [<ffffffff8107eec0>] ? release_tgcred.isra.4+0x330/0x30
        [<ffffffff81aedd60>] kernel_init+0x155/0x15a
        [<ffffffff81087b97>] ? schedule_tail+0x27/0xb0
        [<ffffffff815f4d24>] kernel_thread_helper+0x5/0x10
        [<ffffffff81aedc0b>] ? start_kernel+0x3c5/0x3c5
        [<ffffffff815f4d20>] ? gs_change+0x13/0x13
      
      Increase the buffer size, remove the dangerous part_unpack_uuid() and
      use snprintf() directly from printk_all_partitions().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NSzymon Gruszczynski <sz.gruszczynski@googlemail.com>
      Cc: Will Drewry <wad@chromium.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      05c69d29
  22. 02 3月, 2012 1 次提交
  23. 04 1月, 2012 1 次提交
  24. 10 11月, 2011 1 次提交
  25. 29 8月, 2011 1 次提交
  26. 24 8月, 2011 1 次提交
    • T
      block: add GENHD_FL_NO_PART_SCAN · d27769ec
      Tejun Heo 提交于
      There are cases where suppressing partition scan is useful - e.g. for
      lo devices and pseudo SATA devices which advertise to be a disk but
      get upset on partition scan (some port multiplier control devices show
      such behavior).
      
      This patch adds GENHD_FL_NO_PART_SCAN which suppresses partition scan
      regardless of the number of possible partitions.  disk_partitionable()
      is renamed to disk_part_scan_enabled() as suppressing partition scan
      doesn't imply the device can't be partitioned using
      BLKPG_ADD/DEL_PARTITION calls from userland.  show_partition() now
      directly tests disk_max_parts() to maintain backward-compatibility.
      
      -v2: Updated to make it clear that only partition scan is suppressed
           not partitioning itself as suggested by Kay Sievers.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      d27769ec
  27. 01 7月, 2011 1 次提交
    • T
      block: flush MEDIA_CHANGE from drivers on close(2) · 85ef06d1
      Tejun Heo 提交于
      Currently, only open(2) is defined as the 'clearing' point.  It has
      two roles - first, it's an acknowledgement from userland indicating
      that the event has been received and kernel can clear pending states
      and proceed to generate more events.  Secondly, it's passed on to
      device drivers as a hint indicating that a synchronization point has
      been reached and it might want to take a deeper look at the device.
      
      The latter currently is only used by sr which uses two different
      mechanisms - GET_EVENT_MEDIA_STATUS_NOTIFICATION and TEST_UNIT_READY
      to discover events, where the former is lighter weight and safe to be
      used repeatedly but may not provide full coverage.  Among other
      things, GET_EVENT can't detect media removal while TUR can.
      
      This patch makes close(2) - blkdev_put() - indicate clearing hint for
      MEDIA_CHANGE to drivers.  disk_check_events() is renamed to
      disk_flush_events() and updated to take @mask for events to flush
      which is or'd to ev->clearing and will be passed to the driver on the
      next ->check_events() invocation.
      
      This change makes sr generate MEDIA_CHANGE when media is ejected from
      userland - e.g. with eject(1).
      
      Note: Given the current usage, it seems @clearing hint is needlessly
      complex.  disk_clear_events() can simply clear all events and the hint
      can be boolean @flush.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      85ef06d1
  28. 30 5月, 2011 1 次提交
    • J
      Revert "block: Remove extra discard_alignment from hd_struct." · a1706ac4
      Jens Axboe 提交于
      It was not a good idea to start dereferencing disk->queue from
      the fs sysfs strategy for displaying discard alignment. We ran
      into first a NULL pointer deref, and after fixing that we sometimes
      see unvalid disk->queue pointer values.
      
      Since discard is the only one of the bunch actually looking into
      the queue, just revert the change.
      
      This reverts commit 23ceb5b7.
      
      Conflicts:
      	fs/partitions/check.c
      a1706ac4
  29. 07 5月, 2011 1 次提交
  30. 22 4月, 2011 1 次提交
  31. 22 3月, 2011 1 次提交
  32. 07 1月, 2011 1 次提交
  33. 05 1月, 2011 1 次提交
    • J
      block: fix accounting bug on cross partition merges · 09e099d4
      Jerome Marchand 提交于
      /proc/diskstats would display a strange output as follows.
      
      $ cat /proc/diskstats |grep sda
         8       0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089
         8       1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691
                                                      ~~~~~~~~~~
         8       2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390
         8       3 sda3 54 487 2188 92 0 0 0 0 0 88 92
         8       4 sda4 4 0 8 0 0 0 0 0 0 0 0
         8       5 sda5 81 2027 2130 138 0 0 0 0 0 87 137
      
      Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is
      merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE.
      
      The detailed root cause is as follows.
      
      Assuming that there are two partition, sda1 and sda2.
      
      1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight
         is 0 and sda2's one is 1.
      
              | hd_struct->in_flight
         ---------------------------
         sda1 |          0
         sda2 |          1
         ---------------------------
      
      2. A bio belongs to sda1 is issued and is merged into the request mentioned on
         step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed
         from sda2 region to sda1 region. However the two partition's
         hd_struct->in_flight are not changed.
      
              | hd_struct->in_flight
         ---------------------------
         sda1 |          0
         sda2 |          1
         ---------------------------
      
      3. The request is finished and blk_account_io_done() is called. In this case,
         sda2's hd_struct->in_flight, not a sda1's one, is decremented.
      
              | hd_struct->in_flight
         ---------------------------
         sda1 |         -1
         sda2 |          1
         ---------------------------
      
      The patch fixes the problem by caching the partition lookup
      inside the request structure, hence making sure that the increment
      and decrement will always happen on the same partition struct. This
      also speeds up IO with accounting enabled, since it cuts down on
      the number of lookups we have to do.
      
      Also add a refcount to struct hd_struct to keep the partition in
      memory as long as users exist. We use kref_test_and_get() to ensure
      we don't add a reference to a partition which is going away.
      Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      09e099d4