1. 28 3月, 2014 2 次提交
  2. 19 2月, 2014 1 次提交
  3. 14 2月, 2014 1 次提交
    • N
      lockd: send correct lock when granting a delayed lock. · 2ec197db
      NeilBrown 提交于
      If an NFS client attempts to get a lock (using NLM) and the lock is
      not available, the server will remember the request and when the lock
      becomes available it will send a GRANT request to the client to
      provide the lock.
      
      If the client already held an adjacent lock, the GRANT callback will
      report the union of the existing and new locks, which can confuse the
      client.
      
      This happens because __posix_lock_file (called by vfs_lock_file)
      updates the passed-in file_lock structure when adjacent or
      over-lapping locks are found.
      
      To avoid this problem we take a copy of the two fields that can
      be changed (fl_start and fl_end) before the call and restore them
      afterwards.
      An alternate would be to allocate a 'struct file_lock', initialise it,
      use locks_copy_lock() to take a copy, then locks_release_private()
      after the vfs_lock_file() call.  But that is a lot more work.
      Reported-by: NOlaf Kirch <okir@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      
      --
      v1 had a couple of issues (large on-stack struct and didn't really work properly).
      This version is much better tested.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      2ec197db
  4. 12 2月, 2014 1 次提交
    • J
      nfsd4: fix acl buffer overrun · 09bdc2d7
      J. Bruce Fields 提交于
      4ac7249e "nfsd: use get_acl and
      ->set_acl" forgets to set the size in the case get_acl() succeeds, so
      _posix_to_nfsv4_one() can then write past the end of its allocation.
      Symptoms were slab corruption warnings.
      
      Also, some minor cleanup while we're here.  (Among other things, note
      that the first few lines guarantee that pacl is non-NULL.)
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      09bdc2d7
  5. 10 2月, 2014 8 次提交
  6. 09 2月, 2014 10 次提交
    • F
      Btrfs: fix data corruption when reading/updating compressed extents · a2aa75e1
      Filipe David Borba Manana 提交于
      When using a mix of compressed file extents and prealloc extents, it
      is possible to fill a page of a file with random, garbage data from
      some unrelated previous use of the page, instead of a sequence of zeroes.
      
      A simple sequence of steps to get into such case, taken from the test
      case I made for xfstests, is:
      
         _scratch_mkfs
         _scratch_mount "-o compress-force=lzo"
         $XFS_IO_PROG -f -c "pwrite -S 0x06 -b 18670 266978 18670" $SCRATCH_MNT/foobar
         $XFS_IO_PROG -c "falloc 26450 665194" $SCRATCH_MNT/foobar
         $XFS_IO_PROG -c "truncate 542872" $SCRATCH_MNT/foobar
         $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar
      
      This results in the following file items in the fs tree:
      
         item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160
             inode generation 6 transid 6 size 542872 block group 0 mode 100600
         item 5 key (257 INODE_REF 256) itemoff 15863 itemsize 16
             inode ref index 2 namelen 6 name: foobar
         item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
             extent data disk byte 0 nr 0 gen 6
             extent data offset 0 nr 24576 ram 266240
             extent compression 0
         item 7 key (257 EXTENT_DATA 24576) itemoff 15757 itemsize 53
             prealloc data disk byte 12849152 nr 241664 gen 6
             prealloc data offset 0 nr 241664
         item 8 key (257 EXTENT_DATA 266240) itemoff 15704 itemsize 53
             extent data disk byte 12845056 nr 4096 gen 6
             extent data offset 0 nr 20480 ram 20480
             extent compression 2
         item 9 key (257 EXTENT_DATA 286720) itemoff 15651 itemsize 53
             prealloc data disk byte 13090816 nr 405504 gen 6
             prealloc data offset 0 nr 258048
      
      The on disk extent at offset 266240 (which corresponds to 1 single disk block),
      contains 5 compressed chunks of file data. Each of the first 4 compress 4096
      bytes of file data, while the last one only compresses 3024 bytes of file data.
      Therefore a read into the file region [285648 ; 286720[ (length = 4096 - 3024 =
      1072 bytes) should always return zeroes (our next extent is a prealloc one).
      
      The solution here is the compression code path to zero the remaining (untouched)
      bytes of the last page it uncompressed data into, as the information about how
      much space the file data consumes in the last page is not known in the upper layer
      fs/btrfs/extent_io.c:__do_readpage(). In __do_readpage we were correctly zeroing
      the remainder of the page but only if it corresponds to the last page of the inode
      and if the inode's size is not a multiple of the page size.
      
      This would cause not only returning random data on reads, but also permanently
      storing random data when updating parts of the region that should be zeroed.
      For the example above, it means updating a single byte in the region [285648 ; 286720[
      would store that byte correctly but also store random data on disk.
      
      A test case for xfstests follows soon.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      a2aa75e1
    • J
      Btrfs: don't loop forever if we can't run because of the tree mod log · 27a377db
      Josef Bacik 提交于
      A user reported a 100% cpu hang with my new delayed ref code.  Turns out I
      forgot to increase the count check when we can't run a delayed ref because of
      the tree mod log.  If we can't run any delayed refs during this there is no
      point in continuing to look, and we need to break out.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      27a377db
    • D
      btrfs: reserve no transaction units in btrfs_ioctl_set_features · 8051aa1a
      David Sterba 提交于
      Added in patch "btrfs: add ioctls to query/change feature bits online"
      modifications to superblock don't need to reserve metadata blocks when
      starting a transaction.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      8051aa1a
    • J
      btrfs: commit transaction after setting label and features · d0270aca
      Jeff Mahoney 提交于
      The set_fslabel ioctl uses btrfs_end_transaction, which means it's
      possible that the change will be lost if the system crashes, same for
      the newly set features. Let's use btrfs_commit_transaction instead.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      d0270aca
    • J
      Btrfs: fix assert screwup for the pending move stuff · 6cc98d90
      Josef Bacik 提交于
      Wang noticed that he was failing btrfs/030 even though me and Filipe couldn't
      reproduce.  Turns out this is because Wang didn't have CONFIG_BTRFS_ASSERT set,
      which meant that a key part of Filipe's original patch was not being built in.
      This appears to be a mess up with merging Filipe's patch as it does not exist in
      his original patch.  Fix this by changing how we make sure del_waiting_dir_move
      asserts that it did not error and take the function out of the ifdef check.
      This makes btrfs/030 pass with the assert on or off.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Reviewed-by: NFilipe Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      6cc98d90
    • L
      Merge tag 'pinctrl-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 49447903
      Linus Torvalds 提交于
      Pull pinctrl fixes from Linus Walleij:
       "First round of pin control fixes for v3.14:
      
         - Protect pinctrl_list_add() with the proper mutex.  This was
           identified by RedHat.  Caused nasty locking warnings was rootcased
           by Stanislaw Gruszka.
      
         - Avoid adding dangerous debugfs files when either half of the
           subsystem is unused: pinmux or pinconf.
      
         - Various fixes to various drivers: locking, hardware particulars, DT
           parsing, error codes"
      
      * tag 'pinctrl-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: tegra: return correct error type
        pinctrl: do not init debugfs entries for unimplemented functionalities
        pinctrl: protect pinctrl_list add
        pinctrl: sirf: correct the pin index of ac97_pins group
        pinctrl: imx27: fix offset calculation in imx_read_2bit
        pinctrl: vt8500: Change devicetree data parsing
        pinctrl: imx27: fix wrong offset to ICONFB
        pinctrl: at91: use locked variant of irq_set_handler
      49447903
    • L
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c132adef
      Linus Torvalds 提交于
      Pull irq fix from Thomas Gleixner:
       "Add a missing Kconfig dependency"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Generic irq chip requires IRQ_DOMAIN
      c132adef
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c1ff8431
      Linus Torvalds 提交于
      Pull x86 fixes from Peter Anvin:
       "Quite a varied little collection of fixes.  Most of them are
        relatively small or isolated; the biggest one is Mel Gorman's fixes
        for TLB range flushing.
      
        A couple of AMD-related fixes (including not crashing when given an
        invalid microcode image) and fix a crash when compiled with gcov"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86, microcode, AMD: Unify valid container checks
        x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y
        x86/efi: Allow mapping BGRT on x86-32
        x86: Fix the initialization of physnode_map
        x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable()
        x86/intel/mid: Fix X86_INTEL_MID dependencies
        arch/x86/mm/srat: Skip NUMA_NO_NODE while parsing SLIT
        mm, x86: Revisit tlb_flushall_shift tuning for page flushes except on IvyBridge
        x86: mm: change tlb_flushall_shift for IvyBridge
        x86/mm: Eliminate redundant page table walk during TLB range flushing
        x86/mm: Clean up inconsistencies when flushing TLB ranges
        mm, x86: Account for TLB flushes only when debugging
        x86/AMD/NB: Fix amd_set_subcaches() parameter type
        x86/quirks: Add workaround for AMD F16h Erratum792
        x86, doc, kconfig: Fix dud URL for Microcode data
      c1ff8431
    • L
      Merge tag 'jfs-3.14-rc2' of git://github.com/kleikamp/linux-shaggy · ec2e6cb2
      Linus Torvalds 提交于
      Pull jfs fix from David Kleikamp:
       "Fix regression"
      
      * tag 'jfs-3.14-rc2' of git://github.com/kleikamp/linux-shaggy:
        jfs: fix generic posix ACL regression
      ec2e6cb2
    • D
      jfs: fix generic posix ACL regression · c18f7b51
      Dave Kleikamp 提交于
      I missed a couple errors in reviewing the patches converting jfs
      to use the generic posix ACL function. Setting ACL's currently
      fails with -EOPNOTSUPP.
      Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      Reported-by: NMichael L. Semon <mlsemon35@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c18f7b51
  7. 08 2月, 2014 15 次提交
    • R
      watchdog: dw_wdt: Add dependency on HAS_IOMEM · 1ccfe6f9
      Richard Weinberger 提交于
      On archs like S390 or um this driver cannot build nor work.
      Make it depend on HAS_IOMEM to bypass build failures.
      
      drivers/built-in.o: In function `dw_wdt_drv_probe':
      drivers/watchdog/dw_wdt.c:302: undefined reference to `devm_ioremap_resource'
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
      1ccfe6f9
    • L
      Merge tag 'driver-core-3.14-rc2' of... · 34a9bff4
      Linus Torvalds 提交于
      Merge tag 'driver-core-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core fix from Greg KH:
       "Here is a single kernfs fix to resolve a much-reported lockdep issue
        with the removal of entries in sysfs"
      
      * tag 'driver-core-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag
      34a9bff4
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 41f76d8b
      Linus Torvalds 提交于
      Pull ceph fixes from Sage Weil:
       "There is an RBD fix for a crash due to the immutable bio changes, an
        error path fix, and a locking fix in the recent redirect support"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
        libceph: do not dereference a NULL bio pointer
        libceph: take map_sem for read in handle_reply()
        libceph: factor out logic from ceph_osdc_start_request()
        libceph: fix error handling in ceph_osdc_init()
      41f76d8b
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 42be3f35
      Linus Torvalds 提交于
      Pull arm64 fixes from Catalin Marinas:
       - Relax VDSO alignment requirements so that the kernel-picked one (4K)
         does not conflict with the dynamic linker's one (64K)
       - VDSO gettimeofday fix
       - Barrier fixes for atomic operations and cache flushing
       - TLB invalidation when overriding early page mappings during boot
       - Wired up new 32-bit arm (compat) syscalls
       - LSM_MMAP_MIN_ADDR when COMPAT is enabled
       - defconfig update
       - Clean-up (comments, pgd_alloc).
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: defconfig: Expand default enabled features
        arm64: asm: remove redundant "cc" clobbers
        arm64: atomics: fix use of acquire + release for full barrier semantics
        arm64: barriers: allow dsb macro to take option parameter
        security: select correct default LSM_MMAP_MIN_ADDR on arm on arm64
        arm64: compat: Wire up new AArch32 syscalls
        arm64: vdso: update wtm fields for CLOCK_MONOTONIC_COARSE
        arm64: vdso: fix coarse clock handling
        arm64: simplify pgd_alloc
        arm64: fix typo: s/SERRROR/SERROR/
        arm64: Invalidate the TLB when replacing pmd entries during boot
        arm64: Align CMA sizes to PAGE_SIZE
        arm64: add DSB after icache flush in __flush_icache_all()
        arm64: vdso: prevent ld from aligning PT_LOAD segments to 64k
      42be3f35
    • L
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · d94d0e27
      Linus Torvalds 提交于
      Pull MIPS updates from Ralf Baechle:
       "hree minor patches.  All have sat in -next for a few days"
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
        MIPS: fpu.h: Fix build when CONFIG_BUG is not set
        MIPS: Wire up sched_setattr/sched_getattr syscalls
        MIPS: Alchemy: Fix DB1100 GPIO registration
      d94d0e27
    • L
      Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 3e382dd9
      Linus Torvalds 提交于
      Pull media fixes from Mauro Carvalho Chehab:
       "A series of small fixes.  Mostly driver ones.  There is one core
        regression fix on a patch that was meant to fix some race issues on
        vb2, but that actually caused more harm than good.  So, we're just
        reverting it for now"
      
      * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        [media] adv7842: Composite free-run platfrom-data fix
        [media] v4l2-dv-timings: fix GTF calculation
        [media] hdpvr: Fix memory leak in debug
        [media] af9035: add ID [2040:f900] Hauppauge WinTV-MiniStick 2
        [media] mxl111sf: Fix compile when CONFIG_DVB_USB_MXL111SF is unset
        [media] mxl111sf: Fix unintentional garbage stack read
        [media] cx24117: use a valid dev pointer for dev_err printout
        [media] cx24117: remove dead code in always 'false' if statement
        [media] update Michael Krufky's email address
        [media] vb2: Check if there are buffers before streamon
        [media] Revert "[media] videobuf_vm_{open,close} race fixes"
        [media] go7007-loader: fix usb_dev leak
        [media] media: bt8xx: add missing put_device call
        [media] exynos4-is: Compile in fimc-lite runtime PM callbacks conditionally
        [media] exynos4-is: Compile in fimc runtime PM callbacks conditionally
        [media] exynos4-is: Fix error paths in probe() for !pm_runtime_enabled()
        [media] s5p-jpeg: Fix wrong NV12 format parameters
        [media] s5k5baf: allow to handle arbitrary long i2c sequences
      3e382dd9
    • L
      Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging · 2091f435
      Linus Torvalds 提交于
      Pull hwmon fixes from Guenter Roeck:
       "Fix PMBus driver problem with some multi-page voltage sensors and fix
        da9055 interrupt initialization"
      
      * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (da9055) Remove use of regmap_irq_get_virq()
        hwmon: (pmbus) Support per-page exponent in linear mode
      2091f435
    • L
      Merge tag 'pm+acpi-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 22446d3f
      Linus Torvalds 提交于
      Pull ACPI and power management fixes from Rafael Wysocki:
       "These include a fix for a recent ACPI hotplug regression, four
        concurrency related fixes and one PCI device removal fix for
        ACPI-based PCI hotplug (ACPIPHP), intel_pstate fix that should go into
        stable, three simple ACPI cleanups and a new entry for the ACPI video
        blacklist.
      
        Specifics:
      
         - Fix for a recent ACPI hotplug regression causing a NULL pointer
           dereference to occur while handling ACPI eject notifications for
           already ejected devices.  From Toshi Kani.
      
         - Four concurrency-related fixes for ACPIPHP.  Two of them add
           missing locking and the other two fix race conditions related to
           reference counting.
      
         - ACPIPHP fix to avoid NULL pointer dereferences during device
           removal involving Virtual Funcions.
      
         - intel_pstate fix to make it compute the percentage of time the CPU
           is busy properly.  From Dirk Brandewie.
      
         - Removal of two unnecessary NULL pointer checks in ACPI code and a
           fix for sscanf() format string from Dan Carpenter and Luis G.F.
      
         - New ACPI video blacklist entry for HP EliteBook Revolve 810 from
           Mika Westerberg"
      
      * tag 'pm+acpi-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI / hotplug: Fix panic on eject to ejected device
        ACPI / battery: Fix incorrect sscanf() string in acpi_battery_init_alarm()
        ACPI / proc: remove unneeded NULL check
        ACPI / utils: remove a pointless NULL check
        ACPI / video: Add HP EliteBook Revolve 810 to the blacklist
        intel_pstate: Take core C0 time into account for core busy calculation
        ACPI / hotplug / PCI: Fix bridge removal race vs dock events
        ACPI / hotplug / PCI: Fix bridge removal race in handle_hotplug_event()
        ACPI / hotplug / PCI: Scan root bus under the PCI rescan-remove lock
        ACPI / hotplug / PCI: Move PCI rescan-remove locking to hotplug_event()
        ACPI / hotplug / PCI: Remove entries from bus->devices in reverse order
      22446d3f
    • I
      libceph: do not dereference a NULL bio pointer · 0ec1d15e
      Ilya Dryomov 提交于
      Commit f38a5181 ("ceph: Convert to immutable biovecs") introduced
      a NULL pointer dereference, which broke rbd in -rc1.  Fix it.
      
      Cc: Kent Overstreet <kmo@daterainc.com>
      Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      0ec1d15e
    • H
      Merge tag 'efi-urgent' into x86/urgent · a3b072cd
      H. Peter Anvin 提交于
       * Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit).
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      a3b072cd
    • I
      libceph: take map_sem for read in handle_reply() · ff513ace
      Ilya Dryomov 提交于
      Handling redirect replies requires both map_sem and request_mutex.
      Taking map_sem unconditionally near the top of handle_reply() avoids
      possible race conditions that arise from releasing request_mutex to be
      able to acquire map_sem in redirect reply case.  (Lock ordering is:
      map_sem, request_mutex, crush_mutex.)
      Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      ff513ace
    • I
      libceph: factor out logic from ceph_osdc_start_request() · 0bbfdfe8
      Ilya Dryomov 提交于
      Factor out logic from ceph_osdc_start_request() into a new helper,
      __ceph_osdc_start_request().  ceph_osdc_start_request() now amounts to
      taking locks and calling __ceph_osdc_start_request().
      Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      0bbfdfe8
    • M
      arm64: defconfig: Expand default enabled features · 55834a77
      Mark Rutland 提交于
      FPGA implementations of the Cortex-A57 and Cortex-A53 are now available
      in the form of the SMM-A57 and SMM-A53 Soft Macrocell Models (SMMs) for
      Versatile Express. As these attach to a Motherboard Express V2M-P1 it
      would be useful to have support for some V2M-P1 peripherals enabled by
      default.
      
      Additionally a couple of of features have been introduced since the last
      defconfig update (CMA, jump labels) that would be good to have enabled
      by default to ensure they are build and boot tested.
      
      This patch updates the arm64 defconfig to enable support for these
      devices and features. The arm64 Kconfig is modified to select
      HAVE_PATA_PLATFORM, which is required to enable support for the
      CompactFlash controller on the V2M-P1.
      
      A few options which don't need to appear in defconfig are trimmed:
      
      * BLK_DEV - selected by default
      * EXPERIMENTAL - otherwise gone from the kernel
      * MII - selected by drivers which require it
      * USB_SUPPORT - selected by default
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      55834a77
    • W
      arm64: asm: remove redundant "cc" clobbers · 95c41896
      Will Deacon 提交于
      cbnz/tbnz don't update the condition flags, so remove the "cc" clobbers
      from inline asm blocks that only use these instructions to implement
      conditional branches.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      95c41896
    • W
      arm64: atomics: fix use of acquire + release for full barrier semantics · 8e86f0b4
      Will Deacon 提交于
      Linux requires a number of atomic operations to provide full barrier
      semantics, that is no memory accesses after the operation can be
      observed before any accesses up to and including the operation in
      program order.
      
      On arm64, these operations have been incorrectly implemented as follows:
      
      	// A, B, C are independent memory locations
      
      	<Access [A]>
      
      	// atomic_op (B)
      1:	ldaxr	x0, [B]		// Exclusive load with acquire
      	<op(B)>
      	stlxr	w1, x0, [B]	// Exclusive store with release
      	cbnz	w1, 1b
      
      	<Access [C]>
      
      The assumption here being that two half barriers are equivalent to a
      full barrier, so the only permitted ordering would be A -> B -> C
      (where B is the atomic operation involving both a load and a store).
      
      Unfortunately, this is not the case by the letter of the architecture
      and, in fact, the accesses to A and C are permitted to pass their
      nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
      or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
      store-release on B). This is a clear violation of the full barrier
      requirement.
      
      The simple way to fix this is to implement the same algorithm as ARMv7
      using explicit barriers:
      
      	<Access [A]>
      
      	// atomic_op (B)
      	dmb	ish		// Full barrier
      1:	ldxr	x0, [B]		// Exclusive load
      	<op(B)>
      	stxr	w1, x0, [B]	// Exclusive store
      	cbnz	w1, 1b
      	dmb	ish		// Full barrier
      
      	<Access [C]>
      
      but this has the undesirable effect of introducing *two* full barrier
      instructions. A better approach is actually the following, non-intuitive
      sequence:
      
      	<Access [A]>
      
      	// atomic_op (B)
      1:	ldxr	x0, [B]		// Exclusive load
      	<op(B)>
      	stlxr	w1, x0, [B]	// Exclusive store with release
      	cbnz	w1, 1b
      	dmb	ish		// Full barrier
      
      	<Access [C]>
      
      The simple observations here are:
      
        - The dmb ensures that no subsequent accesses (e.g. the access to C)
          can enter or pass the atomic sequence.
      
        - The dmb also ensures that no prior accesses (e.g. the access to A)
          can pass the atomic sequence.
      
        - Therefore, no prior access can pass a subsequent access, or
          vice-versa (i.e. A is strictly ordered before C).
      
        - The stlxr ensures that no prior access can pass the store component
          of the atomic operation.
      
      The only tricky part remaining is the ordering between the ldxr and the
      access to A, since the absence of the first dmb means that we're now
      permitting re-ordering between the ldxr and any prior accesses.
      
      From an (arbitrary) observer's point of view, there are two scenarios:
      
        1. We have observed the ldxr. This means that if we perform a store to
           [B], the ldxr will still return older data. If we can observe the
           ldxr, then we can potentially observe the permitted re-ordering
           with the access to A, which is clearly an issue when compared to
           the dmb variant of the code. Thankfully, the exclusive monitor will
           save us here since it will be cleared as a result of the store and
           the ldxr will retry. Notice that any use of a later memory
           observation to imply observation of the ldxr will also imply
           observation of the access to A, since the stlxr/dmb ensure strict
           ordering.
      
        2. We have not observed the ldxr. This means we can perform a store
           and influence the later ldxr. However, that doesn't actually tell
           us anything about the access to [A], so we've not lost anything
           here either when compared to the dmb variant.
      
      This patch implements this solution for our barriered atomic operations,
      ensuring that we satisfy the full barrier requirements where they are
      needed.
      
      Cc: <stable@vger.kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      8e86f0b4
  8. 07 2月, 2014 2 次提交