提交 · 479badc364b52774d77264aaf81f4d4b375a4a97 · openanolis / cloud-kernel

24 12月, 2011 21 次提交

m68k: make fp register stores consistent for m68k and ColdFire · 479badc3

由 Greg Ungerer 提交于 11月 02, 2011

There is no reason we can't make the saved fp registers the same for all
m68k types and ColdFire. There is a little wasted space, but the code
consistency and cleanliness is a big win.

sigcontext.h is an exported header, but currently there is no in-mainline
users of the !__uClinux__ and __mcoldfire__ case that this change effects.
Even better this change actually makes this structure consistent with
the out-of-mainline ColdFire/MMU code.
Signed-off-by: NGreg Ungerer <gerg@uclinux.org>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>

479badc3

m68knommu: no need to set register marker on traps · 46729d0e

由 Greg Ungerer 提交于 10月 31, 2011

Commit 61619b12 ("m68k: merge mmu and
non-mmu include/asm/entry.h files") made the trap entry code basically
the same for mmu and non-mmu builds. This means we no longer need code
to mark the stack frame as "system-call" type or other in the non-mmu
trap handling entry points. This is done in the SAVE_ALL_INT macro now.
Signed-off-by: NGreg Ungerer <gerg@uclinux.org>

46729d0e

m68k: support configure time command line for MMU m68k · d1db9120

由 Greg Ungerer 提交于 10月 19, 2011

The non-MMU builds of m68k allow a fixed kernel boot command line to
be configured at configure time. Allow this MMU builds as well.
Signed-off-by: NGreg Ungerer <gerg@uclinux.org>

d1db9120

m68k: print memory layout info in boot log · e87c09a8

由 Greg Ungerer 提交于 10月 13, 2011

Output a table of the kernel memory regions at boot time.
This is taken directly from the ARM architecture code that does this.
The table looks like this:

Virtual kernel memory layout:
    vector  : 0x00000000 - 0x00000400   (   0 KiB)
    kmap    : 0xd0000000 - 0xe0000000   ( 256 MiB)
    vmalloc : 0xc0000000 - 0xcfffffff   ( 255 MiB)
    lowmem  : 0x00000000 - 0x02000000   (  32 MiB)
      .init : 0x00128000 - 0x00134000   (  48 KiB)
      .text : 0x00020000 - 0x00118d54   ( 996 KiB)
      .data : 0x00118d60 - 0x00126000   (  53 KiB)
      .bss  : 0x00134000 - 0x001413e0   (  53 KiB)

This has been very useful while debugging the ColdFire virtual memory
support code. But in general I think it is nice to know extacly where
the kernel has layed everything out on boot.
Signed-off-by: NGreg Ungerer <gerg@uclinux.org>

e87c09a8

m68knommu: move definition of mach_gettod to where it is used · 361a541d

由 Greg Ungerer 提交于 10月 19, 2011

The mach_gettod function pointer is only called from the time_no.c
code. So move its actual definition to there too. It is currently in
setup_no.c for no particularly good reason.
Signed-off-by: NGreg Ungerer <gerg@uclinux.org>

361a541d

m68k: selection of GENERIC_ATOMIC64 is not MMU specific · 5717a02b

由 Greg Ungerer 提交于 10月 19, 2011

The selection of the CONFIG_GENERIC_ATOMIC64 option is not specific to the
MMU being present and enabled. It is a property of certain CPU families.
So select it based on those CPU types being selected.
Signed-off-by: NGreg Ungerer <gerg@uclinux.org>

5717a02b

m68k: remove thread_info struct from thread struct · d25ba98a

由 Greg Ungerer 提交于 9月 02, 2011

Currently on m68k we have a comeplete thread_info structure stored inside
of the thread_struct, and we also have it in the initial part of the kernel
stack. Mostly the code currently uses the one inside of the thread_struct,
only using the "task" pointer from the stack based one.

This is wasteful and confusing, we should only have the single instance of
thread_info inside the stack page. And this is the norm for all other
architectures.

This change makes m68k handle thread_info consistently on both MMU enabled
and non-MMU setups.
Signed-off-by: NGreg Ungerer <gerg@uclinux.org>

d25ba98a

m68k: remove duplicate asm offset for task thread.info · 8d362b0d

由 Greg Ungerer 提交于 9月 02, 2011

We have a duplicate name and definition for the offset of the thread.info
struct within the task struct in our asm-offsets.c code. Remove one of them,
and consolidate to use a single define, TASK_INFO.
Signed-off-by: NGreg Ungerer <gerg@uclinux.org>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>

8d362b0d

m68k: merge the init_task code for mmu and non-mmu targets · 409ee245

由 Greg Ungerer 提交于 8月 30, 2011

The init_task code can be the same for both mmu and non-mmu targets.
None of the alignment carried out in the the current init_task code
is necessary. The linker script takes care of aligning the init_thread
structure to a THREAD SIZE boundary, and that is all we need.

So use the init_task.c code for all target types, that makes m68k
code consistent with what most other architectures do.
Signed-off-by: NGreg Ungerer <gerg@uclinux.org>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>

409ee245

m68knommu: remove unused fasthandler declaration · ed3da2c4

由 Greg Ungerer 提交于 8月 30, 2011

The fasthandler code was removed long ago. Remove the now unused
declaration of it.
Signed-off-by: NGreg Ungerer <gerg@uclinux.org>

ed3da2c4

m68k: Fall back to __gpio_to_irq() for non-arch GPIOs · d85b4094

由 Mark Brown 提交于 10月 26, 2011

gpiolib provides __gpio_to_irq() to map gpiolib gpios to interrupts - hook
that up on m68k.
Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: NGreg Ungerer <gerg@uclinux.org>

d85b4094

clocksource: m86k: Convert to clocksource_register_hz/khz · a2a3dfb8

由 john stultz 提交于 10月 25, 2011

Updated to merge the valid bits of the two m68k patches.

This converts the m86k clocksources to use clocksource_register_hz/khz

This is untested, so any assistance in testing would be appreciated!

CC: Geert Uytterhoeven <geert@linux-m68k.org>
CC: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Signed-off-by: NGreg Ungerer <gerg@uclinux.org>

a2a3dfb8

L

Linux 3.2-rc7 · 5f0a6e2d
由 Linus Torvalds 提交于 12月 23, 2011

5f0a6e2d
L
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · a22681fa
由 Linus Torvalds 提交于 12月 23, 2011
```
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: Fix race between CPU hotplug and lglocks
```
a22681fa

Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux · 6d451c57

由 Linus Torvalds 提交于 12月 23, 2011

for linus: writeback reason binary tracing format fix

* tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: show writeback reason with __print_symbolic

6d451c57

L
Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · 71448c1f
由 Linus Torvalds 提交于 12月 23, 2011
```
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kconfig: adapt update-po-config to new UML layout
```
71448c1f

Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 4d18de94

由 Linus Torvalds 提交于 12月 23, 2011

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] omap3isp: Fix crash caused by subdevs now having a pointer to devnodes

4d18de94

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 827fa4c7

由 Linus Torvalds 提交于 12月 23, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: call d_instantiate after all ops are setup
  Btrfs: fix worker lock misuse in find_worker

827fa4c7

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 5d219c6b

由 Linus Torvalds 提交于 12月 23, 2011

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix MSIQ HV call ordering in pci_sun4v_msiq_build_irq().

5d219c6b

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 155d4551

由 Linus Torvalds 提交于 12月 23, 2011

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  netfilter: xt_connbytes: handle negation correctly
  net: relax rcvbuf limits
  rps: fix insufficient bounds checking in store_rps_dev_flow_table_cnt()
  net: introduce DST_NOPEER dst flag
  mqprio: Avoid panic if no options are provided
  bridge: provide a mtu() method for fake_dst_ops

155d4551

D

Merge branch 'nf' of git://1984.lsi.us.es/net · 6350323a
由 David S. Miller 提交于 12月 23, 2011

6350323a

23 12月, 2011 18 次提交

netfilter: xt_connbytes: handle negation correctly · 0354b48f

由 Florian Westphal 提交于 12月 16, 2011

"! --connbytes 23:42" should match if the packet/byte count is not in range.

As there is no explict "invert match" toggle in the match structure,
userspace swaps the from and to arguments
(i.e., as if "--connbytes 42:23" were given).

However, "what <= 23 && what >= 42" will always be false.

Change things so we use "||" in case "from" is larger than "to".

This change may look like it breaks backwards compatibility when "to" is 0.
However, older iptables binaries will refuse "connbytes 42:0",
and current releases treat it to mean "! --connbytes 0:42",
so we should be fine.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

0354b48f

Btrfs: call d_instantiate after all ops are setup · 08c422c2

由 Al Viro 提交于 12月 23, 2011

This closes races where btrfs is calling d_instantiate too soon during
inode creation.  All of the callers of btrfs_add_nondir are updated to
instantiate after the inode is fully setup in memory.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

08c422c2

Btrfs: fix worker lock misuse in find_worker · 8d532b2a

由 Chris Mason 提交于 12月 23, 2011

Dan Carpenter noticed that we were doing a double unlock on the worker
lock, and sometimes picking a worker thread without the lock held.

This fixes both errors.
Signed-off-by: NChris Mason <chris.mason@oracle.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>

8d532b2a

net: relax rcvbuf limits · 0fd7bac6

由 Eric Dumazet 提交于 12月 21, 2011

skb->truesize might be big even for a small packet.

Its even bigger after commit 87fb4b7b (net: more accurate skb
truesize) and big MTU.

We should allow queueing at least one packet per receiver, even with a
low RCVBUF setting.
Reported-by: NMichal Simek <monstr@monstr.eu>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0fd7bac6

rps: fix insufficient bounds checking in store_rps_dev_flow_table_cnt() · a0a129f8

由 Xi Wang 提交于 12月 22, 2011

Setting a large rps_flow_cnt like (1 << 30) on 32-bit platform will
cause a kernel oops due to insufficient bounds checking.

	if (count > 1<<30) {
		/* Enforce a limit to prevent overflow */
		return -EINVAL;
	}
	count = roundup_pow_of_two(count);
	table = vmalloc(RPS_DEV_FLOW_TABLE_SIZE(count));

Note that the macro RPS_DEV_FLOW_TABLE_SIZE(count) is defined as:

	... + (count * sizeof(struct rps_dev_flow))

where sizeof(struct rps_dev_flow) is 8.  (1 << 30) * 8 will overflow
32 bits.

This patch replaces the magic number (1 << 30) with a symbolic bound.
Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0a129f8

net: introduce DST_NOPEER dst flag · e688a604

由 Eric Dumazet 提交于 12月 22, 2011

Chris Boot reported crashes occurring in ipv6_select_ident().

[  461.457562] RIP: 0010:[<ffffffff812dde61>]  [<ffffffff812dde61>]
ipv6_select_ident+0x31/0xa7

[  461.578229] Call Trace:
[  461.580742] <IRQ>
[  461.582870]  [<ffffffff812efa7f>] ? udp6_ufo_fragment+0x124/0x1a2
[  461.589054]  [<ffffffff812dbfe0>] ? ipv6_gso_segment+0xc0/0x155
[  461.595140]  [<ffffffff812700c6>] ? skb_gso_segment+0x208/0x28b
[  461.601198]  [<ffffffffa03f236b>] ? ipv6_confirm+0x146/0x15e
[nf_conntrack_ipv6]
[  461.608786]  [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77
[  461.614227]  [<ffffffff81271d64>] ? dev_hard_start_xmit+0x357/0x543
[  461.620659]  [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111
[  461.626440]  [<ffffffffa0379745>] ? br_parse_ip_options+0x19a/0x19a
[bridge]
[  461.633581]  [<ffffffff812722ff>] ? dev_queue_xmit+0x3af/0x459
[  461.639577]  [<ffffffffa03747d2>] ? br_dev_queue_push_xmit+0x72/0x76
[bridge]
[  461.646887]  [<ffffffffa03791e3>] ? br_nf_post_routing+0x17d/0x18f
[bridge]
[  461.653997]  [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77
[  461.659473]  [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge]
[  461.665485]  [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111
[  461.671234]  [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge]
[  461.677299]  [<ffffffffa0379215>] ?
nf_bridge_update_protocol+0x20/0x20 [bridge]
[  461.684891]  [<ffffffffa03bb0e5>] ? nf_ct_zone+0xa/0x17 [nf_conntrack]
[  461.691520]  [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge]
[  461.697572]  [<ffffffffa0374812>] ? NF_HOOK.constprop.8+0x3c/0x56
[bridge]
[  461.704616]  [<ffffffffa0379031>] ?
nf_bridge_push_encap_header+0x1c/0x26 [bridge]
[  461.712329]  [<ffffffffa037929f>] ? br_nf_forward_finish+0x8a/0x95
[bridge]
[  461.719490]  [<ffffffffa037900a>] ?
nf_bridge_pull_encap_header+0x1c/0x27 [bridge]
[  461.727223]  [<ffffffffa0379974>] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge]
[  461.734292]  [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77
[  461.739758]  [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge]
[  461.746203]  [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111
[  461.751950]  [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge]
[  461.758378]  [<ffffffffa037533a>] ? NF_HOOK.constprop.4+0x56/0x56
[bridge]

This is caused by bridge netfilter special dst_entry (fake_rtable), a
special shared entry, where attaching an inetpeer makes no sense.

Problem is present since commit 87c48fa3 (ipv6: make fragment
identifications less predictable)

Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and
__ip_select_ident() fallback to the 'no peer attached' handling.
Reported-by: NChris Boot <bootc@bootc.net>
Tested-by: NChris Boot <bootc@bootc.net>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e688a604

mqprio: Avoid panic if no options are provided · 7838f2ce

由 Thomas Graf 提交于 12月 22, 2011

Userspace may not provide TCA_OPTIONS, in fact tc currently does
so not do so if no arguments are specified on the command line.
Return EINVAL instead of panicing.
Signed-off-by: NThomas Graf <tgraf@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7838f2ce

bridge: provide a mtu() method for fake_dst_ops · a13861a2

由 Eric Dumazet 提交于 12月 21, 2011

Commit 618f9bc7 (net: Move mtu handling down to the protocol
depended handlers) forgot the bridge netfilter case, adding a NULL
dereference in ip_fragment().
Reported-by: NChris Boot <bootc@bootc.net>
CC: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a13861a2

Merge branch 'for-linus' of git://neil.brown.name/md · ad1fca20

由 Linus Torvalds 提交于 12月 22, 2011

* 'for-linus' of git://neil.brown.name/md:
  md/bitmap: It is OK to clear bits during recovery.
  md: don't give up looking for spares on first failure-to-add
  md/raid5: ensure correct assessment of drives during degraded reshape.
  md/linear: fix hot-add of devices to linear arrays.

ad1fca20

md/bitmap: It is OK to clear bits during recovery. · 961902c0

由 NeilBrown 提交于 12月 23, 2011

commit d0a4bb49 introduced a
regression which is annoying but fairly harmless.

When writing to an array that is undergoing recovery (a spare
in being integrated into the array), writing to the array will
set bits in the bitmap, but they will not be cleared when the
write completes.

For bits covering areas that have not been recovered yet this is not a
problem as the recovery will clear the bits.  However bits set in
already-recovered region will stay set and never be cleared.
This doesn't risk data integrity.  The only negatives are:
 - next time there is a crash, more resyncing than necessary will
   be done.
 - the bitmap doesn't look clean, which is confusing.

While an array is recovering we don't want to update the
'events_cleared' setting in the bitmap but we do still want to clear
bits that have very recently been set - providing they were written to
the recovering device.

So split those two needs - which previously both depended on 'success'
and always clear the bit of the write went to all devices.
Signed-off-by: NNeilBrown <neilb@suse.de>

961902c0

md: don't give up looking for spares on first failure-to-add · 60fc1370

由 NeilBrown 提交于 12月 23, 2011

Before performing a recovery we try to remove any spares that
might not be working, then add any that might have become relevant.

Currently we abort on the first spare that cannot be added.
This is a false optimisation.
It is conceivable that - depending on rules in the personality - a
subsequent spare might be accepted.
Also the loop does other things like count the available spares and
reset the 'recovery_offset' value.

If we abort early these might not happen properly.

So remove the early abort.

In particular if you have an array what is undergoing recovery and
which has extra spares, then the recovery may not restart after as
reboot as the could of 'spares' might end up as zero.
Reported-by: NAnssi Hannula <anssi.hannula@iki.fi>
Signed-off-by: NNeilBrown <neilb@suse.de>

60fc1370

md/raid5: ensure correct assessment of drives during degraded reshape. · 30d7a483

由 NeilBrown 提交于 12月 23, 2011

While reshaping a degraded array (as when reshaping a RAID0 by first
converting it to a degraded RAID4) we currently get confused about
which devices are in_sync. In most cases we get it right, but in the
region that is being reshaped we need to treat non-failed devices as
in-sync when we have the data but haven't actually written it out yet.
Reported-by: NAdam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

30d7a483

md/linear: fix hot-add of devices to linear arrays. · 09cd9270

由 NeilBrown 提交于 12月 23, 2011

commit d70ed2e4
broke hot-add to a linear array.
After that commit, metadata if not written to devices until they
have been fully integrated into the array as determined by
saved_raid_disk.  That patch arranged to clear that field after
a recovery completed.

However for linear arrays, there is no recovery - the integration is
instantaneous.  So we need to explicitly clear the saved_raid_disk
field.
Signed-off-by: NNeilBrown <neilb@suse.de>

09cd9270

sparc64: Fix MSIQ HV call ordering in pci_sun4v_msiq_build_irq(). · 7cc85833

由 David S. Miller 提交于 12月 22, 2011

This silently was working for many years and stopped working on
Niagara-T3 machines.

We need to set the MSIQ to VALID before we can set it's state to IDLE.

On Niagara-T3, setting the state to IDLE first was causing HV_EINVAL
errors.  The hypervisor documentation says, rather ambiguously, that
the MSIQ must be "initialized" before one can set the state.

I previously understood this to mean merely that a successful setconf()
operation has been performed on the MSIQ, which we have done at this
point.  But it seems to also mean that it has been set VALID too.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7cc85833

Merge branch 'usb-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · b3b1b70e

由 Linus Torvalds 提交于 12月 22, 2011

* 'usb-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: Fix usb/isp1760 build on sparc
  usb: gadget: epautoconf: do not change number of streams
  usb: dwc3: core: fix cached revision on our structure
  usb: musb: fix reset issue with full speed device

b3b1b70e

L
Merge branch 'upstream-linus' of git://github.com/jgarzik/libata-dev · abe8809c
由 Linus Torvalds 提交于 12月 22, 2011
```
* 'upstream-linus' of git://github.com/jgarzik/libata-dev:
  pata_of_platform: Add missing CONFIG_OF_IRQ dependency.
```
abe8809c
D
pata_of_platform: Add missing CONFIG_OF_IRQ dependency. · 19d40dca
由 David Miller 提交于 12月 21, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
```
19d40dca

ipv4: using prefetch requires including prefetch.h · b9eda06f

由 Stephen Rothwell 提交于 12月 22, 2011

Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b9eda06f

22 12月, 2011 1 次提交

VFS: Fix race between CPU hotplug and lglocks · e30e2fdf

由 Srivatsa S. Bhat 提交于 12月 22, 2011

Currently, the *_global_[un]lock_online() routines are not at all synchronized
with CPU hotplug. Soft-lockups detected as a consequence of this race was
reported earlier at https://lkml.org/lkml/2011/8/24/185. (Thanks to Cong Meng
for finding out that the root-cause of this issue is the race condition
between br_write_[un]lock() and CPU hotplug, which results in the lock states
getting messed up).

Fixing this race by just adding {get,put}_online_cpus() at appropriate places
in *_global_[un]lock_online() is not a good option, because, then suddenly
br_write_[un]lock() would become blocking, whereas they have been kept as
non-blocking all this time, and we would want to keep them that way.

So, overall, we want to ensure 3 things:
1. br_write_lock() and br_write_unlock() must remain as non-blocking.
2. The corresponding lock and unlock of the per-cpu spinlocks must not happen
   for different sets of CPUs.
3. Either prevent any new CPU online operation in between this lock-unlock, or
   ensure that the newly onlined CPU does not proceed with its corresponding
   per-cpu spinlock unlocked.

To achieve all this:
(a) We introduce a new spinlock that is taken by the *_global_lock_online()
    routine and released by the *_global_unlock_online() routine.
(b) We register a callback for CPU hotplug notifications, and this callback
    takes the same spinlock as above.
(c) We maintain a bitmap which is close to the cpu_online_mask, and once it is
    initialized in the lock_init() code, all future updates to it are done in
    the callback, under the above spinlock.
(d) The above bitmap is used (instead of cpu_online_mask) while locking and
    unlocking the per-cpu locks.

The callback takes the spinlock upon the CPU_UP_PREPARE event. So, if the
br_write_lock-unlock sequence is in progress, the callback keeps spinning,
thus preventing the CPU online operation till the lock-unlock sequence is
complete. This takes care of requirement (3).

The bitmap that we maintain remains unmodified throughout the lock-unlock
sequence, since all updates to it are managed by the callback, which takes
the same spinlock as the one taken by the lock code and released only by the
unlock routine. Combining this with (d) above, satisfies requirement (2).

Overall, since we use a spinlock (mentioned in (a)) to prevent CPU hotplug
operations from racing with br_write_lock-unlock, requirement (1) is also
taken care of.

By the way, it is to be noted that a CPU offline operation can actually run
in parallel with our lock-unlock sequence, because our callback doesn't react
to notifications earlier than CPU_DEAD (in order to maintain our bitmap
properly). And this means, since we use our own bitmap (which is stale, on
purpose) during the lock-unlock sequence, we could end up unlocking the
per-cpu lock of an offline CPU (because we had locked it earlier, when the
CPU was online), in order to satisfy requirement (2). But this is harmless,
though it looks a bit awkward.
Debugged-by: NCong Meng <mc@linux.vnet.ibm.com>
Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org

e30e2fdf

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功