提交 · a878185c3b93e692ace0d1628a47f3d75504ab4f · openeuler / raspberrypi-kernel

14 12月, 2011 6 次提交

[SCSI] bnx2i: Fixed kernel panic caused by unprotected task->sc->request deref · a878185c

由 Eddie Wai 提交于 12月 06, 2011

During session recovery, the conn_stop call will trigger a flush
to all outstanding SCSI cmds in the xmit queue.  This will set
all outstanding task->sc to NULL prior to the session_teardown
call which frees the task memory.

In the bnx2i SCSI response processing path, only the task was being checked
for NULL under the session lock before the task->sc->request dereferencing.
If there are outstanding SCSI cmd responses pending for process, the
following kernel panic can be exposed where task->sc was found to be NULL.

 Call Trace:
[   69.720205]  [<ffffffffa040d0d0>] bnx2i_process_new_cqes+0x290/0x3c0 [bnx2i]
[   69.804289]  [<ffffffffa040d233>] bnx2i_fastpath_notification+0x33/0xa0 [bnx2
i]
[   69.891490]  [<ffffffffa040d37b>] bnx2i_indicate_kcqe+0xdb/0x330 [bnx2i]
[   69.971427]  [<ffffffffa03eac5e>] service_kcqes+0x16e/0x1d0 [cnic]
[   70.045132]  [<ffffffffa03eacea>] cnic_service_bnx2x_kcq+0x2a/0x50 [cnic]
[   70.126105]  [<ffffffffa03ead53>] cnic_service_bnx2x_bh+0x43/0x140 [cnic]
[   70.207081]  [<ffffffff81060676>] tasklet_action+0x66/0x110
[   70.273521]  [<ffffffff8106025f>] __do_softirq+0xef/0x220
[   70.337887]  [<ffffffff81447ebc>] call_softirq+0x1c/0x30

This patch adds the !task->sc check and also protects the sc dereferencing
under the session lock.
Signed-off-by: NEddie Wai <eddie.wai@broadcom.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

a878185c

[SCSI] qla4xxx: check for failed conn setup · ff1d0319

由 Mike Christie 提交于 12月 01, 2011

iscsi_conn_setup can fail so we must check for NULL being
returned.
Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

ff1d0319

[SCSI] qla4xxx: a small loop fix · e1cd89c5

由 Tomas Henzl 提交于 12月 01, 2011

When the qla4xxx_get_fwddb_entry returns QLA_ERROR
the nex_idx is not updated,
      for (idx = 0; idx < max_ddbs; idx = next_idx) {
                ret = qla4xxx_get_fwddb_entry(ha, idx, NULL, 0, NULL,
                                              &next_idx, &state, &conn_err,
                                                NULL, NULL);
                if (ret == QLA_ERROR)
                        continue;

This means there is a risk that the 'idx < max_ddbs' condition will never
met and the loop will loop forever.
Fix this by explicitly increasing the next_idx in the error condition.

Maybe a break instead of continue is more appropriate, leaving the decision
on the qlogic maintainer.
Signed-off-by: NTomas Henzl <thenzl@redhat.com>
Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

e1cd89c5

[SCSI] qla4xxx: fix flash/ddb support · 13483730

由 Mike Christie 提交于 12月 01, 2011

With open-iscsi support, target entries persisted in the FLASH were not
login. Added support in the qla4xxx driver to do the login on probe
time to the target entries saved in the FLASH by user.
With this changes upgrade to the new kernel with open-iscsi support in
qla4xxx will ensure users original target entries login on driver load
Signed-off-by: NManish Rangankar <manish.rangankar@qlogic.com>
Signed-off-by: NRavi Anand <ravi.anand@qlogic.com>
Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

13483730

[SCSI] zfcp: return early from slave_destroy if slave_alloc returned early · 44f747ff

由 Steffen Maier 提交于 11月 18, 2011

zfcp_scsi_slave_destroy erroneously always tried to finish its task
even if the corresponding previous zfcp_scsi_slave_alloc returned
early. This can lead to kernel page faults on accessing uninitialized
fields of struct zfcp_scsi_dev in zfcp_erp_lun_shutdown_wait. Take the
port field of the struct to determine if slave_alloc returned early.

This zfcp bug is exposed by 4e6c82b3 (in turn fixing f7c9c6bb to be
compatible with 21208ae5) which can call slave_destroy for a
corresponding previous slave_alloc that did not finish.

This patch is based on James Bottomley's fix suggestion in
http://www.spinics.net/lists/linux-scsi/msg55449.html.
Signed-off-by: NSteffen Maier <maier@linux.vnet.ibm.com>
Cc: <stable@kernel.org> #2.6.38+
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

44f747ff

[SCSI] fcoe: Fix preempt count leak in fcoe_filter_frames() · 7e1e7ead

由 Thomas Gleixner 提交于 11月 11, 2011

The error exit path leaks preempt count. Add the missing put_cpu().
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NYi Zou <yi.zou@intel.com>
Cc: stable@kernel.org
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

7e1e7ead

12 12月, 2011 15 次提交

[SCSI] qla2xxx: Update version number to 8.03.07.12-k. · 09b4402d

由 Chad Dupuis 提交于 11月 18, 2011

Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

09b4402d

[SCSI] qla2xxx: Submit all chained IOCBs for passthrough commands on request queue 0. · 0d2aa38e

由 Giridhar Malavali 提交于 11月 18, 2011

Signed-off-by: NGiridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

0d2aa38e

[SCSI] qla2xxx: Correct fc_host port_state display. · 49e85c23

由 Saurav Kashyap 提交于 11月 18, 2011

[jejb: checkpatch fixes]
Add more fine grain parsing of vha->loop_state to export a more accurate
fc_host port_state.
Signed-off-by: NSaurav Kashyap <saurav.kashyap@qlogic.com>
Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

49e85c23

[SCSI] qla2xxx: Disable generating pause frames when firmware hang detected for ISP82xx. · 63154916

由 Giridhar Malavali 提交于 11月 18, 2011

Signed-off-by: NGiridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

63154916

[SCSI] qla2xxx: Clear mailbox busy flag during premature mailbox completion for ISP82xx. · 8937f2f1

由 Giridhar Malavali 提交于 11月 18, 2011

Signed-off-by: NGiridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

8937f2f1

[SCSI] qla2xxx: Encapsulate prematurely completing mailbox commands during ISP82xx firmware hang. · c8f6544e

由 Chad Dupuis 提交于 11月 18, 2011

Signed-off-by: NGiridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

c8f6544e

[SCSI] qla2xxx: Display IPE error message for ISP82xx. · 10a340e6

由 Chad Dupuis 提交于 11月 18, 2011

[jejb: fixup checkpatch error]
Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

10a340e6

[SCSI] qla2xxx: Return the correct value for a mailbox command if 82xx is in reset recovery. · 1806fcd5

由 Andrew Vasquez 提交于 11月 18, 2011

We need to return QLA_FUNCTION_TIMEOUT immediately otherwise we mess up the
mailbox command state machine.
Signed-off-by: NAndrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

1806fcd5

[SCSI] qla2xxx: Enable Minidump by default with default capture mask 0x1f. · 3aadff35

由 Giridhar Malavali 提交于 11月 18, 2011

Signed-off-by: NGiridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

3aadff35

[SCSI] qla2xxx: Stop unconditional completion of mailbox commands issued in... · 841c5e5c

由 Giridhar Malavali 提交于 11月 18, 2011

[SCSI] qla2xxx: Stop unconditional completion of mailbox commands issued in interrupt mode during firmware hang.
Signed-off-by: NGiridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

841c5e5c

[SCSI] qla2xxx: Revert back the request queue mapping to request queue 0. · 0cd33fcf

由 Giridhar Malavali 提交于 11月 18, 2011

If there is an error creating multiple response queues then we need to revert
the request queue mapping back to request queue 0.
Signed-off-by: NGiridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: NAndrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

0cd33fcf

[SCSI] qla2xxx: Don't call alloc_fw_dump for ISP82XX. · be5ea3cf

由 Saurav Kashyap 提交于 11月 18, 2011

Signed-off-by: NSaurav Kashyap <saurav.kashyap@qlogic.com>
Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

be5ea3cf

[SCSI] qla2xxx: Check for SCSI status on underruns. · 4e85e3d9

由 Arun Easi 提交于 11月 18, 2011

Signed-off-by: NArun Easi <arun.easi@qlogic.com>
Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

4e85e3d9

[SCSI] qla2xxx: Remove qla2x00_wait_for_loop_ready function. · ad537689

由 Saurav Kashyap 提交于 11月 18, 2011

This function can wait for 5min under certain scenarios. One of them is when
the port is down from switch and bus reset is issued. The bus reset used to
wait for 5 minutes for the loop and upper layer callers used to hang and give
stack trace because of getting stuck for 120 sec. It is legacy code that was
used when the driver used to do queuing of the commands.
Signed-off-by: NSaurav Kashyap <saurav.kashyap@qlogic.com>
Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

ad537689

[SCSI] mpt2sas: _scsih_smart_predicted_fault uses GFP_KERNEL in interrupt context · f6a290b4

由 Anton Blanchard 提交于 11月 07, 2011

_scsih_smart_predicted_fault is called in an interrupt and therefore
must allocate memory using GFP_ATOMIC.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

f6a290b4

10 12月, 2011 9 次提交

L

Linux 3.2-rc5 · dc47ce90
由 Linus Torvalds 提交于 12月 09, 2011

dc47ce90

Merge git://git.samba.org/sfrench/cifs-2.6 · 8def5f51

由 Linus Torvalds 提交于 12月 09, 2011

* git://git.samba.org/sfrench/cifs-2.6:
  cifs: check for NULL last_entry before calling cifs_save_resume_key
  cifs: attempt to freeze while looping on a receive attempt
  cifs: Fix sparse warning when calling cifs_strtoUCS
  CIFS: Add descriptions to the brlock cache functions

8def5f51

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a776878d

由 Linus Torvalds 提交于 12月 09, 2011

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, efi: Calling __pa() with an ioremap()ed address is invalid
  x86, hpet: Immediately disable HPET timer 1 if rtc irq is masked
  x86/intel_mid: Kconfig select fix
  x86/intel_mid: Fix the Kconfig for MID selection

a776878d

Merge branch 'spi/for-3.2' of git://git.pengutronix.de/git/wsa/linux-2.6 · e2f4e0bc

由 Linus Torvalds 提交于 12月 09, 2011

* 'spi/for-3.2' of git://git.pengutronix.de/git/wsa/linux-2.6:
  spi/gpio: fix section mismatch warning
  spi/fsl-espi: disable CONFIG_SPI_FSL_ESPI=m build
  spi/nuc900: Include linux/module.h
  spi/ath79: fix compile error due to missing include

e2f4e0bc

Merge branch 'for-linus' of git://neil.brown.name/md · af209e0a

由 Linus Torvalds 提交于 12月 09, 2011

* 'for-linus' of git://neil.brown.name/md:
  md: raid5 crash during degradation
  md/raid5: never wait for bad-block acks on failed device.
  md: ensure new badblocks are handled promptly.
  md: bad blocks shouldn't cause a Blocked status on a Faulty device.
  md: take a reference to mddev during sysfs access.
  md: refine interpretation of "hold_active == UNTIL_IOCTL".
  md/lock: ensure updates to page_attrs are properly locked.

af209e0a

Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile · 53523d52

由 Linus Torvalds 提交于 12月 09, 2011

* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  arch/tile: use new generic {enable,disable}_percpu_irq() routines
  drivers/net/ethernet/tile: use skb_frag_page() API
  asm-generic/unistd.h: support new process_vm_{readv,write} syscalls
  arch/tile: fix double-free bug in homecache_free_pages()
  arch/tile: add a few #includes and an EXPORT to catch up with kernel changes.

53523d52

Merge branch 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 592d44a5

由 Linus Torvalds 提交于 12月 09, 2011

* 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  MAINTAINERS: Update amd-iommu F: patterns
  iommu/amd: Fix typo in kernel-parameters.txt
  iommu/msm: Fix compile error in mach-msm/devices-iommu.c
  Fix comparison using wrong pointer variable in dma debug code

592d44a5

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 3ab345fc

由 Linus Torvalds 提交于 12月 09, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek - Fix lost speaker volume controls
  ALSA: hda/realtek - Create "Bass Speaker" for two speaker pins
  ALSA: hda/realtek - Don't create extra controls with channel suffix
  ALSA: hda - Fix remaining VREF mute-LED NID check in post-3.1 changes
  ALSA: hda - Fix GPIO LED setup for IDT 92HD75 codecs
  ASoC: Provide a more complete DMA driver stub
  ASoC: Remove references to corgi and spitz from machine driver document
  ASoC: Make SND_SOC_MX27VIS_AIC32X4 depend on I2C
  ASoC: Fix dependency for SND_SOC_RAUMFELD and SND_PXA2XX_SOC_HX4700
  ASoC: uda1380: Return proper error in uda1380_modinit failure path
  ASoC: kirkwood: Make SND_KIRKWOOD_SOC_OPENRD and SND_KIRKWOOD_SOC_T5325 depend on I2C
  ASoC: Mark WM8994 ADC muxes as virtual
  ALSA: hda/realtek - Fix Oops in alc_mux_select()
  ALSA: sis7019 - give slow codecs more time to reset

3ab345fc

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 975e32c2

由 Linus Torvalds 提交于 12月 09, 2011

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Do no try to schedule task events if there are none
  lockdep, kmemcheck: Annotate ->lock in lockdep_init_map()
  perf header: Use event_name() to get an event name
  perf stat: Failure with "Operation not supported"

975e32c2

09 12月, 2011 10 次提交

sys_getppid: add missing rcu_dereference · 031af165

由 Mandeep Singh Baines 提交于 12月 08, 2011

In order to safely dereference current->real_parent inside an
rcu_read_lock, we need an rcu_dereference.
Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

031af165

rapidio/tsi721: modify PCIe capability settings · 1cee22b7

由 Alexandre Bounine 提交于 12月 08, 2011

Modify initialization of PCIe capability registers in Tsi721 mport driver:
 - change Completion Timeout value to avoid unexpected data transfer
   aborts during intensive traffic.
 - replace hardcoded offset of PCIe capability block by making it use the
   common function.

This patch is applicable to kernel versions starting from 3.2-rc1.
Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1cee22b7

rapidio/tsi721: fix mailbox resource reporting · b439e66f

由 Alexandre Bounine 提交于 12月 08, 2011

Bug fix for Tsi721 RapidIO mport driver: Tsi721 supports four RapidIO
mailboxes (MBOX0 - MBOX3) as defined by RapidIO specification.  Mailbox
resources has to be properly reported to allow use of all available
mailboxes (initial version reports only MBOX0).

This patch is applicable to kernel versions staring from 3.2-rc1.
Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b439e66f

rapidio/tsi721: switch to dma_zalloc_coherent · ceb96398

由 Alexandre Bounine 提交于 12月 08, 2011

Replace the pair dma_alloc_coherent()+memset() with the new
dma_zalloc_coherent() added by Andrew Morton for kernel version 3.2
Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ceb96398

procfs: do not overflow get_{idle,iowait}_time for nohz · 2a95ea6c

由 Michal Hocko 提交于 12月 08, 2011

Since commit a25cac51 ("proc: Consider NO_HZ when printing idle and
iowait times") we are reporting idle/io_wait time also while a CPU is
tickless.  We rely on get_{idle,iowait}_time functions to retrieve
proper data.

These functions, however, use usecs_to_cputime to translate micro
seconds time to cputime64_t.  This is just an alias to usecs_to_jiffies
which reduces the data type from u64 to unsigned int and also checks
whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET)
and returns MAX_JIFFY_OFFSET in that case.

When we overflow depends on CONFIG_HZ but especially for CONFIG_HZ_300
it is quite low (1431649781) so we are getting MAX_JIFFY_OFFSET for
>3000s! until we overflow unsigned int.  Just for reference
CONFIG_HZ_100 has an overflow window around 20s, CONFIG_HZ_250 ~8s and
CONFIG_HZ_1000 ~2s.

This results in a bug when people saw [h]top going mad reporting 100%
CPU usage even though there was basically no CPU load.  The reason was
simply that /proc/stat stopped reporting idle/io_wait changes (and
reported MAX_JIFFY_OFFSET) and so the only change happening was for user
system time.

Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision
to 32b type and it is much more appropriate for cumulative time values
(unlike usecs_to_jiffies which intended for timeout calculations).
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Tested-by: NArtem S. Tashkinov <t.artem@mailcity.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2a95ea6c

mm: vmalloc: check for page allocation failure before vmlist insertion · 1368edf0

由 Mel Gorman 提交于 12月 08, 2011

Commit f5252e00 ("mm: avoid null pointer access in vm_struct via
/proc/vmallocinfo") adds newly allocated vm_structs to the vmlist after
it is fully initialised.  Unfortunately, it did not check that
__vmalloc_area_node() successfully populated the area.  In the event of
allocation failure, the vmalloc area is freed but the pointer to freed
memory is inserted into the vmlist leading to a a crash later in
get_vmalloc_info().

This patch adds a check for ____vmalloc_area_node() failure within
__vmalloc_node_range.  It does not use "goto fail" as in the previous
error path as a warning was already displayed by __vmalloc_area_node()
before it called vfree in its failure path.

Credit goes to Luciano Chavez for doing all the real work of identifying
exactly where the problem was.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Reported-by: NLuciano Chavez <lnx1138@linux.vnet.ibm.com>
Tested-by: NLuciano Chavez <lnx1138@linux.vnet.ibm.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>		[3.1.x+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1368edf0

mm: Ensure that pfn_valid() is called once per pageblock when reserving pageblocks · d0215638

由 Michal Hocko 提交于 12月 08, 2011

setup_zone_migrate_reserve() expects that zone->start_pfn starts at
pageblock_nr_pages aligned pfn otherwise we could access beyond an
existing memblock resulting in the following panic if
CONFIG_HOLES_IN_ZONE is not configured and we do not check pfn_valid:

  IP: [<c02d331d>] setup_zone_migrate_reserve+0xcd/0x180
  *pdpt = 0000000000000000 *pde = f000ff53f000ff53
  Oops: 0000 [#1] SMP
  Pid: 1, comm: swapper Not tainted 3.0.7-0.7-pae #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
  EIP: 0060:[<c02d331d>] EFLAGS: 00010006 CPU: 0
  EIP is at setup_zone_migrate_reserve+0xcd/0x180
  EAX: 000c0000 EBX: f5801fc0 ECX: 000c0000 EDX: 00000000
  ESI: 000c01fe EDI: 000c01fe EBP: 00140000 ESP: f2475f58
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
  Process swapper (pid: 1, ti=f2474000 task=f2472cd0 task.ti=f2474000)
  Call Trace:
  [<c02d389c>] __setup_per_zone_wmarks+0xec/0x160
  [<c02d3a1f>] setup_per_zone_wmarks+0xf/0x20
  [<c08a771c>] init_per_zone_wmark_min+0x27/0x86
  [<c020111b>] do_one_initcall+0x2b/0x160
  [<c086639d>] kernel_init+0xbe/0x157
  [<c05cae26>] kernel_thread_helper+0x6/0xd
  Code: a5 39 f5 89 f7 0f 46 fd 39 cf 76 40 8b 03 f6 c4 08 74 32 eb 91 90 89 c8 c1 e8 0e 0f be 80 80 2f 86 c0 8b 14 85 60 2f 86 c0 89 c8 <2b> 82 b4 12 00 00 c1 e0 05 03 82 ac 12 00 00 8b 00 f6 c4 08 0f
  EIP: [<c02d331d>] setup_zone_migrate_reserve+0xcd/0x180 SS:ESP 0068:f2475f58
  CR2: 00000000000012b4

We crashed in pageblock_is_reserved() when accessing pfn 0xc0000 because
highstart_pfn = 0x36ffe.

The issue was introduced in 3.0-rc1 by 6d3163ce ("mm: check if any page
in a pageblock is reserved before marking it MIGRATE_RESERVE").

Make sure that start_pfn is always aligned to pageblock_nr_pages to
ensure that pfn_valid s always called at the start of each pageblock.
Architectures with holes in pageblocks will be correctly handled by
pfn_valid_within in pageblock_is_reserved.
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Tested-by: NDang Bo <bdang@vmware.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Arve Hjnnevg <arve@android.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>	[3.0+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d0215638

mm/migrate.c: pair unlock_page() and lock_page() when migrating huge pages · 09761333

由 Hillf Danton 提交于 12月 08, 2011

Avoid unlocking and unlocked page if we failed to lock it.
Signed-off-by: NHillf Danton <dhillf@gmail.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09761333

thp: set compound tail page _count to zero · 58a84aa9

由 Youquan Song 提交于 12月 08, 2011

Commit 70b50f94 ("mm: thp: tail page refcounting fix") keeps all
page_tail->_count zero at all times.  But the current kernel does not
set page_tail->_count to zero if a 1GB page is utilized.  So when an
IOMMU 1GB page is used by KVM, it wil result in a kernel oops because a
tail page's _count does not equal zero.

  kernel BUG at include/linux/mm.h:386!
  invalid opcode: 0000 [#1] SMP
  Call Trace:
    gup_pud_range+0xb8/0x19d
    get_user_pages_fast+0xcb/0x192
    ? trace_hardirqs_off+0xd/0xf
    hva_to_pfn+0x119/0x2f2
    gfn_to_pfn_memslot+0x2c/0x2e
    kvm_iommu_map_pages+0xfd/0x1c1
    kvm_iommu_map_memslots+0x7c/0xbd
    kvm_iommu_map_guest+0xaa/0xbf
    kvm_vm_ioctl_assigned_device+0x2ef/0xa47
    kvm_vm_ioctl+0x36c/0x3a2
    do_vfs_ioctl+0x49e/0x4e4
    sys_ioctl+0x5a/0x7c
    system_call_fastpath+0x16/0x1b
  RIP  gup_huge_pud+0xf2/0x159
Signed-off-by: NYouquan Song <youquan.song@intel.com>
Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

58a84aa9

thp: add compound tail page _mapcount when mapped · b6999b19

由 Youquan Song 提交于 12月 08, 2011

With the 3.2-rc kernel, IOMMU 2M pages in KVM works.  But when I tried
to use IOMMU 1GB pages in KVM, I encountered an oops and the 1GB page
failed to be used.

The root cause is that 1GB page allocation calls gup_huge_pud() while 2M
page calls gup_huge_pmd.  If compound pages are used and the page is a
tail page, gup_huge_pmd() increases _mapcount to record tail page are
mapped while gup_huge_pud does not do that.

So when the mapped page is relesed, it will result in kernel oops
because the page is not marked mapped.

This patch add tail process for compound page in 1GB huge page which
keeps the same process as 2M page.

Reproduce like:
1. Add grub boot option: hugepagesz=1G hugepages=8
2. mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages
3. qemu-kvm -m 2048 -hda os-kvm.img -cpu kvm64 -smp 4 -mem-path /dev/hugepages
	-net none -device pci-assign,host=07:00.1

  kernel BUG at mm/swap.c:114!
  invalid opcode: 0000 [#1] SMP
  Call Trace:
    put_page+0x15/0x37
    kvm_release_pfn_clean+0x31/0x36
    kvm_iommu_put_pages+0x94/0xb1
    kvm_iommu_unmap_memslots+0x80/0xb6
    kvm_assign_device+0xba/0x117
    kvm_vm_ioctl_assigned_device+0x301/0xa47
    kvm_vm_ioctl+0x36c/0x3a2
    do_vfs_ioctl+0x49e/0x4e4
    sys_ioctl+0x5a/0x7c
    system_call_fastpath+0x16/0x1b
  RIP  put_compound_page+0xd4/0x168
Signed-off-by: NYouquan Song <youquan.song@intel.com>
Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b6999b19