提交 · 151743fd8dfb02956c5184b5f4f0f42677eb75bc · openanolis / cloud-kernel

21 2月, 2013 2 次提交

ahci: Add Device IDs for Intel Wellsburg PCH · 151743fd

由 James Ralston 提交于 2月 08, 2013

This patch adds the AHCI-mode SATA Device IDs for the Intel Wellsburg PCH
Signed-off-by: NJames Ralston <james.d.ralston@intel.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

151743fd

ata_piix: Add Device IDs for Intel Wellsburg PCH · 3aee8bc5

由 James Ralston 提交于 2月 08, 2013

This patch adds the IDE-mode SATA Device IDs for the Intel Wellsburg PCH
Signed-off-by: NJames Ralston <james.d.ralston@intel.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

3aee8bc5

26 1月, 2013 8 次提交

[SCSI] remove can_power_off flag from scsi_device · 44ec657b

由 Aaron Lu 提交于 1月 23, 2013

Commit 166a2967 "libata: tell scsi layer
device supports runtime power off" introduced the can_power_off flag for
scsi_device and is used to support ZPODD implementation in SCSI layer.
Since ZPODD is now implemented in ATA layer, that flag is no longer
needed, so remove it.
Signed-off-by: NAaron Lu <aaron.lu@intel.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

44ec657b

[libata] scsi: no poll when ODD is powered off · 6f4c827e

由 Aaron Lu 提交于 1月 23, 2013

When the ODD is powered off, any action the user did to the ODD that
would generate a media event will trigger an ACPI interrupt, so the
poll for media event is no longer necessary. And the poll will also
cause a runtime status change, which will stop the ODD from staying in
powered off state, so the poll should better be stopped.

But since we don't have access to the gendisk structure in LLDs, here
comes the disk_events_disable_depth for scsi device. This field is a
hint set by LLDs to convey information to upper layer drivers. A value
of 0 means media poll is necessary for the device, while values above 0
means media poll is not needed and should better be skipped. So we can
increase its value when we are to power off the ODD in ATA layer and
decrease its value when the ODD is powered on, effectively silence the
media events poll.
Signed-off-by: NAaron Lu <aaron.lu@intel.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

6f4c827e

[SCSI] sr: support runtime pm · 6c7f1e2f

由 Aaron Lu 提交于 1月 23, 2013

This patch adds runtime pm support for sr.

It did this by increasing the runtime usage_count of the device when
its block device is accessed. And decreasing the runtime usage_count
of the device when the access is done.

If there is media inside, runtime suspend is not allowed as we don't
always know if the ODD is being used or not.

The idea is discussed here:
http://thread.gmane.org/gmane.linux.acpi.devel/55243/focus=52703
and the restriction to check media inside is discussed here:
http://thread.gmane.org/gmane.linux.ide/53665/focus=58836Signed-off-by: NAaron Lu <aaron.lu@intel.com>
Acked-by: NAlan Stern <stern@rowland.harvard.edu>
Acked-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

6c7f1e2f

ahci: AHCI-mode SATA patch for Intel Avoton DeviceIDs · 29e674dd

由 Seth Heasley 提交于 1月 25, 2013

This patch adds the AHCI and RAID-mode SATA DeviceIDs for the Intel Avoton SOC.
Signed-off-by: NSeth Heasley <seth.heasley@intel.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

29e674dd

ata_piix: IDE-mode SATA patch for Intel Avoton DeviceIDs · aaa51527

由 Seth Heasley 提交于 1月 25, 2013

This patch adds the IDE-mode SATA DeviceIDs for the Intel Avoton SOC.
Signed-off-by: NSeth Heasley <seth.heasley@intel.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

aaa51527

[libata] PM code cleanup for ata port · f5e6d0d0

由 Aaron Lu 提交于 1月 25, 2013

For system freeze, if the port is already runtime suspended, leave it
alone and just return. The port will be resumed on thaw before it will
be used.

And since we will call get_noresume for every device during prepare
phase, and the port is resumed during thaw phase, it can't be in runtime
suspended state during the poweroff phase. So remove the
runtime_suspended check in poweroff callback.

And for all suspend(freeze/suspend/poweroff/etc.), there is no need to
touch the device, so set no_autopsy and no_recovery for them all.
Signed-off-by: NAaron Lu <aaron.lu@intel.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

f5e6d0d0

[libata] pm: differentiate system and runtime pm for ata port · a7ff60db

由 Aaron Lu 提交于 1月 25, 2013

We need to do different things for system PM and runtime PM, e.g. we do
not need to enable runtime wake for ZPODD when we are doing system
suspend, etc.

Currently, we use PMSG_SUSPEND for both system suspend and runtime
suspend and PMSG_ON for both system resume and runtime resume. Change
this by using PMSG_AUTO_SUSPEND for runtime suspend and PMSG_AUTO_RESUME
for runtime resume. And since PMSG_ON means no transition, it is changed
to PMSG_RESUME for ata port's system resume.

The ata_acpi_set_state is modified accordingly, and the sata case and
pata case is seperated for easy reading.
Signed-off-by: NAaron Lu <aaron.lu@intel.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

a7ff60db

J
Revert "libata: export host controller number thru /sys" · e175435e
由 Jeff Garzik 提交于 1月 25, 2013
```
This reverts commit 1757d902.

Discussion continues upstream.
```
e175435e

22 1月, 2013 6 次提交

libata: do not suspend port if normal ODD is attached · 7e15e9be

由 Aaron Lu 提交于 1月 15, 2013

For ODDs, the upper layer will poll for media change every few
seconds, which will make it enter and leave suspend state very
often. And as each suspend will also cause a hard/soft reset,
the gain of runtime suspend is very little while the ODD may
malfunction after constantly being reset. So the idle callback
here will not proceed to suspend if a non-ZPODD capable ODD is
attached to the port.
Signed-off-by: NAaron Lu <aaron.lu@intel.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

7e15e9be

libata: expose pm qos flags for ata device · a59b9aae

由 Aaron Lu 提交于 1月 15, 2013

Expose pm qos flags to user space so that user has a chance to disable
ZPODD feature, if he/she has a broken platform or devices or simply does
not like this feature.

This flag is exposed to user space only for ZPODD devices.

Due to this flag, it is possible the ODD is ZP ready but we didn't power
it off. So the zp_ready flag will need to be cleared whenever we found
the ODD is not in ZP ready state. Previously, once zp_ready is set, the
ODD will always be powered off and the flag will be cleared in
post_poweron. But this is no longer the case now.
Signed-off-by: NAaron Lu <aaron.lu@intel.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

a59b9aae

libata: handle power transition of ODD · 21334205

由 Aaron Lu 提交于 1月 15, 2013

When ata port is runtime suspended, it will check if the ODD attched to
it is a zero power(ZP) capable ODD and if the ZP capable ODD is in zero
power ready state. And if this is not the case, the highest acpi state
will be limited to ACPI_STATE_D3_HOT to avoid powering off the ODD. And
if the ODD can be powered off, runtime wake capability needs to be
enabled and powered_off flag will be set to let resume code knows that
the ODD was in powered off state.

And on resume, before it is powered on, if it was powered off during
suspend, runtime wake capability needs to be disabled. After it is
recovered, the ODD is considered functional, post power on processing
like eject tray if the ODD is drawer type is done, and several ZPODD
related fields will also be reset.
Signed-off-by: NAaron Lu <aaron.lu@intel.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

21334205

libata: check zero power ready status for ZPODD · 3dc67440

由 Aaron Lu 提交于 1月 15, 2013

Per the Mount Fuji spec, the ODD is considered zero power ready when:
  - For slot type ODD, no media inside;
  - For tray type ODD, no media inside and tray closed.

The information can be retrieved by either the returned information of
command GET_EVENT_STATUS_NOTIFICATION(the command is used to poll for
media event) or sense code.

The information provided by the media status byte is not accurate, it
is possible that after a new disc is just inserted, the status byte
still returns media not present. So this information can not be used as
the deciding factor, we use sense code to decide if zpready status is
true.

When we first sensed the ODD in the zero power ready state, the
zp_sampled will be set and timestamp will be recoreded. And after ODD
stayed in this state for some pre-defined period, the ODD is considered
as power off ready and the zp_ready flag will be set. The zp_ready flag
serves as the deciding factor other code will use to see if power off is
OK for the ODD.

The Mount Fuji spec suggests a delay should be used here, to avoid the
case user ejects the ODD and then instantly inserts a new one again, so
that we can avoid a power transition. And some ODDs may be slow to place
its head to the home position after disc is ejected, so a delay here is
generally a good idea. And the delay time can be changed via the module
param zpodd_poweroff_delay.

The zero power ready status check is performed in the ata port's runtime
suspend code path, when port is not frozen yet, as we need to issue some
IOs to the ODD.
Signed-off-by: NAaron Lu <aaron.lu@intel.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

3dc67440

libata: move acpi notification code to zpodd · f064a20d

由 Aaron Lu 提交于 1月 15, 2013

Since the ata acpi notification code introduced in commit
3bd46600 is solely for ZPODD, and we
now have a dedicated place for it, move these code there.

And the ata_acpi_add_pm_notifier code is changed a little bit in that it
is now invoked when scsi device is not bound with ACPI yet, so the way
to get the acpi handle is different with the previous version. And the
ata_acpi_add/remove_pm_notifier is also simplified a little bit in that
it doesn't check if the acpi_device for the handle exists or not as the
odd_can_poweroff function already checked that.
Signed-off-by: NAaron Lu <aaron.lu@intel.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

f064a20d

libata: identify and init ZPODD devices · afe75951

由 Aaron Lu 提交于 1月 15, 2013

The ODD can be enabled for ZPODD if the following three conditions are
satisfied:
1 The ODD supports device attention;
2 The platform can runtime power off the ODD through ACPI;
3 The ODD is either slot type or drawer type.
For such ODDs, zpodd_init is called and a new structure is allocated for
it to store ZPODD related stuffs.

And the zpodd_dev_enabled function is used to test if ZPODD is currently
enabled for this ODD.

A new config CONFIG_SATA_ZPODD is added to selectively build ZPODD code.
Signed-off-by: NAaron Lu <aaron.lu@intel.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

afe75951

15 1月, 2013 5 次提交

libata: export host controller number thru /sys · 1757d902

由 David Milburn 提交于 1月 14, 2013

As low-level drivers register their host controller(s), keep track
of the number of controllers and export thru /sys in a <host.port>
format so that udev can better match up port numbers with a
specific controller.

# pwd
/sys/devices/pci0000:00
# find . -name 'ata*' -print

(2nd controller with port multiplier attached)

./0000:00:1e.0/0000:05:01.0/ata2.7
./0000:00:1e.0/0000:05:01.0/ata2.7/link7/dev7.0/ata_device
./0000:00:1e.0/0000:05:01.0/ata2.7/link7/ata_link
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.0/dev7.0.0/ata_device
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.0/ata_link
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.1/dev7.1.0/ata_device
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.1/ata_link
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.2/dev7.2.0/ata_device
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.2/ata_link
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.3/dev7.3.0/ata_device
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.3/ata_link
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.4/dev7.4.0/ata_device
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.4/ata_link
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.5/dev7.5.0/ata_device
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.5/ata_link
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.6/dev7.6.0/ata_device
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.6/ata_link
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.7/dev7.7.0/ata_device
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.7/ata_link
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.8/dev7.8.0/ata_device
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.8/ata_link
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.9/dev7.9.0/ata_device
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.9/ata_link
./0000:00:1e.0/0000:05:01.0/ata2.7/ata_port
./0000:00:1e.0/0000:05:01.0/ata2.7/ata_port/ata2.7
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.10/dev7.10.0/ata_device
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.10/ata_link
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.11/dev7.11.0/ata_device
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.11/ata_link
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.12/dev7.12.0/ata_device
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.12/ata_link
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.13/dev7.13.0/ata_device
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.13/ata_link
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.14/dev7.14.0/ata_device
./0000:00:1e.0/0000:05:01.0/ata2.7/link7.14/ata_link
./0000:00:1e.0/0000:05:01.0/ata2.8
./0000:00:1e.0/0000:05:01.0/ata2.8/link8/dev8.0/ata_device
./0000:00:1e.0/0000:05:01.0/ata2.8/link8/ata_link
./0000:00:1e.0/0000:05:01.0/ata2.8/ata_port
./0000:00:1e.0/0000:05:01.0/ata2.8/ata_port/ata2.8

(1st controller)

./0000:00:1f.2/ata1.1
./0000:00:1f.2/ata1.1/link1/dev1.0/ata_device
./0000:00:1f.2/ata1.1/link1/ata_link
./0000:00:1f.2/ata1.1/ata_port
./0000:00:1f.2/ata1.1/ata_port/ata1.1
./0000:00:1f.2/ata1.2
./0000:00:1f.2/ata1.2/link2/dev2.0/ata_device
./0000:00:1f.2/ata1.2/link2/ata_link
./0000:00:1f.2/ata1.2/ata_port
./0000:00:1f.2/ata1.2/ata_port/ata1.2
./0000:00:1f.2/ata1.3
./0000:00:1f.2/ata1.3/link3/dev3.0/ata_device
./0000:00:1f.2/ata1.3/link3/ata_link
./0000:00:1f.2/ata1.3/ata_port
./0000:00:1f.2/ata1.3/ata_port/ata1.3
./0000:00:1f.2/ata1.4
./0000:00:1f.2/ata1.4/link4/dev4.0/ata_device
./0000:00:1f.2/ata1.4/link4/ata_link
./0000:00:1f.2/ata1.4/ata_port
./0000:00:1f.2/ata1.4/ata_port/ata1.4
./0000:00:1f.2/ata1.5
./0000:00:1f.2/ata1.5/link5/dev5.0/ata_device
./0000:00:1f.2/ata1.5/link5/ata_link
./0000:00:1f.2/ata1.5/ata_port
./0000:00:1f.2/ata1.5/ata_port/ata1.5
./0000:00:1f.2/ata1.6
./0000:00:1f.2/ata1.6/link6/dev6.0/ata_device
./0000:00:1f.2/ata1.6/link6/ata_link
./0000:00:1f.2/ata1.6/ata_port
./0000:00:1f.2/ata1.6/ata_port/ata1.6
Signed-off-by: NDavid Milburn <dmilburn@redhat.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

1757d902

pata_samsung_cf: Use devm_clk_get() · 25effc36

由 Jingoo Han 提交于 1月 10, 2013

Use devm_clk_get() rather than clk_get() to make cleanup paths
more simple.
Signed-off-by: NJingoo Han <jg1.han@samsung.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

25effc36

[libata] replace sata_settings with devslp_timing · 803739d2

由 Shane Huang 提交于 12月 17, 2012

NCQ capability was used to check availability of SATA Settings page
from Identify Device Data Log, which contains DevSlp timing variables.
It does not work on some HDDs and leads to error messages.

IDENTIFY word 78 bit 5(Hardware Feature Control) can't work either
because it is only the sufficient condition of Identify Device data
log, not the necessary condition.

This patch replaced ata_device->sata_settings with ->devslp_timing
to only save DevSlp timing variables(8 bytes), instead of the whole
SATA Settings page(512 bytes).

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=51881Reported-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NShane Huang <shane.huang@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

803739d2

[libata] ahci: Add support for Enmotus Bobcat device. · 7f9c9f8e

由 Hugh Daschbach 提交于 1月 04, 2013

Silicon does not support standard AHCI BAR assignment.  Add
vendor/device exception to force BAR 2.
Signed-off-by: NHugh Daschbach <hugh.daschbach@enmotus.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

7f9c9f8e

[libata] ahci: Fix lack of command retry after a success error handler. · 1eaca39a

由 Bian Yu 提交于 12月 12, 2012

It should be a mistake introduced by commit 8d899e70.

qc->flags can't be set AC_ERR_*
Signed-off-by: NBian Yu <bianyu@kedacom.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

1eaca39a

12 1月, 2013 19 次提交

Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging · b719f430

由 Linus Torvalds 提交于 1月 11, 2013

Pull a hwmon patch from Guenter Roeck:
 "Fix build error in vexpress driver"

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (vexpress) Fix build error seen if CONFIG_OF_DEVICE is not set

b719f430

Merge branch 'akpm' (incoming fixes from Andrew) · c727b4c6

由 Linus Torvalds 提交于 1月 11, 2013

Merge misc fixes from Andrew Morton:
 "The audit fixes have been floating around for a while - Al and Eric
  aren't responding to either myself or Kees so I asked Kees to
  re-review them and here they are."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
  lib/rbtree.c: avoid the use of non-static __always_inline
  MAINTAINERS: Omar had moved
  mm: compaction: partially revert capture of suitable high-order page
  linux/audit.h: move ptrace.h include to kernel header
  kernel/audit.c: avoid negative sleep durations
  audit: catch possible NULL audit buffers
  audit: create explicit AUDIT_SECCOMP event type
  MAINTAINERS: fix a status pattern
  MAINTAINERS: fix arch/arm/plat-omap/include/plat/omap_hwmod.h
  mm: thp: acquire the anon_vma rwsem for write during split
  mm: mmap: annotate vm_lock_anon_vma locking properly for lockdep
  lockdep, rwsem: provide down_write_nest_lock()
  arch/mn10300/Kconfig: select CONFIG_GENERIC_ATOMIC64
  mm: bootmem: fix free_all_bootmem_core() with odd bitmap alignment
  mm: use aligned zone start for pfn_to_bitidx calculation
  fs/exec.c: work around icc miscompilation
  mm: compaction: fix echo 1 > compact_memory return error issue
  mm: memblock: fix wrong memmove size in memblock_merge_regions()
  drivers/video/ssd1307fb.c: fix bit order bug in the byte translation function
  mm: migrate: check page_count of THP before migrating
  ...

c727b4c6

lib/rbtree.c: avoid the use of non-static __always_inline · 3cb7a563

由 Michel Lespinasse 提交于 1月 11, 2013

lib/rbtree.c declared __rb_erase_color() as __always_inline void, and
then exported it with EXPORT_SYMBOL.

This was because __rb_erase_color() must be exported for augmented
rbtree users, but it must also be inlined into rb_erase() so that the
dummy callback can get optimized out of that call site.

(Actually with a modern compiler, none of the dummy callback functions
should even be generated as separate text functions).

The above usage is legal C, but it was unusual enough for some compilers
to warn about it.  This change makes things more explicit, with a static
__always_inline ____rb_erase_color function for use in rb_erase(), and a
separate non-inline __rb_erase_color function for use in
rb_erase_augmented call sites.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Reported-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3cb7a563

MAINTAINERS: Omar had moved · a8906b0b

由 Chen Gang 提交于 1月 11, 2013

Signed-off-by: NChen Gang <gang.chen@asianux.com>
Cc: Omar Ramirez Luna <omar.ramirez@ti.com>
Cc: Omar Ramirez Luna <omar.ramirez@copitl.com>
Cc: David Miller <davem@davemloft.net>
Cc: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a8906b0b

mm: compaction: partially revert capture of suitable high-order page · 8fb74b9f

由 Mel Gorman 提交于 1月 11, 2013

Eric Wong reported on 3.7 and 3.8-rc2 that ppoll() got stuck when
waiting for POLLIN on a local TCP socket. It was easier to trigger if
there was disk IO and dirty pages at the same time and he bisected it to
commit 1fb3f8ca ("mm: compaction: capture a suitable high-order page
immediately when it is made available").

The intention of that patch was to improve high-order allocations under
memory pressure after changes made to reclaim in 3.6 drastically hurt
THP allocations but the approach was flawed. For Eric, the problem was
that page->pfmemalloc was not being cleared for captured pages leading
to a poor interaction with swap-over-NFS support causing the packets to
be dropped. However, I identified a few more problems with the patch
including the fact that it can increase contention on zone->lock in some
cases which could result in async direct compaction being aborted early.

In retrospect the capture patch took the wrong approach. What it should
have done is mark the pageblock being migrated as MIGRATE_ISOLATE if it
was allocating for THP and avoided races that way. While the patch was
showing to improve allocation success rates at the time, the benefit is
marginal given the relative complexity and it should be revisited from
scratch in the context of the other reclaim-related changes that have
taken place since the patch was first written and tested. This patch
partially reverts commit 1fb3f8ca ("mm: compaction: capture a
suitable high-order page immediately when it is made available").
Reported-and-tested-by: NEric Wong <normalperson@yhbt.net>
Tested-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8fb74b9f

linux/audit.h: move ptrace.h include to kernel header · c0a3a20b

由 Mike Frysinger 提交于 1月 11, 2013

While the kernel internals want pt_regs (and so it includes
linux/ptrace.h), the user version of audit.h does not need it.  So move
the include out of the uapi version.

This avoids issues where people want the audit defines and userland
ptrace api.  Including both the kernel ptrace and the userland ptrace
headers can easily lead to failure.
Signed-off-by: NMike Frysinger <vapier@gentoo.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c0a3a20b

kernel/audit.c: avoid negative sleep durations · 82919919

由 Andrew Morton 提交于 1月 11, 2013

audit_log_start() performs the same jiffies comparison in two places.
If sufficient time has elapsed between the two comparisons, the second
one produces a negative sleep duration:

  schedule_timeout: wrong timeout value fffffffffffffff0
  Pid: 6606, comm: trinity-child1 Not tainted 3.8.0-rc1+ #43
  Call Trace:
    schedule_timeout+0x305/0x340
    audit_log_start+0x311/0x470
    audit_log_exit+0x4b/0xfb0
    __audit_syscall_exit+0x25f/0x2c0
    sysret_audit+0x17/0x21

Fix it by performing the comparison a single time.
Reported-by: NDave Jones <davej@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

82919919

audit: catch possible NULL audit buffers · 0644ec0c

由 Kees Cook 提交于 1月 11, 2013

It's possible for audit_log_start() to return NULL.  Handle it in the
various callers.
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Julien Tinnes <jln@google.com>
Cc: Will Drewry <wad@google.com>
Cc: Steve Grubb <sgrubb@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0644ec0c

audit: create explicit AUDIT_SECCOMP event type · 7b9205bd

由 Kees Cook 提交于 1月 11, 2013

The seccomp path was using AUDIT_ANOM_ABEND from when seccomp mode 1
could only kill a process.  While we still want to make sure an audit
record is forced on a kill, this should use a separate record type since
seccomp mode 2 introduces other behaviors.

In the case of "handled" behaviors (process wasn't killed), only emit a
record if the process is under inspection.  This change also fixes
userspace examination of seccomp audit events, since it was considered
malformed due to missing fields of the AUDIT_ANOM_ABEND event type.
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Julien Tinnes <jln@google.com>
Acked-by: NWill Drewry <wad@chromium.org>
Acked-by: NSteve Grubb <sgrubb@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7b9205bd

MAINTAINERS: fix a status pattern · 56ca9d98

由 Zhang Yanfei 提交于 1月 11, 2013

Change MAINTAINED to Maintained.
Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

56ca9d98

MAINTAINERS: fix arch/arm/plat-omap/include/plat/omap_hwmod.h · 8fc8b12b

由 Zhang Yanfei 提交于 1月 11, 2013

This file was moved to arch/arm/mach-omap2/omap=5Fhwmod.h by commit
2a296c8f ("ARM: OMAP: Make plat/omap=5Fhwmod.h local to
mach-omap2").
Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8fc8b12b

mm: thp: acquire the anon_vma rwsem for write during split · 062f1af2

由 Mel Gorman 提交于 1月 11, 2013

Zhouping Liu reported the following against 3.8-rc1 when running a mmap
testcase from LTP.

  mapcount 0 page_mapcount 3
  ------------[ cut here ]------------
  kernel BUG at mm/huge_memory.c:1798!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables bnep bluetooth rfkill iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfat fat dm_mirror dm_region_hash dm_log dm_mod cdc_ether iTCO_wdt i7core_edac coretemp usbnet iTCO_vendor_support mii crc32c_intel edac_core lpc_ich shpchp ioatdma mfd_core i2c_i801 pcspkr serio_raw bnx2 microcode dca vhost_net tun macvtap macvlan kvm_intel kvm uinput mgag200 sr_mod cdrom i2c_algo_bit sd_mod drm_kms_helper crc_t10dif ata_generic pata_acpi ttm ata_piix drm libata i2c_core megaraid_sas
  CPU 1
  Pid: 23217, comm: mmap10 Not tainted 3.8.0-rc1mainline+ #17 IBM IBM System x3400 M3 Server -[7379I08]-/69Y4356
  RIP: __split_huge_page+0x677/0x6d0
  RSP: 0000:ffff88017a03fc08  EFLAGS: 00010293
  RAX: 0000000000000003 RBX: ffff88027a6c22e0 RCX: 00000000000034d2
  RDX: 000000000000748b RSI: 0000000000000046 RDI: 0000000000000246
  RBP: ffff88017a03fcb8 R08: ffffffff819d2440 R09: 000000000000054a
  R10: 0000000000aaaaaa R11: 00000000ffffffff R12: 0000000000000000
  R13: 00007f4f11a00000 R14: ffff880179e96e00 R15: ffffea0005c08000
  FS:  00007f4f11f4a740(0000) GS:ffff88017bc20000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00000037e9ebb404 CR3: 000000017a436000 CR4: 00000000000007e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process mmap10 (pid: 23217, threadinfo ffff88017a03e000, task ffff880172dd32e0)
  Stack:
   ffff88017a540ec8 ffff88017a03fc20 ffffffff816017b5 ffff88017a03fc88
   ffffffff812fa014 0000000000000000 ffff880279ebd5c0 00000000f4f11a4c
   00000007f4f11f49 00000007f4f11a00 ffff88017a540ef0 ffff88017a540ee8
  Call Trace:
    split_huge_page+0x68/0xb0
    __split_huge_page_pmd+0x134/0x330
    split_huge_page_pmd_mm+0x51/0x60
    split_huge_page_address+0x3b/0x50
    __vma_adjust_trans_huge+0x9c/0xf0
    vma_adjust+0x684/0x750
    __split_vma.isra.28+0x1fa/0x220
    do_munmap+0xf9/0x420
    vm_munmap+0x4e/0x70
    sys_munmap+0x2b/0x40
    system_call_fastpath+0x16/0x1b

Alexander Beregalov and Alex Xu reported similar bugs and Hillf Danton
identified that commit 5a505085 ("mm/rmap: Convert the struct
anon_vma::mutex to an rwsem") and commit 4fc3f1d6 ("mm/rmap,
migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable")
were likely the problem.  Reverting these commits was reported to solve
the problem for Alexander.

Despite the reason for these commits, NUMA balancing is not the direct
source of the problem.  split_huge_page() expects the anon_vma lock to
be exclusive to serialise the whole split operation.  Ordinarily it is
expected that the anon_vma lock would only be required when updating the
avcs but THP also uses the anon_vma rwsem for collapse and split
operations where the page lock or compound lock cannot be used (as the
page is changing from base to THP or vice versa) and the page table
locks are insufficient.

This patch takes the anon_vma lock for write to serialise against parallel
split_huge_page as THP expected before the conversion to rwsem.
Reported-and-tested-by: NZhouping Liu <zliu@redhat.com>
Reported-by: NAlexander Beregalov <a.beregalov@gmail.com>
Reported-by: NAlex Xu <alex_y_xu@yahoo.ca>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

062f1af2

mm: mmap: annotate vm_lock_anon_vma locking properly for lockdep · 572043c9

由 Jiri Kosina 提交于 1月 11, 2013

Commit 5a505085 ("mm/rmap: Convert the struct anon_vma::mutex to an
rwsem") turned anon_vma mutex to rwsem.

However, the properly annotated nested locking in mm_take_all_locks()
has been converted from

	mutex_lock_nest_lock(&anon_vma->root->mutex, &mm->mmap_sem);

to

	down_write(&anon_vma->root->rwsem);

which is incomplete, and causes the false positive report from lockdep
below.

Annotate the fact that mmap_sem is used as an outter lock to serialize
taking of all the anon_vma rwsems at once no matter the order, using the
down_write_nest_lock() primitive.

This patch fixes this lockdep report:

 =============================================
 [ INFO: possible recursive locking detected ]
 3.8.0-rc2-00036-g5f738967 #171 Not tainted
 ---------------------------------------------
 qemu-kvm/2315 is trying to acquire lock:
  (&anon_vma->rwsem){+.+...}, at: mm_take_all_locks+0x149/0x1b0

 but task is already holding lock:
  (&anon_vma->rwsem){+.+...}, at: mm_take_all_locks+0x149/0x1b0

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&anon_vma->rwsem);
   lock(&anon_vma->rwsem);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 4 locks held by qemu-kvm/2315:
  #0:  (&mm->mmap_sem){++++++}, at: do_mmu_notifier_register+0xfc/0x170
  #1:  (mm_all_locks_mutex){+.+...}, at: mm_take_all_locks+0x36/0x1b0
  #2:  (&mapping->i_mmap_mutex){+.+...}, at: mm_take_all_locks+0xc9/0x1b0
  #3:  (&anon_vma->rwsem){+.+...}, at: mm_take_all_locks+0x149/0x1b0

 stack backtrace:
 Pid: 2315, comm: qemu-kvm Not tainted 3.8.0-rc2-00036-g5f738967 #171
 Call Trace:
   print_deadlock_bug+0xf2/0x100
   validate_chain+0x4f6/0x720
   __lock_acquire+0x359/0x580
   lock_acquire+0x121/0x190
   down_write+0x3f/0x70
   mm_take_all_locks+0x149/0x1b0
   do_mmu_notifier_register+0x68/0x170
   mmu_notifier_register+0xe/0x10
   kvm_create_vm+0x22b/0x330 [kvm]
   kvm_dev_ioctl+0xf8/0x1a0 [kvm]
   do_vfs_ioctl+0x9d/0x350
   sys_ioctl+0x91/0xb0
   system_call_fastpath+0x16/0x1b
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mel Gorman <mel@csn.ul.ie>
Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

572043c9

lockdep, rwsem: provide down_write_nest_lock() · 1b963c81

由 Jiri Kosina 提交于 1月 11, 2013

down_write_nest_lock() provides a means to annotate locking scenario
where an outer lock is guaranteed to serialize the order nested locks
are being acquired.

This is analogoue to already existing mutex_lock_nest_lock() and
spin_lock_nest_lock().
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mel Gorman <mel@csn.ul.ie>
Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1b963c81

arch/mn10300/Kconfig: select CONFIG_GENERIC_ATOMIC64 · fef6c12e

由 Andrew Morton 提交于 1月 11, 2013

mn10300 doesn't provide its own atomic64 implementation, so it should pull
in the generic one.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fef6c12e

mm: bootmem: fix free_all_bootmem_core() with odd bitmap alignment · 10d73e65

由 Max Filippov 提交于 1月 11, 2013

Currently free_all_bootmem_core ignores that node_min_pfn may be not
multiple of BITS_PER_LONG.  Eg commit 6dccdcbe ("mm: bootmem: fix
checking the bitmap when finally freeing bootmem") shifts vec by lower
bits of start instead of lower bits of idx.  Also

  if (IS_ALIGNED(start, BITS_PER_LONG) && vec == ~0UL)

assumes that vec bit 0 corresponds to start pfn, which is only true when
node_min_pfn is a multiple of BITS_PER_LONG.  Also loop in the else
clause can double-free pages (e.g.  with node_min_pfn == start == 1,
map[0] == ~0 on 32-bit machine page 32 will be double-freed).

This bug causes the following message during xtensa kernel boot:

  bootmem::free_all_bootmem_core nid=0 start=1 end=8000
  BUG: Bad page state in process swapper  pfn:00001
  page:d04bd020 count:0 mapcount:-127 mapping:  (null) index:0x2
  page flags: 0x0()
  Call Trace:
    bad_page+0x8c/0x9c
    free_pages_prepare+0x5e/0x88
    free_hot_cold_page+0xc/0xa0
    __free_pages+0x24/0x38
    __free_pages_bootmem+0x54/0x56
    free_all_bootmem_core$part$11+0xeb/0x138
    free_all_bootmem+0x46/0x58
    mem_init+0x25/0xa4
    start_kernel+0x11e/0x25c
    should_never_return+0x0/0x3be7

The fix is the following:
 - always align vec so that its bit 0 corresponds to start
 - provide BITS_PER_LONG bits in vec, if those bits are available in the
   map
 - don't free pages past next start position in the else clause.
Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Prasad Koya <prasad.koya@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

10d73e65

mm: use aligned zone start for pfn_to_bitidx calculation · c060f943

由 Laura Abbott 提交于 1月 11, 2013

The current calculation in pfn_to_bitidx assumes that (pfn -
zone->zone_start_pfn) >> pageblock_order will return the same bit for
all pfn in a pageblock.  If zone_start_pfn is not aligned to
pageblock_nr_pages, this may not always be correct.

Consider the following with pageblock order = 10, zone start 2MB:

  pfn     | pfn - zone start | (pfn - zone start) >> page block order
  ----------------------------------------------------------------
  0x26000 | 0x25e00	   |  0x97
  0x26100 | 0x25f00	   |  0x97
  0x26200 | 0x26000	   |  0x98
  0x26300 | 0x26100	   |  0x98

This means that calling {get,set}_pageblock_migratetype on a single page
will not set the migratetype for the full block.  Fix this by rounding
down zone_start_pfn when doing the bitidx calculation.

For our use case, the effects of this bug were mostly tied to the fact
that CMA allocations would either take a long time or fail to happen.
Depending on the driver using CMA, this could result in anything from
visual glitches to application failures.
Signed-off-by: NLaura Abbott <lauraa@codeaurora.org>
Acked-by: NMel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c060f943

fs/exec.c: work around icc miscompilation · 6d92d4f6

由 Xi Wang 提交于 1月 11, 2013

The tricky problem is this check:

	if (i++ >= max)

icc (mis)optimizes this check as:

	if (++i > max)

The check now becomes a no-op since max is MAX_ARG_STRINGS (0x7FFFFFFF).

This is "allowed" by the C standard, assuming i++ never overflows,
because signed integer overflow is undefined behavior.  This
optimization effectively reverts the previous commit 362e6663
("exec.c, compat.c: fix count(), compat_count() bounds checking") that
tries to fix the check.

This patch simply moves ++ after the check.
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6d92d4f6

mm: compaction: fix echo 1 > compact_memory return error issue · 7964c06d

由 Jason Liu 提交于 1月 11, 2013

when run the folloing command under shell, it will return error

  sh/$ echo 1 > /proc/sys/vm/compact_memory
  sh/$ sh: write error: Bad address

After strace, I found the following log:

  ...
  write(1, "1\n", 2)               = 3
  write(1, "", 4294967295)         = -1 EFAULT (Bad address)
  write(2, "echo: write error: Bad address\n", 31echo: write error: Bad address
  ) = 31

This tells system return 3(COMPACT_COMPLETE) after write data to
compact_memory.

The fix is to make the system just return 0 instead 3(COMPACT_COMPLETE)
from sysctl_compaction_handler after compaction_nodes finished.
Signed-off-by: NJason Liu <r64343@freescale.com>
Suggested-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NMel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7964c06d

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功