Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
张重言
rails
提交
dceef082
R
rails
项目概览
张重言
/
rails
通知
1
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
R
rails
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
dceef082
编写于
4月 12, 2010
作者:
N
Norman Clarke
提交者:
Jeremy Kemper
4月 12, 2010
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Improve reliability of Inflector.transliterate. [#4374 state:resolved]
Signed-off-by:
N
Jeremy Kemper
<
jeremy@bitsweat.net
>
上级
36f3634a
变更
4
隐藏空白更改
内联
并排
Showing
4 changed file
with
93 addition
and
25 deletion
+93
-25
activesupport/CHANGELOG
activesupport/CHANGELOG
+2
-0
activesupport/lib/active_support/inflector/transliterate.rb
activesupport/lib/active_support/inflector/transliterate.rb
+37
-24
activesupport/test/inflector_test_cases.rb
activesupport/test/inflector_test_cases.rb
+4
-1
activesupport/test/transliterate_test.rb
activesupport/test/transliterate_test.rb
+50
-0
未找到文件。
activesupport/CHANGELOG
浏览文件 @
dceef082
*Rails 3.0.0 [beta 3] (pending)*
* Improve transliteration quality. #4374 [Norman Clarke]
* Speed up and add Ruby 1.9 support for ActiveSupport::Multibyte::Chars#tidy_bytes. #4350 [Norman Clarke]
...
...
activesupport/lib/active_support/inflector/transliterate.rb
浏览文件 @
dceef082
# encoding: utf-8
require
'iconv'
require
'kconv'
require
'active_support/core_ext/string/multibyte'
module
ActiveSupport
module
Inflector
extend
self
# Replaces accented characters with their ascii equivalents.
def
transliterate
(
string
)
Iconv
.
iconv
(
'ascii//ignore//translit'
,
'utf-8'
,
string
).
to_s
end
if
RUBY_VERSION
>=
'1.9'
undef_method
:transliterate
def
transliterate
(
string
)
proxy
=
ActiveSupport
::
Multibyte
.
proxy_class
.
new
(
string
)
proxy
.
normalize
(
:kd
).
gsub
(
/[^\x00-\x7F]+/
,
''
)
end
# UTF-8 byte => ASCII approximate UTF-8 byte(s)
ASCII_APPROXIMATIONS
=
{
198
=>
[
65
,
69
],
# Æ => AE
208
=>
68
,
# Ð => D
216
=>
79
,
# Ø => O
222
=>
[
84
,
104
],
# Þ => Þ
223
=>
[
115
,
115
],
# ß => ss
230
=>
[
97
,
101
],
# æ => ae
240
=>
100
,
# ð => d
248
=>
111
,
# ø => o
254
=>
[
116
,
104
],
# þ => th
272
=>
68
,
# Đ => D
273
=>
100
,
# đ => đ
294
=>
72
,
# Ħ => H
295
=>
104
,
# ħ => h
305
=>
105
,
# ı => i
306
=>
[
73
,
74
],
# IJ =>IJ
307
=>
[
105
,
106
],
# ij => ij
312
=>
107
,
# ĸ => k
319
=>
76
,
# Ŀ => L
320
=>
108
,
# ŀ => l
321
=>
76
,
# Ł => L
322
=>
108
,
# ł => l
329
=>
110
,
# ʼn => n
330
=>
[
78
,
71
],
# Ŋ => NG
331
=>
[
110
,
103
],
# ŋ => ng
338
=>
[
79
,
69
],
# Œ => OE
339
=>
[
111
,
101
],
# œ => oe
358
=>
84
,
# Ŧ => T
359
=>
116
# ŧ => t
}
# The iconv transliteration code doesn't function correctly
# on some platforms, but it's very fast where it does function.
elsif
"foo"
!=
(
Inflector
.
transliterate
(
"föö"
)
rescue
nil
)
undef_method
:transliterate
def
transliterate
(
string
)
string
.
mb_chars
.
normalize
(
:kd
)
.
# Decompose accented characters
gsub
(
/[^\x00-\x7F]+/
,
''
)
# Remove anything non-ASCII entirely (e.g. diacritics).
end
# Replaces accented characters with an ASCII approximation, or deletes it if none exsits.
def
transliterate
(
string
)
ActiveSupport
::
Multibyte
::
Chars
.
new
(
string
).
tidy_bytes
.
normalize
(
:d
).
unpack
(
"U*"
).
map
do
|
char
|
ASCII_APPROXIMATIONS
[
char
]
||
(
char
if
char
<
128
)
end
.
compact
.
flatten
.
pack
(
"U*"
)
end
# Replaces special characters in a string so that it may be used as part of a 'pretty' URL.
...
...
@@ -45,8 +60,6 @@ def transliterate(string)
# <%= link_to(@person.name, person_path(@person)) %>
# # => <a href="/person/1-donald-e-knuth">Donald E. Knuth</a>
def
parameterize
(
string
,
sep
=
'-'
)
# remove malformed utf8 characters
string
=
string
.
toutf8
unless
string
.
is_utf8?
# replace accented chars with their ascii equivalents
parameterized_string
=
transliterate
(
string
)
# Turn unwanted chars into the separator
...
...
@@ -59,6 +72,6 @@ def parameterize(string, sep = '-')
parameterized_string
.
gsub!
(
/^
#{
re_sep
}
|
#{
re_sep
}
$/i
,
''
)
end
parameterized_string
.
downcase
end
end
end
end
activesupport/test/inflector_test_cases.rb
浏览文件 @
dceef082
...
...
@@ -188,7 +188,10 @@ module InflectorTestCases
StringToParameterizedAndNormalized
=
{
"Malmö"
=>
"malmo"
,
"Garçons"
=>
"garcons"
,
"Ops
\331
"
=>
"ops"
"Ops
\331
"
=>
"opsu"
,
"Ærøskøbing"
=>
"aeroskobing"
,
"Aßlar"
=>
"asslar"
,
"Japanese: 日本語"
=>
"japanese"
}
UnderscoreToHuman
=
{
...
...
activesupport/test/transliterate_test.rb
0 → 100644
浏览文件 @
dceef082
# encoding: utf-8
require
'abstract_unit'
require
'active_support/inflector/transliterate'
class
TransliterateTest
<
Test
::
Unit
::
TestCase
APPROXIMATIONS
=
{
"À"
=>
"A"
,
"Á"
=>
"A"
,
"Â"
=>
"A"
,
"Ã"
=>
"A"
,
"Ä"
=>
"A"
,
"Å"
=>
"A"
,
"Æ"
=>
"AE"
,
"Ç"
=>
"C"
,
"È"
=>
"E"
,
"É"
=>
"E"
,
"Ê"
=>
"E"
,
"Ë"
=>
"E"
,
"Ì"
=>
"I"
,
"Í"
=>
"I"
,
"Î"
=>
"I"
,
"Ï"
=>
"I"
,
"Ð"
=>
"D"
,
"Ñ"
=>
"N"
,
"Ò"
=>
"O"
,
"Ó"
=>
"O"
,
"Ô"
=>
"O"
,
"Õ"
=>
"O"
,
"Ö"
=>
"O"
,
"Ø"
=>
"O"
,
"Ù"
=>
"U"
,
"Ú"
=>
"U"
,
"Û"
=>
"U"
,
"Ü"
=>
"U"
,
"Ý"
=>
"Y"
,
"Þ"
=>
"Th"
,
"ß"
=>
"ss"
,
"à"
=>
"a"
,
"á"
=>
"a"
,
"â"
=>
"a"
,
"ã"
=>
"a"
,
"ä"
=>
"a"
,
"å"
=>
"a"
,
"æ"
=>
"ae"
,
"ç"
=>
"c"
,
"è"
=>
"e"
,
"é"
=>
"e"
,
"ê"
=>
"e"
,
"ë"
=>
"e"
,
"ì"
=>
"i"
,
"í"
=>
"i"
,
"î"
=>
"i"
,
"ï"
=>
"i"
,
"ð"
=>
"d"
,
"ñ"
=>
"n"
,
"ò"
=>
"o"
,
"ó"
=>
"o"
,
"ô"
=>
"o"
,
"õ"
=>
"o"
,
"ö"
=>
"o"
,
"ø"
=>
"o"
,
"ù"
=>
"u"
,
"ú"
=>
"u"
,
"û"
=>
"u"
,
"ü"
=>
"u"
,
"ý"
=>
"y"
,
"þ"
=>
"th"
,
"ÿ"
=>
"y"
,
"Ā"
=>
"A"
,
"ā"
=>
"a"
,
"Ă"
=>
"A"
,
"ă"
=>
"a"
,
"Ą"
=>
"A"
,
"ą"
=>
"a"
,
"Ć"
=>
"C"
,
"ć"
=>
"c"
,
"Ĉ"
=>
"C"
,
"ĉ"
=>
"c"
,
"Ċ"
=>
"C"
,
"ċ"
=>
"c"
,
"Č"
=>
"C"
,
"č"
=>
"c"
,
"Ď"
=>
"D"
,
"ď"
=>
"d"
,
"Đ"
=>
"D"
,
"đ"
=>
"d"
,
"Ē"
=>
"E"
,
"ē"
=>
"e"
,
"Ĕ"
=>
"E"
,
"ĕ"
=>
"e"
,
"Ė"
=>
"E"
,
"ė"
=>
"e"
,
"Ę"
=>
"E"
,
"ę"
=>
"e"
,
"Ě"
=>
"E"
,
"ě"
=>
"e"
,
"Ĝ"
=>
"G"
,
"ĝ"
=>
"g"
,
"Ğ"
=>
"G"
,
"ğ"
=>
"g"
,
"Ġ"
=>
"G"
,
"ġ"
=>
"g"
,
"Ģ"
=>
"G"
,
"ģ"
=>
"g"
,
"Ĥ"
=>
"H"
,
"ĥ"
=>
"h"
,
"Ħ"
=>
"H"
,
"ħ"
=>
"h"
,
"Ĩ"
=>
"I"
,
"ĩ"
=>
"i"
,
"Ī"
=>
"I"
,
"ī"
=>
"i"
,
"Ĭ"
=>
"I"
,
"ĭ"
=>
"i"
,
"Į"
=>
"I"
,
"į"
=>
"i"
,
"İ"
=>
"I"
,
"ı"
=>
"i"
,
"IJ"
=>
"IJ"
,
"ij"
=>
"ij"
,
"Ĵ"
=>
"J"
,
"ĵ"
=>
"j"
,
"Ķ"
=>
"K"
,
"ķ"
=>
"k"
,
"ĸ"
=>
"k"
,
"Ĺ"
=>
"L"
,
"ĺ"
=>
"l"
,
"Ļ"
=>
"L"
,
"ļ"
=>
"l"
,
"Ľ"
=>
"L"
,
"ľ"
=>
"l"
,
"Ŀ"
=>
"L"
,
"ŀ"
=>
"l"
,
"Ł"
=>
"L"
,
"ł"
=>
"l"
,
"Ń"
=>
"N"
,
"ń"
=>
"n"
,
"Ņ"
=>
"N"
,
"ņ"
=>
"n"
,
"Ň"
=>
"N"
,
"ň"
=>
"n"
,
"ʼn"
=>
"n"
,
"Ŋ"
=>
"NG"
,
"ŋ"
=>
"ng"
,
"Ō"
=>
"O"
,
"ō"
=>
"o"
,
"Ŏ"
=>
"O"
,
"ŏ"
=>
"o"
,
"Ő"
=>
"O"
,
"ő"
=>
"o"
,
"Œ"
=>
"OE"
,
"œ"
=>
"oe"
,
"Ŕ"
=>
"R"
,
"ŕ"
=>
"r"
,
"Ŗ"
=>
"R"
,
"ŗ"
=>
"r"
,
"Ř"
=>
"R"
,
"ř"
=>
"r"
,
"Ś"
=>
"S"
,
"ś"
=>
"s"
,
"Ŝ"
=>
"S"
,
"ŝ"
=>
"s"
,
"Ş"
=>
"S"
,
"ş"
=>
"s"
,
"Š"
=>
"S"
,
"š"
=>
"s"
,
"Ţ"
=>
"T"
,
"ţ"
=>
"t"
,
"Ť"
=>
"T"
,
"ť"
=>
"t"
,
"Ŧ"
=>
"T"
,
"ŧ"
=>
"t"
,
"Ũ"
=>
"U"
,
"ũ"
=>
"u"
,
"Ū"
=>
"U"
,
"ū"
=>
"u"
,
"Ŭ"
=>
"U"
,
"ŭ"
=>
"u"
,
"Ů"
=>
"U"
,
"ů"
=>
"u"
,
"Ű"
=>
"U"
,
"ű"
=>
"u"
,
"Ų"
=>
"U"
,
"ų"
=>
"u"
,
"Ŵ"
=>
"W"
,
"ŵ"
=>
"w"
,
"Ŷ"
=>
"Y"
,
"ŷ"
=>
"y"
,
"Ÿ"
=>
"Y"
,
"Ź"
=>
"Z"
,
"ź"
=>
"z"
,
"Ż"
=>
"Z"
,
"ż"
=>
"z"
,
"Ž"
=>
"Z"
,
"ž"
=>
"z"
}
def
test_transliterate_should_not_change_ascii_chars
(
0
..
127
).
each
do
|
byte
|
char
=
[
byte
].
pack
(
"U"
)
assert_equal
char
,
ActiveSupport
::
Inflector
.
transliterate
(
char
)
end
end
def
test_should_convert_accented_chars_to_approximate_ascii_chars
APPROXIMATIONS
.
each
do
|
given
,
expected
|
assert_equal
expected
,
ActiveSupport
::
Inflector
.
transliterate
(
given
)
end
end
end
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录