udf: Fix leak of UTF-16 surrogates into encoded strings

OSTA UDF specification does not mention whether the CS0 charset in case of two bytes per character encoding should be treated in UTF-16 or UCS-2. The sample code in the standard does not treat UTF-16 surrogates in any special way but on systems such as Windows which work in UTF-16 internally, filenames would be treated as being in UTF-16 effectively. In Linux it is more difficult to handle characters outside of Base Multilingual plane (beyond 0xffff) as NLS framework works with 2-byte characters only. Just make sure we don't leak UTF-16 surrogates into the resulting string when loading names from the filesystem for now. CC: stable@vger.kernel.org # >= v4.6 Reported-by: N Mingye Wang <arthur200126@gmail.com> Signed-off-by: N Jan Kara <jack@suse.cz>

udf: Fix leak of UTF-16 surrogates into encoded strings
OSTA UDF specification does not mention whether the CS0 charset in case of two bytes per character encoding should be treated in UTF-16 or UCS-2. The sample code in the standard does not treat UTF-16 surrogates in any special way but on systems such as Windows which work in UTF-16 internally, filenames would be treated as being in UTF-16 effectively. In Linux it is more difficult to handle characters outside of Base Multilingual plane (beyond 0xffff) as NLS framework works with 2-byte characters only. Just make sure we don't leak UTF-16 surrogates into the resulting string when loading names from the filesystem for now. CC: stable@vger.kernel.org # >= v4.6 Reported-by: N Mingye Wang <arthur200126@gmail.com> Signed-off-by: N Jan Kara <jack@suse.cz>
44f06ba8 · Jan Kara · 06856938 · 44f06ba8
隐藏空白更改
内联并排

Showing with 6 addition and 0 deletion

fs/udf/unicode.c fs/udf/unicode.c +6 -0

未找到文件。
--- a/fs/udf/unicode.c
+++ b/fs/udf/unicode.c
@@ -28,6 +28,9 @@

 #include "udf_sb.h"

+#define SURROGATE_MASK 0xfffff800
+#define SURROGATE_PAIR 0x0000d800
+
 static int udf_uni2char_utf8(wchar_t uni,
 			     unsigned char *out,
 			     int boundlen)
@@ -37,6 +40,9 @@ static int udf_uni2char_utf8(wchar_t uni,
 	if (boundlen <= 0)
 		return -ENAMETOOLONG;

+	if ((uni & SURROGATE_MASK) == SURROGATE_PAIR)
+		return -EINVAL;
+
 	if (uni < 0x80) {
 		out[u_len++] = (unsigned char)uni;
 	} else if (uni < 0x800) {