Unicode 6.0 has 7 character categories, and each category has subcategories:
Letter (L): lowercase (Ll), modifier (Lm), titlecase (Lt), uppercase (Lu), other (Lo)
Mark (M): spacing combining (Mc), enclosing (Me), non-spacing (Mn)
Number (N): decimal digit (Nd), letter (Nl), other (No)
Punctuation (P): connector (Pc), dash (Pd), initial quote (Pi), final quote (Pf), open (Ps), close (Pe), other (Po)
Symbol (S): currency (Sc), modifier (Sk), math (Sm), other (So)
Separator (Z): line (Zl), paragraph (Zp), space (Zs)
Other (C): control (Cc), format (Cf), not assigned (Cn), private use (Co), surrogate (Cs)
There are 3 ranges reserved for private use (Co subcategory):
U+E000—U+F8FF (6,400 code points), U+F0000—U+FFFFD (65,534) and U+100000—U+10FFFD (65,534).
Surrogates (Cs subcategory) use the range U+D800—U+DFFF (2,048 code points).
"""
## Brute-force version: list all possible unicode ranges, but this list is not complete.
# text = re.sub('[\u0021-\u002f\u003a-\u0040\u005b-\u0060\u007b-\u007e\u00a1-\u00bf\u2000-\u206f\u2013-\u204a\u20a0-\u20bf\u2100-\u214f\u2150-\u218b\u2190-\u21ff\u2200-\u22ff\u2300-\u23ff\u2460-\u24ff\u2500-\u257f\u2580-\u259f\u25a0-\u25ff\u2600-\u26ff\u2e00-\u2e7f\u3000-\u303f\ufe50-\ufe6f\ufe30-\ufe4f\ufe10-\ufe1f\uff00-\uffef─◆╱]+','',text)