| |||||
| |||||
Search Irongeek.com:
Help Irongeek.com pay for bandwidth and research equipment: |
Unicode Security Notes
Page
Every Unicode Character Blob Page or
TXT file Text below is to help with search indexing and copy and
pasting, but it is missing some items from the Power Point slides. Character Assassination: Adrian Crenshaw About Adrian I run Irongeek.com I have an interest in InfoSec education I don’t know everything - I’m just a geek with time on my
hands Sr. Information Security Engineer at a Fortune 1000 Co-Founder of Derbycon To be clear concerning what this talk is about Why this subject? Lot’s of research has been done, but not many people talk
about it Complexity is the damnable enemy of security, but human
language is complex so what can you do? Act as a setup for future research To encourage others who are better at exploit development
than me to look into it Because I wanted to make an animation with cartoon letters
stabbing each other Why Unicode There are more than English Speakers out there ASCII: American Standard Code for Information
Interchange What about other languages? Cyrillic, Chinese, Hebrew,
Arabic, Klingon… ( ok, sort of
http://wazu.jp/gallery/Test_Klingon.html ) Unicode lets computer systems support more languages,
allowing for world wide use Unicode History ASCII is 7 bit and just 96 printable characters, but an 8th
bit was added to make other standards: Extended ASCII ISO/IEC 8859 ISO/IEC 8859 uses last bit to add another 96+ control
characters You have to specify a part/character set/language to
specify those 96 This still was not enough, and did not allow for a lot of
mixed languages The need was to represent all of the characters as unique
code points, and not get confused amongst languages Unicode History Joe Becker (Xerox), Lee Collins & Mark Davis (Apple)
started working on Unicode in 1987 to do this, version 1.0.0 released in Oct
1991 Unicode started as a 16bit character model (0x0-0xFFFF),
with the first 256 code points the same as ISO-8859-1 Each character has a code point associated with it: This has since been expanded, so Unicode has points from
0x0 to 0x10FFFF (1,114,112 points dec), though support varies Most used points will be in Basic Multilingual Plane (BMP)
represented as U+0000 to U+FFFF Encodings UTF-8 (UCS Transformation Format 8-bit), meant to be
backward compatible with ASCII UTF-16 (Unicode Transformation Format 16-bit) which
superseded UCS-2 UTF-32 (Unicode Transformation Format 32-bit ) BOM (Byte Order Marks) UTF-8 prepends EFBBBF to data UTF-16 FEFF Unicode Big Endian, FFFE Little Endian UTF-32 generally does not use one Encoding Examples Omega U+03A9 AΩB UTF-8 UTF-16 UTF-32 I hate Smart Quotes! “Smart” "Not so smart"
�Smart when dumb�
Why? Microsoft extended ISO 8859-1, making some control
characters in 80 to 9F printable for Windows-1252 “ ” ‚ ‘ ’ — If Windows-1252 is confused for ISO 8859-1, you get
� for these characters Makes copying and pasting command in tutorials a pain! Related: UTF-8 Encoding Lower ASCII is the same in UTF-8, Higher uses continuation
bytes (table bogarded from Wikipedia) UTF-16 Encoding In UTF-16 U+10000 to U+10FFFF use
surrogate pairs in range 0xD800 to 0xD8FF Steps Mojibake! Mojibake = "character" "transform“ AΩB✌C Code Points: U+0041 U+03a9 U+0042 U+270C U+0043 UTF-8 bye string: EF BB BF 41 CE A9 42 E2 9C 8C 43 Mangled by reading as just ISO 8859-1 bytes: AΩB✌C Find Your Character Wikipedia List Unicode Table File Format Unicode Code Converter v7.05 Typing Unicode Windows: Alt, + key on keypad, type hex number May have to edit HKEY_Current_User/Control Panel/Input
Method and set EnableHexNumpad to "1“. OS X Option+Command+t will let you select some System Preferences ->Language & Text->Input Sources Enable “Unicode Hex Input” Select U+ from the menu bar Hold Option Key, type in Hex code Obligatory XKCD Slide Homoglyph/Visual Attacks Confusables and Look-a-likes Classic Phishing Obfuscations Would you follow a link in email to
AdriansHouseOfPwnage.com? Text says one thing, link says another: Confuse user with credentials section of a URL: Firefox pops up a warning IE just refuses to connect Other ideas? Homographs Homographs = words that looks the same Homoglyphs = characters that look the same Examples: rnicrosoft.com vs. microsoft.com paypa1.com vs. paypal.com IR0NGEEK.COM vs. IRONGEEK.COM Now, what about Unicode? Problem: DNS is ASCII DNS labels (the parts separated by dots) follow the LDH
rule: Letters Digits Hyphen This would not allow for international characters in DNS
labels Enter Punycode and IDNA IDNA Internationalized Domain Names in Applications (IDNA)
allows non-ASCII characters in the host section of a URL to map to DNS host
names What about Homoglyphs in Unicode? There are homoglyphs in Unicode that look the same as
normal Latin characters, and these could be used for spoofing names, examples: googlе.com = xn--googl-3we.com іucu.org = xn--ucU+ihd.org pаypal.com = xn--pypal-4ve.com Likely Sources for Homoglyphs Cyrillic script: a, c, e, o, p, x and y Latin alphabet appears twice, U+0021-007E (Basic Latin) &
U+FF01-FF5E (Full width Latin): Even some slashes Slashes? Can other domains be used?
www.microsoft.com⁄index.html.irongeek.com Mouse over it Homoglyph Attack Generator
http://www.irongeek.com/homoglyph-attack-generator.php Combination of JavaScript and PHP libraries created by
phlyLabs as part of phlyMail Protections Implemented by Browsers Firefox shows Punycode if Not in TLD White List (about:config→network.IDN.whitelist) network.IDN_show_punycode set to true (default false) Any of these blacklisted characters appear: Updated at Protections Implemented by Browsers IE 9, and I assume 10 shows Punycode if If there is a mismatch between the characters used in the
URL and the language expectation If character is not used in any language Mixed set of scripts that do not belong together Info may be out of date, most material references IE 7 Protections Implemented by Browsers Chrome shows Punycode if Configured language of the browser (configured in the
“Fonts and Languages” options) does not match Incompatible set of scripts that do not belong But there is a whitelist, so hard to confuse scripts like
Latin with Chinese can be used Characters in a black list Defenses by Registrar Registrars may not allow the character For example, one registrar gave the following error when an
attempt was made to register іucu.org (Cyrillic small letter
Byelorussian-Ukrainian i U+0456): May be gotten around by / homoglyphs,
ノ Katakana Letter No (U+30ce) seems
to work best and a domain you already own Approach Used domain we control, and Local Hosts file to map the DNS
entries IE 10.0.8 FireFox 23.0.1 Chrome 28.0.1500.95 mg Some Results Other odd balls іucu.org [xn--ucu-ihd.org](і
U+0456 ) could not be registered These seemed to pass Registrar’s tests ノ Katakana
Letter No (U+30ce) seems to work in Firefox for subdomain trick, but not in
Chrome or IE Display of IDNA in Web Apps What does the webapp display? How does it parse links? Test Strings Ω U+03A9 Outlook 2010 Sent from Gmail to campus mail Pink phishing warning that must be clicked past to use
links 4th, 7th and 8th link had
parse errors Gmail Sent from Outlook mail to Gmail 2nd and 3rd links used to have
problem with ɡ (Latin small letter script G U+0261) but now work 4th link had problems with Cyrillic і (U+0456)
if no http:// in front 7th and 8th link had parse errors
because of ⁄ (fraction slash U+2044) and were split in two Facebook Seemed to render all but the fourth link as it was inputted
Punycode versions show іucu.org without the preceding http:// gave issues.
Cyrillic і (U+0456) seemed to confuse the parser The ⁄ (fraction slash U+2044) in the last two links seems
to also cause no oddities Twitter Twitter had the effect of rendering all of the URLs as a
truncated, URL shortened (using t.co), Punycode version Except іucu.org without the preceding http://. Again, the
soft-dotted Cyrillic і (U+0456) seemed to confuse the parser. Twitter makes it pretty obvious that there is something
funny about the URLs Fonts Matter Calibri: Courier New: Ok, besides Homoglyphs? Steganography “Covered Writing” Hide Text in text Easy to detect by looking at the bytes, but may fool the
human eye Some examples looks better than others, Unicode support
varying. Can be used in Botnets: Play with it here: Stego Examples Alternate between Latin and Full-width Latin, easy, just
add/subtract 65248 decimal. Use U+205F as space Use very close homoglyphs to encode single bits, skip if
there are no close homoglyphs, use 8 types of space like characters (U+0020,
U+2004, U+2005, U+2006,
U+2008, U+2009, U+202F,
U+205F) to encode 3 bits each (000,001,010,011,100,101,110,111) Use non printable Tags in U+E0000 to U+E007F, also easy
just add/subtract 0xE0000 Examples: Name Spoofing IP Boards let me spoof Daren from Hak5’s screen name: Twitter returned the error Gmail/Google returned the error More research needs to be done in these areas. Right to left? Josh Kelley mentioned this one to me What about left to right mixed with right to left scripts? Takes U+202E (Right-to-Left Override), U+202C stops it http://irongeek.com/moc.tfosorcim//:ptth
More details at: What about file names? Just how they are displayed Non Visual
http://www.unicode.org/reports/tr36/ UTF-8 Exploits Text Comparison Buffer Overflows Property and Character Stability Deletion of Code Points Secure Encoding Conversion Enabling Lossless Conversion Canonicalization Errors? Remember when the full width Latin forms were turned to
normal Latin in the URL bar? < or > filtered? What if it also tries to canonicalize similar characters
like < (U+003c), >(U+003e), ‹ (U+2039), ﹤
(U+FE64), ﹥ (U+FE65) › (U+203a),
<(U+ff1c),
>(U+ff1e) afterwards? Other Transforms Case changes ß (U+00DF) upper case becomes SS İ (U+0130) to lower case becomes i (U+0069) ſ (U+017F) to upper becomes S (U+0053) ẞ (U+1E9E) to lower becomes ß (U+00DF) ı (U+0131) to upper becomes | (U+0049) Apparently, locale matters too, French upper case may drop
diacritics, Turkish handles “iIıİ” differently
http://www.w3.org/International/wiki/Case_folding UTF-8 Exploits Overly long encoding, will it bypass filters? < < = 3C = 00111100 > > = 3E = 00111110 a1 13 a1 03 a1 12 a1 09 a1 10 a1 14 MS00-057 Was this Problem, but with ../ Text Comparison Various characters have both their own code point, and can
be made with “Combining” characters Diacritical marks also A (U+0041) next to U+0300 = À but
À is also U+00C0 We want text searches to be equivalent, NFKC - Normalization Form Compatibility Composition "Ⓓⓔⓛⓔⓣⓔ" into
"delete". International Phonetic Alphabet has examples in
U+0300 to U+036F. Even more in U+1DC0 to U+1DFF Real-life Example: Spotify The canonical_username function was not “idempotent” (only
first time matters), Function like “toLower” would be. Users signs up with username IronGeek, normalized to
irongeek Another user signs up as ᴵᴿᴼᴺᴳᴱᴱᴷ (U+1D35
U+1D3F U+1D3C U+1D3A U+1D33 U+1D31 U+1D31 U+1D37 in Phonetic Extensions
block) ᴵᴿᴼᴺᴳᴱᴱᴷ requests a password reset email, but with it can
reset IronGeek’s account Full story here: Thwart Searches/Obscenity Filters What if you want to be public, by hard to search for? What if you wan to search for filtered words? Classic example, no Unicode needed: pr0n Porn != Pοrn != Pоrn o=U+006f, ο=U+03bf,
о=U+043e Latin Small o, Greek Small Omicron, Cyrillic Small Letter o Searches for the above turcn up different results in Google Some items with mixed scripts just get flagged as spam Just plain fun too Buffer Overflows Some expand out Complexities With Buffer Overflows Try to overwrite EIP with 0x41414141, you get 0x00410041 Chris Anley came up with “Venetian Shellcode” Links: FX of Phenoelit also did some work on this Fuzzing Suggestions:: Combining Diacritics Invisible Characters Malformed UTF-8 Bad Surrogate Pairs Multiple levels or RTL, LTR reversing Chris Weber’s Blog: In recent news, Apple's CoreText API Bug: Big Thanks J. Abolins Chris Weber Michal Zalewski William Coppola Useful Sites Unicode Security Considerations Unicode Security Mechanisms Unicode Converter Unicode Character Info and List Homoglyph Attack Generator Unicode-HAX OWASP XSS Filter Evasion Cheat Sheet Fun Unicode “Fonts” Other Fun Art Hand are based on References A. Costello, March 2003. [Online]. Available:
http://www.ietf.org/rfc/rfc3492.txt J. Abolins, December 2010. [Online]. Available:
http://www.irongeek.com/i.php?page=videos/dojocon-2010-videos#Internationalized%20Domain%20Names%20&%20Investigations%20in%20the%20Networked%20World
M. Zalewski, The Tangled Web: A Guide to Securing Modern
Web Applications, 1st ed., No Starch Press, 2011. E. &. G. A. Gabrilovich, "The Homograph Attack,"
Communications of the ACM , vol. 45, no. 2, 2002. V. Krammer, "Phishing defense against IDN address spoofing
attacks," in Proceedings of the 2006 International Conference on Privacy,
Security and Trust: Bridge the Gap Between PST Technologies and Business
Services , New York, NY, USA, 2006 E. Johanson, "The state of homograph attacks," 2005.
[Online]. Available:
http://www.shmoo.com/idn/. [Accessed 24 4 2012]. D. Kennedy. [Online]. Available:
http://www.secmaniac.com/download/ A. Crenshaw, 2012. [Online]. Available:
http://www.irongeek.com/homoglyph-attack-generator.php phlyLabs, 2012. [Online]. Available:
http://phlymail.com Microsoft, September 2006. [Online]. Available:
http://msdn.microsoft.com/en-us/library/bb250505%28VS.85%29.aspx Chromium Project, [Online]. Available:
http://www.chromium.org/developers/design-documents/idn-in-google-chrome C. Weber, July 2009. [Online]. Available:
http://www.blackhat.com/presentations/bh-usa-09/WEBER/BHUSA09-Weber-UnicodeSecurityPreview-SLIDES.pdf. C. Weber, seems to be longer version of presentation above
http://www.casaba.com/files/Chris_Weber_Character%20Transformations%20v1.7_IUC33.pdf
C. Weber, July 2009. [Online]. Available:
http://www.blackhat.com/presentations/bh-usa-09/WEBER/BHUSA09-Weber-UnicodeSecurityPreview-PAPER.pdf
A. Crenshaw, "Steganographic Command and Control: Building
a communication channel that withstands hostile scrutiny," 2010. [Online].
Available:
http://www.irongeek.com/i.php?page=security/steganographic-command-and-control
[Accessed 23rd April 2012] Events Derbycon Others Questions? 42 Twitter: @Irongeek_ADC
15 most recent posts on Irongeek.com:
|
If you would like to republish one of the articles from this site on your
webpage or print journal please contact IronGeek.
Copyright 2020, IronGeek
Louisville / Kentuckiana Information Security Enthusiast