I’m reading this report on unicode security and found the following paragraphs confusing:
When converting from a multi-byte encoding, a byte value may not be a valid trailing byte, in a context where it follows a particular leading byte. For example, when converting UTF-8 input, the byte sequence E3 80 22 is malformed because 0x22 is not a valid second trailing byte following the leading byte 0xE3. Some conversion code may report the three-byte sequence E3 80 22 as one illegal sequence and continue converting the rest, while other conversion code may report only the two-byte sequence E3 80 as an illegal sequence and continue converting with the 0x22 byte which is a syntax character in HTML and XML (U+0022 double quote). Implementations that report the 0x22 byte as part of the illegal sequence can be exploited for cross-site-scripting (XSS) attacks.
Therefore, an illegal byte sequence must not include bytes that encode valid characters or are leading bytes for valid characters.
Based on the example described (E3 80 22) as a byte sequence, it is clear that it not valid:
>>> b'\xe3\x80\x22'.decode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte
and the question is how a good parser/converter is supposed to manage this type of error.
Probably I’m misunderstanding something, but it says that some may report an error with the whole sequence (
E3 80 22), but others may report an error with
E3 80 and continue converting the
22 byte as a double quote. However, it says that when the report includes the
22 byte, then this can be exploited in a XSS attack. That’s the part that is confusing; I would have thought that it was the second instance the one leading to XSS vulnerabilities. What is the rationale to think it should be the first instance vulnerable to XSS?
An additional question: How is this type of issue exploitable in practice (assuming we are interested in web applications)? Am I supposed to simply use URL encoding or HTML encoding (
ã€", respectively) and hope for the best?
✓ Extra quality
ExtraProxies brings the best proxy quality for you with our private and reliable proxies
✓ Extra anonymity
Top level of anonymity and 100% safe proxies – this is what you get with every proxy package
✓ Extra speed
1,ooo mb/s proxy servers speed – we are way better than others – just enjoy our proxies!
USA proxy location
We offer premium quality USA private proxies – the most essential proxies you can ever want from USA
Our proxies have TOP level of anonymity + Elite quality, so you are always safe and secure with your proxies
Use your proxies as much as you want – we have no limits for data transfer and bandwidth, unlimited usage!
Superb fast proxy servers with 1,000 mb/s speed – sit back and enjoy your lightning fast private proxies!
99,9% servers uptime
Alive and working proxies all the time – we are taking care of our servers so you can use them without any problems
No usage restrictions
You have freedom to use your proxies with every software, browser or website you want without restrictions
Perfect for SEO
We are 100% friendly with all SEO tasks as well as internet marketing – feel the power with our proxies
Buy more proxies and get better price – we offer various proxy packages with great deals and discounts
We are working 24/7 to bring the best proxy experience for you – we are glad to help and assist you!