一个用户在我的网站上发布了一些奇怪的字符,我想阻止他们这样做,但不阻止在外语中使用的字符…因此,使用像[a-z0-9!@#$%^&*()...]
这样的正则表达式是不可取的。
有人能给我解释一下这里发生了什么吗,为什么它会以这种方式显示。角色是如何创建的,我如何防止他们这样做?
♥̧̧̧̛̣̘̟̘̥͓̫̪̹̪̪̮̯̞̘̙̦̝̭̭͕̜̰̩̗̟̹͔̜̥̟̗̗̥̦̠̖̫͕̺̻̞̥̹͇̱̥̥̻͇̦̙̣͊͗̉̽̈́̉͑̀́̃͒̏͋̃̅̇̊̏̎̈́͊͐̉͑̄̌̉́̈́́́̅̇͌̽̽͗́̄̾̓̈́̇̅͛́̈́͐̽̔̌̋̌̾́̿͌̔͊͆̈́̉́̎̔̊͗̊̂̎̍̏̈̀̏͋͌̋̽̄̐̽͐̀͘̕̕͘̕̚̚̚͘͜͜͜͠͝͠͝͠͝
谢谢
编辑:所以它们是用来强调字符的?有没有一种通用的做法或方法来防止用户在不完全阻止它们的情况下利用它们?我对外语或它们的实际使用/目的了解不够多,所以设计一些东西来限制组合字符的使用超出了我的能力范围。:-/
发布于 2014-03-07 02:43:52
这些是combining diacritical marks。对于字符ée-急性,您可以使用代码点U+00E9 (LATIN_SMALL_LETTER_E_WITH_ACUTE)或序列U+0065 U+0301 (LATIN_SMALL_LETTER_E COMBINING_ACUTE_ACENT)来表示它,其中文本渲染器将重音置于前面的代码点之上。
用户正在使用一系列组合标记来利用这一点:
codepoint glyph escaped UTF-8 info
=======================================================================
U+2665 ♥ \u2665 e2,99,a5, MISCELLANEOUS_SYMBOLS, OTHER_SYMBOL
U+034a ͊ \u034a cd,8a, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0360 ͠ \u0360 cd,a0, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0357 ͗ \u0357 cd,97, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0309 ̉ \u0309 cc,89, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033d ̽ \u033d cc,bd, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0344 ̈́ \u0344 cd,84, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0309 ̉ \u0309 cc,89, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0351 ͑ \u0351 cd,91, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0340 ̀ \u0340 cd,80, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+035d ͝ \u035d cd,9d, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0301 ́ \u0301 cc,81, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0303 ̃ \u0303 cc,83, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0352 ͒ \u0352 cd,92, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030f ̏ \u030f cc,8f, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+034b ͋ \u034b cd,8b, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0303 ̃ \u0303 cc,83, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0305 ̅ \u0305 cc,85, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0307 ̇ \u0307 cc,87, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030a ̊ \u030a cc,8a, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030f ̏ \u030f cc,8f, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030e ̎ \u030e cc,8e, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0344 ̈́ \u0344 cd,84, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+034a ͊ \u034a cd,8a, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0350 ͐ \u0350 cd,90, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0309 ̉ \u0309 cc,89, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0351 ͑ \u0351 cd,91, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0304 ̄ \u0304 cc,84, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030c ̌ \u030c cc,8c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0309 ̉ \u0309 cc,89, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0301 ́ \u0301 cc,81, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0344 ̈́ \u0344 cd,84, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0341 ́ \u0341 cd,81, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0301 ́ \u0301 cc,81, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0305 ̅ \u0305 cc,85, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0307 ̇ \u0307 cc,87, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+034c ͌ \u034c cd,8c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033d ̽ \u033d cc,bd, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033d ̽ \u033d cc,bd, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0357 ͗ \u0357 cd,97, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0301 ́ \u0301 cc,81, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0360 ͠ \u0360 cd,a0, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0304 ̄ \u0304 cc,84, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033e ̾ \u033e cc,be, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0343 ̓ \u0343 cd,83, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0344 ̈́ \u0344 cd,84, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0307 ̇ \u0307 cc,87, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0358 ͘ \u0358 cd,98, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0305 ̅ \u0305 cc,85, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+035d ͝ \u035d cd,9d, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+035b ͛ \u035b cd,9b, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0301 ́ \u0301 cc,81, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0344 ̈́ \u0344 cd,84, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0350 ͐ \u0350 cd,90, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033d ̽ \u033d cc,bd, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0314 ̔ \u0314 cc,94, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030c ̌ \u030c cc,8c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030b ̋ \u030b cc,8b, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030c ̌ \u030c cc,8c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033e ̾ \u033e cc,be, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0360 ͠ \u0360 cd,a0, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0301 ́ \u0301 cc,81, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033f ̿ \u033f cc,bf, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+034c ͌ \u034c cd,8c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0314 ̔ \u0314 cc,94, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0315 ̕ \u0315 cc,95, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+034a ͊ \u034a cd,8a, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0346 ͆ \u0346 cd,86, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0344 ̈́ \u0344 cd,84, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0309 ̉ \u0309 cc,89, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+035d ͝ \u035d cd,9d, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0341 ́ \u0341 cd,81, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0315 ̕ \u0315 cc,95, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030e ̎ \u030e cc,8e, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0314 ̔ \u0314 cc,94, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030a ̊ \u030a cc,8a, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0357 ͗ \u0357 cd,97, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0358 ͘ \u0358 cd,98, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030a ̊ \u030a cc,8a, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0315 ̕ \u0315 cc,95, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0302 ̂ \u0302 cc,82, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030e ̎ \u030e cc,8e, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030d ̍ \u030d cc,8d, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030f ̏ \u030f cc,8f, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0308 ̈ \u0308 cc,88, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0340 ̀ \u0340 cd,80, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030f ̏ \u030f cc,8f, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031a ̚ \u031a cc,9a, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+034b ͋ \u034b cd,8b, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031a ̚ \u031a cc,9a, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031a ̚ \u031a cc,9a, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+034c ͌ \u034c cd,8c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+030b ̋ \u030b cc,8b, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033d ̽ \u033d cc,bd, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0304 ̄ \u0304 cc,84, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0310 ̐ \u0310 cc,90, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033d ̽ \u033d cc,bd, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0350 ͐ \u0350 cd,90, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031b ̛ \u031b cc,9b, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0358 ͘ \u0358 cd,98, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0300 ̀ \u0300 cc,80, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0323 ̣ \u0323 cc,a3, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0318 ̘ \u0318 cc,98, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031f ̟ \u031f cc,9f, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+035c ͜ \u035c cd,9c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0318 ̘ \u0318 cc,98, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+035c ͜ \u035c cd,9c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0325 ̥ \u0325 cc,a5, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0353 ͓ \u0353 cd,93, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+032b ̫ \u032b cc,ab, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+032a ̪ \u032a cc,aa, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0339 ̹ \u0339 cc,b9, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+032a ̪ \u032a cc,aa, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+032a ̪ \u032a cc,aa, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+035c ͜ \u035c cd,9c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+032e ̮ \u032e cc,ae, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+032f ̯ \u032f cc,af, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0327 ̧ \u0327 cc,a7, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031e ̞ \u031e cc,9e, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0318 ̘ \u0318 cc,98, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0319 ̙ \u0319 cc,99, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0326 ̦ \u0326 cc,a6, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031d ̝ \u031d cc,9d, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+032d ̭ \u032d cc,ad, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+032d ̭ \u032d cc,ad, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0355 ͕ \u0355 cd,95, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031c ̜ \u031c cc,9c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0330 ̰ \u0330 cc,b0, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0329 ̩ \u0329 cc,a9, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0317 ̗ \u0317 cc,97, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031f ̟ \u031f cc,9f, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0339 ̹ \u0339 cc,b9, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0354 ͔ \u0354 cd,94, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031c ̜ \u031c cc,9c, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0325 ̥ \u0325 cc,a5, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031f ̟ \u031f cc,9f, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0317 ̗ \u0317 cc,97, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0317 ̗ \u0317 cc,97, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0325 ̥ \u0325 cc,a5, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0326 ̦ \u0326 cc,a6, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0320 ̠ \u0320 cc,a0, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0316 ̖ \u0316 cc,96, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+032b ̫ \u032b cc,ab, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0355 ͕ \u0355 cd,95, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033a ̺ \u033a cc,ba, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0327 ̧ \u0327 cc,a7, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033b ̻ \u033b cc,bb, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+031e ̞ \u031e cc,9e, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0325 ̥ \u0325 cc,a5, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0327 ̧ \u0327 cc,a7, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0339 ̹ \u0339 cc,b9, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0347 ͇ \u0347 cd,87, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0331 ̱ \u0331 cc,b1, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0325 ̥ \u0325 cc,a5, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0325 ̥ \u0325 cc,a5, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+033b ̻ \u033b cc,bb, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0347 ͇ \u0347 cd,87, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0326 ̦ \u0326 cc,a6, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0319 ̙ \u0319 cc,99, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
U+0323 ̣ \u0323 cc,a3, COMBINING_DIACRITICAL_MARKS, NON_SPACING_MARK
我在评论中提出了一些观点:
"\u2665\u034a\u0360\u0357"
执行the charts操作
发布于 2014-03-07 02:38:08
阻止这些代码点对您来说可能就足够了:
http://en.wikipedia.org/wiki/Combining_character#Unicode_ranges
发布于 2014-03-07 02:43:21
允许用户“张贴奇怪的字符”可能会让他们造成比看起来奇怪的文本更多的破坏。例如,检查跨站点脚本和类似的攻击。请确保无论您使用什么工具来处理此功能,都要达到标准(安全方面),并且配置良好。这应该会消除你所担心的副作用问题。
https://stackoverflow.com/questions/22233001
复制相似问题