@Amigo1
To answer the other question about UTF-16.
Well basically it does the same as UTF-8, preserve International symbols and there orignal format.
The biggest difference is that UTF-16 is 16 bit not 8bit format.
UTF-8/UTF-16 is popular encoding format used by text files, or transferred over TCP/IP, web pages, and many other things.
Unlike UTF-8, you can't open a UTF-16 file in text editor that does not support it, it will be hard to read its content, but whit UTF-8 you can, but some symbols will be unreadable.
For English speaking and European Languages, UTF16 is overkill,
most of the glyphs in European Languages are below the value of 128 value (Hex value 0x80), standard lattin, in addition most languages has maybe 4 special symbols.
But go to the middle east, or Asia there glyphs are different.
Full list of the Glyph and number they use can be found here:
http://unicode-table.com/en/#control-characterGeneral speaking, UTF-8 is faster to decode (use less CPU power) if one glyph uses one byte, if a glyph needs two bytes UTF-16 is faster (use less CPU power).
So in other words, if most glyphs needs 1 byte, UTF8 is best option, if most glyphs needs 2 bytes then UTF16 is best format.
While UTF-8 / UTF-16 has many advantages, ASCII 7bit or ASCII 8bit whit a translation codepage remains popular when coding C/ASM, because easier to work whit, as you do not need to decode anything, a byte is symbol that's it.
There for in my option UTF8 is best option because of its legacy to ASCII 7BIT. Older programs might in fact work even if the raw data is UTF8, just like we have a problem understanding one or two symbols a program not understand all the string symbols.
This goes back to web server that some decided to feed badly formatted UTF8 strings too, the web server did not detect parent directory “.. “ and other stuff because the symbols where hidden in the UTF8 encoding, the real problem was not that UTF8 was broken, the problem I expect was that web server did not even try to decode the string, because some of program code was old and not updated. Other parts where able to some how decode the string when it accessed the filesystem, I assume the string format was auto detected by the filesystem/io in the OS.
Edited by LiveForIt on 2014/3/7 0:11:25
Edited by LiveForIt on 2014/3/7 0:16:36
Edited by LiveForIt on 2014/3/7 0:19:38