Assuming I am using a standard (British or USA) English installation of AmigaOS4, does it USUALLY use the ISO-8859-1 (Latin 1) character set?
Wikipedia states that AmigaDOS (1.x) uses ISO-8859-1 (Latin 1), so I'm guessing that's what future versions of AmigaOS stuck to for English users?
BTW, I see a Libs:Charsets folder - is there a standard Amiga library which allows converting between charsets? The SDKBrowser wasn't much help in answering this question.
That depends on your locale settings. For plain english the system charset should be ISO-8859-1, but i.e. for czech or other slavic languages it will be ISO-8859-2.
These two lines will give you the currently used charset:
Don't assume a charset. I use ISO-8859-15, otherwise the Euro symbol is missing (I know we don't use the Euro here, but that doesn't mean I don't want or need to type it...!).
There are various ways of finding the current charset. I use the same as tboeckel's code above, although somebody did mention that I shouldn't be using that, but the alternative looked convoluted, not that I can remember what it was.
uint32 loc_CodeSet Specifies the code set required by this locale. Before V50, this value was always 0. Since V50, this is the IANA charset number (see L:CharSets/character-sets). For compatibility, 0 should be handled as equal to 4, both meaning ISO-8859-1 Latin1.
Starting with V50, locale.library maintains a global environment variable called "Charset" which contains the MIME name of the current default charset as used in the system. This is the name of the charset associated with the Locale structure returned by OpenLocale(NULL).
On it's own a "MIBenum" number doesn't look terribly useful. I'll have to see if there is a way to get a meaningful name from it... (Maybe GetDiskFontCtrl() will do the job.)
Quote:
PS: It took me two minutes from not knowing, to RTFM, to finding out, why autodoc authors even bother?
You didn't even answer my main question (i.e. is ISO-8859-1 the default for English), so no need to be grumpy. Wikipedia was literally the ONLY website with any information.
You didn't even answer my main question (i.e. is ISO-8859-1 the default for English),
There is no "default" always use Locale to determine the charset. For example once ancilmon creates his own custom locale for english in turkish character set then, just the fact it's using english would mess you up.
And as the other Chris said many english users use iso-8859-15 these days.
Quote:
so no need to be grumpy.
You beat me to the edit, where I was about to say I'm only saying this because all the contributors to the thread are established and experience developers, who should at least know how to read the autodocs, I ofcourse wouldn't say it to a newbie dev, and if it came over as overly grumpy then sorry
Quote:
Wikipedia was literally the ONLY website with any information.
I would trust wiki.amigaos.net over Wikipedia in such matters any day.
For plain english the system charset should be ISO-8859-1
Thanks! It's amazing how this seems to be assumed as common knowledge, but doesn't actually seem to be stated anywhere (apart from Wikipedia, the unreliable font of all knowledge).
Both of those appear to be V50, so looks like I'll still have to assume ISO-8859-1 for AmigaOS 3.x (and probably MOS+AROS until I can be bothered to find out how they do it).
There you will find how codesets.lib supports all systems to obtain the currently active charset. Eventually it falls back to ISO-8859-1 if all other attempts fail.
if you wont consistency, you should store any text string as UTF8 and convert it to character set used by the user.
At least if it is a language file.
In 8 BIT ASCII you have 0 to 127 the typical English (7BIT ASCII), from 128 to 255 you have language specific chars, the symbols for this are not the same between languages, this are controlled by code set that the user has selected.
If it's English you wont, it make little difference what charset you use, besides the "€" symbol.
The codeset defines what symbol that OS should show depending on the language. They also are the same as values as in UTF32 table, used by the fonts.
There for there is no “default character set”, character sets are irrelevant when it comes to 7bit ASCII.
Edited by LiveForIt on 2014/11/10 15:57:30
(NutsAboutAmiga)
Basilisk II for AmigaOS4 AmigaInputAnywhere Excalibur and other tools and apps.
I want to use my AmigaOS4.1 in English but set the fonts to Turkish. I still haven't figured this out.
You will need to set your keyboard map to turkish to get your charset, then your prefered language to english to get you language. They may or may not produce the effect you requIre?Iim testIng that concept as I type and these arenit typos! There agIn I donit know If the odd characters from they board wIll get trasmItted by AWeb????
BeIng unable to read turkIsh Iim unable to verIfy If that results In turkIsh language beIng dIsplayed correctly?
The only "common" information is... RKRM 3rd Edition based,
ISO-Latin-1 (this is ISO-8859-1 through ISO-8859-15 collectively)
You can only trust the Character Codes up o code 127(DEL)
Anything above character 127 is subject to change at the users whims.
Additionally ... I am working with UTF-8 as the codeset of choice for my own projects.
Use Locale.Library to get the MIBenum value and then query the on-disk reference file mapping them to names if you looked into S: and L:
DiskFont.Library will only tell you about what is currently displayed (and I am having fun and games with *multiple* Keymaps along with chording whole typed words for presenting small menus of options... 3000+ "daily Kanji" with readings anywhere from 1 through to 8 syllables for common and upto 16 syllables for uncommon readings, each "syllable" is equal to 2 or 3 English Letters...and that is only for the Japanese).
I wonder how anyone will cope when the "system default" is set for Unicode and there is no "upper limit" for Character codes (when a 32bit CodePoint IS reasonable).
Assumptions == Screwups of the worst kind... good to ask and definitely double-check before cutting code out of the frypan :P
Thanks again for everyone's suggestions on how to determine the current character set... even though it wasn't originally my intention to ask for that! Having finally got through a large list of things which were more necessary for my new program to function, I've now looked through those suggestions again, and implement a hopefully good way of getting the current character set.
@Chris Quote:
No. DO NOT ASSUME A CERTAIN CHARSET IS IN USE.
I don't see what's wrong with writing a "stub" function (which always returns ISO-8859-1), until my program becomes functional enough that it's worth finding out how to do it properly. Us solo programmers need to pick our fights carefully, and avoid extra work which isn't strictly necessary: http://www.lispcast.com/how-to-write-software (I agree with virtually everything he writes, apart from the part where he says to spend ages ensuring you write something 100% perfect the first time around.)
@broadblues Thanks for your suggestion of "locale->loc_CodeSet". At the moment I'm using that first, and only if it fails for some reason do I fall-back to using "GetDiskFontCtrl(DFCTRL_CHARSET)".
Quote:
I would trust wiki.amigaos.net over Wikipedia in such matters any day.
Of course. But Google didn't find the info I was after on wiki.amigaos.net .
@tboeckel Thanks for both of your suggestions. getSystemCodeset() was helpful in seeing how to do it on MorphOS & AROS.
Quote:
Conversion is best done by codesets.library.
I ended-up writing my own code to convert to/from other charsets, and read/write UTF-8 (the latter being somewhat time consuming since I wasn't familiar with how UTF-8 worked before). One benefit of doing it with my own code is that it will work on Windows/etc without any extra effort. Another benefit is that I can convert to/from UTF-8 while simultaneously converting encoded XML characters (rather than doing it less efficiently in two separate passes).
Text downloaded from the internet comes in all sorts of encodings, and displaying them correctly is tricky.
Quote:
if you wont consistency, you should store any text string as UTF8 and convert it to character set used by the user.
That is in fact what I settled on doing, otherwise things get too complicated. Luckily XML tends to be UTF8 in the first place.
@Belxjander Quote:
DiskFont.Library will only tell you about what is currently displayed
I'm afraid I don't understand how that might be a problem. Would "locale->loc_CodeSet" (after "locale=OpenLocale(NULL);") be better than "GetDiskFontCtrl(DFCTRL_CHARSET)"?
Quote:
I wonder how anyone will cope when the "system default" is set for Unicode and there is no "upper limit" for Character codes (when a 32bit CodePoint IS reasonable).
I don't see how AmigaOS can support Unicode as the system default. Even using UTF-8 would cause problems for many programs, which assume 1 byte is 1 character.
About the only solution I *can* see for AmigaOS, would be to have new functions which were explicitly UTF-8 (possibly also allowing UTF-16), and then have all legacy OS functions automatically convert UTF-8 to/from a "legacy character set". Anything which can't be converted would get replaced by a question mark or whatever (which apparently isn't advised for security reasons, but I can't see a better solution).