For problem 1, if it's a sax parser, then you need to have a buffer that you use to get the data. So you have StartElement () and EndElement () calls that tell you when an element is started and finished. In between them the data handler can get called multiple times, in your case CharacterElement, and you have to append the data chunks together yourself. Does that make sense?
For problem 2, I'd need more info but you should have a lifecycle like
StartElement for Keywords StartElement for ReservedWords One or more calls to CharacterElement EndElement for ReservedWords EndElement for Keywords
That makes sense, but I only get 1 call to CharacterElement() for the item I am parsing. I can tell because I put some Printf() calls in there and it only shows once.
On the other hand, it doesn't make sense. Just tell me the total length of expat's internal buffer and I copy it to my buffer of same length.
You should see Character data called more than once, so I'm not sure why you are not given the info provided.
As to the second issue, remeber that all chracter data including any leading white space is passed through, so you need to strip any newlines and tabs etc. yourself if present.
For more help you will need to share more detailed code....
You could side step this issue with better design XML structure though, your current idea of a space separated list inside a single element is not very structured. And structure is what xml is for. You have no advanteg from your xml over plain text, so you might as wel have used plain text.
For a more structure xml approacg try somethinmg more like...
Construct your own scheme naturally but use one tag for each keyword and add attributes with classnames or similar to associate meta data like what colour or style to render with.
Your second post is what I am trying to get away from. It is fine for most languages, but some have hundreds or even thousands of keywords (PHP, Python, C#....). It's not a deal breaker if I *have* to, just not my first choice.
I don't give up that easy. I will figure something out.
Workbench Explorer - A better way to browse drawers
With respect the more keywords the more advantage there is dealing with them in structured way, your thnking is "flawed" if you think otherwise
You can add new items very quickly with no danger of corrupting existing one for example.
I extended C/C++ syntax file for richeditor (added newlib functions to the existing standard set) The original is nearly 6000 lines long admittedly but I built the extra bits with a trivial bit of perl and this was easy brcause of the simple structred nature of the data.
Don't be lazy it will save you effort in the long run and make your resource much more rubust.
Having read about your trouble with Expat, would it be an option for you to try out something else? There's a lightweight XML parser implemented as a link library for C which I have ported to OS4. It's quite small and fast.
Unlike Expat (which is event-driven and you have to write all handling code yourself), libroxml is a tree-based parser, so it makes it much easier to access the XML data. The API is very well documented.
Let me know if you'd like to give it a try. (I haven't released the port on OS4depot yet because I have only done a few basic tests, but so far it seems to work fine.)
I have re-worked my files to be "the old way" I was doing things, <Keyword>xxx</Keyword>, for each entry. It isn't killing me, but more setup work.
ezXML was my second choice over expat. It worked fine with the long lines....at first. It is also a tree-based parser. But it has no error checking for bad elements.
Does libroxml verify the document like expat?
But yes, I will take a look at it. I like that it can write files, too. Send it to my email.
Thanks
Workbench Explorer - A better way to browse drawers