Strings and C side primitives
Brian T. Rice
water at tunes.org
Mon Aug 16 19:52:44 PDT 2004
I should have replied earlier, but it takes time to think things over,
and I thought Lee might answer this better. In any case, here are my
comments:
Olli wrote:
> Hello.
>
> Disclaimer: I don't know mobius or pidgin too well, so here might be
> errors/misconseptions.
>
>> Hi,
>>
>> Recently we had some discussion about vm.h not being includable, and
>> then we ended up that basically it shouldn't be included at all
>> because C side primitives shouldn't really use anything from the vm.
This makes little to no sense to me. Why would we create a header file
that's not worth including in anything? I wrote the initial system, and
it's based on the Squeak code-generation model where you use the header
to define what operations a native binding can utilize to make some
primitive code work with the VM better.
Of course, we have a different VM architecture, where the only bytecodes
are operations that you can't express in the language (barring a few
optimizations for extremely common cases), and those are relatively safe
to call from outside the image, as long as the controller is working in
a context that has no outside influence; basically a locked Slate process.
If the header is not includable, then it's totally useless, and the
correct solutions are to either remove it or refactor it into something
useful. (I've pointed out some directions via IRC where this could happen.)
>> It brings up a question how should C prims be written platform
>> independently that use strings. File prims are an example, that return
>> unicode file names on windoze.
>>
>> We could fix the interface to communicate trough an utf16[], but on
>> some platforms it would only be a hassle, etc...
>
> I think UTF-8 would be easier, since we already have ByteArrays, but not
> 16-bit arrays, IIRC. We could also use arrays of integers, and then there
> would be no need for UTF encoding/decoding in most platforms. Some sort of
> conversion would be needed anyways (except in platforms where file
> names are communicated in 32-bit arrays), but that would be simpler. Not
> that UTF-8 is too complicated, either.
Don't most modern language implementations these days standardize on
Unicode support via external UTF-8 (and Java using UTF-16 internally
while still using UTF-8 externally)? I agree with Olli that this is
probably the most natural fit, to concentrate on byte-based codings.
>> But we could also extend vm generation so that it contains all the
>> platform specific code in pidgin reduced to one-liner "directly"
>> entries. What do you think about this?
>>
>> And if it's not the way to go, then what should be the interface
>> between VM and C prims with strings?
>
> I'd say either UTF-8 or array of integers. Also the interface with
> characters should be decided, propably just integers.
An alternative is to just always pass in raw bytes including encoding
information, perhaps using lead bytes, but I'm not sure how feasible
this is. This information has to be communicated from the system somehow
so I'd rather just re-use that and keep the low-level interaction code
as simple as possible without introducing ambiguity.
> The unicode library is nearing a point where you can do with unicode
> strings/characters everythingthing you could with the old strings and
> characters. This means that all references to String and Character (or
> StringProto/CharacterProto) used internally in VM should be converted to
> ByteArrays and Bytes (or what ever that is called, ASCIICharacter?). The
> stuff that interacts with the image should be converted to use the proper
> interface, whatever that is decided to be. Just remember that no assumption
> that String == ByteArray should be made anywhere, or that Character ==
> Byte.
Yes, and I have absolutely no objections to making the VM entirely
unaware of what a String is, just to say that "here are some bytes and
here's a code that says how it's coded". In fact, I entirely prefer that
situation if it is a net win.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: water.vcf
Type: text/x-vcard
Size: 208 bytes
Desc: not available
Url : /archives/slate/attachments/20040816/291d8064/water.vcf
More information about the Slate
mailing list