Strings and C side primitives

Brian T. Rice water at tunes.org
Mon Aug 16 19:52:44 PDT 2004


I should have replied earlier, but it takes time to think things over, 
and I thought Lee might answer this better. In any case, here are my 
comments:

Olli wrote:

> Hello.
> 
> Disclaimer: I don't know mobius or pidgin too well, so here might be
> errors/misconseptions.
> 
>> Hi,
>>
>> Recently we had  some discussion about vm.h not being includable, and 
>> then we ended up that basically it shouldn't be included at all
>> because C side primitives shouldn't really use anything from the vm.

This makes little to no sense to me. Why would we create a header file 
that's not worth including in anything? I wrote the initial system, and 
it's based on the Squeak code-generation model where you use the header 
to define what operations a native binding can utilize to make some 
primitive code work with the VM better.

Of course, we have a different VM architecture, where the only bytecodes 
are operations that you can't express in the language (barring a few 
optimizations for extremely common cases), and those are relatively safe 
to call from outside the image, as long as the controller is working in 
a context that has no outside influence; basically a locked Slate process.

If the header is not includable, then it's totally useless, and the 
correct solutions are to either remove it or refactor it into something 
useful. (I've pointed out some directions via IRC where this could happen.)

>> It brings up a question how should C prims be written platform 
>> independently that use strings. File prims are an example, that return
>> unicode file names on windoze.
>>
>> We could fix the interface to communicate trough an utf16[], but on 
>> some platforms it would only be a hassle, etc...
> 
> I think UTF-8 would be easier, since we already have ByteArrays, but not
> 16-bit arrays, IIRC. We could also use arrays of integers, and then there
> would be no need for UTF encoding/decoding in most platforms. Some sort of
> conversion would be needed anyways (except in platforms where file
> names are communicated in 32-bit arrays), but that would be simpler. Not
> that UTF-8 is too complicated, either.

Don't most modern language implementations these days standardize on 
Unicode support via external UTF-8 (and Java using UTF-16 internally 
while still using UTF-8 externally)? I agree with Olli that this is 
probably the most natural fit, to concentrate on byte-based codings.

>> But we could also extend vm generation so that it contains all the 
>> platform specific code in pidgin reduced to one-liner "directly"
>> entries. What do you think about this?
>>
>> And if it's not the way to go, then what should be the interface 
>> between VM and C prims with strings?
> 
> I'd say either UTF-8 or array of integers. Also the interface with
> characters should be decided, propably just integers.

An alternative is to just always pass in raw bytes including encoding 
information, perhaps using lead bytes, but I'm not sure how feasible 
this is. This information has to be communicated from the system somehow 
so I'd rather just re-use that and keep the low-level interaction code 
as simple as possible without introducing ambiguity.

> The unicode library is nearing a point where you can do with unicode
> strings/characters everythingthing you could with the old strings and
> characters. This means that all references to String and Character (or
> StringProto/CharacterProto) used internally in VM should be converted to
> ByteArrays and Bytes (or what ever that is called, ASCIICharacter?). The
> stuff that interacts with the image should be converted to use the proper
> interface, whatever that is decided to be. Just remember that no assumption
> that String == ByteArray should be made anywhere, or that Character == 
> Byte.

Yes, and I have absolutely no objections to making the VM entirely 
unaware of what a String is, just to say that "here are some bytes and 
here's a code that says how it's coded". In fact, I entirely prefer that 
situation if it is a net win.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: water.vcf
Type: text/x-vcard
Size: 208 bytes
Desc: not available
Url : /archives/slate/attachments/20040816/291d8064/water.vcf


More information about the Slate mailing list