[#10209] Market for XML Web stuff — Matt Sergeant <matt@...>

I'm trying to get a handle on what the size of the market for AxKit would be

15 messages 2001/02/01

[#10238] RFC: RubyVM (long) — Robert Feldt <feldt@...>

Hi,

20 messages 2001/02/01
[#10364] Re: RFC: RubyVM (long) — Mathieu Bouchard <matju@...> 2001/02/05

[#10708] Suggestion for threading model — Stephen White <spwhite@...>

I've been playing around with multi-threading. I notice that there are

11 messages 2001/02/11

[#10853] Re: RubyChangeRequest #U002: new proper name for Hash#indexes, Array#indexes — "Mike Wilson" <wmwilson01@...>

10 messages 2001/02/14

[#11037] to_s and << — "Brent Rowland" <tarod@...>

list = [1, 2.3, 'four', false]

15 messages 2001/02/18

[#11094] Re: Summary: RCR #U002 - proper new name fo r indexes — Aleksi Niemel<aleksi.niemela@...>

> On Mon, 19 Feb 2001, Yukihiro Matsumoto wrote:

12 messages 2001/02/19

[#11131] Re: Summary: RCR #U002 - proper new name fo r indexes — "Conrad Schneiker" <schneik@...>

Robert Feldt wrote:

10 messages 2001/02/19

[#11251] Programming Ruby is now online — Dave Thomas <Dave@...>

36 messages 2001/02/21

[#11469] XML-RPC and KDE — schuerig@... (Michael Schuerig)

23 messages 2001/02/24
[#11490] Re: XML-RPC and KDE — schuerig@... (Michael Schuerig) 2001/02/24

Michael Neumann <neumann@s-direktnet.de> wrote:

[#11491] Negative Reviews for Ruby and Programming Ruby — Jim Freeze <jim@...> 2001/02/24

Hi all:

[#11633] RCR: shortcut for instance variable initialization — Dave Thomas <Dave@...>

13 messages 2001/02/26

[#11652] RE: RCR: shortcut for instance variable initialization — Michael Davis <mdavis@...>

I like it!

14 messages 2001/02/27

[#11700] Starting Once Again — Ron Jeffries <ronjeffries@...>

OK, I'm starting again with Ruby. I'm just assuming that I've

31 messages 2001/02/27
[#11712] RE: Starting Once Again — "Aaron Hinni" <aaron@...> 2001/02/27

> 2. So far I think running under TextPad will be better than running

[#11726] Re: Starting Once Again — Aleksi Niemel<zak@...> 2001/02/28

On Wed, 28 Feb 2001, Aaron Hinni wrote:

[ruby-talk:10366] Re: RFC: RubyVM (long)

From: Robert Feldt <feldt@...>
Date: 2001-02-05 08:42:57 UTC
List: ruby-talk #10366
On Sun, 4 Feb 2001, Mathieu Bouchard wrote:

> > >From matz description of the next-generation interpreter (hereafter
> > called MNG) it seems that his main goal is to address issues I1 and
> > I2. He intends to design a bytecode format for Ruby
> 
> I'd like to see a draft of the bytecode format description when Matz
> writes it.
> 
Me to.

> > mSt is a subset of Smalltalk that maps directly onto C
> > constructs. It excludes blocks, message sending and even objects.
> 
> This I'm not sure I agree with. I don't know how mSt can in any way be
> a subset of SmallTalk if it doesn't support the three main elements that
> make SmallTalk what it is. I think our equivalent of mSt should be Ruby
> itself or a real subset of it.
> 
I agree with you. I've looked into Squeak and its VM a bit more and it
actually seems it is written in C with a Smalltalk syntax (IMHO, it can
still be a subset though!), ie. its so far from Smalltalk the only
benefit you get compared to writing it in C is you can develop and run it
in your Smalltalk environment (might be important though).

(On the subset issue: Programs with simply integers and arithmetic
operators would be a valid subset of Ruby or ST so I guess thats why they
can call the system language a subset of ST.)

I agree that the Ruby subset we write RubyVM components in should be a
real subset of Ruby. However we will need to make some deviations from
Ruby semantics if we want good performance. Otherwise we are faced with
writing a full Ruby->C translator (and not like rb2c that (correct me if
I'm wrong) needs matz Ruby implementation to work)), ie. essentially a
Ruby compiler. I think this is doable in the long run but it'll not be
easy and will require tight integration with a VM/run-time.

Stuff that comes to mind were we'll probably need to deviate from full
Ruby (incomplete and from the top of my head):

* No garbage collection. (the idea is that we should write Ruby's GC in
this subset so we cannot presuppose GC in the subset. Besides we'd lose
performance. However, a q for matz: Does the interpreter itself make use
of GC heap mem for internal data structures?)

* Statically typed. To get good performance we need to resolve Ruby's
dynamic typing at compile-time. This can be done either by using
(complex) type inference/reconstruction algorithms or by using the
existing Ruby interpreter (I'm thinking something like: install trace
function that intercepts each "call" and "c-call" and saves the types of
the arguments to the call (we can get them from the binding, right?),
then run the program and collect type info, then convert each method
invocation with a unique type signature into a unique C function)

* No full Ruby Std lib. The basic stuff like Fixnums and Float can be
converted directly but we will probably have to deal with the other basic
ones like String, Array and Hash. Anyone knows what data structures are
used in the current interpreter and will be crucial when writing VM
components? We can take two approaches here: (1) Write CArray and CHash
that give functionality similar to Array and Hash but are written directly
in the subset language, or (2) implement ObjectMemory with GC and then
"real" Array and Hash using the ObjectMemory. Latter will be easier and
less error-prone but probably slower. How does matz do today? I guess he
uses the low-level hash implementation (st?)?

* No dynamic class extension, ie. you cannot redefine methods
dynamically (actually you'll be allowed to do it but only the
latest definition will be used). Since we'll probably have to compile
methods into C functions we need some static mapping. We might work
around this with some name mangling technique but I'm not sure its worth 
the effort.

* No eval. When writing the interpreter its not there so... (Same goes for
a lot of the stuff currently in Ruby like safe levels etc)

I still think a subset of Ruby with the above restrictions would be close
enough to Ruby to be useful (ie being more powerful than C directly while
still being compileable to fast C). We would still have:

* Classes and Modules (converted to structs for instance vars and function
for methods)
* Iterators (converted to iteration, ie. for loops)
* Blocks/closures (converted to functions and function pointers)
* Basic Ruby syntax
* globals, class vars, instance vars and local/temp vars

However I think the way we should develop/design the Ruby subset is by
actually writing some of the current core components (say the GC or the
full memory model (with the object format and GC)) in full Ruby and see
what parts of Ruby are crucial if we don't want to sacrifice readability
and compactness. Then we'll have to think hard about if we can convert all
the constructs needed to fast C code. If not we might have to impose some
unwanted restrictions on the subset language.

If you want to dig into this some more take a look at Pre-Scheme in the
Scheme48 implementation. There are links on my rubyvm web
page. IMHO, Pre-Scheme is closer to what we should strive for.

 > > 1. RubyVM-Core in C (or even
assembler). This
is basically MNG, > > possibly with some differences in design.
> > 2. RubyVM-Core in (pure) Ruby.
> 
> I think #1 is a better for a start. We'll see about #2 when the rest is
> done.
> 
I dont agree. If we can have a Ruby subset that is more than C in Ruby
syntax I think we add benefit to writing it in C. We can use
closures/blocks (proc) and the code is more compact and easier for
non-C/experts to read and understand. IMHO, we should first do it in a
Ruby subset and then do the ciritical parts directly in C if we see the
need for it. And mind you I'm not proposing a substitute for matz work
(which is basically 1) but a parallel project with somewhat similar goals.

 > > Alternative to compiling mRb to C
> > ---------------------------------
> > Instead of compiling to C we could compile to native code directly. We
> > can probably come up with a nice OO design where code generators for
> > different machines can be plugged in but it will probably be more
> > difficult to implement the many different optimizations of modern C
> > compilers.
> 
> If you want to compile to native code you should first compile to C. The
> mRb-to-C compiler is much more important than any native-code generators. 
> I don't think I want another Self interpreter that runs only on 2
> processors. 
> 
I'm proposing we should translate to C. I'm mentioning this alternative
since its so common in many VM's/dynamic programming systems. I think the
drawback with generating native code by hand is that we'd have to
implement all of gcc optimizations to get the performance on par with
it. On the other hand, doing it ourselves might be the only way if we want
dynamic/just-in-time/adaptive compilation so there's a tradeoff here. I
still think we should start by generating C...

> > We should probably learn from the fast Self and Smalltalk
> > implementations around.
> 
> The Self implementation I've tried cannot possibly be described as "fast".
> I guess the PowerPC implementation is not up to par with the SPARC
> implementation. Is that the case?
> 
Don't know, I haven't tried it. My impression from the research papers is
that the Self compiler is very advanced doing analyses that lets them
optimize away dynamic dispatch in many situations. 

But if its not good I guess Craig Chambers later project might be a source
of inspiration (Vortex compiler for Cecil,C++ and Smalltalk).

> > * unboxed floats and long integers (Self only supports 30-bit integers)
> 
> Ruby has its Float boxed too, right?
> 
Yes, I think everything is boxed except Fixnums (and possibly nil and
false et al?).

> > * arbitrary control flow within a method (Java) (gotos for multiple
> > branching) 
> 
> This is given you depart from the AST system. Converting to a more
> free-form (bytecode) control-flow can hinder some optimizations. OTOH,
> some other optimizations are only possible with bytecode. The idea is to
> make the AST -> bytecode translation at the right time, not too early, and
> not too late. 
> 
Actually I expect we should have different implementations of the
low-level VM components; some working on AST's (much like current
interpreter) and some working with byte-code (and some with native
code in the distant future!).

Regards,

Robert

In This Thread