[#72642] Advantages of Symbols over constants — Marek Janukowicz <childNOSPAM@...17.ds.pwr.wroc.pl>

11 messages 2003/06/01

[#72732] case of sub! not working — Ian Macdonald <ian@...>

Hi,

27 messages 2003/06/03
[#72734] Re: case of sub! not working — Joel VanderWerf <vjoel@...> 2003/06/03

Ian Macdonald wrote:

[#72744] Re: case of sub! not working — Ian Macdonald <ian@...> 2003/06/03

On Tue 03 Jun 2003 at 10:21:43 +0900, Joel VanderWerf wrote:

[#72769] Re: case of sub! not working — Michael Campbell <michael_s_campbell@...> 2003/06/03

[#72907] Syck 0.35 + YAML.rb 0.60 -- the 1st stable release — why the lucky stiff <ruby-talk@...>

Pleased to announce:

18 messages 2003/06/05
[#75182] Re: Syck 0.35 + YAML.rb 0.60 -- the 1st stable release — Richard Zidlicky <rz@...68k.org> 2003/07/04

On Fri, Jun 06, 2003 at 06:15:58AM +0900, why the lucky stiff wrote:

[#72908] Problem with "require" stmt in "test-first " tutorial — RLMuller@... (Richard)

Hi All,

27 messages 2003/06/05

[#72940] VAPOR 0.06, Transparent Persistence to PostgreSQL — "Oliver M. Bolzer" <oliver@...>

Hi!

22 messages 2003/06/06

[#72975] join block — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

29 messages 2003/06/06

[#72986] multiple blocks or proc arguments to method — itsme213@... (you CAN teach an old dog ...)

I was trying to write a collect_if method:

11 messages 2003/06/07

[#73081] requiring standard libs with save level 1 — Eugene Scripnik <Eugene.Scripnik@...>

I've set up new version of Ruby from CVS and my programs failed to work.

13 messages 2003/06/09
[#73114] Re: requiring standard libs with save level 1 — matz@... (Yukihiro Matsumoto) 2003/06/09

Hi,

[#73134] tcltklib does not get compiled. — John Fletcher <J.P.Fletcher@...>

I have installed ruby 1.6.7 on two computers using Red Hat 8.0 Linux.

14 messages 2003/06/10

[#73148] OT: Regexp question — Dominik Werder <dwerder@...>

Hi all,

25 messages 2003/06/10

[#73215] Rubyx (provisionally named) linux distro. Made by and run by Ruby — Andrew Walrond <andrew@...>

I have developed a little script which creates a simple linux distro

38 messages 2003/06/11

[#73260] Multiple Initialize methods? — "Nick" <nick.robinson@...>

Hi,

21 messages 2003/06/11

[#73283] Ruby advantages over Perl — Marek Janukowicz <childNOSPAM@...17.ds.pwr.wroc.pl>

68 messages 2003/06/11
[#73374] Re: Ruby advantages over Perl — Jason Creighton <androflux@...> 2003/06/12

On Thu, 12 Jun 2003 17:56:02 +0900

[#73356] does each work on a copy? — Rasputin <rasputin@...>

17 messages 2003/06/12

[#73372] Reason for implicit block syntax ? — itsme213@... (you CAN teach an old dog ...)

What is the reason for the implicit block in Ruby invocations?

13 messages 2003/06/12

[#73463] Hispeed String concat — Dominik Werder <dwerder@...>

What is the fastest way to add many small Strings to a big buffer?

17 messages 2003/06/13

[#73503] RaaInstallInRuby petition — ptkwt@...1.aracnet.com (Phil Tomson)

18 messages 2003/06/13

[#73555] I need a code beautifier or formatter — joaopedrosa@... (Joao Pedrosa)

Hello,

13 messages 2003/06/14

[#73600] Get songtitle from Winamp — calvin8@... (Andi Scharfstein)

Hi,

26 messages 2003/06/15
[#73601] Re: Get songtitle from Winamp — Daniel Carrera <dcarrera@...> 2003/06/15

-----BEGIN PGP SIGNED MESSAGE-----

[#73602] Re: Get songtitle from Winamp — Chad Fowler <chadfowler@...> 2003/06/15

It's a Win32API convention meaning "Window Handle".

[#73603] Re: Get songtitle from Winamp — Daniel Carrera <dcarrera@...> 2003/06/15

-----BEGIN PGP SIGNED MESSAGE-----

[#73605] Re: Get songtitle from Winamp — Wesley J Landaker <wjl@...> 2003/06/15

On Sunday 15 June 2003 9:34 am, Daniel Carrera wrote:

[#73609] Re: Get songtitle from Winamp — Daniel Carrera <dcarrera@...> 2003/06/15

-----BEGIN PGP SIGNED MESSAGE-----

[#73640] Standardizing Installers — Tom Clarke <tom@...2i.com>

I was thinking about some of the issues raised involving ruby libraries

16 messages 2003/06/16

[#73663] /BEGIN/ .. /END/ file reading — Wild Karl-Heinz <kh.wild@...>

hello

15 messages 2003/06/16
[#73674] Re: /BEGIN/ .. /END/ file reading — "Robert Klemme" <bob.news@...> 2003/06/16

[#73677] Re: /BEGIN/ .. /END/ file reading — Michael Campbell <michael_s_campbell@...> 2003/06/16

> A range operator with a regexp works like a flip flop (bistable

[#73680] Multiline comments? — "Christoph Tapler" <christoph.tapler@...>

I'm new to Ruby and I'm wondering that there is no possibility to write

38 messages 2003/06/16

[#73781] editor / ide recommentation on Windows — itsme213@... (you CAN teach an old dog ...)

What editor / ide would you recommend for serious Ruby work on

20 messages 2003/06/17

[#73787] Array#push(empty array expanded) => no exception — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

This strange behavier really surprised me..

13 messages 2003/06/17

[#73821] European Ruby Conference — "Hal E. Fulton" <hal9000@...>

I don't think I've mentioned this before, but I

15 messages 2003/06/17

[#73924] Re: TCP/IP protocol and Net::HTTP — "J.Hawkesworth" <J.Hawkesworth@...>

Works for me too.

13 messages 2003/06/19
[#73931] Re: TCP/IP protocol and Net::HTTP — Nigel Gilbert <n.gilbert@...> 2003/06/19

I am beginning to wonder if this problem arises from the MacOS X

[#73943] collect info about ruby-api — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

I have long been longing for a good description of ruby C api.

35 messages 2003/06/19

[#74039] WxRuby status? — ptkwt@...1.aracnet.com (Phil Tomson)

14 messages 2003/06/20
[#74507] Re: WxRuby status? — Richard Kilmer <rich@...> 2003/06/26

Things are progressing great. Kevin Smith has taken the development

[#74070] How to test if a file exists? — Daniel Carrera <dcarrera@...>

-----BEGIN PGP SIGNED MESSAGE-----

12 messages 2003/06/21

[#74096] Exasperated with ruby/tk - anybody successfully using it? — "Richard Browne" <richb@...>

General question: Is ruby/tk still being maintained in 1.7/1.8 or is it

10 messages 2003/06/22

[#74104] String#decorate — martindemello@... (Martin DeMello)

When chaining methods, it'd be neat to have something that was passed

17 messages 2003/06/22

[#74156] Marshal bug? — Anders Borch <spam@...>

Hi!

15 messages 2003/06/23
[#74161] Re: Marshal bug? — Dave Thomas <dave@...> 2003/06/23

Anders Borch wrote:

[#74205] can't find appropriate regexp — "Patrick Zesar" <jonnypichler@...>

spamassassin blocked my previous post :-((((

17 messages 2003/06/23

[#74279] Ruby Developer's Guide - hurt book sale — dennis@... (Dennis Sutch)

Syngress Publishing is having a hurt book sale. Per Syngress

11 messages 2003/06/24

[#74379] protect parents from children — "Simon Strandgaard" <0bz63fz3m1qt3001@...>

I fell into these pitfalls yesterday.. that a child was modifying a parent!

27 messages 2003/06/25

[#74413] Ruby/Java integration through JNI: working implementation — Mauricio Fern疣dez <batsman.geo@...>

14 messages 2003/06/25
[#74436] Re: Ruby/Java integration through JNI: working implementation — D T <tran55555@...> 2003/06/25

Yet An other JRuby ?? :-)

[#74465] DBD for Oracle9i — Jim Cain <list@...>

Hi all. I was looking for a Ruby interface to 9i that would handle all

25 messages 2003/06/25

[#74478] RPM for 1.8.0 — John Carter <john.carter@...>

I would like to get / build a Mandrake 9.1 RPM for Ruby-1.8.0 Preview 3

17 messages 2003/06/26

[#74506] String#split(' ') and whitespace (perl user's surprise) — mike@... (Mike Stok)

I have to confess that I use a lot of Perl, and some of its idioms are

15 messages 2003/06/26

[#74573] Using & for arrays of objects — "Krishna Dole" <kpdole@...>

Hi,

39 messages 2003/06/27

[#74579] why can't I use $3somevar for global variable in ruby 1.8.0? — Donglai Gong <donglai@...>

Hi, I'm new to Ruby programming and I just upgraded from 1.6.8 to 1.8.0

10 messages 2003/06/27

[#74702] Slides from my talk are up on rubyhacker.com — "Hal E. Fulton" <hal9000@...>

I was pleased to attend the European Ruby Conference

25 messages 2003/06/29

[#74706] Help with UnboundMethod#bind error — gabriele renzi <surrender_it@...1.vip.lng.yahoo.com>

Hi gurus and nubys,

16 messages 2003/06/29
[#74708] Re: Help with UnboundMethod#bind error — nobu.nokada@... 2003/06/29

Hi,

[#74732] Re: Help with UnboundMethod#bind error — matz@... (Yukihiro Matsumoto) 2003/06/30

Hi,

[#74919] Re: Help with UnboundMethod#bind error — "Pit Capitain" <pit@...> 2003/07/02

On 30 Jun 2003 at 17:18, Yukihiro Matsumoto wrote:

[#74717] Re: Message catalogs (I18N) overnight hack... — "Hal E. Fulton" <hal9000@...>

----- Original Message -----

17 messages 2003/06/29

[#74747] Editor like Textpad on Linux? — Dominik Werder <dwerder@...>

Hello,

13 messages 2003/06/30

[#74768] dynamic object creation — Aryeh Friedman <aryeh@...>

If I have something like this:

15 messages 2003/06/30

Re: HTML -> list of sentences? (semi-impossible task)

From: Dave Oshel <dcoshel@...>
Date: 2003-06-12 14:29:23 UTC
List: ruby-talk #73364
It depends on what you mean by "sentence", 'ey?  Do you mean natural 
language (English? Rumanian? Urdu? Hakka? Thai? Japanese?), or 
artificial formalisms like programming languages (Perl, Ruby, FORTH)?

But someone went to a lot of trouble to carve up their perceptions of 
reality (heh) into procrustean HTML, so you may as well begin there.  
Determine the major syntactical units  (TABLE, DIV, P, HR, PRE, TT, H1, 
etc.).  Recursing, determine what is a "sentence" on semantic, 
idiomatic (BR, B, U), or at least grammatical  (カ、ネー、ニ、ヘ、。。。), grounds. 
  Collect these purely formal "sentences" and send the list to 
post-processing (possibly human inspection) to be vetted and refined 
(e.g., does your system account for utterances which are meaningful but 
grammatically abbreviated, like "What up?" (MTV argot used by 
advertisers to slide nickels out of pockets) or "Annta desu" (kids 
choosing sides for oni in Osaka). )

If you have access to a page's CSS, your hints about what the author(s) 
intended are much expanded.  Maybe not so impossible after all?  This 
does not seem like a difficult task to me, but maybe I haven't 
appreciated the context from which the question is posed?  Does the 
solution have to be extremely general, or is it a one-shot?

David


On Wednesday, June 11, 2003, at 09:38  PM, Hal E. Fulton wrote:

> Hello, all.
>
> Here's an idea I'm toying with. Suggestions
> are welcome.
>
> I want to take an HTML document (reasonably
> well-formed, but not guaranteed) and remove
> all the tags from it...
>
> ...and get a list of the *sentences* in the
> document.
>
> There are, of course, several things that make
> this difficult:
>   - need to distinguish between end-of-sentence
>     and embedded punctuation, including both
>     abbreviations and textual references to
>     Ruby methods such as eof? and split!
>   - need to treat sentence fragments as sentences
>   - need to ignore blocks of code
>   - etc.
>
> My current approach is to start with htmlsplit
> from the RAA. This is fairly simplistic, but
> at least it doesn't have any dependencies.
>
> Not sure whether to do it in two steps or not:
> 1. Convert to text
> 2. Process
>
> Might be just as easy to do it in one step if
> I knew what I was doing.
>
> Also not sure what is the best tool/library for
> this job.
>
> Comments welcome.
>
> Hal
>
> --
> Hal Fulton
> hal9000@hypermetrics.com
>
>
>
>
--
David C. Oshel               mailto:dcoshel@mac.com
Cedar Rapids, Iowa       http://homepage.mac.com/dcoshel
``I think most pleasantly in metaphors, and smoking brings metaphors to 
mind." - Augustus Srb, in Alexei Panshin's  _Star Well_


In This Thread