Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Message Delimiters

37 views
Skip to first unread message

Jeanne Fromer

unread,
Jun 28, 1996, 3:00:00 AM6/28/96
to
Hi, I'm trying to take a spool on a UNIX machine and parse the messages
into a data structure. I'm having problems finding a good rule that
can find the boundary between 2 messages. All messages in the spool
start with:

From email-address time

after a blank line, but you could easily imagine someone typing such
a line in the body of a message. I notice that commercial mailers can
determine the delimiter/boundary between 2 messages. Does anyone have
any suggestions as to how I can go about this?

Please mail your replies as I don't check this newsgroup very often.
I will compile the answers I receive and summarize them on the group
in case people are interested.

Thank you very much.

Jeannie Fromer

Bill Coutinho

unread,
Jul 2, 1996, 3:00:00 AM7/2/96
to Jeanne Fromer

Jeanne Fromer wrote:
> All messages in the spool
> start with:
>
> From email-address time
>
> after a blank line,

This is the right delimiter. When sending mail, the mailers avoid lines in the form
"From ..." in the body of the message by putting a ">" in front of them: ">From ...".
--
Bill Coutinho mailto:bi...@correionet.com.br
Dextra - Solutions for ISPs http://www.correionet.com.br/dextra/
Campinas, SP voice:+55-19-251-3644
Brazil fax:+55-19-253-0440

Eric Tse

unread,
Jul 3, 1996, 3:00:00 AM7/3/96
to

In article <31D90B...@correionet.com.br>,

Bill Coutinho <bi...@correionet.com.br> wrote:
>Jeanne Fromer wrote:
>> All messages in the spool
>> start with:
>>
>> From email-address time
>>
>> after a blank line,
>
>This is the right delimiter. When sending mail, the mailers avoid lines in the form
>"From ..." in the body of the message by putting a ">" in front of them: ">From ...".

A related question :
Is this delimiter universal (is it true on _all_ e-mailboxes on the
Internet)? Do different e-mailbox formats employ different delimiters?
Thanks in advance.

Eric
--
--
Eric Tse - E-mail : jye...@undergrad.math.uwaterloo.ca
--- ``Life is too short to be taken seriously.'' ---

Russ Allbery

unread,
Jul 4, 1996, 3:00:00 AM7/4/96
to

In comp.mail.headers, Eric Tse <jye...@undergrad.math.uwaterloo.ca> writes:

>> Jeanne Fromer wrote:

>>> All messages in the spool
>>> start with:
>>>
>>> From email-address time
>>>
>>> after a blank line,

> Is this delimiter universal (is it true on _all_ e-mailboxes on the


> Internet)? Do different e-mailbox formats employ different delimiters?

No, it's not universal. MMDF, for example, uses four Control-As. But
it's pretty close to universal. Most packages and scripts I write assume
"From " as a delimiter; I only bother to support other delimiters when I'm
feeling like being extremely complete.

Followups to comp.mail.misc, since mailbox delimiters are not headers.

--
Russ Allbery (r...@cs.stanford.edu) <URL:http://www.eyrie.org/~eagle/>

Phil Edwards

unread,
Jul 4, 1996, 3:00:00 AM7/4/96
to

Note: all this stuff has been crossposted to c.m.headers and c.m.misc.
Probably time to trim one of them.


With <Dtzpt...@undergrad.math.uwaterloo.ca>,
it seems Eric Tse (jye...@undergrad.math.uwaterloo.ca) sez:

+ In article <31D90B...@correionet.com.br>,
+ Bill Coutinho <bi...@correionet.com.br> wrote:
+ >
+ >This is the right delimiter. When sending mail, the mailers avoid lines in the form
+ >"From ..." in the body of the message by putting a ">" in front of them: ">From ...".
+
+ A related question :
+ Is this delimiter universal (is it true on _all_ e-mailboxes on the
+ Internet)? Do different e-mailbox formats employ different delimiters?
+ Thanks in advance.


Eric,

You've misread the post. The delimiter is placed when /sending/ mail,
i.e., during the actual transmission of the message. The receiving end
removes the delimiter.

It's "universal" on all RFC822/823-compliant Internet hosts. This
probably excludes a great many third-party commercial vendors, and most of
MicroSloth. But it doesn't have anything to do with how the messages are
/stored/.


Luck++;
Phil

--
pedw...@cs.wright.edu http://www.cs.wright.edu/people/students/pedwards/
The gods do not protect fools. Fools are
protected by more capable fools. -Larry Niven

Brian Kantor

unread,
Jul 5, 1996, 3:00:00 AM7/5/96
to

jye...@undergrad.math.uwaterloo.ca (Eric Tse) writes:
> Is this delimiter universal (is it true on _all_ e-mailboxes on the
>Internet)? Do different e-mailbox formats employ different delimiters?

No, it's not universal. It's very common on Unix systems, but not
everywhere. MH, a popular mail agent, stores messages each in their
own separate files, as one example.
- Brian

Paul Hethmon

unread,
Jul 8, 1996, 3:00:00 AM7/8/96
to

:>+ >This is the right delimiter. When sending mail, the mailers avoid lines in the form
:>+ >"From ..." in the body of the message by putting a ">" in front of them: ">From ...".
:>+
:>+ A related question :
:>+ Is this delimiter universal (is it true on _all_ e-mailboxes on the
:>+ Internet)? Do different e-mailbox formats employ different delimiters?
:>+ Thanks in advance.

:>
:>
:>Eric,
:>
:>You've misread the post. The delimiter is placed when /sending/ mail,
:>i.e., during the actual transmission of the message. The receiving end
:>removes the delimiter.

The "from " delimiter is a Unix/sendmail thing, it's not part of the protocol.
When
sending mail RFC 821 is used and what comes in the DATA phase doesn't
really matter. The only thing you have to do is byte stuff for transparency.

:>It's "universal" on all RFC822/823-compliant Internet hosts. This


:>probably excludes a great many third-party commercial vendors, and most of
:>MicroSloth. But it doesn't have anything to do with how the messages are
:>/stored/.

It's actually only used as a storage delimiter for sendmail, and probably some
other Unix implementations. Since the messages are all stored in a single
file, something has to be unique to delimit them.


Paul Hethmon
phet...@utk.edu
------------------------------------------------------------
Computerman -- Agricultural Policy Analysis Center
------------------------------------------------------------
NeoLogic FTP & Mail Servers
------------------------------------------------------------
Knoxville Warp Home Page: http://apacweb.ag.utk.edu/os2
------------------------------------------------------------


Message has been deleted

Graham Minchin

unread,
Jul 12, 1996, 3:00:00 AM7/12/96
to

ght.edu> <4rrs4d$3...@gaia.ns.utk.edu>
Distribution:

Paul Hethmon (phet...@utk.edu) wrote:
:
: It's actually only used as a storage delimiter for sendmail, and probably


some
: other Unix implementations. Since the messages are all stored in a single
: file, something has to be unique to delimit them.

It's actually called the Berkeley file format, although it's possibly the
simplest file format you've ever come across (apart from text), because it
consists of concatenating all the messages together and just saving it as
a text file. It's definitely used by Pine and possibly the other popular
mail readers too, such as elm. The "From " line actually has a more
complex structure, the date and time have to be stored in a specific
format as well (with quite a few variations allowed, I believe) but Pine
converts lines starting with "From " to ">From " so that programs which
can't be bothered testing the format of the whole line will still
correctly parse the folder.

Graham

Graham Minchin

unread,
Jul 13, 1996, 3:00:00 AM7/13/96
to

Jamie Zawinski (j...@netscape.com) wrote:
: > although it's possibly the simplest file format you've ever come

: > across (apart from text), because it consists of concatenating all the
: > messages together and just saving it as a text file.
:
: No, it's not that simple.
: There are actually two different, incompatible formats that use
: delimiter lines that begin with "From ", and it's impossible to
: programatically tell them apart. (And it's not that simple in
: either of them.)
I stand corrected! But how is it more complex than just putting your
messages in one after the other? Surely the only change is when some
program tries to put a content-length field in a header that didn't have
one originally.

: You imply here that heavier parsing of that line is desired, or even
: possible, neither of which is true. Here's some stuff I wrote last time
: this came up:
Well, I guess what I was trying to imply was that it's pretty much
impossible to parse all files because there are lots of variations. I
know that's not what I actually said though.
What I will say is that I've written a program which parses the "From "
lines that Pine puts in its folders and my program hasn't made a mistake
yet. It probably wouldn't work with other mail programs but at least
Pine is consistent, which is good enough for me because I can trust my own
parser.
I admit that heavier parsing is impractical, but I think it *would* be
desirable, because it's fairly obvious from human inspection which lines
would be message delimiters and which are meaningful text.

: But, here's the good news, there *is* no true specification of this file
: format, just a collection of word-of-mouth behaviors of the various
: programs over the last few decades which have used that format.
That's hardly good news, in any context! :)

: Essentially the only safe way to parse that file format is to consider
: all lines which begin with the characters "From " (From-space), which
: are preceeded by a blank line or beginning-of-file, to be the division
: between messages. That is, the delimiter is "\n\nFrom .*\n" except for
: the very first message in the file, where it is "^From .*\n".

From considering some fairly simple examples, you can see that this is
hardly "safe" on its own, the mangling you mentioned is also necessary. I
don't think this is ideal, but I guess it's the best one can do.

Graham

Message has been deleted
Message has been deleted

Jason Adams

unread,
Jul 26, 1996, 3:00:00 AM7/26/96
to

Graham Minchin (gtm...@tartarus.uwa.edu.au) wrote:
: : You imply here that heavier parsing of that line is desired, or even

: : possible, neither of which is true. Here's some stuff I wrote last time
: : this came up:
: Well, I guess what I was trying to imply was that it's pretty much
: impossible to parse all files because there are lots of variations. I
: know that's not what I actually said though.
: What I will say is that I've written a program which parses the "From "
: lines that Pine puts in its folders and my program hasn't made a mistake
: yet. It probably wouldn't work with other mail programs but at least
: Pine is consistent, which is good enough for me because I can trust my own
: parser.
: : all lines which begin with the characters "From " (From-space), which

: : are preceeded by a blank line or beginning-of-file, to be the division
: : between messages. That is, the delimiter is "\n\nFrom .*\n" except for
: : the very first message in the file, where it is "^From .*\n".

: From considering some fairly simple examples, you can see that this is
: hardly "safe" on its own, the mangling you mentioned is also necessary. I
: don't think this is ideal, but I guess it's the best one can do.

I hope you gentlemen don't mind me butting in but it seems that your
talking about a dilema that I am struggling with and would apreciate some
help with. If one parces with "'\n\nFrom ' whatever \n" what happens when
someone has a paragraph in the body that starts with "From " and the
preceding line is blank? Do you suggest parcing the whatever portion to
determine if it is a quallified address then comparing it to the "From: "
header field and checking for a match? Now you have to deal with file
position indicator. Or do you?

The reason I suggest the comparison of the address's is because it does
not seem so unlikely as to never happen that someone might indeed start a
paragraph after a blank line with:

"From so...@dada.edu I heard ... etc."

As I sit here staring at what I just wrote the light has come on! There is
a high probability that the address in a paragraph will be followed by a
space and or text, not a \n char. I think I have a formula now albeit a
painfully long one, a relatively safe one. If you guys have a simpler one
outside of asking that they include a delimit char in RFC 822 (Something I
can't understand why it wasn't done) then please let me in on it.

--
Jasom Adams

Life improves slowly and goes wrong fast, and only catastrophe is clearly
visable. Edward Teller

Zefram

unread,
Jul 28, 1996, 3:00:00 AM7/28/96
to

Jason Adams <jad...@mosquito.frcc.cccoes.edu> wrote:
> If one parces with "'\n\nFrom ' whatever \n" what happens when
>someone has a paragraph in the body that starts with "From " and the
>preceding line is blank?

For the third time in two weeks...

There are two basic possibilities. In the original BSD mailbox format,
if "\n\nFrom " occurs in the body of a message, it is mangled to
"\n\n>From ". Consequently if you are only reading a mailbox, you
don't need to worry about the possibility. The second possibility is
the SysV Content-Length format, in which a Content-Length: header is
added to each message, indicating the relative offset of the next
message. In this case "\n\nFrom " can occur in the body of a message,
but as you shouldn't be looking for the magic string there it doesn't
matter.

The more popular format seems to be the BSD format. This is mostly a
Good Thing; it's simpler and easier to implement, and doesn't rely on
particular mail headers, but does lose some information. (The
Content-Length format theoretically loses much more information, but
it's always a usually-unimportant header.)

-zefram

Tim Goodwin

unread,
Aug 1, 1996, 3:00:00 AM8/1/96
to

In article <1996Jul28....@dcs.warwick.ac.uk>,

Zefram <A.M...@dcs.warwick.ac.uk> wrote:
>The more popular format seems to be the BSD format. This is mostly a
>Good Thing; it's simpler and easier to implement, and doesn't rely on
>particular mail headers, but does lose some information.

It needn't. You can do transparent character stuffing in a backward
compatible way by making the substitution

s/^(>*From )/>\1/

when you store a message in an mbox file and

s/^>(>*From )/\1/

when you read it. (In English, at the beginning of a line, a sequence
of zero or more `>' characters followed by the sequence `From ' has one
extra '>' prepended.)

Programs that don't understand this format will not remove the extra `>'
(generally, they don't strip it even where there's only one, because
they can't tell whether it is part of the message or just stuffing), but
they will otherwise process the mbox correctly.

Of course, all mbox formats are inherently insecure (consider what
happens if the system crashes halfway through writing a message) and any
mail program that takes itself seriously will not use them, excep t
when necessary to communicate with more frivolous programs.

Tim.
--
Tim Goodwin | "USENET, of course, is a pure and unadultered source
Cambridge, UK | of truth and wisdom." -- Richard Kettlewell

Message has been deleted

D. J. Bernstein

unread,
Aug 2, 1996, 3:00:00 AM8/2/96
to

Jamie Zawinski <j...@netscape.com> wrote:
[ about mboxrd parsing ]
> No, you can't,

Yes, you can.

If you feed an mboxo to an mboxo reader, you will get something that
_might_ match the original message---say, 99% success.

If you feed an mboxo to an mboxrd reader, you will get something that is
_more likely_ to match the original message---say, 99.99% success.

That's an improvement.

This also offers the possibility of moving to 100% success---just switch
the writer from mboxo to mboxrd too.

> It may be a better format, but it is *not* the
> same format in use by the extant MUAs that claim to use either of the
> two "mbox" variants, nor is it completely compatible with them.

Actually, there are at least seven different mbox formats. (Not counting
variations in the ``remote'' syntax and in the addition of a blank line
to a message that already ends with a blank line.)

> If you're going to invent your own format, you might as well designit
> reasonably from the start instead of making "just one small incompatible
> tweak" to the existing format.

This one small tweak is about the best you can do while _preserving_
compatibility for essentially every message. It's a reversible format.
Switching to it is attractive for mboxo writers and for mboxo readers.

---Dan

Zefram

unread,
Aug 3, 1996, 3:00:00 AM8/3/96
to

Tim Goodwin <t...@pipex.net> wrote:
>It needn't. You can do transparent character stuffing in a backward
>compatible way by making the substitution
>
> s/^(>*From )/>\1/

Oh yes, you *can*, but that's not what actual implementations do. The
Berkeley mailbox format does not include this transformation, and
therefore loses information.

-zefram

D. J. Bernstein

unread,
Aug 4, 1996, 3:00:00 AM8/4/96
to

Zefram <A.M...@dcs.warwick.ac.uk> wrote:
> Oh yes, you *can*, but that's not what actual implementations do.

It is what some implementations do, and it's going to spread: it's very
easy to implement, it doesn't cause any damage, and it offers visible
benefits.

Here's what you have to do. On the writing side, insert ">" if

!strncmp(line + strspn(line,">"),"From ",5)

On the reading side, skip one character if

(*line == '>') && !strncmp(line + strspn(line,">"),"From ",5)

That's it.

---Dan

Message has been deleted

D. J. Bernstein

unread,
Aug 5, 1996, 3:00:00 AM8/5/96
to

Jamie Zawinski <j...@netscape.com> wrote:
> That it "does no damage" is completely false: if you have a
> properly-formatted mailbox which uses the 20-year-old BSD format, and
> you allow it to be rewritten by a program which does this *new thing*,
> you will not have the same message bodies.

False. Rewriting an mboxo file with an mboxrd reader-writer will produce
exactly the same file.

Perhaps what you meant to say is that, after you write a message in
mboxo format, an mboxrd reader won't necessarily produce the original
message. THE SAME IS TRUE OF AN MBOXO READER.

An mboxrd reader is, in fact, much more likely to produce the original
message than an mboxo reader. So it _reduces_ the mboxo damage.

> If you people who are espousing this would at least *admit that you are
> using an different and incompatible format*,

Everybody freely admits that mboxrd readers aren't 100% compatible with
an mboxo writer. MBOXO READERS AREN'T 100% COMPATIBLE EITHER.

There is no way for a reader to produce 100% correct results from mboxo,
because information has been permanently lost.

> This notion that "mostly" and "almost always" are words that one can use
> when describing a file format used for something as critical as mail
> storage is typical of the "40% is good enough" Unix mindset.

There is an _existing_ problem---exacerbated, I should note, by the
``10% is good enough'' Netscape mindset that has screwed up millions of
mail messages around the world. mboxrd solves that problem.

> Give it up.

No. mbox corruption _can_ and _will_ be solved.

> Use a new format which allows you to *know* when the
> conversion between the old and new formats is happening,

I do. It's called maildir. However, I don't think mbox writers are going
to disappear any time soon.

---Dan

Message has been deleted

Rahul Dhesi

unread,
Aug 9, 1996, 3:00:00 AM8/9/96
to

In <1996Aug901....@koobera.math.uic.edu>
d...@koobera.math.uic.edu (D. J. Bernstein) writes:

>mboxrd, a variant popularized by Rahul Dhesi in 1995, also inserts a >
>before any >From_ line, >>From_ line, >>>From_ line, etc.

Dan, I greatly appreciate the credit. Just for fun (and flames), here's
the original posting.

== begin saved posting ==
Date: 24 Jun 1995 08:41:06 GMT
From: Rahul Dhesi <dhesi>
Newsgroups: comp.mail.sendmail,comp.mail.mime
Message-Id: <3sgj32$k...@hustle.rahul.net>
Subject: Re: How can I send mail with the word "From" at the start of a line?

For storing mailboxes in index format, the simplest approach is the
one used by MH: one message per file, and a mailbox is a directory.

For storing messages in a single file, here is a simple scheme that
will remove all ambiguity. It's a combination of old and new
strategies.

1. (old) 'From ...' (standard pattern) at beginning of line (BOL)
begins a message
2. (old) 'From ' at BOL, when part of the message body, is
converted to '>From ' before the message is added into
a mailbox.
3. (new) '>From ', '>>From ', '>>>From ', etc., at BOL, when part of the
message body, are escaped by prepending on '>' before the message
is added into a mailbox.
4. (new) The mail agent always strips out one '>' from any instance of
'>From ', '>>From ', '>>>From ', etc. at BOL, before showing the message
to the user or moving it from a mailbox into any non-mailbox location.

Ok, look at the advantages of this scheme.

1. A mail reader using the above scheme is 100% compatible with
existing mailbox formats.

2. Said mail reader will correctly show 'From ' at BOL in a message
body, by stripping out the superfluous '>' that is added by existing
mail delivery agents.

3. If said mail reader finds occurrences of '>>From ', '>>>From ',
etc., at BOL in a message body, it may unnecessarily strip out one '>'.
In practice this is unlikely to cause problems.

4. Mail readers and delivery agents can be incrementally revised to
use this scheme.
--
Rahul Dhesi <dh...@rahul.net>
"please ignore Dhesi" -- Mark Crispin <m...@CAC.Washington.EDU>
== end saved posting ==

D. J. Bernstein

unread,
Aug 9, 1996, 3:00:00 AM8/9/96
to

Jamie Zawinski <j...@netscape.com> wrote:
> I really wish you'd stick to standard terminology, or, since there is
> no standard terminology here, define your terms.

mboxo, the original mbox format, inserts a > before any From_ line, adds
a new From_ line on top, and adds a blank line on bottom. (Two blank
lines, if the message ended with a partial final line.)

mboxrd, a variant popularized by Rahul Dhesi in 1995, also inserts a >
before any >From_ line, >>From_ line, >>>From_ line, etc.

> But it *won't* show the same bits to the user.

The same as _what_? There are two different items here:

(1) the bits contained in the original message
(2) the bits printed when an mboxo reader displays an mboxo message

An mboxrd reader is more successful than an mboxo reader at #1.

Obviously this means that it doesn't always do #2. You keep pointing
this out. What you're missing is that what the user _wants_ is #1. #2 is
interesting only to the extent it matches #1.

> > Perhaps what you meant to say is that, after you write a message in
> > mboxo format, an mboxrd reader won't necessarily produce the original
> > message. THE SAME IS TRUE OF AN MBOXO READER.

> No.

You mean ``Yes, but.''

> I'm saying that if you're going to destroy a message,

mboxrd is a reversible format.

> So you're claiming that because Netscape mangles "From " to ">From ",
> it's doing something wrong?

Actually, I was alluding to your handling of blank lines.

> But you've already admitted that your proposed solution is not a
> solution, it's only a heuristic

Uh, no. It's a complete two-part solution. It has the added bonus that
each part is beneficial by itself, even before everyone adopts the other
part.

> Once again, there are existing mail files which:

Yes, it's a shame that some messages have been irreversibly destroyed.
Writers should switch to a reversible format asap.

---Dan

Tim Goodwin

unread,
Aug 9, 1996, 3:00:00 AM8/9/96
to

In article <32099E...@netscape.com>,

Jamie Zawinski <j...@netscape.com> wrote:
>Once again, there are existing mail files which:
>
> 1: faithfully reproduce the contents of the messages in them, and
> 2: happen to have lines that begin with ">From ".
>
>Your new format will not faithfully present those messages to the user.

This is true. However, you have no way of identifying these mail files,
because the "mboxo" format loses information.

>Therefore, it is not a compatible solution.

By "compatible", I mean that mboxo readers understand mboxrd files,
and that mboxrd understand mboxo files. You can even have a file with
messages written in the 2 different formats, and both readers will
continue to understand it. You can upgrade any given writer or reader
independently of all the others.

Sure, mail messages will continue to be displayed corrupted till *all*
software has been upgraded. I don't see this as a problem, because
messages were displayed corrupted *anyway*.

Furthermore, I claim that the sequence `From' is considerably more
common (in original messages) than `>From', `>>From', etc., so an mboxrd
reader sees fewer corrupted messages when reading an mboxo format file
than an mboxo reader would.

>You're proposing a new format, and you're saying, "no, it's not 100%
>compatible." But your new format will cause confusion.

I don't believe that occasionally seeing `From' where you should have
seen `>From' is any more confusing than the other way round. Certainly,
none of my users have complained since I started introducing mboxrd
format over two years ago.

> Because this
>new format has the same problem as all the other formats, which is that
>you cannot examine the file and know which format it is.

No no no. The problem with mboxo is that it *destroys* *information*.
By using an mboxrd writer, you avoid destroying that information, and
you can subsequently retrieve it with an mboxrd reader.

Granted, there's no way to tell an mboxo file from an mboxrd file. But
why do you want to? If your goal is to present corrupted messages on as
few occasions as possible, then, given an mbox file of unknown origin,
your best strategy as a reader is to assume that it is in mboxrd format.
(Per my claim above, this will usually produce better results even if it
isn't.) From this, it follows that your best strategy as a writer is
always to use mboxrd format.

[ I'm using Dan's terminology: "mboxo" is the original mbox format,
using the substitution s/^From />From /; "mboxrd" is Rahul Dhesi's
format which uses s/^(>*)From /\1>From /. Incidentally, I invented this
independently, and first implemented it on 1995-04-04; I'm delighted
that others have had the same idea and the meme is spreading. ]

Rahul Dhesi

unread,
Aug 10, 1996, 3:00:00 AM8/10/96
to

In <4uffqh$n...@wave.news.pipex.net> t...@pipex.net (Tim Goodwin) writes:

>[ I'm using Dan's terminology: "mboxo" is the original mbox format,
>using the substitution s/^From />From /; "mboxrd" is Rahul Dhesi's
>format which uses s/^(>*)From /\1>From /. Incidentally, I invented this
>independently, and first implemented it on 1995-04-04; I'm delighted
>that others have had the same idea and the meme is spreading. ]

Actually I proposed it on Usenet probably a year before my 1995 posting,
but I didn't save the earlier posting. :-(

Barry Margolin

unread,
Aug 11, 1996, 3:00:00 AM8/11/96
to

In article <31F95C...@netscape.com>,
Jamie Zawinski <j...@netscape.com> wrote:
>You are trying to solve a different (and useless, and insoluble)
>problem. You are trying to solve the problem of parsing a file format
>where messages are separated by "From " lines, but where *no* quoting of
>interior "From " lines happens, and where *no* length indicators are
>present.
>
>This format you're trying to parse is different from either of the
>formats in use today. It *does not exist*. If it did exist, it could
>only be because of buggy software. I know of no software that writes
>this format.

Actually, I believe that when sendmail writes to archive files (as opposed
to user inbox files) it doesn't perform the s/^From />From /
transformation. So if you try to read an archive file using a mail reader
that just looks for /^From / it will get a number of false hits. Emacs
RMAIL uses a more strict regexp that tries to recognize all the common
ways that the "From " line is written (newlines added to keep the line from
being too long -- all the newlines in the actual pattern are \n):

/^From \([^ \n]*\(\|".*"[^ \n]*\)\|<[^<>\n]+>\) ?\([^ \n]*\) *\([^ ]*\) *
\([0-9]*\) *\([0-9:]*\) *\([A-Z]?[A-Z]?[A-Z][A-Z]\( DST\)?
\|[-+]?[0-9][0-9][0-9][0-9]\|\) * [0-9][0-9]\([0-9]*\) *
\([A-Z]?[A-Z]?[A-Z][A-Z]\( DST\)?\|[-+]?[0-9][0-9][0-9][0-9]\|\) *
\(remote from .*\)?\n/
--
Barry Margolin
BBN Planet, Cambridge, MA
bar...@bbnplanet.com - Phone (617) 873-3126 - Fax (617) 873-6351
(BBN customers, please call (800) 632-7638 option 1 for support)

0 new messages