IMAP debate

Derek Martin invalid at pizzashack.org
Tue Oct 28 01:28:29 EST 2003


On Mon, Oct 27, 2003 at 06:09:22PM -0500, bscott at ntisys.com wrote:
> > But the server can use them to speed response time up, and the
> > client can use caching techniques to improve performance on the
> > local end.
> 
>   Ahhh, okay, I see where you're going now.

:)

>   One could say that if you're storing persistent indexes for mbox, then
> you're really not using plain old mbox anymore.  That's a semantic argument,
> and thus not very productive.  I bring it up only because once you start
> playing games with the storage format, you might as well consider all the
> options.

Well, I still say indexing is seperate from the storage format, but at
any rate...  When I started writing my code, my intention was so
support all of the most commonly used mail folder formats, including
mbox, maildir, and mh.  So, yes.  :)  As for Cyrus, I don't know
enough about its implementation details to really distinguish it from
maildir.

> Although mbox with separate persistent indexes would have one big
> advantage: Backwards compatibility.  

Indeed...

> Although I imagine that could be another can of worms if you have a
> non-index-aware writer that keeps causing your index to get stale.

This really shouldn't be that bad to deal with.  Sure, you have to do
cache consistency checks, just like you would with any sort of
multi-client cache access (say, smp cpu caches).  But all that should
amount to is making sure the storage file hasn't been modified more
recently than the index file.  Only rarely should that happen, and
then you simply need to make a call to your update_folder_index()
function (or whatever) to rebuild the cache.

> > In the case of the IMAP client, as you point out yourself in a
> > message to Paul, the on-disk file format is irrelevant.  I'm not
> > really sure why you mentioned it here...
> 
> When you were talking about client caching, I was thinking an IMAP
> client, which made no sense.  Now that I realize you were talking
> about the "client"  of the mailbox (i.e., the actual mailbox storage
> code, be it an IMAP server or a UA like Pine or Mutt), it makes a
> lot more sense.

OK.

[mailbox corruption]
> > The same kind of failures can cause file system corruption.
> 
>   Two words: Journaling filesystem.  :-)

Indeed.

> > Data corruption happens.
> 
> That is not an acceptable attitude for many.  Nor should it be, IMO.  

In principle I would agree, but be practical: we're talking about
e-mail here.  Those who live and die by it save copies of everything
remotely important they send and receive.  You can have their mailbox
restored in 10 minutes if it's important enough, and anything
important they might have received in between your nightly back-up and
the crash should be in the sender's sent folder.

We're also talking about a disk failure.  Few other situations can
occur (discounting programming errors in the client) which would cause 
such folder corruption.  So in any event, restoring from back-ups will
be NECESSARY.  Either way (i.e. with mbox or maildir) you're going to
have to recover recently sent messages manually...

Of course, if you're on RAID, you can simply replace the drive and
rebuild the array.  No data corruption.

I still think this really isn't an issue worth worrying about.  If
your system administration team is bad enough that you lose mail this
way, you probably have much worse problems...

> Microsoft Exchange, for example, deals with this particular problem very
> well, by using a journaled database for storage (Exchange has a host of
> other problems, of course, but that's not the point I'm making here).

Perhaps so, but that feature may also be part of the cause of some of
those other problems you mentioned.  It requires lots of overhead, and
is probably at least partly why hotmail needed 5 Exchange servers for
every sendmail server (running on much slower hardware, IIRC) which
they replaced...  Sometimes, it's worth it to have a little less
security.

> > True enough, but the same principle holds.  Earlier messages tend to be
> > saved.  Later messages tend to be deleted.  There will always be
> > exceptions, but they will be relatively infrequent.
> 
>   I'm not so sure about that.  For example, I know of many end-users at
> various customers who retain the last, e.g., six months of email in their
> inbox (i.e., FIFO).  Maybe they're exceptions; I dunno.  But do you really
> have any hard evidence for your claims, or are you just assuming?

HARD evidence?  No.  I have only anecdotal evidence based on my own
observations of how the vast majority of the thousands of users I
have supported since about 1995 used their e-mail.

As a side note, I'm curious: how do those users retain only the last 6
months of e-mail?  Is that a feature their client provides?

> >> All I really know is this: In a production environment, with an IMAP
> >> server, given the way most companies use email these days, UW-IMAP+mbox
> >> is orders of magnitude worse then Cyrus.
> > 
> > And there, I can not argue with you.  But the fault is a bad program, not
> > a bad data format.
> 
>   I think it's both.  Even if we accept that the mail software suite you
> keep threatening to write solves most of the problems, you still have to do
> more I/O with mbox then with Cyrus, 

Mostly with expunge, which is a relatively infrequent operation (as
compared to say, receiving new messages, marking messages for
deletion, or opening a mail folder).  Yes, it's more I/O, but if you
have some other reason to keep mbox around (many people feel that they
do), it's a price worth paying.

> and you still have locking issues for mail delivery (which can be
> significant with a big inbox).

You need to lock, but so long as you don't allow REMOTE access to the
mail spools (i.e. NFS), this just isn't a big issue.  The only thing
that makes it a problem is that remote filesystems generally fall down
with locking.  So long as the MDA and MUA (or in this case, the IMAP
server) implement locking correctly, and no remote filesystem access
is allowed to the spools, locking isn't a problem.  It must be done,
but doing it isn't a problem.

> Maybe, with the right code, mbox can
> be made to suck a lot less, but it still seems like you're trying to
> improve upon a bad idea to me.  Why bother?  :-)

Three reasons:

  - backward compatibility
  - many people still LIKE mbox, for various reasons
  - I still say that there's nothing inherently wrong with mbox...
    only with how people implemented it and use it.  It has lasted 30
    years and is still in widespread use, despite availability of a
    number of "better" alternatives.  It can't be all THAT bad.

>   And, of course, from a strictly pragmatic standpoint, I don't know if
> there are any IMAP+mbox servers that don't suck.

None that I know of, granted.  :)

> Given an ideal implementation world, about the only thing I can think of
> that mbox would truly be better at would be a sequential search though the
> entire mailbox.  Say, for example, if the user wants to find all messages
> with the word "foobar" in them.  With mbox, that's just a big read on one
> file.  With Cyrus, you're doing a potentially huge number of open/close
> operations.  That would suck.

Anything sequential would be faster with mbox.  If you allowed any
sort of live access to the spools (cyrus or mbox), you'll run into the
problem that traditional clents will (generally) know nothing about
their indexes, and hence cache coherency will break periodically.
Rebuilding the index to fix coherency will also be faster with mbox.

Maybe we should make mailbox drivers for the operating system, huh?
;-)

-- 
Derek D. Martin
http://www.pizzashack.org/
GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address.
Replying to it will result in undeliverable mail.
Sorry for the inconvenience.  Thank the spammers.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mail.gnhlug.org/mailman/private/gnhlug-discuss/attachments/20031028/32993e77/attachment.bin


More information about the gnhlug-discuss mailing list