IMAP debate

bscott at ntisys.com bscott at ntisys.com
Mon Oct 27 18:09:22 EST 2003


On Wed, 22 Oct 2003, at 3:54pm, invalid at pizzashack.org wrote:
>> I might point out that, for an IMAP server at least, client-side
>> persistent indexes are just about useless.
> 
> But the server can use them to speed response time up, and the client can
> use caching techniques to improve performance on the local end.

  Ahhh, okay, I see where you're going now.

  One could say that if you're storing persistent indexes for mbox, then
you're really not using plain old mbox anymore.  That's a semantic argument,
and thus not very productive.  I bring it up only because once you start
playing games with the storage format, you might as well consider all the
options.  Although mbox with separate persistent indexes would have one big
advantage: Backwards compatibility.  Although I imagine that could be
another can of worms if you have a non-index-aware writer that keeps causing
your index to get stale.

  Ehhh.  Nothing is ever simple.  :-)

> In the case of the IMAP client, as you point out yourself in a message to
> Paul, the on-disk file format is irrelevant.  I'm not really sure why you
> mentioned it here...

  When you were talking about client caching, I was thinking an IMAP client,
which made no sense.  Now that I realize you were talking about the "client"  
of the mailbox (i.e., the actual mailbox storage code, be it an IMAP server
or a UA like Pine or Mutt), it makes a lot more sense.

> Furthermore, the rewrite-copy method is not immune to this problem.  If
> the same sort of interruption were to occur while the copy is being
> re-written to its proper place, you'll still end up with a corrupted mail
> spool.

  Well, if the copy is done sensibly, no.  You write to a new file, unlink
the old one, and link in the new one.  The window of opportunity for
disruption there is very loss, and even then, recovery is easily implemented
using a standard file naming scheme.

  Note that I don't mean to imply that UW-IMAP (or anything else) actually
does this.  :-)

  Cyrus is even better, of course; there's no single file to worry about.  
Delete/expunge is atomic.  New message creation is atomic, although the
message itself might be truncated during a crash.  Index updates could be a
problem; I don't know what, if anything, Cyrus does to protect that.  But as
the index is redundant data, index corruption is only a performance loss,
not a data loss.

> The same kind of failures can cause file system corruption.

  Two words: Journaling filesystem.  :-)

> Data corruption happens.

  That is not an acceptable attitude for many.  Nor should it be, IMO.  
Microsoft Exchange, for example, deals with this particular problem very
well, by using a journaled database for storage (Exchange has a host of
other problems, of course, but that's not the point I'm making here).

> True enough, but the same principle holds.  Earlier messages tend to be
> saved.  Later messages tend to be deleted.  There will always be
> exceptions, but they will be relatively infrequent.

  I'm not so sure about that.  For example, I know of many end-users at
various customers who retain the last, e.g., six months of email in their
inbox (i.e., FIFO).  Maybe they're exceptions; I dunno.  But do you really
have any hard evidence for your claims, or are you just assuming?

> But, I said it isn't a minor implementation detail precisely because how
> you implement this aspect of the program has a profound effect on the
> performance of the server.  Therefore, it is a major implementation
> detail.

  Ahhh, okay, I get what you're saying.  It's a minor detail in the
implementation, with a major impact on how the implementation performs.  
Damn ambiguous English language.  :-)

>> You'll note that I was not talking about Maildir or mh, but Cyrus.  The
>> persistent index which Cyrus maintains eliminates most of them.
> 
> This is not a performance gain inherent to the format

  Well, most of the objections you kept making against Maildir do not exist
in Cyrus, so I thought it was relevant to mention that.  :-)

> ... (which is still maildir).

  I'm not really sure if you're generalizing there or not, but just for the
record: Cyrus is not just Maildir with an index added on.  MH, Maildir, and
Cyrus all use the one-file-per-message approach, but that doesn't make them
the same thing, any more then mbox and mbx are the same just because they
use a one-file-per-mailbox approach.

>> The rest are, as you note, easily eliminated by spec'ing the filesystem
>> to handle Cyrus in the first place.  Which seems like a no-brainer to me.
> 
> Yeah, but it's a consideration for someone who is moving from UW-IMAP to
> something else.

  Fair enough.

>> All I really know is this: In a production environment, with an IMAP
>> server, given the way most companies use email these days, UW-IMAP+mbox
>> is orders of magnitude worse then Cyrus.
> 
> And there, I can not argue with you.  But the fault is a bad program, not
> a bad data format.

  I think it's both.  Even if we accept that the mail software suite you
keep threatening to write solves most of the problems, you still have to do
more I/O with mbox then with Cyrus, and you still have locking issues for
mail delivery (which can be significant with a big inbox).  Maybe, with the
right code, mbox can be made to suck a lot less, but it still seems like
you're trying to improve upon a bad idea to me.  Why bother?  :-)

  And, of course, from a strictly pragmatic standpoint, I don't know if
there are any IMAP+mbox servers that don't suck.

  Given an ideal implementation world, about the only thing I can think of
that mbox would truly be better at would be a sequential search though the
entire mailbox.  Say, for example, if the user wants to find all messages
with the word "foobar" in them.  With mbox, that's just a big read on one
file.  With Cyrus, you're doing a potentially huge number of open/close
operations.  That would suck.

-- 
Ben Scott <bscott at ntisys.com>
| The opinions expressed in this message are those of the author and do  |
| not represent the views or policy of any other person or organization. |
| All information is provided without warranty of any kind.              |




More information about the gnhlug-discuss mailing list