Lan + DMZ + LargeNumOfFiles = headaches AKA: plz halp and donate urbrain!!

Flaherty, Patrick pflaherty at wsi.com
Mon Sep 8 14:02:14 EDT 2008


> I've been soliciting solutions from everyone I can think of 
> on moving a
> large number of files from inside our lan to a dmz on a 
> regular basis. 
> 
> I have a cluster of machine producing 20k small files (30kbytes or so)
> inside our lan. After the files are created, they are pushed to a few
> web servers in the DMZ using ftp. The push is done by the machine that
> created the file. Ideally, the files make it out to the DMZ 
> in less than
> 30 seconds but there have been some issues. 
> 
> FTP seems to fall down when scaling out to more than a web server or
> two, many retries and transfer failures. It also adds to complexity to
> the processing. What if one of the web servers is down? How 
> many time do
> you retry? Should you notify the other hosts in the cluster? All that
> logic needs to be in the pushing script, which becomes a bit ungainly.
> There's also the issue with constantly opening up new ftp sessions,
> which is a bit expensive.
> 
> So I'm looking for a cleaner architecture. An ideal solution 
> would be an
> NFS/CIFS share internal to the lan replicated readonly to an NFS/CIFS
> share in the DMZ. The cluster can write to the nfs share, the web
> servers can read from the nfs share. Everyone is happy. The 
> big sticking
> point is being careful violating the security by multi homing the
> storage. Many solutions require an open connection network on 
> many ports
> between the two storage boxes, which would be an easy way in 
> to our lan.
> 
> So far I'm poking at (and some downsides):
>  FUSE + (sshfs/ftpfs): High performance hit (60%ish from what 
> I've read)
>  ZFS + StorageTek: Great, another operating system train people on.
>  DRBD: requires full network connection between lan and dmz boxes.
>  dataplow sfs + das box: sales people will promise you the world.
>  Software SAN replicators of to many names to mention.
> 
> This is such a common problem, I'm not sure why there isn't a nice
> canned solution of two cheap pieces of hardware. Maybe I'm 
> just an idiot
> and there is. Oh please please please tell me I'm an idiot. 
> 
> Anyone have any brilliant ideas?

I want to thank everyone for their input. 

The rsync idea was nice, but runs into a lot of expensive overhead. 

Fuse + transport layers looked alright but was expensive and had a
performance impact.

Almost all clustered filesystems required network access between nodes
on top of access to the shared storage. No violating firewall rules
unless we absolutly need to.

I had a discussion with Steven Soltis at a company called dataplow. He
has the distinction of being one of the Ph. D students that worked on
the original GFS. Not holding that against him, his company has a shared
filesystem product where read-only nodes do not need network access to
read/write nodes, only access to the shared storage. It's pretty close
to the idea situation. Should keep vulnerabilities down to driver issues
and fibre channel hacking. 

Thanks again for the input.

Patrick



More information about the gnhlug-discuss mailing list