Is Amazon AWS/EBS snapshotting just LVM, or what?
Joshua Judson Rosen
rozzin at hackerposse.com
Thu Sep 28 14:00:43 EDT 2017
On 09/28/2017 01:46 PM, Tom Buskey wrote:
> I work with OpenStack. It manages images in Glance which sit above its object storage, Swift.
>
> On the POC clouds, you can use LVM as a backend for Glance. Snapshotting is *very* slow. 30 minutes for a snap of a
> 80GB VM that's shutdown.
OK..., that surprises me. A lot.
For comparison, I just made an LVM snapshot of a volume 50% larger than that, that's *in use*
(and mostly not in cache, if that even makes a difference, since my buffer+cache shows as only 17GB *total*),
and the whole operation took only a fraction of a second:
rozzin at zuul:~ $ time sudo lvcreate --name home_snap --size 128G --snapshot zuul-vg/home
Using default stripesize 64.00 KiB.
Logical volume "home_snap" created.
real 0m0.349s
user 0m0.028s
sys 0m0.060s
How in the world does that translate to 30-minutes (*5 thousand* x time)
for a volume only 0.63x as big?
When you say "snapshotting on top of LVM", does that entail actually making a full copy
after the LVM snapshot is made--or something like that?
> You can use other storage backends in OpenStack that are faster. A full non LVM Swift. Ceph and glusterfs are common
> choices where performance matters. They wouldn't be using ZFS but probably something using their S3 object store.
>
>
>
>
> On Thu, Sep 28, 2017 at 1:32 PM, Ken D'Ambrosio <ken at jots.org <mailto:ken at jots.org>> wrote:
>
> I would say it's unlikely to be LVM, because LVM is content-ignorant; it
> snapshots the entire volume, which is inefficient, and when you're
> Amazon, you care a LOT about being efficient. Instead, I imagine
> they're using some content-aware CoW solution such as ZFS. But,
> whatever mechanism, I agree with your opinion: I doubt that their
> solution -- almost certainly CoW of some sort -- stands a chance of
> being more than even slightly impactful.
>
> $.02, YMMV and other assorted disclaimers,
>
> -Ken
>
>
> On 2017-09-28 13:16, Joshua Judson Rosen wrote:
> > I'm working on a project that uses Amazon AWS-provided VPS instances,
> > and the other guy on the project is telling me that "snapshotting
> > hourly may degrade performance",
> > and I'm trying to determine where that's actually true. My gut feeling
> > is that it sounds kind of bogus.
> >
> >> From the information I've been able to find about how Amazon's stuff
> >> works (either in terms
> > of how it's _implemented_ [for which I'm finding basically no insight]
> > or how it's _characterized_
> > [in the engineering sense, not the literary sense]...), it really
> > sounds a _lot_ like Amazon
> > is just using LVM snapshots, e.g. from
> > <https://aws.amazon.com/ebs/faqs/ <https://aws.amazon.com/ebs/faqs/>>:
> >
> > "snapshots can be done in real time while the volume is attached and
> > in use.
> > However, snapshots only capture data that has been written to your
> > Amazon EBS volume,
> > which might exclude any data that has been locally cached by your
> > application or OS."
> >
> > "By design, an EBS Snapshot of an entire 16 TB volume should take no
> > longer than the time
> > it takes to snapshot an entire 1 TB volume. However, the actual time
> > taken to create
> > a snapshot depends on several factors including the amount of data
> > that has changed
> > since the last snapshot of the EBS volume."
> >
> > ... though I'm not entirely sure how to interpret that last bit about
> > "time taken to create a snapshot
> > depends on... the amount of data that has changed since the last
> > snapshot";
> > the _first half of that statement_ reads as "creating a snapshot is
> > constant time",
> > which basically screams to me "copy-on-write just like LVM, and
> > they're probably implemented
> > in terms of LVM".
> >
> > Any insight here as to whether my gut is correct on this, or whether
> > I'm actually likely
> > to notice an impact from hourly snapshots of, say, a 200-GB volume?
> > How about a 1-TB volume?
> >
> > The only thing I'm seeing from Amazon that seems to _vaguely_ support
> > (maybe) the notion
> > that `snapshotting too often' would be something to worry about is
> > this bit from elsewhere
> > in that same FAQ page (under the heading of "performance", whereas the
> > others were
> > under the heading of "snapshots" and a subheading of "performance
> > consistency of my HDD-backed volumes":
> >
> > Another factor is taking a snapshot which will decrease expected
> > write performance
> > down to the baseline rate, until the snapshot completes.
> >
> > ... and, taken in the context of the previously-cited notes about
> > snapshots being
> > `not base on volume-size but maybe influenced by
> > changed-since-last-snapshot set size'
> > (and in the context of the explanations they give for HDD-backed vs.
> > SSD-backed storage),
> > I'm basically reading that as:
> >
> > `if you're using HDD-backed storage then it's because you care about
> > *throughput*
> > more than *response time* and are likely to be monitoring throughput,
> > and if you're monitoring throughput you may notice a *momentary dip
> > in throughput*
> > as the *HDDs* need to seek around to find the volume boundaries and
> > set up the COW records.'
> >
> > Even if you don't have any insight into what's actually happening
> > under the covers at Amazon,
> > does my reading of all of this sound right to you?
> >
> > And, perhaps more interestingly, are these same caveats from Amazon
> > generally applicable to LVM?
> _______________________________________________
> gnhlug-discuss mailing list
> gnhlug-discuss at mail.gnhlug.org <mailto:gnhlug-discuss at mail.gnhlug.org>
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/ <http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/>
>
>
--
"Don't be afraid to ask (λf.((λx.xx) (λr.f(rr))))."
More information about the gnhlug-discuss
mailing list