Is Amazon AWS/EBS snapshotting just LVM, or what?

Thu Sep 28 14:00:43 EDT 2017

On 09/28/2017 01:46 PM, Tom Buskey wrote:
> I work with OpenStack.  It manages images in Glance which sit above its object storage, Swift.
> 
> On the POC clouds, you can use LVM as a backend for Glance.  Snapshotting is *very* slow.  30 minutes for a snap of a
> 80GB VM that's shutdown.

OK..., that surprises me. A lot.

For comparison, I just made an LVM snapshot of a volume 50% larger than that, that's *in use*
(and mostly not in cache, if that even makes a difference, since my buffer+cache shows as only 17GB *total*),
and the whole operation took only a fraction of a second:

	rozzin at zuul:~ $ time sudo lvcreate --name home_snap --size 128G --snapshot zuul-vg/home
	  Using default stripesize 64.00 KiB.
	  Logical volume "home_snap" created.

	real	0m0.349s
	user	0m0.028s
	sys	0m0.060s

How in the world does that translate to 30-minutes (*5 thousand* x time)
for a volume only 0.63x as big?

When you say "snapshotting on top of LVM", does that entail actually making a full copy
after the LVM snapshot is made--or something like that?

> You can use other storage backends in OpenStack that are faster.  A full non LVM Swift.  Ceph and glusterfs are common
> choices where performance matters.  They wouldn't be using ZFS but probably something using their S3 object store.
> 
> 
> 
> 
> On Thu, Sep 28, 2017 at 1:32 PM, Ken D'Ambrosio <ken at jots.org <mailto:ken at jots.org>> wrote:
> 
>     I would say it's unlikely to be LVM, because LVM is content-ignorant; it
>     snapshots the entire volume, which is inefficient, and when you're
>     Amazon, you care a LOT about being efficient.  Instead, I imagine
>     they're using some content-aware CoW solution such as ZFS.  But,
>     whatever mechanism, I agree with your opinion: I doubt that their
>     solution -- almost certainly CoW of some sort -- stands a chance of
>     being more than even slightly impactful.
> 
>     $.02, YMMV and other assorted disclaimers,
> 
>     -Ken
> 
> 
>     On 2017-09-28 13:16, Joshua Judson Rosen wrote:
>     > I'm working on a project that uses Amazon AWS-provided VPS instances,
>     > and the other guy on the project is telling me that "snapshotting
>     > hourly may degrade performance",
>     > and I'm trying to determine where that's actually true. My gut feeling
>     > is that it sounds kind of bogus.
>     >
>     >> From the information I've been able to find about how Amazon's stuff
>     >> works (either in terms
>     > of how it's _implemented_ [for which I'm finding basically no insight]
>     > or how it's _characterized_
>     > [in the engineering sense, not the literary sense]...), it really
>     > sounds a _lot_ like Amazon
>     > is just using LVM snapshots, e.g. from
>     > <https://aws.amazon.com/ebs/faqs/ <https://aws.amazon.com/ebs/faqs/>>:
>     >
>     >       "snapshots can be done in real time while the volume is attached and
>     > in use.
>     >        However, snapshots only capture data that has been written to your
>     > Amazon EBS volume,
>     >        which might exclude any data that has been locally cached by your
>     > application or OS."
>     >
>     >       "By design, an EBS Snapshot of an entire 16 TB volume should take no
>     > longer than the time
>     >        it takes to snapshot an entire 1 TB volume. However, the actual time
>     > taken to create
>     >        a snapshot depends on several factors including the amount of data
>     > that has changed
>     >        since the last snapshot of the EBS volume."
>     >
>     > ... though I'm not entirely sure how to interpret that last bit about
>     > "time taken to create a snapshot
>     > depends on... the amount of data that has changed since the last
>     > snapshot";
>     > the _first half of that statement_ reads as "creating a snapshot is
>     > constant time",
>     > which basically screams to me "copy-on-write just like LVM, and
>     > they're probably implemented
>     > in terms of LVM".
>     >
>     > Any insight here as to whether my gut is correct on this, or whether
>     > I'm actually likely
>     > to notice an impact from hourly snapshots of, say, a 200-GB volume?
>     > How about a 1-TB volume?
>     >
>     > The only thing I'm seeing from Amazon that seems to _vaguely_ support
>     > (maybe) the notion
>     > that `snapshotting too often' would be something to worry about is
>     > this bit from elsewhere
>     > in that same FAQ page (under the heading of "performance", whereas the
>     > others were
>     > under the heading of "snapshots" and a subheading of "performance
>     > consistency of my HDD-backed volumes":
>     >
>     >       Another factor is taking a snapshot which will decrease expected
>     > write performance
>     >       down to the baseline rate, until the snapshot completes.
>     >
>     > ... and, taken in the context of the previously-cited notes about
>     > snapshots being
>     > `not base on volume-size but maybe influenced by
>     > changed-since-last-snapshot set size'
>     > (and in the context of the explanations they give for HDD-backed vs.
>     > SSD-backed storage),
>     > I'm basically reading that as:
>     >
>     >       `if you're using HDD-backed storage then it's because you care about
>     > *throughput*
>     >        more than *response time* and are likely to be monitoring throughput,
>     >        and if you're monitoring throughput you may notice a *momentary dip
>     > in throughput*
>     >        as the *HDDs* need to seek around to find the volume boundaries and
>     > set up the COW records.'
>     >
>     > Even if you don't have any insight into what's actually happening
>     > under the covers at Amazon,
>     > does my reading of all of this sound right to you?
>     >
>     > And, perhaps more interestingly, are these same caveats from Amazon
>     > generally applicable to LVM?
>     _______________________________________________
>     gnhlug-discuss mailing list
>     gnhlug-discuss at mail.gnhlug.org <mailto:gnhlug-discuss at mail.gnhlug.org>
>     http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/ <http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/>
> 
> 

-- 
"Don't be afraid to ask (λf.((λx.xx) (λr.f(rr))))."