So, how big is all of Google+ Communities anyway?
Not how many communities (8 million and a skosh). Not how many actual
users (a few tens of millions). But posts and text.
(I'm skipping the boring stuff like images, which blow things up a lot, but we can get to that later if you'd like.)
Suppose someone walked up to you and asked if you'd like a full copy of the Google+ Communities post archive. A few thoughts might occur to you, one being "where would I put that?"
I mean, Google, GOOGLE SCALE.
Maybe ... not. At least by modern hardware standards
So, back in mid-December I sampled 36,000 randomly-selected G+ communities and got some information on them -- name, description, member count. And the ten most recent posts, along with the elapsed date range between the newest and oldest posts. So what does that give us?
First off, only 4,465 of the 36,000 communities
had a full ten posts in their history. The others either never had
more than 9 posts submitted, or had them purged (user deletion, spam, other actions, stuff
). I'll treat any community with fewer than 10 posts as effectively zero, to simplify things.
The elapsed time gives me a post rate
, which is how many posts are submitted over any given time interval. Weeks become handy, and the rate is about 1 post/wk, on average (1.0046/wk, if you want to be precise).
My 36k sample represents about 1 of every 223 actual communities, so we can multiply 4,465 by 223 and get ... about a million (996,000).
Gee, about a million, and about a post a week, so a million posts a week? Yeah, but let's be precise:
You have: 996000 \* 1.0046
Yeah, that's pretty much a million posts/wk.
G+ Communities launched in December, 2012, and will be shut down in April, 2019. I'll simplify again and take Jan 1 2013 - December 31 2018, or six years of 52 weeks: 312 million posts.
But how long is a post
I'm punting here (though ... actually I do
have some data, come to think), and took a quick look at #Discovfefe
, I mean, Google+ Discover
. Grabbing a random few posts, they tend to run about 20-40 words typically, with some of the longer ones weighing in at 100 - 450 words. Mostly, though, about Tweet sized, which probably reflects a number of factors. Call it 250 bytes.
312 million posts * 250 bytes: 78 GB.
Yes: "Google-scale" source text for Google+ Communities is probably under 100 GB total storage.
Caveats, civet cats, and all that jazz
The short-cuts I took above probably overstate
the storage estimate -- Communities are created over time, they did not all exist for the entire life of G+ communities, and posting rates may have changed. The "elapsed" interval also doesn't count communities which have stopped
generating new posts.
The page weight
And it's likely that a fair amount of data was
submitted but has been removed -- spam, deleted accounts, and the like.
And even if you're only counting text, there's some post-level metadata: the author, date, communityID, and related bits, which pad out the data requirements slightly. These numbers are rough.
But the purpose is to give an approximate sense of the scale.
And photos. About 30% of posts seem to have an image attached. G+ has a max size I'd need to look up -- 2120 x 1192, apparent. But you're looking at about 93 million images or so, roughly. Some can be quite large. This beauty
(and it is
pretty) is 7.4 MB, at about 4k x 6k pixels, 24 MP raw. And this shot of Tulsi Gabbard
as a link hero is about 1060x600 pixels.SanDisk have one of the better image size / storage capacity references
I've seen. In 12MP format (3k x 4k pixels), 1 GB can hold about 238 images, 128 GB, over 30,000.
Put it on my bill, said the duck
How much did Google pay for all of this?
No, really: How much did Google pay for all of this?
Because one aspect of this whole fiasco I kind of don't begrudge is that a handful of us got a fairly nice playground for a while. That it's getting shuttered wasn't a huge surprise. How
it's being shuttered .... Well, I've written about that
Now, if only I knew someone who might be able to tell me the answer....