Blood, sweat, and tears - Mashable certainly tested our resolve

[caption id="attachment_101" align="aligncenter" width="745" caption="source: my heart and soul"] [/caption] Boy was it a rough couple night... but a great one which I will remember forever. Feb 24 12:15AM - On my way to start doing my homework. As soon as I sit down in the study room, Dan Shipper calls me "We're on Mashable!" I'm stunned, don't know what to say and he tells me to come back to the dorms so we can work. I run over as fast as I can. 12:30AM - I'm back in the quad done dancing around in glee with Dan. We realize, we need to get cracking. Our servers at GoDaddy are getting hit hard. We thought it would handle it, but clearly the virtual dedicatd server was not enough. Dan called up Rackspace, got a Cloud Site squared away, and we began to migrate. 1:00AM - We decide it's worth it to take down the page rather than letting it load poorly for people. So we put up a landing page, and let the migration begin. We finally got a hold of Ajay (where the hell was he?). Ajay helps us wit PR. 1:30AM ish - Our cloud site was up and running! Dan and I were so happy! This was one of our emotional highs. Everything was working smoothly, we were getting hit with traffic at unimaginable loads, but we were handling. We though smooth sailing. So we just thought we could spruce things up a bit, fix a bug here and there, calm down the fire of people that had seen the site down and go to bed. We were wrong. 1:30AM - 4:30AM - Happily responding to tweets, facebook comments, Mashable comments while watching our traffic and user base grow. Through th night we saw user benchmarks hit 1K, then 2K, then 3K, 4K, and by the end of the night (er early morning) we see 5K! Also, Dan and I are busy fixing bugs and making features more robust for users. 5:30AM - Ajay heads to bed. As soon as this happens, Dan and I sort of realize, something is wrong. Things are slowing down significantly for loading the profile page. We expect that it must be something to do with our database because the code was basically unchanged through the night and traffic was consistently high so that was not the big difference. The only thing we could think of was something going on with more and more data being logged into our database. I mean, this was good, but it also hurt our speed - so we though. 6:30AM - We continue to look through our code to see if there are any MySQL calls that are just looping or something extremely inefficient. We catch a couple small things but don't see anything that could really make that much of a difference. We see some calls taking as long as 30s from the webpage, when earlier that evening it took only 2-3s...what is going ON??? 7:00AM - We decide to try logging the times it takes to go through each function that was called to see if it really was our SQL database. 8:00AM - Call Rackspace again, finally they discover it is not a CPU or RAM deficiency from our Cloud Site server but is instead long queues to our database because our site is data-call heavy. We continue to look through our scripts because they claim there is nothing we can do but to wait and let the queues clear out. 9:00AM - We have to head to class and so we need to make a choice: keep WhereMyFriends.Be up, or take a down. We had to choose whether we wanted to leave a sub-par, slow product up that would probably still pick up traffic, or take it down and possibly close off this window of opportunity for virality from Mashable. We eventually chose to take it down. To me, the choice was simple because you never release a product you know not to be the highest of quality. We didn't need artificial growth/hits, but instead wanted to make sure people knew we only produce great stuff. So we ended up using MailChimp to put up a nice apology letter, with a chance for them to sign up and see the product when it was up and running again. Then I had to start the day...In between classes and such I was running around checking to make sure our landing page didn't crash (I figured it wouldn't but...who knows sometimes) and seeing what I could do to make the SQL queries faster. I immediately got some great advice, thought of how to implement it, and started doing some of the coding while sitting around waiting for class to begin. I also made sure to help cool the fire online with all the hits going to our site, but nowhere to go from there.  Then at about 12PM Dan tells me we got on CNN! We were down at the time but it was still exciting! But now there were more angry/confused comments to respond to and quel. After finishing some homework that was due the next morning, I met up with Dan again to start coding. I was introduced to a new method of parsing the friends. Before we were sending off 20 friends at a time to get back 20 (or however many revealed their location) locations. I would figure out if those friends were in our database or not, then I would send them on different paths there. In my new implementation, after getting some advice, I parsed the friend list immediately into what-is-in-our-sql-tables, and what-do-I-need-from-FB. That made a big difference in terms of time, and along with other tweeks we thought our code was solid. We were just waiting on Rackspace to migrate us over from Cloud Sites to more powerful and scalable, Cloud Servers. After we did that, migrated over, things seemed to work fine. Mind you it was around 4AM at this point. We had run into several issues along the way, such as DNS mis-pointing, private/public IP misdirecting, SQL fails, and every other migrating issue you can think of. Surprisingly though, the new untested code that parsed the friends data in a different way and sent them on unique tracks worked with almost no hiccups! I was pretty proud. At 5:30AM, we figured everything was pretty much in the clear. We did more cleaning up, some more bug testing and phew. Up and running. Sleep. Wake up, things are good. Go to class, get out of class. Boom. Down again. I had a meeting to go to, and Dan said he was taking care of it so I trusted him to get it fixed. Rackspace told us, there really wasn't much they could do, so we put up the landing page again. FAIL. Jeeze I was mad... So now for the past several hours, Dan and I have been implementing memcache and seeing what else we can do to make things more robust and lightweight. I think we're pretty much done, and just have a few small things we need to tweak. But we should be ready to release tonight. By the way, Dan is in NYC tonight at a concert so if sh** hits the wall...I'm all alone. This project has probably taken up literally half of my life this week. But it's been worth it. Through all the good, and bad, and worse, and better, and best, and etc etc, I've learned a lot and know a lot about what to look out for and do next time I strike gold again (which hopefully is very soon).