My Google, Shanghai: Explained
I’ve talked a little about being in China, but I haven’t said much about why. Up until only recently my duties at Google were unclear, but now I understand my purpose: Christophe dragged me over here to contribute to Hadoop, an open-source MapReduce implementation.
Hadoop is essentially a tool used by software engineers to write programs that use large amounts of computers to process vast amounts of data. Cloud computing is the new buzz word, but Google revolutionized large-scale computing, or distributed-computing, many years ago. Historically, lots of data (like that of the internet) was analyzed by large, expensive computers. In fact, historically, lots of data just flat out wasn’t analyzed. Now, in the wake of MapReduce, Hadoop puts hundreds or even thousands of commodity computers to work to analyze data. Cloud computing is one of the reasons why Google is the best search engine, and industries all over are benefiting from the cloud. Cancer researchers are able to more efficiently understand their data. Astronomers can crunch their images much faster. Hadoop allows any company to effectively understand large amounts of data.
It’s not yet clear exactly how I’ll be contributing to Hadoop; those details should surface soon. I admit that Hadoop is my first open-source project, and I’m very, very excited to be contributing to a field that is growing so rapidly. More updates to come!
Bonus story: after slaving away for four days, I finally have Hadoop’s trunk build running on a multi-node cluster. Boom shakalaka!
2 Comments so far
Leave a reply


Dude! That’s awesome. That’s a really cool project.
(and it’s good to hear that even at google it takes 4 days to get a build of whatever you’re working on running
Thanks, Jason! Yeah, getting builds and environments up and running sucks.