Monthly Archive for April, 2010

Dapper, a Large-Scale Distributed Systems Tracing Infrastructure

I find Google to be one of the most interesting companies in the IT business, especially when it comes to infrastructure, managing it, and using it for fun and profit (guess who owns the most servers? More about Google architecture).

Google published a very interesting paper about their distributed monitoring infrastructure called Dapper. An excellent review was posted on highscalability.com. I found it very interesting, and decided to add some notes to it.

Dapper has been used by Google for the last two years, and “… is part of our basic machine image, making it present on virtually every server at Google [Totaling thousands of different applications]”

Now this is a critical piece of infrastructure that operates at incredibly high data rates. Storing the sheer amount of data is a difficult task, even for Google.

Just to give a basic feeling of complexity:

“A web-search example will illustrate some of the challenges such a system needs to address …. In total, thousands of machines and many different services might be needed to process one universal search query. Tools that aid in understanding system behavior and reasoning about performance issues are invaluable in such an environment.”

So, now on to some details and noteworthy citations:

  • Data is written to local log files, pulled by the collection infrastructure and stored in one of several regional Bigtable repositories.
  • A trace is laid out as a single Bigtable row, with each column corresponding to a Span (Dapper’s term for a basic units of work in a trace)
  • Span Ids are probabilistically  (!) unique64-bit integers
  • Google production servers generate more than 1 terabyte of sampled trace data per day.
  • Throughput is so high (tens of thousands per second per process) that Google decided to sample the data by keeping only a fraction (1/1024) of it
  • The benefits of increased trace data density must then be weighed against the cost of machines and disk storage for the Dapper repositories. Sampling a high fraction of requests also brings the Dapper collectors uncomfortably close to the write throughput limit for the Dapper Bigtable repository
  • Our experience at Google leads us to believe that, for high-throughput services, aggressive sampling does not hinder most important analyses. [...] If a notable execution pattern surfaces once in such systems, it will surface thousands of times
  • However, lower traffic workloads may miss important events at such low sampling rates, while tolerating higher sampling rates with acceptable performance overheads.
  • We are in the process of deploying an adaptive sampling scheme that is parameterized not by a uniform sampling probability, but by a desired rate of sampled traces per unit time.

More about Dapper:

  • Nearly all of Google’s inter-process communication is built around a single RPC framework with bindings in both C++ and Java. We have instrumented that framework to define spans around all RPCs.
  • The core instrumentation is less than 1000 lines of code in C++ and under 800 lines in Java
  • Dapper also allows application developers to enrich Dapper traces with additional information that may be useful to monitor higher level system behavior or to help in debugging problems.

More about performance:

  • The daemon never uses more than 0.3% of one core of a production machine during collection, and has a very small memory Trace data collection is responsible for less than 0:01% of the network traffic in Google’s production environment.
  • Each span corresponds to 426 bytes on average
  • Root span creation and destruction takes 204 nanoseconds on average
  • Unrealistically heavy load testing benchmarks with data rates reaching to 2M/sec  (!), with only 0.267% CPU Core usage. (I wonder if this stands for 2M spans or 2M/426=4694 traces per second?)
  • Writes to local disk are the most expensive operation in Dapper’s runtime library, but their visible overhead is much reduced since each disk write coalesces multiple log file write operations and executes asynchronously with respect to the traced application.

Here’s a screenshot of Dapper user interface as shown in the article:

Z.

Chinese hackers break into hundreds of Israeli gmail accounts

Looks like several hundreds (or thousands?) gmail accounts have been compromised by Chinese hackers.

Over the last couple of weeks I received some spam emails from people I personally know. When I saw the email, I immediately recognized it as a spam/phishing. Normally I’d quickly delete it and move on, but this time I suspected there’s something fishy (pun intended) going on:
  • The sender was a valid gmail account
  • gmail didn’t detect this as spam (this rarely happens on my account)
  • The recipient list was not random and it looked like it came from the sender’s address book. I even recognized some of the emails
  • The message was mailed-by and signed-by gmail.com!
Here’s the first email I got:

date Mon, Mar 29, 2010 at 4:08 PM
hi
i am glad to tell you a good news ,and i find a good website

http://www.buusir.info

On this website ,you can find many new and origianl electronic
products .Now they are holding sales promotion activity, all the
product are sold at a discount.
low cost and good quality ,and the delivery is on time .
It is a good chance that you should not lose.
If you need some, visit this website .
Hope everything goes well.
Greetings!

The second email (sent Sat, Apr 10, 2010) was almost identical, but this time the URL pointed to www.buusir888.com.

If this also happened to you, I highly recommended following the instructions described here for securing your compromised gmail account: my contacts are receiving emails from my email account inviting them to website. but its not me who is sending. – Gmail Help
Tip – gmail has a very useful and little-known feature that shows the last activity in your gmail account, including IP address, time and geo-location (country, e.g. China in the above case). Visit gmail and at the bottom of the screen you’ll see something like the following picture:
Click on the ‘details’ link to see the magic:
Also worth noting that, earlier this year, Google announced that HTTPS will be used by default for gmail. Coincidence? probably not.
Here are some additional interesting links on this subject:
Z

RSA 1024-bit encryption cracked in 100 hours by manipulating voltage supply

This was published over a month ago and generated some headlines, like herehere, and here (The last link leads to the only Hebrew reference I found).

Good reading – a creative attack that shows how you can break a strong crypto-system without trying to attack it’s main strength (the math!) and going for a back door (physical attack on the CPU’s power supply!).

Bruce Schneier writes in ‘Secrets and Lies‘ that ‘Security is a system, not a product’, and continues to cover the failure of PKI on the internet. Here is an excellent excerpt from this book.

From “Secrets and Lies”, by Bruce Schneier, Chapter 15, “Certificates and Credentials”, section “PKIs On The Internet” (page 238):

Most people’s only interaction with a PKI is using SSL. SSL secures web transactions, and sometimes PKI vendors point to it as enabling technology for electronic commerce. This argument is disingenuous; no one is turned away at an online merchant for not using SSL.

SSL does encrypt credit card transactions on the Internet, but it is not the source of security for the participants. That security comes from credit card company procedures, allowing a consumer to repudiate any line item charge before paying the bill. SSL protects the consumer from eavesdroppers, it does not protect against someone breaking into the Web site and stealing a file full of credit card numbers, nor does it protect against a rogue employee at the merchant harvesting credit card numbers. Credit card company procedures protect against those threats.

Has anyone ever sounded the alarm in these cases? Has anyone not bought online products because the name of the certificate didn’t match the name on the Web site? Has anyone but me even noticed?

PKIs are supposed to provide authentication, but they don’t even do that.

Example one: the company F-Secure (formerly Data Fellows) sells software from its Web site at www.datafellows.com. If you click to buy software, you are redirected to the Web site www.netsales.net, which makes an SSL connection with you. The SSL certificate was issued to “NetSales, Inc., Software Review LLC” in Kansas. F-Secure is headquartered in Helsinki and San Jose. By any PKI rules, no one should do business with this site. The certificate received is not from the same company that sells the software. This is exactly what a man-in-the-middle attack looks like, and exactly what PKI is supposed to prevent.

Example two: I visited www.palm.com to purchase something for my PalmPilot. When I went to the online checkout, I was redirected to https://palmorder.modusmedia.com/asp/store.asp. The SSL certificate was registered to Modus Media International; clearly a flagrant attempt to defraud Web customers, which I deftly uncovered because I carefully checked the SSL certificate. Not.

I doubt it. It’s true that VeriSign has certified this man-in-the-middle attack, but no one cares. I made my purchases anyway, because the security comes from credit card rules, not from the SSL. My maximum liability from a stolen card is $50, and I can repudiate a transaction if a fraudulent merchant tries to cheat me. As it is used, with the average user not bothering to verify the certificates exchanged and no revocation mechanism, SSL is just simply a (very slow) Diffie-Hellman key-exchange method. Digital certificates provide no actual security for electronic commerce; it’s a complete sham.

I doubt it. It’s true that VeriSign has certified this man-in-the-middle attack, but no one cares. I made my purchases anyway, because the security comes from credit card rules, not from the SSL. My maximum liability from a stolen card is $50, and I can repudiate a transaction if a fraudulent merchant tries to cheat me. As it is used, with the average user not bothering to verify the certificates exchanged and no revocation mechanism, SSL is just simply a (very slow) Diffie-Hellman key-exchange method. Digital certificates provide no actual security for electronic commerce; it’s a complete sham.

Copyright notes – I’m a proud owner of this book, but I didn’t copy this chapter myself, but rather found it on YURL’s website here.

Scalability articles – MySpace, Google, FarmVille

Several interesting articles about scalability and performance war-stories!
(This post was created over a month ago, published today)

High Scalability – How FarmVille Scales to Harvest 75 Million Players a Month

Perspectives – Scaling FarmVille

Perspectives – Scaling at MySpace

Perspectives – Jeff Dean: Design Lessons and Advice from Building Large Scale Distributed Systems

http://www.royans.net/arch/library/

THE RECAP @ engadget.com

Today I visited engadget.com and saw a very nice layout for summarizing all the news/articles that appeared on a site at a specific day.

Here it how it looks like for April 7th:

http://www.engadget.com/2010/04/07/the-daily-roundup-heres-what-you-mightve-missed/

  • 41 articles published today, generating different amounts of comment (# Comments / Time of day).
  • The color of the dots ranges from purple (cold, few comments) to red (hot!, e.g. 200+ comments).
  • Moving the mouse over one of the dots lets you see the article title, publishing time, and number of comments…
  • clicking on it simply takes you to the article…

nicely done, fast and efficient!

Z