Development of large-scale site performance optimization method from LiveJournal background

A LiveJournal course of development

is a project in the 99 years began in the campus, a few people do as a hobby such an application, in order to achieve the following functions:

Blog, forum
Social network, find friends
Polymerization article polymerization of friends

LiveJournal uses a lot of open source software, even if it itself is an open source software.

After on-line, LiveJournal very rapid growth:

April 2004: 280 million registered users.
April 2005: 680 million registered users.
August 2005: 790 million registered users.
Reached a thousand times per second page request processing.
A large number of MySQL server.
Use a lot of common components.

LiveJournal architecture status quo Profile

Third, from LiveJournal Development in learning

LiveJournal to 100 servers from a server development, which has experienced numerous pain, but also worked out a solution to these problems, through LiveJournal learning, allows us to avoid LJ had mistakes in the past, and good design of the system from the outset, in order to avoid the pain of late.

Let's step-by-step look at the pace of development of LJ.

1, a server

Others a donor server, LJ initially run at the top, just like Google began when broken server, worthy of our respect. At this stage, the LJ at an alarming rate familiar with the Unix operating management, server performance issues, Fortunately, you can change some minor repairs to muddle through. At this stage LJ upgrade CGI to FastCGI.

Final problem, the site is getting slower and slower, has been unable to pass too excellent to solve the point, you need more servers, then LJ began offering paid services may want the money to buy a new server to resolve then predicament.
There is no doubt that when LJ there is a huge single point, everything in server tin box filled.

2, two servers

Earned money paid service LJ bought two servers: one called Kenny Dell 6U machine is used to provide Web services, called Cartman Dell 6U server used to provide database services.

LJ have a larger disk, more computing resources. But at the same time, the network structure is very simple, each machine two network cards, Cartman Kenny intranet MySQL database services.

Temporary solution to the problem of the load, a new problem has emerged:

A single point into a two and a single point.
No cold backup or hot backup.
Site slow problems began to appear, no way, grow too fast.
Web server CPU limit is reached, the Web server.

3, 4 servers

Bought two, Kyle and Stan, this is 1U, are used to provide Web services. LJ, a total of 3 Web server and a database server. At this time both horizontal load 3 Web server.

LJ Kenny gateway for external mod_backhand to both horizontal load.

Then the problem has emerged:

Single point of failure. Database for gateway Web server is a single point, once any machine problems will result in all the service is not available. Web server can be used to make the gateway quickly switch synchronization by maintaining the heartbeat, but still can not solve a single point of the database, LJ that time, did not do this.
Website and slow, this is because the IO and database problem, the problem is how to add to the application inside the database?

4, five servers

Bought a database server. On two database servers using the database synchronization (MySQL support Master-Slave mode), the write operation all the master database (by Binlog, the write operation on the master server can quickly sync from the server), the read operation in two the database at the same time (it can be considered both horizontal load a).

Synchronize to the attention of a few things:

Read operation database selection algorithm processing to choose a current database load lighter.
Is only read from the database server
Ready to deal with the delay in the synchronization process, handled properly may result in database synchronization interrupt. Only the judge can write operation, the read operation does not exist synchronization problems.

5 or more servers

Money, of course, to buy more servers. Fast deployment did not take long, they began to slow. The more Web servers, database servers, there are IO and CPU contention. So the BIG-IP load balancing solution.

6, where we are now:

Server is basically enough, but the performance is still a problem, the reason for the structure.

The structure of the database is the biggest problem. Slave mode due to the increase in the database are added to the application, so the only advantage is that the read operation is distributed to multiple machines, but such consequences is a write operation is distributed, each machine must be running the server more , the greater the waste, with the increase of the write operation, the fewer resources used to service the read operation.

Distribution from one to two

The final results

Now we find that we do not need these data in so many servers keep a copy. Have done a RAID server, database backup, so the backup is completely a waste of resources, a redundant extreme excessive. Why not the distribution of data storage?

The problem is found, start thinking about how to solve. To do now is the distribution of different user data to a different server for storage, in order to achieve the distributed storage of data, each machine only for fixed relative to the user, in order to achieve parallel architecture and good scalability .

In order to achieve user group, we need to be allocated for each user a set of tags used to mark user's data is stored in the database server in which group. Each group database consists of one master and several slave, and the slave in 2-3, in order to achieve the most rational allocation of system resources, both to ensure the distribution of the data read operation, but also avoid the excessive redundancy of data and synchronous operation of system resources excessive consumption.

User packet control is provided by a (group of) central server. All user packet information is stored in this machine, all users need to query the user group number of this machine, and then to get the data in the database group.

This user structure and the LJ architecture has very similar.

In the specific implementation, a couple of caveats:

Do not use auto-incremented in the database group ID, in order to migrate users between the database group at a later date, in order to achieve a more reasonable I / O, disk space and load distribution.
Userid, postid is stored in the global server, you can use the increment, the corresponding value in the database group must be subject to the value on the global server. Global server transactional database InnoDB.
Between the database group when migrating users to be extremely careful when migrate user can not write operation.

7, Where are we now

Question:

A global master server, hang up, then all users to register and write operations to hang.
A master server for each database group, hang up, then the write operation of this group of users and hung.
Database group hang from the server it will lead to other server load is too large.

Single point for Master-Slave mode, LJ adopted a Master-Master mode to resolve. Master-Master is actually artificial, not provided directly by MySQL, which is actually two machines at the same time is the Master, also is the slave, synchronized with each other.

Master-Master achieve need to pay attention to:

A Master synchronization error recovery, it is best done automatically by the server.
Digital distribution, write on both machines at the same time, some ID may conflict.

Solution:

The parity assigned ID write an odd number, a machine, a machine to write even
Allocated by the global server (LJ practice).

Master-Master mode there is a use of this method with the former compared to still maintain the synchronization of the two machines, but only one machine (read and write), rotation every night, or appear problem when switching.

8 Where are we now

Now an ad spots MyISAM vs InnoDB.

Using InnoDB:

Support transactions
Need to do more configuration, but it is worth more secure storage of data, as well as get a faster rate.

Use MyISAM:

Log (LJ use it to the network access log).
Read-only static data storage, fast enough.
Concurrency is poor, unable to read and write data at the same time (add data can)
MySQL non-normal shutdown or crash can cause the index error, need to use myisamchk to repair, and when access to large amount of very frequent.

9 cache

Last year, I wrote , it is a caching tool developed by the team of LJ, key-value way to store data to distributed memory. Data LJ buffer:

12 stand-alone server (not donated)
28 instances
30GB total capacity
90-93% hit rate (used squid may know, squid memory plus disk hit rate of about 70-80%)

How to create a cache strategy?

I want to cache all things? It is not possible, we only need to cache or may result in system bottleneck submission system efficiency. MySQL log analysis, we can find the cached object.

The disadvantage of the cache?

Nothing is perfect, the cache also has drawbacks:
Increase the amount of development, the need for caching write special code.
Management more difficult, more people are needed to participate in system maintenance.
Of course, large memory needed money.

Web access load balancing

At the packet level using the BIG-IP, BIG-IP does not know our internal processing mechanism, can not determine which server processing these requests. The reverse proxy does not play a role, not been fast enough, that is, up to less than the effect we want.

So, LJ the development . Features:

Fast, small, manageable http web server / proxy
Can be forwarded to the internal
Using the Perl development
Single-threaded, asynchronous, event-based, use epoll, kqueue
Support Console management and http remote management, support for dynamic configuration loaded
A variety of modes: web server, reverse proxy, plug-ins
Support plugin: GIF / PNG interchangeable?

11 MogileFS

LJ use open source as the distributed file storage system. MogileFS very simple to use, its main design idea is:

The file belongs to the class (the class is the smallest unit of replication)
Storage location of the trace file
Stored on different hosts
MySQL Cluster Unified Storage distribution information
Big Easy Inexpensive Disks

So far so many more documents can be found in the . students take this document to participate in two MySQL Con, twice OS Con, as well as numerous other meetings, selfless to share their experience, that we can learn. In web2.0 era rapid development to get more and more attention, but good design is still the basis of each application, web2.0 in the way of growth Top500 website, not because of the architecture hindered the development of the site.