key / value database’ performance compare – 查詢 對照表效能 PK 賽

 

From URL : http://anyall.org/blog/2009/04/performance-comparison-keyvalue-stores-for-language-model-counts/

 

architecture name speed (tweets/sec)
in-memory, within-process python dictionary 2700
on-disk, within-process tokyo cabinet hashtable 1400
on-disk, within-process berkeleydb hashtable 340
on-disk, over socket tokyo tyrant, binary protocol 225
in-memory, over socket memcached 120
in-memory, over socket tokyo tyrant, memcached protocol 85
on-disk, over socket tokyo tyrant, memcached protocol 85
on-disk, over socket memcachedb 0.5

memcache 的測試值跟我測得的數據接近(我用 100K 的 data測)

為了避免該資料不見, 搜藏/節錄一下重點:

More details on the options:

  • Python dictionary: defaultdict(int) is the simplest and most obvious implementation. It’s the baseline and the fastest. This is the best option for many types of experimental NLP code, since it can just be serialized to disk for use later. Only if you want many processes to build it concurrently and incrementally, or want many processes to access the model but not have to hold it in their own process space, do the other options start becoming relevant.
  • BerkeleyDB: a well-known key/value store that I’ve used for a while. Unfortunately it’s been removed from the Python distribution, and there are often version hell issues every time I see people try to use it. (Every Linux/Unix seems to carry a different version, and they’re all not compatible with each other.)
  • Tokyo Cabinet is a newish key/value store that has some impressive benchmarks. I just learned about it from Leonard’s post, and I also found it to be excellent. If Cabinet keeps being so awesome, I might never use BerkeleyDB again. (Though installation issues are worse than BerkeleyDB since it’s new enough to not be a common package; e.g. I found it on MacPorts but not Debian.)
  • Memcached: The most standard in-memory key/value for use over sockets. Usually used for caching results from database queries for web applications — because in-memory caching is way faster than hitting disk on a database query. All data in a Memcached disappears if you turn it off. Clients talk to it via a plaintext protocol over sockets.
    • The fact it was slower than the dictionary or BDB or Cabinet means that the communication overhead was high. The nice thing about Memcached for keeping running counts like this is that it should distribute well: have lots of different processes/machines processing data and asking a central Memcached cluster to increment counters. It might be a little unfair to compare Memcached performance to BerkeleyDB or Cabinet, since it’s designed for the situation of communicating with many clients at once. It’s usually considered a win if Memcached is faster than a parallel-ly accessed RDBMS, which is very much the case.
    • I wonder how this architecture would compare to a Hadoop/HDFS/MapReduce for batch-job term counting performance. Jimmy Lin & other Maryland folks wrote an interesting report (2009) about using Memcached during a Hadoop job in a similar way for, among other things, this same language model use case. In general, lots of machine learning algorithms really don’t parallelize very well in the MapReduce architecture; parameter updates in Gibbs sampling, EM, and any online algorithm (e.g. SGD) are other examples. (An earlier paper on a better-than-mapreduce approach for EM parameters: Jason Wolfe et al. 2008; slides, paper.) A Memcached-like system could be a component of more client-server-ish parallel processing models for these use cases.
    • Note of warning: there are actually 3 different Python libraries to talk to Memcached: (1) memcache.py aka python-memcached; (2) cmemcache which wraps the C library libmemcache, and (3) cmemcached.pyx aka python-libmemcached write wraps a different C library, libmemcached. For each one, the X in import X correlates quite poorly to the project’s name. Bleah. Option #3 seems newest, or at least has the best-maintained websites, so I used that.
  • MemcacheDB is a BerkeleyDB-backed, Memcached-protocol server. Initially I had hoped it was just Memcached over BDB. Unfortunately this is clearly not the case. Its name is so similar yet its effectiveness is so different than Memcached! As Leonard points out, there are lots of half-assed solutions out there. It’s easy for anyone to create a system that works well for their needs, but it’s harder to make something more general.
  • Tokyo Tyrant is a server implemented on top of Cabinet that implements a similar key/value API except over sockets. It’s incredibly flexible; it was very easy to run it in several different configurations. The first one is to use an in-memory data store, and communicate using the memcached protocol. This is, of course, *exactly* comparable to Memcached — behaviorally indistinguishable! — and it does worse. The second option is to do that, except switch to an on-disk data store. It’s pretty ridiculous that that’s still the same speed — communication overhead is completely dominating the time. Fortunately, Tyrant comes with a binary protocol. Using that substantially improves performance past Memcached levels, though less than a direct in-process database. Yes, communication across processes incurs overhead. No news here, I guess.

I can’t say this evaluation tells us too much about the server systems, since it’s all for a single process, which really isn’t their use case. It is interesting, however, to see that memcached’s plaintext protocol causing a big performance hit compared to a binary one. There’s a lot of talk and perhaps code for a binary memcached protocol, but I couldn’t find any docs suggesting whether it currently works. Tyrant seems to work great.

The biggest takeaway is that Tokyo Cabinet is awesome. It has very complete English language documentation — something sadly lacking in many otherwise fine Japanese open-source projects — and appears to be highly performant and very flexible. This presentation by its author (Mikio Hirabayashi) shows a pretty impressive array of different things the suite of packages can do. At the very least, I’ll probably abandon BerkeleyDB if Cabinet keeps working so well; and hopefully, distribution and remote access will be easy to add via Tyrant.

Final note: it’s interesting how many of these new low-latency datastore systems come out of open-sourced projects from social network companies. Tokyo Cabinet/Tyrant is from Mixi, a large Japanese social networking site; Cassandra is from Facebook; and Voldemort is from LinkedIn. (Hadoop HDFS, approximately from Yahoo, is another open-source non-rdbms distributed datastore, though it’s not really low-latency enough to be comparable.) Then there are lots of commercial low-latency and distributed systems for data warehousing (oracle greenplum vertica aster…) but all these large web companies seem happy open-sourcing their infrastructure. This is great for me, but sucks to be a database company.

some mysql tips

  • mysql 的 innodb 重裝或改了 innodb_log_file_size 後, 發現 xxx/yyy.frm 壞了 , 解決辦法是 把 /var/lib/mysql/ib_logfile* 砍了, 再 restart mysql
  • http://dev.mysql.com/doc/refman/5.1/en/alter-table.html 中提到….
    若要大量 bluk 作 insert 動作前, 下 ALTER TABLE tbl_name DISABLE KEYS , 這樣可以讓 insert 加快,

    但是作完 insert 後還是得 enable keys , 把 missing 的 indexs 補回來, 我想這時也是非常耗時間吧!!

    另外 enable / disable keys 對於 mysql 5.1.1 以前的 partition table 沒有用

MySQL – Optimizing Database Structure

參考: http://dev.mysql.com/doc/refman/5.0/en/optimizing-database-structure.html

7.4.1. Make Your Data as Small as Possible
7.4.2. Column Indexes
7.4.3. Multiple-Column Indexes
7.4.4. How MySQL Uses Indexes
7.4.5. The MyISAM Key Cache
7.4.6. MyISAM Index Statistics Collection
7.4.7. How MySQL Opens and Closes Tables
7.4.8. Disadvantages of Creating Many Tables in the Same Database

減少 record structure 的大小 – Numeric Types 參考表
(用 MEDIUMINT 3bytes 比 INT 4bytes 好 , 若資料內容不可能有負值那加上 UNSIGNED , 則數值範圍可多一倍!)

fd5aa2a415b47b898c229846a7644cf8

mysql index 的建立/使用 , Multiple-Column Indexes

http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html

CREATE TABLE test (
    id         INT NOT NULL,
    last_name  CHAR(30) NOT NULL,
    first_name CHAR(30) NOT NULL,
    PRIMARY KEY (id),
    INDEX name (last_name,first_name)
);

The name index is an index over the last_name and first_name columns. The index can be used for queries that specify values in a known range for last_name, or for both last_name and first_name. Therefore, the name index is used in the following queries:

SELECT * FROM test WHERE last_name='Widenius';

SELECT * FROM test
  WHERE last_name='Widenius' AND first_name='Michael';

SELECT * FROM test
  WHERE last_name='Widenius'
  AND (first_name='Michael' OR first_name='Monty');

SELECT * FROM test
  WHERE last_name='Widenius'
  AND first_name >='M' AND first_name < 'N';

However, the name index is not used in the following queries, 以下的 query 用不到 index —> 多重 index 有先後區別.

SELECT * FROM test WHERE first_name='Michael';

SELECT * FROM test
  WHERE last_name='Widenius' OR first_name='Michael';

Oracle , full-table-scans (FTS) problem

http://www.dba-oracle.com/t_sql_like_clause_index_usage.htm

Indexing when using the SQL "like" clause can be tricky because the wildcard "%" operator can invalidate the index.  For example a last_name index would be OK with a "like ‘SMI%’" query, but unusable with "like ‘%SMI%’.

Solutions to this issue of a leading wildcard can be addressed in several ways::

Burleson Consulting 說:

These unnecessary full-table scans are a problem:
1. Large-table full-table scans increase the load on the disk I/O sub-system

2. Small table full table scans (in the data buffer) cause high consistent gets and drive-up CPU consumption

非必要的 full-table-scans 造成幾個問題 : 大的資料表會增加 disk I/O , 小的資料表則是增加 CPU 消耗. 所以這個現象可以拿來觀察資料庫系統的問題點.

Tuning Oracle Full-table Scans

http://www.orafaq.com/node/39