Tag Archives: tunning

key / value database’ performance compare – 查詢 對照表效能 PK 賽

 

From URL : http://anyall.org/blog/2009/04/performance-comparison-keyvalue-stores-for-language-model-counts/

 

architecture name speed (tweets/sec)
in-memory, within-process python dictionary 2700
on-disk, within-process tokyo cabinet hashtable 1400
on-disk, within-process berkeleydb hashtable 340
on-disk, over socket tokyo tyrant, binary protocol 225
in-memory, over socket memcached 120
in-memory, over socket tokyo tyrant, memcached protocol 85
on-disk, over socket tokyo tyrant, memcached protocol 85
on-disk, over socket memcachedb 0.5

memcache 的測試值跟我測得的數據接近(我用 100K 的 data測)

為了避免該資料不見, 搜藏/節錄一下重點:

More details on the options:

  • Python dictionary: defaultdict(int) is the simplest and most obvious implementation. It’s the baseline and the fastest. This is the best option for many types of experimental NLP code, since it can just be serialized to disk for use later. Only if you want many processes to build it concurrently and incrementally, or want many processes to access the model but not have to hold it in their own process space, do the other options start becoming relevant.
  • BerkeleyDB: a well-known key/value store that I’ve used for a while. Unfortunately it’s been removed from the Python distribution, and there are often version hell issues every time I see people try to use it. (Every Linux/Unix seems to carry a different version, and they’re all not compatible with each other.)
  • Tokyo Cabinet is a newish key/value store that has some impressive benchmarks. I just learned about it from Leonard’s post, and I also found it to be excellent. If Cabinet keeps being so awesome, I might never use BerkeleyDB again. (Though installation issues are worse than BerkeleyDB since it’s new enough to not be a common package; e.g. I found it on MacPorts but not Debian.)
  • Memcached: The most standard in-memory key/value for use over sockets. Usually used for caching results from database queries for web applications — because in-memory caching is way faster than hitting disk on a database query. All data in a Memcached disappears if you turn it off. Clients talk to it via a plaintext protocol over sockets.
    • The fact it was slower than the dictionary or BDB or Cabinet means that the communication overhead was high. The nice thing about Memcached for keeping running counts like this is that it should distribute well: have lots of different processes/machines processing data and asking a central Memcached cluster to increment counters. It might be a little unfair to compare Memcached performance to BerkeleyDB or Cabinet, since it’s designed for the situation of communicating with many clients at once. It’s usually considered a win if Memcached is faster than a parallel-ly accessed RDBMS, which is very much the case.
    • I wonder how this architecture would compare to a Hadoop/HDFS/MapReduce for batch-job term counting performance. Jimmy Lin & other Maryland folks wrote an interesting report (2009) about using Memcached during a Hadoop job in a similar way for, among other things, this same language model use case. In general, lots of machine learning algorithms really don’t parallelize very well in the MapReduce architecture; parameter updates in Gibbs sampling, EM, and any online algorithm (e.g. SGD) are other examples. (An earlier paper on a better-than-mapreduce approach for EM parameters: Jason Wolfe et al. 2008; slides, paper.) A Memcached-like system could be a component of more client-server-ish parallel processing models for these use cases.
    • Note of warning: there are actually 3 different Python libraries to talk to Memcached: (1) memcache.py aka python-memcached; (2) cmemcache which wraps the C library libmemcache, and (3) cmemcached.pyx aka python-libmemcached write wraps a different C library, libmemcached. For each one, the X in import X correlates quite poorly to the project’s name. Bleah. Option #3 seems newest, or at least has the best-maintained websites, so I used that.
  • MemcacheDB is a BerkeleyDB-backed, Memcached-protocol server. Initially I had hoped it was just Memcached over BDB. Unfortunately this is clearly not the case. Its name is so similar yet its effectiveness is so different than Memcached! As Leonard points out, there are lots of half-assed solutions out there. It’s easy for anyone to create a system that works well for their needs, but it’s harder to make something more general.
  • Tokyo Tyrant is a server implemented on top of Cabinet that implements a similar key/value API except over sockets. It’s incredibly flexible; it was very easy to run it in several different configurations. The first one is to use an in-memory data store, and communicate using the memcached protocol. This is, of course, *exactly* comparable to Memcached — behaviorally indistinguishable! — and it does worse. The second option is to do that, except switch to an on-disk data store. It’s pretty ridiculous that that’s still the same speed — communication overhead is completely dominating the time. Fortunately, Tyrant comes with a binary protocol. Using that substantially improves performance past Memcached levels, though less than a direct in-process database. Yes, communication across processes incurs overhead. No news here, I guess.

I can’t say this evaluation tells us too much about the server systems, since it’s all for a single process, which really isn’t their use case. It is interesting, however, to see that memcached’s plaintext protocol causing a big performance hit compared to a binary one. There’s a lot of talk and perhaps code for a binary memcached protocol, but I couldn’t find any docs suggesting whether it currently works. Tyrant seems to work great.

The biggest takeaway is that Tokyo Cabinet is awesome. It has very complete English language documentation — something sadly lacking in many otherwise fine Japanese open-source projects — and appears to be highly performant and very flexible. This presentation by its author (Mikio Hirabayashi) shows a pretty impressive array of different things the suite of packages can do. At the very least, I’ll probably abandon BerkeleyDB if Cabinet keeps working so well; and hopefully, distribution and remote access will be easy to add via Tyrant.

Final note: it’s interesting how many of these new low-latency datastore systems come out of open-sourced projects from social network companies. Tokyo Cabinet/Tyrant is from Mixi, a large Japanese social networking site; Cassandra is from Facebook; and Voldemort is from LinkedIn. (Hadoop HDFS, approximately from Yahoo, is another open-source non-rdbms distributed datastore, though it’s not really low-latency enough to be comparable.) Then there are lots of commercial low-latency and distributed systems for data warehousing (oracle greenplum vertica aster…) but all these large web companies seem happy open-sourcing their infrastructure. This is great for me, but sucks to be a database company.

some mysql tips

  • mysql 的 innodb 重裝或改了 innodb_log_file_size 後, 發現 xxx/yyy.frm 壞了 , 解決辦法是 把 /var/lib/mysql/ib_logfile* 砍了, 再 restart mysql
  • http://dev.mysql.com/doc/refman/5.1/en/alter-table.html 中提到….
    若要大量 bluk 作 insert 動作前, 下 ALTER TABLE tbl_name DISABLE KEYS , 這樣可以讓 insert 加快,

    但是作完 insert 後還是得 enable keys , 把 missing 的 indexs 補回來, 我想這時也是非常耗時間吧!!

    另外 enable / disable keys 對於 mysql 5.1.1 以前的 partition table 沒有用

MySQL – Optimizing Database Structure

參考: http://dev.mysql.com/doc/refman/5.0/en/optimizing-database-structure.html

7.4.1. Make Your Data as Small as Possible
7.4.2. Column Indexes
7.4.3. Multiple-Column Indexes
7.4.4. How MySQL Uses Indexes
7.4.5. The MyISAM Key Cache
7.4.6. MyISAM Index Statistics Collection
7.4.7. How MySQL Opens and Closes Tables
7.4.8. Disadvantages of Creating Many Tables in the Same Database

減少 record structure 的大小 – Numeric Types 參考表
(用 MEDIUMINT 3bytes 比 INT 4bytes 好 , 若資料內容不可能有負值那加上 UNSIGNED , 則數值範圍可多一倍!)

fd5aa2a415b47b898c229846a7644cf8

mysql index 的建立/使用 , Multiple-Column Indexes

http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html

CREATE TABLE test (
    id         INT NOT NULL,
    last_name  CHAR(30) NOT NULL,
    first_name CHAR(30) NOT NULL,
    PRIMARY KEY (id),
    INDEX name (last_name,first_name)
);

The name index is an index over the last_name and first_name columns. The index can be used for queries that specify values in a known range for last_name, or for both last_name and first_name. Therefore, the name index is used in the following queries:

SELECT * FROM test WHERE last_name='Widenius';

SELECT * FROM test
  WHERE last_name='Widenius' AND first_name='Michael';

SELECT * FROM test
  WHERE last_name='Widenius'
  AND (first_name='Michael' OR first_name='Monty');

SELECT * FROM test
  WHERE last_name='Widenius'
  AND first_name >='M' AND first_name < 'N';

However, the name index is not used in the following queries, 以下的 query 用不到 index —> 多重 index 有先後區別.

SELECT * FROM test WHERE first_name='Michael';

SELECT * FROM test
  WHERE last_name='Widenius' OR first_name='Michael';

Oracle , full-table-scans (FTS) problem

http://www.dba-oracle.com/t_sql_like_clause_index_usage.htm

Indexing when using the SQL "like" clause can be tricky because the wildcard "%" operator can invalidate the index.  For example a last_name index would be OK with a "like ‘SMI%’" query, but unusable with "like ‘%SMI%’.

Solutions to this issue of a leading wildcard can be addressed in several ways::

Burleson Consulting 說:

These unnecessary full-table scans are a problem:
1. Large-table full-table scans increase the load on the disk I/O sub-system

2. Small table full table scans (in the data buffer) cause high consistent gets and drive-up CPU consumption

非必要的 full-table-scans 造成幾個問題 : 大的資料表會增加 disk I/O , 小的資料表則是增加 CPU 消耗. 所以這個現象可以拿來觀察資料庫系統的問題點.

Tuning Oracle Full-table Scans

http://www.orafaq.com/node/39

Oracle hint 用法 , database 優化 tunning

常見Oracle HINT的用法:

1. /*+ALL_ROWS*/
表明對語句塊選擇基於開銷的優化方法,並獲得最佳吞吐量,使資源消耗最小化.
例如:
SELECT /*+ALL+_ROWS*/ EMP_NO,EMP_NAM,DAT_IN FROM BSEMPMS WHERE EMP_NO=’SCOTT’;
2. /*+FIRST_ROWS*/
表明對語句塊選擇基於開銷的優化方法,並獲得最佳響應時間,使資源消耗最小化.
例如:
SELECT /*+FIRST_ROWS*/ EMP_NO,EMP_NAM,DAT_IN FROM BSEMPMS WHERE EMP_NO=’SCOTT’;

3. /*+CHOOSE*/
表明如果數據字典中有訪問表的統計資料,將基於開銷的優化方法,並獲得最佳的吞吐量;
表明如果數據字典中沒有訪問表的統計資料,將基於規則開銷的優化方法;
例如:
SELECT /*+CHOOSE*/ EMP_NO,EMP_NAM,DAT_IN FROM BSEMPMS WHERE EMP_NO=’SCOTT’;

4. /*+RULE*/
表明對語句塊選擇基於規則的優化方法.
例如:
SELECT /*+ RULE */ EMP_NO,EMP_NAM,DAT_IN FROM BSEMPMS WHERE EMP_NO=’SCOTT’;

5. /*+FULL(TABLE)*/
表明對表選擇全局掃描的方法.
例如:
SELECT /*+FULL(A)*/ EMP_NO,EMP_NAM FROM BSEMPMS A WHERE EMP_NO=’SCOTT’;

6. /*+ROWID(TABLE)*/
提示明確表明對指定表根據ROWID進行訪問.
例如:
SELECT /*+ROWID(BSEMPMS)*/ * FROM BSEMPMS WHERE ROWID>=’AAAAAAAAAAAAAA’
AND EMP_NO=’SCOTT’;

7. /*+CLUSTER(TABLE)*/
提示明確表明對指定表選擇簇掃描的訪問方法,它只對簇對像有效.
例如:
SELECT /*+CLUSTER */ BSEMPMS.EMP_NO,DPT_NO FROM BSEMPMS,BSDPTMS
WHERE DPT_NO=’TEC304′ AND BSEMPMS.DPT_NO=BSDPTMS.DPT_NO;

8. /*+INDEX(TABLE INDEX_NAME)*/
表明對錶選擇索引的掃描方法.
例如:
SELECT /*+INDEX(BSEMPMS SEX_INDEX) USE SEX_INDEX BECAUSE THERE ARE FEWMALE BSEMPMS */ FROM BSEMPMS WHERE SEX=’M’;

select /*+INDEX(TBL_TOP_KEYWORD_DAY TBL_TOP_KEYWORD_DAY_XDATE)*/
to_char(xdate,’YYYY-WW’) X1 , sum(val) Y1
from TBL_TOP_KEYWORD_DAY

 

9. /*+INDEX_ASC(TABLE INDEX_NAME)*/
表明對錶選擇索引升序的掃描方法.
例如:
SELECT /*+INDEX_ASC(BSEMPMS PK_BSEMPMS) */ FROM BSEMPMS WHERE DPT_NO=’SCOTT’;

10. /*+INDEX_COMBINE*/
為指定表選擇位圖訪問路經,如果INDEX_COMBINE中沒有提供作為參數的索引,將選擇出位圖索引的布爾組合方式.
例如:
SELECT /*+INDEX_COMBINE(BSEMPMS SAL_BMI HIREDATE_BMI)*/ * FROM BSEMPMS
WHERE SAL<5000000 emp_no=”SCOTT” sex=”M” dpt_no=”V.DPT_NO”>V.AVG_SAL;

20. /*+NO_MERGE(TABLE)*/
對於有可合併的視圖不再合併.
例如:
SELECT /*+NO_MERGE(V) */ A.EMP_NO,A.EMP_NAM,B.DPT_NO FROM BSEMPMS A (SELECT DPT_NO,AVG(SAL) AS AVG_SAL FROM BSEMPMS B GROUP BY DPT_NO) V WHERE A.DPT_NO=V.DPT_NO AND A.SAL>V.AVG_SAL;

21. /*+ORDERED*/
根據表出現在FROM中的順序,ORDERED使ORACLE依此順序對其連接.
例如:
SELECT /*+ORDERED*/ A.COL1,B.COL2,C.COL3 FROM TABLE1 A,TABLE2 B,TABLE3 C WHERE A.COL1=B.COL1 AND B.COL1=C.COL1;

22. /*+USE_NL(TABLE)*/
將指定表與嵌套的連接的行源進行連接,並把指定表作為內部表.
例如:
SELECT /*+ORDERED USE_NL(BSEMPMS)*/ BSDPTMS.DPT_NO,BSEMPMS.EMP_NO,BSEMPMS.EMP_NAM FROM BSEMPMS,BSDPTMS WHERE BSEMPMS.DPT_NO=BSDPTMS.DPT_NO;

23. /*+USE_MERGE(TABLE)*/
將指定的表與其他行源通過合併排序連接方式連接起來.
例如:
SELECT /*+USE_MERGE(BSEMPMS,BSDPTMS)*/ * FROM BSEMPMS,BSDPTMS WHERE BSEMPMS.DPT_NO=BSDPTMS.DPT_NO;

24. /*+USE_HASH(TABLE)*/
將指定的表與其他行源通過hash連接方式連接起來.
例如:
SELECT /*+USE_HASH(BSEMPMS,BSDPTMS)*/ * FROM BSEMPMS,BSDPTMS WHERE BSEMPMS.DPT_NO=BSDPTMS.DPT_NO;

25. /*+DRIVING_SITE(TABLE)*/
強制與ORACLE所選擇的位置不同的表進行查詢執行.
例如:
SELECT /*+DRIVING_SITE(DEPT)*/ * FROM BSEMPMS,DEPT@BSDPTMS WHERE BSEMPMS.DPT_NO=DEPT.DPT_NO;

26. /*+LEADING(TABLE)*/
將指定的表作為連接次序中的首表.

27. /*+CACHE(TABLE)*/
當進行 full scan 時,CACHE提示能夠將表的檢索塊放置在緩衝區緩存中最近最少列表LRU的最近使用端
例如:
SELECT /*+FULL(BSEMPMS) CAHE(BSEMPMS) */ EMP_NAM FROM BSEMPMS;

28. /*+NOCACHE(TABLE)*/
當進行 full scan 時,CACHE提示能夠將表的檢索塊放置在緩衝區緩存中最近最少列表LRU的最近使用端
例如:
SELECT /*+FULL(BSEMPMS) NOCAHE(BSEMPMS) */ EMP_NAM FROM BSEMPMS;

29. /*+APPEND*/
直接插入到表的最後,可以提高速度.
insert /*+append*/ into test1 select * from test4 ;

30. /*+NOAPPEND*/
通過在插入語句生存期內停止並行模式來啟動常規插入.
insert /*+noappend*/ into test1 select * from test4 ;