Answers:
在这种独特情况下,MyISAM可以使InnoDB更快的唯一方法
读取时,可以从.MYI文件读取一次MyISAM表的索引,并将其加载到MyISAM密钥缓存中(由key_buffer_size调整大小)。如何使MyISAM表的.MYD读起来更快?有了这个:
ALTER TABLE mytable ROW_FORMAT=Fixed;
我在过去的帖子中写了这个
好的,InnoDB呢?InnoDB是否进行任何磁盘I / O查询?令人惊讶的是,是的!您可能以为我为之疯狂,但这是绝对正确的,即使对于SELECT查询也是如此。此时,您可能想知道“ InnoDB在世界上如何进行磁盘I / O查询?”
这一切都可以回溯到InnoDB,它是一个具有ACID投诉事务存储引擎。为了使InnoDB具有事务性,它必须支持I
in ACID
,即隔离。维护事务隔离的技术是通过MVCC(Multiversion并发控制)完成的。简单来说,InnoDB会在事务尝试更改数据之前记录其数据。在哪里记录?在系统表空间文件中,更好地称为ibdata1。这需要磁盘I / O。
既然InnoDB和MyISAM都执行磁盘I / O,那么哪些随机因素决定了谁更快?
DELETEs
和引起的数据碎片UPDATEs
因此,在繁重的读取环境中,如果有足够的数据被写入ibdata1中包含的撤消日志中以支持事务行为,则具有固定行格式的MyISAM表有可能胜过InnoDB从InnoDB缓冲池中读出的数据施加在InnoDB数据上。
仔细计划您的数据类型,查询和存储引擎。数据增长后,移动数据可能会变得非常困难。只是问问Facebook ...
在一个简单的世界中,MyISAM的读取速度更快,InnoDB的写入速度更快。
一旦开始引入混合读/写,由于其行锁定机制,InnoDB的读取速度也会更快。
几年前,我写了一个MySQL存储引擎的比较, 直到今天仍然如此,概述了MyISAM和InnoDB之间的独特区别。
以我的经验,应该将InnoDB用于除读取大量缓存表以外的所有内容,在缓存表中,由于损坏而丢失数据并不那么重要。
为了增加此处涉及两个引擎之间机械差异的响应,我提出了一个经验速度比较研究。
就纯速度而言,MyISAM并不总是比InnoDB快,但以我的经验,在PURE READ工作环境中,它往往要快2.0到2.5倍。显然,这并不适合所有环境-正如其他人所写的那样,MyISAM缺少事务和外键之类的东西。
我在下面做了一些基准测试-我使用python进行循环,并使用timeit库进行时间比较。出于兴趣,我还包括了内存引擎,尽管它仅适用于较小的表(The table 'tbl' is full
超过MySQL内存限制时,您会不断遇到),但它可提供最佳的整体性能。我查看的四种选择是:
首先,我使用以下SQL创建了三个表
CREATE TABLE
data_interrogation.test_table_myisam
(
index_col BIGINT NOT NULL AUTO_INCREMENT,
value1 DOUBLE,
value2 DOUBLE,
value3 DOUBLE,
value4 DOUBLE,
PRIMARY KEY (index_col)
)
ENGINE=MyISAM DEFAULT CHARSET=utf8
在第二个和第三个表中用“ MyISAM”代替“ InnoDB”和“内存”。
查询: SELECT * FROM tbl WHERE index_col = xx
结果:平局
它们的速度大致相同,并且所期望的要选择的列数是线性的。InnoDB的似乎稍微比的MyISAM快但这的确是微不足道的。
码:
import timeit
import MySQLdb
import MySQLdb.cursors
import random
from random import randint
db = MySQLdb.connect(host="...", user="...", passwd="...", db="...", cursorclass=MySQLdb.cursors.DictCursor)
cur = db.cursor()
lengthOfTable = 100000
# Fill up the tables with random data
for x in xrange(lengthOfTable):
rand1 = random.random()
rand2 = random.random()
rand3 = random.random()
rand4 = random.random()
insertString = "INSERT INTO test_table_innodb (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"
insertString2 = "INSERT INTO test_table_myisam (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"
insertString3 = "INSERT INTO test_table_memory (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"
cur.execute(insertString)
cur.execute(insertString2)
cur.execute(insertString3)
db.commit()
# Define a function to pull a certain number of records from these tables
def selectRandomRecords(testTable,numberOfRecords):
for x in xrange(numberOfRecords):
rand1 = randint(0,lengthOfTable)
selectString = "SELECT * FROM " + testTable + " WHERE index_col = " + str(rand1)
cur.execute(selectString)
setupString = "from __main__ import selectRandomRecords"
# Test time taken using timeit
myisam_times = []
innodb_times = []
memory_times = []
for theLength in [3,10,30,100,300,1000,3000,10000]:
innodb_times.append( timeit.timeit('selectRandomRecords("test_table_innodb",' + str(theLength) + ')', number=100, setup=setupString) )
myisam_times.append( timeit.timeit('selectRandomRecords("test_table_myisam",' + str(theLength) + ')', number=100, setup=setupString) )
memory_times.append( timeit.timeit('selectRandomRecords("test_table_memory",' + str(theLength) + ')', number=100, setup=setupString) )
查询: SELECT count(*) FROM tbl
结果:MyISAM获胜
这说明了MyISAM和InnoDB之间的巨大区别-MyISAM(和内存)跟踪表中的记录数,因此此事务速度很快,且O(1)。在我研究的范围内,InnoDB计数所需的时间随着表的大小而呈超线性增加。我怀疑在实践中观察到的许多MyISAM查询的提速是由于类似的影响。
码:
myisam_times = []
innodb_times = []
memory_times = []
# Define a function to count the records
def countRecords(testTable):
selectString = "SELECT count(*) FROM " + testTable
cur.execute(selectString)
setupString = "from __main__ import countRecords"
# Truncate the tables and re-fill with a set amount of data
for theLength in [3,10,30,100,300,1000,3000,10000,30000,100000]:
truncateString = "TRUNCATE test_table_innodb"
truncateString2 = "TRUNCATE test_table_myisam"
truncateString3 = "TRUNCATE test_table_memory"
cur.execute(truncateString)
cur.execute(truncateString2)
cur.execute(truncateString3)
for x in xrange(theLength):
rand1 = random.random()
rand2 = random.random()
rand3 = random.random()
rand4 = random.random()
insertString = "INSERT INTO test_table_innodb (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"
insertString2 = "INSERT INTO test_table_myisam (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"
insertString3 = "INSERT INTO test_table_memory (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"
cur.execute(insertString)
cur.execute(insertString2)
cur.execute(insertString3)
db.commit()
# Count and time the query
innodb_times.append( timeit.timeit('countRecords("test_table_innodb")', number=100, setup=setupString) )
myisam_times.append( timeit.timeit('countRecords("test_table_myisam")', number=100, setup=setupString) )
memory_times.append( timeit.timeit('countRecords("test_table_memory")', number=100, setup=setupString) )
查询: SELECT * FROM tbl WHERE value1<0.5 AND value2<0.5 AND value3<0.5 AND value4<0.5
结果:MyISAM获胜
在这里,MyISAM和内存的性能大致相同,对于较大的表,它们比InnoDB快50%。这种查询似乎使MyISAM的好处最大化。
码:
myisam_times = []
innodb_times = []
memory_times = []
# Define a function to perform conditional selects
def conditionalSelect(testTable):
selectString = "SELECT * FROM " + testTable + " WHERE value1 < 0.5 AND value2 < 0.5 AND value3 < 0.5 AND value4 < 0.5"
cur.execute(selectString)
setupString = "from __main__ import conditionalSelect"
# Truncate the tables and re-fill with a set amount of data
for theLength in [3,10,30,100,300,1000,3000,10000,30000,100000]:
truncateString = "TRUNCATE test_table_innodb"
truncateString2 = "TRUNCATE test_table_myisam"
truncateString3 = "TRUNCATE test_table_memory"
cur.execute(truncateString)
cur.execute(truncateString2)
cur.execute(truncateString3)
for x in xrange(theLength):
rand1 = random.random()
rand2 = random.random()
rand3 = random.random()
rand4 = random.random()
insertString = "INSERT INTO test_table_innodb (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"
insertString2 = "INSERT INTO test_table_myisam (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"
insertString3 = "INSERT INTO test_table_memory (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"
cur.execute(insertString)
cur.execute(insertString2)
cur.execute(insertString3)
db.commit()
# Count and time the query
innodb_times.append( timeit.timeit('conditionalSelect("test_table_innodb")', number=100, setup=setupString) )
myisam_times.append( timeit.timeit('conditionalSelect("test_table_myisam")', number=100, setup=setupString) )
memory_times.append( timeit.timeit('conditionalSelect("test_table_memory")', number=100, setup=setupString) )
结果:InnoDB获胜
对于此查询,我为子选择创建了一组附加表。每行仅是两列BIGINT,一列具有主键索引,一列不具有任何索引。由于表很大,因此我没有测试内存引擎。SQL表创建命令是
CREATE TABLE
subselect_myisam
(
index_col bigint NOT NULL,
non_index_col bigint,
PRIMARY KEY (index_col)
)
ENGINE=MyISAM DEFAULT CHARSET=utf8;
在第二个表中,再次用“ MyISAM”代替“ InnoDB”。
在此查询中,我将选择表的大小保留为1000000,而是更改了子选择列的大小。
在这里,InnoDB轻松获胜。到达合理的尺寸表后,两个引擎都随子选择的尺寸线性缩放。索引加快了MyISAM命令的速度,但有趣的是对InnoDB的速度影响很小。subSelect.png
码:
myisam_times = []
innodb_times = []
myisam_times_2 = []
innodb_times_2 = []
def subSelectRecordsIndexed(testTable,testSubSelect):
selectString = "SELECT * FROM " + testTable + " WHERE index_col in ( SELECT index_col FROM " + testSubSelect + " )"
cur.execute(selectString)
setupString = "from __main__ import subSelectRecordsIndexed"
def subSelectRecordsNotIndexed(testTable,testSubSelect):
selectString = "SELECT * FROM " + testTable + " WHERE index_col in ( SELECT non_index_col FROM " + testSubSelect + " )"
cur.execute(selectString)
setupString2 = "from __main__ import subSelectRecordsNotIndexed"
# Truncate the old tables, and re-fill with 1000000 records
truncateString = "TRUNCATE test_table_innodb"
truncateString2 = "TRUNCATE test_table_myisam"
cur.execute(truncateString)
cur.execute(truncateString2)
lengthOfTable = 1000000
# Fill up the tables with random data
for x in xrange(lengthOfTable):
rand1 = random.random()
rand2 = random.random()
rand3 = random.random()
rand4 = random.random()
insertString = "INSERT INTO test_table_innodb (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"
insertString2 = "INSERT INTO test_table_myisam (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"
cur.execute(insertString)
cur.execute(insertString2)
for theLength in [3,10,30,100,300,1000,3000,10000,30000,100000]:
truncateString = "TRUNCATE subselect_innodb"
truncateString2 = "TRUNCATE subselect_myisam"
cur.execute(truncateString)
cur.execute(truncateString2)
# For each length, empty the table and re-fill it with random data
rand_sample = sorted(random.sample(xrange(lengthOfTable), theLength))
rand_sample_2 = random.sample(xrange(lengthOfTable), theLength)
for (the_value_1,the_value_2) in zip(rand_sample,rand_sample_2):
insertString = "INSERT INTO subselect_innodb (index_col,non_index_col) VALUES (" + str(the_value_1) + "," + str(the_value_2) + ")"
insertString2 = "INSERT INTO subselect_myisam (index_col,non_index_col) VALUES (" + str(the_value_1) + "," + str(the_value_2) + ")"
cur.execute(insertString)
cur.execute(insertString2)
db.commit()
# Finally, time the queries
innodb_times.append( timeit.timeit('subSelectRecordsIndexed("test_table_innodb","subselect_innodb")', number=100, setup=setupString) )
myisam_times.append( timeit.timeit('subSelectRecordsIndexed("test_table_myisam","subselect_myisam")', number=100, setup=setupString) )
innodb_times_2.append( timeit.timeit('subSelectRecordsNotIndexed("test_table_innodb","subselect_innodb")', number=100, setup=setupString2) )
myisam_times_2.append( timeit.timeit('subSelectRecordsNotIndexed("test_table_myisam","subselect_myisam")', number=100, setup=setupString2) )
我认为所有这一切的基本含义是,如果您真正关心速度,则需要对正在执行的查询进行基准测试,而不是对哪种引擎更合适做任何假设。
SELECT * FROM tbl WHERE index_col = xx
-以下两个因素可能导致图表出现更多变化:主键与辅助键;索引是否缓存。
SELECT COUNT(*)
在添加WHERE
子句之前,它是MyISAM的明显赢家。
哪个更快?两者可能更快。YMMV。
您应该使用哪个?InnoDB-崩溃安全等