MySQL v4.1.14, ia32 (currently running on win2k but once "in production" will likely be on NetBSD. (tho still 32 bit intel architecture).
I've been reading up on the different Storage Engine options trying to determine my best choice for performance (speed only.....reliability is less of a concern as an occasional crash or even 100% data loss will have a minimal impact for my needs. Downtime while restoring from backup won't really be an issue). I know my question is very table, data, and usage-specific, so I will try to fully explain how the final databse will be used.
Once I populate the tables, no data will be added.
There will be one main table consisting of the following field types:
10 single-character fields, 1 2-byte integer, all non-unique, and one unique 3-byte int so 15 bytes per record + index size:
(Char(1),Char(1),Char(1),C
har(1),Cha
r(1),Char(
1),Char(1)
,Char(1),C
har(1),Cha
r(1),2-byt
e-integer,
3-byte-int
eger)
Only the last column (3-byte-integer) will be unique so therefore will be my primary key.
All the Char(1) columns will be indexed with duplicates allowed.
The main table will always have exactly 649,740 records.
Approximately 300 SELECTs will be performed on that initial table, creating 300 temporary tables. (probably too many to all fit in RAM, but I had difficulty estimating what my index size will be (and therefore total table size) with any of the different Engines. Each Char(1) will be one of only 14 unique characters or <NULL> so it should compress very well...wish I could use 4-bit fields instead of a whole byte, seems I'm wasting more than twice the space I actually need.
Examples of the kinds of querries those 300 SELECTs will be:
all rows containing "J" and "s" in any of the Char(1) columns
all rows not containing "6", "g", "h", or "t" in any Char(1) column
Each of those querries will be returning up to 585,000 rows (although usually around 100,000 rows).
Most of the time, ~300 more querries (similar to the above examples) will be done against each those results (for a total of 90,000 more SELECTs), again with the results being up to about 90% of the size of the temporary table created from the prior querries, but usually the results will be closer to 17% of the querried-table's size. These 90,000 resultant tables can be dropped one by one as soon as a calculation is done on the results (and before the next of the 90,000 SELECTs are sent, so I don't need resources to hold 90,000 temp tables!)
The above types of querries will happen 5 times, with each time getting longer SELECT statements typically as follows:
Round 1: I'd request all rows containing either of 2 different characters in any of the Char(1) fields
Round 2: all rows containing and of 5-7 characters in any Char(1) field.
Round 3: 6-8 chars
Round 4: 7-9 chars
Round 5: 8-10 chars
Each time a new Table is created in each Round <x> above from the results of the first 300 resulting tables, the last table can be dropped as future querries will be run against the latest-created one. I wish I could have worded that better....it'll go like this (Sorry for the pseudo-code that follows, but it seemed it'd be easier to follow than that last sentence!):
300 temp tables are initially created from the main table.
Loop, 300
{
Do a Select on the first of the above 300 temp tables
Client will run a calculation on the resulting table, then drop the table
}
Drop the first temp table
Loop, 300
{
Do the same 300 SELECTs on the second of the 300 temp tables
Calculate and drop
}
....
until all 90,000 iterations are copmlete and all 300 temp tables are deleted.
This process of 90,000 iterations for each of the 5 rounds (450,000 SELECTs total) will repeat continuously for as long as the database is in use (which I expect to be around 5-6 hours a day....which is why it doesn't matter if it crashes and I have to spend 20-30 mins to go downstairs and reboot/restore/whatever...
all I'd lose is the last querries performed and could easilly pick up where I left off after it's back up)
All SELECTs will be done from a command line client, and only one user will be accessing the database at any time.
Currently I'm using the 'doze command line client that comes w/ the MySQL distribution, but will use something else if it would speed things up.
The network connection from the client to the server would be a dedicated 100mbit full duplex circuit (and I could bump up to gigabit if necessary)). If it would help, I could run the client on the same machine as mysqld....but the calculations the client machine will be doing will take about 20% of the CPU cycles.
I've pretty much never typed a SQL command in my life until yesterday (although I've done some small time stuff w/ gui windoze products (access & foxpro) and have read a fair amount of 'nix docs over the years covering at least the basics of db's...if I only remember one thing I've ever read about databases, it'd have to be that initial table and database design critically impacts performance and usability, hence this question for you fellow experts)
Thanks!!!
*Phew*......All that just to ask:
"Which Storage Engine should I use?"