bous All American 11215 Posts user info edit post |
i'm helping in devlopment when an application and have come across an area where the SQL query on 30,000 items takes around 6 minutes.
SQLite.NET
update season SET HasLocalFilesTemp=1 where exists( select local_episodes.EpisodeFilename from local_episodes where SeriesID = season.SeriesID and SeasonIndex = season.SeasonIndex and local_episodes.LocalImportProcessed = 1 )
any ideas on how to make this more efficient? 5/5/2009 8:40:34 PM |
confusi0n All American 5076 Posts user info edit post |
omg u work 4 hulu!$%@ 5/5/2009 8:48:53 PM |
Ernie All American 45943 Posts user info edit post |
I know little about SQL optimization, but I'm sure you've created indexes on the appropriate columns. 5/5/2009 8:54:23 PM |
bous All American 11215 Posts user info edit post |
season has String PRIMARY KEY as ID
that's what i'm showing. i didn't create the database
local_episodes has String PRIMARY KEY as EpisodeFilename 5/5/2009 8:58:55 PM |
BigMan157 no u 103354 Posts user info edit post |
i know nothing about sqlite.net and generally suck at queries, but this may work if it supports multiple tables in updates
update season, local_episodes SET season.HasLocalFilesTemp=1 where local_episodes.SeriesID = season.SeriesID and local_episodes.SeasonIndex = season.SeasonIndex and local_episodes.LocalImportProcessed = 1
[Edited on May 5, 2009 at 9:03 PM. Reason : and if that doesn't work i'd try a JOIN and if that doesn't work i'd give upq] 5/5/2009 9:02:33 PM |
bous All American 11215 Posts user info edit post |
didn't like that - error'd at the first comma 5/5/2009 10:01:20 PM |
qntmfred retired 40819 Posts user info edit post |
your primary keys are strings?
reminds me of my last job. we inherited a database where all the dates were strings and what would normally be about 5 normalized tables was shoved into an xml blob attached to the primary table
[Edited on May 5, 2009 at 10:06 PM. Reason : .] 5/5/2009 10:04:52 PM |
Shaggy All American 17820 Posts user info edit post |
drop *; 5/5/2009 10:12:44 PM |
bous All American 11215 Posts user info edit post |
here's the thing... i have full control over changing this if need-be.
shall i change the current primary key that's a string and set that as a regular key, and create a new primary key as an int that is auto_increment? 5/5/2009 10:58:53 PM |
Ernie All American 45943 Posts user info edit post |
Quote : | "reminds me of my last job. we inherited a database where all the dates were strings and what would normally be about 5 normalized tables was shoved into an xml blob attached to the primary table" |
Ha
I'm dealing with the exact same thing now. Columns for names and IDs, columns for names + IDs, columns for descriptions, columns for names + IDs + descriptions, and on and on and on.
I want to destroy the guy who made these tables.
[Edited on May 5, 2009 at 11:03 PM. Reason : just dump the string key and set an auto_inc PK, and index the columns you'll be querying heavily]5/5/2009 11:02:39 PM |
Tiberius Suspended 7607 Posts user info edit post |
It looks like at least one of those columns has fairly low cardinality, hard to say without knowing more about the contents of the database, but if you're not already you'd probably get better performance combining SeriesID and SeasonIndex in a composite index. 5/6/2009 1:34:18 PM |
philihp All American 8349 Posts user info edit post |
It seems to me that if your data has grown to the point where speed matters, you should consider not using SQLite anymore. This may or may not increase the speed, and you may not even be able to do this in SQLite .NET:
UPDATE season s FROM season s INNER JOIN local_episodes l ON s.seriesid = l.seriesid AND s.seasonindex = l.seasonindex AND l.localimportprocessed = 1 SET haslocalfilestemp = 1
or
UPDATE ( SELECT haslocalfilestemp FROM season s INNER JOIN local_episodes l ON s.seriesid = l.seriesid AND s.seasonindex = l.seasonindex WHERE l.localimportprocessed = 1 ) SET haslocalfilestemp = 1 5/6/2009 3:27:10 PM |
skokiaan All American 26447 Posts user info edit post |
Quote : | "update season, local_episodes SET season.HasLocalFilesTemp=1 where local_episodes.SeriesID = season.SeriesID and local_episodes.SeasonIndex = season.SeasonIndex and local_episodes.LocalImportProcessed = 1" |
I like this one. all that other shit seems too complicated. do some simple debugging on this to fix the error
Probably only need one table name (season) after the update5/6/2009 7:26:21 PM |
Noen All American 31346 Posts user info edit post |
You need to move away from SQLite.
Your Primary Keys should be unique GUIDs, and stored as a numeric data type. This alone should massively speed up your query.
Next, do what Tiberius said:
"combining SeriesID and SeasonIndex in a composite index." Though I would abstract this.
Create a new table (since I'm going to guess SQLite doesn't support views) with: LocalImportProcessed, using SeriesID and SeasonIndex as the composite index (im guessing that seriesID is a unique identifier between series and local_episode).
Then, it's easy as you process imports (whatever those are) to insert to this table, and it's an order of magnitude faster to query this composite table to update HasLocalFilesTemp=1
Although honestly, why is this being pooled in the first place? Why isn't HasLocalFilesTemp=1 being set at the same time that local_episodes.LocalImportProcessed=1? 5/6/2009 9:17:18 PM |
CaelNCSU All American 7144 Posts user info edit post |
Autogenerate a number that doesn't require a full table scan.
God knows how that algorithm is implemented in SQLLite if it's that slow. 5/7/2009 12:12:13 AM |
philihp All American 8349 Posts user info edit post |
^okay that doesn't make any sense at all.
turning your IDs from strings to numbers is a red herring, it's going to save you what... 10-20 bytes per row? you won't see much of a benefit to doing it unless your tables are in the tens of millions of rows. yes, it's a good habit to get into, but this is not the source of much inefficiency in THIS query. not unless the rows in those tables are a few megs each, or you have the slowest disk IO known to man.
what SQLite is probably doing, is a full table scan on the local_episodes table for every row in the season table; essentially running that exists subquery a butt-load of times.
try to do what i said above (the second example is the Oracle syntax), or what skokiaan suggested (which is the MySQL or SQL Server syntax for the same thing) 5/7/2009 12:46:07 AM |
confusi0n All American 5076 Posts user info edit post |
the smaller your PK is the easier it is to keep an index in memory
as you alluded...the numbers posted by bous suggest something much more awful going on which is why i haven't contributed anything useful to this point. you can't expect to ask for help to optimize a query when people have to guess at the schema, data distribution, etc. Its a waste of time.
That and I highly suspect this database is being read from a 5.25" floppy drive.
[Edited on May 7, 2009 at 12:52 AM. Reason : th] 5/7/2009 12:52:06 AM |
Stimwalt All American 15292 Posts user info edit post |
Quote : | "You need to move away from SQLite." |
Quote : | "you should consider not using SQLite anymore" |
Quote : | "you can't expect to ask for help to optimize a query when people have to guess at the schema, data distribution, etc. Its a waste of time. " |
I especially agree with the last quotation, do you have an ERD (Entity-Relationship Diagram)? However, benchmark all of the scripts posted and let us know which, if any, provide better performance. I agree that the best methodology is to change to SQL Server. If it's an option in SQLite, print screen and post your execution plan after running your script, along with any of the performance details included, (like Physical Operation, Logical Operation, Cached Plan Size, Actual Number of Rows, Estimated I/O Cost). It may be a waste of time if you are expecting a true fix, but at least you'll have a better understanding of where the latency is taking place. Look for Index or table scans, Bookmark Lookups, Filters, Sorts.
[Edited on May 7, 2009 at 3:29 PM. Reason : -]5/7/2009 3:20:42 PM |
Noen All American 31346 Posts user info edit post |
Quote : | "turning your IDs from strings to numbers is a red herring, it's going to save you what... 10-20 bytes per row?" |
It's not about the actual data storage, it's about:
Quote : | "what SQLite is probably doing, is a full table scan on the local_episodes table for every row in the season table; essentially running that exists subquery a butt-load of times." |
I'm 99% sure that's what is happening based on the execution time. In order to eliminate the problem he needs to be using a lookup table, and in order to do that reliably he needs GUID's (which he should have anyway) to ensure data consistency.
I'm basing all my suggestions on the assumption that SQLite doesn't support anything useful on the DB to analyze or truly solve the problem.5/7/2009 4:06:35 PM |
LimpyNuts All American 16859 Posts user info edit post |
Yeah, SQLite is super gay, but this looks like it comes from the MP-TV Series plugin to MediaPortal.
I know this because I've fucked with it myself too. Rewriting the entire god damned plug-in to achieve some minor functionality improvement is quite possibly the shittiest idea you could suggest. Especially when they release an update and he has to rewrite that too to get the functionality back.
I like the comments:
Quote : | "WARNING: INDEX HAS TO BE INCLUDED FIRST ( I suck at SQL )" |
repeated several times in the source]5/8/2009 1:36:16 AM |
skokiaan All American 26447 Posts user info edit post |
if its a plugin, sounds like an embedded db is the correct solution.
in any case, if OP hasn't solved the problem by now, he is either lazy or dumb. Doesn't seem like that big of a deal to figure out 5/8/2009 2:29:25 AM |