- StormyKnight wrote:
- One thing Billy that XBMC do is for each folder they store a checksum. I'm not sure how it is created, but it must be faster to check if a folder is changed by recalculating this sum than actually walking the folder contents. Just a thought if the time to check for new movies does take a large part of the overall scrapping time. This won't help ofcourse if all the users movies are in one folder.
Another thought I had was when actually downloading the fanart/posters is there much time spent waiting for responses from IMDB. What I getting at is if you where able to send off 5 or so requests for 5 different posters for 5 different movies, could that allow you to have 5 concurrent connections & as such hopefully download those 5 posters quicker than 5 sequentially requested posters would be? Just as web browsers do the same thing when loading a page with multiple components.
Or is there a limit on the number of concurrent connections to IMDB or TVDB?
There is a folder monitor function actually built into .net which runs in the background and raises an event if for example a file is added to a directory. I am not sure how well it works since I have not had to use it before, but I do believe that in the future that this is the way to go.
I did at one point experiment with having 4 seperate threads scraping movies, there was no noticable speed increase and I did put it down to IMDB only allowing a single connection per IP.
When scraping a new movie MC does the following.
Search Google for movie on IMDB website (Very Fast and accurate)
Load main IMDB movie page for main details (Very Fast)
Load actors page (Very fast)
Load Trailer Page and get trailer url page (Very fast)
Load trailer URL page (Very fast)
Load all IMDB poster pages (If add thumbnail urls to nfo is enabled) (Can be anything from 1-20 pages) very fast
Load IMDB main plot page (Very fast)
Get fanart from TMDB (Can be very slow)
Get poster from TMDB (Can be very slow)
Get full list of posters from TMDB for nfo urls (If add TMDB thumbnail urls to nfo is enabled)(Can be very slow)
Get full list of posters from IMPD for nfo urls(If add IMPD thumbnail urls to nfo is enabled) (Speed is usually ok)
Get full list of posters from IMPA for nfo urls(If add IMPA thumbnail urls to nfo is enabled) (Speed is usually ok)
Once each page is loaded MC scrapes the appropriate information from it, the time this takes is negligible.
As you can see the scraper has to perform quite a bit of work to compile a fully populated nfo file.
There could be a small ammount of optimisation for the IMDB scraper, for example taking the actors from the main movie page when people have selected to scrape 15 or less actors, but the benefit would be hardly noticeable, on my 10meg connection it takes between 5 & 8 seconds to scrape the main body from IMDB, then anywhere between 15 & 60 seconds to actually download the hi-res fanart and poster.
It would be possible to create additional threads to scrape IMDB, MPDB, TMDB and IMPA seperately, but the complexity of this is in my oppinion not worth the slight benefits.