Add single movie file to be scraped

New User Posts : 4 Join date : 2009-07-13

Currently you have to do Media Search to add new movies to be scraped and if you have a large library it will take some time. It would be great if there was an 'Add Media' feature so the scraping can start right away instead of waiting minutes for the entire media folders to be scanned.

thx
m

Subject: Re: Add single movie file to be scraped Sat Jul 25, 2009 10:48 am

For me MC took 7 secs to scan the 1 folder I have 100 movies in.....& those movies are on a NAS on a 100mbit connection.

Have you tried to measure the time it takes to just check for new files by running a Media Scan after all of your movies have already been scraped. i.e. running the Media scan a second time?

If this is short in time then the minutes you are refering to are probably the time to actually scrape a new movie & download the posters/fanart.

If not, can you give us an idea of your setup....number of folders, are they on a NAS or local to your PC etc etc...

Cheers

Admin Posts : 1326 Join date : 2008-09-20

Most of my movies are stored on a cheap GIGANAS and I have to admit it does take quite a while to search the system for new media.

I'm not sure what I can do about this, are you talking about browsing for new media,

I was thinking of adding a drag and drop system at some point in the future for individual files or even make use of .net monitor feature although the latter would be require quite a bit of work and will not be anytime soon.

Subject: Re: Add single movie file to be scraped Sat Jul 25, 2009 3:27 pm

One thing Billy that XBMC do is for each folder they store a checksum. I'm not sure how it is created, but it must be faster to check if a folder is changed by recalculating this sum than actually walking the folder contents. Just a thought if the time to check for new movies does take a large part of the overall scrapping time. This won't help ofcourse if all the users movies are in one folder.

Another thought I had was when actually downloading the fanart/posters is there much time spent waiting for responses from IMDB. What I getting at is if you where able to send off 5 or so requests for 5 different posters for 5 different movies, could that allow you to have 5 concurrent connections & as such hopefully download those 5 posters quicker than 5 sequentially requested posters would be? Just as web browsers do the same thing when loading a page with multiple components.

Or is there a limit on the number of concurrent connections to IMDB or TVDB?

New User Posts : 4 Join date : 2009-07-13

thx guys;
I am on an 11mb wireless network connecting to a macmini small server which is connected to a Western Digital 1tera HD. read/write to the hard disk is very good, but when I click Media Search it does take 2-3 secs for each folder and i have 28 folders (over 600 movies total), so it does take quite a bit of time to go thru them all. The actual scraping doesn't take long at all, i'm on a fast connection to the internet.

overall no complaints... i love the tool Very Happy

thx billy

Admin Posts : 1326 Join date : 2008-09-20

StormyKnight wrote:: One thing Billy that XBMC do is for each folder they store a checksum. I'm not sure how it is created, but it must be faster to check if a folder is changed by recalculating this sum than actually walking the folder contents. Just a thought if the time to check for new movies does take a large part of the overall scrapping time. This won't help ofcourse if all the users movies are in one folder.

Another thought I had was when actually downloading the fanart/posters is there much time spent waiting for responses from IMDB. What I getting at is if you where able to send off 5 or so requests for 5 different posters for 5 different movies, could that allow you to have 5 concurrent connections & as such hopefully download those 5 posters quicker than 5 sequentially requested posters would be? Just as web browsers do the same thing when loading a page with multiple components.

Or is there a limit on the number of concurrent connections to IMDB or TVDB?

There is a folder monitor function actually built into .net which runs in the background and raises an event if for example a file is added to a directory. I am not sure how well it works since I have not had to use it before, but I do believe that in the future that this is the way to go.

I did at one point experiment with having 4 seperate threads scraping movies, there was no noticable speed increase and I did put it down to IMDB only allowing a single connection per IP.

When scraping a new movie MC does the following.
Search Google for movie on IMDB website (Very Fast and accurate)
Load main IMDB movie page for main details (Very Fast)
Load actors page (Very fast)
Load Trailer Page and get trailer url page (Very fast)
Load trailer URL page (Very fast)
Load all IMDB poster pages (If add thumbnail urls to nfo is enabled) (Can be anything from 1-20 pages) very fast
Load IMDB main plot page (Very fast)
Get fanart from TMDB (Can be very slow)
Get poster from TMDB (Can be very slow)
Get full list of posters from TMDB for nfo urls (If add TMDB thumbnail urls to nfo is enabled)(Can be very slow)
Get full list of posters from IMPD for nfo urls(If add IMPD thumbnail urls to nfo is enabled) (Speed is usually ok)
Get full list of posters from IMPA for nfo urls(If add IMPA thumbnail urls to nfo is enabled) (Speed is usually ok)

Once each page is loaded MC scrapes the appropriate information from it, the time this takes is negligible.

As you can see the scraper has to perform quite a bit of work to compile a fully populated nfo file.

There could be a small ammount of optimisation for the IMDB scraper, for example taking the actors from the main movie page when people have selected to scrape 15 or less actors, but the benefit would be hardly noticeable, on my 10meg connection it takes between 5 & 8 seconds to scrape the main body from IMDB, then anywhere between 15 & 60 seconds to actually download the hi-res fanart and poster.

It would be possible to create additional threads to scrape IMDB, MPDB, TMDB and IMPA seperately, but the complexity of this is in my oppinion not worth the slight benefits.

Admin Posts : 1326 Join date : 2008-09-20

Just for information, I have just finished a drag and drop feature, supported media items can be dropped onto the list individually or in groups, when these are scraping additional media can be dropped

There are a few issues that I am not quite sure how to deal with, such as items from folders that have not been added to MC, for the moment these will have nfo, tbn and fanart files created but will not be added to the main list since this could cause confusion when they are not there on restart.

This runs on its own thread so can be run along side the normal media scraper, although as proof to my argument above it is no faster than scraping on a single thread, they just slow each other down waiting for IMDB and TMDB.

Subject: Re: Add single movie file to be scraped Sun Jul 26, 2009 3:31 am

Thanks Billy I appreciate the explanation & I now understand a little better the difficulties involved.

Senior Member Posts : 223 Join date : 2008-12-08

billyad2000 wrote:: Just for information, I have just finished a drag and drop feature, supported media items can be dropped onto the list individually or in groups, when these are scraping additional media can be dropped

There are a few issues that I am not quite sure how to deal with, such as items from folders that have not been added to MC, for the moment these will have nfo, tbn and fanart files created but will not be added to the main list since this could cause confusion when they are not there on restart.

This runs on its own thread so can be run along side the normal media scraper, although as proof to my argument above it is no faster than scraping on a single thread, they just slow each other down waiting for IMDB and TMDB.

Perhaps threads kicked off for getting posters / trailers etc so that at least this part can be done in the background.

Admin Posts : 1326 Join date : 2008-09-20

dbareis wrote:

billyad2000 wrote:: Just for information, I have just finished a drag and drop feature, supported media items can be dropped onto the list individually or in groups, when these are scraping additional media can be dropped

There are a few issues that I am not quite sure how to deal with, such as items from folders that have not been added to MC, for the moment these will have nfo, tbn and fanart files created but will not be added to the main list since this could cause confusion when they are not there on restart.

This runs on its own thread so can be run along side the normal media scraper, although as proof to my argument above it is no faster than scraping on a single thread, they just slow each other down waiting for IMDB and TMDB.

Perhaps threads kicked off for getting posters / trailers etc so that at least this part can be done in the background.

Maybe eventually, but it would only lead to maybe a 10% improvement in scraping speed which unless you are scraping 1000s of movies does not really add up to much.

If people really want to see an enormous improvement then all I can suggest is making a donations to TMDB, maybe if they are better funded they would be able to afford better equipment, hosting etc.

» Multi-Episode Single file naming convention?
» MC v3.4 Creating Multiple NFOs on Single File Films
» [SOLVED] - How do you handle multi-part TV Shows in a single file?
» TT1112233 in movie info to be a button to open BROWSER at IMDB page for Movie
» A quick guide to movie file naming convention