scans/ rescans

Got an idea? Missing something? Post your feature request here.
Lars
Contributor
Contributor
Posts: 36
Joined: 01 Jul 2014, 23:06
Has thanked: 23 times
Been thanked: 23 times

scans/ rescans

Unread post by Lars »

a few remarks on my personal experience lately. i use binhex-madsonic docker on an unraid server, before i used botez-madsonic on the same server... even before that i used a subsonic docker on the same machine.
i have a largish music only collection of about 12tb / 100k albums. under subsonic a scan used to take about 3hrs. since i changed to madsonic i am looking at numbers in the area 30+hrs when i used the botez docker and around 15hrs with the binhex docker. what is quite a drastic change to the worse.
i have to say the improvement in the gui responsiveness are quite impressive with the last 5.1 final releases. so my thanx for that and the many other little improvements. generally my big thanx for madsonic in general - it makes subsonic so much more useful!!!!

anyway, doing some research and thinking about the prob - i have some questions / suggestions. since my programming skills and knowledge about the workings of database engines are rudimentary at best, plz correct me where i am wrong or consider it for future releases.
- first i noticed a huge discrepancy between reads and writes during a scan (there are about 7 writes for every read) i assume this is due to the extended functions of madsonic vs. subsonic!? since much of it should still be based on the same information in tags etc. - is there a big room to optimize the db tables?
- ms and ss are build with hypersql which is based on java, can that be a limiting factor in performance? java to my knowledge is not really known for speed and often enough causes when used by multiple programs at least under linux crashes (which i didn't have yet) and minor probs.
- i read during my searches for solutions about increasing the memory limit, i did that to about 1500mb - no noticeable real improvement there. i read also somewhere on the ss forums a post of somebody who undertook for himself a change/ rewrite from hypersql to postgresql claiming major performance increase.

now to the actual request/ question - would a db change from hypersql to postgresql be likely to accomplish that? if so, why not switch for better performance? how complicated would it be? is it something that could be considered for future releases / has anybody attempted that for madsonic and has some information how performance compares?

i think also something must be fundamentally wrong when a simple rescan (just add the new stuff) takes as long as a full scan of your media (they do- i tried it). a simple rescan should take a substantially shorter time. a good example for that is kplaylist (which i used for many years before switching to ss and recently ms a couple years back due to the stagnation in the kplaylist development). there a full scan used to take about 2hrs, a rescan as little as 5min!!!
seems like a much better deployed db functionality to me as the layman.

looking forward to answers, explanations, ideas or commitments of ppl with the ability to do it to find out :)

cheers, L
These users thanked the author Lars for the post:
Madsonic
Rating: 7.69%
GJ51
Contributor
Contributor
Posts: 192
Joined: 15 Dec 2012, 17:52
Has thanked: 42 times
Been thanked: 83 times

Re: scans/ rescans

Unread post by GJ51 »

There was a fork that used Postgres db a little over a year ago, but the developer seems to have abandoned the project. Although I prefer the Madsonic fork, some users are pretty happy with the fork called "Music Cabinet." Many claimed the db was faster and more reliable than the db used in Subsonic.

I've been a pretty loyal fan of using Madsonic on Windows server running Apache Tomcat. It's been stable and fast enough for me but I don't have anywhere near the size library you have.

Anyway, John Foliot's site runs Music Cabinet and is open to guests if you want a look.

http://108.0.157.240/login.view

I don't know what his hardware is but it appears to be a Windows platform with a library close to 530 Gb.

I guess that's small by comparison, but still fairly large for avg. users.

There are also many factors affecting performance that you might experiment with. It might be faster if you sort by folder names rather than tags, or vice/versa. There are so many settings in MS that it's hard to tell what affects performance and how - all of which could be magnified grossly on a 12 TB library.

The good news is that no matter how fast you sort it it's still gonna take you almost 14 years to listen to each track once, unless you sleep 8 hours a day which would make it closer to 20 years. :ugeek:
These users thanked the author GJ51 for the post (total 2):
MadsonicLars
Rating: 15.38%
Lars
Contributor
Contributor
Posts: 36
Joined: 01 Jul 2014, 23:06
Has thanked: 23 times
Been thanked: 23 times

Re: scans/ rescans

Unread post by Lars »

hi GJ51,

thx for the reply first of all.

i looked a long time ago into music cabinet and it wasn't bad. i juzst never really liked it :) .also, from reading around about the db speed problem i came across some posts claiming that the mc development came pretty much to a standstill. def not a software option i would consider, that was for me the reason at the time to move on from kplaylist which i liked very much. it didn't have the 'nice' gui but a lot of the features we have seen first in the madevil fork of subsonic again. anyway, i like the madsonic fork and the way it is going alot. so basically i committed myself to it for the foreseeable future, as it is actively developed (most important- usually to the better), the responses on questions here are lightning fast compared to the subsonic community and it basically covers everything i need really. sometimes i have some issue, next day i wanna post about it i kinda forgot what it was (basically saying here- it wasn't important, otherwise i would remember what put me off ;) )
i really just try to figure out the media scan time issue.
i have no issues with stability of madsonic at all. it just runs in my environment with unraid running madsonic as a docker app. as mentioned before i used a 5beta first (even that was fine), now running 5.1 latest stable release - which solved many of my gui responsiveness problems. all that is normally in well acceptable time-frames now (in the next stable version i envision changes to be happen when i just consider them.... just kidding)
the media scan is my big issue. i was just running a normal rescan, after adding a lot of stuff. took basically 11hrs. just under. i scan usually for for folder names instead of tags, cause it is named exactly like i would like it to show (plus i would think it is faster, just gut instinct - no hard data there really).
i am wondering also (even as i doubt it based on the hardware usage numbers i get) if there are huge differences between docker use, vs win install, vs linux install - so i hope some other ppl chime in here as well.
since i usually do a once a week update (rescan) i can somehow live with the 11hrs time frame. i just think that there must be a general underlaying issue which causes it. even more so about the fact that a 'normal' rescan takes basically as much time as a full scan. sth i established a couple of times now by just clearing the db completely do a new scan from zero and run a 'rescan' or update the next day (with none or very few new items added)! the time frame stays within matters of minutes the same. now, that might be a hypersql vs 'xyz' database engine issue, or just a matter how functions are applied within hypersql (i am - as stated before - the wrong person to come up with final conclusions for that). but i think it would be great if somebody here with a good understanding of this kind of stuff would wrap his head around it and check it out!!!

and for your last remark "The good news is that no matter how fast you sort it it's still gonna take you almost 14 years to listen to each track once, unless you sleep 8 hours a day which would make it closer to 20 years. :ugeek:" - yep you are rite :D - but if you wanna listen to this new album you just pushed up..... you get the idea :roll:

cheers, L
These users thanked the author Lars for the post (total 2):
MadsonicGJ51
Rating: 15.38%
GJ51
Contributor
Contributor
Posts: 192
Joined: 15 Dec 2012, 17:52
Has thanked: 42 times
Been thanked: 83 times

Re: scans/ rescans

Unread post by GJ51 »

Thanks for the feedback. You have an interesting problem. You might check to see if you have "Fast access mode" checked on the Settings/Media Folders screen. If it's checked then supposedly you won't see new files until after the next scan, which, of course, implies that if it's unchecked you should be able to see new files as soon as their added (???) I think.

Ultimately, probably the biggest impact on performance is hardware power with high bandwidth CPU crunch power and fast disk arrays, but it does seem strange that Madsonic would take 4X longer than Subsonic.

Another option is multiple sites, perhaps one trimmed down and dedicated to newer additions, but that involves more hardware or a VM and new or duplicated libraries.

I see Marty has read the thread so maybe he'll chime in regarding the difference in the scan speed at some point.

One other thought - if your storage is local on the host machine rather than in network storage, you would get better performance unless your on something like infiniband. I am at that point where managing our network and moving storage is really hampered by network throughput. We're now upwards of 50TB total storage capacity and moving the big pieces takes a lot of time. So in your case with 12TB library size, you'll probably get the best performance if the Madsonic installation is on the storage LM. We have our media on a 10TB RAID array and access it from multiple VM's - but my total music library is only about 8k albums so the performance is not an issue. A 12 TB library is totally in another category with a lot of things to sort out to get good response and performance.

Keep us posted if you get an idea or setup that improves the situation.
These users thanked the author GJ51 for the post (total 2):
LarsMadsonic
Rating: 15.38%
Lars
Contributor
Contributor
Posts: 36
Joined: 01 Jul 2014, 23:06
Has thanked: 23 times
Been thanked: 23 times

Re: scans/ rescans

Unread post by Lars »

GJ51 wrote:Thanks for the feedback. You have an interesting problem. You might check to see if you have "Fast access mode" checked on the Settings/Media Folders screen. If it's checked then supposedly you won't see new files until after the next scan, which, of course, implies that if it's unchecked you should be able to see new files as soon as their added (???) I think.
no, it isn't checked. and yes, you are rite - i can find it, knowing what i am looking for - 14 other ppl can't, because they don't know it was added ;)
GJ51 wrote:Ultimately, probably the biggest impact on performance is hardware power with high bandwidth CPU crunch power and fast disk arrays, but it does seem strange that Madsonic would take 4X longer than Subsonic.
true, but i think running this server with just one objective, music storage with unraid6 and the only addition being madsonic as docker implementation the athlon II x4, 8gb ram should be plenty of hardware the disk arrays are running partially of the gigabyte motherboards sata3 ports and of a additional lsi sata3 pcie x8 card. so basically all of it is sata3 wd-green drives 3tb. only exception is the cache drive (which holds the docker and madsonic files as well) being a seagate raptor. looking at system usage numbers everything on the hardware end is even during scans way below 20% of capacity in usage - that should not really be a limiting factor, i think.
and as i mentioned before, my hardware did not change between the different installs of ss and ms.
GJ51 wrote:Another option is multiple sites, perhaps one trimmed down and dedicated to newer additions, but that involves more hardware or a VM and new or duplicated libraries.
as you just said, it involves more hardware first of all, also inconvenience and possible confusion by my other users :) - i like to keep it with the KISS approach (Keep It Simple, Stupid)
GJ51 wrote:I see Marty has read the thread so maybe he'll chime in regarding the difference in the scan speed at some point.
i noticed to, maybe he has an idea. on the other hand i rather have him concentrate on the next awesome releases and hope some db-engine guru here will take a shot on the issue. i am still convinced that it is a db engine issue. maybe a diff db engine would improve things, maybe just some major db optimization is all it needs.
GJ51 wrote:One other thought - if your storage is local on the host machine rather than in network storage, you would get better performance unless your on something like infiniband. I am at that point where managing our network and moving storage is really hampered by network throughput. We're now upwards of 50TB total storage capacity and moving the big pieces takes a lot of time. So in your case with 12TB library size, you'll probably get the best performance if the Madsonic installation is on the storage LM. We have our media on a 10TB RAID array and access it from multiple VM's - but my total music library is only about 8k albums so the performance is not an issue. A 12 TB library is totally in another category with a lot of things to sort out to get good response and performance.
it is all physically on the same machine! no nas etc. involved. so basically the only network connection involved is my client machine, where i look at the gui. which has no influence in any way on the actual madsonic performance during scans etc. that is all same box, same motherboard - can't get closer together. of-course should i win the lotto one day i might upgrade all the storage to ssd's and the rest of the hardware to pro-server quality. till that is happening it is what it is.
GJ51 wrote:Keep us posted if you get an idea or setup that improves the situation.
i sure will. and thanx for your continuing interest, ideas and support. it is highly appreciated!

cheers, L


ps: i kinda hope also that some other ppl with large media collections (music or video) chime in here at one point with their experiences re. scan times etc. - i still keep wondering if it might be a docker / unraid issue by any chance. but following the limetech forum on that also closely i do not really see an indication for that so far.
Lars
Contributor
Contributor
Posts: 36
Joined: 01 Jul 2014, 23:06
Has thanked: 23 times
Been thanked: 23 times

Re: scans/ rescans

Unread post by Lars »

just as an idea here - thinking about the prob.
since it seems to scan everything all the time - is it possible to change/ add a command for the db to only scan new stuff??? along the lines - ignore everything to the date of last scan. just scan new stuff!?
basically it should stop the db engine from even looking into files of former scans - just add (read) files added since than. everything else should be covered anyway by older scans.
for any older stuff - i do not really care.... it might become an issue for a full scan.....

cheers. L
User avatar
Madsonic
Administrator
Administrator
Posts: 984
Joined: 07 Dec 2012, 03:58
Answers: 7
Has thanked: 1201 times
Been thanked: 470 times

Re: scans/ rescans

Unread post by Madsonic »

Thank you Lars for this discussion!
Lars wrote: - first i noticed a huge discrepancy between reads and writes during a scan (there are about 7 writes for every read) i assume this is due to the extended functions of madsonic vs. subsonic!? since much of it should still be based on the same information in tags etc. - is there a big room to optimize the db tables?
Yes there is much room for improvements. Next version will do this much better, via a directory-based scraper scan function. this will optimize also the scan time.
Lars wrote: - ms and ss are build with hypersql which is based on java, can that be a limiting factor in performance? java to my knowledge is not really known for speed and often enough causes when used by multiple programs at least under linux crashes (which i didn't have yet) and minor probs.
Madsonic was up to the 5.1 version greatly expanded on the basis of subsonic db layouts. with the next version many functions are implemented from scratch. So a new DB layout will be implemented on the basis of the new HSQLDB 2.3 version for more performance.
Lars wrote: - i read during my searches for solutions about increasing the memory limit, i did that to about 1500mb - no noticeable real improvement there. i read also somewhere on the ss forums a post of somebody who undertook for himself a change/ rewrite from hypersql to postgresql claiming major performance increase.
now to the actual request/ question - would a db change from hypersql to postgresql be likely to accomplish that? if so, why not switch for better performance? how complicated would it be? is it something that could be considered for future releases / has anybody attempted that for madsonic and has some information how performance compares?
The deployment with hsqldb is simply unbeatable. distribution with postgres across multiple platforms is quite complicated,
and does not bring the performance boost compared to an optimized development for HyperSQLDB.

Best regards,
Madevil
These users thanked the author Madsonic for the post (total 3):
GJ51Larsjake-
Rating: 23.08%
Lars
Contributor
Contributor
Posts: 36
Joined: 01 Jul 2014, 23:06
Has thanked: 23 times
Been thanked: 23 times

Re: scans/ rescans

Unread post by Lars »

thx for the input mate! :)
very much appreciated!
the mentioned changes for the next version sound good, i guess we will have to wait and see how they will improve the performance.

cheers, L
These users thanked the author Lars for the post:
Madsonic
Rating: 7.69%
Post Reply