Page 1 of 1

Need help with this phpBB site

Posted: Mon Feb 25, 2008 9:07 am
by 1-d
Hello to the great KoalaBear and the other posters,

I am struggling with setting up PicaLoader to make it work with this phpBB-based website. The starting address is

http://www.candidsplash.com/viewforum.php?f=2

Let me first say that the site may not be suitable for minors, although there is no nudity or hardcore stuff.

The forum lists 5o threads per page (URL of each thread always includes 'viewtopic.php?t=' string) and currently there are 3 pages within the section. When you click on a thread, that's where things get a little complicated. Depending on the poster, sometimes full-size pics are directly embedded into the thread and the other times thumbnail pics are linked to fullsize pics on an external site (such as imageshack).

I do have a user account and already using it for manual login. But no matter what I do I can't even make PicaLoader to connect even to the thread/topic pages. Whenever I run the task, the start page gets read but that's it. No pictures are retrieved whatsoever.

KB, you were awesome helping me out earlier. Could you use your magic wand one more time? :wink:

Posted: Mon Feb 25, 2008 11:25 am
by KoalaBear
Start URL:http://www.candidsplash.com/viewforum.php?f=2
Picture Filter Profile:All pictures except thumbnail
Check Site requires a password
Check Manual login URL
Manual Login URL:http://www.candidsplash.com/login.php
Page URL Include Filters:&topicdays=0&start=\d+$;viewtopic\.php\?t=\d+$;\.jpg$
Picture URL Include Filters:\.jpg

Posted: Mon Feb 25, 2008 4:44 pm
by 1-d
Worked like a charm... again! :D You are simply amazing, master KB. How do you know so much about these things?

I am also trying to understand the filters you used here, just out of curiousity. Who knows, if I could grasp the underlying principles I won't need to bother you everytime I come across similarly-structured sites. HTML parser script is way above my league so I am not even going to try. :)

I think I understand most of the filter entries, except for \d+$ part. Could you explain to me how it works? I also wonder if there are any common rules regarding page/picture URL filters when dealing with this type of board-based sites.

Again, much obliged.

Posted: Tue Feb 26, 2008 8:20 am
by KoalaBear
\d means one digit character
\d+ means one or more digit character
$ means match the end of URL

so \d+$ means the URL is end with one or more digit character

Posted: Thu Feb 28, 2008 4:51 am
by 1-d
Thanks KoalaBear. I actually looked at the help file and read through the regualr expressions so I kinda guessed it, but you made it all clear. :)