KrawlSite

Browser

Source (link to git-repo or to original if based on someone elses unmodified work): Add the source-code for this project on opencode.net

0
5.0
Available as/for: -
Description:

KrawlSite is a web crawler/spider/ offline browser/download manager application. It is a KPart component with its own shell, so it can be run independently in its shell as well as it can be embedded into KPart aware applications like Konqueror.
To integrate with Konqueror, open the file associations page in the configuration dialog, select text/html mime type and in the embedded viewers list choose KrawlSite_Part. Now when you right click on a web-page in Konqueror, in the preview in menu, you'll see KrawlSite. Selecting it embeds the component into Konqueror as in the second screen shot. The first screen shot shows the shell in which the component runs. The third component is the configuration dialog.

If you like it please rate it as good :)

Feel free to send in your bug reports and comments. I'll look into them when I have some spare time.

Also, I am lousy at creating icons, so if someone out there likes this applications(a lot), please make an icon for this app. I'll include your name in the credits. :)

TIP
To use this app to download tutorials, set offline mode on, start crawling from the start of the tutorial. If the start page of the tutorial is the TOC, set crawl depth to 1 or if the start page has the TOC along with the first chapter, set crawl depth to 0. If only next & previous links are present per chapter page, set crawl depth to number of chapters.

I'd like to put in all this information in the handbook, but due to lack of time, not been able to do so. If someone understands the functionality and is willing to write the handbook, pls contact me.

If someone develops an rpm for this, pls contact me, so that I can link your rpm from this page. Many thanks!
Last changelog:

15 years ago

ver 0.7
Finally!
*crash free(afaik!), esp after kde 3.4 came around.
*support for html frames
*better UI

patch to v 0.6
* removes a bug that crashes app.
* removes bug in multiple job mode

ver 0.6
This one took a long time to come out, but it removes almost all of the bugs that caused the app to crash intermittently, apparently without any reason! There's one KNOWN BUG:
* If icon thumbnail previews are generated real time as files are created/deleted the app crashes. This has something to do with the internal implementation of the file browser(a KDE component), so to remove this bug, I'll have to write my own component( lot of work ), or i am doing something wrong with it ( will look into it). Thumbnail previews is disabled by default(but can be enabled by the context menu)
changes:
*) almost crash proof :) (see above)
*) new file browser, much cleaner to use.
*) more work on the leech mode, so its easier to use as a download manager.
If you use this app, with some regularity, i strongly suggest that you upgrade from 0.5.1, not because of any major new features but a much easier and crash-less experience. :)
Last of all, thanks for bearing with the crashes. I know it must have been exasperating.
~

ver 0.5.1
* corrected a bug in leech mode

ver 0.5
Some more features:
* leech mode finally functional. In Leech mode, the app simply parses through the html file and presents the links and images as checkable items. Select the files to download and save it to disk. handy when you need to download 20-30 links(files) from a list of 50-60-100 (rather than right-click and save link 30 times).
* Multiple job support with drop target window. click on drop target window, and drop urls on it. then you can configure each url to have different crawl settings, that is you can crawl the first url to depth 1 in offline mode, while 2nd url to depth 2 in simple mode, and so on. By default each url takes the current main settings.
* notification window. notifies when all job(s) have completed.
* user can jump to next link(in case current link is unresponsive), to next dropped url, pause and restart crawling.
* UI improvements(hopefully!) :-)

ver 0.4.1
* corrected a bug in downloading external links.

ver 0.4
0.4 is a huge jump from 0.3. Almost everything has been spruced up, and some new features added, though Leech mode is still unimplemented.
changes:
* total rework on offline mode browsing. now links are correctly cross-linked.
* handles dynamic content correctly.
* tar file support fully functional. turned out tougher to implement than i thought initially, thanks to the tar:/ protocol. the archive tool in konqueror is really simplistic and doesnt do the job right. My version does. :-)
* regular expression parsing to correctly parse html pages.can parse through almost 12000 links(in one page) in no time. :-)
* a proper file manager with drag-support.
* spruced up URL list view.
* quick set options available on the page
* UI improvements.

ver 0.3
* offline browser mode added. crawl through a site with this setting on, and the app modifies the links in the parsed files to point to local files if they exist on local disk.
* improved error reporting. errors encountered are reported in a separate window in real time.
* file types can be excluded(dont dowload these file types) or exclusive(only download these file types besides text/html)
* UI improvements in main window & config dialog.
* web archive support - not working completely. more complicated than i thought initially. right now, only creates a compressed tarball.
* leech mode - not implemented as yet.
* more code cleanup.

ver 0.2
* major code cleanup.
* ugly qt event loop hack replaced with elegant threaded model
* ugly crashes due to ugly qt event loop hack removed.
* minor UI improvements

linux3114a

16 years ago

Have removed due to some abnormal end given by this version on KDE3.4

Sorry

Report

linpete

16 years ago

hey - thanks a lot for this fine app
0.5.1 now works for me
on suse 9.1 at KDE 3.2.1

i built a rpm using chekinstall but i don´t have webspace :-(

pete

Report

eyecon

16 years ago

GREAT IDEA!!

Compiles fine (if I do NOT use --disable-debug --enable-final) but freezes with URL entry/enter.

Report

dusham

16 years ago

cast-align -Wconversion -Wchar-subscripts -O2 -fno-exceptions -fno-check-new -fno-common -o libkrawlsitepart.la -rpath /usr/local/kde/lib/kde3 -module -avoid-version -module -no-undefined -Wl,--no-undefined -Wl,--allow-shlib-undefined -R /usr/local/kde/lib -R /usr/local/qt/lib -R /usr/X11R6/lib -L/usr/X11R6/lib -L/usr/local/qt/lib -L/usr/local/kde/lib krawlsite_part.lo krawlsitepartwidget.lo krawlconfig.lo krawler.lo logbrowser.lo krawlarchiver.lo part_mainw.lo fileincexoptions.lo krawloptions.lo localdiroptions.lo urloptions.lo logWidget.lo krawler.moc.lo krawlarchiver.moc.lo -lkparts -lkio
.libs/krawler.o(.text+0x66d): In function `Krawler::run(void)':
: undefined reference to `Krawler::debug(QString)'
.libs/krawler.o(.text+0x72d): In function `Krawler::run(void)':
: undefined reference to `Krawler::debug(QString)'
.libs/krawler.o(.text+0x815): In function `Krawler::run(void)':
: undefined reference to `Krawler::debug(QString)'
.libs/krawler.o(.text+0x8a2): In function `Krawler::stopCrawlingClicked(void)':
: undefined reference to `Krawler::debug(QString)'
.libs/krawler.o(.text+0xca4): In function `Krawler::continueKrawling(void)':
: undefined reference to `Krawler::debug(QString)'
.libs/krawler.o(.text+0x283e): In function `Krawler::imgSrcList(QString &, KURL, bool)':
: undefined reference to `Krawler::kurlAttributes(KURL, QString &, QString &, QString &)'
.libs/krawler.o(.text+0x289a): In function `Krawler::imgSrcList(QString &, KURL, bool)':
: undefined reference to `Krawler::substring(QString, int, int)'
.libs/krawler.o(.text+0x2976): In function `Krawler::imgSrcList(QString &, KURL, bool)':
: undefined reference to `Krawler::stringWithinQuotes(QString, QString, QString, QString &, int, bool)'
.libs/krawler.o(.text+0x2ca8): In function `Krawler::hrefList(QString &, KURL, bool)':
: undefined reference to `Krawler::kurlAttributes(KURL, QString &, QString &, QString &)'
.libs/krawler.o(.text+0x2d04): In function `Krawler::hrefList(QString &, KURL, bool)':
: undefined reference to `Krawler::substring(QString, int, int)'
.libs/krawler.o(.text+0x2d7a): In function `Krawler::hrefList(QString &, KURL, bool)':
: undefined reference to `Krawler::stringWithinQuotes(QString, QString, QString, QString &, int, bool)'
collect2: ld returned 1 exit status
make[2]: *** [libkrawlsitepart.la] Error 1

Report

dusham

16 years ago

this app looks good, but 'make' gives me errors:

krawlsitepartwidget.cpp: In method `void KrawlSitePartWidget::createNewFolder()':
krawlsitepartwidget.cpp:755: ambiguous overload for `bool ? const char[11] : const QString'
krawlsitepartwidget.cpp:755: candidates are: operator ?:(bool, QString, QString)
krawlsitepartwidget.cpp:755: operator ?:(bool, basic_string, string)
krawlsitepartwidget.cpp:755: operator ?:(bool, const char *, const char *)
make[2]: *** [krawlsitepartwidget.lo] Error 1
make[2]: Leaving directory `/home/corel/download/inet/krawlsite/src'

some fix to this?

i'm using debian woody, gcc 2.95.4 (no problems with krawlsite 0.3)

thank you in advance.

Report

C

wireframe01

16 years ago

i compiled using gcc 3.3.2, and it compiles correctly over here. i cant see why it should fail, at the line number you have provided.
i can only suggest upgrading your gcc.. sorry abt that..

Report

C

wireframe01

16 years ago

or replace line no 755 in krawlsitepartwidget.cpp with this:
QString input_value = new_folder_index == 0 ? QString("New Folder") : QString("New Folder" + QString("%1").arg(new_folder_index));

Report

Khan

16 years ago

Hey wireframe, I'm the guy that contacted you via Gmail about that URL with the CGI's. Any luck on figuring out how to crawl that site yet?

http://library.brickshelf.com/cgi-bin/gallery.cgi?f=9

Report

C

wireframe01

16 years ago

Yeah, i thought i'd reply to that mail, but i saw this here, and for the benefit of others, would like to reply here. version 0.4 does the job. start crawling from the link you've put here, with crawl depth at 1, put cgi in the dynamic content extensions list, and you are good to go :-)
in fact i am crawling the link you have given here and i am getting all the images and the pdf files.

Report

jowilly

16 years ago

This app looks very promising.

But it does not work. When I click on start krawling, it starts and after 3 seconds it comes back with done... but didn't do anything...

Also, you need to clean the UI up a little: there is not enough room for some of the widgets in the config dialog, so half of them is cut off, on the main window we should be able to move the middle bar, the help menu is not at the right place, etc...

But this is a KDE app that was needed for long... coming at last !

Thanks !

Report

philr

16 years ago

I have the same problem. I know it's not a malformed URL because I copied and pasted from the browser address bar.

Phil.

Report

C

wireframe01

16 years ago

could you also tell me what the url was? i'll look into it.

Report

philr

16 years ago

http://www.annierivieccio.com/

She's my sister, honestly :-)

Report

C

wireframe01

16 years ago

ah! frames... no support for that yet. added it to my todo list..

Report

C

wireframe01

16 years ago

i guess some error must have occurred like a malformed url or something. i guess i need to improve the error reporting mechanism from the current scheme where i just flash it on the statusbar.
and yeah, the UI... yes i realize that requires some more work. will put a splitter in between the views.
:)

Report

jowilly

16 years ago

v. 0.3 settings/ krawl mode, there is not enough room on the right:

- I see only 1 pixel of the letter "g" in Simple Crawling
- Default crawl depth selection arrows, I see only 1/2 of the up/down buttons
- "Travers out to external links b -> what is after "b" ? I don't see it.
---

- when I do not enter http:// in the url it says : Malformed URL
- it seems to crawl and download something, but does not save any files to my folder ? Also, I see nothing in crawl statistics...

Report

15 years ago

ver 0.7
Finally!
*crash free(afaik!), esp after kde 3.4 came around.
*support for html frames
*better UI

patch to v 0.6
* removes a bug that crashes app.
* removes bug in multiple job mode

ver 0.6
This one took a long time to come out, but it removes almost all of the bugs that caused the app to crash intermittently, apparently without any reason! There's one KNOWN BUG:
* If icon thumbnail previews are generated real time as files are created/deleted the app crashes. This has something to do with the internal implementation of the file browser(a KDE component), so to remove this bug, I'll have to write my own component( lot of work ), or i am doing something wrong with it ( will look into it). Thumbnail previews is disabled by default(but can be enabled by the context menu)
changes:
*) almost crash proof :) (see above)
*) new file browser, much cleaner to use.
*) more work on the leech mode, so its easier to use as a download manager.
If you use this app, with some regularity, i strongly suggest that you upgrade from 0.5.1, not because of any major new features but a much easier and crash-less experience. :)
Last of all, thanks for bearing with the crashes. I know it must have been exasperating.
~

ver 0.5.1
* corrected a bug in leech mode

ver 0.5
Some more features:
* leech mode finally functional. In Leech mode, the app simply parses through the html file and presents the links and images as checkable items. Select the files to download and save it to disk. handy when you need to download 20-30 links(files) from a list of 50-60-100 (rather than right-click and save link 30 times).
* Multiple job support with drop target window. click on drop target window, and drop urls on it. then you can configure each url to have different crawl settings, that is you can crawl the first url to depth 1 in offline mode, while 2nd url to depth 2 in simple mode, and so on. By default each url takes the current main settings.
* notification window. notifies when all job(s) have completed.
* user can jump to next link(in case current link is unresponsive), to next dropped url, pause and restart crawling.
* UI improvements(hopefully!) :-)

ver 0.4.1
* corrected a bug in downloading external links.

ver 0.4
0.4 is a huge jump from 0.3. Almost everything has been spruced up, and some new features added, though Leech mode is still unimplemented.
changes:
* total rework on offline mode browsing. now links are correctly cross-linked.
* handles dynamic content correctly.
* tar file support fully functional. turned out tougher to implement than i thought initially, thanks to the tar:/ protocol. the archive tool in konqueror is really simplistic and doesnt do the job right. My version does. :-)
* regular expression parsing to correctly parse html pages.can parse through almost 12000 links(in one page) in no time. :-)
* a proper file manager with drag-support.
* spruced up URL list view.
* quick set options available on the page
* UI improvements.

ver 0.3
* offline browser mode added. crawl through a site with this setting on, and the app modifies the links in the parsed files to point to local files if they exist on local disk.
* improved error reporting. errors encountered are reported in a separate window in real time.
* file types can be excluded(dont dowload these file types) or exclusive(only download these file types besides text/html)
* UI improvements in main window & config dialog.
* web archive support - not working completely. more complicated than i thought initially. right now, only creates a compressed tarball.
* leech mode - not implemented as yet.
* more code cleanup.

ver 0.2
* major code cleanup.
* ugly qt event loop hack replaced with elegant threaded model
* ugly crashes due to ugly qt event loop hack removed.
* minor UI improvements

12345678910
Be the first to comment
File (click to download) Version Description PackagetypeArchitectureRelease ChannelDevices Downloads Date Filesize DL OCS-Install MD5SUM
*Needs pling-store or ocs-url to install things
Pling
0 Affiliates
Details
license
version
0.7
updated Dec 04 2005
added Dec 07 2004
downloads 24h
0
mediaviews 24h 0
pageviews 24h 3
System Tags app software