rewritten webarchiver plugin

Various KDE 1.-4. Improvements

Source (link to git-repo or to original if based on someone elses unmodified work): Add the source-code for this project on opencode.net

2
Score 70%
Description:

An improved replacement for the webarchiver plugin found in kdeaddons. Compared to the old webarchiver this one can handle pages with frames and nested stylesheets. Please note that Javascript may or may not work in archived webpages.

Last changelog:

11 years ago

--- 2008-02-25

* Version r3/3.5.8 to r4/3.5.9
* Fixed failed assertion if KHTML does not parse a STYLE area
* use "Verify" cache strategy if working together with the original http slave

--- 2007-11-20

* Version r2/3.5.7 to r3/3.5.8
* fixed crashes on webpages that somehow create internal DOM nodes without children.
* fixed handling of webpages that have more than one style area.
* very small performance and error handling optimizations.

-- 2007-06-07

* Version r2/3.5.6 to r2/3.5.7
* no changes in the code itself

-- 2007-03-21

* new patches against KDE 3.5.6.
* Much stricter and more secure URL checking
* several bug fixes
* last directory is now remembered in save dialog

-- 2006-20-06

* new patches against KDE 3.5.3. No changes in the code itself

--

* new patches against KDE 3.5.2. No changes in the code itself
* updated README
* webarchiver sources packaged as patches, not as tar archives

DanaKil

11 years ago

I'd love if the original url of the file (the web url) can be saved in the .war file ;maybe as a comment of the tar.gz archive or something (don't know if this kind of archive can embed comment) and be accessible through the properties of the file.

Maybe a personal comment could be useful to (a note about this page...)

It's also annoying to click on a button to close the dialog when the save is finished


I really hope your improvements will be part of KDE soon :)

Report

C

maps4711

11 years ago

It is already saved although a bit hidden. Open the archived page in Konqueror and press Ctrl-U (or select View->Show Source from the Menu). The original URL is saved inside a HTML comment at the top.

About Meta-information: Sounds interesting, but I have not yet looked at how it is handled by KDE.

Report

halux

12 years ago

I think it would be a good idea to name the extension kwar, webar or something like that.

.war is used in JSP Servers and Java Application Servers like Tomcat, JBoss and Weblogic. At least for me this is not optimal because also the mime actions in kde are wrong. (This wars are zip files)

Felix

Report

C

maps4711

12 years ago

If .war gets renamed then I guess all users with their existing .war file
collection would start revolting.

I don't know if there is way to tell KDE that the same file extension refers
to two different file types.

Report

GameMage

12 years ago

I'm guessing you haven't explored the properties bit on Konqueror much then. Right click a file and select Properties, on the right from type should be a tool icon. Clicking that lets you alter a file's mime type including adding extra extentions to its list. Its also accessible via Konqueror's settings and KControl.

Report

C

maps4711

12 years ago

You are talking about other the way round: A certain file type (for example a Word Document)
has one or more file extensions (*.doc; *.DOC)

The problem here is that two different file types (KDE web archive, Java archives) share
the same extension (*.war).

But your post made me look at it again. In the mime type property dialog it is possible to
add another application to a filetype and give it higher priority over the default one.
In this case it means adding the Java application server to the list of applications that
handles files of type web archive.

A drawback is that KDE will always default-open .war files with the Java server regardless
of whether the file is a web or Java archive.

Report

manik

12 years ago

FireFox has maff and Konqi has war. However during testing these I find war to be easier quick and responsive than maf.

It would be great if FireFox could have been made to understand war by writing a plugin for it following the Maf plugin code. What do you say?

Report

C

maps4711

12 years ago

* I rarely use firefox ;-)
* good idea, but no time, sorry.

Report

DanaKil

12 years ago

hi,
maybe someone know if the WAR format is a KDE only format or if it is used by others DE too (like Gnome)

I think it's KDE only no ? isn't there any unified format for linux ?

(and many thanks for this improvements, really)

Report

C

maps4711

12 years ago

.war is plain .tar.gz format in disguise, you can extract it with e.g.

$ mkdir webpage && cd webpage
$ tar -xzf ../webpage-archive.war
$ <open index.html with your browser>

Report

manik

12 years ago

It is a zip file. Just unzip it and open with firefox if you please!

Report

avuton

12 years ago

One thing I find really annoying is webpages that have a shrunken picture and when the picture is clicked on it uses java magic to popup a window with a larger picture in it. This isn't saved by the web archiver, is there any way to get support for this?

Report

C

maps4711

12 years ago

There are two problems:

1) Images or other things loaded by Javascript, Java or plugins can change
unpredictably everytime the webpage is viewed. For example, an embedded java
script may load a different image on the first of each month. Therefore, the
webarchiver is not able to know beforehand what may lurk inside a java
script block.

2) The design of the new (and AFAIK old) webarchiver is to be able to make
a snapshot of the current webpage only (as far as it is possible). What are
you looking for is a tool that also downloads hyperlinked pages and images.

It is possible but a time-consuming task to add that to the webarchiver and,
frankly, I don't want to add such bloat, because there are already tools
out there that do this job. I suggest you use 'wget' or a similiar program
that can download webpages recursively.

Report

yglodt

13 years ago

File a wishlist item at bugs.kde.org and
attach your patch to it, this will make
it more visible to the maintainers of the
code you improved!

Report

C

maps4711

13 years ago

I did so a few times, but were constantly hitting a KDE feature freeze.
Someone pointed me to kde-apps.org to get it out so people can test it.
In the time between I was too busy / lazy doing other things (working
for example :-)

But anyway, this is good idea.

http://bugs.kde.org/show_bug.cgi?id=98695
http://bugs.kde.org/show_bug.cgi?id=118475

Report

ziuchkov

13 years ago

This is a much-needed improvement! Please keep up your work with this!

Report

C

maps4711

13 years ago

I have been using this patches privately for about a year because I really wanted them.
So chances are good I will support them in future KDE versions :-)

Report

11 years ago

--- 2008-02-25

* Version r3/3.5.8 to r4/3.5.9
* Fixed failed assertion if KHTML does not parse a STYLE area
* use "Verify" cache strategy if working together with the original http slave

--- 2007-11-20

* Version r2/3.5.7 to r3/3.5.8
* fixed crashes on webpages that somehow create internal DOM nodes without children.
* fixed handling of webpages that have more than one style area.
* very small performance and error handling optimizations.

-- 2007-06-07

* Version r2/3.5.6 to r2/3.5.7
* no changes in the code itself

-- 2007-03-21

* new patches against KDE 3.5.6.
* Much stricter and more secure URL checking
* several bug fixes
* last directory is now remembered in save dialog

-- 2006-20-06

* new patches against KDE 3.5.3. No changes in the code itself

--

* new patches against KDE 3.5.2. No changes in the code itself
* updated README
* webarchiver sources packaged as patches, not as tar archives

product-maker 18 49

File (click to download) Version Description Downloads Date Filesize DL OCS-Install
Pling
*Needs ocs-url or ocs-store to install things
Details
license
version
3.5.9-r4
updated Feb 25 2008
added Dec 15 2005
downloads today
0
page views today 3