Converting PDF to HTML
(posted by Fred Emmott at 2007-10-25 18:34:08)
Earlier today, I saw the question "Can imagemagick convert pdf to html?" in IRC. In response, I present the world's worst PDF->HTML converter:
#!/bin/sh BN=$(basename $1 .pdf) pdf2ps $1 pstopnm -xsize 800 $BN.ps cat > $BN.html <<EOF <html> <head> <title>$1</title> </head> <body> EOF for file in $BN*.ppm; do out=$(basename $file .ppm).png convert $file $out echo "<img src='data:image/png;base64,$(base64 $out)'>" >> $BN.html done cat >> $BN.html <<EOF </body> </html> EOF
Trackbacks
No trackbacks for this post
Comments
Quotes
Posted at 2007-10-26 07:37:03 GMT +0000 by "Anonymous"
You're missing quotes around the filenames. Try passing a file with spaces in its name to your script to see what I mean. And there's pdftohtml lying around for ages, so your script is not really the first thing to do that...
Re: Quotes
Posted at 2007-10-26 15:10:43 GMT +0000 by "Fred Emmott" (
http://fredemmott.co.uk/)
It's not meant to be of practical use. It's a joke; there's a reason I labelled it "the world's worst".
ipljobc orqstf
Posted at 2008-05-25 03:30:59 GMT +0000 by "lpkdca rcdjvxag"
ogvb dehxgfct dcti pmuvldx equh voacfwlmh gzkduxc











Updated version
Posted at 2007-10-25 19:37:59 GMT +0000 by "Fred Emmott" (
http://fredemmott.co.uk/)
An updated version, that actually produces readable text is at http://files.fredemmott.co.uk/pdf2html.sh