Converting PDF to HTML :: Blog
Converting PDF to HTML
(posted at 2007-10-25 18:34:08 UTC)
Earlier today, I saw the question "Can imagemagick convert pdf to html?" in IRC. In response, I present the world's worst PDF->HTML converter:
#!/bin/sh BN=$(basename $1 .pdf) pdf2ps $1 pstopnm -xsize 800 $BN.ps cat > $BN.html <<EOF <html> <head> <title>$1</title> </head> <body> EOF for file in $BN*.ppm; do out=$(basename $file .ppm).png convert $file $out echo "<img src='data:image/png;base64,$(base64 $out)'>" >> $BN.html done cat >> $BN.html <<EOF </body> </html> EOF
Comments
Updated version
Posted at 2007-10-25 19:37:59 UTC by "Fred Emmott"
An updated version, that actually produces readable text is at http://files.fredemmott.co.uk/pdf2html.sh
Quotes
Posted at 2007-10-26 07:37:03 UTC by "Anonymous"
You're missing quotes around the filenames. Try passing a file with spaces in its name to your script to see what I mean. And there's pdftohtml lying around for ages, so your script is not really the first thing to do that...
Re: Quotes
Posted at 2007-10-26 15:10:43 UTC by "Fred Emmott"
It's not meant to be of practical use. It's a joke; there's a reason I labelled it "the world's worst".
New Comment
My Blog ▶ 2007 ▶ October ▶ 25 ▶ Converting PDF to HTML