Contents
The Problem
Let’s say that you need to copy a website to a CD so that it can be used offline. Of course, wget takes care of html pages and images for you:
wget -m -k -K -E http://www.mysite.com
However, if the site contains youtube videos and you want those to work offline, then you’ll need to do a little more work.
Download Resources
You’ll need three pieces of software:
- JW Player, a free (for non commercial use) FLV player, with the key feature of allowing flash video to be played from a local directory. Alternative flash players (e.g. flowplayer) that I tried did not allow this by default, instead requiring you to alter the security settings in flash, and/or recompile the SWF.
- SWFObject 1.5. My example below is specific to swfobject 1.5, although you should be able to easily adapt it to swfobject 2. It’s an an easy-to-use and standards-friendly method to embed Flash content, utilizing one small JavaScript file.
- Youtube-dl, a python script for downloading from youtube.
Once you have these all downloaded, assemble into the same directory:
- jwplayer.swf
- swfobject.js
- youtube-dl
The Script
This following script operates on the directory created by the wget invocation above, downloads the flash videos, and swaps out youtube object tags with the markup needed for jwplayer.
#!/bin/bash
DIR=whatever_wget_created
function use_local_video_player {
for file in $DIR/*.htm $DIR/*.html
do
echo Adding script tag to $file
sed 's%<head>%<head>\n<script type="text/javascript" src="swfobject.js"></script>%' $file > tmp
mv tmp $file
done
cp jwplayer.swf $DIR
cp swfobject.js $DIR
}
function mirror_videos {
[ -f vids ] && rm vids
for file in $DIR/*.htm $DIR/*.html
do
echo Looking for video in $file
cat $file | grep embed >> vids
done
perl -i -p -e 's/.*embed src="(.*)" type=.*/\1 /' vids
while read line
do
./youtube-dl $line
done < vids
}
function swap_out_video_refs {
mkdir $DIR/youtube
mv *.flv $DIR/youtube
for file in $DIR/*.htm $DIR/*.html
do
perl -i -p -e 's/<object width="([0-9]*)" height="([0-9]*)".*com\/v\/(.*)\&hl.*<\/object>/<div id="\3"><\/div><script type="text\/javascript">var so = new SWFObject("jwplayer.swf","mpl","\1","\2","9"); so.addVariable("file","youtube\/\3.flv"); so.write("\3");<\/script>/' $file
done
}
use_local_video_player
mirror_videos
swap_out_video_refs
Limitations
This script quite obviously is somewhat limited, as it was written to address a specific problem that I had. Here are some limitations. Please comment here if you add improvements.
- Youtube only – this only works with youtube, not with vimeo et al. The following references (while also specific to youtube) may give you some leads on how to proceed: 1 2
- Non recursive – obviously the script is only operating on html and htm files in the top level directory. Add recursion at will
- Video only – JW Player can play audio as well. The script could be enhanced to look for any kind of flash embedded mp3, and swap out the player with the offline version.