How To: Locally Mirror A Site With Youtube Content

 

The Problem

Let’s say that you need to copy a website to a CD so that it can be used offline. Of course, wget takes care of html pages and images for you:

wget -m -k -K -E http://www.mysite.com

However, if the site contains youtube videos and you want those to work offline, then you’ll need to do a little more work.

Download Resources

You’ll need three pieces of software:

  • JW Player, a free (for non commercial use) FLV player, with the key feature of allowing flash video to be played from a local directory. Alternative flash players (e.g. flowplayer) that I tried did not allow this by default, instead requiring you to alter the security settings in flash, and/or recompile the SWF.
  • SWFObject 1.5. My example below is specific to swfobject 1.5, although you should be able to easily adapt it to swfobject 2. It’s an an easy-to-use and standards-friendly method to embed Flash content, utilizing one small JavaScript file.
  • Youtube-dl, a python script for downloading from youtube.

Once you have these all downloaded, assemble into the same directory:
  • jwplayer.swf
  • swfobject.js
  • youtube-dl

The Script

This following script operates on the directory created by the wget invocation above, downloads the flash videos, and swaps out youtube object tags with the markup needed for jwplayer.

#!/bin/bash

DIR=whatever_wget_created

function use_local_video_player { 

    for file in $DIR/*.htm $DIR/*.html
    do
        echo Adding script tag to $file
        sed 's%<head>%<head>\n<script type="text/javascript" src="swfobject.js"></script>%' $file > tmp
        mv tmp $file
    done
    cp jwplayer.swf $DIR
    cp swfobject.js $DIR
}

function mirror_videos {

    [ -f vids ] && rm vids
    for file in $DIR/*.htm $DIR/*.html
    do
        echo Looking for video in $file
        cat $file | grep embed >> vids
    done

    perl -i -p -e 's/.*embed src="(.*)" type=.*/\1 /' vids

    while read line
    do
        ./youtube-dl $line
    done < vids
}

function swap_out_video_refs {

    mkdir $DIR/youtube
    mv *.flv $DIR/youtube
    for file in $DIR/*.htm $DIR/*.html
    do
       perl -i -p -e 's/<object width="([0-9]*)" height="([0-9]*)".*com\/v\/(.*)\&hl.*<\/object>/<div id="\3"><\/div><script type="text\/javascript">var so = new SWFObject("jwplayer.swf","mpl","\1","\2","9"); so.addVariable("file","youtube\/\3.flv"); so.write("\3");<\/script>/' $file
    done
}

use_local_video_player
mirror_videos
swap_out_video_refs
Limitations

This script quite obviously is somewhat limited, as it was written to address a specific problem that I had. Here are some limitations. Please comment here if you add improvements.
  • Youtube only – this only works with youtube, not with vimeo et al. The following references (while also specific to youtube) may give you some leads on how to proceed: 1 2
  • Non recursive – obviously the script is only operating on html and htm files in the top level directory. Add recursion at will
  • Video only – JW Player can play audio as well. The script could be enhanced to look for any kind of flash embedded mp3, and swap out the player with the offline version.