Data Mining

Written by skoch. Filed under work. Tagged , , , , . Bookmark the Permalink. Post a Comment. Leave a Trackback URL.

The current project at work had me tapping into the YouTube API. What I realized was that it wouldn’t be practical to have each client request a shit ton of thumbnails each time. So the quest to build a little system to cache the thumbs, say, once a day was a fun task. Basically a little PHP script that does three things.

First, run a query through a loop (so as not to exhaust memory … PHP was maxing out if the query was requesting too many thumbs at once) and build 2 external pages from the results. The 1st is an XML file which would be fed to the front end. This file contains the video ID as well as a path to a thumbnail which, instead of from YouTube, would be from the local machine. The 2nd file is simple CSV containing the same data however the thumbnail path is the path on the YouTube servers.

Once these files are written to the server I can clear the existing cache of thumbnails followed by a perl script (from PHP!) which reads the CSV and ‘curl’s the thumbnail from YouTube server to the local machine and the cache is stored. It’s fun to learn that one can ‘curl’ via a loop. I’d used this before but only from the command line.

Fun task which will be set up, I believe, as a cron job. Otherwise someone would have to do this on a regular basis. If, for example, the search terms are different.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>