Pages
Categories
Archives
- May 2010
- March 2010
- January 2010
- November 2009
- September 2009
- August 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- January 2008
- December 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- February 2007
- September 2006
- August 2006
- July 2006
- June 2006
- May 2006
- April 2006
- March 2006
Data Mining
The current project at work had me tapping into the YouTube API. What I realized was that it wouldn’t be practical to have each client request a shit ton of thumbnails each time. So the quest to build a little system to cache the thumbs, say, once a day was a fun task. Basically a little PHP script that does three things.
First, run a query through a loop (so as not to exhaust memory … PHP was maxing out if the query was requesting too many thumbs at once) and build 2 external pages from the results. The 1st is an XML file which would be fed to the front end. This file contains the video ID as well as a path to a thumbnail which, instead of from YouTube, would be from the local machine. The 2nd file is simple CSV containing the same data however the thumbnail path is the path on the YouTube servers.
Once these files are written to the server I can clear the existing cache of thumbnails followed by a perl script (from PHP!) which reads the CSV and ‘curl’s the thumbnail from YouTube server to the local machine and the cache is stored. It’s fun to learn that one can ‘curl’ via a loop. I’d used this before but only from the command line.
Fun task which will be set up, I believe, as a cron job. Otherwise someone would have to do this on a regular basis. If, for example, the search terms are different.