<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Tim Akinbo's Blog &#187; Programming</title>
	<atom:link href="http://blog.timakinbo.com/category/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.timakinbo.com</link>
	<description>the web, mobile technology and location based services as I see it</description>
	<lastBuildDate>Sun, 25 Jul 2010 09:26:46 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>How it was done &#8211; Nigerian postcode data extraction</title>
		<link>http://blog.timakinbo.com/2010/07/25/how-it-was-done-nigerian-postcode-data-extraction/</link>
		<comments>http://blog.timakinbo.com/2010/07/25/how-it-was-done-nigerian-postcode-data-extraction/#comments</comments>
		<pubDate>Sun, 25 Jul 2010 09:26:46 +0000</pubDate>
		<dc:creator>Tim</dc:creator>
				<category><![CDATA[Open Data]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Nigeria]]></category>
		<category><![CDATA[postcodes]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[scraping]]></category>

		<guid isPermaLink="false">http://blog.timakinbo.com/?p=99</guid>
		<description><![CDATA[In my previous post, I talked about open data and how making the Nigerian postcode data open and more accessible has a wide potential for powering several applications. I&#8217;ve received several comments on Facebook with even more examples on how that data could be useful. In this post, I would share how this extraction was [...]]]></description>
			<content:encoded><![CDATA[<p>In my previous post, I talked about open data and how making the Nigerian postcode data open and more accessible has a wide potential for powering several applications. I&#8217;ve received several comments on Facebook with even more examples on how that data could be useful.</p>
<p>In this post, I would share how this extraction was done and how similar extraction scripts or scrapers could be written.</p>
<p>The first step in every scraping project I begin is to understand the HTTP dialog for the website I want to scrape. So I attempt to answer questions like these:</p>
<ol>
<li>Does the application need me to login?</li>
<li>Is it sensitive to certain HTTP features like cookies or referrers?</li>
<li>What urls do I access to view the content I want to extract?</li>
<li>What variables can I set to change the view and specify what I want?</li>
</ol>
<p>Determining the answers to these questions can be obtained by using tools that enable you view this dialog. I personally like to use <a href="http://getfirebug.com/">Firebug</a> for this task.</p>
<div id="attachment_100" class="wp-caption alignnone" style="width: 310px"><a href="http://blog.timakinbo.com/wp-content/uploads/2010/07/http-dialog-nigeriapostcodes.com_-e1280048721758.png"><img class="size-medium wp-image-100" title="HTTP Dialog - Nigeriapostcodes.com" src="http://blog.timakinbo.com/wp-content/uploads/2010/07/http-dialog-nigeriapostcodes.com_-300x203.png" alt="HTTP Dialog" width="300" height="203" /></a><p class="wp-caption-text">Click for a larger version</p></div>
<p>After you&#8217;ve determined the HTTP dialog, you can then write your script to do the extraction. You can write scrapers in any language provided it has support to retrieve HTTP resources and parse HTML. The parsing aspect of a scraper is usually the most interesting part because a lot of parsing libraries choke when they encounter badly formed HTML.</p>
<p>In the code snippet below, I used <a href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a> for parsing the HTML and python&#8217;s <a href="http://docs.python.org/library/urllib2.html">urllib2</a> for the HTTP communication.</p>
<p>The code is <a href="http://gist.github.com/484852">available on Github</a> and although it changes as more functionality is added, you can view the revision log of the gist to see the history of changes.</p>
<p><script src="http://gist.github.com/484852.js"> </script></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.timakinbo.com/2010/07/25/how-it-was-done-nigerian-postcode-data-extraction/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>When unique isn&#8217;t really unique</title>
		<link>http://blog.timakinbo.com/2009/11/24/when-unique-isnt-really-unique/</link>
		<comments>http://blog.timakinbo.com/2009/11/24/when-unique-isnt-really-unique/#comments</comments>
		<pubDate>Tue, 24 Nov 2009 10:53:23 +0000</pubDate>
		<dc:creator>Tim</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Web Development]]></category>
		<category><![CDATA[php]]></category>

		<guid isPermaLink="false">http://blog.timakinbo.com/?p=74</guid>
		<description><![CDATA[Sometimes, an auto-generated number isn&#8217;t enough and what you really need is a unique identifier. Several people have different techniques for generating their unique identifiers. My favorite has been generating a random number and then hashing it through the md5 hash generator. Here&#8217;s an example I was once using: &#60;?php $unique_identifier = md5(rand(100000, 999999)); ?&#62; [...]]]></description>
			<content:encoded><![CDATA[<p>Sometimes, an auto-generated number isn&#8217;t enough and what you really need is a unique identifier. Several people have different techniques for generating their unique identifiers. My favorite has been generating a random number and then hashing it through the md5 hash generator. Here&#8217;s an example I was once using:</p>
<pre class="brush: php">&lt;?php $unique_identifier = md5(rand(100000, 999999)); ?&gt;</pre>
<p>The problem with this is that I have given an allowance for only 899,999 possible values. I didn&#8217;t realize my error until I started getting mysql integrity check errors for a unique column that stored that value.</p>
<p>I reverted to using a more elegant solution:</p>
<pre class="brush: php">&lt;?php $unique_identifier = md5(uniqid(rand(), true)); ?&gt;</pre>
<p>The <em>uniqid</em> statement generates a globally unique identifier with a <em>rand()</em> prefix and using much more entropy (<em>true).</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.timakinbo.com/2009/11/24/when-unique-isnt-really-unique/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Using git hooks to check syntax errors</title>
		<link>http://blog.timakinbo.com/2009/09/25/using-git-hooks-to-check-syntax-errors/</link>
		<comments>http://blog.timakinbo.com/2009/09/25/using-git-hooks-to-check-syntax-errors/#comments</comments>
		<pubDate>Fri, 25 Sep 2009 07:57:33 +0000</pubDate>
		<dc:creator>Tim</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Web Development]]></category>
		<category><![CDATA[drupal]]></category>
		<category><![CDATA[git]]></category>
		<category><![CDATA[php]]></category>

		<guid isPermaLink="false">http://blog.timakinbo.com/?p=53</guid>
		<description><![CDATA[Git is currently my favorite source code versioning tool and while I used Subversion, I knew about something called hooks that I never used. Essentially, hooks allow you to execute custom scripts when you perform certain actions on your repository like committing files, pulling updates and so on. This is a very useful as you [...]]]></description>
			<content:encoded><![CDATA[<p>Git is currently my favorite source code versioning tool and while I used Subversion, I knew about something called hooks that I never used.</p>
<p>Essentially, hooks allow you to execute custom scripts when you perform certain actions on your repository like committing files, pulling updates and so on. This is a very useful as you can write hook scripts to (say for example) automatically ftp a file to your web server when a change has been made.</p>
<p>A whole lot of really cool hook scripts have been written and if you use any code versioning tools, you should check out the ones that have been written for the tool you use.</p>
<p>In particular, I find that sometimes developers could check in code that has syntactic bugs. This happens in environments where there are no strict code testing rules. It can be really annoying when you or someone else does this and you have to fix that and then commit again&#8230; not professional at all. So I came across <a href="http://phpadvent.org/2008/dont-commit-that-error-by-travis-swicegood" target="_blank">this post</a> by Travis Swicegood that lists code that does a php lint on your PHP files before committing them to the repository. PHP lint (php -l) basically checks the syntax of your code and either gives an &#8220;ok&#8221; or prints the offending line.</p>
<p>For one of the projects I&#8217;m working on, I had to change line 11 of Travis&#8217; code to read:</p>
<p><code>$filename_pattern = '/\.(php|engine|theme|install|inc|module|test)$/';</code></p>
<p>instead of</p>
<p><code>$filename_pattern = '/\.php$/';</code></p>
<p>If you&#8217;ve done Drupal coding, you&#8217;ll quickly recognize that <img src='http://blog.timakinbo.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.timakinbo.com/2009/09/25/using-git-hooks-to-check-syntax-errors/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BOMs can really drive you nuts!</title>
		<link>http://blog.timakinbo.com/2009/09/14/boms-can-really-drive-you-nuts/</link>
		<comments>http://blog.timakinbo.com/2009/09/14/boms-can-really-drive-you-nuts/#comments</comments>
		<pubDate>Mon, 14 Sep 2009 19:40:18 +0000</pubDate>
		<dc:creator>Tim</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://blog.timakinbo.com/?p=50</guid>
		<description><![CDATA[A BOM is an acronym for byte-order mark and is essentially used to tell the type of encoding of a data stream or file without having to explicitly specify it (for instance, through the content-type header in HTTP response). I&#8217;d been having a particular issue with an API I built for a web application I&#8217;m [...]]]></description>
			<content:encoded><![CDATA[<p>A BOM is an acronym for <strong>byte-order mark</strong> and is essentially used to tell the type of encoding of a data stream or file without having to explicitly specify it (for instance, through the content-type header in HTTP response).</p>
<p>I&#8217;d been having a particular issue with an API I built for a web application I&#8217;m managing and I just couldn&#8217;t figure out what was wrong until I got to using the API in an application that had HEX output.</p>
<p>So I put the application in debug mode and watched the communication stream. The API was supposed to return an integer value but instead, I noticed a <em>EF BB BF</em> hex sequence being prepended in the output. All attempts to remove this were futile.</p>
<p>Since I used the Symfony framework for the application, I suspected that it might be a bug and googled for a solution. The best answer I could get was to upgrade the framework, which I did but the problem persisted.</p>
<p>My big break came when I attempted to make another API call to a separate method that resulted in the correct output. While tracing the source of the problem, I traced it back to a php file that was incorrectly saved in the UTF-8 character encoding format; which was included when making the buggy API call. Opening and saving this file in ANSI format solved the problem.</p>
<p>So next time you run into a problem like this, make sure to check the encoding of your source files. Better still, you can use a tool like <a href="http://en.wikipedia.org/wiki/Iconv" target="_blank">iconv</a> to convert your files to the appropriate character encoding format.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.timakinbo.com/2009/09/14/boms-can-really-drive-you-nuts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using Git over non-standard SSH ports</title>
		<link>http://blog.timakinbo.com/2009/09/14/using-git-over-non-standard-ssh-ports/</link>
		<comments>http://blog.timakinbo.com/2009/09/14/using-git-over-non-standard-ssh-ports/#comments</comments>
		<pubDate>Mon, 14 Sep 2009 13:01:46 +0000</pubDate>
		<dc:creator>Tim</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Web Development]]></category>
		<category><![CDATA[git]]></category>
		<category><![CDATA[SSH]]></category>

		<guid isPermaLink="false">http://blog.timakinbo.com/?p=43</guid>
		<description><![CDATA[I&#8217;ve configured some deployment servers to use SSH over non-standard SSH ports and that can really be a problem when you want to use that with git. No matter what you do, git would always attempt to connect through the standard SSH port 22. There was really no point in scratching my head and trying [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve configured some deployment servers to use SSH over non-standard SSH ports and that can really be a problem when you want to use that with git. No matter what you do, git would always attempt to connect through the standard SSH port 22.</p>
<p>There was really no point in scratching my head and trying to pull my hair out in order to fix this. So I googled a solution. This <a href="http://infovore.org/archives/2008/10/13/pulling-from-git-over-a-non-standard-ssh-port/" target="_blank">post</a> gave me a hint but wasn&#8217;t helpful enough so I decided to write a blog post on how to go about solving this.</p>
<p>You&#8217;ll have to configure your repository in your <em>.ssh/config</em> file. Here&#8217;s an example. Simply substitute the <em>#*#</em> placeholders for the actual values</p>
<pre>
Host #hostname#
  User #username#
  Hostname #hostname#
  Port #non-standard port#
</pre>
<p>And if you prefer to use private keys to login:</p>
<pre>
Host #hostname#
  User #username#
  Hostname #hostname#
  Port #non-standard port#
  PreferredAuthentications publickey
  IdentityFile "#path_to_private_key#"
</pre>
]]></content:encoded>
			<wfw:commentRss>http://blog.timakinbo.com/2009/09/14/using-git-over-non-standard-ssh-ports/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Migrating from SVN to GIT</title>
		<link>http://blog.timakinbo.com/2009/09/03/migrating-from-svn-to-git/</link>
		<comments>http://blog.timakinbo.com/2009/09/03/migrating-from-svn-to-git/#comments</comments>
		<pubDate>Thu, 03 Sep 2009 06:25:02 +0000</pubDate>
		<dc:creator>Tim</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[git]]></category>
		<category><![CDATA[migration]]></category>
		<category><![CDATA[scm]]></category>
		<category><![CDATA[svn]]></category>

		<guid isPermaLink="false">http://blog.timakinbo.com/?p=37</guid>
		<description><![CDATA[One of the big benefits of using source code versioning (or source code management systems) is that it allows you to maintain a history of all the changes in code and allows for easy collaboration amongst several developers on the same code base. SCMs will allow you answer the following questions: Who made what change [...]]]></description>
			<content:encoded><![CDATA[<p>One of the big benefits of using source code versioning (or source code management systems) is that it allows you to maintain a history of all the changes in code and allows for easy collaboration amongst several developers on the same code base. SCMs will allow you answer the following questions:</p>
<ol>
<li>Who made what change</li>
<li>What change was made</li>
<li>When was the change made</li>
</ol>
<p>If you happen to have a number of developers and you&#8217;re working on the same project (even if you&#8217;re the only developer, this helps) you like to know when a change to the source code breaks the application (this happens alot). Not only that, having an SCM allows you to easily revert that change. This post is not about SCMs but I thought it necessary to provide a little background information.</p>
<p>I recently started using git and have really loved it (although it&#8217;s not perfect, it does the job well). I prefer git over subversion (svn) for a major reason that it is distributed: does not require a connection to a server in order to commit to the repository.</p>
<p>Recently, I thought about migrating my svn repositories to git and I found a wonderful resource for that <a href="http://www.simplisticcomplexity.com/2008/03/05/cleanly-migrate-your-subversion-repository-to-a-git-repository/" target="_blank">here</a>. I&#8217;m not going to attempt to republish what has been said there here. Instead just point out a couple of things I had to do differently.</p>
<p>First of all, depending on the way your git is installed, these are basically the commands to run:</p>
<p><code> git svn init svn://server/repo/trunk/ --no-metadata<br />
git config svn.authorsfile /path/to/svn-authors<br />
git svn fetch</code></p>
<p>The difference here is to use <em>git svn</em> instead of <em>git-svn</em>. Also, if you happen to have <em>blank</em> authors (as I had in mine), then your svn-authors file should contain an entry similar to this:</p>
<p><code>(no author) = Firstname Lastname &lt;emailaddress&gt;</code></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.timakinbo.com/2009/09/03/migrating-from-svn-to-git/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
