<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Che Hodgins &#187; xss</title>
	<atom:link href="http://www.chehodgins.com/tag/xss/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.chehodgins.com</link>
	<description>Musings on Web Development</description>
	<lastBuildDate>Tue, 24 Nov 2009 02:39:52 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Simplifying Data Filtering</title>
		<link>http://www.chehodgins.com/php/simplifying-data-filtering/</link>
		<comments>http://www.chehodgins.com/php/simplifying-data-filtering/#comments</comments>
		<pubDate>Wed, 06 May 2009 12:33:38 +0000</pubDate>
		<dc:creator>chehodgins</dc:creator>
				<category><![CDATA[php]]></category>
		<category><![CDATA[filtering]]></category>
		<category><![CDATA[xss]]></category>

		<guid isPermaLink="false">http://www.chehodgins.com/?p=226</guid>
		<description><![CDATA[This post doesn&#8217;t strictly follow my weekly PECL package series per se, but is related by the fact that the subject was briefly an experimental PECL package.
Reinventing the wheel. This is something that programmers do over and over and over again. I have come up with a few hypotheses as to why this is the [...]]]></description>
			<content:encoded><![CDATA[<p>This post doesn&#8217;t strictly follow my weekly PECL package series per se, but is related by the fact that the subject was briefly an experimental PECL package.</p>
<p>Reinventing the wheel. This is something that programmers do <a href="http://thedailywtf.com/Articles/zzGeneralFunctions.aspx">over</a> and <a href="http://thedailywtf.com/Articles/CommonUtils-and-the-Inadequate-javalang.aspx">over</a> and <a href="http://thedailywtf.com/Articles/Lets-All-Reinvent-the-Wheel-Again,-and-More.aspx">over</a> again. I have come up with a few hypotheses as to why this is the case:</p>
<ul>
<li>Don&#8217;t know any better. This is the type of programmer that wants to start writing code right away and doesn&#8217;t wonder if a solution already exists.</li>
<li>Doesn&#8217;t trust the &#8220;wheel&#8221;. This is the person who would rather write something from scratch because they don&#8217;t trust anyone else&#8217;s code.</li>
<li>Can&#8217;t find the &#8220;wheel&#8221;. This is when the person takes a quick look (e.g. 1 Google search) and decides they must write it themselves.</li>
</ul>
<p>I know several people from each of the aforementioned categories. Some are simply clueless with regard to reusability and others just have a hard head. When I first studied software engineering I loved just sitting down with a can of coke and typing as much code as possible, as fast as possible. I remember doing pair programming and having my partner comment &#8220;You are typing too fast&#8221;, and responding with a clever smile. In University I was even forced to re-implement data structures, such as linked lists, to grasp the basics of how they work. In an educational context I think this is a good idea, but not when you are working for real, i.e. in a real company. In my case, I eventually slowed down my typing and thought through what I wanted to do first, did some research, and then proceeded.</p>
<p><strong>Back to the subject at hand</strong></p>
<p>How many times have you wanted to validate an e-mail address?</p>
<p>How many times have you wanted to sanitize input?</p>
<p>How many times have wanted to validate anything really?</p>
<p>For the first question, my typical thought process would involve thinking which characters are allowed, which are not allowed, determine that a regular expression would be ideal for this situation and google it. I would then check if any PEAR packages can help out, such as <a href="http://pear.php.net/package/Validate">Validate</a>. If I was using a framework, such as Zend Framework, I could check out what it offers. For ZF, there are many many classes pertaining to validation and filtering. Meanwhile, our wheel reinventers would start writing a regex, invariably forgetting or simplifying the rules, or ending up with a <a href="http://gravitonic.com/files/talks/php-quebec-2009/regex-clinic.pdf">page long regular expression</a>. For those who Googled say &#8220;php email validation&#8221;, they are presented with over 1 million results containing spiffy regular expressions.</p>
<p><strong>It gets simpler</strong></p>
<p>Available since PHP 5.1 and bundled in PHP as of 5.2 (late 2006), the <a href="http://ca3.php.net/manual/en/book.filter.php">PHP Filter extension</a> makes it way simpler. This extension provides several functions that allow you to do two types of filtering: validation and sanitization. Validation can be done on emails, IP&#8217;s, URL&#8217;s, regex&#8217;s, and more. Data can be sanitized based on many filters but most importantly it can be sanitized similarly to htmlentities().</p>
<div class="codecolorer-container php default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br /></div></td><td><div class="php codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #000000; font-weight: bold;">&lt;?php</span><br />
<span style="color: #b1b100;">echo</span> <span style="color: #990000;">filter_var</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'test@test.com'</span><span style="color: #339933;">,</span> FILTER_VALIDATE_EMAIL<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// returns test@test.com</span><br />
<span style="color: #b1b100;">echo</span> <span style="color: #990000;">filter_var</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'test@test'</span><span style="color: #339933;">,</span> FILTER_VALIDATE_EMAIL<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// returns false</span></div></td></tr></tbody></table></div>
<p>How easy is that? It doesn&#8217;t get any easier than that. Actually, it can. Suppose we want to validate our data from HTTP GET or POST:</p>
<div class="codecolorer-container php default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br /></div></td><td><div class="php codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #000000; font-weight: bold;">&lt;?php</span><br />
<span style="color: #b1b100;">echo</span> <span style="color: #990000;">filter_input</span><span style="color: #009900;">&#40;</span>INPUT_GET<span style="color: #339933;">,</span> <span style="color: #0000ff;">'email'</span><span style="color: #339933;">,</span> FILTER_VALIDATE_EMAIL<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// validates $_GET['email']</span></div></td></tr></tbody></table></div>
<p>This extension also supports sanitization of data, e.g. the removal of invalid characters. This is especially useful to prevent XSS attacks and handles <a href="http://shiflett.org/blog/2005/dec/google-xss-example">character encoding issues</a> fine.</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&lt;?php<br />
echo &quot;Welcome, &quot; . $_GET['name']; // Not good. Set $_GET['name'] = &lt;script&gt;alert('xss');&lt;/script&gt;<br />
echo &quot;Welcome, &quot; . filter_input(INPUT_GET, 'name', FILTER_SANITIZE_STRING); // Safe now :)</div></td></tr></tbody></table></div>
<p><strong>Conclusion</strong></p>
<p>The filter extension provides an easy way to sanitize and validate input. The way it works may not suit everyone&#8217;s needs: The default behavior is not what everyone wants and there are <a href="http://ca3.php.net/manual/en/ref.filter.php#75734">quirks</a>. Still, most of the functions allow fine grained option settings (even callbacks), making this extension easy to use yet customizable for specific needs. For some reason I don&#8217;t see many people use this extension, let alone know that it exists. For something that&#8217;s been bundled with PHP for several years this is unfortunate.</p>
<p>Additional reading on the filter extension: <a href="http://devzone.zend.com/node/view/id/1113">here</a> and <a href="http://phpro.org/tutorials/Filtering-Data-with-PHP.html">here</a>. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.chehodgins.com/php/simplifying-data-filtering/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
