Che Hodgins // Musings on Web Development

Quick review of online backup services

My [quick] personal review of the online backup services listed in Lifehackers Five best online backup tools article.

I’m looking for: cheap, encrypted, ~90GB storage, data that won’t change much once uploaded, data rarely downloaded.

Crash Plan:
- Incremental upload (i.e. can be paused), nice UI.
- Personal encryption key
- 4.50$/month

Mozy:
- Ugly interface + horrible upload process (buggy)
- Personal encryption key
- 5.95$/month

Dropbox:
- Very nice, except;
- Too expensive (9.99$/month)
- 50GB limit

Jungle Disk:
- No incremental upload, rough UI.
- Tedious signup with amazon payments
- Personal encryption key
- Provider selection (amazon or rackspace)
- Pay per GB (.15$/GB up, .17$/GB down)

Carbonite:
- Only local hard drive
- Had to google to find uninstall (shame on you)
- 4.58$/month

So I’m going with Crash Plan…

php -v

PHP 5.3.0 (cli) (built: Jun 30 2009 13:24:04)
Copyright (c) 1997-2009 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2009 Zend Technologies

Yes, it has arrived.

Today is a big day for open source software. Lets look at some notable releases:

This is one of those days where I realize more and more that open source software (and notably the PHP community) is thriving and I love it!

Tags: ,

Advanced Geolocation

I write this in honor of the Firefox web browser. I still remember when it was first released November 9th, 2004, and gave me hope for a better, nicer, non-IE world. Today, Firefox 3.5 is released. Building upon nearly 5 years of success, they have continued innovating and I thank them for making the web a better place.

Feel free to Skip to the demo.

I previously wrote about using IP based Geolocation. Although this method is widely used, the downsides are obvious: inaccurate results, proxies, false positives, and a lack of privacy control for the end user.

The Future of Geolocation

The new generation of browsers are implementing the Geolocation API specification. This gives the browser the job of figuring out where you are. There are some positive points and negative points to this. Firstly, the position of the user can be more accurate. In IP-based Geolocation, the only data available is the IP address. The browser has access to much more precise data such as WiFi networks and GPS devices (iPhone!). Secondly, privacy settings. The browser should be able to ask the user if they will allow such information to be shared, ideally even the level of accuracy that should be shown. This is possible if implemented in the browser. One negative point is something we are all familiar with: cross-browser compatibility. Different implementations in different web browser will make developers miserable, but hey, that’s what standards are for, right? :)

Browser Support

As of today a few web browsers support geolocation. Here’s the status of the mainstream browsers:

  • Firefox: Available in version 3.5, released today.
  • iPhone Safari: Available in OS3.
  • IE: Experimental in version 8.
  • Opera: Available in nightly builds since March 2009.
  • Chrome: Available through Google Gears API
  • Safari: Unknown.

Here is how to request a users location (see line 26 for the goods):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
function knownLocation(position) {

    var latitude, longitude;
    if (position.coords) { // iPhone
        latitude = position.coords.latitude;
        longitude = position.coords.longitude;
    } else { // Firefox
        latitude = position.latitude;
        longitude = position.longitude;
    }

    var div = document.getElementById('geo');
    div.innerHTML = div.innerHTML + "Latitude: " + latitude + "<br/>Longitude: " + longitude;
}

function unknownLocation() {
    var div = document.getElementById('geo');
    div.innerHTML = div.innerHTML + "Unknown Location";
}

window.onload = function() {
    var div = document.getElementById('geo');
    div.innerHTML = div.innerHTML + "Browser: " + navigator.appName + " (" + navigator.appVersion + ")<br/>";

    if (navigator.geolocation) {
        navigator.geolocation.getCurrentPosition(knownLocation, unknownLocation);
    } else {
        div.innerHTML = div.innerHTML + "Browser not supported";
    }
};

This then prompts the user for their approval:

Geolocation in Firefox 3.5

Similarly, on the iPhone:

Conclusion

This is really cool. With geolocation implemented in the browser, great precision can be achieved. For example, on the iPhone the GPS is used, so visiting the demo page from my living room gives different coordinates than when visiting from my kitchen. I can imagine many useful applications of this. Another thing I love is that I can deny my location to certain sites, which I will absolutely use on certain sites.

Tags: , , ,

A thought on Ignorance

I just saw this post on friendfeed:

“I love Drupal and Joomla, but it’s too bad they’re written in PHP… which is kind of an antique compared to newer stuff like Ruby on Rails… the next generation of cool CMS platforms will probably be running something like Rails, not PHP.”

Normally I would disregard this kind of ignorance but the fact that some people were agreeing with him pissed me off :)

I began working feverishly on preparing a response to this “media guy”, arming myself mostly with Terry Chay blog posts. In the end I cooled off and decided to write a quick blog post about it.

I won’t defend PHP, as there is plenty of evidence on the web that can do this for me. I just think we should be skeptical when “cool”, “antique”, and “next-generation” are used in the same sentence by social media junkies.

Tags: ,

Free and Fast Geolocation in PHP

Geo* (as I call them) are the web technologies that provide a link between online content and Earth’s geography. Examples includes Geocoding (finding latitude/longitude based on street addresses), Geotagging (tagging media with latitude/longitude coordinates), and Geolocation (finding latitude/longitude of a computer).

Geolocation is a particularly cool technique because it allows you to estimate a person’s geographic location, thus allowing you to provide a custom tailored experience on your website, among other things. This can be useful as much as it can be annoying. There are several methods of Geolocation, some as simple as asking the user where they are located. This article focuses on adding IP based Geolocation to your PHP website for free all the while keeping it fast.

Problems

If IP addresses are to be used to determine a persons physical location then a few possible problems come to mind:

  • How accurate is the mapping between an IP address and a geographical location?
    • From maxmind.com’s Geolocation service: “99.8% accurate on a country level, 90% accurate on a state level, and 83% accurate for the US within a 25 mile radius.”. Doing some research, the matching is done using either the address of the ISP that owns that IP [link], or by buying the data from websites that ask for users locations [link].
  • What about users behind proxies?
    • Some Geolocation databases flag the IPs of potential anonymous proxy servers.
    • Most proxy servers send X-Forwarded-For and Client-IP headers that you can use.

This is not perfect, but in many cases the approximate geographical location of a user can be inferred.

Demo time

This demo will use the free Geolocation database provided by Maxmind.com. I believe this is the ideal choice for normal (i.e. not Facebook) websites for several reasons:

  • It is free (there is a paid version with higher accuracy)
  • It is fast. They report up to 1 million queries per second on 1 machine.
  • It is extensible. The database can be upgraded to the paid version by just replacing the binary.
  • They like developers. They provide implementations in over 10 different programming languages, with benchmarks.
  • Their website is full of valuable information. They provide benchmarks, an explanation of how they collect their data, and more. I haven’t seen this with any other IP Geolocation services.

There are two options for us PHP developers. The pure PHP library or a PECL package implementing the C library. For reasons that will be discussed below, the PECL package will be used. If you do not want to use a PECL package or are on a hosted server, then you can download the pure PHP classes here.

First, the GeoIP C library must be downloaded (link) and installed. Note that this can be installed on windows as well. No special options are needed to install it:

1
2
3
mbpro:GeoIP-1.4.6 chehodgins$ sudo ./configure
mbpro:GeoIP-1.4.6 chehodgins$ sudo make
mbpro:GeoIP-1.4.6 chehodgins$ sudo make install



Then the PECL package can be installed:

1
2
3
4
mbpro:~ chehodgins$ sudo pecl install geoip
downloading geoip-1.0.7.tar ...
...
You should add "extension=geoip.so" to php.ini



Next, add this extension to php.ini (i.e. extension=geoip.so), restart apache and check out phpinfo():

GeoIP in phpinfo()

GeoIP in phpinfo()



The final step before writing code is to download the actual database. It is updated monthly so remember to stay up to date. The directory that should contain the file is OS dependent, so create a quick php script to see where the directory is:

1
2
3
4
ini_set('display_errors', true);
error_reporting(E_ALL | E_STRICT);
$result = geoip_record_by_name('72.30.81.165');
var_dump($result);



Gives us:

Determine the binary directory


Now save the binary to the directory mentioned in the php warning, reload your script, and the warning should disappear. Let’s try again with some more code:

1
2
3
4
5
6
7
8
ini_set('display_errors', true);
error_reporting(E_ALL | E_STRICT);

$functions = get_extension_funcs('geoip');
var_dump($functions);

$result = geoip_record_by_name('72.30.81.165');
var_dump($result);



Gives:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
array
0 => string 'geoip_database_info' (length=19)
1 => string 'geoip_country_code_by_name' (length=26)
2 => string 'geoip_country_code3_by_name' (length=27)
3 => string 'geoip_country_name_by_name' (length=26)
4 => string 'geoip_continent_code_by_name' (length=28)
5 => string 'geoip_org_by_name' (length=17)
6 => string 'geoip_record_by_name' (length=20)
7 => string 'geoip_id_by_name' (length=16)
8 => string 'geoip_region_by_name' (length=20)
9 => string 'geoip_isp_by_name' (length=17)
10 => string 'geoip_db_avail' (length=14)
11 => string 'geoip_db_get_all_info' (length=21)
12 => string 'geoip_db_filename' (length=17)
13 => string 'geoip_region_name_by_code' (length=25)
14 => string 'geoip_time_zone_by_country_and_region' (length=37)

array
'continent_code' => string 'NA' (length=2)
'country_code' => string 'US' (length=2)
'country_code3' => string 'USA' (length=3)
'country_name' => string 'United States' (length=13)
'region' => string 'CA' (length=2)
'city' => string 'Sunnyvale' (length=9)
'postal_code' => string '94089' (length=5)
'latitude' => float 37.4249000549
'longitude' => float -122.007400513
'dma_code' => int 807
'area_code' => int 408



With only an IP address we can easily get the country, postal code, longitude and latitude, and even the area code of the user.

Performance

I initially thought that the PECL version would outperform the pure PHP version by a small percentage. I was wrong. The PECL version was much faster. Here are some informal benchmarks.

Iterations Total Avg Notes
PECL GeoIP 10,000 0.7s .007ms per request
Pure PHP 10,000 49.2s 4.92ms per request
PECL GeoIP 1 0.08ms 0.08ms per request Typical real world usage
Pure PHP 1 2.4ms 2.4ms per request Typical real world usage

As a validation of my results I benchmarked the pure PHP library being used in a web application and had comparable results to my benchmarks (5.9ms per IP lookup versus the 2.4ms above).

Conclusion

Because of the ease of implementation, the low cost, and the minimal performance losses, there is much to be gained by adding IP Geolocation to your web application. The PECL package is the ideal configuration because it provides a faster experience with less code to maintain. The pure PHP library is none the less still relatively fast and thus still worth it. This is still far from a perfect solution. False positives can occur, anonymous proxies mess everything up, and IP addresses are constantly changing. Also, what about users’ who simply do not want to share their location? There are privacy issues. This is currently a hot topic, with the W3C geolocation API being actively worked on, including the efforts of Mozilla, Opera and others to improve the situation of location awareness on the web, something I am looking forward to it.

More reading:

Interesting Geolocation presentation
GeoIP functions in the PHP manual
Cool Geo* stuff at Y!

Tags: , , ,

Simplifying Data Filtering

This post doesn’t strictly follow my weekly PECL package series per se, but is related by the fact that the subject was briefly an experimental PECL package.

Reinventing the wheel. This is something that programmers do over and over and over again. I have come up with a few hypotheses as to why this is the case:

  • Don’t know any better. This is the type of programmer that wants to start writing code right away and doesn’t wonder if a solution already exists.
  • Doesn’t trust the “wheel”. This is the person who would rather write something from scratch because they don’t trust anyone else’s code.
  • Can’t find the “wheel”. This is when the person takes a quick look (e.g. 1 Google search) and decides they must write it themselves.

I know several people from each of the aforementioned categories. Some are simply clueless with regard to reusability and others just have a hard head. When I first studied software engineering I loved just sitting down with a can of coke and typing as much code as possible, as fast as possible. I remember doing pair programming and having my partner comment “You are typing too fast”, and responding with a clever smile. In University I was even forced to re-implement data structures, such as linked lists, to grasp the basics of how they work. In an educational context I think this is a good idea, but not when you are working for real, i.e. in a real company. In my case, I eventually slowed down my typing and thought through what I wanted to do first, did some research, and then proceeded.

Back to the subject at hand

How many times have you wanted to validate an e-mail address?

How many times have you wanted to sanitize input?

How many times have wanted to validate anything really?

For the first question, my typical thought process would involve thinking which characters are allowed, which are not allowed, determine that a regular expression would be ideal for this situation and google it. I would then check if any PEAR packages can help out, such as Validate. If I was using a framework, such as Zend Framework, I could check out what it offers. For ZF, there are many many classes pertaining to validation and filtering. Meanwhile, our wheel reinventers would start writing a regex, invariably forgetting or simplifying the rules, or ending up with a page long regular expression. For those who Googled say “php email validation”, they are presented with over 1 million results containing spiffy regular expressions.

It gets simpler

Available since PHP 5.1 and bundled in PHP as of 5.2 (late 2006), the PHP Filter extension makes it way simpler. This extension provides several functions that allow you to do two types of filtering: validation and sanitization. Validation can be done on emails, IP’s, URL’s, regex’s, and more. Data can be sanitized based on many filters but most importantly it can be sanitized similarly to htmlentities().

1
2
3
<?php
echo filter_var('test@test.com', FILTER_VALIDATE_EMAIL); // returns test@test.com
echo filter_var('test@test', FILTER_VALIDATE_EMAIL); // returns false

How easy is that? It doesn’t get any easier than that. Actually, it can. Suppose we want to validate our data from HTTP GET or POST:

1
2
<?php
echo filter_input(INPUT_GET, 'email', FILTER_VALIDATE_EMAIL); // validates $_GET['email']

This extension also supports sanitization of data, e.g. the removal of invalid characters. This is especially useful to prevent XSS attacks and handles character encoding issues fine.

1
2
3
<?php
echo "Welcome, " . $_GET['name']; // Not good. Set $_GET['name'] = <script>alert('xss');</script>
echo "Welcome, " . filter_input(INPUT_GET, 'name', FILTER_SANITIZE_STRING); // Safe now :)

Conclusion

The filter extension provides an easy way to sanitize and validate input. The way it works may not suit everyone’s needs: The default behavior is not what everyone wants and there are quirks. Still, most of the functions allow fine grained option settings (even callbacks), making this extension easy to use yet customizable for specific needs. For some reason I don’t see many people use this extension, let alone know that it exists. For something that’s been bundled with PHP for several years this is unfortunate.

Additional reading on the filter extension: here and here.

Tags: , ,

Sorting out your PHP includes using Inclued

This is the third edition of my weekly PECL package series. Check out my Scream article as well as my Sphinx article to learn about these extensions.

If you have ever inherited spaghetti code or worse, written spaghetti code, the following article is for you. This article is an introduction to the Inclued PECL extension. It helps answer the common question “Where is this include coming from?”, something that I’ve asked myself before when working on some projects.

This extension works by overriding an opcode in Zend, allowing it to log information regarding which files are being included, and from where. This information can be collected using a single function named inclued_get_data() or by setting inclued.dumpdir in php.ini to dump the data of each request.

The final step involves graphing this data to get a view of the include hierarchy. This can be done by converting the JSON encoded output into a dot language file, and then converting it to an image or viewing it with an application such as Graphviz.

To start, we need to install the inclued PECL extension:

1
2
3
4
mbpro:~ chehodgins$ sudo pecl install inclued-alpha
downloading inclued-0.1.0.tar ...
[...]
install ok: channel://pecl.php.net/inclued-0.1.0

And add to php.ini and restart apache:

1
2
3
extension=inclued.so
inclued.enabled=1
inclued.dumpdir=/tmp

Next, in our web browser we load the page that we wish to analyze. A file named inclued.*.* will be added to /tmp. We will convert to this to a dot file using the gengraph.php script that is included in the PECL package:

1
2
3
mbpro:tmp chehodgins$ php /usr/local/lib/php/gengraph.php -i inclued.00196.2
Written inclued.out.dot...
To generate images: dot -Tpng -o inclued.png inclued.out.dot

Now we have the choice to either create a png using the dot command or simply opening with Graphviz.

1
mbpro:tmp chehodgins$ dot -Tpng -o ~/Documents/inclued.png inclued.out.dot

And a super nice graph of the includes is generated as an image. Here is the graph of the includes in Wordpress (click to view fullscreen):

Inclued run on Wordpress

Inclued run on Wordpress

Notice that there are a lot of includes, but in general there appears to be order. Now let’s check out osCommerce:

Inclued on osCommerce

Inclued on osCommerce

This also looks decent. What about magento?

Inclued in Magento

Inclued in Magento

Holy crap, thats a lot of includes!

In conclusion, the inclued PECL extension can be useful in many situations, from trying to understand how and
why a file is being included, to reorganizing your includes by seeing the dependencies. If anything, it can be an
easy way to show off your application/framework’s include structure.

Tags: , ,

Search improvements using Sphinx, MySQL and PECL

This is the second edition of my weekly PECL package series. See last week’s post to learn about the Scream extension.

This week’s topic will be on Full-Text searching using Sphinx, specifically with the PHP client extension written by Antony Dovgal and released as a 1.0 PECL package in late January 2009.

Background

Sphinx is an open source full-text search engine. It provides an alternative to MySQL full-text searching. Its main features include high search speed (avg query is under 0.1 sec on 2-4 GB text collections), high scalability (up to 100 GB of text, up to 100 M documents on a single CPU) and most importantly, native support for MySQL (MyISAM and InnoDB) and PostgreSQL . It has also proven its worth considering that it is used on web sites such as Craigslist, Netlog, and The Pirate Bay.

Sphinx Install

There are two methods of using Sphinx in PHP: Using the PHP API or using the native libaries with the PECL package. We will of course be covering the PECL version :)

Installation a basic version of sphinx is easy:

1
2
3
mbpro:sphinx-0.9.8.1 chehodgins$ sudo ./configure --prefix /usr/local/share/sphinx --with-mysql /usr/local/share/mysql/
mbpro:sphinx-0.9.8.1 chehodgins$ sudo make
mbpro:sphinx-0.9.8.1 chehodgins$ sudo make install

Next, using the sphinx.conf configuration file a data source and index must be defined. I have added a table named `track` in my MySQL database with 7.8 million track names.

1
2
mbpro:etc chehodgins$ sudo cp sphinx.conf.dist sphinx.conf
mbpro:etc chehodgins$ sudo vi sphinx.conf

In sphinx.conf:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
source track
{
type                                    = mysql

sql_host                                = localhost
sql_user                                = root
sql_pass                                = root
sql_db                                  = test
sql_port                                = 3306

sql_sock                                = /Applications/MAMP/tmp/mysql/mysql.sock
sql_query_pre                 = SET NAMES utf8

# the data to be indexed
sql_query  = SELECT id, name, length, year FROM track;

}

index track_index
{
# document source(s) to index
source                  = track

# index files path and file name, without extension
# mandatory, path must be writable, extensions will be auto-appended
path                    = /usr/local/share/sphinx/var/data/track_index

min_word_len            = 1
}

We can now index our data and start the sphinx server:

1
2
3
4
5
6
7
8
mbpro:sphinx chehodgins$ sudo bin/indexer track_index
mbpro:sphinx chehodgins$ sudo /usr/local/share/sphinx/bin/searchd

Sphinx 0.9.8.1-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff

using config file '/usr/local/share/sphinx/etc/sphinx.conf'...
creating server socket on 0.0.0.0:3312

The data indexing took 1 minute on 7.8 million rows (204 MB of data) at a speed of 116655.16 docs/sec! Note that indexing should be done on regular intervals, depending on how fresh the data is required to be.

PHP/PECL Install

With our data indexed we must now get access to the Sphinx API. This is done using the Sphinx PECL extension. Before installating the PECL package we must install libsphinxclient, which is included in the Sphinx distribution:

1
2
3
mbpro:libsphinxclient chehodgins$ cd sphinx-0.9.8.1/api/libsphinxclient/
mbpro:libsphinxclient chehodgins$ LIBTOOLIZE=glibtoolize sudo ./buildconf.sh
mbpro:libsphinxclient chehodgins$ sudo ./configure && make install

Now we are ready to install the PECL package:

1
mbpro:~ chehodgins$ sudo pecl install sphinx

Now it must be added to php.ini:

1
extension=sphinx.so

Restart apache and check that it is installed:

Sphinx in phpinfo

Sphinx in phpinfo

Now it’s simply a matter of using the Sphinx function reference on php.net to query your dataset.

1
2
3
4
5
6
7
8
9
10
11
12
13
<?php

$sphinx = new SphinxClient();
$sphinx->setServer("localhost", 3312);
$sphinx->setMatchMode(SPH_MATCH_ALL);
$sphinx->setMaxQueryTime(500); // Limit query to 500 milliseconds
$sphinx->setLimits(0, 10, 1000); // return first 10 results

$result = $sphinx->query('Ride the Lightning');
var_dump($result['matches']);

echo $result['total_found'] . ' total results found.';
?>

Thanks to the Sphinx log, you can see that the query executed in .042 seconds:

1
[Sat Apr 18 01:33:58.878 2009] 0.042 sec [all/0/rel 160 (0,10)] [*] Ride the Lightning

Conclusion

The example was kept simple, but queries can be refined even more using SQL-like methods of the Sphinx API. Notably, setGroupBy() will do the equivalent of GROUP BY and ORDER BY. Also, setFilter() will add extra filtering on other columns in the dataset.

This is the tip of the iceberg of the different ways that Sphinx can be used. The easy integration with MySQL combined with the ease of setup make it a logical next step when MySQL’s Full-Text indexing performance degrades. It also appears capable of scaling to the needs of the top-tiered websites out there. As such, I would seriously consider Sphinx when looking for solutions to your searching needs.

Finally, it would be worthwhile to explore alternatives such as Lucene (Java), Solr (Java), and Marjory (PHP).

Tags: , ,

Weekly PECL Package – Scream

This is the first of what is planned to be a weekly post on a more or less random PECL package. The idea is for me to get to know some PECL packages in more detail and for you to get to know some PECL packages in more detail – without losing your precious time.

For the first edition of this series I will cover the relatively new PECL package aptly named Scream. The purpose of this extension is to, well, scream. It will disable the the silence operator (@) so that any hidden errors will still be shown. After this, you may scream at whoever used the silence operator in the first place – thus the name Scream (Just kidding?).

Lets get started…

1
che-hodginss-macbook-pro:~ chehodgins$ sudo pecl install scream-alpha

After a few minutes…

1
2
3
Build process completed successfully
Installing '/usr/local/lib/php/extensions/no-debug-non-zts-20060613/scream.so'
install ok: channel://pecl.php.net/scream-0.1.0

Great, now add it to php.ini and restart apache:

1
2
extension=scream.so
scream.enabled=1

Check phpinfo and we are ready to go:

I'm new to macs and just discovered taking screenshots of portions of the screen (Apple key ⌘ + Shift + 4). Very cool.

I'm new to macs and just discovered taking screenshots of portions of the screen (Apple key ⌘ + Shift + 4). Very cool.

















Now we will borrow some code from some open source projects that use the silence operator and see what happens.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
ini_set('display_errors', 1);
error_reporting(E_ALL | E_STRICT);

echo 'starting... ';

// Initialize
$host = $user = $password = $sock = $port = $errno = $errstr = $response = '';

// From Joomla!
if (!($resource = @mysql_connect( $host, $user, $password, true ))) {
// ...
}

// From Wordpress
$response .= @ fread ( $sock, 8192 );

// From Joomla!
@ dl('bz2.so');

// From Wordpress
$sock = @fsockopen($host, $port, $errno, $errstr);

echo "done.\n";

?>



With scream.enabled = 0 we get this lovely output:

1
2
3
che-hodginss-macbook-pro:www chehodgins$ php -f scream.php
starting... done.
che-hodginss-macbook-pro:www chehodgins$

And with scream.enabled = 1:

1
2
3
4
5
6
7
8
9
10
che-hodginss-macbook-pro:www chehodgins$ php -f scream.php
starting...
Warning: fread(): supplied argument is not a valid stream resource in /Users/chehodgins/www/scream.php on line 17

Warning: dl(): Unable to load dynamic library '/Applications/MAMP/bin/php5/lib/php/extensions/no-debug-non-zts-20050922/bz2.so' - (null) in /Users/chehodgins/www/scream.php on line 20

Warning: fsockopen() expects parameter 2 to be long, string given in /Users/chehodgins/www/scream.php on line 23
done.

che-hodginss-macbook-pro:www chehodgins$


It is obvious at this moment that in general it is not advisable to use the silence operator. Most PHP programmers have been burnt by this a few times and usually will be much more harsh towards those think this feature is useful. I can recall spending lots of time bug hunting before finding an @ which lead me to a simple error. Its a painful experience, don’t do it.

As a programmer you may already steer clear of the silence operator but much code is inherited. Because of the simplicity of the silence operator it can be hard to track down where it is used in your code. Try searching for ‘@’ in one of your projects, how many thousands of results do you get? That is one reason to install this on your dev box and find those tough bugs before they hit production.

And just in case you are still not convinced, check out Five reasons the shut-up operator (@) should be avoided by Derick Rethans

Tags: ,

Welcome to Gmail?

February 25, 2009bugsComments

Not even Google’s software is free of defects. Luckily this one is not dangerous.

Go to gmail.com, if you are signed in then sign out. Click “sign up for gmail”. You will see the usual account creation form.

Now, lets say you remembered that you already have a Google account, so click on “sign in here” at the top. Sign in as usual and you are greeted with this lovely message:

gmail_intro

Aww shucks, thanks for the welcome…

Now let me get back to real work.

Tags: ,

PHP processing after the request has completed

January 30, 2009phpComments

If you want to perform any time sensitive processing without the user waiting on you, you can use the following technique.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<?php
ob_start();

// Perform usual tasks

$response = ob_get_clean();
header("Connection: close");
header("Content-Length: " . strlen($response));
echo $response;
flush();

// Perform time sensitive tasks
// The users web browser is done loading
// To test: sleep(15);

?>

This is useful for situations where an “as-it happens” action must be performed, but you can’t afford to make the user wait.

For example, every time a user logs in to your website you want to tweet the action in twitter. This method will ensure an instantaneous update while removing the complexity of using a background task to manage these actions.

Beware if you use ob_gzhandler() as your callback function for ob_start(), there are encoding issues.

Personally I don’t really like this solution because of the risk of the users browser ignoring the headers and waiting anyway. I’d rather keep these types of operations separate from the usual page loads.

Tags:

Why does JPEG rock the web?

November 14, 2008webComments

In short, because it is the ideal format for most situations.

This webcomic summarizes the views of many developers when comes the question of what image formats to use on the web.

For those who don’t get the joke, JPEG is a lossy format, meaning that each time the file is edited there is a degradation in quality. That’s why the JPEG side of the comic contains visual artifacts.

JPEG’s popularity is helped by the fact that it is the default image file format of most digital cameras, making it the most common image format on the web.

I love JPEG’s too, but I understand that they are not the solution to every problem. In general I believe its a simple solution to determine which format to use.

  • If your image has many colors (such as pictures) I recommend using JPEG.
  • If small file size is important or your image has less colors (like icons, buttons) I recommending trying PNG8, which produces a 256 color PNG. As points out Stoyan Stefanov, the human eye has difficulty telling the difference between 200 and 1000 colors. If it looks good then go for it.

I recently setup a command line solution for converting and compressing images from JPEG to PNG8 using ImageMagick and PNGCrush.

1
2
convert ~/jpgs/file.jpg PNG8:~/pngs/file.png
pngcrush -rem gAMA -rem cHRM -rem iCCP -rem sRGB ~/pngs/file.png ~/crushed/file.png

By also stripping out the color correction I was able to reduce the size on average by 50% for 290×290 image sizes without any visible loss of quality. This can give a real front-end performance boost on web pages that contain many images.

Tags: , ,

Will Spam Ever Disappear?

November 13, 2008webComments

Maybe, but for now forget about it.

Today some servers in California were shutdown for spam related activities effectively dropping the amount of spam sent by roughly 66%. Wow! 2/3 of the worlds spam eliminated in one swift blow.

Unfortunately for us, this is but a mild setback for spammers. They will regroup and find a new home to host their activities.

It is still unsure if the hosting provider, McColo Corporation, will even be held legally responsible even though they are known as being friends of the bad guys, specifically with their involvement in botnets.

A study recently showed that 1 out of 12.5 million spam emails were responded to. This corresponds to 8×10^-8% response rate. Not worth it for spammers you think? The study estimates that this corresponds to revenue of 7,000$ per day. Its fair to say the quantity comes before quality.

There seems to be money to make and until this is no longer true spam will stay with us.

For some reason or another this news reminded me of a great Woody Allen quote from Annie Hall:

[In California]
Annie Hall: It’s so clean out here.
Alvy Singer: That’s because they don’t throw their garbage away, they turn it into television shows.

Tags: ,