Che Hodgins // Musings on Web Development

Free and Fast Geolocation in PHP

Thursday, May 21, 2009geolocation, pecl, php

Geo* (as I call them) are the web technologies that provide a link between online content and Earth’s geography. Examples includes Geocoding (finding latitude/longitude based on street addresses), Geotagging (tagging media with latitude/longitude coordinates), and Geolocation (finding latitude/longitude of a computer).

Geolocation is a particularly cool technique because it allows you to estimate a person’s geographic location, thus allowing you to provide a custom tailored experience on your website, among other things. This can be useful as much as it can be annoying. There are several methods of Geolocation, some as simple as asking the user where they are located. This article focuses on adding IP based Geolocation to your PHP website for free all the while keeping it fast.

Problems

If IP addresses are to be used to determine a persons physical location then a few possible problems come to mind:

  • How accurate is the mapping between an IP address and a geographical location?
    • From maxmind.com’s Geolocation service: “99.8% accurate on a country level, 90% accurate on a state level, and 83% accurate for the US within a 25 mile radius.”. Doing some research, the matching is done using either the address of the ISP that owns that IP [link], or by buying the data from websites that ask for users locations [link].
  • What about users behind proxies?
    • Some Geolocation databases flag the IPs of potential anonymous proxy servers.
    • Most proxy servers send X-Forwarded-For and Client-IP headers that you can use.

This is not perfect, but in many cases the approximate geographical location of a user can be inferred.

Demo time

This demo will use the free Geolocation database provided by Maxmind.com. I believe this is the ideal choice for normal (i.e. not Facebook) websites for several reasons:

  • It is free (there is a paid version with higher accuracy)
  • It is fast. They report up to 1 million queries per second on 1 machine.
  • It is extensible. The database can be upgraded to the paid version by just replacing the binary.
  • They like developers. They provide implementations in over 10 different programming languages, with benchmarks.
  • Their website is full of valuable information. They provide benchmarks, an explanation of how they collect their data, and more. I haven’t seen this with any other IP Geolocation services.

There are two options for us PHP developers. The pure PHP library or a PECL package implementing the C library. For reasons that will be discussed below, the PECL package will be used. If you do not want to use a PECL package or are on a hosted server, then you can download the pure PHP classes here.

First, the GeoIP C library must be downloaded (link) and installed. Note that this can be installed on windows as well. No special options are needed to install it:

1
2
3
mbpro:GeoIP-1.4.6 chehodgins$ sudo ./configure
mbpro:GeoIP-1.4.6 chehodgins$ sudo make
mbpro:GeoIP-1.4.6 chehodgins$ sudo make install



Then the PECL package can be installed:

1
2
3
4
mbpro:~ chehodgins$ sudo pecl install geoip
downloading geoip-1.0.7.tar ...
...
You should add "extension=geoip.so" to php.ini



Next, add this extension to php.ini (i.e. extension=geoip.so), restart apache and check out phpinfo():

GeoIP in phpinfo()

GeoIP in phpinfo()



The final step before writing code is to download the actual database. It is updated monthly so remember to stay up to date. The directory that should contain the file is OS dependent, so create a quick php script to see where the directory is:

1
2
3
4
ini_set('display_errors', true);
error_reporting(E_ALL | E_STRICT);
$result = geoip_record_by_name('72.30.81.165');
var_dump($result);



Gives us:

Determine the binary directory


Now save the binary to the directory mentioned in the php warning, reload your script, and the warning should disappear. Let’s try again with some more code:

1
2
3
4
5
6
7
8
ini_set('display_errors', true);
error_reporting(E_ALL | E_STRICT);

$functions = get_extension_funcs('geoip');
var_dump($functions);

$result = geoip_record_by_name('72.30.81.165');
var_dump($result);



Gives:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
array
0 => string 'geoip_database_info' (length=19)
1 => string 'geoip_country_code_by_name' (length=26)
2 => string 'geoip_country_code3_by_name' (length=27)
3 => string 'geoip_country_name_by_name' (length=26)
4 => string 'geoip_continent_code_by_name' (length=28)
5 => string 'geoip_org_by_name' (length=17)
6 => string 'geoip_record_by_name' (length=20)
7 => string 'geoip_id_by_name' (length=16)
8 => string 'geoip_region_by_name' (length=20)
9 => string 'geoip_isp_by_name' (length=17)
10 => string 'geoip_db_avail' (length=14)
11 => string 'geoip_db_get_all_info' (length=21)
12 => string 'geoip_db_filename' (length=17)
13 => string 'geoip_region_name_by_code' (length=25)
14 => string 'geoip_time_zone_by_country_and_region' (length=37)

array
'continent_code' => string 'NA' (length=2)
'country_code' => string 'US' (length=2)
'country_code3' => string 'USA' (length=3)
'country_name' => string 'United States' (length=13)
'region' => string 'CA' (length=2)
'city' => string 'Sunnyvale' (length=9)
'postal_code' => string '94089' (length=5)
'latitude' => float 37.4249000549
'longitude' => float -122.007400513
'dma_code' => int 807
'area_code' => int 408



With only an IP address we can easily get the country, postal code, longitude and latitude, and even the area code of the user.

Performance

I initially thought that the PECL version would outperform the pure PHP version by a small percentage. I was wrong. The PECL version was much faster. Here are some informal benchmarks.

Iterations Total Avg Notes
PECL GeoIP 10,000 0.7s .007ms per request
Pure PHP 10,000 49.2s 4.92ms per request
PECL GeoIP 1 0.08ms 0.08ms per request Typical real world usage
Pure PHP 1 2.4ms 2.4ms per request Typical real world usage

As a validation of my results I benchmarked the pure PHP library being used in a web application and had comparable results to my benchmarks (5.9ms per IP lookup versus the 2.4ms above).

Conclusion

Because of the ease of implementation, the low cost, and the minimal performance losses, there is much to be gained by adding IP Geolocation to your web application. The PECL package is the ideal configuration because it provides a faster experience with less code to maintain. The pure PHP library is none the less still relatively fast and thus still worth it. This is still far from a perfect solution. False positives can occur, anonymous proxies mess everything up, and IP addresses are constantly changing. Also, what about users’ who simply do not want to share their location? There are privacy issues. This is currently a hot topic, with the W3C geolocation API being actively worked on, including the efforts of Mozilla, Opera and others to improve the situation of location awareness on the web, something I am looking forward to it.

More reading:

Interesting Geolocation presentation
GeoIP functions in the PHP manual
Cool Geo* stuff at Y!

  • Digg
  • del.icio.us
  • Facebook
  • Reddit
  • StumbleUpon
  • TwitThis

Tags: , , ,

  • http://www.chehodgins.com/web/advanced-geolocation/ Advanced Geolocation | Che Hodgins

    [...] previously wrote about using IP based Geolocation. Although this method is widely used, the downsides are obvious: [...]

blog comments powered by Disqus