Monthly Archive for April, 2009
This is the third edition of my weekly PECL package series. Check out my Scream article as well as my Sphinx article to learn about these extensions.
If you have ever inherited spaghetti code or worse, written spaghetti code, the following article is for you. This article is an introduction to the Inclued PECL extension. It helps answer the common question “Where is this include coming from?”, something that I’ve asked myself before when working on some projects.
This extension works by overriding an opcode in Zend, allowing it to log information regarding which files are being included, and from where. This information can be collected using a single function named inclued_get_data() or by setting inclued.dumpdir in php.ini to dump the data of each request.
The final step involves graphing this data to get a view of the include hierarchy. This can be done by converting the JSON encoded output into a dot language file, and then converting it to an image or viewing it with an application such as Graphviz.
To start, we need to install the inclued PECL extension:
1 2 3 4
| mbpro:~ chehodgins$ sudo pecl install inclued-alpha
downloading inclued-0.1.0.tar ...
[...]
install ok: channel://pecl.php.net/inclued-0.1.0 |
And add to php.ini and restart apache:
1 2 3
| extension=inclued.so
inclued.enabled=1
inclued.dumpdir=/tmp |
Next, in our web browser we load the page that we wish to analyze. A file named inclued.*.* will be added to /tmp. We will convert to this to a dot file using the gengraph.php script that is included in the PECL package:
1 2 3
| mbpro:tmp chehodgins$ php /usr/local/lib/php/gengraph.php -i inclued.00196.2
Written inclued.out.dot...
To generate images: dot -Tpng -o inclued.png inclued.out.dot |
Now we have the choice to either create a png using the dot command or simply opening with Graphviz.
1
| mbpro:tmp chehodgins$ dot -Tpng -o ~/Documents/inclued.png inclued.out.dot |
And a super nice graph of the includes is generated as an image. Here is the graph of the includes in Wordpress (click to view fullscreen):

Inclued run on Wordpress
Notice that there are a lot of includes, but in general there appears to be order. Now let’s check out osCommerce:

Inclued on osCommerce
This also looks decent. What about magento?

Inclued in Magento
Holy crap, thats a lot of includes!
In conclusion, the inclued PECL extension can be useful in many situations, from trying to understand how and
why a file is being included, to reorganizing your includes by seeing the dependencies. If anything, it can be an
easy way to show off your application/framework’s include structure.
Tags: inclued, pecl, php
This is the second edition of my weekly PECL package series. See last week’s post to learn about the Scream extension.
This week’s topic will be on Full-Text searching using Sphinx, specifically with the PHP client extension written by Antony Dovgal and released as a 1.0 PECL package in late January 2009.
Background
Sphinx is an open source full-text search engine. It provides an alternative to MySQL full-text searching. Its main features include high search speed (avg query is under 0.1 sec on 2-4 GB text collections), high scalability (up to 100 GB of text, up to 100 M documents on a single CPU) and most importantly, native support for MySQL (MyISAM and InnoDB) and PostgreSQL . It has also proven its worth considering that it is used on web sites such as Craigslist, Netlog, and The Pirate Bay.
Sphinx Install
There are two methods of using Sphinx in PHP: Using the PHP API or using the native libaries with the PECL package. We will of course be covering the PECL version
Installation a basic version of sphinx is easy:
1 2 3
| mbpro:sphinx-0.9.8.1 chehodgins$ sudo ./configure --prefix /usr/local/share/sphinx --with-mysql /usr/local/share/mysql/
mbpro:sphinx-0.9.8.1 chehodgins$ sudo make
mbpro:sphinx-0.9.8.1 chehodgins$ sudo make install |
Next, using the sphinx.conf configuration file a data source and index must be defined. I have added a table named `track` in my MySQL database with 7.8 million track names.
1 2
| mbpro:etc chehodgins$ sudo cp sphinx.conf.dist sphinx.conf
mbpro:etc chehodgins$ sudo vi sphinx.conf |
In sphinx.conf:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
| source track
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass = root
sql_db = test
sql_port = 3306
sql_sock = /Applications/MAMP/tmp/mysql/mysql.sock
sql_query_pre = SET NAMES utf8
# the data to be indexed
sql_query = SELECT id, name, length, year FROM track;
}
index track_index
{
# document source(s) to index
source = track
# index files path and file name, without extension
# mandatory, path must be writable, extensions will be auto-appended
path = /usr/local/share/sphinx/var/data/track_index
min_word_len = 1
} |
We can now index our data and start the sphinx server:
1 2 3 4 5 6 7 8
| mbpro:sphinx chehodgins$ sudo bin/indexer track_index
mbpro:sphinx chehodgins$ sudo /usr/local/share/sphinx/bin/searchd
Sphinx 0.9.8.1-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file '/usr/local/share/sphinx/etc/sphinx.conf'...
creating server socket on 0.0.0.0:3312 |
The data indexing took 1 minute on 7.8 million rows (204 MB of data) at a speed of 116655.16 docs/sec! Note that indexing should be done on regular intervals, depending on how fresh the data is required to be.
PHP/PECL Install
With our data indexed we must now get access to the Sphinx API. This is done using the Sphinx PECL extension. Before installating the PECL package we must install libsphinxclient, which is included in the Sphinx distribution:
1 2 3
| mbpro:libsphinxclient chehodgins$ cd sphinx-0.9.8.1/api/libsphinxclient/
mbpro:libsphinxclient chehodgins$ LIBTOOLIZE=glibtoolize sudo ./buildconf.sh
mbpro:libsphinxclient chehodgins$ sudo ./configure && make install |
Now we are ready to install the PECL package:
1
| mbpro:~ chehodgins$ sudo pecl install sphinx |
Now it must be added to php.ini:
Restart apache and check that it is installed:

Sphinx in phpinfo
Now it’s simply a matter of using the Sphinx function reference on php.net to query your dataset.
1 2 3 4 5 6 7 8 9 10 11 12 13
| <?php
$sphinx = new SphinxClient();
$sphinx->setServer("localhost", 3312);
$sphinx->setMatchMode(SPH_MATCH_ALL);
$sphinx->setMaxQueryTime(500); // Limit query to 500 milliseconds
$sphinx->setLimits(0, 10, 1000); // return first 10 results
$result = $sphinx->query('Ride the Lightning');
var_dump($result['matches']);
echo $result['total_found'] . ' total results found.';
?> |
Thanks to the Sphinx log, you can see that the query executed in .042 seconds:
1
| [Sat Apr 18 01:33:58.878 2009] 0.042 sec [all/0/rel 160 (0,10)] [*] Ride the Lightning |
Conclusion
The example was kept simple, but queries can be refined even more using SQL-like methods of the Sphinx API. Notably, setGroupBy() will do the equivalent of GROUP BY and ORDER BY. Also, setFilter() will add extra filtering on other columns in the dataset.
This is the tip of the iceberg of the different ways that Sphinx can be used. The easy integration with MySQL combined with the ease of setup make it a logical next step when MySQL’s Full-Text indexing performance degrades. It also appears capable of scaling to the needs of the top-tiered websites out there. As such, I would seriously consider Sphinx when looking for solutions to your searching needs.
Finally, it would be worthwhile to explore alternatives such as Lucene (Java), Solr (Java), and Marjory (PHP).
Tags: pecl, php, sphinx
This is the first of what is planned to be a weekly post on a more or less random PECL package. The idea is for me to get to know some PECL packages in more detail and for you to get to know some PECL packages in more detail – without losing your precious time.
For the first edition of this series I will cover the relatively new PECL package aptly named Scream. The purpose of this extension is to, well, scream. It will disable the the silence operator (@) so that any hidden errors will still be shown. After this, you may scream at whoever used the silence operator in the first place – thus the name Scream (Just kidding?).
Lets get started…
1
| che-hodginss-macbook-pro:~ chehodgins$ sudo pecl install scream-alpha |
After a few minutes…
1 2 3
| Build process completed successfully
Installing '/usr/local/lib/php/extensions/no-debug-non-zts-20060613/scream.so'
install ok: channel://pecl.php.net/scream-0.1.0 |
Great, now add it to php.ini and restart apache:
1 2
| extension=scream.so
scream.enabled=1 |
Check phpinfo and we are ready to go:

I'm new to macs and just discovered taking screenshots of portions of the screen (Apple key ⌘ + Shift + 4). Very cool.
Now we will borrow some code from some open source projects that use the silence operator and see what happens.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
| ini_set('display_errors', 1);
error_reporting(E_ALL | E_STRICT);
echo 'starting... ';
// Initialize
$host = $user = $password = $sock = $port = $errno = $errstr = $response = '';
// From Joomla!
if (!($resource = @mysql_connect( $host, $user, $password, true ))) {
// ...
}
// From Wordpress
$response .= @ fread ( $sock, 8192 );
// From Joomla!
@ dl('bz2.so');
// From Wordpress
$sock = @fsockopen($host, $port, $errno, $errstr);
echo "done.\n";
?> |
With scream.enabled = 0 we get this lovely output:
1 2 3
| che-hodginss-macbook-pro:www chehodgins$ php -f scream.php
starting... done.
che-hodginss-macbook-pro:www chehodgins$ |
And with scream.enabled = 1:
1 2 3 4 5 6 7 8 9 10
| che-hodginss-macbook-pro:www chehodgins$ php -f scream.php
starting...
Warning: fread(): supplied argument is not a valid stream resource in /Users/chehodgins/www/scream.php on line 17
Warning: dl(): Unable to load dynamic library '/Applications/MAMP/bin/php5/lib/php/extensions/no-debug-non-zts-20050922/bz2.so' - (null) in /Users/chehodgins/www/scream.php on line 20
Warning: fsockopen() expects parameter 2 to be long, string given in /Users/chehodgins/www/scream.php on line 23
done.
che-hodginss-macbook-pro:www chehodgins$ |
It is obvious at this moment that in general it is not advisable to use the silence operator. Most PHP programmers have been burnt by this a few times and usually will be much more harsh towards those think this feature is useful. I can recall spending lots of time bug hunting before finding an @ which lead me to a simple error. Its a painful experience, don’t do it.
As a programmer you may already steer clear of the silence operator but much code is inherited. Because of the simplicity of the silence operator it can be hard to track down where it is used in your code. Try searching for ‘@’ in one of your projects, how many thousands of results do you get? That is one reason to install this on your dev box and find those tough bugs before they hit production.
And just in case you are still not convinced, check out Five reasons the shut-up operator (@) should be avoided by Derick Rethans
Tags: pecl, php