<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Che Hodgins &#187; sphinx</title>
	<atom:link href="http://www.chehodgins.com/category/sphinx/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.chehodgins.com</link>
	<description>Musings on Web Development</description>
	<lastBuildDate>Tue, 24 Nov 2009 02:39:52 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Search improvements using Sphinx, MySQL and PECL</title>
		<link>http://www.chehodgins.com/php/search-improvements-using-sphinx-mysql-and-pecl/</link>
		<comments>http://www.chehodgins.com/php/search-improvements-using-sphinx-mysql-and-pecl/#comments</comments>
		<pubDate>Sat, 18 Apr 2009 13:21:42 +0000</pubDate>
		<dc:creator>chehodgins</dc:creator>
				<category><![CDATA[pecl]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[sphinx]]></category>

		<guid isPermaLink="false">http://www.chehodgins.com/?p=164</guid>
		<description><![CDATA[This is the second edition of my weekly PECL package series. See last week&#8217;s post to learn about the Scream extension.
This week&#8217;s topic will be on Full-Text searching using Sphinx, specifically with the PHP client extension written by Antony Dovgal and released as a 1.0 PECL package in late January 2009.
Background
Sphinx is an open source [...]]]></description>
			<content:encoded><![CDATA[<p><strong>This is the second edition of my weekly PECL package series. See <a href="http://www.chehodgins.com/programming/php/weekly-pecl-package-scream/">last week&#8217;s post</a> to learn about the Scream extension.</strong></p>
<p>This week&#8217;s topic will be on Full-Text searching using Sphinx, specifically with the <a href="http://pecl.php.net/package/sphinx">PHP client extension</a> written by Antony Dovgal and released as a 1.0 PECL package in late January 2009.</p>
<p><strong>Background</strong></p>
<p>Sphinx is an open source full-text search engine. It provides an alternative to MySQL full-text searching. Its main features include high search speed (avg query is under 0.1 sec on 2-4 GB text collections), high scalability (up to 100 GB of text, up to 100 M documents on a single CPU) and most importantly, <a href="http://www.sphinxsearch.com/docs/current.html#intro">native support for MySQL</a> (MyISAM and InnoDB) and PostgreSQL . It has also proven its worth considering that it is used on web sites such as <a href="http://jeremy.zawodny.com/blog/archives/010869.html">Craigslist</a>, <a href="http://www.slideshare.net/_jayme/scaling-optimizing-search-on-netlog-presentation">Netlog</a>, and <a href="http://www.sphinxsearch.com/powered.html">The Pirate Bay</a>.</p>
<p><strong>Sphinx Install</strong></p>
<p>There are two methods of using Sphinx in PHP: Using the PHP API or using the native libaries with the PECL package. We will of course be covering the PECL version <img src='http://www.chehodgins.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Installation a basic version of sphinx is easy:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br /></div></td><td><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">mbpro:sphinx-0.9.8.1 chehodgins$ <span style="color: #c20cb9; font-weight: bold;">sudo</span> .<span style="color: #000000; font-weight: bold;">/</span>configure <span style="color: #660033;">--prefix</span> <span style="color: #000000; font-weight: bold;">/</span>usr<span style="color: #000000; font-weight: bold;">/</span>local<span style="color: #000000; font-weight: bold;">/</span>share<span style="color: #000000; font-weight: bold;">/</span>sphinx <span style="color: #660033;">--with-mysql</span> <span style="color: #000000; font-weight: bold;">/</span>usr<span style="color: #000000; font-weight: bold;">/</span>local<span style="color: #000000; font-weight: bold;">/</span>share<span style="color: #000000; font-weight: bold;">/</span>mysql<span style="color: #000000; font-weight: bold;">/</span><br />
mbpro:sphinx-0.9.8.1 chehodgins$ <span style="color: #c20cb9; font-weight: bold;">sudo</span> <span style="color: #c20cb9; font-weight: bold;">make</span><br />
mbpro:sphinx-0.9.8.1 chehodgins$ <span style="color: #c20cb9; font-weight: bold;">sudo</span> <span style="color: #c20cb9; font-weight: bold;">make</span> <span style="color: #c20cb9; font-weight: bold;">install</span></div></td></tr></tbody></table></div>
<p>Next, using the sphinx.conf configuration file a data source and index must be defined. I have added a table named `track` in my MySQL database with 7.8 million track names.</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br /></div></td><td><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">mbpro:etc chehodgins$ <span style="color: #c20cb9; font-weight: bold;">sudo</span> <span style="color: #c20cb9; font-weight: bold;">cp</span> sphinx.conf.dist sphinx.conf<br />
mbpro:etc chehodgins$ <span style="color: #c20cb9; font-weight: bold;">sudo</span> <span style="color: #c20cb9; font-weight: bold;">vi</span> sphinx.conf</div></td></tr></tbody></table></div>
<p>In sphinx.conf:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;width:435px;height:500px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br />14<br />15<br />16<br />17<br />18<br />19<br />20<br />21<br />22<br />23<br />24<br />25<br />26<br />27<br />28<br />29<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">source track<br />
{<br />
type                                    = mysql<br />
<br />
sql_host                                = localhost<br />
sql_user                                = root<br />
sql_pass                                = root<br />
sql_db                                  = test<br />
sql_port                                = 3306<br />
<br />
sql_sock                                = /Applications/MAMP/tmp/mysql/mysql.sock<br />
sql_query_pre                 = SET NAMES utf8<br />
<br />
# the data to be indexed<br />
sql_query  = SELECT id, name, length, year FROM track;<br />
<br />
}<br />
<br />
index track_index<br />
{<br />
# document source(s) to index<br />
source                  = track<br />
<br />
# index files path and file name, without extension<br />
# mandatory, path must be writable, extensions will be auto-appended<br />
path                    = /usr/local/share/sphinx/var/data/track_index<br />
<br />
min_word_len            = 1<br />
}</div></td></tr></tbody></table></div>
<p>We can now index our data and start the sphinx server:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br /></div></td><td><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">mbpro:sphinx chehodgins$ <span style="color: #c20cb9; font-weight: bold;">sudo</span> bin<span style="color: #000000; font-weight: bold;">/</span>indexer track_index<br />
mbpro:sphinx chehodgins$ <span style="color: #c20cb9; font-weight: bold;">sudo</span> <span style="color: #000000; font-weight: bold;">/</span>usr<span style="color: #000000; font-weight: bold;">/</span>local<span style="color: #000000; font-weight: bold;">/</span>share<span style="color: #000000; font-weight: bold;">/</span>sphinx<span style="color: #000000; font-weight: bold;">/</span>bin<span style="color: #000000; font-weight: bold;">/</span>searchd<br />
<br />
Sphinx 0.9.8.1-release <span style="color: #7a0874; font-weight: bold;">&#40;</span>r1533<span style="color: #7a0874; font-weight: bold;">&#41;</span><br />
Copyright <span style="color: #7a0874; font-weight: bold;">&#40;</span>c<span style="color: #7a0874; font-weight: bold;">&#41;</span> 2001-2008, Andrew Aksyonoff<br />
<br />
using config <span style="color: #c20cb9; font-weight: bold;">file</span> <span style="color: #ff0000;">'/usr/local/share/sphinx/etc/sphinx.conf'</span>...<br />
creating server socket on 0.0.0.0:<span style="color: #000000;">3312</span></div></td></tr></tbody></table></div>
<p>The data indexing took 1 minute on 7.8 million rows (204 MB of data) at a speed of 116655.16 docs/sec! Note that indexing should be done on regular intervals, depending on how fresh the data is required to be. </p>
<p><strong>PHP/PECL Install</strong></p>
<p>With our data indexed we must now get access to the Sphinx API. This is done using the Sphinx PECL extension. Before installating the PECL package we must install libsphinxclient, which is included in the Sphinx distribution:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br /></div></td><td><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">mbpro:libsphinxclient chehodgins$ <span style="color: #7a0874; font-weight: bold;">cd</span> sphinx-0.9.8.1<span style="color: #000000; font-weight: bold;">/</span>api<span style="color: #000000; font-weight: bold;">/</span>libsphinxclient<span style="color: #000000; font-weight: bold;">/</span><br />
mbpro:libsphinxclient chehodgins$ <span style="color: #007800;">LIBTOOLIZE</span>=glibtoolize <span style="color: #c20cb9; font-weight: bold;">sudo</span> .<span style="color: #000000; font-weight: bold;">/</span>buildconf.sh<br />
mbpro:libsphinxclient chehodgins$ <span style="color: #c20cb9; font-weight: bold;">sudo</span> .<span style="color: #000000; font-weight: bold;">/</span>configure <span style="color: #000000; font-weight: bold;">&amp;&amp;</span> <span style="color: #c20cb9; font-weight: bold;">make</span> <span style="color: #c20cb9; font-weight: bold;">install</span></div></td></tr></tbody></table></div>
<p>Now we are ready to install the PECL package:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">mbpro:~ chehodgins$ <span style="color: #c20cb9; font-weight: bold;">sudo</span> pecl <span style="color: #c20cb9; font-weight: bold;">install</span> sphinx</div></td></tr></tbody></table></div>
<p>Now it must be added to php.ini:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #007800;">extension</span>=sphinx.so</div></td></tr></tbody></table></div>
<p>Restart apache and check that it is installed:</p>
<div id="attachment_165" class="wp-caption aligncenter" style="width: 645px"><a href="http://www.chehodgins.com/wp-content/uploads/2009/04/picture-2.png"><img class="size-full wp-image-165" title="sphinx in phpinfo" src="http://www.chehodgins.com/wp-content/uploads/2009/04/picture-2.png" alt="Sphinx in phpinfo" width="635" height="128" /></a><p class="wp-caption-text">Sphinx in phpinfo</p></div>
<p>Now it&#8217;s simply a matter of using the <a title="Sphinx api reference" href="http://php.net/sphinx">Sphinx function reference</a> on php.net to query your dataset.</p>
<div class="codecolorer-container php default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br /></div></td><td><div class="php codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #000000; font-weight: bold;">&lt;?php</span><br />
<br />
<span style="color: #000088;">$sphinx</span> <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> SphinxClient<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
<span style="color: #000088;">$sphinx</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">setServer</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;localhost&quot;</span><span style="color: #339933;">,</span> 3312<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
<span style="color: #000088;">$sphinx</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">setMatchMode</span><span style="color: #009900;">&#40;</span>SPH_MATCH_ALL<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
<span style="color: #000088;">$sphinx</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">setMaxQueryTime</span><span style="color: #009900;">&#40;</span>500<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// Limit query to 500 milliseconds</span><br />
<span style="color: #000088;">$sphinx</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">setLimits</span><span style="color: #009900;">&#40;</span>0<span style="color: #339933;">,</span> 10<span style="color: #339933;">,</span> 1000<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// return first 10 results</span><br />
<br />
<span style="color: #000088;">$result</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$sphinx</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">query</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Ride the Lightning'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
<span style="color: #990000;">var_dump</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$result</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'matches'</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
<br />
<span style="color: #b1b100;">echo</span> <span style="color: #000088;">$result</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'total_found'</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">.</span> <span style="color: #0000ff;">' total results found.'</span><span style="color: #339933;">;</span><br />
<span style="color: #000000; font-weight: bold;">?&gt;</span></div></td></tr></tbody></table></div>
<p>Thanks to the Sphinx log, you can see that the query executed in .042 seconds:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #7a0874; font-weight: bold;">&#91;</span>Sat Apr <span style="color: #000000;">18</span> 01:<span style="color: #000000;">33</span>:<span style="color: #000000;">58.878</span> <span style="color: #000000;">2009</span><span style="color: #7a0874; font-weight: bold;">&#93;</span> <span style="color: #000000;">0.042</span> sec <span style="color: #7a0874; font-weight: bold;">&#91;</span>all<span style="color: #000000; font-weight: bold;">/</span><span style="color: #000000;">0</span><span style="color: #000000; font-weight: bold;">/</span>rel <span style="color: #000000;">160</span> <span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #000000;">0</span>,<span style="color: #000000;">10</span><span style="color: #7a0874; font-weight: bold;">&#41;</span><span style="color: #7a0874; font-weight: bold;">&#93;</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span><span style="color: #000000; font-weight: bold;">*</span><span style="color: #7a0874; font-weight: bold;">&#93;</span> Ride the Lightning</div></td></tr></tbody></table></div>
<p><strong>Conclusion</strong></p>
<p>The example was kept simple, but queries can be refined even more using SQL-like methods of the Sphinx API. Notably, setGroupBy() will do the equivalent of GROUP BY and ORDER BY. Also, setFilter() will add extra filtering on other columns in the dataset.</p>
<p>This is the tip of the iceberg of the different ways that Sphinx can be used. The easy integration with MySQL combined with the ease of setup make it a logical next step when MySQL&#8217;s Full-Text indexing <a href="http://pooteeweet.org/blog/1359">performance degrades</a>. It also appears capable of scaling to the needs of the top-tiered websites out there. As such, I would seriously consider Sphinx when looking for solutions to your searching needs.</p>
<p>Finally, it would be worthwhile to explore alternatives such as <a href="http://lucene.apache.org/java/docs/">Lucene</a> (Java), <a title="Solr search" href="http://http://lucene.apache.org/solr/">Solr</a> (Java), and <a href="http://code.google.com/p/marjory/">Marjory</a> (PHP).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.chehodgins.com/php/search-improvements-using-sphinx-mysql-and-pecl/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
