<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Pumping Up Your Applications with Xapian Full-Text Search</title>
	<atom:link href="http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/</link>
	<description>The Ramblings of a Freelance Software Developer</description>
	<pubDate>Sun, 12 Oct 2008 13:41:11 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: dperales</title>
		<link>http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-6804</link>
		<dc:creator>dperales</dc:creator>
		<pubDate>Sat, 13 Sep 2008 00:25:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-6804</guid>
		<description>hey, i got a problem, it seems that when i do "author:john wayne", "john wayne" is not treated as a whole term, 
 i tried "  author:'john wayne' ", but nothing. what could be the possible problems? .

i have to add that when i retrieve all terms, "john wayne" is showed as a whole,and "author:john" is not returning anything.</description>
		<content:encoded><![CDATA[<p>hey, i got a problem, it seems that when i do &#8220;author:john wayne&#8221;, &#8220;john wayne&#8221; is not treated as a whole term,<br />
 i tried &#8221;  author:&#8217;john wayne&#8217; &#8220;, but nothing. what could be the possible problems? .</p>
<p>i have to add that when i retrieve all terms, &#8220;john wayne&#8221; is showed as a whole,and &#8220;author:john&#8221; is not returning anything.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: david petar novakovic: attempted axiomatisation</title>
		<link>http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-6713</link>
		<dc:creator>david petar novakovic: attempted axiomatisation</dc:creator>
		<pubDate>Fri, 04 Jan 2008 10:20:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-6713</guid>
		<description>&lt;strong&gt;Installing xapian and its bindings on OS X 10.5 Leopard...&lt;/strong&gt;

Well I&#8217;ve had an interesting few days where I got so frustrated with Leopard that I switched back to Tiger. Upon trying to install many libs that I need in Tiger I realised that Leopard is actually great for developers and had to switch back to i...</description>
		<content:encoded><![CDATA[<p><strong>Installing xapian and its bindings on OS X 10.5 Leopard&#8230;</strong></p>
<p>Well I&#8217;ve had an interesting few days where I got so frustrated with Leopard that I switched back to Tiger. Upon trying to install many libs that I need in Tiger I realised that Leopard is actually great for developers and had to switch back to i&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sridhar Ratnakumar</title>
		<link>http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-6695</link>
		<dc:creator>Sridhar Ratnakumar</dc:creator>
		<pubDate>Fri, 26 Oct 2007 08:02:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-6695</guid>
		<description>&lt;code&gt;stem_word&lt;/code&gt; is &lt;a href="http://www.xapian.org/docs/deprecation.html" rel="nofollow"&gt;deprecated&lt;/a&gt;. Replace that line with &lt;code&gt;stemmer(term.group()),&lt;/code&gt;.</description>
		<content:encoded><![CDATA[<p><code>stem_word</code> is <a href="http://www.xapian.org/docs/deprecation.html" rel="nofollow">deprecated</a>. Replace that line with <code>stemmer(term.group()),</code>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Richard Boulton</title>
		<link>http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-5675</link>
		<dc:creator>Richard Boulton</dc:creator>
		<pubDate>Tue, 22 May 2007 06:28:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-5675</guid>
		<description>Just to add to the comments about thread safety - while a Xapian database can indeed only safely be accessed from one thread at a time, we (the Xapian developers) have ensured that the cost of creating a Xapian database object is very low (it basically involves opening a few files, and no complex processing), and there are no global variables used or anything like that, so it is perfectly reasonable to open the database for each search request and handle each request in a separate thread.  I've built many multi-threaded applications in Python which search Xapian databases.

You are restricted to having only one instance of a database open for writing at a time, but as many instances of a database for reading as you like can be open (from a single process, or from many).

We intentionally removed thread handling from the innards of Xapian because it was error-prone and imposed an overhead on all searches, whether single or multi threaded.</description>
		<content:encoded><![CDATA[<p>Just to add to the comments about thread safety - while a Xapian database can indeed only safely be accessed from one thread at a time, we (the Xapian developers) have ensured that the cost of creating a Xapian database object is very low (it basically involves opening a few files, and no complex processing), and there are no global variables used or anything like that, so it is perfectly reasonable to open the database for each search request and handle each request in a separate thread.  I&#8217;ve built many multi-threaded applications in Python which search Xapian databases.</p>
<p>You are restricted to having only one instance of a database open for writing at a time, but as many instances of a database for reading as you like can be open (from a single process, or from many).</p>
<p>We intentionally removed thread handling from the innards of Xapian because it was error-prone and imposed an overhead on all searches, whether single or multi threaded.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris Lu</title>
		<link>http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-4210</link>
		<dc:creator>Chris Lu</dc:creator>
		<pubDate>Sun, 22 Apr 2007 05:17:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-4210</guid>
		<description>I would argue Lucene is faster, because it is thread-safe. The concurrent searching threads Xapian can handle is limited by the number of process you can create before-hand, while Lucene was not.

And the mis-conception that Java == slow is so last century...

Chris Lu
http://www.dbsight.net</description>
		<content:encoded><![CDATA[<p>I would argue Lucene is faster, because it is thread-safe. The concurrent searching threads Xapian can handle is limited by the number of process you can create before-hand, while Lucene was not.</p>
<p>And the mis-conception that Java == slow is so last century&#8230;</p>
<p>Chris Lu<br />
<a href="http://www.dbsight.net" rel="nofollow">http://www.dbsight.net</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: sally</title>
		<link>http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-4206</link>
		<dc:creator>sally</dc:creator>
		<pubDate>Sun, 22 Apr 2007 02:12:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-4206</guid>
		<description>Hello Nadav,

Thanks for doing a great job of Xapian explanation.

Can you please tell me what is the merit of using Xapian as against say lucene or some other full text search engine.

I'm in the process of looking for a robust full text serach engine, that is very scalable and my product will be using the engine using apis.

Do you have any opinions? I was leaning towards lucene, but may be I should look further (My concern for Lucene is that since it is Java based it may not be fast enought?)

Thanks

Sally</description>
		<content:encoded><![CDATA[<p>Hello Nadav,</p>
<p>Thanks for doing a great job of Xapian explanation.</p>
<p>Can you please tell me what is the merit of using Xapian as against say lucene or some other full text search engine.</p>
<p>I&#8217;m in the process of looking for a robust full text serach engine, that is very scalable and my product will be using the engine using apis.</p>
<p>Do you have any opinions? I was leaning towards lucene, but may be I should look further (My concern for Lucene is that since it is Java based it may not be fast enought?)</p>
<p>Thanks</p>
<p>Sally</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nadav Samet</title>
		<link>http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-4114</link>
		<dc:creator>Nadav Samet</dc:creator>
		<pubDate>Thu, 19 Apr 2007 05:58:07 +0000</pubDate>
		<guid isPermaLink="false">http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-4114</guid>
		<description>Hi Mark,

Great question. The index() gets a period (say 1 hour) argument and then operates on all documents changed in that period. So it is meant to run as a cron job every hour.

I believe that on a high load system, indexing documents in batches is faster than indexing on every change.

If we would run the indexer in the same process of the search server, it will block searches while it indexes (and vice versa). But setting the indexer on its own process, can be an alternative to cron, but I'm not sure what the benefits are.</description>
		<content:encoded><![CDATA[<p>Hi Mark,</p>
<p>Great question. The index() gets a period (say 1 hour) argument and then operates on all documents changed in that period. So it is meant to run as a cron job every hour.</p>
<p>I believe that on a high load system, indexing documents in batches is faster than indexing on every change.</p>
<p>If we would run the indexer in the same process of the search server, it will block searches while it indexes (and vice versa). But setting the indexer on its own process, can be an alternative to cron, but I&#8217;m not sure what the benefits are.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark</title>
		<link>http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-4088</link>
		<dc:creator>Mark</dc:creator>
		<pubDate>Wed, 18 Apr 2007 18:04:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-4088</guid>
		<description>Is there any reason why you didn't implement the indexer on Twisted? 
The way you implemented it, there is no incremental update to the index.  I am trying to index as objects are created/updated, thinking of using Twisted.  Any thoughts on your part about how I may approach this.  

Thank you.</description>
		<content:encoded><![CDATA[<p>Is there any reason why you didn&#8217;t implement the indexer on Twisted?<br />
The way you implemented it, there is no incremental update to the index.  I am trying to index as objects are created/updated, thinking of using Twisted.  Any thoughts on your part about how I may approach this.  </p>
<p>Thank you.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: thesamet</title>
		<link>http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-3602</link>
		<dc:creator>thesamet</dc:creator>
		<pubDate>Sun, 01 Apr 2007 22:18:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-3602</guid>
		<description>Hi Peter,

Great question. The short answer is: Yes, only one search query is handled in one time, but &lt;a href="http://www.thesamet.com/multitasking.png" rel="nofollow"&gt;sometimes multiple threads don't help too much&lt;/a&gt;. Here is the long answer.

Let's start with a word about Twisted servers. In general, single-threaded servers can handle many requests at the same time. The basic idea is that their execution flow is event-driven, instead of thread-driven (or system scheduler driven). All events are handled on the same thread, and the assumption is that handling an event is a short operation.

As Xapian is not thread-safe by design, there is no way to interleave accesses to the database, and each query is handled from start to finish, regardless of the server's implementation.

So if needed, this solution can be scaled by starting several instances of the search server (possibly even on several machines) and load-balancing. 

As experience had taught me, you never become Google overnight. It's best to stick with the simplest thing as long as it works.

I use &lt;a href="http://txfx.net/code/wordpress/subscribe-to-comments/" rel="nofollow"&gt;Subscribe to Comments plug-in&lt;/a&gt;.

Nadav</description>
		<content:encoded><![CDATA[<p>Hi Peter,</p>
<p>Great question. The short answer is: Yes, only one search query is handled in one time, but <a href="http://www.thesamet.com/multitasking.png" rel="nofollow">sometimes multiple threads don&#8217;t help too much</a>. Here is the long answer.</p>
<p>Let&#8217;s start with a word about Twisted servers. In general, single-threaded servers can handle many requests at the same time. The basic idea is that their execution flow is event-driven, instead of thread-driven (or system scheduler driven). All events are handled on the same thread, and the assumption is that handling an event is a short operation.</p>
<p>As Xapian is not thread-safe by design, there is no way to interleave accesses to the database, and each query is handled from start to finish, regardless of the server&#8217;s implementation.</p>
<p>So if needed, this solution can be scaled by starting several instances of the search server (possibly even on several machines) and load-balancing. </p>
<p>As experience had taught me, you never become Google overnight. It&#8217;s best to stick with the simplest thing as long as it works.</p>
<p>I use <a href="http://txfx.net/code/wordpress/subscribe-to-comments/" rel="nofollow">Subscribe to Comments plug-in</a>.</p>
<p>Nadav</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peter</title>
		<link>http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-3600</link>
		<dc:creator>Peter</dc:creator>
		<pubDate>Sun, 01 Apr 2007 21:19:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.thesamet.com/blog/2007/02/04/pumping-up-your-applications-with-xapian-full-text-search/#comment-3600</guid>
		<description>When you say: "single-threaded implementation, the Twisted framework makes it extremely", do you mean only one search query can be handle at any one time?  So there is no concurrency, how does this scalable in internet world?  Also, what plugin are you using for this blog, that allows "subscribe to comments via email"?

Thank you</description>
		<content:encoded><![CDATA[<p>When you say: &#8220;single-threaded implementation, the Twisted framework makes it extremely&#8221;, do you mean only one search query can be handle at any one time?  So there is no concurrency, how does this scalable in internet world?  Also, what plugin are you using for this blog, that allows &#8220;subscribe to comments via email&#8221;?</p>
<p>Thank you</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.263 seconds -->
