Rob Garrett - Blogs

Welcome to Rob Garrett - Blogs Sign in | Join | Help
in Search
Google

Software/Technology Discussion

Software and Technology Tid-bits

.TEXT Search - How To

I've just completed an "advanced searching" module for .TEXT

Just before Christmas I had added an elementary searching feature to my blogs that simply looked for a substring in the title and body of blog posts. Over the holiday I found time to enhance search to be a little more intelligent, so as to ignore text inside HTML tags. I cover the implementation specifics below....

I should start by indicating that my advanced search feature is somewhat involved. I wanted to author a solution that would perform most of the searching in SQL Server but still tie in nicely with the .TEXT framework. I achieved this by writing a SQL server extended stored procedure in C++ that called a C# in-process COM engine to perform the searching work.

The search source code can be downloaded here, and the code changes to .TEXT are illustrated below.

RobGarrett.Com.BlogSearch.dll - Extended Stored Procedure.

Anyone who tells you that Extended Stored Procedures has written a few of them before. ESP's are typically written in native C++ and can be fiddly to get right. Debugging them can be a real pain because SQL Server retains a handle to ESP DLLs when loaded and will not let go of them until the server is stopped, which means pulling down SQL Server for each new compile step after debugging. Fortunately, I had written an ESP before and knew what I was doing, so I installed a local copy of SQL Server, complete with .TEXT blog database, which I could tear down as often as I liked.

Extended Stored Procedures work much like regular stored procedures, they usually involve a query to data tables in the database and return or update a set of rows before completing. ESP's are stored and executed as DLLs, whereas regular stored procedures are stored and compiled with the database. In this case, I wanted an ESP that would return me a list of post identifiers from the blog_Content table, which matched a passed search term. My ESP is passed the blog identifier (each hosted blog in .TEXT has a unique identifier) along with the search term.

Once passed a valid search term and blog ID, my ESP will search the database for all posts containing the search term in the title or text body. Of those posts returned, the ESP will search the title and body looking for the search term in text that is not contained in tags braces ('<' and '>') - which is where the C# in-process COM DLL comes in. When my algorithm is satisfied that the post contains the search term in viewable text it returns the ID of the post as a data row, otherwise it skips the post and moves on to the next.

I could have written the entire search operation in the ESP, but chose to call out to a C# in-proc COM server to perform the physical database query and text searching, for a couple of  reasons:

* Since all ESP code runs in the process of SQL Server, any bad code will cause the server to shutdown - rendering my blogs off line. It's very easy to write unintentional bugs in C++ and to do bad things with memory pointers. C# provides more of a safety net, the framework takes better care of memory management.

* Executing SQL queries against the database in C++ requires using DBLIB (if not using a third party DAL or ODBC library dependency), which can make code complicated and prone to crash, when done wrong. C# provides much nicer access to the database through ADO.NET.

Calling out to a C# assembly required some work - I provided a COM interface wrapper around the assembly using .NET COM Interop and then called the assembly using IDispatch (more on calling C# from COM in a later post).

RobGarrett.Com.BlogSearchCom.dll - In-Proc COM DLL.

The code in this DLL was real easy. I used ADO.NET to connect with SQL Server and execute a stored procedure to return the title, body and ID of blog posts that contained a search term (anywhere in the post). I filter each post returned from the executed stored procedure, based on whether the search term was found in side HTML tag braces of the title or body text or not. If the search term exists my algorithm appends the post ID to a comma separated list, which is returned to the ESP and converted to a set of row data results.

In addition to the search filtering code, I also included code to self register the assembly as a COM component in the registry, add  the assembly to the GAC and create a type library. This code is executed when when passing the assembly path as an argument to InstallUtil.exe.

* After compiling both BlogSearch and BlogSearchCom DLLs, copy them to the Binn directory of the SQL Server installation.
* Run InstallUtil.exe RobGarrett.Com.BlogSearchCom.dll
* Add a new extended stored procedure to the master database, called xp_BlogSearch, with path to RobGarrett.Com.BlogSearch.dll in the SQL Server Binn directory.
* Give the database user running .TEXT, execute permission on the ESP just created.

The in-proc DLL requires a registry entry to access to the .TEXT database:

* Create the registry entry: HKEY_LOCAL_MACHINE\SOFTWARE\RobGarrett.Com\DotTextWeb.
* Add the string value ConnectionStr to the above key. Set the value of the connection string to the same as that used by .TEXT, in the Web.config file.

blog_SimpleSearch - Called from In-Proc DLL
.

This stored procedure searches the posts and comments for entries containing the search term anywhere in the text body or title. This SP also searches comments and returns the ID of the original post. Add it to the .TEXT database by executing the following SQL:

CREATE PROC dbo.blog_SimpleSearch
(
 @SearchTerm NVARCHAR(50),
 @BlogID INT
)
AS
 
CREATE TABLE #Results (ID INT)
INSERT INTO #Results
SELECT blog_Content.ID FROM blog_Content
WHERE blog_Content.PostType=1 AND blog_Content.BlogID = @BlogID AND blog_Content.PostConfig & 1 = 1
      AND (blog_Content.Title LIKE '%' + @SearchTerm + '%' OR
           blog_Content.Text LIKE '%' + @SearchTerm + '%')
INSERT INTO #Results
SELECT blog_Content.ParentID FROM blog_Content
WHERE blog_Content.PostType=3 AND blog_Content.BlogID = @BlogID AND blog_Content.PostConfig & 1 = 1
      AND (blog_Content.Title LIKE '%' + @SearchTerm + '%' OR
           blog_Content.Text LIKE '%' + @SearchTerm + '%') AND blog_Content.ParentID NOT IN
(SELECT DISTINCT ID FROM #Results)
 
SELECT blog_Content.ID, blog_Content.Title, blog_Content.Text FROM
#Results, blog_Content where #Results.ID = blog_Content.ID
ORDER BY blog_Content.DateAdded DESC
 
DROP TABLE #Results


blog_Search - Called from .TEXT.

This stored procedure calls the ESP and is called from .TEXT, add it to the .TEXT database by executing the following SQL:

CREATE PROC dbo.blog_Search -- '',0
(
      @SearchTerm NVARCHAR(50),
      @BlogID INT
)
AS
 
CREATE TABLE #results (ID INT)
INSERT INTO #results (ID)
EXEC master..xp_BlogSearch @SearchTerm, @BlogID
 
SELECT blog_Content.BlogID, blog_Content.[ID], blog_Content.Title, blog_Content.DateAdded, blog_Content.[Text], blog_Content.[Description],
blog_Content.SourceUrl, blog_Content.PostType, blog_Content.Author, blog_Content.Email, blog_Content.SourceName, blog_Content.DateUpdated, blog_Content.TitleUrl,
blog_Content.FeedBackCount, blog_Content.ParentID, blog_Content.PostConfig,
blog_Content.EntryName FROM blog_Content, #results
WHERE blog_Content.PostType=1 AND blog_Content.ID = #results.ID AND blog_Content.PostConfig & 1 = 1
AND blog_Content.BlogID = @BlogID
ORDER BY blog_Content.DateAdded DESC
 
DROP TABLE #results

Changes to .TEXT.

The following changes are required to be made to the .TEXT source code, I am using version  0.95.2004.102 as my baseline.

Add the following code to the Cacher class in Dottext.Common\Data\Cacher.cs:

private static readonly string EntrySearchKey = "EntrySearch:Terms{0}Blog{1}";
public static EntryCollection GetSearch(string searchTerms, CacheTime ct, HttpContext context)
{
    string key = string.Format(EntrySearchKey,searchTerms.Replace(" ", "_"),BlogID(context));
    EntryCollection search = null;
    try
    {
        search = (EntryCollection)context.Cache[key];
    }
    catch (InvalidCastException)
    {
        // Not sure why it's doing this, but what the hell.
        search = null;
    }
    if(search == null)
    {
        search = Entries.GetPostsBySearch(searchTerms);
        if(search != null)
        {
            Cacher.CacherCache(key,context,searchTerms,ct);
        }
    }
    return search;
}

Add the following code to DataDTOProvider class in Dottext.Framework\Data\DataDTOProvider.cs:

public EntryCollection GetEntriesBySearch(string searchTerms)
{
    IDataReader reader = DbProvider.Instance().GetPostsBySearch(searchTerms);
    try
    {
        EntryCollection ec = DataHelper.LoadEntryCollection(reader);
        return ec;
    }
    finally   
    {
        reader.Close();
    }
}

Add the following code to the IDbProvider interface in Dottext.Framework\Data\IDbProvider.cs:

IDataReader GetPostsBySearch(string searchTerms);

Add the following code to the IDTOProvider interface in Dottext.Framework\Data\IDTOProvider.cs:

EntryCollection GetEntriesBySearch(string searchTerms);

Add the following code to the SqlDataProvider class in Dottext.Framework\Data\SqlDataProvider.cs:

public IDataReader GetPostsBySearch(string searchTerms)
{
    SqlParameter[] p =
    {
          SqlHelper.MakeInParam("@SearchTerm",SqlDbType.NVarChar,50,searchTerms),BlogIDParam};
          return GetReader("blog_Search",p);   

        }
}

Add the following code to the Entries class in Dottext.Framework\Entries.cs::

public static EntryCollection GetPostsBySearch(string searchTerms)
{
    return DTOProvider.Instance().GetEntriesBySearch(searchTerms);
}

Create a Search.ascx file in each of the skin directories in DottextWeb\Skins, with the following code:

<%@ Control Language="c#" AutoEventWireup="false" Inherits="Dottext.Web.UI.Controls.Search" %>
<%@ Register TagPrefix="uc1" TagName="EntryList" Src="EntryList.ascx" %>

Create Search.cs in DottextWeb\UI\Controls with the following code:

namespace Dottext.Web.UI.Controls
{
      using System;
      using Dottext.Common.Data;

      ///


     
/// Search Control.
     
///
     
public  class Search : Dottext.Web.UI.Controls.BaseControl
      {          
            protected Dottext.Web.UI.Controls.EntryList Results; 
            protected override void OnLoad(EventArgs e)
            {
                  base.OnLoad (e);
                
                  string searchTerms = Request["q"];
                  if (null != searchTerms && searchTerms.Trim().Length > 0)
                  {
                        Results.EntryListItems = Cacher.GetSearch(searchTerms,CacheTime.Short,Context);
                        Results.EntryListTitle = string.Format("Search Results for {0} ", searchTerms.Trim());
                        Dottext.Web.UI.Globals.SetTitle("Blog Search Results", Context);
                  }
            }
      }
} 

Add the following code to the UIText class in DottextWeb\UI\UIText.cs:

public static string Search
{
    get
   
{
        return GetSafeConfig("Search","Search");
    }
}

Add the following line to the Web.config file in the HttpConfiguration\HttpHandlers section:

<
HttpHandler pattern = "^(?:/\w+\/(\w|\s|\.)+\/search\.aspx)$" controls="Search.ascx" />

* Recompile the complete .TEXT solution.
* Replace the installation on your server with the binaries and content files from this new hybrid version.

Make the following changes to the CSS and New/Announcements options in your blog admin. This is a little naughty, but it works, and it's easy to reconfigure the search box without making code future code changes.

CSS:

.BlogSearch
{
    border-style: solid;
    border-width: 1px;
    border-color: #1649B0;
    width: 100px;
}

News/Announcements:
(Change the URL in the redirect function to reflect the path in your blog - search.aspx will not exists, an Http handler kicks in when this URL is requested).

<H3>Search</H3>
<table cellpadding="5" cellspacing="2" width="100%" ID="Table1">
<tr>
<td align="center" valign="middle">
<input class="BlogSearch" type="text" name="searchBox" value="" ID="searchBox" onkeypress="return KeyPress(event);" maxlength="50"> 
<input type="button" value="Go" onclick="blogSearch();" ID="Button1" NAME="Button1"></td></tr></table>

<script type="text/javascript">
function KeyPress(evt)
{
var keyCode;
if (evt)
keyCode = evt.keyCode ? evt.keyCode : evt.which;
else if (window.event)
keyCode = window.event.keyCode;
if (keyCode == 13)
{ blogSearch(); return false; }
else
return true;
}

function blogSearch()
{
var input = document.getElementById('searchBox');
redirect(input.value);
}

function redirect(terms)
{
if (Trim(terms).length > 0)
window.location = '/blogs/rant/search.aspx?q=' + terms;
}

function Trim(s)
{
// Remove leading spaces and carriage returns

while ((s.substring(0,1) == ' ') || (s.substring(0,1) == '\n') || (s.substring(0,1) == '\r'))
{
s = s.substring(1,s.length);
}

// Remove trailing spaces and carriage returns

while ((s.substring(s.length-1,s.length) == ' ') || (s.substring(s.length-1,s.length) == '\n') || (s.substring(s.length-1,s.length) == '\r'))
{
s = s.substring(0,s.length-1);
}
return s;
}
</script>


-------

That about covers it, this solution is not the most elegant, but it works. Feel free to drop me a comment or email me with questions or better suggestions.

Share this post: Email it! | bookmark it! | digg it! | reddit!
Published Tuesday, December 28, 2004 6:19 PM by Rob Garrett

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

TrackBack said:

February 2, 2005 1:45 PM
 

TrackBack said:

February 2, 2005 1:54 PM
 

David Starr said:

Here is another solution to searching .Text.

http://www.elegantcode.com/dottextweb/articles/462.aspx
February 5, 2005 9:40 PM
 

TrackBack said:

March 5, 2005 5:03 PM
 

Mads Grønfeldt said:

I have used your excellent search implementation at my blog, and have made an entry concerning the quirks I identified during the implementation. See this link: http://blog.garbagecollector.dk/archive/2005/03/05/169.aspx
March 5, 2005 5:10 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit

Blurb


Head Shot
Rob Garrett is a British Expat living in Maryland USA. Rob is a trained software engineer and experienced in Windows .NET development.

Rob enjoys listening to Rock music, posting to blogs, driving in the country with the sunroof open, beer (not in conjunction with country driving) and spending time with his family.

This Blog

Syndication

Powered by Community Server, by Telligent Systems