Search, Hold the Server

Being a content site, it was pretty important that whichElement.com have search–which was pretty hard considering that I didn’t want to have any server-side components involved.

At first I thought I would just let Google index the site, and hook up a Google search box on the site to solve the problem. That was certainly an option.  I thought I would have to do some SEO magic to make it happen correctly, but it was doable. In fact, Ray had solved this problem already.

But then I got to thinking, wouldn’t be cooler to rise to the challenge of a search without a server? Why, yes, yes it would. I broke up my needs into two parts:

  • An index of the site’s content
  • A mechanism for searching the index and displaying the results.

I kicked around a few ideas, but finally settled on the idea of creating a JSON file that had an array of objects with title, url, summary, and condensed content info. If I had such a file, all I would have to do is search through the JSON to find results. So the second part of my search was a snap.  All I had to do was:

  • Pull down the JSON file
  • Run searches against that JSON file
  • Present the results

All of this was pretty easy to do, and not revolutionary.

The difficult part was making the index in the first place.  The added difficulty is that I wanted to use JavaScript for everything.  So I couldn’t just use a shell script or some other easy way of indexing the files. Basically I wanted to remove barriers to entry, so an OS X shell script would create an obstacle for Windows based HTML developers to get involved.

Trying to do this with JavaScript in a browser was very hard. While there is a File API for the browser, you can’t use it to point at arbitrary directories on the file system like the site itself, you can only really point it at a sandboxed space. This made indexing kinda impossible.

I said I had to use JavaScript; but I didn’t say I had to use a browser. Enter Rhino the Java interpreter for JavaScript. Rhino gave me the ability to call Java File IO classes from JavaScript.  This allowed for easy indexing of the content. Now this might be a bit of a cheat since I am basically calling Java, which is a decidedly server-side technology in this case.  I rationalized my way out of it. ANT is required to build the project, but knowing how to fire off an ANT build and being forced to right full Java are two different things.  I’d love to hear if any of you are put off by this.

Rhino gave me the ability to run JavaScript from the command line, or from ANT.  Since we publish whatever gets checked in to the github repository, and we publish that code through ANT, I could just reindex as part of the build whenever new content comes in. New content causes a reindex, the index is always up to date, and generated on my terms – only JavaScript.

What it actually does:

  • Reads in all HTML files in the site
  • Filters ones that I don’t want in search results.
  • Grabs the title, url, and content from each
  • Writes out this content to JSON on disk

It’s not perfect. Search is pretty primitive – I don’t know how far it will scale. But for now, I have a pretty cool solution to my problem.

Here’s the indexer code:

http://snipplr.com/js/embed.js
http://snipplr.com/json/63962

CFHTTP equivalent in Java. Really, Java, Really?

I was talking to my boss Kevin, and we were talking about how concise ColdFusion makes certain rote tasks, and he mentioned trying to duplicate CFHTTP in Java. He talked about how it went on for line after line, after line. I figured he was talking about something in the order of magnitude of about 2 or 3 times as much code.

He forwarded me a post on making HTTP GET and POST requests in Java. As the post shows it takes 12 lines of code just to import all of the classes you need. When you’re all said and done it takes about 30 or so lines of code to actually make a GET request. So it takes about 42 lines of Java code to duplicate the functionally that can be called in 1 line of ColdFusion using CFHTTP. I never noticed it was that big a difference. All that versus:

 

 

Now, I’m sure there are easier ways of doing this. And after you build the class and method once, you can just reuse the object repeatedly. But in this day and age of SOA, SOAP, and REST, that seems like something that should be built into the language.

I’m not trying to make this a bash Java post. Really I’m not. Java can do lots of things ColdFusion cannot. In fact Java networking is this verbose because it has more options; it can do low level socket communication. I know, when I’ve needed it in ColdFusion, I’ve dropped down to Java to write it.

But as developers, I think there are lots of places where we don’t add value, but are still forced to work:

  • Getting reporting data out of a database? You add value by writing good complex SQL, but not by writing the database connection code.
  • In basic database applications, you add value by designing the database, but not by writing CRUD code.
  • In a REST and SOAP world, you add value by mashing up services people have thought of combining, not by making the HTTP call to get the data.

I know frameworks, libraries, code reuse, and other encapsulation techniques are ways around this.

ColdFusion as an abstraction layer on top of Java is another way. It’s the way I do it. And because I do it that way, I never have to write more code then I have to for an HTTP GET request, or email, or database connection, or .NET integration, or Exchange call, or Spreadsheet creation, or …