Wikipedia Showdown

I’ve writen before about how sometimes the amount of content in wikipedia on certain subjects disturbs me (here and here.) I have off for the next few days and was looking to work my brain a little bit and wrote this little application that will compare how many characters are written about any two subjects in wikipedia. So for example you can discover that more is written about “blankbabied” then ““. So check it out, and come up with and comment on your own crazy showdowns.

Wikipedia Showdown!

In order to do this, I had to rely on CFHTTP to get the job done. I’ve included the cfc that handles grabbing the input from wikipedia in the extended entry.

<cffunction access=”public” name=”stripHTML” output=”false” returntype=”string” hint=”Removes HTML from input string.”>
<cfargument name=”str” type=”string” hint=”String to clean.” required=”yes”>
<cfreturn REReplaceNoCase(str,”<[^>]*>”,””,”ALL”)>

<cffunction access=”public” name=”weigh” output=”true” returntype=”struct”>
<cfargument name=”search_term” type=”string” required=”yes” hint=”The search term. “>

<cfset search_url=””&gt;
<cfset return_struct.searchTerm=arguments.search_term>

<cfhttp url=”#search_url#” method=”post” delimiter=”,” resolveurl=”no”>
<cfhttpparam type=”formfield” name=”search” value=”#arguments.search_term#” />

<cfif findNoCase(“Search – Wikipedia, the free encyclopedia”, cfhttp.FileContent)>
<cfset return_struct.contents=”There are no records for that search term.”>
<cfset return_struct.length=0>
<cfset return_struct.url=””>
<cfreturn return_struct>

<cfset contents=cfhttp.FileContent>

<cfset contents_start=FindNoCase(“bodyContent”, contents)>
<cfset contents_end=FindNoCase(“catlinks”, contents)>
<cfset contents_len=(contents_end-contents_start)>

<cfset contents=Mid(contents,contents_start, contents_len)>
<cfset contents=stripHTML(contents)>
<cfset contents_crap=14+9>

<cfset contents=Mid(contents, 14, Len(contents) -contents_crap)>

<cfset return_struct.contents=contents>
<cfset return_struct.length=Len(contents)>

<cfset retrieved_location=FindNoCase(“Retrieved from”,contents)>
<cfset article_url=Mid(contents, retrieved_location, Len(contents) – retrieved_location)>

<cfset article_url=replace(article_url,”Retrieved from”, “”, “ALL”)>
<cfset article_url=replace(article_url,””””, “”, “ALL”)>
<cfset article_url=trim(article_url)>
<cfset return_struct.url=article_url>

<cfreturn return_struct>



10 thoughts on “Wikipedia Showdown

  1. I think someone should take away your laptop when you have vacation days.

    But I’m impressed that you’ve been blogging it up lately. With other people’s ( blogs you have to clear out the cobwebs before reading.


Leave a Reply to Dan Cancel reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s