I’ve writen before about how sometimes the amount of content in wikipedia on certain subjects disturbs me (here and here.) I have off for the next few days and was looking to work my brain a little bit and wrote this little application that will compare how many characters are written about any two subjects in wikipedia. So for example you can discover that more is written about “blankbabied” then “zombo.com“. So check it out, and come up with and comment on your own crazy showdowns.
Wikipedia Showdown!
In order to do this, I had to rely on CFHTTP to get the job done. I’ve included the cfc that handles grabbing the input from wikipedia in the extended entry.
<cffunction access=”public” name=”stripHTML” output=”false” returntype=”string” hint=”Removes HTML from input string.”>
<cfargument name=”str” type=”string” hint=”String to clean.” required=”yes”>
<cfreturn REReplaceNoCase(str,”<[^>]*>”,””,”ALL”)>
</cffunction>
<cffunction access=”public” name=”weigh” output=”true” returntype=”struct”>
<cfargument name=”search_term” type=”string” required=”yes” hint=”The search term. “>
<cfset search_url=”http://en.wikipedia.org/wiki/Special:Search”>
<cfset return_struct.searchTerm=arguments.search_term>
<cfhttp url=”#search_url#” method=”post” delimiter=”,” resolveurl=”no”>
<cfhttpparam type=”formfield” name=”search” value=”#arguments.search_term#” />
</cfhttp>
<cfif findNoCase(“Search – Wikipedia, the free encyclopedia”, cfhttp.FileContent)>
<cfset return_struct.contents=”There are no records for that search term.”>
<cfset return_struct.length=0>
<cfset return_struct.url=””>
<cfreturn return_struct>
</cfif>
<cfset contents=cfhttp.FileContent>
<cfset contents_start=FindNoCase(“bodyContent”, contents)>
<cfset contents_end=FindNoCase(“catlinks”, contents)>
<cfset contents_len=(contents_end-contents_start)>
<cfset contents=Mid(contents,contents_start, contents_len)>
<cfset contents=stripHTML(contents)>
<cfset contents_crap=14+9>
<cfset contents=Mid(contents, 14, Len(contents) -contents_crap)>
<cfset return_struct.contents=contents>
<cfset return_struct.length=Len(contents)>
<cfset retrieved_location=FindNoCase(“Retrieved from”,contents)>
<cfset article_url=Mid(contents, retrieved_location, Len(contents) – retrieved_location)>
<cfset article_url=replace(article_url,”Retrieved from”, “”, “ALL”)>
<cfset article_url=replace(article_url,””””, “”, “ALL”)>
<cfset article_url=trim(article_url)>
<cfset return_struct.url=article_url>
<cfreturn return_struct>
</cffunction>
</cfcomponent>
Hmm…who would win the Terry v. Janice smackdown?
But really, Terry, how could you compete against http://en.wikipedia.org/wiki/Janice?
LikeLike
“janice” defeats “terry”
5642 characters to 585 characters.
I lose.
LikeLike
I think someone should take away your laptop when you have vacation days.
But I’m impressed that you’ve been blogging it up lately. With other people’s (blog.alig.net) blogs you have to clear out the cobwebs before reading.
LikeLike
Talk about cobwebs, you should check out: http://bobz01.blogspot.com/.
LikeLike
Ryan, I tried once to get the laptop away, but he bit my arm and then looked at me suspiciously all day. It really wasn’t worth the trouble.
LikeLike
You’re sitting right next to me! There is nothing from stopping me from…
LikeLike
EEeekk! My shoulder, my shoulder! Am I going to need shots?
LikeLike
Disclaimer: Commentor is not responsible for any bodily harm caused by comment.
Was that too late?
LikeLike
Nope too late. Janice already is in the fetal position foaming at the mouth. But in fairness, it may be unrelated to my bite.
LikeLike
Hey! Keep it down Cronkright….
LikeLike