" /> Cyril Dangerville: November 2005 Archives

« October 2005 | Main | December 2005 »

November 28, 2005

[Weblog Research] How does weblog technology work?

Blogs work like websites, except they have a specific use of dynamic pages, databases and templates. Read more for further details.

A blog is a website first and foremost, i.e. a blog is a bunch of webpages hosted on a HTTP (HyperText Transfer Protocol) server.
Footnote
{
HTTP is indeed the request/response protocol used between the client (typically your web browser) and the server to receive webpages for the end-user. (HTTP is an application-layer protocol.)
}

There are two HTTP servers that dominate the market: Apache (70%) and IIS (Internet Information Services).
Footnote
{
Apache is open source, lead by the Apache software foundation and sponsored by IBM. IIS is proprietary, edited by Microsoft.
}

To understand how blogs work, you need to understand the client-server model which is at the foundation of all Internet services. The most common version of it is the 3-tier client-server model. Below are figures about the 3 tier client-server architecture and the real version of it.
three-tier.gif
client-serverArchitecture.gif

The client (browser) requests for the web page stored on the remote machine through the server software. The server locates this file, then...

• Case 1: the webpage is a static webpage (extension html, htm), which means it is created once and for all. The content does not change unless the webmaster (or anybody allowed to) deigns to change it. In this case, the server just sends it to the browser. The browser then interprets the the HTML (HyperText Markup Language) or XHTML (eXtensible HTML, that is to replace HTML) code of the page and displays this file on your machine.

• Case 2: the webpage is dynamic (extension php, asp,aspx, jsp, etc.). In this page, some code in a specific programming language is embedded in the webpage, and executed by the web/application server when the client requests the page.

Footnote
{
Different server side programming languages are available. These are the most popular: PHP (Hypertext Preprocessor, recursive acronym) which is open source, Java with JSP (Java Server Pages) from Sun Microsystems, ASP (Active Server Pages) and ASP.NET which is the last evolution of it, from Microsoft. ASP.NET is web development platform like JSP, but unlike JSP which only supports Java code, ASP.NET actually support different programming languages (C#, VBScript, J# and others).
}

The HTML code is generated and the result is sent to the client. Such pages usually contain some static HTML code and some specific tags that indicates the code within this tags is in another language, the appropriate module has to execute it and the HTML result has to be merged with the source page.
The contents of the dynamic page depend on the query passed to the program. In particular, the program may use the parameters in the query to query the database and retrieve the relevant information, and finally use it to build the HTML page.

Footnote
{
Three database/SQL server software prevail: MySQL (open source, very very popular), PostgreSQL (open source, enterprise-level, much more sophisticated than MySQL, therefore more complex), Oracle (proprietary, leader in the business world), for very large databases
}

Think of search engines for which you specify keywords on a web page. You request is sent to the server with the keywords in the query and the database is processed as a result.

• Case 3: this case is useful to understand the process. It is just a simpler version of case 2, where the client request is sent directly to a script (short program) – not a webpage - using a language like Perl, which script produces (prints) HTML code to be sent to the client. No tags here, but instructions executed in sequence.

Now that you have the client-server architecture in mind, there is one last point to explain before you understand how blogs work.
“Blogs differ from traditional web sites in that, rather than being composed of many individual pages connected by hyperlinks, they are composed of a few templates (usually Main Page, Archive Page, and Individual Article/Item Page), into which content is fed from a database.” (Wikipedia, Weblog, http://en.wikipedia.org/wiki/Weblog, retrieved 11/26/2005). Indeed, you have 3 dynamic pages at least in a blog site, to keep it simple (Main, Archive, Individual Article/Item). The form of the page follows the appropriate template and the content is build from a database of articles, the articles you have already submitted through the Individual Article Page, the page where you edit your article. When you submitted your article, a program is executed by the webpage (cf.CASE 2 above) and filled out a database with information about the article such as the date, the author, the title, etc. You notice on the main page that articles are displayed in reverse chronological order, which is very specific to weblogs. Again, this main page executes a code that fetches data about:
- you as an author because this is your blog, you want to see your articles
- the dates of the articles, to sort them out in the reverse chronological order
- some extra things, less significant.

You got the idea.

References:
1. www.webdevelopersnotes.com, The Client Server Architecture, 2005, http://www.webdevelopersnotes.com/basics/client_server_architecture.php3
2. See http://www.dcs.bbk.ac.uk/~mick/academic/networks/msc/programming/server.shtml for extra references.

See Alok’s former blog for extra information.

November 23, 2005

SPAM + BLOG = SPLOG

... = SPAM 2.0!
I recently read an article about a new – new for the least geeks of us, I actually read the article one month ago - web phenomenum that I deem worth to be brought to your attention : the « splog ». What the hell is that? The definition, the good and evil, the causes and effects, and the Nota Bene, all you wanted to know (unconsciously) about SPLOGS...

Splog is to weblogs what spam is to emails. Why spam blogs? What’s the big deal?

I. WHY SPLOGS?

1) Spam filters for emails give headaches to spammers..
As spam filters are more and more efficient, spammers are looking for new targets.

2) Blogs are growing exponentially.
According to the search engine Technorati, there are more than 20 million blogs on the web, and 80,000 created every day. Ooh, I guess we have a new target.

3) Blogs enable spammers to improve the ranking of their website in Google or other search engines.
To illustrate my statement, Joe will be my man. Joe is a SPAMMER. And not the dumbest. (All characters are entirely fictional, blablabla, I don’t take responsibility for any fortuitous connection to CICS geeks or other real persons.) What is Joe’s purpose? To drag you to his website (to advertise, sell you products, to make you give critical information about you, which is called phishing in this case), no matter the cost. What is one of Joe’s best tactics to drag you to his site? To get an excellent ranking in Google’s results (or other major search engines like Yahoo !, MSN, Altavista). For this increases the probability you encounter a link to Joe’s website. How to improve the ranking? Well, first, you have to know that search engines compute the ranking of the results according to the number of links targeting a given website. To keep it simple, the more links are pointing at your website, the better you will be ranked in the results. Of course, this depends on the keyword specified in the request as well. That’s why the links have to be associated with keywords that are relevant to the activity of your website. A rational way to get better ranking is to invite partner websites to link to your websites, usually related to your activity. You can cut a deal by offering to link to their website from yours in exchange. What if your so-called partner has no interest in trading links with you because he’s so big already that it doesn’t need you to grow? (You may have to pay them or pay Google too.)
What if you are Joe the spammer, you use your website for your evil purposes and nobody wants to deal with you?
Well, Joe has the solution. Joe built a tool (program) that generates blogs automatically on platforms like Blogger, Google’s blogging service. This tool can create blogs, register them, insert a content into it. Joe uses this tool to create blogs, full of ranking-friendly keywords and full of links (hypertext, typically) to his website, for free. The ranking-friendly keywords can refer to the activity of Joe’s website. They can also state the names of reputed bloggers. Finally, a new trend consists to use keywords that generate juicy Adsense (Google’s program that integrates advertisement in the webpage dynamically, according to keywords found in the content of the page) advertisement.
blogger.gifgoogle_adsense.gif


II. THE COUNTER-ATTACK

To avoid being overwhelmed with splogs, some search engines stopped indexing weblogs, purely and simply. Google seems to have developed techniques that differentiate more or less splogs from genuine blogs, but did not communicate on the subject. Blogger also provides a “Flag” button that enables any user to denounce abuses on other users’ blogs.
Nevertheless, “flagging a splog” remain relatively as inefficient as captchas to prevent splogs from thriving. Cf. Wikipedia for a good definition of captcha.
Captcha of Dr. Gillette's email address: captcha.jpg
Worse is splogs look more and more similar to genuine “human” blogs, and splog filters tend to put splogs and real blogs in the same bag!


III. THE FACTS

1) Fightsplog - blog specialized in the combat against splogs - listed 2,763 splogs of pornographic nature, created by the same person!

2) According to ZDNet.fr, out of 1.3 million posts or comments on weblogs happening every day, 50,000 of them are spam. Most of them are comments posted automatically by robots (programs).

3) Google indicated last week that his blog hosting service Blogger had been infested by 13,000 splogs.

N. B. (just for fun, I mean, just to add value…) : at the origin, the word « splog » comes from New-Zealand and refers to specific shoes, halfway between clogs and sheepskin slippers.
splogscolours.jpg