Measuring the Spanish Blogosphere (COAST)
Measuring the Spanish Blogosphere
Fernando Tricas (ftricas-at-unizar.es, Departamento de Informática e Ingeniería de Sistemas, U. Zaragoza, Spain)
Víctor R. Ruiz (rvr-at-blogalia.com, Blogalia.com)
Juan J. Merelo (jj-at-merelo.net, Depto. Arquitectura y Tecnología de Computadores, U. Granada, Spain)
Accepted for 2nd International Conference of the COST Action A20, Towards New Media Paradigms: Content, Producers, Organisations and Audiences.
Weblogs or blogs are a new form of diffusion of content and knowledge. They can be, and have been, used for different matters ranging from the expression of the angst of adolescent people (which has been used in some ocasions to dismiss the value of the whole phenomenon) to the diffusion of higher-level knowledge available in some areas of research, such as the one offered by Dave Winer on blogs themselves, or some other collective blogs such as PlanetMath. All weblogs constitute the blogosphere, which designs de set of websites as well as their authors; thus, blogosphere is the community of bloggers, people or collectives who share chronologically ordered information and opinions.
Focusing on the object of this study, it could be said, in general, that the Spanish blogosphere has not reached yet the critical mass. Besides, the main reference of the Spanish-speaking blogosphere is still the English-speaking web; most links found point outside the Spanish-speaking web.
In particular, it is still quite uncommon that newsitems seen or generated in the Spanish blogosphere becomes popular throughout it; when this happens most of the time is due to the reproduction of the English bloghosphere. There is also an "increasing returns" phenomenon: most bloggers concentrate in some weblog hosting sites (such as Blogalia or Barrapunto), which might the link space of the whole blogosphere.
This paper will show the experience of the authors in developing blogging tools, specially, the Blogómetro, which is an open source program (available from blogometro.sf.net) that measures the link space in the Spanish-speaking blogosphere daily, the same way that BlogDex or DayPop do in the English (or maybe global) one. We will also show and analyze data gathered from the end of the year 2002 to the first quarter of 2003; and try to provide some ideas about how the difusion of information spreads along this new medium.
Weblogs or blogs are a new form of diffusion of content and knowledge. This paper will show the experience of the authors in developing blogging tools, in particular the Blogómetro, which is an open source program that measures the link space in the Spanish-speaking blogosphere daily , in roughly the same way that BlogDex or DayPop do in the English (or maybe global) one. We will also show and analyze data gathered from the end of the year 2002 to the first quarter of 2003 and try to provide some ideas about how the difussion of information spreads along this new medium.
The rest of the paper is organized as follows: in the next sections, we will introduce weblogs, its origin (sections 2 through 4), and a brief approach to its sociology (sections 5 to 8). Next, section 9 is devoted to describe how we gathered data for this study; first results are shown in section 10, while section 11 shows some collective phenomena emerging in this blogosphere. Finally, section 12 exposes our conclusions.
2. Brief history of weblogs
History tells us that long before the WWW became mainstream, its inventor, Tim Berners-Lee, was runing a weblog compiling links to the new web sites that appeared on the Internet, while the size of his invention was still small.
The term weblog was coined by Jorn Barger (Robot Wisdom) in 1997, and is a combination of the words web and log, as in logbook. Weblog is often shortened as blog, and usually pronounced we-blog. In 1998, Jesse James Garret begun to compile a list of sites similar to his weblog, Infosift. Other well known blog pioneers are Dave Winer (Scripting News) and Cameron Barret (Camworld). In 1999, some blog-hosting companies appeared and the blog boom began. The first service was Pitas, followed the highly successful Blogger, the lesser known Edit This Page, Groksoup and Velocinews. By April, 2000, almost three hundred weblogs were tracked in Garret's list.
Nowadays, there are millions of bloggers around the world. The growing rate of blogs at Blogger.com is about a thousand new sites per day and the biggest Internet companies are investing in this field. For example, Google bought Pyra Labs, the makers of Blogger in March, 2003, while AOL and Microsoft announced in May, 2003 that they were working in blogging software.
3. Brief Spanish history of blogs
Eduardo Arcos, one of the early Spanish webloggers says that :
I think that the Spanish weblog phenomenon begun in mid-1999, with Tremendo, Subte, Beto y Gustavo. Some months after, others like us followed.
Two of the blogs mentioned here no longer exist. Almost at the same time, Barrapunto, was set up as the first Spanish collective weblog. It started as clone of Slashdot, a site with news about Linux, open source software, technology and related topics. However, the developers of Barrapunto implemented two new concepts, which made it fairly unique:
- MiBarrapuntos. It allowed people to create their own personal or collaborative Barrapunto-like sites. It was a more advanced tool than the Slashcode's journals.
- Ecolutions. A mechanism to replicate posts between blogs, for further commenting.
These features were dropped in the latest incarnation of Barrapunto, started recently. In 2000, Eduardo Arcos started bitacoras.net, a site devoted to introduce begginers in the blogging world. It hosts a blog directory and is one of the must sites of the Spanish blogosphere.
In the beginning of 2002, Blogalia.com was developed from scratch as a personal project of one of us, Víctor R. Ruiz, to provide a simple and usable tool to Spanish-speaking people. It has become a fairly populated site, with a very active community. Libertonia was also started in 2002, mainly dedicated to Linux and open source topics; it also hosts personal journals, which are, in fact, weblogs.
While the Spanish weblog community had a strong grow from 2000 to nowadays, mainly among Linux user groups, personal blogs have not grown as fast. However, their number is increasing very quickly in 2003 (possibly exponentially; the blogs known to us have increased from several hundreds in 2002 to around six thousands in 2003).
4. Anatonomy of a weblog
But, what is a weblog? This concept is in evolution, and, quite obviously, the object of heated arguments. Recently, there has been some discussion about the word weblog and blog . The open source developers call weblogs to any Slashdot-like site and, in contrast, the words blog and bitácora are used by bloggers to refer to any Blogger-like site. Slashdot and Barrapunto are run by groups of managers and users, while Blogger pages are mainly used as personal sites. Anyway, both share a basic style. Rebecca Blood made this description of weblogs in 2000 :
Their editors present links both to little-known corners of the web and to current news articles they feel are worthy of note. Such links are nearly always accompanied by the editor's commentary.
Traditionally, a post, posting or story is a short text containing one or more links to interesting sites; however, this is not a rule; today, a post can be a large article, a poem or even pictures, without any link or comments. More recently, on May, 2003, Dave Winer  defined a weblog as:
[...] a hierarchy of text, images, media objects and data, arranged chronologically, that can be viewed in an HTML browser. There's a little more to say. The center of the hierarchy, in some sense, is a sequence of weblog 'posts'.
In this paragraph, Winer mentions the main characteristic of a weblog: the chronological order. The most recent posts appear first.
Most weblog services export weblogs posts in machine-readable formats called RDF (Resource Description Format) and RSS (Rich Site Summary). Using special programs called aggregators, these files can be periodically retrieved in order to track new posts in tens of blogs.
5. Who blogs?
Today, almost anyone, from children to the elderly, from geeks to journalists. Although there are not many statistics, in some countries women's blogging-share are higher than men's (for instance, in Poland). In some other places, like Iran, men blog three times as often as women.
6. Why are blogs important?
The first generation of bloggers have been people skilled in technology issues. Most of them have enough computer knowledge to develop, install and modify weblog software as Slashcode, PHP Nuke, Greymatter, Movable Type, etc. But the revolution began with free blog hosting services, like Blogger, which doesn't require any previous knowledge in programming languages or HTML text editing. This is more about easy and cheap content management systems and less about a writing style. Bill Gates , said in April, 2003:
That sort of bottom-up publishing capability has really exploded in a certain way. Blogging [lets you] decide if, essentially, your regular diary being there, being accessible to everyone, is a very important thing.
Blogs are democratizing the Internet. They are giving digital voice to millions of people, which now can publish anything, anywhere. Running an authoritative site no longer requires digerati, anyone can produce and consume contents, which might keep readers and money from big portals, if they don't offer blogging services (and we are sure that they will do). Blogs aren't small media anymore. Personal blogs like Scripting News or Joi Ito have tens of thousands of unique visitors per day, more than some e-newspapers in Spain.
Weblogs are also ruling Google. Google's PageRank algorithm, which sorts the result of a search, gives more points the higher the number of links a site receives. Frequently, weblogs share links between them, so they weigh in a lot in Google searchs. This allows the use of a technique called Google-bombing, which Spanish blogs used, for instance, to demand political responsibility in the Prestige catastrophe.
7. Relationship between journalism and blogs
Most of the previous and current media coverage of the weblog phenomenon has dismissed it or ridiculed it, calling blogging amateur journalism. However, blogs are used by professional journalists in several ways. Some big media groups are setting up blogs for their own journalists. MSNBC currently hosts dozen of weblogs for their columnists. In May, 2003, Dan Gillmor (Silicon Valley) used his blog to gather expert information and opinions about a story he later published. Others are using it to publish full interviews, that are edited in the printed version. In Spain there are some columnists running blogs (i.e. Enrique Meneses, and Javier Armentia). In Argentina, Clarin started recently a weblog (Conexiones).
Most media recently published news about the personal diaries phenomenon in the middle of the Iraq war. Some of them covered the conflict, compiling links to news alerts. One of the most visited blogs was Where is Raed?, where an iraqi called Salam Pax (an alias), posted his thoughts from Bagdad until he lost his Internet connection due to air raids. In May, 2003, The Guardian announced that they hired Salam Pax as columnist for their digital edition.
The relationship between blogs and journalism is growing. There is an increasing interest in the academic area about weblogs, as can be seen in the blogs of University of Navarra professors José Luis Orihuela (e-Cuaderno) and Ramón Salaverría (e-periodistas, recently terminated).
8. Blogómetro: raiders of the lost news
In an article published in the Spanish newspaper ABC , Borja González wrote (our translation):
Maybe we are at the begining of a blog era, in which there will be a growing number of news and opinions available in the Internet and not in the traditional media. Doubtless, there will be posts with valuable information but, at the same time, it will be harder to isolate the relevant information.
Some tools have been developed to extract relevant information from among the cacophony of thousands of blogs. The first project to became alive was BlogDex (a project of the Media Lab at the MIT). This engine retrieves new links appeared in the blogsphere and ranks them. Other similar projects, like DayPop and PopDex followed soon. The Spanish version of BlogDex is the Blogómetro, an open source implementation developed and run by the authors of this paper. It can be used to track hot topics. However, as will be discussed later, the Spanish blogs don?t link very often, so maybe some other techniques should be developed. These tools are freely available for both curious and journalists.
All data in this study has been taken from the Blogómetro, a site hosted in Blogalia that posts fresh links every day taken from the list of blogs that are considered ?Spanish-speaking?. The Blogómetro works as follows: a spider, written in Python, crawls every day, early in the morning, all blogs in the list. From the raw HTML file, it scrapes the links, and stores them in a database if they have not been seen before.
A link is considered new or fresh if its URL has not been seen before in that particular blog; that means that if a blog refers to another several times by its URL, it will count as a single reference; that also means that links included in the blogroll lis (a list of blogs what the author considers interesting which appear in the blog in a permanent way) are considered only once (during the lifetime of the data). The DBMS is PostgresSQL, a free, open source program, that is easily interfaced with via languages such as Perl or Python. The database contains only two tables, one for the blogs themselves, with URL and description, and another for the URLs. Data has been stored for approximately 4 months, from november, 15, 2002, to may, 10, 2003. The database was purged for self-links, including links from each blog to others (which is not always possible, only in the case that links to a blog include the blog?s URL).
10. Big Numbers
Figure 1. Evolution of the number of fresh links per day. Certain periodicity seems to be present; it might correspond roughly to weekly cycles. Big spikes occur when sets of new blogs are added, so they are rather artifacts of the data collecting procedure.
During the period of study, 128857 links were observed (excluding self-links), which yields an average of 728 a day, that is, an average of .47 links per blog per day, although, in this figure, it should be taken into account that not all blogs were present at the beginning of the study. If we assume that each new history posts a new link, that means around 1000 new histories a day in the Spanish blogosphere. Not a big deal, but at least we know whose ballpark is it. It also shows that most of the accounted for blogs are active, that is, posting; on average, we could say that about half of them post a history a day, but, of course, that figure is as true as any other statistic. It could happen that there are very few quite active blogs, while others are just there, without updating anything.
If self-links are included in order to complete the picture, since we are not able to purge all of the self-links (and we need to take into account that almost each posting has a link, usually called permalink), a total of 207768 links were harvested, which yields an average of 1173 a day, that is, or 1.14 per blog, per day.
What was the most popular link during that period? Unsurprisingly, the first 20 links or so are taken by banners that have appeared in weblogs, popular weblogs such as Slashdot (number 2), directories such as bitacoras.net, and blog hosting sites and software such as Slash, Blogger, Blogspot and Blogalia. Barrapunto starts to show up here, first, by itself, and then, by having its banners as the most pointed-to links). The first real link is is one of out blogs, Cuaderno de bitácora, by Víctor R. Ruiz, with 101 links, Mini-D, with 81 links and then, Libro de Notas - Prestige (75 links), a (quite critical) page on the Prestige wreck, which was part of a campaign to google-bomb the word prestige. It obviously succeeded. It should be noted that this latter web, which corresponds to an event, has been demoted since our last study .
How are traditional media seen or used within the blogosphere? Apparently, the daily El Mundo, a national daily, is the most popular, with 616 links during this period. Next to it, an internet-only daily, Libertad Digital, with 162 links, and Periodista Digital (204 links), who usually makes selected El Pais (and others) content publicly avaiable under the ?fair quoting? provision of the copyright act. It is surprising to see that two of the 3 most popular ?traditional? media in the Spanish blogosphere are pure-play internet outfits. The rest of the Spanish press follow: El Pais, another journal, follows with 129 links, but it should be noted that in December 2002, this journal changed to a pay-per-view model, which obviously make it lose some internet readers. Other popular journals are La Vanguardia (108 links) and ABC (89 links). We would like to emphasize that none of the so-called 'confidenciales', or confidentials, Drudge-report-like media which are mainly devoted to gossip and unconfirmed news, have any relevance within the blogosphere; none of them gets more than a few links during this period. In that sense, traditional media do not have anything to fear.
As it would be expected, blogs themselves are popular in the blogosphere, as is shown in table 1.
Most of them are collective blogs, that is, places where several ?editors? put news in the main page, or directories. Many of them are also related with free software, showing the its strong relation with the whole blogging phenomenon in the Spanish-speaking community.
11. Collective phenomena
All blogs in Spanish might constitute a community; linking, as its name implies, creates a link, and it means that the linker, at least, reads the linkee, and maybe follows it. By reading a blog, and maybe commenting on it, reader and writer constitute a kind of community. In many cases, whole constellations of blogs follow, link and comment on each other. This gives rise to some collective phenomena, the first of which are ad-hoc clusters, as is shown in figure 2.
Figure 2. Graph of the main component of the Spanish blogosphere, done by Valdis Krebs using InFlow 3.0. It shows the main component of the blogosphere, including only those blogs that are related by 5 or more links. Closeness in the map corresponds to a close relation. Besides the layout, it should be noted how some blogs, such as ZaragozaWireless, act as bridges between the 'main0 component and another set of wireless blogs. fernand0.blogalia.com, one of our blogs (Tricas'), acts also a nodal point; it is almost in the middle, and links very different communities.
This map shows the overall structure of a good part of the blogosphere, the part that is most closely linked to each other. It shows most of the phenomena usual in social networks (as is shown in the book Linked, by Laszlo Barabasi): there is a core, composed, for instance, by BlogPocket, Libertonia, and our own blogs, Atalaya and Reflexiones , which is closely interrelated, composed of many blogs with many incoming and outgoing links; bridges, such as the one spanned by the 7bytes.net sites, which join different parts of the network that would otherwise be separated, and tendrils that project out of the inner core, which represent blogs that are related to each other, and loosely joined to the rest, such as the wireless tendril, bottom left. These network structure give rise to other emergent phenomena, the fact that some blogs are linked more profusely than others. It has been shown [8,9] that the global blogosphere follows a power law, with an exponent close to 1, namely -0.83. The scale of the global blogosphere is at least two order of magnitude bigger than the Spanish one, so the multiplying constant should be different. Nevertheless, we took the same measure in the data available, plotting the number of incoming links vs. the blogs, ordered by descending number of links.
Figure 3. Power law fit to experimental data. Blogs, order by the number of incoming links, are represented in the x axis, while the number of links is represented in the y axis. Both axes are logarithmic in scale Data have been fitted to a power law, however, the fit is not good, as can be seen in the graph. The best fit is to the function g(x) = 627.019x-0.692869
In the case of the studied blogs, during the time of observation, this power law phenomenon is not observed; the fit is not good, since the best positioned blogs receive too few links (the fit is over the experimental curve), and the worse positioned too many (experimental curve over the fitted one). In our opinion, this is an indication of the lack of maturity of the Spanish blogosphere, with no clear winners attracting most of the incoming links.
In this paper, we have tried to present blogs as a new communication medium, and then, the results of our study, done along several months, on the Spanish speaking blogs. Links are the main currency in the blogosphere, and, by studying them, we can uncover some collective behavior, and measure it quite precisely. This study has shown the impressive increase in the amount of blogs, number of posts, and the rise of some centers and some other superstructures within that community. We have also tried to see how traditional media is seen within this blogosphere, and what kind of articles raise its attention. In the future, as we have done in the past , we will try to continue studying these phenomena; the Blogómetro will continue working, and we will try to draw some conclusions about the dynamic nature of the blogosphere, not just static snapshots as this one.
 Eduardo Arcos. Do we live in a small world, May 2003.
 Awacate. ¿Qué es un blog?, May 2003.
 Rebecca Blood. Weblogs: a history and perspective, September 2000.
 Dave Winer. What makes a weblog a weblog?, May 2003.
 Bill Gates. Remarks by Bill Gates, Newspaper Association of America Annual Convention, Seattle, Washington, April 29, 2003.
 Víctor R. Ruiz. Blogs vs periodistas profesionales en el ABC, March 2003. The original article is no longer available.
 Fernando Tricas; Víctor R. Ruiz; Juan Julián Merelo. Do we live in an small world? Measuring the Spanish-speaking blogosphere. In Proceedings BlogTalk Conference, 2003.
 Jason Kottke. Screw the power law, embrace the power law.
 Clay Shirky. Power laws, weblogs, and inequality.