Oratie rogers 1.36
Vossiuspers UvA is an imprint of Amsterdam University Press.
This edition is established under the auspices of the Universiteit van Amsterdam.
This publication was made possible in part by a grant received from the Mondriaan Interregelingfor the Digital Methods Initiative.
This is inaugural lecture 339, published in this series of the University of Amsterdam
Cover design: Crasborn BNO, Valkenburg a/d GeulLay-out: JAPES, AmsterdamCover illustration: Carmen Freudenthal, Amsterdam
ISBN 978 90 5629 593 6e-ISBN 978 90 4851 128 0
All rights reserved. Without limiting the rights under copyright reserved above, no part of this bookmay be reproduced, stored in or introduced into a retrieval system, or transmitted, in any form or byany means (electronic, mechanical, photocopying, recording, or otherwise), without the writtenpermission of both the copyright owner and the author of this book.
the Chair of New Media & Digital Culture
Situating Digital Methods in Internet Research
Arguably, there is an ontological distinction between the natively digital and thedigitized; that is, between the objects, content, devices and environments ‘born’in the new medium, as opposed to those which have ‘migrated’ to it. Should thecurrent methods of study change, however slightly or wholesale, given the focuson objects and content of the medium? The research program proposed here there-by engages with ‘virtual methods’ importing standard practices from the socialsciences and the humanities. The distinction between the natively digital and thedigitized also could apply to current research methods. What type of Internetresearch may be performed with digitized methods (such as online surveys anddirectories) compared to those that are natively digital (such as recommendationsystems and folksonomy)?
Second, I propose that Internet research may be put to new uses, given an
emphasis on natively digital methods as opposed to the digitized. I will strive toshift the attention from the opportunities afforded by transforming ink into bits,and instead inquire into how research with the Internet may move beyond thestudy of online culture alone. How to capture and analyze hyperlinks, tags, searchengine results, archived websites, and other digital objects? What may one learnfrom how online devices (e.g. engines and recommendation systems) make use ofthe objects, and how may such uses be repurposed for social and cultural research?Ultimately, I propose a research practice which grounds claims about culturalchange and societal conditions in online dynamics, introducing the term ‘onlinegroundedness.’ The overall aim is to rework method for Internet research, devel-oping a novel path of study: digital methods.
To date, the methods employed in Internet research have served to critique the
persistent idea of the Internet as a virtual realm apart. Such thinking arose from
the discourse surrounding virtual reality in the late 1980s and early 1990s, and theInternet came to stand for a virtual realm, with opportunities for redefining con-sciousness, identity, corporality, community, citizenry and (social movement) pol-itics.1 Indeed, in 1999, in one of the first efforts to synthesize Internet research,the communications scholar Steve Jones invited researchers to move beyond theperspective of the Internet as a realm apart, and opened the discussion of meth-od.2 How would social scientists study the Internet, if they were not to rely on theapproaches associated with it to date: human-computer interaction, social psychol-ogy and cybercultural studies?3 In their ground-breaking work on Internet usage inTrinidad and Tobago, the ethnographers Daniel Miller and Don Slater challengedthe idea of cyberspace as a realm apart where all ‘inhabiting’ it experienced itsidentity-transforming affordances, regardless of physical location.4 Slater andMiller grounded the Internet, arguing that Trinis appropriated the medium to fittheir own cultural practices. Although it was a case study, the overall thrust of theresearch was its potential for generalization. If Trinis were using the Internet tostage Trini culture, the expectation is that other cultures are doing the same.
The important Virtual Society? program (1997-2002) marked another turning
point in Internet research, debunking the myth of cyberspace’s transformative ca-pacities through multiple empirical studies about Internet users. The program ul-timately formulated five ‘rules of virtuality’.5 In what is now the classic digitaldivide critique, researchers argued that the use of new media is based on one’ssituation (access issues), and the fears and risks are unequally divided (skills is-sues). With respect to the relationship between the real and the virtual, virtualinteractions supplement rather than substitute for the ‘real,’ and stimulate morereal interaction, as opposed to isolation and desolation. Finally, the research foundthat identities are grounded in both the online as well as the offline. Significantly,the program settled on approaches subsequently characterized as virtual methods,with an instrumentarium for studying users. Surveys, interviews, observation andparticipant-observation became the preferred methods of inquiry. In the huma-nities, subsequent user studies – concentrating on the amateur, the fan, and the
‘produser’ – also are grappling with the real and virtual divide, seeking to demon-strate and critique the reputational status of online culture.6 The argument ad-vanced here is that virtual methods and user studies in the social sciences and thehumanities have shifted the attention away from the data of the medium, and theopportunities for study of far more than online culture.
How may one rethink user studies with data (routinely) collected by software?
User studies to date have relied on accounts favoring observation, interviews andsurveys, owing, in one reading, to the difference in armatures between socialscientific and humanities computing, on the one hand, and the large commercialcompanies, with their remarkable data collection achievements, on the other. In asense, Google, Amazon and many other dominant Web devices are already con-ducting user studies, however infrequently the term is used. User inputs (prefer-ences, search history, purchase history, location) are captured and analyzed so as totailor results. Taking a lead from such work, new media theorist Lev Manovich hascalled for a methodological turn in Internet research, at least in the sense of datacollection. With ‘cultural analytics,’ named after Google Analytics, the proposal isto build massive collection, storage and analytical facilities for humanities comput-ing.7 One distinguishing feature of the methodological turn is its marked departurefrom the reliance on (negotiated) access to commercial data sets, e.g. AOL’s set ofusers’ search engine queries, Linden Lab’s set of the activities of millions of usersin Second Life, or Sony’s for Everquest, however valuable the findings have been.8
In a sense, the research program is one answer to the question, what would
Google do? The programs could be situated in the larger context of the extentand effects of ‘googlization’. Until now, the googlization critique has examinedthe growing ‘creep’ of Google; its business model and its aesthetics, across infor-mation and knowledge industries.9 Library science scholars in particular concernthemselves with the changing locus of access to information and knowledge (frompublic shelves and stacks to commercial servers). The ‘Google effect’ also may becouched in terms of supplanting surfing and browsing with search. It also may bestudied in terms of the demise of the expert editor, and the rise of the back-endalgorithm, themes to which I will return later. Here, however, the point is thatthey also may be studied in terms of models for research – ones that seek toreplicate the scale of data collection as well as analysis.
The proposal I am putting forward is more modest, yet still in keeping with
what are termed registrational approaches to user studies. Online devices and soft-ware installed on the computer (e.g. browsers) register users’ everyday usage.
Browser histories would become a means to study use. The larger contention isthat data collection, in the methodological turn described above, could benefitfrom thinking about how computing may have techniques which can be appro-
priated for research. Thus the proposal is to consider first and foremost the avail-ability of computing techniques.
I would like to suggest inaugurating a new era in Internet research, which no
longer concerns itself with the divide between the real and the virtual. It concernsa shift in the kinds of questions put to the study of the Internet. The Internet isemployed as a site of research for far more than just online culture. The issue nolonger is how much of society and culture is online, but rather how to diagnosecultural change and societal conditions using the Internet. The conceptual point ofdeparture for the research program is the recognition that the Internet is not onlyan object of study, but also a source. Knowledge claims may be made on the basis ofdata collected and analyzed by devices such as search engines. One of the moreremarkable examples is Google Flu Trends, a non-commercial (Google.org) pro-ject launched in 2008, which anticipates local outbreaks of influenza by countingsearch engine queries for flu, flu symptoms and related terms, and ‘geo-locating’the places where the queries have been made. It thereby challenges existing meth-ods of data collection (emergency room reports), and reopens the discussion of theWeb as anticipatory medium, far closer to the ground than one might expect.10
Where did the ‘grounded Web,’ and its associated geo-locative research prac-
tice, originate? The ‘end of cyberspace’ as a placeless space (as Manuel Castells putit) may be located in the technical outcomes of the famous Yahoo lawsuit, broughtby two non-governmental organizations in France in 2000.11 At the time, FrenchWeb users were able to access the Nazi memorabilia pages on Yahoo.com in theUnited States, and the French organizations wanted the pages blocked – in France.
IP-to-geo (address location) technology was developed specifically to channel con-tent nationally; when one types google.com into a browser in France, now google.
fr is returned by default. This ‘grounding’ of the Web has been implemented bymajor content-organizing projects such as YouTube; online television is servedgeographically, too.
Diagnostic work such as Google Flu Trends, whereby claims about societal
conditions are made on the basis of captured Internet practices, leads to new the-oretical notions. For the third era of Internet research, the digital methods pro-gram introduces the term online groundedness, in an effort to conceptualize researchwhich follows the medium, captures its dynamics, and makes grounded claimsabout cultural and societal change. Indeed, the broader theoretical goal of digitalmethods is to rethink the relationship between the Web and the ground. Like the
ethnographers before them, the researchers in the UK Virtual Society? programneeded to visit the ground in order to study the Web. Here the digital methodsresearch program actually complicates the sequence in which one’s findings aregrounded.12 For example, journalism has methodological needs, now that the In-ternet has become a significant meta-source, where the traditional question nor-mally concerns the trustworthiness of a source. Snowballing from source to sourcewas once a social networking approach to information-checking, methodologicallyspeaking. Who else should I speak to? That question comes at the conclusion of theinterview, if trust has been built. The relationship between ‘who I should speak to’and ‘who else do you link to’ is asymmetrical for journalism, but the latter is whatsearch engines ask when recommending information. How to think through thedifference between source recommendations from verbal and online links? Issearch the beginning of the quest for information, ending with some groundedinterview reality beyond the net, whereby we maintain the divide between thereal and the virtual? Or is that too simplistic? Our ideal source set divide (realand virtual, grounded or googled) raises the question of what comes next. Whatdo we ‘look up’ upon conclusion of the interview to check the reality? The Inter-net may not be changing the hierarchy of sources for some (i.e. the restrictions onciting Wikipedia in certain educational settings), but it may well be changing theorder of checking, and the relationship of the Web to the ground.
I developed the notion of online groundedness after reading a study conducted
by the Dutch newspaper NRC Handelsblad. The investigation into right-wing andextremist groups in the Netherlands explored whether the language used wasbecoming harsher over time, perhaps indicating a ‘hardening’ of right-wing andhate culture more generally. Significantly, the investigators elected to use the In-ternet Archive, over an embedded researcher (going native), or the pamphlets,flyers and other ephemera at the Social History Institute.13 They located and ana-lyzed the changes in tone over time on right-wing as well as extremist sites, find-ing that right-wing sites were increasingly employing more extremist language.
Thus the findings made about culture were grounded through an analysis of web-sites. Most significantly, the online became the baseline against which one mightjudge a societal condition.
Follow the Medium: The Digital Methods Research Program
Why follow the medium? A starting point is the recognition that Internet researchis often faced with unstable objects of study. The instability is often discussed interms of the ephemerality of websites and other digital media, and the complex-ities associated with fixing them, to borrow a term from photography. How tomake them permanent, so that they can be carefully studied? In one approach,vintage hardware and software are maintained so as to keep the media ‘undead.’Another technique, as practiced in game environments, addresses ephemeralitythrough simulation/emulation, which keeps the nostalgic software, like Atarigames, running on current hardware. The ephemerality issue, however, is muchlarger than issues of preservation. The Internet researcher is often overtaken byevents of the medium, such as software updates that ‘scoop’ one’s research.
As a research practice, following the medium, as opposed to striving to fix it,
may also be discussed in a term borrowed from journalism and the sociology ofscience – ‘scooping.’ Being the first to publish is to ‘get the scoop.’ ‘Being scooped’refers to someone else having published the findings first. Sociologist of scienceMichael Lynch has applied this term to the situation in which one’s research sub-jects come to the same or similar conclusions as the researchers, and go on recordwith their findings first. The result is that the ‘[research subjects] reconfigure thefield in which we previously thought our study would have been situated’.14 InInternet research, being scooped is common. Industry analysts, watchdogs andbloggers routinely coin terms (googlization) and come to conclusions which shapeongoing academic work. I would like to argue, however, that scooping is also doneby the objects themselves, which are continually reconfigured. For example, Face-book, the social networking site, has been considered a walled garden’ or rela-tively closed community system, where by default only ‘friends’ can view infor-mation and activities concerning other friends. The walled garden is a series ofconcentric circles: a user must have an account to gain access, must ‘friend’ peo-ple to view their profiles, and must change privacy default settings to let friends offriends view one’s own profile. Maximum exposure is opening profiles to friendsof friends. In March 2009, Facebook changed a setting; users may now make theirprofile open to all other users with accounts, as opposed to just friends, or friendsof friends, as in its previous configuration. Which types of research would be
‘scooped’ by Facebook’s flipping a switch? Facebook serves as one notable example
of the sudden reconfiguration of a research object, which is common to the me-dium.
More theoretically, following the medium is a particular form of medium-spe-
cific research. Medium specificity is not only how one sub-divides disciplinarycommitments in media studies according to the primary objects of study: film,radio, television, etc. It is also a particular plea to take seriously ontological dis-tinctiveness, though the means by which the ontologies are constructed differ. Tothe literary scholar and media theorist Marshall McLuhan, media are specific inhow they engage the senses.15 Depth, resolution and other aesthetic propertieshave effects on how actively or passively one processes media. One is filled bymedia, or one fills it in. To the cultural theorist Raymond Williams, mediumspecificity lies elsewhere. Media are specific in the forms they assume – formsshaped by the dominant actors to serve interests.16 For example, creating ‘flow,’the term for how television sequences programming so as to keep viewers watch-ing, boosts viewer ratings and advertising. Thus, to Williams, media are not apriori distinct from one another, but can be made so. To Katherine Hayles, thespecificity of media resides in their materiality; a book specifies, whilst text doesnot.17 Her proposal for media-specific analysis is a comparative media studies pro-gram, which takes materially instantiated characteristics of media (such as hyper-text in digital media), and enquires into their (simulated) presence in other media(such as print). One could take other media traits and study them across media.
For example, as Alexander Galloway has argued, flow is present not only in radioand television, but also on the Web, where dead links disrupt surfing.18
Hayle’s point of departure may be seen in Mathew Fuller’s work on Microsoft
Word and Adobe Photoshop, which studies how particular software constrains orenables text.19 To Fuller, a Microsoft document or a Photoshop image are specificoutputs of software, distinctive from some document or some image. An accom-panying research program would study the effects of (software) features, as LevManovich also points to in his work on the specificity of computer media. Withthese media Manovich’s ontology moves beyond the outputs of media (Hayle’shypertextual print, Fuller’s Word document and Photoshop image).20 Computermedia are metamedia in that they incorporate prior media forms, which is inkeeping with the remediation thesis put forward by Jay David Bolter and RichardGrusin.21 Yet, to Manovich, computer media not only refashion the outputs ofother media; they also embed their forms of production.
The medium specificity put forward here lies not so much in McLuhan’s sense
engagement, Williams’s socially shaped forms, Hayles’s materiality, or other theo-rists’ properties and features. Rather, it is situated in method. Previously I describedsuch work ‘Web epistemology’.22 On the Web, information, knowledge and soci-ality are organized by recommender systems – algorithms and scripts that prepareand serve up orders of URLs, media files, friends, etc. In a sense, Manovich hasshifted the discussion in this direction, both with the focus on forms of production(method as craft) as well as with the methodological turn associated with thecultural analytics initiative. I would like to take this turn further, and propose thatthe under-interrogated methods of the Web also are worthy of study, both in andof themselves as well as in the effects of their spread to other media, e.g. TVshows recommended to Tivo users on the basis of their profiles.
Initial work in the area of Web epistemology arose within the context of the
politics of search engines. 23 It sought to consider the means by which sources areadjudicated by search engines. Why, in March of 2003, were the US White House,the Central Intelligence Agency, the Federal Bureau of Investigation, the HeritageFoundation and news organizations such as CNN the top returns for the query
‘terrorism’? The answer lies somewhat in how hyperlinks are handled. Hyper-links, however, are but one digital object, to which may be added: the thread,tag, PageRank, Wikipedia edit, robots.txt, post, comment, trackback, pingback,IP address, URL, whois, timestamp, permalink, social bookmark and profile. Inno particular order, the list goes on. The proposal is to study how these objects arehandled, specifically, in the medium, and learn from medium method.
In the following, I would like to introduce a series of medium objects, devices,
spaces, as well as platforms, first touching briefly on how they are often studiedwith digitized methods and conceptual points of departure from outside the me-dium. Subsequently, I would like to discuss the difference it makes to research ifone were to follow the medium – by learning from and reapplying how digitalobjects are treated by devices, how websites are archived, how search enginesorder information and how geo-IP location technology serves content nationallyor linguistically. What kinds of research can be performed through hyperlink ana-lysis, repurposing insights from dominant algorithms? How to work with the In-ternet archive for social research? Why capture website histories? How may searchengine results be studied so as to display changing hierarchies of credibility, andthe differences in source reliance between the Web, the blogosphere and news
sphere? Can geo-IP address location technology be reworked so as to profile coun-tries and cultures? How may the study of social networking sites reveal culturaltastes and preferences? How are software robots changing how quality content ismaintained on Wikipedia? What would a research bot do? Thus, from the micro tothe macro, I treat the hyperlink, website, search engine and spheres (includingnational webs). I finally turn to social networking sites, as well as Wikipedia, andseek to learn from these profiling and bot cultures (respectively), and rethink howto deploy them analytically. The overall purpose of following the medium is toreorient Internet research to consider the Internet as a source of data, methodand technique.
How is the hyperlink most often studied? There are at least two dominant ap-proaches to studying hyperlinks: hypertext literary theory and social network the-ory, including small world and path theory.24 To literary theorists of hypertext,sets of hyperlinks form a multitude of distinct pathways through text. The surfer,or clicking text navigator, may be said to author a story by choosing routes (multi-ple clicks) through the text.25 Thus the new means of authorship, as well as thestory told through link navigation, are both of interest. For small world theorists,the links that form paths show distance between actors. Social network analystsuse pathway thought, and zoom in on how the ties, uni-directional or bi-direc-tional, position actors.26 There is a special vocabulary that has been developed tocharacterize an actor’s position, especially an actor’s centrality, within a network.
For example, an actor is ‘highly between’ if there is a high probability that otheractors must pass through him to reach each other.
How do search engines treat links? Arguably, theirs is a scientometric (and
associational sociology) approach. As with social network analysis, the interest isin actor positioning, but not necessarily in terms of distance from one another, orthe means by which an actor may be reached through networking. Rather, ties arereputational indicators, and may be said to define actor standing. Additionally, theapproach does not assume that the ties between actors are friendly, or otherwisehave utility, in the sense of providing empowering pathways, or clues for successfulnetworking.
Here I would like to explore how engines treat links as markers of impact and
reputation. How may an actor’s reputation be characterized by the types of hyper-links given and received? Actors can be profiled not only through the quantity oflinks received, as well as the quantity received from others who themselves havereceived many links, in the basic search engine algorithm. Actors may also beprofiled by examining which particular links they give and receive.27 In previousresearch, my colleagues and I found linking tendencies among domain types, i.e.
governments tend to link to other governmental sites only; non-governmentalsites tend to link to a variety of sites, occasionally including critics. Corporatewebsites tend not to link, with the exception of collectives of them – industrytrade sites and industry ‘front groups’ do link, though. Academic and educationalsites typically link to partners and initiatives they have created. Taken together,these linking proclivities of organizational types display an everyday ‘politics ofassociation’.28 For example, in work my colleagues and I conducted initially in1999, we found that while Greenpeace linked to governmental sites, governmentdid not link back. Novartis, the multinational corporation, linked to Greenpeace,and Greenpeace did not link back. When characterizing an actor according toinlinks and outlinks, one notices whether there is some divergence from thenorms, and more generally whether particular links received may reveal some-thing about an actor’s reputation. A non-governmental organization receiving alink from a governmental site could be construed as a reputation booster, forexample.29
Apart from capturing the micro-politics of hyperlinks, analysis of links also may
be put to use in more sophisticated sampling work. Here the distinction betweendigitized and natively digital method stands out in greater relief. The Open NetInitiative at the University of Toronto conducts Internet censorship research bybuilding lists of websites (from online directories such as the Open DirectoryProject and Yahoo). The researchers subsequently check whether the sites areblocked in a variety of countries. It is important work that sheds light on the scopeas well as technical infrastructure of state Internet censorship practices world-wide.30 In the analytical practice, sites are grouped by category: famous bloggers,government sites, human rights sites, humor, women’s rights, etc.; there are ap-proximately forty categories. Thus censorship patterns may be researched by sitetype across countries.
The entire list of websites checked per country (some 3000) is a sample, cover-
ing of course only the smallest fraction of all websites as well as those of a parti-cular subject category. How would one sample websites in a method following themedium, learning from how search engines work (link analysis) and repurposing itfor social research? My colleagues and I contributed to the Open Net Initiativework by employing a method which crawls all the websites in a particular cate-gory, captures the hyperlinks from the sites, and locates additional key sites (byco-link analysis) that are not on the lists. I dubbed the method ‘dynamic URLsampling’, in an effort to highlight the difference between manual URL-list com-pilation, and more automated techniques of finding significant URLs. Once thenew sites are found, they are checked for connection stats (through proxies initi-ally, and later perhaps from machines located in the countries in question), inorder to determine whether they are blocked. In the research project on ‘social,political and religious’ websites in Iran, researchers and I crawled all the sites inthat ONI category, and through hyperlink analysis, found some thirty previouslyunknown blocked sites. Significantly, the research was also a page-level analysis (asopposed to host only), with one notable finding being that Iran was not blockingthe BBC news front page (as ONI had found), but only its Persian-language page.
The difference between the two methods of gathering lists of websites for analysis
– manual directory-style work and dynamic URL sampling – shows the contribu-tion of medium-specific method.
Up until now, investigations into websites have been dominated by user and ‘eye-ball studies,’ where attempts at a navigation poetics are met with such soberingideas as ‘don’t make me think’.31 Many methods for studying websites are locatedover the shoulder, where one observes navigation or the use of a search engine,and later conducts interviews with the subjects. In what one may term classicregistrational approaches, a popular technique is eye-tracking. Sites load and eyesmove to the upper left of the screen, otherwise known as the golden triangle. Theresulting heat maps provide site redesign cues. For example, Google.com hasmoved its services from above the search box (tabs) to the top left corner of thepage (menu). Another dominant strand of website studies lies in feature analysis,
where sites are compared and contrasted on the basis of levels of interactivity,capacities for user feedback, etc.32 The questions concern whether a particularpackage of features result in more users, and more attention. In this tradition,most notably in the 9/11 special collection, websites are often archived for furtherstudy. Thus much of the work lies in the archiving of sites prior to the analysis.
One of the crucial tasks ahead is further reflection upon the means by whichwebsites are captured and stored, so as to make available the data upon whichfindings are based. Thus the digital methods research program engages specificallywith the website as archived object, made accessible, most readily, through theInternet Archive’s Wayback Machine. The research program specifically askswhich types of website study are enabled and constrained by the WaybackMachine.
In order to answer that question, the work first deconstructs, or unpacks, the
Internet Archive and its Wayback Machine. In which sense does the Internet Ar-chive, as an object formed by the archiving process, embed particular preferencesfor how it is used, and for the type of research performed using it? Indeed, Webarchiving scholar Niels Brügger has written: ‘[U]nlike other well-known media,the Internet does not simply exist in a form suited to being archived, but rather isfirst formed as an object of study in the archiving, and it is formed differentlydepending on who does the archiving, when, and for what purpose.’33 The ideathat the object of study is constructed by the very means by which it is tamed, andcaptured by method and technique, is a classic point from the sociology and philo-sophy of science and elsewhere.34 Thus the initial research questions are, whichmethods of research are privileged by the specific form assumed by the Web ar-chive, and which are precluded? For example, when one uses the Internet Archive(archive.org), what stands out for everyday Web users accustomed to search en-gines is not so much the achievement of the existence of an archived Internet.
Rather, the user is struck by how the Internet is archived, and, particularly, howit is queried. One queries a URL, as opposed to keywords, and receives a list ofstored pages associated with the URL from the past. In effect, the Internet Ar-chive, through the interface of the Wayback Machine, has organized the story ofthe Web into the histories of individual websites.
Which research approaches are favored by the current organization of websites
by the Internet Archive? With the Wayback Machine, one can study the evolutionof a single page (or multiple pages) over time; for example, by reading or collect-
ing snapshots from the dates when a page was indexed. How can such an arrange-ment of historical sites be put to use? Previously I mentioned the investigativereporting work done by NRC Handelsblad in their analysis of the rise of extremistlanguage in the Netherlands. The journalists read some hundred websites from theInternet archive, some dating back a decade. It is work that should be built upon,methodologically as well as technically. One could scrape the pages of the right-wing and extremist sites from the Internet Archive, place the text (and images) ina database, and systematically query it for the presence of particular keywordsover time. As NRC Handelsblad did, one could determine changes in societal con-ditions through the analysis of particular sets of archived sites.
How else to perform research with the Internet Archive? The digital methods
program has developed means to capture the history of sites by taking snapshotsand assembling them into a movie, in the style of time-lapse photography.35 Todemonstrate how to use the Internet archive for capturing such evolutionary his-tories, my colleagues and I took snapshots of Google’s front pages from 1998 up tothe end of 2007. The analysis concerned the subtle changes made to the interface,in particular the tabs. We found that the Google directory project, organizing theWeb by topic, undertaken by human editors, has been in decline. After its place-ment on the Google front page in 2001, it was moved in 2004 under the ‘more’button, and in 2006 under ‘even more.’ By late 2007, with the removal of the
‘even more’ option, one had to search Google in order to find its directory.36 Thelarger issue of the demise of the human editor, read in this case from the evolutionof Google’s interface, has far-reaching implications for how knowledge is collectedand ordered. Indeed, after examining Google, researchers and I turned to Yahoo,the original Web directory, and found that there, too, the directory had beenreplaced by the back-end algorithm. In examining the outputs of a query in thedirectory, we also learned that at Yahoo the results are no longer ordered alphabe-tically, in the egalitarian style of information and source ordering inherited fromencyclopedias. Yahoo is listing its directory sources according to popularity, in thewell-known style of recommendation systems more generally.
Are the histories of search engines, captured from their interface evolutions,
indicating changes in how information and knowledge are ordered more generally?A comparative media studies approach would be useful, with one of the morepoignant cases being the online newspaper. With the New York Times, for example,articles are still placed on the front page and in sections, but are also listed by
‘most emailed’ and ‘most blogged’, providing a medium-specific recommendersystem for navigating the news. The impact of recommender systems – the domi-nant means on the Web by which information and knowledge are ordered – mayalso be studied through user expectations. Are users increasingly expecting Web-like orderings at archives, libraries, tourist information centers and other sites ofknowledge and information queries?
The study of search engines was jolted by the now infamous AOL search enginedata release in 2006, where 500,000 users’ searches over three months were putonline, with frightening and often salacious press accounts about the level of inti-mate detail revealed about searchers, even if their histories are made anonymousand decoupled from geography (no IP address). One may interpret the findingsfrom the AOL case as a shift in how one considers online presence, if that remainsthe proper term. A person may be ‘googled’, and his or her self-authored pre-sence often appears at or towards the top of the returns. Generally speaking, whatothers have written about a person would appear lower down in the rankings.
However, with search engine queries stored, a third set of traces could come todefine an individual. This opens up intriguing policy questions. How long may anengine company keep search histories? Thus search engines are being studied inthe legal arena, especially in terms of how data retention laws may be applied tosearch histories.
Previously, I mentioned another strand in search engine studies, summed up in
the term googlization. It is a political-economy style critique, considering howGoogle’s free-service-for-profile model may be spreading across industries and(software) cultures. I have covered the critique elsewhere, striving to propose aresearch agenda for googlization scholars which includes front-end and back-endgooglization. Front-end googlization would include the study of the informationpolitics of the interface (including the demise of the human-edited directory).
Back-end googlization concerns the rise of the algorithm that recommends sourceshierarchically, instead of alphabetically, as mentioned above. The significance ofstudying the new information hierarchies of search engines also should be viewedin light of user studies. A small percentage of users set preferences to more than
ten results per page; typically they do not look past the first page of results; andthey increasingly click the results appearing towards the top.37 Thus the power ofsearch engines lies in the combination of its ranking practices (source inclusion inthe top results) together with the users’ apparent ‘respect’ for the orderings (notlooking further). Google’s model also relies on registrational interactivity, where auser’s preferences as well as history are registered, stored and employed, increas-ingly, to serve customized results. Prior to the Web and search engine algorithmsand recommendation systems, interactivity was ‘consultational,’ with pre-loadedinformation ‘called up’.38 A query would return the same information for all usersat any given time. Now the results are dynamically generated, based on one’sregistered preferences, history and location.
The different orders of sources and things served by engines are under-studied,
largely because they are not stored, and made available for research, apart fromthe AOL data release, or other negotiated agreements with search engine compa-nies. Google once made available an API (application programming interface) al-lowing data collection. A limited number of queries could be made per day, andthe results repurposed. Researchers relying on the API were scooped by Googlewhen it discontinued the service in late 2006. With its reintroduction in a differentform in 2009, Google emphasized, however, that automated queries and the per-manent storage of results violated the terms of service. How to study search en-gine results under such conditions? Now we scrape Google, and post a noticeappreciating Google’s forbearance.39
What may be found in Google’s search engine results? As I have remarked,
search engines, a crucial point of entry to the Web, are epistemological machinesin the sense that they crawl, index, cache and ultimately order content. Earlier Idescribed the Web, and particularly a search engine-based Web, as a potentialcollision space for alternative accounts of reality.40 The phrasing built upon thework of the sociologist C. Wright Mills, who characterized the purpose of socialresearch as ‘no less than to present conflicting definitions of reality itself’.41 Areengines placing alternative accounts of reality side by side, or do the results alignwith the official and the mainstream? Storing and analyzing search engine resultscould answer such questions. Such has been the purpose of the software projectcalled the Issue Dramaturg, so called for the potential drama within the top re-sults, whereby sites may climb to or suddenly fall from the top. It is important topoint out that top engine placements are highly sought after; organizations make
use of search engine optimization techniques so as to boost site visibility. There arewhite hat and black hat techniques; that is, those accepted by engines and thosethat prompt engines to delist websites from results until there is compliance againwith engine etiquette.
In the Issue Dramaturg project, my team stored Google search engine results
for the query ‘9/11’ as well as other keywords for two purposes. The one is toenquire into source hierarchies, as described above. Which sources are privileged?Which are ‘winning’ the competition to be the top sources returned for particularqueries? The other purpose has been to chart particular sources, in the approachto engine studies I have termed ‘source distance’. For the query 9/11, how farfrom the top of the engine returns are such significant actors as the New York Citygovernment and the New York Times? Are such sources prominent, or do they ap-pear side by side with sources that challenge more official and familiar views?Apart from the New York City government and the New York Times, another actorwe have monitored is the 9/11 truth movement (911truth.org). For months be-tween March and September 2007, the 9/11 truth movement’s site appeared inthe top five results for the query 9/11, and the other two were well below resultfifty. In mid-September 2007, around the anniversary of the event, there was dra-ma. 911truth.org fell precipitously to result two hundred, and subsequently out ofthe top one thousand, the maximum number of results served by Google. Webelieve that is one of the first fully documented cases of the apparent removal of awebsite in Google – from a top five placement for six months to a sub-one thou-sand ranking.42 The case leads to questions of search engine result stability andvolatility, and opens up an area of study.
However dominant it may be, there are more search engines than Google’s
Web search. What is less appreciated perhaps is that there are other dominantengines per section or sphere of the Web. For the blogosphere, there is Technor-ati; for the newssphere, Google News; and for the tagosphere or social bookmark-ing space, Delicious. Indeed, thinking of the Web in terms of spheres refers initi-ally to the name of one of the most well-known, the blogosphere, as well as toscholarship that seeks to define another realm, the Web sphere.43 The sphere inblogosphere refers in spirit to the public sphere; it also may be thought of as thegeometrical form, where all points on the surface are the same distance from thecenter or core. One could think about such an equidistant measure as an egalitar-ian ideal, where every blog, or even every source of information, is knowable by
the core, and vice versa. On the Web, however, it has been determined that cer-tain sources are central. They receive the vast majority of links as well as hits.
Following such principles as the rich get richer (aka Pareto power law distribu-tions), the sites receiving attention tend to garner only more. The distance be-tween the center and other nodes may only grow, with the ideal of a sphere beinga fiction, though a useful one. I would like to suggest an approach examining thequestion of distance from core to periphery, and operationalize it as the measure ofdifferences in rankings between sources per sphere. Cross-spherical analysis is adigital method for measuring and learning from the distance between sources indifferent spheres on the Web.
Conceptually, a sphere is considered to be a device demarcated source set, i.e.
the pure PageRank of all sources on the Web (most influential sites by inlinkcount), or indeed analogous pageranks of all sources calculated by the dominantengines per sphere, such as Technorati, Google News and Delicious. Thus, tostudy a sphere, we propose first to allow the engines to demarcate it. In sphereanalysis, one considers which sources are most influential, not only overall but perquery. Cross-spherical analysis compares the sources returned by each sphere forthe same query. It can therefore be seen as comparative ranking research. Mostimportantly, with cross-spherical analysis, one may think through the conse-quences of each engine’s treatment of links, freshness, tags, etc. Do particularsources tend to be in the core of one sphere, and not in others? What do compar-isons between sources, and source distances, across the spheres tell us about thequality of the new media? What do they tell us about current informational com-mitments in particular cultures?
In a preliminary analysis, my colleagues and I studied which animals are most
associated with climate change on the (English-language) Web, in the news and inthe blogosphere. We found that the Web has the most diverse set of animals asso-ciated with climate change. The news favored the polar bear, and the blogosphereamplified, or made more prominent, the selection in the news sphere. Here wecautiously concluded that the Web may be less prone to the creation of mediaicons than the news, which has implications for studies of media predicated upona publicity culture. The blogosphere, moreover, appeared parasitically connectedto the news as opposed to providing an alternative to it.
As mentioned above, Internet research has been haunted by the virtual/real di-vide. One of the reasons for such a divide pertains to the technical arrangementsof the Internet, and how they became associated with a virtual realm, cyberspace.
Indeed, there was meant to be something distinctive about cyberspace, technolo-gically.44 The protocols and principles, particularly packet switching and the end-to-end principle, initially informed the notion of cyberspace as a realm free fromphysical constraints. The Internet’s technical indifference to the geographical loca-tion of its users spawned ideas not limited to placeless-ness. In its very architec-ture, the Internet also supposedly made for a space untethered to the nation-states, and their divergent ways of treating flows of information. One recalls thefamous quotation attributed to John Gilmore, co-founder with John Perry Barlowof the Electronic Frontier Foundation. ‘The Internet treats censorship as a mal-function, and routes around it’.45 Geography, however, was built into cyberspacefrom the beginning, if one considers the locations of the original thirteen rootservers, the unequal distributions of traffic flows per country, as well as the allot-ment of IP addresses in ranges, which later enabled the application of geo-IP ad-dress location technology to serve advertising and copyright needs. Geo-IP tech-nology, as well as other technical means (aka locative technology), also may be putto use for research that takes the Internet as a site of study, and inquires into whatmay be learned about societal conditions across countries. In the digital methodsresearch program, my colleagues and I have dubbed such work national Webstudies.
Above I discussed the research by British ethnographers, who grounded cyber-
space through empirical work on how Caribbean Internet users appropriated themedium to fit their own cultural practices. This is of course national Web studies,although with observational methods (from outside of the medium). To study theWeb, nationally, one also may inquire into routinely collected data, for example bylarge enterprises such as Alexa’s top sites by country (according to traffic). Whichsites are visited most frequently per country, and what does site visitation sayabout a country’s informational culture? Alexa pioneered registrational data col-lection with its toolbar, which users installed in their browsers. The toolbar pro-vided statistics about the Website loaded in the browser, such as its freshness. Allwebsites the user loaded, or surfed, also would be logged, and the logged URLs
would be compared with the URLs already in the Alexa database. Those URLs notin the database would be crawled, and fetched. Thus was born the Internet Ar-chive.
The Internet Archive (1996- ) was developed during the period of Internet
history that one could term cyberspace. (I have developed periodizations of Inter-net history elsewhere, and will not further elaborate here.)46 To illustrate the de-sign and thought behind the Internet Archive, and the national Web archivessprouting up in many countries, it may be useful to point out that the InternetArchive was built for surfing – an Internet usage type that arguably has given wayto search.47 At the Wayback Machine of the Internet Archive, type in a singleURL, view available pages, and browse them. If one reaches an external link, theInternet Archive looks up the page closest in date to the site one is exiting, andloads it. If no site exists in the Internet Archive, it connects to the live website. Itis the continuity of flow, from Website to Website, that is preserved.48 NationalWeb archives, on the other hand, have ceased to think of the Web in terms ofcyberspace. Instead, their respective purposes are to preserve national Webs. Forthe purposes of contributing method to Internet research, the initial question is,how would one demarcate a national Web?
At the National Library in the Netherlands, for example, the approach is similar
to that of the Internet censorship researchers, discussed above. It is a digitizedmethod, that is, a directory model, where an expert chooses significant sites basedon editorial criteria. These sites are continually archived with technology origin-ally developed in the Internet Archive project. At the time of writing, approxi-mately one thousand national websites are archived in the Netherlands – a far cryfrom what is saved in the Internet Archive.49 In accounting for the difference inapproaches and outcomes of the two projects, I would like to observe that the endof the virtual, and the end of cyberspace, have not been kind to Web archiving; thereturn of the nation-state and the application of certain policy regimes (especiallycopyright) have slowed efforts dramatically. Would digital methods aid in redres-sing the situation? I would like to invite national Web archivists to consider aregistrational approach, e.g. the Alexa model adapted for a national context.
Social Networking Sites & Post-demographics
‘We define social networking websites here as sites where users can create a pro-file and connect that profile to other profiles for the purposes of making an expli-cit personal network.’50 Thus begins the study of American teenage use of suchsites as MySpace and Facebook, conducted for the Pew Internet & American LifeProject. Surveys were taken. 91% of the respondents use the sites to ‘managefriendships’; less than a quarter use the sites to ‘flirt’. Other leading research intosocial networking sites considers such issues as presenting oneself and managingone’s status online, the different ‘social classes’ of users of MySpace and Facebook,and the relationship between real-life friends and ‘friended’ friends.51 Another setof work, often from software-making arenas, concerns how to make use of thecopious amounts of data contained in online profiles, especially interests andtastes. I would like to dub this latter work ‘post-demographics.’ Post-demo-graphics could be thought of as the study of the data in social networking plat-forms, and, in particular, how profiling is, or may be, performed. Of particularinterest here are the potential outcomes of building tools on top of profiling plat-forms. What kinds of findings may be made from mashing up the data, or whatmay be termed meta-profiling?
Conceptually, with the ‘post’ prefixed to demographics, the idea is to stand in
contrast to how the study of demographics organizes groups, markets and votersin a sociological sense. It also marks a theoretical shift from how demographicshave been used ‘bio-politically’ (to govern bodies) to how post-demographics areemployed ‘info-politically,’ to steer or recommend certain information to certainpeople.52 The term post-demographics also invites new methods for the study ofsocial networks, where the traditional demographics of race, ethnicity, age, in-come, and educational level – or derivations thereof such as class – give way totastes, interests, favorites, groups, accepted invitations, installed apps and otherinformation comprising an online profile and its accompanying baggage. That is,demographers normally would analyze official records (births, deaths, marriages)and survey populations, with census-taking being the most well known of thoseundertakings. Profilers, however, have users input data themselves in platformsthat create and maintain social relations. They capture and make use of informa-tion from users of online platforms.
Perhaps another means of distinguishing between the two types of thought and
practice is with reference to the idea of digital natives, those growing up withonline environments, and unaware of life prior to the Internet, especially with theuse of manual systems that came before it, like a library card catalog.53 The cate-gory of digital natives, however, takes a generational stance, and in that sense is atraditional demographic way of thinking. The post-demographic project would beless interested in new digital divides (digital natives versus non-natives) and theemergent narratives surrounding them (e.g. moral panics), but rather in how pro-filers recommend information, cultural products, events or other people (friends)to users, owing to common tastes, locations, travel destinations and more. Thereis no end to what could be recommended, if the data are rich and stored. How tostudy the data?
With post-demographics, the proposal is to make a contribution to Internet
research by learning from those profilers and researchers who both collect as wellas harvest (or scrape) social networking sites’ data for further analysis or software-making, such as mash-ups. How do social networking sites make their data avail-able to profilers? Under the developers’ menu item at Facebook, for example, onelogs in and views the fields available in the API (or application programming inter-face). Sample scripts are provided, as in ‘get friends of user number x,’ where x isyourself. Thus the available scripts generally follow the privacy culture, in thesense that the user decides what the profiler can see. It becomes more interestingto the profiler when many users allow access, by clicking ‘I agree’ on a third-partyapplication.
Another set of profiling practices are not interested in personal data per se, but
rather in tastes and especially taste relationships. One may place many profilingactivities in the category of depersonalized data analysis, including Amazon’s semi-nal recommendation system, where it is not highly relevant which person alsobought a particular book, but rather that people have done so. Supermarket loy-alty cards and the databases storing purchase histories similarly employ depersona-lized information analysis, where like Amazon, of interest is the quantity of par-ticular items purchased as well as the purchasing relationships (which chips withwhich soft drink). Popular products are subsequently boosted. Certain combina-tions may be shelved together.
While they do not describe themselves as such, of course the most significant
post-demographic machines are the social networking platforms themselves, col-
lecting user tastes, and showing them to others, be they other friends, everydaypeoplewatchers or profilers. Here I would like to describe briefly one piece ofsoftware my research team built on top of the large collection device, MySpace,and the kinds of post-demographic analytical practices which resulted.
Elfriendo.com is the outcome of reflecting on how to make use of the profiles
on the social networking platform, MySpace. At Elfriendo.com, enter a singleinterest, and the tool creates a new profile on the basis of the profiles of peopleexpressing that single interest. One may also compare the compatibility of inter-ests, i.e. whether one or more interests, tunes, movies, TV shows, books andheroes are compatible with other ones. Is Christianity compatible with Islam, inthe sense that those people with one of the respective interests listen to the samemusic and watch the same television programs? Elfriendo answers those sorts ofquestions by analyzing sets of friends’ profiles, and comparing interests acrossthem. Thus a movie, TV show, etc. has an aggregate profile, made up of otherinterests. (To wit, Eminem, the rapper, appears in both the Christianity and Islamaggregate profiles, in early February 2009.) One also may perform a semblance ofpost-demographic research with the tool, gaining an appreciation of relationaltaste analysis with a social networking site, more generally.54
It is instructive to state that MySpace is more permissive and less of a walled
garden than Facebook, in that it allows the profiler to view a user’s friends (andhis/her friends’ profiles), without your having friended anybody. Thus, one canview all of Barack Obama’s friends, and their profiles. Here, in an example, onequeries Elfriendo for Barack Obama as well as John McCain, and the profiles oftheir respective sets of friends are analyzed. The software counts the items listedby the friends under interests, music, movies, TV shows, books and heroes. Whatdoes this relational taste counting practice yield? The results provide distinctivepictures of the friends of the two presidential candidates campaigning in 2008.
The compatibility level between the interests of the friends of the two candidatesis generally low. The two groups share few interests. The tastes of the candidates’friends are not compatible for movies, music, books and heroes, though for TVshows the compatibility is 16%. There seem to be particular media profiles foreach set of candidate’s friends, where those of Obama watch the Daily Show, andthose of McCain watch Family Guy, Top Chef and America’s Next Top Model.
Both sets of friends watch Lost. The findings may be discussed in terms of voterpost-demographics, in that the descriptions of voter profiles are based on media
tastes and preferences as opposed to educational levels, income and other standardindicators.
At present, approaches to the study of Wikipedia have followed from certain qua-lities of the online encyclopedia, all of which appear counter-intuitive at firstglance. One example is that Wikipedia is authored by so-called amateurs, yet issurprisingly encyclopedia-like, not only in form but in accuracy.55 The major de-bate concerning the quality of Wikipedia vis-à-vis Encyclopedia Britannica has raisedquestions relevant to digital methods, in that the Web-enabled collective editingmodel has challenged the digitized work of a set of experts. However, research hasfound that there is only a tiny ratio of editors to users in Web 2.0 platforms,including Wikipedia. This is otherwise known as the myth of user-generated con-tent.56 Wikipedia co-founder Jimbo Wales, has often remarked that the dedicatedcommunity is indeed relatively small, at just over 500 members. Thus the smallcadre of Wikipedia editors could be considered a new elite, leading to exercises inrelativizing the alleged differences between amateurs and experts, such as througha study of the demographics of Wikipedians.57 Another example of a counter-intuitive aspect of Wikipedia is that the editors are unpaid, yet committed andhighly vigilant. The vigilance of the crowd, as it is termed, is something of amythical feature of a quality-producing Web, until one considers how vigilance isperformed. Who is making the edits? One approach to the question lies in theWikiscanner project (2007- ), developed by Virgil Griffith studying at the Califor-nia Institute of Technology. The Wikiscanner outs anonymous editors by lookingup the IP address of the editor and checking it against a database with the IPaddress locations (geoIP technology). Wikipedia quality is ensured, to Griffith, byscandalizing editors making self-serving changes, such as a member of the Dutchroyal family, who embellished an entry and made the front-page of the newspaperafter a journalist used the tool.
How else are vandals kept at bay on Wikipedia, including those experimenters
and researchers making erroneous changes to an entry, or creating a new fictionalone, in order to keep open the debate about quality?58 Colleagues and I havecontributed to work about the quality of Wikipedia by introducing the term net-
worked content.59 It refers to content held together by human authors and non-human tenders, including bots and alert software which revert edits or notifyWikipedians of changes made. Indeed, when looking at the statistics available onWikipedia on the number of edits per Wikipedian user, it is remarkable to notethat the bots are by far the top editors. The contention, which is being researchedin the digital methods program, is that the bots and the alert software are signifi-cant agents of vigilance, maintaining the quality of Wikipedia.
From the Wikiscanner project and the bots statistics related above, it is worth
emphasizing that Wikipedia is a compendium of network activities and events,each logged and made available as large data sets. Wikipedia also has in-built re-flection or reflexivity, as it shows the process by which an entry has come intobeing, something missing from encyclopedias and most other finished work moregenerally. One could study the process by which an entry matures; the materialsare largely the revision history of an entry, but also its discussion page, perhaps itsdispute history, its lock-downs and re-openings. Another approach to utilizingWikipedia data would rely on the edit logs of one or more entries, and repurposethe Wikiscanner’s technical insights by looking up where they have been made.
‘The places of edits’ show subject matter concerns and expertise by organizationand by country.
Conclusion. The End of the Virtual – Grounding Claims
My aim is to set into motion a transformation in how and why one performsresearch using the Internet. The first step is to move the discussion away from thelimitations of the virtual (how much culture and society are online) to the limita-tions of current method (how to study culture and society, and ground findingswith the Internet).
I would like to conclude with a brief discussion of these limitations in Internet
research as well as a proposal for renewal. First, the end of cyberspace and itsplaceless-ness, and the end of the virtual as a realm apart, are lamentable forparticular research approaches and other projects. In a sense, the real/virtual di-vide served specific research practices.60 Previously I mentioned that Internet ar-chiving thrived in cyberspace, and more recently, it suffers without it. Where
cyberspace once enabled the idea of massive website archiving, the grounded Weband the national Webs are shrinking the collections.
Indeed, I have argued that one may learn from the methods employed in the
medium, moving the discussion of medium specific theory from ontology (proper-ties and features) to epistemology (method). The Internet, and the Web morespecifically, have their ontological objects, such as the link and the tag. Web epis-temology, among other things, is the study of how these natively digital objects arehandled by devices. The insights from such a study lead to important methodo-logical distinctions, as well as insights about the purpose of Internet research.
Where the methodological distinction is concerned, one may view current Inter-net methods as those that follow the medium (and the dominant techniques em-ployed in authoring and ordering information, knowledge and sociality) and onesthat remediate or digitize existing method. The difference in method may havesignificant outcomes. One reason for the fallowing of the Web archiving effortsmay lie in the choice of a digitized method (editorial selection) over a digital one(registrational data collection), such as that employed in the original Internet Ar-chive project, where sites surfed by users were recorded. Indeed, I have employedthe term digital methods so that researchers may consider the value and the out-comes of one approach over another. As a case in point, the choice of dynamicURL sampling over the editorial model could be beneficial to Internet censorshipresearch, as I discussed.
Third, and finally, I have argued that the Internet is a site of research for far
more than online culture and its users. With the end of the virtual/real divide,however useful, the Internet may be rethought as a source of data about societyand culture. Collecting it and analyzing it for social and cultural research requiresnot only a new outlook about the Internet, but method, too, to ground the find-ings. Grounding claims in the online is a major shift in the purpose of Internetresearch, in the sense that one is not so much researching the Internet, and itsusers, as studying culture and society with the Internet. I hope you will join me inthis urgent project.
Barlow, 1996; Benedict, 1991; Dibbell, 1998; Rheingold, 1991; Rheingold; 1993;Shaviro, 2008; Stone, 1995; Turkle, 1995.
Jenkins, 2006; Keen, 2007; Bruns, 2008.
Manovich, 2007. See also Manovich, 2008.
Contractor 2009; Lazer et al., 2009.
Jeanneney, 2007; Vaidhyanathan, 2007; Rogers, 2009.
10. Rogers, 2003.
11. Castells, 1996; Goldsmith & Wu, 2006; Rogers, 2008.
12. Marres & Rogers, 2008.
13. NRC Handelsblad, 2007.
14. Lynch, 1997.
15. McLuhan, 1964.
16. Williams, 1974.
17. Hayles, 2004.
18. Galloway, 2004.
19. Fuller, 2003.
20. Manovich, 2008.
21. Bolter & Grusin, 1999.
22. Rogers, 2004.
23. Introna & Nissenbaum, 2000.
24. Landow, 1994; Watts, 1999; Park & Thewall, 200325. Elmer, 2001.
26. Krebs, 2002.
27. cf. Beaulieu, 2005.
28. Marres & Rogers, 2000; Rogers, 2002.
29. The Issue Crawler software, with particular allied tools, has been developed specifi-
cally to perform such hyperlink analysis. The software crawls websites, and links aregathered and stored. The crawler-analytical modules are adaptations from sciento-metrics (co-link analysis) and social networking analysis (snowball). Once a network islocated with the Issue Crawler, individual actors may be profiled, using the actor pro-filer tool. The actor profiler shows, in a graphic representation, the inlinks and out-links of the top ten network actors. The other technique for actor profiling relies on a
scraper that would capture all outlinks from a site, and a scraper of a search engine, theYahoo inlink ripper, which provides a list of the links made to a website.
30. Diebert et al, 2006.
31. Krug, 2000; Dunne, 2005.
32. Foot & Schneider, 2006.
33. Brügger, 2005, 1.
34. Latour & Woolgar, 1986; Knorr-Cetina, 1999; Walker, 2005.
35. Screen-capturing software has been employed previously for the analysis of Wikipedia
pages, showing the evolution of entries and thus how Wikipedians build knowledge.
36. The ‘even more’ button returned to the interface of Google.com in 2008.
37. Spink & Jansen, 2004.
38. Jensen, 1999.
39. The notice appears on the credits page of the Issue Dramaturg, http://issuedramaturg.
40. Rogers, 2004.
41. C. Wright Mills, 1971, 212; Rogers & Marres, 2002.
42. Rogers, 2009.
43. Foot & Schneider, 2002; Schneider & Foot, 2002.
44. Chun, 2006.
45. Boyle, 1997.
46. Rogers, 2008.
47. Shirky, 2005.
48. Galloway, 2004.
49. Weltevrede, 2009.
50. Lenhart & Madden, 2007.
51. Boyd & Ellison, 2007.
52. Foucault, 1998; Rogers, 2004.
53. Prensky, 2001.
54. One gains a sense of how analysis may be performed, and the kinds of findings that may
be made, because Elfriendo captures the top 100 profiles, thus providing an indication,as opposed to a grounded finding from a proper sampling procedure.
55. Giles, 2005.
56. Swartz, 2006.
57. Van Dijck, 2009.
58. Chesney, 2006; Read, 2006; Magnus, 2008.
59. Niederer, 2009.
60. For the edits may be traced.
Barlow, J. P., ‘A Declaration of the Independence of Cyberspace,’ Davos, Switzerland,
1996, http://homes.eff.org/~barlow/Declaration-Final.html (accessed 28 January2009)
Beaulieu, A., ‘Sociable Hyperlinks: An Ethnographic Approach to Connectivity’, in: C.
Hine (ed.), Virtual Methods: Issues in Social Research on the Internet. Berg, Oxford, 2005,pp. 183-197
Benedict, M., ‘Cyberspace: Some Proposals’, in: M. Benedict (ed.), Cyberspace – First Steps.
Cambridge: MIT Press, Cambridge, MA, 1991, pp. 119-224
Bolter, J. D. and R. Grusin, Remediation: Understanding New Media. MIT Press, Cambridge,
Boyd, D. and N. Ellison, ‘Social network sites: Definition, history, and scholarship,’ in:
Journal of Computer-Mediated Communication, 13(1), 2007
Boyle, J., ‘Foucault in Cyberspace’, in: Univ. Cincinnati Law Review, 66, 1997, pp. 177-205Brügger, N., Archiving Websites: General Considerations and Strategies. Centre for Internet Re-
Bruns, A., Blogs, Wikipedia, Second Life, and Beyond: From Production to Producage. Peter Lang,
Castells, M., The Information Age: Economy, Society and Culture – The Rise of the Network Society.
Chesney, T., ‘An empirical examination of Wikipedia’s credibility’, in: First Monday, 11(11),
Chun, W., Control and Freedom: Power and Paranoia in the Age of Fiber. MIT Press, Cambridge,
Contractor, N., ‘Digital Traces: An Exploratorium for Understanding and Enabling Social
Networks’, presentation at the annual meeting of the American Association for theAdvancement of Science (AAAS), 2009
Dibbell, J., My Tiny Life: Crime and Passion in a Virtual World. Henry Holt, New York, 1998Diebert, R., J. Palfrey, R. Rohozinski, and J. Zittrain (eds.), Access Denied: The practice and
policy of global Internet filtering. MIT Press, Cambridge, MA, 2008
van Dijck, J., ‘Users Like You: Theorizing Agency in User-Generated Content’, in: Media,
Culture and Society, 31(1), 2009, pp. 41-58
Dunne, A., Hertzian Tales: Electronic Products, Aesthetic Experience, and Critical Design. MIT
Elmer, G., ‘Hypertext on the Web: The Beginnings and Ends of Web Path-ology’, in: Space
Elmer, G., Profiling Machines. MIT Press, Cambridge, MA, 2004
Foot, K. and S. Schneider, ‘Online Action in Campaign 2000: An Exploratory Analysis of
the U.S. Political Web Sphere, in: Journal of Broadcast and Electronic Media, 46(2), 2002,pp. 222-244
Foot, K. and S. Schneider, Web Campaigning. Cambridge, MA: MIT PressFoucault, M., The History of Sexuality Vol.1: The Will to Knowledge. Penguin, London, 1998Fuller, M., Behind the Blip: Essays on the Culture of Software. Autonomedia, Brooklyn, 2003Galloway, A., Protocol: How Control Exists After Decentralization. MIT Press, Cambridge, MA,
Giles, J., ‘Internet encyclopedias go head to head’, in: Nature, 438, 2005, pp. 900-901Goldsmith, J. and T. Wu, Who Controls the Internet? Illusions of a Borderless World. Oxford,
Hayles, K., ‘Print Is Flat, Code Is Deep: The Importance of Media-Specific Analysis’, Poetics
Hine, C., Virtual Ethnography. Sage, London, 2000Hine, C. (ed.), Virtual Methods: Issues in Social Research on the Internet. Berg, Oxford, 2005Introna, L. and H. Nissenbaum, ‘Shaping the Web: Why the Politics of Search Engines
Matters’, The Information Society, 16(3), 2000, pp. 1-17
Jeanneney, J.-N., Google and the Myth of Universal Knowledge. University of Chicago Press,
Jenkins, H., Convergence Culture: Where Old and New Media Collide. NYU Press, New York,
Jensen, J., ‘Interactivity: Tracking a New Concept in Media and Communication Studies’,
in: P. Mayer (ed.), Computer Media and Communication. Oxford University Press, Ox-ford, 1999, pp. 160-188
Jones, S., ‘Studying the Net: Intricacies and Issues.’ in: S. Jones (ed.), Doing Internet Re-
search: Critical Issues and Methods for Examining the Net. Sage, London, 1999, pp. 1-28
Keen, A., The Cult of the Amateur: How Today's Internet is Killing Our Culture. Nicholas Brealey,
Knorr-Cetina, K., Epistemic Cultures. Harvard University Press, Cambridge, MA, 1999Krebs, V., ‘Mapping Networks of Terrorist Cells’, in: Connections, 24(3), 2002, 43-52Krug, S., Don’t Make Me Think! A Common Sense Approach to Web Usability. New Riders,
Landow, G., Hyper/Text/Theory. Johns Hopkins University Press, Baltimore, MD, 1994Latour, B. and S. Woolgar, Laboratory Life. Princeton University Press, Princeton, NJ, 1986Lazer, D. et al., ‘Computational Social Science’, in: Science, 323, 2009, pp. 721-723Lenhart, A. and M. Madden, ‘Social Networking Websites and Teens’, Pew Internet Project
Data Memo, Pew Internet & American Life Project, Washington, DC, 2007
Lynch, M., ‘A sociology of knowledge machine’. in: Ethnographic Studies, 2, 1997, pp. 16-38Magnus, P.D., ‘Early response to false claims in Wikipedia’, in: First Monday, 13(9), 2008
Manovich, L., ‘Cultural Analytics.’ unpublished ms., www.manovich.net/cultural_analy-
Manovich, L., Software Takes Command. unpublished ms., www.manovich.net/ (accessed 10
Marres, N. and R. Rogers, ‘Depluralising the Web, Repluralising Public Debate. The GM
Food Debate on the Web,’ in: R. Rogers (ed.), Preferred Placement. Jan van Eyck Edi-tions, Maastricht, 2000, pp. 113-135
Marres, N. and R. Rogers, ‘Subsuming the Ground: How Local Realities of the Ferghana
Valley, Narmada Dams and BTC Pipeline are put to use on the Web’, Economy & Society,37(2), 2008, pp. 251-281
McLuhan, M., Understanding Media: The Extensions of Man. McGraw Hill, New York, 1964Miller, D. and D. Slater, The Internet: An Ethnographic Approach. Berg, Oxford, 2000Mills, C. Wright, The Sociological Imagination. Penguin, Harmondsworth, 1971Niederer, S., ‘Wikipedia and the Composition of the Crowd,’ unpublished ms., 2009NRC Handelsblad. 28 August 2007Park, H. and M. Thewall, ‘Hyperlink Analyses of the World Wide Web: A Review’, in:
Journal of Computer-Mediated Communication, 8(4), 2003
Prensky, M., ‘Digital Natives, Digital Immigrants’, On the Horizon. 9(5), 2001Read, B., ‘Can Wikipedia Ever Make the Grade?’ Chronicle of Higher Education, 53(10),
Reingold, H., Virtual Reality: Exploring the Brave New Technologies. Summit, New York, 1991Rheingold, H., The Virtual Community: Homesteading on the Electronic Frontier. Addison-Wes-
Rogers, R., ‘Operating Issue Networks on the Web,’ in: Science as Culture, 11(2), 2002,
Rogers and Marres, N., ‘French scandals on the Web, and on the streets: A small experi-
ment in stretching the limits of reported reality’, in: Asian Journal of Social Science, 30(2), 2002, pp. 339-353
Rogers, R., ‘The Viagra Files: The Web as Anticipatory Medium’, in: Prometheus, 21(2),
Rogers, R., Information Politics on the Web. MIT Press, Cambridge, MA, 2004Rogers, R., ‘The Politics of Web Space,’ unpublished ms., 2008Rogers, R., ‘The Googlization Question, and the Inculpable Engine’, in: Stalder, F. and K.
Becker (eds.), Deep Search: The Politics of Search Engines. Edison, NJ: Transaction Pub-lishers, 2009
Schneider, S. and K. Foot, ‘Online structure for political action: Exploring presidential Web
sites from the 2000 American election’, Javnost, 9(2), 2002, pp. 43-60
Shaviro, S., ‘Money for Nothing: Virtual Worlds and Virtual Economies’, in: M. Ipe (ed.),
Virtual Worlds. The Icfai University Press, Hyderabad, 2008, pp. 53-67.
Shirky, C., ‘Ontology is Overrated: Categories, Links, and Tags’, The Writings of Clay
Shirky, 2005, www.shirky.com/writings/ontology_overrated.html (accessed 28 Janu-ary 2009)
Spink, A. and B.J. Jansen, Web Search: Public Searching on the Web. Kluwer, Dordrecht, 2004Stone, A.R., The War of Desire and Technology at the Close of the Mechanical Age. MIT Press,
Sunstein, C., Infotopia: How Many Minds Produce Knowledge. Oxford University Press, New
Swartz, A., ‘Who writes Wikipedia?’ Raw Thoughts blog entry, 4 September 2006, www.
aaronsw.com/weblog/whowriteswikipedia/ (accessed 22 August 2008)
Turkle, S., Life on the Screen: Identity in the Age of the Internet. Simon & Schuster, New York,
Vaidhyanathan, S., ‘Where is this book going?’ The Googlization of Everything Blog, 25
book_going.php (accessed 22 December 2008)
Walker, J., ‘Feral Hypertext: When Hypertext Literature Escapes Control’, Proceedings of
the Sixteenth ACM conference on Hypertext and Hypermedia, 6-9 September 2005, Salzburg,Austria, pp. 46-53
Watts, D., Small Worlds. Princeton University Press, Princeton, 1999Weltevrede, E., Thinking Nationally with the Web: A Medium-Specific Approach to the National
Turn in Web Archiving. M.A thesis, University of Amsterdam, 2009
Williams, R., Television: Technology and Cultural Form. Fontana, London, 1974Woolgar, S., ‘Five Rules of Virtuality, in: S. Woolgar (ed.), Virtual Society? Technology, Cyber-
bole, Reality. Oxford University Press, Oxford, 2002, pp. 1-22
Date: 26/9/2012 Imtiaz Cajee - Biography: I was born in August 1966 at my maternal grand-parents’ residence. As per Indian tradition, a first time mother is expected to return to her parents’ home for maternity, thus I was born in Roodepoort on the West Rand. However, forty days after my birth my mother returned to her matrimonial home in Standerton, the Eastern Transvaal (now Mpumalan
United States Court of Appeals For the First Circuit MARÍA YOLANDA MARCANO RIVERA; JORGE RODRÍGUEZ MATOS;HOSPITAL INTERAMERICANO DE MEDECINA AVANZADA,APPEAL FROM THE UNITED STATES DISTRICT COURT[Hon. Héctor M. Laffitte, U.S. District Judge] Lynch, Lipez, and Howard, Circuit Judges. Orlando H. Martínez-Echeverría, with whom Fernando E. AgraitJorge M. Suro Ballester, with whom Carlo