Web Graph Sociology Research Initiative


This section references important papers and reports with relevance to web graph sociology. Each referenced paper includes an abstract or excerpt from the paper and a link to the full document. Documents are divided into the following categories:

Social Impact of the Internet

Electronic Communities: Global Village or Cyberbalkans? (pdf)
Marshall Van Alstyne, Erik Brynjolfsson, MIT Sloan School, March, 1997

Just as separation in physical space, or basic balkanization, can divide geographic groups, we find that separation in virtual space, or cyberbalkanization can divide special interest groups. In certain cases, the latter can be more fragmented. We introduce several formal indices of balkanization then show both algebraically and graphically the conditions under which these indices will rise or fall with different levels of access.(p.3)

We do not argue that increased balkanization must result from increased connectivity. On the contrary, we believe that the Internet has enormous potential to elevate the nature of human interaction. Indeed, we find that if preferences favor diversity, the same mechanisms might reduce balkanization. However, our analysis also indicates that, other factors being equal, all that is required for increased balkanization is that preferred interactions are more focused than existing interactions. Thus, we examine critically the claim that a global village is the inexorable result of increased connectivity.

Sidewalks in Cyberspace (pdf)
Noah Zatz, Harvard Journal of Law & Technology
Volume 12, Number 1 Fall 1998, p. 54

The sticking points remain audience scarcity and "balkanization." Even though Web pages can in principle be accessed by millions of people at relatively low cost, it may nonetheless be difficult to get them to visit a specific site. They may simply have no way of knowing the site exists or have no interest in seeking it out. Public forums not only allow access in principle to large numbers of people but they permit speakers to seek out their audiences. Indeed, they facilitate a degree of communication among members of the public by mere juxtaposition in the same place; there is important social value simply in seeing that other kinds of people exist and in retaining some degree of familiarity through jostling on a subway, passing by on the sidewalk, or waiting in line together at the post office. These sorts of very casual encounters are the ones most distant from the current structure of cyberspace, in which one never sees any trace of the individuals simultaneously using the same ISP or interacting with the same website, except when the cyber-place is specifically constructed to enable such interactions.

The Internet and Civil Society
Peter Levine, Jul 11, 2002, 8:00am web posting

As Cass Sunstein notes in Republic.com, when groups consisting of dissimilar people deliberate, they move toward compromise positions, and sometimes reach consensus. But when ideologically similar people deliberate, they drift toward their own ideology's extreme limits. Since the Internet encourages citizens to select discussion partners from across the country or around the world, one can expect to see islands of committed believers isolating themselves from conversation with those of differing views. Such groups will likely come before our national institutions with firm demands, but without having learned to compromise or to respect other people’s views. This nightmare will not come to pass if news organs continue to attract large and diverse audiences, either by surviving offline or by moving successfully onto the Internet. So far, certain Web sites have indeed found ways to reach mass publics with general news and diverse debates. However, the Internet presents many dangers for traditional bridge-building media, especially metropolitan daily newspapers, which are likely to lose market share to niche Web sites.

Internet Paradox Revisited
Robert Kraut, Sara Kiesler, Bonka Boneva, Jonathon Cummings, Vicki Helgeson, and Anne Crawford
Carnegie Mellon University
May 4, 2001, Version 14.0

Kraut et al. (1998) reported small but reliable negative effects of using the Internet on measures of social involvement and psychological well-being among Pittsburgh families in 1995-1996. We called the effects a “paradox” because participants in the sample used the Internet heavily for communication, which generally has positive effects. In a 3-year follow-up of the original sample, we find that negative effects dissipated over the total period. We also report findings from a longitudinal study in 1998-99 of new computer and television purchasers. This new sample experienced overall positive effects of using the Internet on communication, social involvement, and well-being. Using the Internet generally predicted better outcomes for extraverts or those with more social support but worse outcomes for introverts or those with less support. Although using the Internet had slightly different benefits for teens and adults, controlling for age does not change the main conclusions.

Expanding the public sphere through computer-mediated communication:
Political discussion about abortion in a Usenet newsgroup.
Schneider, S. M. (1997). Unpublished PhD dissertation, MIT, Cambridge, MA.

Newsgroups are unquestionably a component of the informal zone of the public sphere. Thus, it is suggested that the definition of the public sphere be expanded to include all forms of “associational space,” providing the opportunity for citizens to converse with each other. Even those forms of associational space with no clearly identified political activity resulting from the discussions contribute to the opinion- and will-formation exercise that is the function of the public sphere in a democratic society. Usenet newsgroups provide extensive opportunities for individuals to comment freely and autonomously on topics of public concern, and more importantly to engage in public discourse with other citizens about these issues.

Network Theory

The structure and function of complex networks
M.E.J. Newman, Sante Fe Institute, 2003

A network is a set of items, which we will call vertices or sometimes nodes, with connections between them, called edges (Fig. 1). Systems taking the form of networks (also called “graphs” in much of the mathematical literature) abound in the world. Examples include the Internet, the World Wide Web, social networks of acquaintance or other connections between individuals, organizational networks and networks of business relations between companies, neural networks, metabolic networks, food webs, distribution networks such as blood vessels or postal delivery routes, networks of citations between papers, and many others (Fig. 2). This paper reviews recent (and some not-so-recent) work on the structure and function of networked systems such as these.

A Short FAQ to Social Networks: A Non-Technical Elementary Primer
Charles Kadushin, Working Background Paper for The CERPE Workshop May 21-26, 2000

Tie Strength and the Impact of New Media
Caroline Haythornthwaite, published in the Proceedings of the Hawaii International Conference On System Sciences, January 3-6, 2001, Maui, Hawaii.

This paper presents a perspective on the impact and use of new media that focuses on the strength of the interpersonal tie connecting communicators. Research shows that more strongly tied pairs communicate more frequently, maintain more and different kinds of relations, and use more media to communicate. It is argued that where ties are strong, communicators adapt their use of media and expand to other media to support the exchanges important to their tie; where ties are weak, communicators rely on few means of contact (often only one), and depend on media and protocols established by others. It is theorized that dependence on a common, widely used medium makes a weak tie network vulnerable to dissolution and reformulation following changes to that medium; by contrast strong ties are more robust under conditions of change since their connection rests on multiple relations and media.

FACETED ID/ENTITY: Managing representation in a digital world (pdf)
by Danah Boyd, MIT Media Lab Master's Thesis, 2002

See chapter 7 on Social Network Fragments:<br> The social network of most people is quite large; manual studies of social networks have found that people average approximately 1500 ties of all different strengths (Killworth, et. al. 1990). Because of the ephemeral nature of people’s connections, there are even more ties documented in email; many digital connections are so tangential that offline researchers would not even consider them. The quantity of ties impacts the dimensionality of one’s network, because rarely do people maintain social networks where all members of their network are unaware of all others. Instead, there are many different types of ties between the different members of one’s network. By simply trying to imagine what the graph of such a system would be, it is easy to realize that this largely dimensional dataset is quite hard to comprehend. In response, Social Network Fragments seeks to make this information accessible through an interactive visualization.

Defining Cyberspace

The Cyber Space–Time Continuum: Meaning and Metaphor
Mihalache, Adrian (2002) The Information Society, 18:293–301

As far as information exchange is concerned, the cyber space–time continuum takes over from real life only the concept of discourse, that is, the organization of signs in a significant pattern. Discourses in cyberspace have multiple sources and multiple targets. The time of the few centralized emitters is gone. Every cyber-person is wrung between the drive for expression and the thirst for information. You have to silence the other and lure him to pay attention to you, that is, to give you his time, the most precious resource in cyberspace. On the other hand, you are also prepared to listen; sometimes you are even craving an answer. Consequently, there is a permanent exchange of information, but it is not directly reciprocal. I may not require your information content while delighting in giving you mine. I may speak to you, then listen to another person. The only thing that mediates all informational exchanges is time. One buys my information and pays with his time, I buy my information and pay with my time. Time plays in cyberspace the part that money plays in real life; "time is money" indeed—it is the abstract equivalent of any merchandise, be it information, knowledge, or whatever, the sign of all the signs. The discourses do not fight each other; more often than not they just pass along each other, like ships in the dark. They slide along the fabric of cyberspace, moving with the time. (p. 300)

The Meaning of the Web
Jim Falk 1998 The Information Society, 14:285-293

It is difficult even to define the Net. Is it the full set of protocols through which computers communicate with each other, togetherwith the hardware links, or is it merely the set of communication hubs, the computers that connect those hubs? From the user’ s point of view, of course, it is none of these things. It is the resources and experiences that it makes available that give the Net its distinctive character and attraction and are the foundations of its meaning.

Hyperbole over Cyberspace:
Self-presentation & Social Boundaries in Internet Home Pages and Discourse
Wynn and Katz (1997), 13(4): 297-328.

Futurist sensationalism, journalistic attention, constructivist theory, and appeal to technical determinism, all make the genre of literature on cyberspace, described as postmodern, visible and possibly influential. This paper takes issue with assertions in this literature that Internet communication alters cultural processes by changing the basis of social identity, and that it provides alternate realities that displace the socially grounded ones of everyday synchronous discourse. A main theme of the postmodern perspective on cyberspace is that Internet technology liberates the individual from the body, and allows the separate existence of multiple aspects of self which otherwise would not be expressed and which can remain discrete rather than having to be resolved or integrated as in ordinary social participation. The concepts under review presume a prior definition of self as a psychological unity, when the term is open to many definitions including the one that the self is a product of varying social contexts and is normally managed to accommodate them. Arguments from phenomenological hermeneutics are available to counter the plausibility of programming multiple selves, as the postmodern literature on cyberspace suggests can be done. The notion of fragmentation contradicts a substantial body of theory in social interaction based in the premise of coconstruction. Evidence of the socially grounded nature of interaction exists everywhere in cyberspace. Empirical examples include: list discourse that illustrates the situated significance of authentic identity in Internet professional groups; secondary research suggesting electronic communication is most successful as one genre in a communication repertoire; cases of home page self-presentation mediated through socially defined links; and evidence that the "virtualness" and alleged anonymity of Internet are illusory and therefore could not over time support a plausibly disembodied, depoliticized, fragmented "self".

The Market Logic of Information
Phil Agre Knowledge, Technology, and Policy 13(1), 2001, pages 67-77.

.. experience is making clear that the idea of cyberspace is misleading, and that the uses of advanced information technologies do not constitute a space apart from ordinary reality (Wynn and Katz 1997). Quite the contrary, for most purposes advanced information technologies are deeply bound up with the rest of the world. Virtual reality is simply one end of a spectrum of applications, each of which embeds networked hardware and software into the world in a different way.... Notions of online community are equally misleading. People do form social bonds through new digital media. But communities are analytically prior to the particular technologies that their members use. The worldwide community of stamp collectors, for example, existed long before the Internet, and it conducts its collective life through many media besides the Internet. Its members even meet face-to-face. The same goes for the thousands of other communities of shared interest -- professions, associations, extended families, political parties, and so on. Use of the Internet might change the dynamics of these communities, and many striking stories of community change have been recorded. But Internet use is embedded in something larger. Even communities that form online often develop other means of interaction, for example by holding caucuses at conferences. In each case, the notion of cyberspace limits our vision by directing our attention to a small corner of a large phenomenon: the interaction and coevolution between new technologies and the institutional orders and ways of life in which they are embedded.

Seeing Cyberspace: The Electrical Infrastructure is Architecture
Brian Thomas Carroll, 2001

Through architectural language, one can see the otherwise intangible Cyberspace materialized in the power, media, and technological systems of the Electrical Infrastructure. In so doing, pressing issues such as war, energy inefficiency, global warming, pollution, and economic instability can be structurally related to the seemingly separate experience online the Internet. Identifying this relationship can help to educate and organize citizens who want to address common yet otherwise ignored needs of the representative human public.

Web Graph Analysis

All-American Issues: Seven Stories From The Homeland (pdf)
Proceedings of the News about Networks workshop, preprared by Richard Rogers, Govcom.org Foundation, Amsterdam, 2003.

"Doing without News?" is the main thread running through a series of specific research projects undertaken at the workshop, using data collection, analysis and visualisation software tools. The report provides an introduction to the some of the arguments about why one may desire to ‘do without news’. Subsequently, it describes each of the projects, including the data collected, the methods employed and the results that were sought, and eventually found. It also contains the info-graphics created during the workshop that aid in telling the stories.

Hyperlink Network Analysis:
A New Method for the Study of Social Structure on the Web
Han Woo Park, 2003 CONNECTIONS 25(1): 49-61 © 2003 INSNA

This paper identifies hyperlink network analysis (HNA) as a newly emerging methodology. It suggests that social (or communication) structures on the web may be analyzed based on the hyperlinks among websites. Hyperlink network analysis has advantages in describing emerging structures among social actors on the web. In order to examine what constitutes hyperlink network analysis, this paper reviews prior research on the topic. Further, it describes the data-gathering techniques for those interested in hyperlink network analysis.

You are what you link
Adamic, L.A. and E. Adar. 2001. Presented to the 10th annual International World Wide Web Conference, Hong Kong.

The Web has become a massive repository of a wide variety of data. Hidden in this pile of data is information about web users, in the form of the text on their personal homepages, links to and from those homepages, and users' mailing lists. Our study of the publicly available user information revealed a complex network of interpersonal links and a rich mining ground for studying social phenomena. We have developed a web application to gather such data and present it in an easily browsable form.

Measuring National borders on the world wide web.
Halavais, A. 2000. Master's thesis.

It is difficult for us to think of a network without imagining a space in which that network is inscribed. Likewise, “space” does not exist outside of an arrangement of objects or ideas. It is what comes between objects and is therefore key in describing relationships and arrangements. Throughout the course of this thesis, the ideas of space and of structure commingle, providing an interstitial foundation. But this foundation is only visible through the tensions it creates with earlier social regimes. We recognize that concepts of space have changed for large groups of people through anecdotal tales of trans-oceanic friendships and business relationships. We see the evidence of a more networked society in guides to becoming a “free agent” in the business world and observations that we are “bowling alone.” Just as a Bronze Age metallurgist must have judged his own progress against his isolated Neolithic neighbors, we only see the inklings of a new age within an élite “net set,” which bears much in common with the “jet set” that came before it. It is in this difference that we might identify the diffusion of a new social structure. This difference also creates the stresses of a complex social system in the midst of a state change.

Networks and flows of content on the World Wide Web
Halavais, A (2003). International Communication Association. San Diego, May.

The internet represents an embarrassment of riches to the social scientist. As interactions increasingly move online, the potential for recording and analyzing these interactions also rises exponentially. The availability of this data now in many cases taxes our abilities to collect and analyze it, to attach meaning to it. The World Wide Web represents just one facet of the great number of interactions that occur online. One would think that because it is relatively static and public when compared with other uses of the Internet, the Web represents a fairly easy target of study. On the contrary, the scholar interested in understanding the content and use of the World Wide Web is faced with number of critical difficulties, many of them related to the scale of the data available, and others related to the ever-evolving content found there. These challenges have been noted before (Mitra & Cohen, 1999, inter alia); here it is proposed how some of the technologies that have emerged as part of the Web may be usefully employed in analyzing the Web.

The Web as a Graph
Ravi Kumar et al, 2000 Proc. 19th ACM SIGACT-SIGMOD-AIGART Symp. Principles of Database Systems, PODS

The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph has about a billion nodes today, several billion links, and appears to grow exponentially with time. There are many reasons -- mathematical, sociological, and commercial -- for studying the evolution of this graph. We first review a set of algorithms that operate on the Web graph, addressing problems from Web search, automatic community discovery, and classification. We then recall a number of measurements and properties of the Web graph. Noting that traditional random graph models do not explain these observations, we propose a new family of random graph models.

Trawling the Web for Emerging Cyber-Communities
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins
Computer Networks (Amsterdam, Netherlands: 1999)

The web harbors a large number of communities -- groups of content-creators sharing a common interest -- each of which manifests itself as a set of interlinked web pages. Newgroups and commercial web directories together contain of the order of 20000 such communities; our particular interest here is on emerging communities -- those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment.

Generating Web Graphs with Embedded Communities
Vivek Tawde et al, 2004 WWW2004, May 17–22, 2004, New York, NY USA, ACM.

The ideal model for the web graph should encompass local as well as global characteristics of the web. Such a model will not only be useful in studying global structure of the web but also in characterizing behavior of local components. It can be utilized to study the global effect of evolution of local components and viceversa (e.g the effect of creating a community, merging of two or more communities, and splitting of communities on the web graph). This will be of tremendous value because it will make it possible to predict the effect of a particular phenomenon in the graph on various characteristics of the graph and its components. Such a comprehensive model of the web graph will make it possible to reason about changes in properties of the graph in local or global contexts. It will also be a useful model to the research community which extensively studies algorithms for ranking web pages based on connectivity (e.g. HITS [14], Pagerank [3]). A modified algorithm can be tested on this model first to get an idea about it performance before actually testing it on the real web which can be an expensive task in terms of computation and resources.

Self-Organization and Identification of Web Communities
G. W. Flake, S. Lawrence, C. L. Giles, and F. M. Coetzee
IEEE Computer, 35(3), 66–71, 2002

Identification of communities on the web is significant for several reasons. Practical applications include automatic web portals and focused search engines, content filtering, and complementing text-based searches. More importantly, global community identification allows for analysis of the entire web and the objective study of relationships within and between communities (for example, scientific disciplines or countries). Such research could provide insight into the organization and interests of sectors of society, which individual members reflect by their linking practices. For example, links between scientific disciplines may allow more timely identification of emerging interdisciplinary connections.

Inferring Web Communities from Link Topology
Gibson, Kleinberg, and Raghavan 1998 UK Conference on Hypertext

Our analysis of the link structure of the WWW suggests that the on-going process of page creation and linkage, while very difficult to understand at a local level, results in structure that is considerably more orderly than is typically assumed. Thus it gives us a global understanding of the way in which independent users have built connections to one another, and a basis for predicting the way in which on-line communities in less computer-oriented disciplines will develop as they become increasingly "wired."

An approach to build a cyber-community hierarchy
P. Krishna Reddy et al, Institute of Industrial Science, University of Tokyo

In this paper we propose an approach to extract community structures in the Web by considering a community structure as a group of content creators that manifests itself as a set of interlinked pages....A high-level community is abstracted as a Dense Bipartite Graph (DBG) over a set of low-level communities. Using the proposed approach, a community hierarchy can be constructed for the given data set that generalizes a large number of low-level communities into a few high-level communities....We believe that the extracted community hierarchy facilitates easy analysis of the low-level communities...and provides a way to understand the sociology of the web.

Evaluating Emergent Collaboration on the Web
Loren Terveen, Will Hill 1998, Computer Supported Cooperative Work

Links between web sites can be seen as evidence of a type of emergent collaboration among web site authors. We report here on an empirical investigation into emergent collaboration. We developed a webcrawling algorithm and tested its performance on topics volunteered by 30 subjects. Our findings include:
Collective dynamics of 'small-world' networks
Watts & Strogatz, Nature, Vol. 393 | June 4, 1998, pp. 440-442.

Networks of coupled dynamical systems have been used to model biological oscillators1, Josephson junction arrays, excitable media, neural networks, spatial games11, genetic control networks and many other self-organizing systems. Ordinarily, the connection topology is assumed to be either completely regular or completely random. But many biological, technological and social networks lie somewhere between these two extremes. Here we explore simple models of networks that can be tuned through this middle ground: regular networks ‘rewired’ to introduce increasing amounts of disorder. We find that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs. We call them ‘small-world’ networks, by analogy with the small-world phenomenon.

Network Visualization

Visualizing Social Networks
Linton Freeman, 2000, Carnegie Mellon: Journal of Social Structure: Visualizing Social Networks

There have been five fairly distinct phases in the development and use of point and line displays in social network analysis. First, beginning in the 1930s, graphic images were produced by hand. They were ad hoc and their success varied with the insight and artistic skill of their creator. Second, in the early 1950s, investigators began to turn to the use of standard computational procedures to produce images. Third, in the 1970s, computers became widely available and began to be used to produce machine drawn images automatically. Fourth, in the 1980s, the presence of personal computers encouraged investigators to develop images that could be displayed on monitors and in color. And fifth and finally, in the 1990s, the availability of browsers and the World Wide Web opened up all sorts of new possibilities for graphic display. I will review this history and the present state of the art in this paper.

Graph Visualization and Navigation in Information Visualization: A Survey
Ivan Herman, Member, IEEE Computer Society, Guy MelancËon, and M. Scott Marshall

This is a survey on graph visualization and navigation techniques, as used in information visualization. Graphs appear in numerous applications such as web browsing, state-transition diagrams, and data structures. The ability to visualize and to navigate in these potentially large, abstract graphs is often a crucial part of an application. Information visualization has specific requirements, which means that this survey approaches the results of traditional graph drawing from a different perspective.

Macroscope Manifesto
Jonathan Schull 2002
The profound importance of branching and interconnecting proliferative patterns has been established and celebrated by Malthus, Darwin, Mendel, Watson/Crick, William James, Richard Dawkins, Tim Berners-Lee, and many others, but the problem remains under-appreciated and under-studied. The intra- and inter-disciplinary importance of this topic is increasingly clear to evolutionary theorists and graph theorists, to applied scientists such as epidemiologists and social network analysts, to educators, and even to those who must worry that distributed terrorist networks conspire over global networks of communication to release genetically-engineered bio-weapons into unprotected populations.

As pervasive as they are, these patterns are often overlooked because they exist in spatial, physical, temporal, and informational domains to which our sensory systems are not tuned. The purpose of this project is to make these phenomena visible, and studiable. We want to build a “macroscope” and apply it to a number of empirical research domains.

Home | FAQ | Site Map | Privacy Policy | Contact | Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License.