Article Image

Cover: Iraq. Kadimain, third holy city of the Shite Moslems [i.e., Muslims]. Road to Kadimain from Baghdad. 1932 (Library of Congress, LC-DIG-matpc-16058)

Please, use the latest version at

My dissertation, Computational Reading of Arabic Biographical Collections with Special Reference to Preaching in the Sunni World (661-1300 ce), turned out to be more on the method of computational reading rather than on anything else, but the results are most exciting in terms of prospects that this method opens. After less than two years of development this method allows getting almost instantaneous insights into a great number of historical issues. Although technologically the approach has been developed practically from scratch, in spirit it follows in the footsteps of the quantitative method that has been used by the scholars of Islam since the 1970s. In its current state the method is best suited for analyzing biographical data from social, chronological and geographical perspectives, yet the complexity of analytical tasks can be increased </span>ad infinitum. Computational reading is flexible, scalable and fast beyond comparison with conventional methods. Dwelling on these properties should offer a glimpse into the prospects of its further implementation.

The flexibility of computational reading allows asking various historical questions by designing analytical algorithms of any complexity. (It should be stressed that even though the method puts a lot of emphasis on the use of technology, its effectiveness implementation requires traditional training in Near Eastern studies.) Although the emphasis in the dissertation was primarily on the analysis of biographical data (dates, names and toponyms), computational reading also allows for the analysis of complex textual evidence.

Age statements from Taʾrīkh al-islām. The left image shows the chronological fluctuation of the average lifespan, while the image on the right shows the chronological distribution of age statements (darker areas mean more age statements).

For example, we can get a glimpse into the age structure of Islamic élites through the computational analysis of age statement that often occur in biographies. Analyzing most frequent types of such statements in Taʾrīkh al-islām, my experimental algorithm yields ages for over 5,100 individuals and shows that during the period of almost seven centuries the average lifespan fluctuated between 67 and 80 lunar years (Age Statements, left), clearly going down when age statements become more and more frequent, after c. 350/962 CE (Age Statements, right). Onomastic and toponymic synsets that allow re-grouping data using social, religious and geographical parameters may shed light on the age structure of different social groups and local communities. With minor modifications, this analytical algorithm can be applied to other sources as well. For example, the Hadīyat al-ʿārifīn offers age data on about 1,650 Islamic authors (out of approximately 8,800) and a very cursory glance at the results shows that the longevity was indeed characteristic of religious scholars,1 while most of the short-lived authors are usually found in the field of poetry and fine literature, where talent and audacity seem to have been more important than networks and perseverance. Ability to collect such data from numerous biographical collections will help to advance the study of the demography of the Islamic world.

In a similar manner, algorithms can be devised for a more complex analysis of onomastic data that would allow, for example, reproducing Bulliet’s study of conversion.2 The very fact that this study is still criticized3 after more than three decades from its publication shows that Bulliet’s model of conversion cannot be discarded through a critique of where it fails, if otherwise it still remains plausible and coherent. The old model will remain standing until an equally plausible alternative can be offered.  The flexibility of computational reading will allow re-testing the original model of conversion on new biographical collections, experimenting with its variations and developing a new one.

The emphasis in the dissertation was primarily on biographical collections, however, computational reading can be applied to texts of any genre, although it does work best for texts that show structural regularities of some kind. For example, one can design algorithms that will allow tracing the usage of Qurʾānic verses over the entire digital corpus of Islamic sources. Such a large-scale study of how the Qurʾān was quoted and interpreted by different authors will allow to improve our understanding of how different aspects of the Islamic scripture were coming to prominence depending on historical circumstances. The same can be done for Prophetic traditions (sing. ḥadīth), where computational reading will be particularly helpful for the analysis of the chains of transmitters. Compendia of legal decisions (sing. fatwá) can also be analyzed in the same manner and the exploratory analysis of possible correlations between the topics of legal decisions, locales and periods will most likely reveal unexpected commonalities and differences between regional communities of Muslims, as well as offer a unique perspective on the long-term regional development of Islamic law. Likewise, interesting experiments can be designed for the study of classical Arabic poetry. Considering that the meter can be identified computationally,4 the scholars of Arabic poetry can look for correlations between meters and themes, and, of course, put their discoveries in geographical and chronological perspectives.5

The scalability of computational reading allows testing whether the same historical questions yield similar results when asked of new sources—this is done by applying existing analytical algorithms to new sources. For example, the already devised complex means of identifying preachers and passages relevant to preaching can be effectively applied to local biographical collections and local histories, which will allow us to get a more detailed idea of the chronology of different preaching practices in particular regions of the Islamic world, and simultaneously test whether regional representation in Taʾrīkh al-islām corresponds to that of local sources.

Computational reading is fast. It does take a great deal of time to put together the essentials—devise algorithms, compile synsets, reformat sources—but when they are ready, the results can be produced almost instantaneously. The results can be easily regenerated if analytical algorithms require adjustments or new sources added to the corpus; and it does not matter whether analytical algorithms are applied to a single text or the entire digital corpus of classical Arabic that already significantly exceeds 400 million words. In most cases the results come in volumes that are significant enough to trace historical patterns.

Geographical Networks of the Legal Schools. Legend: Yellow cores and numbers on the left show individuals strongly associated with regions in question; yellow husks and numbers on the right show individuals who visited regions in question. NB: each maps has its own scale.

The volume of structured data that has been generated so far from Taʾrīkh al-islām alone is sufficient for dozens of studies that will allow advancing our understanding of the social history of the pre-modern Islamic world. Most of these data remained outside this dissertation project, but to give an idea of these “byproducts,” we can take a quick look at the results for the major Sunnī legal schools. Figure above shows that each school had a distinct geographical network. In and of themselves these geographies are hardly surprising and largely agree with what the students of Islam have already discovered over the last century or so. (It is worth highlighting, however, that these maps are but a circumstantial result of the two-year research by a graduate student). At the same time, reformatted into graphs and chronological maps—similar to the ones that were used in the part on preaching and preachers—these data can give the scholars of Islam a much more subtle picture of how these geographical networks were changing over time, where and when they flourished, stagnated, and declined. The use of hierarchical lists of geographical entities—toponymic synsets—allows taking a more detailed view of these geographical networks and analyze connections not only between provinces, but also between urban centers and even city quarters. By putting data on all four legal schools on the same chronological maps we can get a glimpse into how these schools were coexisting with each other in different regional clusters. Figure below should give an idea of how the “relative weights” of the schools were changing over time in major regional clusters during the period of 470–670/1078–1272 CE.

“Relative Weights” of the Legal Schools in Regional Clusters. Legend: Solid core shows individuals  strongly associated with regions in question; semi-transparent husks show individuals who visited regions in question.

Needless to say that in the same manner one can trace the chronology and geography of any social group that can be identified in the sources through relevant onomastic elements or more complex textual descriptions. One of the major advantages of the computational approach is that instead of artificially imposing chronological and geographical boundaries, one can discover periods and regions that are important to specific phenomena, practices, or social groups.

Somewhat ironically, the advantages of computational reading pose problems. The volume of results generated with this method is overwhelming. The visualization of data with tables, graphs, and maps is helpful for getting meaningful insights into findings, but comprehension and interpretation of these data will require collaborative efforts and decades of more traditional research. Fortunately, computational reading also allows marshaling all relevant textual evidence for close reading.


  1. Bulliet determines an average lifespan of 78 lunar years (Bulliet, Richard W. “A Quantitative Approach to Medieval Muslim Biographical Dictionaries.” Journal of the Economic and Social History of the Orient 13, no. 2 (April 1, 1970), p. 200); Nawas gives 80 lunar years (Nawas, John. “Development of the Islamic Religious Sciences.” al-Masaq 11 (1999), p. 161, also see fn. 8 for more references); and Şentürk—79.82 (Şentürk, Recep. Narrative Social Structure: Anatomy of the Hadith Transmission Network, 610-1505. Stanford, Calif.: Stanford University Press, 2005, p. 65). In all three cases the emphasis is strongly on the religious élites, and even more so—on the transmitters of ḥadīth, for whom longevity was one of the most important characteristics; the coverage of Taʾrīkh al-islām is, of course, not limited to any specific group. []
  2. Bulliet, Richard W. Conversion to Islam in the Medieval Period: An Essay in Quantitative History. Cambridge: Harvard University Press, 1979. []
  3. Most recently: Wasserstein, David J. “Where Have All the Converts Gone? Difficulties in the Study of Conversion to Islam in al-Andalus.” Al-Qanṭara 33, no. 2 (February 11, 2013): 325–342. []
  4. For one such tool see, The Encyclopaedia of Arabic Poetry by Cultural Foundation, Abu Dhabi (UAE), reviewed in details by Michael Bonner and Maxim Romanov at []
  5. Similar studies in the history of English fiction have already yielded a number of interesting and unexpected discoveries. For example, Jockers, Matthew L. Macroanalysis: Digital Methods and Literary History. 1st Edition. University of Illinois Press, 2013; Moretti, Franco. Graphs, Maps, Trees: Abstract Models for Literary History. Verso, 2007; Moretti, Franco. Distant Reading. 1st ed. Verso, 2013. []

Please, use the latest version at

Posted by Maxim Romanov

Research fellow (PhD in Near Eastern Studies, U of Michigan, 2013) at the Humboldt Chair for Digital Humanities [Institut für Informatik], University of Leipzig. He studies Islamic historical texts with computational methods, currently focusing on the analysis of multivolume biographical and bibliographical collections.