The World Migration Network : rankings, groups and gravity models

Samuel Martin | Download | HTML Embed
  • Jun 3, 2015
  • Views: 17
  • Page(s): 6
  • Size: 951.53 kB
  • Report



1 1 The World Migration Network : rankings, groups and gravity models AbstractHuman migrations is a growing field of relations between countries. investigation. It is accelerated by globalization, which calls for efficient methods of analysis. This contribution 3) Prediction of future migrations: A highly studies world human migrations across several decades from a novel network perspective. Tools developed in studied topic in the migration literature consists in various fields such as PageRank, community detection elaborating so-called "gravity models" (because of or gravity models are analysed and applied to diverse some resemblance with Newtons law of gravitation) aspects of migrations. describing and predicting the flow of migrants between two countries, from explaining factors such as distance, GDP of both countries, etc. We propose I. Introduction new formulas based on the ranking measures and the community partition we proposed, and achieve good Human migrations have been demonstrated to exert accuracy with a relatively restricted number of factors. a large influence on the economy [29], education [11], health [16] or labour market [27] of a country. As such Before performing such analysis there is a need to have it constitutes a growing field of research [23]. Network reliable data about migrations. The work presented here science, now a common tool in sociology [6], mobility [9], is mainly based on the data of the World Bank [30] which trade [20], etc. has not been used for an extensive study of contains about 17 million migrations that has been made the migration phenomenon, except for [13]. Migration data between 1990 and 2000. This represents about 740 000 however adopt a natural network structure, where nodes migrations by country, including islands and dependent represent countries and edges, migratory fluxes, weighted states. Other data from the World Bank like population, with a number of recorded migrants over a considered GDP, etc. are also used. period. A network perspective is therefore natural in this context. This paper is organized as follows. In Section II, the PageRank, is described for the task of ranking countries In this paper we focus more particularly on three topics instead of simple methods which presents some shortcom- of interests: ings. Section III discuss about the group detection. Finally, Section IV presents our different gravity models. 1) Ranking: Different countries widely attract different numbers of migrants. An empirical measure of II. Ranking analysis attractiveness is simply the in-degree, i.e. the total A. Simple methods number of immigrants into a country. However this A simple way to rank countries is to consider only the measure is noisy and highly varying with time. A number of immigrants. Applied with the migration data, measure of attractiveness can also be composed from this method gives the ranking presented on Table I. different possible explaining factors ranging from wealth to education level, or living conditions in the Ranking Country Ranking Country 1 United States 6 Canada destination country [14]. But such a measure requires 2 Russian Federation 7 Ukraine extra data and clarification of assumptions. In this 3 Germany 8 Saudi Arabia work we propose a robust ranking, based solely on 4 France 9 United Kingdom 5 India 10 Australia migration data itself, in the manner that is used e.g. to rank the "webpages" attractiveness on the web [24] Table I: Most attractive countries according to the number or the influence of an economy in the world [17]. of immigrants. 2) Group detection: One shortcoming of the ranking This ranking highlights a big shortcoming of the method: analysis is not considering the relations that a country the populated countries are too advantaged compared to can have with others. The group detection analysis lowly populated. This occurs because the more populated overcomes this lack by detecting groups of countries is a country, the more will be the probability to have having strong relations together. The goal is to gather migrants. countries having important migration flow together into the same group. Group, or "community", One way to solve this problem is to divide the number of detection is a mature field of network science. immigrants by the population of their home country. The Applying such methods can highlight unexpected result is presented on Table II.

2 2 Ranking Country 1 Kuwait 6 Not. Mar. Islands The PageRank has thereby the idea that all the 2 Qatar 7 Cayman Islands connections are not equal. The more important is a 3 United Arab Emirates 8 Macao SAR, China country a migrant comes from, the more importance it 4 Monaco 9 Falkland Islands 5 Andorra 10 Virgin Islands (US) gives to the country he goes to. Furthermore, it is not the population that is taken into account but only the Table II: Most attractive countries according to the number proportion of migrants of the different countries. of immigrants divided by the population. Mathematically, to compute the PageRank we need to Here, the reverse problem is observed: lowly populated follow these first steps : countries are now too advantaged. 1) Transform the migration adjacency matrix1 into a stochastic matrix. One way to get rid of the population is to elaborate 2) Add the probability to go to any countries. a ranking according to the ratio of immigrations and 3) Find the eigenvector of the highest eigenvalue . emigrations. Mathematically, it is expressed like this: Then, using the Perron-Frobenius theorem [7], it can in-degree be shown that is equal to 1 and is unique. The left ratio = . eigenvector related to this eigenvalue is the PageRank. out-degree According to Perron-Frobenius, the PageRank always It gives the Table III. exists and as mentioned above, it is equal to the Ranking Country Ranking Country stationary probability. 1 Qatar 6 United States 2 Mayotte 7 Cayman Islands 3 United Arab Emirates 8 Gabon This leads us to the computation of the following system 4 Saudi Arabia 9 French Guiana of equations: 5 Djibouti 10 Andorra Table III: Most attractive countries according in-degree GT = (1) out-degree . where G is a matrix obtained from the PageRank A last problem remains: the importance of countries computation and the PageRank. This system can be are not taken into account. Importance of a country can solved with iterative methods as Power method [12]. It is be defined in a recursive way. If a country receives people the method we used to compute the PageRank. from an important country it will gain more importance than if the country is less important. Not considering Using the PageRank to characterise the countries, we this problem leads to have good local attractors on the obtain Table IV. top at the expense of global attractors. The Table III Ranking Country Ranking Country illustrates this situation. Several countries like Mayotte, 1 United States 6 France French Guiana, Gabon or Djibouti are good attractors 2 Canada 7 West Bank and Gaza only locally. However, the goal is to have a ranking of 3 United Kingdom 8 Mexico 4 Germany 9 Puerto Rico more global attractors. To do so, a solution is to take the 5 Australia 10 Saudi Arabia importance of the countries of the incoming people into account. Table IV: Most attractive countries according to their PageRank. This directly leads to a more sophisticated method to rank countries: the PageRank. All of these countries have a plausible reason to be on this top : B. PageRank High HDI2 : Australia (0.906, 2nd ), United States The idea of the PageRank [8] is related to the random (0.897, 3rd ), Canada (0.879, 6th ), Germany (0.864, walk. Let us consider a random person who moves out 12th ), France (0.846, 18th ), United Kingdom (0.833, from country to country following the flow of people and 22nd ). who sometimes decides to go to any country randomly. High GDP3 : United States (1st ), Germany (3rd ), Given that weighted flow are considered, if the number United Kingdom (4th ), France (5th ), Canada (8th ), of migrants going from a country to another is high, the Mexico (9th ), Australia (14th ), Saudi Arabia (23rd ). probability that this random person will follow this flow Tax haven or Offshore Financial Center : Puerto Rico. will be high too. By assuming that this person moves out Oil producing country : Saudi Arabia. an infinity of times, the PageRank of a country is defined Special event [26][10] : West Bank and Gaza. as the proportion of times this random person has been in 1 It corresponds to the matrix where are stored migrations for each this country. This is also referred in the literature as the pair of country. stationary probability. 2 Human Development Index. 3 Gross Domestic Product.

3 3 Major emigrations of the United States : Mexico of cost 1. An edge (n1 , n2 ) of weight 3 will give 3 edges (16%), Canada (12.4%), Puerto Rico (11.2%), United from n1 to n2 . Additionally, instead of pruning the nodes Kingdom (7.6%), Germany (5.3%), France (3.8%). according to their total degree, we consider separately the in and the out degree. This gives another definition of the Concerning the last point, as the United States is by k-core which is most adapted for the migration graph. far the most attractive country according to the PageR- Definition III.2 (Directed k-core). Given a directed graph ank evaluation, the major emigrations coming from this G = (V, E) and a threshold k, the directed k-core of G is the country give to the destination country a good place in subgraph of G where all the nodes have a weighted in-degree the ranking. Nevertheless, other attractive countries rise in and an out-degree higher than k. top of the ranking for other reasons. The PageRank gives thereby a good ranking of the attractiveness of countries. One method to compute the k-core is to recursively prune all the nodes having their in-degree or their out- degree inferior than the threshold k. Using this algorithm, C. Extensions of the model we obtain the map presented on Figure 2. PageRank can be used in different ways to obtain other rankings : 1) Reversing all the edges of the migration graph4 and applying the PageRank give a ranking of repulsiveness. 2) Combining the PageRank attractiveness with the PageRank repulsiveness by taking their ratio allows to represent both ranking together. This result is presen- ted on Map 1 where green countries are attractive countries and red are repulsive. Figure 2: Map of k-cores for different k. On this map, the darker is the color gradient, the higher is the k-core order and the stronger is the connexion between these countries. A preconceived idea may be to believe that the United states or Canada, present in the top rank of the PageRank, belong to the densest k-core. The map invalidates this idea. Instead, we observe a (1000000)-core with six countries : India, Russia, Figure 1: Map of ratio of PageRanks. Kazakhstan, Pakistan, Russia, Ukraine, Uzbekistan. One might ask why these countries are in a k-core denser III. Group detection than the one of countries as United states known to be A. K-core strong importer and exporter and well ranked. The differ- There are several ways to define groups of countries ence between the United States and these countries is that having strong relations together. For instance, it is possible the latter have their exchange relations more centralised to form groups only by considering the strength by which with few countries while the exchanges of the former are some nodes are connected to the others. It is the idea of more distributed. the first method explained here: the k-core [3]. B. Communities Definition III.1 (k-core). Given a simple graph G = (V, E) and a threshold k, the k-core of G is the subgraph A family of method to group countries is based on the of G where all the nodes that have a degree higher than k. concept of communities. To understand it, let us first define what a graph clustering is. However our graph is directed and weighted. Some modi- fications are therefore needed. First, to take the weights Definition III.3 (Graph clustering). A graph clustering is into account, we can replace the edges of cost n by n edges a classification of the nodes of the graph into groups where the repartition of the nodes tends to optimise a configuration 4 Immigrants for a country become emigrants and vice versa. where:

4 4 The connections between nodes within the same group the network which can exceed the storage capacity. Using are strong. this algorithm, we obtain the map on Figure 3. The connections between nodes of different groups are weak. That brings us to the definition of communities. Definition III.4 (Community). A community is a group of vertices found with a graph clustering method. The challenge behind the communities method is to efficiently form clusters. There is a solution proposed by Newman and Girvan [22] which envisage the community problem with another point of view. They designed a metric, called the modularity (Q), having the purpose of measuring the quality of a graph partition into communities. The intuition behind this measure is to compare the density of the connections within a same community with the expected density for a same community partition [5]. The expected density means that we consider a randomised graph having the same number Figure 3: Communities map. of nodes where every node keeps the same degree but where the edges are placed randomly. This map exhibits four groups of countries. The West Africa. The higher is the modularity Q, the better is the A portion of the Middle East, the Indian subcontinent partitioning. The problem of finding best communities and the south of the Far East. turns thereby to maximise the modularity. Firstly, we need Countries belonging mainly to the former USSR. to have a computable expression for Q. For a directed A last community taking over the rest of the world. weighted graph such as the migration network, an expres- sion of modularity is given by the formula [19] : One may ask why Ethiopia is in the same community " # than the former USSR countries (in purple). The reason 1 X kiin kjout can be explained by some historic facts. Ethiopia has Q= Aij (ci , cj ) (2) m i,j m longstanding relations since the 17th century with Russia [21]. with Aij the weight of the edge from the node i to the node This observation shows a first interest of this kind of j. P map: discovering hidden facts by highlighting particular m = i,j Aij , the sum of all the weighted edges. situations. kiin the weighted in-degree of the node i. kiout the weighted out-degree of the node i. IV. Gravity model ci the community of the node i. The goal of the gravity model is to predict new mi- (ci , cj ) = 1 if i and j are in the same community, 0 grations by using known parameters. There are two main otherwise. families of gravity models: kin kout i mj correspond to the probability to have an edge The bilateral gravity models [1] where only the origin from the node i to j in a random graph having the and the attractiveness of the destination are con- same configuration than ours. sidered. The multilateral gravity models [4] where not only the The task is now to find the partitioning producing the origin and the attractiveness of the destination into highest modularity Q. A naive solution is to consider all account but also the attractiveness of the alternative the partitions and select the one having the highest Q. destinations. However, the problem of finding the optimal partitions is known to be NP-complete. For this reason the solutions This paper focus on bilateral gravity models. To elab- requiring to enumerate all the partitions are infeasible in orate such models, the first step is to design an expres- practice. sion defining the attractiveness between entities. In the case of migration, it is mainly defined by some relevant Another solution is the algorithm proposed by Blondel et characteristics of countries and the relations between them. al. called the Louvain method [5]. Compared to the other Depending on the model, we can for example consider: algorithms existing in the literature, the bottleneck of the Geographic factors: distances between countries, Louvain method is not the computation time but the size of common boundaries, etc.

5 5 Linguistic factors: common language, English zero-value problem of the dependent variable. spoken, etc. Socio-economic factors: population, GDP, HDI, This is the reason why we use this method as well. etc. Historic factors: old migrations, former colony, etc. Specific factors: polygamy, military service, etc. C. Our gravity model Our motivation is to build a gravity model which Several gravity models using these kinds of parameters is general and easy to explain. For example, taking have already been developed [2], [18], [25] in the literature. parameters such as the polygamy, the military service rate Such models have three mains purposes: or the fertility into account seems more arbitrary than Predicting migrations in the future. taking account of the GDP or the population. Applying it in the present to find missing migration data. To test the different explaining factors, we computed Understanding the main factors that explain the mi- a set of models, each using a unique combination gration flows. of parameters. The idea is to analyse the possible correlations between the different parameters. A. Mathematical concept of gravity model Compared to the models already present in the literat- As the gravity models generalises the Newtons formula, ure, we built new ones using mathematical parameters it can be written as: which have until now never been used on this purpose: Fij = 0 A 2 3 N 1ij A2ij A3ij . . . ANij Eij 1 The PageRank of the destination country which is a measure of the attractiveness of the destination. with The inverted PageRank of the origin country which is Fij is the flow we want to analyse. a measure of the repulsiveness of the origin. 0 a constant. The community of countries which is a measure of Aii=1:N the parameters used to explain the flow. the intensity of connections between countries. ii=1:N the exponents of the different parameters. Eij is a term of error with an expectation of 1 In addition to these three parameters, we also used more (E(Eij |A1ij , . . . , ANij ) = 1). classical parameters which are relevant in existing models. B. The zero-value and heteroscedasticity problems D. Analysis of results The problem of the previous expression is that we cannot consider the zero-value of Fij . Indeed, the logarithm of The best way to compare the efficiency of a model is zero is not a finite value but we would like our regression to compare it with others. On this purpose, we use the model to take account of as many observations as possible. pseudo R-squared metric5 , which is a measure of the In the migration data set, half of the migrations have a quality of a PPML regression. zero-value. Moreover, a problem that can not be solved concerns the zero-values in the parameters. This can One of the parameters used is the past migrations. happen when there is a data missing for a factor or a However, we developed a model containing this parameter past zero-migration between two countries. As most of but also a model without. This choice is done in the the expressions are logarithmic, the observations are not purpose of highlighting the importance of the different taken into account in these situations as well. factors that push people to migrate. In that sense, taking account of past migrations biases this analysis. Another major issue in regression is the presence of heteroscedasticity, the non-homogeneity of the variance Now, let us compare our results with existing models, of the predicted values. The problem is that it is hard to first with past migrations then without: predict and to guess the form of this heteroscedasticity. Heteroscedasticity does not change the bias of the regression but it biases the standard errors. Silva and Author # parameters R-squared Tenreyro [28] proposed several estimators based on Artuc, Docquier et al. [2] 15 0.8720.898 different forms of the heteroscedasticity. They compared Our model 7 0.9565 them in different cases and showed that the PPML (Poisson pseudo-maximum likelihood) gives the best Figure 4: Comparison between our model and Artuc, Doc- results, also compared to LS (Least Squares) regression. quier et al. model with past migrations. Furthermore, they proved that the data do not have to be Poisson at all and that the model naturally deals with the 5 McFaddens pseudo R-squared metric [15].

6 6 Author # parameters R-squared [3] V. Batagelj and M. Zaversnik. An O (m) algorithm for cores Lewer and Van den Berg[18] 10 0.663 decomposition of networks. arXiv preprint cs/0310049, pages 19, 2002. Ramos and Surinach [25] 13 0.634 [4] S. Bertoli and J. F.-h. Moraga. Multilateral Resistance to Our model 10 0.7457 Migration. (5958), 2011. [5] V. Blondel, J.-l. Guillaume, R. Lambiotte, and E. Lefebvre. Fast Figure 5: Comparison between our model and existing unfolding of communities in large networks. J. Stat. Mech. (2008) models without past migrations. P10008. [6] S. P. Borgatti. 2-Mode Concepts in Social Network Analysis. Social Networks 19 (1997) 243-269. In both cases, our model gives better results. This con- [7] S. P. Boyd. Course EE363: Linear Dynamical Systems : Perron- Frobenius Theory, 2008. firms the validity of the previous chapters where the new [8] M. Chiang. Networked Life : 20 Questions and Answers. (April), parameter were introduced. To the best of our knowledge, 2012. even with more specific database and with more paramet- [9] A. De Montis, M. Barthlemy, A. Chessa, and A. Vespignani. The structure of interurban traffic: a weighted network analysis. ers than what we are using, no model of international Environment and Planning B: Planning and Design, 34(5):905 migrations competes with ours. 924, 2007. [10] A. Di Bartolomeo, T. Jaulin, and D. Perrin. Palestine. (July), 2011. V. Conclusion [11] C. Dustmann and A. Glitz. Migration and Education. 2011. This paper used and modified several well-known [12] L. Eldn. A Note on the Eigenvalues of the Google Matrix. pages 13, 2003. mathematical concepts of network science to analyse the [13] G. Fagiolo and M. Mastrorillo. International migration network: migration flows and proved their validity by applying Topology and modeling. Physical Review E, 88(1):012812, July them to econometric models which gave excellent results 2013. [14] GeoHive. Human Development Index. http://www.geohive. according to the metrics commonly used in this research com/earth/gen_hdi.aspx. field. Three major points in the field of migrations [15] Institude for digital research and education - UCLA. What are were developed: ranking countries, grouping them pseudo R-squared? faq/general/Psuedo_RSquareds.htm. into consistent groups and elaborating innovative and [16] M. Kristiansen, A. Mygind, and A. Krasnik. Health effects of competitive gravity models for migrations by using the migration. Danish medical bulletin, 54, 2007. results from the former sections. [17] D. L.Ermann. [18] J. J. Lewer and H. Van den Berg. A gravity model of im- migration. Economics Letters, 99(1):164167, Apr. 2008. http: So far, methods from network science has not been // much used to analyse migrations6 . We showed however [19] Y. Liu, Q. Liu, and Z. Qin. Community Detecting and Feature Analysis in Real Directed Weighted Social Networks. that such methods lead to consistent results. Network Journal of Networks, 8(6):14321439, June 2013. http://ojs. science can thereby be a new way to analyse migrations. [20] L. T. Luca De Benedictis. The world trade network. 15 September 2010. In many ways, one can go further in the analysis. One [21] Ministry of Foreign Affairs of Ethiopia. Ethiopia-Russia rela- idea is to distinguish the migrations of highly or lowly tions. educated people. This would take the "brain drain" effect [22] M. E. J. Newman. The mathematics of networks. pages 112. [23] OECD UNDESA. World Migration in Figures. (October):16, into account and improve our models. Besides, using time 2013. series regressions could improve our model to predict future [24] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank migrations and populations. Citation Ranking : Bringing Order to the Web. 1998. [25] R. Ramos and J. Suriach. A gravity model of migration between ENC and EU. 2013. VI. Acknowledgments [26] T. M. Rempel. Palestinian Refugees in the West Bank and the Gaza Strip, 2006. http://www. We would like to thank the people who directly or indirectly contributed to this work. Firstly, our supervisor, palestinian-refugees-in-the-west-bank-and-the-gaza/ alldocuments. Jean-Charles Delvenne, for his continuous support. Then, [27] M. Ruhs and C. Vargas-Silva. The Labour Market Effects of Jean-Franois Carpantier and Yves Deville for their read- Immigration. The migration observatory, 2014. ing of this work. And finally, Frdric Docquier, Yves- [28] J. M. C. S. Silva and S. Tenreyro. The log of gravity. 88(November):641658, 2006. Alexandre de Montjoye and Luc Rocher for the discussions [29] The Levin Institute - The State University of New York. Glob- we had. alization101 : Economic effects of migration, 2014. http://www. [30] World Bank Group, C. Ozden, C. R. Parsons, M. Schiff, and References T. L. Walmsley. World Bank Economic Review : Where on [1] J. E. Anderson. The Gravity Model. Annual Earth is Everybody? The Evolution of Global Bilateral Migra- Review of Economics, 3(1):133160, Sept. 2011. tion 1960-2000. Technical report. data-catalog/global-bilateral-migration-database. annurev-economics-111809-125114. [2] E. Artuc, F. Docquier, C. Ozden, and C. R. Parsons. A Global Assessment of Human Capital Mobility : the Role of non-OECD Destinations. docquier/oxlight.htm. 6 Except for [13].

Load More