L’ONOMASTIQUE APPLIQUÉE AU
DÉCRYPTAGE DES ENJEUX IDENTITAIRES ET DE TERRITOIRE
Elian CARSENAT, NamSor Applied Onomastics
1...
2
 (En introduction : présentation du cas AKAF47)
Founder Bio
3
Elian CARSENAT, a computer scientist trained at
ENSIIE/INRIA, started his career at JP Morgan in
Paris in 19...
NamSor sorts Names
4
 Names are meaningful : we use sociolinguistics to extract their
semantics and deliver actionable in...
Mining 3M twitter names to map Diasporas
Who are they, where are they and what are they doing?
5
Source: Twitter
Source: T...
Flow view – who travels where?
6
Source Target Type Id Onoma Weight
United Kingdom France Directed 16 Great Britain 37
Spa...
Mapping Talents in Cancer Research
(in collaboration with French INSERM)
7
Thomson Reuters WebOfScience (6 countries, 250k...
Cancer Research in Poland and Slovenia
Examining the ‘brain drain’
8
In the Polish Corpus, we look at co-
authors with Pol...
9
 WORK IN PROGRESS – INDIA
“Incredible India” – 1.2 BN People
Indian onomastics by State/Union Territory
10
Names in LATIN, BENGALI, DEVANAGARI, GUJA...
GUJARAT : mapping onomastics by district
11
Source: Voters List; Visualization : Google Fusion Tables; Data Mining: NamSor...
ASSAM: Karbi Anglong, within district
Inter-caste marriages ?
12
output Input Input
clusterId clusterParentId Firstname La...
13
 USE CASE – BOSTON CITY
US Census vs NamSor geo-demographics
14
 In July 2015, the US Government announced new
rules that will require all cities...
US Census Race Map of Boston
15
http://www.nytimes.com/interactive/2015/07/08/us/census-race-map.html
Using Voters List
 US Census:
1pixel = 40 inhabitants
 Voters List:
1 pixel = 1 voter
16
Source: Boston Voters List
Visu...
Voter’s list: zooming further into 051200
 US Census
 Voters List + NamSor
17
Source: Boston Voters List
Visualization :...
Breaking down ‘White’ and ‘Asian’ into
Portuguese, Spanish, Italian, India, Pakistan, China, ...
18
Source: Boston Voters ...
Using NamSor API
19
Option 1/
Online
Option 2/
RapidMiner Extension
Merci !
Elian CARSENAT,
elian.carsenat@namsor.com
Phone : +33 6 52 77 99 07
20
Juillet 2013, Ambassade de Lituanie à Paris
of 20

NamSor for GEOINT

Applying onomastics to decrypting identities and understanding territorial issues, from open data and big data sources. Presentation delivered at the Geographical Society (Paris).
Published on: Mar 3, 2016
Published in: Presentations & Public Speaking      
Source: www.slideshare.net


Transcripts - NamSor for GEOINT

  • 1. L’ONOMASTIQUE APPLIQUÉE AU DÉCRYPTAGE DES ENJEUX IDENTITAIRES ET DE TERRITOIRE Elian CARSENAT, NamSor Applied Onomastics 1 2015-09-12
  • 2. 2  (En introduction : présentation du cas AKAF47)
  • 3. Founder Bio 3 Elian CARSENAT, a computer scientist trained at ENSIIE/INRIA, started his career at JP Morgan in Paris in 1997. He later worked as consultant and managed business & IT projects in London, Paris, Moscow and Shanghai. In 2012, Elian created NamSor, a piece of sociolinguistics software to mine the 'Big Data' and better understand international flows of money, ideas and people. http://fr.linkedin.com/in/eliancarsenat/en
  • 4. NamSor sorts Names 4  Names are meaningful : we use sociolinguistics to extract their semantics and deliver actionable intelligence.  Names reflect cultural Identity  NamSor data mining software recognizes the linguistic or cultural origin of names in any alphabet / language, with fine grain and high accuracy.
  • 5. Mining 3M twitter names to map Diasporas Who are they, where are they and what are they doing? 5 Source: Twitter Source: Twitter Visualization : CartoDB Data Mining: NamSor
  • 6. Flow view – who travels where? 6 Source Target Type Id Onoma Weight United Kingdom France Directed 16 Great Britain 37 Spain France Directed 55 Spain 14 United States France Directed 75 Great Britain 12 Turkey France Directed 79 Turkey 11 Brazil France Directed 87 Portugal 10 United Kingdom France Directed 112 Ireland 9 Italy France Directed 152 Italy 7 Switzerland France Directed 226 France 5 Belgium France Directed 247 France 5 United Kingdom France Directed 258 France 5 Mexico France Directed 287 Spain 4 Ireland France Directed 317 Great Britain 4 United Kingdom France Directed 333 Italy 4 United States France Directed 375 France 4 Source: Twitter Visualization : Gephi Data Mining: NamSor
  • 7. Mapping Talents in Cancer Research (in collaboration with French INSERM) 7 Thomson Reuters WebOfScience (6 countries, 250k scientists, 50k papers) “Analysts uncovered amazing patterns in the way scientists’ names correlate with whom they publish, and who they cite in their papers - not just in case of a particular country, but globally. Tania Vichnevskaia of the French National Institute for Health (INSERM) presented the paper ‘Applying onomastics to scientometrics‘ at IREG International symposium 2015 organised by University of Maribor and Shanghai Jiao Tong University. The paper was prepared jointly with NamSor, a private start-up company specialized in mapping international Diasporas.” Source: WoS; Data Mining: INSERM with NamSor
  • 8. Cancer Research in Poland and Slovenia Examining the ‘brain drain’ 8 In the Polish Corpus, we look at co- authors with Polish names, affiliated abroad. Top countries: 1. US, 2. Great-Britain, 3. Germany. In the Slovenian Corpus, we look at co- authors with Slovenian names, affiliated abroad. Top countries: 1. Great-Britain, 2. US, 3. Germany. Source: WoS; Data Mining: INSERM with NamSor
  • 9. 9  WORK IN PROGRESS – INDIA
  • 10. “Incredible India” – 1.2 BN People Indian onomastics by State/Union Territory 10 Names in LATIN, BENGALI, DEVANAGARI, GUJARATI, GURMUKHI, KANNADA, MALAYALAM, ORIYA, TAMIL, TELUGU, ARABIC
  • 11. GUJARAT : mapping onomastics by district 11 Source: Voters List; Visualization : Google Fusion Tables; Data Mining: NamSor 7Ahmedabad 22Surat 2Banaskantha 4Mahesana 9Rajkot 16Kheda 14Bhavnagar 21Bharuch 15Anand 24Navsari 19Vadodara 5Sabarkantha 6Gandhinagar 25Valsad 17Panchmahal 8Surendranagar 3Patan 10Jamnagar 12Junagadh 13Amreli 1Kachchh 28Morbi 27Arvalli 30GirSomnath 32Mahisagar 18Dahod 31Botad 29DevbhumiDwarka 20Narmada 26Tapi 33Chhotaudepur 11Porbandar 23Dangs L62454:497 L63405:400 L79998:394 L184529:18 L511960:527 L437236:1278 L256619:111 L294948:886 L85680:416 L184530:134
  • 12. ASSAM: Karbi Anglong, within district Inter-caste marriages ? 12 output Input Input clusterId clusterParentId Firstname LastName parent is FirstParentLastParent L25354:253L64958:2797 A¡à[¹ ¹}[ššã husband ¤àl¡ü[W¡³ [W¡}>๠L47490:1593L64958:2797 ¤àK[¹ [W¡}>๠father ¤àl¡ü[W¡³ [W¡}>๠L28582:1209L47490:1593 [³>à Òü}[t¡šã husband ¤àK[¹ [W¡}>๠L23643:669L35593:510 ™åKƒ}à [W¡}>๚ã father ¤ài¡[W¡³ [W¡}>๠L23643:669L35593:510 ³à>àÒü [W¡}>๚ã father ¤ài¡[W¡³ [W¡}>๠L47490:1593L35593:510 W¡àì=¢ [W¡}>๠father Wå¡ì¤ [W¡}>๠L23643:669L35593:510 A¡àì¹ t¡àì¹ïšã husband Wå¡ì¤ [W¡}>๠L35593:510L47490:1593 [ƒ[ºš [W¡}>๠father W¡àì¤ [W¡}>๠L23643:669L47490:1593 [¹>à [W¡}>๚ã father W¡àì¤ [W¡}>๠parent is husband Count of serial Column Labels Row Labels L47490:1593 L116370:3612 L54332:2031 L184096:2297 L35593:510 L168871:1819 L135664:4438 L51271:837 L23643:669 6931 84 5099 15 2069 28 791 1924 L151415:3559 18 212 11 6446 19 1217 55 6 L28582:1209 5132 68 3565 10 1494 17 592 1323 L116370:3612 66 10283 38 72 40 321 137 29 L9839:442 2491 60 1851 9 774 11 321 660 L168871:1819 7 263 6 361 8 2730 24 4 L23642:141 1198 8 822 2 375 4 156 332 L25354:253 1181 12 932 375 7 100 323 L135664:4438 20 154 5 22 19 44 2212 3 L87032:1210 11 315 13 51 14 141 37 9 L90333:3644 3 204 2 31 190 5 L184096:2297 13 1735 3 84 11 1 L87031:697 4 136 4 12 3 137 4 5 L14495:131 614 10 432 167 4 68 163 L63724:1422 17 83 10 34 34 28 96 6 L98994:891 31 161 46 21 19 59 21 5 ASSAM: Karbi Anlong district names clustered L116370:3612 L23643:669 L151415:3559 L47490:1593 L28582:1209 L54332:2031 L184096:2297 L168871:1819 L9839:442 L135664:4438 L87032:1210 L90333:3644 L35593:510 L51271:837 L63724:1422 L154797:1168 L64959:1796 L23642:141 L87031:697 L6536:295 L98994:891 L25354:253 L64958:2797 L30570:2614 L90334:1189 L95839:287 L100510:366 L121390:783 Other Source: Voters List; Data Mining: NamSor
  • 13. 13  USE CASE – BOSTON CITY
  • 14. US Census vs NamSor geo-demographics 14  In July 2015, the US Government announced new rules that will require all cities and towns receiving federal housing funds to assess patterns of segregation.  The NY Times has published interactive maps of Boston geo-demographics, which we can compare with the information inferred by NamSor
  • 15. US Census Race Map of Boston 15 http://www.nytimes.com/interactive/2015/07/08/us/census-race-map.html
  • 16. Using Voters List  US Census: 1pixel = 40 inhabitants  Voters List: 1 pixel = 1 voter 16 Source: Boston Voters List Visualization : ESRI Data Mining: NamSor
  • 17. Voter’s list: zooming further into 051200  US Census  Voters List + NamSor 17 Source: Boston Voters List Visualization : ESRI Data Mining: NamSor
  • 18. Breaking down ‘White’ and ‘Asian’ into Portuguese, Spanish, Italian, India, Pakistan, China, ... 18 Source: Boston Voters List Visualization : ESRI Data Mining: NamSor
  • 19. Using NamSor API 19 Option 1/ Online Option 2/ RapidMiner Extension
  • 20. Merci ! Elian CARSENAT, elian.carsenat@namsor.com Phone : +33 6 52 77 99 07 20 Juillet 2013, Ambassade de Lituanie à Paris

Related Documents