Lexical networks from Finnish news articles

The dataset includes lexical networks centered on keywords related to migration. The networks are built starting from Finnish news articles extracted from the dataset described in Pluralistic Recommendation in News [1]). Each keyword is represented by a tuple (keyword_lemma, POS), e.g. for Finnish, migrant_NOUN, migrant_ADJ, emigrant_NOUN, emigrant_ADJ, izbjeglica_NOUN, izbjeglica_ADJ, imigrant_NOUN, imigrant_ADJ, migrirati_VERB. For each keyword_lemma the four previous and following words are extracted. The nodes of the networks are annotated with weighted attributes derived from sentiment and emotion lexicons, i.e., NRC lexicons. [1] Pluralistic Recommendation in News, url

Dataset extent

Map data © OpenStreetMap contributors
Tiles by MapBox
Data and Resources
To access the resources you must log in
  • finnish_egoNet_w4jsonl

    The dataset includes lexical networks centered on keywords related to...

    The resource: 'finnish_egoNet_w4' is not accessible as guest user. You must login to access it!
Personal Data Attributes

Description: Personal Data related Information

Field Value
Anonymisation Methodology News articles IDs are replaced by random UUIDs, i.e., uuid.uuid4()
Anonymised Pseudo Anonymized
ChildrenData No
Cross Border Authorised No
Data Protection Impact Assessment No
Ethics Committee Approval No
General Data Yes
Informed Consent Template No
Non Personal Data Explanation The dataset contains networks of words extracted from news articles.
Personal Data No
Personal data was manifestly made public by the data subject N/A (Not appliable)
Sensitive Data No
Additional Info
Field Value
Accessibility Both
Accessibility Mode Download
Availability On-Line
Basic rights Download
Creation Date 2023-11-29
Creator Laura Pollacci,,
Dataset Citation Pluralistic Recommendation in News, url
Dataset Re-Use Safeguards None
Field/Scope of use Non-commercial research only
Format jsonl
Group Migration Studies
Group Societal Debates and Misinformation
IP/Copyrights University of Pisa
Language fin, Finnish
License term 2023-11-29 /2030-11-29
Manifestation Type Virtual
Processing Degree Secondary
Retention Period 2030-11-29
Semantic Coverage Lexical networks, news articles, migration
SoBigData Node SoBigData IT
SoBigData Node SoBigData EU
Sublicense rights No
Territory of use World Wide
Thematic Cluster Social Data [SD]
Thematic Cluster Social Network Analysis [SNA]
Thematic Cluster Text and Social Media Mining [TSMM]
Time Coverage 2016-01-01 /2021-12-31
{"type":"Point", "coordinates":[27.04980432987213,64.86328125]}
system:type Dataset
Management Info
Field Value
Author Pollacci Laura
Maintainer Pollacci Laura
Version 1
Last Updated 29 November 2023, 16:48 (CET)
Created 29 November 2023, 15:49 (CET)