Update: Cybersecurity researcher Troy Hunt examined the data in this leak and found it to contain a mix of authentic scraped data from LinkedIn users and email addresses constructed from individuals’ names, explained further below. Additionally, the threat actor behind the leak has now released a total of 35 million lines of data from LinkedIn users, and this has been uploaded to the Have I Been Pwned database here.
A database containing 2.5 million rows of data on LinkedIn Premium users is shared for free on a popular clearnet hacking forum.
Scraping is the process of using automated tools to extract large amounts of data from websites, typically involving crawlers and bots that can evade anti-scraping measures by mimicking human-like user behavior.
Although scraping constitutes a violation of the terms of service on LinkedIn, which the job-oriented social media platform has fought in court for, many threat actors continue to engage in the activity to demonstrate their capacity to bypass protections or to make a profit.
In this case, user ‘USDoD’ has freely shared the CSV database, which contains recent (2023) data for LinkedIn Premium users, including the following information, among other things:
- Full names
- Email addresses
- LinkedIn profile IDs and URLs
- Job titles
- Employer names
- Education history
- Languages spoken
- Brief professional summaries
While most of the above is already publicly accessible to LinkedIn users, the inclusion of email addresses makes this leak valuable to cybercriminals. This information can be used for correlating email addresses with other leaks to find common passwords, narrow down the scope of brute-forcing attacks, or simply enable phishing. Also, having sensitive information combined into an indexable form makes it a lot easier for malicious actors to leverage it in social engineering attacks or perform identity fraud.
USDoD notes that the freely shared database contains the information of important people like government employees, members of non-governmental organizations, staff of education institutes, finance firms, etc., and generally concerns high-ranking individuals.
Email address visibility is determined by the user’s setting, which defines the degree of connections allowed to access this sensitive information. The scraper used in this case either bypassed these settings for all users or posed as a close connection, which is an unrealistic scenario.
RestorePrivacy has contacted LinkedIn to ask about the validity of the leaked data and how the threat actors could scrape multiple email addresses for each exposed user, but we have yet to hear back.
In August 2023, the UK’s Information Commissioner’s Office (ICO) and eleven data protection authorities from Canada, China, Australia, Switzerland, Norway, Argentina, and New Zealand, issued a statement that called social media platforms to implement more robust defenses against automated scraping. The statement reminded the internet platforms that any data collection happening without the user’s consent may be considered an infringement of data protection laws like the GDPR in Europe.
Update: on the LinkedIn data scrape
Cybersecurity researcher Troy Hunt has examined the data and found it to contain a mix of real and fabricated information. Troy writes on his blog:
This data is a combination of information sourced from public LinkedIn profiles, fabricated emails address and in part (anecdotally based on simply eyeballing the data this is a small part), the other sources in the column headings above. But the people are real, the companies are real, the domains are real and in many cases, the email addresses themselves are real.Troy Hunt
And to add another twist to this interesting story, the threat actor behind the leak has now released even more data allegedly sourced from LinkedIn.
In the latest leak, the user ‘USDoD’ is offering the “Full” LinkedIn database for free, which allegedly contains 35 millions lines.
We’ll keep an eye out for any more developments on this latest LinkedIn data scrape leak and update the article with any new findings.
This article was last updated on November 7, 2023 with further analysis on the data sample and an update on the threat actor releasing more data, allegedly containing 35 million lines from LinkedIn.