Data from 700 million LinkedIn users has been put up for sale online, making this one of the largest LinkedIn data leaks to date. After analyzing the data and making contact with the seller, we have updated this article with more information, including how the data was obtained and the possible impact on LinkedIn users.
UPDATE: LinkedIn has confirmed via email to RestorePrivacy that the data was obtained from their servers, as well as from other sources. And contrary to some reports, LinkedIn is NOT denying that data was harvested from their servers. They point out, however, that some data was also obtained from other sources.
Many people trust LinkedIn with all sorts of private data, hoping and trusting that the information remains in safe hands. But is this trust warranted? So far in 2021, we have already seen two separate incidents where bad actors have exploited the professional networking platform to harvest vast amounts of user data.
The implications of this are far-ranging, from identity theft to phishing attacks, social engineering attacks, and more. Before we dive into the consequences of this leak, let’s first examine what happened.
What happened exactly?
On June 22nd, a user of a popular hacker forum advertised data from 700 Million LinkedIn users for sale. The user of the forum posted a sample of the data that includes 1 million LinkedIn users. We examined the sample and found it to contain the following information:
- Email Addresses
- Full names
- Phone numbers
- Physical addresses
- Geolocation records
- LinkedIn username and profile URL
- Personal and professional experience/background
- Genders
- Other social media accounts and usernames
The user claims that the complete database contains the personal information of 700 Million LinkedIn users. Since LinkedIn has 756 million users, according to its website, this would mean that 92% of all LinkedIn users can be found in these records.
Below is a small section of the sample we examined to show you how much information one record can contain:
Based on our analysis and cross-checking data from the sample with other publicly available information, it appears all data is authentic and tied to real users. Additionally, the data does appear to be up to date, with samples from 2020 to 2021.
While we did not find login credentials or financial data in the samples we examined, there is still a treasure trove of information for bad actors to exploit for financial gain, as we’ll explain more below.
How was the data obtained?
We reached out directly to the user who is posting the data up for sale on the hacking forum. He claims the data was obtained by exploiting the LinkedIn API to harvest information that people upload to the site.
Below is one interaction we had with the threat actor on Telegram. You can see that he is asking $5,000 for the complete data set, and stating that the data was acquired through the LinkedIn API.
However, LinkedIn has emailed us an explanation, stating that not all of the data could have been acquired through the LinkedIn API. Instead, some of the data likely came from other sources.
LinkedIn has even issued a statement here, where they note that their “initial investigation has found that this data was scraped from LinkedIn and other various websites.”
Everything remains up for sale at this time.
Official response from LinkedIn
We have also reached out to LinkedIn for comment on this latest data leak. They have confirmed that the data was scraped from their servers, as well as other sources, but are also claiming that “no private LinkedIn member data was exposed.” And note that the definition of “private data” is surely subjective.
Our teams have investigated a set of alleged LinkedIn data that has been posted for sale. We want to be clear that this is not a data breach and no private LinkedIn member data was exposed. Our initial investigation has found that this data was scraped from LinkedIn and other various websites and includes the same data reported earlier this year in our April 2021 scraping update.
– LinkedIn’s full statement can be found here.
It is important to note that LinkedIn is not denying that data was harvested from their servers. They are simply pointing out that:
- Some of the data was also obtained from “other various websites”.
- They do not consider your LinkedIn data that was exposed to be “private”.
So what is the definition of “private data” and what expectation of privacy do you have when you upload data to LinkedIn?
Possible impact of this latest LinkedIn data leak
While this latest LinkedIn leak did not contain any financial records or login credentials, there are still serious consequences. This is because it puts 700+ million people at risk of:
- identity theft
- phishing attempts
- social engineering attacks
- hacked accounts
Cybercriminals can use the information found in the leaked files with other data in order to create full detailed profiles of their potential victims. Additionally, bad actors can use the available data, particularly usernames, emails, and personal information, to gain access to other accounts.
Above all else, this information exposes LinkedIn users to a higher risk of exploitation by bad actors.
And once your private data is leaked, there’s no getting it back.
Should companies be financially liable when your data is exposed?
This leads us to an interesting question. Should companies be held liable when user data is exploited by bad actors?
In this specific case, it does not appear that LinkedIn servers were hacked or there was a full “breach” in the traditional sense of the term. Instead, however, the data was harvested through LinkedIn’s own API (application program interface) by threat actors.
How much privacy should one really expect on a social networking site?
When others have your data, it puts you at risk
We’ve said it before and we’ll say it again: any business, individual, or entity that has control over your private data puts you at risk. Whether this risk is minimal or vast depends on the data, who is securing it, and the consequences of it being lost.
To minimize this risk, you need to limit the amount of data that is available to others.
This could include getting off of all social networks entirely, or limiting the information you share. Using products and services that don’t harvest your personal information for profit is also crucial. We have reviewed some of the best options with:
- Secure browsers that respect your privacy and don’t collect your data for advertising networks
- Secure and private email services that don’t sell access to your inbox or scan your emails and attachments
- Private search engines that respect your privacy
And of course, you should remain vigilant to all potential attacks while continuing to safeguard your personal information.
Linkedin has banned me by claiming i did excessive page view when I barely use it. It seems that a lot of people suffer the same issues. I wonder if this have relation with user data leak experience as they claimed the leak was from scraped data through API. User like me become victims for no good reason.
I don’t quite understand. LinkedIn is saying that this was not a breach, but you are saying that they are confirming a breach. Here is the quote from the linked press release from your article (linked April release):
“We have investigated an alleged set of LinkedIn data that has been posted for sale and have determined that it is actually an aggregation of data from a number of websites and companies. It does include publicly viewable member profile data that appears to have been scraped from LinkedIn. This was not a LinkedIn data breach, and no private member account data from LinkedIn was included in what we’ve been able to review.”
Did you read the article?
We discussed LinkedIn’s statement and their stance on the issue.
It is quite suspicious that they ask only 5K for such a valuable dataset?!
It could be that this is just a web scrap and aggregate from multiple sites as the LI suggests?
It’s just public info from their API, I can scrape this, no news here.
My accounts were hacked in GitHub and Microsoft using android cloned Apple devices without my permission or knowledge of use or setup and using my google hacked accounts stolen by android
Being that LinkedIn was hacked what is it advising its users to do: i.e. change passwords, use 2FA, etc.
That being said, what steps are they taking aside from saying only certain servers were hacked?
So basically LinkedIn is saying that they have a lot of personal data (though they don’t call it that) available on their API.
They should be more clear on what exactly was obtained via the api because this makes it look like there is a serious flaw on their end, either by having data available or by not securing it properly.
And it’s not like they are a small tech company with no $ to invest in security.
One more reason to quit Big Tech. Glad for sites like yours that help us do it.
Is there a website link that a person can check if their information was hacked???
Hi Xenia, I recommend the website https://haveibeenpwned.com, which is a project from cybersecurity expert Troy Hunt.
But checking information there (or similar websites) means sharing our data with them, which are stored/logged.
This is certainly a valid concern, but HIBP never stores or logs data that is entered on the site. It is a project that is run by a reputable person in the cybersecurity community (Troy Hunt). See here:
https://haveibeenpwned.com/Privacy
I checked and take a look on the Logging section:
These logs may include information entered into a form by the user.
(sorry I couldn’t post it as a response to your comment)
Yes that is interesting because it also says above that:
“Searching for an email address or phone number only ever retrieves the data from storage then returns it in the response, the searched data is never explicitly stored anywhere.”
I will look into this more.
Correct me if I’m wrong but I believe the haveibeenpwned (HIBP) protocol is (more or less).. to do the initial hash client side, then take the first 5 characters of the initial hash and use them to query the HIBP database and return pwned hashes which match the first 5 characters. Then the client queries that set of matching hashes for a full matches. Thus the HIBP server never knows your full hash or if its been pwned.
Its outlined on the HIBP site under “When you search Pwned Passwords”:
“The password is hashed client-side with the SHA-1 algorithm then only the first 5 characters of the hash are sent to HIBP per the Cloudflare k-anonymity implementation.”
However I’m not sure how email domains are handled.
It’s most likely an internal programmer who is selling the data. Since the companies are doing it, some employees also take advantage of it. Officially, of course, it was hacked or scraped. True for all big tech companies. They all know eachother – there is an entire black market for buying and selling behind the doors. Don’t be suprised if you find your search queries somehow in media articles.
If you have an email newsletter or anything else containing a lot of your articles, please email me to show me how to get your newsletters sent to my email address. It is [redacted].
THANK YOU VERY MUCH.
Hi Cliff, we are working on this and will have a newsletter available later in the year. In the meantime, check back here for updates.
Is it allowed to know the website of the said leak and user?
This was leaked and sold on RaidForums.
Let’s not forget about all of the Sales Professionals, like myself, who rely on LinkedIn to make a living. We pay a monthly subscription fee for access to LinkedIn profiles. And there’s limits on how many we can contact per month. Based on what you’re saying, we’re paying a premium to receive only a fraction of information that was scraped.
Let’s say it is my fault for posting personal information about myself on a social media site. Again, unlike FB, Twitter and others, I’m paying LinkedIn so would expect to have some sort of safeguards in place. As far as monthly subscriptions go, LinkedIn is not cheap. Otherwise, why wouldn’t we go and buy some of these 700 million profiles which are available online and have more information that what we’re provided with as a paying customer?
I wouldn’t have even know about this if it wasn’t for your post!
Hey LinkedIn, how about a notification to let me know that the past 20 years of my work history will soon be used against me in a social engineering attack.
More needs to come of this.
@LinkedIn Customer
I feel so sorry for your circumstance;
And your am appreciative of your choice to share your experience on an a privacy respecting site.
Reading the press lease linked above, I thought LinkedIn’s response inadequate.
LinkedIn describe this event as “not a breach,” and;
I believe they should rightly call this an “incident.”
Incident response protocol defines three levels; incident, event and breach.
An “incident” is defined by, an actor violating site policies.
LinkedIn state there policies have been violated.
The law and framework surrounding privacy is a mess;
It is inadequate and ineffective.
LinkedIn’s security team would describe this occurrence as an “incident”;
Then the PR team, “not a breach;”
Stop Lying To Us!
What a well research and informative article!
The question raised in the article regarding users’ expectations of privacy when using social media has raised great debate in the chat. I believe that this is quiet a conflicting question.
A question for Sven: When viewing the sample, were the data types revealed consistent for each user? If they were, this may demonstrate a lack of user control over what data they chose to make public.
I don’t have a LinkedIn account so I can not speak from experience but some of those data types such as “location_geo”, “inferred_salary” and “inferred_years_experience” don’t sound like a user would willingly or knowingly chose to make available to be shared and Harvasted.
When an organization places in their privacy policy terms to the affect, “We can share your PII with our partners and affiliates,” what does this actually mean? Who are these partners? And by what methods do organization share this?
This a fantastic piece of investigative journalism.
I am not on LinkedIn and I don’t know what the law is; yet I am outraged.
Hi BobeX,
> “were the data types revealed consistent for each user?”
No, the data varied for each user. Some people gave LinkedIn more data, including other email handles, date of birth, addresses, other social media accounts, etc., while others did not. So there was some variability between the different records we examined.
> “We can share your PII with our partners and affiliates,” what does this actually mean?
I’m not a lawyer and have not studied LinkedIn’s privacy policy. But we find this phrase, or variations of this phrase, with many different applications and services. To me, this usually means “we will share your data with anyone we want” – and this can be all sorts of groups, businesses, and organizations. It really opens the door up to your data going places you may not want. We often see this in the mobile app market, where free apps are actually data collection tools in disguise.
If there is a crack in the barrel it is considered a leak, if someone fills their cup from the barrel’s tap, it is considered ok, so why when someone fills another barrel from the first barrel’s tap, is it considered a leak?
Whatever you put on social media, it’s out there, so don’t put things up that you are not prepared to “leak”.
People who use LinkedIn typically share their information WITHIN their professional network.
I did not upload my information on LinkedIn for it to be traded and sold on hacker forums, to be used by scammers around the world impersonating me or trying to hijack my accounts. Do you want scammers to have your mobile phone, work email, professional history and geolocation data? Hell no. I did not expect this lack of privacy, but I guess I should going forward. Scary really.
I bet the hacker is a paid Microsoft person. Using “leaked” Linkedin data as a new revenue stream from Linkedin for Microsoft. without getting their hands dirty by selling it to data brokers. Well, they probably are doing that too.
How much are you willing to wager ? :p
Unfortunately in this job market LinkedIn is used by most potential hiring companies in order to check my profile out (experience, recommendations, etc.) and connections. That’s even after I provide my resume and filing out an online application.
“Should companies be financially liable when your data is exposed?”
Er, they are liable. The GDPR applies to everyone in the world, not just Europeans. If you have had your data exposed by LinkedIn, you have had your human right to privacy breached, and can claim compensation. Usually such claims are between €7000 and €15000 for minor, “no injury” claims like this one.
The applicability of GDRP is not universal. It is based on subject nationality and data location. So US citizens with DATA in US Datacenters ARE NOT PROTECTED under GDPR. This is because Article 3 of the GDPR, which defines the law’s territorial scope, states that it applies to companies in the EU/EEA and companies outside of the EU/EEA that serve (or track the data of) EU/EEA residents. So you HAVE TO BE A EU\EEA resident.
There is no social media site is fully safe for us. This is my personal thought.
Question: How much privacy should one really expect on a social networking site?
Answer: Not much at all.
I’m retired. I don’t need Linkedin for anything. In fact I think it’s just a liability at this point in my life. And as I get older I’m sure more people will try to scam me. Time to pull the plug and delete everything under my account, not worth the risks.
Um…. I put my professional profile up there on LinkedIn on purpose… I didn’t expect any privacy at all… I don’t see how anyone would?
@Bernd
Have you seen Jim Browning’s’ YouTube channel?
Commonly fraudsters are only playing confidence tricks that date back to Ur or Rome or Greece.
Have a look, these people often aren’t sophisticated, they just pretend to be.
I guess for most people who need to use LinkedIn for their career, giving up all the personal info to the company may be worth the risks. But not for me. The same could be said of any social network I guess.